University of Turku

University of Turku

University of Turku participates in sharing resources developed in the context of two thematic areas Digital Futures and Cultural Memory and Social Change. It hosts and supports the digitisation of the Archives of History, Culture and Arts Studies, it hosts the Agricola portal and the Digilang portal for language corpora. The DARIAH node is led by two research groups TurkuNLP and Turku Data Science group, known for compiling language corpora and digital curation in the humanities.

Website

https://www.utu.fi/en

Contact

Veronika Laippala
Digital Language Studies

Affiliated Groups

TurkuNLP

TurkuNLP is a leading Finnish research group in natural language processing (NLP) and digital language studies. Known for its strong interdisciplinary focus ranging from digital humanities to BioNLP and large language models, the group develops advanced language technology tools and resources, and runs a number of projects targeting language use in digital collections. TurkuNLP is internationally recognized for its work on dependency parsing, with widely used tools such as the Turku Neural Parser Pipeline and contributions to Universal Dependencies. Further, TurkuNLP works on the development of large language models such as FinGPT and Viking, and pioneers in structuring web corpora through web register (genre) identification.

Expertise

language technology

digital language studies

NLP

Large Language Models (LLMs)

Contact

DARIAH node

Turku Data Science Group

Turku Data Science group is a leading Finnish research group in computational humanities. The main research focus is in the analysis of complex natural and social systems using the state-of-the-art approaches in scientific data analysis. The group has developed data science frameworks to advance computational analyses of cultural production, population studies, and ecological systems and runs a number of internationally recognized projects in these areas. The team coordinates the international rOpenGov developer network (ropengov.org) that provides open data science methods for open government data.

Expertise

computational humanities

Contact

DARIAH node

Tools

Tools to make sense of web data

This resource consists of two tools: one to classify toxic data in Finnish (e.g., insults, obscene language) from datasets retrieved from social media platforms; and another to identify registers (genres, e.g., reviews, interviews, news reports) from web content in diverse languages.

Toxicity classifier: https://github.com/TurkuNLP/toxicity-classifier
Multilingual modeling of web registers: https://github.com/TurkuNLP/multilingual-register-labeling

Resource developed by the TurkuNLP / University of Turku in partnership with the CSC.
Guidance can be found in the websites of the resources.

Tutorial/Demo

https://youtu.be/q8kOJB6nA2M?feature=shared

Developed by

University of Turku

Contact

Veronika Laippala

Text Network Tools for Parliamentary Data

This resource provides tools based on network analysis for the analysis of political text. With these tools, researchers will be able, for example, to analyze keyword embeddings of the FinParl corpus and identify how phrases or longer text passages are re-used over time in he MPs plenary debates of the Finnish parliament.

KWIC keyword tool for FinParl corpus: http://finparl-01.utu.fi/apps/KWIC/
TNA tool for the analysis of speeches of Finnish MPs: http://finparl-01.utu.fi/apps/TNA

Resource developed by the University of Turku in partnership with Aalto University. Collaborators: the University of Jyväskylä.

Tutorial/Demo

https://youtu.be/_5CCKOnOSfg?feature=shared

Developed by

University of Turku

Contact

Kimmo Elo

Harmonized Finnish National Bibliography

This resource provides a harmonized version of the Finnish national bibliography (Fennica) dataset as well as the code used for cleaning, enriching and automatically generating reports on the data. Thanks to this resource, researchers will be able to extract bibliographic metadata for large scale statistical analysis.

Access to resource: https://fennica-fennica.2.rahtiapp.fi/
Code use to harmonize metadata: https://github.com/fennicahub/fennica

Information and guidance can be found in the webiste of the resource.
This resource has been developed by the University of Turku in partnership with the University of Helsinki. Collaborators: National Library of Finland, University of Jyväskylä.

Tutorial/Demo

https://youtu.be/9QYKQ1IYIjQ?feature=shared

Developed by

University of Turku

Contact

Leo Lahti

Automated Automated Harmonisation and Enrichment of Metadata

This resource provides R packages for collecting and enriching of Finnish cultural heritage metadata. finna R package is for collecting cultural metadata using the Finna API and the second is finto R package for enriching the metadata using the Finto API from the finto service. geofi R package is for Geospatial analysis and visualization of metadata. These tools are designed to offer easy access, geospatial analysis and visualization of metadata for cultural heritage researchers.

Finna R package: https://github.com/fennicahub/finna
Finto R package: https://github.com/fennicahub/finto
Geofi R package: https://github.com/rOpenGov/geofi

Information and guidance can be found in the webistes of the resources.

This resource has been develeoped by the University of Turku. Collaborators: National Library of Finland.

Tutorial/Demo

Developed by

University of Turku

Contact

Leo Lahti

Teaching and Training

Introduction to Digital Humanities

(Course in Finnish) Opintojakso tarjoaa monipuolisen yleiskatsauksen digitaalisiin ihmistieteisiin ja sen eri osa-alueisiin. Erityisesti tarkastellaan digitaalisuuden merkitystä ja vaikutusta humanistisessa tutkimuksessa, ja esitellään sekä kvalitatiivisia että kvantitatiivisia menetelmiä, joilla digitaalisuutta ja digitaalista aineistoa voidaan lähestyä. Käsiteltäviä teemoja, työvälineitä ja menetelmiä ovat esimerkiksi tekstinlouhinta, verkosto-analyysi, datan visualisointi, digitaalinen vuorovaikutus ja tietokoneavusteinen tekijäntunnistus.

Bachelor’s level

Digital Humanities for Literary Studies

(Course in Finnish) Opiskelija perehtyy kirjallisuudentutkimukseen liittyviin digitaalisiin ihmistieteisiin teoreettisesti ja tutkimusartikkelien perusteella.

Bachelor’s level

Master’s Degree Programme in Information and Communication Technology: Data Analytics

The Master’s Degree Programme in Information and Communication Technology provides versatile and high quality ICT education in selected fields of ICT, with an established reputation in innovative, interdisciplinary, and international education.

Master’s level

Master’s Degree in Digital Linguistics

Digital language studies is a multidisciplinary field of research that examines language use in a digital environment and combines methods from linguistics and natural language processing (NLP). A Master’s Degree in Digital linguistics can be obtained from the Language Specialist Degree Programme offered by the School of Languages and Translation Studies.

Master’s level

Doctoral Programme in Languages and Translation Studies (Utuling)

The doctoral researchers’ areas of research belong to linguistic, translation and literary scholarship and to other humanistic studies, and their approach is often multi- or cross-disciplinary.

Doctoral level