
University of Turku
University of Turku participates in sharing resources developed in the context of two thematic areas Digital Futures and Cultural Memory and Social Change. It hosts and supports the digitisation of the Archives of History, Culture and Arts Studies, it hosts the Agricola portal and the Digilang portal for language corpora. The DARIAH node is led by two research groups TurkuNLP and Turku Data Science group, known for compiling language corpora and digital curation in the humanities.
Website
Contact

Veronika Laippala
Digital Language Studies
Affiliated Groups

TurkuNLP is a leading Finnish research group in natural language processing (NLP) and digital language studies. Known for its strong interdisciplinary focus ranging from digital humanities to BioNLP and large language models, the group develops advanced language technology tools and resources, and runs a number of projects targeting language use in digital collections. TurkuNLP is internationally recognized for its work on dependency parsing, with widely used tools such as the Turku Neural Parser Pipeline and contributions to Universal Dependencies. Further, TurkuNLP works on the development of large language models such as FinGPT and Viking, and pioneers in structuring web corpora through web register (genre) identification.
Expertise
Contact
DARIAH node

Turku Data Science group is a leading Finnish research group in computational humanities. The main research focus is in the analysis of complex natural and social systems using the state-of-the-art approaches in scientific data analysis. The group has developed data science frameworks to advance computational analyses of cultural production, population studies, and ecological systems and runs a number of internationally recognized projects in these areas. The team coordinates the international rOpenGov developer network (ropengov.org) that provides open data science methods for open government data.
Expertise
Contact
DARIAH node
Tools
Tools to make sense of web data
This resource consists of two tools: one to classify toxic data in Finnish (e.g., insults, obscene language) from datasets retrieved from social media platforms; and another to identify registers (genres, e.g., reviews, interviews, news reports) from web content in diverse languages.
Toxicity classifier: https://github.com/TurkuNLP/toxicity-classifier
Multilingual modeling of web registers: https://github.com/TurkuNLP/multilingual-register-labeling
Resource developed by the TurkuNLP / University of Turku in partnership with the CSC.
Guidance can be found in the websites of the resources.
Tutorial/Demo
Developed by
Contact
Text Network Tools for Parliamentary Data
This resource provides tools based on network analysis for the analysis of political text. With these tools, researchers will be able, for example, to analyze keyword embeddings of the FinParl corpus and identify how phrases or longer text passages are re-used over time in he MPs plenary debates of the Finnish parliament.
KWIC keyword tool for FinParl corpus: http://finparl-01.utu.fi/apps/KWIC/
TNA tool for the analysis of speeches of Finnish MPs: http://finparl-01.utu.fi/apps/TNA
Resource developed by the University of Turku in partnership with Aalto University. Collaborators: the University of Jyväskylä.
Tutorial/Demo
Developed by
Contact
Harmonized Finnish National Bibliography
This resource provides a harmonized version of the Finnish national bibliography (Fennica) dataset as well as the code used for cleaning, enriching and automatically generating reports on the data. Thanks to this resource, researchers will be able to extract bibliographic metadata for large scale statistical analysis.
Access to resource: https://fennica-fennica.2.rahtiapp.fi/
Code use to harmonize metadata: https://github.com/fennicahub/fennica
Information and guidance can be found in the webiste of the resource.
This resource has been developed by the University of Turku in partnership with the University of Helsinki. Collaborators: National Library of Finland, University of Jyväskylä.
Tutorial/Demo
Developed by
Contact
Teaching and Training
Introduction to Digital Humanities
(Course in Finnish) Opintojakso tarjoaa monipuolisen yleiskatsauksen digitaalisiin ihmistieteisiin ja sen eri osa-alueisiin. Erityisesti tarkastellaan digitaalisuuden merkitystä ja vaikutusta humanistisessa tutkimuksessa, ja esitellään sekä kvalitatiivisia että kvantitatiivisia menetelmiä, joilla digitaalisuutta ja digitaalista aineistoa voidaan lähestyä. Käsiteltäviä teemoja, työvälineitä ja menetelmiä ovat esimerkiksi tekstinlouhinta, verkosto-analyysi, datan visualisointi, digitaalinen vuorovaikutus ja tietokoneavusteinen tekijäntunnistus.
Bachelor’s level
Digital Humanities for Literary Studies
(Course in Finnish) Opiskelija perehtyy kirjallisuudentutkimukseen liittyviin digitaalisiin ihmistieteisiin teoreettisesti ja tutkimusartikkelien perusteella.
Bachelor’s level
Master’s Degree Programme in Information and Communication Technology: Data Analytics
The Master’s Degree Programme in Information and Communication Technology provides versatile and high quality ICT education in selected fields of ICT, with an established reputation in innovative, interdisciplinary, and international education.
Master’s level
Master’s Degree in Digital Linguistics
Digital language studies is a multidisciplinary field of research that examines language use in a digital environment and combines methods from linguistics and natural language processing (NLP). A Master’s Degree in Digital linguistics can be obtained from the Language Specialist Degree Programme offered by the School of Languages and Translation Studies.
Master’s level
Doctoral Programme in Languages and Translation Studies (Utuling)
The doctoral researchers’ areas of research belong to linguistic, translation and literary scholarship and to other humanistic studies, and their approach is often multi- or cross-disciplinary.
Doctoral level