Resources
Tools to make sense of web data
This resource consists of two tools: one to classify toxic data in Finnish (e.g., insults, obscene language) from datasets retrieved from social media platforms; and another to identify registers (genres, e.g., reviews, interviews, news reports) from web content in diverse languages.
Toxicity classifier: https://github.com/TurkuNLP/toxicity-classifier
Multilingual modeling of web registers: https://github.com/TurkuNLP/multilingual-register-labeling
Resource developed by the TurkuNLP / University of Turku in partnership with the CSC.
Guidance can be found in the websites of the resources.
Tutorial/Demo
Developed by
Contact
Twitch Chat Collector & Analysis Tool ![]()
This resource collects chat data from the live stream service Twitch and YouTube. Thanks to this resource, researchers will be able to retrieve and analyze larger samples of chat data from the livestream services Twitch and YouTube.
The tools sidebar contains multiple ways to collect data, but also sections for chat content classification based on machine learning and video clip analysis based on Multimodal Large Language Models.
Resource developed by the University of Jyväskylä with collaboration from Tampere University.
Guidance can be found in the website of the resource.
Tutorial/Demo
https://youtu.be/BN6ikEOy54U (chat analysis) / https://youtu.be/4DKX8O3auhE (video analysis)
Developed by
Contact
Document Understanding Tools
The archival data team, consisting of Venla and Ida, has worked on producing tools for document understanding. This refers to various kinds of processing of documents, such as named entity recognition and document type classification.
Named entity recognition (UI): https://arkkiivi.fi/
Named entity recognition (Huggingface): https://huggingface.co/Kansallisarkisto/finbert-ner
Document type classification https://huggingface.co/jyu-digihum/findoctype
Most of the tool development has been conducted in collaboration with the National Archives of Finland.
Tutorial/Demo
Developed by
Contact
L2 Finnish model
L2 Finnish model is a classification model trained with CEFR annotated data containing fictional and non-fictional texts written by Finnish as a second language (L2) speakers. With the model you can classify texts into the following CEFR classes: A1, A2, B1, B2, and C.
Tutorial/Demo
Developed by
Contact
This resource makes Twitter/X data available for researchers. Altogether, it contains nearly 74 million messages from hundreds of thousands of user accounts from the five Nordic countries. The NTS data cover the period between January 2013 and May 2023 and were collected using the Academic API, which is now closed. The NTS comes with an easy-to-use graphic interface that supports quick data access. It is possible for instance to study public discourses and sentiment concerning events in recent history. Researchers will be able to search, subset, visualize, and download data.
Access to resource: https://nordictweetstream.fi/
Resource developed by the University of Eastern Finland, in collaboration with Linnaeus University.
Contact information and guidance can be found in the website of the resource.
Tutorial/Demo
Developed by
This resource provides a framework for building customizable and responsive user interfaces for semantic portals without the necessity of having broad coding skill.
Sampo UI: https://seco.cs.aalto.fi/tools/sampo-ui/
Tutorial: https://seco.cs.aalto.fi/tools/sampo-ui/Sampo-UI-tutorial.pdf
An example of semantic portal created with this resource is ParliamentSampo: https://parlamenttisampo.fi/ . In this portal, it is possible to test its functionality to study parliamentary speeches; as well as some examples of queries that can be addressed using the portal.
Resource developed by Aalto University in partnership with the University of Turku and the University of Helsinki.
Tutorial/Demo
Developed by
Contact
Text Network Tools for Parliamentary Data
This resource provides tools based on network analysis for the analysis of political text. With these tools, researchers will be able, for example, to analyze keyword embeddings of the FinParl corpus and identify how phrases or longer text passages are re-used over time in he MPs plenary debates of the Finnish parliament.
KWIC keyword tool for FinParl corpus: http://finparl-01.utu.fi/apps/KWIC/
TNA tool for the analysis of speeches of Finnish MPs: http://finparl-01.utu.fi/apps/TNA
Resource developed by the University of Turku in partnership with Aalto University. Collaborators: the University of Jyväskylä.
Tutorial/Demo
Developed by
Contact
This resource provides a set of easy-to-use tools for conducting qualitative analysis on survey responses in Finnish. Thanks to this resource, researchers will be able to better understand data retrieved from open-ended questions.
CRAN webpage: https://CRAN.R-project.org/package=finnsurveytext
Guidance can be found in the website of the resource.
Tutorial/Demo
Developed by
Contact
This is an application for scraping comment-data from Finnish resources with high user traffic.
Access: https://github.com/uh-dcm/finnish-forum-scrapers
Tutorial/Demo
Developed by
Contact
Historical Newspapers in the CSC Supercomputing Environment ![]()
This resource allows to download copyright-free materials from the National Library of Finland through the CSC.
Access the resource: https://github.com/CSCfi/kielipankki-nlf-harvester
Technical documentation: https://urn.fi/urn:nbn:fi:lb-202311261
Resource developed by the National Library of Finland in partnership with the CSC, University of Helsinki and University of Turku. Collaborators: National Archives of Finland and University of Jyväskylä.
Developed by
Contact
kk-tutkijapalvelut@helsinki.fi
Harmonized Finnish National Bibliography ![]()
This resource provides a harmonized version of the Finnish national bibliography (Fennica) dataset as well as the code used for cleaning, enriching and automatically generating reports on the data. Thanks to this resource, researchers will be able to extract bibliographic metadata for large scale statistical analysis.
Access to resource: https://fennica-fennica.2.rahtiapp.fi/
Code use to harmonize metadata: https://github.com/fennicahub/fennica
Information and guidance can be found in the webiste of the resource.
This resource has been developed by the University of Turku in partnership with the University of Helsinki. Collaborators: National Library of Finland, University of Jyväskylä.
Tutorial/Demo
Developed by
Contact
Tool to evaluate biases and errors ![]()
This resource provides tools for subsetting and evaluating datasets that have not originally been created for research. Thanks to this resource, researchers will be able to robustly explore large datasets, examine their representativeness, and extract the subset they are interested in.
End-user interface links and usage instructions for centrally indexed datasets: https://github.com/hsci-r/elasticsearch-openshift/blob/main/documentation/exported_query.md
Technical documentation enabling people to set up their own instances for their own datasets: https://github.com/hsci-r/elasticsearch-openshift
Resource developed by the University of Helsinki (ARTS) in partnership with the CSC.
Tutorial/Demo
Developed by
Contact
Forensic Linguistics Corpus and Search Interface C.R.I.M.E.
This resource is a structured, searchable corpus comprising audio and ASR-generated transcripts from investigative interviews, courtroom interactions, and related media.
Access the database: https://forensic.corpora.li
Access the static dataset: https://doi.org/10.7910/DVN/MLMB6E
Additional information (user guide, proceedings article) are linked on the websites.
This resource has been developed by Steven Coats, University of Oulu
Tutorial/Demo
User guide in the resources
Developed by
Contact
Automated Automated Harmonisation and Enrichment of Metadata
This resource provides R packages for collecting and enriching of Finnish cultural heritage metadata. finna R package is for collecting cultural metadata using the Finna API and the second is finto R package for enriching the metadata using the Finto API from the finto service. geofi R package is for Geospatial analysis and visualization of metadata. These tools are designed to offer easy access, geospatial analysis and visualization of metadata for cultural heritage researchers.
Finna R package: https://github.com/fennicahub/finna
Finto R package: https://github.com/fennicahub/finto
Geofi R package: https://github.com/rOpenGov/geofi
Information and guidance can be found in the webistes of the resources.
This resource has been develeoped by the University of Turku. Collaborators: National Library of Finland.
Tutorial/Demo
Developed by
Contact
Research Data Management handbooks
A collection of open access digital handbooks for research data management for SSH fields edited by the Helsinki Institute for Social Sciences and Humanities in Spring/Autumn 2024. The five guides cover: Texts, register data, surveys, social media, as well as audiovisual recordings.
Developed by
Contact
A public directory of publications (research articles, conference proceedings, data publications) that point at, explain or introduce use cases for the infrastructures developed by the DARIAH-FI partners for the FIN-CLARIAH project.
Contact
UX questionnaire developed within DARIAH-FI to test and evaluate tools, datasets or workflows developed for the project. The questionnaire was created and updated in several phases between 2022-2023 from a literature review, semi-structured interviews, and tests with end-users.
Developed by
Guideline for collecting user experiences from workshops and training sessions ![]()
This document is intended to serve as an initial guide for collecting user experience data from workshops and training sessions related to the resources developed by the FIN-CLARIAH consortium.
Developed by
This document includes information regarding the educational materials relevant to the DARIAH-FI research infrastructure and guidance on which courses might be relevant to use its resources more efficiently. The document also includes an overview of the state of the digital humanities and computational social sciences education in Finland.
Developed by
Educational resource development ![]()
This document provides an updated report on the educational resource development in DARIAH-FI for the 2024–2025 funding period.
Developed by
Recommender system for NLF data ![]()
This resource provides code for developing recommender systems to assist information retrieval in digital libraries based on log data gathered from their use. The resource was developed by Tampere University in partnership with CSC and the University of Helsinki. Collaborators: National Library of Finland, University of Turku.
Developed by


