Resources

Twitch Chat Collector & Analysis Tool

Contact

Veronika Laippala

This resource collects chat data from the live stream service Twitch and YouTube. Thanks to this resource, researchers will be able to retrieve and analyze larger samples of chat data from the livestream services Twitch and YouTube.

The tools sidebar contains multiple ways to collect data, but also sections for chat content classification based on machine learning and video clip analysis based on Multimodal Large Language Models.

Resource developed by the University of Jyväskylä with collaboration from Tampere University.
Guidance can be found in the website of the resource.

Tutorial/Demo

https://youtu.be/BN6ikEOy54U (chat analysis) / https://youtu.be/4DKX8O3auhE (video analysis)

Developed by

University of Jyväskylä

Contact

Raine Koskimaa, Jari Lindroos

Document Understanding Tools

The archival data team, consisting of Venla and Ida, has worked on producing tools for document understanding. This refers to various kinds of processing of documents, such as named entity recognition and document type classification.

Named entity recognition (UI): https://arkkiivi.fi/

Named entity recognition (Huggingface): https://huggingface.co/Kansallisarkisto/finbert-ner

Document type classification https://huggingface.co/jyu-digihum/findoctype

Most of the tool development has been conducted in collaboration with the National Archives of Finland.

Tutorial/Demo

https://github.com/JYU-digihum

Developed by

University of Jyväskylä

Contact

Venla Poso, Ida Toivanen

L2 Finnish model

L2 Finnish model is a classification model trained with CEFR annotated data containing fictional and non-fictional texts written by Finnish as a second language (L2) speakers. With the model you can classify texts into the following CEFR classes: A1, A2, B1, B2, and C.

Tutorial/Demo

https://huggingface.co/jyu-digihum/l2finnishmodel

Developed by

University of Jyväskylä

Contact

Jenny Tarvainen, Ida Toivanen, Ari Huhta

NTS: Nordic Tweet Stream

This resource makes Twitter/X data available for researchers. Altogether, it contains nearly 74 million messages from hundreds of thousands of user accounts from the five Nordic countries. The NTS data cover the period between January 2013 and May 2023 and were collected using the Academic API, which is now closed. The NTS comes with an easy-to-use graphic interface that supports quick data access. It is possible for instance to study public discourses and sentiment concerning events in recent history. Researchers will be able to search, subset, visualize, and download data.

Access to resource: https://nordictweetstream.fi/

Resource developed by the University of Eastern Finland, in collaboration with Linnaeus University.
Contact information and guidance can be found in the website of the resource.

Tutorial/Demo

https://youtu.be/XvJhfp8pWM4

Developed by

University of Eastern Finland

Sampo UI builder

This resource provides a framework for building customizable and responsive user interfaces for semantic portals without the necessity of having broad coding skill.

Sampo UI: https://seco.cs.aalto.fi/tools/sampo-ui/
Tutorial: https://seco.cs.aalto.fi/tools/sampo-ui/Sampo-UI-tutorial.pdf
An example of semantic portal created with this resource is ParliamentSampo: https://parlamenttisampo.fi/ . In this portal, it is possible to test its functionality to study parliamentary speeches; as well as some examples of queries that can be addressed using the portal.

Resource developed by Aalto University in partnership with the University of Turku and the University of Helsinki.

Tutorial/Demo

https://youtu.be/VV7Hw_uEtHM?feature=shared

Developed by

Aalto University

Contact

Eero Hyvönen

Text Network Tools for Parliamentary Data

This resource provides tools based on network analysis for the analysis of political text. With these tools, researchers will be able, for example, to analyze keyword embeddings of the FinParl corpus and identify how phrases or longer text passages are re-used over time in he MPs plenary debates of the Finnish parliament.

KWIC keyword tool for FinParl corpus: http://finparl-01.utu.fi/apps/KWIC/
TNA tool for the analysis of speeches of Finnish MPs: http://finparl-01.utu.fi/apps/TNA

Resource developed by the University of Turku in partnership with Aalto University. Collaborators: the University of Jyväskylä.

Tutorial/Demo

https://youtu.be/_5CCKOnOSfg?feature=shared

Developed by

https://youtu.be/UN7viKRzQvI

Contact

Kimmo Elo

Finnsurveytext

This resource provides a set of easy-to-use tools for conducting qualitative analysis on survey responses in Finnish. Thanks to this resource, researchers will be able to better understand data retrieved from open-ended questions.
CRAN webpage: https://CRAN.R-project.org/package=finnsurveytext
Guidance can be found in the website of the resource.

Tutorial/Demo

Developed by

Finnish Forum Scrapers

Contact

Krista Lagus

This is an application for scraping comment-data from Finnish resources with high user traffic.

Access: https://github.com/uh-dcm/finnish-forum-scrapers

Tutorial/Demo

https://youtu.be/yawd9zSb_xY

Developed by

Historical Newspapers in the CSC Supercomputing Environment

Contact

Matti Nelimarkka

This resource allows to download copyright-free materials from the National Library of Finland through the CSC.

Access the resource: https://github.com/CSCfi/kielipankki-nlf-harvester
Technical documentation: https://urn.fi/urn:nbn:fi:lb-202311261

Resource developed by the National Library of Finland in partnership with the CSC, University of Helsinki and University of Turku. Collaborators: National Archives of Finland and University of Jyväskylä.

Developed by

National Library of Finland

Contact

kk-tutkijapalvelut@helsinki.fi

Harmonized Finnish National Bibliography

This resource provides a harmonized version of the Finnish national bibliography (Fennica) dataset as well as the code used for cleaning, enriching and automatically generating reports on the data. Thanks to this resource, researchers will be able to extract bibliographic metadata for large scale statistical analysis.

Access to resource: https://fennica-fennica.2.rahtiapp.fi/
Code use to harmonize metadata: https://github.com/fennicahub/fennica

Information and guidance can be found in the webiste of the resource.
This resource has been developed by the University of Turku in partnership with the University of Helsinki. Collaborators: National Library of Finland, University of Jyväskylä.

Tutorial/Demo

https://youtu.be/9QYKQ1IYIjQ?feature=shared

Developed by

https://youtu.be/EVe5ZUo8tOM?feature=shared

Contact

Leo Lahti

Tool to evaluate biases and errors

This resource provides tools for subsetting and evaluating datasets that have not originally been created for research. Thanks to this resource, researchers will be able to robustly explore large datasets, examine their representativeness, and extract the subset they are interested in.

End-user interface links and usage instructions for centrally indexed datasets: https://github.com/hsci-r/elasticsearch-openshift/blob/main/documentation/exported_query.md
Technical documentation enabling people to set up their own instances for their own datasets: https://github.com/hsci-r/elasticsearch-openshift

Resource developed by the University of Helsinki (ARTS) in partnership with the CSC.

Tutorial/Demo

Developed by

Contact

Eetu Mäkelä

Forensic Linguistics Corpus and Search Interface C.R.I.M.E.

This resource is a structured, searchable corpus comprising audio and ASR-generated transcripts from investigative interviews, courtroom interactions, and related media.

Access the database: https://forensic.corpora.li
Access the static dataset: https://doi.org/10.7910/DVN/MLMB6E

Additional information (user guide, proceedings article) are linked on the websites.

This resource has been developed by Steven Coats, University of Oulu

Tutorial/Demo

User guide in the resources

Developed by

University of Oulu

Contact

Steven Coats

Automated Automated Harmonisation and Enrichment of Metadata

This resource provides R packages for collecting and enriching of Finnish cultural heritage metadata. finna R package is for collecting cultural metadata using the Finna API and the second is finto R package for enriching the metadata using the Finto API from the finto service. geofi R package is for Geospatial analysis and visualization of metadata. These tools are designed to offer easy access, geospatial analysis and visualization of metadata for cultural heritage researchers.

Finna R package: https://github.com/fennicahub/finna
Finto R package: https://github.com/fennicahub/finto
Geofi R package: https://github.com/rOpenGov/geofi

Information and guidance can be found in the webistes of the resources.

This resource has been develeoped by the University of Turku. Collaborators: National Library of Finland.

Tutorial/Demo

Developed by

Contact

Leo Lahti

Research Data Management handbooks

A collection of open access digital handbooks for research data management for SSH fields edited by the Helsinki Institute for Social Sciences and Humanities in Spring/Autumn 2024. The five guides cover: Texts, register data, surveys, social media, as well as audiovisual recordings.

Developed by

https://www.zotero.org/groups/6332635/dariah-fi/library

Contact

Jouni Tuominen

DARIAH-FI Zotero library

A public directory of publications (research articles, conference proceedings, data publications) that point at, explain or introduce use cases for the infrastructures developed by the DARIAH-FI partners for the FIN-CLARIAH project.

Link

Contact

Inés Matres

User Experience Questionnaire

UX questionnaire developed within DARIAH-FI to test and evaluate tools, datasets or workflows developed for the project. The questionnaire was created and updated in several phases between 2022-2023 from a literature review, semi-structured interviews, and tests with end-users.

Developed by

Guideline for collecting user experiences from workshops and training sessions

This document is intended to serve as an initial guide for collecting user experience data from workshops and training sessions related to the resources developed by the FIN-CLARIAH consortium.

Developed by

Educational material

This document includes information regarding the educational materials relevant to the DARIAH-FI research infrastructure and guidance on which courses might be relevant to use its resources more efficiently. The document also includes an overview of the state of the digital humanities and computational social sciences education in Finland.

Developed by

Educational resource development

This document provides an updated report on the educational resource development in DARIAH-FI for the 2024–2025 funding period.

Developed by

https://drive.google.com/file/d/1rYT_xqot9FKourz0DUoK5mD5urzeNVDE/view

Recommender system for NLF data

This resource provides code for developing recommender systems to assist information retrieval in digital libraries based on log data gathered from their use. The resource was developed by Tampere University in partnership with CSC and the University of Helsinki. Collaborators: National Library of Finland, University of Turku.

Tutorial/Demo

Developed by