DARIAH-FI Workshop: New tools and data services for cultural heritage research
Organisers: Inés Matres (University of Helsinki), Ida Toivanen (University of Jyväskylä), and Ilkka Lähteenmäki (University of Oulu)
Date: 22 January 2026, Online, Full-day with breaks
(Registration will open in November)
This workshop unveils new infrastructure developed in the last two years making use of LLMs, computer vision and semantic web to help processing visual, multimodal or large-scale historical documents. In this full-day workshop participants will get acquainted with tools to find, acquire and process manuscripts, letters, historical images or cultural heritage accessible in FINNA and ASTIA services. The workshop includes sessions for researchers not familiar with computational methods and one session for advanced digital humanities research.
This workshop is organised by the universities of Helsinki, Jyväskylä, Oulu, Turku and Aalto with participation of The National Library of Finland, The National Archives and is facilitated by DARIAH-FI, the Finnish Network for Data-intensive Research in the Humanities and Social Sciences.
Preliminary schedule
9:00 Welcome
9:10 Session 1
From manuscripts to data: JYU MessyDesk for processing your data Ida Toivanen (University of Jyväskylä), Ari Häyrinen (University of Jyväskylä), Venla Poso (University of Jyväskylä), Tanja Välisalo (National Archives of Finland).
We show how to process a dataset using MessyDesk, an user interface that incorporates tools, such as multi-century transcription and named entity recognition models. Audience: Users interested in digital humanities (no coding skills needed)
LetterSampo Finland – Finnish Nineteenth-Century Letters on the Semantic Web Petri Leskinen, Ilona Pikkanen, Jouni Tuominen, Eero Hyvönen, Heikki Rantala, Annastiina Ahola, Henna Poikkimäki, Rafael Leal (Aalto University) in collaboration with HELDIG, HSSH, SKS.
This presentation introduces the LetterSampo Finland–Finnish Nineteenth-Century Letters on the Semantic Web data service and semantic portal. The presentation will cover transforming and assembling the source data into Linked Data, enriching the data as well as demonstrating using the portal for browsing, searching the letter collections and visualizing the data. Audience: Users interested in digital humanities (no coding skills needed). More information the homepage: https://seco.cs.aalto.fi/project/coco/
SampoSampo – Connecting Everything to Everything Else Eero Hyvönen, Petri Leskinen, Annastiina Ahola, Heikki Rantala, Jouni Tuominen (Aalto University and University of Helsinki (HELDIG and HSSH)).
SampoSampo is a global Linked Open Data data service and portal based on a data alignment service on top of a cloud of interlinked Cultural Heritage knowledge graphs (KG) and data services of different application domains. In this way, a more comprehensive global view for searching, exploring, and analyzing entities with enriched linked data and their semantic connections can be provided than by using local KGs separately. The portal can be used for searching and exploring a cloud of linked KGs with a single user interface (UI) and for finding semantic “interesting” connections (relations) between their entities with natural language explanations and for validating the linked datasets with each other. Audience: Users interested in digital humanities (no coding skills needed). More information on the homepage: https://seco.cs.aalto.fi/project/ss/
10:40 Break
11:00 Session 2: Tools for visual cultural heritage
Bonus-demo: Content Search Satu Sorvali (The National Archives of Finland)
Content Search (Sisältöhaku) is a Finnish-language demo service that enables mass loading and text searches within archive material processed with content recognition by the National Archives of Finland.
Acquiring images and metadata from cultural heritage organisations with the new FINNA API Joona Manner, Julia Huovinen (The National Library of Finland), Inés Matres (University of Helsinki).
In this session we demonstrate the renewed FINNA API and still under development, and show a possible workflow to organize and cluster a mid-size dataset of historical drawings for further qualitative analysis in Tropy or Orange. The session includes a discussion of further development with researchers. Audience: Researchers and teachers interested in visual images (no coding skills needed)
ArtSampo – Finnish Art on the Semantic Web Annastiina Ahola, Heikki Rantala, Eero Hyvönen, Rafael Leal (Aalto University)
This presentation introduces the ArtSampo – Finnish Art on the Semantic Web data service and semantic portal. The presentation will cover transforming the original data into Linked Data, enriching the data using Generative AI as well as demonstrate using the portal for browsing, searching and visualizing the data. Audience: Users interested in digital humanities (no coding skills needed). More information on the project homepage: https://seco.cs.aalto.fi/projects/taidesampo/
12:30 Lunch Break
13:30 Welcome back
Computational Access to Finnish Cultural Heritage: Hands-On with the finna R Package Leo Lahti, Akewak Jeba, Julia Matveeva (University of Turku)
This session demonstrates how the open-source Finna R package enables researchers to access and analyze Finnish cultural heritage metadata from finna platform, which integrates resources from libraries, museums and archives across Finland and maintained by the National Library of Finland. The workshop introduces participants to the structure of finna data and shows how subsets such as Fennica (the national bibliography) and Viola (the national discography) can be accessed programmatically. Through guided examples and hands-on exercises, participants will practice retrieving, analyzing and visualizing metadata while gaining practical skills for integrating cultural data into computational research workflows. Audience & Prerequisites: previous knowledge with R needed/useful? (installations beforehand needed)
14:45 Round table discussion (15 min) Moderator: Ilkka Lähteenmäki (University of Oulu)
Join discussion on future development and the audience’s wants and needs (with option to comment on individual demos and infrastructures)
