FIN-CLARIAH summer meeting builds "Roads to Multimodality"

Before we begin the summer hiatus we bring news from the last FIN-CLARIAH day dedicated to building Roads to Multimodality. On June 13, the members of the national consortium FIN-CLARIAH were hosted by the DARIAH-FI node at the University of Jyväskylä in the green surroundings of Seminaarimäki. Lynty (“Lantern”) was chosen as venue, a building designed by Alvar Aalto on top of the campus hill. This symbolic place was fit to celebrate the news that FIN-CLARIAH has been granted lighthouse status, which means that FIN-CLARIAH is one of six research infrastructures selected to lead the way in service provision, impact and shared use. This day in Jyväskylä was dedicated to setting standards in latest and multimodal modes of research. With insights for processing audiovisual data.

In his keynote, Erkut Erdem, Professor in computer science from Hacettepe University (and soon visiting professor in Jyväskylä), presented three promising application scenarios combining computer vision and GenAI to incorporate context to audiovisual material.

Prof. Edem explaining Video-LLaMa model. (Click to watch full lecture)
Video: CC-BY reproduced with permission of speaker

One of the goals of FIN-CLARIAH since 2024 has been to create better infrastructures for researchers using social media and multimodal content. Past events have showed us that SSH researchers are interested in both (see reports from FIN-CLARIAH day in Helsinki or Roadshow in Vaasa). It is undeniable that social media is multimodal. This is evident in the “convergence” of platforms such as Instagram, TikTok, YouTube or Twitch, where users upload live-streams and short video clips, or react textually and visually to them. If we look at historical sources, our most recent history has produced an audiovisual heritage that will be available on an ever larger scale. Thus, there is a wide community of researchers that can benefit if analysis tools include functionalities such as the creation of synchronous content descriptions, or inversely the capacity to search visual elements or key moments using textual prompts, also when the material is not annotated or subtitled.

Prof. Erdem’s talk made evident the limitation of models trained on short, non-licensed content from YouTube, questions were raised by the audience if these models might not perform equally in other types of content or in longer videos, which increases computational power. Within FIN-CLARIAH, these issues have been noted, for example, in the Twitch videostream research. Ongoing work to support capture and analysis of Twitch streams has focused on generating automatic summaries of livestream video clips by integrating the video and audio with the chat interactions. The future roadmap aims to expand this into long-context video understanding, capable of analyzing up to entire livestreams. All this is connected to utilizing the state-of-the-art capabilities of models, such as Google´s Gemini 2.5, for video understanding tasks.

In the afternoon Juhanna Salonen introduced the Jyväskylä-based Corpus Project of Finland’s Sign Languages (CFINSL → Sign bank). This project recorded over hundred interviews documenting Finnish and Finland-Swedish Sign Language. This corpus is now published in Kielipankki in two datasets: one fully annotated Finnish Sign Language¹ corpus, and another non-annotated in Finland-Swedish Sign Language².

Details from the interviews recorded for the project CFINSL. (Click to watch full lecture / subtitled version coming soon)
Video: CC-BY reproduced with permission of speaker

The CFINSL project required a thorough documentation of each interview, shot with several cameras; and the work of volunteers who transcribed the interviews with ELAN. In this corpus one can find open-access conversations (see image above, in green) which can be further re-used for teaching sign-language or to develop much needed solutions to aid interpreters; and more restricted-access material (see image above, in yellow) containing personal narratives from the Sign Language community. While this research field has special demands, similar recording and annotating infrastructures are used in oral history or visual anthropology.

This event covered three areas of development to which DARIAH-FI and The Language Bank of Finland (together FIN-CLARIAH) are collaborating to advance:

Lowering barriers to implement LLMs and computer vision technology by SSH research communities,
Advancing access and tools for audiovisual and multimodal content,
Creating infrastructures for minority languages in Finland.

Text: Inés Matres & Venla Poso / Header image: Rosalma

1 Suomalaisen viittomakielen korpus http://urn.fi/urn:nbn:fi:lb-2021092401

2 Finlandssvensk teckenspråkskorpus http://urn.fi/urn:nbn:fi:lb-2024090329