On November 28, the Centre for Social Data Science (CSDS) at the University of Helsinki hosted this milestone meeting to support FIN-CLARIAH in developing national infrastructure for digital social sciences. The event focused on current practices in annotating datasets in across disciplines interested in qualitative data, narrative analysis, secure practices for sensitive data, and the responsible use of AI tools in the annotation pipeline. This event was open for the local research community, and had approximately 70 participants.
The day opened with the keynote “Dream Infrastructures for a Social Scientist” by Salla-Maaria Laaksonen (above). Her talk (watch here) caught us up with the last ten years of research with social, ‘messy’ big data and provided some guidelines as to what infrastructures social media researchers need today. Laaksonen has a long trajectory in studying the power of social media platforms in democratic, institutional and political processes through ‘computational hermeneutics’. With this concept she referred to using a range of methods derived from getting entangled in analysing big-data but also trying to understand broader phenomena in which data is embedded. Laaksonen has been active in rajapinta.co, an association that is reference community in Finland for digital social sciences. The talk provided examples of accessible but often transitory infrastructures for acquiring social media data, which Laaksonen characterised as “Tamagotxi”, digital creatures that need feeding, care, and are constantly on the verge of disappearing.
Other great challenges that social science researchers face are not derived from working with big, but sensitive data. Prof. Hisayo Katsui, in her insight talk, opened up the ethical concerns and challenges generating and working with this data. Katsuki is professor for disability studies, and active in fostering emancipatory and participative methods in research situations that involve severely marginalized groups. In this and related fields, researchers must make very difficult decisions taking into account trauma, normalized unfair power structures, strict ethical committees and pressure to comply with openness.

Finally, our host for the day, Prof. Krista Lagus, introduced theory-based annotation using large language models (watch here). In it she tackled one last key technological development that comes with its own set of challenges and opportunities, and which researchers in the Center for Social Data Science at Helsinki are currently addressing in various projects. An example of this work was illustrated through their Pharma H project producing coding aids with help of LLMs for big data such as suomi24 discussions; with the concrete task of detecting meaningful expressions related to psychological, social and physical well-being.

These talks set the stage for three afternoon workshops that introduced participants to environments such as CSC’s AITTA environment to deal with LLMs; or the Sensitive Data (SD) environment to work across organisational borders with sensitive material in a virtual but secure desktop that already contains a catalogue of tools to e.g. transcribe or annotate. The third workshop presented participants with best practices and examples of agreements for the reuse of social media and interview data.
Text and images: Inés Matres

