Resources
Tools to make sense of web data
This resource consists of two tools: one to classify toxic data in Finnish (e.g., insults, obscene language) from datasets retrieved from social media platforms; and another to identify registers (genres, e.g., reviews, interviews, news reports) from web content in diverse languages.
Toxicity classifier: https://github.com/TurkuNLP/toxicity-classifier
Multilingual modeling of web registers: https://github.com/TurkuNLP/multilingual-register-labeling
Resource developed by the TurkuNLP / University of Turku in partnership with the CSC.
Guidance can be found in the websites of the resources.
Tutorial/Demo
Developed by
Contact
This resource collects chat data from the live stream service Twitch. Thanks to this resource, researchers will be able to retrieve and analyze larger samples of chat data from the livestream service Twitch.
Resource developed by the University of Jyväskylä in partnership with the CSC.
Guidance can be found in the website of the resource.
Tutorial/Demo
Developed by
Contact
This resource makes Twitter/X data available for researchers. Altogether, it contains nearly 74 million messages from hundreds of thousands of user accounts from the five Nordic countries. The NTS data cover the period between January 2013 and May 2023 and were collected using the Academic API, which is now closed. The NTS comes with an easy-to-use graphic interface that supports quick data access. It is possible for instance to study public discourses and sentiment concerning events in recent history. Researchers will be able to search, subset, visualize, and download data.
Access to resource: https://nordictweetstream.fi/
Resource developed by the University of Eastern Finland, in collaboration with Linnaeus University.
Contact information and guidance can be found in the website of the resource.
Tutorial/Demo
Developed by
UX questionnaire developed within DARIAH-FI to test and evaluate tools, datasets or workflows developed for the project. The questionnaire was created and updated in several phases between 2022-2023 from a literature review, semi-structured interviews, and tests with end-users.
Developed by
This resource provides a framework for building customizable and responsive user interfaces for semantic portals without the necessity of having broad coding skill.
Sampo UI: https://seco.cs.aalto.fi/tools/sampo-ui/
Tutorial: https://seco.cs.aalto.fi/tools/sampo-ui/Sampo-UI-tutorial.pdf
An example of semantic portal created with this resource is ParliamentSampo: https://parlamenttisampo.fi/ . In this portal, it is possible to test its functionality to study parliamentary speeches; as well as some examples of queries that can be addressed using the portal.
Resource developed by Aalto University in partnership with the University of Turku and the University of Helsinki.
Tutorial/Demo
Developed by
Contact
Text Network Tools for Parliamentary Data
This resource provides tools based on network analysis for the analysis of political text. With these tools, researchers will be able, for example, to analyze keyword embeddings of the FinParl corpus and identify how phrases or longer text passages are re-used over time in he MPs plenary debates of the Finnish parliament.
KWIC keyword tool for FinParl corpus: http://finparl-01.utu.fi/apps/KWIC/
TNA tool for the analysis of speeches of Finnish MPs: http://finparl-01.utu.fi/apps/TNA
Resource developed by the University of Turku in partnership with Aalto University. Collaborators: the University of Jyväskylä.
Tutorial/Demo
Developed by
Contact
This resource provides a set of easy-to-use tools for conducting qualitative analysis on survey responses in Finnish. Thanks to this resource, researchers will be able to better understand data retrieved from open-ended questions.
CRAN webpage: https://CRAN.R-project.org/package=finnsurveytext
Guidance can be found in the website of the resource.
Tutorial/Demo
Developed by
Contact
Historical Newspapers in the CSC Supercomputing Environment
This resource allows to download copyright-free materials from the National Library of Finland through the CSC.
Access the resource: https://github.com/CSCfi/kielipankki-nlf-harvester
Technical documentation: https://urn.fi/urn:nbn:fi:lb-202311261
Resource developed by the National Library of Finland in partnership with the CSC, University of Helsinki and University of Turku. Collaborators: National Archives of Finland and University of Jyväskylä.
Developed by
Contact
kk-tutkijapalvelut@helsinki.fi
Harmonized Finnish National Bibliography
This resource provides a harmonized version of the Finnish national bibliography (Fennica) dataset as well as the code used for cleaning, enriching and automatically generating reports on the data. Thanks to this resource, researchers will be able to extract bibliographic metadata for large scale statistical analysis.
Access to resource: https://fennica-fennica.2.rahtiapp.fi/
Code use to harmonize metadata: https://github.com/fennicahub/fennica
Information and guidance can be found in the webiste of the resource.
This resource has been developed by the University of Turku in partnership with the University of Helsinki. Collaborators: National Library of Finland, University of Jyväskylä.
Tutorial/Demo
Developed by
Contact
Tool to evaluate biases and errors
This resource provides tools for subsetting and evaluating datasets that have not originally been created for research. Thanks to this resource, researchers will be able to robustly explore large datasets, examine their representativeness, and extract the subset they are interested in.
End-user interface links and usage instructions for centrally indexed datasets: https://github.com/hsci-r/elasticsearch-openshift/blob/main/documentation/exported_query.md
Technical documentation enabling people to set up their own instances for their own datasets: https://github.com/hsci-r/elasticsearch-openshift
Resource developed by the University of Helsinki (ARTS) in partnership with the CSC.
Tutorial/Demo
Developed by
Contact