Tool – Tools to make sense of web data

Tools to make sense of web data

This resource consists of two tools: one to classify toxic data in Finnish (e.g., insults, obscene language) from datasets retrieved from social media platforms; and another to identify registers (genres, e.g., reviews, interviews, news reports) from web content in diverse languages.

Toxicity classifier: https://github.com/TurkuNLP/toxicity-classifier
Multilingual modeling of web registers: https://github.com/TurkuNLP/multilingual-register-labeling

Resource developed by the TurkuNLP / University of Turku in partnership with the CSC.
Guidance can be found in the websites of the resources.

Tutorial/Demo

https://youtu.be/q8kOJB6nA2M?feature=shared

Developed by

University of Turku

Contact

Veronika Laippala