Kohinaa kestävät NLP-mallit

Noise-resistant NLP Models and Data

Leader: Veronika Laippala, University of Turku; Partners: CSC; Collaborators: JYU

This WP provides infrastructure that allows processing noisy or otherwise non-standard data (e.g. historical language, dialects, spoken genres, and OCR noise). We currently have tools that process standard language with extremely high performance, such as the Turku Neural Parser and the FinBERT language model. Nevertheless, their performance deteriorates when facing noisy, non-standard input. We need infrastructure that is tolerant to noise and non-standard language. To achieve this, we will develop datasets and language models targeting such departures from the norm, such as corpora of non-standard language, statistical models of noise, allowing various types of noise to be introduced automatically, large language models pre-trained on noisy language, and noise-resistant fine-tuned task-specific models for, e.g., parsing, named entity tagging and sentiment analysis.

DARIAH-FI OFFICE:

The National Archives of Finland

Tanja Välisalo

DARIAH-FI: YLEISET KYSYMYKSET

DARIAH-FI: GENERAL

DARIAH-KONTTORI:

Turun yliopisto

Veronika Laippala

DARIAH-KONTTORI:

Jyväskylän yliopisto

Tanja Välisalo

DARIAH-KONTTORI:

Itä-Suomen yliopisto

Paula Rautionaho

DARIAH-KONTTORI:

Oulun yliopisto

Marika Rauhala

DARIAH-KONTTORI:

Aalto-yliopisto

Eero Hyvönen

DARIAH-KONTTORI:

Helsingin yliopisto

Risto Turunen

DARIAH-KONTTORI:

TampereEN YLIOPISTO

Sanna Kumpulainen

DARIAH-KONTTORI:

Suomen Kansalliskirjasto

Johanna Lilja

DARIAH-KONTTORI:

CSC – Tieteen tietotekniikan keskus

Katri Tegel

DARIAH-FI OFFICE:

CSC – IT Centre for Science

Katri Tegel

DARIAH-FI OFFICE:

National Library of Finland

Johanna Lilja

DARIAH-FI OFFICE:

Tampere University

Sanna Kumpulainen

DARIAH-FI OFFICE:

Aalto University

Eero Hyvönen

DARIAH-FI OFFICE:

University of Oulu

Marika Rauhala

DARIAH-FI OFFICE:

University of Eastern Finland

Paula Rautionaho

DARIAH-FI OFFICE:

Jyväskylä University

Venla Poso

DARIAH-FI OFFICE:

University of Turku

Veronika Laippala