Leader: Jari Ojala, Jyväskylä University; Partners: UHEL/SOC, CSC, UHEL/ARTS; Collaborators: NARC, TAU, UTU
The foreseen impact is to provide infrastructure to better use the mass digitisation service of the National Archives of Finland in which the printed documents that are in the possession of State authorities and kept permanently or in long-term storage will be digitised. After this digitisation we will have a dataset that includes no less than 130 shelf-kilometres of documents of State authorities from the early 1970s to the present day. Content recognition tools are to be used during the digitisation process. We need AI based solutions to help researchers in various fields to use this massive dataset effectively as c. 20 million files are created annually. To achieve this, we will further develop tools from UHEL/ARTS and UTU to recognize named entities (personal, organizational and place names) in the massive dataset.