About the project
Even though there have been large developments within artificial intelligence, computer linguistics and neural networks such a general system with acceptable quality does not exist for Norwegian. There are only specialized systems that can recognize handwriting from writers in the training set with sufficient quality.
Intermediate goals are to further improve the specialized recognition for writers in the training set, increase the number of writers in the training and to automate the training process as much as possible.
The following steps will be used to achieve the goals:
-Building from existing systems, generate a robust layout system, i.e. finding text lines, that can adapt to new writer?s style
-Using and adapting state-of-the-art neural network technology for character recognition.
-Utilizing advanced linguistics for historical Norwegian to improve the recognition.
-Incorporate novel techniques such as making artificial documents that mimics handwriting of a writer (using GAN networks), but with a known content so it can be used for training without any manual effort. Also use a trainable feature-based method (?Zero-shot word spotting?) to recognize words and augment the results from the other processing.
-Generate a large training set with a diverse set of writing styles and try to minimize the manual effort need for transcription.
The project will place great emphasis on testing and analysing test results with feedback to the development to track progress and identify issues that need special attention.
The project period
From the starting date: 01.10.2021
To the date of completion: 01.02.2025
Project type
Collaborative Project to Meet Societal and Industry-related Challenges
Funding
From Research Council of Norway: 11 995 kNOK. Total for the project: 15 366 kNOK
Partners
HØGSKOLEN I ØSTFOLD
NASJONALBIBLIOTEKET
TIDVIS AS
ANAHIT AS
TEKLIA