Correcting OCR errors for German in Fraktur font

Abstract

In this paper, we present ongoing experiments for correcting OCR errors on German newspapers in Fraktur font. Our approach borrows from techniques for spelling correction in context using a probabilistic edit-operation error model and lexical resources. We highlight conditions in which high error reduction rates can be obtained and where the approach currently stands with real data.

Publication
Proceedings of the First Italian Conference on Computational Linguistics (CLiC-it 2014)
Next
Previous