About the Project
The coordination project OCR-D is aimed at the development of methods of Optical Character Recognition (OCR) for printed historical material.
Therefore existing workflows and methods of automatic text recognition are examined, described and optimized. An important goal is to conceptually prepare the transformation of the German-speaking prints from the 16th-19th Century in machine-readable full text.
The Herzog August Bibliothek in Wolfenbüttel, the Berlin-Brandenburg Academy of Sciences and Humanities, in particular the German Text Archive in Berlin and the State Library Berlin take part in this project.
Leading experts, scholars and libraries in the fields of digitization and digital Humanities will furthermore support this project.
Recently especially academic libraries digitized large stocks of historic materials and present the images online. Through an OCR process searchable full texts can be automatically generated from these image data. The value added by the use of such digital text documents is indispensable today in many scientific disciplines, particularly in the field of humanities.
To date, however, the access to the electronic full text is often not or only insufficiently possible, although many historical documents are accessible online through the "Bibliography of Books Printed in the German Speaking Countries from the 16th-18th century”.
The Results from established OCR methods have so far been insufficient when it comes to the recognition of old printing type, especially Gothic types. This is where the work of OCR-D will set in. The aim is to examine existing tools and recent studies to describe if and how latest research results are used in established OCR processes and how the tools and recent findings can be used to develop the OCR process for mass-digitization of printed historical material.
The project is funded by the German Research Foundation (DFG) and has a term of three years until September 2018. In the first phase requirements are discovered and conceptually prepare the second phase. In this tenders for pilot projects will be carried out, which allows the participation of other facilities. Those projects should find an answer for the technical and organizational challenges we are facing today.
In every step, we welcome an active exchange with colleagues from related projects and facilities as well as service providers.
At the end of the overall project we will present a consolidated concept for the OCR processing of digitized printed heritage from 16th to 19th century.