• HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
Monday, July 21, 2025
BIOENGINEER.ORG
No Result
View All Result
  • Login
  • HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
        • Lecturer
        • PhD Studentship
        • Postdoc
        • Research Assistant
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
  • HOME
  • NEWS
  • EXPLORE
    • CAREER
      • Companies
      • Jobs
        • Lecturer
        • PhD Studentship
        • Postdoc
        • Research Assistant
    • EVENTS
    • iGEM
      • News
      • Team
    • PHOTOS
    • VIDEO
    • WIKI
  • BLOG
  • COMMUNITY
    • FACEBOOK
    • INSTAGRAM
    • TWITTER
No Result
View All Result
Bioengineer.org
No Result
View All Result
Home NEWS Science News Science

OCR: Modern tool for old texts

Bioengineer by Bioengineer
April 23, 2019
in Science
Reading Time: 5 mins read
0
ADVERTISEMENT
Share on FacebookShare on TwitterShare on LinkedinShare on RedditShare on Telegram

IMAGE

Credit: Dresden State and University Library, CC BY-SA 4.0 https://creativecommons.org/licenses/by-sa/4.0/deed.de…

Historians and other Humanities’ scholars often have to deal with difficult research objects: centuries-old printed works that are difficult to decipher and often in an unsatisfactory state of conservation. Many of these documents have now been digitized – usually photographed or scanned – and are available online worldwide. For research purposes, this is already a step forward.

However, there is still a challenge to overcome: bringing the digitized old fonts into a modern form with text recognition software that is readable for non-specialists as well as for computers. Scientists at the Center for Philology and Digitality at Julius-Maximilians-Universität Würzburg (JMU) in Bavaria, Germany, have made a significant contribution to further development in this field.

With OCR4all, the JMU research team is making a new tool available to the scientific community. It converts digitized historical prints with an error rate of less than one percent into computer-readable texts. And it offers a graphical user interface that requires no IT expertise. With previous tools of this kind, user-friendliness was not always given as the users mostly had to work with programming commands.

Developed in cooperation with the humanities

The new OCR4all tool was developed under the direction of Christian Reul together with his computer science colleagues Professor Frank Puppe (Chair of Artificial Intelligence and Applied computer science) and Christoph Wick as well as Uwe Springmann (Digital Humanities expert) and numerous students and assistants.

OCR4all originates from the JMU Kallimachos project, which is funded by the German Federal Ministry of Education and Research. This cooperation between the Humanities and computer science will be continued and institutionalized in the newly founded JMU Center for Philology and Digitality.

In developing OCR4all, computer scientists have collaborated with the Humanities at JMU – including German and Romance studies and literature studies in the project “Narragonien digital”. The aim was to digitize the “Narrenschiff”, a moral satire by Sebastian Brant, a bestseller of the 15th century that was translated into many languages. Furthermore, OCR4all has been frequently used in the JMU’s Kolleg “Medieval and Early Modern Times”.

OCR4all is freely available to the public on the GitHub platform (with instructions and examples): https://github.com/OCR4all

Each print shop had its own font

Christian Reul explains the challenges involved in the development of OCR4all: Automatic text recognition (OCR = Optical Character Recognition) has been working very well for modern fonts for some time now. However, this has not yet been the case for historical fonts.

“One of the biggest problems was typography,” says Reul. One of the reasons for this is that the first printers of the 15th century did not use uniform fonts. “Their printing stamps were all carved by themselves, each printing house practically had its own letters.”

Error rates below one percent

Whether e or c, whether v or r – it is often not easy to distinguish in old prints, but software can learn to recognize such subtleties. To do so, it has to be trained on sample material. In his work, Reul has developed methods to make training more efficient. In a case study with six historical prints from the years 1476 to 1572, the average error rate in automatic text recognition was reduced from 3.9 to 1.7 percent.

Not only the methodology was improved, JMU computer scientist Christoph Wick has also decisively further refined the technical component by developing the Calamari OCR tool, which is also freely available and has since been fully integrated into OCR4all. Therefore, one gained even better results: Now, even for the oldest printed works, error rates of less than one percent can be achieved in general.

Lexical projects

Reul has also convinced external partners of the quality of Würzburg’s OCR research. In cooperation with the “Zentrum für digitale Lexikographie der deutschen Sprache” (Berlin), Daniel Sanders’ “Wörterbuch der deutschen Sprache” (Dictionary of the German Language) has been digitally indexed and a scientific publication on this work is currently being prepared. The various lines of this text often contain different fonts, representing different semantic information. Here, the existing approach to character recognition was extended in such a way that not only the text but also the typography and thus the complex content structure of the lexicon may be reproduced very precisely.

The computer scientist from Würzburg will soon complete his doctoral thesis, but he is also willing to proceed working with OCR in the future: “The computer science behind OCR is extremely exciting,” he says. A possible project in the near future: the creators of the “Idiotikon”, a dictionary of the Swiss-German language, have indicated their interest in collaboration since they might well need the Würzburg’s specialist knowledge.

###

Weblinks

OCR4all on GitHub (https://github.com/OCR4all)

Calamari on GitHub (https://github.com/Calamari-OCR)

Link to publication (case study with six historical books) (https://jlcl.org/content/2-allissues/1-heft1-2018/jlcl_2018-1_1.pdf)

Publication combining methodological and technical improvements (https://jlcl.org/content/2-allissues/1-heft1-2018/jlcl_2018-1_4.pdf)

Center for Philology and Digitality

The JMU Center for Philology and Digitality is the result of an initiative launched by Professors Dag Nikolaus Hasse, Fotis Jannidis and Ulrich Konrad. The Center bridges the gap between the Humanities, computer science and Digital Humanities. It represents the first building block for a new Humanities Center on the JMU North Campus.

A new building for the ZPD is to be erected there, close to the Graduate School building. From 2022 on one expects around 100 people working in the new ZPD building on a total area of 2,700 square metres. The total cost of the building is estimated at 15 million euros. A digital lab, research rooms and lecture halls are planned on the ground floor of the ZPD. The upper floors will be used primarily for offices and communication rooms.

Media Contact
Christian Reul
[email protected]

Original Source

https://www.uni-wuerzburg.de/en/news-and-events/news/detail/news/modern-tool-for-old-texts/

Tags: Computer ScienceResearch/DevelopmentSoftware EngineeringTechnology/Engineering/Computer Science
Share12Tweet8Share2ShareShareShare2

Related Posts

Five or more hours of smartphone usage per day may increase obesity

July 25, 2019
IMAGE

NASA’s terra satellite finds tropical storm 07W’s strength on the side

July 25, 2019

NASA finds one burst of energy in weakening Depression Dalila

July 25, 2019

Researcher’s innovative flood mapping helps water and emergency management officials

July 25, 2019
Please login to join discussion

POPULAR NEWS

  • Enhancing Broiler Growth: Mannanase Boosts Performance with Reduced Soy and Energy

    Enhancing Broiler Growth: Mannanase Boosts Performance with Reduced Soy and Energy

    73 shares
    Share 29 Tweet 18
  • Overlooked Dangers: Debunking Common Myths About Skin Cancer Risk in the U.S.

    53 shares
    Share 21 Tweet 13
  • New Organic Photoredox Catalysis System Boosts Efficiency, Drawing Inspiration from Photosynthesis

    54 shares
    Share 22 Tweet 14
  • IIT Researchers Unveil Flying Humanoid Robot: A Breakthrough in Robotics

    53 shares
    Share 21 Tweet 13

About

We bring you the latest biotechnology news from best research centers and universities around the world. Check our website.

Follow us

Recent News

Additive Manufacturing of Monolithic Gyroidal Solid Oxide Cells

Machine Learning Uncovers Sorghum’s Complex Mold Resistance

Pathology Multiplexing Revolutionizes Disease Mapping

  • Contact Us

Bioengineer.org © Copyright 2023 All Rights Reserved.

Welcome Back!

Login to your account below

Forgotten Password?

Retrieve your password

Please enter your username or email address to reset your password.

Log In
No Result
View All Result
  • Homepages
    • Home Page 1
    • Home Page 2
  • News
  • National
  • Business
  • Health
  • Lifestyle
  • Science

Bioengineer.org © Copyright 2023 All Rights Reserved.