OCR Tools

Machine Learning in Linux: Surya – multilingual document OCR toolkit adds text recognition

In Operation

In our review of an earlier version, we explored the CLI. This time, let’s look at the streamlit app. Start it with the command $ surya_gui

Your default web browser will open at http://localhost:8501/.

We’ve going to show a very simple example, a PNG image that’s the first page of A Room with a View, a 1908 novel by writer E. M. Forster.

The right hand side is the uploaded image. The left hand side is the OCR output. As you can see, the OCR output has some strange font sizes. But that’s not an issue. The text rendering is designed for debugging only. The actual image output is not relevant to the quality of the OCR.

Surya in action
Click image for full size

The next image shows the OCR text. It’s this output that matters.

Surya OCR
Click image for full size

Summary

Surya is already generating impressive results. From running tests on a variety of different images, the text recognition is impressive, particularly given the software is in an early stage of development.

The latest release has added text recognition. The software works better with documents using printed text, and results can be improved by preprocessing images, or by changing the resolution of the image.

The software supports more than 90 languages.

Website: github.com/VikParuchuri/surya
Support:
Developer: Vik Paruchuri
License: GNU General Public License v3.0

Surya is written in Python. Learn Python with our recommended free books and free tutorials.

Artificial intelligence icon For other useful open source apps that use machine learning/deep learning, we’ve compiled this roundup.

Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation and Summary

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

1 Comment
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Daniel Hunt
Daniel Hunt
2 months ago

I have been surfing online more than 3 hours today, yet I never found any interesting article like yours. It is pretty worth enough for me. In my opinion, if all web owners and bloggers made good content as you did, the web will be much more useful than ever before.