6 Useful OCR Tools
Optical Character Recognition (OCR) is the conversion of
scanned images of handwritten, typewritten or printed text into
searchable, editable documents. OCR software is able to recognise the
difference between characters and images, and between characters
themselves.
The use of paper has been displaced from some activities. For
example, the vast majority of journeys on the London Underground are
made using the Oyster card without a paper ticket being
issued. We have witnessed talk of a paperless office for more than 40
years. However, the office environment has shown a resistance to remove
the mountain of paper generated. Things have changed in the past few
years, with a marked shift in the paperless office concept. Paper
documents contain a wealth of important
management data and information that would be better stored
electronically. There is computer software that makes this conversion
possible. The benefit of scanning documents is not purely for archival
reasons. OCR technology is vital for gaining access to paper-based
information, as well as integrating that information in digital
workflows.
The selection of the right OCR tool is dependent on specific
needs. For some, online OCR services may be useful, but there are
privacy concerns and file size limitations. This article focuses on
desktop, open source OCR software that offer good recognition accuracy
and support for a wide range of file formats. We cover OCR engines as
well as front-end tools.
OCR software is not mainstream so open source alternatives to
proprietary heavyweight software (such as OmniPage, ReadIRIS, CVision
pdfcompressor, or the Linux supported ABBYY FineReader) are fairly
thin on the ground. Matters are also complicated by the fact that OCR
computer software needs very sophisticated algorithms to translate the
image of text into accurate actual text. The software also has to cope
with images that contain a lot more than text, such as
layouts, images, graphics, tables, in single or multi-pages.
Now, let's explore the 6 OCR tools at hand. For
each title we have compiled its own portal page, a full description
with an in-depth analysis of its features, together with links to
relevant resources and reviews.
| OCR Tools
|
| Tesseract |
High quality OCR Engine |
| OCRopus |
Open
source document analysis and OCR system |
| Cuneiform |
OCR Engine to convert OCR documents into editable form |
| Lios |
linux-intelligent-ocr-solution |
| OCRFeeder |
Desktop OCR suite |
| GOCR |
Reads
images in many formats |
Return to our complete collection of Group
Tests, identifying the finest Linux software.
Last Updated Friday, April 26 2013 @ 08:26 AM EDT |