OCRopus is an OCR system written in Python, NumPy, and SciPy
focusing on the use of large scale machine learning for addressing
problems in document analysis.
OCRopus can be used from the command line or inside gscan2pdf.
OCRopus aims primarily for high-volume document conversion,
namely for Google Book Search, but also for desktop and office use or
for vision impaired people. The codebase is mostly in C++,
with some Python. The build system is based on jam.
- Pluggable layout analysis
- Pluggable character recognition
- Pluggable language modeling
- Text line recognizer based on recurrent neural networks
(and does not require language modeling)
- Models for both Latin script and Fraktur
- Tools for ground truth labeling
- Sample scripts illustrating recognition and training
- Layout analysis plugin does image preprocessing and layout
- Unicode and ligature support
OCR Tools Home Page
Last Updated Tuesday, May 06 2014 @ 02:06 PM EDT