OCR-Systems

Excellent Utilities: Paperwork – personal document manager

This is the third in a new series highlighting best-of-breed utilities. We’ll be covering a wide range of utilities including tools that boost your productivity, help you manage your workflow, and lots more besides. For this article, we’ll put Paperwork under the spotlight.

Paperwork is designed to simplify the management of your paperwork. The software lets you scan or import your documents, and quickly find what you want, wrapped together in a GTK interface.

Paperwork relies on a raft of open source projects for its core functionality. Specifically, it uses SANE/Pyinsane to scan pages (Libinsane is in development which is the successor of Pyinsane). The optical character recognition is fulfilled by Tesseract/Pyocr. Whoosh is used to index and search documents, Simplebayes for label identification, and Pillow/Libpillowfight for image manipulation. Libpoppler provides PDF support.

The software is written in the Python programming language. This is cross-platform software with both Linux and Windows platforms supported.

Installation

You can install the software using Flatpak, a technology for building and distributing desktop applications on Linux. Installing with Flatpak means Paperwork will run in a container. You have to ensure the scanning daemon is enabled on the host system, with connection allowed from 127.0.0.1.

The simplest way to install the software is arguably with an unofficial package. There are packages available for Debian / Ubuntu, Fedora, Gentoo, Arch Linux, and other distributions.

The full source code is available too.

Next page: Page 2 – In Operation

Pages in this article:
Page 1 – Introduction / Installation
Page 2 – In Operation
Page 3 – Search / Labels
Page 4 – Other Features
Page 5 – Summary


Complete list of articles in this series:

Excellent Utilities
tmuxA terminal multiplexer that offers a massive boost to your workflow
lnavAdvanced log file viewer for the small-scale; great for troubleshooting
PaperworkDesigned to simplify the management of your paperwork
AbricotineMarkdown editor with inline preview functionality
mdlessFormatted and highlighted view of Markdown files
fkillKill processes quick and easy
TuskAn unofficial Evernote client with bags of potential
UlauncherSublime application launcher
McFlyNavigate through your bash shell history
LanguageToolStyle and grammar checker for 30+ languages
pecoSimple interactive filtering tool that's remarkably useful
Liquid PromptAdaptive prompt for Bash & Zsh
AnanicyShell daemon created to manage processes’ IO and CPU priorities
cheat.shCommunity driven unified cheat sheet
ripgrepRecursively search directories for a regex pattern
exaA turbo-charged alternative to the venerable ls command
OCRmyPDFAdd OCR text layer to scanned PDFs
WatsonTrack the time spent on projects
fontpreviewQuickly search and preview fonts
fdWonderful alternative to the venerable find
scrcpyDisplay and control Android devices
dufDisk usage utility with more polished presentation than the classic df
tldrSimplified and community-driven man pages
Share this article

5 comments

  1. It seems like such a good idea, but on my Ryzen 2700X with GeForce GTX 1080 Ti, it is impossibly slow on documents of a few hundred pages. I can’t get the cut and paste to work either.

  2. Smooth GUI, but not intuitive and most of the time it is not clear what it is doing. I cannot tell when OCR was successful, little to no progress indication on most actions. It has a lot of potential, but also a lot of potential for improvement.

Share your Thoughts

This site uses Akismet to reduce spam. Learn how your comment data is processed.