Tabula is a tool for liberating data tables locked inside PDF files through a simple web interface.
Turn PDF reports into Excel spreadsheets, CSVs, and JSON files for use in analysis and database applications.
Tabula only works on text-based PDFs, not scanned documents.
The software is written in Java.
Key Features
- Extract rows of data from PDF files into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface.
- Templates – create selections, save them, and reload those same templates for another PDF.
- Bindings for JRuby and R.
- tabula-java is a library for extracting tables from PDF files — it’s the table extraction engine that powers Tabula.
- Support for RTL languages like Hebrew and Arabic.
- Designed with security in mind. Your PDF and the extracted data never touch the net
- Cross-platform support – as the software is developed in Java it runs under Linux, Mac, and Windows.
Website: tabula.technology
Support: GitHub Code Repository
Developer: Manuel Aristarán, Mike Tigas and Jeremy B. Merrill with the support of ProPublica, La Nación DATA, Knight-Mozilla OpenNews, The New York Times
License: MIT License
Tabula requires a Java Runtime Environment compatible with Java 7 (i.e. Java 7, 8 or higher).
Tabula is written in Java. Learn Java with our recommended free books and free tutorials.
Related Software
| PDF Manipulation Tools | |
|---|---|
| Stirling PDF | Locally hosted web based PDF manipulation tool |
| PDFsam | Extract pages, split, merge, mix and rotate PDF files |
| PDF Mix Tool | Perform common editing operations on PDF files |
| PDF Arranger | Merge, rearrange, split, rotate, and crop PDFs |
| cpdf | Set of command-line tools that let you modify PDF files |
| pdftk | The PDF toolkit |
| pstoedit | Translates PostScript and PDF graphics into other vector formats |
| img2pdf | Lossless conversion of raster images to PDF |
| PDF Chain | Graphical user interface for The PDF Toolkit |
| Tabula | Extract data tables inside PDF files |
| PDFStitcher | Utility for stitching together PDF pages |
| wkhtmltopdf | Render HTML into PDF |
| krop | Simple graphical tool to crop the pages of PDF files |
| Qpdf Tools | Qt interface for Ghostscript and QPDF |
| Quick PDF Join | Joins multiple PDF files together |
| PDF Tricks | Offer small manipulations in PDF files |
| OnePDFPlease | TUI for working with PDF files |
| PdfJumbler | Rearrange, merge, delete, and rotate pages |
| PDF Juggler | Mix, reorder and select PDF pages |
| jpeg2pdf | Command-line tool which lets you convert images to PDF |
Read our verdict in the software roundup.
Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk. You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more. Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form. |

