PDF

Tabula – extract data tables inside PDF files

Tabula is a tool for liberating data tables locked inside PDF files through a simple web interface.

Turn PDF reports into Excel spreadsheets, CSVs, and JSON files for use in analysis and database applications.

Tabula only works on text-based PDFs, not scanned documents.

The software is written in Java.

Key Features

  • Extract rows of data from PDF files into a CSV or Microsoft Excel spreadsheet using a simple, easy-to-use interface.
  • Templates – create selections, save them, and reload those same templates for another PDF.
  • Bindings for JRuby and R.
  • tabula-java is a library for extracting tables from PDF files — it’s the table extraction engine that powers Tabula.
  • Support for RTL languages like Hebrew and Arabic.
  • Designed with security in mind. Your PDF and the extracted data never touch the net
  • Cross-platform support – as the software is developed in Java it runs under Linux, Mac, and Windows.

Website: tabula.technology
Support: GitHub Code Repository
Developer: Manuel Aristarán, Mike Tigas and Jeremy B. Merrill with the support of ProPublica, La Nación DATA, Knight-Mozilla OpenNews, The New York Times
License: MIT License

Tabula requires a Java Runtime Environment compatible with Java 7 (i.e. Java 7, 8 or higher).

Tabula is written in Java. Learn Java with our recommended free books and free tutorials.


Related Software

PDF Manipulation Tools
Stirling PDFLocally hosted web based PDF manipulation tool
PDFsamExtract pages, split, merge, mix and rotate PDF files
PDF Mix ToolPerform common editing operations on PDF files
PDF ArrangerMerge, rearrange, split, rotate, and crop PDFs
cpdfSet of command-line tools that let you modify PDF files
pdftkThe PDF toolkit
pstoeditTranslates PostScript and PDF graphics into other vector formats
img2pdfLossless conversion of raster images to PDF
PDF ChainGraphical user interface for The PDF Toolkit
TabulaExtract data tables inside PDF files
PDFStitcherUtility for stitching together PDF pages
wkhtmltopdfRender HTML into PDF
kropSimple graphical tool to crop the pages of PDF files
Qpdf ToolsQt interface for Ghostscript and QPDF
Quick PDF JoinJoins multiple PDF files together
PDF TricksOffer small manipulations in PDF files
OnePDFPleaseTUI for working with PDF files
PdfJumblerRearrange, merge, delete, and rotate pages
PDF JugglerMix, reorder and select PDF pages
jpeg2pdfCommand-line tool which lets you convert images to PDF

Read our verdict in the software roundup.


Best Free and Open Source Software Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.

This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk.

You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more.

Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form.
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments