OpenRefine is a powerful tool for working with messy data: cleaning it; transforming it from one format into another; and extending it with web services and external data.
It has similarities with spreadsheet applications, and can handle spreadsheet file formats such as CSV, but it’s closer to acting like a database. Unlike spreadsheets, OpenRefine doesn’t store formulas and display the output of those calculations; it only shows the value inside each cell. It doesn’t support cell colors or text formatting.
OpenRefine lets users clean, correct, codify, and extend data. Without ever needing to type inside a single cell, users can automatically fix typos, convert things to the right format, and add structured categories from trusted sources.
This is free and open source software.
Key Features
- Faceting – drill through large datasets using facets and apply operations on filtered views of your dataset.
- Clustering – use a variety of comparison methods to find text entries that are similar but not exact, then shares those results with you so that you can merge the cells that should match.
- Transformation of data – convert values to other formats, normalizing and denormalizing.
- Reconciliation – matches your dataset with that of an external source.
- Infinite undo/redo – go back to any previous state of your dataset and replay your operation history.
- Privacy – data is cleaned locally. It doesn’t require internet access to run its basic functions.
- Export – CSV, Excel, Google spreadsheet, HTML table, and TSV.
- Import – CSV, Google spreadsheet, JSON, RDF triples, TSV, and XML.
- Cross-platform support – runs under Linux, macOS, and Windows.
Website: openrefine.org
Support: GitHub Code Repository
Developer: Community
License: BSD 3-Clause “New” or “Revised” License

OpenRefine is written in Java. Learn Java with our recommended free books and free tutorials.
Related Software
| Python Data Validation | |
|---|---|
| Pydantic | Data validation using Python type hints |
| pandera | Framework for precision data testing |
| jsonschema | Implementation of JSON Schema for Python |
| Cerberus | Lightweight and extensible data validation library |
| schema | Library for validating Python data structures |
| GX | Validating, documenting, and profiling data |
| marshmallow | ORM/ODM/framework-agnostic library |
| Voluptuous | Python data validation library |
| Schematics | Combine types into structures, validate , and transform the shapes of data |
| Colander | Serialization / deserialization / validation library |
| Valideer | Lightweight data validation and adaptation Python library |
| OpenRefine | Desktop program for data cleanup and transformation |
| Soda Core | Data quality and data contract verification engine |
| OpenMetadata | Unified metadata platform |
| Elementary OSS | dbt-native data observability command-line tool |
Read our verdict in the software roundup.
Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk. You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more. Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form. |

