Miller is like awk, sed, cut, join, and sort for data formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed.
With Miller, you get to use named fields without needing to count positional indices, using familiar formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed. Then, on the fly, you can add new fields which are functions of existing fields, drop fields, sort, aggregate statistically, pretty-print, and more.
This is free and open source software.
Key Features
- Multi-purpose: it’s useful for data cleaning, data reduction, statistical reporting, devops, system administration, log-file processing, format conversion, and database-query post-processing.
- Snarf and munge log-file data, including selecting out relevant substreams, then produce CSV format and load that into all-in-memory/data-frame utilities for further statistical and/or graphical processing.
- Complements data-analysis tools such as R, pandas, etc.: you can use Miller to clean and prepare your data. While you can do basic statistics entirely in Miller, its streaming-data feature and single-pass algorithms enable you to reduce very large data sets.
- Complements SQL databases: you can slice, dice, and reformat data on the client side on its way into or out of a database. You can also reap some of the benefits of databases for quick, setup-free one-off tasks when you just need to query some data in disk files in a hurry.
- Step fully into our modern, no-SQL world: its essential record-heterogeneity property allows Miller to operate on data where records with different schema (field names) are interleaved.
- Streaming: most operations need only a single record in memory at a time, rather than ingesting all input before producing any output. For those operations which require deeper retention (sort, tac, stats1), Miller retains only as much data as needed. This means that whenever functionally possible, you can operate on files which are larger than your system’s available RAM, and you can use Miller in tail -f contexts.
- Pipe-friendly and interoperates with the Unix toolkit.
- I/O formats include tabular pretty-printing, positionally indexed (Unix-toolkit style), CSV, TSV, JSON, JSON Lines, and others.
- Conversion between formats.
- Processing is format-aware: e.g. CSV sort and tac keep header lines first.
- High-throughput performance on par with the Unix toolkit.
Website: github.com/johnkerl/miller
Support:
Developer: John Kerl
License: 2-clause BSD License

Miller is written in Go. Learn Go with our recommended free books and free tutorials.
Related Software
| Alternatives to awk | |
|---|---|
| gawk | Implementation of the awk programming language |
| GoAWK | POSIX-compliant awk interpreter written in Go |
| Miller | Small and powerful CLI tool to do all your data processing |
| frawk | Small programming language for writing short programs processing textual data |
| choose | Human-friendly and fast alternative to cut and (sometimes) awk |
| gema | General purpose text processing utility based on the concept of pattern matching |
| mawk | Interpreter for the awk programming language |
| ruplacer | Find and replace text in source files |
| rawk | POSIX compatible AWK written in Rust |
| wak | awk implementation for toybox and standalone |
| PAWK | Python line processor |
| Hawk | Haskell text processor for the command-line |
Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk. You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more. Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form. |

