Miller - small and powerful CLI tool to do all your data processing

Miller is like awk, sed, cut, join, and sort for data formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed.

With Miller, you get to use named fields without needing to count positional indices, using familiar formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed. Then, on the fly, you can add new fields which are functions of existing fields, drop fields, sort, aggregate statistically, pretty-print, and more.

This is free and open source software.

Key Features

Multi-purpose: it’s useful for data cleaning, data reduction, statistical reporting, devops, system administration, log-file processing, format conversion, and database-query post-processing.
Snarf and munge log-file data, including selecting out relevant substreams, then produce CSV format and load that into all-in-memory/data-frame utilities for further statistical and/or graphical processing.
Complements data-analysis tools such as R, pandas, etc.: you can use Miller to clean and prepare your data. While you can do basic statistics entirely in Miller, its streaming-data feature and single-pass algorithms enable you to reduce very large data sets.
Complements SQL databases: you can slice, dice, and reformat data on the client side on its way into or out of a database. You can also reap some of the benefits of databases for quick, setup-free one-off tasks when you just need to query some data in disk files in a hurry.
Step fully into our modern, no-SQL world: its essential record-heterogeneity property allows Miller to operate on data where records with different schema (field names) are interleaved.
Streaming: most operations need only a single record in memory at a time, rather than ingesting all input before producing any output. For those operations which require deeper retention (sort, tac, stats1), Miller retains only as much data as needed. This means that whenever functionally possible, you can operate on files which are larger than your system’s available RAM, and you can use Miller in tail -f contexts.
Pipe-friendly and interoperates with the Unix toolkit.
I/O formats include tabular pretty-printing, positionally indexed (Unix-toolkit style), CSV, TSV, JSON, JSON Lines, and others.
Conversion between formats.
Processing is format-aware: e.g. CSV sort and tac keep header lines first.
High-throughput performance on par with the Unix toolkit.

Website: github.com/johnkerl/miller
Support:
Developer: John Kerl
License: 2-clause BSD License

Miller is written in Go. Learn Go with our recommended free books and free tutorials.

Related Software

Alternatives to awk
gawk	Implementation of the awk programming language
GoAWK	POSIX-compliant awk interpreter written in Go
Miller	Small and powerful CLI tool to do all your data processing
frawk	Small programming language for writing short programs processing textual data
choose	Human-friendly and fast alternative to cut and (sometimes) awk
gema	General purpose text processing utility based on the concept of pattern matching
mawk	Interpreter for the awk programming language
ruplacer	Find and replace text in source files
rawk	POSIX compatible AWK written in Rust
wak	awk implementation for toybox and standalone
PAWK	Python line processor
Hawk	Haskell text processor for the command-line

Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.

This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk.

You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more.

Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form.

Documents	Internet	Education
Audio	Video	Graphics
Admin	Desktop	Productivity
Science	Games	Security
Utilities	Coding	Finance
Web Apps	Other	Books

Google	Microsoft	Apple
Adobe	IBM	Autodesk
Oracle	Atlassian	Corel
Cisco	Intuit	SAS
Progress	Salesforce	Citrix