Documentation

Miller – small and powerful CLI tool to do all your data processing

Miller is like awk, sed, cut, join, and sort for data formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed.

With Miller, you get to use named fields without needing to count positional indices, using familiar formats such as CSV, TSV, JSON, JSON Lines, and positionally-indexed. Then, on the fly, you can add new fields which are functions of existing fields, drop fields, sort, aggregate statistically, pretty-print, and more.

This is free and open source software.

Key Features

  • Multi-purpose: it’s useful for data cleaning, data reduction, statistical reporting, devops, system administration, log-file processing, format conversion, and database-query post-processing.
  • Snarf and munge log-file data, including selecting out relevant substreams, then produce CSV format and load that into all-in-memory/data-frame utilities for further statistical and/or graphical processing.
  • Complements data-analysis tools such as R, pandas, etc.: you can use Miller to clean and prepare your data. While you can do basic statistics entirely in Miller, its streaming-data feature and single-pass algorithms enable you to reduce very large data sets.
  • Complements SQL databases: you can slice, dice, and reformat data on the client side on its way into or out of a database. You can also reap some of the benefits of databases for quick, setup-free one-off tasks when you just need to query some data in disk files in a hurry.
  • Step fully into our modern, no-SQL world: its essential record-heterogeneity property allows Miller to operate on data where records with different schema (field names) are interleaved.
  • Streaming: most operations need only a single record in memory at a time, rather than ingesting all input before producing any output. For those operations which require deeper retention (sort, tac, stats1), Miller retains only as much data as needed. This means that whenever functionally possible, you can operate on files which are larger than your system’s available RAM, and you can use Miller in tail -f contexts.
  • Pipe-friendly and interoperates with the Unix toolkit.
  • I/O formats include tabular pretty-printing, positionally indexed (Unix-toolkit style), CSV, TSV, JSON, JSON Lines, and others.
  • Conversion between formats.
  • Processing is format-aware: e.g. CSV sort and tac keep header lines first.
  • High-throughput performance on par with the Unix toolkit.

Website: github.com/johnkerl/miller
Support:
Developer: John Kerl
License: 2-clause BSD License

Help for Miller

Miller is written in Go. Learn Go with our recommended free books and free tutorials.


Related Software

Alternatives to awk
gawkImplementation of the awk programming language
GoAWKPOSIX-compliant awk interpreter written in Go
MillerSmall and powerful CLI tool to do all your data processing
frawkSmall programming language for writing short programs processing textual data
chooseHuman-friendly and fast alternative to cut and (sometimes) awk
gemaGeneral purpose text processing utility based on the concept of pattern matching
mawkInterpreter for the awk programming language
ruplacerFind and replace text in source files
rawkPOSIX compatible AWK written in Rust
wakawk implementation for toybox and standalone
PAWKPython line processor
HawkHaskell text processor for the command-line

Best Free and Open Source Software Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.

This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk.

You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more.

Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form.
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments