Last Updated on November 20, 2023
Chukwa is an open source data collection system for managing large distributed systems. It is a Hadoop subproject devoted to large-scale log collection and analysis built on top of the Hadoop Distributed File System (HDFS) and Map/Reduce framework.
Chukwa aims to provide a flexible and powerful platform for distributed data collection and rapid data processing, that is capable of modification to use newer storage technologies (HDFS appends, HBase, etc) as they mature.
In order to maintain this flexibility, Chukwa is structured as a pipeline of collection and processing stages, with clean and narrow interfaces between stages.
Chukwa has four primary components:
-
- Agents that run on each machine and emit data.
- Collectors that receive data from the agent and write it to stable storage.
- MapReduce jobs for parsing and archiving the data.
- HICC, the Hadoop Infrastructure Care Center; a web-portal style interface for displaying data. Data is fetched from a MySQL database, which in turn is populated by a mapreduce job that runs on the collected data, after Demux. It is the central dashboard for visualize and monitoring of metrics collected by Chukwa.
Website: chukwa.apache.org
Support:
Developer: The Apache Software Foundation
License: Apache License 2.0

Chukwa is written in Java and JavaScript. Learn Java with our recommended free books and free tutorials. Learn JavaScript with our recommended free books and free tutorials.
Related Software
| Log Analyzers | |
|---|---|
| Kibana | Browser based interface for logstash and ElasticSearch |
| logstash | Log processing, search, and analytics |
| OpenObserve | Cloud-native observability platform |
| GoAccess | Real-time web log analyzer and interactive viewer |
| Fluentd | Data collector for unified logging layer |
| Loki | Horizontally-scalable, highly-available, multi-tenant log aggregation system |
| Graylog2 | Log management solution implementation storing logs in ElasticSearch |
| Graphite | Enterprise scalable realtime graphing |
| SigNoz | Monitor your applications and troubleshoot problems |
| Apache Flume | Delivers data from applications to Apache Hadoop's HDFS |
| OpenTSDB | Scalable, distributed Time Series Database |
| VictoriaLogs | High-performance log database designed to ingest, store, and query log data |
| Scribe | Server for aggregating log data that is streamed in real time from clients |
| LogoRRR | Cross-platform log analysis tool |
| Chukwa | Hadoop sub-project devoted to large-scale log collection and analysis |
Read our verdict in the software roundup.
Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk. You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more. Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form. |

