Log Analyzers

Chukwa – data collection system for monitoring large distributed systems

Last Updated on November 20, 2023

Chukwa is an open source data collection system for managing large distributed systems. It is a Hadoop subproject devoted to large-scale log collection and analysis built on top of the Hadoop Distributed File System (HDFS) and Map/Reduce framework.

Chukwa aims to provide a flexible and powerful platform for distributed data collection and rapid data processing, that is capable of modification to use newer storage technologies (HDFS appends, HBase, etc) as they mature.

In order to maintain this flexibility, Chukwa is structured as a pipeline of collection and processing stages, with clean and narrow interfaces between stages.

Chukwa has four primary components:

    1. Agents that run on each machine and emit data.
    2. Collectors that receive data from the agent and write it to stable storage.
    3. MapReduce jobs for parsing and archiving the data.
    4. HICC, the Hadoop Infrastructure Care Center; a web-portal style interface for displaying data. Data is fetched from a MySQL database, which in turn is populated by a mapreduce job that runs on the collected data, after Demux. It is the central dashboard for visualize and monitoring of metrics collected by Chukwa.

Website: chukwa.apache.org
Support:
Developer: The Apache Software Foundation
License: Apache License 2.0

Chukwa

Chukwa is written in Java and JavaScript. Learn Java with our recommended free books and free tutorials. Learn JavaScript with our recommended free books and free tutorials.


Related Software

Log Analyzers
KibanaBrowser based interface for logstash and ElasticSearch
logstashLog processing, search, and analytics
OpenObserveCloud-native observability platform
GoAccessReal-time web log analyzer and interactive viewer
FluentdData collector for unified logging layer
LokiHorizontally-scalable, highly-available, multi-tenant log aggregation system
Graylog2Log management solution implementation storing logs in ElasticSearch
GraphiteEnterprise scalable realtime graphing
SigNozMonitor your applications and troubleshoot problems
Apache FlumeDelivers data from applications to Apache Hadoop's HDFS
OpenTSDBScalable, distributed Time Series Database
VictoriaLogsHigh-performance log database designed to ingest, store, and query log data
ScribeServer for aggregating log data that is streamed in real time from clients
LogoRRRCross-platform log analysis tool
ChukwaHadoop sub-project devoted to large-scale log collection and analysis

Read our verdict in the software roundup.


Best Free and Open Source Software Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.

This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk.

You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more.

Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form.
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments