Chukwa is an open source data collection system for managing
large distributed systems. It is a Hadoop subproject devoted to
large-scale log collection and analysis built on top of the Hadoop
Distributed File System (HDFS) and Map/Reduce framework.
Chukwa aims to provide a flexible and powerful platform for
distributed data collection and rapid data processing, that is capable
of modification to use newer storage technologies (HDFS
appends, HBase, etc) as they mature. In order to maintain this
flexibility, Chukwa is structured as a pipeline of collection and
processing stages, with clean and narrow interfaces between stages.
Chukwa has four primary components:
- Agents that run on each machine and emit data.
- Collectors that receive data from the agent and write it to
- MapReduce jobs for parsing and archiving the data.
- HICC, the Hadoop Infrastructure Care Center; a web-portal
style interface for displaying data. Data is fetched from a MySQL
database, which in turn is populated by a mapreduce job that runs on
the collected data, after Demux. It is the
central dashboard for visualize and monitoring of metrics collected by
- Hadoop Infrastructure Care Center:
- Individual user view management
- Basic Drag and Drop Web portal functions
- Multiple views per user
- Multiple tabs page per view
- Multiple widgets per tab page
- Drag and Drop relayout individual view
- Basic view permission system
- external widgets integration
- data integration. Using HICC existing UI component to
display chart, table and graph.
- direct UI integration. Component can provide a URL
and HICC will display the html page as a widget on the portal page
- Collection components of Chukwa -- adaptors, agents, and
- Use the default
- Write your own parser
- Fexible and powerful toolkit for displaying, monitoring and
analyzing results to make the best use of the collected data
- Uses the same configuration system as Hadoop
to Log Analyzers Home Page
Last Updated Sunday, September 22 2013 @ 12:41 AM EDT