Apache Flume is an open source, scalable, distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic application.
The system is centrally managed and allows for intelligent dynamic management. It uses a simple extensible data model that allows for online analytic applications.
The main goal of Apache Flume is to deliver data from applications to Apache Hadoop’s HDFS. It has a simple and flexible architecture based on streaming data flows. It is robust and fault tolerant with tunable reliability mechanisms and many failover and recovery mechanisms. It uses a simple extensible data model that allows for online analytic applications.
Key Features
- Complex flows:
- build multi-hop flows where events travel through multiple agents before reaching the final destination.
- fan-in and fan-out flows.
- contextual routing.
- backup routes (fail-over) for failed hops.
- Channel-based transactions to guarantee reliable message delivery.
- Supports a durable file channel which is backed by the local file system. Events are staged in the channel, which manages recovery from failure.
- High performance persistent channel – the File Channel.
- ElasticSearch Sink.
- Create a SpoolDirectory Source and Client.
- Regex Extractor Interceptor.
- Load Balancing RPC client.
- Hive Sink based on the new Hive Streaming support.
- End to End authentication in Flume.
- Simple regex search-and-replace interceptor.
Website: flume.apache.org
Support: User Guide
Developer: The Apache Software Foundation
License: Apache License 2.0
Apache Flume is written in Java. Learn Java with our recommended free books and free tutorials.
Related Software
| Log Analyzers | |
|---|---|
| Kibana | Browser based interface for logstash and ElasticSearch |
| logstash | Log processing, search, and analytics |
| OpenObserve | Cloud-native observability platform |
| GoAccess | Real-time web log analyzer and interactive viewer |
| Fluentd | Data collector for unified logging layer |
| Loki | Horizontally-scalable, highly-available, multi-tenant log aggregation system |
| Graylog2 | Log management solution implementation storing logs in ElasticSearch |
| Graphite | Enterprise scalable realtime graphing |
| SigNoz | Monitor your applications and troubleshoot problems |
| Apache Flume | Delivers data from applications to Apache Hadoop's HDFS |
| OpenTSDB | Scalable, distributed Time Series Database |
| VictoriaLogs | High-performance log database designed to ingest, store, and query log data |
| Scribe | Server for aggregating log data that is streamed in real time from clients |
| LogoRRR | Cross-platform log analysis tool |
| Chukwa | Hadoop sub-project devoted to large-scale log collection and analysis |
Read our verdict in the software roundup.
Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk. You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more. Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form. |

