Storm is an open source, big-data processing system that is different from other systems. Storm is designed for distributed real-time processing and is language independent. This free software makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. It is a complex-event processing system.
It uses custom created “spouts” and “bolts” to define information sources and manipulations to allow batch, distributed processing of streaming data. A Storm application is designed as a topology of interfaces which create a “stream” of transformations. It provides similar functionality as a MapReduce job with the exception that it will theoretically run indefinitely until it is manually terminated.
Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.
Storm integrates with any queueing system and any database system. Storm’s spout abstraction makes it easy to integrate a new queuing system. Storm has a large and growing ecosystem of libraries and tools to use in conjunction with Storm.
Key Features
- Easy integration with any queueing and database system.
- Simple and easy to use API.
- Highly scalable – processes very high throughputs of messages with very low latency.
- Fault-tolerant – runs the topology until it is killed or the cluster is shut down.
- Uses tuples as its data model. A tuple is a named list of values, and a field in a tuple can be an object of any type.
- Guarantees every tuple will be fully processed.
- Can be used with any programming language. Spouts and bolts can be defined in any language. Non-JVM spouts and bolts communicate to Storm over a JSON-based protocol over stdin/stdout. Adapters that implement this protocol exist for Ruby, Python, Javascript, Perl, and PHP.
- Easy to deploy, requiring a minimum of setup and configuration to get up and running.
Website: storm.apache.org
Support: Documentation
Developer: Backtype
License: Eclipse Public License
Storm is written in Java. Learn Java with our recommended free books and free tutorials.
Related Software
| Data Analysis Tools | |
|---|---|
| Hadoop | Distributed processing of large data sets across clusters of computers |
| Storm | Distributed and fault-tolerant realtime computation |
| Drill | Distributed system for interactive analysis of large-scale datasets |
| Flink | Framework and distributed processing engine |
| Spark | Unified analytics engine for large-scale data processing |
| Pentaho | Enterprise reporting, analysis, dashboard, data mining, workflow and more |
| HPCC Systems | Designed for the enterprise to resolve Big Data challenges |
| Rapid Miner | Knowledge discovery in databases, machine learning, and data mining |
Read our verdict in the software roundup.
Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk. You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more. Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form. |

