Big Data

Storm – big-data processing system

Storm is an open source, big-data processing system that is different from other systems. Storm is designed for distributed real-time processing and is language independent. This free software makes it easy to reliably process unbounded streams of data, doing for realtime processing what Hadoop did for batch processing. It is a complex-event processing system.

It uses custom created “spouts” and “bolts” to define information sources and manipulations to allow batch, distributed processing of streaming data. A Storm application is designed as a topology of interfaces which create a “stream” of transformations. It provides similar functionality as a MapReduce job with the exception that it will theoretically run indefinitely until it is manually terminated.

Storm has many use cases: realtime analytics, online machine learning, continuous computation, distributed RPC, ETL, and more. Storm is fast: a benchmark clocked it at over a million tuples processed per second per node. It is scalable, fault-tolerant, guarantees your data will be processed, and is easy to set up and operate.

Storm integrates with any queueing system and any database system. Storm’s spout abstraction makes it easy to integrate a new queuing system. Storm has a large and growing ecosystem of libraries and tools to use in conjunction with Storm.

Key Features

  • Easy integration with any queueing and database system.
  • Simple and easy to use API.
  • Highly scalable – processes very high throughputs of messages with very low latency.
  • Fault-tolerant – runs the topology until it is killed or the cluster is shut down.
  • Uses tuples as its data model. A tuple is a named list of values, and a field in a tuple can be an object of any type.
  • Guarantees every tuple will be fully processed.
  • Can be used with any programming language. Spouts and bolts can be defined in any language. Non-JVM spouts and bolts communicate to Storm over a JSON-based protocol over stdin/stdout. Adapters that implement this protocol exist for Ruby, Python, Javascript, Perl, and PHP.
  • Easy to deploy, requiring a minimum of setup and configuration to get up and running.

Website: storm.apache.org
Support: Documentation
Developer: Backtype
License: Eclipse Public License

Storm is written in Java. Learn Java with our recommended free books and free tutorials.


Related Software

Data Analysis Tools
HadoopDistributed processing of large data sets across clusters of computers
StormDistributed and fault-tolerant realtime computation
DrillDistributed system for interactive analysis of large-scale datasets
FlinkFramework and distributed processing engine
SparkUnified analytics engine for large-scale data processing
PentahoEnterprise reporting, analysis, dashboard, data mining, workflow and more
HPCC SystemsDesigned for the enterprise to resolve Big Data challenges
Rapid MinerKnowledge discovery in databases, machine learning, and data mining

Read our verdict in the software roundup.


Best Free and Open Source Software Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.

This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk.

You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more.

Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form.
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments