Storm
Storm is an open source, big-data processing system that
is different from other systems. Storm is designed for distributed
real-time processing and is language independent. This free software
makes it easy to reliably process unbounded streams of data, doing for
realtime processing what Hadoop did for batch processing. It is a
complex-event processing system.
It uses custom created "spouts" and "bolts" to define
information sources and manipulations to allow batch, distributed
processing of streaming data. A Storm application is designed as a
topology of interfaces which create a "stream" of transformations. It
provides similar functionality as a MapReduce job with the exception
that it will theoretically run indefinitely until it is manually
terminated.
Storm has many use cases: realtime analytics, online machine
learning, continuous computation, distributed RPC, ETL, and more. Storm
is fast: a benchmark clocked it at over a million tuples processed per
second per node. It is scalable, fault-tolerant, guarantees
your data will be processed, and is easy to set up and operate.
Storm integrates with any queueing system and any database
system. Storm's spout abstraction makes it easy to integrate a new
queuing system. Storm has a large and growing ecosystem of
libraries and tools to use in conjunction with Storm.
Features include:
- Easy integration with any queueing and database system
- Simple and easy to use API
- Highly scalable - processes very high throughputs of
messages with very low latency
- Fault-tolerant - runs the topology until it is
killed or the cluster is shut down
- Uses tuples as its data model. A tuple is a named list of
values, and a field in a tuple can be an object of any type
- Guarantees every tuple will be fully processed
- Can be used with any programming language. Spouts and bolts
can be defined in any language. Non-JVM spouts and bolts communicate to
Storm over a JSON-based protocol over stdin/stdout. Adapters that
implement this protocol exist for Ruby, Python, Javascript, Perl, and
PHP
- Easy to deploy, requiring a minimum of setup and
configuration to get up and running
Return
to Data Analysis Tools for Big Data Home Page
Last Updated Saturday, April 06 2013 @ 03:12 AM EST |