HPCC Systems
HPCC (High-Performance Computing Cluster) is an open source
data-intensive computing system platform designed for the enterprise to
resolve Big Data challenges. It stores and processes large
quantities of data, processing billions of records per second using
massive parallel processing technology. Large amounts of data across
disparate data sources can be accessed, analyzed and manipulated in
fractions of seconds. HPCC functions as both a processing and a
distributed data storage environment, capable of analyzing terabytes of
information.
The HPCC Thor (The Data Refinery Cluster) technology is
designed to effectively process, analyze, and find links and
associations within high
volumes of complex data. It functions as a distributed file system with
parallel processing power spread across several nodes. A cluster can
scale from a single node to thousands of nodes. This can detect
non-obvious relationships, scale to support petabytes of data, and is
significantly faster than competing technologies while requiring less
hardware and resources. HPCC Thor works well on Amazon AWS EC2.
The HPCC Roxie technology - also known as the Rapid
Online XML Inquiry Engine or RDDE - uses a combination of
technologies and techniques that produce extremely fast throughput for
queries on indexed data. It is the data delivery engine used in HPCC to
serve data quickly and can support many thousands of requests per node
per second.
HPCC generates C++ and not Java which gives it an efficiency advantage.
HPCC has also been in critical production environments for over a
decade. The Community Edition is an open source version of the HPCC
platform that is supported by an active community of open source
developers and enthusiasts.
Features include:
- Services for job execution
- Services for distributed file system access
- A Thor cluster is also configured with a master node and
multiple slave nodes
- A Roxie cluster is a peer-coupled cluster where each node
runs Server and Agent tasks for query execution and key and file
processing
- The file system on the Roxie cluster is a distributed
indexed-based file system which uses a custom B+Tree structure for data
storage
- Indexes and data supporting queries are pre-built on Thor
clusters and deployed to Roxie with portions of the index and data
stored on each node
- ECL Agent acting on behalf of a client program to manage
the execution of a job on a Thor cluster
- Roxie file system is optimized for high concurrent query
processing
- ESP Server (Enterprise Services Platform) providing
authentication, logging, security, and other services for the job
execution and Web services environment
- Dali server which functions as the system data store for
job workunit information and provides naming services for the
distributed file systems
- ECL IDE - an integrated development environment
for the ECL language designed to make ECL coding easy and
programmer-friendly. Using the ECL IDE you can build, edit and execute
ECL queries, and mix and match your data with any of the ECL built-in
functions and/or definitions that you have created. The ECL
IDE offers a built-in Attribute Editor, Syntax Checking, and ECL
Repository Access. You can execute queries and review your results
interactively, making the ECL IDE a robust and powerful programming
tool
- ECL code migration tool
- Distributed File Utility (DFU)
- Environment Configuration Utility
- ECLWatch is a Web-based utility program for monitoring the
HPCC environment and includes queue management, distributed file system
management, job monitoring, and system performance monitoring tools
Return
to Data Analysis Tools for Big Data Home Page
Last Updated Thursday, April 11 2013 @ 03:33 PM EDT |