7 Search Engines for Big Data
Big Data is an all-inclusive term that refers to data sets so
large and complex that they need to be processed by specially designed
hardware and software tools. The data sets are typically of the order
of tera or exabytes in size. These data sets are created from a diverse
range of sources: sensors that gather climate information, publicly
available information such as magazines, newspapers, articles. Other
examples where big data is generated include purchase transaction
records, web logs, medical records, military surveillance, video and
image archives, and large-scale e-commerce.
There is a heightened interest in Big Data and Big Data
analytics and the implications they have for businesses. Big data is
more than simply a matter of size though. Big Data varies in terms of
volume, velocity, variety, and veracity. The analysis of Big Data
provides a unique opportunity to find insights in new and emerging
types of data and content, to make a business more responsive to
change, and to have answers to questions that could not previously be
addressed.
Many organisations just cannot feasibly
keep up with the volume and velocity of the data being generated. It
calls for an entirely different approach than on-hand database
management tools or traditional data processing applications.
This is the first in a series of articles which highlight the
best open source software for making sense of Big Data. This article
examines the finest open source software that provide full-featured
search engines through an application programming interface. With
scalable, high-performance indexing, the featured software here is
designed for performing information retrieval functions on Big Data.
Now, let's explore the 7 Big Data search engines. For
each title we have compiled its own portal page, a full description
with an in-depth analysis of its features, together with links to
relevant resources and reviews.
| Search Engines for Big Data |
| Apache
Lucene |
Search engine library |
| Apache
Solr |
Search
engine server that uses Lucene |
| ElasticSearch |
Flexible and powerful distributed RESTful search engine
and analytics engine |
| Sphinx |
Search
engine designed with indexing database content in mind |
| Xapian |
Probabilistic information retrieval library |
| Nutch |
Web-search
software project |
| LGTE |
Information retrieval tool |
Return to our complete collection of Group
Tests, identifying the finest Linux software.
Last Updated Wednesday, April 03 2013 @ 07:15 AM EST |