Big Data is an all-inclusive term that refers to data sets so large and complex that they need to be processed by specially designed hardware and software tools. The data sets are typically of the order of tera or exabytes in size. These data sets are created from a diverse range of sources: sensors that gather climate information, publicly available information such as magazines, newspapers, articles. Other examples where big data is generated include purchase transaction records, web logs, medical records, military surveillance, video and image archives, and large-scale e-commerce.
There is a heightened interest in Big Data and Big Data analytics and the implications they have for businesses. Big data is more than simply a matter of size though. Big Data varies in terms of volume, velocity, variety, and veracity. The analysis of Big Data provides a unique opportunity to find insights in new and emerging types of data and content, to make a business more responsive to change, and to have answers to questions that could not previously be addressed.
Many organisations just cannot feasibly keep up with the volume and velocity of the data being generated. It calls for an entirely different approach than on-hand database management tools or traditional data processing applications.
This article is one of a series which highlights the best open source software for making sense of Big Data. This article examines the finest open source software that provide full-featured search engines through an application programming interface. With scalable, high-performance indexing, the featured software here is designed for performing information retrieval functions on Big Data.
Here’s our recommendations. Each featured program is published under an open source license.
Now, let’s explore the 8 Big Data search engines. For each title we have compiled its own portal page, a full description with an in-depth analysis of its features, together with links to relevant resources.
|Search Engines for Big Data|
|Apache Solr||Search engine server that uses Lucene|
|Apache Lucene||Search engine library|
|ElasticSearch||Flexible and powerful distributed RESTful search engine and analytics engine|
|MeiliSearch||Easy to use and deploy search engine|
|Sphinx||Search engine designed with indexing database content in mind|
|Nutch||Web-search software project|
|Xapian||Probabilistic information retrieval library|
|Typesense||Fast, typo-tolerant search engine|
Read our complete collection of recommended free and open source software. The collection covers all categories of software.
The software collection forms part of our series of informative articles for Linux enthusiasts. There's tons of in-depth reviews, open source alternatives to proprietary software from large corporations like Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle and Autodesk. There are also fun things to try, hardware, free programming books and tutorials, and much more.