Big Data is an all-inclusive term that refers to data sets so large and complex that they need to be processed by specially designed hardware and software tools. The data sets are typically of the order of tera or exabytes in size. These data sets are created from a diverse range of sources: sensors that gather climate information, publicly available information such as magazines, newspapers, articles. Other examples where big data is generated include purchase transaction records, web logs, medical records, military surveillance, video and image archives, and large-scale e-commerce.
There is a heightened interest in Big Data and Big Data analytics and the implications they have for businesses. Big data is more than simply a matter of size though. Big Data varies in terms of volume, velocity, variety, and veracity. The analysis of Big Data provides a unique opportunity to find insights in new and emerging types of data and content, to make a business more responsive to change, and to have answers to questions that could not previously be addressed.
Many organisations just cannot feasibly keep up with the volume and velocity of the data being generated. It calls for an entirely different approach than on-hand database management tools or traditional data processing applications.
This article is one of a series which highlights the best open source software for making sense of Big Data. This article examines the finest open source software that provide full-featured search engines through an application programming interface. With scalable, high-performance indexing, the featured software here is designed for performing information retrieval functions on Big Data.
The chart below captures our recommendations. All the software is free and open source.
Let’s explore the big data search engines. Click the links in the table below to learn more about each search engine.
|Search Engines for Big Data|
|Solr||Search engine server that uses Lucene|
|Lucene||Search engine library|
|ElasticSearch||Flexible and powerful distributed RESTful search engine and analytics|
|MeiliSearch||Easy to use and deploy search engine|
|Sphinx||Search engine designed with indexing database content in mind|
|Xapian||Probabilistic information retrieval library|
|Typesense||Fast, typo-tolerant search engine|
|Manticore Search||Easy to use fast database for search|
This article has been revamped in line with our recent announcement.
|Read our complete collection of recommended free and open source software. Our curated compilation covers all categories of software.
The software collection forms part of our series of informative articles for Linux enthusiasts. There are hundreds of in-depth reviews, open source alternatives to proprietary software from large corporations like Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk.
There are also fun things to try, hardware, free programming books and tutorials, and much more.