Xapian
Xapian is an open source probabilistic information retrieval
library. It is a full text search engine library for
programmers.
The Xapian search engine library is a highly adaptable toolkit
which allows developers to easily add advanced indexing and search
facilities to their own applications. It implements the probabilistic
model of information
retrieval, and provides facilities for performing ranked free-text
searches, relevance feedback, phrase searching, boolean searching,
stemming, and simultaneous update and searching. It is highly scalable,
and is capable of working with collections containing hundreds of
millions of documents.
It supports a rich set of boolean query operators.
Xapian is written in C++, with bindings to allow use from
Perl, Python, PHP, Java, Tcl, C#, Ruby, Lua and Erlang.
Features include:
- Supports database files > 2GB - essential for
scaling to large document collections
- Transactions: if database update fails in the middle of a
transaction, the database is guaranteed to remain in a consistent state
- Simultaneous search and update, with new documents being
immediately visible
- Support for large databases: Xapian has been proven to be
scalable to hundreds of millions of documents
- Accurate probabilistic ranking: more relevant documents are
listed first
- Phrase and proximity searching - users can search
for words occurring in an exact phrase or within a specified number of
words, either in a specified order, or in any order
- Relevance feedback, which improves ranking and can expand a
query, find related documents, categorise documents etc
- Structured Boolean queries, e.g. "race AND condition NOT
horse"
- Wildcard search, e.g. "wiki*"
- Spelling correction
- Synonyms
- Omega, a packaged solution for adding a search engine to a
web site or intranet. Omega can easily be extended and adapted to fit
changing requirements
- Faceted search - dynamically generate complete
lists of category values which feature in matching documents
- Supports Unicode (including codepoints beyond the BMP), and
stores indexed data in UTF-8
- Highly portable
Return
to Search Engines for Big Data Home Page
Last Updated Wednesday, April 03 2013 @ 07:03 AM EST |