Sphinx
SQL Phrase Index (Sphinx) is an open source software search
engine designed with indexing database content in mind. It is written
in C++.
Sphinx is a standalone software package provides
fast and relevant full-text search functionality to client
applications. It was specially designed to integrate well with SQL
databases storing the data, and to be easily accessed by scripting
languages.
Applications can access Sphinx search daemon (searchd) using
any of the three different access methods: a) via native search API
(SphinxAPI), b) via Sphinx own implementation of MySQL network protocol
(using a small SQL subset called SphinxQL), or c) via MySQL server with
a pluggable storage engine (SphinxSE).
Sphinx powers many popular websites.
Features include:
- Batch and Real-Time full-text indexes
- Indexing speed of up to 10-15 MB/sec per single core and HDD
- Sphinx clusters scale up to tens of billions of documents
and hundreds of millions search queries per day
- Searching speed of over 500 queries/sec against
1,000,000-document on a 2-core desktop system with 2 GB of RAM
- Batch and incremental (soft real-time) full-text indexing
- Support for non-text attributes (scalars, strings, sets)
- Supports boolean, phrase, word proximity and other types of
queries
- Direct indexing of SQL databases. Native support for MySQL,
MariaDB, PostgreSQL, MSSQL, plus ODBC connectivity
- XML documents indexing support
- Distributed searching support out of the
box. Searches
can be distributed across multiple machines, enabling horizontal
scale-out and HA (High Availability).
- Integration via access APIs
- SQL-like syntax support via MySQL protocol
- Full-text searching syntax
- Database-like result set processing
- Relevance ranking utilizing additional factors besides
standard BM25
- Sphinx comes with three different APIs, SphinxAPI,
SphinxSE, and SphinxQL
- Text processing support for SBCS and UTF-8 encodings,
stopwords, indexing of words known not to appear in the database
("hitless"), stemming, word forms, tokenizing exceptions, and "blended
characters" (dual-indexing as both a real character and a word
separator)
- Supports UDF
- Supports stemming (stemmers for English, Russian, Czech and
Arabic are built-in; and stemmers for French, Spanish, Portuguese,
Italian, Romanian, German, Dutch, Swedish, Norwegian, Danish, Finnish,
Hungarian, are available by building third party
Return
to Search Engines for Big Data Home Page
Last Updated Wednesday, April 03 2013 @ 05:24 AM EST |