Search

Terrier – flexible, efficient, and effective open source search engine

Terrier is billed as a highly flexible, efficient, and effective open source search engine, readily deployable on large-scale collections of documents. Terrier implements state-of-the-art indexing and retrieval functionalities, and provides an ideal platform for the rapid development and evaluation of large-scale retrieval applications.

Terrier follows a plugin architecture, and is easy to extend to develop new retrieval techniques, add new ranking features or experiment with low-level functionality such as index compression.

It’s written in the Java programming language, and therefore runs on all main operating systems.

Features include:

  • Indexing support for common desktop file formats, and for commonly used TREC research collections (e.g. TREC CDs 1-5, WT2G, WT10G, GOV, GOV2, Blogs06, Blog08, ClueWeb09, ClueWeb12).
  • Many document weighting models, such as many parameter-free Divergence from
  • Randomness weighting models, Okapi BM25 and language modelling.
  • Supervised (machine learned) ranking models are supported via learning to rank.
  • Conventional query language supported, including phrases, and terms occurring in tags.
  • Handling full-text indexing of large-scale document collections, in a centralised architecture to at least 50 million documents, and using the Hadoop MapReduce distributed indexing scheme for even larger collections.
  • Incremental indexing and retrieval capabilities to support real-time search
  • Modular and open indexing and querying APIs, to allow easy extension for your own applications and research.
  • Active Information Retrieval research fed into the Open Source platform.
  • Indexing:
    • Out-of-the box indexing of tagged document collections, such as the TREC test collections.
    • Out-of-the box indexing for documents of various formats, such as HTML, PDF, or Microsoft Word, Excel and PowerPoint files.
    • Out-of-the box support for distributed indexing in a Hadoop MapReduce setting.
    • Indexing of field information, such as the frequency of a term in a TITLE or H1 HTML tag.
    • Indexing of position information on a word, or a block (e.g. a window of terms within a distance) level.
    • Support for various encodings of documents (UTF), to facilitate multi-lingual retrieval.
    • Support for changing the tokenisation being used.
    • Updatable indices to support real-time search
    • Indexing support for query-biased summarisation.
    • Support for fetching files to index by HTTP, allowing intranets to be easily searched.
    • Highly compressed index disk data structures with built-in pluggable compression algorithms.
    • Highly compressed direct file for efficient query expansion.
    • Alternative faster single-pass and MapReduce based indexing.
    • Various stemming techniques supported, including the Snowball stemmer for European languages.
  • Retrieval:
    • Provides desktop, command-line and Web based querying interfaces.
    • Provides standard querying facilities, as well as Query Expansion (pseudo-relevance feedback).
    • Can be applied in interactive applications, such as the included Desktop Search, or in a batch setting for research and experimentation.
    • Provides many standard document weighting models, including up to 126 Divergence From Randomness (DFR) document ranking models, and other models such as Okapi BM25, language modelling and TF-IDF. Two new 2nd generation DFR weighting model, JsKLs and XSqrA_M, are also included, which provide robust performance on a range of test collections without the need for any parameter tuning or training.
    • Advanced query language that supports synonyms, +/- operators, phrase and proximity search, and fields.
    • Learning-to-rank support enables out-of-the-box supervised ranking models.
    • Provides a number of parameter-free DFR term weighting models for automatic query expansion, in addition to Rocchio’s query expansion.
    • Flexible processing of terms through a pipeline of components, such as stopword removers and stemmers.

Website: terrier.org
Support: Documentation, GitHub Code Repository
Developer: School of Computing Science, University of Glasgow
License: Mozilla Public Licence

Learn Java with our recommended free books and free tutorials.

Return to Desktop Search Engines


Popular series
Free and Open Source SoftwareThe largest compilation of the best free and open source software in the universe. Each article is supplied with a legendary ratings chart helping you to make informed decisions.
ReviewsHundreds of in-depth reviews offering our unbiased and expert opinion on software. We offer helpful and impartial information.
Alternatives to Proprietary SoftwareReplace proprietary software with open source alternatives: Google, Microsoft, Apple, Adobe, IBM, Autodesk, Oracle, Atlassian, Corel, Cisco, Intuit, and SAS.
GamesAwesome Free Linux Games Tools showcases a series of tools that making gaming on Linux a more pleasurable experience. This is a new series.
Artificial intelligence iconMachine Learning explores practical applications of machine learning and deep learning from a Linux perspective. We've written reviews of more than 40 self-hosted apps. All are free and open source.
Guide to LinuxNew to Linux? Read our Linux for Starters series. We start right at the basics and teach you everything you need to know to get started with Linux.
Alternatives to popular CLI tools showcases essential tools that are modern replacements for core Linux utilities.
System ToolsEssential Linux system tools focuses on small, indispensable utilities, useful for system administrators as well as regular users.
ProductivityLinux utilities to maximise your productivity. Small, indispensable tools, useful for anyone running a Linux machine.
AudioSurveys popular streaming services from a Linux perspective: Amazon Music Unlimited, Myuzi, Spotify, Deezer, Tidal.
Saving Money with LinuxSaving Money with Linux looks at how you can reduce your energy bills running Linux.
Home ComputersHome computers became commonplace in the 1980s. Emulate home computers including the Commodore 64, Amiga, Atari ST, ZX81, Amstrad CPC, and ZX Spectrum.
Now and ThenNow and Then examines how promising open source software fared over the years. It can be a bumpy ride.
Linux at HomeLinux at Home looks at a range of home activities where Linux can play its part, making the most of our time at home, keeping active and engaged.
Linux CandyLinux Candy reveals the lighter side of Linux. Have some fun and escape from the daily drudgery.
DockerGetting Started with Docker helps you master Docker, a set of platform as a service products that delivers software in packages called containers.
Android AppsBest Free Android Apps. We showcase free Android apps that are definitely worth downloading. There's a strict eligibility criteria for inclusion in this series.
Programming BooksThese best free books accelerate your learning of every programming language. Learn a new language today!
Programming TutorialsThese free tutorials offer the perfect tonic to our free programming books series.
Linux Around The WorldLinux Around The World showcases usergroups that are relevant to Linux enthusiasts. Great ways to meet up with fellow enthusiasts.
Stars and StripesStars and Stripes is an occasional series looking at the impact of Linux in the USA.
Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

0 Comments
Inline Feedbacks
View all comments