Natural Language Processing

TiMBL – Tilburg Memory-Based Learner

The Tilburg Memory Based Learner, TiMBL, is an open source tool for NLP research, and for many other domains where classification tasks are learned from examples. It is an efficient implementation of k-nearest neighbor classifier. It is a core component of various NLP software systems such as MBT (memory-based tagger generator), Frog (Dutch morpho-syntactic analyzer), Valkuil.net (Dutch context-sensitive spelling corrector), and SoothSayer (Dutch word completion).

TiMBL is an an open source software package implementing several memory-based learning algorithms, among which IB1-IG, an implementation of k-nearest neighbor classification with feature weighting suitable for symbolic feature spaces, and IGTree, a decision-tree approximation of IB1-IG. All implemented algorithms have in common that they store some representation of the training set explicitly in memory. During testing, new cases are classified by extrapolation from the most similar stored cases.

Memory-Based Learning (MBL) is a machine-learning method applicable to a wide range of tasks in Natural Language Processing (NLP).

TiMBL is a product of the ILK Research Group (Tilburg University, The Netherlands) and the CLiPS Research Centre (University of Antwerp, Belgium).

Key Features

  • Multi-CPU support.
  • Fast, decision-tree-based implementation of k-nearest neighbor classification.
  • Implementations of IB1 and IB2, IGTree, TRIBL, and TRIBL2 algorithms.
  • Similarity metrics: Overlap, MVDM, Jeffrey Divergence, Dot product, Cosine.
  • Per-value similarity metrics: Levenshtein, Dice coefficient.
  • Feature weighting metrics: information gain, gain ratio, chi squared, shared variance.
  • Distance weighting metrics: inverse, inverse linear, exponential decay.
  • Extensive verbosity options to inspect nearest neighbor sets.
  • Server functionality and extensive API.
  • Fast leave-one-out testing and internal cross-validation.
  • Handles user-defined example weighting.

Website: languagemachines.github.io
Support: Reference Guide, Frog, GitHub Code Repository
Developer: Ko van der Sloot, Antal van den Bosch and contributors
License: GNU GPL v3

TiMBL is written in C++. Learn C++ with our recommended free books and free tutorials.


Related Software

C++ Natural Language Processing Tools
text2vecFramework with API for text analysis and natural language processing
MosesStatistical machine translation system
TiMBLTilburg Memory-Based Learner
MITIEMIT Information Extraction
MeTAModern C++ data sciences toolkit
Colibri CoreEfficient n-gram & skipgram modelling on text corpora
CRF++Yet Another CRF toolkit
BLLIP ParserStatistical natural language parser

Read our verdict in the software roundup.


Best Free and Open Source Software Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.

This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk.

You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more.

Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form.
Subscribe
Notify of
guest
0 Comments
Oldest
Newest Most Voted