Natural Language Processing

MeTA – modern C++ data sciences toolkit

MeTA is a modern C++ data sciences toolkit. It’s a suite of natural language processing, classification, information retrieval, data mining, and other applications of text processing.

MeTA’s emphasis focuses on the tight integration of search capabilities (indeed, text access capabilities in general) with text analysis functions, enabling it to provide full support for building a powerful text analysis application.

Another design philosophy of MeTA is to facilitate education and research experiments with various algorithms. In this direction, it is similar to Indri/Lemur in its emphasis on modularity and extensibility achieved through object-oriented design. It enables flexible configuration of a selected subset of modules so as to make it easy for designing course assignments or experimenting with a few selected algorithms as needed in focused research projects.

Key Features

  • Text tokenization, including deep semantic features like parse trees.
  • Inverted and forward indexes with compression and various caching strategies.
  • A collection of ranking functions for searching the indexes.
  • Topic models.
  • Classification algorithms.
  • Graph algorithms.
  • Language models.
  • CRF implementation (POS-tagging, shallow parsing).
  • Wrappers for liblinear and libsvm (including libsvm dataset parsers).
  • UTF8 support for analysis on various languages.
  • Multithreaded algorithms.

Website: github.com/meta-toolkit/meta
Support:
Developer: MeTA Team – University of Illinois at Urbana-Champaign
License: Open source

MeTA is written in C++. Learn C++ with our recommended free books and free tutorials.


Related Software

C++ Natural Language Processing Tools
text2vecFramework with API for text analysis and natural language processing
MosesStatistical machine translation system
TiMBLTilburg Memory-Based Learner
MITIEMIT Information Extraction
MeTAModern C++ data sciences toolkit
Colibri CoreEfficient n-gram & skipgram modelling on text corpora
CRF++Yet Another CRF toolkit
BLLIP ParserStatistical natural language parser

Read our verdict in the software roundup.


Best Free and Open Source Software Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.

This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk.

You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more.

Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form.