CRF++ is a simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data.
CRF++ is designed for generic purpose and will be applied to a variety of NLP tasks, such as Named Entity Recognition, Information Extraction and Text Chunking.
The software is published under an open source license.
Key Features
- Designed as a general purpose tool.
- Can redefine feature sets.
- Written in C++ with STL.
- Fast training based on LBFGS, a quasi-newton algorithm for large scale numerical optimization problem.
- Less memory usage both in training and testing.
- Encoding/decoding in practical time.
- Can perform n-best outputs.
- Can perform single-best MIRA training.
- Can output marginal probabilities for all candidates.
- Supports 1-best MIRA training.
- Uses the exactly same data format as YamCha, a generic, customizable, and open source text chunker oriented toward a lot of NLP tasks, such as POS tagging, Named Entity Recognition, base NP chunking, and Text Chunking. YamCha is using a state-of-the-art machine learning algorithm called Support Vector Machines (SVMs), first introduced by Vapnik in 1995.
Both the training file and the test file need to be in a particular format for CRF++ to work properly. Generally speaking, training and test file must consist of multiple tokens. In addition, a token consists of multiple (but fixed-numbers) columns. The definition of tokens depends on tasks, however, in most of typical cases, they simply correspond to words. Each token must be represented in one line, with the columns separated by white space (spaces or tabular characters). A sequence of token becomes a sentence. To identify the boundary between sentences, an empty line is put.
Website: taku910.github.io/crfpp
Support: GitHub Code Repository
Developer: Taku Kudo
License: GNU Lesser General Public License or new BSD License
CRF++ is written in Shell and C++. Learn C++ with our recommended free books and free tutorials.
Related Software
| C++ Natural Language Processing Tools | |
|---|---|
| text2vec | Framework with API for text analysis and natural language processing |
| Moses | Statistical machine translation system |
| TiMBL | Tilburg Memory-Based Learner |
| MITIE | MIT Information Extraction |
| MeTA | Modern C++ data sciences toolkit |
| Colibri Core | Efficient n-gram & skipgram modelling on text corpora |
| CRF++ | Yet Another CRF toolkit |
| BLLIP Parser | Statistical natural language parser |
Read our verdict in the software roundup.
Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk. You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more. Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form. |

