CRF++: Yet Another CRF toolkit

CRF++ is a simple, customizable, and open source implementation of Conditional Random Fields (CRFs) for segmenting/labeling sequential data.

CRF++ is designed for generic purpose and will be applied to a variety of NLP tasks, such as Named Entity Recognition, Information Extraction and Text Chunking.

The software is published under an open source license.

Key Features

Designed as a general purpose tool.
Can redefine feature sets.
Written in C++ with STL.
Fast training based on LBFGS, a quasi-newton algorithm for large scale numerical optimization problem.
Less memory usage both in training and testing.
Encoding/decoding in practical time.
Can perform n-best outputs.
Can perform single-best MIRA training.
Can output marginal probabilities for all candidates.
Supports 1-best MIRA training.
Uses the exactly same data format as YamCha, a generic, customizable, and open source text chunker oriented toward a lot of NLP tasks, such as POS tagging, Named Entity Recognition, base NP chunking, and Text Chunking. YamCha is using a state-of-the-art machine learning algorithm called Support Vector Machines (SVMs), first introduced by Vapnik in 1995.

Both the training file and the test file need to be in a particular format for CRF++ to work properly. Generally speaking, training and test file must consist of multiple tokens. In addition, a token consists of multiple (but fixed-numbers) columns. The definition of tokens depends on tasks, however, in most of typical cases, they simply correspond to words. Each token must be represented in one line, with the columns separated by white space (spaces or tabular characters). A sequence of token becomes a sentence. To identify the boundary between sentences, an empty line is put.

Website: taku910.github.io/crfpp
Support: GitHub Code Repository
Developer: Taku Kudo
License: GNU Lesser General Public License or new BSD License

CRF++ is written in Shell and C++. Learn C++ with our recommended free books and free tutorials.

Related Software

C++ Natural Language Processing Tools
text2vec	Framework with API for text analysis and natural language processing
Moses	Statistical machine translation system
TiMBL	Tilburg Memory-Based Learner
MITIE	MIT Information Extraction
MeTA	Modern C++ data sciences toolkit
Colibri Core	Efficient n-gram & skipgram modelling on text corpora
CRF++	Yet Another CRF toolkit
BLLIP Parser	Statistical natural language parser

Read our verdict in the software roundup.

Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.

This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk.

You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more.

Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form.

Documents	Internet	Education
Audio	Video	Graphics
Admin	Desktop	Productivity
Science	Games	Security
Utilities	Coding	Finance
Web Apps	Other	Books

Google	Microsoft	Apple
Adobe	IBM	Autodesk
Oracle	Atlassian	Corel
Cisco	Intuit	SAS
Progress	Salesforce	Citrix