The Apache OpenNLP library is an open source machine learning based toolkit for the processing of natural language text.
It includes a sentence detector, a tokenizer, a name finder, a parts-of-speech (POS) tagger, a chunker, and a parser. It has proficient APIs that can be easily integrated with a Java program.
The goal of the OpenNLP project will be to create a mature toolkit. An additional goal is to provide a large number of pre-built models for a variety of languages, as well as the annotated text resources that those models are derived from.
Features include:
- Tokenization. OpenNLP offers multiple tokenizer implementations:
- Whitespace Tokenizer – A whitespace tokenizer, non whitespace sequences are identified as tokens.
-
- Simple Tokenizer – A character class tokenizer, sequences of the same character class are tokens.
-
- Learnable Tokenizer – A maximum entropy tokenizer, detects token boundaries based on probability model.
- Sentence segmentation.
- Part-of-speech tagging – marks tokens with their corresponding word type based on the token itself and the context of the token.
- Named entity extraction – the Name Finder can detect named entities and numbers in text.
- Chunking – consists of dividing a text in syntactically correlated parts of words, like noun groups, verb groups, but does not specify their internal structure, nor their role in the main sentence.
- Parsing – offers two different parser implementations, the chunking parser and the treeinsert parser. OpenNLP has a command line tool which is used to train the models available from the model download page on various corpora.
- Coreference resolution – links multiple mentions of an entity in a document together. The OpenNLP implementation is currently limited to noun phrase mentions, other mention types cannot be resolved.
- Maximum entropy.
- Perceptron based machine learning.
Website: opennlp.apache.org
Support: Documentation, GitHub
Developer: The Apache Software Foundation
License: Apache License Version 2.0
Apache OpenNLP is written in Java. Learn Java with our recommended free books and free tutorials.
Return to Natural Language Processing | Return to Java Natural Language Tools
| Popular series | |
|---|---|
| The largest compilation of the best free and open source software in the universe. Each article is supplied with a legendary ratings chart helping you to make informed decisions. | |
| Hundreds of in-depth reviews offering our unbiased and expert opinion on software. We offer helpful and impartial information. | |
| The Big List of Active Linux Distros is a large compilation of actively developed Linux distributions. | |
| Replace proprietary software with open source alternatives: Google, Microsoft, Apple, Adobe, IBM, Autodesk, Oracle, Atlassian, Corel, Cisco, Intuit, SAS, Progress, Salesforce, and Citrix | |
| Awesome Free Linux Games Tools showcases a series of tools that making gaming on Linux a more pleasurable experience. This is a new series. | |
| Machine Learning explores practical applications of machine learning and deep learning from a Linux perspective. We've written reviews of more than 40 self-hosted apps. All are free and open source. | |
| New to Linux? Read our Linux for Starters series. We start right at the basics and teach you everything you need to know to get started with Linux. | |
| Alternatives to popular CLI tools showcases essential tools that are modern replacements for core Linux utilities. | |
| Essential Linux system tools focuses on small, indispensable utilities, useful for system administrators as well as regular users. | |
| Linux utilities to maximise your productivity. Small, indispensable tools, useful for anyone running a Linux machine. | |
| Surveys popular streaming services from a Linux perspective: Amazon Music Unlimited, Myuzi, Spotify, Deezer, Tidal. | |
| Saving Money with Linux looks at how you can reduce your energy bills running Linux. | |
| Home computers became commonplace in the 1980s. Emulate home computers including the Commodore 64, Amiga, Atari ST, ZX81, Amstrad CPC, and ZX Spectrum. | |
| Now and Then examines how promising open source software fared over the years. It can be a bumpy ride. | |
| Linux at Home looks at a range of home activities where Linux can play its part, making the most of our time at home, keeping active and engaged. | |
| Linux Candy reveals the lighter side of Linux. Have some fun and escape from the daily drudgery. | |
| Getting Started with Docker helps you master Docker, a set of platform as a service products that delivers software in packages called containers. | |
| Best Free Android Apps. We showcase free Android apps that are definitely worth downloading. There's a strict eligibility criteria for inclusion in this series. | |
| These best free books accelerate your learning of every programming language. Learn a new language today! | |
| These free tutorials offer the perfect tonic to our free programming books series. | |
| Linux Around The World showcases usergroups that are relevant to Linux enthusiasts. Great ways to meet up with fellow enthusiasts. | |
| Stars and Stripes is an occasional series looking at the impact of Linux in the USA. | |