Moses is a statistical machine translation system that allows you to automatically train translation models for any language pair. This is the dominant approach in the field at the moment, and is employed by the on-line translation systems deployed by the likes of Google and Microsoft.
All you need is a collection of translated texts (parallel corpus). Once you have a trained model, an efficient search algorithm quickly finds the highest probability translation among the exponential number of choices.
The training process in Moses takes in the parallel data and uses coocurrences of words and segments (known as phrases) to infer translation correspondences between the two languages of interest. In phrase-based machine translation, these correspondences are simply between continuous sequences of words, whereas in hierarchical phrase-based machine translation or syntax-based translation, more structure is added to the correspondences.
The two main components in Moses are the training pipeline and the decoder. There are also a variety of contributed tools and utilities. The training pipeline is really a collection of tools (mainly written in Perl, with some in C++) which take the raw data (parallel and monolingual)and turn it into a machine translation model. The decoder is a single C++ application which, given a trained machine translation model and a source sentence, will translate the source sentence into the target language.
Key Features
- Two types of translation models: phrase-based and tree-based.
- Factored translation models, which enable the integration linguistic and other information at the word level.
- Moses allows the decoding of confusion networks and word lattices, enabling easy integration with ambiguous upstream tools, such as automatic speech recognizers or morphological analyzers.
- Supports models that have become known as hierarchical phrase-based models and syntax-based models. These models use a grammar consisting of SCFG (Synchronous Context-Free Grammar) rules.
- The Experiment Management System makes using Moses much easier.
- Decoder runs on Linux (32 and 64-bit), Windows, Cygwin, Mac OSX (Intel and PowerPC). The training and tuning scripts are regularly run on Linux (32 and 64-bit), and occasionally on Mac (Intel).
Website: github.com/moses-smt/mosesdecoder
Support:
Developer: Hieu Hoang, Philipp Koehn
License: GNU Lesser General Public License
Moses is written in Perl and C++. Learn Perl with our recommended free books and free tutorials. Learn C++ with our recommended free books and free tutorials.
Related Software
| Natural Language Processing | |
|---|---|
| PyTorch-Transformers | Library of state-of-the-art pre-trained models |
| Natural Language Toolkit | Suite of open source Python modules, data sets and tutorials |
| Stanford CoreNLP | Extensible annotation-based NLP pipeline |
| spaCy | Industrial strength natural language processing |
| scikit-learn | Machine learning library for Python |
| Gensim | Python-based vector space modeling and topic modeling toolkit |
| flair | Simple framework for state-of-the-art NLP |
| Apache OpenNLP | Machine learning based toolkit |
| DL4J | Deploy and train deep learning models |
| Apache Lucene | Full-featured information retrieval software library |
| UIMA | Implementation of the UIMA specification |
| tidytext | Text mining using dplyr, ggplot2, and other tidy tools |
| text2vec | Framework with API for text analysis and NLP |
| quanteda | R package for Quantitative Analysis of Textual Data |
| Moses | Statistical machine translation system |
Read our verdict in the software roundup.
| C++ Natural Language Processing Tools | |
|---|---|
| text2vec | Framework with API for text analysis and natural language processing |
| Moses | Statistical machine translation system |
| TiMBL | Tilburg Memory-Based Learner |
| MITIE | MIT Information Extraction |
| MeTA | Modern C++ data sciences toolkit |
| Colibri Core | Efficient n-gram & skipgram modelling on text corpora |
| CRF++ | Yet Another CRF toolkit |
| BLLIP Parser | Statistical natural language parser |
Read our verdict in the software roundup.
Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk. You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more. Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form. |

