Moses – statistical machine translation system

Moses is a statistical machine translation system that allows you to automatically train translation models for any language pair. This is the dominant approach in the field at the moment, and is employed by the on-line translation systems deployed by the likes of Google and Microsoft.

All you need is a collection of translated texts (parallel corpus). Once you have a trained model, an efficient search algorithm quickly finds the highest probability translation among the exponential number of choices.

The training process in Moses takes in the parallel data and uses coocurrences of words and segments (known as phrases) to infer translation correspondences between the two languages of interest. In phrase-based machine translation, these correspondences are simply between continuous sequences of words, whereas in hierarchical phrase-based machine translation or syntax-based translation, more structure is added to the correspondences.

The two main components in Moses are the training pipeline and the decoder. There are also a variety of contributed tools and utilities. The training pipeline is really a collection of tools (mainly written in Perl, with some in C++) which take the raw data (parallel and monolingual)and turn it into a machine translation model. The decoder is a single C++ application which, given a trained machine translation model and a source sentence, will translate the source sentence into the target language.

Features include:

  • Two types of translation models: phrase-based and tree-based.
  • Factored translation models, which enable the integration linguistic and other information at the word level.
  • Moses allows the decoding of confusion networks and word lattices, enabling easy integration with ambiguous upstream tools, such as automatic speech recognizers or morphological analyzers.
  • Supports models that have become known as hierarchical phrase-based models and syntax-based models. These models use a grammar consisting of SCFG (Synchronous Context-Free Grammar) rules.
  • The Experiment Management System makes using Moses much easier.
  • Decoder runs on Linux (32 and 64-bit), Windows, Cygwin, Mac OSX (Intel and PowerPC). The training and tuning scripts are regularly run on Linux (32 and 64-bit), and occasionally on Mac (Intel).

Support: Manual, Mailing Lists, GitHub Code Repository
Developer: Hieu Hoang, Philipp Koehn
License: GNU Lesser General Public License

Moses is written in Perl and C++. Learn Perl with our recommended free books and free tutorials. Learn C++ with our recommended free books and free tutorials.

Return to Natural Language Processing Home Page | Return to C++ Natural Language Tools Page

Read our complete collection of recommended free and open source software. The collection covers all categories of software.

The software collection forms part of our series of informative articles for Linux enthusiasts. There's tons of in-depth reviews, alternatives to Google, fun things to try, hardware, free programming books and tutorials, and much more.
Share this article