CMUSphinx (Sphinx) is a collective term to describe a group of speech recognition systems developed at Carnegie Mellon University.
CMUSphinx contains a number of packages for different tasks and applications:
- Pocketsphinx — a lightweight speech recognition engine, specifically tuned for handheld and mobile devices, written in C.
- Sphinxbase — contains the basic libraries shared by the CMU Sphinx trainer and all the Sphinx decoders (Sphinx-II, Sphinx-III, and PocketSphinx), as well as some common utilities for manipulating acoustic feature and audio files.
- Sphinx4 — a state-of-the-art, speaker-independent, continuous speech recognition system written in the Java programming language. The design of Sphinx-4 is based on patterns that have emerged from the design of past systems as well as new requirements based on areas that researchers currently want to explore. To exercise this framework, and to provide researchers with a “research-ready” system, Sphinx-4 also includes several implementations of both simple and state-of-the-art techniques.
- Sphinxtrain — Carnegie Mellon University’s open source acoustic model trainer.
.
Key Features
- State of art speech recognition algorithms for efficient speech recognition. CMUSphinx tools are designed specifically for low-resource platforms.
- A flexible design.
- Focuses on practical application development and not on research.
- Wide range of tools for many speech-recognition related purposes (keyword spotting, alignment, pronunciation evaluation).
- Support for several languages including English, French, Mandarin, German, Dutch, Russian, and the ability to build models for other languages.
Website: cmusphinx.github.io
Support: FAQ, GitHub
Developer: Many contributors
License: BSD-like license
Sphinx is written in Java. Learn Java with our recommended free books and free tutorials.
Related Software
| Speech Recognition Tools | |
|---|---|
| Whisper | Automatic speech recognition (system trained on 680,000 hours of data |
| Flashlight | Fast, flexible machine learning library written entirely in C++. |
| Coqui STT | Deep-learning toolkit for training and deploying speech-to-text models |
| Kaldi | C++ toolkit designed for speech recognition researchers. |
| SpeechBrain | All-in-one conversational AI toolkit based on PyTorch |
| Handy | Offline speech-to-text application |
| ESPnet | End-to-End speech processing toolkit |
| deepspeech.pytorch | Implementation of DeepSpeech2 using Baidu Warp-CTC. |
| Whispering | Transcription application with global speech-to-text functionality |
| Julius | Two-pass large vocabulary continuous speech recognition engine |
| CMUSphinx | Speech recognition system for mobile and server applications |
| Simon | Flexible speech recognition software |
| hyprwhspr | Native speech-to-text designed for Arch / Omarchy |
| ostt | Open Speech-to-Text |
| DeepSpeech | TensorFlow implementation of Baidu's DeepSpeech architecture. |
| OpenSeq2Seq | TensorFlow-based toolkit for sequence-to-sequence models |
| Eesen | End-to-End Speech Recognition |
Read our verdict in the software roundup.
Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk. You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more. Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form. |


PocketSphinx looks very interesting, particualry its fixed-point arithmetic and efficient algorithms for GMM computation.