DeepSpeech is an open source Speech-To-Text engine, using a model trained by machine learning techniques. It’s a TensorFlow implementation of Baidu’s DeepSpeech architecture.
This open-source platform is designed for advanced decoding with flexible knowledge integration.
The software is in an early stage of development.
The core of the system is a bidirectional recurrent neural network (BRNN) trained to ingest speech spectrograms and generate English text transcriptions. A pre-trained English model is available for use.
The software needs Python 2.7, and the Git Large File Storage – a Git extension for versioning large files.
Key Features
- 3 different ways to use the software:
- Python package.
- Command-line client.
- Node.JS package.
- Works with signed 16-bit PCM data.
- Takes word lattice as input, perform feature extraction specified by developers, generate factor graphs based on descriptive rules, and perform learning and inference automatically.
Supported operating systems:
- OS X 10.10, 10.11, 10.12 and 10.13
- Linux x86 64 bit with a modern CPU (needs at least AVX/FMA)
- Linux x86 64 bit with a modern CPU + NVIDIA GPU (Compute Capability at least 3)
- Raspbian Jessie on Raspberry Pi 3
Website: github.com/mozilla/DeepSpeech
Support: Releases
Developer: Mozilla
License: Mozilla Public License 2.0
DeepSpeech is written in C++ and Python. Learn C++ with our recommended free books and free tutorials. Learn Python with our recommended free books and free tutorials.
Related Software
| Speech Recognition Tools | |
|---|---|
| Whisper | Automatic speech recognition (system trained on 680,000 hours of data |
| Flashlight | Fast, flexible machine learning library written entirely in C++. |
| Coqui STT | Deep-learning toolkit for training and deploying speech-to-text models |
| Kaldi | C++ toolkit designed for speech recognition researchers. |
| SpeechBrain | All-in-one conversational AI toolkit based on PyTorch |
| Handy | Offline speech-to-text application |
| ESPnet | End-to-End speech processing toolkit |
| deepspeech.pytorch | Implementation of DeepSpeech2 using Baidu Warp-CTC. |
| Whispering | Transcription application with global speech-to-text functionality |
| Julius | Two-pass large vocabulary continuous speech recognition engine |
| CMUSphinx | Speech recognition system for mobile and server applications |
| Simon | Flexible speech recognition software |
| hyprwhspr | Native speech-to-text designed for Arch / Omarchy |
| ostt | Open Speech-to-Text |
| DeepSpeech | TensorFlow implementation of Baidu's DeepSpeech architecture. |
| OpenSeq2Seq | TensorFlow-based toolkit for sequence-to-sequence models |
| Eesen | End-to-End Speech Recognition |
Read our verdict in the software roundup.
Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk. You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more. Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form. |

