deepspeech.pytorch is an implementation of DeepSpeech2 using Baidu Warp-CTC. The software creates a network based on the DeepSpeech2 architecture, trained with the CTC activation function.
Key Features
- Train DeepSpeech, configurable RNN types and architectures with multi-GPU support.
- Language model support using kenlm (WIP currently).
- Multiple dataset downloaders, support for AN4, TEDLIUM, Voxforge and Librispeech. Datasets can be merged, and support for custom datasets is included.
- Noise injection (dynamic) for online training to improve noise robustness.
- Audio augmentation to improve noise robustness. This applies small changes to the tempo and gain when loading audio to increase robustness.
- Easy start/stop capabilities in the event of crash or hard stop during training.
- Visdom/Tensorboard support for visualizing training graphs.
This software has the following dependencies: python-levenshtein, torch, visdom, wget, librosa, and tqdm.
The project also offers a set of pre-trained networks for evaluation usage.
Website: github.com/SeanNaren/deepspeech.pytorch
Support:
Developer: Sean Naren
License: MIT License
deepspeech.pytorch is written in Python. Learn Python with our recommended free books and free tutorials.
Related Software
| Speech Recognition Tools | |
|---|---|
| Whisper | Automatic speech recognition (system trained on 680,000 hours of data |
| Flashlight | Fast, flexible machine learning library written entirely in C++. |
| Coqui STT | Deep-learning toolkit for training and deploying speech-to-text models |
| Kaldi | C++ toolkit designed for speech recognition researchers. |
| SpeechBrain | All-in-one conversational AI toolkit based on PyTorch |
| Handy | Offline speech-to-text application |
| ESPnet | End-to-End speech processing toolkit |
| deepspeech.pytorch | Implementation of DeepSpeech2 using Baidu Warp-CTC. |
| Whispering | Transcription application with global speech-to-text functionality |
| Julius | Two-pass large vocabulary continuous speech recognition engine |
| CMUSphinx | Speech recognition system for mobile and server applications |
| Simon | Flexible speech recognition software |
| hyprwhspr | Native speech-to-text designed for Arch / Omarchy |
| ostt | Open Speech-to-Text |
| DeepSpeech | TensorFlow implementation of Baidu's DeepSpeech architecture. |
| OpenSeq2Seq | TensorFlow-based toolkit for sequence-to-sequence models |
| Eesen | End-to-End Speech Recognition |
Read our verdict in the software roundup.
Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk. You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more. Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form. |

