deepspeech.pytorch is an implementation of DeepSpeech2 using Baidu Warp-CTC. The software creates a network based on the DeepSpeech2 architecture, trained with the CTC activation function.
- Train DeepSpeech, configurable RNN types and architectures with multi-GPU support.
- Language model support using kenlm (WIP currently).
- Multiple dataset downloaders, support for AN4, TEDLIUM, Voxforge and Librispeech. Datasets can be merged, and support for custom datasets is included.
- Noise injection (dynamic) for online training to improve noise robustness.
- Audio augmentation to improve noise robustness. This applies small changes to the tempo and gain when loading audio to increase robustness.
- Easy start/stop capabilities in the event of crash or hard stop during training.
- Visdom/Tensorboard support for visualizing training graphs.
This software has the following dependencies: python-levenshtein, torch, visdom, wget, librosa, and tqdm.
The project also offers a set of pre-trained networks for evaluation usage.
Developer: Sean Naren
License: MIT License
|Read our complete collection of recommended free and open source software. The collection covers all categories of software.
The software collection forms part of our series of informative articles for Linux enthusiasts. There's tons of in-depth reviews, alternatives to Google, fun things to try, hardware, free programming books and tutorials, and much more.