deepspeech.pytorch – Implementation of DeepSpeech2 using Baidu Warp-CTC

deepspeech.pytorch is an implementation of DeepSpeech2 using Baidu Warp-CTC. The software creates a network based on the DeepSpeech2 architecture, trained with the CTC activation function.

Features include:

  • Train DeepSpeech, configurable RNN types and architectures with multi-GPU support.
  • Language model support using kenlm (WIP currently).
  • Multiple dataset downloaders, support for AN4, TEDLIUM, Voxforge and Librispeech. Datasets can be merged, and support for custom datasets is included.
  • Noise injection (dynamic) for online training to improve noise robustness.
  • Audio augmentation to improve noise robustness. This applies small changes to the tempo and gain when loading audio to increase robustness.
  • Easy start/stop capabilities in the event of crash or hard stop during training.
  • Visdom/Tensorboard support for visualizing training graphs.

This software has the following dependencies: python-levenshtein, torch, visdom, wget, librosa, and tqdm.

The project also offers a set of pre-trained networks for evaluation usage.

Developer: Sean Naren
License: MIT License

deepspeech.pytorch is written in Python. Learn Python with our recommended free books and free tutorials.

Return to Speech Recognition Tools Home Page

Read our complete collection of recommended free and open source software. The collection covers all categories of software.

The software collection forms part of our series of informative articles for Linux enthusiasts. There's tons of in-depth reviews, alternatives to Google, fun things to try, hardware, free programming books and tutorials, and much more.
Share this article

Share your Thoughts

This site uses Akismet to reduce spam. Learn how your comment data is processed.