Voice Recognition

Machine Learning in Linux: Coqui STT – deep-learning toolkit for training and deploying speech-to-text models

We used to recommend DeepSpeech as the finest open-source Speech-To-Text engine. They released models capable of transcribing lectures, conversations, television and radio shows, and other live streams with “human accuracy”. Sadly, DeepSpeech is no longer maintained. Fortunately, there are other solutions.

Coqui STT (STT) is a deep-learning toolkit for training and deploying speech-to-text models.

This is free and open source software.


To avoid polluting your system, we recommend installing STT with Anaconda, a distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment. Alternatively, use Miniconda (a minimal installer for conda).

Download and install Anaconda using wget.

$ wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh

Run the shell script:

$ bash Anaconda3-2022.10-Linux-x86_64.sh

You’ll be asked to accept Anaconda’s license and whether to initialize Anaconda3 by running conda init. For changes to take effect, close and re-open your current shell.

Create a conda environment, and activate it.

$ conda create --name coqui-stt
$ conda activate coqui-stt

$ pip install coqui-stt-model-manager

This command installs: Flask-2.0.1 Flask-Cors-3.0.10 Flask-SocketIO-4.3.2 Jinja2-3.0.1 Werkzeug-2.0.3 coqpit-0.0.9 coqui-stt-model-manager-0.0.21 idna-2.10 itsdangerous-2.1.2 python-engineio-3.14.2 python-socketio-4.6.1 requests-2.25.1 stt-1.4.0 webrtcvad-2.0.10

Next page: Page 2 – In Operation and Summary

Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation and Summary

Share your Thoughts

This site uses Akismet to reduce spam. Learn how your comment data is processed.