Last Updated on March 6, 2023
We used to recommend DeepSpeech as the finest open-source Speech-To-Text engine. They released models capable of transcribing lectures, conversations, television and radio shows, and other live streams with “human accuracy”. Sadly, DeepSpeech is no longer maintained. Fortunately, there are other solutions.
Coqui STT (STT) is a deep-learning toolkit for training and deploying speech-to-text models.
This is free and open source software.
To avoid polluting your system, we recommend installing STT with Anaconda, a distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment. Alternatively, use Miniconda (a minimal installer for conda).
Download and install Anaconda using wget.
$ wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh
Run the shell script:
$ bash Anaconda3-2022.10-Linux-x86_64.sh
You’ll be asked to accept Anaconda’s license and whether to initialize Anaconda3 by running conda init. For changes to take effect, close and re-open your current shell.
Create a conda environment, and activate it.
$ conda create --name coqui-stt
$ conda activate coqui-stt
$ pip install coqui-stt-model-manager
This command installs: Flask-2.0.1 Flask-Cors-3.0.10 Flask-SocketIO-4.3.2 Jinja2-3.0.1 Werkzeug-2.0.3 coqpit-0.0.9 coqui-stt-model-manager-0.0.21 idna-2.10 itsdangerous-2.1.2 python-engineio-3.14.2 python-socketio-4.6.1 requests-2.25.1 stt-1.4.0 webrtcvad-2.0.10