Machine Learning in Linux: Coqui STT - deep-learning toolkit for training and deploying speech-to-text models

Last Updated on March 6, 2023

We used to recommend DeepSpeech as the finest open-source Speech-To-Text engine. They released models capable of transcribing lectures, conversations, television and radio shows, and other live streams with “human accuracy”. Sadly, DeepSpeech is no longer maintained. Fortunately, there are other solutions.

Coqui STT (STT) is a deep-learning toolkit for training and deploying speech-to-text models.

This is free and open source software.

Installation

To avoid polluting your system, we recommend installing STT with Anaconda, a distribution of the Python and R programming languages for scientific computing, that aims to simplify package management and deployment. Alternatively, use Miniconda (a minimal installer for conda).

Download and install Anaconda using wget.

$ wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh

Run the shell script:

$ bash Anaconda3-2022.10-Linux-x86_64.sh

You’ll be asked to accept Anaconda’s license and whether to initialize Anaconda3 by running conda init. For changes to take effect, close and re-open your current shell.

Create a conda environment, and activate it.

$ conda create --name coqui-stt
$ conda activate coqui-stt

$ pip install coqui-stt-model-manager

This command installs: Flask-2.0.1 Flask-Cors-3.0.10 Flask-SocketIO-4.3.2 Jinja2-3.0.1 Werkzeug-2.0.3 coqpit-0.0.9 coqui-stt-model-manager-0.0.21 idna-2.10 itsdangerous-2.1.2 python-engineio-3.14.2 python-socketio-4.6.1 requests-2.25.1 stt-1.4.0 webrtcvad-2.0.10

Next page: Page 2 – In Operation and Summary

Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation and Summary

Pages: 1 2

Documents	Internet	Education
Audio	Video	Graphics
Admin	Desktop	Productivity
Science	Games	Security
Utilities	Coding	Finance
Web Apps	Other	Books

Google	Microsoft	Apple
Adobe	IBM	Autodesk
Oracle	Atlassian	Corel
Cisco	Intuit	SAS