Machine Learning in Linux: Whisper - automatic speech recognition system - Page 2 of 2

Last Updated on December 19, 2023

In Operation

whisper is run from the command-line, there’s no fancy graphical user interface included with the project.

The software comes with a range of pre-trained models in varying sizes which is useful to examine the scaling properties of Whisper. Here’s the complete list: ‘tiny.en’, ‘tiny’, ‘base.en’, ‘base’, ‘small.en’, ‘small’, ‘medium.en’, ‘medium’, ‘large-v1’, ‘large-v2’, and ‘large’.

Let’s try the software using the medium model on a MP3 file (FLAC and WAV are also supported). The first time you use a model, the model is downloaded. The medium model is a 461MB download (the large model is 2.87GB download).

If we don’t specify the language with the flag --language the software automatically detects the language using up to the first 30 seconds. We can tell the software the spoken language which avoids the overhead of auto-detection. There’s support for more than 100 languages.

We want a transcription of the audio.mp3 file using the medium model. We’ll tell the software this file is spoken English.

$ whisper audio.mp3 --model medium --language English

The image below shows transcribing in progress.

We verify that this transcription is using our GPU.

GreenWithEnvy — Click image for full size

You can see our GPU has 8GB of VRAM. Note the large model won’t run on this GPU as it requires over 8GB of VRAM.

There are tons of options available which can be viewed with $ whisper --help

Summary

Whisper gets our highest recommendation. From our testing, the accuracy of transcription is excellent approaching human level robustness and accuracy.

There’s support for an impressive number of languages.

Whisper doesn’t come with graphical interface, nor can it record audio. It can only take existing audio files and output text files.

There are some interesting uses of Whisper detailed on the project’s Show and tell page. Examples include a transcriber for WhatsApp voice notes, and a script to burn whisper AI generated transcription / translation subtitles into provided video using ffmpeg.

Whisper has amassed over 25,000 GitHub stars.

Website: openai.com/research/whisper
Support: GitHub Code Repository
Developer: OpenAI
License: MIT License

Whisper is written in Python. Learn Python with our recommended free books and free tutorials.

For other useful open source apps that use machine learning/deep learning, we’ve compiled this roundup.

Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation and Summary

Related Software

Speech Recognition Tools
Whisper	Automatic speech recognition (system trained on 680,000 hours of data
Flashlight	Fast, flexible machine learning library written entirely in C++.
Coqui STT	Deep-learning toolkit for training and deploying speech-to-text models
Kaldi	C++ toolkit designed for speech recognition researchers.
SpeechBrain	All-in-one conversational AI toolkit based on PyTorch
Handy	Offline speech-to-text application
ESPnet	End-to-End speech processing toolkit
deepspeech.pytorch	Implementation of DeepSpeech2 using Baidu Warp-CTC.
Whispering	Transcription application with global speech-to-text functionality
Julius	Two-pass large vocabulary continuous speech recognition engine
CMUSphinx	Speech recognition system for mobile and server applications
Simon	Flexible speech recognition software
hyprwhspr	Native speech-to-text designed for Arch / Omarchy
ostt	Open Speech-to-Text
DeepSpeech	TensorFlow implementation of Baidu's DeepSpeech architecture.
OpenSeq2Seq	TensorFlow-based toolkit for sequence-to-sequence models
Eesen	End-to-End Speech Recognition

Read our verdict in the software roundup.

Explore our comprehensive directory of recommended free and open source software. Our carefully curated collection spans every major software category.

This directory is part of our ongoing series of informative articles for Linux enthusiasts. It features hundreds of detailed reviews, along with open source alternatives to proprietary solutions from major corporations such as Google, Microsoft, Apple, Adobe, IBM, Cisco, Oracle, and Autodesk.

You’ll also find interesting projects to try, hardware coverage, free programming books and tutorials, and much more.

Discovered a useful open source Linux program that we haven’t covered yet? Let us know by completing this form.

Pages: 1 2

11 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Tom J.

3 years ago

Whisper is Amazing! I haven’t tried the API for C++ yet but hopefully there’s finally hope for Linux speech recognition!

Gaf the Horse

2 years ago

This is a really useful tutorial on installing and setting up Whisper, so many times tutorials have errors leading to frustration but this one guided me through without a hitch….Many Thanks

Dick D.

No matter the input file size, I got: untyped_storage = torch.UntypedStorage(
torch.cuda.OutOfMemoryError: CUDA out of memory…. Very frustrating.

Author

Steve Emms

Reply to Dick D.

What graphics card are you using? Try using the tiny model to start with.

Mali

It didn’t work. How do I uninstall all those massive packages?

triple5

Reply to Mali

possibly with ubuntu 23.04 it doesn’t work? in my case it’s like this, and it is mentioned that at the top (as we ran into issues using Ubuntu 22.10).
It worked on my laptop with ubuntu 22.04,
on my laptop I used anaconda3 https://repo.anaconda.com/archive/Anaconda3-2023.09-0-Linux-x86_64.sh instead of the above older version

If you want to uninstall, just remove the directory:
$ rm $HOME/anaconda3

Aaron

Whisper is awesome.

Dave

Long message saying it didn’t work. suggested using pipx after installing pipx tried command but it didn’t like the -U flag. Currently says it is installing without that. Seems to go on for ever, no change in size of Anaconda folder. xubuntu23.10

Reply to Dave

I’ve updated the installation section for Ubuntu 23.10. The old way of install via pip -U was deprecated after publishing my review.

Wilbur Ince

This worked great! I was able to get this running on Ubuntu 22.04, but had to remove pipx, and reinstall. I did this:
rm -rf ~/.local/pipx
Then just reinstall pipx, and it worked.

Is there a way to clean up the text output of Whisper?

I figured it out! I am using this command to output to a txt file.
whisper intro.wav --model medium --language English --output_format txt

Documents	Internet	Education
Audio	Video	Graphics
Admin	Desktop	Productivity
Science	Games	Security
Utilities	Coding	Finance
Web Apps	Other	Books

Google	Microsoft	Apple
Adobe	IBM	Autodesk
Oracle	Atlassian	Corel
Cisco	Intuit	SAS
Progress	Salesforce	Citrix