Machine Learning in Linux: Whisper – automatic speech recognition system

In Operation

whisper is run from the command-line, there’s no fancy graphical user interface included with the project.

The software comes with a range of pre-trained models in varying sizes which is useful to examine the scaling properties of Whisper. Here’s the complete list: ‘tiny.en’, ‘tiny’, ‘base.en’, ‘base’, ‘small.en’, ‘small’, ‘medium.en’, ‘medium’, ‘large-v1’, ‘large-v2’, and ‘large’.

Let’s try the software using the medium model on a MP3 file (FLAC and WAV are also supported). The first time you use a model, the model is downloaded. The medium model is a 461MB download (the large model is 2.87GB download).

If we don’t specify the language with the flag --language the software automatically detects the language using up to the first 30 seconds. We can tell the software the spoken language which avoids the overhead of auto-detection. There’s support for more than 100 languages.

We want a transcription of the audio.mp3 file using the medium model. We’ll tell the software this file is spoken English.

$ whisper audio.mp3 --model medium --language English

The image below shows transcribing in progress.

whisper in action

We verify that this transcription is using our GPU.

Click image for full size

You can see our GPU has 8GB of VRAM. Note the large model won’t run on this GPU as it requires over 8GB of VRAM.

There are tons of options available which can be viewed with $ whisper --help


Whisper gets our highest recommendation. From our testing, the accuracy of transcription is excellent approaching human level robustness and accuracy.

There’s support for an impressive number of languages.

Whisper doesn’t come with graphical interface, nor can it record audio. It can only take existing audio files and output text files.

There are some interesting uses of Whisper detailed on the project’s Show and tell page. Examples include a transcriber for WhatsApp voice notes, and a script to burn whisper AI generated transcription / translation subtitles into provided video using ffmpeg.

Whisper has amassed over 25,000 GitHub stars.

Support: GitHub Code Repository
Developer: OpenAI
License: MIT License

Whisper is written in Python. Learn Python with our recommended free books and free tutorials.

Artificial intelligence icon For other useful open source apps that use machine learning/deep learning, we’ve compiled this roundup.

Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation and Summary

One comment

  1. Whisper is Amazing! I haven’t tried the API for C++ yet but hopefully there’s finally hope for Linux speech recognition!

Share your Thoughts

This site uses Akismet to reduce spam. Learn how your comment data is processed.