Machine Learning in Linux: Whisper – automatic speech recognition system

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web. Powered by deep learning and neural networks, Whisper is a natural language processing system that’s built on PyTorch.

The software offers transcription in multiple languages, as well as translation from those languages into English.

This is free and open source software.

Installation

We tested Whisper with Ubuntu 22.04 LTS (as we ran into issues using Ubuntu 22.10).

To avoid polluting your system, we recommend installing Whisper with Anaconda or Miniconda (if you only want conda).

Download and install Anaconda using wget.

$ wget https://repo.anaconda.com/archive/Anaconda3-2022.10-Linux-x86_64.sh

Run the shell script:

$ bash Anaconda3-2022.10-Linux-x86_64.sh

You’ll be asked to accept Anaconda’s license and whether to initialize Anaconda3 by running conda init. For changes to take effect, close and re-open your current shell.

Create a conda environment, and activate it.

$ conda create --name whisper
$ conda activate whisper

Now we’re ready to install Whisper using pip, a package manager for Python.

$ pip install -U openai-whisper

This is the output from running that command.

Successfully built openai-whisper
Installing collected packages: tokenizers, huggingface-hub, transformers, openai-whisper
Successfully installed huggingface-hub-0.12.1 openai-whisper-20230124 tokenizers-0.13.2 transformers-4.26.1

Next page: Page 2 – In Operation and Summary

Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation and Summary

One comment

  1. Whisper is Amazing! I haven’t tried the API for C++ yet but hopefully there’s finally hope for Linux speech recognition!

Share your Thoughts

This site uses Akismet to reduce spam. Learn how your comment data is processed.