Audio generation

Machine Learning in Linux: Audiocraft – audio processing and generation with deep learning

Our Machine Learning in Linux series focuses on apps that make it easy to experiment with machine learning.

We recently explored Bark, a transformer-based text-to-audio model. The software can generate realistic multilingual speech as well as other audio – including music, background noise and simple sound effects, from text.

Instead of generating speech with some music, what about generating music extracts? Audiocraft might be your cup of tea. It’s Python-based software which provides the code and models for MusicGen, a simple and controllable model for music generation.

The models generate short music extracts based on the text description you provide. The models can generate up to 30 seconds of audio in one pass.

MusicGen is a single stage auto-regressive Transformer model trained over a 32kHz EnCodec tokenizer with 4 codebooks sampled at 50 Hz.


We tested Audiocraft with the Arch distro.

To avoid polluting our system, we’ll use conda to install Audiocraft. A conda environment is a directory that contains a specific collection of conda packages that you have installed.

If your system doesn’t have conda, install either Anaconda or Miniconda, the latter is a minimal installer for conda; a small, bootstrap version of Anaconda that includes only conda, Python, the packages they depend on, and a small number of other useful packages, including pip, zlib and a few others.

There’s a package for Miniconda in the AUR which we’ll install with the command:

$ yay -S miniconda3

There are Miniconda packages available for many other distros.

If your shell is Bash or a Bourne variant, enable conda for the current user with the command:

$ echo "[ -f /opt/miniconda3/etc/profile.d/ ] && source /opt/miniconda3/etc/profile.d/" >> ~/.bashrc

Create our conda environment with the command:

$ conda create --name audiocraft

Activate that environment with the command:

$ conda activate audiocraft

Clone the project’s GitHub repository:

$ git clone

Change into the newly created directory

$ cd audiocraft

In our conda environment, we can now install the software.

$ pip install 'torch>=2.0'

$ pip install -U audiocraft

We’ll also install gradio in our conda environment. gradio offers a really quick way to demo machine learning models with a friendly web interface.

$ pip install gradio

Next page: Page 2 – In Operation

Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation
Page 3 – Summary

Notify of

This site uses Akismet to reduce spam. Learn how your comment data is processed.

1 Comment
Newest Most Voted
Inline Feedbacks
View all comments
7 months ago

Thanks for the article. This is one of the best deep learning tools I’ve tried although it’s dog slow on my AMD CPU.

It’s like Stable Diffusion but for audio.

What irks me is the large RAM requirements. Fortunately I’ll be getting a GeForce RTX 4060 Ti so this will have enough RAM.