Chat

Machine Learning in Linux: Bark – Text-Prompted Generative Audio

Our Machine Learning in Linux series focuses on apps that make it easy to experiment with machine learning.

One of the standout machine learning apps is Stable Diffusion, a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. We’ve explored quite a few hugely impressive web frontends such as Easy Diffusion, InvokeAI, and Stable Diffusion web UI.

Extending this theme but from an audio perspective, step forward Bark. This is a transformer-based text-to-audio model. The software can generate realistic multilingual speech as well as other audio – including music, background noise and simple sound effects, from text. The model also generates nonverbal communications like laughing, sighing, crying, and hesitations.

Bark follows a GPT style architecture. It is not a conventional Text-to-Speech model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script.

Installation

We tested Bark with a fresh installation of the Arch distro.

To avoid polluting our system, we’ll use conda to install Bark. A conda environment is a directory that contains a specific collection of conda packages that you have installed.

If your system doesn’t have conda, install either Anaconda or Miniconda, the latter is a minimal installer for conda; a small, bootstrap version of Anaconda that includes only conda, Python, the packages they depend on, and a small number of other useful packages, including pip, zlib and a few others.

There’s a package for Miniconda in the AUR which we’ll install with the command:

$ yay -S miniconda3

If your shell is Bash or a Bourne variant, enable conda for the current user with

$ echo "[ -f /opt/miniconda3/etc/profile.d/conda.sh ] && source /opt/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc

Create our conda environment with the command:

$ conda create --name bark

Activate that environment with the command:

$ conda activate bark

Clone the project’s GitHub repository:

$ git clone https://github.com/suno-ai/bark

Change into the newly created directory, and install with pip (remember we’re installing to our conda environment, without polluting our system).

cd bark && pip install .

There are a few extras which you might need to do. The full version of Bark requires around 12GB of VRAM. If your GPU has less than 12GB of VRAM (our test machine hosts a GeForce RTX 3060 Ti card with only 8GB of VRAM), you’ll get errors such as this:

Oops, an error occurred: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.76 GiB total capacity; 6.29 GiB already allocated; 62.19 MiB free; 6.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC

Instead, we need to use smaller version of the models. To tell Bark to use the smaller models, set the environment flag SUNO_USE_SMALL_MODELS=True.

$ export SUNO_USE_SMALL_MODELS=True

We’ll also install IPython, an interactive command-line terminal for Python.

$ pip install ipython # Again, only use this command in the conda environment.

Next page: Page 2 – In Operation and Summary

Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation and Summary
Page 3 – Example Python File

Subscribe
Notify of
guest

This site uses Akismet to reduce spam. Learn how your comment data is processed.

6 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Carlos
Carlos
10 months ago

Never heard of Bark before. It looks kinda interesting. I’ll give it a whirl under Ubuntu.

James
James
10 months ago
Reply to  Carlos

I’m using Debian so I should be able to get it working.

Neil
Neil
10 months ago
Reply to  James

do what?

Mel
Mel
10 months ago

Can you run Bark without a dedicated graphics card? I’ve got a 5th generation Intel machine with 8GB of RAM.