Machine Learning in Linux: Bark - Text-Prompted Generative Audio

Our Machine Learning in Linux series focuses on apps that make it easy to experiment with machine learning.

One of the standout machine learning apps is Stable Diffusion, a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. We’ve explored quite a few hugely impressive web frontends such as Easy Diffusion, InvokeAI, and Stable Diffusion web UI.

Extending this theme but from an audio perspective, step forward Bark. This is a transformer-based text-to-audio model. The software can generate realistic multilingual speech as well as other audio – including music, background noise and simple sound effects, from text. The model also generates nonverbal communications like laughing, sighing, crying, and hesitations.

Bark follows a GPT style architecture. It is not a conventional Text-to-Speech model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script.

Installation

We tested Bark with a fresh installation of the Arch distro.

To avoid polluting our system, we’ll use conda to install Bark. A conda environment is a directory that contains a specific collection of conda packages that you have installed.

If your system doesn’t have conda, install either Anaconda or Miniconda, the latter is a minimal installer for conda; a small, bootstrap version of Anaconda that includes only conda, Python, the packages they depend on, and a small number of other useful packages, including pip, zlib and a few others.

There’s a package for Miniconda in the AUR which we’ll install with the command:

$ yay -S miniconda3

If your shell is Bash or a Bourne variant, enable conda for the current user with

$ echo "[ -f /opt/miniconda3/etc/profile.d/conda.sh ] && source /opt/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc

Create our conda environment with the command:

$ conda create --name bark

Activate that environment with the command:

$ conda activate bark

Clone the project’s GitHub repository:

$ git clone https://github.com/suno-ai/bark

Change into the newly created directory, and install with pip (remember we’re installing to our conda environment, without polluting our system).

cd bark && pip install .

There are a few extras which you might need to do. The full version of Bark requires around 12GB of VRAM. If your GPU has less than 12GB of VRAM (our test machine hosts a GeForce RTX 3060 Ti card with only 8GB of VRAM), you’ll get errors such as this:

Oops, an error occurred: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.76 GiB total capacity; 6.29 GiB already allocated; 62.19 MiB free; 6.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC

Instead, we need to use smaller version of the models. To tell Bark to use the smaller models, set the environment flag SUNO_USE_SMALL_MODELS=True.

$ export SUNO_USE_SMALL_MODELS=True

We’ll also install IPython, an interactive command-line terminal for Python.

$ pip install ipython # Again, only use this command in the conda environment.

Next page: Page 2 – In Operation and Summary

Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation and Summary
Page 3 – Example Python File

Pages: 1 2 3

This site uses Akismet to reduce spam. Read our Comment FAQ before commenting.

6 Comments

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Carlos

2 years ago

Never heard of Bark before. It looks kinda interesting. I’ll give it a whirl under Ubuntu.

James

Reply to Carlos

I’m using Debian so I should be able to get it working.

Neil

Reply to James

do what?

Mel

Can you run Bark without a dedicated graphics card? I’ve got a 5th generation Intel machine with 8GB of RAM.

Author

Steve Emms

Reply to Mel

We don’t recommend using Bark without a dedicated GPU, but it’s definitely possible to run it without one.

You’ll get a warning

“No GPU being used. Careful, inference might be very slow!”

And that’s definitely the case. A 5 second clip took over a minute to be generated on an Intel i5-10400 machine.

Last edited 2 years ago by Steve Emms

Reply to Steve Emms

Even with an i9-13900K, processing is slow. A dedicated graphics card is a must for these machine learning apps.

Documents	Internet	Education
Audio	Video	Graphics
Admin	Desktop	Productivity
Science	Games	Security
Utilities	Coding	Finance
Web Apps	Other	Books

Google	Microsoft	Apple
Adobe	IBM	Autodesk
Oracle	Atlassian	Corel
Cisco	Intuit	SAS
Progress	Salesforce	Citrix