Our Machine Learning in Linux series focuses on apps that make it easy to experiment with machine learning.
One of the standout machine learning apps is Stable Diffusion, a latent text-to-image diffusion model capable of generating photo-realistic images given any text input. We’ve explored quite a few hugely impressive web frontends such as Easy Diffusion, InvokeAI, and Stable Diffusion web UI.
Extending this theme but from an audio perspective, step forward Bark. This is a transformer-based text-to-audio model. The software can generate realistic multilingual speech as well as other audio – including music, background noise and simple sound effects, from text. The model also generates nonverbal communications like laughing, sighing, crying, and hesitations.
Bark follows a GPT style architecture. It is not a conventional Text-to-Speech model, but instead a fully generative text-to-audio model capable of deviating in unexpected ways from any given script.
We tested Bark with a fresh installation of the Arch distro.
To avoid polluting our system, we’ll use conda to install Bark. A conda environment is a directory that contains a specific collection of conda packages that you have installed.
If your system doesn’t have conda, install either Anaconda or Miniconda, the latter is a minimal installer for conda; a small, bootstrap version of Anaconda that includes only conda, Python, the packages they depend on, and a small number of other useful packages, including pip, zlib and a few others.
There’s a package for Miniconda in the AUR which we’ll install with the command:
$ yay -S miniconda3
If your shell is Bash or a Bourne variant, enable conda for the current user with
$ echo "[ -f /opt/miniconda3/etc/profile.d/conda.sh ] && source /opt/miniconda3/etc/profile.d/conda.sh" >> ~/.bashrc
Create our conda environment with the command:
$ conda create --name bark
Activate that environment with the command:
$ conda activate bark
Clone the project’s GitHub repository:
$ git clone https://github.com/suno-ai/bark
Change into the newly created directory, and install with pip (remember we’re installing to our conda environment, without polluting our system).
cd bark && pip install .
There are a few extras which you might need to do. The full version of Bark requires around 12GB of VRAM. If your GPU has less than 12GB of VRAM (our test machine hosts a GeForce RTX 3060 Ti card with only 8GB of VRAM), you’ll get errors such as this:
Oops, an error occurred: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.76 GiB total capacity; 6.29 GiB already allocated; 62.19 MiB free; 6.30 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC
Instead, we need to use smaller version of the models. To tell Bark to use the smaller models, set the environment flag SUNO_USE_SMALL_MODELS=True.
$ export SUNO_USE_SMALL_MODELS=True
We’ll also install IPython, an interactive command-line terminal for Python.
$ pip install ipython # Again, only use this command in the conda environment.