Machine Learning in Linux: Audiocraft - audio processing and generation with deep learning - Page 3 of 3

Summary

Audiocraft produces remarkable results. It’s not going to make us a music maestro, but the samples generated are impressive even without a lot of tweaking of the text descriptions.

We were initially disappointed to read that a GPU with at least 16GB of VRAM is necessary to use the melody model. Graphics cards with this amount of RAM are expensive for the average user. But fortunately, that information doesn’t appear to be correct. Our test machine with 8GB VRAM mid-range graphics card is able to generate 30 second clips with the melody model.

If you don’t have an NVIDIA GPU, how long does it take to generate music extracts with just the CPU? We made a small code change to audiocraft/models/musicgen.py to force the software to use the CPU instead of the dedicated GPU.

Here are the results to generate a 10 second music extract using the text description “A cheerful country song with acoustic guitars”. For the melody model we used Ravel’s Bolero mp3 file.

Model	CPU	GPU
Melody	178.6	10.9
Small	53.1	5.8
Medium	186.3	11.6
Large	339.5	---
All times in seconds with model pre-loaded. CPU: Intel i5-12400F; GPU: NVIDIA GeForce 3060 Ti

The table should help give you an indication of how long it will take to generate music extracts on your system.

Using the GPU offers a huge speed advantage over the CPU. No surprise there. But if you’re happy waiting a minute or two to generate a clip, you can use the software without a dedicated graphics card. Or you can use Google Colab.

With our test machine, we can only use the large model with the CPU as the GPU has insufficient VRAM, borking out with the error message torch.cuda.OutOfMemoryError: CUDA out of memory.

Website: github.com/facebookresearch/audiocraft
Support:
Developer: Meta Platforms, Inc. and affiliates
License: MIT License

Audiocraft is written in Python. Learn Python with our recommended free books and free tutorials.

For other useful open source apps that use machine learning/deep learning, we’ve compiled this roundup.

Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation
Page 3 – Summary

Pages: 1 2 3

This site uses Akismet to reduce spam. Please read our Comment FAQ before posting.

1 Comment

Oldest

Newest Most Voted

Inline Feedbacks

View all comments

Alan

2 years ago

Thanks for the article. This is one of the best deep learning tools I’ve tried although it’s dog slow on my AMD CPU.

It’s like Stable Diffusion but for audio.

What irks me is the large RAM requirements. Fortunately I’ll be getting a GeForce RTX 4060 Ti so this will have enough RAM.

Documents	Internet	Education
Audio	Video	Graphics
Admin	Desktop	Productivity
Science	Games	Security
Utilities	Coding	Finance
Web Apps	Other	Books

Google	Microsoft	Apple
Adobe	IBM	Autodesk
Oracle	Atlassian	Corel
Cisco	Intuit	SAS
Progress	Salesforce	Citrix