Machine Learning in Linux: Demucs - music source separation - Page 2 of 3

Last Updated on March 6, 2023

In Operation

demucs is command-line software.

Let’s say we want to process a FLAC file into stems. Here’s an example command:

$ demucs test-music-file.flac

As we haven’t specified a folder to put the extracted tracks into (-o folder), nor a model (-n NAME), demucs uses the default Hybrid Transformer based source separation (htdemucs) model (it’s a single model) and creates a folder ~/separated/htdemucs/test-music-file/. By default, this model splits the FLAC file into four stems: vocals, drums, bass, and other (everything else).

demucs uses CUDA (allowing it to use the GPU) to process the audio file. If we want to use the CPU instead, use the -d flag.

$ demucs -d cpu test-music-file.flac

To give a flavour of the time taken to process a local music file, we took a FLAC file with duration 6 minute 24 seconds. With a 12th generation Intel CPU (i5-12400F) machine sporting a midrange graphics card (NVIDIA GeForce RTX 3060 Ti), the software took 15.6 seconds to process the file. Using only the CPU, processing the song took 187.8 seconds. It’s possible to speed up the separation process by increasing segment but this requires more memory.

Let’s suppose we want to create an instrumental (i.e. a track with all the stems excluding vocals). We use the --two-stems option.

$ demucs --two-stems vocals test-music-file.flac

This creates two files: no_vocals.wav and vocals.wav. The first file is our instrumental track. Perfect for karaoke.

We can tell demucs to use a specific pretrained model with the -n NAME option. If this option isn’t specified, the htdemucs model is used.

We’ve reproduced all the flags below.

usage: demucs.separate [-h] [-s SIG | -n NAME] [--repo REPO] [-v] [-o OUT]
                       [--filename FILENAME] [-d DEVICE] [--shifts SHIFTS]
                       [--overlap OVERLAP] [--no-split | --segment SEGMENT]
                       [--two-stems STEM] [--int24 | --float32]
                       [--clip-mode {rescale,clamp}] [--mp3]
                       [--mp3-bitrate MP3_BITRATE] [-j JOBS]
                       tracks [tracks ...]

For an explanation of these options, we’ve reproduced the help message here.

Summary

demucs is truly sublime software and produces impressive results. Your system will need a decent GPU with a good dollop of RAM if you want fast processing!

The models have been trained on data which is biased towards pop/rock music. The basic training set is a mere 87 songs, but it still works well. The extra model are trained with an extra 150 full lengths music tracks (~10h duration) of different genres along with their isolated drums, bass, vocals and others stems. Obviously this doesn’t cover all instruments and styles. Of course, it’s possible to train the software with data you own.

If we want to try the 6 sources models (adding guitar and piano), we can type:

$ time demucs -n htdemucs_6s test-music-file.flac

The piano stem is currently pretty ropey from our testing but hopefully this will improve with a later release.

The project has attracted more than 5,000 GitHub stars.

Website: github.com/facebookresearch/demucs
Support:
Developer: Meta Platforms, Inc. and affiliates.
License: MIT License

Demucs is written in Python. Learn Python with our recommended free books and free tutorials.

For other useful open source apps that use machine learning/deep learning, we’ve compiled this roundup.

Next page: Page 3 – Help Message

Pages in this article:
Page 1 – Introduction and Installation
Page 2 – In Operation and Summary
Page 3 – Help Message

Pages: 1 2 3

Documents	Internet	Education
Audio	Video	Graphics
Admin	Desktop	Productivity
Science	Games	Security
Utilities	Coding	Finance
Web Apps	Other	Books

Google	Microsoft	Apple
Adobe	IBM	Autodesk
Oracle	Atlassian	Corel
Cisco	Intuit	SAS
Progress	Salesforce	Citrix