How to separate audio source data using Spleeter

Picture by John Matychuk, Unsplash.

5 minutes to read

Machine Learning TensorFlow

Have you ever wanted to remove the singing from a track, so you can create an instrumental version to sing Karaoke to? Or do you want to remix a track or just listen to your favourite singer performing acapella? If so, the Spleeter model is what you’ve been looking for…

Spleeter is an audio source separation model written on top of TensorFlow and was created by researchers working for music streaming business Deezer. It’s a state-of-the-art system that comes with pre-trained models built-in, so you just need to set it up, load up your track, and it will perform audio source separation for you.

Deezer built the Spleeter model to help with Music Information Retrieval or MIR tasks, such as the analysis of vocal lyrics, lyric transcription, and music transcription, such as chord transcription, drum and bass transcription, and beat tracking. Here’s a basic guide to using it.

Install FFMPEG

To use Spleeter you’ll need the FFMPEG transcoding software installed on your machine. This lets you convert files between various formats and can be installed by entering sudo apt install ffmpeg in your terminal.

Install Spleeter

Next, install Spleeter. You can either clone the GitHub repository or install via PyPi by entering pip3 install spleeter in your terminal. Spleeter is a Python application that uses TensorFlow internally and is supplied with several pre-built models to separate audio sources.

Create your script

Now, open a Jupyter notebook and import Audio from IPython.display, then go off to your music collection and find an MP3 to use. I’ve used Moving to LA by Art Brut, because I thought lead singer Eddie Argos’ vocals would sound particularly amusing without the benefit of guitar-based accompaniment.

from IPython.display import Audio

Transcode the MP3 to WAV

While I think you can use MP3 files in Spleeter, I opted to transcode mine to .wav to avoid some bitrate issues I was encountering with the MP3. You can transcode the file within Jupyter by prefixing the command with an exclamation mark. Adding Audio('moving_to_la.wav') will insert an embedded music player in your notebook.

!ffmpeg -i moving_to_la.mp3 -acodec pcm_u8 -ar 22050 moving_to_la.wav
Audio('moving_to_la.wav')

Separate the audio sources

Now the source audio file is transcoded, we can tell Spleeter to perform the audio source separation using TensorFlow. As the Spleeter application is designed to be run from the terminal, we can again use the exclamation mark trick to run it in Jupyter. We’ll store the output in a directory called moving_to_la.

!spleeter separate -o moving_to_la/ moving_to_la.wav

INFO:spleeter:File moving_to_la/moving_to_la/vocals.wav written succesfully
INFO:spleeter:File moving_to_la/moving_to_la/accompaniment.wav written succesfully

After a couple of seconds, the Spleeter model will have separated the audio sources into the vocal track and the accompaniment track and saved them to .wav files in the destination directory. You can view the file names by doing an ls to list the directory contents.

!ls moving_to_la/moving_to_la

accompaniment.wav  vocals.wav

Listen to the separated audio

To listen to the separated audio sources, you can either navigate to the output directory and play them in your media player, or use the Audio() function to play them from within Jupyter. The vocals.wav file contains only the vocals and has cleverly separated the accompaniment. You can barely hear it, so it sounds like Eddie is singing/speaking in an empty room.

Audio('moving_to_la/moving_to_la/vocals.wav')

The accompaniment.wav file has had the singing stripped out, giving you a pure instrumental version of the track. This works really well too. There are some slightly quieter or muffled parts, where the singing has been removed, but it’s pretty bloody close for a zero-shot attempt!

Audio('moving_to_la/moving_to_la/accompaniment.wav')

If you have music that features other “stems”, such as drums, bass, or piano, Spleeter can also separate these out into individual audio tracks using the 2stems, 4stems, or 5stems models built in. It works amazingly well, and has apparently already been utilised in a number of commercial applications used in music production. Check out the paper below for a detailed explanation of how it works under the hood.