Have you ever wanted to remove the singing from a track, so you can create an instrumental version to sing Karaoke to? Or do you want to remix a track or just listen to your favourite singer performing acapella? If so, the Spleeter model is what you’ve been looking for…
Spleeter is an audio source separation model written on top of TensorFlow and was created by researchers working for music streaming business Deezer. It’s a state-of-the-art system that comes with pre-trained models built-in, so you just need to set it up, load up your track, and it will perform audio source separation for you.
Deezer built the Spleeter model to help with Music Information Retrieval or MIR tasks, such as the analysis of vocal lyrics, lyric transcription, and music transcription, such as chord transcription, drum and bass transcription, and beat tracking. Here’s a basic guide to using it.
To use Spleeter you’ll need the FFMPEG transcoding software installed on your machine. This lets you convert files between various formats and can be installed by entering
sudo apt install ffmpeg in your terminal.
Next, install Spleeter. You can either clone the GitHub repository or install via PyPi by entering
pip3 install spleeter in your terminal. Spleeter is a Python application that uses TensorFlow internally and is supplied with several pre-built models to separate audio sources.
Now, open a Jupyter notebook and import
IPython.display, then go off to your music collection and find an MP3 to use. I’ve used Moving to LA by Art Brut, because I thought lead singer Eddie Argos’ vocals would sound particularly amusing without the benefit of guitar-based accompaniment.
from IPython.display import Audio
While I think you can use MP3 files in Spleeter, I opted to transcode mine to
.wav to avoid some bitrate issues I was encountering with the MP3. You can transcode the file within Jupyter by prefixing the command with an exclamation mark. Adding
Audio('moving_to_la.wav') will insert an embedded music player in your notebook.
!ffmpeg -i moving_to_la.mp3 -acodec pcm_u8 -ar 22050 moving_to_la.wav Audio('moving_to_la.wav')
Now the source audio file is transcoded, we can tell Spleeter to perform the audio source separation using TensorFlow. As the Spleeter application is designed to be run from the terminal, we can again use the exclamation mark trick to run it in Jupyter. We’ll store the output in a directory called
!spleeter separate -o moving_to_la/ moving_to_la.wav
INFO:spleeter:File moving_to_la/moving_to_la/vocals.wav written succesfully INFO:spleeter:File moving_to_la/moving_to_la/accompaniment.wav written succesfully
After a couple of seconds, the Spleeter model will have separated the audio sources into the vocal track and the accompaniment track and saved them to
.wav files in the destination directory. You can view the file names by doing an
ls to list the directory contents.
To listen to the separated audio sources, you can either navigate to the output directory and play them in your media player, or use the
Audio() function to play them from within Jupyter. The
vocals.wav file contains only the vocals and has cleverly separated the accompaniment. You can barely hear it, so it sounds like Eddie is singing/speaking in an empty room.
accompaniment.wav file has had the singing stripped out, giving you a pure instrumental version of the track. This works really well too. There are some slightly quieter or muffled parts, where the singing has been removed, but it’s pretty bloody close for a zero-shot attempt!
If you have music that features other “stems”, such as drums, bass, or piano, Spleeter can also separate these out into individual audio tracks using the
5stems models built in. It works amazingly well, and has apparently already been utilised in a number of commercial applications used in music production. Check out the paper below for a detailed explanation of how it works under the hood.
Matt Clarke, Tuesday, March 02, 2021