Vocal separation using Karaoke U-net
8 Pages Posted: 14 Dec 2021
Date Written: December 12, 2021
Abstract
Currently karaoke tracks for songs have to be specially made by an audio engineer. The process for generating a high quality karaoke track for a song is not accessible to the general public. Specialized software’s like Audacity have to be used. Hence in this paper we are proposing a modified U-net called Karaoke U-net which provides a simple and quick separation of vocals
from a given song with both vocal and instrumental components and offers a high-quality karaoke track. It doesn’t require any special audio processing software’s. The proposed system takes as input a song, generates spectrograms of it and passes it through the Karaoke U-net. Our U-net generates the spectrograms of the vocals and instrumental of the input song. Finally the generated spectrograms are used to create audio files of the vocals and instrumental. We have created the
first U-net model specifically for generating a Karaoke track. We have an overall accuracy of 88.6 % and the performance of the proposed model on the MUSDB18 is better than other similar systems. Our U-net allows the user to create an instrumental for any song with vocal components. It can also be used by students who are learning audio mixing and mastering to analyze the vocals separately from the track and understand what processing has been done on the vocals. One more application of the U-net is to remove background noise during live video conferencing and, in turn helping the users to communicate more effectively.
Keywords: Karaoke, Vocal separation, U-net, Audio processing
JEL Classification: Y9
Suggested Citation: Suggested Citation