Vocal separation using Karaoke U-net

8 Pages Posted: 14 Dec 2021

See all articles by Ninad Mehendale

Ninad Mehendale

University of Mumbai - K. J. Somaiya College of Engineering (K.J.S.C.E.); Ninad's research Lab

Vipul Dube

K J Somaiya College of Engineering, Somaiya Vidyavihar University, Mumbai, India

Rutwik Patel

University of Mumbai - K. J. Somaiya College of Engineering (K.J.S.C.E.)

Vrushali Sule

K. J. Somaiya College of Engineering

Date Written: December 12, 2021

Abstract

Currently karaoke tracks for songs have to be specially made by an audio engineer. The process for generating a high quality karaoke track for a song is not accessible to the general public. Specialized software’s like Audacity have to be used. Hence in this paper we are proposing a modified U-net called Karaoke U-net which provides a simple and quick separation of vocals
from a given song with both vocal and instrumental components and offers a high-quality karaoke track. It doesn’t require any special audio processing software’s. The proposed system takes as input a song, generates spectrograms of it and passes it through the Karaoke U-net. Our U-net generates the spectrograms of the vocals and instrumental of the input song. Finally the generated spectrograms are used to create audio files of the vocals and instrumental. We have created the
first U-net model specifically for generating a Karaoke track. We have an overall accuracy of 88.6 % and the performance of the proposed model on the MUSDB18 is better than other similar systems. Our U-net allows the user to create an instrumental for any song with vocal components. It can also be used by students who are learning audio mixing and mastering to analyze the vocals separately from the track and understand what processing has been done on the vocals. One more application of the U-net is to remove background noise during live video conferencing and, in turn helping the users to communicate more effectively.

Keywords: Karaoke, Vocal separation, U-net, Audio processing

JEL Classification: Y9

Suggested Citation

Mehendale, Ninad and Dube, Vipul and Patel, Rutwik and Sule, Vrushali, Vocal separation using Karaoke U-net (December 12, 2021). Available at SSRN: https://ssrn.com/abstract=3983514 or http://dx.doi.org/10.2139/ssrn.3983514

Ninad Mehendale (Contact Author)

University of Mumbai - K. J. Somaiya College of Engineering (K.J.S.C.E.) ( email )

Mumbai, MA Maharashtra 400007
India

Ninad's research Lab ( email )

M.G. Road, Naupada Thane
Thane, 400602
India

Vipul Dube

K J Somaiya College of Engineering, Somaiya Vidyavihar University, Mumbai, India ( email )

India

Rutwik Patel

University of Mumbai - K. J. Somaiya College of Engineering (K.J.S.C.E.) ( email )

Mumbai, MA Maharashtra 400007
India

Vrushali Sule

K. J. Somaiya College of Engineering ( email )

India

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
112
Abstract Views
636
Rank
445,470
PlumX Metrics