AI-based Vocal Detection Algorithm
“In this song, the vocals come in after 16 bars…”
“In this one, the chorus begins after the 8th bar…”
The DJs of yesteryear, mixing music with LPs and turntables, had to memorize for each song when the vocals kicked in. It was enough to drive them frantic.
For DJs handling genres such as house and hiphop, those containing vocals and rap, one mistake could lay spoken vocals on top of other vocals. This kind of overlap grated on the ears and had to be avoided above all else. Today’s DJs use data to guide them. Waveforms can now be visualized on a screen, making the structure of a song much easier to recognize by sight. Even so, identifying the starting point of vocals from waveforms alone is no easy task.
In this project, Qosmo, Inc. has partnered with AlphaTheta Corporation to develop a mechanism that detects whether vocals are present in a song, and if so at what point they start and stop. Using deep learning, the system analyzes data on songs that contain vocals (songs, rap, choruses, etc.) and those that do not and learns to predict which songs will contain vocals. The learned data are mainly expressed as a spectrogram, a graph that displays the time distribution of each frequency.
The software displays the timing of the detected vocals as an overlay on the waveform. DJs can tell at a glance where the vocals appear in the song, so they can use this information in mixing and other performances.
The engine used to develop this software is incorporated in the newly released Pioneer DJ rekordbox for Mac/Windows (ver. 6.0.1).
Going forward, Qosmo plans to apply similar technology in its long-running AI DJ Project.
One of our missions is to apply the power of AI technology to help musicians and other artists expand their range of expression. Qosmo is delighted to release the technology created in this joint-research project as a product that DJs everywhere can use. We hope this technology will provide musicians and other artists with a valuable tool to assist them in their creative efforts.
Nao Tokui (Qosmo, Inc.), Max Frenzel(Qosmo, Inc.)
Jun Kato (Dentsu Craft Tokyo), Yumi Takahashi (Qosmo, Inc.)
This project uses Qosmo Music & Sound AI.