“AI DJ Project” is a live performance featuring an Artificial Intelligence (AI) DJ playing alongside a human DJ. Utilizing various deep neural networks, the software(AI DJ) selects vinyl records and mixes songs. Playing alternately, each DJ selects one song at a time, embodying a dialogue between the human and AI through music. DJ-ing “Back to Back” serves as a critical investigation into the unique relationship between humans and machines. The system of AI DJ consists of the following three features:
1. Music Selection
We trained three different neural networks for inferring genres, musical instruments and drum machines used in the track from spectrogram images. AI DJ “listens” to what human DJ plays and extracts auditory features using those networks. The extracted features are compared with those of all tracks in our pre-selected record box, so that the system can select the closest one, which presumably has similar musical tone/mood.
It is also a task for AI DJ to control the pitch(speed) of the turntable to match the beat. We used “reinforcement learning”(RL) to teach the model how to speed up/down, nudge/pull the turntable to align downbeats through trials and errors. For this purpose, we built an OSC-compatible custom turntable and robot fingers to manipulate.
A good DJ should pay attention to the energy of the audience. We utilize a deep leaning based motion tracking technique to quantify how much people in the audience dance to the music AI plays for future music selection.
We have performed several times in different locations in Japan and Europe. AI’s slight unpredictability always brought amusing tension into the performance and gave new ideas to human DJs on what/how to play music as a DJ. AI is not a replacement for the human DJ. Instead, it is a partner that can think and play alongside its human counterpart, bringing forth a wider perspective of our relationship to contemporary technologies.
A DJ (or disc jockey) is a person who mixes different sources of pre-existing recorded music, usually for a live audience in a nightclub. It is regarded as a highly creative process to select appropriate music and mix them in smooth and pleasant ways.
The art of DJ has been one of many testbeds of computational creativity. ‘AlgoRhythms’ is a Turing test competition, where DJ software mix given music automatically and try to convince human evaluators that human DJs did the mixes. ‘2045’ is an AI-themed DJ party, where each DJ brings his/her custom DJ algorithm and let it play in lieu.
Unlike these previous attempts, our AI DJ project doesn’t aim to automate the whole DJ process, but rather tries to accomplish a successful collaboration between AI and human DJ. Hence in our DJ session, software and human DJ plays alternately one track at a time(usually referred as Back to Back or B2B).
In B2B the AI system and human DJ perform under similar conditions as much as possible. For example, the AI uses the same physical vinyl records and turntables as human DJ. The system listens to tracks played by the human DJ and chooses the next record to be played. (It is a task for human assistants to look for the selected record and set it to the turntable.) After a record is set the AI begins the process again, adjusting the tempo of the next track to the tempo of the track played by its human counterpart. The beats of both tracks are matched by controlling the pitch(rotation speed) of the turntable. For this purpose, we built a custom DJ turntable and a robot finger, which can be plugged into a computer and be manipulated via OSC protocol.
1. MUSIC SELECTION
The minimum requirement for a DJ is to maintain the “flow” of music, so it is a common practice to select a next track, which sounds somewhat similar to what is being played, but has something new in its rhythm structure/sound texture. . . etc at the same time. Also, DJs usually use instruments or sometimes prominent drum-machine sounds used in tracks as clues for music selection (i.e., a track with piano solo to a track with organ riff, Two tracks both with Roland TR-808 snare)
Based on these observations, we trained three different neural networks. Our models and datasets used for each model are the following:
Genre Inference (wasabeat dance music dataset)
Instrument Inference (IRMAS dataset)
Drum Machine Inference (200.Drum.Machines dataset)
Each model is a convolutional neural network similar to , which takes spectrogram images of sounds and infers genres(minimal techno/tech house/hip-hop. . . ), instruments(piano/trumpet. . . ) and drum machines (TR-808/TR-909. . . ).
Once we got the network trained, we can use the same model to extract auditory features in a high dimensional vector. When human DJ is playing, the system feeds the incoming audio into the model and generate a feature vector. The vector will be compared with those of all tracks in our pre-selected record box (with over 350 tracks for the present), so that the system can select the closest track, which presumably has similar musical tone/mood/texture, as the next track to play.
It’s worth noting that we initially collected and analyzed DJ playlist dataset (visualized in the image) and used it to select the most likely candidate according to the data as in the collaborative filter. We soon realized, however, that it ended up banal music selections, then decided to ignore all metadata associated with the music (genre, artist name, label, etc.) and focus only on the audio data.
The second task for AI DJ is to control the pitch(=speed) of turntable to match the beat with music human DJ plays. We used “reinforcement learning”(RL) to teach the model how to speed up/down, nudge/pull the turntable to align downbeats through trials and errors. We use various metrics in  to compute rewards for the model.
We have found that it is relatively easy to match tempo of two tracks, but very difficult to align the “phase” of beats at the same time due to its longterm dependency: the result of any manipulation can be observed as changes in tempo only after several bars. Hence, the beat matching through RL is still an open challenge.
“A good DJ is always looking at the crowd, seeing what they like, seeing whether it’s working; communicating with them, smiling at them. And a bad DJ is always looking down at the decks and just doing whatever they practiced in their bedroom, regardless of whether the crowd are enjoying it or not.”Norman Cook, aka Fatboy Slim
During music selection process, the system tries to select tracks with similar mood as mentioned above, as long as the amount of the body movement is more significant than a given threshold. Once the index gets less than the threshold, random noise, inverse proportional to the amount of the body movement, was added to the feature vectors of incoming music, so that the system might be able to explore new musical realm and (hopefully) stimulate the seemingly bored audience.
Unsurprisingly, this randomness apparently worked as a feedback loop in the performance: the randomness brought more confusion to the audience, and it led to more randomness. It ended up proving the difficulty to maintain a subtle balance between regularity and unexpectedness in DJ’s music selection process.
At the latest AI DJ performance in Dec 2017, we introduced a new feature: “reading” a crowd. It is an essential role of DJ to read the audience and play music suitable to the atmosphere. In the performance, we deployed a camera system to track the movement of the bodies in the crowd using OpenPose library. The system quantifies how much the audience appreciates (i.e., dance) the music being played and use the information in the process of music selection.
|2016/9/4||2045 Generation #4（京都岡崎音楽祭「OKAZAKI LOOPS」内）||京都国立近代美術館|
|2016/10/27||2045 × LIFE PAINT Supported by VOLVO CAR JAPAN||代官山UNIT|
|2017/2/17||DIGITAL CHOC — マシン・デジラント 欲望する機械 —||渋谷WWW|
|2017/9/14||Festival Speculum Artium 2017 in Slovenia||Zavod za kulturo Delavski dom Trbovlje, Slovenia|
|2017/9/21||SCOPITONE FESTIVAL 2017 in France||STEREOLUX, Nantes, France|
|2017/12/15||sound tectonics #20（Guest DJ : tofubeats, Licaxxx）||山口情報芸術センター[YCAM]|
|2019/5/7||Google I/O 2019||Mountain View, California, USA|
|2019/6/1||Japan Media Arts Festival x MUTEK.JP||日本科学未来館（Miraikan）|
Nao Tokui (Qosmo, Inc.)
Shoya Dozono (Qosmo, Inc.)
Yuma Kajihara (Qosmo, Inc.)
Miyu Hosoi (Qosmo, Inc.)
wasabeat, Chris Romero, Yansu Kim, Rakutaro Ogiwara