We, human beings, can imagine sounds by taking a glance at a photo: The scenery of a beach may bring the sound of crashing waves to mind. You may hear sounds of horns and street advertising when you look at a picture of a busy crossing. “Imaginary Soundscape” is a web-based sound installation, focusing on this unconscious behavior, where viewers can freely walk around Google Street View and immerse themselves into imaginary soundscapes generated with deep learning models.
This work is based on the recent development of the cross-modal information retrieval technique, such as image-to-audio, text-to-image, using deep learning. Given video inputs, the system was trained with two models: one well-established, pre-trained image recognition model, processes the frames, while another convolutional neural network reads the audio as spectrogram images, evolving so that the distribution of its output gets as close as possible to that of the first one. Once trained, the two networks allow us to retrieve the best-matched sound file for a scene, out of our massive environmental sound dataset.
The soundscapes generated by the AI sometimes amaze us by meeting our expectation, but occasionally ignore the cultural and geographical context (the sound of waves on an icy field of Greenland for instance). These differences and mistakes lead us to contemplate how the imagination works and how fertile the sound environments surrounding us are. By externalizing our synesthetic thinkings, we tried to shed lights on the power of imagination we all share.
We have exhibited a sound installation “Imaginary Soundwalk” based on this system at Media Ambition Tokyo 2018.
Nao Tokui (Qosmo, Inc.)
Shoya Dozono (Qosmo, Inc.)
Yuma Kajihara (Qosmo, Inc.)
Robin Jungers (Qosmo, Inc.)