Qosmo Music & Sound AI

Img2Sound

FEATURES

  • Quickly extract sound clips from the library that “match” a given image

  • Smart indexing runs allow searches to be performed on any sound library without the need to retrain the model

  • Supports not only “image to music” but also other modalities such as “music to image”, “text to sound”, “video to sound”

  • Quickly extract sound clips from the library that “match” a given image

  • Smart indexing runs allow searches to be performed on any sound library without the need to retrain the model

  • Supports not only “image to music” but also other modalities such as “music to image”, “text to sound”, “video to sound”

USE CASE

  • Music selection according to content

    Finding the right sound source for the content you are producing from a large amount of stock sound sources can be a daunting task. Now finding the right sound for the given use case becomes simple and fast.

  • New listening experience

    By selecting music that fits the scenery and atmosphere of a place, or by generating music that matches previously taken photos, the system can create a new way to enjoy music.

IMPLEMENTA-
TION

IMPLEMENTATION

  • Our website Imaginary Soundscape allows you to experience sound matching with Img2Sound by uploading your own images or by navigating Google Street View.

    Demo site for this product: Imaginary Soundscape (https://imaginarysoundscape.net/)

TECHNOLOGY

  • The system successfully achieved high accuracy by applying deep learning based on convolutional neural networks to select the most appropriate sound clip (environmental sound) for a given image. Since last year, we have further improved the accuracy of this system and expanded the selection target to include not only environmental sounds but also musical compositions, and as a result, we are now able to provide an algorithm that can more accurately match a wider variety of sounds. One factor in the development of this technology is the application of the CLIP model released by OpenAI last year. This model, which has been trained on a very wide range of data, enables accurate interpretation of a wide range of images and sounds in a broader context.

TECH SPEC

  • Price system

    License term: Monthly

    Developer's license: Yes

  • Input/Output

    Input: Image, video

    Output: Audio (WAV)

  • Operating environment

    Cloud computing: Standard API provided

    On-premise environment: Possible by consultation

  • Processing speed

    Real time

Get in touch with us here!

CONTACT

Get in touch with us here!

CONTACT