Neutone – AI Audio Plugin and Community

Bringing the latest AI audio technology to the hands of music creators



Neutone 紹介ムービー

Neutone is a project built around an audio plugin that enables innovative musical expression by bringing the latest AI models, developed by AI researchers, into the hands of music/sound creators. This plugin works on top of a Digital Audio Workstation (DAW) and is able to drive real-time Digital Signal Processing (DSP) models built using deep learning. Before Neutone, there was a large technical gap between artists and creators who wished to use AI in their creative process. Neutone solves this problem without requiring specialized hardware. In addition, AI researchers and engineers can easily share their latest achievements with music/sound creators on this platform.


There used to be a steep learning curve for those who wished to use AI models in music production. Python programming can be performed on any generic computer nowadays and with the advent of new libraries like PyTorch, AI programming has become significantly simplified. However, these skills are still rare for musicians and creators to possess and only a handful of them have been able to run pre-existing code, let alone develop new models.

In addition, AI models take a long time to train, especially with the large amount of data required for the models in question. Computers with GPUs can accelerate this time-consuming process, but such hardware is not only expensive but also requires a good amount of experience to set up.

Furthermore, the AI models generated in this way have no interface for use with general-purpose music production software, and there are many limitations on how they can be used for creative purposes. In most cases, existing interfaces are not sufficient for real-time operation, making them unsuitable for live performances.

DAW Plugin

Neutone VST/Audio Unit plug-in

At the heart of the Neutone project is a VST/AudioUnit format plugin, developed by Qosmo. This plugin allows real-time audio signal processing models developed with PyTorch to work in DAWs such as Ableton Live and Logic, running on general-purpose computers. A number of AI audio conversion models have already been published for this plugin and can be loaded from the repository within the plugin.

Some of the models available at the time of writing are listed below

  • RAVE.amen – Transform input sounds into Amen Break
  • RAVE.evoice and RAVE.jvoice – Transform input sounds into voice
  • RAVE.kora – Transform input sounds into kora (African harp)
  • DDSP.violin – Transform input sounds into violin
  • DDSP.sax – Transform input sounds into saxophone
  • DDSP.shakuhachi – Transform input sounds into Shakuhachi
  • conv1d-overdrive.random – Overdrive with deep learning
  • temporalconv.reverb – Reverb using deep learning
Neutone model repository view

This mechanism allows for completely different outputs depending on the model loaded, unlike other audio plugins, which are generally limited to providing pre-programmed functions. This mechanism allows Neutone to serve as a platform for AI researchers and developers to make new AI models available to creators.


Many of the models currently available in Neutone use techniques to transform the timbre of the input sound. There have been significant developments in AI timbre-transform technologies over the past few years, and in particular, RAVE and DDSP algorithms have made it possible to generate high-quality output at 48 kHz sampling rates in real-time on a general-purpose CPU.

Training and inference process of timber transfer model

These models are trained with several hours of input sounds and are evaluated on their ability to reproduce the input timbre. Each model has its preferred input sounds at inference time, but a major advantage is that the same model can be used regardless of the type of sound fed into the conversion.

The following is an example of voice input to a drum sound model trained using RAVE as a demonstration during the prototype development of the Neutone plug-in. You can hear that the output drum sound reflects the changes in verbal expressions such as “dokodokodokodoko” and “tsk tsk tsk.” While technologies that convert singing voices into MIDI to control musical instruments have existed for some time, they fail to capture nuances in performance. Therefore this new breed of timber transfer technology has the potential to create novel and unique expressions.

Qosmo develops RAVE models for Neutone under license from IRCAM, the developer of the RAVE technology. We also develop and license timber transfer models for commercial purposes. You can find more details here.


The Neutone project aims to bring researchers and creators closer together and encourage the application of technology to music creation, but we only stand at the start line. In order to allow AI researchers to publish their own models on the Neutone platform, we’ve created an SDK for developers on GitHub.

Neutone Developer SDK:

Neutone Developer SDK on GitHub

We sincerely hope to engage researchers, engineers, and music creators with the Neutone project in order to further explore the creative potential of AI. If you are interested in contributing, please come and join us in the Neutone Discord channel!

Neutone Discord channel:




  • Project Direction

    Akira Shibata

  • Concept / Tech Lead, Machine Learning

    Nao Tokui

  • Tech Lead, Plugin Front-end

    Robin Jungers

  • Back-end

    Bogdan Teleaga

  • Tech Lead, Plugin and Architecture

    Andrew Fyfe

  • Machine Learning

    Christopher Mitcheltree

  • Machine Learning

    Naotake Masuda

  • RAVE algorithm was developed by Antoine Caillon and Philippe Esling, STMS Laboratory (IRCAM, CNRS, Sorbonne University, Ministry of Culture and Communication) and licensed by IRCAM

Get in touch with us here!