1 of 40

Parameter Inference of Music Synthesizers �using Deep Learning

Hao Hao Tan

helloharry66@gmail.com

Good morning ladies and gentlemen at the ADC. My name is Hao Hao, and thank you for joining me for my talk today about Parameter Inference of Music Synthesizers using Deep Learning. A little bit about myself, I am a music technology enthusiast currently based in Singapore (unfortunately I can’t be in London at the time being, but hopefully I can see you guys in person). I am particularly interested in the intersection of AI/ML/music, previously I worked on topics such as music generation, cover song recognition, music fingerprinting and audio synthesis.

Today, I am going to talk about synthesizers and its parameters. So, as we all know, music synthesizers are everywhere in today’s music production pipeline. And finding the correct parameters in order to shape the desired sound becomes an important process for many producers. Today I would like to share a bit about how we can use deep learning to facilitate this sound design process.

2 of 40

My first taste on a synth plugin…

Before diving into the details, I want to share a little bit about why I am interested in this topic. So I started learning sound design few years ago, and this is the user interface that I first encounter in the course, and the first thing that came to my mind is that - holy crap there are so many knobs on this thing!

Well, of course looking back we know, Sylenth doesn’t have the most number of knobs for you to tune, but still, there is still a certain amount of parameters that you need to take care of.

And then I started to learn about presets, which is some set of parameters that professional sound designers come up with, so you can just lazily load them in and voila~ you have some sound that you desired. Then you don’t need to go through the hassle of designing the sounds, right? So I fall in love with presets, the more presets the merrier.

Then an idea came to me which is - why not we train some kind of AI model, such that given any sound that you have, it can output the synth preset for you, and you can just load it into your synth without tuning the knobs / without searching tireless for the best preset?

And that is basically the motivation behind this topic - parameter inference of music synthesizers.

3 of 40

Let’s see how parameter inference works…

Syntheon: https://github.com/gudgud96/syntheon

4 of 40

Synthesis Methods - A Brief Look

5 of 40

Types of Synthesizers

Additive synthesis

6 of 40

Types of Synthesizers

Subtractive synthesis

7 of 40

Types of Synthesizers

Wavetable synthesis

Wavetable synthesis is that we store a fixed size window of wavelet in memory, sample based on a phase accumulator

Depending on the fundamental frequency, it determines the phase increment, i.e. how fast are you sampling from the wavetable

Let’s say your frequency is f, sample increment is window length * ratio between f.f and sample rate

Why? 1 second f_sr samples (sample rate). 1 second we have f cycles, so 1 cycle takes up f_sr / f samples. But for your waveable 1 cycle is window_length samples. So you do a scaling - on actual signal you increment by 1 sample, on wavetable you need to increment by window length * f.f / f_sr samples.

But sample increment calculated according to this formula will not always be an integer, which means sometimes we need some missing sample “between” the two samples of a digital signal

So we need interpolation, one way is linear interpolation

The formula is a fancy way of describing the stuffs just now. Fractional indexing = when sample increment is non-integer, which kernel will you use

Parameters - amplitude, output level per wavetable, type of interpolation

8 of 40

Types of Synthesizers

FM synthesis

9 of 40

Types of Synthesizers

Many other synthesis methods!

Granular synthesis, waveshaping, physical modelling, etc…�
Combination of synthesis methods (e.g. additive + noise)�
Will focus on additive, subtractive, FM, wavetable in this talk

10 of 40

Parameter Inference

11 of 40

Why need parameter inference?

Amateurs

Sound design made easier with inferred preset to start with
Educational tool to learn sound design�

Professionals

Save time!
Discover new sounds through searching “audio-parameter space”

12 of 40

Which parameters?

Oscillator part

ADSR envelope
Oscillator level, wave type (sine, square, saw)
FM - Modulation index, FM configuration
Wavetable - wavetable, phase offset
Filter - cutoff frequency, resonance�

FX part

Mix level of each FX
(Multi-band) compressor - compression ratio, threshold, attack / release
Reverb - reverb size, frequency cutoff
EQ - cutoff, resonance, gain level

13 of 40

Past Works (non-DL) on Parameter Inference

Genetic algorithms

A. Horner, J. Beauchamp, and L. Haken., 1993. Genetic Algorithms and Their Application to FM, �Matching Synthesis [link]
Mitchell et al., 2005. Frequency Modulation Tone Matching Using a Fuzzy Clustering Evolution Strategy [link]
Michael Chinen and Naotoshi Osaka, 2007. Genesynth: Noise band-based genetic algorithm analysis/synthesis framework [link]
James M. McDermott, 2008. Evolutionary Computation Applied to Sound Synthesis [link]
Tatar et al., 2016. Automatic Synthesizer Preset Generation with PresetGen [link]�

Regression model

Itoyama et al., 2014. Parameter Estimation of Virtual Musical Instrument Synthesizers [link]�

Particle swarm optimization

Heise et al., 2009. Automatic Cloning of Recorded Sounds by Software Synthesizers [link]�

Hill climber

Yee-King et al., 2016. Automatic Programming of VST Sound Synthesizers using Deep Networks �and Other Techniques [link]�

Gaussian Processes

Huang et al., 2014. Active learning of intuitive control knobs for synthesizers using gaussian processes [link]

14 of 40

Deep Learning

15 of 40

Basic Concepts of Deep Learning

Deep learning: Machine learning with (many layers of) neural networks�
Neural networks: function approximators with learnable parameters�
Types of learning:

Supervised learning - x - input data, y - label

Loss function is commonly a distance metric between f(x) and actual y�

Unsupervised / self-supervised learning - just input data x, without label y

Commonly, objective function includes input x, e.g. reconstructing x -> f(x) = x�

Semi-supervised / weakly-supervised learning

Combination of both
Commonly, objective function is a weighted sum of both modes

16 of 40

Gradient Descent & Back-Propagation

**Differentiable - gradients can be calculated + back-propagated to learnable parameters

Before diving deeper we need to understand how deep neural networks are being optimized.

Say you have a neuron unit, with a linear function wx+b, with learnable parameters w and b, and a sigmoid activation function.�

Neural networks are optimized by an algorithm called gradient descent and back propagation. You need to do 4 steps basically -

Forward pass, pass your x into function f, then pass x’ into function sigma, to get your estimated output y_hat�
Calculate loss w.r.t your label y, mean square error loss�
Backward pass / backpropagation, calculate the derivative of the loss function at each step, from right to left, until your learnable parameter�We know from chain rule, that the derivative will be the product of all derivatives from right to left, so keep on calculating the intermediate derivatives using chain rule this until w and b�
Update rule, update your current parameter value by a fraction of the derivative. This fraction is called the alpha / learning rate

So if all gradients can be calculated and backpropagated to your learnable parameters, then this module is differentiable. We will revisit this when we talk about differentiable DSP.

17 of 40

Neural Network Blocks

Feedforward Neural Network (FFN / MLP)

Convolution Neural Network (CNN)

Recurrent Neural Network (RNN)

Temporal Convolution Network (TCN)

18 of 40

Why Deep Learning for Parameter Inference?

Neurals networks are good at learning highly complex, nonlinear mapping functions
Following benchmark by Yee-King et al. (2016)

Performance - Deep learning method outperforms other methods
Inference speed - Deep learning method close to real-time, search-based method ~40ms

Yee-King et al., 2016. Automatic Programming of VST Sound Synthesizers using Deep Networks and Other Techniques [link]

19 of 40

Supervised Learning

20 of 40

Supervised Learning Formulation

21 of 40

How to build your dataset?

Programmatically generated

Write a programmatic synthesizer yourself!
Automation on desired VST

RenderMan: https://github.com/fedden/RenderMan
DawDreamer: https://github.com/DBraun/DawDreamer�

Gather online

Ready-made preset banks, Splice

22 of 40

Example: Syntheon Baseline Model on Vital

Parameters: Wavetable + ADSR values
Wavetable inferred directly from audio
Dataset: Generated 100k+ audio based on Eurorack module wavetables, with random values of attack, decay and sustain

Syntheon: https://github.com/gudgud96/syntheon

23 of 40

Example: Inversynth on subtractive + FM synth

Parameters: ADSR envelopes, FM oscillator function parameters, LP filter cutoff frequency & resonance
Dataset: Generated based on FM synthesis

Barkan et al, 2018. Inversynth: Deep Estimation of Synthesizer Parameter Configurations from Audio Signals - https://arxiv.org/pdf/1812.06349.pdf

24 of 40

Barkan et al, 2018. Inversynth: Deep Estimation of Synthesizer Parameter Configurations from Audio Signals - https://arxiv.org/pdf/1812.06349.pdf

25 of 40

Example: SerumRNN on Serum

Parameters: Effect chain and effect parameters
Dataset: Generate audio based on Serum presets, add up to 5 effects
2 parts of model:

Effect parameter model (estimate value)
Effect selection model (estimate sequence)

Mitcheltree et al., 2021. SerumRNN: Step by Step Audio VST Effect Programming - https://arxiv.org/pdf/2104.03876.pdf

26 of 40

Mitcheltree et al., 2021. SerumRNN: Step by Step Audio VST Effect Programming - https://arxiv.org/pdf/2104.03876.pdf

27 of 40

Other Works

Yee-King et al., 2016. Automatic Programming of VST Sound Synthesizers using Deep Networks and Other Techniques [link]�
Esling et al., 2019. Universal Audio Synthesizer Control With Normalizing Flows [link]�
Vaillant et al., 2021. Improving Synthesizer Programming From Variational Autoencoders �Latent Space [link]�
Chen et al., 2022. Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation [link]

28 of 40

Semi / Self-supervised Learning

29 of 40

Formulation

30 of 40

DDSP (Differentiable Digital Signal Processing)

Differentiable DSP components (synthesizers / audio effects)
Allow components to back-propagate gradient
Allow end-to-end learning�
Downsides:

You need to rewrite every component you need such that it is differentiable

Engel et al., 2020. DDSP: Differentiable Digital Signal Processing - https://arxiv.org/pdf/2001.04643.pdf

31 of 40

Example: A differentiable wavetable oscillator

Syntheon: https://github.com/gudgud96/syntheon

32 of 40

Example: DDX7 on Dexed

Dataset: URMP dataset (violin, flute, trumpet)
Parameters: Output level of 6 oscillators
Differentiable FM synthesis, training by minimizing multi-scale spectral loss
Uses temporal convolution networks (TCN)

Caspe et al., 2022. DDX7: Differentiable FM Synthesis Of Musical Instrument Sounds - https://arxiv.org/pdf/2208.06169.pdf

33 of 40

Example: Masuda et al. on additive synthesis

Semi-supervised training
Dataset:

In-domain (supervised): Generated using random parameter values on Harmor-like synthesizer
Out-domain (self-supervised): NSynth dataset

Parameters: Oscillator amplitude, filter cutoff frequency, saw/square wave mix, ADSR envelope
Differentiable additive synthesiser

Pre-train by using parameter loss (in-domain)
Fine-tune by using multi-scale spectral loss (out-domain)

Masuda et al., 2022. Synthesizer Sound Matching With Differentiable DSP - https://archives.ismir.net/ismir2021/paper/000053.pdf

Another work is done by Masuda et al. on additive synthesis. Here, the authors used unlabelled data to try and improve the synthesized sound, such that it sounds more similar to the query sound.

The authors used semi supervised learning - so for the supervised dataset, the authors generated them using random parameter values on a Harmor-like synthesizer

And additionally, the authors used an unlabelled dataset called NSynth, which is an audio dataset with musical note recordings, to further fine-tune the model/

The parameters here include the oscillator amplitude, filter cutoff frequency, ADSR and so on.

The interesting part is how is it being trained - so you pre-train your model just like what we discussed in the supervised learning part, using parameter loss

After that, you fine-tune your model using the method we discussed in the self-supervised learning part, which is only the minimize the reconstruction loss of the audio.

34 of 40

Masuda et al., 2022. Synthesizer Sound Matching With Differentiable DSP - https://archives.ismir.net/ismir2021/paper/000053.pdf

35 of 40

Discussion

36 of 40

Discussion

Modulation

“Warping” sounds
LFO, macro, automation…�

Polyphonic cases (chords, ambient soundscapes)

Most work are done on single note / monophonic cases
Need to rely on onset detection / music transcription�

Alternative self-supervised methods

Implementing your own DDSP modules are troublesome

37 of 40

Summary

Parameter inference of music synthesizers are useful for simplifying the parameter tuning process when crafting the desired sound profile�
In deep learning context, we can formulate this problem in 3 different ways:

Supervised:

minimize parameter loss

Self-supervised:

need differentiable synthesizer / DSP modules
minimize audio reconstruction loss

Semi-supervised:

combining both
use labeled data as guidance + unlabeled data to improve performance

38 of 40

References

39 of 40

References

Deep Learning for Audio

Russel McClellan - A practical perspective on deep learning in audio software. ADC’19 [link]�

Theses

Matthew John Yee-King, 2011. Automatic Sound Synthesizer Programming: Techniques and Applications [link]
Jordie Shier, 2017. The Synthesizer Programming Problem: Improving the Usability of Sound Synthesizers [link]�

Code repositories

torchsynth, GPU enabled & differentiable modular synthesis [github]
SpiegeLib, automatic synthesizer sound matching [github]�

Differentiable filters

Kuznetsov et al., 2020. Differentiable IIR Filters For Machine Learning Applications [link]
Shahan Nercessian, 2020. Neural Parametric Equalizer Matching Using Differentiable Biquads [link]

40 of 40

Thank you!

gudgud96.github.io� @GoodGood014

@gudgud96� helloharry66@gmail.com

Hao Hao Tan