1 of 40

Parameter Inference of Music Synthesizers �using Deep Learning

Hao Hao Tan

helloharry66@gmail.com

2 of 40

My first taste on a synth plugin…

3 of 40

Let’s see how parameter inference works…

4 of 40

Synthesis Methods - A Brief Look

5 of 40

Types of Synthesizers

Additive synthesis

6 of 40

Types of Synthesizers

Subtractive synthesis

7 of 40

Types of Synthesizers

Wavetable synthesis

8 of 40

Types of Synthesizers

FM synthesis

9 of 40

Types of Synthesizers

Many other synthesis methods!

  • Granular synthesis, waveshaping, physical modelling, etc…�
  • Combination of synthesis methods (e.g. additive + noise)�
  • Will focus on additive, subtractive, FM, wavetable in this talk

10 of 40

Parameter Inference

11 of 40

Why need parameter inference?

  • Amateurs
    • Sound design made easier with inferred preset to start with
    • Educational tool to learn sound design�
  • Professionals
    • Save time!
    • Discover new sounds through searching “audio-parameter space”

12 of 40

Which parameters?

  • Oscillator part
    • ADSR envelope
    • Oscillator level, wave type (sine, square, saw)
    • FM - Modulation index, FM configuration
    • Wavetable - wavetable, phase offset
    • Filter - cutoff frequency, resonance�
  • FX part
    • Mix level of each FX
    • (Multi-band) compressor - compression ratio, threshold, attack / release
    • Reverb - reverb size, frequency cutoff
    • EQ - cutoff, resonance, gain level

13 of 40

Past Works (non-DL) on Parameter Inference

  • Genetic algorithms
    • A. Horner, J. Beauchamp, and L. Haken., 1993. Genetic Algorithms and Their Application to FM, �Matching Synthesis [link]
    • Mitchell et al., 2005. Frequency Modulation Tone Matching Using a Fuzzy Clustering Evolution Strategy [link]
    • Michael Chinen and Naotoshi Osaka, 2007. Genesynth: Noise band-based genetic algorithm analysis/synthesis framework [link]
    • James M. McDermott, 2008. Evolutionary Computation Applied to Sound Synthesis [link]
    • Tatar et al., 2016. Automatic Synthesizer Preset Generation with PresetGen [link]
  • Regression model
    • Itoyama et al., 2014. Parameter Estimation of Virtual Musical Instrument Synthesizers [link]�
  • Particle swarm optimization
    • Heise et al., 2009. Automatic Cloning of Recorded Sounds by Software Synthesizers [link]�
  • Hill climber
    • Yee-King et al., 2016. Automatic Programming of VST Sound Synthesizers using Deep Networks �and Other Techniques [link]�
  • Gaussian Processes
    • Huang et al., 2014. Active learning of intuitive control knobs for synthesizers using gaussian processes [link]

14 of 40

Deep Learning

15 of 40

Basic Concepts of Deep Learning

  • Deep learning: Machine learning with (many layers of) neural networks�
  • Neural networks: function approximators with learnable parameters�
  • Types of learning:
    • Supervised learning - x - input data, y - label
      • Loss function is commonly a distance metric between f(x) and actual y�
    • Unsupervised / self-supervised learning - just input data x, without label y
      • Commonly, objective function includes input x, e.g. reconstructing x -> f(x) = x�
    • Semi-supervised / weakly-supervised learning
      • Combination of both
      • Commonly, objective function is a weighted sum of both modes

16 of 40

Gradient Descent & Back-Propagation

**Differentiable - gradients can be calculated + back-propagated to learnable parameters

17 of 40

Neural Network Blocks

Feedforward Neural Network (FFN / MLP)

Convolution Neural Network (CNN)

Recurrent Neural Network (RNN)

Temporal Convolution Network (TCN)

18 of 40

Why Deep Learning for Parameter Inference?

  • Neurals networks are good at learning highly complex, nonlinear mapping functions
  • Following benchmark by Yee-King et al. (2016)
    • Performance - Deep learning method outperforms other methods
    • Inference speed - Deep learning method close to real-time, search-based method ~40ms

Yee-King et al., 2016. Automatic Programming of VST Sound Synthesizers using Deep Networks and Other Techniques [link]

19 of 40

Supervised Learning

20 of 40

Supervised Learning Formulation

21 of 40

How to build your dataset?

  • Programmatically generated
    • Write a programmatic synthesizer yourself!
    • Automation on desired VST
  • Gather online
    • Ready-made preset banks, Splice

22 of 40

Example: Syntheon Baseline Model on Vital

  • Parameters: Wavetable + ADSR values
  • Wavetable inferred directly from audio
  • Dataset: Generated 100k+ audio based on Eurorack module wavetables, with random values of attack, decay and sustain

23 of 40

Example: Inversynth on subtractive + FM synth

  • Parameters: ADSR envelopes, FM oscillator function parameters, LP filter cutoff frequency & resonance
  • Dataset: Generated based on FM synthesis

Barkan et al, 2018. Inversynth: Deep Estimation of Synthesizer Parameter Configurations from Audio Signals - https://arxiv.org/pdf/1812.06349.pdf

24 of 40

Barkan et al, 2018. Inversynth: Deep Estimation of Synthesizer Parameter Configurations from Audio Signals - https://arxiv.org/pdf/1812.06349.pdf

25 of 40

Example: SerumRNN on Serum

  • Parameters: Effect chain and effect parameters
  • Dataset: Generate audio based on Serum presets, add up to 5 effects
  • 2 parts of model:
    • Effect parameter model (estimate value)
    • Effect selection model (estimate sequence)

Mitcheltree et al., 2021. SerumRNN: Step by Step Audio VST Effect Programming - https://arxiv.org/pdf/2104.03876.pdf

26 of 40

Mitcheltree et al., 2021. SerumRNN: Step by Step Audio VST Effect Programming - https://arxiv.org/pdf/2104.03876.pdf

27 of 40

Other Works

  • Yee-King et al., 2016. Automatic Programming of VST Sound Synthesizers using Deep Networks and Other Techniques [link]�
  • Esling et al., 2019. Universal Audio Synthesizer Control With Normalizing Flows [link]�
  • Vaillant et al., 2021. Improving Synthesizer Programming From Variational Autoencoders �Latent Space [link]�
  • Chen et al., 2022. Sound2Synth: Interpreting Sound via FM Synthesizer Parameters Estimation [link]

28 of 40

Semi / Self-supervised Learning

29 of 40

Formulation

30 of 40

DDSP (Differentiable Digital Signal Processing)

  • Differentiable DSP components (synthesizers / audio effects)
  • Allow components to back-propagate gradient
  • Allow end-to-end learning�
  • Downsides:
    • You need to rewrite every component you need such that it is differentiable

Engel et al., 2020. DDSP: Differentiable Digital Signal Processing - https://arxiv.org/pdf/2001.04643.pdf

31 of 40

Example: A differentiable wavetable oscillator

32 of 40

Example: DDX7 on Dexed

  • Dataset: URMP dataset (violin, flute, trumpet)
  • Parameters: Output level of 6 oscillators
  • Differentiable FM synthesis, training by minimizing multi-scale spectral loss
  • Uses temporal convolution networks (TCN)

Caspe et al., 2022. DDX7: Differentiable FM Synthesis Of Musical Instrument Sounds - https://arxiv.org/pdf/2208.06169.pdf

33 of 40

Example: Masuda et al. on additive synthesis

  • Semi-supervised training
  • Dataset:
    • In-domain (supervised): Generated using random parameter values on Harmor-like synthesizer
    • Out-domain (self-supervised): NSynth dataset
  • Parameters: Oscillator amplitude, filter cutoff frequency, saw/square wave mix, ADSR envelope
  • Differentiable additive synthesiser
    • Pre-train by using parameter loss (in-domain)
    • Fine-tune by using multi-scale spectral loss (out-domain)

Masuda et al., 2022. Synthesizer Sound Matching With Differentiable DSP - https://archives.ismir.net/ismir2021/paper/000053.pdf

34 of 40

Masuda et al., 2022. Synthesizer Sound Matching With Differentiable DSP - https://archives.ismir.net/ismir2021/paper/000053.pdf

35 of 40

Discussion

36 of 40

Discussion

  • Modulation
    • “Warping” sounds
    • LFO, macro, automation…�
  • Polyphonic cases (chords, ambient soundscapes)
    • Most work are done on single note / monophonic cases
    • Need to rely on onset detection / music transcription�
  • Alternative self-supervised methods
    • Implementing your own DDSP modules are troublesome

37 of 40

Summary

  • Parameter inference of music synthesizers are useful for simplifying the parameter tuning process when crafting the desired sound profile�
  • In deep learning context, we can formulate this problem in 3 different ways:
    • Supervised:
      • minimize parameter loss
    • Self-supervised:
      • need differentiable synthesizer / DSP modules
      • minimize audio reconstruction loss
    • Semi-supervised:
      • combining both
      • use labeled data as guidance + unlabeled data to improve performance

38 of 40

References

39 of 40

References

  • Deep Learning for Audio
    • Russel McClellan - A practical perspective on deep learning in audio software. ADC’19 [link]�
  • Theses
    • Matthew John Yee-King, 2011. Automatic Sound Synthesizer Programming: Techniques and Applications [link]
    • Jordie Shier, 2017. The Synthesizer Programming Problem: Improving the Usability of Sound Synthesizers [link]�
  • Code repositories
    • torchsynth, GPU enabled & differentiable modular synthesis [github]
    • SpiegeLib, automatic synthesizer sound matching [github]�
  • Differentiable filters
    • Kuznetsov et al., 2020. Differentiable IIR Filters For Machine Learning Applications [link]
    • Shahan Nercessian, 2020. Neural Parametric Equalizer Matching Using Differentiable Biquads [link]

40 of 40

Thank you!

gudgud96.github.io� @GoodGood014

@gudgud96� helloharry66@gmail.com

Hao Hao Tan