Diffusion-based generative models for audio
GLADIA Research Group
Speakers: Michele Mancusi
Giorgio Mariani
Generative Models
Generative Models
Generative Models: Diffusion models
Understanding Diffusion Models: Langevin Dynamics
Brownian motion
Pollen grains in water
Simulation of particles
moving in water
https://water.lsbu.ac.uk/water/Brownian.html
Langevin Dynamics
For very tiny particles (few microns):
Brownian motion
Brownian motion
Brownian motion
Fokker-Planck equation
Steady State of Langevin Dynamics
if we want to sample from the distribution we need to set the potential energy to be
Our equation becomes
Sampling Using Langevin Dynamics
We can use the Euler-Maruyama method
Sampling Using Langevin Dynamics
Illustration from https://yang-song.net/blog/2021/score
Diffusion Models
Score function
Score function!
Score-based model
We need to train a model to estimate the score function
by minimizing this loss (a.k.a. Fisher divergence)
The ideal workflow…
Two important issues
1st Problem: Low-density region
How can we bypass the difficulty of accurate score estimation in regions of low data density?
Solution: add noise to the data
How much noise?
Solution: Multiple Levels!
Annealed Langevin dynamics combine a sequence of Langevin chains with gradually decreasing noise scales.
Annealed Langevin Dynamics
2nd Problem: Unknown data score
Solution: Denoising Score Matching
Noise Conditional Score Networks (NCSN)
The Network
Forward and Backward Diffusion
Forward Diffusion Process
Backward Diffusion Process
Backward Diffusion Process
(Song et al., 2019; Ho et al. 2020; Song et al., 2021)
What about audio?
Focus on music – Commercial interest
Music production (DAWs)
Mixing
Recommendation systems
Lyrics extraction
Immersive music
Data Representation
Symbolic
Continuous
Data Representation
Multi-Source Diffusion Models
Core idea
Multi-Source Diffusion Models
Inference procedure: Total Generation
Inference procedure: Partial Generation
Inference procedure: Source Separation
Quantitative metrics
Future work
Thank You!