Diffusion models
Ruba Haroun
Senior Research Engineer, Google DeepMind
I. Generative modelling
II. Iterative refinement
III. Diffusion models
IV. Guidance
V. Other topics
I. Generative modelling
Generative modelling: the probabilistic perspective
x ~ p(x)
entropy
x
Explicit: autoregression, flows, VAEs, …
Implicit: GANs, …
Conditional generative models
sparsely conditioned
densely conditioned
class labels
bounding boxes
segmentation
grayscale image (colorisation)
y = “cat”
Control model output using a control signal: p(x|c) vs. p(x)
II. Iterative refinement
Two approaches to iterative refinement
Autoregression: step-by-step
Turn everything into a 1D sequence,�generate it one step at a time
Diffusion: iterative denoising
Gradually add noise until all information is destroyed,�then learn to invert this procedure step by step
Autoregression: step-by-step
chain rule of probability
p(x) = Πi p(xi|x<i)
Autoregression in pixel space: PixelRNN & PixelCNN
van den Oord et al. ‘Pixel Recurrent Neural Networks’ (2016)
van den Oord et al. ‘Conditional Image Generation with PixelCNN Decoders’ (2016)
Autoregression in amplitude space: WaveNet & SampleRNN
van den Oord et al. ‘WaveNet: a Generative Model for Raw Audio’ (2016)
Mehri et al. ‘SampleRNN: An Unconditional End-to-End Neural Audio Generation Model’ (2016)
III. Diffusion models
Diffusion: iterative denoising
Diffusion: forward process
+ δ
…
+ δ
+ δ
…
+ δ
x0
xt
x∞
training data
noisy�data
Gaussian�noise
Diffusion: forward process
+ δ
…
+ δ
+ δ
…
+ δ
x0
xt
x∞
training data
noisy�data
Gaussian�noise
xt = x0 + σ(t)·ε
Diffusion: forward process
+ δ
…
+ δ
+ δ
…
+ δ
x0
xt
xT
training data
noisy�data
Gaussian�noise
xt = α(t)·x0 + σ(t)·ε
·γ
·γ
·γ
·γ
Diffusion: forward process
Diffusion: backward process
+ δ
…
+ δ
+ δ
…
+ δ
x0
xt
xT
training data
noisy�data
Gaussian�noise
Diffusion: backward process
xt
x0
Diffusion: backward process
xt
x̂0
predict x0
x0
Diffusion: backward process
predict x0
x0
xt
x̂0
Diffusion: backward process
xt
x̂0
take a small step
x0
Diffusion: backward process
add some noise
x0
xt
x̂0
xt-1
ξ
Diffusion: backward process
repeat
x0
xt
x̂0
xt-1
ξ
x̂0
Diffusion: backward process
repeat
x0
xt
x̂0
xt-1
ξ
x̂0
Diffusion: backward process
repeat
x0
xt
x̂0
xt-1
ξ
x̂0
xt-2
ξ
Diffusion:�predict ε instead of x0?
Salimans & Ho ‘Progressive Distillation [..]’ (2022)
Lipman et al. ‘Flow matching [..]’ (2022)
xt = α(t)·x0 + σ(t)·ε
Diffusion training: summary
For each training example x0:�
Diffusion sampling: summary
At each sampling time step t:�
IV. Guidance
Guidance: a cheat code�for diffusion models
https://sander.ai/2022/05/26/guidance.html�https://sander.ai/2023/08/28/geometry.html
Diffusion: classifier guidance
xt
x0
Diffusion: classifier guidance
xt
x̂0
predict x0
x0
Diffusion: classifier guidance
calculate ∇xlog p(c|xt)
x0
xt
x̂0
∇xlog p(c=‘bunny’|xt)
Diffusion: classifier guidance
combine directions
x0
xt
x̂0
∇xlog p(c=‘bunny’|xt)
Classifier guidance: the Bayesian perspective
Diffusion: classifier guidance
calculate ∇xlog p(c|xt)
x0
xt
x̂0
∇xlog p(c=‘bunny’|xt)
Diffusion: classifier guidance
scale by ɣ
x0
xt
x̂0
ɣ·∇xlog p(c=‘bunny’|xt)
Diffusion: classifier guidance
x0
xt
x̂0
ɣ·∇xlog p(c=‘bunny’|xt)
combine directions
Diffusion: classifier guidance
x0
xt
x̂0
ɣ·∇xlog p(c=‘bunny’|xt)
add some noise
xt-1
ξ
Classifier guidance: the Bayesian perspective
Diffusion: classifier-free guidance
xt
x̂0
predict x0
x0
Diffusion: classifier-free guidance
xt
x̂0
predict x0|c
x0
x̂0|c
Diffusion: classifier-free guidance
xt
x̂0
calculate difference
x0
x̂0|c
δ
Diffusion: classifier-free guidance
xt
x̂0
amplify difference
x0
x̂0|c
ɣ·δ
Diffusion: classifier-free guidance
xt
x̂0
take a small step
x0
x̂0|c
ɣ·δ
Diffusion: classifier-free guidance
xt
x̂0
add some noise
x0
x̂0|c
ɣ·δ
xt-1
ξ
The power of classifier-free guidance�A stained glass window of a panda eating bamboo
Nichol et al. ‘GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models’ (2021)
https://sander.ai/2022/05/26/guidance.html
The power of classifier-free guidance�A cozy living room with a painting of a corgi on the wall […]
Nichol et al. ‘GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models’ (2021)
https://sander.ai/2022/05/26/guidance.html
V. Other topics
Latent diffusion
https://sander.ai/2020/09/01/typicality.html�Rombach et al. ‘High-Resolution Image Synthesis with Latent Diffusion Models’ (2021)
ℒgenerator
decoder
encoder
input
reconstruction
latents
ℒregression + ℒperceptual + ℒadversarial
ℒbottleneck
training�stage 1
training�stage 2
encoder
input
latents
iterative generator�(AR or diffusion)
sampling
iterative generator�(AR or diffusion)
latents
decoder
output
𝛁
𝛁
𝛁
❄
❄
❄
w=256
h=256
c=3
w=32
c=8
h=32
pixels
latents
‘EQ-VAE: Equivariance Regularized Latent Space for Improved Generative Image Modeling’,�Kouzelis, Kakogeorgiou, Gidaris, Komodakis, arXiv, 2025.
VI. Examples
Image generation at scale: Imagen 4�https://deepmind.google/models/imagen/
Image generation at scale: Imagen 4�https://deepmind.google/models/imagen/
Image generation at scale: Imagen 4�https://deepmind.google/models/imagen/
Image generation at scale: Nano Banana�https://deepmind.google/models/gemini-image/flash/
Prompt: Make woman underwater, and remove the couch and wallpaper
Image generation at scale: Nano Banana�https://deepmind.google/models/gemini-image/flash/
Prompt 1: Remove the door mirror.
Thank you!