JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 9

Dynamical System Modeling and Stability Investigation�DSMSI-2025

May 08-10, 2025, Kyiv, Ukraine

Architecture and Training Principles of Stable Diffusion

Dmitriy Klyushin, Professor,

Doctor of Physical and Mathematical Sciences

Pavlo Lysyi, PhD Student

2 of 9

Architecture of the Stable Diffusion Model

Dynamical System Modeling and Stability Investigation, DSMSI-2025

3 of 9

3 main part of Stable Diffusion

U-Net is a neural network that predicts and removes noise at each diffusion step, restoring the image.
Autoencoders compress images into a compact latent space, speeding up generation and reducing memory usage.
Text Encoder transforms text prompts into vectors that guide image generation, typically using models like BERT or CLIP.

Dynamical System Modeling and Stability Investigation, DSMSI-2025

4 of 9

U-Net architecture (256 × 256 input)

Dynamical System Modeling and Stability Investigation, DSMSI-2025

U-Net

5 of 9

Autoencoders

Dynamical System Modeling and Stability Investigation, DSMSI-2025

General structure of an autoencoder

6 of 9

Text encoder

Dynamical System Modeling and Stability Investigation, DSMSI-2025

General structure of an text-encoder

7 of 9

Diffusion process

Dynamical System Modeling and Stability Investigation, DSMSI-2025

General structure of diffusion process

8 of 9

Image generation results

Dynamical System Modeling and Stability Investigation, DSMSI-2025

Additional results for the comparison of the output of SDXL with previous versions of Stable Diffusion. For each prompt, shows 3 random sample

9 of 9

Thank you for your attention