1 of 9

Dynamical System Modeling and Stability Investigation�DSMSI-2025

May 08-10, 2025, Kyiv, Ukraine

Architecture and Training Principles of Stable Diffusion

Dmitriy Klyushin, Professor,

Doctor of Physical and Mathematical Sciences

Pavlo Lysyi, PhD Student

2 of 9

Architecture of the Stable Diffusion Model

Dynamical System Modeling and Stability Investigation, DSMSI-2025

3 of 9

3 main part of Stable Diffusion

  • U-Net is a neural network that predicts and removes noise at each diffusion step, restoring the image.
  • Autoencoders compress images into a compact latent space, speeding up generation and reducing memory usage.
  • Text Encoder transforms text prompts into vectors that guide image generation, typically using models like BERT or CLIP.

Dynamical System Modeling and Stability Investigation, DSMSI-2025

4 of 9

U-Net architecture (256 × 256 input)

Dynamical System Modeling and Stability Investigation, DSMSI-2025

U-Net

5 of 9

Autoencoders

Dynamical System Modeling and Stability Investigation, DSMSI-2025

General structure of an autoencoder

6 of 9

Text encoder

Dynamical System Modeling and Stability Investigation, DSMSI-2025

General structure of an text-encoder

7 of 9

Diffusion process

Dynamical System Modeling and Stability Investigation, DSMSI-2025

General structure of diffusion process

8 of 9

Image generation results

Dynamical System Modeling and Stability Investigation, DSMSI-2025

Additional results for the comparison of the output of SDXL with previous versions of Stable Diffusion. For each prompt, shows 3 random sample

9 of 9

Thank you for your attention