1 of 181

Text-zu-Bild-KI “Stable Diffusion” - Revolution der Bilder?

Stable Diffusion

2 of 181

Stable Diffusion

generiert mit FLUX1.dev (1-3)

3 of 181

Slideshow - generierte Bilder

4 of 181

Quelle

5 of 181

flux1.dev (12)

6 of 181

Quelle

7 of 181

flux1.dev (14)

8 of 181

Quelle

9 of 181

Quelle

10 of 181

Quelle

11 of 181

flux1.dev (11)

12 of 181

Quelle

13 of 181

Quelle

14 of 181

Quelle

15 of 181

flux1.dev (4)

16 of 181

flux1.dev (16)

17 of 181

Quelle

18 of 181

flux1.dev (13)

19 of 181

Quelle

20 of 181

It’s important to have good advisors

Quelle

21 of 181

Quelle

22 of 181

flux1.dev (5)

23 of 181

flux1.dev (9)

24 of 181

Quelle

25 of 181

Quelle

26 of 181

Quelle

27 of 181

Quelle

28 of 181

Quelle

29 of 181

flux1.dev (8)

30 of 181

Quelle

31 of 181

Quelle

32 of 181

flux1.dev (10)

33 of 181

flux1.dev (6)

34 of 181

flux1.dev (7)

35 of 181

Text-zu-Bild-KI “Stable Diffusion” - Revolution der Bilder?

Stable Diffusion

Ende der Slideshow

36 of 181

Vorstellung

Julian Egner
seit >20 Jahren in der Softwareentwicklung
seit 2016 bei der tarent (heute Qvest Digital)
Softwareentwickler

Java, Kotlin
Angular
Android

Ich bin kein AI-Experte

Flux1.dev�Comic image of a nerdy half bald programmer with glasses in a black hoodie with orange text "QVEST"

37 of 181

AGENDA

Einleitung
Funktionsweise
Prompting
Einsatzmöglichkeiten
Ökosystem
Entwicklung

SDXL, SD3 und Flux
von Schwächen zu Lösungen

Soziale und Gesellschaftliche Auswirkungen
Fragen & Diskussion

Fragen gerne am Ende

echtes Photo

38 of 181

Einleitung - was ist Text-to-Image?

photograph of a (one massive colorful crystal:1.2) growing out of the rocky mountain, (focus on crystal:1.2), 4k, 8k, (highly detailed), ((landscape)),(translucent crystal:1.1), light going trough the crystal, bokeh, chromatic aberration, mountain view

Die Software erstellt aus einem Text (dem Prompt) ein Image

Prompt von: https://www.reddit.com/r/StableDiffusion/comments/ylajxn/hyperrealistic_crystalsrocks/

Generiert mit https://huggingface.co/spaces/stabilityai/stable-diffusion

39 of 181

Einleitung - was ist Image-to-Image?

a tropical island from above

As a Monster from monster inc movie

Erstellt mit https://huggingface.co/spaces/huggingface-projects/diffuse-the-rest

40 of 181

Funktionsweise

41 of 181

Funktionsweise: Überblick

Latent Space

Text-Conditioning

Diffusion Model

?

“A quiet forest in winter”

Stable Diffusion

?

42 of 181

Funktionsweise: Diffusion Models

https://twitter.com/tomgoldsteincs/status/1562503816679145474

43 of 181

Funktionsweise: Latent Space

“To enable DM [Diffusion Model] training on limited computational resources while retaining their quality and flexibility, we apply them in the latent space of powerful pretrained autoencoders.” [https://arxiv.org/abs/2112.10752]

https://hackernoon.com/latent-space-visualization-deep-learning-bits-2-bd09a46920df

44 of 181

Funktionsweise: Latent Space

Quelle: https://towardsdatascience.com/understanding-latent-space-in-machine-learning-de5a7c687d8d

Das Konzept des latent space ist der Kern des deep learning - �Eigenschaften von Daten lernen und Datenstrukturen vereinfachen, um Muster zu finden.

Stell dir einen grossen Datensatz von handgeschriebenen Zahlen (0-9) vor.�Bilder der gleichen Zahl (z.B. 3) sind zueinander am ähnlichsten, verglichen mit Bilder anderer Zahlen.

Wenn wir ein Model darauf trainiert haben, Zahlen einzuordnen, dann ist es auch darauf trainiert, die “strukturellen Ähnlichkeiten” (‘structural similarities’) zwischen Bildern zu erkennen. Durch Lernen der Eigenschaften jeder Zahl kann das Model dann Zahlen erkennen.

Das passiert im Hintergrund, versteckt - “latent”.

45 of 181

Funktionsweise: Text-Conditioning

https://jalammar.github.io/illustrated-stable-diffusion/

46 of 181

Funktionsweise: Trainings-Daten

Stable Diffusion Model wurde trainiert auf 3 subsets von LAION 5B

laion2B-en

Englische Captions
2 Mrd. Elemente

laion-high-resolution
laion-aesthetics v2 5+

ein Subset von LAION 5B
gefiltert danach, wie wahrscheinlich die Bilder als “schön” empfunden werden
Hunderte Millionen von Bildern wurden genutzt

https://laion.ai/blog/laion-5b/

47 of 181

Prompting

48 of 181

Prompting: was sind Prompts?

Mit der Eingabe (Prompt) lenkt der Anwender die Software dahin, das zu generieren, was er möchte.

Häufig werden die Prompts sehr detailliert, aber auch ein kurzer Prompt kann interessante Ergebnisse liefern.

photograph of a (one massive colorful crystal:1.2) growing out of the rocky mountain, (focus on crystal:1.2), 4k, 8k, (highly detailed), ((landscape)),(translucent crystal:1.1), light going trough the crystal, bokeh, chromatic aberration, mountain view

crystal growing out of the rocky mountain

49 of 181

Prompting: Gewichtung

Gewichtung gibt einen Hinweis, welche Bildelemente wichtiger sind

photograph of a (one massive colorful crystal:1.2) growing out of the rocky mountain, (focus on crystal:1.2), 4k, 8k, (highly detailed), ((landscape)),(translucent crystal:1.1), light going trough the crystal, bokeh, chromatic aberration, mountain view

Mehrere Wörter können durch Klammern zu einem Bildelement zusammengefasst werden.

Mit :1.2 wird dieses Bildelement um 20% höher gewichtet. �Auch niedrigere Gewichtung ist mit :0.8 möglich.

Mehrere Klammern wie ((landscape)) erhöhen die Gewichtung um jeweils 10%, zwei Klammern also um 1.1 * 1.1 = 1.21

Eckige Klammern [landscape] senken die Gewichtung um 10%.��Empfohlen wird die Schreibweise (landscape:1.2), wegen der Lesbarkeit.�Ausserdem werden drei Klammern von seltsamen rechtsextremen in den USA verwendet, um antisemitische Kommentare zu machen. Deshalb kann es passieren, daß der Account gesperrt wird, wenn man Text in 3 Klammern postet

50 of 181

Prompting: Negative Prompts

negative Prompts geben der Software einen Hinweis, was man nicht haben möchte (Doppelte Köpfe, unscharfe Bilder etc)

negative Prompts sind bei Version 2.0 anscheinend besonders wichtig

https://www.reddit.com/r/StableDiffusion/comments/z6nyu0/sd_20_since_images_of_lovely_ladies_seem_to_get/

Negative Prompt: make up, ugly, hands, blurry, low resolution, animated, cartoon, lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck

50 steps using Euler a, 768x768, CFG: 7.

Prompt: a beautiful ((closeup)) cinematic still image of a natural ((young)) woman as (medieval knight) standing in a forest near a ruined castle, sunset, headshot, textured skin, freckles, (((skin pores))), birthmark, auburn hair, blue eyes, dramatic lighting, Zeiss lens, 35mm film, f/2.8

Stable Diffusion 2.0

51 of 181

Generierung

https://artificial-art.eu/ (kein login, viele Optionen und viele models)

powered by stable horde (crowdsourced distributed computing)

siehe Anhang und stablehorde.net

Aktuelle Alternativen:

https://clipdrop.co/�finetuned SDXL, meiste Funktionen kostenpflichtig, aber einfache Prompts möglich
https://huggingface.co/black-forest-labs/FLUX.1-dev�flux1.dev, Anmeldung nötig

Hier hatte ich eigentlich geplant, live eine Generierung zu zeigen. Die Änderungen, die zu besseren Ergebnissen führen, sorgen aber dafür, daß die Generierung länger dauert (etwa 5 Minuten für 10 Bilder).

Die Zeit haben wir jetzt nicht, deshalb zeige ich nur, wie das aussehen würde - im Workshop ist mehr Zeit

—----------

Links folgen: Vollbild wird automatisch ausgeschaltet

Wenn Generierung zu lange dauert, einfach auf “Bilder” wechseln oder auf die nächste Folie.

Wenn fertig, F11 und einen Moment warten für Vollbild

Prompt: a beautiful ((closeup)) cinematic still image of a natural ((young)) woman as (medieval knight) standing in a forest near a ruined castle, sunset, headshot, textured skin, freckles, (((skin pores))), birthmark, auburn hair, blue eyes, dramatic lighting, Zeiss lens, 35mm film, f/2.8

Negative Prompt: make up, ugly, hands, blurry, low resolution, animated, cartoon, lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck

52 of 181

53 of 181

Prompting: mehr als Prompting

Img2Img
Inpainting/ Outpainting
depth2Img (Stable Diffusion 2.0)

inpainting: Ersetzen eines Bildausschnittes

Outpainting: Erweitern eines Bilds

https://www.reddit.com/r/StableDiffusion/comments/xcsbhg/because_soon_or_later_all_of_us_are_going_to_do/

Siehe auch clipdrop uncrop

54 of 181

Prompting: mehr als Prompting

Img2Img
Inpainting/ Outpainting
depth2Img (Stable Diffusion 2.0)

The new depth-guided stable diffusion model, depth2img, extends the prior image-to-image feature from V1 with entirely new creative possibilities. Depth2img determines the depth of an input image (using an existing model) and then generates new images based on both the text and the depth information.

55 of 181

Prompting: region prompter extension

https://www.reddit.com/r/StableDiffusion/comments/13360c1/allure_of_the_lake_txt2img_region_prompter/

https://github.com/hako-mikan/sd-webui-regional-prompter

Wenn man versucht, einen Teil eines Bildes anzupassen (z.B. blaue Haare einer Person), kommt es vor, daß auch andere Bildelemente geändert werden, also z.B. ein Vorhang blau wird (prompt bleeding).

Völlig verschiedene Elemente in ein Bild zu bekommen, ist also ohne Nachbearbeitung nicht ganz einfach.

Diese Extension sorgt dafür, daß für die einzelnen Bereiche unterschiedliche prompts verwendet werden können.

prompt bleeding ist bei flux behoben

56 of 181

Einsatzmöglichkeiten

57 of 181

Einsatzmöglichkeiten: Kunst

https://www.reddit.com/r/StableDiffusion/comments/12fdexx/dreaming_of_the_universe

58 of 181

Einsatzmöglichkeiten: Musikalben

SDXL 1.0�Prompt: Cover of metal music album with evil dark duck and title text "Fear of the Duck"�https://clipdrop.co/stable-diffusion

59 of 181

Einsatzmöglichkeiten: Shirts

Generiert von mir. Workflow siehe Anhang.

Ada Lovelace

echtes Photo

60 of 181

Einsatzmöglichkeiten: Logos

SDXL

clipdrop.co/stable-diffusion�Prompt: Kangaroo�Style: Neon Punk

Der Hintergrund wurde mit�https://clipdrop.co/remove-background entfernt,�sonst unbearbeitet

61 of 181

Einsatzmöglichkeiten: AI Actors

Quelle: https://www.reddit.com/r/StableDiffusion/comments/11qexu0/animate_your_stable_diffusion_portraits/

62 of 181

Fake Video

Flux & Kling AI

https://www.reddit.com/r/StableDiffusion/comments/1eppdz7/flux_with_kling_ai_no_upscale_done_its_scary_isnt/

63 of 181

Einsatzmöglichkeiten: (Spiele-)entwicklung

https://www.reddit.com/r/StableDiffusion/comments/13cdl2j/my_ongoing_mission_to_create_the_perfect/

reddit.com/StableDiffusion

Isometrische Teile bzw. Welten

App Icons

64 of 181

Einsatzmöglichkeiten: Comic Panels

“Consistent comic character - test with SD Dreambooth trained on myself (plus a couple of celeb co-stars)”

65 of 181

Einsatzmöglichkeiten: reverse Comic

https://www.reddit.com/r/StableDiffusion/comments/14okexp/simpsons_house

66 of 181

Einsatzmöglichkeiten: Werbung

Burger-Bild von: https://www.reddit.com/r/StableDiffusion/comments/z0rzm2/we_really_enjoyed_having_lunch_at_this_restaurant/

Generierung von Artikeln wie z.B. Mahlzeiten für eine Speisekarte

Lachs-Bild von: https://www.reddit.com/r/StableDiffusion/comments/12afsuq/its_addicting_creating_food_in_sd_when_youre

67 of 181

Einsatzmöglichkeiten: Werbung (Mode)

https://www.reddit.com/r/StableDiffusion/comments/155k1ck/a_real_skirt_in_the_ai_fashion_model/

https://www.reddit.com/r/StableDiffusion/comments/157tm5w/generate_models_wearing_specific_clothes_using

68 of 181

Einsatzmöglichkeiten: Restauration

Soviet Soldier, 1946 https://www.reddit.com/r/StableDiffusion/comments/12fcgdq/soviet_soldier_1946/

69 of 181

Ökosystem

70 of 181

Ökosystem

Stable Diffusion the Software

Automatic1111’s WebUi

stable horde

dreambooth

SD- models

upscaler

webservices

Apps

stability.ai

eleuther.ai

CompVis�Machine Vision and Learning LMU Munich

Korrektur-GANs

ControlNet-models

Stable Diffusion the Model [1.4,1.5,2.0,2.1, 3.0]

SDXL [0.9, 1.0]

TI

LoRA

ComfyUI

clipdrop.co

flux

black forest labs

LAION

Unternehmen / Gruppen

Personalisierung

User Interfaces

Nachbearbeitung

Models

Anwendungen

crowdsourced distributed computing

71 of 181

Ökosystem: models

https://www.reddit.com/r/StableDiffusion/comments/z6eg5x/generating_porsches_with_the_knollingcase_model

https://www.reddit.com/r/StableDiffusion/comments/yhi8zo/modern_disney_lara_croft_prompt_settings_in

https://www.reddit.com/r/StableDiffusion/comments/yujief/samdoesarts_model_v2_huggingface_link_in_comments

Knollingcase

Modern Disney

samdoesart

mehr models: siehe Anhang

Huggingface: > 13.800 Models�https://huggingface.co/models?other=stable-diffusion

CivitAI: > 6000 Models (Stand 2023)

https://civitai.com/models

72 of 181

Ökosystem: models

Model freedom

Auf Basis von Stable Diffusion 2.1

https://civitai.com/models/87288/freedomredmond

https://www.reddit.com/r/StableDiffusion/comments/146272w/freedom_is_here_the_generalist_21_768x_finetuned

73 of 181

Ökosystem: Models

Vergleich einiger beispielhafter Models mit demselben Prompt und 2 verschiedenen Seeds. �Mehr im Anhang. Hinweis: vom 20.07.2023

Quelle: https://www.reddit.com/r/StableDiffusion/comments/154nd8y/more_realistic_model_comparisons/

74 of 181

Ökosystem: ControlNet

Wir wollen auch hier genauer Kontrollieren, was wir bekommen. �ControlNet ermöglich es uns, die Form besser festzulegen.

Was ist ControlNet?

ControlNet ist ein neuronales Netzwerk, um Stable Diffusion Models genauer kontrollieren zu können.

Die einfachste Form, Models zu verwenden, ist text-to-image, wobei text prompts als conditioning (Steuerung) verwendet werden (wie vorhin gezeigt). ControlNet fügt ein conditioning hinzu.

Dieses zusätzlich conditioning kann verschiedene Ausprägungen haben, je nachdem, welches ControlNet-Model verwendet wird. ControlNet-Models werden zusammen mit dem üblichen Stable Diffusion-Model verwendet.

Original

Prompt =

75 of 181

Ökosystem: ControlNet

Original

Quelle: https://www.reddit.com/r/StableDiffusion/comments/12gu6u4/i_know_youre_probably_tired_of_seeing_this_but/

76 of 181

Ökosystem: ControlNet

ControlNet Models

Von einem Bild werden Informationen extrahiert und zusammen mit einem Prompt verwendet, um ein neues Bild zu erzeugen.

Mehr Infos: https://civitai.com/articles/157/openpose-controlnets-v11-using-poses-and-generating-new-ones

edge detection (hier: canny edge)

human pose detection (hier: openpose)

77 of 181

Ökosystem: ControlNet

ControlNet-Model: Controlnet for QR Code

erstellt von https://qrbtf.com/
Model nicht veröffentlicht, nutzbar unter https://qrbtf.com/en
Veröffentlichtes Model: Qrcode Monster �https://huggingface.co/monster-labs/control_v1p_sd15_qrcode_monster

Ja, das sind funktionierende QR-Codes!

qrbtf

qrcode monster

78 of 181

Ökosystem: Dreambooth, TI & LoRA

Stable Diffusion personalisieren

Wir können neue Konzepte hinzufügen und der KI etwas “beibringen”, ein neues Objekt oder einen neuen Stil.

Dreambooth: mächtig, aber erzeugt neue Modelle (2-7GB)
Textual Inversion (TI): sehr klein (< 1MB), kleine Änderungen, zusätzlich zum Model
Low-Rank Adaptation (LoRA): mittel (20-700MB), zusätzlich zum Model

Details siehe Anhang

79 of 181

Entwicklung: SDXL, SD3 und Flux

80 of 181

Entwicklung: News 2023 & 2024

26.07.2023 SDXL 1.0 release
21.11.2023 SVD (stable video diffusion) release (ermöglicht Img2Video)
28.11.2023 SDXL Turbo release (braucht nur 1-4 Steps im Gegensatz zu 50 bei SDXL)
22.02.2024 SD3 preview
18.03.2024 SV3D (stable video 3D)
12.06.2024 SD3 medium release
24.07.2024 SV4D (stable video 4D)
01.08.2024 Stable Fast 3D: 3D Asset Generation from single Picture
01.08.2024 flux (dev, pro, schnell) release

SVD�https://stability.ai/news/stable-video-diffusion-open-ai-video-model�https://www.reddit.com/r/StableDiffusion/comments/183v0sa/seems_legit

SDXL Turbo �https://stability.ai/news/stability-ai-sdxl-turbo�https://clipdrop.co/stable-diffusion-turbo

SD3�https://stability.ai/news/stable-diffusion-3�https://stability.ai/news/stable-diffusion-3-medium

https://stability.ai/news/introducing-stable-fast-3d

Flux

https://blackforestlabs.ai/

81 of 181

Stable Diffusion XL (SDXL)

SDXL ist ein Model für stable diffusion (Also quasi Nachfolger von 1.5 und 2.1), keine “neue” Software

SDXL ist ein Zwischenschritt auf dem Weg zu stable diffusion 3

SDXL 1.0 wurde am 26.07.2023 veröffentlicht.

Selber generieren: (leider inzwischen kostenpflichtig)

https://clipdrop.co/stable-diffusion

Made with https://clipdrop.co/stable-diffusion (SDXL 0.9)

Prompt: a realistic blue frog with text 'FrosCon' on the back, sitting in green grass

Type: Photo

82 of 181

SDXL

SDXL ist ein Zwischenschritt auf dem Weg zu Stable Diffusion 3.

Mehr Daten: Parameter SDXL: 6.6 Mrd., 1.5: 0.98 Mrd.

SDXL besteht aus zwei Models: base und refiner Model.

Mehr Infos: siehe Anhang

83 of 181

SDXL

Made with https://clipdrop.co/stable-diffusion (SDXL 0.9)

Prompt: a duck with a sign saying Quack!

Type: No Style

84 of 181

Entwicklung: SD3

nur SD3 Medium ist öffentlich
Stark Zensiert - “Frau auf Wiese”
Reaktion der Community: Enttäuschend

Zitat Golem.de:�“Die Community hält die neue Version von Stable Diffusion für einen Witz.”

Menschen sind NSFW?

"Es funktioniert, solange keine Menschen im Bild zu sehen sind", schreibt ein User im Thread. "Ich denke, ihr härterer NSFW-Filter, der Trainingsdaten filtert, hat sich dazu entschieden, dass alles Menschliche als NSFW (Not safe for work) markiert wird." NSFW-Filter werden schon beim Training des Modells angewendet, um die Nutzung des Bildgenerators für pornografische oder erotische Bilder von Menschen einzuschränken.�(golem.de)

Die SD3 API-Variante wurde offenbar nicht zensiert, weil dort die Eingabe geprüft werden kann, was bei den herunterladbaren open source Versionen natürlich nicht möglich ist.

https://marketing-ki.de/ki-news/stable-diffusion-3-medium-ernuchterung-statt-begeisterung/

https://www.reddit.com/r/StableDiffusion/comments/1deaahg/sd3_dead_on_arrival/

https://www.golem.de/news/wegen-nsfw-filter-stable-diffusion-3-macht-menschen-zu-abscheulichkeiten-2406-186033.html

https://www.reddit.com/r/StableDiffusion/comments/1axo5hz/how_is_sd_3_censored

85 of 181

Entwicklung: Flux

Open Source Base Model von black forest labs,�früheren Mitarbeitern von stability AI.

Released 01.08.2024
3 Varianten: Dev, Pro und Schnell*
Source: https://github.com/black-forest-labs/flux
Online testen (Anmeldung erforderlich)
https://replicate.com/black-forest-labs
https://fal.ai/

* ja, das heisst wirklich flux-schnell, �weil black forest labs ein Startup aus Deutschland (Freiburg im Breisgau) ist

Prompt: Close-up of LEGO chef minifigure cooking for homeless. Focus on LEGO hands using utensils, showing culinary skill. Warm kitchen lighting, late morning atmosphere. Canon EOS R5, 50mm f/1.4 lens. Capture intricate cooking techniques. Background hints at charitable setting. Inspired by Paul Bocuse and Massimo Bottura's styles. Freeze-frame moment of food preparation. Convey compassion and altruism through scene details.

86 of 181

Entwicklung: Flux

12 Mrd. Parameter
Text funktioniert
Hände funktionieren auch bei komplizierten Fällen
Deutlich bessere prompt adherence, �kein prompt bleeding mehr
Keine übermäßige Zensur wie bei SD3, �“Frau auf Wiese” geht also

https://www.reddit.com/r/StableDiffusion/s/T0tJRcHhLI

Man beachte die Hände mit den ineinander verschlungenen Fingern!

87 of 181

Ökosystem: Flux models

Flux civitAI: https://civitai.com/models/618692/flux

https://huggingface.co/black-forest-labs/FLUX.1-schnell

https://huggingface.co/black-forest-labs/FLUX.1-dev

Flux.1-pro ist nicht veröffentlicht, �Zugriff ist nur über API möglich

88 of 181

Entwicklung: von Schwächen zu Lösungen

89 of 181

Entwicklung: Augen

nur 1.4, ab 1.5 besser
noch bei SDXL bei genauem Hinsehen Pupillen unförmig
bei Flux1 behoben

https://lexica.art/?prompt=ba8ad686-3b5a-49bf-9b98-d25f0225881d

flux1.dev�headshot photo of red haired woman with freckles and blue eyes

Ausschnitt

90 of 181

Entwicklung: Text

bis 2.1 nicht nutzbar
ab SDXL deutlich besser, funktioniert aber meist nicht beim ersten Versuch
bei flux1 funktioniert es meistens, teilweise noch Tricks wie Großbuchstaben nötig

https://www.reddit.com/r/StableDiffusion/comments/z22mru/i_was_generating_some_random_dinotopia_got_way_to/

Flux1.dev�a box of a boardgame about zootopia on a table, photorealistic

flux1.dev (15)

91 of 181

Entwicklung: Hände

ab 2.0 verbessert
seit SDXL häufiger 6 Finger
mit flux1 funktionieren überkreuzte Finger
Auch bei Flux klappen Handstellungen wie �Vulcan Greeting 🖖�Sign of the Horns 🤘 �und das “Einhorn” 🖕 �(noch) nicht,�“Peace” ✌️ aber schon

"woman showing her hands"

https://www.reddit.com/r/StableDiffusion/comments/z3a4ye/prompt_woman_showing_her_hands_on_stable

SDXL 1.0, prompt: spock vulcan salute

FLUX1

92 of 181

Entwicklung: prompt bleeding

region prompter Extension

Flux

https://www.reddit.com/r/StableDiffusion/s/l5HzxCBquc

Flux folgt dem Prompt deutlich besser als vorherige models, prompt bleeding scheint behoben zu sein

93 of 181

Entwicklung: Prompt Adherence

Bisher konnte es leicht passieren, daß Farbinformationen auf dem prompt auch an anderen Stellen des Bildes verwendet wurden - z.B. “green Dress” konnte auch zu grünen Vorhängen führen - �“prompt bleeding” bzw. schlechte “prompt adherence”

flux1.dev�blonde woman wearing a red dress next to a ginger girl wearing a green dress in a bedroom with purple curtains and yellow bedsheets

94 of 181

Entwicklung

SDXL 1.0�Woman with black tshirt with iron maiden logo and text "fear of the duck" and a picture of an evil duck

Style: Photographic

https://clipdrop.co/stable-diffusion

flux1.dev�a female metalhead, wearing a black t-shirt with the iron maiden logo, an evil duck and the text "Fear of the Duck"

https://huggingface.co/black-forest-labs/FLUX.1-dev

nochmal deutlich näher am Prompt und schärferes Bild

95 of 181

Entwicklung

Bisher war es nicht möglich, Bogenschützen korrekt darzustellen, mit der Sehne, Bogen, Pfeil und Händen an den richtigen Stellen.

flux1.dev

an archer aiming at a target, photorealistic

SDXL

Hier war die Hand vor der Sehne - physikalisch unmöglich

96 of 181

Entwicklung

flux1.dev

an airbus a380 with lufthansa colors flying over new york

Flux kennt wesentlich mehr als frühere Modelle und kann daher auch mit wenigen Informationen viel erreichen

97 of 181

Entwicklung: prompt adherence & realism

https://huggingface.co/XLabs-AI/flux-RealismLora

Prompt: a young blonde woman in a black dress waving. she has a red handbag over her right shoulder

Das war direkt der erste Versuch. Nochmal etwas realistischer als flux1.dev alleine, vor allem die Haut

98 of 181

Soziale und gesellschaftliche Auswirkungen

99 of 181

Copyright

Viele Fragen noch offen:

Trainingsdaten

LAION hält nur Referenzen zu öffentlich verfügbaren Bildern und Alt-Texten vor
Kein explizites Einverständnis der Rechteinhaber

Stile und Markenrecht (“by Greg Rutkowski”, “modern/classic Disney”)
Prompts

Grundsätzlich: egal, wie es erstellt wurde, man sollte besser nicht mit einem Bild “der Maus” öffentlich kommerziell Werben…

100 of 181

Auswirkungen

Kommerziell

Wird es schwieriger, von Kunst zu leben?
Oder eröffnen sich ganz neue Geschäftsmodelle?

Institutionell

Museen, Kunstpreise

Ästhetisch

Inflation der Bilder

Gesellschaftlich

“Alles Fake!”
Hyper-Individualitäts-Blasen

Politisch

Bilder für Wahlwerbung
Bilder für Propaganda

101 of 181

Fragen & Diskussion

102 of 181

Fragen und Diskussion

Diese Präsentation ist über https://tinyurl.com/y7jpyf7f abrufbar

Wir haben > 50 Seiten im Anhang mit weiteren Informationen,�und zusätzlich 20 Seiten mit Hinweisen für einen Workshop.

SDXL 1.0�Prompt: A robot with a speach bubble saying "Fragen?"

https://clipdrop.co/stable-diffusion

103 of 181

Präsentation Stable Diffusion

Diese Präsentation ist über https://tinyurl.com/y7jpyf7f abrufbar

104 of 181

Fragen & Diskussion

SDXL 1.0�Prompt: A robot with a speach bubble saying "Fragen?"

https://clipdrop.co/stable-diffusion

https://tinyurl.com/y7jpyf7f

105 of 181

weitere Informationen und Quellen

Anhang

106 of 181

Workflows

Jeweils generiert mit FLUX1.dev online mit https://huggingface.co/black-forest-labs/FLUX.1-dev

a white frog with a sign with text "FrOSCon 2024"
a white frog holding a sign with the text "Text-zu-Bild-KI "Stable Diffusion" - Revolution der Bilder?", blue background
a white frog holding a sign with the text “Revolution der BILDER?”, blue background
hyperrealistic photo of a middle aged woman, slightly smiling
A wizard holding a tiny dragon in his right hand, the dragon spits fire from its mouth, the wizard lights up his cigar with this fire (Nachbearbeitet, um Feuer über dem Drachenrücken zu entfernen)
in a glass are an ice cube and a mini titanic steamship (much smaller than the ice cube) swim next to each other on top of whiskey in a glass, photorealistic
french revolution, on the flag is text "REVOLUTION der BILDER"
photo of a small child hugging a teddy bear
photo of a painting on the wall which shows a fly
a young blonde woman petting the white rabbit in her arm, photorealistic
photo of an old pirate with tricorn, eye patch and a wooden leg, grinning and looking into a treasure chest with shining gold coins in it
photo of a young woman in jeans and a white shirt with peace sign on it, showing the peace sign with her right hand
mermaid swimming with fishes above a coral reef, photorealistic
photo of a roman soldier
photo of a book, oldfashioned with leather cover, on a wooden table, the book is open, the left page is blank, on the right page is the Text "Once upon a time, ..."
photo of a ginger woman wearing a green dress, smiling

107 of 181

Funktionsweise: Trainingskosten

⚠️ Modelle brauchen sehr viele Ressourcen beim Training

“We actually used 256 A100s for this per the model card, 150k hours in total so at market price $600k” [Emad Mostaque, CEO stabiliy.ai]

108 of 181

Funktionsweise: Hardwareanforderungen

Intel/AMD Prozessor + Grafikkarte

⚠️ GPU mit relativ viel VRAM (min. 8 GB für fertige Modelle, 30 GB für Modell-Verfeinerungen wie mit DreamBooth)
⚠️ Kann in der Cloud schwer zu bekommen sein (I am looking at you, AWS!)

Apple Silicon

It Just works, sogar auf dem kleinen Macbook Air M1, aber ca. 60 Sekunden pro Bild

Mobil

Hängt vom Gerät ab, moderne iPhones funktionieren

Hinweise siehe Anhang

109 of 181

Anhang: wie kann ich das nutzen?

Online: extrem viele Anbieter

Meine Empfehlung: stable horde (siehe nächste Seite)

https://huggingface.co/spaces/stabilityai/stable-diffusion

kostenlos, lange Wartezeit, wenig Einstellmöglichkeiten

https://lexica.art/ Generate-Button, Anmeldung, kostenlos
http://beta.dreamstudio.ai/ 150 image credits free
https://getimg.ai/ 100 credits free (per month), with GFPGAN & Real-ESRGAN upscaling and inpainting/outpainting
https://openart.ai/ 100 free tokens
https://theartbutton.ai/ (login needed, tokens needed, refresh to 100 tokens every day)
https://visualise.ai/ 3 free runs, 100 tokens nach anmeldung
https://histre.com/integrations/generative/

generierung online ohne login, aber die Ergebnisse werden direkt veröffentlicht

110 of 181

Anhang: wie kann ich das nutzen?

Online: stable horde (crowdsourced computing)

https://stablehorde.net/
https://artificial-art.eu/ fork von dem UI, das bei aqualxx verwendet wird. no login, no tokens,..
https://aqualxx.github.io/stable-ui/ stable horde deployment: no login, no tokens, <60 sec,

webp only..

https://diffusionui.com/b/stable_horde
https://tinybots.net/artbot
Mastodon bot: https://sigmoid.social/@stablehorde_generator�Example: "@stablehorde_generator draw for me a beautiful night style: fantasy"

Meine Empfehlung: https://artificial-art.eu/

111 of 181

Anhang: wie kann ich das nutzen?

Selbst installieren: cloud

Install SD on GCP

https://towardsdatascience.com/how-to-run-a-stable-diffusion-server-on-google-cloud-platform-gcp-c879357808bf

Achtung: nicht einfach, an VMs mit GPU zu kommen (gilt auch für AWS)

Bei unserem Versuch hat es 2 Wochen gedauert, an GPUs zu kommen (AWS), dort waren dann aber zu wenige CPUs.

Ähnlich bei paperspace.com: Dort gibt es zwar spezielle auf Ubuntu basierende Images. Allerdings muss man größere RAM-Mengen mit Begründung anfordern:

112 of 181

Anhang: wie kann ich das nutzen?

Selbst installieren: lokal

GPU nötig, mindestens 4GB VRAM, besser mehr - nvidia oder apple

https://blog.admin-intelligence.de/stable-diffusion-mit-docker/

Run local with nvidia graphics card

https://medium.com/geekculture/run-stable-diffusion-in-your-local-computer-heres-a-step-by-step-guide-af128397d424

1-click install with GFPGAN & upscaling (min 4GB VRAM)

https://github.com/cmdr2/stable-diffusion-ui

Local, CPU only (no GPU needed)

https://www.reddit.com/r/MachineLearning/comments/x3pvqa/p_run_stable_diffusion_cpu_only_with_web/

local super-hi-res:

https://www.reddit.com/r/StableDiffusion/comments/x5wkj7/testing_even_higher_res_3200x1920_i_think_my_3090/

MacOS only

https://diffusionbee.com/

113 of 181

Anhang: wie kann ich das nutzen?

lokal auf smartphone (iOS)

maple diffusion

https://twitter.com/amasad/status/1580772494230704128

https://github.com/madebyollin/maple-diffusion

promptArt

https://labml.ai/#promptArt

Draw Things: AI Generation (ios App)

114 of 181

Anhang: wie kann ich das nutzen?

smartphone (Android)

Meine Empfehlung: stable horde über https://artificial-art.eu/

https://www.arinteli.com/stable-diffusion-android-download/

Berechnung läuft auf Servern

https://play.google.com/store/apps/details?id=com.triceratop.aiapp

anscheinend kostenlos, gibt aber IAP??

Berechnung läuft auf Servern

https://play.google.com/store/apps/details?id=ai.pixelz.mobileApp

tokens, kostenpflichtig

Berechnung läuft auf Servern

115 of 181

Anhang: wie kann ich das nutzen?

rocketchat integration

https://www.appypie.com/integrate/apps/stable-diffusion/integrations/rocket-chat

Kostet, kann getestet werden

116 of 181

Prompting: Hints

https://publicprompts.art/

How a Stable Diffusion prompt changes its output for the style of 1500 artists

https://mpost.io/best-100-stable-diffusion-prompts-the-most-beautiful-ai-text-to-image-prompts/

lexica.art (wie immer)

https://www.reddit.com/r/StableDiffusion/

117 of 181

Prompting: Tricks

Delayed Keywords

https://www.reddit.com/r/StableDiffusion/comments/156s26v/delayed_keywords_is_a_nice_little_trick/

AND-Verknüpfungen im Prompt für mehrere Elemente

https://www.reddit.com/r/StableDiffusion/comments/155zidp/the_invention_of_the_razor_blade_c_5000_bc/

118 of 181

Ökosystem: models

Model: dreamscapes & dragonfire

(Autor: DarkAgent)

https://civitai.com/models/50294/dreamscapes-and-dragonfire-new-v20

https://www.reddit.com/r/StableDiffusion/comments/144wsoe/goddess_of_creation/

119 of 181

Entwicklung: SDXL

https://www.reddit.com/r/StableDiffusion/comments/14wog4r/sdxl_buffy_the_vampire_slayer_shirt_text_reads

Buffy the vampire slayer, shirt text reads slayer, metal festival with vampires

120 of 181

Prompting: Everything “by Greg Rutkowski”??

Betrifft Versionen bis 1.4 (die aber noch intensiv genutzt werden)

Um brauchbare Ergebnisse zu erzielen, fügen viele “by Greg Rutkowski” zu ihren Prompts hinzu (Ein lebender polnischer Künstler, der vor allem Fantasy bebildert, wie D&D und Magic).

Dadurch können seine eigenen Werke kaum noch gefunden werden, was er verständlicherweise nicht gut findet

“Well I guess soon I won't be able to find my own work on the internet cause it will be flooded with ai stuff.” https://twitter.com/GrzegorzRutko14/status/1568294080756473858

Eine mögliche Lösung wäre, einen (nicht existenten) Künstler zu nennen, dessen Nennung den gleichen Effekt hat: Sjampinjon Grzybski

https://www.reddit.com/r/StableDiffusion/comments/xn4jnr/art_by_sjampinjon_grzybski/

Oder man fügt eine Kunstrichtung statt eines Künstlers hinzu.

Hinweis: Bei stable diffusion 2.0 wurde die Verbindung zu Greg entfernt.

121 of 181

Einsatzmöglichkeiten: AI Actors

Quelle: Google Colab - Thin-Plate-Spline-Motion-Model for SD.ipynb

Quelle: reddit

122 of 181

Einsatzmöglichkeiten: Video Art

Video “History of Earth, Life and Civilization – made with AI” [1]

[1] Wahrscheinlich erstellt mit Deforum Stable diffusion

123 of 181

Einsatzmöglichkeiten: Image Compression

Stable Diffusion Based Image Compression by Matthias Bühlmann

Original

124 of 181

Anhang: Beispiel-Kosten

Anbieter	Produkt	Kosten
Paperspace.com	P5000	$0.78/h
AWS	g4ad.xlarge	$0.379/h
runpod.io	RTX A5000	$0.490/h
vast.ai	RTX A4000	$0.242/hr

Hinweis: diese Daten stammen aus Dezember 2022.

125 of 181

Anhang: Alternativen

DALL·E 2 (OpenAI)

https://www.heise.de/hintergrund/KI-System-DALL-E-Ein-Alleskoenner-fuer-Kreative-7206468.html?hg=1&hgi=0&hgf=false

Midjourney

Copyright: Alle von dir erstellen Bilder landen im Archiv.

Commercial Terms: https://midjourney.gitbook.io/docs/billing#commercial-terms

“Löscht Du Deinen Account, darfst Du die Bilder nicht mehr Dein nennen.”

https://mizine.de/midjourney/einrichten/

comic mit Midjourney erstellt https://www.reddit.com/r/StableDiffusion/comments/yjqxhw/the_lesson_a_free_comic_download_made_with/

Imagen https://imagen.research.google/

Wombo dream https://dream.ai

126 of 181

Anhang: sozial-gesellschaftliche Auswirkungen

127 of 181

Ökosystem: 1.5 vs. 2.0

“a professional photograph of a cat in a space suit” Generiert in Stable Diffusion 1.5 und 2.0

In 2.0 sind viele Bilder nicht mehr eingeflossen (NSFW-Filter) und die Zuordnung zu Künstlern wurde entfernt. Für Abbildungen von Menschen ist 2.0 nach ersten Auswertungen schlechter geeignet als 1.5, für andere Abbildungen aber besser.

Hinzugekommen sind aber ein Upscaler, depth2img, verbessertes Inpainting

Die Nutzbarkeit wurde mit 2.1 wieder verbessert, dennoch werden nach wie vor viele models eingesetzt, die auf 1.5 basieren.

https://www.reddit.com/r/StableDiffusion/comments/z3csv7/15_and_20_with_a_professional_photograph_of_a_cat/

128 of 181

Anhang: Ökosystem: 1.5 vs. 2.0

Hinweis: theoretisch veraltet (aktuelle version ist 2.1), aber 1.5 wird immer noch intensiv verwendet (bzw. darauf basierende Models)

https://stable-diffusion-art.com/how-to-run-stable-diffusion-2-0/

https://huggingface.co/spaces/fffiloni/prompt-converter

https://www.reddit.com/r/StableDiffusion/comments/z6r79u/hmm_sd_20_is_actually_better/

https://www.reddit.com/r/StableDiffusion/comments/z6nyu0/sd_20_since_images_of_lovely_ladies_seem_to_get/

Anscheinend sind negative prompts bei 2.0 wichtiger

https://www.reddit.com/r/StableDiffusion/comments/z6ao2j/please_use_negative_prompts_with_stable_diffusion/

https://www.reddit.com/r/StableDiffusion/comments/z66wch/comparing_20_768_and_15_at_their_native/

https://www.reddit.com/r/StableDiffusion/comments/z5vqqu/beautiful_farmers_daughter_stable_diffusion_20/

https://www.reddit.com/r/StableDiffusion/comments/z637bb/sd_20/

129 of 181

Anhang: Ökosystem: 1.5 vs. 2.0

Hinweis: theoretisch veraltet (aktuelle version ist 2.1), aber 1.5 wird immer noch intensiv verwendet (bzw. darauf basierende Models)

https://www.reddit.com/r/StableDiffusion/comments/z5jnq0/stable_diffusion_20_has_surprisingly_good_lighting/

https://www.reddit.com/r/StableDiffusion/comments/z6apqi/how_to_generate_better_images_with_stable/

https://www.reddit.com/r/StableDiffusion/comments/z74iyj/sd_2_can_take_photorealistic_photos_good_tips_are

https://www.reddit.com/r/StableDiffusion/comments/z73jv6/living_room_stable_diffusion_20

https://www.reddit.com/r/StableDiffusion/comments/z76udu/sd_20_is_amazing_on_photorealism

https://www.reddit.com/r/StableDiffusion/comments/z7ghbf/not_only_is_stable_diffusion_20_not_bad_but

https://www.reddit.com/r/StableDiffusion/comments/z7k8nd/20_realistic_jewelry_withwithout_gems_prompt

130 of 181

Anhang: Ökosystem: Models

https://huggingface.co/models?other=stable-diffusion (> 4000 models)

https://stablehorde.net/ (>200 Models, direkt nutzbar über die UIs)

Interessant sind z.B. “Asim Simpsons”, “realistic vision”, “mo-di-diffusion” (Pixar style), “dreamshaper”, “pixhell” (pixel art)

Dreamshaper 5 model https://www.reddit.com/r/StableDiffusion/comments/12jt5y1/dreamshaper_5_is_here_sorry_it_took_me_a_while_i

Model pixhell https://www.reddit.com/r/StableDiffusion/comments/11v52ql/pixhell_21_sd_model

Model freedom https://www.reddit.com/r/StableDiffusion/comments/146272w/freedom_is_here_the_generalist_21_768x_finetuned

https://www.reddit.com/r/StableDiffusion/comments/z6eg5x/generating_porsches_with_the_knollingcase_model

https://huggingface.co/nitrosocke/mo-di-diffusion�https://www.reddit.com/r/StableDiffusion/comments/yhi8zo/modern_disney_lara_croft_prompt_settings_in

https://www.reddit.com/r/StableDiffusion/comments/yujief/samdoesarts_model_v2_huggingface_link_in_comments

https://www.reddit.com/r/StableDiffusion/comments/yskhce/my_new_dd_model_trained_for_30000_steps_on_2500

https://www.reddit.com/r/StableDiffusion/comments/yxat2s/new_release_nitrodiffusion_multistyle_model_with

https://www.reddit.com/r/StableDiffusion/comments/z42idl/new_release_sd_20_dreambooth_model_futurediffusion

https://huggingface.co/dallinmackay/Tron-Legacy-diffusion

https://huggingface.co/nitrosocke/classic-anim-diffusion

131 of 181

Anhang: Ökosystem: Model-Vergleich

Quelle: https://www.reddit.com/r/StableDiffusion/comments/154nd8y/more_realistic_model_comparisons/

132 of 181

Anhang: Ökosystem: Model-Vergleich

Quelle: https://www.reddit.com/r/StableDiffusion/comments/154nd8y/more_realistic_model_comparisons/

133 of 181

Anhang: Ökosystem: Model-Vergleich

Quelle: https://www.reddit.com/r/StableDiffusion/comments/154nd8y/more_realistic_model_comparisons/

134 of 181

Ökosystem: Dreambooth

DreamBooth is a method to personalize text-to-image models like Stable Diffusion given just a few (3-5) images of a subject. It allows the model to generate contextualized images of the subject in different scenes, poses, and views.

135 of 181

Ökosystem: Dreambooth

Dreambooth ist die “mächtigste” Option, um ein Model zu tunen.

Aber: dadurch wird ein neues Model mit dem entsprechenden Platzbedarf geschaffen (2-7 GB)

Siehe auch:

https://huggingface.co/docs/diffusers/training/dreambooth

https://stable-diffusion-art.com/dreambooth/

136 of 181

Ökosystem: Textual Inversion (TI)

Wir können neue Konzepte hinzufügen und der KI etwas “beibringen”, ein neues Objekt oder einen neuen Stil. Diesem wird ein Begriff zugeordnet, der bisher nicht verwendet wurde (“pseudo-word”).

Die Idee hinter Textual Inversion ist, dem text model mit wenigen repräsentativen Bildern ein neues Wort beizubringen.

Häufig wird TI auch für style transfer angewendet.

Textual Inversions sind sehr klein (< 1MB).

Siehe: https://medium.com/@onkarmishra/how-textual-inversion-works-and-its-applications-5e3fda4aa0bc

137 of 181

Ökosystem: LoRA

Low-Rank Adaptation (LoRA) models are small modifiers of checkpoint models

LoRA models are small Stable Diffusion models that apply tiny changes to standard checkpoint models. They are usually 10 to 100 times smaller than checkpoint models. That makes them very attractive to people having an extensive collection of models.

LoRA applies small changes to the most critical part of Stable Diffusion models: The cross-attention layers. It is the part of the model where the image and the prompt meet.

Siehe auch: Anhang

Dreambooth is powerful but results in large model files (2-7 GBs). Textual inversions are tiny (about 100 KBs), but you can’t do as much.

138 of 181

Anhang: Ökosystem: LoRA

Dreambooth is powerful but results in large model files (2-7 GBs). Textual inversions are tiny (about 100 KBs), but you can’t do as much.

https://stable-diffusion-art.com/lora/#What_are_LoRA_models

https://softwarekeep.com/help-center/how-to-use-stable-diffusion-lora-models

https://aituts.com/stable-diffusion-lora/

wo finde ich LoRas? https://huggingface.co/models?other=stable-diffusion&search=lora

139 of 181

Anhang: SDXL

Vergleich SDXL (base + refiner model) und 1.5 (base und finetuned models)

https://stable-diffusion-art.com/sdxl-model/#Differences_between_SDXL_and_v15_models

https://venturebeat.com/ai/stability-ai-announces-stable-diffusion-xl-beta-for-api-and-dreamstudio/

140 of 181

Anhang: SDXL models

https://www.reddit.com/r/StableDiffusion/comments/155jx09/sdnext_kandinsky_v2_diffuser_model_not_just_sdxl

https://www.reddit.com/r/StableDiffusion/comments/1550f0s/dreamshaper_xl_alpha

https://www.reddit.com/r/StableDiffusion/comments/15694hh/sdxl_10_candidate_models_are_insane

141 of 181

Anhang: SDXL 1.0

Informationen und vor allem erste gute Ergebnisse mit SDXL 1.0

https://www.reddit.com/r/StableDiffusion/comments/15aapcb/sdxl_10_is_out

https://www.reddit.com/r/StableDiffusion/comments/15azdjo/sdxl_10_a1111_heron_lamp_designs

https://www.reddit.com/r/StableDiffusion/comments/15b67mp/sdxl_10_an_ai_noob_fan_of_the_lord_of_the_rings

https://www.reddit.com/r/StableDiffusion/comments/15b7oqt/a_different_perspective_on_beauty_sdxl_10

https://www.reddit.com/r/StableDiffusion/comments/15b5dxq/stable_diffusion_incorporate_text_in_image

https://www.reddit.com/r/StableDiffusion/comments/15ap2hg/sdxl_10_artist_reference_sheet_with_rabbits

https://www.reddit.com/r/StableDiffusion/comments/15b2doz/sdxl_base_refiner_10_gothic_cyberpunk_portraits

https://www.reddit.com/r/StableDiffusion/comments/15at3ko/first_results_sdxl_10

https://www.reddit.com/r/StableDiffusion/comments/15be7jg/sdxl_10_a1111_what_a_difference_refine_makes

142 of 181

Anhang: Ökosystem: Web UIs

automatic’s webUI�https://github.com/AUTOMATIC1111/stable-diffusion-webui�
hlky / stable-diffusion-webui�https://github.com/Sygil-Dev/sygil-webui

Hinweis: es gibt inzwischen mehr UIs. Am verbreitesten ist nach wie vor das von automatic1111�

143 of 181

Ökosystem: die Anderen

DALL-E 2

proprietärer Cloud Service
OpenAI ( Elon Musk, Sam Altman, Microsoft)
Diffusion Model

Imagen

proprietärer Cloud Service
Google
Diffusion Model

MidJourney

proprietärer Cloud Service

Wombo Dream

proprietärer Cloud Service

Beispiel-Bild von Midjourney

https://www.reddit.com/r/StableDiffusion/comments/z2xhvq/scottish_landscapes/

144 of 181

Anhang: Ökosystem: DeepFloydIF

DeepFloydIF

open source
stabilityAI (wie SD)
noch in Entwicklung
Diffusion Model
Text-fähigkeit

145 of 181

Anhang: stable horde

stable horde (crowdsourced distributed computing)

stablehorde.net

stable horde: Freiwillige stellen ihre Rechenkapazität zur Verfügung, die alle nutzen können.�Kann ohne Anmeldung genutzt werden. Mit Anmeldung können “Kudos” verdient werden (vor allem durch Bereitstellung von Rechenkapazität), die für “teure” Generationen verwendet werden können (höhere Auflösung, viele Steps, besondere models…)

Seiten zum Generieren: https://aqualxx.github.io/stable-ui/ https://artificial-art.eu/

Aufruf zur Unterstützung: https://www.reddit.com/r/StableDiffusion/comments/12tultf/reminder_you_can_help_everyone_and_especially/

Zusammenarbeit stable Horde und LAION: https://laion.ai/blog/laion-stable-horde/

146 of 181

Anhang: Schwächen: Hände

2.0 kann Hände ordentlich darstellen

https://www.reddit.com/r/StableDiffusion/comments/z7salo/with_the_right_prompt_stable_diffusion_20_can_do

SDXL 0.9 bekommt das (ohne Erweiterungen und Nachbearbeitung) nicht besonders gut hin, für das Bild unten habe ich 200 Versuche gebraucht:

Generiert mit https://clipdrop.co/stable-diffusion

prompt: spock vulcan salute

147 of 181

Anhang

Prompt: “Climate change report on woooden table, readable text”

Das ist teilweise überflutetes Land, links unten steht anscheinend “climate change” und es ist auf einem Tisch…

generiert mit lexica.art (November 2022), SD 1.5

https://lexica.art/prompt/5c031846-5f8f-4d60-92c9-41db095bbed0

148 of 181

Anhang: Funktionsweise: Anwendungssicht

Software + Model + Parameter + Prompt = Bild-Output

Seed
Sampler
Bilddimensionen
Steps
Guidance

Bei identischem Input wird deterministisch immer derselbe Output generiert.

Hinweis: es kann sein, daß trotz gleichem input unterschiedliche Ergebnisse erzeugt werden, wenn sich die Hardware unterscheidet. Das konnte ich nicht verifizieren, daher habe ich diese Folie in den Anhang verschoben

149 of 181

Anhang depth2img vs. instructpix2pix

Zwei Modelle, die ermöglichen, die Komposition eines Bilder aufrechtzuerhalten und mit einem Prompt einzelne Elemente zu ersetzen

https://stable-diffusion-art.com/depth-to-image/

https://stable-diffusion-art.com/instruct-pix2pix/

150 of 181

Anhang: Workflow für Ada

Beginning with the workflow from here https://www.reddit.com/r/StableDiffusion/comments/10noz4f/sorry_had_to_post_this_a_good_prompt_a_good_model/

I fiddled around and after about 100 generations I finally got this.

Prompt: Vibrant neon colors comic, Portrait (woman) (30 year old woman), ((((ada lovelace)))), (long hair), Smirk, realistic shaded, elegant, award winning half body portrait of (((ada lovelace))) with flower in ((dark hair)), (rococo hair style), shaded flat illustration, comic sketch

Negative Prompt: highly detailed, fine detail, intricate, poorly drawn, crippled, crooked, broken, weird, odd, distorted, (big breasts), (big tits), erased, cut, mutilated, sloppy, hideous, ((ugly)), pixelated, ((bad hands)), aliasing, lowres, (monochrome), (black and white), ((b&w)), poorly drawn, sloppy, over exposed, over saturated, burnt image, sloppy, broken, fuzzy, aliasing, cheap, oldschool, poor quality, pixelated, sleepy, closed-eyes, lowres, pixelated, aliasing, old, granny, ugly, ((bad anatomy)), hideous, deformed, mutant, butchered, gore, sloppy, artifacts, mutilated, poorly drawn, smudged, pencil, glossy skin, doll, plastic, (signature), (watermark), (words), (letters), (logo), (username), ((disfigured)), ((close up))

Sampler: k_euler_a�Model: realistic vision�Postprocessor: RealESRGAN_4xplus�Size 512x512�Steps 30�Guidance 7�Karras enabled�Seed: 400424973

Made with https://aqualxx.github.io/stable-ui/

151 of 181

Anhang: Teaser des Vortrags

Titel: Text-zu-Bild-KI “Stable Diffusion” - Revolution der Bilder?

Short: “Stable Diffusion" generiert Bilder aus Texteingaben und ist seit Ende August 2022 für alle quelloffen verfügbar. Stehen wir vor einer Revolution der Bilder und dem „Ende der Kunst“?

Wir diskutieren Funktionsweise und Ökosystem, Copyright, soziale und gesellschaftliche Auswirkungen sowie die Kunst des “Prompting”.

Beschreibung:

Der Vortrag soll einen Einstieg und die nötigen Hinweise geben, um selbst Bilder generieren zu können.

Themen sind Funktionsweise, Varianten wie Text-to-Image, Image-to-Image und ControlNet, Auswirkung von Modellen und Seeds sowie das Prompting. Es geht auch um das rasant wachsende Ökosystem und die möglichen Einsatzmöglichkeiten.�Die aktuellen Entwicklungen und zu erwartenden Neuerungen sowie gesellschaftliche Auswirkungen sind weitere Themen.

Um direkt selbst einsteigen zu können, zeige ich die Nutzung von stable horde (crowdsourced distributed computing)

Der Vortrag ist eine aktualisierte Version des Vortrags, den wir im Dezember gehalten haben (https://www.youtube.com/watch?v=ahvO05zEbf4).

152 of 181

Denkanstoß

“This release is the culmination of many hours of collective effort to create a single file that compresses the visual information of humanity into a few gigabytes.”

[Stable Diffusion Public Release Announcement, stability.ai Blog]

153 of 181

Anhang: vorherige Vorträge

Tech Friday 2022

Diesen Vortrag hatte ich zusammen mit meinem Kollegen Martin Lukas im Dezember 2022 in der inhouse-Konferenz “Tech Friday” bereits einmal gehalten.

Die damaligen Folien sind hier abrufbar: tinyurl.com/48w4vfm2�Das Video des damaligen Vortrags ist hier zu finden: https://youtu.be/ahvO05zEbf4

FrOSCon 2023

Folien: https://tinyurl.com/2ackd4ex�Video: https://media.ccc.de/v/froscon2023-2893-text-zu-bild-ki_stable_diffusion

Tech Friday 2023

Folien: https://tinyurl.com/u59vydwb�Video: https://www.youtube.com/watch?v=jgF1a-5nw6g

154 of 181

Ökosystem: Geschichte

22.08.2022 Release Stable Diffusion (1.4), Start Entwicklung Automatic1111

24.08.2022 Lexica

06.09.2022 Start Entwicklung SD-Dreambooth

20.10.2022 Runway brings in/outpainting to level of Dall-E 2

21.10.2022 Stable Diffusion 1.5

27.10.2022 Model “Modern Disney” (Pixar Style)

Explosionartige Vermehrung von neuen Models, Model-merges

24.11.2022 Stable Diffusion 2.0

Hier gibt es eine interessante Zusammenfassung der Geschichte von SD

https://www.reddit.com/r/StableDiffusion/comments/154p01c/before_sdxl_new_era_starts_can_we_make_a_summary

Veraltet! Stand Dezember 2022.

Seitdem hat sich sehr viel getan (SD 2.1, UIs, SDXL, ControlNet …)

155 of 181

Übersicht Künstler SDXL

Man kann immer schreiben “A Picture of [x] in the Style of [y]”, wobei [y] dann der Name des Künstlers, z.B. Picasso, ist. Die Seite https://clio.so/rabbitsxl gibt eine Übersicht und Beispielbilder für jeden Künstler

156 of 181

Text-zu-Bild-KI “Stable Diffusion” - Revolution der Bilder?

Stable Diffusion

Das waren die Bilder von Dezember 2022 - jetzt kommen neue

It’s important to have good advisors

Das ist kein Photo!

Ds ist auch kein Photo!

Ende der Slideshow�Quellen der Bilder im Anhang

157 of 181

Anhang: Bilder der Slideshow

Die Bilder der Slideshow kommen von (in dieser Reihenfolge):

https://www.reddit.com/r/StableDiffusion/comments/z8jp2y/its_really_hard_developing_artist_styles_in_sd_20

https://www.reddit.com/r/StableDiffusion/comments/z9h9yo/is_this_detailed_enough_yet_sd_v2_768_a1111_fork

https://www.reddit.com/r/StableDiffusion/comments/z8gje0/hard_to_tell_these_are_not_real_women_damn

https://www.reddit.com/r/StableDiffusion/comments/z7ghbf/not_only_is_stable_diffusion_20_not_bad_but

https://www.reddit.com/r/StableDiffusion/comments/yokohg/thanks_to_nitrosocke_for_the_fantastic

https://www.reddit.com/r/StableDiffusion/comments/yo5cla/pplease_dont_hurt_me_im_not_a_bad_slime_if_you

https://www.reddit.com/r/StableDiffusion/comments/yc17bg/ice_cream

https://www.reddit.com/r/StableDiffusion/comments/z7ko3m/using_takeon_hassanblend_and_knollingcase_to_make

https://www.reddit.com/r/StableDiffusion/comments/z97a0w/documenting_the_imaginary_architecture_of_havana

https://www.reddit.com/r/StableDiffusion/comments/yguwv6/prompt_to_create_silhouette_wallpapers_newest

https://www.reddit.com/r/StableDiffusion/comments/z9gf3x/knollingcase_is_a_great_model_for_models

158 of 181

Anhang: Bilder der Slideshow

neue Bilder (Juli 2023) der Slideshow am Anfang kommen von (in dieser Reihenfolge):

https://www.reddit.com/r/StableDiffusion/comments/1429jj2/its_important_to_have_good_advisors/

https://www.reddit.com/r/StableDiffusion/comments/13uerov/some_dnd_inspired_watercolor_style_portraits/

https://www.reddit.com/r/StableDiffusion/comments/136knxi/you_understand_that_this_is_not_a_photo_right/

https://www.reddit.com/r/StableDiffusion/comments/155iir2/most_realistic_image_by_accident/

https://www.reddit.com/r/StableDiffusion/comments/13b0tkk/i_know_people_like_their_waifus_but_here_are_some/

https://www.reddit.com/r/StableDiffusion/comments/12x2m1t/midnight/

159 of 181

Pause

clipdrop.co/stable-diffusion

SDXL 1.0

Prompt: Mug with paws printed on it and steam from the inside

Das ist ein alter Scherz, der schon in Lemmings eingebaut war.

Paws klingt wie Pause

Zum Workshop

160 of 181

Vielen Dank

Erstellt mit SDXL 0.9 https://clipdrop.co/stable-diffusion

Prompt: crocodile with a sign that says See you later …

Ich werde noch am Stand der Qvest Digital am Eingang zu finden sein

161 of 181

Selber machen …

Workshop

162 of 181

Zugriff Informationen

Diese Präsentation ist über https://tinyurl.com/y7jpyf7f abrufbar

Ab Seite 140 geht es mit den Informationen für den Workshop los!

163 of 181

Wie anfangen?

zuerst einen prompt zusammenstellen oder einen prompt, der ähnlich ist, nehmen und anpassen
ein model aussuchen, das passen könnte
Dann den prompt weiter anpassen

Allgemein ist es sinnvoll, viele Bilder auf einmal zu generieren, um davon dann passende Auszusuchen.

Ausserdem sollte man mindestens ein paar models ausprobieren, auch gerne mit demselben Prompt.

Und das wichtigste: erwartet keine Wunder. Die Bilder, die ihr im Internet seht, sind meist das Ergebnis von viel Arbeit, hunderten Generierungen, manchmal sogar speziellen Models - oder purem Glück

164 of 181

Wie kann ich das nutzen?

Online: stable horde (crowdsourced computing)

https://stablehorde.net/
https://artificial-art.eu/ fork von dem UI, das bei aqualxx verwendet wird. no login, no tokens,..
https://aqualxx.github.io/stable-ui/ no login, no tokens, <60 sec, webp only..
https://diffusionui.com/b/stable_horde
https://tinybots.net/artbot
Mastodon bot: https://sigmoid.social/@stablehorde_generator�Example: "@stablehorde_generator draw for me a beautiful night style: fantasy"

lokal auf smartphone (iOS)

maple diffusion�https://twitter.com/amasad/status/1580772494230704128�https://github.com/madebyollin/maple-diffusion
promptArt�https://labml.ai/#promptArt�Draw Things: AI Generation (ios App)

Meine Empfehlung: https://artificial-art.eu/

Wenn’s etwas kosten darf:�https://clipdrop.co/stable-diffusion

(verfeinertes SDXL)

165 of 181

166 of 181

Anzahl/batch size: Wieviel Bilder in einer Generierung erstellt werden.�Empfehlung: 10

Breite/Width und Höhe/Height: Format des Bildes. sollte 512*512 für SD und 1024*1024 für SDXL sein

Rest kann erstmal so bleiben

Wenn die ersten Bilder fertig sind, werden sie auf der rechten Seite angezeigt.

Dort kann man sie auch herunterladen (.jpeg), und der workflow wird in einer .json-Datei ebenfalls heruntergeladen

167 of 181

Auswahl des Models

Die Zahl in Klammern nach dem Model ist die Zahl der aktiven Worker für dieses Model.

Model: Welches SD(XL)-Model soll verwendet werden?

SDXL 1.0 - SDXL, general purpose
stable diffusion
stable diffusion 2.1
Realistic Vision - für realbilder und comic

Asim Simpsons - Simpsons
Knollingcase - In einem Case
ICBINP - I can’t believe it’s not Photography
mo-di-diffusion - Pixar Style

Beispiele

168 of 181

Prompting-Beispiel

Negative Prompt: make up, ugly, hands, blurry, low resolution, animated, cartoon, lowres, text, error, cropped, worst quality, low quality, jpeg artifacts, ugly, duplicate, morbid, mutilated, out of frame, extra fingers, mutated hands, poorly drawn hands, poorly drawn face, mutation, deformed, blurry, dehydrated, bad anatomy, bad proportions, extra limbs, cloned face, disfigured, gross proportions, malformed limbs, missing arms, missing legs, extra arms, extra legs, fused fingers, too many fingers, long neck

Stable Diffusion 2.0

50 steps using Euler a, 768x768, CFG: 7.

Prompt: a beautiful ((closeup)) cinematic still image of a natural ((young)) woman as (medieval knight) standing in a forest near a ruined castle, sunset, headshot, textured skin, freckles, (((skin pores))), birthmark, auburn hair, blue eyes, dramatic lighting, Zeiss lens, 35mm film, f/2.8

versucht mal den Prompt mit einem beliebigen Model

Fügt dann den negativen Prompt hinzu

Gibt es einen sichtbaren Unterschied?

169 of 181

Prompting-Beispiel 2

photograph of a (one massive colorful crystal:1.2) growing out of the rocky mountain, (focus on crystal:1.2), 4k, 8k, (highly detailed), ((landscape)),(translucent crystal:1.1), light going trough the crystal, bokeh, chromatic aberration, mountain view

Hier ist wahrscheinlich ein realistic model sinnvoll

170 of 181

Prompting: Hints

https://publicprompts.art/

lexica.art (wie immer)

https://www.reddit.com/r/StableDiffusion/

Prompt aus einem Bild generieren

https://huggingface.co/spaces/fffiloni/CLIP-Interrogator-2

171 of 181

Workflow für Ada

Beginning with the workflow from here https://www.reddit.com/r/StableDiffusion/comments/10noz4f/sorry_had_to_post_this_a_good_prompt_a_good_model/

-> Siehe auch nächste Seite!

I fiddled around and after about 100 generations I finally got this.

Prompt: Vibrant neon colors comic, Portrait (woman) (30 year old woman), ((((ada lovelace)))), (long hair), Smirk, realistic shaded, elegant, award winning half body portrait of (((ada lovelace))) with flower in ((dark hair)), (rococo hair style), shaded flat illustration, comic sketch

Negative Prompt: highly detailed, fine detail, intricate, poorly drawn, crippled, crooked, broken, weird, odd, distorted, (big breasts), (big tits), erased, cut, mutilated, sloppy, hideous, ((ugly)), pixelated, ((bad hands)), aliasing, lowres, (monochrome), (black and white), ((b&w)), poorly drawn, sloppy, over exposed, over saturated, burnt image, sloppy, broken, fuzzy, aliasing, cheap, oldschool, poor quality, pixelated, sleepy, closed-eyes, lowres, pixelated, aliasing, old, granny, ugly, ((bad anatomy)), hideous, deformed, mutant, butchered, gore, sloppy, artifacts, mutilated, poorly drawn, smudged, pencil, glossy skin, doll, plastic, (signature), (watermark), (words), (letters), (logo), (username), ((disfigured)), ((close up))

Sampler: k_euler_a�Model: realistic vision�Postprocessor: RealESRGAN_4xplus�Size 512x512�Steps 30�Guidance 7�Karras enabled�Seed: 400424973

172 of 181

Prompt Vorlage

https://www.reddit.com/r/StableDiffusion/comments/10noz4f/sorry_had_to_post_this_a_good_prompt_a_good_model/

PROMPT: synthwave style, nvinkpunk Detailed portrait cyberpunk (sks woman) (20 year old sks woman), futuristic neon reflective wear, sci-fi, robot parts, perfect face, ((tattoo)), (long hair), matte skin, pores, sharp detail, sharpness, wrinkles, hyperdetailed, hyperrealistic, subsurface scattering, Hasselblad Award Winner, Soft Diffuse Lighting, Smirk, machine face, fine details, realistic shaded, intricate, elegant, award winning half body portrait of a woman in a croptop and cargo pants with ombre navy red teal hairstyle with head in motion and hair flying, paint splashes, splatter, outrun, vaporware, shaded flat illustration, digital art, highly detailed, fine detail, intricate

Negative prompt: lowres, poorly drawn, crippled, crooked, broken, weird, odd, distorted, (big breasts), (big tits), erased, cut, mutilated, sloppy, hideous, ((ugly)), pixelated, ((bad hands)), aliasing, lowres, (monochrome), (black and white), ((b&w)), poorly drawn, sloppy, over exposed, over saturated, burnt image, sloppy, broken, fuzzy, aliasing, cheap, oldschool, poor quality, pixelated, sleepy, closed-eyes, lowres, pixelated, aliasing, old, granny, ugly, ((bad anatomy)), hideous, deformed, mutant, butchered, gore, sloppy, artifacts, mutilated, poorly drawn, poorly detailed, smudged, sketch, pencil, glossy skin, doll, plastic, (signature), (watermark), (words), (letters), (logo), (username), ((disfigured)), ((close up))

Steps: 32, Sampler: DPM++ SDE, CFG scale: 7, Seed: 3856152164, Size: 1536x2304, Model hash: 8194f84cdc, Denoising strength: 0.35, Mask blur: 4, Ultimate SD upscale upscaler: R-ESRGAN 4x+, Ultimate SD upscale tile_size: 512, Ultimate SD upscale mask_blur: 16, Ultimate SD upscale padding: 32

173 of 181

live paint - Schnell und einfach

https://tinybots.net/artbot/live-paint

(auch das geht über stable horde - kostenlos und ohne Anmeldung)

Das Ergebnis auf der rechten Seite war schneller als diesen Satz zu schreiben…

174 of 181

Was geht leicht? Was nicht?

nur eine Person / ein Objekt ist einfach
Mehrere Personen / Objekte, die in einer Beziehung zueinander stehen, nicht.

Ein Zauberer, der ein Kaninchen aus dem Hut zieht.

Da sind nicht nur 3 Personen / Objekte enthalten, die auf eine bestimmte Weise in Beziehung stehen, sondern es gibt auch ein kulturelles Bild, wie das auszusehen hat (das Kaninchen eines Zauberers wird weiss sein…)

echtes Photo

https://www.thestandard.com.hk/section-news/section/5/252801/HSBC-pulls-rabbit-out-of-a-hat-when-needed

Prompt: a white rabbit jumping out of a wizards hat

175 of 181

Ergebnisse

Prompt: a magician pulling a rabbit out of a hat, 4k photo, high quality

model: SDXL 1.0

size 1024*1024

Etwa 120 Generierungen…

und nicht wirklich das, was uns vorschwebte.

Deshalb versuchen wir img2Img und ControlNet.

Siehe nächste Seite!

176 of 181

Img2Img & ControlNet

Ist leider bei AAA kaputt, deshalb https://tinybots.net/artbot

upload image
prompt
Number of Images auf 10 setzen
Image Model je nach Stil (realistisch, comic, malerei)

a colored photo of a magician pulling a white rabbit out of his hat

echtes Photo

177 of 181

ControlNet

https://jspaint.app Bild zeichnen
upload des Bildes
controlType openpose

178 of 181

Clipdrop

clipdrop.co ist von stability.ai, die Stable Diffusion weiterentwickeln.

Ein paar Funktionalitäten sind kostenlos, andere dagegen erfordern eine Lizenz (15€/Monat)

Bei den kostenpflichtigen Funktionalitäten ist besonders SDXL hevorzuheben, denn diese Variante ermöglicht es, mit sehr einfachen Prompts durchaus ansprechende Ergebnisse zu erzielen.

Mehr auf den folgenden Seiten (kostenlose bzw. Demoversionen)

clipdrop.co/stable-diffusion

SDXL 1.0

Prompt: “Penguin”

Style: Neonpunk

… Direkt beim ersten Versuch

179 of 181

Clipdrop SDXL Turbo

https://clipdrop.co/stable-diffusion-turbo

Veröffentlich am 28.11.2023

Nur 1-4 Steps statt 50, damit extrem schnell, es wird bei der Eingabe generiert.

Kann ohne Bezahlung nur kurz (ca. eine Minute) getestet werden.

180 of 181

Clipdrop uncrop (outpainting)

https://clipdrop.co/uncrop

echtes Photo

181 of 181

Clipdrop Sky Replacer

echtes Photo