Quick Guide to Deforum v05
Art by: neuro @ https://twitter.com/neurodiculous
This quick user guide is intended as a LITE reference for different aspects and items found within the Deforum notebook. It is intended for version 05, which was release 10/2//2022
While this reference guide includes different explanations of parameters, it is not to be used as a complete troubleshooting resource. The user is encouraged to explore and create their own style, using this guide as a compass to help better their inspiration. The best way to make this guide effective is to share your findings and experiences with the community! -ScottieFox
The AI art scene is evolving rapidly. Take this guide lightly. Methods, models, and notebooks will change. All the info in this guide will become irrelevant. Sad but true :’( -huemin
Stability.AI Model Terms of Use 2
By using this Notebook, you agree to the following Terms of Use, and license:
This model is open access and available to all, with a CreativeML OpenRAIL-M license further specifying rights and usage.
The CreativeML OpenRAIL License specifies:
You can't use the model to deliberately produce nor share illegal or harmful outputs or content
CompVis claims no rights on the outputs you generate, you are free to use them and are accountable for their use which must not go against the provisions set in the license
You may re-distribute the weights and use the model commercially and/or as a service. If you do, please be aware you have to include the same use restrictions as the ones in the license and share a copy of the CreativeML OpenRAIL-M to all your users (please read the license entirely and carefully)
Please read the full license here: https://huggingface.co/spaces/CompVis/stable-diffusion-license
## What's Changed
* Print some useful anim_args by @johnnypeck in https://github.com/deforum/stable-diffusion/pull/63
* Update: cv2-based frame unpacking (vid2frames) by @Bardia323 in https://github.com/deforum/stable-diffusion/pull/66
* Add perspective flipping to 2D animation mode by @kabachuha in https://github.com/deforum/stable-diffusion/pull/64
* Add custom settings import from file by @kabachuha in https://github.com/deforum/stable-diffusion/pull/65
* Add schedule parameters evaluation as math expressions by @kabachuha in https://github.com/deforum/stable-diffusion/pull/72
* Printing/saving of intermediate steps by @enzymezoo-code in https://github.com/deforum/stable-diffusion/pull/73
* Video init masking by @enzymezoo-code in https://github.com/deforum/stable-diffusion/pull/73
* Improved masking by @enzymezoo-code in https://github.com/deforum/stable-diffusion/pull/74
* Weighted prompts by @kabachuha in https://github.com/deforum/stable-diffusion/pull/78
* Auto-download models and added robo-diffusion by @nousr in https://github.com/deforum/stable-diffusion/pull/83
* Add waifu v3 by @deforum in https://github.com/deforum/stable-diffusion/pull/88
## New Contributors
* @johnnypeck made their first contribution in https://github.com/deforum/stable-diffusion/pull/63
* @Bardia323 made their first contribution in https://github.com/deforum/stable-diffusion/pull/66
* @kabachuha made their first contribution in https://github.com/deforum/stable-diffusion/pull/64
* @nicolai256 made their first contribution in https://github.com/deforum/stable-diffusion/pull/69
* @nousr made their first contribution in https://github.com/deforum/stable-diffusion/pull/83
**Full Changelog**: https://github.com/deforum/stable-diffusion/compare/v0.4.0...v0.5.0
In Deforum an automatic model download feature has been added - you no longer need to download model’s manually and place them in the correct folder. All model weights, when selected, are downloaded from huggingface and placed in the appropriate model folder.
To download official model weights within Deforum you will need to have an account on huggingface and provide your username and an access token within the notebook.
You can make an access token by going to your profile > settings > access tokens > new token
Provide your username and access token when prompted in Deforum. Afterwards, the notebook will attempt to download the model. If you run into errors automatically downloading models, please try again, sometime the colab session times out.
Deforum v05 has the following models configured for automatic download:
Official Stable Diffusion Weights (requires huggingface login and token)
Unofficial Nousr Robo Diffusion
Make an account on huggingface, download the .ckpt file, and place the file in Google Drive.
The Deforum Stable Diffusion notebook requires the user to download model weights (~4GB) and correctly link the model weights to the Colab Notebook. The following steps will walk you through downloading model weights and uploading them to google drive:
1. SETUP
NVIDIA GPU:
This cell will give you information regarding the gpu you have connected to in the run session. Diffusion in general makes heavy use of VRAM (video RAM) to render images. Colab GPU tier list from best to worst: A100 (40GB VRAM), V100 (16GB VRAM), P100 (16GB VRAM), T4, K80.
Model and Output Paths:
Google Drive Path Variables (Optional):
The notebook expects the following path variables to be defined: models_path and output_path. These locations will be used to access the Stable Diffusion .pth model weights and save the diffusion output renders, respectively. There is the option to use paths locally or on Google Drive. If you desire to use paths on Google drive, mount_google_drive must be True. Mounting Gdrive will prompt you to access your Drive, to read/write/save images.
Setup Environment:
Running this cell will download github repositories, import python libraries, and create the necessary folders and files to configure the Stable Diffusion model. Sometimes there may be issues where the Setup Environment cells do not load properly and you will encounter errors when you start the run. Verify the Setup Environment cells have been run without any errors.
Python Definitions:
Running this cell will define the required functions to proceed with making images. Verify the Python Definitions cell has been run without any errors.
Select and Load Model:
In order to load the Stable Diffusion model, Colab needs to know where to find the model_config file and the model_checkpoint. The model_config file contains information about the model architecture. The model_checkpoint contains model weights which correspond to the model architecture. For troubleshooting verify that both the config and weight path variables are correct. By default the notebook expects the model config and weights to be located in the model_path. You can provide custom model weights and config paths by selecting “custom” in both the model_config and model_checkpoint dropdowns. Sometimes there are issues with downloading the model weights and the file is corrupt. The check_sha256 function will verify the integrity of the model weights and let you know if they are okay to use. The map_location allows the user to specify where to load model weights. For most colab users, the default “GPU” map location is best.
settings on next page →
2. SETTINGS
2a. Animation Settings
Animation modes:
“Border, translation_x, translation_y, rotation_3d_x, rotation_3d_y, rotation_3d_z, noise_schedule, contrast_schedule, color_coherence, diffusion_cadence, 3D depth warping, midas_weight, fov, padding_mode, sampling_mode, and save_depth_map. Resume_from_timestring is available during 3D mode. (more details below)
Animation Parameters:
Motion Parameters:
motion parameters are instructions to move the canvas in units per frame
Coherence:
The color coherence will attempt to sample the overall pixel color information, and trend those values analyzed in the 0th frame, to be applied to future frames. LAB is a more linear approach to mimic human perception of color space - a good default setting for most users.
HSV is a good method for balancing presence of vibrant colors, but may produce unrealistic results - (ie.blue apples) RGB is good for enforcing unbiased amounts of color in each red, green and blue channel - some images may yield colorized artifacts if sampling is too low.
The diffusion cadence will attempt to follow the 2D or 3D schedule of movement as per specified in the motion parameters, while enforcing diffusion on the frames specified. The default setting of 1 will cause every frame to receive diffusion in the sequence of image outputs. A setting of 2 will only diffuse on every other frame, yet motion will still be in effect. The output of images during the cadence sequence will be automatically blended, additively and saved to the specified drive. This may improve the illusion of coherence in some workflows as the content and context of an image will not change or diffuse during frames that were skipped. Higher values of 4-8 cadence will skip over a larger amount of frames and only diffuse the “Nth” frame as set by the diffusion_cadence value. This may produce more continuity in an animation, at the cost of little opportunity to add more diffused content. In extreme examples, motion within a frame will fail to produce diverse prompt context, and the space will be filled with lines or approximations of content - resulting in unexpected animation patterns and artifacts. Video Input & Interpolation modes are not affected by diffusion_cadence.
3D Depth Warping:
FOV (field of view/vision) in deforum, will give specific instructions as to how the translation_z value affects the canvas. Range is -180 to +180. The value follows the inverse square law of a curve in such a way that 0 FOV is undefined and will produce a blank image output. A FOV of 180 will flatten and place the canvas plane in line with the view, causing no motion in the Z direction. Negative values of FOV will cause the translation_z instructions to invert, moving in an opposite direction to the Z plane, while retaining other normal functions.A value of 30 fov is default whereas a value of 100 would cause transition in the Z direction to be more smooth and slow. Each type of art and context will benefit differently from different FOV values. (ex. “Still-life photo of an apple” will react differently than “A large room with plants”)
FOV also lends instruction as to how a midas depth map is interpreted. The depth map (a greyscale image) will have its range of pixel values stretched or compressed in accordance with the FOV in such a fashion that the illusion of 3D is more pronounced at lower FOV values, and more shallow at values closer to 180. At full FOV of 180, no depth is perceived, as the midas depth map has been compressed to a single value range.
In image processing, bicubic interpolation is often chosen over bilinear or nearest-neighbor interpolation in image resampling, when speed is not an issue. In contrast to bilinear interpolation, which only takes 4 pixels (2×2) into account, bicubic interpolation considers 16 pixels (4×4). Images resampled with bicubic interpolation are smoother and have fewer interpolation artifacts.
Video Input:
When using video_input mode, the run will be instructed to write video frames to the drive. If you’ve already populated the frames needed, uncheck this box to skip past redundant extraction, and immediately start the render. If you have not extracted frames, you must run at least once with this box checked to write the necessary frames.
Interpolation:
Resume Animation:
Currently only available in 2D & 3D mode, the timestamp is saved as the settings .txt file name as well as images produced during your previous run. The format follows:
yyyymmddhhmmss - a timestamp of when the run was started to diffuse.
prompts on next page →
2b. PROMPTS
In the above example, we have two groupings of prompts: the still frames *prompts* on top, and the animation_prompts below. During the “NONE” animation mode, the diffusion will look to the top group of prompts to produce images. In all other modes, (2D, 3D etc) the diffusion will reference the second lower group of prompts.
Careful attention to the syntax of these prompts is critical to be able to run the diffusion.
For still frame image output, numbers are not to be placed in front of the prompt, since no “schedule” is expected during a batch of images. The above prompts will produce and display a forest image and a separate image of a woman, as the outputs.
During 2D//3D animation runs, the lower group with prompt numbering will be referenced as specified. In the example above, we start at frame 0: - an apple image is produced. As the frames progress, it remains with an apple output until frame 20 occurs, at which the diffusion will now be directed to start including a banana as the main subject, eventually replacing the now no longer referenced apple from previous.
Interpolation mode, however, will “tween” the prompts in such a way that firstly, 1 image each is produced from the list of prompts. An apple, banana, coconut, and a durian fruit will be drawn. Then the diffusion begins to draw frames that should exist between the prompts, making hybrids of apples and bananas - then proceeding to fill in the gap between bananas and coconuts, finally resolving and stopping on the last image of the durian, as its destination. (remember that this exclusive mode ignores max_frames and draws the interpolate_key_frame/x_frame schedule instead.
Many resources exist for the context of what a prompt should include. It is up to YOU, the dreamer, to select items you feel belong in your art. Currently, prompts weights are not implemented yet in deforum, however following a template should yield fair results:
[Medium] [Subject] [Artist] [Details] [Repository]
Ex. “A Sculpture of a Purple Fox by Alex Grey, with tiny ornaments, popular on CGSociety”,
run on next page →
3. Run
Load Settings:
Image settings:
Dimensions in output must be multiples of 64 pixels otherwise, the resolution will be rounded down to the nearest compatible value. Proper values 128, 192, 256, 320, 384, 448, 512, 576, 640, 704, 768, 832, 896, 960, 1024. Values above these recommended settings are possible, yet may yield OOM (out of memory) issues, as well as improper midas calculations. The model was trained on a 512x512 dataset, and therefore must extend its diffusion outside of this “footprint” to cover the canvas size. A wide landscape image may produce 2 trees side-by-side as a result, or perhaps 2 moons on either side of the sky. A tall portrait image may produce faces that are stacked instead of centered.
Sampling Settings:
Stable Diffusion outputs are deterministic, meaning you can recreate images using the exact same settings and seed number. Choosing a seed number of -1 tells the code to pick a random number to use as the seed. When a random seed is chosen, it is printed to the notebook and saved in the image settings .txt file.
Considering that during one frame, a model will attempt to reach its prompt by the final step in that frame. By adding more steps, the frame is sliced into smaller increments as the model approaches completion. Higher steps will add more defining features to an output at the cost of time. Lower values will cause the model to rush towards its goal, providing vague attempts at your prompt. Beyond a certain value, if the model has achieved its prompt, further steps will have very little impact on final output, yet time will still be a wasted resource. Some prompts also require fewer steps to achieve a desirable acceptable output.
During 2D & 3D animation modes, coherence is important to produce continuity of motion during video playback. The value under Motion Parameters, “strength_schedule” achieves this coherence by utilizing a proportion of the previous frame, into the current diffusion. This proportion is a scale of 0 - 1.0 , with 0 meaning there’s no cohesion whatsoever, and a brand new unrelated image will be diffused. A value of 1.0 means ALL of the previous frame will be utilized for the next, and no diffusion is needed. Since this relationship of previous frame to new diffusion consists of steps diffused previously, a formula was created to compensate for the remaining steps to justify the difference. That formula is as such:
Target Steps - (strength_schedule * Target Steps)
Your first frame will, however, yield all of the steps - as the formula will be in effect afterwards.
A normal range of 7-10 is appropriate for most scenes, however some styles and art will require more extreme values. At scale values below 3, the model will loosely impose a prompt with many areas skipped and left uninteresting or simply grayed-out. Values higher than 25 may over enforce a prompt causing extreme colors of over saturation, artifacts and unbalanced details. For some use-cases this might be a desirable effect. During some animation modes, having a scale that is too high, may trend color into a direction that causes bias and overexposed output.
Save & Display Settings:
Prompt Settings:
Batch Settings:
Iter = incremental change (ex 77, 78, 79 ,80, 81, 82, 83…)
Fixed = no change in seed (ex 33, 33, 33, 33, 33, 33…)
Random = random seed (ex 472, 12, 927812, 8001, 724…)
Note: seed -1 will choose a random starting point, following the seed behavior thereafter
Troubleshoot: a “fixed” seed in 2D/3D mode will overbloom your output. Switch to “iter”
Init_Settings:
Note: even with use_init unchecked, video input is still affected.
Note: in ‘none’ animation mode, a folder of images may be referenced here.
create video from frames on next page →
4. Create video from frames