EE P 596 Presentation: ControlNet + LoRA Line-Art Colorization Workflow
Xinghao Chen
PROBLEM STATEMENT
PROJECT SUMMARY
PROPOSED METHOD
First, we load the Stable Diffusion models and pre-trained LoRA model.
Then use positive and negative CLIP text encoder to describe the desired features of the final image.
Next, we load the line art image and input it into two ControlNet models. One to capture key lines and the other to capture body structure.
All these components are then fed into a K-Sampler for sampling and followed by a VAE decoder to generate the output image.
Finally, to achieve a higher resolution output, we use a DF2K model to upscale the generated image by a factor of 4.
Baseline 1: Prompt-Guided Image-to-Image
Using image-to-image with text prompts, the outputs failed to replicate input line art contours accurately, and without LoRA, the models missed the specified character features.
Input: a line art image of Mr. Joe Biden
Output images
Baseline 2: Image-to-Image with LoRA
Basically, this is adding LoRA model into baseline one, although output restored desired character attributes, but still failed to align with input image's structural features.
Input: a line art image of Mr. Joe Biden
Output images
Baseline 3: ControlNet without LoRA
In this baseline, ControlNet captured the structural features of the input image but failed to reproduce character-specific details.
Input: a line art image of Mr. Joe Biden
Output images
RESULTS
By integrating ControlNet and LoRA, we generated images that accurately preserved the input structural features while capturing the characters' distinctive features.
Input: a line art image of Mr. Joe Biden
Output images
SOME OTHER RESULTS
CONCLUSION
Future Work: