1 of 10

EE P 596 Presentation: ControlNet + LoRA Line-Art Colorization Workflow

Xinghao Chen

2 of 10

PROBLEM STATEMENT

  • Nowadays, Diffusion models demonstrate strong image generation capabilities.

  • But character depiction remains challenging, particularly for precise, character-specific details.

  • To address this, this project integrates ControlNet and Low-Rank Adaptation (LoRA) into a diffusion workflow.

3 of 10

PROJECT SUMMARY

  • 1. Character Selection and Data Preparation:�We gathered about 20 reference images of the target character from Internet, then used a Vision Transformer model (wd-vit-v3) to generate detailed describing tags for each image.
  • 2. LoRA Model Training:Using the files we prepared, we trained a LoRA model on the Stable Diffusion (SD1.5) to learn the character's unique visual features.
  • 3. Line Art to Image Generation:With a line-art image as input, our proposed workflow generated high-quality images that accurately matched the character's features while staying true to the line art structure.

4 of 10

PROPOSED METHOD

First, we load the Stable Diffusion models and pre-trained LoRA model.

Then use positive and negative CLIP text encoder to describe the desired features of the final image.

Next, we load the line art image and input it into two ControlNet models. One to capture key lines and the other to capture body structure.

All these components are then fed into a K-Sampler for sampling and followed by a VAE decoder to generate the output image.

Finally, to achieve a higher resolution output, we use a DF2K model to upscale the generated image by a factor of 4.

5 of 10

Baseline 1: Prompt-Guided Image-to-Image

Using image-to-image with text prompts, the outputs failed to replicate input line art contours accurately, and without LoRA, the models missed the specified character features.

Input: a line art image of Mr. Joe Biden

Output images

6 of 10

Baseline 2: Image-to-Image with LoRA

Basically, this is adding LoRA model into baseline one, although output restored desired character attributes, but still failed to align with input image's structural features.

Input: a line art image of Mr. Joe Biden

Output images

7 of 10

Baseline 3: ControlNet without LoRA

In this baseline, ControlNet captured the structural features of the input image but failed to reproduce character-specific details.

Input: a line art image of Mr. Joe Biden

Output images

8 of 10

RESULTS

By integrating ControlNet and LoRA, we generated images that accurately preserved the input structural features while capturing the characters' distinctive features.

Input: a line art image of Mr. Joe Biden

Output images

9 of 10

SOME OTHER RESULTS

10 of 10

CONCLUSION

  • Developed a hybrid approach combining ControlNet and LoRA.
  • Addressed limitations of existing methods, offering more accurate and consistent results.
  • Opened new opportunities for applications in artistic style transfer and creative design.

Future Work:

  • Cross-Domain Feature Transfer: Explore methods for transferring features across different art domains, such as blending anime and photorealistic styles.
  • Integration with 3D Workflows: Investigate the potential for extending the method to 3D character modeling.