1 of 35

Text2Cinemagraph� Text-Guided Synthesis of Eulerian Cinemagraphs

Aniruddha Mahapatra

CMU

Hsin-Ying Lee

Snap Research

Aliaksandr Sairohin

Snap Research

Sergey Tulyakov

Snap Research

Jun-Yan Zhu

CMU

2 of 35

What is a Cinemagraph?

  • Perpetual seamless looping videos

2

* Adobe stock footage

3 of 35

Motivation

3

*Adobe Photoshop Elements 2023 (Moving Elements)

4 of 35

Motivation

  • Online papers
  • Commercial websites
  • Games (background)
  • Virtual meeting backgrounds
  • Picture Books
  • Animated movies

4

5 of 35

Academic Prior Art

  • Uncontrollable cinemagraph from a single image

5

* Animating Pictures with Eulerian Motion Fields, Holynski et al., 2021 (CVPR)

6 of 35

Academic Prior Art

  • Controllable cinemagraph from a single image

6

* Controllable Animation of Fluid Elements in Still Images, Mahapatra et al., 2022 (CVPR)

7 of 35

Limitations!

  • Prior works only work on real-life images

7

8 of 35

Limitations!

  • Prior works only work on real-life images
  • But what is wrong with that ??

8

9 of 35

Limitations!

  • Prior works only work on real-life images
  • But what is wrong with that ??
    • Why do we want to stop at just generating cinemagraphs

for real-life images?

9

10 of 35

Limitations!

  • Prior works only work on real-life images
  • But what is wrong with that ??
    • Why do we want to stop at just generating cinemagraphs

for real-life images?

      • Commercial websites
      • Games (background)
      • Virtual meeting backgrounds
      • Picture Books
      • Animated movies

10

11 of 35

What more can we do?

  • Go beyond real-life images – Stable Diffusion

11

* Promptbase (Stable-Diffusion-v-1-4)

12 of 35

What more can we do?

  • Go beyond real-life images – Stable Diffusion – from text !!

12

* Promptbase (Stable-Diffusion-v-1-4)

13 of 35

13

Artistic Style Cinemagraph From Text

14 of 35

Method Overview

14

"a large waterfall falling from hills during sunset in the style of Leonid Afremov"

Text to Image with Stable Diffusion

Image to Optical Flow Prediction

Optical Flow warping to Cinemagraph

15 of 35

Method Overview - Hypothesis

  • A single optical flow can describe the motion of such cinemagraphs
    • Waterfall, lakes, rivers, clouds, etc.
  • Warping is done using Symmetric Splatting (Holynski et al. 2021)

15

16 of 35

Optical Flow Prediction

  • Training data only consists of real-life videos!

16

17 of 35

Optical Flow Prediction

  • How to predict flow for Artistic Style Image
    • Not in training distribution!

17

18 of 35

Optical Flow Prediction

  • Can predict a twin counterpart of the Artistic Image
    • Self-Attention maps in Stable Diffusion captures structure the most
    • Copy Self-Attention maps from Artistic Image and modified prompt
    • Similar to Plug-and-Play (Tumanyan et al. 2023)

18

19 of 35

Twin Image Generation

19

20 of 35

Mask Generation

  • In addition to the single image, mask is quite important to reliable flow prediction – use pretrained segmentation model

20

ODISE

* ODISE: Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models, Xu et al. 2023 (CVPR)

21 of 35

Mask Generation

  • Boundary Artifacts

21

22 of 35

Mask Generation

  • Self-Attention Maps

22

23 of 35

Mask Generation

  • Boundary Artifacts

23

24 of 35

Optical Flow Prediction

24

25 of 35

Video (Cinemagraph) Generation

25

26 of 35

Method (Revisited)

26

27 of 35

27

Text-Guided Direction Control

28 of 35

28

Text-Guided Direction Control

29 of 35

Baseline Comparisons (Artistic)

29

Ours

Animating Landscape

Holynski et al. 2021

SLR-SFS

CogVideo

Text2Video-Zero

VideoCrafter

"a large waterfall falling between hills in the style Van Gogh painting during sunset, 4k"

30 of 35

Baseline Comparisons (Real)

30

Ours

Animating Landscape

Holynski et al. 2021

SLR-SFS

Input Single Image

31 of 35

Metrics

31

32 of 35

32

"a large waterfall falling between hills in the style Van Gogh painting during sunset, 4k"

”super detailed color lowpoly art, northern sunset on a lake, monochrome high contrast color palette, 3 d render, digital art"

”pirate ship in turbulent ocean, ancient photo, brown tint"

33 of 35

33

"dense black smoke over a large grassland, after fire, vibrant colors, saturated, RGB, psychedelic colors"

”World War scene, fighter planes in the sky, clouds, cinematic ancient photo, brown tint"

"Old Victorian architecture in a Victorian valley, dramatic sky, cloudy sky, digital art, 4k, 8k"

34 of 35

34

Animating Historic Painting

* The Ninth Wave (1850) - Ivan Aivazovsky

35 of 35

35