1 of 37

Motion Texture

Akihito Maruya

2 of 37

The barberpole illusion demonstrates that perceived motion direction tends to align with the longer axis of the viewing aperture, overriding the true direction of the moving stimulus.

An increase in the aperture's aspect ratio amplifies its

influence on the perceived direction of motion.

3 of 37

Aperture Problem

  • Imagine an infinitely long line observed at time t.
  • At time t + Δt, you observe another infinitely long line.
  • Given a specific point on the line at time t (e.g., marked with a circle),
  • There are infinitely many possible velocities that could have resulted in the observation at t + Δt.
  • This illustrates the ambiguity in determining motion direction from local measurements along extended contours.
  • How does the brain—or a computer—resolve this ambiguity to infer true motion?

 

 

4 of 37

How does the brain—or a computer—resolve this ambiguity to infer true motion?�

  •  

5 of 37

How does the brain—or a computer—resolve this ambiguity to infer true motion? (Lukas & Kanaeda, 1981) �

  •  

6 of 37

How does the brain—or a computer—resolve this ambiguity to infer true motion? (Lukas & Kanaeda, 1981) �

  •  

 

 

 

 

7 of 37

How does the brain—or a computer—resolve this ambiguity to infer global motion? (Weiss et al., 2002) �

  •  

 

 

 

 

 

 

 

 

 

 

8 of 37

A Bayesian interpretation of the barberpole illusion explains our percept by incorporating a rigidity assumption.

Bayesian Model

Deviation From True Motion

Aspect Ratio

Aperture Orientation

9 of 37

Neural Evidence Supporting the Integration of Local Motion Signals (Rust et al., 2006)

10 of 37

Structure From Motion (Tomashi & Kanaeda, 1992)

  •  

11 of 37

Structure From Motion (Tomashi & Kanaeda, 1992)

  •  

12 of 37

3D Reconstruction Under Rigid Rotation: What Happens When Points Can't Be Tracked?

Rigid rotation of two rings

If you track each point

When the shape is symmetric, the visual system struggles to track individual points, leading to an illusory wobbling percept.

13 of 37

Many objects and scene deform as they move.

Despite the abundance of nonrigid 3D organisms and scene in the world, and despite the evolutionary advantage of being able to judge their actions from shape

deformations, nonrigid motion perception is an understudied phenomenon

14 of 37

How is the non-rigid motion texture represented in the brain?

  • Non-rigid motion texture is characterized by continuously evolving shapes and fluid dynamics.
  • Despite its seemingly chaotic patterns, our visual system effortlessly distinguishes between different classes of non-rigid motion.
  • Furthermore, even though the human body is comprised of relatively rigid parts, we can still infer the dynamic expressions found in dance and choreography.
  • How does the brain represent these complex non-rigid motions?

15 of 37

A Parametric Texture Model (2D texture) Based on Joint Statistics (Portilla and Simoncelli, 2000)

  • The image is decomposed using a steerable pyramid:
  • Outputs subbands representing structure at different scales and orientations (like V1 neurons)
  • From this decomposition, the algorithm computes joint statistics including:
  • Marginal statistics (mean, variance, skewness, kurtosis) of subbands
  • Auto-correlations within subbands (spatial relationships)
  • Cross-correlations between orientations and across scales
  • Synthesis by Iterative Matching

Reconstructed Texture

Original Texture

16 of 37

2D sine wave and its Fourier transform: spatial frequency is indicated by the radius from the center.

 

17 of 37

2D sine wave and its Fourier transform: orientation is represented by the angle from the center.

 

18 of 37

FFT-Based Decomposition That Preserves Every Detail—Without Distortion

Aliasing

Ununiform expression

19 of 37

The 2D Steerable Pyramid decomposes an image's FFT into orientation- and scale-specific subbands without distortion.

20 of 37

The 2D Steerable Pyramid decomposes an image's FFT into orientation- and scale-specific subbands without distortion.

 

21 of 37

Speed as Orientation in the Spatiotemporal Frequency Domain

 

22 of 37

The 3D Steerable Pyramid enables distortion-free decomposition of a video’s Fourier spectrum into orientation- and scale-tuned spatiotemporal subbands.

23 of 37

The 3D Steerable Pyramid enables distortion-free decomposition of a video’s Fourier spectrum into orientation- and scale-tuned spatiotemporal subbands.

High-pass Mask

Low-pass Mask

Angle Mask

24 of 37

The 3D Steerable Pyramid enables distortion-free decomposition of a video’s Fourier spectrum into orientation- and scale-tuned spatiotemporal subbands.

25 of 37

The 3D Steerable Pyramid enables distortion-free decomposition of a video’s Fourier spectrum into orientation- and scale-tuned spatiotemporal subbands.

26 of 37

The Motion Texture Model synthesizes a video that is physically different from the original but perceptually similar.

  • The video is decomposed using a steerable pyramid:
  • Outputs subbands representing structure at different scales and spatiotemporal orientations (like V1 direction selective neurons)
  • From this decomposition, the algorithm computes joint statistics including:
  • Marginal statistics (mean, variance, skewness, kurtosis) of subbands
  • Auto-correlations within subbands (spatiotemporal relationships)
  • Cross-correlations between spatiotemporal orientations and across scales
  • Synthesis by Iterative Matching

Metamer

Original Texture

27 of 37

How are different classes of motion textures represented in the model?

….

….

Three motion texture classes—Fire, Wave, and Human Wave Dance—950 videos per class with random spatial rotations .

28 of 37

How are different classes of motion textures represented in the model?

  • The representational vector is high-dimensional.
  • To reduce dimensionality, I performed Principal Component Analysis (PCA).
  • The first two principal components explain approximately 60% of the total variance.
  • When projected onto these components, the vectors for each video reveal clear clustering of two distinct categories (Fire and Wave).

Explained Variance

Principle Components

PC1

PC2

Wave

Fire

29 of 37

How are different classes of motion textures represented in the model?

  •  

PC2

PC1

Wave

Fire

Principle Components

Human Wave Dance

30 of 37

How are different classes of motion textures represented in the model?

  •  

PC2

PC1

Wave

Fire

Principle Components

31 of 37

How are different classes of motion textures represented in the model?

  • The dotted line represents the decision boundary, and the arrow indicates the direction toward classifying as an ocean wave.
  • Most human wave dance samples fall on the side closer to natural wave motions.
  • An obvious outlier in the fire class pulls the decision boundary. Without it, a larger portion of human wave dance samples would likely be classified as ocean wave.

PC2

PC1

Wave

Fire

Principle Components

Human Wave Dance

32 of 37

The model representation appears to align with human perception, though further analysis is needed to confirm this resemblance.

Proportion Classified as ”Wave”

Fire

Wave

Human Wave Dance

Fisher Discriminant Classifier

Most Wave Like

Miss Classified

33 of 37

Why does speed have such a profound effect on our perception of non-rigidity? 

34 of 37

Each video, rigid or non-rigid motion, is decomposed into five distinct speed bands.

35 of 37

Each video, whether exhibiting rigid or non-rigid motion, is decomposed into five distinct speed bands.

36 of 37

Non-rigid motion elicits a slightly higher mean response in the higher-speed filter bands.

Rigid motion elicits slightly higher mean responses in lower-speed filter bands.�• Non-rigid motion shows slightly higher mean responses in higher-speed bands.�• Note: Non-rigid motion includes rigid background components.�• More controlled data is needed for definitive conclusions.

Mean Response

Speed Filter Index

Slow

Fast

Rigid

Non-rigid

37 of 37

Slow-speed bands offer stable features across lighting and distance for rigid motion, while high-speed bands convey dynamic texture cues like direction and intensity.