Motion Texture
Akihito Maruya
The barberpole illusion demonstrates that perceived motion direction tends to align with the longer axis of the viewing aperture, overriding the true direction of the moving stimulus.
An increase in the aperture's aspect ratio amplifies its
influence on the perceived direction of motion.
Aperture Problem
How does the brain—or a computer—resolve this ambiguity to infer true motion?�
How does the brain—or a computer—resolve this ambiguity to infer true motion? (Lukas & Kanaeda, 1981) �
How does the brain—or a computer—resolve this ambiguity to infer true motion? (Lukas & Kanaeda, 1981) �
How does the brain—or a computer—resolve this ambiguity to infer global motion? (Weiss et al., 2002) �
A Bayesian interpretation of the barberpole illusion explains our percept by incorporating a rigidity assumption.
Bayesian Model
Deviation From True Motion
Aspect Ratio
Aperture Orientation
Neural Evidence Supporting the Integration of Local Motion Signals (Rust et al., 2006)
Structure From Motion (Tomashi & Kanaeda, 1992)
Structure From Motion (Tomashi & Kanaeda, 1992)
3D Reconstruction Under Rigid Rotation: What Happens When Points Can't Be Tracked?
Rigid rotation of two rings
If you track each point
When the shape is symmetric, the visual system struggles to track individual points, leading to an illusory wobbling percept.
Many objects and scene deform as they move.
Despite the abundance of nonrigid 3D organisms and scene in the world, and despite the evolutionary advantage of being able to judge their actions from shape
deformations, nonrigid motion perception is an understudied phenomenon
How is the non-rigid motion texture represented in the brain?
A Parametric Texture Model (2D texture) Based on Joint Statistics (Portilla and Simoncelli, 2000)
Reconstructed Texture
Original Texture
2D sine wave and its Fourier transform: spatial frequency is indicated by the radius from the center.
2D sine wave and its Fourier transform: orientation is represented by the angle from the center.
FFT-Based Decomposition That Preserves Every Detail—Without Distortion
Aliasing
Ununiform expression
The 2D Steerable Pyramid decomposes an image's FFT into orientation- and scale-specific subbands without distortion.
The 2D Steerable Pyramid decomposes an image's FFT into orientation- and scale-specific subbands without distortion.
Speed as Orientation in the Spatiotemporal Frequency Domain
The 3D Steerable Pyramid enables distortion-free decomposition of a video’s Fourier spectrum into orientation- and scale-tuned spatiotemporal subbands.
The 3D Steerable Pyramid enables distortion-free decomposition of a video’s Fourier spectrum into orientation- and scale-tuned spatiotemporal subbands.
High-pass Mask
Low-pass Mask
Angle Mask
The 3D Steerable Pyramid enables distortion-free decomposition of a video’s Fourier spectrum into orientation- and scale-tuned spatiotemporal subbands.
The 3D Steerable Pyramid enables distortion-free decomposition of a video’s Fourier spectrum into orientation- and scale-tuned spatiotemporal subbands.
The Motion Texture Model synthesizes a video that is physically different from the original but perceptually similar.
Metamer
Original Texture
How are different classes of motion textures represented in the model?
….
….
Three motion texture classes—Fire, Wave, and Human Wave Dance—950 videos per class with random spatial rotations .
How are different classes of motion textures represented in the model?
Explained Variance
Principle Components
PC1
PC2
Wave
Fire
How are different classes of motion textures represented in the model?
PC2
PC1
Wave
Fire
Principle Components
Human Wave Dance
How are different classes of motion textures represented in the model?
PC2
PC1
Wave
Fire
Principle Components
How are different classes of motion textures represented in the model?
PC2
PC1
Wave
Fire
Principle Components
Human Wave Dance
The model representation appears to align with human perception, though further analysis is needed to confirm this resemblance.
Proportion Classified as ”Wave”
Fire
Wave
Human Wave Dance
Fisher Discriminant Classifier
Most Wave Like
Miss Classified
Why does speed have such a profound effect on our perception of non-rigidity?
Each video, rigid or non-rigid motion, is decomposed into five distinct speed bands.
Each video, whether exhibiting rigid or non-rigid motion, is decomposed into five distinct speed bands.
Non-rigid motion elicits a slightly higher mean response in the higher-speed filter bands.
• Rigid motion elicits slightly higher mean responses in lower-speed filter bands.�• Non-rigid motion shows slightly higher mean responses in higher-speed bands.�• Note: Non-rigid motion includes rigid background components.�• More controlled data is needed for definitive conclusions.
Mean Response
Speed Filter Index
Slow
Fast
Rigid
Non-rigid
Slow-speed bands offer stable features across lighting and distance for rigid motion, while high-speed bands convey dynamic texture cues like direction and intensity.