1 of 11

Saliency-aware Stereoscopic Video Retargeting

Hassan Imani, Md Baharul Islam, Lai-Kuan Wong

2 of 11

Stereoscopic Video retargeting

3 of 11

Saliency-aware Stereoscopic Video Retargeting

https: //github.com/z65451/SVR/

SVT: Stereo video Transformer

CoSD: Co saliency detection

4 of 11

Stereo Video Transformer (SVT) architecture

We factorize all parts of the encoder to the spatial, temporal, and disparity channels.

MHDPA: multi-head dot-product attention

LN: layer normalization

MLP: multilayer perceptron

Attention-based frameworks are a rational option for modeling long-range contextual relations in the video

5 of 11

Qualitative results of stereo video retargeting

Qualitative results of stereo video retargeting on randomly selected frames from the KITTI stereo 2015 (first row) and 2012 (second row) datasets for 50% reduced the horizontal video size.

[1] Shai Avidan and Ariel Shamir. Seam carving for content aware image resizing. In ACM SIGGRAPH 2007 papers, pages 10–es. 2007.

[2] Zhu Chuning. Fast video retargeting based on seam carving with parental labeling. arXiv preprint arXiv:1903.03180, 2019.

original frame linear scaling manual cropping seam carve [1] fast video [2] ours

6 of 11

Qualitative Results

Qualitative results of retargeting on randomly selected frames from the KITTI stereo 2015 dataset for 30% and 20% reduced the horizontal size.

20%

30%

7 of 11

Qualitative Results

Different retargeting results with respective depth maps. Left to right: Input video frame and their retargeted results by reducing horizontal size at 50%, 30%, 20%, and 150%(enlarge), respectively.

8 of 11

Quantitative Results

Comparison based on bidirectional similarity

Comparison based on VGG19 feature difference

9 of 11

Ablation study

Comparison of the similarity between the input and retargeted videos based on the VGG19 features.

- without (w/o CoSD) CoSD saliency detection

without (w/o Trans) Transformer block
With all of the blocks

10 of 11

Ablation study

Performance comparison of our model without

using CoSD (w/o CoSD), without using

the SVT (w/o Trans), and with all modules

(ours). Videos are selected from the KITTI

stereo 2015 dataset.

11 of 11

Thanks

Do you have any questions?

Hassan Imani <hassan.imani1987@gmail.com>

Baharul Islam <bislam.eng@gmail.com>

Wong Lai Kuan <lkwong@mmu.edu.my>

CREDITS: This presentation template was created by Slidesgo, including icons by Flaticon,and infographics & images by Freepik