1 of 29

Understanding How Deaf and Hard of Hearing Viewers Visually Explore Captioned Live TV News

Akhter Al Amin, Saad Hassan, Sooyeon Lee, Matt Huenerfauth

W4A’23: 20th International Web for All Conference, April 2023, Austin, Texas

2 of 29

2

Image Credit: Proxima Studio

3 of 29

3

Image Credit: Sylvain Pedneault and Mike Liao

4 of 29

Caption Occlusion

4

5 of 29

Prior Work on Caption Placement and Occlusion

5

SIGDOC '13

UAHCI ‘21

W4A ‘21

CHI ‘22

6 of 29

Eye-tracking Study with 19 DHH Participants

6

16 D/deaf and 3 hard of hearing

Watch captioned TV news ~3hrs/week

Do you identify as Deaf or Hard of Hearing? AND

Do you use captioning when viewing videos or television?

27.33 years (SD=6.46)

7 of 29

Stimuli Preparation and Annotation

Stimuli Video Preparation

Reviewed 100 video samples from 15 TV channels and selected a total of 28 video stimuli from 9 TV channels.

Placed captions in top-third of the screen and bottom-third of screen, giving a total of 56 videos.

Arial, 14 font size, white font color, black background, 3-6 seconds latency.

Area of Interest Annotation

Annotated the location and timing of each information region that appeared on the screen.

A researcher used rectangular boxes to annotate areas of interest.

Two researchers verified tight rectangular box fit for information regions.

7

Reviewed 100 video samples from 15 TV channels and selected a total of 28 video stimuli from 9 TV channels. Placed captions in top-third of the screen and bottom-third of screen, giving a total of 56 videos. Arial, 14 font size, white font color, black background, 3-6 seconds latency.

Restricted ourselves to videos with information regions that are common in the TV news genre.

Avoided contentious or emotionally disturbing topics, which could have affected participants’ preferences.

Unlike previous studies, we placed captions so that they did not block other information regions.

For annotation, we extracted the individual frames from each 30-frames-per- second video

Our annotation occurred on each individual frame of video.

For each frame, we drew rectangular shape around each information region to annotate.

After this initial annotation performed by one researcher, two other researchers reviewed every frame to ensure that the rectangular boxes sufficiently contain each information region in that image, while remaining as tight as possible.

The two researchers performed this task together and discussed their work.

More details can be found in the paper.

8 of 29

8

Tobii Pro Nano remote eye tracker

65cm

Notes Taking

Secondary screen with live gaze tracking

9 of 29

RQ1: Gaze Behavior vs. Subjective Numeric Ratings

9

Mean Proportional Fixation Time

Subjective Numeric Ratings

10 of 29

RQ2: Gaze Behavior Over Time For Different Regions

10

11 of 29

RQ2: Gaze Behavior Over Time For Different Regions

11

Group 1

Group 2

Group 3

Group 4

12 of 29

Gaze Behavior Over Time For Different Information Regions

12

Group 1: Peak Followed by Slowly Decreasing Sustained Attention

13 of 29

Factors Explaining Variation in their Attention Over Time

13

Group 1: Peak Followed by Slowly Decreasing Sustained Attention

✓ High Attention Priority

✓ Initial Visual Scan

✓ Provided Context

14 of 29

Factors Explaining Variation in their Attention Over Time

14

“The information on the bottom, the discussion topic, and the running headlines should be visible at any time. I want to be able to read those things and have those things not be blocked. It is fine if some of the information is blocked for a few seconds." - P12

15 of 29

What do we recommend?

15

During the first few seconds of a news video story, it is especially important that over-the-shoulder text, discussion topic, and scrolling news should not be blocked. Later, it is also better to avoid blocking these high-priority information regions, but not at the expense of blocking any dynamic information regions.

16 of 29

Gaze Behavior Over Time For Different Information Regions

16

Group 2: Sustained Attention

17 of 29

Factors Explaining Variation in their Attention Over Time

17

Group 2: Sustained Attention

✓ Human Faces

✓ Dynamic Information

✓ Identification of Speaker

✓ Provide Context

18 of 29

Factors Explaining Variation in their Attention Over Time

18

“The person’s mouth, facial expression, and sometimes body language [are important]. You can really get a lot of information from body language and facial expressions about the context of the video.” - P15

19 of 29

What do we recommend?

19

Speaker’s face, Listener’s face, and Over-the-shoulder text should not be blocked during a news video because they receive continuous attention. We did not find additional priority for these regions during the first few seconds of the news story.

20 of 29

Gaze Behavior Over Time For Different Information Regions

20

Group 3: Low Attention with Some Peaks

21 of 29

Factors Explaining Variation in their Attention Over Time

21

Group 3: Low Attention with Some Peaks

✓ Understanding Source

✓ Static Text

22 of 29

What do we recommend?

22

It could be OK to block Speaker’s Information and Program Title, as long as there were some short gaps in-between caption blocks when a viewer could briefly see them.

23 of 29

Gaze Behavior Over Time For Different Information Regions

23

Group 4: Very Low Attention

24 of 29

Factors Explaining Variation in their Attention Over Time

24

Group 4: Very Low Attention

✓ Unrelated to News Story

✓ Brief Attention Required

25 of 29

Factors Explaining Variation in their Attention Over Time

25

“for the most part, there is some information that is more important than others. Like the weather… temperature isn’t as important as long as the other discussion topics and news are still able to be seen.” - P13

26 of 29

What do we recommend?

26

Not blocking Logo, Time, and Temperature is always best, but if necessary, it should not be problematic to block these regions. Brief durations of time in-between caption blocks when these regions are visible may be enough for DHH viewers to read them.

27 of 29

How do our findings captioning regulatory agencies?

Using direct behavioral measures of attention can shed new light on DHH viewer’s use of information regions.

Development of more specific guidelines for how captions should be placed during television news programs that consider how DHH viewers’ attention both spatially and temporally.

Captioned-video-quality metrics could be invented that penalize occlusions more severely during specific times during a video.

27

28 of 29

28

Dr. Matt Huenerfauth

Dr. Akhter Al Amin

Dr. Sooyeon Lee

Max Shengelia

Saad Hassan

Acknowledgements

And there is more…

Velvet Howland

29 of 29

Recruiting 1-2 PhD Students at Tulane University

Contact Information: saadh.info

Design of robust and flexible human-AI systems to provide access to audio and visual information

Socio-technical challenges related to algorithmic discrimination and transparency in AI systems

Community experiences and perceptions of AI systems used in healthcare (co-advisees)

29