1 of 29

Automatic Speech Recognition of Radio

in the Clariah Media Suite

Alec Badenoch

CDH Webinar 26 November 2021

a.w.badenoch@uu.nl

Eurovision Song Contest 2021 in Rotterdam. Source: Wikimedia/Sietske

2 of 29

Who’s got time for time-based media?

100,000s of hours of material
sometimes mere moments are needed
multiple genres
difficult to search

...print sources still dominate for gauging public debate, reactions to popular media, events, etc.

Eurovision Song Contest 2021 in Rotterdam. Source: Wikimedia/Sietske

3 of 29

“The CLARIAH Media Suite is one of the applications of the Dutch infrastructure for Digital Humanities and Social Sciences developed in the CLARIAH project. It facilitates access to key Dutch media collections with advanced multimedia search and analysis tools.

The Media Suite is an innovative digital research environment, an experimental environment (LAB) , in which we are experimenting with new ways of working with multimedia data collections….The Media Suite is in a constant process of co-development with its users and, in that sense, it is not a “finished” environment.”

https://mediasuite.clariah.nl/documentation/faq/what-is-it

4 of 29

Today

General background on ASR generally and in the media suite specifically
Familiarization with searching, close and distant reading with ASR in the media suite
Some tips and tricks for international scholars

5 of 29

ASR (huh! yeah!): what is it good for?

(actually, all kinds of things:)

Searching beyond standard metadata fields
Pinpointing relevant material: specific quotes, speeches, etc.
Comparing with print sources

6 of 29

ASR (huh! yeah!): what is it good for?

(actually, all kinds of things:)

Searching beyond standard metadata fields
Pinpointing relevant material: specific quotes, speeches, etc.
Comparing with print sources
Discovering change in language and/or discourse over time

7 of 29

ASR (huh! yeah!): what is it good for?

(actually, all kinds of things:)

Searching beyond standard metadata fields
Pinpointing relevant material: specific quotes, speeches, etc.
Comparing with print sources
Discovering change in language and/or discourse over time
Further language operations (eg sentiment analysis), theoretically...

8 of 29

ASR and the media suite: parameters

Only accessible in closed environment

“We bring tools to the data, because for reasons of copyright or privacy these data can not be brought to the tools by simply downloading them.”

(so far) only Dutch language (with some exceptions…)
No tools (yet) for language analysis
Gauging completeness of corpus is difficult (completeness/accuracy of ASR; completeness of digitization…)

9 of 29

ASR in the media suite: process & principles

Determining speech/non-speech -> coupling speech to specific vocabularies
Alignment to specific time in text
Aiming for an inclusive AI (accents, dialects, domains)
Still with mistakes: substitution, insertion, deletion

See (in Dutch) Roeland Ordelman “Spraakherkenning voor onderzoek in AV-archieven – Twintig jaar ontwikkeling in Nederland” AVA_net, 2021 https://www.avanet.nl/spraakherkenning-voor-onderzoek-in-av-archieven-twintig-jaar-ontwikkeling-in-nederland/

“Speech Recognition” Beeld en Geluid https://archiefstats.beeldengeluid.nl/speech-recognition

10 of 29

ASR in the media suite: Sound and Vision

More complete overview here https://archiefstats.beeldengeluid.nl/speech-recognition/availability

Radio 1 (Hilversum 2, Radio 1, NPO Radio 1)
Radio 5 (Hilversum 5, Radio 747, 747 AM, Radio 5, NPO Radio 5, NPO Radio 5 Nostalgia)
Source catalogs (items from the Radio Programma, Weken Nederlandse Radio, and Hoorspelen collections)
Television (news and current affairs)

Currently on hold, to resume with after new system build

11 of 29

Radio in the media suite

More complete overview here https://mediasuitedata.clariah.nl/dataset/radio-collection-daan

12 of 29

Radio + ASR in the media suite

Status 22 November 2021. More extensive – but outdated - overview here https://archiefstats.beeldengeluid.nl/speech-recognition/availability

13 of 29

Radio + ASR in the media suite: transcript availability

Status 22 November 2021. More extensive – but outdated - overview here https://archiefstats.beeldengeluid.nl/speech-recognition/availability

14 of 29

Learning features + developing strategies

Accepting/working with fuzziness

At the level of distant reading – corpus
At the level of close reading (errors, translation, etc.)

Working with and around ASR as an international scholar
Exploration and serendipity

15 of 29

Using and exploring ASR in the Media Suite

Getting in, searching in the ASR layer
Forming and refining a query with the search tools
Distant reading: historical charts and the compare tool
Close reading: understanding and working with the transcripts

16 of 29

Case study: TV on the Radio

Long running – 1956-present
Eurovision song competition as point of conversation and public debate (Sandvoss);
expressing sentiments about Europe and/or mirror of national identities (Pajala, Vuletic, etc.);
Site of political controversy
point of pride for queer identities (Raykoff, etc.)
International(ly comparable) event
Intermedial interests: influence of televised music on radio, stardom, etc. (Badenoch 2013)
Radio as source for discussion about TV

At a loss for scholarly inspiration? Check out: https://escincontext.com/resources/bibliography-of-esc-research/

images from https://eurovision.tv/events

17 of 29

PSA:

If you have not already, please:

Open the media suite
Log in (if possible)
Make a user project for use in this webinar
tutorial https://mediasuite.clariah.nl/learn/subject-tutorials/media-suite-tutorial-logging-in-workspace-and-creating-a-user-project

18 of 29

1. Getting in, searching in the ASR layer

Select collection: choose “Sound and Vision Radio Archive”
Metadata: “Speech transcripts ASR”
Date field: “Date, sorting (PREFERRED)”

19 of 29

2.1 Design a query

Enter “Eurovisie” (NL for Eurovision). Alternative: what happens when you try ‘Eurovision’? For both, enter “Eurovisi*”
Save this query
By way of comparison: try doing the same search using only archival metadata (titles, descriptions, subject keywords, etc.) – how big is the difference?

20 of 29

2.2 Refine your query

Try to refine it with a Boolean operator

AND (eg eurovisie AND Israel– NB use Dutch word for the country) (List of participating countries with year of debut: https://en.wikipedia.org/wiki/List_of_countries_in_the_Eurovision_Song_Contest; List of winners: https://en.wikipedia.org/wiki/List_of_Eurovision_Song_Contest_winners#Winners_by_year )
NOT (eg. Eurovisie NOT songfestival; song title or artist NOT Eurovisie)

Try refining by genre – eg ‘muziekuitzending’ (music broadcast) ‘muziekuitvoering’ (music performance) and/or ‘muziekprogramma’ (music programme); news (nieuws) or current affairs
Combine with linked data (next slide)
Save your refined queries

21 of 29

2.2.3 Refine your query: Combine with linked data

Clear search
In the right (‘in:’) column, select ‘performers [...]’, ‘guests’ and/or ‘persons discussed’
Enter the name (eg. Björn Ulvaeus; NB people only for linked data ‘ABBA’ won’t work here)
In the right (‘in:’) column, select ‘Speech Transcripts (ASR)’
Add a Boolean operator (AND; NOT) and a search term (eg. ‘Eurovisie’)

Please remember to save your refined queries!

22 of 29

3.1 Distant reading: “Eurovisie” (Eurovision)

Click on ‘show chart’

What general trends are apparent?
(How) do they change significantly when you slide from ‘absolute’ to ‘relative’ values?

23 of 29

3.2 Distant reading: comparing

Go to ‘tools’ at the top of the screen, and select ‘compare’ from the menu
compare your general and (or) your refined query

What patterns do you see in the data? (cf. known data)
What hypotheses might you form?
To what degree may you confidently rely on your observations based on this visualisation?

Browse your refined list of results: can you tell how useful they are?

24 of 29

4.0 from distant to close reading: filtering

Clear search; re-select ‘speech transcripts’
Enter “douze points” – quotes necessary
Have a look at the list provided – which of these seems the most likely to you?
Have a look/listen to each
In your own query, select

source: https://eurovision.tv/douze-points

25 of 29

4.1 Close reading:

Click on one of your results (It will play the segment automatically: you’ll probably want to stop it.)
In the right column, click on ‘content annotations’
Choose “speech transcript” from the drop-down menu
Note that your search term is in the box above…

26 of 29

4.2 Close reading:

Dutch speakers: scan through the whole speech transcript (you may need to delete your search term).

How many errors can you spot?
How reliable does the transcription seem to you?

Non-Dutch speakers: select the segment of text in question and copy and paste it to an online translator like Google translate or Deep L:

To what extent are you able to make sense of the translation?
To what extent are you able to make use of the translation?
What kinds of question would this help you to answer?
See further: https://mediasuite.clariah.nl/learn/subject-tutorials/work-arounds-for-analyzing-dutch-content-in-english

27 of 29

4.3 Close reading:

Take a moment to explore the whole broadcast:

How much of the programme is intelligible?
(How) does it render music?
Where does it become more or less reliable?

28 of 29

So...

General background on ASR generally and in the media suite specifically
Familiarization with searching, close and distant reading with ASR in the media suite
Some tips and tricks for international scholars

1 of 29

2 of 29

3 of 29

4 of 29

5 of 29

6 of 29

7 of 29

8 of 29

9 of 29

10 of 29

11 of 29

12 of 29

13 of 29

14 of 29

15 of 29

16 of 29

17 of 29

18 of 29

19 of 29

20 of 29

21 of 29

22 of 29

23 of 29

24 of 29

25 of 29

26 of 29

27 of 29

28 of 29

29 of 29