1 of 29

Automatic Speech Recognition of Radio

in the Clariah Media Suite

Alec Badenoch

CDH Webinar 26 November 2021

a.w.badenoch@uu.nl

Eurovision Song Contest 2021 in Rotterdam. Source: Wikimedia/Sietske

2 of 29

Who’s got time for time-based media?

  • 100,000s of hours of material
  • sometimes mere moments are needed
  • multiple genres
  • difficult to search

...print sources still dominate for gauging public debate, reactions to popular media, events, etc.

Eurovision Song Contest 2021 in Rotterdam. Source: Wikimedia/Sietske

3 of 29

“The CLARIAH Media Suite is one of the applications of the Dutch infrastructure for Digital Humanities and Social Sciences developed in the CLARIAH project. It facilitates access to key Dutch media collections with advanced multimedia search and analysis tools.

The Media Suite is an innovative digital research environment, an experimental environment (LAB) , in which we are experimenting with new ways of working with multimedia data collections….The Media Suite is in a constant process of co-development with its users and, in that sense, it is not a “finished” environment.”

https://mediasuite.clariah.nl/documentation/faq/what-is-it

https://mediasuite.clariah.nl/documentation/faq/what-is-it

4 of 29

Today

  • General background on ASR generally and in the media suite specifically
  • Familiarization with searching, close and distant reading with ASR in the media suite
  • Some tips and tricks for international scholars

5 of 29

ASR (huh! yeah!): what is it good for?

(actually, all kinds of things:)

  • Searching beyond standard metadata fields
  • Pinpointing relevant material: specific quotes, speeches, etc.
  • Comparing with print sources

6 of 29

ASR (huh! yeah!): what is it good for?

(actually, all kinds of things:)

  • Searching beyond standard metadata fields
  • Pinpointing relevant material: specific quotes, speeches, etc.
  • Comparing with print sources
  • Discovering change in language and/or discourse over time

7 of 29

ASR (huh! yeah!): what is it good for?

(actually, all kinds of things:)

  • Searching beyond standard metadata fields
  • Pinpointing relevant material: specific quotes, speeches, etc.
  • Comparing with print sources
  • Discovering change in language and/or discourse over time
  • Further language operations (eg sentiment analysis), theoretically...

8 of 29

ASR and the media suite: parameters

  • Only accessible in closed environment

“We bring tools to the data, because for reasons of copyright or privacy these data can not be brought to the tools by simply downloading them.”

  • (so far) only Dutch language (with some exceptions…)
  • No tools (yet) for language analysis
  • Gauging completeness of corpus is difficult (completeness/accuracy of ASR; completeness of digitization…)

9 of 29

ASR in the media suite: process & principles

  • Determining speech/non-speech -> coupling speech to specific vocabularies
  • Alignment to specific time in text
  • Aiming for an inclusive AI (accents, dialects, domains)
  • Still with mistakes: substitution, insertion, deletion

See (in Dutch) Roeland Ordelman “Spraakherkenning voor onderzoek in AV-archieven – Twintig jaar ontwikkeling in Nederland” AVA_net, 2021 https://www.avanet.nl/spraakherkenning-voor-onderzoek-in-av-archieven-twintig-jaar-ontwikkeling-in-nederland/

“Speech Recognition” Beeld en Geluid https://archiefstats.beeldengeluid.nl/speech-recognition

10 of 29

ASR in the media suite: Sound and Vision

  • Radio 1 (Hilversum 2, Radio 1, NPO Radio 1)
  • Radio 5 (Hilversum 5, Radio 747, 747 AM, Radio 5, NPO Radio 5, NPO Radio 5 Nostalgia)
  • Source catalogs (items from the Radio Programma, Weken Nederlandse Radio, and Hoorspelen collections)
  • Television (news and current affairs)

Currently on hold, to resume with after new system build

11 of 29

Radio in the media suite

12 of 29

Radio + ASR in the media suite

Status 22 November 2021. More extensive – but outdated - overview here https://archiefstats.beeldengeluid.nl/speech-recognition/availability

13 of 29

Radio + ASR in the media suite: transcript availability

Status 22 November 2021. More extensive – but outdated - overview here https://archiefstats.beeldengeluid.nl/speech-recognition/availability

14 of 29

Learning features + developing strategies

  • Accepting/working with fuzziness
    • At the level of distant reading – corpus
    • At the level of close reading (errors, translation, etc.)
  • Working with and around ASR as an international scholar
  • Exploration and serendipity

15 of 29

Using and exploring ASR in the Media Suite

  1. Getting in, searching in the ASR layer
  2. Forming and refining a query with the search tools
  3. Distant reading: historical charts and the compare tool
  4. Close reading: understanding and working with the transcripts

16 of 29

Case study: TV on the Radio

  • Long running – 1956-present
  • Eurovision song competition as point of conversation and public debate (Sandvoss);
  • expressing sentiments about Europe and/or mirror of national identities (Pajala, Vuletic, etc.);
  • Site of political controversy
  • point of pride for queer identities (Raykoff, etc.)
  • International(ly comparable) event
  • Intermedial interests: influence of televised music on radio, stardom, etc. (Badenoch 2013)
  • Radio as source for discussion about TV

At a loss for scholarly inspiration? Check out: https://escincontext.com/resources/bibliography-of-esc-research/

17 of 29

PSA:

If you have not already, please:

  • Open the media suite
  • Log in (if possible)
  • Make a user project for use in this webinar
  • tutorial https://mediasuite.clariah.nl/learn/subject-tutorials/media-suite-tutorial-logging-in-workspace-and-creating-a-user-project

18 of 29

1. Getting in, searching in the ASR layer

  1. Select collection: choose “Sound and Vision Radio Archive”
  2. Metadata: “Speech transcripts ASR”
  3. Date field: “Date, sorting (PREFERRED)”

19 of 29

2.1 Design a query

  1. Enter “Eurovisie” (NL for Eurovision). Alternative: what happens when you try ‘Eurovision’? For both, enter “Eurovisi*”
  2. Save this query
  3. By way of comparison: try doing the same search using only archival metadata (titles, descriptions, subject keywords, etc.) – how big is the difference?

20 of 29

2.2 Refine your query

  1. Try to refine it with a Boolean operator
    1. AND (eg eurovisie AND Israel– NB use Dutch word for the country) (List of participating countries with year of debut: https://en.wikipedia.org/wiki/List_of_countries_in_the_Eurovision_Song_Contest; List of winners: https://en.wikipedia.org/wiki/List_of_Eurovision_Song_Contest_winners#Winners_by_year )
    2. NOT (eg. Eurovisie NOT songfestival; song title or artist NOT Eurovisie)
  2. Try refining by genre – eg ‘muziekuitzending’ (music broadcast) ‘muziekuitvoering’ (music performance) and/or ‘muziekprogramma’ (music programme); news (nieuws) or current affairs
  3. Combine with linked data (next slide)
  4. Save your refined queries

21 of 29

2.2.3 Refine your query: Combine with linked data

  1. Clear search
  2. In the right (‘in:’) column, select ‘performers [...]’, ‘guests’ and/or ‘persons discussed’
  3. Enter the name (eg. Björn Ulvaeus; NB people only for linked data ‘ABBA’ won’t work here)
  4. In the right (‘in:’) column, select ‘Speech Transcripts (ASR)’
  5. Add a Boolean operator (AND; NOT) and a search term (eg. ‘Eurovisie’)

Please remember to save your refined queries!

22 of 29

3.1 Distant reading: “Eurovisie” (Eurovision)

  1. Click on ‘show chart’
    1. What general trends are apparent?
    2. (How) do they change significantly when you slide from ‘absolute’ to ‘relative’ values?

23 of 29

3.2 Distant reading: comparing

  1. Go to ‘tools’ at the top of the screen, and select ‘compare’ from the menu
  2. compare your general and (or) your refined query
    1. What patterns do you see in the data? (cf. known data)
    2. What hypotheses might you form?
    3. To what degree may you confidently rely on your observations based on this visualisation?
  3. Browse your refined list of results: can you tell how useful they are?

24 of 29

4.0 from distant to close reading: filtering

  1. Clear search; re-select ‘speech transcripts’
  2. Enter “douze points” – quotes necessary
  3. Have a look at the list provided – which of these seems the most likely to you?
  4. Have a look/listen to each
  5. In your own query, select

source: https://eurovision.tv/douze-points

25 of 29

4.1 Close reading:

  1. Click on one of your results (It will play the segment automatically: you’ll probably want to stop it.)
  2. In the right column, click on ‘content annotations’
  3. Choose “speech transcript” from the drop-down menu
  4. Note that your search term is in the box above…

26 of 29

4.2 Close reading:

  • Dutch speakers: scan through the whole speech transcript (you may need to delete your search term).
    • How many errors can you spot?
    • How reliable does the transcription seem to you?

  • Non-Dutch speakers: select the segment of text in question and copy and paste it to an online translator like Google translate or Deep L:
    • To what extent are you able to make sense of the translation?
    • To what extent are you able to make use of the translation?
    • What kinds of question would this help you to answer?
    • See further: https://mediasuite.clariah.nl/learn/subject-tutorials/work-arounds-for-analyzing-dutch-content-in-english

27 of 29

4.3 Close reading:

  • Take a moment to explore the whole broadcast:
    • How much of the programme is intelligible?
    • (How) does it render music?
    • Where does it become more or less reliable?

28 of 29

So...

  • General background on ASR generally and in the media suite specifically
  • Familiarization with searching, close and distant reading with ASR in the media suite
  • Some tips and tricks for international scholars

29 of 29

(thank you)