1 of 14

Radio Dicer

Experiments in automatically segmenting radio content

James Dooley

BBC News Labs

2 of 14

There is so much content

Most of it is story driven

3 of 14

Chopping up programmes is a pain

3

4 of 14

News consumption is changing

4

5 of 14

Segmentation can help

  • Visual breakdown of content
  • Jump to points of interest
  • Linking to timestamps in programmes
  • Personalisation

5

6 of 14

So, how do we do it?

Through the magic of text alignment and fuzzy matching

6

7 of 14

What tools do we have at our disposal?

Running Orders

Account of what is in the programme, with prepared scripts

Machine Transcription

Speech to Text processing of the audio programme

7

8 of 14

What tools do we have at our disposal?

Machine Transcription

  • BBC Kaldi
  • Trained on BBC content

Running Orders

  • Vary by programme
  • Different systems
  • Hard to extract data

8

9 of 14

Inputs

Words Array

Rundown Array

{

start: 0.17,

confidence: 1,

end: 0.39,

word: "good",

punct: "Good",

index: 0

}

{

story: "Headlines",

script: "Good evening this is the Six O'Clock News..."

}

9

10 of 14

10

11 of 14

Results

  • Tried with different spoken programmes
  • 85%+ perfect match for scripted programmes
  • When presenters ad lib, accuracy decreases
  • Non-spoken notes in running order causes errors
  • Transcription errors
    • Misunderstanding names
    • Brexit = “breaks it”, “Breck’s. It”, “breakfast

11

12 of 14

12

13 of 14

Closing Thoughts

  • Relatively niche use-case
  • To be open sourced
  • Feeding improvements into the model
  • Short term solution to the problem
  • OpenMedia and other systems will help

13

14 of 14

Thanks!

Any questions?

You can find me at:

james.dooley@bbc.co.uk | github.com/jamesdools

14