1 of 21

Artificial Intelligence and Archived Television

Dan Schultz, The Internet Archive

2 of 21

Story Arc

  1. The Archive
  2. The Data
  3. The Experiments
  4. The Vision

3 of 21

1: The Archive

4 of 21

5 of 21

The Archive

  • Comprehensive TV News (2009-present)
  • Python API
  • mp4, mp3, mpg, ...
  • Clip-based metadata
  • Also, you know, The Internet

6 of 21

7 of 21

8 of 21

2: The Data

9 of 21

The Data - Categories

Video

Audio

Captions

Program

Context

Audience

10 of 21

The Data - Signals

Video

Audio

Captions

  • Faces / Expression
  • Objects
  • Effects
  • Chyrons / Words
  • Pitch / Frequency
  • Language
  • Volume
  • Music
  • Sound Effects
  • Text
  • Timestamps
  • Speech to text

Program

Context

Audience

  • Airtime / Date
  • Themes
  • Hosts
  • Description
  • Ownership / Sponsor
  • Related Programming
  • Current Events
  • Meta Narratives
  • Location
  • Social
  • Demographics
  • Networks

11 of 21

The Data - Techniques (Examples)

  • Optical Character Recognition (OCR)
  • Speaker Identification
  • Audio / Video Fingerprinting
  • Automated Speech Recognition (ASR)
  • Speaker Diarization
  • Claim Detection
  • Topic Modeling
  • Sentiment Analysis

12 of 21

3: The Experiments

13 of 21

The Experiments - Political TV Ad Archive

  • Audio Fingerprinting
  • Downloadable Data
  • Semiautomated Curation

FOLLOW UP: Coverage Analysis

14 of 21

The Experiments - Political TV Ad Archive

15 of 21

The Experiments - Political TV Ad Archive

16 of 21

The Experiments - Duplitron 5000

http://bit.ly/worst-dvr

17 of 21

The Experiments - Face-o-Matic

  • FaceNet (Matroid)
  • Face Identification
  • Slack Prototype

(Just Launched Today)

http://bit.ly/faceomatic

18 of 21

The Experiments - Chyron Extraction

((Tracey will talk about this tomorrow))

19 of 21

4: The Vision

20 of 21

The Vision

Distribution

  • Opened Captions
  • REST API

Digital Library Branches

  • Collaborative experimentation clusters

Community Contribution

  • Search and contribute data on your own
  • More sources

21 of 21

Thanks!

Dan Schultz

dan.schultz@archive.org

@slifty