1 of 54

Can computers REALLY understand human languages?

2 of 54

Who am I?

  • Liling: NTU graduate student @ LMS, HSS

  • alvations: my online alias , semi-hacker (cheat-engine, script kiddy, notCEH, PenTest enthusiast)

  • bigmax: my othello nickname (sometimes i drop by NUS Intellectual Games Club)

3 of 54

Overview

  • What is NLP?
    • Human vs Computer Interaction?

  • Siri - Talking to iPhone
    • Speech Recognition
    • Speech Synthesizer

  • Watson - Supercomputer in Jeopardy
    • NLP components
    • NLP resources

  • Can computers understand language?

4 of 54

What is Natural Language Processing?

NLP is where computer scientists tries to make computer talks and where linguists tries to talk to the computer

Ultimately making the computer produce and understand human language.

5 of 54

What is Natural Language Processing?

  • Today, NLP is largely used to improve service standards by
    • understanding customers' preferences
    • simplifying complex information
    • simulating human interaction

  • Secretly we know our

altruistic, ideal goal is:

to create talking

androids or cylons

6 of 54

How do humans communicate using language?

7 of 54

Hear

Think

Speak

Write

Read

8 of 54

How do computers communicate using human language?

9 of 54

Hear

Think

Speak

Write

Read

10 of 54

Speech Recognition

Grammar, Semantics,

Knowledge processing, ...

Text-to-Speech Synthesizers

Various Applications

Corpora, Dictionaries,

Ontologies, ...

11 of 54

12 of 54

Siri - Talking to iPhone

  • Siri is a voice activated personal assistant and knowledge navigator
    • Personal assistant = don't need to drag, poke or press to get things done on an iPhone
    • Knowledge navigator = automatically suggest information by filtering a large search result (usually based on user's preference)

13 of 54

Science behind Siri

Language Understanding

Feature Extraction

Word Recognition

Grammatical Structure

Literal meaning of words

Inference, implications, humor, etc

Speech Recognition

(ASR)

Natural Language Generation (NLG)

Speech Synthesis

Generating and Ranking Answers

How to reply appropriately

14 of 54

Science behind Siri

Language Understanding (ASU)

Feature Extraction

Word Recognition

Syntactic Analysis

Semantic Analysis

Pragmatic Analysis

Speech Recognition

(ASR)

Natural Language Generation (NLG)

Speech Synthesis

Answer Generation

Diaglog Systems

15 of 54

Automatic Speech Recognition

ASR is straight-forward, record some sound, analyze and then guess the sound:

  • Feature extraction
  • Word matching (disclaimer: not going to talk much)

16 of 54

Feature Extraction

Spotting patterns (i.e. feature) from a sound wave.

  1. Record sound waves
  2. Pick a sample of the sound waves
  3. Analyze energy patterns of sample waves
  4. Spot patterns from these sample waves
  5. Apply spotted patterns to across whole set of data

17 of 54

Feature Extraction: IPA

Before pattern spotting, we ask ourselves what types of sounds are there?

  • s z sh th (airy, hissing sound)
  • t d b p (explosive sound)
  • a e i o u (vowels = vocal tract energetic)
  • yada yada (many many other types ...)

18 of 54

Feature Extraction: IPA

Before pattern spotting, we ask ourselves what types of sounds are there?

  • s z sh th (spirant)
  • t d b p (plosive)
  • a e i o u (vocalic)
  • yada yada (many many more ...)

19 of 54

20 of 54

Strong burst of energy from vocalic sounds

21 of 54

Airy, fuzzy sound from Spirants

22 of 54

Word matching

  • Ask lots of undergraduates to annotate lots of words-energy, and then

  • Use machine learning methods to determine which word matches which chunk of energy
    • Speech technologies are largely based on supervised machine learning
      • Bayesian, HMM, FSA graphs

23 of 54

Ɵ I ʌ ɹ aI v ʌ l o̞ f bɛ n w a l: ɘ s:

24 of 54

If you have google-chrome browser and a linux distro, go to terminal and copy and paste this, and press return:

echo "<input type="text" x-webkit-speech />" >> chrome-speak.html | google-chrome chrome-speak.html

25 of 54

Revived History:

Articulatory synthesis

They seriously built "robots" that tried to imitate human nose, mouth, lips, vocal tract and tongue (e.g. Kempelen's, The Voder, Vocandroids)

26 of 54

Some-time-ago History:

Concatenative Synthesis

They tried asking people to record large chunks and then cut & paste.

27 of 54

State-of-art:

Diphone Synthesis (Unit Selection)

Now the Stephen Hawking way...

What is a phone? A sound

What is diphone? 2 sounds lor...

Example:

green day

-g , gr , ri , i: , in , n- , -d , de , ei , i-

28 of 54

State-of-art:

Diphone Synthesis (Unit Selection)

Q: So in English, there are 26 letters in the alphabets so we get 26^2 diphones?

A: No, there are 26 letters but there are ~40 phonemes , so about 1600 diphones.

(click here for Google synthesizer hack by alvas)

29 of 54

Siri's competitor

  • S-Voice (Samsung's version of Siri)
  • Evi (Android's alternative to Siri)
  • Iris (AAA, Another Android App)
  • Ahmed's Iris (an indie project)
  • Watson (From supercomputer to your iphone)

  • Anyone else who can apply the above ASR and TTS knowledge coupled with simple NLP techniques for QnA.

30 of 54

31 of 54

What is Jeopardy?

You are given the answer/hint and you need to give the question that ask about the

thing / time / person / location / etc...

For example,

A: The person who walks on moon and is crowned the king of pop.

Q: Who is Michael Jackson?

32 of 54

Computers DON'T:

  • understand human language, it only knows bits/bytes

(language structure and meaning)

  • know how to ask questions, until now we have only seen computers answering questions.

(question forming)

33 of 54

Computers DON'T:

  • listen to pop music in the 90's

(world knowledge)

  • understand humor / puns

(subtlety in language, pragmatics)

34 of 54

Watson: Grammar and Semantics

Q: How do computers understand our grammar? How do computers knows:

[ [The person] [who walked on moon] ] and [is crowned [the king of pop] ].

A: Using Deep or Shallow processing methods

35 of 54

Deep Linguistic Processing

You have a grammarian or computational linguist sitting down in his office telling the computer,

When you see: (Part of Speech Tagging)

  • [crowned], it is a Verb in the past tense
  • [king], it is a Noun
  • [pop], it can be a Noun or a Verb depending on the words nearby
  • [moon], it's a Noun
  • ...

36 of 54

Deep Linguistic Processing

You have a grammarian or computational

linguist sitting down in his office telling the

computer,

When you see:(Combinatory Rules/Constraints)

  • [the], the next word is a Noun.
  • [on], [of], the next phrase is a Noun Phrase.
  • [on], it's telling you the location of a Noun nearby
  • [of], it's telling you some property of a Noun nearby
  • [who], it's describing some property of a Noun nearby
  • ..

37 of 54

Shallow Language Processing

Extract a sample from your data, hire some undergrads and tell them to assign a tag to each word:

  • [the] is a determiner
  • [of] , [on] is a preposition
  • [crowned], [walked] is a past-tense verb
  • [king], [moon] are noun

And then apply what the students tag, feed it to some machine learning software and tagged the rest automatically (MAGIC!!!, yes it is!!!)

38 of 54

Shallow Language Processing

After tagging, we write some simple rules to tell the computer which words can combine with which:

  • [the] + [king] = NP
  • [of] + [pop] = PP
  • NP + PP = bigger NP
  • [walked] forms a VP by itself
  • [walked] + PP[on the moon] = bigger VP
  • NP + VP = S

39 of 54

Shallow Language Processing

After tagging, we write some simple rules to tell the computer which words can combine with which:

40 of 54

Beyond Grammar

Now the computer can understand human language structure, how about:

- meaning ???

- world knowledge ???

- humor ???

(there are tonnes of science behind these but for today, i'll just tell you what is required to solve these)

41 of 54

Resources need to go beyond grammar

  • Ontologies

42 of 54

Resources need to go beyond grammar

  • Ontologies
  • Corpora and dictionaries

43 of 54

Resources need to go beyond grammar

  • Ontologies
  • Corpora and dictionaries
  • Annotations - undergrads who invest their time in doing these

44 of 54

Resources need to go beyond grammar

  • Ontologies
  • Corpora and dictionaries
  • Annotations
  • Machine Learning techniques

45 of 54

Resources need to go beyond grammar

  • Ontologies
  • Corpora and dictionaries
  • Annotations
  • Machine learning techniques
  • Information Retrieval techniques

46 of 54

Science behind Siri

Language Understanding (ASU)

Feature Extraction

Word Recognition

Syntactic Analysis

Semantic Analysis

Pragmatic Analysis

Speech Recognition

(ASR)

Natural Language Generation (NLG)

Speech Synthesis

Answer Generation

Diaglog Systems

47 of 54

Can computers REALLY understand human languages?

48 of 54

49 of 54

No,

at least not yet

50 of 54

51 of 54

Appendix

52 of 54

ASR Demos to try (in your freetime)

  • ispeech.org (web demo)

  • ASR lists on wiki (linux friendly)

  • Naunce Dragon (commercial ware)

53 of 54

Speech Synthesizers Demo + vids

  • AT&T TTS Demo - Synthesizer (web)

  • Yamaha Synthesizer

  • Vocaloids Holograph - Synthesizer + 4D

  • Vocandroids - Synthesizer + Vocawatch

54 of 54

Computational Meaning

Lexical: each word has a meaning called senses and the combination of senses is the meaning to the sentence

Logical: each word contains a certain mathematical operation and the combination of these operations will generate true or false statements