1 of 17

Introduction to Speech & Natural Language Processing

Lecture 9

Speech Production & Phonetics

Krishnendu Ghosh

2 of 17

What is Speech?

  • Produced from human’s vocal apparatus (Mouth)

  • Natural mode of communication among humans

  • Best communication aid for humans

3 of 17

Speech Processing: Applications

  • Speech Recognition
    • Speaking interface with machines
    • Automatic dictation system
    • Healthcare
  • Speech Synthesis
    • Speaking interface with machines
    • Voice response system
    • Screen readers, reading story books, etc..
    • Aid for visually challenged people
  • Speaker Recognition/Speaker Verification
    • Voice based person authentication system
    • Forensic investigation application

4 of 17

Speech Processing: Applications

  • Language Identification
  • Voice-based Information Retrieval
  • Speech Enhancement
  • Pathological Speech Analysis: Detection & Classification of Voice Disorders
  • Speech Coding
  • Paralinguistic Analysis
    • Emotion recognition
    • Speaker style modeling (speaking rate, pronunciation, etc..)

5 of 17

Developing Speech Applications

  • Speech Recognition: Features to characterize sound units
  • Speech Synthesis: Parameters of Vocal-tract and Excitation
  • Speaker Recognition: Features to characterize speaker
  • Language Identification: Features to characterize language
  • Information (data) Retrieval (voice-based): Pattern discovery
  • Speech Coding: Features for efficient coding and reproduction
  • Speech Enhancement: Features to characterize speech, non-speech, noise, reverberation, etc...

6 of 17

Elements of Speech Communication

  • Talker: Message formulation and conveying via speech mode.
  • Listener: Reception of speech and message comprehension.
  • Medium: Physical medium which carries speech from talker to listener.

7 of 17

Speech Chain:

Steps in Human Speech Communication

8 of 17

Speech Production System

  • Speech is produced during exhalation of air
  • Lungs & associated structure provides required energy
  • Vocal-folds inside larynx is the main excitation source and constriction (total or partial) along vocal tract is an additional source
  • Supra-glottal system which includes pharynx, oral cavity and nasal cavity behave as time-varying resonator

9 of 17

Excitation Sources

Voiced Excitation

  • Vibration of vocal folds
  • Voiced speech

Unvoiced Excitation

  • Total constriction along the vocal tract
  • Partial constriction along the vocal tract
  • Unvoiced speech

Mixed Excitation

  • Combination of above
  • Mixed speech

10 of 17

Production of Speech Sounds

Vowels

  • Articulation: Oral cavity is wide open; no major obstruction. Tongue position and lip shape determine vowel type. Vocal folds vibrate (voiced).
  • Example Sounds: /a/, /e/, /i/, /o/, /u/
  • Example Words: cat, bed, sit, dog, put

Unvoiced Consonants

  • Articulation: There’s complete constriction or closure somewhere in the vocal tract, and no glottal vibration.
  • Example Sounds: /p/, /t/, /k/, /s/, /f/
  • Example Words: pen, top, cat, sun, fan

11 of 17

Production of Speech Sounds

Voiced Consonants

  • Articulation: Constriction like unvoiced consonants, with glottal vibration.
  • Example Sounds: /b/, /d/, /g/, /z/, /v/
  • Example Words: bat, dog, go, zoo, van

Nasal Sounds

  • Articulation: Oral cavity closed, nasal cavity opened, air pass through nose.
  • Example Sounds: /m/, /n/, /ŋ/ (as in sing)
  • Example Words: man, nose, song

Fricatives

  • Articulation: Partial closure creates a narrow opening, causing continuous friction as air passes through.
  • Example Sounds: /f/, /v/, /s/, /z/, /ʃ/ (as in shoe), /ʒ/ (as in measure)
  • Example Words: fan, vase, sun, zebra, shoe, measure

12 of 17

Graphical Model: Speech Production

13 of 17

Digital Model: Speech Production

14 of 17

Speech Perception System

15 of 17

Speech Perception Mechanism

Mainly 3 regions: outer ear, middle ear & inner ear

  • Outer ear directs speech pressure variations towards the middle ear
  • Middle ear transforms pressure variations into mechanical motion
  • Inner ear converts mechanical vibrations into electrical firings in the
  • auditory neurons, which leads to brain
  • Language decoding and message understanding at the higher centers of learning which is less understood

16 of 17

Steps in Speech Reception and

Message Comprehension

  • Acoustic pressure variations funnelled into middle ear by outer ear.
  • Eardrum converts acoustic pressure variations to mechanical vibrations.
  • Mechanical vibrations are transferred to inner ear by middle ear bones.
  • Standing wave patterns are generated in inner ear liquid.
  • Standing waves are converted into neural firings on auditory nerve.
  • Neural firings are decoded and message comprehension is done in brain.

17 of 17