1 of 1

DESIGN AND DEVELOPMENT OF A PERSIAN TEXT TO SPEECH SYSTEM

Mojtaba Moattari Supervisor: Prof. M.H. Moradi​ Electrical and Computer Engineering Department, Shiraz University​

PROBLEM DESCRIPTION

  • TTS systems are either concatenative or parameter-synthesizers
  • Designing a TTS from scratch needs large amount of data

Sample of resulted

part of speeches

DATA

  • Recorded 5 hours of sentences

Sentence in Persian: «چه روان، روان را روان کرد»

Lipsync program for word alignment

SPEECH SYNTHESIS APPROACH

OBJECTIVE

FUTURE WORK:

  • Designing a Farsi TTS with small amount of data for educational purpose
  • Use pun words to easily use a same word in different part of speeches
  • Designing a TTS system from scratch for new language

METHOD

Sample of resulted

Chinks and Chunks

  • Recorded 5 hours of sentences with only pun words (psyche in Persian means current, ego, thought, lunatic, going, … all in one word!
  • Used Audacity to remove second channel and noisy-records
  • Used Lipsync tool to align audio to text in phoneme level
  • Extracted part of speeches of each word in sentence
  • Extracted emission (1-gram) and transition probability (2-grams) for hidden Markov model
  • Selected list of candidate voices for each phoneme in text
  • Used cascade of linear classifiers to filter the candidates
  • Chosen the best candidate as synthesized speech

Audacity Program

  • Work on fisher CSP as feature extraction / improve Fisher-CSP
  • Delve into tempo-spectral features/ Use Heisenberg Uncertainty Principle as a metric / Make K-SVD usable
  • Another Fuzzy-CSP improvement using supervised clustering

CVCC and character to phoneme look-up table

Results of TTS on a sample sentence:

Aligning the words with voice:

Making a phase modulator to account for delay

Using Dynamic Programming to search for best synthesized speech

(HMM emission and transition probability)

Samples of Resulted word to phone

Result of randomized phone concatenation (no TTS training) : There is no intonation

Result of phone concatenation after TTS training : The sentence has a rhythm: