1 of 1

DESIGN AND DEVELOPMENT OF A PERSIAN TEXT TO SPEECH SYSTEM

Mojtaba Moattari Supervisor: Prof. M.H. Moradi Electrical and Computer Engineering Department, Shiraz University

PROBLEM DESCRIPTION

Sample of resulted

part of speeches

DATA

Sentence in Persian: «چه روان، روان را روان کرد»

Lipsync program for word alignment

SPEECH SYNTHESIS APPROACH

OBJECTIVE

FUTURE WORK:

METHOD

Sample of resulted

Chinks and Chunks

Recorded 5 hours of sentences with only pun words (psyche in Persian means current, ego, thought, lunatic, going, … all in one word!
Used Audacity to remove second channel and noisy-records
Used Lipsync tool to align audio to text in phoneme level
Extracted part of speeches of each word in sentence
Extracted emission (1-gram) and transition probability (2-grams) for hidden Markov model
Selected list of candidate voices for each phoneme in text
Used cascade of linear classifiers to filter the candidates
Chosen the best candidate as synthesized speech

Audacity Program

Work on fisher CSP as feature extraction / improve Fisher-CSP
Delve into tempo-spectral features/ Use Heisenberg Uncertainty Principle as a metric / Make K-SVD usable
Another Fuzzy-CSP improvement using supervised clustering

CVCC and character to phoneme look-up table

Results of TTS on a sample sentence:

Aligning the words with voice:

Making a phase modulator to account for delay

Using Dynamic Programming to search for best synthesized speech

(HMM emission and transition probability)

Samples of Resulted word to phone

Result of randomized phone concatenation (no TTS training) : There is no intonation

Result of phone concatenation after TTS training : The sentence has a rhythm: