DESIGN AND DEVELOPMENT OF A PERSIAN TEXT TO SPEECH SYSTEM
Mojtaba Moattari Supervisor: Prof. M.H. Moradi Electrical and Computer Engineering Department, Shiraz University
- TTS systems are either concatenative or parameter-synthesizers
- Designing a TTS from scratch needs large amount of data
Sample of resulted
part of speeches
- Recorded 5 hours of sentences
Sentence in Persian: «چه روان، روان را روان کرد»
Lipsync program for word alignment
SPEECH SYNTHESIS APPROACH
- Designing a Farsi TTS with small amount of data for educational purpose
- Use pun words to easily use a same word in different part of speeches
- Designing a TTS system from scratch for new language
Sample of resulted
Chinks and Chunks
- Recorded 5 hours of sentences with only pun words (psyche in Persian means current, ego, thought, lunatic, going, … all in one word!
- Used Audacity to remove second channel and noisy-records
- Used Lipsync tool to align audio to text in phoneme level
- Extracted part of speeches of each word in sentence
- Extracted emission (1-gram) and transition probability (2-grams) for hidden Markov model
- Selected list of candidate voices for each phoneme in text
- Used cascade of linear classifiers to filter the candidates
- Chosen the best candidate as synthesized speech
- Work on fisher CSP as feature extraction / improve Fisher-CSP
- Delve into tempo-spectral features/ Use Heisenberg Uncertainty Principle as a metric / Make K-SVD usable
- Another Fuzzy-CSP improvement using supervised clustering
CVCC and character to phoneme look-up table
Results of TTS on a sample sentence:
Aligning the words with voice:
Making a phase modulator to account for delay
Using Dynamic Programming to search for best synthesized speech
(HMM emission and transition probability)
Samples of Resulted word to phone
Result of randomized phone concatenation (no TTS training) : There is no intonation
Result of phone concatenation after TTS training : The sentence has a rhythm: