Accent Transfer
Sameer Pusapaty, Patrick Wang
Problem statement
The goal of this project is to be able to convert an accent of any speaker to another given accent. Specifically we hoped to do it with the movement of American to British and American to Spanish.
“to-ma-to”
“to-mah-to”
Background Research
Data sources
Kaggle:
Forvo
Pre-processing: isolating words
Parsing individual words
“Please call Stella ...
Pre-processing: aligning speech
FastDTW (Dynamic Time Warping)
Pre-processing: extracting MFCC coefficients
[1, 2, 3, 4 … 25]
[1, 2, 3, 4 … 25]
...
5 ms
5 ms
Information is lost when turning sound into MFCC coefficients!
Data Pipeline and Preparation
Speech Samples
Speech separated into words using Watson API
Target Accent Audio Files
Sample Accent Audio Files
Find MFCCs
Find MFCCs
Align using FastDTW
DATA
LABELS
Post-processing
[1, 2, 3, 4 … 25]
[1, 2, 3, 4 … 25]
[1, 2, 3, 4 … 25]
[0, 0, 0, 0 … 0]
[25, 24, 23, 22 … 1]
...
[1, 1, 1, 1 … 1]
...
...
[1, 2, 3, 4 … 25]
[0.5, 1, 1.5, … 12.5]
[9, 9, 9, 9 … 9]
...
...
Averaging rows to “smooth” outcome
Model
Details:
Input layer
output layer
Training
MSE
prediction
label
Results
Extensions and applications