DLHLP - HW3
Source Separation
TA: 黃冠博、陳泓廷、楊采綸
Source separation
mixture of two people speaking
Time-domain audio separation network (TasNet)
3
Conv-TasNet
4
Conv-TasNet
5
Problem
mixture of two people speaking
Problem
mixture of two people speaking
Permutation Invariant Training (PIT)
8
output 1
output 2
speech 1
(ground truth)
speech 2
(ground truth)
MSE
MSE
speech 2
(ground truth)
speech 1
(ground truth)
Requirements
Run the two following tasks:
9
Dataset
10
1.Download from google drive or
2.Use gdown command in the command line
(pip install gdown if not installed)
Implementation - Conv Tasnet
11
Run
12
make sure you modify this path(data) to the path where min/ is located
don’t care about these two paths
Network Configuration
13
Feel free to change the configuration by yourself.
Larger model size results in longer training time.
PIT?
14
change pit to 0 to disable PIT
Submit testing result
15
do not push separate/ to github without zipping !
Report(1/2)
(至少針對2種不同的hyperparameter進行實驗)
16
Your best model should pass the baseline! Si-SNR = 10
NOTE: 第一題請回報最後上傳Github的separation.zip的Si-SNR數值
Report(2/2)
bonus(2%) : �請自己找兩段音訊合起來(請不要使用作業給的data)測看看是否能成功分離,
上傳音訊(含原音檔、合成後音檔及經過model分離的音檔),紀錄Si-SNR於report中,並給出至少一種improve Si-SNR的方法(調參數除外)。
將bonus結果放在hw3/results/bonus/separate.zip,裡面除了分離後的音檔,請另外創建資料夾”origin”,放進你的原音檔(s1, s2, mix)
17
有用tasnet分離得到一個sisnr分數,improve後再分離得到更好的sisnr,並於report說明方法 ->2分
有用tasnet分離得到一個sisnr分數,只有提出方法 -> 0.5~1分
有用tasnet分離得到一個sisnr分數 沒有講關於improve的東西 -> 0分
Reference
18
Deadline
19