1 of 19

DLHLP - HW3

Source Separation

TA: 黃冠博、陳泓廷、楊采綸

dlhlp.ta@gmail.com

2 of 19

Source separation

mixture of two people speaking

3 of 19

Time-domain audio separation network (TasNet)

3

4 of 19

Conv-TasNet

4

5 of 19

Conv-TasNet

5

6 of 19

Problem

mixture of two people speaking

7 of 19

Problem

mixture of two people speaking

8 of 19

Permutation Invariant Training (PIT)

8

https://arxiv.org/pdf/1607.00325.pdf

output 1

output 2

speech 1

(ground truth)

speech 2

(ground truth)

MSE

speech 2

(ground truth)

speech 1

(ground truth)

9 of 19

Requirements

Run the two following tasks:

3-1 speaker dependent

only two speakers
training data and testing data are the same two speakers

3-2 speaker independent

many different speakers
the speaker set in training data and testing data do not overlap

9

10 of 19

Dataset

3-1 (2 GB) only two speakers

3-2 (5.7 GB) many speakers

10

1.Download from google drive or

2.Use gdown command in the command line

(pip install gdown if not installed)

11 of 19

Implementation - Conv Tasnet

[Link]

11

12 of 19

Run

egs/wsj0/run.sh

12

make sure you modify this path(data) to the path where min/ is located

don’t care about these two paths

13 of 19

Network Configuration

egs/wsj0/run.sh

13

Feel free to change the configuration by yourself.

Larger model size results in longer training time.

14 of 19

PIT?

egs/wsj0/run.sh

14

change pit to 0 to disable PIT

15 of 19

Submit testing result

for example: in the following directory “Conv-TasNet/egs/wsj0/exp/train_r8000_N256_L20_B256_H512_P3_X8_R4_C2_gLN_causal0_relu_epoch100_half1_norm5_bs3_worker4_adam_lr1e-3_mmt0_l20_tr/”

zip separate/ and push it to your github repository (should be less than 100MB)

hw3/results/3-1/separate.zip
hw3/results/3-2/separate.zip
hw3/results/bonus/separate.zip (optional)

in evaluate.log, report average SISNR
report.pdf

15

do not push separate/ to github without zipping !

16 of 19

Report(1/2)

(5%)請記錄 evaluate.log 裡面的SiSNR 數值，和當時所用的hyperparameter(這一題請3-1不用PIT, 3-2用PIT)
(5%)嘗試調整不同的hyperparameter，比較其差異，並試著分析結果

(至少針對2種不同的hyperparameter進行實驗)

(3%)3-1, 3-2請分別試看看有無PIT的差異並記錄結果 (loss learning curve, Si-SNR)
(2%)思考一下為何有無PIT會影響3-1, 3-2的結果並寫下你的看法

16

Your best model should pass the baseline! Si-SNR = 10

NOTE: 第一題請回報最後上傳Github的separation.zip的Si-SNR數值

[Template]

17 of 19

Report(2/2)

bonus(2%) : �請自己找兩段音訊合起來(請不要使用作業給的data)測看看是否能成功分離，

上傳音訊(含原音檔、合成後音檔及經過model分離的音檔)，紀錄Si-SNR於report中，並給出至少一種improve Si-SNR的方法(調參數除外)。

將bonus結果放在hw3/results/bonus/separate.zip，裡面除了分離後的音檔，請另外創建資料夾”origin”，放進你的原音檔（s1, s2, mix）

separate/

s1.wav, s2.wav
origin/

s1.wav, s2.wav, mix.wav

17

[Template]

有用tasnet分離得到一個sisnr分數，improve後再分離得到更好的sisnr，並於report說明方法 ->2分

有用tasnet分離得到一個sisnr分數，只有提出方法 -> 0.5~1分

有用tasnet分離得到一個sisnr分數沒有講關於improve的東西 -> 0分

18 of 19

Reference

Paper - TasNet
Paper - Conv-TasNet
Source code - Conv-TasNet implemented by kaituoxu (forked)

18

19 of 19

Deadline

2020/05/06 9:00

19