1 of 19

DLHLP - HW3

Source Separation

TA: 黃冠博、陳泓廷、楊采綸

dlhlp.ta@gmail.com

2 of 19

Source separation

mixture of two people speaking

3 of 19

Time-domain audio separation network (TasNet)

3

4 of 19

Conv-TasNet

4

5 of 19

Conv-TasNet

5

6 of 19

Problem

mixture of two people speaking

7 of 19

Problem

mixture of two people speaking

8 of 19

Permutation Invariant Training (PIT)

8

output 1

output 2

speech 1

(ground truth)

speech 2

(ground truth)

MSE

MSE

speech 2

(ground truth)

speech 1

(ground truth)

9 of 19

Requirements

Run the two following tasks:

  • 3-1 speaker dependent
    • only two speakers
    • training data and testing data are the same two speakers
  • 3-2 speaker independent
    • many different speakers
    • the speaker set in training data and testing data do not overlap

9

10 of 19

Dataset

10

1.Download from google drive or

2.Use gdown command in the command line

(pip install gdown if not installed)

11 of 19

Implementation - Conv Tasnet

11

12 of 19

Run

  • egs/wsj0/run.sh

12

make sure you modify this path(data) to the path where min/ is located

don’t care about these two paths

13 of 19

Network Configuration

  • egs/wsj0/run.sh

13

Feel free to change the configuration by yourself.

Larger model size results in longer training time.

14 of 19

PIT?

  • egs/wsj0/run.sh

14

change pit to 0 to disable PIT

15 of 19

Submit testing result

  • for example: in the following directory “Conv-TasNet/egs/wsj0/exp/train_r8000_N256_L20_B256_H512_P3_X8_R4_C2_gLN_causal0_relu_epoch100_half1_norm5_bs3_worker4_adam_lr1e-3_mmt0_l20_tr/”
    • zip separate/ and push it to your github repository (should be less than 100MB)
      • hw3/results/3-1/separate.zip
      • hw3/results/3-2/separate.zip
      • hw3/results/bonus/separate.zip (optional)
    • in evaluate.log, report average SISNR
    • report.pdf

15

do not push separate/ to github without zipping !

16 of 19

Report(1/2)

  1. (5%)請記錄 evaluate.log 裡面的SiSNR 數值,和當時所用的hyperparameter(這一題請3-1不用PIT, 3-2用PIT)
  2. (5%)嘗試調整不同的hyperparameter,比較其差異,並試著分析結果

(至少針對2種不同的hyperparameter進行實驗)

  • (3%)3-1, 3-2請分別試看看有無PIT的差異並記錄結果 (loss learning curve, Si-SNR)
  • (2%)思考一下為何有無PIT會影響3-1, 3-2的結果並寫下你的看法

16

Your best model should pass the baseline! Si-SNR = 10

NOTE: 第一題請回報最後上傳Github的separation.zip的Si-SNR數值

17 of 19

Report(2/2)

bonus(2%) : �請自己找兩段音訊合起來(請不要使用作業給的data)測看看是否能成功分離,

上傳音訊(含原音檔、合成後音檔及經過model分離的音檔),紀錄Si-SNR於report中,並給出至少一種improve Si-SNR的方法(調參數除外)。

將bonus結果放在hw3/results/bonus/separate.zip,裡面除了分離後的音檔,請另外創建資料夾”origin”,放進你的原音檔(s1, s2, mix)

  • separate/
    • s1.wav, s2.wav
    • origin/
      • s1.wav, s2.wav, mix.wav

17

有用tasnet分離得到一個sisnr分數,improve後再分離得到更好的sisnr,並於report說明方法 ->2分

有用tasnet分離得到一個sisnr分數,只有提出方法 -> 0.5~1分

有用tasnet分離得到一個sisnr分數 沒有講關於improve的東西 -> 0分

18 of 19

Reference

18

19 of 19

Deadline

  • 2020/05/06 9:00

19