Frontend Optimization Methods for Robust Speaker Verification�
Xuechen Liu�PhD Student, MULTISPEECH & CSG@UEF��MULTISPEECH Weekly, 2021.09.02
-
1
02/09/2021
Frontend Optimization Methods for Robust Speaker Verification
What is Speaker Verification?
-
2
02/09/2021
Task of Speaker Recognition
Speaker Identification
Speaker Verification
Pics are from course Machine Learning for Speech, UEF Fall 2020.
Frontend Optimization Methods for Robust Speaker Verification
How Modern Speaker Verification Works?
-
3
02/09/2021
Feature Extractor
Speaker Embedding
Extractor
Backend
TDNNs, ResNet34, Self-attention pooling, Multi-task learning, Angular SoftMax,
Data augmentation via RIR, MUSAN…
Windowing
DFT
Power
Spectrum
Mel
Filterbanks
Log
DCT
MFCC
Frontend Optimization Methods for Robust Speaker Verification
Why Deep Speaker Embeddings?
-
4
02/09/2021
Frontend Optimization Methods for Robust Speaker Verification
But Why Then For Frontend?
-
5
02/09/2021
Feature Extractor
Speaker Embedding
Extractor
Backend
(and less computing-exhaustive…)
Windowing
DFT
Power
Spectrum
Mel
Filterbanks
Log
DCT
MFCC
Frontend Optimization Methods for Robust Speaker Verification
Anyway - We First Need to Benchmark
-
6
02/09/2021
Short-term features, magnitude spectrum
Short-term features, phase spectrum
Short-term features, with long-term processing
Fundamental frequency features
Feature Extractor
x-vector
PLDA
14 Hand-crafted feature extractors
We re-assess……
Frontend Optimization Methods for Robust Speaker Verification
Anyway - We First Need to Benchmark
-
7
02/09/2021
Constant-Q cepstral coefficients
(CQCCs, Todisco et al. 2016)
Multi-taper MFCCs
(Kinnunen et al. 2012)
Spectral centroid frequency coefficients/
Spectral centroid magnitude coefficients
(SCFCs/SCMCs, Kua et al, 2010)
Mel frequency cepstral coefficients
(MFCCs, Baseline)
Linear prediction cepstral coefficients/
Perceptual linear prediction cepstral coefficients
(LPCCs, Makhoul, 1975; PLPCCs, Hermansky, 1990)
X
Frontend Optimization Methods for Robust Speaker Verification
Anyway - We First Need to Benchmark
-
8
02/09/2021
Modified group delay function
(MGDF, Murthy and Gadde, 2003)
All-Pole group delay function
(APGDF, Rajan et al, 2013)
Unwrapping + Cosine function
Cosine phase function
(Cosphase, Wu et al, 2012)
Cepstal magnitude-phase octave coefficients
(CMPOCs, Yang et al, 2018)
Frontend Optimization Methods for Robust Speaker Verification
Anyway - We First Need to Benchmark
-
9
02/09/2021
Mean Hilbert envelope coefficients
(MHECs, Sadjadi et al, 2012)
Power-normalized cepstral coefficients
(PNCCs, Kim et al, 2016)
Also, MFCC+pitch
MFCC base, attached with 3-dimensional pitch vector
(Ghahremani et al, 2014)
Frontend Optimization Methods for Robust Speaker Verification
Anyway - We First Need to Benchmark
-
10
02/09/2021
Feature | EER (%) | minDCF |
MFCC | 4.65 | 0.5937 |
SCMC | 4.57 | 0.5875 |
Multi-taper | 4.84 | 0.5459 |
MFCC+pitch | 4.67 | 0.5223 |
MFCC+SCMC+Multi-taper | 3.89 | 0.5396 |
Feature | EER (%) | minDCF |
MFCC | 8.12 | 0.8531 |
PNCC | 6.08 | 0.7614 |
SCMC | 6.62 | 0.762 |
MFCC+cosphase+PNCC | 6.24 | 0.7998 |
Voxceleb1-E Results
SITW-DEV Results
Frontend Optimization Methods for Robust Speaker Verification
Anyway - We First Need to Benchmark
-
11
02/09/2021
Feature Extractor
Backend
Alternatives from MFCCs?
Speaker Embedding
Extractor
Short-term features,
magnitude Based
Short-term features,
phase Based
Short-term features with long-term processing
BEST INDOMAIN!
MOST ROBUST!
Frontend Optimization Methods for Robust Speaker Verification
We Have DNN. Can Data-Driven be an Option?
-
12
02/09/2021
Feature Extractor
Speaker Embedding
Extractor
Backend
Windowing
DFT
Power
Spectrum
Mel
Filterbanks
Log
DCT
MFCC
Frontend Optimization Methods for Robust Speaker Verification
Learnable MFCCs
-
13
02/09/2021
Windowing
DFT
Power Spectrum
Mel Filterbanks
Log
DCT
MFCC
Feature Extractor
Speaker Embedding
Extractor
Backend
Frontend Optimization Methods for Robust Speaker Verification
Learnable MFCCs
-
14
02/09/2021
Windowing
DFT
Power Spectrum
Mel Filterbanks
Log
DCT
MFCC
We adapt the four
linear operations
Feature Extractor
Speaker Embedding
Extractor
Backend
Frontend Optimization Methods for Robust Speaker Verification
Learnable MFCCs
-
15
02/09/2021
Log
MFCC
Kernel Initialization
Feature Extractor
Speaker Embedding
Extractor
Backend
Frontend Optimization Methods for Robust Speaker Verification
Learnable MFCCs
-
16
02/09/2021
Windowing
DFT + Power Spectrum
Mel Filterbanks
DCT
Log
MFCC
Loss Regularization
(+loss.)
Feature Extractor
Speaker Embedding
Extractor
Backend
[1] Y. Zhu and B. Mak, Orthogonality Regularizations for End-to-End Speaker Verification. Odyssey 2020.
Frontend Optimization Methods for Robust Speaker Verification
Learnable MFCCs
-
17
02/09/2021
Windowing
DFT + Power Spectrum
DCT
Log
MFCC
Feature Extractor
Speaker Embedding
Extractor
Backend
Kernel Update
(+kernel.)
Mel Filterbanks
Frontend Optimization Methods for Robust Speaker Verification
Learnable MFCCs
-
18
02/09/2021
Windowing
DFT + Power Spectrum
DCT
Log
MFCC
Mel Filterbanks
6.09% EER on SITW
4.33% EER on Vox-1
0.7689 minDCF
on SITW
0.4971 minDCF
on Vox-1
+kernel.
+kernel.
+loss.
9.7% rel. lower
6.7% rel. lower
6.7% rel. lower
18.1% rel. lower
Baseline | EER/minDCF |
Vox-1 test | 4.64%/0.6071 |
SITW | 6.72%/0.8243 |
Feature Extractor
Speaker Embedding
Extractor
Backend
Frontend Optimization Methods for Robust Speaker Verification
Robustness of Features Against Recent Challenges
-
19
02/09/2021
Feature Extractor
Speaker Embedding
Extractor
Backend
Multi-Taper
PCEN&PCMN
Filterbank
PNCCs
Frontend Optimization Methods for Robust Speaker Verification
Robustness of Features Against Recent Challenges
-
20
02/09/2021
Feature Extractor
Speaker Embedding
Extractor
Backend
Multi-Taper
PNCCs
Kernel Initialization
PCEN&PCMN
Filterbank
Frontend Optimization Methods for Robust Speaker Verification
Robustness of Features Against Recent Challenges
-
21
02/09/2021
Feature Extractor
Speaker Embedding
Extractor
Backend
PNCCs
Multi-Taper
Kernel Initialization
PCEN&PCMN
Filterbank
Frontend Optimization Methods for Robust Speaker Verification
How about their Robustness Against Recent Challenges?
-
22
02/09/2021
Feature Extractor
Speaker Embedding
Extractor
Backend
PNCCs
Multi-Taper
VoxCeleb1-{E,H}
VoxMovies (new!)
PCEN&PCMN
Filterbank
Frontend Optimization Methods for Robust Speaker Verification
Key Take-Aways
Frontend Optimization Methods for Robust Speaker Verification
What’s Next?
-
24
02/09/2021
Feature Extractor
Speaker Embedding
Extractor
Backend
Speech attributes
Signal processing/
Filtering
Temporal/long-term operations
More robust kernels/architectures
General
Speaker Verification
Scenario-Specific
Speaker Verification
With applications to…
Frontend Optimization Methods for Robust Speaker Verification
Papers Mentioned in This Presentation
-
25
02/09/2021
THANKS FOR LISTENING!
For questions, please either ask in mattermost or email to:
(I’m not in Nancy anymore, but I’m not going to anywhere either)
-
26
25/11/2020