1 of 42

Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis

Xin Wang , Tomi Kinnunen, Kong Aik Lee, Paul-Gauthier Noe, Junichi Yamagishi

NII, JST PRESTO, UEF, PolyU, Inria

Interspeech 2024

A4-O2.3 #442

wangxin@nii.ac.jp

1

2 of 42

Summary in one slide

Question: how ASV and spoofing countermeasure (CM) should be fused theoretically?
Message: fusing ASV and CM != fusing ASVs (or CMs)
Methods

Linear fusion of log likelihood ratios (LLRs)
Non-linear fusion of LLRs

Results: both better than baseline, non-linear the best

Bayesian decision theory

2

3 of 42

Background: spoofing CM

protect human listeners

protect ASV

Spoofing CM

bona fide

spoofed

bona fide

3

4 of 42

Background: spoofing CM protecting ASV

ASV

Spoofing CM

enroll

bona fide

matched

bona fide

not matched

4

5 of 42

Background: spoofing-robust ASV (SASV)

A single deep neural network (DNN)

ASV

Spoofing CM

enroll

SASV

5

6 of 42

Background: spoofing-robust ASV (SASV)

Approach 1: end-to-end

A single deep neural network (DNN)

ASV

Spoofing CM

enroll

A single deep neural network (DNN)

easy to get hands on
no extra explanation

6

7 of 42

Background: spoofing-robust ASV (SASV)

Approach 2: fusion-based

technically demanding
re-use CM & ASV
extra explanations

enroll

Fusion

ASV

Spoofing CM

7

8 of 42

Question: how to properly fuse ASV and CM

enroll

ASV

Spoofing CM

Fusion

8

9 of 42

Question: how to properly fuse ASV and CM

Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, and Tomi Kinnunen. 2022. SASV 2022: The first spoofing-aware speaker verification challenge. In Proc. Interspeech, 2022. 2893–2897.

baseline approach ^{(Jung 2022)}

ASV

Spoofing CM

+

9

10 of 42

Question: how to properly fuse ASV and CM

Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, and Tomi Kinnunen. 2022. SASV 2022: The first spoofing-aware speaker verification challenge. In Proc. Interspeech, 2022. 2893–2897.

baseline approach ^{(Jung 2022)}

ASV

Spoofing CM

+

What to do if, say,

10

11 of 42

Question: how to properly fuse ASV and CM

Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, and Tomi Kinnunen. 2022. SASV 2022: The first spoofing-aware speaker verification challenge. In Proc. Interspeech, 2022. 2893–2897.

baseline approach ^{(Jung 2022)}

ASV

Spoofing CM

+

tanh

What to do if, say,
Why not normalize both, why summation …

Any thoery to support the good pratice?

11

12 of 42

Answers by this work

Fusion in SASV != fusion in ASV (or CM) ensemble (sec.2.1)

Spoofing CM and ASV are dealing with different pairs of hypotheses
A different theory is needed

ASV subsystem

+

Logistic regression

ASV

Spoofing CM

+

tanh

12

13 of 42

Answers by this work

Fusion in SASV != fusion in ASV (or CM) ensemble (sec.2.1)

Spoofing CM and ASV are dealing with different pairs of hypotheses
A different theory is needed

Linear summation (Sec.2.2 – 2.4)

Bayesian decision theory + compositional data analysis
In practice: calibration + sum of CM and ASV LLRs

Non-linear fusion (Sec.2.5)

Bayesian decision theory (arxiv appendix)

the “optimal” solution to minimize a decision cost

In practice: calibration & non-linear fusion

We explain the practice in this talk

13

14 of 42

Method 1: linear fusion in good practice

Score calibrations are needed

ASV

Spoofing CM

+

Calibration

Why normalize , not

14

15 of 42

Method 1: linear fusion in good practice

Score calibrations are needed
LLRs should be summed

ASV

Spoofing CM

+

Calibration

Why normalize , not
summation, product

15

16 of 42

Method 1: linear fusion in good practice

Score calibrations are needed
LLRs should be summed

Why normalize , not
summation, product

ASV

Spoofing CM

+

Calibration

Decisions in compositional data analysis

Three data classes but binary decisions!

(sec 2.2 and appendix)

16

17 of 42

Method 1: linear fusion in good practice

Geoffrey Stewart Morrison. 2013. Tutorial on logistic-regression calibration and fusion: converting a score to a likelihood ratio. Australian Journal of Forensic Sciences 45, 2 (2013), 173–197.

Scikit-learn: https://scikit-learn.org/stable/modules/calibration.html

Score calibration – nothing new

ASV

+

calibration

Calibration

Spoofing CM

estimate {a,b} on using hold-out data

Logistic regression ^{(Morrison 2013)}

17

18 of 42

Method 1: linear fusion in good practice

Niko Brummer, Albert Swart, and David Van Leeuwen. 2014. A comparison of linear and non-linear calibrations for speaker recognition. In Proc. Odyssey, 2014. 14–18.

Score calibration – nothing new

Summing LLRs

ASV

+

calibration

Calibration

Spoofing CM

Logistic regression

Generative calibration ^{(Brummer 2014)}

choose a parametric distribution
estimate distribution para. on dev. set
compute

18

19 of 42

Method 1: linear fusion in good practice

Luciana Ferrer, "Analysis and Comparison of Classification Metrics", arXiv:2209.05355, https://github.com/luferrer/CalibrationTutorial

David A. van Leeuwen and Niko Brümmer. 2013. The distribution of calibrated likelihood-ratios in speaker recognition. In Proc. Interspeech, 2013. 1619–1623.

Score calibration – nothing new

Summing LLRs

Summing LLRs

ASV

+

calibration

Calibration

Spoofing CM

Logistic regression

Generative calibration

Many other methods exist ^{(Ferrer 2022, Leeuwen 2013)}

19

20 of 42

Method 1: linear fusion in good practice

Is linear fusion optimal for decision making?

No

+

ASV

calibration

Spoofing CM

See more in Sec2.5 & Appendix

Cost
Bona fide matched	0	Cmiss
Bona fide unmatched	Cfa	0
Spoofed	Cfa	0

20

21 of 42

Method 2: non-linear fusion is better

Non-linear fusion minimizes the cost

ASV

calibration

Spoofing CM

Cost
Bona fide matched	0	Cmiss
Bona fide unmatched	Cfa	0
Spoofed	Cfa	0

fuse

for Cfa=Cmiss

See more in Sec2.5 & Appendix

21

22 of 42

Method 2: non-linear fusion is better

Tomi H. Kinnunen, Kong Aik Lee, Hemlata Tak, Nicholas Evans, and Andreas Nautsch. 2023. t-EER: Parameter-Free Tandem Evaluation of Countermeasures and Biometric Comparators. IEEE Trans. Pattern Anal. Mach. Intell. (2023), 1–16. https://doi.org/10.1109/TPAMI.2023.3313648

Non-linear fusion minimizes the cost

ASV

Calibration

Spoofing CM

fuse

for Cfa=Cmiss

Asserted spoofing prior ^{(Kinnuen 2023)}

22

23 of 42

Method 2: non-linear fusion is better

Tomi H. Kinnunen, Kong Aik Lee, Hemlata Tak, Nicholas Evans, and Andreas Nautsch. 2023. t-EER: Parameter-Free Tandem Evaluation of Countermeasures and Biometric Comparators. IEEE Trans. Pattern Anal. Mach. Intell. (2023), 1–16. https://doi.org/10.1109/TPAMI.2023.3313648

Non-linear fusion minimizes the cost

ASV

Calibration

Spoofing CM

fuse

for Cfa=Cmiss

Asserted spoofing prior ^{(Kinnuen 2023)}

23

24 of 42

Method 2: non-linear fusion is better

Massimiliano Todisco, Héctor Delgado, Kong Aik Lee, Md Sahidullah, Nicholas Evans, Tomi Kinnunen, and Junichi Yamagishi. 2018. Integrated presentation attack detection and automatic speaker verification: Common features and gaussian back-end fusion. In Proc. Interspeech, 2018. 77–81.

Non-linear fusion minimizes the cost

ASV

Calibration

Spoofing CM

fuse

for Cfa=Cmiss

Asserted spoofing prior ^{(Kinnuen 2023)}

A general form of Gaussian fusion ^{(Todisco 2018)}

24

25 of 42

Demo on toy data set

25

26 of 42

Demo on toy data set

26

27 of 42

Demo on toy data set

27

28 of 42

Recap the practices

ASV

Calibration

Spoofing CM

fuse

Linear fusion

Non-linear fusion

All are supported by decision theory

28

29 of 42

Experiments

Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, and Tomi Kinnunen. 2022. SASV 2022: The first spoofing-aware speaker verification challenge. In Proc. Interspeech, 2022. 2893–2897.

Data

SASV 2022 challenge database, official protocols ^{(Jung 2022)}

Systems

All use pre-trained ASV and CM from SASV 2022 B1 ^{(Jung 2022)}
Systems differ in score calibration & fusion

Misc

Training & evaluation in six rounds
Averaged results are reported

29

30 of 42

Experiments

better

worse

Systems with different fusion & calibration methods

SASV-EER

^(Jung2022)

From other papers

other metrics

linear

non-linear

30

31 of 42

Experiments

ASV

CM

+

ASV

CM

+

logistic reg. calibration

ASV

CM

+

Gaussian +

logistic reg.

Gaussian +

logistic reg.

log.reg. calibration

no

calibration

log.reg. + Gaussian calibration

linear

baseline

good linear fusion

31

32 of 42

Experiments

ASV

CM

+

ASV

CM

+

logistic reg. calibration

ASV

CM

+

Gaussian +

logistic reg.

Gaussian +

logistic reg.

bona fide matched

bona fide unmatched

spoofed

baseline

good linear fusion

32

33 of 42

Experiments

ASV

CM

+

ASV

CM

+

logistic reg. calibration

ASV

CM

+

Gaussian +

logistic reg.

Gaussian +

logistic reg.

bona fide matched

bona fide unmatched

spoofed

33

34 of 42

Experiments

Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, and Tomi Kinnunen. 2022. SASV 2022: The first spoofing-aware speaker verification challenge. In Proc. Interspeech, 2022. 2893–2897.

You Zhang, Ge Zhu, and Zhiyao Duan. 2022. A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification. In Proc. Odyssey, June 28, 2022. ISCA, 77–84.

linear

non-liear

^{(Jung 2022)}

^{(Zhang 2022)}

The difference is small on this database

good linear fusion

good non-linear fusion

34

35 of 42

Main messages

Fusion SASV != fusion of ASV or CM ensemble
Linear and non-linear can be suppored by theory
Calibration affects discrimination

ASV

Calibration

Spoofing CM

fuse

35

36 of 42

Pointers

Evaluation using the same Bayes decision cost

SOTA ASV is not robust to spoofing attacks

The non-linear fusion has been used by many teams in ASVspoof 5 challenge

Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen, Nicholas Evans, Jean-Francois Bonastre, and Itshak Lapidot. 2024. a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification. In Proc. Odyssey, 2024. 158–164. https://doi.org/10.21437/odyssey.2024-23

Jee-weon Jung, Xin Wang, Nicholas Evans, Shinji Watanabe, Hye-jin Shim, Hemlata Tak, Sidhhant Arora, Junichi Yamagishi, and Joon Son Chung. 2024. To what extent can ASV systems naturally defend against spoofing attacks? In Proc. Interspeech, 2024. .

A4-05.5

36

37 of 42

Thank you

Appendix

theory in details

Code & Jupyter notebook

step-by-step explanation

ASVspoof

37

38 of 42

Fusing CM & ASV is special

Assuming a 1-0 decision cost

A single ASV

ASV

match

not match

decision

scoring

38

39 of 42

Fusing CM & ASV is special

Fusing ASV, face recognition, and other biometrics

ASV

Face recognition

decision

scoring

+

39

40 of 42

Fusing CM & ASV is special

CM and ASV are dealing with different hypotheses

ASV

CM

decision

scoring

+

40

41 of 42

Fusing CM & ASV is special

We have three classes of data in two separate hypothesis testings

FAKE

Bayes’ rule

&

Isometric-log-ratio

1

Simplex

Optimal way using ternary hypothesis testing

What we need

FAKE

41

42 of 42

Fusing CM & ASV is special

We have three classes of data in two separate hypothesis testings

FAKE

Bayes’ rule

&

Isometric-log-ratio

1

Simplex

FAKE

vs

log likelihood ratio

vs

log likelihood ratio

42