1 of 42

Revisiting and Improving Scoring Fusion for Spoofing-aware Speaker Verification Using Compositional Data Analysis

Xin Wang , Tomi Kinnunen, Kong Aik Lee, Paul-Gauthier Noe, Junichi Yamagishi

NII, JST PRESTO, UEF, PolyU, Inria

Interspeech 2024

A4-O2.3 #442

wangxin@nii.ac.jp

1

2 of 42

Summary in one slide

  • Question: how ASV and spoofing countermeasure (CM) should be fused theoretically?
  • Message: fusing ASV and CM != fusing ASVs (or CMs)
  • Methods
    • Linear fusion of log likelihood ratios (LLRs)
    • Non-linear fusion of LLRs
  • Results: both better than baseline, non-linear the best

Bayesian decision theory

2

3 of 42

Background: spoofing CM

protect human listeners

protect ASV

Spoofing CM

bona fide

spoofed

bona fide

3

4 of 42

Background: spoofing CM protecting ASV

ASV

Spoofing CM

enroll

bona fide

matched

bona fide

not matched

4

5 of 42

Background: spoofing-robust ASV (SASV)

A single deep neural network (DNN)

ASV

Spoofing CM

enroll

SASV

5

6 of 42

Background: spoofing-robust ASV (SASV)

  • Approach 1: end-to-end

A single deep neural network (DNN)

ASV

Spoofing CM

enroll

A single deep neural network (DNN)

  • easy to get hands on
  • no extra explanation

6

7 of 42

Background: spoofing-robust ASV (SASV)

  • Approach 2: fusion-based
  • technically demanding
  • re-use CM & ASV
  • extra explanations

enroll

Fusion

ASV

Spoofing CM

7

8 of 42

Question: how to properly fuse ASV and CM

enroll

ASV

Spoofing CM

Fusion

8

9 of 42

Question: how to properly fuse ASV and CM

Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, and Tomi Kinnunen. 2022. SASV 2022: The first spoofing-aware speaker verification challenge. In Proc. Interspeech, 2022. 2893–2897.

  • baseline approach (Jung 2022)

ASV

Spoofing CM

+

9

10 of 42

Question: how to properly fuse ASV and CM

Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, and Tomi Kinnunen. 2022. SASV 2022: The first spoofing-aware speaker verification challenge. In Proc. Interspeech, 2022. 2893–2897.

  • baseline approach (Jung 2022)

ASV

Spoofing CM

+

  • What to do if, say,

10

11 of 42

Question: how to properly fuse ASV and CM

Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, and Tomi Kinnunen. 2022. SASV 2022: The first spoofing-aware speaker verification challenge. In Proc. Interspeech, 2022. 2893–2897.

  • baseline approach (Jung 2022)

ASV

Spoofing CM

+

tanh

  • What to do if, say,
  • Why not normalize both, why summation …

Any thoery to support the good pratice?

11

12 of 42

Answers by this work

  • Fusion in SASV != fusion in ASV (or CM) ensemble (sec.2.1)
    • Spoofing CM and ASV are dealing with different pairs of hypotheses
    • A different theory is needed

ASV subsystem

ASV subsystem

+

Logistic regression

ASV

Spoofing CM

+

tanh

12

13 of 42

Answers by this work

  • Fusion in SASV != fusion in ASV (or CM) ensemble (sec.2.1)
    • Spoofing CM and ASV are dealing with different pairs of hypotheses
    • A different theory is needed

  • Linear summation (Sec.2.2 – 2.4)
    • Bayesian decision theory + compositional data analysis
    • In practice: calibration + sum of CM and ASV LLRs
  • Non-linear fusion (Sec.2.5)
    • Bayesian decision theory (arxiv appendix)
      • the “optimal” solution to minimize a decision cost
    • In practice: calibration & non-linear fusion

We explain the practice in this talk

13

14 of 42

Method 1: linear fusion in good practice

  • Score calibrations are needed

ASV

Spoofing CM

+

Calibration

Calibration

  • Why normalize , not

14

15 of 42

Method 1: linear fusion in good practice

  • Score calibrations are needed
  • LLRs should be summed

ASV

Spoofing CM

+

Calibration

Calibration

  • Why normalize , not
  • summation, product

15

16 of 42

Method 1: linear fusion in good practice

  • Score calibrations are needed
  • LLRs should be summed

  • Why normalize , not
  • summation, product

ASV

Spoofing CM

+

Calibration

Calibration

Decisions in compositional data analysis

Three data classes but binary decisions!

(sec 2.2 and appendix)

16

17 of 42

Method 1: linear fusion in good practice

Geoffrey Stewart Morrison. 2013. Tutorial on logistic-regression calibration and fusion: converting a score to a likelihood ratio. Australian Journal of Forensic Sciences 45, 2 (2013), 173–197.

Scikit-learn: https://scikit-learn.org/stable/modules/calibration.html

  • Score calibration – nothing new

ASV

+

calibration

Calibration

Spoofing CM

estimate {a,b} on using hold-out data

Logistic regression (Morrison 2013)

17

18 of 42

Method 1: linear fusion in good practice

Niko Brummer, Albert Swart, and David Van Leeuwen. 2014. A comparison of linear and non-linear calibrations for speaker recognition. In Proc. Odyssey, 2014. 14–18.

  • Score calibration – nothing new

    • Summing LLRs

ASV

+

calibration

Calibration

Spoofing CM

Logistic regression

Generative calibration (Brummer 2014)

  1. choose a parametric distribution
  2. estimate distribution para. on dev. set
  3. compute

18

19 of 42

Method 1: linear fusion in good practice

Luciana Ferrer, "Analysis and Comparison of Classification Metrics", arXiv:2209.05355, https://github.com/luferrer/CalibrationTutorial

David A. van Leeuwen and Niko Brümmer. 2013. The distribution of calibrated likelihood-ratios in speaker recognition. In Proc. Interspeech, 2013. 1619–1623.

  • Score calibration – nothing new

    • Summing LLRs

    • Summing LLRs

ASV

+

calibration

Calibration

Spoofing CM

Logistic regression

Generative calibration

Many other methods exist (Ferrer 2022, Leeuwen 2013)

19

20 of 42

Method 1: linear fusion in good practice

  • Is linear fusion optimal for decision making?
    • No

+

ASV

calibration

calibration

Spoofing CM

See more in Sec2.5 & Appendix

Cost

Bona fide

matched

0

Cmiss

Bona fide

unmatched

Cfa

0

Spoofed

Cfa

0

20

21 of 42

Method 2: non-linear fusion is better

  • Non-linear fusion minimizes the cost

ASV

calibration

calibration

Spoofing CM

Cost

Bona fide

matched

0

Cmiss

Bona fide

unmatched

Cfa

0

Spoofed

Cfa

0

fuse

for Cfa=Cmiss

See more in Sec2.5 & Appendix

21

22 of 42

Method 2: non-linear fusion is better

Tomi H. Kinnunen, Kong Aik Lee, Hemlata Tak, Nicholas Evans, and Andreas Nautsch. 2023. t-EER: Parameter-Free Tandem Evaluation of Countermeasures and Biometric Comparators. IEEE Trans. Pattern Anal. Mach. Intell. (2023), 1–16. https://doi.org/10.1109/TPAMI.2023.3313648

  • Non-linear fusion minimizes the cost

ASV

Calibration

Calibration

Spoofing CM

fuse

for Cfa=Cmiss

Asserted spoofing prior (Kinnuen 2023)

22

23 of 42

Method 2: non-linear fusion is better

Tomi H. Kinnunen, Kong Aik Lee, Hemlata Tak, Nicholas Evans, and Andreas Nautsch. 2023. t-EER: Parameter-Free Tandem Evaluation of Countermeasures and Biometric Comparators. IEEE Trans. Pattern Anal. Mach. Intell. (2023), 1–16. https://doi.org/10.1109/TPAMI.2023.3313648

  • Non-linear fusion minimizes the cost

ASV

Calibration

Calibration

Spoofing CM

fuse

for Cfa=Cmiss

Asserted spoofing prior (Kinnuen 2023)

 

23

24 of 42

Method 2: non-linear fusion is better

Massimiliano Todisco, Héctor Delgado, Kong Aik Lee, Md Sahidullah, Nicholas Evans, Tomi Kinnunen, and Junichi Yamagishi. 2018. Integrated presentation attack detection and automatic speaker verification: Common features and gaussian back-end fusion. In Proc. Interspeech, 2018. 77–81.

  • Non-linear fusion minimizes the cost

ASV

Calibration

Calibration

Spoofing CM

fuse

for Cfa=Cmiss

Asserted spoofing prior (Kinnuen 2023)

A general form of Gaussian fusion (Todisco 2018)

 

24

25 of 42

Demo on toy data set

25

26 of 42

Demo on toy data set

26

27 of 42

Demo on toy data set

 

27

28 of 42

Recap the practices

ASV

Calibration

Calibration

Spoofing CM

fuse

Linear fusion

Non-linear fusion

All are supported by decision theory

28

29 of 42

Experiments

Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, and Tomi Kinnunen. 2022. SASV 2022: The first spoofing-aware speaker verification challenge. In Proc. Interspeech, 2022. 2893–2897.

  • Data
    • SASV 2022 challenge database, official protocols (Jung 2022)

  • Systems
    • All use pre-trained ASV and CM from SASV 2022 B1 (Jung 2022)
    • Systems differ in score calibration & fusion

  • Misc
    • Training & evaluation in six rounds
    • Averaged results are reported

29

30 of 42

Experiments

better

worse

Systems with different fusion & calibration methods

SASV-EER

(Jung2022)

From other papers

other metrics

linear

linear

non-linear

30

31 of 42

Experiments

ASV

CM

+

ASV

CM

+

logistic reg. calibration

logistic reg. calibration

ASV

CM

+

Gaussian +

logistic reg.

Gaussian +

logistic reg.

log.reg. calibration

no

calibration

log.reg. + Gaussian calibration

linear

linear

baseline

good linear fusion

good linear fusion

31

32 of 42

Experiments

ASV

CM

+

ASV

CM

+

logistic reg. calibration

logistic reg. calibration

ASV

CM

+

Gaussian +

logistic reg.

Gaussian +

logistic reg.

bona fide matched

bona fide unmatched

spoofed

baseline

good linear fusion

good linear fusion

32

33 of 42

Experiments

ASV

CM

+

ASV

CM

+

logistic reg. calibration

logistic reg. calibration

ASV

CM

+

Gaussian +

logistic reg.

Gaussian +

logistic reg.

bona fide matched

bona fide unmatched

spoofed

33

34 of 42

Experiments

Jee-weon Jung, Hemlata Tak, Hye-jin Shim, Hee-Soo Heo, Bong-Jin Lee, Soo-Whan Chung, Ha-Jin Yu, Nicholas Evans, and Tomi Kinnunen. 2022. SASV 2022: The first spoofing-aware speaker verification challenge. In Proc. Interspeech, 2022. 2893–2897.

You Zhang, Ge Zhu, and Zhiyao Duan. 2022. A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification. In Proc. Odyssey, June 28, 2022. ISCA, 77–84.

linear

non-liear

(Jung 2022)

(Zhang 2022)

The difference is small on this database

good linear fusion

good non-linear fusion

34

35 of 42

Main messages

  • Fusion SASV != fusion of ASV or CM ensemble
  • Linear and non-linear can be suppored by theory
  • Calibration affects discrimination

ASV

Calibration

Calibration

Spoofing CM

fuse

35

36 of 42

Pointers

  • Evaluation using the same Bayes decision cost

  • SOTA ASV is not robust to spoofing attacks

  • The non-linear fusion has been used by many teams in ASVspoof 5 challenge

Hye-jin Shim, Jee-weon Jung, Tomi Kinnunen, Nicholas Evans, Jean-Francois Bonastre, and Itshak Lapidot. 2024. a-DCF: an architecture agnostic metric with application to spoofing-robust speaker verification. In Proc. Odyssey, 2024. 158–164. https://doi.org/10.21437/odyssey.2024-23

Jee-weon Jung, Xin Wang, Nicholas Evans, Shinji Watanabe, Hye-jin Shim, Hemlata Tak, Sidhhant Arora, Junichi Yamagishi, and Joon Son Chung. 2024. To what extent can ASV systems naturally defend against spoofing attacks? In Proc. Interspeech, 2024. .

A4-05.5

36

37 of 42

Thank you

Appendix

theory in details

Code & Jupyter notebook

step-by-step explanation

ASVspoof

37

38 of 42

Fusing CM & ASV is special

Assuming a 1-0 decision cost

  • A single ASV

ASV

match

not match

decision

scoring

38

39 of 42

Fusing CM & ASV is special

  • Fusing ASV, face recognition, and other biometrics

ASV

Face recognition

decision

scoring

+

39

40 of 42

Fusing CM & ASV is special

  • CM and ASV are dealing with different hypotheses

ASV

CM

decision

scoring

+

40

41 of 42

Fusing CM & ASV is special

  • We have three classes of data in two separate hypothesis testings

FAKE

Bayes’ rule

&

Isometric-log-ratio

1

Simplex

Optimal way using ternary hypothesis testing

What we need

FAKE

FAKE

41

42 of 42

Fusing CM & ASV is special

  • We have three classes of data in two separate hypothesis testings

FAKE

Bayes’ rule

&

Isometric-log-ratio

1

Simplex

FAKE

vs

log likelihood ratio

vs

log likelihood ratio

42