1 of 40

Decision Theory in Practice: Validation, Accuracy, and Trade-offs

By Vera Wilde, Ph.D.

2 of 40

About me

Academic:

    • Ph.D. in Politics, University of Virginia (NSF-supported)
    • Postdoctoral research: UCLA Psychology (Prejudice & Violence Lab), Harvard Kennedy School (Malcolm Wiener Center for Inequality)
    • Research focus: Bias and neutrality in tech-mediated decision-making (security, medicine, welfare)
    • Postdoc work on U.S. national policing database initiative

Civil society:

    • FOIA activism including a Knight Foundation-supported lawsuit resulting in a U.S. Circuit Court precedent (Sack v. DoD)
    • Co-creator of iBorderCtrl.no, cited as impactful science advocacy by Access Now and EDRi; engaged with CCC
    • Collaboration with media like McClatchy Newspapers and Wired

3 of 40

4 of 40

The Validation Problem

“There’s a major difference between asking people about something that

you can verify and asking

them about something that

you can’t…”

- Stephen Fienberg, 2009

interview

Image: The American Academy of Social and Political Science,

https://www.aapss.org/fellow/stephen-e-fienberg/.

5 of 40

6 of 40

Binary Screening Test Results

7 of 40

Case Study: Chat Control

Scaled up from Fienberg et al’s 2003 NAS polygraph report, Table S-1: https://nap.nationalacademies.org/read/10420/chapter/2#5.

8 of 40

Case Study: Chat Control

9 of 40

Irresolvable Tension

  • Binary classifications yield four types of results.
  • Maximizing true-positive rates and minimizing false-positive rates are in tension.
  • Both have implications for practical outcomes of interest (e.g., security, health, research/information quality).

10 of 40

Why the Trade-Off?

  • Idealized Holy Grail model.
  • Probabilistic signal

detection realities.

  • Tech just categorizes.
  • No exit from universal

mathematical laws

11 of 40

12 of 40

What’s the Problem?

  • Net harm possible.
  • Probability theory implies this when:
    • Rarity.
    • Uncertainty.
    • Harm.

Image: Sherkiya Wedgeworth, CC Attribution-NonComm. 4.0 Int’l Lic.

13 of 40

14 of 40

15 of 40

Case Study: UTIs in Primary Care

Table 1 Summary statistics for laboratory tests and initial antibiotic prescribing

All tested

Positive test

Negative test

N

Bacterial rate

Prescrib. rate

N

Prescrib. rate

N

Prescrib. rate

2010

17,513

0.37

0.39

6,411

0.60

11,102

0.27

2011

21,237

0.39

0.39

8,305

0.60

12,932

0.25

2012

27,169

0.39

0.39

10,510

0.61

16,659

0.25

Total

65,919

0.38

0.39

25,226

0.61

40,693

0.26

16 of 40

17 of 40

Bias-variance trade-off

Image: The American Academy of Social and Political Science,

https://www.aapss.org/fellow/stephen-e-fienberg/.

18 of 40

19 of 40

20 of 40

21 of 40

22 of 40

23 of 40

24 of 40

25 of 40

26 of 40

Persistent Uncertainties

  • When and how do we need to worry about reverse causality – screening/intervention contributing to exactly the problem it seeks to mitigate?
  • What about acceptability of algorithmic decision-making?
  • What about Ullrich’s concerns about strategic actors trying to game when they know there’s a threshold – equilibrium effects?
  • Where could there be heterogeneity in how people will use the tech (e.g., automation bias and its opposite)?
  • Under what conditions can we solve the validation problem? How generalizable are Ullrich’s results?

27 of 40

Image: Cristian Faezi & Omar Vidal.

28 of 40

Application

Target

Bycatch

Trawling

Tuna

Dolphin

Polygraph

Spies, Terrorists

Non-spies, non-terrorists

iBorderCtrl

Bad crossings

Innocent crossings

ChatControl

CSAM

Innocent coms

Asymptomatic cancer screenings

Deaths

Healthy people

Lifestyle diseases

Big problems

Mild cases

Advanced medical imaging

See problems

Harmless anomalies

Educational ethics

Plagiarism and AI use in writing

Innocent students

Misinformation

Provably wrong

Ambiguity, dissent

Disinformation

Hostile propaganda

Counterpoint

29 of 40

Application

Target

Bycatch

Trawling

Tuna

Dolphin

Polygraph

Spies, Terrorists

Non-spies, non-terrorists

iBorderCtrl

Bad crossings

Innocent crossings

ChatControl

CSAM

Innocent coms

Asymptomatic cancer screenings

Deaths

Healthy people

Lifestyle diseases

Big problems

Mild cases

Advanced medical imaging

See problems

Harmless anomalies

Educational ethics

Plagiarism and AI use in writing

Innocent students

Misinformation

Provably wrong

Ambiguity, dissent

Disinformation

Hostile propaganda

Counterpoint

30 of 40

Application

Target

Bycatch

Trawling

Tuna

Dolphin

Polygraph

Spies, Terrorists

Non-spies, non-terrorists

iBorderCtrl

Bad crossings

Innocent crossings

ChatControl

CSAM

Innocent coms

Asymptomatic cancer screenings

Deaths

Healthy people

Lifestyle diseases

Big problems

Mild cases

Advanced medical imaging

See problems

Harmless anomalies

Educational ethics

Plagiarism and AI use in writing

Innocent students

Misinformation

Provably wrong

Ambiguity, dissent

Disinformation

Hostile propaganda

Counterpoint

31 of 40

Application

Target

Bycatch

Trawling

Tuna

Dolphin

Polygraph

Spies, Terrorists

Non-spies, non-terrorists

iBorderCtrl

Bad crossings

Innocent crossings

ChatControl

CSAM

Innocent coms

Asymptomatic cancer screenings

Deaths

Healthy people

Lifestyle diseases

Big problems

Mild cases

Advanced medical imaging

See problems

Harmless anomalies

Educational ethics

Plagiarism and AI use in writing

Innocent students

Misinformation

Provably wrong

Ambiguity, dissent

Disinformation

Hostile propaganda

Counterpoint

32 of 40

Application

Target

Bycatch

Trawling

Tuna

Dolphin

Polygraph

Spies, Terrorists

Non-spies, non-terrorists

iBorderCtrl

Bad crossings

Innocent crossings

ChatControl

CSAM

Innocent coms

Asymptomatic cancer screenings

Deaths

Healthy people

Lifestyle diseases

Big problems

Mild cases

Advanced medical imaging

See problems

Harmless anomalies

Educational ethics

Plagiarism and AI use in writing

Innocent students

Misinformation

Provably wrong

Ambiguity, dissent

Disinformation

Hostile propaganda

Ambiguity, dissent

33 of 40

A Dangerous Structure

  • Mass screenings for low-prevalence problems (MaSLoPP)
  • Signal detection
  • Shared structure.
  • Shared problems.

Image: Russell Lee, 1942, public domain. Coolidge, Pinal County, Arizona. Casa Grande Farms, FSA (Farm Security Administration) project. Pigs at a feed trough.

34 of 40

35 of 40

Validation problem spectrum

Chat Control

(unsolved)

UTIs in Danish primary care

36 of 40

Validation problem spectrum

Chat Control

(unsolved)

UTIs in Danish primary care

Policy problems:

  • Border security - Online promotion of terrorism
  • Scientific integrity - Educational ethics
  • Diabetes - Breast cancer - Hypertension
  • (Vaccine) misinformation - (Ukraine) disinformation
  • Opioid prescriptions - Lupus diagnosis

37 of 40

References

Signal detection theory and psychophysics, by Green & Swets. John Wiley, 1966.

“How to Improve Bayesian Reasoning Without Instruction: Frequency Formats,”

Gerd Gigerenzer & Ulrich Hoffrage, Psychological Review, 102(4), October

1995.

The Polygraph and Lie Detection, Stephen Fienberg et al, National Academies

Press, 2002.

“The Need for Cognitive Science in Methodology,” Sander Greenland, American

Journal of Epidemiology, Vol. 186, No. 6, 15 September 2017, p. 639–645.

38 of 40

References

Bayesian Inference in Statistical Analysis, by Box & Tiao, especially the aphorism

about point estimates (relevant, e.g., to quoted accuracy rates of screening

tests): “To the idea that people like to have a single number we answer that

usually they shouldn’t get it,” p. 310.

Statistical Rethinking, by Richard McElreath, especially the vampire example in

Chapter 3, “Sampling the Imaginary.”

Inevitable Illusions: How mistakes of reason rule our minds, by Massimo

Piattelli-Palmarini, especially Chapters 6, “The Fallacy of Near Certainty”

(Bayes’ rule is required to reason about screening tests, and native intuitions

tend to be poor) and 7, “The Seven Deadly Sins” (e.g., overconfidence increases

more than prediction accuracy for experts, anchoring works, and untrained

statistical intuitions tend to be wrong).

39 of 40

References

Michael A. Ribers & Hannes Ullrich, “Complementarities between algorithmic and

human decision-making: The case of antibiotic prescribing,” Quantitative

Marketing and Economics, 2024.

Harding Center for Risk Literacy, “Early detection of breast cancer by

mammography screening,” Fact Box,

https://www.hardingcenter.de/en/transfer-and-impact/fact-boxes/early-detect

ion-of-cancer/early-detection-of-breast-cancer-by-mammography-screening.

“Risk stratification in breast screening workshop,” Andrew Anderson, Cristina

Visintin, Antonis Antoniou, Nora Pashayan, Fiona J. Gilbert, Allan Hackshaw,

Rikesh Bhatt, Harry Hill, Stuart Wright, Katherine Payne, Gabriel Rogers, Bethany Shinkins, Sian Taylor-Phillips & Rosalind Given-Wilson, BMC Proceedings, Vol. 18, No. 22, 2024.

40 of 40

References

Overdiagnosed: Making People Sick in The Pursuit of Health, H.

Gilbert Welch, Lisa M. Schwartz, and Steven Woloshin (MDs;

Beacon Press, 2011).

Quantifying Biases in Causal Models: Classical Confounding vs

Collider-Stratification Bias,” Sander Greenland, Epidemiology

14(3):p 300-306, May 2003.

Causal Diagrams,” by M. Maria Glymour and Sander Greenland,

Chapter 12.