1 of 37

Unsupervised out of distribution detection�in digital pathology

Gabriel Raya

30th November 2020

MSc. Thesis for the degree of Master of Science in Computing Science – Data Science specialization

2 of 37

Twan Van Laarhoven

Assistant professor - Data Science

Jasper Linmans

PhD Candidate - Uncertainty estimation

3 of 37

Deep Learning Success

  • Deep Learning is everywhere!

Unsupervised out-of-distribution detection in digital pathology

3

 

4 of 37

Example: Handwritten Digit Recognition System

  •  

Unsupervised out-of-distribution detection in digital pathology

4

Model trained on MNIST

(in-distribution)

Smiley face

(out-distribution (OoD))

5 of 37

Deep learning fails to OOD inputs

  • This failure is known as the out-of-distribution problem.

Unsupervised out-of-distribution detection in digital pathology

5

6 of 37

Pipeline: Protecting predictive models

Unsupervised out-of-distribution detection in digital pathology

6

Model trained on MNIST

Deep Learning

OOD detector

“4”

Out-of-distribution detection allows us to measure how a model generalizes to domain shift, detecting if the model knows what it knows!

7 of 37

Why deep learning fails to OOD inputs?

  •  

Unsupervised out-of-distribution detection in digital pathology

7

a) Neural Network (NN)

8 of 37

Predictive Uncertainty

Sources of uncertainty:

    • Data uncertainty (irreducible)
    • Epistemic/Model uncertainty (reducible)

Unsupervised out-of-distribution detection in digital pathology

8

a) Neural Network (NN)

b) Ensembles of 50 NNs

 

c) NN trained using Adaptive scale Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) [Chen et al., 2014; Springenberg et al., 2016]

9 of 37

Predictive Uncertainty

  • Supervised learning OOD methods:
    • Task-specific
    • Require labeled data

  • Unsupervised learning OOD methods:
    • Task agnostic
    • Only require input data (no labels are required)

Unsupervised out-of-distribution detection in digital pathology

9

10 of 37

Digital Pathology

  • Whole Slide Images (WSIs) typically contains trillions of pixels.
  • Thousands of sample data in the form of patches can be extracted!

Unsupervised out-of-distribution detection in digital pathology

10

Patient tissue samples

Glass slides

A WSI with healthy tissue

11 of 37

Digital Pathology

Labels are expensive!

Unsupervised out-of-distribution detection is a promising avenue!

Unsupervised out-of-distribution detection in digital pathology

11

Healthy tissue

Tumor tissue

12 of 37

OOD detection applications in pathology

  • Anomaly detection
  • Novelty detection

Unsupervised out-of-distribution detection in digital pathology

12

Healthy tissue

Tumor tissue

13 of 37

Research question

Unsupervised out-of-distribution detection in digital pathology

13

 

14 of 37

Data

Unsupervised out-of-distribution detection in digital pathology

14

15 of 37

Methods

  • Random prior networks
  • Likelihood-based models (VAEs)
  • Bayesian VAE
  • Ensembles VAEs
  • Density of States Estimator

Unsupervised out-of-distribution detection in digital pathology

15

16 of 37

Random Prior Networks

  • Evaluate random prior networks [Ciosek et al., 2020]
  • Works for regression tasks

Unsupervised out-of-distribution detection in digital pathology

16

On top, two predictors (green) were trained to fit two randomly generated priors (red). On the bottom, we obtain uncertainties from the difference between predictors and priors. Dots correspond

to training points xi

  • I found an error in their paper
    • Improper validation of their experiments

TEST vs X

17 of 37

Random Prior Networks

  • Evaluate

Unsupervised out-of-distribution detection in digital pathology

17

18 of 37

Doesn’t work!

Unsupervised out-of-distribution detection in digital pathology

18

19 of 37

Likelihood-based Deep Generative Model (DGM)

  •  

Unsupervised out-of-distribution detection in digital pathology

19

20 of 37

Likelihood-based Deep Generative Model (DGM)

  • If p(x*|model) < threshold then reject it [Bishop, 1994]

Unsupervised out-of-distribution detection in digital pathology

20

 

VAE

Training: Healthy

Testing: Tumor

 

21 of 37

Variational Autoencoders (VAEs)

Unsupervised out-of-distribution detection in digital pathology

21

Samples generated by the VAE resembles the data distribution.

22 of 37

Variational Autoencoders

  • Findings:
    1. VAEs sometimes assign higher log-likelihoods to OoD data than to in-distribution!

Unsupervised out-of-distribution detection in digital pathology

22

 

* Same phenomenon happens in autoregressive models and flow-based models [Nalisnick et al., 2018]

23 of 37

Why VAEs fails to OOD detection?

  •  

Unsupervised out-of-distribution detection in digital pathology

23

24 of 37

Epistemic uncertainty in VAEs

Unsupervised out-of-distribution detection in digital pathology

24

BVAEs requires doing Bayesian inference: Stochastic Gradient Hamiltonian Montecarlo1 instead of SGD

25 of 37

Why VAEs fails to OOD detection?

  • VAEs only tell us about the areas of high density (likelihoods).
  • But don't not tell us where the points concentrate (typical set)

  • Gaussian Annulus Theorem: � “For High dimensional distributions samples concentrates in an annulus ration of√(dim).

Unsupervised out-of-distribution detection in digital pathology

25

A 2-dimensional projection of a 100-dimensional Isotropic Gaussian.

Point with highest density

Samples concentrate here!

26 of 37

Typicality in VAEs

  •  

Unsupervised out-of-distribution detection in digital pathology

26

Points with highest density

Samples concentrate here!

27 of 37

Density of States Estimation (DoSE) [Morningstar et al., 2020]

  •  

Unsupervised out-of-distribution detection in digital pathology

27

28 of 37

Density of States Estimation (DoSE)

  • Compute T1,.., Tn statistics from the VAE
  • Compute the frequency of these statistics using SVM

Unsupervised out-of-distribution detection in digital pathology

28

T1

T2

T3

T4

T5

160.1767

361.4301

201.2534

236.4269

-384.114

97.28395

353.8612

256.5773

238.588

-313.327

135.925

356.8224

220.8974

218.1117

-333.544

196.5389

386.6683

190.1294

244.7873

-424.759

237.7352

409.1786

171.4434

262.2122

-467.235

186.4038

394.6253

208.2215

214.5554

-367.506

...

...

...

...

94.06613

346.7294

252.6633

206.3037

-302.648

One class SVM

 

 

29 of 37

Results

Unsupervised out-of-distribution detection in digital pathology

29

30 of 37

Results in Digital Pathology

Unsupervised out-of-distribution detection in digital pathology

30

31 of 37

Conclusions

  • Unsupervised OoD using VAEs

  • VAEs sometimes assign higher log-likelihoods to OoD

  • Log-likelihoods is just one dimension that characterized the data.

  • Log-likelihoods in DGMs only tell us about high probability density regions but they don’t tell use about where the samples concentrate!

Unsupervised out-of-distribution detection in digital pathology

31

32 of 37

Conclusions

  • Typical set-based approaches are promising for this problem,

  • DoSE outperform all the other approaches with an AUC score of 0.84 in a real pathology case.

  • How we should choose these statistics?

Unsupervised out-of-distribution detection in digital pathology

32

33 of 37

Conclusions

  •  

Unsupervised out-of-distribution detection in digital pathology

33

34 of 37

Future work

  • Hyperparameters:
    • VAE : architecture complexity, image size, latent prior.
    • Bayesian VAE : prior, number of samples, etc.
  • Combine Normalizing flows and VAEs
  • Learn the statistics use in DoSE

Unsupervised out-of-distribution detection in digital pathology

34

35 of 37

Take home message

  • Deep learning fails to OOD samples
  • We can OOD methods to make them more robust
  • Be cautious when using likelihood-based DGMs to OOD detection
  • OOD detection is hard because OOD samples can be anything.
  • The use of the frequency of several statistics let us model better in-distribution data.

Unsupervised out-of-distribution detection in digital pathology

35

36 of 37

Questions?

Unsupervised out-of-distribution detection in digital pathology

36

37 of 37

References

  • [Gal and Ghahramani, 2016] Gal, Y. and Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050-1059.
  • [Chen et al., 2014] Chen, T., Fox, E. B., and Guestrin, C. (2014). Stochastic gradient hamiltonian monte carlo.
  • [Springenberg et al., 2016] Springenberg, J. T., Klein, A., Falkner, S., and Hutter, F.(2016). Bayesian optimization with robust bayesian neural networks. In Advances in neural information processing systems, pages 4134-4142
  • [Nalisnick et al., 2018] Nalisnick, E., Matsukawa, A., Teh, Y. W., Gorur, D., and Lakshminarayanan, B. (2018). Do deep generative models know what they don't know?
  • [Bishop, 1994] Bishop, C. M. (1994). Novelty detection and neural network validation. IEE Proceedings-Vision, Image and Signal processing, 141(4):217-222.
  • [Daxberger and Hernandez-Lobato, 2019] Daxberger, E. and Hernandez-Lobato, J. M. (2019). Bayesian variational autoencoders for unsupervised out-of-distribution detection.
  • [Morningstar et al., 2020] Morningstar, W. R., Ham, C., Gallagher, A. G., Lakshminarayanan, B., Alemi, A. A., and Dillon, J. V. (2020). Density of states estimation for out-of-distribution detection.

Unsupervised out-of-distribution detection in digital pathology

37