1 of 37

Unsupervised out of distribution detection�in digital pathology

Gabriel Raya

30th November 2020

MSc. Thesis for the degree of Master of Science in Computing Science – Data Science specialization

2 of 37

Twan Van Laarhoven

Assistant professor - Data Science

Jasper Linmans

PhD Candidate - Uncertainty estimation

3 of 37

Deep Learning Success

Deep Learning is everywhere!

Unsupervised out-of-distribution detection in digital pathology

3

4 of 37

Example: Handwritten Digit Recognition System

Unsupervised out-of-distribution detection in digital pathology

4

Model trained on MNIST

(in-distribution)

Smiley face

(out-distribution (OoD))

Demo: https://yumiw.csb.app/

5 of 37

Deep learning fails to OOD inputs

This failure is known as the out-of-distribution problem.

Unsupervised out-of-distribution detection in digital pathology

5

6 of 37

Pipeline: Protecting predictive models

Unsupervised out-of-distribution detection in digital pathology

6

Model trained on MNIST

Deep Learning

OOD detector

“4”

Out-of-distribution detection allows us to measure how a model generalizes to domain shift, detecting if the model knows what it knows!

7 of 37

Why deep learning fails to OOD inputs?

Unsupervised out-of-distribution detection in digital pathology

7

a) Neural Network (NN)

8 of 37

Predictive Uncertainty

Sources of uncertainty:

Data uncertainty (irreducible)
Epistemic/Model uncertainty (reducible)

Unsupervised out-of-distribution detection in digital pathology

8

a) Neural Network (NN)

b) Ensembles of 50 NNs

c) NN trained using Adaptive scale Stochastic Gradient Hamiltonian Monte Carlo (SGHMC) [Chen et al., 2014; Springenberg et al., 2016]

9 of 37

Predictive Uncertainty

Supervised learning OOD methods:

Task-specific
Require labeled data

Unsupervised learning OOD methods:

Task agnostic
Only require input data (no labels are required)

Unsupervised out-of-distribution detection in digital pathology

9

10 of 37

Digital Pathology

Whole Slide Images (WSIs) typically contains trillions of pixels.
Thousands of sample data in the form of patches can be extracted!

Unsupervised out-of-distribution detection in digital pathology

10

Patient tissue samples

Glass slides

A WSI with healthy tissue

11 of 37

Digital Pathology

Labels are expensive!

Unsupervised out-of-distribution detection is a promising avenue!

Unsupervised out-of-distribution detection in digital pathology

11

Healthy tissue

Tumor tissue

12 of 37

OOD detection applications in pathology

Anomaly detection
Novelty detection

Unsupervised out-of-distribution detection in digital pathology

12

Healthy tissue

Tumor tissue

13 of 37

Research question

Unsupervised out-of-distribution detection in digital pathology

13

14 of 37

Data

Unsupervised out-of-distribution detection in digital pathology

14

15 of 37

Methods

Random prior networks
Likelihood-based models (VAEs)
Bayesian VAE
Ensembles VAEs
Density of States Estimator

Unsupervised out-of-distribution detection in digital pathology

15

16 of 37

Random Prior Networks

Evaluate random prior networks [Ciosek et al., 2020]
Works for regression tasks

Unsupervised out-of-distribution detection in digital pathology

16

On top, two predictors (green) were trained to fit two randomly generated priors (red). On the bottom, we obtain uncertainties from the difference between predictors and priors. Dots correspond

to training points xi

I found an error in their paper

Improper validation of their experiments

TEST vs X

17 of 37

Random Prior Networks

Evaluate

Unsupervised out-of-distribution detection in digital pathology

17

18 of 37

Doesn’t work!

Unsupervised out-of-distribution detection in digital pathology

18

19 of 37

Likelihood-based Deep Generative Model (DGM)

Unsupervised out-of-distribution detection in digital pathology

19

20 of 37

Likelihood-based Deep Generative Model (DGM)

If p(x*|model) < threshold then reject it [Bishop, 1994]

Unsupervised out-of-distribution detection in digital pathology

20

VAE

Training: Healthy

Testing: Tumor

21 of 37

Variational Autoencoders (VAEs)

Unsupervised out-of-distribution detection in digital pathology

21

Samples generated by the VAE resembles the data distribution.

22 of 37

Variational Autoencoders

Findings:

VAEs sometimes assign higher log-likelihoods to OoD data than to in-distribution!

Unsupervised out-of-distribution detection in digital pathology

22

* Same phenomenon happens in autoregressive models and flow-based models [Nalisnick et al., 2018]

23 of 37

Why VAEs fails to OOD detection?

Unsupervised out-of-distribution detection in digital pathology

23

24 of 37

Epistemic uncertainty in VAEs

Bayesian : Bayesian Variational Autoencoders [Daxberger and Hernandez-Lobato, 2019]
Non-Bayesian: Ensembles of VAE [Lakshminarayanan et al., 2017]

Unsupervised out-of-distribution detection in digital pathology

24

BVAEs requires doing Bayesian inference: Stochastic Gradient Hamiltonian Montecarlo1 instead of SGD

25 of 37

Why VAEs fails to OOD detection?

VAEs only tell us about the areas of high density (likelihoods).
But don't not tell us where the points concentrate (typical set)

Gaussian Annulus Theorem: � “For High dimensional distributions samples concentrates in an annulus ration of√(dim).

Unsupervised out-of-distribution detection in digital pathology

25

A 2-dimensional projection of a 100-dimensional Isotropic Gaussian.

Point with highest density

Samples concentrate here!

26 of 37

Typicality in VAEs

Unsupervised out-of-distribution detection in digital pathology

26

Points with highest density

Samples concentrate here!

27 of 37

Density of States Estimation (DoSE) [Morningstar et al., 2020]

Unsupervised out-of-distribution detection in digital pathology

27

28 of 37

Density of States Estimation (DoSE)

Compute T1,.., Tn statistics from the VAE
Compute the frequency of these statistics using SVM

Unsupervised out-of-distribution detection in digital pathology

28

T1	T2	T3	T4	T5
160.1767	361.4301	201.2534	236.4269	-384.114
97.28395	353.8612	256.5773	238.588	-313.327
135.925	356.8224	220.8974	218.1117	-333.544
196.5389	386.6683	190.1294	244.7873	-424.759
237.7352	409.1786	171.4434	262.2122	-467.235
186.4038	394.6253	208.2215	214.5554	-367.506
...	…	...	…	…
...	…	...	…	…
94.06613	346.7294	252.6633	206.3037	-302.648

One class SVM

29 of 37

Results

Unsupervised out-of-distribution detection in digital pathology

29

30 of 37

Results in Digital Pathology

Unsupervised out-of-distribution detection in digital pathology

30

31 of 37

Conclusions

Unsupervised OoD using VAEs

VAEs sometimes assign higher log-likelihoods to OoD

Log-likelihoods is just one dimension that characterized the data.

Log-likelihoods in DGMs only tell us about high probability density regions but they don’t tell use about where the samples concentrate!

Unsupervised out-of-distribution detection in digital pathology

31

32 of 37

Conclusions

Typical set-based approaches are promising for this problem,

DoSE outperform all the other approaches with an AUC score of 0.84 in a real pathology case.

How we should choose these statistics?

Unsupervised out-of-distribution detection in digital pathology

32

33 of 37

Conclusions

Unsupervised out-of-distribution detection in digital pathology

33

34 of 37

Future work

Hyperparameters:

VAE : architecture complexity, image size, latent prior.
Bayesian VAE : prior, number of samples, etc.

Combine Normalizing flows and VAEs
Learn the statistics use in DoSE

Unsupervised out-of-distribution detection in digital pathology

34

35 of 37

Take home message

Deep learning fails to OOD samples
We can OOD methods to make them more robust
Be cautious when using likelihood-based DGMs to OOD detection
OOD detection is hard because OOD samples can be anything.
The use of the frequency of several statistics let us model better in-distribution data.

Unsupervised out-of-distribution detection in digital pathology

35

36 of 37

Questions?

Unsupervised out-of-distribution detection in digital pathology

36

37 of 37

References

[Gal and Ghahramani, 2016] Gal, Y. and Ghahramani, Z. (2016). Dropout as a Bayesian approximation: Representing model uncertainty in deep learning. In international conference on machine learning, pages 1050-1059.
[Chen et al., 2014] Chen, T., Fox, E. B., and Guestrin, C. (2014). Stochastic gradient hamiltonian monte carlo.
[Springenberg et al., 2016] Springenberg, J. T., Klein, A., Falkner, S., and Hutter, F.(2016). Bayesian optimization with robust bayesian neural networks. In Advances in neural information processing systems, pages 4134-4142
[Nalisnick et al., 2018] Nalisnick, E., Matsukawa, A., Teh, Y. W., Gorur, D., and Lakshminarayanan, B. (2018). Do deep generative models know what they don't know?
[Bishop, 1994] Bishop, C. M. (1994). Novelty detection and neural network validation. IEE Proceedings-Vision, Image and Signal processing, 141(4):217-222.
[Daxberger and Hernandez-Lobato, 2019] Daxberger, E. and Hernandez-Lobato, J. M. (2019). Bayesian variational autoencoders for unsupervised out-of-distribution detection.
[Morningstar et al., 2020] Morningstar, W. R., Ham, C., Gallagher, A. G., Lakshminarayanan, B., Alemi, A. A., and Dillon, J. V. (2020). Density of states estimation for out-of-distribution detection.

Unsupervised out-of-distribution detection in digital pathology

37