1 of 84

Machine Learning Group

Big Data Summer Institute 2022

Department of Electrical Engineering and Computer Science

Department of Biostatistics

Claire Chu Sara Colando Ricardo Gloria Picazzo Savannah Gonzales Audrey Kim

Amaan Jogia-Sattar Jonathan Lin Dhruba Nandi Rui Nie Xavier Serrano Nguyen Tran-Bach

1

2 of 84

Presentation Outline

  • Background
  • Different eXplainable AI methods
    • Group 1
    • Group 2
    • Group 3
  • Joint conclusion
  • Questions

2

3 of 84

In medicine, diagnoses = important

  • Proper patient care hinges on specific, accurate, and timely diagnostics

3

4 of 84

In medicine, diagnoses = important

  • Proper patient care hinges on specific, accurate, and timely diagnostics

  • ex) Brain tumor grades require different interventions

4

5 of 84

Radiologists!

Radiologists to the rescue!

segmentation →

5

6 of 84

Radiologist’s dilemma: accuracy vs efficiency

Radiologists to the rescue!

segmentation →

6

  • Large workloads
  • Small time frames
  • High stakes

7 of 84

Radiologist’s dilemma: accuracy vs efficiency

Radiologists to the rescue!

7

  • Large workloads
  • Small time frames
  • High stakes

8 of 84

Radiologist’s dilemma: accuracy vs efficiency

Radiologists to the rescue!

8

  • Large workloads
  • Small time frames
  • High stakes

9 of 84

Optimizing radiology with AI

9

10 of 84

Do we trust AI?

10

11 of 84

AI Explainability

  • Demystifying the “Black-Box Problem”�
  • eXplainable AI (XAI)

Precision Health->Clinician-Model Synchronicity

11

12 of 84

Drawbacks of Existing XAI Methods

XAI methods exist! So what’s the problem?

  • Huge computational demands
  • Incompatibility
    • Inputs (data)
    • Task (classification vs. segmentation)

12

13 of 84

Drawbacks of Existing XAI Methods

XAI methods exist! So what’s the problem?

  • Huge computational demands
  • Incompatibility
    • Inputs (data)
    • Task (classification vs. segmentation)

13

14 of 84

Drawbacks of Existing XAI Methods

XAI methods exist! So what’s the problem?

  • Huge computational demands
  • Incompatibility
    • Inputs (data)
    • Task (classification vs. segmentation)

classification

cow

vs.

no cow

“cow”

“goat”

“bull”

14

15 of 84

Drawbacks of Existing XAI Methods

XAI methods exist! So what’s the problem?

  • Huge computational demands
  • Incompatibility
    • Inputs (data)
    • Task (classification vs. segmentation)

classification

segmentation

cow

vs.

no cow

“cow”

“goat”

“bull”

15

16 of 84

Drawbacks of Existing XAI Methods

XAI methods exist! So what’s the problem?

  • Huge computational demands
  • Incompatibility
    • Inputs (data)
    • Task (classification vs. segmentation)

classification

segmentation

cow

vs.

no cow

“cow”

“goat”

“bull”

16

Difficulty: XAI methods not prepared for segmentation

17 of 84

AI Model: U-Net

  • State-of-the-art convolutional neural network used for medical image segmentation�
  • Layers utilize spatial correlations of input data, create weighted feature map�
  • Segmentation: identifying precise location of entity of interest

Specifically,

  • Trained for brain tumor segmentation

17

18 of 84

AI Model: U-Net

  • State-of-the-art convolutional neural network used for medical image segmentation�
  • Layers utilize spatial correlations of input data, create weighted feature map�
  • Segmentation: identifying precise location of entity of interest

Specifically,

  • Trained for brain tumor segmentation

18

7.8mil parameters!!!

Difficulty: Huge model → hard to train (computational costs)

19 of 84

The Dataset

Per patient,

  • MRI scans:

19

[

[

[

144 pixels

144 pixels

144 slices

(images)

4 sequences

GDC Data. National Cancer Institute. https://portal.gdc.cancer.gov/

20 of 84

The Dataset

Per patient,

  • MRI scans:

20

[

[

[

144 pixels

144 pixels

144 slices

(images)

4 sequences

GDC Data. National Cancer Institute. https://portal.gdc.cancer.gov/

21 of 84

The Dataset

Per patient,

  • MRI scans:

Per slice,

  • Ground truth: radiologist-determined segmentations

21

[

[

[

144 pixels

144 pixels

144 slices

(images)

4 sequences

GDC Data. National Cancer Institute. https://portal.gdc.cancer.gov/

22 of 84

The Dataset

Per patient,

  • MRI scans:

Per slice,

  • Ground truth: radiologist-determined segmentations
  • Predictions: AI-generated segmentations

22

[

[

[

144 pixels

144 pixels

144 slices

(images)

4 sequences

U-Net

GDC Data. National Cancer Institute. https://portal.gdc.cancer.gov/

23 of 84

The Dataset

Per patient,

  • MRI scans:

Per slice,

  • Ground truth: radiologist-determined segmentations
  • Predictions: AI-generated segmentations

23

[

[

[

144 pixels

144 pixels

144 slices

(images)

4 sequences

U-Net

Difficulty: Using 4 sequences to produce one prediction

GDC Data. National Cancer Institute. https://portal.gdc.cancer.gov/

24 of 84

Explainability Attempts

  • Group 1: LIME
  • Group 2: Quantifying Uncertainty
  • Group 3: Embeddings Projector

24

25 of 84

Modifying LIME to Explain Tumor Segmentation Predictions

Amaan Jogia-Sattar, Audrey Kim, Rui Nie

25

GROUP 1

26 of 84

26

LIME:

  • Model-agnostic
  • Individual predictions

Expertise

UNet

Prediction

Ideally

False Positive?

False Negative?

27 of 84

27

Research trajectory

  • Modify and implement LIME
    • Accommodate data inputs and segmentation task
    • Which part of the brain scan of each MRI sequence positively contributes to the final prediction of tumor segmentation?
  • An Exploratory study of different segmentation algorithms of images used in LIME

28 of 84

LIME : What is it?

28

  • Local: approximate model behavior in the vicinity of local prediction�
  • Interpretable: local surrogate model is sparse, linear�
  • Model-agnostic: treat the original model as a black box�
  • Explanation: surrogate model weights correspond to approximated features of importance

29 of 84

LIME Modifications

29

One Grey or RGB image at a time

Classification

Class labels for each image

A mask consist of 0 and 1, indicating region of contribution

4 MRI sequences (images) for each scan

Segmentation (simulated with

binary classification by pixel)

Tumor/non-tumor label for each pixel

A mask consist of 0 and 1, indicating region of contribution

Model Input

Task

Explanation target

Explanation formats

Traditional LIME

LIME for UNet

30 of 84

LIME for U-Net

30

  • Sequence Extraction + Perturbation
  • “Black box” Prediction
  • Sparse Linear Surrogate Model
  • Superpixels

31 of 84

LIME for U-Net

31

  • Sequence Extraction + Perturbation
  • “Black box” Prediction
  • Sparse Linear Surrogate Model
  • Superpixels

32 of 84

LIME for U-Net

32

  • Sequence Extraction + Perturbation
  • “Black box” Prediction
  • Sparse Linear Surrogate Model
  • Superpixels

33 of 84

LIME for U-Net

33

  • Sequence Extraction + Perturbation
  • “Black box” Prediction
  • Sparse Linear Surrogate Model
  • Superpixels

34 of 84

Results: LIME explanations for single pixels

34

Patient case: ‘TCGA-HT-7874’

Brain slice: 75

Segmentation Algorithm: quick shift

Sequences: FLAIR

Explanation

Prediction by UNet

Tumor Label

35 of 84

Results (cont.):

Explanatory Plots

35

FLAIR

T1

T1Gd

T2

Sequences

(original)

Quick shift

Felzenszwalb

Heatmap masks:

  • Idea: weighted mean of two types of masks: pixels predicted as tumor vs. non-tumor

Mask Boundary plots:

  • Idea: delineate using thresholds (e.g. 0.5) on heatmap masks

36 of 84

Results (cont.):

Metrics of explanations

36

FLAIR

T1

T1Gd

T2

Quick shift

74.9%

72.0%

89.7%

74.9%

Felzenszwalb

26.1%

34.1%

12.8%

22.2%

Table: proportion of tumor pixels included in explanations (%)

37 of 84

Future directions

37

  • Reflect contextual information (e.g. clinical observations) in explanations as opposed to lone ground truth segmentations and tumor vs. non-tumor labeling.
  • Develop metrics for assessing explanation qualities and determine if particular MRI sequences result in more optimal diagnoses.
  • Attempt global explanation using a set of local instances.

38 of 84

Thank you!

Amaan Jogia-Sattar amaanjs@umich.edu

Audrey Kim audreyki@umich.edu

Rui Nie ruinie@umich.edu

38

39 of 84

Quantifying Uncertainty in a Tumor Segmentation Model

Claire Chu, Sara Colando, Dhruba Nandi, Xavier Serrano

39

GROUP 2

40 of 84

Transparency as Explainability

Transparency exposes a model’s properties to various stakeholders to better understand, improve, and contest model predictions.

40

Uncertainty Quantification in models communicates to stakeholders:

(a) if and when they should trust model predictions

(b) assess how fair these predictions are on sample-wide and patient-specific cases

So, Uncertainty is Transparency and Uncertainty is Explainability

41 of 84

How Does Uncertainty Enhance Explainability?

41

Explainable to Clinicians:

Explainable to Patients:

Explainable to Model Designers:

  • Allowing physicians to more confidently segment tumors
  • Clarity in review processes leading up to implementation of models in a clinical setting
  • Help model designers understand weaknesses
  • Collaboration with domain experts can clarify various types of errors and their implications
  • Encourage trust between clinician and patient
  • Help patients understand strengths and limitations of models without an overload of technical information

42 of 84

Central Goal:

Quantify model uncertainty by using a partially bayesian neural network (pBNN) to communicate where the model is uncertain of its prediction.

Research Questions:

  1. Where is this model failing, and how is it failing to properly segment the tumor?
  2. In what cases is the model certain but still makes a mistake in tumor segmentation?

42

43 of 84

Outline of Methods

43

44 of 84

Outline of Methods

44

(Snehal Prabhudesai 2022)

45 of 84

U-NET Architecture

45

Selected Layer

(Snehal Prabhudesai 2022)

46 of 84

Outline of Methods

46

(Snehal Prabhudesai 2022)

47 of 84

Bayesian Inference

Allows us to update the probability of a hypothesis as more data becomes available!

In neural net:

Using bayesian inference, the weights are sampled push-forward posterior distribution generated during training.

47

Example: Full Bayesian Neural Net

Push Forward Posterior Distribution

Output

Input

Hidden Layer

48 of 84

48

49 of 84

Why Use a Partially Bayesian Neural Net?

49

Targeted Bayesian inference on a small, strategically chosen single layer of the Deep Neural Network while training the rest of the network using less-expensive deterministic methods.

Promises of using a pBNN:

  • Less Computationally Expensive than using a complete bayesian neural networks.
  • Outputs a predicted value for each pixel between 0 (no tumor) and 1 (tumor) that serves as a probability for pixel classification.
  • Standard Deviation of sampled predictions can quantify model uncertainty → which increases explainability.

50 of 84

50

Training Summary:

Epochs = 400

Batch Size/Epoch: 256

Parameters: 7.8 million

Training Time: 11 hours

Tuning the Hyperparameters

51 of 84

Outline of Methods

51

52 of 84

52

Inaccurate Prediction but Not Uncertain?

Clustering of False Positive and False Negative?

OUTPUTS

INPUTS

Female, age 41

37.13 month survival time

Tissue Source Site: Case Western - St. Joes

Study: Brain Lower Grade Glioma

Histology: oligodendroglioma (G3)

53 of 84

53

Female, age 41

37.13 month survival time

Tissue Source Site: Case Western - St. Joes

Study: Brain Lower Grade Glioma

Histology: oligodendroglioma (G3)

Inaccurate Prediction but Not Uncertain?

Clustering of False Positive and False Negative?

OUTPUTS

INPUTS

54 of 84

54

Female, age 66,

15.97 month survival time

Tissue Source Site: Duke

Study: Glioblastoma multiforme

Histology: glioblastoma (G3)

High Sensitivity

Higher Uncertainty in Predicted Boundary Regions

55 of 84

Comparing Uncertainty Across Truth Prediction Discrepancy Values

55

More certain for accurate classification.

More certain for false negatives than false positives.

  • Less certain when classifying a pixel as “tumor”.
  • More likely to be falsely confident that a pixel is “non-tumor” than “tumor”.

False Negative

1.0

0.8

0.6

0.4

Normalized Uncertainty Distribution

0.2

0.0

False Positive

Accurate

56 of 84

Sample-wide Certainty ≠ Individual Level Certainty

56

False Negative

1.0

0.8

0.6

0.4

Normalized Uncertainty Distribution

0.2

0.0

False Positive

Accurate

False Positive

False Negative

Accurate

Accurate

False Positive

False Negative

Male, age 67, 7.69 month survival time

Tissue Source Site: Thomas Jefferson University

Study: Lower Brain Grade Glioma

Histology: Astrocytoma (G3)

Female, age 70, 5.32 month survival time

Tissue Source Site: Case Western St. Joes

Study: Lower Brain Grade Glioma

Histology: Astrocytoma (G3)

Sample-Wide

57 of 84

These patients’ clinical info are highly similar

57

…But the Normalized Uncertainty Distributions Vary

Especially in False Positive and Accurate Discrepancies

58 of 84

Future Work

Investigating the implications of the different kinds of model failure on clinical outcomes. Investigating what kind of model failure is considered more dangerous by clinicians.

58

Collaborating with clinicians to better understand why model fails in specific brain regions, and why false positive and false negative results tend to cluster.

Comparing model performance and uncertainty levels across various subsets (e.g. different tumor histologies, tissue source sites, patient sex, vital status, etc.).

59 of 84

References

59

Bhatt, Umang, Javier Antorn, Yunfeng Zhang, Q. Vera Liao, Prasanna Sattigeri, Riccardo Fogliato, Gabrielle Melançon, et al. 2021. “Uncertainty as a Form of Transparency: Measuring, Communicating, and Using Uncertainty.” In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society, 401–13. AIES ’21. New York, NY, USA: Association for Computing Machinery. https://doi.org/10.1145/3461702.3462571.

Prabhudesai, Snehal, Nicholas Wang, Vinayak Ahluwalia, Xun Huan, Jayapalli Bapuraj, Nikola Banovic, and Arvind Rao. 2021. “Stratification by Tumor Grade Groups in a Holistic Evaluation of Machine Learning for Brain Tumor Segmentation.” Frontiers in Neuroscience 15 (October). https://doi.org/10.3389/fnins.2021.740353.

Snehal Prabhudesai, Dingkun Guo, Jeremiah Hauth. 2022. “Partially Bayesian Neural Networks: Low-Cost Bayesian Uncertainty Quantification for Deep Learning in Medical Image Segmentation.”

60 of 84

Thank you!

60

61 of 84

Visualization Using

Embedding Projector

Jonathan Lin, Nguyen Tran-Bach, Ricardo Gloria-Picazzo, Savannah Gonzales

GROUP 3

62 of 84

Overview

Goal: Use TensorBoard to visualize and explain certain aspects of the machine learning model

  • While Uncertainty Maps and LIME are grounded in explainable AI, TensorBoard remains largely experimental, and has not been used extensively in the field.
  • Very little has been done to apply TensorBoard to explain tumor segmentation models.

After applying TensorBoard, we hope to obtain an intuitive visualization that can shed light onto why the machine learning model is behaving in the way that it is.

63 of 84

Embedding Projector

  • TensorFlow is an end-to-end open source platform for machine learning.
  • TensorBoard is a visualization toolkit provided by TensorFlow which includes an embedding projector tool.
  • Embedding: a technique that takes high-dimensional input data and plots them in a 2D or 3D space preserving some geometric structure.
  • Dimensionality reduction is extremely useful to visualize data especially when the dimension of the data is too high.

63

34

Daniel Smilkov, Nikhil Thorat, Charles Nicholson, Emily Reif, Fernanda B. Viégas, and Martin Wattenberg. Embedding projector: Interactive visualization and interpretation of embeddings, 2016.

64 of 84

GIF taken from Google

65 of 84

Embeddings shown in TensorBoard

  • PCA (Principal Component Analysis)

finds a submanifold in which, upon projected, the data points yield the highest empirical variance.

  • t-SNE (t-distributed Stochastic Neighbor Embedding)

creates a probability distribution by determining similarities in the data, tries to minimize KL convergence between the distribution in high-dimension and the one in low-dimension; works well for clustering.

  • UMAP (Uniform Manifold Approximation and Projection)

similar to t-SNE, but with additional mathematical assumptions.

65

65

Daniel Smilkov, Nikhil Thorat, Charles Nicholson, Emily Reif, Fernanda B. Viégas, and Martin Wattenberg. Embedding projector: Interactive visualization and interpretation of embeddings, 2016.

66 of 84

Ground Truth vs. Prediction

  • Aim to explain how accurate the model is.

  • 144 points representing the ground truth; 144 points representing the predictions; all of these are for a single patient.

66

UMAP of ground truth and prediction images

67 of 84

67

PCA of

ground truth

vs. prediction:

68 of 84

68

t-SNE of ground truth vs. prediction:

69 of 84

Output of First Layer

  • Aims to explain what the first layer does to the input.

  • 39 patients, from each patient we selected 3 middle z-slices.

  • Each of the 39 * 3 = 117 points represents the output values of the first layer.

69

UMAP output of first layer

70 of 84

First layer (Conv2D) and filters

70

71 of 84

71

PCA of outputs of first layer:

72 of 84

72

UMAP of outputs of first layer:

73 of 84

73

t-SNE of outputs of first layer:

74 of 84

Challenges

  • Computational intensiveness of certain techniques implemented by TensorBoard.
  • The visualization that TensorToard provides, while capable of clustering data points together, does not actually provide an explanation as to why those points are clustered in such a way.
  • Technical difficulties running TensorBoard on Armis2.
  • TensorBoard generally is used for classification models.
  • The layers may be too complex for the visualization to yield good results.

74

75 of 84

Future Developments

  • Apply TensorBoard and other dimensionality reduction techniques to subsequent layers in the model
  • Extend our example for using dimensionality reduction to explain the accuracy of the model to all patients simultaneously, rather than only a single patient at a time.
  • Multidimensional scaling (MDS): maps the points into a lower dimension while trying to minimize the loss function of distances. It is significantly faster than other methods when the number of data points is much smaller than the dimension of the data.

75

76 of 84

MDS (n = 282; d = 11,943,936)

76

77 of 84

Thank you!

Jonathan Lin jlin900@berkeley.edu

Nguyen Tran-Bach tactb@mit.edu

Ricardo Gloria Picazzo ricardo.gloria@cimat.mx

Savannah Gonzales srgonzal@umich.edu

77

78 of 84

Takeaways

Three XAI frameworks:

  • LIME aims to explain local (instance-specific) decisions.
  • Uncertainty Quantification aims to pinpoint where and how the model’s predictions fail.
  • TensorBoard aims to help visualize how accurate the model is and what it learns in each layer.

78

AI explainability

broader adoption in healthcare and beyond

us

79 of 84

References - Group 3

  1. Janet Bastiman. Explainability in AI: why you need it. Napier, 2021.
  2. Ahmed Hosny, Chintan Parmar, John Quackenbush, Lawrence H. Schwartz, and Hugo J. W. L. Aerts. Artificial intelligence in radiology. National Reviews Cancer, 18(8):500–510, 2018.
  3. Leland McInnes, John Healy, and James Melville. Umap: Uniform manifold approximation and projection for dimension reduction.
  4. Long Nguyen. Multivariate and categorical data analysis (UMich STAT 601), fall 2016.
  5. Keiron O’Shea and Ryan Nash. An introduction to convolutional neural networks, 2015.
  6. Snehal Prabhudesai, Nicholas Chandler Wang, Vinayak Ahluwalia, Xun Huan, Jayapalli Rajiv Bapuraj, Nikola Banovic, and Arvind Rao. Stratification by tumor grade groups in a holistic evaluation of machine learning for brain tumor segmentation. Frontiers in Neuroscience, 15, 2021.
  7. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-Net: Convolutional networks for biomedical image segmentation. Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, 2015.
  8. Daniel Smilkov, Nikhil Thorat, Charles Nicholson, Emily Reif, Fernanda B. Viégas, and Martin Wattenberg. Embedding projector: Interactive visualization and interpretation of embeddings, 2016.
  9. Jie Tian, Di Dong, Zhenyu Liu, and Jingwei Wei. Chapter 1 - introduction. In Jie Tian, Di Dong, Zhenyu Liu, and Jingwei Wei, editors, Radiomics and Its Clinical Application, The Elsevier and MICCAI Society Book Series, pages 1–18. Academic Press, 2021.

79

80 of 84

Acknowledgements

We would like to thank

  • Dr. Nikola Banovic and Snehal Prabhudesai for proposing this project and fearlessly and patiently mentoring us.
  • Dan Barker for his technical expertise.
  • The BDSI Coordinators – Sabrina Olsson, Hanna Venera, and Jenna Bedrava – for organizing and making our research possible .
  • Dr. Bhramar Mukherjee for everything: this opportunity, bringing us together, and more.

80

81 of 84

Results (cont.):

Explanatory Plots

81

FLAIR

T1

T1Gd

T2

Sequences

(original)

Quick shift

Felzenszwalb

Heatmap masks:

  • Idea: weighted mean of two types of masks: pixels predicted as tumor vs. non-tumor

Mark Boundary plots:

  • Idea: delineate using thresholds (e.g. 0.5) on heatmap masks

82 of 84

Neural networks: machine learning models that mirror the way human brains process information

  • Input layer: receive inputs of high-dimensional data, which are split up and mapped to a hidden layer
  • Hidden layer: determines how each input will improve or worsen the final output, using what it learned from the previous layer
  • Purpose is to learn from inputs in order to optimize outputs

82

Keiron O’Shea and Ryan Nash. An introduction to convolutional neural networks, 2015.

83 of 84

Convolutional neural networks (CNNs): a type of neural network primarily used for pattern recognition and image classification

  • Hidden layers create a vector that tells us what parts of the image have identifiable features
  • Parts of image with easily identifiable features → larger weights; Parts of image with harder to identify features → smaller weights
  • Weights are the effect that each pixel has on the final output image

83

Keiron O’Shea and Ryan Nash. An introduction to convolutional neural networks, 2015.

84 of 84

84

t-SNE of ground truth vs. prediction: