1 of 47

Medical Imaging with Deep Learning Overview

2 of 47

Chapter 1

Radiology and multi-view

2

3 of 47

Common X-ray projections

PA = PosteroAnterior

Image: [Bustos, “PadChest: A Large Chest x-Ray Image Dataset with Multi-Label Annotated Reports.” 2019]

Most common

L

R

4 of 47

Chest X-ray14 Dataset

Ronald Summers

NIH Clinical Center

Released 2017, first large scale chest X-ray dataset

>100k PA images released without copyright.

Enabled the deep learning radiology revolution

5 of 47

Stanford Pneumonia study

https://stanfordmlgroup.github.io/projects/chexnet/

In 2017 Pranav Rajpurkar and Jeremy Irvin trained a DenseNet on NIH data scaled to 224x224 pixels

Set the benchmark performance which has not been significantly improved.

They evaluated pneumonia predictions against 4 radiologists.

"We find that the model exceeds the average radiologist performance on the pneumonia detection task."

6 of 47

Criticism of the Chest X-ray14 Dataset

https://lukeoakdenrayner.wordpress.com/2017/12/18/the-chestxray14-dataset-problems/

In 2017 Luke Oakden-Rayner published a blog post discussing issues with the labels in the NIH data.

This led to more work on automatic label extraction.

In a sample of images red

are said to be wrong

7 of 47

2019: the year of X-ray data

PADCHEST

160k images

Multiple views

Almost 200 labels

27% hand labelled, others using an RNN.

License:Creative Commons Attribution-ShareAlike

CheXpert

224k images

PA and L views

13 labels.

Automated rule-based labeler

Non-commercial research purposes only

MIMIC-CXR

377k images

PA and L views

13 labels.

Automated rule-based labeler. NIH (NegBio) and CheX labelers ran.

Non-commercial research purposes only. Confidentially training required.

8 of 47

Multi-modal/view inference (X-ray use case)

PA

Lateral

Flattened diaphragm

Pleural effusion

Here saliency maps are from models trained on single views.

These two tasks perform better when using lateral views.

[Bertrand, 2019]

9 of 47

Also: Multi-modal/view inference (MRI use case)

T1

T2

T1C

Flair

Ischemic stroke lesion segmentation

Stroke Perfusion Estimation

Brain tumor segmentation

Image Credit: Mohammad Havaei

10 of 47

Challenge: missing modalities/views

Patient 1

Patient 2

Patient 3

Incomplete

Input!

Expected:

Given:

11 of 47

Integrating multiple views

Combine images right at the input

Take mean of activations in the middle of the network

Concat output features of two models with single prediction

Three losses. A network for each modality with losses that regularize each network.

Image: [Hashir, Quantifying the Value of Lateral Views in Deep Learning for Chest X-rays, 2020]

12 of 47

Integrating multiple views (X-ray images)

All models are about equal in performance given the right hyperparameters.

Hyperparameter tuning is easier on some models but not others

Image: [Hashir, Quantifying the Value of Lateral Views in Deep Learning for Chest X-rays, 2020]

13 of 47

Chapter 2

Histology and segmentation

13

14 of 47

Peter Bandi, et al. From detection of individual metastases to classification of lymph node status at the patient level: the CAMELYON17 challenge. IEEE-TMI 2018

CAMELYON17: A large high resolution open histology dataset for cancer detection

CAMELYON17 Dataset

1000 whole-slide images (WSIs) of sentinel lymph node. (~3GB each!)

5 medical centers. 40 patients from each center. 5 whole-slide images per patient.

15 of 47

Patch wise segmentation�Use case: Invasive Ductal Carcinoma (most common subtype of all breast cancers)

https://colab.research.google.com/drive/13T9s3weexAw6YskKoY6c-VvoUgUvWsqf

Starting with a full slide image of breast tissue.

Image is labelled as IDC or not

Image is chopped into patches and labelled as IDC or not

16 of 47

Patch wise segmentation�Use case: Invasive Ductal Carcinoma (most common subtype of all breast cancers)

Slide design: Fei-Fei Li & Andrej Karpathy & Justin Johnson

Extract �patch

Run through�a CNN

Classify center pixel

CNN

cancer

Class imbalance is an issue. Patch wise training allows easy balancing of classes using standard methods.

17 of 47

Fully convolutional processing (FCN)

Kernel size 3

Kernel size 2

Input size 4

Output size 1

Input size 5

Output size 2

What is this model's receptive field?

How many multiplications were saved?

How many saved for an input size of 6?

4

4 nodes

8

Allows for very fast inference.

However, training this way requires a lot of memory. Need to save past outputs.

Patch wise training together with FCN inference is a good balance.

18 of 47

https://colab.research.google.com/drive/13T9s3weexAw6YskKoY6c-VvoUgUvWsqf

Input image

Output class 0

Output class 1

class 1 > class 0

19 of 47

Recap: Segmentation using a bottleneck

Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015

Slide design: Fei-Fei Li & Andrej Karpathy & Justin Johnson

Normal VGG

“Upside down” VGG

Upsampling possible with

Unpooling
Transposed convolutions

20 of 47

Recap: U-NET

Difference:

Skip connections (like resnet)

�Dogma: skips carry spatial information, bottleneck carries high level structure.

21 of 47

Segmentation metrics

gt

pred

True Positive

True Negative

False Negative

False Positive

IoU=0.4

IoU=0.7

IoU=0.9

22 of 47

Training with dice

Using the dot product to compute the intersection allows for a differentiable loss.

For multiple classes a basic approach is to average over all classes

What maximizes the numerator?

More reading: https://arxiv.org/abs/1707.00478

Use a sigmoid or a softmax to restrict output.

23 of 47

Images provided by Konrad Wagstyl (University College London) 2020

input

gt seg

output prob

Edge prediction

pred seg

Baseline

With edge prediction

Tricks: Improving edges in segmentations by predicting edges

Brain histology

More reading about idea: [Polzounov, WordFence: Text Detection in Natural Images with Border Awareness, 2017]

Segmentation of cortex

Task: segment cortical layers in brain histology

24 of 47

Challenge: extreme class imbalance (e.g. lung nodule)

Background classes can dominates the loss and cause learning instability do to large gradients.

Balanced sampling may not work as well because patches which could yield false positives are rarely seen to train on.

25 of 47

CASED importance sampling for large images

[Jesson, https://arxiv.org/abs/1807.10819 ]

General Idea:

Store a probability for each patch.

Generate patches based on this probability.

Probability is inverse of how well your model performs on that patch.

Samples are stratified by class.

26 of 47

Chapter 3

Counting

26

27 of 47

Use case: Proliferation/Cell growth studies

Treat cells with different compounds and observe proliferation over time

Standard 96-well plate

28 of 47

Use case: Proliferation/Cell growth studies

Bachstetter, MW151 Inhibited IL-1? Levels after Traumatic Brain Injury with No Effect on Microglia Physiological Responses, PLOS ONE, 2017

29 of 47

Use case: Proliferation/Cell growth studies

At the Cell Counter: THP-1 Cells, Molecular Devices

https://www.moleculardevices.com/cell-counter-thp-1-cells

30 of 47

Use case: Counting in histology slides

Complicated cell structure

31 of 47

Cell counting (classic CV)

Create binary segmentation image�
Watershed segmentation�
Isolate and count

32 of 47

Cell counting (classic CV)

Create binary segmentation image�
Watershed segmentation�
Isolate and count

33 of 47

Cell counting (classic CV)

Create binary segmentation image�
Watershed segmentation�
Isolate and count

34 of 47

Cell counting (classic CV)

Create binary segmentation image�
Watershed segmentation�
Isolate and count

35 of 47

Cell counting (classic CV)

Create binary segmentation image�
Watershed segmentation�
Isolate and count

This works well on easy tasks but doesn't scale.

"Pipelines" end up breaking on new images with different lighting or stain.

36 of 47

How to get labels?

37 of 47

Counting via Segmentation

V. Lempitsky and A. Zisserman, “Learning To Count Objects in Images,” 2010.

To recover count:

Targets for regression

Sigma is typically small like a few pixels

Train model to regress

38 of 47

Multiple output classes

Count and classify different cell types [Bidart 2018]

Counting and classifying also possible using multiple output channels.

Combine losses together

Max prediction over output channels for each cell identified

39 of 47

Chapter 4

GANs

39

40 of 47

Medical image-to-image translation considered harmful

MR -> CT

CT -> PET

Synthesized H&E staining

Adversarial losses are very good at distribution matching

(e.g. CycleGAN).

But artifacts could be introduced and then used in diagnosis which can be dangerous.

Many papers have proposed methods that can "translate between modalities"

41 of 47

But a bias in training data can lead to incorrect translation

41

T1 Transformed

Everyone is so healthy!

T1 Real

Real Image

Image

Translation/

Synthesis

Source Image

T1 Real

Use case: MRI modality transformation

Cohen, Distribution Matching Losses Can Hallucinate Features in Medical Image Translation, 2018

42 of 47

42

CycleGAN results

% training data with tumor

43 of 47

43

Real Flair

Real T1

Biased Transformations

44 of 47

Chapter 5

Right for the right reasons

44

45 of 47

Incorrect feature attribution

NIH

PADCHEST

Example: Systematic discrepancy between

average image in datasets

Models can overfit to confounding variables in the data.

Merging datasets with different class imbalance�(confounding artifacts from each hospital)�
Labels confounding with each other�
Demographics confounding with labels

Overfitting while predicting Emphysema [Vivano 2019]

[Zeck, Confounding variables can degrade generalization performance of radiological deep learning models, 2018]

[Viviano, Underwhelming Generalization Improvements From Controlling Feature Attribution, 2019]

[Simpson, GradMask: Reduce Overfitting by Regularizing Saliency, 2019]

[Ross, Right for the Right Reasons, 2017]

46 of 47

Mitigation approaches

Feature engineering

Range normalization ( /max)
Subspace alignment (align data using their eigenbasis based on a feature)

During training

Reverse gradient (make intermediate layer invariant to a label) [Ganin & Lempitsky, 2014]
Right for the Right Reasons (regularize saliency map) [Ross, Hughes, & Finale Doshi-Velez, 2017]
GradMask (regularize contrast saliency map between classes) [Simpson, 2019]
ActivDiff (regularize representation to focus on pathology) [Viviano, 2019]

46

What if feature artifact is correlated with target label?�Is the reason that should be used for prediction known?

What if it is not known?

47 of 47

Not discussed

Image Registration

Cell morphology representation (e.g. BBBC021)