Medical Imaging with Deep Learning Overview
Popular image problems:
Multi-modality/view
Segmentation
Counting
Incorrect feature attribution
Slides by Joseph Paul Cohen 2020
Email: joseph@josephpcohen.com
License: Creative Commons Attribution-Sharealike
Chapter 1
Radiology and multi-view
2
Common X-ray projections
PA = PosteroAnterior
Image: [Bustos, “PadChest: A Large Chest x-Ray Image Dataset with Multi-Label Annotated Reports.” 2019]
Most common
L
R
Chest X-ray14 Dataset
Ronald Summers
NIH Clinical Center
Released 2017, first large scale chest X-ray dataset
>100k PA images released without copyright.
Enabled the deep learning radiology revolution
Stanford Pneumonia study
https://stanfordmlgroup.github.io/projects/chexnet/
In 2017 Pranav Rajpurkar and Jeremy Irvin trained a DenseNet on NIH data scaled to 224x224 pixels
Set the benchmark performance which has not been significantly improved.
They evaluated pneumonia predictions against 4 radiologists.
"We find that the model exceeds the average radiologist performance on the pneumonia detection task."
Criticism of the Chest X-ray14 Dataset
https://lukeoakdenrayner.wordpress.com/2017/12/18/the-chestxray14-dataset-problems/
In 2017 Luke Oakden-Rayner published a blog post discussing issues with the labels in the NIH data.
This led to more work on automatic label extraction.
In a sample of images red
are said to be wrong
2019: the year of X-ray data
PADCHEST
160k images
Multiple views
Almost 200 labels
27% hand labelled, others using an RNN.
License:Creative Commons Attribution-ShareAlike
CheXpert
224k images
PA and L views
13 labels.
Automated rule-based labeler
Non-commercial research purposes only
MIMIC-CXR
377k images
PA and L views
13 labels.
Automated rule-based labeler. NIH (NegBio) and CheX labelers ran.
Non-commercial research purposes only. Confidentially training required.
Multi-modal/view inference (X-ray use case)
PA
Lateral
Flattened diaphragm
Pleural effusion
Here saliency maps are from models trained on single views.
These two tasks perform better when using lateral views.
[Bertrand, 2019]
Also: Multi-modal/view inference (MRI use case)
T1
T2
T1C
Flair
Ischemic stroke lesion segmentation
Stroke Perfusion Estimation
Brain tumor segmentation
Image Credit: Mohammad Havaei
Challenge: missing modalities/views
Patient 1
Patient 2
Patient 3
Incomplete
Input!
Expected:
Given:
Integrating multiple views
Combine images right at the input
Take mean of activations in the middle of the network
Concat output features of two models with single prediction
Three losses. A network for each modality with losses that regularize each network.
Image: [Hashir, Quantifying the Value of Lateral Views in Deep Learning for Chest X-rays, 2020]
Integrating multiple views (X-ray images)
All models are about equal in performance given the right hyperparameters.
Hyperparameter tuning is easier on some models but not others
Image: [Hashir, Quantifying the Value of Lateral Views in Deep Learning for Chest X-rays, 2020]
Chapter 2
Histology and segmentation
13
Peter Bandi, et al. From detection of individual metastases to classification of lymph node status at the patient level: the CAMELYON17 challenge. IEEE-TMI 2018
CAMELYON17: A large high resolution open histology dataset for cancer detection
CAMELYON17 Dataset
1000 whole-slide images (WSIs) of sentinel lymph node. (~3GB each!)
5 medical centers. 40 patients from each center. 5 whole-slide images per patient.
Patch wise segmentation�Use case: Invasive Ductal Carcinoma (most common subtype of all breast cancers)
Starting with a full slide image of breast tissue.
Image is labelled as IDC or not
Image is chopped into patches and labelled as IDC or not
Patch wise segmentation�Use case: Invasive Ductal Carcinoma (most common subtype of all breast cancers)
Slide design: Fei-Fei Li & Andrej Karpathy & Justin Johnson
Extract �patch
Run through�a CNN
Classify center pixel
CNN
cancer
Class imbalance is an issue. Patch wise training allows easy balancing of classes using standard methods.
Fully convolutional processing (FCN)
Kernel size 3
Kernel size 2
Input size 4
Output size 1
Input size 5
Output size 2
What is this model's receptive field?
How many multiplications were saved?
How many saved for an input size of 6?
4
4 nodes
8
Allows for very fast inference.
However, training this way requires a lot of memory. Need to save past outputs.
Patch wise training together with FCN inference is a good balance.
Input image
Output class 0
Output class 1
class 1 > class 0
Recap: Segmentation using a bottleneck
Noh et al, “Learning Deconvolution Network for Semantic Segmentation”, ICCV 2015
Slide design: Fei-Fei Li & Andrej Karpathy & Justin Johnson
Normal VGG
“Upside down” VGG
Upsampling possible with
Recap: U-NET
Difference:
Skip connections (like resnet)
�Dogma: skips carry spatial information, bottleneck carries high level structure.
Segmentation metrics
gt
pred
True Positive
True Negative
False Negative
False Positive
IoU=0.4
IoU=0.7
IoU=0.9
Training with dice
Using the dot product to compute the intersection allows for a differentiable loss.
For multiple classes a basic approach is to average over all classes
What maximizes the numerator?
More reading: https://arxiv.org/abs/1707.00478
Use a sigmoid or a softmax to restrict output.
Images provided by Konrad Wagstyl (University College London) 2020
input
gt seg
output prob
Edge prediction
pred seg
Baseline
With edge prediction
Tricks: Improving edges in segmentations by predicting edges
Brain histology
More reading about idea: [Polzounov, WordFence: Text Detection in Natural Images with Border Awareness, 2017]
Segmentation of cortex
Task: segment cortical layers in brain histology
Challenge: extreme class imbalance (e.g. lung nodule)
Background classes can dominates the loss and cause learning instability do to large gradients.
Balanced sampling may not work as well because patches which could yield false positives are rarely seen to train on.
CASED importance sampling for large images
[Jesson, https://arxiv.org/abs/1807.10819 ]
General Idea:
Store a probability for each patch.
Generate patches based on this probability.
Probability is inverse of how well your model performs on that patch.
Samples are stratified by class.
Chapter 3
Counting
26
Use case: Proliferation/Cell growth studies
Treat cells with different compounds and observe proliferation over time
Standard 96-well plate
Use case: Proliferation/Cell growth studies
Bachstetter, MW151 Inhibited IL-1? Levels after Traumatic Brain Injury with No Effect on Microglia Physiological Responses, PLOS ONE, 2017
Use case: Proliferation/Cell growth studies
At the Cell Counter: THP-1 Cells, Molecular Devices
https://www.moleculardevices.com/cell-counter-thp-1-cells
Use case: Counting in histology slides
Complicated cell structure
Cell counting (classic CV)
Cell counting (classic CV)
Cell counting (classic CV)
Cell counting (classic CV)
Cell counting (classic CV)
This works well on easy tasks but doesn't scale.
"Pipelines" end up breaking on new images with different lighting or stain.
How to get labels?
Counting via Segmentation
V. Lempitsky and A. Zisserman, “Learning To Count Objects in Images,” 2010.
To recover count:
Targets for regression
Sigma is typically small like a few pixels
Train model to regress
Multiple output classes
Count and classify different cell types [Bidart 2018]
Counting and classifying also possible using multiple output channels.
Combine losses together
Max prediction over output channels for each cell identified
Chapter 4
GANs
39
Medical image-to-image translation considered harmful
MR -> CT
CT -> PET
Synthesized H&E staining
Adversarial losses are very good at distribution matching
(e.g. CycleGAN).
But artifacts could be introduced and then used in diagnosis which can be dangerous.
Many papers have proposed methods that can "translate between modalities"
But a bias in training data can lead to incorrect translation
41
T1 Transformed
Everyone is so healthy!
T1 Real
Real Image
Image
Translation/
Synthesis
Source Image
T1 Real
Use case: MRI modality transformation
Cohen, Distribution Matching Losses Can Hallucinate Features in Medical Image Translation, 2018
42
CycleGAN results
% training data with tumor
43
Real Flair
Real T1
Biased Transformations
Chapter 5
Right for the right reasons
44
Incorrect feature attribution
NIH
PADCHEST
Example: Systematic discrepancy between
average image in datasets
Models can overfit to confounding variables in the data.
Overfitting while predicting Emphysema [Vivano 2019]
[Zeck, Confounding variables can degrade generalization performance of radiological deep learning models, 2018]
[Viviano, Underwhelming Generalization Improvements From Controlling Feature Attribution, 2019]
[Simpson, GradMask: Reduce Overfitting by Regularizing Saliency, 2019]
[Ross, Right for the Right Reasons, 2017]
Mitigation approaches
Feature engineering
During training
46
What if feature artifact is correlated with target label?�Is the reason that should be used for prediction known?
What if it is not known?
Not discussed
Image Registration
Cell morphology representation (e.g. BBBC021)