1 of 62

MRI-Based Alzheimer’s Disease Classification Using Deep Learning: A Novel Small-Data Approach

Raja Haseeb

(raja@rit.kaist.ac.kr)

Advisor: Prof. Jong-Hwan Kim

May 26, 2021

M.S. Dissertation

School of Electrical Engineering, KAIST

RIT

Robot Intelligence Technology Laboratory�Challenge for Knowledge Creation and Innovative Technology

2 of 62

  1. Introduction

1.1 Research Background

1.2 Research Motivation

1.3 Research Outline

  • Proposed Framework

2.1 Overall Approach

2.2 Data Augmentation

2.3 Attention Mechanism

2.4 Contrastive Learning

2.5 Classification Network

  • Experiments and Results

3.1 Dataset

3.2 Data Augmentation

3.3 Comparison of Various Architectures

3.4 Proposed Architecture

3.5 Results

3.6 Comparison with Existing Methods

  • Conclusion and Future Work

Contents

3 of 62

1. Introduction

3

4 of 62

1.1 Research Background

    • What is Dementia
      • Dementia is a word used to describe a group of symptoms that occur when brain cells stop working properly.
      • There are over 100 diseases that may cause dementia
      • Types
        • Alzheimer’s disease
        • Vascular dementia
        • Dementia with Lewy bodies
        • Frontotemporal dementia
        • Alcohol related dementia

      • Although often thought of as a disease of older people, around 4% of people with Alzheimer’s are under 65. This is called early-onset or young-onset Alzheimer’s. It usually affects people in their 40s, 50s and early 60s.

    • What is Alzheimer’s Disease?
      • Alzheimer’s is a progressive neurodegenerative disease
      • Most common cause of Dementia (70% of the cases)
      • 5.7 million Americans with AD in 2018, Number will rise to 14 million by 2050
      • Causes cognitive impairment and problems with memory, thinking and behavior

  1. Alzheimer's Association. "2018 Alzheimer's disease facts and figures." Alzheimer's & Dementia 14.3 (2018): 367-429.

4 / 59

5 of 62

1.1 Research Background

    • Classification
      • Cognitive Normal (CN)
      • Alzheimer’s Disease (AD)
      • Mild Cognitive Impairment (MCI)
        • MCI describes people having mild symptoms of brain impairment. The MCI patients are still able to perform daily activities up to some extent. However, their ability to do so declines with time as the disease progresses and the patients in this phase have high chances of progressing into dementia

      • Stable MCI (sMCI), stable Normal Controls (sNC), progressive Normal Controls (pNC), progressive MCI (pMCI), stable AD (sAD)

    • Traditionally, physicians diagnosed patients themselves using clinical methods
      • Cerebrospinal fluid (CSF) concentration in the brain is reported to indicate the presence of AD.
      • A ventricular puncture is used for the collection of CSF
      • This process can be arduous and can cause bleeding in the brain

5 / 59

6 of 62

1.1 Research Background

    • Medical Imaging Techniques
      • A lot of focus has been put on the development of medical imaging techniques in recent years.
        • MRI, PET, CT are used to diagnose functional and structural changes in the brain.
      • Observe changes in the brain structure (Changes in WM, GM, CSF, ventricles etc. caused by Alzheimer’s disease)
      • This process can be costly, laborious, time-consuming and prone to human errors

    • Healthcare sector is not small anymore
      • A lot of patients and records
      • Therefore, there is a need for an automated way for the diagnosis which takes less time and effort, is reliable, less costly, and helps practitioners.

    • Machine learning to the rescue
      • Use of traditional ML algorithms for medical diagnosis
        • SVM, Random forests, kNN and so on.
      • Deep learning, the new paradigm
      • Huge success in medical domain

6 / 59

7 of 62

1.1 Research Background

    • Various methods in recent years for AD classification and prediction using machine learning
      • Single modal and multi-modal approaches
      • Multimodal and Multiscale Deep Neural Networks (Lu et al., 2018),

GRU-based (Lee et al., 2019), Non-linear SVM (Rallabandi et al., 2020),

Multi-modal deep learning (Goto et al., 2020), Transfer learning (Khan et al., 2020),

LSTM-based (Hong et al., 2019)

    • Good accuracy results (range 80%~90%)

7 / 59

8 of 62

1.1 Research Background

  • Multimodal and Multiscale Deep Neural Networks for the Early Diagnosis of Alzheimer’s disease using structural MRI and FDG-PET images
    • ADNI (Alzheimer’s disease Neuroimaging Initiative) dataset (2402 T1 MRI + 2402 FDG- PET images)
    • Segment gray matter, then divide into patches and extract features
    • Six independent DNNs, corresponding to each scale of single modality
    • Features from these 6 fused together by another DNN to predict final score
    • 3 scales for each MRI and FDG-PET image (based on different patch sizes)
    • Accuracy up to 82%

Multimodal and Multiscale Deep Neural Networks (Lu et al., 2018)

  1. Lu, Donghuan, et al. "Multimodal and multiscale deep neural networks for the early diagnosis of Alzheimer’s disease using structural MR and FDG-PET

Images." Scientific reports 8.1 (2018): 1-13.

8 / 59

9 of 62

1.1 Research Background

  • Predicting Alzheimer’s disease progression using multi-modal deep learning approach
    • 1,618 ADNI participants aged 55 to 91 were used, which include 415 cognitively normal older adult controls (CN), 865 MCI and 338 AD patient
    • Separately build GRU feature extractors for each modality
    • Each GRU component takes both time series and non-time series data
    • Integrate the four extracted features at the end for final prediction
    • Accuracy up to 81%

Multimodal Deep Learning (Lee et al., 2019)

  1. Lee, Garam, et al. "Predicting Alzheimer’s disease progression using multi-modal deep learning approach." Scientific reports 9.1 (2019): 1-12.

9 / 59

10 of 62

1.1 Research Background

  • Convolution neural network–based Alzheimer's disease classification using hybrid enhanced independent component analysis based segmented gray matter of T2 weighted magnetic resonance imaging with clinical evaluation
    • 1820 T2-weighted brain MRI (635 AD MRIs, 548 MCI MRIs, 637 CN MRIs)
    • Extract gray matter from brain voxels and then perform classification using CNN
    • Accuracy up to 90.47%

Architecture design (Basheera et al., 2019)

  1. Basheera, Shaik, and M. Satya Sai Ram. "Convolution neural network–based Alzheimer's disease classification using hybrid enhanced

independent component analysis based segmented gray matter of T2 weighted magnetic resonance imaging with clinical valuation.“

Alzheimer's & Dementia: Translational Research & Clinical Interventions 5 (2019): 974-986.

10 / 59

11 of 62

1.1 Research Background

  • Automatic classification of cognitively normal, mild cognitive impairment and Alzheimer’s disease using structural MRI analysis
    • 1167 whole-brain T1 MRI from ADNI
    • Used libraries/tools to extract brain tissues and segment them into gray matter, white matter and cerebrospinal fluid
    • Compute the regional cortical thickness (CT) of several anatomical regions
    • AD progression affects regional cortical thickness
    • Cortical thickness is the distance between white-gray interface and gray-CSF

interface

    • 68 CT features extracted
    • Training using several ML algorithms with Auto-WEKA 2.6 tool
    • Non-linear SVM found to be best classifier
    • Accuracy up to 75%

  1. Rallabandi, VP Subramanyam, et al. "Automatic classification of cognitively normal, mild cognitive impairment and Alzheimer's disease using

structural MRI analysis." Informatics in Medicine Unlocked 18 (2020): 100305.

11 / 59

12 of 62

1.1 Research Background

  • Automatic classification of cognitively normal, mild cognitive impairment and Alzheimer’s disease using structural MRI analysis

Schematic diagram of proposed approach (Rallabandi et al., 2020)

  1. Rallabandi, VP Subramanyam, et al. "Automatic classification of cognitively normal, mild cognitive impairment and Alzheimer's disease using

structural MRI analysis." Informatics in Medicine Unlocked 18 (2020): 100305.

12 / 59

13 of 62

1.1 Research Background

  • Ensembles of Patch-Based Classifiers for Diagnosis of Alzheimer Diseases
    • Structural Magnetic Resonance Imaging (sMRI) as the modality (352 MRI scans belonging to AD, NC and MCI)
    • Hippocampus region focused as the input feature for the CNN
    • Localize hippocampus manually, then generate 32 x 32 patches from the local region
    • 32 x 32 patches from each of the sagittal, axial and coronal view and merged them as a single sample
    • These three view patches (TVPs) are fed into the network
    • Three individual models trained and results are combined i.e. CNN for left hippocampus, CNN for right hippocampus and CNN for the left and right hippocampus classification
    • The ensemble of the three models achieves accuracy of 85.55% on ADNI dataset

  1. Ahmed, Samsuddin, et al. "Ensembles of patch-based classifiers for diagnosis of Alzheimer diseases." IEEE Access 7 (2019): 73373-73383.

13 / 59

14 of 62

1.1 Research Background

  • Ensembles of Patch-Based Classifiers for Diagnosis of Alzheimer Diseases

Schematic diagram of proposed approach (Ahmed et al., 2019)

  1. Ahmed, Samsuddin, et al. "Ensembles of patch-based classifiers for diagnosis of Alzheimer diseases." IEEE Access 7 (2019): 73373-73383.

14 / 59

15 of 62

1.1 Research Background

  • Multi-modal deep learning for predicting progression of Alzheimer's disease using bi-linear shake fusion
  • Predicting Alzheimer’s Disease Using LSTM
  • Transfer Learning With Intelligent Training Data Selection for Prediction of Alzheimer’s Disease

  1. Goto, Tsubasa, et al. "Multi-modal deep learning for predicting progression of Alzheimer's disease using bi-linear shake fusion." 
  2. Hong, Xin, et al. "Predicting Alzheimer’s disease using LSTM." IEEE Access 7 (2019): 80893-80901.
  3. Khan, Naimul Mefraz, Nabila Abraham, and Marcia Hon. "Transfer learning with intelligent training data selection for prediction of Alzheimer’s disease.“

 IEEE Access 7 (2019): 72726-72735.

15 / 59

16 of 62

1.2 Research Motivation

    • Timely diagnosis and treatment of AD patients

    • Limitations of past work
      • No focus on small-data regime
        • All the previous work for AD diagnosis utilized a huge amount of labeled data
        • There are many cases when there isn’t enough labeled data
        • One main reason is that patient data is well protected by the patient data laws
        • Also, it is expensive and time-consuming to obtain more medical data especially annotated ones since you need a medical expert for that purpose
        • If the disease is rare like Covid-19, then data is not enough
        • The imaging and data retrieval standards also differ from country to country and even from one hospital to another, which makes the process even more complicated

  1. Wen, Junhao, et al. "Convolutional neural networks for classification of Alzheimer's disease: Overview and reproducible evaluation." Medical image analysis 63 (2020): 101694.

16 / 59

17 of 62

1.2 Research Motivation

      • Transfer learning
        • Another limitation is the use of using transfer learning like pre-training on ImageNet data
        • Medical data is different from real-world data, and pre-training on real-world images like ImageNet is not a very useful idea for medical diagnosis

      • Different data and metrics
        • Also, it is not possible to make a direct comparison between various past works
        • These approaches differ in the participants involved, image processing procedures, cross-validation methods, and the evaluation metrics

      • Different data selection method
        • There has been some limitation in data selection methods as well
        • Some work utilized all the slices from a patient's data. Usually not all the slices are good for AD classification, and some bad slices can also cause performance degradation
        • Entropy-based selected does not show good results

  1. Wen, Junhao, et al. "Convolutional neural networks for classification of Alzheimer's disease: Overview and reproducible evaluation." Medical image analysis 63 (2020): 101694.

17 / 59

18 of 62

1.2 Research Motivation

      • Biased evaluation (Data Leakage)
        • Data Leakage, which refers to the presence of test data in any part of the training process, is a major source of bias during the evaluation.
        • Since DL approaches are flexible and complex, data leakage can be very hard to detect

        • Wrong data split
            • Training, validation, and test set should be separated at the subject-level and not the data-level.
            • If not, then data from the same patient can appear in several sets, resulting in biased evaluation of the model.

        • Late split
            • Procedures like feature selection, data augmentation, or pre-training must never use test data.
            • These steps should be performed on the training set after separating the data into training, validation, and test set.

  1. Wen, Junhao, et al. "Convolutional neural networks for classification of Alzheimer's disease: Overview and reproducible evaluation." Medical image analysis 63 (2020): 101694.

18 / 59

19 of 62

1.2 Research Motivation

        • Absence of independent test set
            • To correctly evaluate the performance of the classifier, the test set should be separate and should only be used in the final stage to assess the classifier.
            • Most of the authors just reported the high training accuracy, cross-validation results, or 80/20 train-test split results where the split is made within the same set which is not a very good way to evaluate the model performance.
            • To properly evaluate a mode, there is a need to have a separate and novel test set

  1. Wen, Junhao, et al. "Convolutional neural networks for classification of Alzheimer's disease: Overview and reproducible evaluation." Medical image analysis 63 (2020): 101694.

19 / 59

20 of 62

1.3 Research Outline

    • ADNI dataset
      • The Alzheimer's Disease Neuroimaging Initiative (ADNI) began in 2004 under the leadership of Dr. Michael W. Weiner
      • Foundation for the National Institutes of Health and National Institute on Aging
      • The Alzheimer’s Disease Neuroimaging Initiative (ADNI) unites researchers with study data as they work to define the progression of Alzheimer’s disease
      • ADNI researchers collect, validate and utilize data, including MRI and PET images, genetics, cognitive tests, CSF and blood biomarkers as predictors of the disease
      • Study resources and data from the North American ADNI study are available through this website, including Alzheimer’s disease patients, mild cognitive impairment subjects, and elderly controls

20 / 59

21 of 62

1.3 Research Outline

    • Research Objective
      • Classification of Alzheimer’s Disease (AD) and Cognitive Normal (CN) subjects
      • AD vs. CN vs. MCI (Mild Cognitive Impairment)

    • Dealing with medical data scarcity
      • Various approaches
        • N-shot learning methods, Matching networks, Siamese networks, GAN-based methods, Meta-learning, Surrogate data methods, self-supervised methods and so on.
      • GAN-based medical image synthesis

    • Architecture selection
      • Testing various CNN architectures
        • Custom CNN, Inception network, ResNet, DenseNet, Multi-scale CNN, Residual Attention Network, Transfer Learning

21 / 59

22 of 62

1.3 Research Outline

    • Proposed architecture
      • ResNet-181 architecture with CBAM2 (Convolutional Block Attention Module)
      • Pretraining with SimCLR3 Framework

  1. Wang, Fei, et al. "Residual attention network for image classification." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
  2. Woo, Sanghyun, et al. "Cbam: Convolutional block attention module." Proceedings of the European conference on computer vision (ECCV). 2018.
  3. Chen, Ting, et al. "A simple framework for contrastive learning of visual representations." International conference on machine learning. PMLR, 2020.

Input Data

Data

Augmentation

Contrastive Learning

Classification Network

Results

Traditional

+

PGGAN-Based

Pretraining with SimCLR to learn useful

representations

ResNet-18

+

CBAM

AD

CN

MCI

22 / 59

23 of 62

1.3 Research Outline

    • Experiments and Results
      • Two category classification (AD vs. CN)
      • Three category classification (AD vs. CN vs. MCI)

23 / 59

24 of 62

2. Proposed framework

24

25 of 62

2.1 Overall Approach

Generated data

Training data

+

AD

CN

MCI

Contrastive

Learning

Classification phase

Pre-training phase

(ResNet-18 + CBAM)

ResNet-18 architecture

ResNet-18 + CBAM

Input

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

FC

Softmax

AvgPool

25 / 59

26 of 62

2.2 Data Augmentation

    • Dealing with data scarcity

    • Adopted two data augmentation methods for a small dataset with only few MRI scans
      • Traditional data augmentation
      • GAN-based augmentation

26 / 59

27 of 62

2.2 Data Augmentation

  • Traditional data augmentation
      • Remove class imbalance
      • Increase data set for GAN training
      • Rotation, shear, zoom, shift (translation)

  • GAN-based image generation
    • Progressive Growing of GAN with Wasserstein Gradient Penalty1 (PGGAN-WGP)
    • Generate same number of images as the original training data

  1. Karras, Tero, et al. "Progressive Growing of GANs for Improved Quality, Stability, and Variation." International Conference on Learning Representations. 2018.

27 / 59

28 of 62

2.2 Data Augmentation

    • What is GAN
      • Generator and a Discriminator
      • Discriminator acts as a critic

      • The formula derives from the cross-entropy between the real and generated distributions
      • Discriminator tries to maximize this loss function while Generator tries to minimize it
      • The goal of the generator is to fool the discriminator network by generating as real images as possible

Generator

Generated data

Training data

Discriminator

Real

or

Fake

  1. Goodfellow, Ian J., et al. "Generative adversarial networks." arXiv preprint arXiv:1406.2661 (2014).

28 / 59

29 of 62

2.2 Data Augnmentation

    • Classical GAN can only produce low resolution images
      • Higher resolution makes it easier to tell the generated images apart from training images
      • Large resolutions also necessitate using smaller mini-batches due to memory constraints, further compromising training stability
      • Going straight from the latent z variable to a 1024² image contains an enormous amount of variance in the space.

    • PGGAN release by Nvidia
      • Outputs good quality high-resolution images of up to 1024x1024

    • Progressively increase network size
      • Grow both generator and discriminator progressively

  1. Karras, Tero, et al. "Progressive Growing of GANs for Improved Quality, Stability, and Variation." International Conference on Learning Representations. 2018.

29 / 59

30 of 62

2.2 Data Augnmentation

    • Training starts from low-resolution and proceeds to higher resolutions
      • Starting from easier low-resolution images (4x4, 8x8), and add new layers that introduce higher-resolution details as the training progresses (up to 1024x1024). This greatly speeds up training and improves stability in high resolutions

    • Wasserstein Gradient Penalty
      • WGAN-GP enhances training stability.
      • Produces better results

PGGAN architecture

  1. Karras, Tero, et al. "Progressive Growing of GANs for Improved Quality, Stability, and Variation." International Conference on Learning Representations. 2018.
  2. Gulrajani, Ishaan, et al. "Improved training of wasserstein gans." arXiv preprint arXiv:1704.00028 (2017).

30 / 59

31 of 62

2.2 Data Augnmentation

Generator and Discriminator architecture

  1. Karras, Tero, et al. "Progressive Growing of GANs for Improved Quality, Stability, and Variation." International Conference on Learning Representations. 2018.

31 / 59

32 of 62

2.2 Data Augnmentation

  1. Karras, Tero, et al. "Progressive Growing of GANs for Improved Quality, Stability, and Variation." International Conference on Learning Representations. 2018.

32 / 59

33 of 62

2.3 Attention Mechanism

    • What is attention?
      • Initially in NLP
      • Tells the model which part of the input sentence to focus on
      • Attention modules are used in computer vision to make the model learn and focus more on the important information, rather than learning background information.
      • A typical attention module generates a mask of the input feature map using a simple 2D-convolutional layer, multi-layer perceptron (MLP), and a sigmoid function at the end.

    • CBAM
      • Provided an input feature map, it computes the attention maps along two dimensions i.e. channel and spatial.
      • These inferred attention maps are then multiplied with the input feature map to further refine the features.
      • The intuition behind this idea is that blind attachment of an attention module can result in a 3D attention map which can be computationally expensive
      • Results indicate that the proposed method achieves a similar effect with much fewer parameters

.

  1. Woo, Sanghyun, et al. "Cbam: Convolutional block attention module." Proceedings of the European conference on computer vision (ECCV). 2018.

33 / 59

34 of 62

2.3 Attention Mechanism

.

  1. Woo, Sanghyun, et al. "Cbam: Convolutional block attention module." Proceedings of the European conference on computer vision (ECCV). 2018.

Channel

Attention

Module

Spatial

Attention

Module

Input Feature

Refined Feature

Convolutional Block Attention Module

34 / 59

35 of 62

2.3 Attention Mechanism

    • Channel attention module
      • Two pooling methods i.e. average and max pooling, are used at the same time to compute channel-wise attention for the given input feature map.
      • This results in two Cx1x1 vectors, one produced by max-pooling and the other by average pooling.
      • These are then passed through a simple bottleneck dense layer and then combined with a summation.
      • The sigmoid function is applied at the end to obtain a Cx1x1 vector which shows the importance of each channel in the original feature map.
      • This channel attention vector is applied to the input feature in a pointwise manner, which creates a new vector F’ which is shaped the same as the original input feature map F

.

  1. Woo, Sanghyun, et al. "Cbam: Convolutional block attention module." Proceedings of the European conference on computer vision (ECCV). 2018.

Input Feature F

AvgPool

MaxPool

MLP

Channel Attention

Mc

35 / 59

36 of 62

2.3 Attention Mechanism

    • Spatial attention module
      • After channel dimension, the next step is to process features in width and height dimensions
      • From the channel attention module, we obtain a CxHxW map. In the spatial attention module, average and max pooling are applied pointwise which results in two 1xHxW features.
      • A 7x7 convolution is applied after concatenating the two features
      • In the end, the sigmoid function is applied to get a 1xHxW shaped feature, which is called a spatial attention map.
      • This spatial attention map is applied F’ pointwise, resulting in a CxHxW vector and we get the final output of CBAM.

.

  1. Woo, Sanghyun, et al. "Cbam: Convolutional block attention module." Proceedings of the European conference on computer vision (ECCV). 2018.

[MaxPool, AvgPool]

Spatial Attention

Ms

Conv

layer

Channel-refined feature F

36 / 59

37 of 62

2.4 Contrastive Learning

    • A Simple Framework for Contrastive Learning of Visual Representations (SimCLR)
    • The basic intuition behind contrastive learning is to teach a machine how to distinguish between similar and dissimilar things. (maximize similarity)
    • Self-supervised learning
      • Self-supervised learning empowers us to exploit a variety of labels that come with the data for free. 
      • No human supervision. Data itself provides supervision

.

  1. Chen, Ting, et al. "A simple framework for contrastive learning of visual representations." International conference on machine learning. PMLR, 2020.
  2. https://ai.googleblog.com/2020/04/advancing-self-supervised-and-semi.html

37 / 59

38 of 62

2.4 Contrastive Learning

    • The idea of SimCLR framework is rather very simple. Given an input image, random transformations are applied to get two augmented versions of the image xi and xj
    • Representation is then obtained for these augmented images by passing them through an encoder network. These encoded vector are represented as hi and hj
    • Then a non-linear fully connected layer is applied to get representations z
    • The objective is to maximize the similarity between these two representations zi, zj

.

  1. Chen, Ting, et al. "A simple framework for contrastive learning of visual representations." International conference on machine learning. PMLR, 2020.
  2. https://ai.googleblog.com/2020/04/advancing-self-supervised-and-semi.html

An illustration of SimCLR by Google AI Blog

xi

xj

hi

zj

zi

hj

38 / 59

39 of 62

2.4 Contrastive Learning

    • We use ResNet-18 with CBAM as encoder network. Original paper used ResNet-50
    • Cosine similarity shown in equation is calculated between the representations zi and zj

    • The similarity of the augmented patches belonging to the same image/class will be higher compared to the similarity between images from different classes.
    • The augmented pair in the batch are taken one by one and the probability of the two images being similar is calculated by applying the softmax function.
    • The loss is calculated by taking the negative log of the above calculation

    • Loss is computed for the same pair a second time, by interchanging the positions of the images in the pair. Then we take the average.

.

  1. Chen, Ting, et al. "A simple framework for contrastive learning of visual representations." International conference on machine learning. PMLR, 2020.
  2. https://ai.googleblog.com/2020/04/advancing-self-supervised-and-semi.html

39 / 59

40 of 62

2.4 Contrastive Learning

    • Motivated by this, we leverage SimCLR framework to pretrain our model for AD classification task

.

  1. Chen, Ting, et al. "A simple framework for contrastive learning of visual representations." International conference on machine learning. PMLR, 2020.
  2. https://ai.googleblog.com/2020/04/advancing-self-supervised-and-semi.html

40 / 59

41 of 62

2.5 Classification Network

    • Resnet-18 with CBAM module

  1. Wang, Fei, et al. "Residual attention network for image classification." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.
  2. Woo, Sanghyun, et al. "Cbam: Convolutional block attention module." Proceedings of the European conference on computer vision (ECCV). 2018.

ResNet-18 architecture

Input

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

3x3 conv, 64

FC

Softmax

AvgPool

conv

Previous conv blocks

Mc

Ms

Next

conv blocks

F

F

F’’

ResBlock + CBAM

41 / 59

42 of 62

2.5 Classification Network

    • Cross-Entropy Loss is used

      • Where n is the number of classes, ti is the truth label and pi is the softmax probability for the ith class.

    • For binary classification, we have binary cross-entropy defined as

42 / 59

43 of 62

3. Experiments and results

43

44 of 62

3.1 Dataset

  • The Alzheimer’s Disease Neuroimaging Initiative (ADNI) dataset
    • T2-weighted MRI scan (GE medical systems)
    • DICOM/NIFTI to PNG (256x256)
    • 246 subjects (82 AD, 82 MCI, 82 CN) for training
    • Manually went through all images to select images with good regions
    • Validation data consisted of 115 slices (51 AD, 64 CN)
    • Test data consisted of 10 subjects for each category

Demographic representation of training data

44 / 59

45 of 62

3.2 Data Augmentation

  • Traditional data augmentation
    • Used rotation, shear, zoom, shift augmentation
    • To remove class imbalance and increase dataset size for GAN training

45 / 59

46 of 62

3.2 Data Augmentation

  • GAN-based image generation
    • Comparison of RaLSGAN and PGGAN

    • The ‘Fréchet Inception Distance” (FID)

which captures the similarity of

generated images to real ones.

    • PGGAN-WP FID score 35-40
      • Better than RaLSGAN
      • More realistic

  1. Jolicoeur-Martineau, Alexia. "The relativistic discriminator: a key element missing from standard GAN." arXiv preprint arXiv:1807.00734 (2018).
  2. Karras, Tero, et al. "Progressive Growing of GANs for Improved Quality, Stability, and Variation." International Conference on Learning Representations. 2018.

46 / 59

47 of 62

3.2 Data Augmentation

  • Final data set

Training data set after augmentation

47 / 59

48 of 62

3.3 Comparison of various architectures

    • Comparison on AD vs. CN classification task
      • After PGGAN generation, a detailed analysis was made on the AD vs. CN classification accuracy using different architectures.
      • These architectures include Custom CNN architecture, ResNet1, Inception model2, residual attention network3 , CBAM architecture4 and Multi-scale CNN. We also evaluate pretrained models on ImageNet data.
      • Testing results are reported based on the majority voting decision for the patients.

  1. He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  2. Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
  3. Wang, Fei, et al. "Residual attention network for image classification." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

  • Woo, Sanghyun, et al. "Cbam: Convolutional block attention module." Proceedings of the European conference on computer vision (ECCV). 2018.

Custom CNN architecture

Input

5x5 conv, 32

5x5 conv, 64

3x3 conv, 128

3x3 conv, 256

3x3 conv, 512

FC1

Softmax

AvgPool

FC2

FC3

FC4

FC5

48 / 59

49 of 62

3.3 Comparison of various architectures

    • Results

    • Observations
      • All these architectures showed higher accuracy during training. However, the validation accuracy is not very high.
      • The best performance was achieved in some cases with 2-3 miss-classifications per category.
      • The reason can be that these large ImageNet models might be over-parametrized for very small data sets.
      • Also transfer learning on real-world data is not very effective

Comparison of various architectures for AD vs. CN classification task

  1. He, Kaiming, et al. "Deep residual learning for image recognition." Proceedings of the IEEE conference on computer vision and pattern recognition. 2016.
  2. Szegedy, Christian, et al. "Going deeper with convolutions." Proceedings of the IEEE conference on computer vision and pattern recognition. 2015.
  3. Wang, Fei, et al. "Residual attention network for image classification." Proceedings of the IEEE conference on computer vision and pattern recognition. 2017.

  • Woo, Sanghyun, et al. "Cbam: Convolutional block attention module." Proceedings of the European conference on computer vision (ECCV). 2018.

49 / 59

50 of 62

3.4 Proposed Architecture

  • Training Details
    • System and environment
      • NVIDIA RTX 2080 Ti GPU.
      • PyTorch

    • Model
      • ResNet-18 with CBAM

    • SimCLR Pretraining
      • 1000 epochs
      • Batch size of 128
      • SGD optimizer
      • Learning rate of 0.05

50 / 59

51 of 62

3.4 Proposed Architecture

  • Training Details
    • Classification task
      • 100 epochs
      • SGD optimizer
      • Learning rate 0.001
      • Batch size 128

51 / 59

52 of 62

3.5 Results

  • Two category classification
    • AD vs. CN classification
      • We trained our ResNet-18 model for comparison. We also compare results with and without CBAM and SimCLR architectures to highlight their effect. We also compared our model to a custom-designed CNN
      • When training from scratch, the model seems to overfit and the training process is unstable
      • However, after pretraining, the model training on classification tasks is more stable and the resulting accuracy also goes up to 75%.
      • The model performance is further refined when combined with CBAM architecture since the model learns to focus and refine important features
      • Only 1 misclassification per category

AD vs. CN classification results of proposed framework

52 / 59

53 of 62

3.5 Results

  • Three category classification
    • AD vs. CN vs. MCI classification
      • Accuracy up to 65%
      • MCI class is very hard to distinguish from AD and CN classes.

AD vs. CN vs. MCI classification results of proposed framework

53 / 59

54 of 62

3.6 Comparison with existing methods

Study

Total Subjects

Performance

Approach

Data Leakage

Aderghal et al., 2017

Cheng and Liu, 2017

Korolev et al., 2017

Valliani and Soni, 2017

Senanayake et al., 2018

Li et al., 2017

Basaia et al., 2019

Hon and Khan, 2017

Hosseini et al., 2018

Lin et al., 2018

Taqi et al., 2018

Vu et al., 2017

Vu et al., 2018

Wang et al., 2019

Basheera et al., 2019

Proposed

815 (T1 MRI)

193 (T1 MRI + PET)

231 (T1 MRI)

417 (T1 MRI)

515 (T1 MRI)

427 (T1 MRI)

646 (T1 MRI)

416 (T1 MRI)

140 (T1 MRI)

417 (T1 MRI)

400 (T2 MRI)

317 (T1 MRI)

400 (T1 MRI)

400 (T1 MRI)

242 (T2 MRI)

164 (T2 MRI)

ACC=0.84

ACC=0.85

ACC=0.80

ACC=0.81

ACC=0.76

ACC=0.88

ACC=0.99

ACC=0.96

ACC=0.99

ACC=0.89

ACC=1.00

ACC=0.85

ACC=0.86

ACC=0.99

ACC=1.00

ACC=0.83

ROI-based

3D subject-level

3D subject-level

2D slice-level

3D subject-level

3D patch-level

3D subject-level

2D slice-level

3D subject-level

ROI-based

2D slice-level

3D subject-level

3D subject-level

3D subject-level

2D slice-level

2D slice-level

None

None

None

None

None

None

Unclear (b)

Unclear (a, c)

Unclear (a)

Unclear (b)

Unclear (b)

Unclear (a)

Clear (a, c)

Clear (b)

Clear (b)

None

  1. Wen, Junhao, et al. "Convolutional neural networks for classification of Alzheimer's disease: Overview and reproducible evaluation." Medical image analysis 63 (2020): 101694.

    • Types of data leakage
      • a: wrong data split; b: absence of independent test set; c: late split

Comparison of AD vs. CN classification task

54 / 59

55 of 62

3.6 Comparison with existing methods

    • Observations
      • Most of these approaches have data leakage
      • They used large datasets (more number of subjects)
      • T1-weighted has more slices compared to T2-weighted
      • Most of these approaches does not have proper evaluation and separate validation and test data
      • It is just like reporting training accuracy

    • Even with small data and very few slices, our approach achieves good results

    • In our approach, we avoided all kinds of data leakage and provided an unbiased evaluation of our model which is very important for clinical applications

  1. Wen, Junhao, et al. "Convolutional neural networks for classification of Alzheimer's disease: Overview and reproducible evaluation." Medical image analysis 63 (2020): 101694.

55 / 59

56 of 62

3.6 Comparison with existing methods

Study

Total Subjects

Performance

Approach

Data Leakage

Valliani and Soni, 2017

Hosseini et al., 2018

Farooq et al., 2017

Vu et al., 2018

Wang et al., 2019

Basheera et al., 2019

Proposed

660 (T1 MRI)

210 (T1 MRI)

355 (T1 MRI)

615 (T1 MRI)

624 (T1 MRI)

349 (T2 MRI)

246 (T2 MRI)

ACC=0.57

ACC=0.97

ACC=0.99

ACC=0.80

ACC=0.97

ACC-0.86

ACC=0.65

2D slice-level

3D subject-level

2D slice-level

3D subject-level

3D subject-level

2D slice-level

2D slice-level

None

Unclear (a)

Clear (a, c)

Clear (a, c)

Clear (b)

Clear (b)

None

  1. Wen, Junhao, et al. "Convolutional neural networks for classification of Alzheimer's disease: Overview and reproducible evaluation." Medical image analysis 63 (2020): 101694.

    • Types of data leakage
      • a: wrong data split; b: absence of independent test set; c: late split

    • Only one approach without data leakage

Comparison of AD vs. CN vs. MCI classification task

56 / 59

57 of 62

4. Conclusion and Future Work

57

58 of 62

4. Conclusion

    • Achieved good results despite small dataset
      • Two-category: 83% accuracy
      • Three-category: 65% accuracy

    • Good performance even with small dataset

58 / 59

59 of 62

4. Conclusion

    • Contributions
      • Proposed a novel framework for the AD classification task 
        • ResNet-18 + CBAM
        • Self-supervised pretraining

      • Dealt with medical data scarcity
        • PGGAN based image synthesis
        • Novel diseases

      • Proper unbiased evaluation
        • Separate test and validation data

      • Application of SimCLR to AD classification

      • An analysis of various approaches and architectures

59 / 59

60 of 62

4. Future Work

    • Improving results
      • Three category classification

    • Test model with large data sets and other modalities
      • PET, cognitive scores, APOE genotype, CSF biomarkers, demographic data etc.

    • Working with further preprocessing techniques
      • FMRIB Software Library
      • FreeSurfer
      • Skull stripping, segmentation, bias correction etc.

    • Alzheimer disease prediction

    • Diagnosis of other types of dementia
      • Vascular dementia, Frontotemporal dementia, Dementia with Lewy bodies

60 / 59

61 of 62

4. Future Work

    • FMRIB Software Library (FSL)
      • Skull stripping, bias correction etc.

61 / 59

62 of 62

Thank you!

62