1 of 124

Glaucoma classification from Fundus Images

BPEye PBL Training for

Shirshak Acharya

Research Assistant, NAAMII

2024-25

2 of 124

Outline

  • How are Glaucoma Classified By Doctors
  • How do we learn to see?
  • Extract Features from Image
    • Hand Engineered Feature Extraction
    • Challenging Condition in Feature Extraction
    • Feature Extraction : Learned from Data – CNN
    • CNN Feature Maps Visualization
  • Linear Functions to Deep Networks

3 of 124

Outline (2)

  • Composition with Non-Linear Functions
    • Composing Sigmoid with Linear Function
    • Training Logisic Regression
  • Why CNNs
    • Sparse Connectivity of CNN
    • Receptive Field
    • Parameter Sharing
    • Translation Equivariance and Invariance
  • MaxPooling & Invariance to small shifts

4 of 124

Outline (3)

  • Strided Convolutions to Down Sample Image
    • Comparative Analysis : Maxpool vs Strided Convolution
  • Layer Patterns
    • Common ConvNet Architectures
  • Successful Examples and Network Engineering
    • LeNet-5
    • AlexNet
    • Inception
    • ResNet
    • EfficientNet

5 of 124

Glaucoma Classification

Classify if image is Glaucoma or not from below?

6 of 124

Glaucoma Classification

Classify if image is Glaucoma or not from below?

7 of 124

Fundus Images

  • image containing detailed view of the retina, optic disc, macula, and blood vessels at the back of the eye

Fundus Images

8 of 124

Fundus Camera

  • intricate microscope attached to a flash enabled camera

9 of 124

Fundus Images : Data Understanding

Glaucoma

Not Glaucoma

10 of 124

Fundus Images : Data Understanding

Glaucoma

Not Glaucoma

11 of 124

Fundus Images : Data Understanding

Glaucoma

Not Glaucoma

12 of 124

Fundus Images : Data Understanding

Glaucoma

Not Glaucoma

13 of 124

Fundus Images : Data Understanding

Glaucoma

Not Glaucoma

14 of 124

Airogs Dataset

15 of 124

How do doctor classify Glaucoma

  • Features :
    • Optic Cup-to-Disc Ratio (CDR)
    • Thickness of the nerve fiber layer
    • Blood Vessel Patterns

16 of 124

Glaucoma Classification Training

healthy optic disc

glaucomatous optic disc

17 of 124

Glaucoma Classification Training

From features mentioned at previous slide. Classify Glaucoma

18 of 124

Glaucoma Classification Training

From features mentioned at previous slide. Classify Glaucoma

not glaucoma

19 of 124

Glaucoma Classification Training

From features mentioned at previous slide. Classify Glaucoma

20 of 124

Glaucoma Classification Training

From features mentioned at previous slide. Classify Glaucoma

glaucomatous

21 of 124

Identifying Glaucoma

Not Glaucoma

Glaucoma

Doctor

Fundus Images

22 of 124

How did we learn to classify Glaucoma Images?

  • Recognized Feature and some pattern from image
  • Better Glaucoma Feature Learning = Better Glaucoma Classification

23 of 124

Problem with current approaches

  • Time-Consuming for diagnosis
  • Specialists are scarce in remote and underserved areas

24 of 124

Solution : Use of AI models

  • Handles large datasets efficiently, flagging potential cases
  • Automates routine analysis & saves time for specialists

25 of 124

How can we teach these to machines?

  • How to make Feature Learning and pattern recognition possible by machines?

  • How we see How machines see

26 of 124

How do we learn to see

What we see in image ?

What machines see in image?

27 of 124

What is image to machine

  • Computers just see numbers (in fact, a lot of numbers)
  • 2D image: each discrete position (x , y ) consist of intensity values
  • 3D image: each discrete position (x , y , z ) consist of intensity values
  • Grayscale images represented as N × M matrix
  • Color images (R,G,B) represented as 3 × N × M matrix

28 of 124

How do doctor classify Glaucoma

  • Features :
    • Optic Cup-to-Disc Ratio (CDR)
    • Thickness of the nerve fiber layer
    • Blood Vessel Patterns

29 of 124

Extract Feature from image

  • Feature Extraction :
  • Hand Engineered Feature Extraction
    • Easy to understand
    • If designed well, delivers good performance with small computation  overhead requires a lot of expertise and domain knowledge
  • Learned from data
    • Quite popular these days (Why ?)
    • Requires a lot of data and computational might
    • Difficult (and sometime impossible) to understand (a black box)

30 of 124

Hand Engineered Feature Extraction:

Edge Detection

31 of 124

How ??

Hand Engineered Feature Extraction:

Edge Detection

32 of 124

Edge Detection from Hand Engineered Features

33 of 124

  • Edge represent boundaries or abrupt change in intensity between adjacent pixels.
  • Edge = Sharp variation in images 
  • Edge ⇒ Large first derivative�

Edge Detection (Details Heavy)

34 of 124

Sobel Edge Detection (Details Heavy)

35 of 124

Hand Engineered Feature Extraction: Threshold

36 of 124

Hand Engineered Feature Extraction: Harris Corner

37 of 124

Hand Engineered Feature Extraction: Harris Corner

38 of 124

  • Compute the horizontal and vertical edge
  • Convolve a Gaussian Smoothing kernel
  • Compute Corner Response:
  • Threshold to get strong corners: R > threshold

Hand Engineered Feature Extraction: Harris Corner

39 of 124

Feature Extraction (Details)

40 of 124

  • Compute new image as

Linear Filter

41 of 124

Computation Example

42 of 124

Computation Example

43 of 124

Linear Filtering

44 of 124

Linear Filtering

45 of 124

Linear Filtering

46 of 124

Linear Filtering

47 of 124

Correlation and Convolution

Correlation (previous slides) :

Convolution :

48 of 124

Features depend on imaging conditions and hard to design “invariant” features Relatively strong domain knowledge required

Source: http://cs231n.github.io/classification

Challenging conditions in Feature Extraction

49 of 124

  • Fully Connected Networks
  • Convolution Neural Networks
  • (Variational) Auto-encoders
  • Generative Adversarial Networks
  • Spatial Transformers

Feature Extraction : Learned from data

50 of 124

Feature Extraction : Learned from data - CNN

51 of 124

X 2

X 3

CNN Feature Maps Visualization (VGG-16)

52 of 124

Deeper Layer Feature Map of Glaucoma

Deeper Layer Feature Map of Normal

  • Early Layers detect simple features – edge, gradients.
  • Deeper layers detect abstract features.

Feature Maps

53 of 124

Glaucoma

Not-Glaucoma

Feature Maps

Fully Connected (4096)

Fully Connected (4096)

Fully Connected (1000)

Glaucomatic

Non Glaucomatic

Classification from feature maps (VGG-16)

54 of 124

Linear Functions to Deep Network

55 of 124

Linear Regression

Data from G. WiF. Journal of StaJsJcs EducaJon, Volume 21, Number 1 (2013)

56 of 124

Linear Regression

57 of 124

Linear Regression

58 of 124

Linear Regression

Data from G. WiF. Journal of StaJsJcs EducaJon, Volume 21, Number 1 (2013)

Which m & c to use?

  • One with least least square distance

59 of 124

Least Square Linear Regression

Data from G. WiF. Journal of StaJsJcs EducaJon, Volume 21, Number 1 (2013)

60 of 124

Least Square Linear Regression

Data from G. WiF. Journal of StaJsJcs EducaJon, Volume 21, Number 1 (2013)

61 of 124

Least Square Linear Regression

X1 & X2 – independent variables

62 of 124

Gradient Descent

63 of 124

Gradient Descent

64 of 124

Gradient Descent

65 of 124

Gradient Descent

66 of 124

Gradient Descent

67 of 124

Gradient Descent

68 of 124

Gradient Descent Example

69 of 124

Gradient Descent Example

70 of 124

Gradient Descent Example

71 of 124

Gradient Descent Example

72 of 124

Linear Functions to Deep Network

73 of 124

Linear Functions to Deep Network

74 of 124

Linear Functions to Deep Network

75 of 124

Linear Functions to Deep Network

76 of 124

Composition with Non-Linear Functions

77 of 124

Sigmoid Function

78 of 124

Composing Sigmoid with Linear Function

Activation Functions​

Adding non-linearity and composing :

Can model complex relationships between input & output

79 of 124

Training a Logistic Regression

Model

Loss Function

80 of 124

Training a Logistic Regression

Loss function over m training samples

81 of 124

Training a Logistic Regression

82 of 124

Training a Logistic Regression

83 of 124

Training a Logistic Regression

84 of 124

Training a Logistic Regression

85 of 124

CNN in a typical Scenario

86 of 124

  • Sparse Interactions

  (sparse -> less parameters -> less computation)

  • Parameter Sharing
  • Equivalent to Translation

Why CNNs

87 of 124

Sparse Connectivity of CNN

Sparse connectivity:

  • One input (Eg. X3 affects three output nodes)

Sparse connectivity:

  • One input (Eg. X3 affects every output nodes)
  1. Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

88 of 124

Sparse Connectivity of CNN

Sparse connectivity:

  • One input (Eg. X3 affects three output nodes)
  • One output unit is affected by 3 input unit
  • 2+3+3+3+2= 13 variables(parameters)

Sparse connectivity:

  • One input (Eg. X3 affects every output nodes)
  • One output unit is affected by every input node.
  • 5x5= 25 variables(parameters)
  1. Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

89 of 124

Receptive Field of CNN

Increasing Layers increases receptive field of CNN

  1. Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

90 of 124

CNNs Parameter Sharing

  1. Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

91 of 124

CNNs Parameter Sharing

  1. Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

Fully Connected (Bottom), Convolutional (Top)

Because of parameter sharing, single parameter is used across all locations.

92 of 124

CNNs Parameter Sharing

  1. Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

Fully Connected (Bottom), Convolutional (Top)

Because of parameter sharing, single parameter is used across all locations.

No parameter sharing ; one parameter for each input is used for only one output.

93 of 124

CNNs Parameter Sharing

  1. Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.
  • Convolution with kernel size of 2
  • Same 2 weights (a,b) repeated over input.
  • Runtime same, but storage massively reduced.

94 of 124

CNNs Parameter Sharing

  1. Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.
  • Convolution with kernel size of 2
  • Same 2 weights (a,b) repeated over input.
  • Runtime same, but storage massively reduced.

No. of parameters originally = 25

After sparse connectivity = 13

After parameter sharing = 2

95 of 124

CNNs Parameter Sharing

  1. Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.
  • Convolution with kernel size of 2
  • Same 2 weights (a,b) repeated over input.
  • Runtime same, but storage massively reduced.

No. of parameters originally = 25

After sparse connectivity = 13

After parameter sharing = 2

Reduce parameters by 12.5x

96 of 124

  • CNN parameter sharing ---> Equivariance to Translation

CNNs Equivariance to Translation

  1. Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

97 of 124

Translation Equivariance

98 of 124

Translation Invariance

99 of 124

  • Pooling : Replace o/p of network at certain location with summary statistics of nearby outputs [1]
  • E.g. Max Pooling, Avg Pooling, Weighted Avg Pooling, etc

Pooling

  1. Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

100 of 124

Max Pooling and Invariance to Small Shifts

  1. Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.
  • “Detector Stage” => non-linear activation, e.g. ReLU
  • All pixels changed in bottom row, but only half in top row

101 of 124

Max Pooling and Invariance to Small Shifts

  1. Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.
  • “Detector Stage” => non-linear activation, e.g. ReLU
  • All pixels changed in bottom row, but only half in top row

What about other invariance such as rotation ?

102 of 124

Max Pooling and Invariance to Small Shifts

  1. Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.
  • “Detector Stage” => non-linear activation, e.g. ReLU
  • All pixels changed in bottom row, but only half in top row

What about other invariance such as rotation ?

Pool over multiple feature maps!

But what are feature maps ?

103 of 124

CNN in a typical Scenario

104 of 124

CNN in a typical scenario

105 of 124

Need of Data Augmentation

Original glaucoma image

Rotated

Glaucomatous

Not Glaucomatous

Same

106 of 124

Max Pooling Over Multiple Feature Maps

Three learned filters + max pooling

= Invariant to rotation​

107 of 124

Pooling to Down Sample Image

108 of 124

  • Taking max or avg values, it can overlook small but crucial variations in the image.
  • Uniform down sampling of max pool doesn't account for varying importance of different regions of image.

Pooling Drawbacks :

  • https://arxiv.org/abs/1412.6806

109 of 124

Strided Convolutions to Down Sample Image

  • Learnable Downsampling
  • Learning pooling operation, which increases model expressiveness ability

  • https://arxiv.org/abs/1412.6806

110 of 124

Comparative Analysis: Maxpool vs Strided Conv

111 of 124

Layer Patterns

* Repetition; ? Optional; 

112 of 124

Common ConvNet Architectures

113 of 124

Successful Examples and Network Engineering

Alex net, VGG, GoogLeNet with Inception Module, Residual Net

114 of 124

Successful Examples and Network Engineering

  • Y. LeCun, L. Bottou, Y. Bengio and  P. Haffner. Gradient Based Learning Applied to Document Recognition. Proc. of the IEEE, 1998

The LeNet-5 Architecture, 1998 [1]

115 of 124

Successful Examples and Network Engineering

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

AlexNet Architecture, 2012 [1]

  • ReLU non-linearity
  • Dropout
  • Data Augmentation

    - Image Translation & horizontal Reflections

    - Altering the intensities of the RGB

116 of 124

Successful Examples and Network Engineering

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

5x5 conv layer - Global features captured,

Max-pool - low-level features captured

3x3 conv layer - distributed features captured

1x1 conv layer - depth reduction​​

Inception module with dimension reductions

117 of 124

Successful Examples and Network Engineering

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

Inception module with dimension reductions

5x5 conv layer - Global features captured,

3x3 conv layer - distributed features captured

1x1 conv layer - depth reduction​​

Max-pool - low-level features captured

118 of 124

Successful Examples and Network Engineering

  • He, Kaiming, et al. "Identity mappings in deep residual networks." European conference on computer vision. Springer, Cham, 2016..

Residual Module of ResNet, 2016 [1]

  • Identity Mappings, skip connection
  • Large number of Batch Normalization layers
  • Very deep networks without vanishing gradient problem
  • E.g. 152 layers compared to 6 layers

119 of 124

Successful Examples and Network Engineering

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

Different scaling methods

vs. EfficientNet Compound scaling

120 of 124

  • What is glaucoma classification? classify glaucoma from a given fundus image ----> 8, 9.
  • What is fundus image? ----> 10
  • How fundus image captured? from fundus camera–---->11
  • How glaucoma classified from fundus image? from their features --->12
  • Glaucoma classification training with people ----> 13 : 17
  • how did people learn classify? feature learning & pattern recognition ----> 18
  • can we teach that to machines? -----> 19
  • What we see vs what machines see -------> 20, 21

  • Concept covered : Fundus Images, Data, Features, RGB Representation, Binary Classification, Label

Narrative : Glaucoma Classification

121 of 124

  • Meaning of feature extraction from image? multiply image with some filter ----> 25
  • Why not use hand engineered features? harder for hand engineered features, let machine learn engineered features by itself ------> 42, 43
  • How are Fully Connected layers(ANN) used with CNN? ------> 45
  • Why are CNN used ? Reduce number of parameters = less computational cost ----> 61

  • Concepts covered : Feature extraction, kernels, greyscale/rgb images, thresholding, elementwise multiplication, harris corner, Convolution, Fully Connected Layers, Activation, Loss Functions, Forward & Backward Propagation, Metrics

Narrative : Glaucoma Classification (2)

122 of 124

  • CNN invariant to translation? How to make it invariant to rotation -----> 72, 73, 80
  • Why pooling used with CNN? uniform Downsampling image -> 81
  • Why not use strided convolutions?=learnable downsampling ------> 83
  • Common convnets architecture -------> 85:93
  • Challenging conditions in feature extraction --------> 94

  • Concepts covered : Down Sampling, Pooling, Receptive field, Parameter Sharing, Sparse Connection, Data Augmentations, Activation Functions, Dimensionality Reduction, Dropout , Skip Connections, vanishing & exploding gradients

Narrative : Glaucoma Classification (3)

123 of 124

Pooling - Downsampling Image, Invariance to small shift

Pooling with data augmentation  and transformation - Invariance to rotation, scaling, size

Why strided convolution is better than pooling?

How does simple CNN Layer Pattern Look like?(73, 74)

Engineered Architectures and Examples of best CNN Architectures (75)

Lenet-5 started use of convolution and subsampling (pooling) which used tanh activations (76)

Alexnet : add data augmentation, dropout feature, ReLU instead of tanh, established use of GPU for fast training (Slide 77)

Inception : used 1x1 for reduce dimensionality, implement multiple filters of  1x1, 3x3, 5x5 helps extract relevant features without the need for deeper or wider networks , introduces additional hyperparameter like number and sizes of filters.

Resnet : used skip connections to allow gradients to flow directly to earlier layers, (vanishing gradient)

Efficient Net : Instead of randomly scaling introduced compound scaling to scale up width, depth, and resolution.

Technical Detailed Narrative:Glaucoma Classification

124 of 124

Extract feature from image – Feature Extraction (20)

Hand engineered feature extraction – extracts edge, corners, (21, 22)

Elementwise multiplication of filter and images creates feature map. (23)

Edge detection – sharp change in image pixels (24, 25)

Image Thresholding, Harris Corner (26, 27)

Challenging conditions in feature extractions (28)

Feature Learning and Classification in CNN (30)

CNN feature maps visualization – How does a CNN learn? (31, 32, 33)

How do linear function in CNN work? (34 - 38)

Why is it necessary to incorporate Non-Linear activations in Linear Functions (37, 38)

Composition linear models with non linear activation-sigmoid (39, 40, 41 )

Logistic Regression – Powerful Backbone of Neural Network is born (42 - 47)

How CNN is so powerful?(48, 49)

Sparse Connectivity of CNN (50, 51 ), Receptive Field (52), Parameter Sharing 53-58),

Translation Equivariance & Invariance (59, 61, 61)

Technical Detailed Narrative : Glaucoma Classification (2)