2 of 124

Outline

How are Glaucoma Classified By Doctors
How do we learn to see?
Extract Features from Image

Hand Engineered Feature Extraction
Challenging Condition in Feature Extraction
Feature Extraction : Learned from Data – CNN
CNN Feature Maps Visualization

Linear Functions to Deep Networks

3 of 124

Outline (2)

Composition with Non-Linear Functions

Composing Sigmoid with Linear Function
Training Logisic Regression

Why CNNs

Sparse Connectivity of CNN
Receptive Field
Parameter Sharing
Translation Equivariance and Invariance

MaxPooling & Invariance to small shifts

4 of 124

Outline (3)

Strided Convolutions to Down Sample Image

Comparative Analysis : Maxpool vs Strided Convolution

Layer Patterns

Common ConvNet Architectures

Successful Examples and Network Engineering

LeNet-5
AlexNet
Inception
ResNet
EfficientNet

5 of 124

Glaucoma Classification

Classify if image is Glaucoma or not from below?

6 of 124

Glaucoma Classification

Classify if image is Glaucoma or not from below?

7 of 124

Fundus Images

image containing detailed view of the retina, optic disc, macula, and blood vessels at the back of the eye

Fundus Images

8 of 124

Fundus Camera

intricate microscope attached to a flash enabled camera

9 of 124

Fundus Images : Data Understanding

Glaucoma

Not Glaucoma

10 of 124

Fundus Images : Data Understanding

Glaucoma

Not Glaucoma

11 of 124

Fundus Images : Data Understanding

Glaucoma

Not Glaucoma

12 of 124

Fundus Images : Data Understanding

Glaucoma

Not Glaucoma

13 of 124

Fundus Images : Data Understanding

Glaucoma

Not Glaucoma

14 of 124

Airogs Dataset

https://www.kaggle.com/datasets/deathtrooper/glaucoma-dataset-eyepacs-airogs-light-v2

15 of 124

How do doctor classify Glaucoma

Features :

Optic Cup-to-Disc Ratio (CDR)
Thickness of the nerve fiber layer
Blood Vessel Patterns

16 of 124

Glaucoma Classification Training

healthy optic disc

glaucomatous optic disc

https://biomedical-engineering-online.biomedcentral.com/articles/10.1186/s12938-019-0649-y/figures/1

17 of 124

Glaucoma Classification Training

From features mentioned at previous slide. Classify Glaucoma

18 of 124

Glaucoma Classification Training

From features mentioned at previous slide. Classify Glaucoma

not glaucoma

19 of 124

Glaucoma Classification Training

From features mentioned at previous slide. Classify Glaucoma

20 of 124

Glaucoma Classification Training

From features mentioned at previous slide. Classify Glaucoma

glaucomatous

21 of 124

Identifying Glaucoma

Not Glaucoma

Glaucoma

Doctor

Fundus Images

22 of 124

How did we learn to classify Glaucoma Images?

Recognized Feature and some pattern from image
Better Glaucoma Feature Learning = Better Glaucoma Classification

23 of 124

Problem with current approaches

Time-Consuming for diagnosis
Specialists are scarce in remote and underserved areas

24 of 124

Solution : Use of AI models

Handles large datasets efficiently, flagging potential cases
Automates routine analysis & saves time for specialists

25 of 124

How can we teach these to machines?

How to make Feature Learning and pattern recognition possible by machines?

How we see How machines see

26 of 124

How do we learn to see

What we see in image ?

What machines see in image?

27 of 124

What is image to machine

Computers just see numbers (in fact, a lot of numbers)
2D image: each discrete position (x , y ) consist of intensity values
3D image: each discrete position (x , y , z ) consist of intensity values
Grayscale images represented as N × M matrix
Color images (R,G,B) represented as 3 × N × M matrix

28 of 124

How do doctor classify Glaucoma

Features :

Optic Cup-to-Disc Ratio (CDR)
Thickness of the nerve fiber layer
Blood Vessel Patterns

29 of 124

Extract Feature from image

Feature Extraction :
Hand Engineered Feature Extraction

Easy to understand
If designed well, delivers good performance with small computation overhead requires a lot of expertise and domain knowledge

Learned from data

Quite popular these days (Why ?)
Requires a lot of data and computational might
Difficult (and sometime impossible) to understand (a black box)

30 of 124

Hand Engineered Feature Extraction:

Edge Detection

https://github.com/one2clouds/NAAMII-BPEye-Training/blob/main/Week%202.5%20image_kernels.ipynb

31 of 124

How ??

Hand Engineered Feature Extraction:

Edge Detection

https://github.com/one2clouds/NAAMII-BPEye-Training/blob/main/Week%202.5%20image_kernels.ipynb

32 of 124

Edge Detection from Hand Engineered Features

33 of 124

Edge represent boundaries or abrupt change in intensity between adjacent pixels.
Edge = Sharp variation in images
Edge ⇒ Large first derivative�

Edge Detection (Details Heavy)

34 of 124

Sobel Edge Detection (Details Heavy)

35 of 124

Hand Engineered Feature Extraction: Threshold

https://github.com/one2clouds/NAAMII-BPEye-Training/blob/main/Week%202.5%20image_kernels.ipynb

36 of 124

Hand Engineered Feature Extraction: Harris Corner

https://github.com/one2clouds/NAAMII-BPEye-Training/blob/main/Week%202.5%20image_kernels.ipynb

37 of 124

Hand Engineered Feature Extraction: Harris Corner

38 of 124

Compute the horizontal and vertical edge
Convolve a Gaussian Smoothing kernel
Compute Corner Response:
Threshold to get strong corners: R > threshold

�

Hand Engineered Feature Extraction: Harris Corner

39 of 124

Feature Extraction (Details)

40 of 124

Compute new image as

Linear Filter

41 of 124

Computation Example

42 of 124

Computation Example

43 of 124

Linear Filtering

44 of 124

Linear Filtering

45 of 124

Linear Filtering

46 of 124

Linear Filtering

47 of 124

Correlation and Convolution

Correlation (previous slides) :

Convolution :

48 of 124

Features depend on imaging conditions and hard to design “invariant” features Relatively strong domain knowledge required

Source: http://cs231n.github.io/classification

Challenging conditions in Feature Extraction

49 of 124

Fully Connected Networks
Convolution Neural Networks
(Variational) Auto-encoders
Generative Adversarial Networks
Spatial Transformers

Feature Extraction : Learned from data

50 of 124

Feature Extraction : Learned from data - CNN

51 of 124

X 2

X 3

CNN Feature Maps Visualization (VGG-16)

https://github.com/one2clouds/NAAMII-BPEye-Training/blob/main/Visualizing%20Feature%20Maps.ipynb

52 of 124

Deeper Layer Feature Map of Glaucoma

Deeper Layer Feature Map of Normal

Early Layers detect simple features – edge, gradients.
Deeper layers detect abstract features.

Feature Maps

53 of 124

Glaucoma

Not-Glaucoma

Feature Maps

Fully Connected (4096)

Fully Connected (1000)

Glaucomatic

Non Glaucomatic

Classification from feature maps (VGG-16)

54 of 124

Linear Functions to Deep Network

https://medium.com/@melodious/understanding-deep-neural-networks-from-first-principles-logistic-regression-bd2f01c9e263

55 of 124

Linear Regression

Data from G. WiF. Journal of StaJsJcs EducaJon, Volume 21, Number 1 (2013)

56 of 124

Linear Regression

57 of 124

Linear Regression

58 of 124

Linear Regression

Data from G. WiF. Journal of StaJsJcs EducaJon, Volume 21, Number 1 (2013)

Which m & c to use?

One with least least square distance

59 of 124

Least Square Linear Regression

Data from G. WiF. Journal of StaJsJcs EducaJon, Volume 21, Number 1 (2013)

60 of 124

Least Square Linear Regression

Data from G. WiF. Journal of StaJsJcs EducaJon, Volume 21, Number 1 (2013)

61 of 124

Least Square Linear Regression

X1 & X2 – independent variables

62 of 124

Gradient Descent

63 of 124

Gradient Descent

64 of 124

Gradient Descent

65 of 124

Gradient Descent

66 of 124

Gradient Descent

67 of 124

Gradient Descent

68 of 124

Gradient Descent Example

69 of 124

Gradient Descent Example

70 of 124

Gradient Descent Example

71 of 124

Gradient Descent Example

72 of 124

Linear Functions to Deep Network

https://medium.com/@melodious/understanding-deep-neural-networks-from-first-principles-logistic-regression-bd2f01c9e263

73 of 124

Linear Functions to Deep Network

https://medium.com/@melodious/understanding-deep-neural-networks-from-first-principles-logistic-regression-bd2f01c9e263

74 of 124

Linear Functions to Deep Network

https://medium.com/@melodious/understanding-deep-neural-networks-from-first-principles-logistic-regression-bd2f01c9e263

75 of 124

Linear Functions to Deep Network

https://medium.com/@melodious/understanding-deep-neural-networks-from-first-principles-logistic-regression-bd2f01c9e263

76 of 124

Composition with Non-Linear Functions

https://medium.com/@melodious/understanding-deep-neural-networks-from-first-principles-logistic-regression-bd2f01c9e263

77 of 124

Sigmoid Function

https://commons.wikimedia.org/w/index.php?curid=4310325

78 of 124

Composing Sigmoid with Linear Function

https://medium.com/@melodious/understanding-deep-neural-networks-from-first-principles-logistic-regression-bd2f01c9e263

Activation Functions

Adding non-linearity and composing :

Can model complex relationships between input & output

79 of 124

Training a Logistic Regression

https://medium.com/@melodious/understanding-deep-neural-networks-from-first-principles-logistic-regression-bd2f01c9e263

Model

Loss Function

80 of 124

Training a Logistic Regression

https://medium.com/@melodious/understanding-deep-neural-networks-from-first-principles-logistic-regression-bd2f01c9e263

Loss function over m training samples

81 of 124

Training a Logistic Regression

https://medium.com/@melodious/understanding-deep-neural-networks-from-first-principles-logistic-regression-bd2f01c9e263

82 of 124

Training a Logistic Regression

https://medium.com/@melodious/understanding-deep-neural-networks-from-first-principles-logistic-regression-bd2f01c9e263

83 of 124

Training a Logistic Regression

https://medium.com/@melodious/understanding-deep-neural-networks-from-first-principles-logistic-regression-bd2f01c9e263

84 of 124

Training a Logistic Regression

https://medium.com/@melodious/understanding-deep-neural-networks-from-first-principles-logistic-regression-bd2f01c9e263

85 of 124

CNN in a typical Scenario

86 of 124

Sparse Interactions

(sparse -> less parameters -> less computation)

Parameter Sharing
Equivalent to Translation

Why CNNs

87 of 124

Sparse Connectivity of CNN

Sparse connectivity:

One input (Eg. X3 affects three output nodes)

Sparse connectivity:

One input (Eg. X3 affects every output nodes)

Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

88 of 124

Sparse Connectivity of CNN

Sparse connectivity:

One input (Eg. X3 affects three output nodes)
One output unit is affected by 3 input unit
2+3+3+3+2= 13 variables(parameters)

Sparse connectivity:

One input (Eg. X3 affects every output nodes)
One output unit is affected by every input node.
5x5= 25 variables(parameters)

Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

89 of 124

Receptive Field of CNN

Increasing Layers increases receptive field of CNN

Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

90 of 124

CNNs Parameter Sharing

Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

91 of 124

CNNs Parameter Sharing

Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

Fully Connected (Bottom), Convolutional (Top)

Because of parameter sharing, single parameter is used across all locations.

92 of 124

CNNs Parameter Sharing

Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

Fully Connected (Bottom), Convolutional (Top)

Because of parameter sharing, single parameter is used across all locations.

No parameter sharing ; one parameter for each input is used for only one output.

93 of 124

CNNs Parameter Sharing

Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

Convolution with kernel size of 2
Same 2 weights (a,b) repeated over input.
Runtime same, but storage massively reduced.

94 of 124

CNNs Parameter Sharing

Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

Convolution with kernel size of 2
Same 2 weights (a,b) repeated over input.
Runtime same, but storage massively reduced.

No. of parameters originally = 25

After sparse connectivity = 13

After parameter sharing = 2

95 of 124

CNNs Parameter Sharing

Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

Convolution with kernel size of 2
Same 2 weights (a,b) repeated over input.
Runtime same, but storage massively reduced.

No. of parameters originally = 25

After sparse connectivity = 13

After parameter sharing = 2

Reduce parameters by 12.5x

96 of 124

CNN parameter sharing ---> Equivariance to Translation

CNNs Equivariance to Translation

Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

97 of 124

Translation Equivariance

https://www.baeldung.com/cs/translation-invariance-equivariance

98 of 124

Translation Invariance

https://www.baeldung.com/cs/translation-invariance-equivariance

99 of 124

Pooling : Replace o/p of network at certain location with summary statistics of nearby outputs [1]
E.g. Max Pooling, Avg Pooling, Weighted Avg Pooling, etc

Pooling

Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

100 of 124

Max Pooling and Invariance to Small Shifts

Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

“Detector Stage” => non-linear activation, e.g. ReLU
All pixels changed in bottom row, but only half in top row

101 of 124

Max Pooling and Invariance to Small Shifts

Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

“Detector Stage” => non-linear activation, e.g. ReLU
All pixels changed in bottom row, but only half in top row

What about other invariance such as rotation ?

102 of 124

Max Pooling and Invariance to Small Shifts

Ian Goodfellow, Yoshua Bengio and Aaron Courville. Deep Learning.

“Detector Stage” => non-linear activation, e.g. ReLU
All pixels changed in bottom row, but only half in top row

What about other invariance such as rotation ?

Pool over multiple feature maps!

But what are feature maps ?

103 of 124

CNN in a typical Scenario

104 of 124

CNN in a typical scenario

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/deploying-convolutional-neural-network-on-cortex-m-with-cmsis-nn

105 of 124

Need of Data Augmentation

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/deploying-convolutional-neural-network-on-cortex-m-with-cmsis-nn

Original glaucoma image

Rotated

Glaucomatous

Not Glaucomatous

Same

106 of 124

Max Pooling Over Multiple Feature Maps

https://community.arm.com/arm-community-blogs/b/architectures-and-processors-blog/posts/deploying-convolutional-neural-network-on-cortex-m-with-cmsis-nn

Three learned filters + max pooling

= Invariant to rotation

107 of 124

Pooling to Down Sample Image

http://cs231n.github.io/convolutional-networks/

108 of 124

Taking max or avg values, it can overlook small but crucial variations in the image.
Uniform down sampling of max pool doesn't account for varying importance of different regions of image.

Pooling Drawbacks :

https://arxiv.org/abs/1412.6806

109 of 124

Strided Convolutions to Down Sample Image

Learnable Downsampling
Learning pooling operation, which increases model expressiveness ability

https://arxiv.org/abs/1412.6806

110 of 124

Comparative Analysis: Maxpool vs Strided Conv

https://github.com/DuaneNielsen/maxpoolvsconv?tab=readme-ov-file

111 of 124

Layer Patterns

http://cs231n.github.io/convolutional-networks/

* Repetition; ? Optional;

112 of 124

Common ConvNet Architectures

http://cs231n.github.io/convolutional-networks/

113 of 124

Successful Examples and Network Engineering

http://cs231n.github.io/convolutional-networks/

Alex net, VGG, GoogLeNet with Inception Module, Residual Net

114 of 124

Successful Examples and Network Engineering

Y. LeCun, L. Bottou, Y. Bengio and P. Haffner. Gradient Based Learning Applied to Document Recognition. Proc. of the IEEE, 1998

The LeNet-5 Architecture, 1998 [1]

115 of 124

Successful Examples and Network Engineering

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

AlexNet Architecture, 2012 [1]

ReLU non-linearity
Dropout
Data Augmentation

- Image Translation & horizontal Reflections

- Altering the intensities of the RGB

116 of 124

Successful Examples and Network Engineering

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

5x5 conv layer - Global features captured,

Max-pool - low-level features captured

3x3 conv layer - distributed features captured

1x1 conv layer - depth reduction

Inception module with dimension reductions

117 of 124

Successful Examples and Network Engineering

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

Inception module with dimension reductions

5x5 conv layer - Global features captured,

3x3 conv layer - distributed features captured

1x1 conv layer - depth reduction

Max-pool - low-level features captured

118 of 124

Successful Examples and Network Engineering

He, Kaiming, et al. "Identity mappings in deep residual networks." European conference on computer vision. Springer, Cham, 2016..

Residual Module of ResNet, 2016 [1]

Identity Mappings, skip connection
Large number of Batch Normalization layers
Very deep networks without vanishing gradient problem
E.g. 152 layers compared to 6 layers

119 of 124

Successful Examples and Network Engineering

Krizhevsky, Alex, Ilya Sutskever, and Geoffrey E. Hinton. "Imagenet classification with deep convolutional neural networks." Advances in neural information processing systems. 2012.

Different scaling methods

vs. EfficientNet Compound scaling

120 of 124

What is glaucoma classification? classify glaucoma from a given fundus image ----> 8, 9.
What is fundus image? ----> 10
How fundus image captured? from fundus camera–---->11
How glaucoma classified from fundus image? from their features --->12
Glaucoma classification training with people ----> 13 : 17
how did people learn classify? feature learning & pattern recognition ----> 18
can we teach that to machines? -----> 19
What we see vs what machines see -------> 20, 21

Concept covered : Fundus Images, Data, Features, RGB Representation, Binary Classification, Label

Narrative : Glaucoma Classification

121 of 124

Meaning of feature extraction from image? multiply image with some filter ----> 25
Why not use hand engineered features? harder for hand engineered features, let machine learn engineered features by itself ------> 42, 43
How are Fully Connected layers(ANN) used with CNN? ------> 45
Why are CNN used ? Reduce number of parameters = less computational cost ----> 61

Concepts covered : Feature extraction, kernels, greyscale/rgb images, thresholding, elementwise multiplication, harris corner, Convolution, Fully Connected Layers, Activation, Loss Functions, Forward & Backward Propagation, Metrics

Narrative : Glaucoma Classification (2)

122 of 124

CNN invariant to translation? How to make it invariant to rotation -----> 72, 73, 80
Why pooling used with CNN? uniform Downsampling image -> 81
Why not use strided convolutions?=learnable downsampling ------> 83
Common convnets architecture -------> 85:93
Challenging conditions in feature extraction --------> 94

Concepts covered : Down Sampling, Pooling, Receptive field, Parameter Sharing, Sparse Connection, Data Augmentations, Activation Functions, Dimensionality Reduction, Dropout , Skip Connections, vanishing & exploding gradients

Narrative : Glaucoma Classification (3)

123 of 124

Pooling - Downsampling Image, Invariance to small shift

Pooling with data augmentation and transformation - Invariance to rotation, scaling, size

Why strided convolution is better than pooling?

How does simple CNN Layer Pattern Look like?(73, 74)

Engineered Architectures and Examples of best CNN Architectures (75)

Lenet-5 started use of convolution and subsampling (pooling) which used tanh activations (76)

Alexnet : add data augmentation, dropout feature, ReLU instead of tanh, established use of GPU for fast training (Slide 77)

Inception : used 1x1 for reduce dimensionality, implement multiple filters of 1x1, 3x3, 5x5 helps extract relevant features without the need for deeper or wider networks , introduces additional hyperparameter like number and sizes of filters.

Resnet : used skip connections to allow gradients to flow directly to earlier layers, (vanishing gradient)

Efficient Net : Instead of randomly scaling introduced compound scaling to scale up width, depth, and resolution.

Technical Detailed Narrative:Glaucoma Classification

124 of 124

Extract feature from image – Feature Extraction (20)

Hand engineered feature extraction – extracts edge, corners, (21, 22)

Elementwise multiplication of filter and images creates feature map. (23)

Edge detection – sharp change in image pixels (24, 25)

Image Thresholding, Harris Corner (26, 27)

Challenging conditions in feature extractions (28)

Feature Learning and Classification in CNN (30)

CNN feature maps visualization – How does a CNN learn? (31, 32, 33)

How do linear function in CNN work? (34 - 38)

Why is it necessary to incorporate Non-Linear activations in Linear Functions (37, 38)

Composition linear models with non linear activation-sigmoid (39, 40, 41 )

Logistic Regression – Powerful Backbone of Neural Network is born (42 - 47)

How CNN is so powerful?(48, 49)

Sparse Connectivity of CNN (50, 51 ), Receptive Field (52), Parameter Sharing 53-58),

Translation Equivariance & Invariance (59, 61, 61)

Technical Detailed Narrative : Glaucoma Classification (2)