1 of 33

EGDCL (Evidence-Guided Dual-Curriculum Learning)

An Adaptive Curriculum Learning Framework for Unbiased Glaucoma Diagnosis

Shirshak Acharya

RA, NAAMII

2 of 33

Method

Reweighted Loss Function

3 of 33

Student (Self Attention CNN network)

4 of 33

Spatial Attention

Use CNN & transform i/p image to downsampled feature maps
Use bilinear sampling in later stages of spatial attention network to generate attention score for whole image

In our case, attention is calculated for

feature map, so no bilinear sampling needed

5 of 33

Spatial Attention

Let, Ms(F) be the spatial attention for image feature map (F),

Spatial attention across feature map of CxH x W feature map:

H dimensional vector for each W of feature map

6 of 33

Other techniques for Spatial Attention

https://www.digitalocean.com/community/tutorials/attention-mechanisms-in-computer-vision-cbam

Input tensor (channel × height × width) is decomposed into (2 × h × w)

1st channel = max pooling across channel
2nd channel = average pooling across channels

Then, Convolution + Batch Norm + Relu(optional)

Passed to sigmoid layer, which gives importance of each pixel

7 of 33

Channel Attention

8 of 33

Squeeze and Excitation Network

Global Average Pooling to Image of HxWxC becomes -> 1x1xC
Passed to MLP with hidden nodes(r) = reduction ratio-> hyperparameter
MLP same o/p nodes as hidden nodes
Sigmoid Activation
Residual connection

9 of 33

Channel Attention

Same as SE network
New : Maxpool is used for preserving edge information of image

10 of 33

Channel Attention

Let, Mc(F) be the channel attention for image feature map (F),

Channel attention across feature map of CxHxW :

C dimensional vector which gives weight to each channels based on Glaucoma Classification

11 of 33

Student (Self Attention CNN network)

3d attention map across cxhxw

What does this mean?

12 of 33

Student (Self Attention CNN network)

Overall Attention Map

Refined Feature Map, F of input image then becomes

13 of 33

Example of attention map calculation

14 of 33

Example of attention map calculation

15 of 33

Student (Evidence Identification Algorithm)

3d attention map across cxhxw

Relevance matrix, E is computed which gives importance of all features

16 of 33

Student (Evidence Identification Algorithm)

Suppose label, c∈ [0, 1]; 0 = Normal, 1=Glaucoma
p(c|F) = prob. of occurrence of class c given all features
p(c|F \i )= prob. of occurrence of class c given all features except Fi

3d attention map across cxhxw

17 of 33

Student (Evidence Identification Algorithm)

Suppose label, c∈ [0, 1]; 0 = Normal, 1=Glaucoma
p(c|F) = prob. of occurrence of class c given all features
p(c|F \i )= prob. of occurrence of class c given all features except Fi

The difference gives us how prediction changes for each feature(i) of feature map

3d attention map across cxhxw

18 of 33

Student (Evidence Identification Algorithm)

Each feature Fi can be unknown, so we can't directly subtract feature Fi from features F & compute

So, using marginaling effects of F1 from joint distribution :

3d attention map across cxhxw

probability of observing feature Fi given other features F /i

probability of observing label c given feature Fi and F /i

19 of 33

Student (Evidence Identification Algorithm)

3d attention map across cxhxw

probability of observing feature Fi given other features F /i

probability of observing label c given feature Fi and F /i

20 of 33

Student (Evidence Identification Algorithm)

3d attention map across cxhxw

probability of observing feature Fi given other features F /i

probability of observing label c given feature Fi and F /i

= joint probability of observing feature Fi given other features F/i & observing class c given F

Marganilize across Fi, to get probability of observing label c given F/i ???

21 of 33

Student (Evidence Identification Algorithm)

3d attention map across cxhxw

Assumption by scientists :

probability of occurring feature Fi is independent of the features of neighboring pixel of image

Finally we get overall Evidence Matrix equal to size of input image

22 of 33

Overall Student Network

23 of 33

Method

Reweighted Loss Function

24 of 33

Dual Curriculum Generation

Curriculum learning [1] :

Ordering training data by way humans learn : from simple to complex samples

[1] : Bengio, Y., Louradour, J., Collobert, R., Weston, J.: Curriculum learning. In: ICML

25 of 33

Dual Curriculum

26 of 33

Sample Curriculum

α Weight altered by not just evidance map but teacher network prediction

27 of 33

Sample Curriculum

Takes into account :

teacher model's estimated probability only for positive(disease) label
Evidence Map's estimated probability only for positive(disease) label

Note :

Evidance map's (Ei) passed to compact "correct sub-network" for sample xi to get evidence map estimated probability

Gives a weight value for each sample, if model correctly classifies then less weight & doesn't classify correctly...more weight
Details on next section

28 of 33

Sample Curriculum

weight value (α) will be :

denote the model’s estimated probability for class with label y = 1 based on teacher network & evidence maps

= Weight factor only affected by evidence map if prediction = wrong

29 of 33

Sample Curriculum

Weight altered by evidence maps

Weight altered by teacher n/w

30 of 33

Properties of Weight (α)

Above formula only works if booli = 1 i.e evidance map doesn't classify samples (hard samples)

As piE gets closer to 0.5 & sample is misclassified, weighting factor αi becomes larger and the loss is up-weighted

Weight altered by evidence maps

31 of 33

Properties of Weight (α)

Teacher also focuses on hard samples if piT gets closer to 0.5

Weighting factor αi becomes larger and the loss is up-weighted

Weight altered by teacher model

1 of 33

2 of 33

3 of 33

4 of 33

5 of 33

6 of 33

7 of 33

8 of 33

9 of 33

10 of 33

11 of 33

12 of 33

13 of 33

14 of 33

15 of 33

16 of 33

17 of 33

18 of 33

19 of 33

20 of 33

21 of 33

22 of 33

23 of 33

24 of 33

25 of 33

26 of 33

27 of 33

28 of 33

29 of 33

30 of 33

31 of 33

32 of 33

33 of 33