1 of 14

Carcinoma Classification

OxML 2023 Cases

Dr. M. Singh

Mr. A. Asgharpoor

2 of 14

Dataset

Imbalanced Dataset

186 Biopsy Slides

62 Labeled Images

- Normal: 36 (58%)

- Benign: 14 (22.6%)

- Malignant: 12 (19.4%)

3 of 14

Methods to Handle Imbalanced Datasets

An overview of methods to handle imbalanced datasets

Oversampling: Increasing the number of instances in the minority class

Undersampling: Reduce number of instances in majority class

Class Weighting: Gives more importance to minority class

Ensemble Methods: Combine multiple classifiers to improve performance

4 of 14

Dataset Preprocessing

  • AddGaussianNoise(noise_std)
  • transforms.RandomHorizontalFlip()
  • transforms.RandomVerticalFlip()
  • transforms.RandomRotation(45)
  • transforms.ColorJitter(brightness=0.4, contrast=0.4, saturation=0.4, hue=0.2)

  • transforms.Resize((max_height, max_width)
  • transforms.ToTensor()
  • transforms.Normalize((0.5, 0.5, 0.5), (0.5, 0.5, 0.5)) # To stabilize training and ensures consistent input ranges

Data Augmentation

5 of 14

K-Fold Cross-Validation

What is K-fold Cross-Validation?

Why Stratified K-fold cross-validation?

Benefits of using Stratified K-fold:

Training set

Test set

K Iterations

6 of 14

Weighted Sampling

Why should it be used?

Imbalanced Dataset:

Applying a naive classification would led to bias

Minority Class Importance:

It is crucial to correctly identify samples from the minority classes.

Performance Improvement:

Mitigating the issue of class imbalance

With vs Without

7 of 14

Few Shot Learning

Credit: IARAI

8 of 14

Ensemble Learning

Dataset

Resnet50

Inspection V3

Googlenet

Efficientnet V2

Combine

9 of 14

Check Point

10 of 14

Hyperparameters

Here are the hyperparameter values we used:

- `noise_std`: 0.1

- `max_height`: value determined by finding the maximum height among the images

- `max_width`: value determined by finding the maximum width among the images

- `num_classes`: 3

- `batch_size`: 8

- `k_folds`: 8

- `num_epochs`: 5

- `learning_rate` (for each optimizer):

- `optimizer_resnet`: 0.001

- `optimizer_efficientnet`: 0.001

- `optimizer_inception`: 0.001

- `optimizer_googlenet`: 0.001

11 of 14

Limitations

1. Small Training Data

2. Unbalanced Dataset

3. Preprocessing Challenges

4. Limited Model Training

5. Evaluation Metric

6. Limited Experimentation.

12 of 14

Future Works

1. Use Multi-Model Approach:

- Carcinoma: ⊖ OR ⊕

- If ⊕ : Benign OR Malignant

2. White Padding Approach:

- We only tried Black (Low Contrast with Cancer Cells)

13 of 14

Any Questions?

14 of 14

Thank you for your time and attention 🙂