1 of 14

Image Segmentation

Vatsal Sivaratri

Modified from presentation by Arnav Jain and Sauman Das

TJ Machine Learning Club

Slide 1

2 of 14

What is Segmentation?

  • Identifying a region of an image
  • Regions are usually representative of an object
    • Chairs in a classroom
    • Tumors in a medical scan
  • Goal: Create a mask around the region(s) of interest

TJ Machine Learning Club

Slide 2

3 of 14

Types of Segmentation

  • Semantic Segmentation
    • Each pixel has an associated class
    • U-Nets
  • Instance Segmentation
    • Can distinguish objects within the same class
    • Mask RCNN, Faster RCNN
  • Panoptic Segmentation
    • Mix of both semantic and instance segmentation.
    • Tracks both classes and individual instances
    • Mask RCNN, UPSNet, FPSNet

TJ Machine Learning Club

Slide 3

4 of 14

U-Net

What is the input to an image segmentation model?

What is the output to an image segmentation model? What dimension should this output be?

TJ Machine Learning Club

Slide 4

5 of 14

Types of Layers

Convolution

Note: This is actually a cross-correlation!

MaxPooling

TJ Machine Learning Club

Slide 5

6 of 14

Types of Layers

Upsampling

TJ Machine Learning Club

Slide 6

7 of 14

U-Net

  • Symmetric Structure (“U”)
  • Convolutional Layers Increase in # of filters
  • MaxPool is the Downsampler (What decreases size and loses information)
  • UpSampling2D is the Upsampler (What reverts the original image size)
  • Allows for Skip Layers

TJ Machine Learning Club

Slide 7

8 of 14

Max-Pool

TJ Machine Learning Club

Slide 8

9 of 14

Code for U-Net

  • Convolutional Layers Increase in # of filters
  • MaxPool is the Downsampler (What decreases size and loses information)
  • UpSampling2D is the Upsampler (What reverts the original image size)

TJ Machine Learning Club

Slide 9

10 of 14

FCN

  • FCN: End-to-End Segmentation with Convolutional Transpose Layers.
  • Flexible: Handles Arbitrary Image Sizes, Adaptable to Different Architectures.
  • Convolutional Layers: Extract hierarchical features.
  • Upsampling Layers: Increase spatial resolution.
  • Skip Connections: Fuse coarse and fine features.
  • Output Layer: Produce pixel-wise predictions.

TJ Machine Learning Club

Slide 10

11 of 14

R-CNN

  • Temporal Convolutional Processing
  • Recurrent Connections
  • Sequential Data Modeling

  1. Convolutional Layers: Extract spatial features.
  2. Recurrent Layers: Capture temporal dependencies.
  3. Pooling Layers: Downsample spatial dimensions.
  4. Fully Connected Layers: Global information aggregation.
  5. Output Layer: Generate final predictions.

TJ Machine Learning Club

Slide 11

12 of 14

When to use each?

FCN:

  • End-to-End Semantic Segmentation
  • Adaptability to Varying Image Sizes
  • Spatial Context Understanding
  • Learnable Upsampling

U-Net:

  • High-Quality Image Segmentation
  • Biomedical Image Analysis
  • Limited Training Data
  • Local Feature Emphasis

R-CNN:

  • Object Detection with Precise Localization
  • Instance Segmentation
  • Selective Search Region Proposals
  • Complex Scenes with Multiple Objects

TJ Machine Learning Club

Slide 12

13 of 14

Loss Functions/Metrics

  • Standard Classification Task Losses
    • Binary Cross-Entropy

    • Categorical Cross-Entropy

  • Intersection-over-Union

  • Dice Score

TJ Machine Learning Club

Slide 13

14 of 14

Image Segmentation Applications

  • Medicine
  • Self Driving Cars/Robotics
  • Video Surveillance
  • Traffic Control Systems

TJ Machine Learning Club

Slide 14