1 of 14

Image Segmentation

Vatsal Sivaratri

Modified from presentation by Arnav Jain and Sauman Das

TJ Machine Learning Club

Slide 1

2 of 14

What is Segmentation?

Identifying a region of an image
Regions are usually representative of an object

Chairs in a classroom
Tumors in a medical scan

Goal: Create a mask around the region(s) of interest

TJ Machine Learning Club

Slide 2

3 of 14

Types of Segmentation

Semantic Segmentation

Each pixel has an associated class
U-Nets

Instance Segmentation

Can distinguish objects within the same class
Mask RCNN, Faster RCNN

Panoptic Segmentation

Mix of both semantic and instance segmentation.
Tracks both classes and individual instances
Mask RCNN, UPSNet, FPSNet

TJ Machine Learning Club

Slide 3

4 of 14

U-Net

What is the input to an image segmentation model?

What is the output to an image segmentation model? What dimension should this output be?

TJ Machine Learning Club

Slide 4

5 of 14

Types of Layers

Convolution

Note: This is actually a cross-correlation!

MaxPooling

TJ Machine Learning Club

Slide 5

6 of 14

Types of Layers

Upsampling

TJ Machine Learning Club

Slide 6

7 of 14

U-Net

Symmetric Structure (“U”)
Convolutional Layers Increase in # of filters
MaxPool is the Downsampler (What decreases size and loses information)
UpSampling2D is the Upsampler (What reverts the original image size)
Allows for Skip Layers

TJ Machine Learning Club

Slide 7

8 of 14

Max-Pool

TJ Machine Learning Club

Slide 8

9 of 14

Code for U-Net

Convolutional Layers Increase in # of filters
MaxPool is the Downsampler (What decreases size and loses information)
UpSampling2D is the Upsampler (What reverts the original image size)

TJ Machine Learning Club

Slide 9

10 of 14

FCN

FCN: End-to-End Segmentation with Convolutional Transpose Layers.
Flexible: Handles Arbitrary Image Sizes, Adaptable to Different Architectures.
Convolutional Layers: Extract hierarchical features.
Upsampling Layers: Increase spatial resolution.
Skip Connections: Fuse coarse and fine features.
Output Layer: Produce pixel-wise predictions.

TJ Machine Learning Club

Slide 10

11 of 14

R-CNN

Temporal Convolutional Processing
Recurrent Connections
Sequential Data Modeling

Convolutional Layers: Extract spatial features.
Recurrent Layers: Capture temporal dependencies.
Pooling Layers: Downsample spatial dimensions.
Fully Connected Layers: Global information aggregation.
Output Layer: Generate final predictions.

TJ Machine Learning Club

Slide 11

12 of 14

When to use each?

FCN:

End-to-End Semantic Segmentation
Adaptability to Varying Image Sizes
Spatial Context Understanding
Learnable Upsampling

U-Net:

High-Quality Image Segmentation
Biomedical Image Analysis
Limited Training Data
Local Feature Emphasis

R-CNN:

Object Detection with Precise Localization
Instance Segmentation
Selective Search Region Proposals
Complex Scenes with Multiple Objects

TJ Machine Learning Club

Slide 12

13 of 14

Loss Functions/Metrics

Standard Classification Task Losses

Binary Cross-Entropy

Categorical Cross-Entropy

Intersection-over-Union

Dice Score

TJ Machine Learning Club

Slide 13

14 of 14

Image Segmentation Applications

Medicine
Self Driving Cars/Robotics
Video Surveillance
Traffic Control Systems

TJ Machine Learning Club

Slide 14