1 of 28

Lit Review: nnU-Net

For Biomedical Image Segmentation

Selina Liu

2 of 28

Prez Outline: U-Net → nnU-Net → application

Application [lung tumor segmentation]

nnU-Net: design a U-Net for us

ML image seg pipeline

Image input

Image output

3 of 28

Image Segmentation: extract important info from image

  • A process of dividing an image into meaningful regions/objects
  • Idea: assign each pixel / groups of pixels with a specific label / class
  • Relevant application: analyze and understand various structures and features of biomedical images
    • E.g. identify and analyze tumors in CT scans
  • Assist medical research, diagnosis, and treatment planning
  • Many image segmentation methods: manual annotation, thresholding, …

4 of 28

Biomedical image segmentation pipeline: ML approach

Research Papers

International Competitions

Propose New Architecture

  • CNN
  • random forest
  • U-Net

DataSet Properties

DataSet challenge

Data splitting

Initialization

Loss function

Monitoring and validation

Hyperparameter tuning

Performance metric

Testing

Additional steps

Model Architecture

Pre-processing

Model training

post-processing

5 of 28

Our choice of machine learning architecture: U-Net

Anand, V.; Gupta, S.; Koundal, D.; Nayak, S.R.; Barsocchi, P.; Bhoi, A.K. Modified U-NET Architecture for Segmentation of Skin Lesion. Sensors 2022, 22, 867. https://doi.org/10.3390/s22030867

  • Our approach: U-Net
    • Effective feature representation: capture low-level & high-level features
    • Reduced need for training data: has good performance even with limited data [our case]
    • Relatively flexible: could easily be extended to adapt to additional biomedical imaging datasets.
    • Promising: has shown significant success in literature.
    • Available pretrained models: transfer learning is faster to train and often results in better results
    • Apply to both 2D and 3D data

Model Architecture

Pre-processing

Model training

post-processing

6 of 28

U-Net topology: Encoder & Skip Connections & Decoder

Encoder

  • Contracting path
  • Gradually reduce the spatial resolution while increasing # of captured image features
  • Extract higher-level/ abstract representations from the input image.

Decoder

  • Expanding path
  • Gradually increase the spatial resolution, concatenating skip connection
  • Extract lower-level / detailed representations from the input image

Skip Connections:

  • directly link corresponding layers between the encoding and decoding paths.
  • enable the flow of information at different scales and preserve the fine-grained details

Model Architecture

Pre-processing

Model training

post-processing

7 of 28

Encoder: Capture high-level features hierarchically.

  • A series of convolutional layers and pooling layers
  • Convolutional layer:
    • Apply multiple convolutional filters to abstract features form the input data
    • Each filter learns to extract a different set of features from the input image
    • Output a stack of feature maps
  • Pooling layer:
    • Downsample the feature maps and capture larger scale information by max-pooling:
    • Max pooling: retains the most prominent feature while discarding less relevant information.

Model Architecture

Pre-processing

Model training

post-processing

8 of 28

Encoder: Capture high-level features hierarchically.

  • A series of convolutional layers and pooling layers
  • Convolutional layer:
    • Apply multiple convolutional filters to abstract features form the input data
    • Each filter learns to extract a different set of features from the input image
    • Output a stack of feature maps
  • Pooling layer:
    • Downsample the feature maps and capture larger scale information by max-pooling:
    • Max pooling: retains the most prominent feature while discarding less relevant information.

Model Architecture

Pre-processing

Model training

post-processing

9 of 28

Encoder: Capture high-level features hierarchically.

  • A series of convolutional layers and pooling layers
  • Convolutional layer:
    • Apply multiple convolutional filters to abstract features form the input data
    • Each filter learns to extract a different set of features from the input image
    • Output a stack of feature maps
  • Pooling layer:
    • Downsample the feature maps and capture larger scale information by max-pooling:
    • Max pooling: retains the most prominent feature while discarding less relevant information.

Model Architecture

Pre-processing

Model training

post-processing

10 of 28

Skip Connection: enable multi-level information flow

  • preserve spatial details and enable information flow at multiple levels

  • Skip connections are created by concatenating/merging the feature maps from the encoding path to the corresponding layers in the decoding path.

  • These connections allow the decoder to access both local and global information, aiding in accurate segmentation.

Model Architecture

Pre-processing

Model training

post-processing

11 of 28

Decoder: reconstruct lower-level features hierarchically

  • A series of upsampling and convolutional layers
  • Upsampling
    • using deconvolutions through padding

  • the upsampled feature maps are concatenated with the corresponding skip connection, enables the decoder to recover spatial details

  • A series of convolutional filters are then applied to concatenated feature maps to extract and learn relevant features, and further refine our segmentation

Model Architecture

Pre-processing

Model training

post-processing

12 of 28

Decoder: reconstruct lower-level features hierarchically

  • A series of upsampling and convolutional layers
  • Upsampling
    • using deconvolutions through padding

  • the upsampled feature maps are concatenated with the corresponding skip connection, enables the decoder to recover spatial details

  • A series of convolutional filters are then applied to concatenated feature maps to extract and learn relevant features, and further refine our segmentation

Model Architecture

Pre-processing

Model training

post-processing

13 of 28

DataSets Properties: variability & class-imbalance & limited

Model Architecture

Pre-processing

Model training

post-processing

  • Large variability
    • due to variations in imaging modalities, patient populations, disease states, and imaging protocols
    • can pose challenges in designing robust models that generalize well across different variations
  • Class imbalance
    • regions of interest might be significantly underrepresented compared to others.
    • Dealing with class imbalance is crucial to ensure that the model learns to segment all classes effectively and avoids biased predictions.
  • Limited annotated data
    • It is common to have limited annotated data, which necessitates techniques like data augmentation and/or transfer learning to mitigate the limitations of scarce annotations.

14 of 28

Model Training: learn from data & extract useful info

Data Pre-processing

Train

Test

Hyperparameter initialized model

Train Input

Train Output

Trained model

Test Input

Test Output

Predicted Output

Model evaluation

compare

ML Architecture: U-Net

Model Architecture

Pre-processing

Model training

post-processing

15 of 28

Model Training: hyper-parameters fine-tuning

Data Pre-processing

Train

Test

Hyperparameter initialized model

Train Input

Train Output

Trained model

Test Input

Test Output

Predicted Output

Model evaluation

compare

ML Architecture: U-Net

Model Architecture

Pre-processing

Model training

post-processing

Best/ Final model

16 of 28

Multiple design choices are needed to obtain optimal model

U-Net (Variants)

  • Attention U-Net
  • Residual U-Net
  • Dense U-Net
  • Ensemble U-Net
  • Adversarial U-Net
  • Inception U-Net
  • Recurrent U-Net
  • 2.5D U-Net
  • Data Augmentation
    • Rotation, flipping, scaling
    • Elastic deformation
  • Loss function
    • Dice loss
    • Cross-entropy loss
  • Qualification Metric
    • Dice Coefficients
    • Hausdorff Distance

U-Net and its Variants

Other Design Choices

Model Architecture

Pre-processing

Model training

post-processing

17 of 28

Post-Processing: potential further improvement

  • additional steps after the initial segmentation output of the U-Net model
  • [potentially] refine and improve the segmentation results
  • involve applying various techniques to address specific challenges and enhance the quality of the segmentation output.
    • Smoothing operations
      • Create more coherent segmentations
    • Conditional rules
      • Domain-specific rules [shape constraints]
    • Connected component analysis
      • Better connectivity of segmented objects

Model Architecture

Pre-processing

Model training

post-processing

18 of 28

… Oops! U-Net pipeline has major drawbacks

  • Cumbersome U-Net pipeline design
    • Most design choices are highly dependent on each other
      • Many U-Net variants
      • Number of neutrons, number of convolutional layers, learning rate, dropout, loss function, qualification metric…
    • Difficult to follow the literature and ascertain design choices that generalize beyond the experiment they demonstrate

  • Time-consuming
    • Current practice is expert-driven
    • Involved manual trials and error experiments
    • specific to the task at hand

19 of 28

Systematic U-Net pipeline design? No New U-Net!

  • Motivation:
    • Researchers don’t want to design a new U-Net pipeline everytime they have a new segmentation task.
  • Thought :
    • Is here a higher-level model that design a U-Net for me?

  • Solution:
    • nnU-Net (no new U-Net)
    • The method design a U-Net pipeline for the specific dataset and segmentation task
    • achieves state of the art performance on several medical segmentation benchmarks.

Isensee, F., Jäger, P. F., Kohl, S. A. A., Petersen, J., & Maier-Hein, K. H. (2020, April 2). Automated design of Deep Learning Methods for Biomedical Image segmentation. arXiv.org. https://arxiv.org/abs/1904.08128

20 of 28

Pipeline comparison: expert-driven V.S. nnU-Net

21 of 28

nnU-Net: a ML to design ML(s) to make predictions

Segmentation algorithm can be formalized as:

  • f = segmentation algorithm (U-Net)
  • x = input image
  • y-hat = predicted segmentation
  • θ = set of hyper-parameters

nnU-Net: formalizing the process of adjusting θ based on dataset

  • g = nnU-Net
  • X, Y = datasets properties
  • θ = the formalized optimal set of hyper-parameters

22 of 28

nnU-Net configures seg pipeline using 3-step recipe

  • Fixed parameters
    • are not adapted.
    • certain architecture and training properties that can simply be used all the time.
    • e.g.nnU-Net's loss function, (most of the) data augmentation strategy and learning rate.
  • Rule-based parameters
    • use the dataset fingerprint to adapt certain segmentation pipeline properties by following hard-coded heuristic rules.
    • e.g.the patch size, network topology and batch size are optimized jointly given some GPU memory constraint.
  • Empirical parameters
    • are essentially model-learned / trial-and-error.
    • E.g. the optimization of the postprocessing strategy.
  • Training time estimate: 18 hrs - 3 days (dependent on dataset)

23 of 28

nnU-Net output & its performance in competitions

Based on a given dataset, nnU-Net creates three U-Net configurations:

  • a 2D U-Net (for 2D and 3D datasets)
  • a 3D U-Net that operates on a high image resolution (for 3D datasets only)
  • a 3D U-Net cascade where first a 3D U-Net operates on low resolution images and then a second high-resolution 3D U-Net refined the predictions of the former (for 3D datasets with large image sizes only)
  • Image segmentation time estimate: <60s - 10mins (dependent on dataset)

nnU-Net outcompetes many specialized deep learning pipelines,

and in [KiTS challenge] semi-target setting (Lung Tumor

segmentation), it has the Best performance!

24 of 28

nnU-Net could be sub-optimal: possible further improvements!

nnU-Net could be suboptimal for some segmentation tasks

  • It’s developed with a focus on the Dice coefficient [2D scenarios] as performance metric, may not be optimal for other metrics [3D scenarios]
  • Unconsidered dataset properties could exist which may cause suboptimal segmentation performance
  • Possible post-processing techniques specifically for our dataset may not be included in nnU-Net

For highly domain specific cases, nnU-net should be seen as a good starting point for necessary modifications

E.g. in this study, the proposed modifications to the default nnU-Net pipeline substantially improved the results both on the training set cross-validation as well as the official validation set

25 of 28

Further improvements & Next Steps

Further Improvements we can do:

  • Extended Benchmark Datasets train our own nnU-Net [lung-tumor-oriented]

  • Faster reaction time:
    • 10mins -> ~1s [this study]

Next Steps

  • Pretrained Model performance test
  • Train our nnU-Net [lung cancer oriented]
  • Validation on UCSF datasets

Future steps

  • Extended segmentation tasks
  • User Friendly reaction time
  • 3D interactive feature

Lung Nodule Analysis 2016

880 patients

2D

Kaggles Data Science Bowl

1397 patients

2D

The Lung Image Database Consortium dataset

[LIDC] → 1024 patients / 2D

26 of 28

Index Page: annotated perspective papers & ML terms

In you are interested, please check out this doc that summarizes the related literature.

Each paper is highlighted in four levels

  • Important contents that are highly related to our project
  • Semi-related examples/ explanations/ supplements
  • ML terms that are annotated in more details
  • Alternative methods / research themes that could be further explored

In you are interested, please check out this doc that give more detailed info for machine-learning related terms noted in the literature review.

  • This doc is organized in a standard pipeline of biomedical image segmentation
  • For each major action, the functionality of the action and the methods are noted

27 of 28

Summary: nnU-Net for biomedical segmentation task

ML image seg pipeline

  • Model architecture
    • U-Net
  • Pre-processing
    • Data augmentation
  • Model training
    • hyperparameter
  • Post-processing
    • Further improve

nnU-Net to design U-Net

  • 3-step recipe:
    • Fixed
    • Rule-based
    • exp-learned
  • Default output:
    • 2D
    • 3D
    • 3D cascade
  • Good starting point

Application & next steps

  • Pretrain model test
  • Train task-specific nnU-Net
  • Validate on UCSF dataset

Cumbersome Design

Sub-optimal

28 of 28

THANK U

Selina Liu😊