1 of 11

SEMINAR WEB ENGINEERING (WS 2020/2021)�DATA AUGMENTATION

Data Augmentation

Model Checkpointing

2 of 11

  • What are model checkpoints?
    • Saved snapshots of model state during training to resume or restore later.
  • Basic concept: Like saving your game progress - stores everything about your model at a point in time

Introduction

3 of 11

  • Real-world analogy: Think of checkpoints like taking snapshots on your phone - you can always go back to that exact moment.
  • Why we need them: Training can take days - imagine losing progress due to power failure!

What is a Checkpoint?

4 of 11

Model weights??

  • The actual numbers your model has learned.
  • Think of them as your model's "memory".
  • Stored as large arrays of floating-point numbers.

What's Inside a Checkpoint?

Source: Medium

5 of 11

  • Stores momentum and velocity for optimizers like Adam
  • Important because it helps your model learn smoothly
  • Example: Adam keeps track of past gradients to decide future updates

Optimizer state

6 of 11

  • Current epoch number (which training round you're on)
  • Current learning rate
  • Best accuracy achieved so far
  • Like keeping track of your high score in a game

Training progress

7 of 11

How to Save Checkpoints?

8 of 11

  • After every epoch (might waste storage)
  • Every N epochs (e.g., every 5 epochs)
  • When you hit a new "high score" (best validation accuracy)

When to save?

9 of 11

  • Each checkpoint might be 500MB or larger!
  • Solution: Keep only last 3 checkpoints + best one!

Naming convention:

  • Bad: 'model.pth'
  • Good: 'model_epoch10_acc85.3.pth'

Storage problems

10 of 11

  • Always verify your checkpoint after saving
  • Keep backup checkpoints

Corrupted saves

11 of 11

Demo