1 of 10

Pre-Trained Models.

By Christine Muthee.

2 of 10

OUTLINE.

What are Pre-trained Models?
Why do we need them?
How do we obtain them?
Tips on using pre-trained Models.

2

3 of 10

1.

WHAT ARE PRETRAINED MODELS?

4 of 10

A pre-trained model is a neural network model that has been trained on a large dataset whose learnt parameters can be reused.

4

“

5 of 10

CHARACTERISTICS OF PRETRAINED MODELS.

Prior Knowledge

They have learned general features (eg edges, shapes, textures, or grammar, context) from vast datasets.

This knowledge serves as a foundation for new tasks.

Saved Weights

The parameters are saved after the initial training and can be loaded to initialize a model.

Open Source *

Many pre-trained models are openly available in model zoos or libraries (e.g. PyTorch, Hugging Face Hub, TensorFlow Hub).

5

6 of 10

BIG CONCEPT

Transfer learning: the practice of taking a pre-trained model and adapting it to a new but related task.

6

7 of 10

2.

WHY DO WE NEED THEM?

8 of 10

Importance of Pretrained Models.

Reduced Training Time and Cost.

Avoiding OverFitting.

Better Performance on small Datasets.

Feature Extraction.

Benchmarks.

8

Source: CS231N CNN for Visual Rec

Reduced Training Time - Training a Deep NN from scratch can be time consuming and resource intensive. Replicating a moden CNN on ImageNet from scratch might take days , even weeks on a powerful hardware thats why, with pretrained models, you can skip this process and save time because the model already converged on a larger dataset.
In your work place or research, Its hard to have very large dataset for every new task. Pretrained dataset come in to handy to give you a com[arable performance on a small set of data. The model has learned general features from a big dataset and can apply them to your problem. As a result, models fine-tuned from pre-trained weights often outperform those trained from scratch on small datasets.
Avoid Over Fitting: Initializing with pre-trained weights can act as a form of regularization. The model's parameters start in a good place that encodes general knowledge, rather than random values. This often means you need less data to train effectively, and you're less likely to overfit (memorize) the small training set you have.
Even without fine-tuning, pre-trained models can be used as feature extractors. Early layers of CNNs learn very generic features (edges, colors, basic shapes) that are useful for many tasks. You can freeze the convolutional base of a CNN and use it to transform raw images into feature vectors and then feed them to a classifier (as a head).
Popular Pretrained Models (ResNet, VGG BERT etc are often well tested and benchmarked. Using them gives you a reliable starting point.

9 of 10

3. SOURCES OF PRE-TRAINED MODELS.

PyTorch Model Zoo.

2. Hugging Face Hub.

3. TensorFlow Hub.

4. ONNX Model Zoo.

5. Kaggle (Pretrained Models).

9

And many more …

Pytorch has libraries eg Torchvision from which you can access pretrained models on. PyTorch also offers a broader model hub (TorchHub) for models contributed by the community (including NLP and audio models).
HF - Hugging Face hosts hundreds of thousands of models contributed by the community and organizations. The Hugging Face Hub is a rich resource not just for models in PyTorch, but also TensorFlow, Keras, and JAX.
Although we use Pytorch for our assignments, TensorFlow offers an Open Source repo of pretrained Models. You can check out tensorflow hub to load and reuse these models in TF/Keras.
(Open Neural Network Exchange) is an open format to represent models. The ONNX Model Zoo is a collection of pre-trained state-of-the-art models in the ONNX format. These models can be used across frameworks that support ONNX. The model zoo includes vision models (classification, detection), NLP models, etc., often converted from PyTorch or TensorFlow.
Kaggle has a section for pre-trained models (Kaggle Kernels / Models) where community members share models they've trained, including weights, I will let you explore this by your selves, since you’re already familiar with the territory.

These are

10 of 10

TIPS ON USING PRETRAINED MODELS.

Understand the Source Dataset.
Leverage Transfer Learning (can be parameter efficient).
Choose the right model for your Task.
Use Pre Trained Models as Primary Feature Extractors.
Experiment with different Fine-Tuning Strategies.

10

Pre-trained models are typically trained on large datasets like ImageNet (for vision tasks) or large text corpora (for NLP tasks). Understanding the source dataset helps you gauge whether the model’s learned features are transferable to your task. Also, ensure that your input data matches the format and pre-processing steps used during the model's original training (e.g., resizing images, normalization of pixel values, etc.).
Start with a smaller learning rate: Fine-tuning a pre-trained model requires careful learning rate tuning. A small learning rate will ensure that the pre-trained weights are preserved while you adapt the model to your data.
Consider the type of data as well. Consider model size and complexity: Pre-trained models come in various sizes (e.g., ResNet-18, ResNet-50, ResNet-152), which offer a trade-off between computational resources and accuracy. Choose a model that balances accuracy with performance for your task.
Freeze lower layers: You can use pre-trained models as feature extractors by freezing the lower layers and only training a small classifier on top. This is helpful when you have limited data and still want to use the model's learned features.
Freeze layers based on task similarity: If your task is very similar to the task the model was originally trained on (e.g., image classification on ImageNet), you might only need to fine-tune the final classification layers. For more dissimilar tasks, you may need to fine-tune more layers.

These are just but a few tips of using pretrained Models, there are others that you will come to interact with as you continue experiencing DL models more and more. Remember that in this course WE DO NOT USE PRETRAINED MODELS ! The purpose of this is so that we learn how to treak models amd understand what is happening under the hood.