1 of 10

O-TPT: Orthogonality Constraints for Calibrating Test-Time Prompt Tuning in Vision-Language Models

GIthub Repository

2 of 10

Motivation

    • Vision-language models (VLMs) are powerful but struggle with calibration.

    • Test-time prompt tuning (TPT) improves accuracy, but often overconfident.

    • Poor calibration is risky in domains like healthcare and autonomous systems.

    • Existing methods like C-TPT focus on L2 dispersion but ignore angular structure.

3 of 10

Core Insight

    • Calibration error correlates with cosine similarity between text features
    • Greater angular separation = better calibration.
    • Dispersion alone (via L2 norm) underutilises feature space.

4 of 10

Our Method – O-TPT

Orthogonality constraint on class text features

    • Encourages low cosine similarity between text embeddings
    • Promotes better angular utilisation of hyperspherical space

5 of 10

Experimental Setup

    • Backbones: CLIP-B/16 (ViT) and CLIP-RN50 (ResNet)
    • Tasks:
      • Fine-grained: Food101, DTD, Pets, Cars, etc. (9 datasets)
      • General: ImageNet, Caltech101
      • Natural distribution shifts: ImageNet-A, V2, R, S
    • Metrics: Accuracy, Expected Calibration Error (ECE), Static Calibration Error (SCE)

6 of 10

Experiments

CLIP-B/16

CLIP-RN50

7 of 10

Experiments

Natural Distribution Shift

8 of 10

Calibration Quality – Reliability Diagrams

    • C-TPT shows drift from diagonal (poor calibration)
    • O-TPT stays close to diagonal (well-calibrated)

9 of 10

Conclusion

    • O-TPT improves calibration in test-time prompt tuning without requiring supervision
    • Enforcing orthogonality on textual features leads to more reliable confidence estimates
    • Consistently achieves lower ECE across datasets, backbones, and domain shifts
    • Future Direction:
      • Dynamic Prompt Regularization: Adjust orthogonality strength based on uncertainty or input shifts.

      • Medical and Safety-critical Applications: Deploy O-TPT in high-risk domains like radiology or autonomous driving where calibrated confidence is essential.

10 of 10

Thank You

GIthub Repository