JavaScript isn't enabled in your browser, so this file can't be opened. Enable and reload.

1 of 10

O-TPT: Orthogonality Constraints for Calibrating Test-Time Prompt Tuning in Vision-Language Models

GIthub Repository

2 of 10

Motivation

Vision-language models (VLMs) are powerful but struggle with calibration.

Test-time prompt tuning (TPT) improves accuracy, but often overconfident.

Poor calibration is risky in domains like healthcare and autonomous systems.

Existing methods like C-TPT focus on L2 dispersion but ignore angular structure.

3 of 10

Core Insight

Calibration error correlates with cosine similarity between text features

Greater angular separation = better calibration.

Dispersion alone (via L2 norm) underutilises feature space.

4 of 10

Our Method – O-TPT

Orthogonality constraint on class text features

Encourages low cosine similarity between text embeddings

Promotes better angular utilisation of hyperspherical space

5 of 10

Experimental Setup

Backbones: CLIP-B/16 (ViT) and CLIP-RN50 (ResNet)

Tasks:

Fine-grained: Food101, DTD, Pets, Cars, etc. (9 datasets)

General: ImageNet, Caltech101

Natural distribution shifts: ImageNet-A, V2, R, S

Metrics: Accuracy, Expected Calibration Error (ECE), Static Calibration Error (SCE)

6 of 10

Experiments

CLIP-B/16

CLIP-RN50

7 of 10

Experiments

Natural Distribution Shift

8 of 10

Calibration Quality – Reliability Diagrams

C-TPT shows drift from diagonal (poor calibration)

O-TPT stays close to diagonal (well-calibrated)

9 of 10

Conclusion

O-TPT improves calibration in test-time prompt tuning without requiring supervision

Enforcing orthogonality on textual features leads to more reliable confidence estimates

Consistently achieves lower ECE across datasets, backbones, and domain shifts

Future Direction:

Dynamic Prompt Regularization: Adjust orthogonality strength based on uncertainty or input shifts.

Medical and Safety-critical Applications: Deploy O-TPT in high-risk domains like radiology or autonomous driving where calibrated confidence is essential.

10 of 10

Thank You

GIthub Repository