⚕ Caduceus Distill Report
Tabtab Labs
1
Agenda
Tabtab Labs
2
Goal:
Is it possible to leverage distillation on the Caduceus model to reduce the inference cost while keeping good performance.
Tabtab Labs
3
Tabtab Labs
4
Distillation
Tabtab Labs
5
Tabtab Labs
6
Caduceus Model
Distillation
Learned Model
About distillation
Tabtab Labs
9
About distillation
Tabtab Labs
10
"Hard" Loss
"Soft" Loss
About distillation
Tabtab Labs
11
L = α("Soft" Loss) + (1 - α)("Hard" Loss)
The impact of Temperature
Tabtab Labs
12
The impact of Temperature
Tabtab Labs
13
Tabtab Labs
16
Tabtab Labs
17
Hidden states loss
Tabtab Labs
18
Tabtab Labs
19
Cross Entropy �(Distillation Loss)
Cosine �(Hidden States Loss)
Masked Language Modeling�(Original BERT Loss)
Tabtab Labs
20
Caduceus
Tabtab Labs
21
Tabtab Labs
22
Tabtab Labs
23
Tabtab Labs
24
Tabtab Labs
25
DNA Sequence (from HG38)
Nucleotides�(V=12 Tokens!)
Caduceus:
Tabtab Labs
26
Tabtab Labs
27
Experimental results
Our Problem Statement
Tabtab Labs
28
Results
Tabtab Labs
29
“Global” validation loss
Train loss
“Local” validation loss
Results - (Some) Hyperparameters
Tabtab Labs
33
Results - (Some) Hyperparameters
Tabtab Labs
34
Results - Code/Links
uv run distill --help
Tabtab Labs
35
Results
Future Work
Tabtab Labs
36
Tabtab Labs
37
Tabtab Labs
38
Log
Tabtab Labs
39
Log
Tabtab Labs
40
Log
Tabtab Labs
41
Log
Tabtab Labs
42
Log
Tabtab Labs
43
Log
Tabtab Labs
44
Log
wandb.watch(model=model, log="all", log_freq=...)
Tabtab Labs
45
Log
Tabtab Labs
46
Tabtab Labs
47
Log
Tabtab Labs
48
Tabtab Labs
49
Debugging Batch 8590
Tabtab Labs
50
Tabtab Labs
51
Thank You!�Questions?
Tabtab Labs
52
Scaling Laws
Tabtab Labs
53
Tabtab Labs
54
Tabtab Labs
55
Tabtab Labs
56