Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language Models
Will ternary models outshine half-precision and quantised models?
22-07-2024
Introduction
2
Introduction
3
0.0 Introduction
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Introduction
4
0.0 Introduction
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Introduction
5
0.0 Introduction
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Introduction
6
0.0 Introduction
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Compute (FLOPs) are grow faster than memory capacity and bandwidth
Background:
7
0.0 Background
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Background:
8
0.0 Background
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
How are these Memory Bottlenecks in LLMs addressed?
9
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
How are these Memory Bottlenecks in LLMs addressed?
10
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
How are these Memory Bottlenecks in LLMs addressed?
11
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
How are these Memory Bottlenecks in LLMs addressed?
12
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
How are these Memory Bottlenecks in LLMs addressed?
13
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
How are these Memory Bottlenecks in LLMs addressed?
14
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
How are these Memory Bottlenecks in LLMs addressed?
15
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
How are these Memory Bottlenecks in LLMs addressed?
16
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Memory Bottlenecks and Low-Bitwidth Language Modelling
17
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Deployment: Memory Capacity over peak TFLOPs
18
1.0 Memory Bottlenecks and Low-Bitwidth Language Modelling
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Consider recent microarchitectures
Deployment: Memory Capacity over peak TFLOPs
19
1.0 Memory Bottlenecks and Low-Bitwidth Language Modelling
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Consider recent microarchitectures
Deployment: Memory Capacity over peak TFLOPs
20
1.0 Memory Bottlenecks and Low-Bitwidth Language Modelling
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Consider recent microarchitectures
Memory Capacity and Low-Bitwidth Modelling
21
We don’t consider. Overhead of KV cache, activation and compilation incurred during model deployment
A single H-100 can easily fit:
1.0 Memory Bottlenecks and Low-Bitwidth Language Modelling
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Latency: Memory Bandwidth over FLOPs
22
1.0 Memory Bottlenecks and Low-Bitwidth Language Modelling
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Consider recent microarchitectures
Latency: Memory Bandwidth over FLOPs
23
1.0 Memory Bottlenecks and Low-Bitwidth Language Modelling
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Consider recent microarchitectures
Latency and Low-Bitwidth Language Modelling
24
1.0 Memory Bottlenecks and Low-Bitwidth Language Modelling
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Consider recent microarchitectures
TriLM (Language Modelling with Ternary Weights)
25
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Architecture of TriLM
26
2.0 TriLM
Key Architectural Features
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Architecture of TriLM
27
2.0 TriLM
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Architecture of TriLM
28
2.0 TriLM
Linear Layers
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Architecture of TriLM
29
2.0 TriLM
Linear Layers
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Computational Flow
30
2.0 TriLM
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Computational Flow
31
2.0 TriLM
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Computational Flow
32
2.0 TriLM
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Computational Flow
33
2.0 TriLM
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
TriLM vs Bitnet
34
2.0 TriLM
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Relative Performance across architecture
TriLM vs Bitnet
35
Key Highlights:
2.0 TriLM
Relative Performance across architecture
Relative Performance across architecture
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Optimisation Schedule
36
2.0 TriLM
Training loss over 100B tokens for different optimization interventions: both L2 Regularization and Peak LR, only L2 Regularization, only Peak LR, and neither.
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Optimisation Schedule
37
2.0 TriLM
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Training loss over 100B tokens for different optimization interventions: both L2 Regularization and Peak LR, only L2 Regularization, only Peak LR, and neither.
Spectra Suite:
Spanning Parameters & Bitwidth
38
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Overview of suite
The suite includes three model families:
39
3.0 Spectra-Suite: Spanning Parameters & Bitwidth
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Overview of suite
The suite includes three model families:
40
3.0 Spectra-Suite: Spanning Parameters & Bitwidth
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Key Properties of our suite:
41
3.0 Spectra-Suite: Spanning Parameters & Bitwidth
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Key Properties of our suite:
42
3.0 Spectra-Suite: Spanning Parameters & Bitwidth
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Key Properties of our suite:
43
3.0 Spectra-Suite: Spanning Parameters & Bitwidth
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Key Properties of our suite:
44
3.0 Spectra-Suite: Spanning Parameters & Bitwidth
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
About FloatLM (Float16 LM)
45
3.0 Spectra-Suite: Spanning Parameters & Bitwidth
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
About QuantLM (Quantized LM)
46
3.0 Spectra-Suite: Spanning Parameters & Bitwidth
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Training Dynamics and Scaling Laws
47
3.0 Spectra-Suite: Spanning Parameters & Bitwidth
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Training Dynamics and Scaling Laws
48
3.0 Spectra-Suite: Spanning Parameters & Bitwidth
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Training Dynamics and Scaling Laws
49
3.0 Spectra-Suite: Spanning Parameters & Bitwidth
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Training Dynamics and Scaling Laws
50
3.0 Spectra-Suite: Spanning Parameters & Bitwidth
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Final Validation Loss across Size and Parameters
At the size of TriLM 3.9B, these ternary models start offering better performance than models more than five times their size
51
3.0 Spectra-Suite: Spanning Parameters & Bitwidth
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Final Validation Loss across Size and Parameters
52
3.0 Spectra-Suite: Spanning Parameters & Bitwidth
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Final Validation Loss across Size and Parameters
53
3.0 Spectra-Suite: Spanning Parameters & Bitwidth
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Final Validation Loss across Size and Parameters
TriLMs with increasing size offer much better performance than FloatLMs of same number of bits and the gap in validation perplexity closes at large scale
54
3.0 Spectra-Suite: Spanning Parameters & Bitwidth
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Advancing research via open access
SpectraSuite Models
55
3.0 Spectra-Suite: Spanning Parameters & Bitwidth
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Results
56
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Commonsense & Reasoning
57
4.0 Results
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Commonsense & Reasoning
58
4.0 Results
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Knowledge
59
4.0 Results
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Knowledge
60
4.0 Results
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Knowledge
61
4.0 Results
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Conclusion and Discussion
62
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Conclusion
63
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Conclusion
64
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Conclusion
65
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Broader Impact:
Environmental Benefits and Resource Efficiency
66
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Broader Impact:
Environmental Benefits and Resource Efficiency
Benefits on Specialised Hardware like Groq, Cerebras
67
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Broader Impact:
Reduced Training Costs.
Environmental Benefits and Resource Efficiency
Benefits on Specialised Hardware like Groq, Cerebras
68
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Thanks
Tejas
Pandey
Nolano AI,
IIT Kharagpur
Tejas
Vaidhya
Nolano AI, MILA,
University of Montreal
Ayush
Kaushal
Nolano AI,
University of Montreal
Aaryan
Bhagat
UC Riverside
Irina
Rish
Nolano AI, MILA
University of Montreal
69
4.0 Thanks
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024
Thank you <3
Read our paper
SpectraSuite Models
GitHub
https://huggingface.co/SpectraSuite
https://arxiv.org/pdf/2210.17323
https://github.com/NolanoOrg/SpectraSuite
70
Spectra: A Comprehensive Study of Ternary, Quantized and FP16 Language models
22-07-2024