ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
Origin and occurrence of LLM innovations
2
CC-BY Epoch
3
4
Welcome to Epoch's dataset of key algorithmic innovations that underpin large language models. It is currently (as of November 2023) maintained by Ben Cottier, ben@epochai.org.

All of the innovations were identified based on: Zhao, W. X., Zhou, K., Li, J., Tang, T., Wang, X., Hou, Y., ... & Wen, J. R. (2023). A survey of large language models. arXiv preprint arXiv:2303.18223.

See the tab "Origins_clean" for the list of key algorithmic innovations and data about their origin. See the tab "Occurrences" for data on the adoption of the innovations in large language models.
5
6
Acknowledgements:
7
These data have been collected by Ben Cottier, David Owen, and Tamay Besiroglu.
8
9
BibTeX for citations:
10

@misc{epochLLMInnovationData2023,
title = {Origin and occurrence of LLM innovations},
author = {Epoch},
year = {2023},
copyright = {CC-BY},
howpublished = {https://epochai.org/data/llm-innovations},
}
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100