Overview open-source LLM's

	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	R
1	Model	Derivative	Release date	Profile	Available on hugging face	py	No. Parameters	Languages	German language support	The size of training data	Training dataset	Data available	Technology	Trained on (hardware) for (time)	Code available for finetuning	Min. Hardware for finetuning	LangChain Support	Min. Hardware for Inference

2	LLaMA	LLaMA - base model	24.02.2023	Text completion			7B	Mainly EN but also bg, ca, cs, da, de, es, fr, hr, hu, it, nl, pl, pt, ro, ru, sl, sr, sv, uk	very little	1 trillion tokens	CCNet [67%], C4 [15%], GitHub [4.5%], Wikipedia [4.5%], Books [4.5%], ArXiv [2.5%], Stack Exchange[2%]		- Transformer architecture, - Pre-Normalization with RMSNorm, - SwiGLU activation function, - Rotary Embeddings	? x A100 GPU 80GB of RAM for 82432 GPU-hours		RTX 4090		6GB VRAM, 16 GB RAM; RTX 1660, 2060, AMD 5700xt, RTX 3050, 3060
3			24.02.2023	Text completion			13B			1 trillion tokens				? x A100 GPU 80GB of RAM for 135168 GPU-hours		1x NVidia Titan RTX 24G		10GB VRAM, 32GB RAM; AMD 6900xt, RTX 2060 12GB, 3060 12GB, 3080, A2000
4			24.02.2023	Text completion			33B			1.4 trillion tokens				? x A100 GPU 80GB of RAM for 530432 GPU-hours		1xA100 80GB		20GB VRAM, 64GB RAM; RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100
5			24.02.2023	Text completion			65B			1.4 trillion tokens				2048 A100 GPU 80GB of RAM for 21 days (1,022,362 GPU-hours)		?		40GB VRAM, 128GB RAM; A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000, Titan Ada
6		Alpaca	13.03.2023	Dialog			7B initially by stanford (13B, 33B, 65B trained by other parties)	+ EN	very little	52k	instruction-following data generated in the style of self-instruct using text-davinci-003		- via Hugging Face’s training framework, - Fully Sharded Data Parallel, - Mixed precision training	8 80GB A100 for 3 hours		1 A100 GPU		CPU with 5GB of RAM (alpaca.cpp, 4 bits)
7		Alpaca - Lora	28.03.2023	Dialog			7B initial model (13B, 33B, 65B trained by other parties)	EN + unofficial weights for - 7B: BR, CN, JP, FR, TH, DE, PL, IT, RU, UA; - 13B: JP, CR, CN, ES; - 30B: JP;	very little	52k			- low-rank adaptation (LoRA), - Huggingface's PEFT, - Tim Dettmers' bitsandbytes	1 RTX 4090 for 5 hours		1 A100 GPU / RTX 4090 / NVIDIA T4		- 7B/ 13B: T4 16GB - 30B: A100 40GB (without quantization)
8		Lit-LLaMA	04.04.2023	Text completion			7B	inherited languages from LLaMa	very little	?	?		?	?		GPU with ~24 GB memory (GTX 3090)		8-10GB VRAM
9		Vicuna	~01.04.2023	Dialog			7B	+ EN	very little	70k	conversations from the ShareGPT		- gradient checkpointing, - flash attention, - SkyPilot managed spot	8 A100 GPUs with 80GB		8 A100 GPUs with 80GB (or fewer with adjustements)		14GB VRAM (GPU) or 30GB CPU memory
10		Vicuna	~01.04.2023	Dialog			13B	+ EN	very little	70k	conversations from the ShareGPT			8 A100 GPUs with 80GB		8 A100 GPUs with 80GB (or fewer with adjustements)		28GB VRAM (GPU) or 60GB of CPU memory
11		Cabrita	17.03.2023	Dialog			7B	+ Portuguese	very little	52k	alpaca dataset translated to portuguese		- low-rank adaptation (LoRA), - Huggingface's PEFT,	1 A100 at Colab for 4 hours		1 A100 GPU		14GB VRAM,T4
12		ColossalAI	14.02.2023	Dialog			7B	+ EN + CN	very little	104K	bilingual datasets of Chinese and English		- RLHF (Reinforcement Learning with Human Feedback), - ZeRO (Zero Redundancy Optimizer) , - LoRA	up to 8-GPU servers		4x32 GB GPUs		4GB VRAM (4-bit quantized, 7B)
13		Koala	03.04.2023	Dialog			13B (but also 7B, 30B, 65B)	mainly EN but also code	very little	~ 500k examples	- ShareGPT, - HC3, - OIG, - Stanford Alpaca, - Anthropic HH, - OpenAI WebGPT. - OpenAI Summarization		- implemented with JAX/Flax in EasyLM	8 A100 GPU for 6 hours		?		14GB VRAM,T4 (7B Model)
14		Baize	03.04.2023	Dialog			7B	+ EN	very little	111.5k + 52k	dialogs generated by letting ChatGPT chat with itself with topics from Quora, StackOverFlow + Alpaca Dataset		- Adapter, - BitFit, - Diffpruning, - Prefix Tuning, - LoRA	A100-80G GPU		26GB VRAM (with int8)		16GB VRAM (without int8)
15							13B									25GB VRAM (with int8)		28GB VRAM (without int8)
16							30B									67GB VRAM (with int8)		67GB VRAM (without int8)
17							7B (Medical)	+ EN (Medical)		(111.5k + 47k) + 52k	as above plus generated dialogs on MedQuAD questions					42GB VRAM (with int8)		16GB VRAM (without int8)
18		GPT4All	29.03.2023	Dialog			7B	+ EN + Code	very little	~440k	GPT-3.5-Turbo Generations based on LLaMa		- LoRA	8 x A100 80GB GPUs for 8 hours		16 GB VRAM		full model on GPU (16GB of RAM required) or quantized model CPU with 8GB RAM
19		GPT4All	29.03.2023	Dialog			7B	+ EN + Code	very little	~440k	GPT-3.5-Turbo Generations based on LLaMa		- LoRA	8 x A100 80GB GPUs for 8 hours		16 GB VRAM
20	BLOOM	BLOOM (also distributed via petals)	July 2022	Text completion			(560m-)176B	46 languages and 13 programming languages	no	1.6TB of pre-processed text, converted into 350B unique tokens	ROOTS		- decoder-only, - ALiBi embeddings	416 A100 80GB GPUs for 4 months		3 A100 (8 bit) or basically none if using petals		Possible/ 3 minutes per token on CPU (16GB RAM, 1TB SSD)
21		BLOOMz (also distributed via petals)	03.11.2022	Dialog			(560m-)176B	13 training tasks across 46 languages with English prompts + prompts in 19 additional languages which were all machine translated from the English prompts	very little	13 tasks	BigScience's xP3 datasets			288 A100 80GB GPUs		144 A100 or basically none if using petals		4 A100 (8 bit) or basically none if using petals
22		BLOOM LoRA	27.03.2023	Dialog			7B	Same as Bloom + more EN	no	50k + 5 k	Alpaca cleaned, ChatDoctor		- LoRA, -PEFT	RTX 4090 for 5 hours		?		?
23		BLOOM-CLP German	25.01.2023	Dialog			6.4B	DE	yes	50.4B tokens	German OSCAR dataset, German court decisions from Open Legal Data		- Cross-Lingual and Progressive Transfer Learning	32xA100-40GB GPUs for 12.5 days		?		?
24		IGEL (on BLOOM-CLP German)	04.04.2023	Dialog			6.4B	DE	yes	?	instructions in English translated into German using an automated translation tool		- LoRA	?		?		?
25	GPT-NeoX-20B	GPT-NeoXT-Chat-Base-20B	10.03.2023	Dialog			20B	EN	no	43M	instructions from OIG-43M		- finetuning focused on question answering, classification, extraction, and summarization	2 x 8 x A100 GPUs		8 x A100 GPUs ( 6 x A100/80GB GPUs for int8)		48 GB VRAM ( 24GB VRAM for int8), also CPU inference possible
26	Pythia	Pythia-Chat-Base-7B	10.03.2023	Dialog			7B	EN	no	43M	instructions from OIG-43M			8 x A100 GPUs		8 GPU's with 32GB		24 GB VRAM ( 12GB VRAM for int8), also CPU inference possible
27	Pythia 12B	Open Assistant	15.04.2023	Dialog			12B	EN	in the dataset but not the model	625k tasks, or >10k Conversation Trees	The OpenAssistant Conversations dataset is a comprehensive collection of conversational data that was obtained through a crowdsourcing effort involving more than 13,000 volunteers. The process was divided into five separate steps: prompting, labelling prompts, adding reply messages as prompter or assistant, labelling replies, and ranking assistant replies.		- using Supervised Fine-Tuning (SFT)	?		? (doesn't seem to be meant for finetuning)		40 GB VRAM (8-bit)
28	Pythia 12B	Dolly 2.0	12.04.2023	Dialog			13B	EN	no	15k	new, high-quality human generated instruction following dataset, crowdsourced among Databricks employees		- Rotary Position Embedding (RoPE)	?		8 A100 GPUs or V100 instances with 32GB of GPU memory		13B: A100, or A10 24GB (8-bits) or smaller for smaller models
29	GPT-J	Dolly	24.03.2023	Dialog			6B	EN	no	52k	Alpaca		- Rotary Position Embedding (RoPE)	8x A100 40GB GPUs for 30 mins		4 x A10 24GB or 8 V100s with 32GB of GPU memory		?
30	GPT-J	GPT4All-J	13.04.2023	Dialog			6B	EN	no	800k point	Subsets of LAION OIG, Coding questions with a random sub-sample of Stackoverflow Questions, sub-sample of Bigscience/P3, Custom-generated creative questions		?	8 A100 80GB GPUs for ~12 hours		?		Runs with CPU only with 16GB RAM
31	UL2	FLAN-UL2	03.03.2023	Dialog			20B	EN, FR, DE + Code	yes	1 trillion tokens	taskmaster2, djaym7/wiki_dialog, deepmind/code_contests, lambada, gsm8k, aqua_rat, esnli, quasc and qed		- T5 architecture, - Mixture-of-Denoisers (MoD)	TPU v3 or TPU v4 pods, using t5x codebase together with jax		?		A10G with 24GB VRAM via Amazon SageMaker or A100 GPU
32	RWKV	RWKV-4-Raven	30.03.2023	Dialog			3B/7B/14B	EN	no	?	Alpaca, CodeAlpaca, Guanaco, GPT4All, ShareGPT and more		- RNN	on n x A100 GPUs 40G of Stability and EleutherAI		80G VRAM (for 14B model)		12GB-16GB RAM or 9GB-15GB VRAM (could be also combined)
33	Cerebras-GPT	Cerebras-GPT	06.04.2023	Text Completion			(111M-)13B	EN	no	400B tokens	The Pile: consists of 22 smaller datasets, s,includingCommon Crawl,PubMedCentral,Books3,OpenWebText2,Github,andarXiv		- Chinchilla scaling laws, - Cerebras' weight streaming technology,	on Andromeda AI supercomputer comprised of 16 CS-2 wafer scale systems		?		?
34
35	Visualization in good resolution
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100