A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | AA | AB | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Organization | Model Name | Parameters | Context | Type | License | Release | Notes | See Also | Properties | ||||||||||||||||||
2 | Skywork | Skywork-MoE | 22Ba146B | 8K | EN/CN | Lawful | 2024-6-3 | https://github.com/SkyworkAI/Skywork-MoE | ||||||||||||||||||||
3 | LLM360 | K2 | 65B | 2K | General | Apache 2.0 | 2024-5-29 | https://www.llm360.ai/blog/several-new-releases-to-further-our-mission.html | ||||||||||||||||||||
4 | IEIT-Yuan | Yuan2.0-M32 | 3.7Ba40B | 8K | Unknown | 2024-5-28 | https://huggingface.co/IEITYuan/Yuan2-M32-hf | |||||||||||||||||||||
5 | DeepSeek | DeepSeek-V2-Lite | 2.4Ba16B | 32K | EN/CN | Lawful | 2024-5-16 | https://huggingface.co/deepseek-ai/DeepSeek-V2-Lite | ||||||||||||||||||||
6 | IBM | Granite | 3B, 8B, 20B, 34B | 2K-8K | Code | Apache 2.0 | 2024-5-7 | https://research.ibm.com/blog/granite-code-models-open-source | ||||||||||||||||||||
7 | DeepSeek | DeepSeek-V2 | 21Ba236B | 128K | EN/CN | Lawful | 2024-5-6 | https://github.com/deepseek-ai/DeepSeek-V2 | ||||||||||||||||||||
8 | Snowflake | Arctic | 17Ba408B | 4K | Apache 2.0 | 2024-4-24 | https://www.snowflake.com/blog/arctic-open-efficient-foundation-language-models-snowflake/ | |||||||||||||||||||||
9 | Microsoft | Phi-3 | 3.8B, 7B, 14B | 4-128K | MIT | 2024-4-23 | https://news.microsoft.com/source/features/ai/the-phi-3-small-language-models-with-big-potential/ | |||||||||||||||||||||
10 | Meta | Llama 3 | 8B, 70B, 405B | 8K | <700M MAU | 2024-4-18 | https://ai.meta.com/blog/meta-llama-3/ | |||||||||||||||||||||
11 | Mistral | Mixtral 8x22B | 35Ba141B | 64K | Apache 2.0 | 2024-4-17 | https://mistral.ai/news/mixtral-8x22b/ | |||||||||||||||||||||
12 | Hugging Face | Idefics2 | 8B | 32K | Vision | Apache 2.0 | 2024-4-15 | https://huggingface.co/blog/idefics2 | ||||||||||||||||||||
13 | Cohere | Command-R+ | 104B | 128K | Multilingual | CC-BY-NC | 2024-4-4 | https://cohere.com/blog/command-r-plus-microsoft-azure | ||||||||||||||||||||
14 | AI21 | Jamba | 12Ba52B | 256K | Apache 2.0 | 2024-3-28 | https://www.ai21.com/blog/announcing-jamba | |||||||||||||||||||||
15 | Databricks | DBRX | 36Ba132B | 32K | Apache 2.0 | 2024-3-27 | https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm | |||||||||||||||||||||
16 | xAI | Grok-1 | 79Ba314B | 8K | Apache 2.0 | 2024-3-17 | https://x.ai/blog/grok-os | |||||||||||||||||||||
17 | Cohere | Command-R | 104B | 128K | Multilingual | CC-BY-NC | 2024-3-11 | https://cohere.com/blog/command-r | ||||||||||||||||||||
18 | BigCode | StarCoder2 | 3B, 7B, 15B | 16K | Code | Apache 2.0 | 2024-2-28 | https://huggingface.co/blog/starcoder2 | ||||||||||||||||||||
19 | Alibaba | Qwen 1.5 | 0.5B, 1.8B, 4B, 7B, 14B, 72B | 32K | <100M MAU | 2024-2-4 | https://qwenlm.github.io/blog/qwen1.5/ | |||||||||||||||||||||
20 | OpenBMB | MiniCPM | 2.7B | China | 2024-2-1 | https://openbmb.vercel.app/minicpm-en | ||||||||||||||||||||||
21 | Allen Institute | OLMo | 7B | 2K | Apache 2.0 | 2024-2-1 | https://allenai.org/olmo | |||||||||||||||||||||
22 | LLaVA | LLaVA-NeXT | 7B, 13B, 34B | 4K-32K | Vision | Various | 2024-1-30 | https://llava-vl.github.io/blog/2024-01-30-llava-next/ | ||||||||||||||||||||
23 | BlinkDL | RWKV-5 | 7B | 4K+ | Apache 2.0 | 2024-1-28 | https://twitter.com/BlinkDL_AI/status/1751542433039651304 | |||||||||||||||||||||
24 | ORION STAR | Orion | 14B | 4K, 200K | Multilingual | Lawful/+Comm | 2024-1-22 | https://github.com/OrionStarAI/Orion | ||||||||||||||||||||
25 | InternLM | InternLM2 | 7B, 20B | 200K | EN/CN | Lawful/+Comm | 2024-1-17 | https://github.com/InternLM/InternLM | ||||||||||||||||||||
26 | Mistral | Mixtral 8x7B | 13Ba47B | Apache 2.0 | 2023-12-11 | https://mistral.ai/news/mixtral-of-experts/ | ||||||||||||||||||||||
27 | XVERSE | XVERSE | 7B, 13B, 65B, 65B-2 | 16K | EN/CN | Lawful/+Comm | 2023-12-8 | https://github.com/xverse-ai | ||||||||||||||||||||
28 | DeepSeek | LLM | 7B, 67B | 4K | EN/CN | Lawful | 2023-11-29 | https://github.com/deepseek-ai/DeepSeek-LLM | ||||||||||||||||||||
29 | DeepSeek | Coder | 1.3B, 6.7B, 33B | 16K | Code | MIT | 2023-11-3 | https://deepseekcoder.github.io/ | ||||||||||||||||||||
30 | 01.ai | Yi | 6B, 34B | 4K | EN/CN | Apache 2.0 | 2023-11-02 | https://github.com/01-ai/Yi | ||||||||||||||||||||
31 | Mistral | Mistral | 7B | 4K | General | Apache 2.0 | 2023-9-27 | https://mistral.ai/news/announcing-mistral-7b/ | ||||||||||||||||||||
32 | CofeAI | FLM | 101B | 2K+ | EN/CN | Apache 2.0 | 2023-09-08 | Terrible perf https://www.reddit.com/r/LocalLLaMA/comments/16danhb/flm101b_an_open_llm_and_how_to_train_it_with_100k/ | ||||||||||||||||||||
33 | Meta | CodeLlama | 7B, 13B, 34B | 16K | Code | <700M MAU | 2023-08-24 | https://about.fb.com/news/2023/08/code-llama-ai-for-coding/ | ||||||||||||||||||||
34 | matsuo-lab | weblab | 10B | 2K | JP | CC-BY-NC 4.0 | 2023-08-18 | https://weblab.t.u-tokyo.ac.jp/100%E5%84%84%E3%83%91%E3%83%A9%E3%83%A1%E3%83%BC%E3%82%BF%E3%82%B5%E3%82%A4%E3%82%BA%E3%83%BB%E6%97%A5%E8%8B%B12%E3%83%B6%E5%9B%BD%E8%AA%9E%E5%AF%BE%E5%BF%9C%E3%81%AE%E5%A4%A7%E8%A6%8F%E6%A8%A1/ | ||||||||||||||||||||
35 | LINE | japanese-large-lm | 3.6B | 2K | JP | Apache 2.0 | 2023-08-14 | https://engineering.linecorp.com/ja/blog/3.6-billion-parameter-japanese-language-model | ||||||||||||||||||||
36 | Stability AI | Japanese StableLM | 7B | 2K | JP | Apache 2.0 | 2023-08-10 | https://stability.ai/blog/stability-ai-new-jplm-japanese-language-model-stablelm | ||||||||||||||||||||
37 | Alibaba | Qwen | 7B, 14B | 8K | EN/CN | <100M MAU | 2023-08-03 | https://github.com/QwenLM/Qwen-7B | ||||||||||||||||||||
38 | Meta | Llama 2 | 7B, 13B, 34B, 70B | 4K | General | <700M MAU | 2023-07-18 | https://ai.meta.com/llama/ | ||||||||||||||||||||
39 | Salesforce | CodeGen2.5 | 7B | Code | Apache 2.0 | 2023-07-06 | https://blog.salesforceairesearch.com/codegen25/ | |||||||||||||||||||||
40 | Salesforce | XGen | 7B | General/Code | Apache 2.0 | 2023-06-28 | https://blog.salesforceairesearch.com/xgen/ | |||||||||||||||||||||
41 | BAAI | Aquila | 7B, 33B | EN/CN | Lawful SA | 2023-06-09 | ||||||||||||||||||||||
42 | TII | Falcon | 7B, 40B | 2K | General | Apache 2.0 | 2023-05-25 | license changed 5/31 | ||||||||||||||||||||
43 | s-JoL | Open-Llama | 7B | General | MIT | 2023-05-11 | https://github.com/s-JoL/Open-Llama | |||||||||||||||||||||
44 | conceptofmind | PaLM | 1B | General | MIT | 2023-05-08 | Open Source Reimplementation of Google's PaLM, only C4 trained https://www.reddit.com/r/MachineLearning/comments/13bxu2g/p_opensource_palm_models_trained_at_8k_context/ | |||||||||||||||||||||
45 | RedPajama | INCITE | 3B, 7B | General | Apache 2.0 | 2023-05-05 | https://www.together.xyz/blog/redpajama-models-v1 | |||||||||||||||||||||
46 | BigCode | StarCoder | 15.5B | Code | OpenRAIL | 2023-05-04 | ||||||||||||||||||||||
47 | openlm-research | OpenLLaMA | 3B, 7B, 13B, 20B | General | Apache 2.0 | 2023-05-02 | All models done training to 1T | |||||||||||||||||||||
48 | MosaicML | MPT | 1B, 7B, 30B | General | Apache 2.0 | 2023-04-20 | More being trained https://twitter.com/jefrankle/status/1649060478910357504 | |||||||||||||||||||||
49 | Stability AI | StableLM | 3B, 7B, 15B, 30B | General | CC-BY-SA 4.0 | 2023-04-19 | Still training (alpha checkpoint not good) | |||||||||||||||||||||
50 | NVIDIA | GPT-2B-001 | 2B | General | CC-BY 4.0 | 2023-04-17 | ||||||||||||||||||||||
51 | GeoV | GeoV | 9B | General | OpenRAIL | 2023-04-02 | Still Training (checkpoints available) | |||||||||||||||||||||
52 | Cerebras | Cerebras-GPT | 1.3B, 2.7B, 6.7B, 13B | General | Apache 2.0 | 2023-03-28 | ||||||||||||||||||||||
53 | Anthropic | Claude | claude-instant-1, claude-2 | 100K | General | Commercial API | 2023-03-14 | Wait List | ||||||||||||||||||||
54 | OpenAI | GPT-4 | 1.8T? | 32K | General | Commercial API | 2023-03-14 | Wait List | ||||||||||||||||||||
55 | Together | GPT-JT-Moderation | 6B | Moderation | Apache 2.0 | 2023-03-10 | ||||||||||||||||||||||
56 | Together | GPT-NeoXT-Chat-Base | 20B | Instruction | Apache 2.0 | 2023-03-10 | ||||||||||||||||||||||
57 | AI21 | J2 | 7.5B, 17B, 178B | General | Commercial API | 2023-03-09 | ||||||||||||||||||||||
58 | Meta | LLaMA | 7B, 13B, 33B, 65B | 2K | General | NC Research | 2023-02-24 | |||||||||||||||||||||
59 | BlinkDL | RWKV | 1B, 3B, 7B, 14B | 4K+ | General | Apache 2.0 | 2023-02-15 | |||||||||||||||||||||
60 | EleutherAI | Pythia | 1B, 1.4B, 2.8B, 6.9B, 12B | General | Apache 2.0 | 2023-02-13 | ||||||||||||||||||||||
61 | BigCode | Santacoder | 1.1B | Code | OpenRAIL | 2022-12-22 | ||||||||||||||||||||||
62 | Stanford | BioMedLM | 2.7B | Academic (Bio) | RAIL | 2022-12-16 | ||||||||||||||||||||||
63 | EleutherAI | Polyglot | 1.3B, 3.8B, 5.8B | KO | Apache 2.0 | 2022-12-15 | ||||||||||||||||||||||
64 | OpenAI | GPT-3.5 | 175B? | General | Commercial API | 2022-11-30 | ||||||||||||||||||||||
65 | Meta | Galactica | 120B | Academic | CC-BY-NC 4.0 | 2022-11-16 | ||||||||||||||||||||||
66 | Cohere | cohere | 6B, 13B, 52B | General | Commercial API | 2022-11-08 | ||||||||||||||||||||||
67 | Flan-T5 | 3B, 11B | General | Apache 2.0 | 2022-10-22 | |||||||||||||||||||||||
68 | OpenBMB | CPM-Ant | 1B, 3B, 7B, 10B | EN/CN | GML Open | 2022-10-12 | ||||||||||||||||||||||
69 | NVIDIA | NeMo | 1.3B, 5B, 20B | General | CC-BY 4.0 | 2022-09-15 | ||||||||||||||||||||||
70 | THUDM | GLM-130B | 130B | EN/CN | NC China | 2022-08-04 | ||||||||||||||||||||||
71 | BigScience | BLOOM | 1B, 3B, 7B, 176B | Multilingual | OpenRAIL | 2022-07-12 | ||||||||||||||||||||||
72 | Yandex | YaLM | 100B | EN/RU | Apache 2.0 | 2022-06-23 | ||||||||||||||||||||||
73 | Meta | OPT | 1.3B, 2.7B, 13B, 30B, 66B, 175B | General | NC Research | 2022-05-03 | ||||||||||||||||||||||
74 | Salesforce | CodeGen | 2B, 6B, 16B | Code | BSD | 2022-04-06 | ||||||||||||||||||||||
75 | EleutherAI | GPT-Neo | 1.3B, 2.7B | General | Apache 2.0 | 2022-03-21 | ||||||||||||||||||||||
76 | Flan-UL2 | 20B | General | Apache 2.0 | 2022-03-03 | |||||||||||||||||||||||
77 | EleutherAI | GPT-NeoX | 20B | General | Apache 2.0 | 2022-02-02 | ||||||||||||||||||||||
78 | Huawei | PanGu-α | 2.6B, 13B, (200B) | EN/CN | Apache 2.0 | 2021-12-30 | ||||||||||||||||||||||
79 | Meta | Fairseq | 1.3B, 2.7B, 6.7B, 13B | General | MIT | 2021-12-21 | ||||||||||||||||||||||
80 | OpenAI | Codex | cushman, davinci | Code | Commercial API | 2021-08-10 | ||||||||||||||||||||||
81 | EleutherAI | GPT-J | 6B | General | Apache 2.0 | 2021-06-04 | ||||||||||||||||||||||
82 | mT5 | 1.2B, 3.7B, 13B | Multilingual | Apache 2.0 | 2020-12-02 | |||||||||||||||||||||||
83 | OpenAI | GPT-3 | ada, babbage, curie, davinci | General | Commercial API | 2020-06-11 | ||||||||||||||||||||||
84 | Meta | Megatron | 11B | General | MIT | 2020-04-04 | ||||||||||||||||||||||
85 | ||||||||||||||||||||||||||||
86 | ||||||||||||||||||||||||||||
87 | ||||||||||||||||||||||||||||
88 | ||||||||||||||||||||||||||||
89 | ||||||||||||||||||||||||||||
90 | ||||||||||||||||||||||||||||
91 | ||||||||||||||||||||||||||||
92 | ||||||||||||||||||||||||||||
93 | ||||||||||||||||||||||||||||
94 | ||||||||||||||||||||||||||||
95 | ||||||||||||||||||||||||||||
96 | ||||||||||||||||||||||||||||
97 | ||||||||||||||||||||||||||||
98 | ||||||||||||||||||||||||||||
99 | ||||||||||||||||||||||||||||
100 |