| A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Model pink = recently added | creator | MMLU | LMArena score | SWE-Bench Score | AL score | Parameters (Bn) | Tokens trained (B) | Ratio Tokens | Announced | year | month | date | Lab | Playground | MMLU -Pro | GPQA | Link | Archiecture | Note | open access | force label | show only | ||
2 | source: LifeArchitect https://docs.google.com/spreadsheets/d/1kc262HZSMAWI6FVsh0zJwbB-ooYvzhCHaHcNUiA0_hY/edit?gid=1158069878#gid=1158069878 | https://huggingface.co/spaces/lmarena-ai/chatbot-arena-leaderboard where available | https://www.vals.ai/benchmarks/legal_bench-09-08-2025 | ALScore "ALScore" is a quick and dirty rating of the model's power. The formula is: Sqr Root of (Parameters x Tokens) ÷ 300. Any ALScore ≥ 1.0 is a powerful model in mid-2023. | Ratio Tokens:Params (Chinchilla scaling≥20:1) | as numeric | |||||||||||||||||||
3 | AMD-Llama-135m | other | 23.0 | 0.0 | 0.135 | 670 | 4,963:1 | Sep/2024 | 2024 | 9 | 5.75 | AMD | https://huggingface.co/amd/AMD-Llama-135m | https://www.amd.com/en/developer/resources/technical-articles/introducing-amd-first-slm-135m-model-fuels-ai-advancements.html | Dense | Small language model (SLM) trained on 70,000 open access books | |||||||||
4 | Apple On-Device Jun 24 | other | 26.8 | 0.2 | 3.04 | 1,500 | 494:1 | Jun/2024 | 2024 | 6 | 5.50 | Apple | https://github.com/apple/corenet/tree/main/projects/openelm | https://arxiv.org/abs/2404.14619 | Dense | YES | significant models | ||||||||
5 | Arctic | other | 67.3 | 4.3 | 480 | 3,500 | 8:1 | Apr/2024 | 2024 | 4 | 5.33 | Snowflake AI Research | https://arctic.streamlit.app/ | https://www.snowflake.com/blog/arctic-open-efficient-foundation-language-models-snowflake/ | Hybrid | ||||||||||
6 | Atlas | meta | 47.9 | 0.1 | 11 | 40 | 4:1 | Aug/2022 | 2022 | 8 | 3.67 | Meta AI | https://arxiv.org/abs/2208.03299 | Dense | |||||||||||
7 | Baichuan 2 | chinese | 54.2 | 0.6 | 13 | 2,600 | 200:1 | Sep/2023 | 2023 | 9 | 4.75 | Baichuan Intelligence | https://cdn.baichuan-ai.com/paper/Baichuan2-technical-report.pdf | Dense | Chinese open-access equivalent to Meta's Llama model | YES | |||||||||
8 | BLOOM | other | 39.1 | 0.8 | 176 | 366 | 3:1 | Jul/2022 | 2022 | 7 | 3.58 | BigScience | https://huggingface.co/spaces/huggingface/bloom_demo | https://github.com/bigscience-workshop/bigscience/tree/master/train/tr11-176B-ml | Dense | (tr11-176B-ml) | YES | significant models | |||||||
9 | BloombergGPT | other | 39.2 | 0.6 | 50 | 569 | 12:1 | Mar/2023 | 2023 | 3 | 4.25 | Bloomberg | https://arxiv.org/abs/2303.17564 | Dense | Finance-focussed (of course), based on BLOOM, underperforms against GPT 3 | significant models | |||||||||
10 | Bytedance 175B | chinese | 44 | 0.8 | 175 | 300 | 2:1 | Feb/2024 | 2024 | 2 | 5.17 | ByteDance | https://arxiv.org/abs/2402.15627 | Dense | GPT 3-esque | ||||||||||
11 | Bytedance 530B | chinese | 44 | 1.3 | 530 | 300 | 1:1 | Feb/2024 | 2024 | 2 | 5.17 | ByteDance | https://arxiv.org/abs/2402.15627 | Dense | GPT 3-esque | YES | |||||||||
12 | Chameleon | meta | 65.8 | 1.9 | 34 | 9,200 | 271:1 | May/2024 | 2024 | 5 | 5.42 | Meta AI | https://ai.meta.com/resources/models-and-libraries/chameleon-downloads/?gk_enable=chameleon_web_flow_is_live | https://arxiv.org/abs/2405.09818 | Dense | ||||||||||
13 | ChatGPT (gpt-3.5-turbo) | openAI | 70.0 | 20 | Nov/2022 | 2022 | 11 | 3.92 | OpenAI | https://chat.openai.com/ | 28.1 | https://openai.com/blog/chatgpt | Dense | significant models | |||||||||||
14 | Chinchilla | 67.5 | 1.0 | 70 | 1,400 | 20:1 | Mar/2022 | 2022 | 3 | 3.25 | DeepMind | https://arxiv.org/abs/2203.15556 | Dense | First to double tokens per size increase | |||||||||||
15 | ChuXin | other | 0.2 | 1.6 | 2,300 | 1,438:1 | May/2024 | 2024 | 5 | 5.42 | Independent | https://huggingface.co/chuxin-llm/Chuxin-1.6B-Base | https://arxiv.org/abs/2405.04828 | Dense | ChuXin-1M performs well across all context window lengths up to 1M (!) | ||||||||||
16 | Claude 2 | anthropic | 78.5 | 1.9 | 130 | 2,500 | 20:1 | Jul/2023 | 2023 | 8 | 4.67 | Anthropic | https://claude.ai/ | https://www-files.anthropic.com/production/images/Model-Card-Claude-2.pdf | Dense | Expanded input and output length (up to 100,00 tokens) allowing the AI model to analyze long documents such as technical guides or entire books | significant models | ||||||||
17 | Claude 2.1 | anthropic | 78.5 | 1.9 | 130 | 2,500 | 20:1 | Nov/2023 | 2023 | 11 | 4.92 | Anthropic | https://claude.ai/ | https://www.anthropic.com/index/claude-2-1 | Dense | Fewer hallucinations, 200k context length, tool use | YES | significant models | |||||||
18 | Claude 3 Opus | anthropic | 86.8 | 1,248 | 29.8 | 2000 | 40,000 | 20:1 | Mar/2024 | 2024 | 3 | 5.25 | Anthropic | https://claude.ai/ | 68.5 | 59.5 | https://www.anthropic.com/claude-3-model-card | Dense | 200K context window! 1M for researchers | YES | significant models | ||||
19 | Claude 3.5 Sonnet* | anthropic | 90.5 | 1,283 | Jun/2024 | 2024 | 6 | 5.50 | Anthropic | https://poe.com/Claude-3.5-Sonnet | 78 | 65 | https://www.anthropic.com/news/claude-3-5-sonnet | Dense | YES | significant models | |||||||||
20 | Command-R | other | 37.9 | 0.5 | 35 | 700 | 20:1 | Mar/2024 | 2024 | 3 | 5.25 | Cohere | Cohere | 37.9 | https://txt.cohere.com/command-r/ | Dense | has knowledge outside training set (RAG), tool use | ||||||||
21 | Command-R+ | other | 75.7 | 2.1 | 104 | 4,000 | 39:1 | Apr/2024 | 2024 | 4 | 5.33 | Cohere | https://huggingface.co/spaces/CohereForAI/c4ai-command-r-plus | https://huggingface.co/CohereForAI/c4ai-command-r-plus | Dense | business orientated - "purpose-built to excel at real-world enterprise use cases." | |||||||||
22 | DBRX | other | 73.7 | 4.2 | 132 | 12,000 | 91:1 | Mar/2024 | 2024 | 3 | 5.25 | MosaicML | https://huggingface.co/spaces/databricks/dbrx-instruct | https://www.databricks.com/blog/introducing-dbrx-new-state-art-open-llm | MoE | ||||||||||
23 | DCLM-Baseline 7B 2.6T | other | 63.7 | 0.4 | 7 | 2,600 | 372:1 | Jun/2024 | 2024 | 6 | 5.50 | International | https://huggingface.co/apple/DCLM-Baseline-7B | https://arxiv.org/abs/2406.11794 | Dense | ||||||||||
24 | DeepSeek-V2 | chinese | 54.8 | 1,258 | 4.6 | 236 | 8,100 | 35:1 | May/2024 | 2024 | 5 | 5.42 | DeepSeek-AI | https://chat.deepseek.com/ | 54.8 | https://arxiv.org/abs/2405.04434 | MoE | Huge dataset, 12% Chinese "Therefore, we acknowledge that DeepSeek-V2 still has a slight gap in basic English capabilities with LLaMA3 70B". | |||||||
25 | DeepSeek-V3 | chinese | 87.1 | 1,317 | 671 | 14,800 | 22:1 | Dec/2024 | 2025 | 12 | 6.00 | DeepSeek-AI | https://chat.deepseek.com/ | 64.4 | 59.1 | https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf | MoE | Prelude to DeepSeek-R1 | YES | YES | significant models | ||||
26 | DeepSeek-R1 | chinese | 90.8 | 1,357 | 671 | 14,800 | 22:1 | Jan/2025 | 2025 | 1 | 6.08 | DeepSeek-AI | https://chat.deepseek.com/ | 84 | 71.5 | https://github.com/deepseek-ai/DeepSeek-R1/blob/main/DeepSeek_R1.pdf | MoE | Major reasoning open-source LLM with performance comparable to ChatGPT-o1 | YES | YES | significant models | ||||
27 | Ernie 3.5 | chinese | 65.1 | Jul/2023 | 2023 | 7 | 4.58 | Baidu | https://yiyan.baidu.com/ | https://www.reuters.com/technology/chinas-baidu-unveils-latest-version-its-ernie-ai-model-2023-10-17/ | Dense | "Enhanced Representation through kNowledge IntEgration" | |||||||||||||
28 | Ernie 4.0 | chinese | 86 | 14.9 | 1000 | 20,000 | 20:1 | Oct/2023 | 2023 | 10 | 4.83 | Baidu | https://yiyan.baidu.com/ | https://www.reuters.com/technology/chinas-baidu-unveils-latest-version-its-ernie-ai-model-2023-10-17/ | Dense | Enhanced Representation through kNowledge IntEgration | |||||||||
29 | Ernie 4.0 Turbo* | chinese | 86 | 14.9 | 1000 | 20,000 | 20:1 | Jun/2024 | 2024 | 6 | 5.50 | Baidu | https://www.reuters.com/technology/artificial-intelligence/baidu-launches-upgraded-ai-model-says-user-base-hits-300-mln-2024-06-28/ | https://www.reuters.com/technology/chinas-baidu-unveils-latest-version-its-ernie-ai-model-2023-10-17/ | Dense | "rivals GPT-4 in capabilities" | |||||||||
30 | EXAONE 3.0 | other | 27.4 | 0.8 | 7.8 | 8,000 | 1,026:1 | Aug/2024 | 2024 | 8 | 5.67 | LG | https://huggingface.co/LGAI-EXAONE/EXAONE-3.0-7.8B-Instruct | 27.4 | 10.1 | https://arxiv.org/abs/2408.03541 | Dense | “EXAONE”=“EXpert AI for EveryONE” | significant models | ||||||
31 | EXAONE 3.5 | other | 78.3 | 1.5 | 32 | 6,500 | 204:1 | Dec/2024 | 2024 | 12 | 6.00 | LG | https://huggingface.co/collections/LGAI-EXAONE/exaone-35-674d0e1bb3dcd2ab6f39dbb4 | 39.7 | https://huggingface.co/collections/LGAI-EXAONE/exaone-35-674d0e1bb3dcd2ab6f39dbb4 | Dense | EXAONE”=“EXpert AI for EveryONE” | ||||||||
32 | Falcon 180B | other | 70.6 | 2.6 | 180 | 3,500 | 20:1 | Sep/2023 | 2023 | 9 | 4.75 | TII | https://huggingface.co/spaces/tiiuae/falcon-180b-demo | https://arxiv.org/abs/2311.16867 | Dense | Largest open-access model | YES | ||||||||
33 | Falcon 2 11B | other | 58.4 | 0.8 | 11 | 5,500 | 500:1 | May/2024 | 2024 | 5 | TII | https://huggingface.co/tiiuae/falcon-11B | https://www.tii.ae/news/falcon-2-uaes-technology-innovation-institute-releases-new-ai-model-series-outperforming-metas | Dense | foundational large language model (LLM) with 40 billion parameters trained on one trillion tokens | significant models | |||||||||
34 | Falcon Mamba 7B | other | 62.1 | 0.7 | 7 | 6,000 | 858:1 | Aug/2024 | 2024 | 8 | 5.67 | TII | https://falconllm.tii.ae/falcon-models.html | 14.47 | 8.05 | https://falconllm.tii.ae/tii-releases-first-sslm-with-falcon-mamba-7b.html | Dense | ||||||||
35 | Flan-PaLM | 73.5 | 2.2 | 540 | 780 | 2:1 | Oct/2022 | 2022 | 10 | 3.83 | https://arxiv.org/abs/2210.11416 | Dense | |||||||||||||
36 | Galactica | meta | 52.6 | 0.8 | 120 | 450 | 4:1 | Nov/2022 | 2022 | 11 | 3.92 | Meta AI | https://galactica.org/ | https://galactica.org/static/paper.pdf | Dense | scientific only | YES | significant models | |||||||
37 | Gemini 1.5 Flash | 78.9 | 1,271 | May/2024 | 2024 | 5 | 5.42 | Google DeepMind | https://aistudio.google.com/app/prompts/new_chat | 59.1 | 39.5 | https://goo.gle/GeminiV1-5 | MoE | 1M context length. | significant models | ||||||||||
38 | Gemini 1.5 Pro | 85.9 | 1,260 | 22.4 | 1500 | 30,000 | 20:1 | Feb/2024 | 2024 | 2 | 5.17 | Google DeepMind | https://aistudio.google.com/app/prompts/new_chat | 69 | 46.2 | https://goo.gle/GeminiV1-5 | MoE | 100 languages | YES | significant models | |||||
39 | Gemini 2.0 | 76.4 | 1,369 | Dec/2024 | 2024 | 12 | 6.00 | https://console.cloud.google.com/vertex-ai/generative/multimodal/create/text?model=gemini-2.0-flash-exp&pli=1&inv=1&invt=Abj6Sg | 76.4 | 62.1 | https://console.cloud.google.com/vertex-ai/generative/multimodal/create/text?model=gemini-2.0-flash-exp&pli=1&inv=1&invt=Abj6Sg | MoE | introduces native image generation and controllable text-to-speech capabilities | YES | |||||||||||
40 | Gemini Ultra 1.0 | 83.7 | 22.4 | 1500 | 30,000 | 20:1 | Dec/2023 | 2023 | 12 | 5.00 | Google DeepMind | https://deepmind.google/technologies/gemini/ | 35.7 | https://storage.googleapis.com/deepmind-media/gemini/gemini_1_report.pdf | Dense | Bot, based on Chincilla | significant models | ||||||||
41 | Gemini-1.5 | 75.8 | 1,301 | 22.4 | 1500 | 30,000 | 20:1 | Sep/2024 | 2024 | 9 | 5.75 | Google DeepMind | https://aistudio.google.com/app/prompts/new_chat | 75.8 | 59.1 | https://developers.googleblog.com/en/updated-production-ready-gemini-models-reduced-15-pro-pricing-increased-rate-limits-and-more/ | MoE | YES | significant models | ||||||
42 | Gemma | 64.3 | 0.7 | 7 | 6,000 | 858:1 | Feb/2024 | 2024 | 2 | 5.17 | Google DeepMind | https://labs.pplx.ai/ | 33.7 | https://storage.googleapis.com/deepmind-media/gemma/gemma-report.pdf | Dense | YES | |||||||||
43 | Gemma 2 | 75.2 | 1,220 | 2 | 27 | 13,000 | 482:1 | Jun/2024 | 2024 | 6 | 5.50 | Google DeepMind | https://huggingface.co/google/gemma-2-27b-it | https://storage.googleapis.com/deepmind-media/gemma/gemma-2-report.pdf | Dense | YES | |||||||||
44 | GLM-4 | chinese | 81.5 | 1,274 | 3 | 200 | 4,000 | 20:1 | Jan/2024 | 2024 | 1 | 5.08 | Tsinghua | https://open.bigmodel.cn/ | https://pandaily.com/zhipu-ai-unveils-glm-4-model-with-advanced-performance-paralleling-gpt-4/ | Dense | Best Chinese model to date based on analysis. Follows OpenAI roadmap | significant models | |||||||
45 | Gopher | 60.0 | 1.0 | 280 | 300 | 2:1 | Dec/2021 | 2021 | 12 | 3.00 | DeepMind | https://arxiv.org/abs/2112.11446 | Dense | ||||||||||||
46 | GPT-2 | openAI | 32.4 | 0.0 | 1.5 | 10 | 7:1 | Feb/2019 | 2019 | 2 | 0.17 | OpenAI | Hugging Face | https://openai.com/blog/better-language-models/ | Dense | trained on Reddit only | YES | significant models | |||||||
47 | GPT-3 | openAI | 43.9 | 0.8 | 175 | 300 | 2:1 | May/2020 | 2020 | 5 | 2.65 | OpenAI | https://arxiv.org/abs/2005.14165 | Dense | significant models | ||||||||||
48 | GPT-4 Classic | openAI | 86.4 | 1,186 | 15.9 | 1760 | 13,000 | 8:1 | Mar/2023 | 2023 | 3 | 4.25 | OpenAI | https://chat.openai.com/ | 35.7 | https://cdn.openai.com/papers/gpt-4.pdf | MoE | significant models | |||||||
49 | GPT-4 Turbo* | openAI | 86.4 | 1,256 | 13,000 | Nov/2023 | 2023 | 11 | 4.92 | OpenAI | https://chat.openai.com/ | 46.5 | https://cdn.openai.com/papers/gpt-4.pdf | MoE | significantly better model than other earlier GPTs | significant models | |||||||||
50 | GPT-4o mini* | openAI | 82.0 | 1,273 | 1.1 | 8 | 13,000 | 1,625:1 | Jul/2024 | 2024 | 7 | 5.58 | OpenAI | https://chatgpt.com/ | 40.2 | https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/ | MoE | "OpenAI would not disclose exactly how large GPT-4o mini is, but said it’s roughly in the same tier as other small AI models, such as Llama 3 8b, Claude Haiku and Gemini 1.5 Flash." | significant models | ||||||
51 | GPT-4o* | openAI | 88.7 | 1,265 | 6.7 | 200 | 20,000 | 100:1 | May/2024 | 2024 | 5 | 5.42 | OpenAI | https://chatgpt.com/ | 72.6 | 53.6 | https://openai.com/index/gpt-4o-system-card/ | MoE | likely early "beta" of GPT-5 | YES | significant models | ||||
52 | GPT-NeoX | other | 33.6 | 20 | Mar/2023 | 2023 | 3 | 4.25 | Together | https://huggingface.co/spaces/togethercomputer/OpenChatKit | https://github.com/togethercomputer/OpenChatKit | Dense | YES | ||||||||||||
53 | Granite | other | 57.0 | 0.6 | 13 | 2,500 | 193:1 | Sep/2023 | 2023 | 9 | 4.75 | IBM | https://www.ibm.com/granite | https://www.ibm.com/downloads/cas/X9W4O6BM | trained on 2.5T token | ||||||||||
54 | Griffin | 49.5 | 0.2 | 14 | 300 | 22:1 | Feb/2024 | 2024 | 2 | 5.17 | Google DeepMind | https://arxiv.org/abs/2402.19427 | Dense | ||||||||||||
55 | GRIN MoE | microsoft | 79.4 | 1.6 | 60 | 4,025 | 68:1 | Sep/2024 | 2024 | 9 | 5.75 | Microsoft | https://huggingface.co/microsoft/GRIN-MoE | https://huggingface.co/microsoft/GRIN-MoE/blob/main/GRIN_MoE.pdf | MoE | ||||||||||
56 | Grok-1.5 | other | 81.3 | 4.6 | 314 | 6,000 | 20:1 | Mar/2024 | 2024 | 3 | 5.25 | xAI | https://grok.x.ai/ | https://x.ai/blog/grok-1.5 | MoE | Twitter chatbot trained on Twitter data, Context=128k. | |||||||||
57 | Grok-2 | other | 87.5 | 1,288 | 10.0 | 600 | 15,000 | 25:1 | Aug/2024 | 2024 | 8 | 5.67 | xAI | https://x.com/i/grok | 75.5 | 56 | https://x.ai/blog/grok-2 | MoE | Twitter chatbot trained on Twitter data | YES | significant models | ||||
58 | H20-Danube3-4B | other | 55.2 | 0.5 | 4 | 6,000 | 1,500:1 | Jul/2024 | 2024 | 7 | 5.58 | H20.ai | https://h2o.ai/platform/danube/personal-gpt/ | https://arxiv.org/abs/2407.09276 | Dense | "Runs natively and fully offline on mobile phone." | |||||||||
59 | Hawk | 35.0 | 0.2 | 7 | 300 | 43:1 | Feb/2024 | 2024 | 2 | 5.17 | Google DeepMind | https://arxiv.org/abs/2402.19427 | Dense | ||||||||||||
60 | HLAT | other | 41.3 | 0.4 | 7 | 1,800 | 258:1 | Apr/2024 | 2024 | 4 | 5.33 | Amazon | https://arxiv.org/abs/2404.10630 | Dense | |||||||||||
61 | iFlytekSpark-13B | other | 63.0 | 0.7 | 13 | 3,000 | 231:1 | Jan/2024 | 2024 | 1 | 5.08 | iFlyTek | https://gitee.com/iflytekopensource/iFlytekSpark-13B | https://www.ithome.com/0/748/030.htm | Dense | ||||||||||
62 | Inflection-2.5 | other | 85.5 | 16.3 | 1200 | 20,000 | 17:1 | Mar/2024 | 2024 | 3 | 5.25 | Inflection AI | https://inflection.ai/inflection-2 | 38.4 | https://inflection.ai/inflection-2-5 | Dense | |||||||||
63 | InternLM2 | chinese | 67.7 | 0.8 | 20 | 2,600 | 130:1 | Jan/2024 | 2024 | 1 | 5.08 | Shanghai AI Laboratory/SenseTime | https://github.com/InternLM/InternLM | https://arxiv.org/abs/2403.17297 | Dense | Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentionS | |||||||||
64 | InternLM2.5 | chinese | 73.5 | 0.8 | 20 | 2,600 | 130:1 | Jul/2024 | 2024 | 7 | 5.58 | Shanghai AI Laboratory/SenseTime | https://huggingface.co/internlm/internlm2_5-20b-chat | 38.4 | https://github.com/InternLM/InternLM/blob/main/model_cards/internlm2.5_7b.md | Dense | Image-aware Decoder Enhanced à la Flamingo with Interleaved Cross-attentionS | YES | |||||||
65 | Jamba 1 | other | 67.4 | 1.7 | 52 | 5,000 | 97:1 | Mar/2024 | 2024 | 3 | 5.25 | AI21 | https://huggingface.co/ai21labs/Jamba-v0.1 | https://arxiv.org/abs/2403.19887 | MoE | ||||||||||
66 | Jamba 1.5 | other | 81.2 | 1,221 | 5.9 | 398 | 8,000 | 21:1 | Aug/2024 | 2024 | 8 | 5.67 | AI21 | https://huggingface.co/collections/ai21labs/jamba-15-66c44befa474a917fcf55251 | 53.5 | 36.9 | https://arxiv.org/abs/2408.12570 | MoE | optimized for business use cases and capabilities such as function calling, structured output (JSON), and grounded generation. | ||||||
67 | JetMoE-8B | other | 49.2 | 0.3 | 8 | 1,250 | 157:1 | Apr/2024 | 2024 | 4 | 5.33 | MIT | https://www.lepton.ai/playground/chat?model=jetmoe-8b-chat | https://huggingface.co/jetmoe/jetmoe-8b | MoE | ||||||||||
68 | K2 | other | 64.8 | 1.0 | 65 | 1,400 | 22:1 | May/2024 | 2024 | 5 | 5.42 | LLM360 | https://huggingface.co/LLM360/K2 | https://www.llm360.ai/blog/several-new-releases-to-further-our-mission.html | Dense | outperforming Llama 2 70B using 35% less compute. | |||||||||
69 | LFM-40B | other | 78.8 | 0.9 | 40 | 2,000 | 50:1 | Sep/2024 | 2024 | 9 | 5.75 | Liquid AI | https://labs.perplexity.ai/ | 55.6 | https://www.liquid.ai/liquid-foundation-models | MoE | |||||||||
70 | Llama 2 | meta | 68.9 | 1.2 | 70 | 2,000 | 29:1 | Jul/2023 | 2023 | 8 | 4.67 | Meta AI | https://www.llama2.ai/ | 37.5 | 26.26 | https://ai.meta.com/research/publications/llama-2-open-foundation-and-fine-tuned-chat-models/ | Dense | Open source LLM comes in 3 parameter sizes - 7, 30, and 70 bn, Context window=4096 | YES | significant models | |||||
71 | Llama 3 70B | meta | 82.0 | 1,206 | 3.4 | 70 | 15,000 | 215:1 | Apr/2024 | 2024 | 4 | 5.33 | Meta AI | https://meta.ai/ | 52.8 | https://ai.meta.com/blog/meta-llama-3/ | Dense | YES | significant models | ||||||
72 | Llama 3.1 405B | meta | 88.6 | 1,269 | 8.2 | 405 | 15,000 | 38:1 | Jul/2024 | 2024 | 7 | 5.58 | Meta AI | https://www.meta.ai/ | 73.3 | 51.1 | https://ai.meta.com/research/publications/the-llama-3-herd-of-models/ | Dense | YES | YES | significant models | ||||
73 | Llama 3.2 3B | meta | 63.4 | 0.6 | 3.21 | 9,000 | 2,804:1 | Sep/2024 | 2024 | 9 | 5.75 | Meta AI | https://www.llama.com/ | 32.8 | https://www.llama.com/ | Dense | YES | significant models | |||||||
74 | LLama 3.3 | meta | 86.0 | 1,256 | 3.4 | 70 | 15,000 | 215:1 | Dec/2024 | 2024 | 12 | 6.00 | Meta AI | https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct | 68.9 | 50.5 | https://huggingface.co/meta-llama/Llama-3.3-70B-Instruct | Dense | Enhanced performance | YES | YES | ||||
75 | LLaMA-65B | meta | 68.9 | 1.0 | 65 | 1,400 | 22:1 | Feb/2023 | 2023 | 2 | 4.17 | Meta AI | Weights leaked: https://github.com/facebookresearch/llama/pull/73/files | https://research.facebook.com/publications/llama-open-and-efficient-foundation-language-models/ | Dense | Researchers only, noncommercial only. | YES | ||||||||
76 | Mamba | other | 26.2 | 0.1 | 2.8 | 300 | 108:1 | Dec/2023 | 2023 | 12 | 5.00 | CMU | https://huggingface.co/havenhq/mamba-chat | https://arxiv.org/abs/2312.00752 | Dense | ||||||||||
77 | MAP-Neo | other | 58.1 | 0.6 | 7 | 4,500 | 643:1 | May/2024 | 2024 | 5 | 5.42 | International | https://map-neo.github.io/ | https://arxiv.org/abs/2405.19327 | Dense | YES | |||||||||
78 | Minitron-4B | other | 58.6 | 0.1 | 4 | 94 | 24:1 | Aug/2024 | 2024 | 8 | 5.67 | NVIDIA | https://huggingface.co/nvidia/Minitron-4B-Base | https://arxiv.org/abs/2407.14679 | Dense | ||||||||||
79 | Minitron-8B | other | 63.8 | 0.1 | 4 | 94 | 24:1 | Jul/2024 | 2024 | 7 | 5.58 | NVIDIA | https://huggingface.co/nvidia/Mistral-NeMo-Minitron-8B-Base | https://blogs.nvidia.com/blog/mistral-nemo-minitron-8b-small-language-model/ | Dense | significant models | |||||||||
80 | Mistral 7B | mistral | 30.9 | 0.3 | 7.3 | 800 | 110:1 | Sep/2023 | 2023 | 9 | 4.75 | Mistral | https://huggingface.co/mistralai | https://mistral.ai/news/announcing-mistral-7b/ | Open source, outperforms Llama2 | YES | significant models | ||||||||
81 | Mistral Large | mistral | 81.2 | 1,251 | 5.2 | 300 | 8,000 | 27:1 | Feb/2024 | 2024 | 2 | 5.17 | Mistral | https://poe.com/Mistral-Large | https://mistral.ai/news/mistral-large/ | Dense | natively fluent in English, French, Spanish, German, and Italian | significant models | |||||||
82 | Mistral Large 2 | mistral | 84.0 | 1,249 | 3.3 | 123 | 8,000 | 66:1 | Jul/2024 | 2024 | 7 | 5.58 | Mistral | https://huggingface.co/mistralai/Mistral-Large-Instruct-2407 | https://mistral.ai/news/mistral-large-2407/ | Dense | significant models | ||||||||
83 | Mistral Small | mistral | 72.2 | 0.5 | 7 | 3,000 | 429:1 | Feb/2024 | 2024 | 2 | 5.17 | Mistral | https://chat.mistral.ai/chat | https://mistral.ai/news/mistral-large/ | Dense | Optimised for latency and cost. | significant models | ||||||||
84 | Mistral-medium | mistral | 75.3 | 2.6 | 180 | 3,500 | 20:1 | Dec/2023 | 2023 | 12 | 5.00 | Mistral | https://poe.com/ | https://mistral.ai/news/la-plateforme/ | Dense | ||||||||||
85 | mixtral-8x22b | mistral | 77.8 | 1.8 | 141 | 2,000 | 15:1 | Apr/2024 | 2024 | 4 | 5.33 | Mistral | https://huggingface.co/mistral-community/Mixtral-8x22B-v0.1 | https://mistral.ai/news/mixtral-8x22b/ | MoE | ||||||||||
86 | mixtral-8x7b-32kseqlen | mistral | 70.6 | 2.0 | 46.7 | 8,000 | 172:1 | Dec/2023 | 2023 | 12 | 5.00 | Mistral | https://www.together.ai/blog/mixtral | 43.3 | https://arxiv.org/abs/2401.04088 | MoE | processes input and generates output at the same speed and for the same cost as a 12B model.' | ||||||||
87 | NeMo | mistral | 68.0 | 0.5 | 12 | 2,000 | 167:1 | Jul/2024 | 2024 | 7 | 5.58 | Mistral | https://huggingface.co/mistralai/Mistral-Nemo-Base-2407 | https://mistral.ai/news/mistral-nemo/ | Dense | ||||||||||
88 | Nemotron-3 22B | other | 54.4 | 1.0 | 22 | 3,800 | 173:1 | Nov/2023 | 2023 | 11 | 4.92 | NVIDIA | https://huggingface.co/nvidia/nemotron-3-8b-base-4k | https://developer.nvidia.com/blog/nvidia-ai-foundation-models-build-custom-enterprise-chatbots-and-co-pilots-with-production-ready-llms/ | Dense | Nvidia's LLM | |||||||||
89 | Nemotron-4 15B | other | 64.2 | 1.2 | 15 | 8,000 | 534:1 | Feb/2024 | 2024 | 2 | 5.17 | NVIDIA | https://arxiv.org/abs/2402.16819 | Dense | Nvidia's LLM | ||||||||||
90 | Nemotron-4-340B | other | 81.1 | 5.8 | 340 | 9,000 | 27:1 | Jun/2024 | 2024 | 6 | 5.50 | NVIDIA | https://build.nvidia.com/nvidia/nemotron-4-340b-instruct | https://d1qx31qr3h6wln.cloudfront.net/publications/Nemotron_4_340B_8T.pdf | Dense | Nvidia's LLM. Open-source equiv of Mar/2023 GPT-4 | YES | YES | significant models | ||||||
91 | NLVM 1.0 | other | 82.0 | 3.8 | 72 | 18,000 | 250:1 | Sep/2024 | 2024 | 9 | 5.75 | NVIDIA | https://huggingface.co/nvidia/NVLM-D-72B | https://arxiv.org/abs/2409.11402 | Dense | Flamingo clone. | |||||||||
92 | Nova Pro* | other | 85.9 | 1,244 | 3.2 | 90 | 10,000 | 112:1 | Dec/2024 | 2024 | 12 | 6.00 | Amazon | https://aws.amazon.com/bedrock/ | 46.9 | https://www.aboutamazon.com/news/aws/amazon-nova-artificial-intelligence-bedrock-aws | Dense | First major LLM from Amazon, same performance as LLama 3.2 | |||||||
93 | Nova* | other | 88.8 | May/2024 | 2024 | 5 | 5.42 | Rubiks AI | https://rubiks.ai/ | https://rubiks.ai/nova/release/ | Dense | ||||||||||||||
94 | o1 pro* | openAI | 92.3 | 1,364 | 6.7 | 200 | 20,000 | 100:1 | Dec/2024 | 2024 | 12 | 6.00 | OpenAI | https://chatgpt.com/ | 91 | 79 | https://chatgpt.com/ | MoE | a version of our most intelligent model that thinks longer for the most reliable responses | YES | |||||
95 | o1* | openAI | 92.3 | 1,335 | 6.7 | 200 | 20,000 | 100:1 | Sep/2024 | 2024 | 9 | 5.75 | OpenAI | https://chatgpt.com/ | 91 | 78.3 | https://openai.com/index/introducing-openai-o1-preview/ | MoE | YES | significant models | |||||
96 | OLMoE-1B-7B | other | 54.1 | 0.7 | 6.9 | 5,900 | 856:1 | Sep/2024 | 2024 | 9 | 5.75 | Allen AI | https://huggingface.co/collections/allenai/olmoe-66cf678c047657a30c8cd3da | 23 | https://arxiv.org/abs/2409.02060v1 | MoE | |||||||||
97 | OpenELM | other | 26.8 | 0.2 | 3.04 | 1,500 | 494:1 | Apr/2024 | 2024 | 4 | 5.33 | Apple | https://huggingface.co/apple/OpenELM-3B-Instruct | https://arxiv.org/abs/2404.14619 | Dense | ||||||||||
98 | Orion-14B | other | 69.6 | 0.6 | 14 | 2,500 | 179:1 | Jan/2024 | 2024 | 1 | 5.08 | OrionStar | https://github.com/OrionStarAI/Orion | https://arxiv.org/abs/2401.12246 | Dense | English, Chinese, Japanese, Korean, and other languages. | |||||||||
99 | Palmyra X | other | 70.2 | 1.0 | 72 | 1,200 | 17:1 | Jan/2024 | 2024 | 1 | 5.08 | Writer | https://writer.com/blog/palmyra-helm-benchmark/ | Dense | |||||||||||
100 | phi-3-medium | microsoft | 78.2 | 0.9 | 14 | 4,800 | 343:1 | Apr/2024 | 2024 | 4 | 5.33 | Microsoft | https://huggingface.co/microsoft/Phi-3-medium-128k-instruct | 55.7 | https://arxiv.org/abs/2404.14219 | Dense |