A | B | C | D | E | F | G | H | I | J | K | L | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Model | without template | with template | with template multiturn | n-shot | diff without and with | diff without and with multiturn | |||||
2 | generate | multi_choice | generate | multi_choice | generate | multi_choice | generate | multi_choice | generate | multi_choice | ||
3 | Remek/OpenChat3.5-0106-Spichlerz-Inst-001 | 56,21 | 50,90 | 55,62 | 49,46 | 57,02 | 52,11 | 5 | -0,59 | -1,44 | 0,81 | 1,21 |
4 | berkeley-nest/Starling-LM-7B-alpha | 52,47 | 49,42 | 51,44 | 46,55 | 55,18 | 51,16 | 5 | -1,03 | -2,87 | 2,71 | 1,74 |
5 | mistralai/Mistral-7B-Instruct-v0.2 | 47,40 | 42,83 | 40,31 | 39,94 | 43,60 | 41,50 | 5 | -7,09 | -2,88 | -3,80 | -1,33 |
6 | HuggingFaceH4/zephyr-7b-alpha | 44,00 | 31,59 | 33,19 | 15,75 | 43,31 | 37,44 | 5 | -10,80 | -15,84 | -0,69 | 5,85 |
7 | openchat/openchat-3.5-0106 | 56,35 | 50,86 | 56,28 | 50,58 | 57,07 | 50,97 | 5 | -0,06 | -0,28 | 0,73 | 0,11 |
8 | Remek/OpenChat3.5-0106-Spichlerz-Bocian | 56,08 | 48,15 | 52,39 | 47,53 | 5 | -3,69 | -0,62 | ||||
9 | Nexusflow/Starling-LM-7B-beta | 56,38 | 48,43 | 51,42 | 47,90 | 5 | -4,96 | -0,53 | ||||
10 | Qwen/Qwen1.5-32B-Chat | 54,48 | 54,61 | 55,46 | 51,83 | 5 | 0,97 | -2,79 | ||||
11 | Qwen/Qwen1.5-72B-Chat | 54,34 | 58,92 | 60,60 | 56,06 | 5 | 6,26 | -2,86 | ||||
12 | Qwen/Qwen2-7B-Instruct | 54,73 | 47,61 | 53,30 | 42,98 | 5 | -1,42 | -4,63 | ||||
13 | meta-llama/Meta-Llama-3-70B-Instruct | 66,40 | 60,75 | 63,09 | 54,30 | 5 | -3,31 | -6,45 | ||||
14 | meta-llama/Meta-Llama-3-8B-Instruct | 53,53 | 44,19 | 13,29 | 19,16 | 54,23 | 40,75 | 5 | -40,24 | -25,03 | 0,70 | -3,44 |
15 | speakleash/Bielik-7B-Instruct-v0.1 | 47,97 | 37,56 | 47,95 | 12,04 | 43,32 | 42,52 | 5 | -0,02 | -25,52 | -4,66 | 4,96 |
16 | mistralai/Mistral-7B-Instruct-v0.3 | 49,50 | 43,02 | 43,85 | 39,75 | 49,44 | 40,42 | 5 | -5,65 | -3,26 | -0,06 | -2,59 |
17 | mistralai/Mixtral-8x22B-Instruct-v0.1 | 67,91 | 60,79 | 62,83 | 57,42 | 5 | -5,09 | -3,37 | ||||
18 | mistralai/Mixtral-8x7B-Instruct-v0.1 | 57,95 | 50,00 | 43,13 | 42,11 | 5 | -14,82 | -7,89 | ||||
19 | models/gwint2 | 65,58 | 62,65 | 65,45 | 63,31 | 5 | -0,13 | 0,66 | ||||
20 | openchat/openchat-3.5-0106-gemma | 57,36 | 53,34 | 39,02 | 41,36 | 50,09 | 41,25 | 5 | -18,34 | -11,99 | -7,27 | -12,09 |
21 | openchat/openchat-3.6-8b-20240522 | 56,77 | 51,85 | 57,78 | 47,57 | 5 | 1,01 | -4,28 | ||||
22 | ||||||||||||
23 | Remek/OpenChat3.5-0106-Spichlerz-Inst-001 | 35,91 | 27,43 | 36,36 | 35,24 | 36,36 | 35,24 | 0 | 0,45 | 7,82 | 0,45 | 7,82 |
24 | berkeley-nest/Starling-LM-7B-alpha | 36,83 | 25,38 | 42,22 | 46,49 | 42,22 | 46,49 | 0 | 5,39 | 21,11 | 5,39 | 21,11 |
25 | mistralai/Mistral-7B-Instruct-v0.2 | 33,75 | 23,48 | 30,41 | 33,17 | 30,41 | 33,17 | 0 | -3,34 | 9,69 | -3,34 | 9,69 |
26 | HuggingFaceH4/zephyr-7b-alpha | 27,25 | 24,36 | 27,52 | 30,11 | 27,52 | 30,11 | 0 | 0,27 | 5,75 | 0,27 | 5,75 |
27 | openchat/openchat-3.5-0106 | 37,48 | 28,15 | 39,48 | 43,20 | 39,48 | 43,20 | 0 | 2,00 | 15,05 | 2,00 | 15,05 |
28 | Remek/OpenChat3.5-0106-Spichlerz-Bocian | 36,33 | 28,80 | 30,98 | 39,27 | 0 | -5,34 | 10,47 | ||||
29 | Nexusflow/Starling-LM-7B-beta | 34,93 | 29,17 | 31,36 | 38,25 | 0 | -3,57 | 9,07 | ||||
30 | Qwen/Qwen1.5-32B-Chat | 25,20 | 42,33 | 33,76 | 40,21 | 0 | 8,56 | -2,12 | ||||
31 | Qwen/Qwen1.5-72B-Chat | 1,11 | 48,81 | 46,46 | 48,95 | 0 | 45,35 | 0,14 | ||||
32 | Qwen/Qwen2-7B-Instruct | 41,71 | 38,12 | 35,86 | 44,73 | 0 | -5,85 | 6,61 | ||||
33 | meta-llama/Meta-Llama-3-70B-Instruct | 44,84 | 41,33 | 48,05 | 42,88 | 0 | 3,22 | 1,55 | ||||
34 | meta-llama/Meta-Llama-3-8B-Instruct | 35,38 | 23,26 | 35,14 | 36,10 | 35,14 | 36,12 | 0 | -0,24 | 12,85 | -0,24 | 12,87 |
35 | speakleash/Bielik-7B-Instruct-v0.1 | 19,38 | 19,93 | 40,48 | 18,31 | 40,48 | 18,31 | 0 | 21,10 | -1,62 | 21,10 | -1,62 |
36 | mistralai/Mistral-7B-Instruct-v0.3 | 33,27 | 21,68 | 31,84 | 29,98 | 31,41 | 28,89 | 0 | -1,43 | 8,30 | -1,86 | 7,22 |
37 | mistralai/Mixtral-8x22B-Instruct-v0.1 | 55,32 | 50,30 | 46,90 | 52,87 | 0 | -8,42 | 2,57 | ||||
38 | mistralai/Mixtral-8x7B-Instruct-v0.1 | 41,50 | 30,08 | 36,69 | 38,25 | 0 | -4,81 | 8,17 | ||||
39 | models/gwint2 | 48,63 | 53,50 | 50,59 | 59,39 | 0 | 1,95 | 5,89 | ||||
40 | openchat/openchat-3.5-0106-gemma | 46,82 | 41,92 | 33,79 | 8,76 | 33,63 | 8,36 | 0 | -13,03 | -33,16 | -13,19 | -33,57 |
41 | openchat/openchat-3.6-8b-20240522 | 39,18 | 36,54 | 41,14 | 44,54 | 0 | 1,96 | 8,00 |