A | B | |
---|---|---|
1 | Reddit: https://www.reddit.com/r/LocalLLaMA/ Discord: https://discord.gg/Y8H8uUtxc3 | Reddit: https://www.reddit.com/r/LocalLLaMA/ Discord: https://discord.gg/Y8H8uUtxc3 |
2 | For inference I use KoboldCPP (latest version as they come out) | |
3 | I create a .bat file for each model, and run it. Here's a .bat example for Mixtral 8x7b model: title koboldcpp :start koboldcpp ^ --model mixtral-8x7b-instruct-v0.1.Q4_K_M.gguf ^ --contextsize 4096 ^ --usecublas ^ --gpulayers 6 ^ --threads 9 pause goto start | |
4 | When the GUI opens up, here's how I configure KoboldCPP settings. First I change "quick presets" to [Default], then I move temperature and rep penalty to minimum and deactivate all samplers for the most deterministic output I can get. | |
5 | I change the "Instract Tag Preset" (or "Start/End Sequence") to whatever prompt template is listed on the model's Huggingface page. | |
6 | ||
7 | ||
8 | ||
9 | ||
10 | ||
11 | ||
12 | ||
13 | ||
14 | ||
15 | ||
16 | ||
17 | ||
18 | ||
19 | ||
20 | ||
21 | ||
22 | ||
23 | ||
24 | ||
25 | ||
26 | ||
27 | ||
28 | ||
29 | ||
30 | ||
31 | ||
32 | ||
33 | ||
34 | ||
35 | Some good benchmarks: | |
36 | https://oobabooga.github.io/benchmark.html | |
37 | ||
38 | ||
39 | ||
40 | ||
41 | ||
42 | ||
43 | ||
44 | ||
45 | ||
46 | ||
47 | ||
48 | ||
49 | ||
50 | ||
51 | ||
52 | ||
53 | ||
54 | ||
55 | ||
56 | ||
57 | ||
58 | ||
59 | ||
60 | ||
61 | ||
62 | ||
63 | ||
64 | ||
65 | ||
66 | ||
67 | ||
68 | ||
69 | ||
70 | ||
71 | ||
72 | ||
73 | ||
74 | ||
75 | ||
76 | ||
77 | ||
78 | ||
79 | ||
80 | ||
81 | ||
82 | ||
83 | ||
84 | ||
85 | ||
86 | ||
87 | ||
88 | ||
89 | ||
90 | ||
91 | ||
92 | ||
93 | ||
94 | ||
95 | ||
96 | ||
97 | ||
98 | ||
99 | ||
100 |