A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Threads | Engine version/type | Speed nps | Neural Net Name | Remark | |||||||||||||||||||||
2 | RTX 3080 & 3070 | 6 | lc0 v0.27 | 20x256 | 131229 | 11248 | lc0 -t 6 --backend=multiplexing --backend-opts="backend=cuda-fp16,a(gpu=0),b(gpu=1),c(gpu=0),d(gpu=1)" -w weights_11248.txt --nncache=2000000 --minibatch-size=1024 | |||||||||||||||||||
3 | RTX 3080 & 2070 Super | 6 | lc0 v0.26.3 | 20x256 | 120096 | 11248 | lc0 -t 6 --backend=multiplexing --backend-opts="backend=cuda-fp16,a(gpu=0),b(gpu=1),c(gpu=0),d(gpu=1)" -w weights_11248.txt --nncache=2000000 --minibatch-size=1024 | |||||||||||||||||||
4 | RTX 3090 | 4 | lc0 v0.26.3 | 20x256 | 101006 | 42850 | -t 4 -b cuda-fp16 --nncache=2000000 --minibatch-size=1024 | |||||||||||||||||||
5 | 3 x RTX 2080 TI | 2 | lc0 v0.23.1+git.6837b83 | 20x256 | 80814 | 42850 | -t 2 --backend=demux "--backend-opts=(backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1),(backend=cudnn-fp16,gpu=2)" --nncache=2000000 --minibatch-size=1024; go nodes 5000000 | |||||||||||||||||||
6 | 2x RTX TITAN | 4 | lc0 v0.20 dev (with PR 619) | 20x256 | 80000 | --threads=4 --backend=roundrobin --nncache=10000000 --cpuct=3.0 --minibatch-size=256 --max-collision-events=64 --max-prefetch=64 --backend-opts=(backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1) go infinite; NPS checked after 100 seconds (peak was over 100k, then it starts dropping) | ||||||||||||||||||||
7 | 4xV100 | 4 | lc0 cuda92 cudnn714 ubuntu | 20x256 | 78700 | 10040 | ./lc0 --backend=multiplexing --backend-opts="x(backend=cudnnhalf,gpu=0,max_batch=512),y(backend=cudnnhalf,gpu=1,max_batch=512),yy(backend=cudnnhalf,gpu=2,max_batch=512),yyy(backend=cudnnhalf,gpu=3,max_batch=512)" --no-smart-pruning --minibatch-size=1024 --threads=4 | |||||||||||||||||||
8 | RTX 2070 Super & 2070 | 4 | lc0 v0.20 dev (with PR 619) | 20x256 | 78401 | 32332 | same as above | |||||||||||||||||||
9 | RTX 2070 Super & 2070 | 12 | lc0 v0.22.0 | 20x256 | 76052 | 42425 | ba | |||||||||||||||||||
10 | 2 | lc0-cudnn Batchsize=256 node-collisions=32 | 20x256 | 66723 | kb1-256x20-2100000 | l | ||||||||||||||||||||
11 | GTX 1070 @ stock | 2 | lc0 v0.20.3 | 20x256 | 54150 | 32930 | -t 2 --backend=cudnn-fp16 --minibatch-size=1024 --nncache=20000000; go nodes 5000000 | |||||||||||||||||||
12 | mkl | 3 | lc0 v0.20.4 | 20x256 | 53838 | 11248 | --threads=3 --backend=roundrobin --nncache=10000000 --cpuct=3.0 --minibatch-size=256 --max-collision-events=64 --max-prefetch=64 --backend-opts=(backend=cudnn-fp16,gpu=0) go infinite; NPS checked at 100 seconds | |||||||||||||||||||
13 | 2 x RTX 2060 | 4 | lc0 v0.22.0 | 20x256 | 52410 | 40685 | -t 4 -backend=demux -nncache=1000000 -minibatch-size=512 -max-prefetch=32 -backend-opts=(backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1) go nodes 5000000 | |||||||||||||||||||
14 | 2 x RTX 2060 | 4 | lc0 v0.22.0 | 20x320 | 52340 | T40B.2-106 | -t 4 -backend=demux -nncache=1000000 -minibatch-size=512 -max-prefetch=32 -backend-opts=(backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1) go nodes 5000000 | |||||||||||||||||||
15 | RTX Titan | 3 | lc0 v0.20.1-rc1 | 20x256 | 50558 | 32392 | --minibatch-size=512 -t 3 --backend=cudnn-fp16 --nncache=10000000; go infinite; note down NPS after 1 min | |||||||||||||||||||
16 | RTX 2080 Ti @ 338W (~1785 MHz) | 2 | lc0 v0.20.2 (linux: fedora 29, 415.27, cuda 10.0, cudnn 7.4.2.24) | 20x256 | 50456 | 32392 | -t 2 --backend=cudnn-fp16 --minibatch-size=1024 --nncache=20000000; go nodes 5000000 | |||||||||||||||||||
17 | 2 | lc0 v0.20.1-rc1 | 20x256 | 46446 | --backend=cudnn-fp16 --nncache=10000000; go infinite; note down NPS after 1 min | |||||||||||||||||||||
18 | RTX Titan | 3 | lc0 v0.20.2 (linux: fedora 29, 415.27, cuda 10.0, cudnn 7.4.2.24) | 20x256 | 39390 | 32930 | --minibatch-size=512 -t 3 --backend=cudnn-fp16 --nncache=2000000; go nodes 5000000 | |||||||||||||||||||
19 | RTX 2070 (slight OC, +66Mhz core) | lc0-v0.19.1.1-windows-cuda.zip | 20x256 | 31723 | 32085 | --minibatch-size=1024 -t 2 --backend=multiplexing --backend-opts="x(backend=""cudnn-fp16"",gpu=0)" ; go nodes 1000000 | ||||||||||||||||||||
20 | 2 | lc0 v0.18.1 | 31433 | 11250 | ||||||||||||||||||||||
21 | TITAN V | 2 | lc0 v0.20.1-rc2 | 20x256 | 31004 | 10048 | --minibatch-size=512 -t 2 --backend=cudnn-fp16 --nncache=2000000; go nodes 1000000 | |||||||||||||||||||
22 | 6 | lc0-v0.19.0 | 20x256 | 29329 | 31748 | lc0 -t 6 -w weights_31748.txt --backend=multiplexing "--backend-opts="a(backend=cudnn-fp16,gpu=0,minibatch-size=512,nncache=2000000),b(backend=cudnn,gpu=1)" | ||||||||||||||||||||
23 | 2 | lc0-v0.18.1-windows-cuda10.0-cudnn7.3-for-2080.zip | 26135 | 11250? | --futile-search-aversion=0 --minibatch-size=1024 -t 2 --backend-opts="x(backend=""cudnn-fp16"",gpu=0)" | |||||||||||||||||||||
24 | RTX 2070 | lc0 v0.26.3 | 20x256 | 25480 | 42850 | lc0 benchmark default settings | ||||||||||||||||||||
25 | RTX2080(laptop) | 2 | lc0 v0.22.0 | 20x256 | 24142 | T40B.2-106 | lc0.exe -b cudnn-fp16 -w T40B.4-160;go nodes 100000 | |||||||||||||||||||
26 | 2 | lc0-v0.20.2 | ? | 21797 | 32742 | --minibatch-size=512 -t 2 --backend=cudnn-fp16 --nncache=2000000; go nodes 1000000 | ||||||||||||||||||||
27 | RTX 2060 | 4 | lc0-v0.18.1-windows-cuda10.0-cudnn7.3-for-2080.zip | 20x256 | 21413 | 11250 | .\lc0 --weights=weights_run1_11248.pb.gz --threads=4 --minibatch-size=256 --allowed-node-collisions=256 --cpuct=2.8 --nncache=10000000 --backend=multiplexing --backend-opts="(backend=cudnn,gpu=0),(backend=cudnn,gpu=1)" | |||||||||||||||||||
28 | 4 | lc0-v0.18.1 | 20x256 | 21413 | 11248 | |||||||||||||||||||||
29 | Radeon R9 390X | 2 | lc0-v0.26.3-windows10-gpu-dx12 | 10x128 | 19897 | 703810 | lc0 benchmark | |||||||||||||||||||
30 | RTX 2060 (laptop) | lc0 v0.26.3 | 20x256 | 17522 | 42850 | lc0 benchmark default settings | ||||||||||||||||||||
31 | GeForce GTX 780 Ti | 4 | lc0-v0.26.0 | 20x256 | 11000 | 1541000 | Default Settings, backend = cudnn-fp16, NNCacheSize = 20000000, MiniBatchSize = 1024, MaxCollisionsEvents = 1024, Threads = 4 | |||||||||||||||||||
32 | RX 5700 XT | 2 | lc0-v0.25.1-windows10-gpu-dx12 | 20x256 | 10981 | 42850 | ./lc0 benchmark | |||||||||||||||||||
33 | GTX 1080 Ti | LC0 V17.2 dev | 20x256 | 9208 | 10954 | GPU load 98-99%, GPU Temp <= 82°C, Fanspeed 95%, go movetime 130 000 | ||||||||||||||||||||
34 | GTX 1650 SUPER | 2 | lc0-v0.24.1-windows-gpu-nvidia-cuda | 20x256 | 8773 | 42850 | ./lc0 benchmark --threads=2 --backend=cudnn-fp16 --minibatch-size=512 --movetime=50000 | |||||||||||||||||||
35 | GTX 1080 | 2 | ? | 8000 | 11250 | |||||||||||||||||||||
36 | 2x GTX 1060 (6GB) | 1 | lc0-win-20180526 (cuda 9.2) | 20x256 | 7596 | kb1-256x20-2000000 | -t 12 --backend=multiplexing "--backend-opts=(backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)" --nncache=2000000 --minibatch-size=1024; go nodes 5000000 | |||||||||||||||||||
37 | 4 | LC0 Ver 17 RC2 (Windows) | 7005 | 10970 | -t 4 --minibatch-size=512 --backend=multiplexing --backend-opts=(backend=cudnn,gpu=0,max_batch=1024),(backend=cudnn,gpu=1,max_batch=1024) | |||||||||||||||||||||
38 | GTX 1070Ti | 2 | lc0.20.1 (cuda) | 20x256 | 6496 | 32965 | --nncache=8000000 --max-collision-events=256 --minibatch-size=256 --backend=multiplexing --cpuct=3.1 | |||||||||||||||||||
39 | GTX 1070Ti | 2 | Lc0 v0.17 | 20x256 | 5657 | 11149 | --futile-search-aversion=0 (the equivalent of --no-smart-pruning) and otherwise default settings | |||||||||||||||||||
40 | 2 | LC0 Ver 17 RC2 (Windows) | 20x256 | 3598 | 10970 | --threads=1 --fpu-reduction=0.2 --cpuct=1.2 --slowmover=1.5 --move-overhead=10 --no-smart-pruning | ||||||||||||||||||||
41 | GeForce GTX 1060 (3GB) | lc0 v0.24.1 cuda 10.0.0 cudnn 7.4.2 | 20x256 | 3365 | LS 14.3 | lc0 benchmark (v0.24.1) | ||||||||||||||||||||
42 | GTX 1060 (6GB) | 2 | lc0 v0.24.1 (cuda 10.2) | 20x256 | 3359 | 20x256SE-jj-9-53420000 | ||||||||||||||||||||
43 | GTX980m 10%underclock | 1 | lc0-win-20180522 (cuda 9.2) | 20x256 | 2258 | kb1-256x20-2000000 | ||||||||||||||||||||
44 | GTX 960 | 2 | lc0 v0.18.1 windows cuda | 20x256 | 2123 | 11248 | ||||||||||||||||||||
45 | GTX980m 10%underclock | 2 | 20x256 | 1855 | kb1-256x20-2000000 | --no-smart-pruning (value adjusted from 1093 to 1855 after rerun) | ||||||||||||||||||||
46 | GTX 750 Ti @ 1350/1350 MHz | 1 | lc0 v0.18.1 windows cuda | 20x256 | 1658 | 11258 | GPU Load 98%, GPU Temp 59°C, Fan Speed 39%, unknown command line flag --no-smart-pruning; -nncache 300000; go nodes 725000 | |||||||||||||||||||
47 | AMD R9 Fury X (core + mem 20% overclock) | 4 | v0.21.2-rc3 (Manjaro, OpenCL 2.1 AMD-APP (2841.4)) | 20x256 | 1518 | 42512 | result at 130.000 nodes, peaks at 550.000 nodes at 1880nps, then starts decreasing | |||||||||||||||||||
48 | GTX 950 | 1 | v0.16.0 custom build for Debian (cuda 9.1.85) | 20x256 | 1501 | 10687 | --no-smart-pruning --minibatch-size=1024 --threads=2 | |||||||||||||||||||
49 | GTX 750 Ti @ stock | 2 | lc0 v0.18.1 windows cuda | What is the "0" | 1314 | 11258 | GPU Load 100%, GPU Temp 50°C, Fan Speed 35%, unknown command line flag --no-smart-pruning | |||||||||||||||||||
50 | Nvidia Quadro K2200 | 2 | LC0 Ver 20.1 rc1 (Windows) | 20x256 | 1304 | 32194 | ||||||||||||||||||||
51 | AMD RX480 (core + mem 20% overclock) | 4 | v0.21.2-rc3 (Manjaro, OpenCL 2.1 AMD-APP (2841.4)) | 20x256 | 1133 | 42512 | result at 130.000 nodes, peaks at 540.000 nodes at 1385nps, then starts decreasing | |||||||||||||||||||
52 | Nvidia Geforce 840M | 4 | v0.21.2-rc3 (Manjaro, cuda 10.1.0, cudnn 7.5.1) | 20x256 | 433 | 42512 | ||||||||||||||||||||
53 | li | 16 | v0.21.2 OpenBlas (Windows) | 20x256 | 288 | 42699 | ||||||||||||||||||||
54 | GTX 470 | 2 | lczero v0.10 OpenCL (Windows) | 20x256 | 130 | 10021 | ||||||||||||||||||||
55 | AMD Ryzen 3 1200 @stock | 4 | lczero v0.10 OpenBlas (Windows) | 20x256 | 35 | 10021 | tested using bumblebee + optirun, peaks at 570.000 nodes at 539nps, then starts decreasing, stays around 527-528nps. | gp | ||||||||||||||||||
56 | ||||||||||||||||||||||||||
57 | Intel Core i7-4900MQ 4x 2,80GHz 500GB 16GB Quadro K2100M RW | LC0 Ver 20.1 rc1 (Windows) | 20x256 | 32805 | This was performed at the starting position. Whole game average was 63k (see latest paper and talkchess clarifications) | |||||||||||||||||||||
58 | TITAN V | 3 | lc0-win-20180708-cuda92-cudnn714 | 20x256 | 11248 | |||||||||||||||||||||
59 | MX150 | 2 | LC0 Ver 17 RC2 (Windows) | 20x256 | 11045 | --minibatch-size=256 --backend=cudnn | ||||||||||||||||||||
60 | TITAN XP | 2 | 20x256 | 10751 | This benchmark is done in Arena Chess. I suspect on Cutechess-cli Leela will achieve an even higher NPS. | |||||||||||||||||||||
61 | (for cudnn client start with --no-smart-pruning flag and do "go nodes 130000" and wait until it finishes). For faster GPUs let it run for 1m or even 5m nodes. | ×x | ||||||||||||||||||||||||
62 | CPU | |||||||||||||||||||||||||
63 | GeForce GTX 780 Ti | |||||||||||||||||||||||||
64 | Have clean up for the last time! after that i will close it! | |||||||||||||||||||||||||
65 | LCZero Benchmark Nodes/sec. GPU & CPU | |||||||||||||||||||||||||
66 | Please put your own bench scores here in sorted NPS order if you can. If you don't know what engine type, gpu is opencl and cpu is openblas + Networks ID ! | |||||||||||||||||||||||||
67 | RTX 2080 Ti @ 1290 MHz (~169W) | 2 | -t 2 --backend=cudnn-fp16 --minibatch-size=1024 --nncache=20000000; go nodes 5000000 | |||||||||||||||||||||||
68 | Run go infinite from start position and abort after depth 30 and report NPS output. | |||||||||||||||||||||||||
69 | 4x TPU | 20x256? | --threads=4 --backend=roundrobin --nncache=10000000 --cpuct=3.0 --minibatch-size=256 --max-collision-events=64 --max-prefetch=64 --backend-opts=(backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1) go infinite; NPS checked after 100 seconds (peak was over 100k, then it starts dropping) | |||||||||||||||||||||||
70 | ||||||||||||||||||||||||||
71 | ||||||||||||||||||||||||||
72 | ||||||||||||||||||||||||||
73 | ||||||||||||||||||||||||||
74 | ||||||||||||||||||||||||||
75 | ||||||||||||||||||||||||||
76 | ||||||||||||||||||||||||||
77 | ||||||||||||||||||||||||||
78 | ||||||||||||||||||||||||||
79 | ||||||||||||||||||||||||||
80 | ||||||||||||||||||||||||||
81 | ||||||||||||||||||||||||||
82 | ||||||||||||||||||||||||||
83 | ||||||||||||||||||||||||||
84 | ||||||||||||||||||||||||||
85 | ||||||||||||||||||||||||||
86 | ||||||||||||||||||||||||||
87 | ||||||||||||||||||||||||||
88 | ||||||||||||||||||||||||||
89 | ||||||||||||||||||||||||||
90 | ||||||||||||||||||||||||||
91 | ||||||||||||||||||||||||||
92 | ||||||||||||||||||||||||||
93 | ||||||||||||||||||||||||||
94 | ||||||||||||||||||||||||||
95 | ||||||||||||||||||||||||||
96 | ||||||||||||||||||||||||||
97 | ||||||||||||||||||||||||||
98 | ||||||||||||||||||||||||||
99 | ||||||||||||||||||||||||||
100 |