LCZero Benchmarks
 Share
The version of the browser you are using is no longer supported. Please upgrade to a supported browser.Dismiss

 
%
123
 
 
 
 
 
 
 
 
 
ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
ThreadsEngine version/type
Neural Net size
Speed nps11248Remark
2
2x RTX TITAN4lc0 v0.20 dev (with PR 619) 20×25680000
--threads=4 --backend=roundrobin --nncache=10000000 --cpuct=3.0 --minibatch-size=256 --max-collision-events=64 --max-prefetch=64 --backend-opts=(backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1) go infinite; NPS checked after 100 seconds (peak was over 100k, then it starts dropping)
3
4xV1004lc0 cuda92 cudnn714 ubuntu20x2567870010040
./lc0 --backend=multiplexing --backend-opts="x(backend=cudnnhalf,gpu=0,max_batch=512),y(backend=cudnnhalf,gpu=1,max_batch=512),yy(backend=cudnnhalf,gpu=2,max_batch=512),yyy(backend=cudnnhalf,gpu=3,max_batch=512)" --no-smart-pruning --minibatch-size=1024 --threads=4
4
2x RTX TITAN4lc0 v0.20 dev (with PR 619)20x2567840132332same as above
5
RTX 2070 & 20606lc0 v0.20.220x2566841740685
-t 6 --backend=multiplexing "--backend-opts=(backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1)" --nncache=2000000 --minibatch-size=1024; go nodes 5000000
6
RTX 2070 & 20602lc0 v0.20.320x2565415032930-t 2 --backend=cudnn-fp16 --minibatch-size=1024 --nncache=20000000; go nodes 5000000
7
mkl3lc0 v0.20.420x2565383811248
--threads=3 --backend=roundrobin --nncache=10000000 --cpuct=3.0 --minibatch-size=256 --max-collision-events=64 --max-prefetch=64 --backend-opts=(backend=cudnn-fp16,gpu=0) go infinite; NPS checked at 100 seconds
8
RTX Titan3lc0 v0.20.1-rc120x2565055832392--minibatch-size=512 -t 3 --backend=cudnn-fp16 --nncache=10000000; go infinite; note down NPS after 1 min
9
RTX 2080 Ti @ 338W (~1785 MHz)2
lc0 v0.20.2 (linux: fedora 29, 415.27, cuda 10.0, cudnn 7.4.2.24)
20x2565045632930-t 2 --backend=cudnn-fp16 --minibatch-size=1024 --nncache=20000000; go nodes 5000000
10
RTX Titan2lc0 v0.20.1-rc120x2564644632392--backend=cudnn-fp16 --nncache=10000000; go infinite; note down NPS after 1 min
11
RTX 2080Ti3lc0 v0.18.120x2564291511250--minibatch-size=512 -t 3 --backend=cudnn-fp16 --nncache=2000000; go nodes 5000000
12
RTX 2080 Ti @ 1290 MHz (~169W)2
lc0 v0.20.2 (linux: fedora 29, 415.27, cuda 10.0, cudnn 7.4.2.24)
20x2563939032930-t 2 --backend=cudnn-fp16 --minibatch-size=1024 --nncache=20000000; go nodes 5000000
13
RTX 2080Ti2lc0 v0.18.120x2563749911250--minibatch-size=512 -t 2 --backend=cudnn-fp16 --nncache=2000000; go nodes 1000000
14
RTX 2080lc0-v0.19.1.1-windows-cuda.zip20x2563172332085
--minibatch-size=1024 -t 2 --backend=multiplexing --backend-opts="x(backend=""cudnn-fp16"",gpu=0)" ; go nodes 1000000
15
RTX 2070 (slight OC, +66Mhz core)2lc0 v0.18.120x2563143311250
16
TITAN V2lc0-win-20180708-cuda92-cudnn71420x2563100410048--minibatch-size=512 -t 2 --backend=cudnn-fp16 --nncache=2000000; go nodes 1000000
17
10x 1080ti-t 20lc0 cuda92 cudnn714 ubuntu20x2563030210040
18
RTX 2070 & GTX 10806lc0-v0.19.020x2562932931748
lc0 -t 6 -w weights_31748.txt --backend=multiplexing "--backend-opts="a(backend=cudnn-fp16,gpu=0,minibatch-size=512,nncache=2000000),b(backend=cudnn,gpu=1)"
19
RTX 20802lc0-v0.18.1-windows-cuda10.0-cudnn7.3-for-2080.zip20x2562613511250?
--futile-search-aversion=0 --minibatch-size=1024 -t 2 --backend=multiplexing --backend-opts="x(backend=""cudnn-fp16"",gpu=0)"
20
2lc0-v0.20.2?2179732742--minibatch-size=512 -t 2 --backend=cudnn-fp16 --nncache=2000000; go nodes 1000000
21
RTX 20604lc0-v0.18.1-windows-cuda10.0-cudnn7.3-for-2080.zip20x2562141311250
22
2 x gtx 1080 ti4lc0-v0.18.120x2562141311248
.\lc0 --weights=weights_run1_11248.pb.gz --threads=4 --minibatch-size=256 --allowed-node-collisions=256 --cpuct=2.8 --nncache=10000000 --backend=multiplexing --backend-opts="(backend=cudnn,gpu=0),(backend=cudnn,gpu=1)"
23
GTX 1080 TiLC0 V17.2 dev20x256920810954GPU load 98-99%, GPU Temp <= 82°C, Fanspeed 95%, go movetime 130 000
24
TITAN XP2lc0-win-20180708-cuda92-cudnn71420x256868610048
--no-smart-pruning --minibatch-size=1024 -t 2 --backend=multiplexing --backend-opts="x(backend=cudnn,gpu=1)"
25
1080 TIDefault (2)(Will Add Later) v1620x256847810751This benchmark is done in Arena Chess. I suspect on Cutechess-cli Leela will achieve an even higher NPS.
26
GTX 10802?800011250
27
1x Titan V (overclock +115 MHz Memory +20 MHz Core)
1lc0-win-20180526 (cuda 9.2)20x2567596kb1-256x20-2000000--fpu-reduction=0.2 --cpuct=1.2 --slowmover=1.5 --move-overhead=10 --no-smart-pruning
28
2x GTX 1060 (6GB)4LC0 Ver 17 RC2 (Windows)20x256700510970
-t 4 --minibatch-size=512 --backend=multiplexing --backend-opts=(backend=cudnn,gpu=0,max_batch=1024),(backend=cudnn,gpu=1,max_batch=1024)
29
GTX 1070 @ stock2lc0-cudnn Batchsize=256 node-collisions=3220x2566672kb1-256x20-2100000
30
GTX 1070Ti2lc0.20.1 (cuda)20x256649632965--nncache=8000000 --max-collision-events=256 --minibatch-size=256 --backend=multiplexing --cpuct=3.1
31
GTX 1070Ti2Lc0 v0.1720x256565711149--futile-search-aversion=0 (the equivalent of --no-smart-pruning) and otherwise default settings
32
GTX980m 10%underclock2LC0 Ver 17 RC2 (Windows)20x256359810970-t 2--minibatch-size=256 --backend=cudnn
33
1lc0-win-20180522 (cuda 9.2)20x2562258kb1-256x20-2000000--threads=1 --fpu-reduction=0.2 --cpuct=1.2 --slowmover=1.5 --move-overhead=10 --no-smart-pruning
34
GTX 9602lc0 v0.18.1 windows cuda20x256212311248
35
GTX980m 10%underclock220x2561855kb1-256x20-2000000--no-smart-pruning (value adjusted from 1093 to 1855 after rerun)
36
GTX 750 Ti @ 1350/1350 MHz1lc0 v0.18.1 windows cuda20x256165811258
GPU Load 98%, GPU Temp 59°C, Fan Speed 39%, unknown command line flag --no-smart-pruning; -nncache 300000; go nodes 725000
37
AMD R9 Fury X (core + mem 20% overclock)4v0.21.2-rc3 (Manjaro, OpenCL 2.1 AMD-APP (2841.4))20x256151842512result at 130.000 nodes, peaks at 550.000 nodes at 1880nps, then starts decreasing
38
GTX 9501v0.16.0 custom build for Debian (cuda 9.1.85)20x256150110687--no-smart-pruning --minibatch-size=1024 --threads=2
39
GTX 750 Ti @ stock2lc0 v0.18.1 windows cuda20x256131411258GPU Load 100%, GPU Temp 50°C, Fan Speed 35%, unknown command line flag --no-smart-pruning
40
Nvidia Quadro K22002LC0 Ver 20.1 rc1 (Windows)20x256130432194
41
AMD RX480 (core + mem 20% overclock)4 v0.21.2-rc3 (Manjaro, OpenCL 2.1 AMD-APP (2841.4))20x256113342512result at 130.000 nodes, peaks at 540.000 nodes at 1385nps, then starts decreasing
42
MX1502LC0 Ver 17 RC2 (Windows)20x25680511045--minibatch-size=256 --backend=cudnn
43
Nvidia Geforce 840M4v0.21.2-rc3 (Manjaro, cuda 10.1.0, cudnn 7.5.1)20x25643342512
44
GTX 4702lczero v0.10 OpenCL (Windows)20x25613010021
45
AMD Ryzen 3 1200 @stock4lczero v0.10 OpenBlas (Windows)20x2563510021
tested using bumblebee + optirun, peaks at 570.000 nodes at 539nps, then starts decreasing, stays around 527-528nps.
46
4x TPU20x256?
--threads=4 --backend=roundrobin --nncache=10000000 --cpuct=3.0 --minibatch-size=256 --max-collision-events=64 --max-prefetch=64 --backend-opts=(backend=cudnn-fp16,gpu=0),(backend=cudnn-fp16,gpu=1) go infinite; NPS checked after 100 seconds (peak was over 100k, then it starts dropping)
47
TITAN V3lc0 v0.20.1-rc220x25611248
48
Intel Core i7-4900MQ 4x 2,80GHz 500GB 16GB Quadro K2100M RW
LC0 Ver 20.1 rc1 (Windows)20x25632805
This was performed at the starting position. Whole game average was 63k (see latest paper and talkchess clarifications)
49
(for cudnn client start with --no-smart-pruning flag and do "go nodes 130000" and wait until it finishes). For faster GPUs let it run for 1m or even 5m nodes.
×
50
CPU
51
Have clean up for the last time! after that i will close it!
52
LCZero Benchmark Nodes/sec. GPU & CPU
53
Please put your own bench scores here in sorted NPS order if you can. If you don't know what engine type, gpu is opencl and cpu is openblas + Networks ID !
54
Run go infinite from start position and abort after depth 30 and report NPS output.
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
Loading...