Multi-GPU measurements

	A	B	C	D	E	F	G	H	I	J	K	L	M	N	O	P	Q	R	S
1	Data from official TF benchmark
2	https://www.tensorflow.org/performance/benchmarks
3
4	Data format: NCHW NCCL: false Variable update: parameter server PS: CPU Dataset shape: imagenet
5
6
7		batch size	64	64	64	64	64	64	64	64	32	32	32	32	32	32	32	32
8		dataset	synth	synth	synth	synth	real	real	real	real	synth	synth	synth	synth	real	real	real	real
9		model	inception3	inception3	resnet50	resnet50	inception3	inception3	resnet50	resnet50	inception3	inception3	resnet50	resnet50	inception3	inception3	resnet50	resnet50
10		machine	dgx-1	gce	dgx-1	gce	dgx-1	gce	dgx-1	gce	dgx-1	gce	dgx-1	gce	dgx-1	gce	dgx-1	gce
11		gpu type	P100	K80	P100	K80	P100	K80	P100	K80	P100	K80	P100	K80	P100	K80	P100	K80	median:
12	gpus:
13	images/s	1	142	30	219	52	142	30.6	218	51.2	128	29.3	195	49.5	130	29.5	193	49.3
14		2	284	58	422	99	278	58.4	425	98.8	259	55	368	95.4	257	55.4	369	95.3
15		3	596	116	852	195	551	115	853	194	520	109	768	183	507	110	760	186
16		4	1131	227	1734	387	1079	225	1630	381	995	216	1485	362	966	216	1410	359
17	speedup	1	1.00x	1.00x	1.00x	1.00x	1.00x	1.00x	1.00x	1.00x	1.00x	1.00x	1.00x	1.00x	1.00x	1.00x	1.00x	1.00x	1.00x
18		2	2.00x	1.93x	1.93x	1.90x	1.96x	1.91x	1.95x	1.93x	2.02x	1.88x	1.89x	1.93x	1.98x	1.88x	1.91x	1.93x	1.93x
19		4	4.20x	3.87x	3.89x	3.75x	3.88x	3.76x	3.91x	3.79x	4.06x	3.72x	3.94x	3.70x	3.90x	3.73x	3.94x	3.77x	3.87x
20		8	7.96x	7.57x	7.92x	7.44x	7.60x	7.35x	7.48x	7.44x	7.77x	7.37x	7.62x	7.31x	7.43x	7.32x	7.31x	7.28x	7.44x
21	efficiency	1	100.00%	100.00%	100.00%	100.00%	100.00%	100.00%	100.00%	100.00%	100.00%	100.00%	100.00%	100.00%	100.00%	100.00%	100.00%	100.00%	100.00%
22		2	100.00%	96.67%	96.35%	95.19%	97.89%	95.42%	97.48%	96.48%	101.17%	93.86%	94.36%	96.36%	98.85%	93.90%	95.60%	96.65%	96.42%
23		4	104.93%	96.67%	97.26%	93.75%	97.01%	93.95%	97.82%	94.73%	101.56%	93.00%	98.46%	92.42%	97.50%	93.22%	98.45%	94.32%	96.84%
24		8	99.56%	94.58%	98.97%	93.03%	94.98%	91.91%	93.46%	93.02%	97.17%	92.15%	95.19%	91.41%	92.88%	91.53%	91.32%	91.02%	93.02%
25	speedup from using synth vs. real dataset	1	1.00x	0.98x	1.00x	1.02x					0.98x	0.99x	1.01x	1.00x					1.00x
26		2	1.02x	0.99x	0.99x	1.00x					1.01x	0.99x	1.00x	1.00x					1.00x
27		4	1.08x	1.01x	1.00x	1.01x					1.03x	0.99x	1.01x	0.98x					1.01x
28		8	1.05x	1.01x	1.06x	1.02x					1.03x	1.00x	1.05x	1.01x					1.02x
29	speedup from using batch size 64. vs. 32	1	1.00x	1.00x	1.00x	1.00x	1.00x	1.00x	1.00x	1.00x									1.00x
30		2	0.99x	1.03x	1.02x	0.99x	0.99x	1.02x	1.02x	1.00x									1.01x
31		4	1.03x	1.04x	0.99x	1.01x	0.99x	1.01x	0.99x	1.00x									1.01x
32		8	1.02x	1.03x	1.04x	1.02x	1.02x	1.00x	1.02x	1.02x									1.02x
33	speedup from GPU P100 vs. K80		4.73x		4.21x		4.64x		4.26x		4.37x		3.94x		4.41x		3.91x		4.40x
34			4.90x		4.26x		4.76x		4.30x		4.71x		3.86x		4.64x		3.87x
35			5.14x		4.37x		4.79x		4.40x		4.77x		4.20x		4.61x		4.09x
36			4.98x		4.48x		4.80x		4.28x		4.61x		4.10x		4.47x		3.93x
37	comparison of resnet50 vs. inception3				1.54x	1.73x			1.54x	1.67x			1.52x	1.69x			1.48x	1.67x	1.61x
38					1.49x	1.71x			1.53x	1.69x			1.42x	1.73x			1.44x	1.72x
39					1.43x	1.68x			1.55x	1.69x			1.48x	1.68x			1.50x	1.69x
40					1.53x	1.70x			1.51x	1.69x			1.49x	1.68x			1.46x	1.66x
41
42
43
44	Observations:
45	- scaling is really good - with more GPUs efficiency goes down a little bit (but still over 90%) - sometimes there's superlinear speedup - probably due to noise in the 1-GPU measurement - using real or synth dataset doesn't show any significant effect, thus we can use synthetic dataset to estimate performance on real dataset - using batch size 64 or 32 doesn't show any significant effect - this kind of training is 4.4x faster (median) on P100 than K80 - training resnet50 is 1.61x faster (median) than inception3 in this benchmark - both architectures on this dataset have roughly 24-26 million parameters - baseline performance on 1x Tesla K80 is 30 images/sec on InceptionV3 and 50 images/sec on Resnet50
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100