Lassen Latency

	A	B	C	D	E	F	H	I	J	K	L	M
1	system	lassen					system	lassen
2	cuda	10.1.168					cuda	10.1.168
3	job	426694					job	426729
4	nodes	1					nodes	2
5	message size (B)	GPU-GPU	GPU-CPU	CPU-GPU	CPU-CPU	GPU-GPU/CPU-CPU	message size (B)	GPU-GPU	GPU-CPU	CPU-GPU	CPU-CPU	GPU-GPU/CPU-CPU
6	0	0.32	0.31	0.33	0.33	1.0	0	1.24	1.23	1.23	1.22	1.0
7	1	18.63	30.25	29.85	0.37	50.4	1	6.23	15.51	15.14	1.26	4.9
8	2	18.65	30.26	29.84	0.37	50.4	2	6.25	15.44	15.02	1.26	5.0
9	4	18.65	30.24	29.82	0.37	50.4	4	6.27	15.48	15.02	1.26	5.0
10	8	18.66	30.2	29.87	0.38	49.1	8	6.29	15.78	15.14	1.27	5.0
11	16	18.62	30.24	29.92	0.38	49.0	16	6.21	15.7	15.07	1.3	4.8
12	32	18.65	30.19	29.86	0.4	46.6	32	6.21	15.73	15.04	1.3	4.8
13	64	18.64	30.01	29.69	0.4	46.6	64	6.21	15.81	15.19	1.31	4.7
14	128	18.7	29.87	29.5	0.41	45.6	128	6.22	15.82	15.13	1.44	4.3
15	256	18.72	29.93	29.58	0.45	41.6	256	6.29	16.28	15.41	1.85	3.4
16	512	18.67	29.92	29.6	0.48	38.9	512	6.34	16.38	15.52	1.95	3.3
17	1024	18.69	30.04	29.72	0.54	34.6	1024	6.43	16.44	15.65	2.19	2.9
18	2048	18.71	30.26	29.94	0.67	27.9	2048	6.66	16.78	16.08	3	2.2
19	4096	18.64	41.91	41.45	0.79	23.6	4096	7.08	17.34	16.71	3.66	1.9
20	8192	18.72	54.19	53.46	1.01	18.5	8192	7.62	18.16	17.99	5.1	1.5
21	16384	18.78	35.45	35.19	3.49	5.4	16384	8.6	31.12	30.97	7.32	1.2
22	32768	18.63	36.66	36.34	3.95	4.7	32768	10.32	45.5	45.16	9.54	1.1
23	65536	18.97	40.96	40.17	4.86	3.9	65536	13.7	72.42	71.5	12.46	1.1
24	131072	20.33	45.09	44.88	7.29	2.8	131072	18.18	17.9	17.98	17.91	1.0
25	262144	21.89	53.03	52.85	12.48	1.8	262144	27.33	26.89	27.54	26.55	1.0
26	524288	25.72	68.75	68.44	20.46	1.3	524288	46.92	46.3	46.26	46.01	1.0
27	1048576	33.31	98.84	98.53	37.52	0.9	1048576	83.52	83.54	83.29	83.03	1.0
28	2097152	48.31	150.94	163.42	71.85	0.7	2097152	159.15	158.74	158.73	158.36	1.0
29	4194304	79.04	326.54	323.19	250.9	0.3	4194304	401.65	354.74	354.89	309.25	1.3
30
31
32
33
34
35	IBM PAPER 'AC922 Data Movement for CORAL' reports 8us for DD_1 small messages. I can't reproduce this result on lassen. See the tests below with various resource set configurations. All runs below were tested with '-latency gpu-gpu' and '-latency gpu-cpu'; the difference was less than a few percent. '-latency gpu-gpu' results are reported below.
36
37
38
39	grep -A 23 ^0 <output_file> \| awk '{print $2}'
40		cores	gpus	task-per-rs	num-rs	notes
41	rsA	10	1	1	2	one task per half socket, on one socket
42	rsB	1	1	1	2	naive
43	rsC	1	2	1	2	one task and gpu per socket
44	rsD	40	4	2	1	all resources, let jsrun figure it out
45	rsE	2	1	2	1	two tasks using the same gpu
46	nodes	1
47	rs	rsA	rsB	rsC	rsD	rsE
48	job	429115	429033	429110	429161	429193
49	message size (B)	GPU-GPU	GPU-GPU	GPU-GPU	GPU-GPU	GPU-GPU
50	0	0.46	0.31	0.87	0.31	0.3
51	1	17.55	18.57	19.13	18.27	95.56
52	2	17.48	18.51	19.34	18.36	93.95
53	4	17.51	18.49	19.46	18.38	93.91
54	8	17.45	18.52	19.43	18.29	93.88
55	16	17.5	18.43	19.48	18.29	93.84
56	32	17.48	18.45	19.45	18.29	93.81
57	64	17.48	18.48	19.44	18.3	93.79
58	128	17.48	18.44	19.46	18.28	93.82
59	256	17.51	18.44	19.39	18.36	93.89
60	512	17.49	18.45	19.38	18.36	93.85
61	1024	17.47	18.47	19.45	18.25	93.86
62	2048	17.53	18.48	19.4	18.37	93.88
63	4096	17.47	18.42	19.45	18.26	93.89
64	8192	17.46	18.51	19.52	18.32	94.02
65	16384	17.45	18.42	19.45	18.25	93.95
66	32768	17.55	18.41	19.72	18.26	93.94
67	65536	17.82	18.68	20.41	18.4	94.12
68	131072	19.04	19.42	21.96	19.5	94.13
69	262144	21.05	21.58	25.19	21.46	94.25
70	524288	24.93	25.14	31.63	25.27	95.26
71	1048576	32.56	32.79	44.9	33.04	96.77
72	2097152	47.7	47.85	70.6	47.85	101.33
73	4194304	78.45	78.55	121.69	78.53	107.85
74
75
76	nv_peer_mem version
77	smith516@lassen709: ~/develop/ac922NetworkPerformance (master)$ modinfo nv_peer_mem
78	filename: /lib/modules/4.14.0-49.18.1.bl6.ppc64le/extra/nv_peer_mem.ko
79	version: 1.0-4
80	license: Dual BSD/GPL
81	description: NVIDIA GPU memory plug-in
82	author: Yishai Hadas
83	rhelversion: 7.5
84	srcversion: 5608581351378487D59AF1A
85	depends: nvidia,ib_core
86	name: nv_peer_mem
87	vermagic: 4.14.0-49.18.1.bl6.ppc64le SMP mod_unload modversions
88
89
90
91
92
93
94
95
96
97
98
99
100