ABCDEFGHIJKLMNOPQRSTUVWXYZ
1
systemlassensystemlassen
2
cuda10.1.168cuda10.1.168
3
job426694job426729
4
nodes1nodes2
5
message size (B)GPU-GPUGPU-CPUCPU-GPUCPU-CPUGPU-GPU/CPU-CPU
message size (B)
GPU-GPUGPU-CPUCPU-GPUCPU-CPU
GPU-GPU/CPU-CPU
6
00.320.310.330.331.001.241.231.231.221.0
7
118.6330.2529.850.3750.416.2315.5115.141.264.9
8
218.6530.2629.840.3750.426.2515.4415.021.265.0
9
418.6530.2429.820.3750.446.2715.4815.021.265.0
10
818.6630.229.870.3849.186.2915.7815.141.275.0
11
1618.6230.2429.920.3849.0166.2115.715.071.34.8
12
3218.6530.1929.860.446.6326.2115.7315.041.34.8
13
6418.6430.0129.690.446.6646.2115.8115.191.314.7
14
12818.729.8729.50.4145.61286.2215.8215.131.444.3
15
25618.7229.9329.580.4541.62566.2916.2815.411.853.4
16
51218.6729.9229.60.4838.95126.3416.3815.521.953.3
17
102418.6930.0429.720.5434.610246.4316.4415.652.192.9
18
204818.7130.2629.940.6727.920486.6616.7816.0832.2
19
409618.6441.9141.450.7923.640967.0817.3416.713.661.9
20
819218.7254.1953.461.0118.581927.6218.1617.995.11.5
21
1638418.7835.4535.193.495.4163848.631.1230.977.321.2
22
3276818.6336.6636.343.954.73276810.3245.545.169.541.1
23
6553618.9740.9640.174.863.96553613.772.4271.512.461.1
24
13107220.3345.0944.887.292.813107218.1817.917.9817.911.0
25
26214421.8953.0352.8512.481.826214427.3326.8927.5426.551.0
26
52428825.7268.7568.4420.461.352428846.9246.346.2646.011.0
27
104857633.3198.8498.5337.520.9104857683.5283.5483.2983.031.0
28
209715248.31150.94163.4271.850.72097152159.15158.74158.73158.361.0
29
419430479.04326.54323.19250.90.34194304401.65354.74354.89309.251.3
30
31
32
33
34
35
IBM PAPER 'AC922 Data Movement for CORAL' reports 8us for DD_1 small messages. I can't reproduce this result on lassen. See the tests below with various resource set configurations. All runs below were tested with '-latency gpu-gpu' and '-latency gpu-cpu'; the difference was less than a few percent. '-latency gpu-gpu' results are reported below.
36
37
38
39
grep -A 23 ^0 <output_file> | awk '{print $2}'
40
coresgpustask-per-rsnum-rsnotes
41
rsA10112
one task per half socket, on one socket
42
rsB1112naive
43
rsC1212
one task and gpu per socket
44
rsD40421
all resources, let jsrun figure it out
45
rsE2121
two tasks using the same gpu
46
nodes1
47
rsrsArsBrsCrsDrsE
48
job429115429033429110429161429193
49
message size (B)GPU-GPUGPU-GPUGPU-GPUGPU-GPUGPU-GPU
50
00.460.310.870.310.3
51
117.5518.5719.1318.2795.56
52
217.4818.5119.3418.3693.95
53
417.5118.4919.4618.3893.91
54
817.4518.5219.4318.2993.88
55
1617.518.4319.4818.2993.84
56
3217.4818.4519.4518.2993.81
57
6417.4818.4819.4418.393.79
58
12817.4818.4419.4618.2893.82
59
25617.5118.4419.3918.3693.89
60
51217.4918.4519.3818.3693.85
61
102417.4718.4719.4518.2593.86
62
204817.5318.4819.418.3793.88
63
409617.4718.4219.4518.2693.89
64
819217.4618.5119.5218.3294.02
65
1638417.4518.4219.4518.2593.95
66
3276817.5518.4119.7218.2693.94
67
6553617.8218.6820.4118.494.12
68
13107219.0419.4221.9619.594.13
69
26214421.0521.5825.1921.4694.25
70
52428824.9325.1431.6325.2795.26
71
104857632.5632.7944.933.0496.77
72
209715247.747.8570.647.85101.33
73
419430478.4578.55121.6978.53107.85
74
75
76
nv_peer_mem version
77
smith516@lassen709: ~/develop/ac922NetworkPerformance (master)$ modinfo nv_peer_mem
78
filename: /lib/modules/4.14.0-49.18.1.bl6.ppc64le/extra/nv_peer_mem.ko
79
version: 1.0-4
80
license: Dual BSD/GPL
81
description: NVIDIA GPU memory plug-in
82
author: Yishai Hadas
83
rhelversion: 7.5
84
srcversion: 5608581351378487D59AF1A
85
depends: nvidia,ib_core
86
name: nv_peer_mem
87
vermagic: 4.14.0-49.18.1.bl6.ppc64le SMP mod_unload modversions
88
89
90
91
92
93
94
95
96
97
98
99
100