SOFA Basic
1
Quick Start (1/2): Prepare, Install, and Run
2
Quick Start (2/2): Visualization
3
Visualization display of heterogeneous performance data are stored in directory of ./sofalog/ , you could
4
X-axis = Unix Time Timestamps (seconds); Y-axis = Metrics with different Units (log10-scale) | |||
CPU | CPU time (seconds) | NET | Payload of each packet (bytes) |
VMSTAT_CS/VMSTAT_BI/VMSTAT_BO | counts/seconds | STRACE | duration (seconds) |
MPSTAT_USR | Seconds per 10-ms | GPU Kernel, CUDA_COPY_H2D (Host-to-Device) CUDA_COPY_D2H (Device-to-Host) | Duration (seconds) |
Heterogenous Traces Visualization in SOFA
5
GPU H2D memcpy
GPU D2H memcpy
GPU DNN Backward Propagation
GPU DNN Forward Propagation
CPU Utilization
Network Bandwidth
SOFA & Deep Learning
6
Case Study: Storage
7
Commands:
sudo sysctl -w vm.drop_caches=3
sofa record "dd if=/dev/zero of=dummy.out bs=10M count=500"
Case Study: Storage (cont.)
8
Commands:
sudo sysctl -w vm.drop_caches=3
sofa record "dd if=/dev/zero of=dummy.out bs=10M count=500"
sofa report --with_gui
Case Study: Storage (cont.)
9
Commands:
sudo sysctl -w vm.drop_caches=3
sofa record "dd if=/dev/zero of=dummy.out bs=10M count=500"
sofa report --with_gui
Case Study: Storage (cont.)
10
ls -lah /dev/mapper/
...
lrwxrwxrwx. 1 root root 7 1月 30 14:16 cl-home -> ../dm-2
lrwxrwxrwx. 1 root root 7 1月 30 14:16 cl-root -> ../dm-0
lrwxrwxrwx. 1 root root 7 1月 30 14:16 cl-swap -> ../dm-1
COMMAND:
sofa record "dd if=/dev/zero of=dummy.out bs=1M count=1000"
sofa report --with_gui
10 Hz diskstat monitoring, unit: read/write sectors.
Case Study: Storage (cont.)
11
MPSTAT Profiling:
CPU Utilization (%):
core USR SYS IDL IOW IRQ
0 0 0 97 0 0
1 0 9 75 14 0
2 0 0 99 0 0
3 1 3 88 6 0
4 1 7 90 0 0
5 0 54 32 12 0
6 0 6 46 46 0
7 0 0 95 3 0
CPU Time (s):
core USR SYS IDL IOW IRQ
0 0.03 0.03 3.11 0.02 0.00
1 0.00 0.32 2.41 0.46 0.00
2 0.01 0.02 3.17 0.00 0.00
3 0.06 0.10 2.84 0.22 0.00
4 0.04 0.25 2.91 0.00 0.00
5 0.00 1.72 1.02 0.40 0.00
6 0.03 0.20 1.48 1.48 0.00
7 0.00 0.03 3.06 0.11 0.00
Active CPU Time (s): 5.510
Active CPU ratio (%): 22
Def, Active CPU ratio = total non-idle time / ( elapsed time * CPU cores)
Case Study: Storage (cont.)
12
Exercise 1
Exercise 2
Exercise 3
Case Study: CUDA Memory Copy
13
What is the reason that cause network traces (i.e. tcpdump traces)?
Command:
sofa record ~/NVIDIA_CUDA-9.1_Samples/1_Utilities/bandwidthTest/bandwidthTest
SOFA - Advanced Usage
14
SOFA Advanced Usage
15
usage: sofa [-h] [--logdir /path/to/logdir/] [--verbose] [--pid PID]
[--profile_all_cpus] [--enable_strace] [--enable_tcpdump]
[--enable_py_stacks]
[--perf_events "cycles,instructions,cache-misses"]
[--blkdev BLKTRACE_DEVICE] [--netstat_interface NETSTAT_INTERFACE]
[--nvprof_inside] [--skip_preprocess]
[--gpu_filters "keyword1:color1,keyword2:color2"]
[--cpu_filters "keyword1:color1,keyword2:color2"]
[--cluster_ip "192.168.0.1,192.168.0.2"] [--cpu_top_k N]
[--num_iterations N] [--num_swarms N] [--cpu_time_offset_ms N]
[--strace_min_time F] [--plot_ratio N] [--viz_port N]
[--enable_aisi] [--enable_encode_decode] [--aisi_via_strace]
[--display_swarms] [--enable_swarms] [--base_logdir BASE_LOGDIR]
[--match_logdir MATCH_LOGDIR] [--hsg_multifeatures]
[--enable_vmstat] [--network_filters "ip1,ip2,ip3"]
[--cuda_api_tracing] [--potato_server "ip:port"]
[--absolute_timestamp] [--profile_region begin_time,end_time]
[--spotlight_gpu] [--with_gui] [--nvsmi_time_zone 8]
<SOFA_COMMAND> [<PROFILED_COMMAND>]
SOFA Advanced Usage (cont.)
16
More performance metrics:
sofa record "dd if=/dev/zero of=dummy.out bs=10M count=500" --perf_events="cycles,instructions,cache-misses,branch-misses"
More performance metrics:
sofa record "~/samples/1_Utilities/bandwidthTest/bandwidthTest" --cuda_api_tracing
More performance metrics:
sofa record "~/samples/1_Utilities/bandwidthTest/bandwidthTest" --enable_strace
More performance metrics:
sofa record "sleep 5" --enable_tcpdump
Background recording for daemon or multiple-command bash file
sofa record "sleep 20" --profile_all_cpus
Then, execute the target command
SOFA Advanced Usage (cont.)
17
Verbose mode to show more information, like the progress of generating report or displaying detailed reports (e.g., total system call time)�sofa report --verbose
Automatically identification iterative swarm and then expose per-iteration performance summary�sofa report --enable_aisi --num_iterations 20
Display top-10 hotspot swarms which are highlighted with different colors
sofa report --verbose --display_swarms
Reduce the number of points shown on visualization interfaces �sofa report --plot_ratio 10
Absolute or Relative (default) Timestamp
sofa report
sofa report --absoluate_timestamp
SOFA Advanced Usage (cont.)
18
Apply filters to highlight interested traces �sofa report --cpu_filters=’tensorflow:orange’ --gpu_filters=’fw:blue’ --gpu_filters=’bw:red’ --gpu_filters=nccl:purple’
Compare two-run traces swarm-by-swarm to find the affected swarms due to hardware/software/system changes:
sofa record "dd if=/dev/zero of=dummy.out bs=100M count=10" --logdir log1
sofa record "dd if=/dev/zero of=dummy.out bs=10M count=100" --logdir log2
sofa diff --base_logdir log1 --match_logdir log2
Absolute or Relative (default) Timestamp
Command:
blktrace full example
sudo sofa record "sleep 10" --blkdev=/dev/sda1
OR
sudo sofa record "dd if=/dev/zero of=dummy.out bs=1K count=2000000" --blkdev /dev/sda1
sudo sofa report --blkdev=/dev/sda1 --with_gui
Appendix