Vernard Martin, Solutions Architect and Infrastructure Team Lead�Scientific Computing & Bioinformatics Services �Office of Advanced Molecular Detection�Centers for Disease Control & Prevention
Data Movement Hardware & Software
This document is a result of work by volunteer LCI instructors and is licensed under CC BY-NC-ND 4.0 (https://creativecommons.org/licenses/by-nc-nd/4.0/).
Data Transfer Node
2 – ESnet Science Engagement (engage@es.net) - 5/12/2024
Motherboard and Chassis selection
3 – ESnet Science Engagement (engage@es.net) - 5/12/2024
PCI Slot Considerations
4 – ESnet Science Engagement (engage@es.net) - 5/12/2024
PCI Slot Considerations
5 – ESnet Science Engagement (engage@es.net) - 5/12/2024
PCI Bus Considerations
6 – ESnet Science Engagement (engage@es.net) - 5/12/2024
-Storage Architectures - Internal
Storage Architectures - External
DTNs in a Facility
40G/100G
Downstream
10G
10G
10G
DTNs in a Facility – Network Paths
40G/100G
Downstream
10G
10G
10G
DTNs in a Facility – Security
40G/100G
Downstream
10G
10G
10G
Storage Subsystem Selection
12 – ESnet Science Engagement (engage@es.net) - 5/12/2024
SSDs and HDs
13 – ESnet Science Engagement (engage@es.net) - 5/12/2024
SSD Form Factors
14 – ESnet Science Engagement (engage@es.net) - 5/12/2024
RAID Controllers
15 – ESnet Science Engagement (engage@es.net) - 5/12/2024
RAID Controller
16 – ESnet Science Engagement (engage@es.net) - 5/12/2024
Network Subsystem Selection
17 – ESnet Science Engagement (engage@es.net) - 5/12/2024
Reference Implementation (2023)
18 – ESnet Science Engagement (engage@es.net) - 5/12/2024
Reference Implementation (2020)
19 – ESnet Science Engagement (engage@es.net) - 5/12/2024
Reference Implementation (2020)
Preliminary System Performance Results for this configuration
�Initial system performance was evaluated utilizing two identical units, and a topology that consisted of:
The testing was performed on Ubuntu 18.04 using XFS on mdraid with O_DIRECT and standard POSIX I/O. ��The file sizes varied between 256 GB and 1T with consistent results between file sizes, as well as consistent results when writing pseudo-random data vs all-zeroes.
�Testing revealed it was possible to maintain a sustained total of 80Gbps on both connections of the NIC. Future testing will focus on the use of 100Gbps cable cables and optics.
20 – ESnet Science Engagement (engage@es.net) - 5/12/2024
Reference Implementation (2020)
Disk configuration effects on performance
�The test systems have employed several disk configurations which were somewhat more complicated:
21 – ESnet Science Engagement (engage@es.net) - 5/12/2024
Reference Implementation (2020)
CPU & Memory effects on performance
�Testing has revealed that not all 12 cores (on a single chip) are activated regularly, unless the degree of parallelism increases. Busy DTN requirements will push up this requirement. ��Populating all memory channels in the RAM configuration also ensures peak performance, as the system is able to fully utilize and balance load as needed.
22 – ESnet Science Engagement (engage@es.net) - 5/12/2024
Take Home Points
23 – ESnet Science Engagement (engage@es.net) - 5/12/2024
DTN Tuning is Art & Science
24 – ESnet Science Engagement (engage@es.net) - 5/12/2024
DTN Tuning �http://fasterdata.es.net/science-dmz/DTN/tuning
25 – ESnet Science Engagement (engage@es.net) - 5/12/2024
DTN Tuning
26 – ESnet Science Engagement (engage@es.net) - 5/12/2024
Network Tuning
# add to /etc/sysctl.conf
net.core.rmem_max = 67108864
net.core.wmem_max = 67108864
net.ipv4.tcp_rmem = 4096 87380 33554432
net.ipv4.tcp_wmem = 4096 65536 33554432
net.core.netdev_max_backlog = 250000
Add to /etc/rc.local to increase send queue depth
# increase txqueuelen
/sbin/ifconfig eth2 txqueuelen 10000
/sbin/ifconfig eth3 txqueuelen 10000
This info (and more) is available on fasterdata, formatted for cut and paste
# make sure cubic and htcp are loaded
/sbin/modprobe tcp_htcp
/sbin/modprobe tcp_cubic
# set default to CC alg to htcp
net.ipv4.tcp_congestion_control=htcp
# with the Myricom 10G NIC
# using interrupt coalencing helps a lot:
/usr/sbin/ethtool -C ethN rx-usecs 75
Please consider jumbo frames, but don’t just blindly turn them on
I/O Scheduler
28 – ESnet Science Engagement (engage@es.net) - 5/12/2024
Interrupt Affinity
29 – ESnet Science Engagement (engage@es.net) - 5/12/2024
Know Your Layout
30 – ESnet Science Engagement (engage@es.net) - 5/12/2024
SSD Issues – Write Sparingly
31 – ESnet Science Engagement (engage@es.net) - 5/12/2024
Benchmarking
32 – ESnet Science Engagement (engage@es.net) - 5/12/2024
Sample Data Transfer Results (2005)
Tool Throughput
scp: 140 Mbps
HPN patched scp: 1.2 Gbps
ftp 1.4 Gbps
GridFTP, 4 streams 5.4 Gbps
GridFTP, 8 streams 6.6 Gbps
33 – ESnet Science Engagement (engage@es.net) - 5/12/2024
Say NO to SCP (2016)
34 – ESnet Science Engagement (engage@es.net) - 5/12/2024
Tool | Throughput |
scp | 330 Mbps |
wget, GridFTP, FDT, 1 stream | 6 Gbps |
GridFTP and FDT, 4 streams | 8 Gbps (disk limited) |
Data Transfer Tools
35 – ESnet Science Engagement (engage@es.net) - 5/12/2024
Why Not Use SCP or SFTP?
36 – ESnet Science Engagement (engage@es.net) - 5/12/2024
A Fix For scp/sftp
37 – ESnet Science Engagement (engage@es.net) - 5/12/2024
sftp
38 – ESnet Science Engagement (engage@es.net) - 5/12/2024
Commercial Data Transfer Tools
39 – ESnet Science Engagement (engage@es.net) - 5/12/2024
GridFTP
40 – ESnet Science Engagement (engage@es.net) - 5/12/2024
41 – ESnet Science Engagement (engage@es.net) - 5/12/2024
42 – ESnet Science Engagement (engage@es.net) - 5/12/2024
The End
© 2019, Engagement and Performance Operations Center (EPOC)
43
5/12/2024