Network Troubleshooting:�Techniques and Approaches
Eli Dart, Network Engineer
ESnet Science Engagement
Lawrence Berkeley National Laboratory
CaRCC Systems-Facing Track meeting
Virtual (Pandemic)
October 21, 2021
Outline
2
10/20/21
The Internet
3
10/20/21
Selected networks and their missions
4
10/20/21
Notes about different networks
5
10/20/21
Topology Matters
6
10/20/21
Common Problems
7
10/20/21
Packet Loss
8
Metro Area
Local
(LAN)
Regional
Continental
International
Measured (TCP Reno)
Measured (HTCP)
Theoretical (TCP Reno)
Measured (no loss)
.
See Eli Dart, Lauren Rotman, Brian Tierney, Mary Hester, and Jason Zurawski. The Science DMZ: A Network Design Pattern for Data-Intensive Science. In Proceedings of the IEEE/ACM Annual SuperComputing Conference (SC13), Denver CO, 2013.
Common Causes of Packet Loss
9
10/20/21
Approaching WAN Performance Problems
10
10/20/21
Real-World Example – Using perfSONAR
11 – ESnet Science Engagement (engage@es.net) - 10/20/21
Wide Area Testing – User Problem Statement
12 – ESnet Science Engagement (engage@es.net) - 10/20/21
Wide Area Testing – Full Context
13 – ESnet Science Engagement (engage@es.net) - 10/20/21
Wide Area Testing – Long Clean Test
14 – ESnet Science Engagement (engage@es.net) - 10/20/21
Wide Area Testing – Dirty Tests
15 – ESnet Science Engagement (engage@es.net) - 10/20/21
Wide Area Testing – Problem Localization
16 – ESnet Science Engagement (engage@es.net) - 10/20/21
Slow tests indicate likely problem area
Example: Chile to California via Miami
17
10/20/21
Example: KSTAR (Fusion) to DOE HPC – 2013
18
10/20/21
Example: LHC Data – Europe to Pakistan
19
10/20/21
Example: Petascale DTN
20
10/20/21
Managing Sociology
21
10/20/21
Community Resources
22
10/20/21
Thanks!
Eli Dart
Energy Sciences Network (ESnet)
Lawrence Berkeley National Laboratory