EPOC and NetSage
for WestNet
Jennifer Schopf Jason Zurawski
TACC / UT Austin ESnet
EPOC/NetSage is supported NSF award #1826994
Thermodynamics, Economics, and Gravity win
(Dec 1, 2020)
© 2021, Engagement and Performance Operations Center (EPOC)
4
5/12/2021
Data Center Building 1.
….Let me tell you a story….
5
Today’s Discussion
6
Why an Engagement Operations Center?
7
Because, remember….
…..We’re building networks to support research and education, not just to have cool networks
8
Engagement and Performance
Operations Center (EPOC)
Core Mission
10
Current Regional Partners (13)
11
EPOC Six Focus Areas Currently
Roadside Assistance
13
Roadside Assistance - Consulting
Roadside Assistance is not “normal network engineering problem solving”
Soft Failures are Different from Hard Failures
16
EPOC Year 3 RA Overview- 97 Cases
17
RA Focus: Routing Issues
18
Routing Working Group
19
RA Focus: MTU settings impacting performance
20
RA Current Lessons Learned
21
EPOC Deep Dive Vision
oil change, or planning to buy a car
actual scientists
Deep Dive Overview
Deep Dive: Face-to-Face Discussion
We Walk Through Scientific Components…
And Also More Technical Aspects…
5. Software Infrastructure
6. Network and Data Architecture
7. Cloud Services
8. Outstanding Issues and Pain Points
Local and regional IT staff are critical for this information, and help form valuable partnerships that may not exist or could use strengthening
Virtual Deep Dives (Shallow Splash)
27
Deep Dive: Outputs
Deep Dives So Far
29
Deep Dive Futures
30
Deep Dive Futures (2)
31
Data Mobility Exhibition (DME)/
Baseline Performance Testing
Example: University Idaho to UCAR/NCAR
33
EPOC Going Forward
34
Training
Joint work with University South Carolina
36
How EPOC Relates
37
EPOC:
Understanding and supporting science use cases
Smallest difference for the biggest change
Debugging any and all network complications�Data mobility across ecosystem: sw, hw. nw
Targeted work with anyone, focusing mostly on those who aren’t affiliated with an R1 or a large center
CaRCC/RCD:
CI community for Research Data Computing
Broad set of focus areas, including workforce, emerging centers, practitioners
CI focused, not science focused
NRP:
National overlay network that joins distributed computational resources
Novel tech and connections
Focus on largest of large applications
XSEDE/ACCESS:
Links and supports NSF computational & storage resources
Robust and broad communities
Networking and data mobility are not a core focus
TrustedCI
NSF #1920430
XSEDE/ ACCESS
NSF #1548562
CaRCC
CI CoE:
RCD Resource & Career Ctr
NSF #2100003
CI CoE:
CI Compass
NSF #2127548
NRP
NSF #2112167
EPOC
NSF #1826994
CI Compass:
Support NSF Major Facilities
Focus - data lifecycle
Not looking at data movement, smaller projects
Trusted CI:
All aspects of the cybersecurity landscape and how it relates to R&E use
Library of training and reference materials built internally, and through community collaboration
Any Questions on the Rest of EPOC Before I go into NetSage in detail?
Monitoring using NetSage
39
NetSage Data Sources
40
NetSage Ingest
41
Ingest Pipeline
Ingest Pipeline
Flow Data collection
42
NetSage Privacy
43
NetSage - Built around answering questions
44
Built around answering questions:
45
Interesting pattern. What does it mean?
Singapore to Taiwan via LA?
Why so slow?
NetSage Focus on Use Cases and Questions
46
EPOC NetSage Deployments
FRGP: SNMP, Flow
GPN: SNMP, Flow
iLight: Flow
LEARN: SNMP, Flow
PNWGP: SNMP, Flow
SoX: SNMP, Flow
Sun Corridor: Flow
TACC: Flow
47
Sample NetSage - FRGP
48
FRGP Front Page: https://frgp.netsage.global
49
Pick the question to answer
Change the timeframe
50
Top Pairs as a table
51
52
53
54
55
Bandwidth Data from SNMP
56
Bandwidth data from SNMP per-circuit
57
What do flows like like for my institution?
58
59
Sun Corridor - Top Senders over Time
60
Sun Corridor - Top Receivers over Time
61
62
Sun Corridor General Information
63
64
65
Click on institution
OR
Click on spinning S, then pick “What are top flows by organization”;
then type in institution’s name
66
67
Click on UT Austin (or Individual Flows and set the other end to UT Austin…)
68
69
Protocols and Ports
70
Everything we know about one of the flows
71
72
NetSage Great Plains Network
73
74
From another example -
Regular backup/sync with the cloud…
…but, did we know this was happening? Should the performance be better?
75
76
Using SWIP for better Organizational Detail
77
Adding Data to the Science Registry
78
Global Science Registry
79
Heat Map - Darker is more by volume
80
Sun Corridor - Geo traffic
81
Science Data Patterns
82
83
84
85
If we also had SNMP Data: Analysis
86
Things that we’ve added…
87
88
Another Use Case
89
90
91
Sometimes the heatmap is more helpful to see patterns
92
93
94
Who’s transferring data to/from U Wyoming?
On the “all” portal
95
On the “all”portal
96
On the CENIC portal
97
Ah - CENIC in Denver, that makes more sense
98
99
NOAA and NetSage
…. And more!
100
What orgs connected to iLight are transferring data with NOAA in Boulder?
101
102
What orgs connected to iLight are transferring data with NOAA in Boulder? (2)
103
What orgs connected to iLight are transferring data with NOAA in Boulder? (3)
104
NSF International Circuits – Flow Data Collectors
105
NOAA over NSF International Links- To
106
What orgs connected to NSF links are transferring data with NOAA in Boulder?
107
What orgs connected to NSF links are transferring data with NOAA in Boulder? (2)
108
What orgs connected to NSF links are transferring data to NOAA GFDL?
109
What orgs connected to NSF links are transferring data from NOAA GFDL?
What else can I see about the GFDL-Chonbuk flows?
110
Set Source
Set Dest
111
What else can I see about the GFDL-Chonbuk flows? (2)
112
What do the GFDL-Chonbuk flows look like over time? - Volume
113
What do the GFDL-Chonbuk flows look like over time? - Rate
NSF International Circuits – Flow Data Collectors
114
What path?
115
Change Sensor
What path? (2)
116
Change Sensor
Jason Zurawski [4:10 PM]: the path is just a bit weird in general…
6 irb.3901.brtr.denv.nwave.noaa.gov (137.75.72.10) 6.926 ms 6.906 ms 6.911 ms
7 137.75.72.131 (137.75.72.131) 7.157 ms 7.160 ms 7.151 ms
8 et-4-3-0.3532.rtsw.seat.net.internet2.edu (198.71.46.247) 32.380 ms 32.397 ms 32.390 ms
9 et-4-0-0.4070.rtsw.port.net.internet2.edu (162.252.70.82) 36.150 ms 36.090 ms 36.160 ms
10 et-3-0-0.4070.rtsw.sunn.net.internet2.edu (162.252.70.85) 49.405 ms 49.428 ms 49.405 ms
11 et-2-1-0.4070.rtsw.losa.net.internet2.edu (162.252.70.71) 56.335 ms 56.352 ms 56.347 ms
12 vlan-966.rtr.hong.transpac.org (198.71.45.137) 255.984 ms 256.047 ms 233.723 ms
13 134.75.107.17 (134.75.107.17) 196.534 ms 219.096 ms 219.091 ms
14 kreonet2-hongkong.daej.kreonet2.net (134.75.105.17) 226.032 ms 194.154 ms 194.062 ms
15 kreonet-dj-bb1-kreonet2-gr-bb2.daej.kreonet2.net (134.75.105.113) 215.589 ms 193.395 ms 193.388 ms
16 134.75.8.22 (134.75.8.22) 217.603 ms 195.386 ms 195.470 ms
17 210.98.55.2 (210.98.55.2) 224.620 ms 247.868 ms 246.877 ms
so NOAA TIC in Denv to Seat, then down to LOSA, then across transpac to hong kong
117
Path….
118
119
So this path was…
120
Return Path:
1 gateway (210.117.228.1) 0.299 ms 0.461 ms 0.626 ms
2 134.75.14.6 (134.75.14.6) 0.429 ms 0.556 ms 0.736 ms
3 134.75.105.241 (134.75.105.241) 0.755 ms 0.814 ms 0.836 ms
4 seattle-kreonet2.seat.kreonet2.net (134.75.105.82) 110.118 ms 110.131 ms 110.119 ms
5 abilene-1-lo-jmb-706.sttlwa.pacificwave.net (207.231.240.8) 110.211 ms 110.228 ms 110.223 ms
6 et-4-0-0.4079.rtsw.miss2.net.internet2.edu (162.252.70.0) 120.847 ms 120.765 ms 120.814 ms
7 et-4-0-0.4079.rtsw.minn.net.internet2.edu (162.252.70.58) 144.183 ms 144.208 ms 143.942 ms
8 et-1-1-5.4079.rtsw.eqch.net.internet2.edu (162.252.70.106) 152.047 ms 152.062 ms 152.116 ms
9 ae-0.4079.rtsw3.eqch.net.internet2.edu (162.252.70.163) 152.113 ms 152.021 ms 152.060 ms
10 ae-1.4079.rtsw.clev.net.internet2.edu (162.252.70.130) 177.572 ms 160.933 ms 160.930 ms
11 ae-0.4079.rtsw.ashb.net.internet2.edu (162.252.70.128) 168.358 ms 168.397 ms 168.263 ms
12 et-11-3-0-1275.clpk-core.maxgigapop.net (206.196.177.2) 170.035 ms 170.070 ms 170.126 ms
13 nwave-clpk-re.demarc.maxgigapop.net (206.196.177.189) 169.990 ms 170.215 ms 170.180 ms
14 lo-0.1.brtr.wash.nwave.noaa.gov (137.75.100.7) 170.921 ms 170.969 ms 171.090 ms
15 137.75.68.19 (137.75.68.19) 170.778 ms 170.804 ms 170.721 ms
16 irb.3903.rtr3.wash.nwave.noaa.gov (137.75.68.22) 171.369 ms 171.201 ms 171.932 ms
17 ae-3.2.rtr.wash.nwave.noaa.gov (140.172.70.100) 175.550 ms 175.488 ms 175.806 ms
18 140.208.63.9 (140.208.63.9) 176.860 ms 176.797 ms 175.603 ms
121
Return Path
122
123
What else can I learn about this interaction – zooming in
124
Lets Zoom In
Zoom In
125
Time Frame
Zoom In (2) – Every 5 minutes…?
126
Zoom In (3)
127
Other Time Frame
Performance of the Individual Flows
128
So…. Now what?
129
130
What NetSage Does Best
131
Takeaways
NetSage is funded by US NSF award #1540933
EPOC is funded by US NSF award #1826994
132
Acknowledgements
133