Report on the Research Networking Technical WG
Shawn McKee / University of Michigan
Internet2 Community Measurement, Metrics, and Telemetry meeting
May 12, 2020
Presentation Overview
Since this group is involved with network monitoring and metrics, I wanted to provide a quick update on a new activity that is relevant.
For High-Energy Physics (HEP), we have identified a need to better understand our traffic in-flight (as well as a few other items where we have a very good level of consensus).
I want to update you on a new effort to organize a technical working group to address some specific areas of interest to HEP that are relevant for the broader R&E community globally.
2
WLCG Network Requirements
3
New Research Networking Technical WG
The HEPiX NFV report was presented to the WLCG experiments and NRENs during the January 2020 LHCONE/LHCOPN meeting and discussed in detail. We achieved a strong consensus that this work needed to move forward ASAP!
The three areas proposed for work are:
To move forward we organized a new Research Networking Technical Working Group (RNTWG), focused on addressing the identified needs of HEP and the NRENs (and others!)
Charter for the group is at https://docs.google.com/document/d/1l4U5dpH556kCnoIHzyRpBl74IPc0gpgAG3VPUp98lo0/edit?usp=sharing
Kickoff meeting was April 21 https://indico.cern.ch/event/911274/
4
Making our network use visible
Understanding HEP traffic flows in detail is critical for understanding how our complex systems are actually using the network. Current monitoring/logging tell us where data flows start and end, but is unable to understand the data in flight. In general the monitoring we have is experiment specific and very difficult to correlate with what is happening in the network.
(See next slide example)
5
Packet Marking Overview (Feasibility)
The proposal is to provide a mechanism to mark our network packets with the experiment/owner and activity
6
Pacing/Shaping WAN data flows
It remains a challenge for HEP storage endpoints to utilize the network efficiently and fully.
7
Network orchestration
8
Straw man proposal for a work plan
We already identified areas of work, so our proposed work plan would be (per area):
Goal: to finish prototype packet marking stage by EoY (or Q1 2021)*
9
Packet Marking Sub Group
Since Packet Marking was first on the list, we have a soon-to-be-announced document focused on organizing this work
See draft here
Join the mailing list to participate
Please join if you are interested!
My goal is to have some amount of R&E traffic being labeled by the end of this calendar year.
10
Acknowledgements
We would like to thank the WLCG, HEPiX, perfSONAR and OSG organizations for their work on the topics presented.
In addition we want to explicitly acknowledge the support of the National Science Foundation which supported this work via:
11
Questions?
Questions, Comments, Suggestions?
12
References
WG Meetings and Notes: https://indico.cern.ch/category/10031/
SDN/NFV Tutorial: https://indico.cern.ch/event/715631/
2018 IEEE/ACM Innovating the Network for Data-Intensive Science (INDIS) – http://conferences.computer.org/scw/2018/#!/toc/3
OVN/OVS overview: https://www.openvswitch.org/
GEANT Automation, Orchestration and Virtualisation (link)
Cloud Native Data Centre Networking (book)
MPLS in the SDN Era (book)
13
Backup slides
14
Packet Marking Challenges
We would like this to be applicable for ALL significant R&E network users/science domains, not just HEP
How best to use the number of bits we can get?
What can we rely on from the Linux network stack and what do we need to provide?
What can the network operators provide for accounting?
15
Packet Marking - Storage Elements
The primary challenge here is in two areas:
16
Packet Marking - Jobs
As jobs source data onto the network OR pull data into the job, we should try to ensure the corresponding packets are marked appropriately
17
Packet Marking - IPv6
IPv6 incorporates a “Flow Label” in the header (20 bits)
18
Packet Marking - IPv4
IPv4 incorporates a “Options” in the header (allowing to add more 32 bit words)
19
Network Functions Virtualisation WG
Mandate: Identify use cases, survey existing approaches and evaluate whether and how Software Defined Networking (SDN) and Network Functions Virtualisation (NFV) should be deployed in HEP. �
Team: 60 members including R&Es (GEANT, ESNet, Internet2, AARNet, Canarie, SURFNet, GARR, JISC, RENATER, NORDUnet) and sites (ASGC, PIC, BNL, CNAF, CERN, KIAE, FIU, AGLT2, Caltech, DESY, IHEP, Nikhef) ��Monthly meetings started in Jan 2018 (https://indico.cern.ch/category/10031/)
20
NFV WG Report
NFV WG produced an interim-report that could serve as one of the inputs for the LHCOPN/LHCONE feedback
Executive summary for NFV Phase 1 report is at https://docs.google.com/document/d/1w7XUPxE23DJXn--j-M3KvXlfXHUnYgsVUhBpKFjyjUQ/edit#heading=h.flthknqgm3ub
Report has 3 chapters:
Cloud Native DC Networking
Programmable WAN
Proposed Areas of Future Work
Future (phase 2) is partially the work of this RNT WG, but we may end up separating out a more focused NFV/SDN group.
21
Future Work for Experiments/NRENs
The report proposes areas of future work with the experiments
During the LHCONE/LHCOPN meeting we heard consistent interest in making network use more visible (all VOs), more effective (CMS pacing, others) and orchestrated (managed, controlled). This matches what we identified:
Areas proposed for this WG (pages 53-56):
22
NFV Report Conclusions
The primary challenge we face is ensuring that WLCG and its constituent collaborations will have the networking capabilities required to most effectively exploit LHC data for the lifetime of the LHC. To deliver on this challenge, automation is a must. The dynamism and agility of our evolving applications, tools, middleware and infrastructure require automation of at least part of our networks, which is a significant challenge in itself. While there are many technology choices that need discussion and exploration, the most important thing is ensuring the experiments and sites collaborate with the RENs, network engineers and researchers to develop, prototype and implement a useful, agile network infrastructure that is well integrated with the computing and storage frameworks being evolved by the experiments as well as the technology choices being implemented at the sites and RENs.
23
Research Networking Technical WG
Charter:
https://docs.google.com/document/d/1l4U5dpH556kCnoIHzyRpBl74IPc0gpgAG3VPUp98lo0/edit#
Mailing list:
http://cern.ch/simba3/SelfSubscription.aspx?groupName=net-wg
Members (79 as of today, in no particular order):
Christian Todorov (Internet2) Frank Burstein (BNL) Richard Carlson (DOE) Marcos Schwarz (RNP) Susanne Naegele Jackson (FAU)
Alexander Germain (OHSU) Casey Russell (CANREN) Chris Robb (GlobalNOC/IU) Dale Carder (ESnet) Doug Southworth (IU)
Eli Dart (ESNet) Eric Brown (VT) Evgeniy Kuznetsov (JINR) Ezra Kissel (ESnet) Fatema Bannat Wala (LBL) Joseph Breen (UTAH) James Blessing (Jisc) James Deaton (Great Plains Network) Jason Lomonaco (Internet2) Jerome Bernier (IN2P3) Jerry Sobieski
Ji Li (BNL) Joel Mambretti (Northwestern) Karl Newell (Internet2) Li Wang (IHEP) Mariam Kiran (ESnet) Mark Lukasczyk (BNL)
Matt Zekauskas (Internet2) Michal Hazlinsky (Cesnet) Mingshan Xia (IHEP) Paul Acosta (MIT) Paul Howell (Internet2)
Paul Ruth (RENCI) Pieter de Boer (SURFnet) Roman Lapacz (PSNC) Sri N () Stefano Zani (CNAF) Tamer Nadeem (VCU)
Tim Chown (Jisc) Tom Lehman (ESnet) Vincenzo Capone (GEANT) Wenji Wu (FNAL) Xi Yang (ESnet) Chin Guok (ESnet)
Tony Cass (CERN) Eric Lancon (BNL) James Letts (UCSD) Harvey Newman (Caltech) Duncan Rand (Jisc)
Edoardo Martelli (CERN) Shawn McKee (Univ. of Michigan) Simone Campana (CERN) Andrew Hanushevsky (SLAC)
Marian Babik (CERN) James William Walder () Petr Vokac () Alexandr Zaytsev (BNL) Raul Cardoso Lopes () Mario Lassnig (CERN) Han-Wei Yen () Wei Yang (Stanford) Edward Karavakis (CERN) Tristan Suerink (Nikhef) Garhan Attebury (UNL) Pavlo Svirin ()
Shan Zeng (IHEP) Jin Kim (KISTI) Richard Cziva (ESnet) Phil Demar (FNAL) Justas Balcas (Caltech) Bruno Hoeft (FZK)
24