1 of 25

Packet and Flow Marking Technical Specification Update

Marian Babik (CERN), Shawn McKee (Univ. of Michigan)

net-wg@cern.ch | www.scitags.org

On behalf of the Research Networking Technical Working Group

2 of 25

News

  • Round-table ?

2

3 of 25

Scitags Architecture

3

4 of 25

Meetings Plan

Technical meetings to be scheduled on the different architectural areas �(in no particular order, TBD):�

  • Registry
    • Science domains and activities registry (google sheet); JSON API at http://api.scitags.org/
  • Flow Marking (UDP fireflies)
    • Updates based on feedback from XRootd development
    • Schema extensions, new attributes (bytes transferred, netlink information, etc.)
    • Communication with collectors (anycast, ports)
  • Packet Marking
    • IPv6 flow label, IPv6 header extensions (HbH, Dst options), SRv6 potentially
  • Collectors
    • Software collectors (syslog) - capturing UDP fireflies only - new deployments
    • In-line/HW collectors - capturing packets
  • DDM/FTS extensions to support passing flow id to the storages
  • R&E analytics and feedback loop to DDM

4

5 of 25

Registry

  • Science domain and activities registry google sheet
  • The plan is to develop a script that will migrate data to api.scitags.org
    • This can be automated/triggered on commit to scitags.org
    • api.scitags.org was created as an alias to make it possible to migrate away from github (which has a limit on number of queries)
  • Proposal is to add all relevant science domains and activities using edit-mode
    • Anyone can add new science domain and/or activity
      • WG should be notified of any changes (alternatively a small team of moderators)
    • We can re-evaluate once a year the list of top ~ 500 science domains based on their existing global traffic usage (threshold would be > 0.2%)
      • 500 is our current limit of science domains that we can support for packet marking
      • Note that we have no limit for flow marking (but we should have a way to clean up the list in any case)

5

6 of 25

Flow and Packet Marking Technical Specification

6

7 of 25

Technical Spec Changes

Overall, technical specification document has proven to be very useful during implementation as it was an essential reference guide. We did receive a number of proposals for changes based on the feedback from implementation. ��We need discuss how we want to keep it updated.

Proposal is to raise any changes in the WG meeting and agree on 4 possible outcomes:

  • Accept as mandatory (requires schema version update + new implementation)
  • Accept as optional (requires schema version update)
  • Postpone until more information available
  • Reject

7

8 of 25

UDP firefly proposed new schema attributes

  • Bytes transferred per flow
    • Add “usage”: {“received”: bytes, “sent”: bytes”}
    • Proposal is to add this field as mandatory
  • Netlink information attribute
    • Add “netlink”: {list of key-value pairs from netlink}
    • Examples (from CS8): retransmits, rto, sndmss, rcvmss, rtt, pmtu, cwnd, cong-algo (~ 30 in total per flow), see M-Lab’s tcp-info for list of possible attributes and an existing implementation
    • Content depends on the operating system/kernel version
    • Proposal is to add this field as optional (using superset of key-value pairs from CS9)
  • Timestamps
    • Right now: start_time and end_time = current_time for start and end firefly packets
    • current_time thus only make sense in continuation firefly packets
      • No implementation yet (neither flowd nor xrootd issue continuation packets)
    • Proposal is to make current_time optional (if continuation packets are not implemented)
      • This assumes continuation packets will be kept as optional

8

9 of 25

Science domain and activities defaults

Defaults for science domain and activity in case storage is unable to determine them (or not found in registry) - this likely needs to be split into different cases:

  • UDP firefly - none of the two are available
  • UDP firefly - science domain available but not activity
  • Packet marking - none of the two are available

Proposal is to create default science domain/activity and assign them value 0

9

10 of 25

UDP firefly communication changes:

  • UDP firefly continuation packets interleave
    • To have intermediate/continuation fireflies we need to define their interleave period
    • However this is not simple - we could try to determine this from avg. flow durations once we have more data (proposal to postpone)
  • Discovery of external IP address
    • Currently we have different implementations, but no clear recommendation
    • This likely needs more time/experience (proposal to postpone)
  • UDP firefly dst port for packets sent to the destination of the original flow
    • Currently we’re using 10514, the main concern is that if there is some service on the target storage listening on that port firefly packets would be classified as DoS
    • Possible options are to change to 514 or try to use 0
  • Use anycast for the UDP firefly dst address
    • Not intended for firefly packets sent to the dst of the original flow
    • Only for firefly packets sent directly to a collector (as we have been using during testing)
    • This requires a more in-depth discussion on how we want to design/operate network of collectors (proposal postpone)

10

11 of 25

Summary

Schema attributes:

  • Bytes transferred
  • Netlink
  • Timestamps

Science domain and activities defaults

UDP firefly communication changes:

  • Continuation packets
  • Discovery of ext. addresses
  • Dst port for fireflies sent to dedicated collector
  • Anycast

11

12 of 25

Meetings Plan

Technical meetings to be scheduled on the different architectural areas:�

  • Flow Marking (UDP fireflies)
    • Updates based on feedback from XRootd development
    • Communication with collectors (anycast, ports)
    • Schema extensions, new attributes (bytes transferred, netlink information, etc.)
  • Registry
    • Science domains and activities registry (google sheet); JSON API at http://api.scitags.org/
  • Packet Marking
    • Based on feedback from XRootd development, we will re-discuss the current status and possible options
    • IPv6 flow label, IPv6 header extensions (HbH, Dst options), SRv6 potentially
    • Important aspect is potential deployment of CS9 Linux (with kernel 5.14)
  • Collectors
    • Software collectors (syslog) - capturing UDP fireflies only - new deployments
    • In-line/HW collectors - capturing packets and/or UDP fireflies
    • Collectors network and organisation
  • DDM/FTS extensions to support passing flow id to the storages
  • R&E analytics and feedback loop to DDM

12

13 of 25

Collectors intro

  • What we would like to have

  • What we have now

13

src

dst

Site’s edge collector

R&E1 collector

R&E2 collector

src

dst

R&E collector

UDP fireflies

Original flow (w/packet marking)

Collects both UDP fireflies and/or packet markings

14 of 25

Questions, comments ?

14

Prototype code of the flow service (flowd)

implementing UDP fireflies

Prototype testing as part of the WLCG Data Challenges effort in collaboration with ESnet

15 of 25

Backup slides

15

16 of 25

Concepts

  • Marking is based on two different approaches
    • Flow marking using UDP fireflies (works for both IPv4 and IPv6)
    • Packet marking using IPv6 flow label and/or header extensions
  • Both carry flow identifier, which at present is an encoded representation of experiment/science domain and activity
    • For UDP fireflies flow id can be extended with other fields in the future
    • For packet marking the space is restricted due to number of bits available in the headers
  • Experiments and activities need to be registered prior to their usage
    • Registry serves this purpose and ensures RENs and DDMs have consistent view
  • Designed to work with proxies, cached proxies and private networks
  • Generators, collectors, storage and analytics can evolve independently

16

17 of 25

Technical Specification Updates

  • Content
    • Packet and Flow Marking Definitions
    • Flow Service
    • Flow Identifier Lifecycle
      • Provides overview of the expected functionality from each storage/transfer component
      • Proposes extension to Xroot and HTTP TPC protocols
    • Prototype Implementation Plan�
  • Protocols updates
    • Xroot protocol extension with <scitag.flow> attribute to pass flow identifier as part of the URL
    • HTTP TPC protocol extension (passing flow identifier as part of the HTTP headers)
  • UDP firefly packet specification
    • Payload is a syslog message that conforms to RFC5424
      • Last part of the syslog message is a structured data specification (in JSON)
      • JSON schema for the structured data is also available
  • Flow registry specification
    • Maps experiments and activities to IDs
    • Draft JSON schema, which is already used in the API
    • https://www.scitags.org/api.json

17

18 of 25

Implementation

  • Flow service (flowd) - developed to help test and validate the approach
    • Provides reference implementation of the technical specification
    • Storage systems can either provide their own implementation or use flowd
    • Written in python, runs as Linux service (integrates with systemd/journal, supports CC8/C8/docker)
  • Provides pluggable system to test different flow/packet marking strategies.
    • Currently supports flow marking (UDP fireflies) via sampling plugin (netstat) or storage API
    • Sampling plugin using netlink instead of netstat is also in development
      • Can provide additional information per connection (TCP cong. algo, RTT/RTO, CWND, bytes sent/rcvd)
    • Possibility to combine storage API to mark start/end flow and sampling plugin to add additional information
      • This might be needed for storages that don’t have access to the underlying socket interface�
  • XRoot 5.4.0 release
    • Full implementation of the UDP firefly spec (marks start and end of each flow)
    • UDPs fireflies are sent to a dedicated endpoint
    • Supports different options to detect flow identifiers (both experiments and activities)
    • Connects to flow registry API
  • Initial implementation of packet marking in XRoot also exists but requires further testing

18

19 of 25

WLCG Data Challenge

  • Aim was to test and validate our approach in gradual steps, our initial goals:
    • Test flow service deployment directly on the site’s storages (done)
    • Generate UDP fireflies based on real traffic (done)
    • Capture UDP packets (initially using a dedicated endpoint) (done)
    • Understand how UDP firefly information can be correlated with R&E netflow data (on-going)
  • Flow service (flowd) deployment
    • Currently deployed at AGLT2, BNL, KIT, UNL and Caltech
    • Runs directly on the storage nodes, uses netstat plugin
    • Generates UDP fireflies based on real traffic
  • ESnet has setup a dedicated collector to capture the UDP fireflies
    • Will attempt to correlate them with their netflow data
  • Results
    • Deployment, packet generation and collection worked fine
    • On-going - summary/results on the correlation with netflow

19

20 of 25

Plans

  • Near-term objectives
    • Finalise validation and get feedback from ESnet correlation exercise
    • Extend testing to Xrootd using dedicated R&E collection endpoint(s) and partial-marking
      • Detect flow identifiers from storage path/url, activities from user role mapping
      • Test proxies, cached proxies, private networks (K8s)
    • Involve other storage systems (dCache, etc.); discuss possible design/implementation
    • Instrument Rucio/FTS to pass flow identifiers to the storages �
  • Continue with the validation and testing using the existing deployment
    • Improve existing prototypes based on the feedback from the initial DC tests�
  • Engage other R&Es and explore available technologies for collectors
    • Deploy additional collectors and perform R&D in the packet collectors
    • Improve existing data collection and analytics
  • Test and validate ways to propagate flow identifiers
    • Engage experiments and data management systems
    • Validate, test protocol extensions and FTS integration
    • Explore other possibilities for flow identifier propagation, e.g. tokens
  • R&D activities
    • Packet marking - further testing and validation is required for IPv6 flow label implementation.
    • Packet collectors - currently UDP fireflies are sent to a dedicated collector(s). R&D is needed to understand how to run generic collectors (that would capture UDP fireflies from real traffic).

20

21 of 25

Packet Marking - IPv6

IPv6 header

21

Extension headers

For more details and discussion of various trade-offs please refer to the Packet Marking Document

22 of 25

IPv6 Ext. headers: Dst Option

The Destination Options header is used to carry optional information that need be examined only by a packet's destination node(s)

  • Allocated as one or more blocks of 8 octets; options are TLV encoded

Can be set/changed using standard socket interface (IPV6_DSTOPTS), but requires the options to be built first

  • This can be done using standard ancillary data functions

Reading options is performed via socket interface (IPV6_2292PKTOPTIONS)

\

22

23 of 25

IPv6 Flow Label

23

24 of 25

Flow Label in Linux Kernel

  • Ways to implement:
    • Advanced socket interface
      • Native socket interface, uses kernel network subsystem directly
      • Comes with limitations due to the complexity of the network stack
    • eBPF (XDP, TC-BPF)
      • Sandbox programs running via JIT directly in Linux Kernel
    • Netfilter
      • Kernel module using netfilter subsystem/hooks
    • DPDK, VPP - vendor-specific technologies
    • Software switches (Open vSwitch) - requires OpenFlow
    • SmartNICs (via P4, etc.)
      • Requires dedicated HW, but can be very useful for analytics

24

25 of 25

Linux Flow Label Implementation Status

25

OS/ Kernel

Flow Label Socket Interface

Netfilter

TC-BPF

Flow UDP

client�server

Flow TCP client�

Flow TCP server

Remote flow read�

Flow label change on�client

CC7 (3.10)

client only

ok

--

--

--

ok

--

C8 (4.15)

ok

ok

ok

ok

--

ok

ok

5.8

ok

ok

ok

ok

--

ok

ok