1 of 31

The “Emerging” Edge

Edge Services, Edge Computing, Flexible Service Orchestration & Dynamic Infrastructure Overlays

Joe Breen

University of Utah

Snowmass Computational Frontier Workshop

Aug 10, 2020

1

2 of 31

Where does the science need to happen?

  • On the instrument?
  • At the workstation?
  • At the High Performance Computing Center?
  • In the Cloud?
  • At the “edge” of the network?

2

3 of 31

Where does the science need to happen?

  • On the instrument?
  • At the workstation?
  • At the High Performance Computing Center?
  • In the Cloud?
  • At the “edge” of the network?

  • All of the above

3

4 of 31

The “Edge” is emerging as an area of rapid growth

  • Changes are happening in both the industry and science edge
    • These changes intertwine and supplement each other
  • New and rapidly evolving technologies are enabling different approaches
    • WiFi6 - more bandwidth, increased wireless scheduling
    • 5G - more bandwidth, lower latency, network isolation, private cellular infrastructure on CBRS bands
    • Network Function Virtualization - Virtual networks, migration and scaling of virtual network functions
    • Containers - Docker, Singularity, Shifter - ability to package one or more processes in self-contained environments
    • Container orchestration tools such as Kubernetes, Docker Compose -- ability to deploy multiple processes together to create composable services

4

5 of 31

The “Edge” is emerging as an area of rapid growth

  • New workflows and workflow requirements
    • DevOps - developers are increasingly more responsible for the complete software lifecycle, from development to deployment
    • Groups want to create mash-ups of software technologies to create workflows
    • Reproducible and Repeatable science requirements - Scientists want to package their workflows with all of the respective dependencies - move between cloud, HPC, and local resources
  • Scaling workflows
    • LHC experiments
    • Plant Genomics
    • Increased number of sensors in IoT deployment

5

6 of 31

The “Edge” is emerging as an area of rapid growth

  • There are two distinct and closely coupled aspects of “Edge”
    • Edge Services
    • Edge Computing

6

7 of 31

Edge Services

  • What are Edge Services?
    • User facing services that need to tie tightly to the local compute and storage resources
    • User facing services that need to be as close to the end user as possible in order to maintain specific network and security characteristics
      • Really about maintaining the user experience
    • Services which might benefit from “multi-cloud” capabilities
    • Persistent user managed application services
    • Service methods to increase the resiliency strategy https://www.networkworld.com/article/3280232/living-on-the-edge-5-reasons-why-edge-services-are-critical-to-your-resiliency-strategy.html

7

8 of 31

Edge Services examples

  • Persistent services
    • Science Gateways
    • Web portals
    • Reverse proxy and local web caching engines
    • Workflow management systems for specific disciplines
  • Services which need to tie tightly to compute, network and storage
    • Jupyter, Matlab and Mathematica analysis notebooks and portals
    • Data Transfer services
    • Data caching

8

9 of 31

Industry Content Delivery Networks

  • Akamai
  • Google Global Cache (content)
  • Google Cloud CDN (service)
    • Hosting web and other content for entities
    • https://cloud.google.com/cdn
  • Netflix Open Connect

9

10 of 31

Science Examples of Content Delivery Networks�

10

U.S. ATLAS is using SLATE to deploy XCache for XRootD-based data caching in the US and Europe.

Ilija Vukotic, UChicago

Edgar Fajardo, UCSD

The Open Science Grid operates a world-wide network of XCache-based caches, used by LIGO, Virgo, DUNE and other scientific communities.

11 of 31

Edge Computing

  • What is edge computing?
    • Compute, storage, and network resources that exist at the edge of the network close to the resources
    • Compute, storage, and network resources that exist at the edge of the network close to the user
      • Supports End User Experience requirements of specific network and security characteristics, i.e. low latency, minimum exposure to the internet, distributed availability and resiliency
    • Resources that support specific edge services (usually dedicated)
      • Services sometimes centrally managed by another entity

11

12 of 31

Oakridge Leadership Computing Facility

  • Support persistent workflows that leverage local resources
    • Deploying Science Gateways
  • Support user/developer deployed workflow systems
  • Support Continuous Integration / Continuous Development

12

13 of 31

NERSC

  • Ability to “spin up” services on Rancher (Management system for Kubernetes)
  • Foster “Build, Ship, Run” within scope of NERSC HPC infrastructure
    • Create your own application stack in Docker containers
    • Bundle all dependencies within the containers
    • Mount local NERSC filesystems
  • Support long-term, constantly running services which require persistency and ability to scale
    • Example applications:
      • Frontier Squid Cache
      • CVMFS
      • License servers
      • Science Gateways
      • Jupyter analysis portals

13

14 of 31

Edge Services in the Cloud

  • All provide “Edge Services” in the Cloud
    • Support persistent user applications with access to cloud compute, network, and storage
    • Support containers, microservices allowing packaging of dependencies
    • Support large scalability

14

15 of 31

Edge Computing near the end user or experiment

  • Industry example: 5G and autonomous vehicles
    • Central offices and cell towers are converting into micro-datacenters to house Edge compute
    • Edge Compute necessary to support real-time responses and sensor streaming for autonomous vehicles
  • Science Example: Edge computing at the remote experiment
    • Raspberry Pi with SenseHat on space station supporting primary and secondary education
      • https://astro-pi.org/

15

16 of 31

Rapid growth of Edge + Shifting paradigms of software delivery

  • Edge Services and Edge computing is naturally distributed and larger scale
    • Large number of software management points
  • Software delivery is moving towards Continuous Integration / Continuous Deployment
    • In the past, Netscape browser upgrades were local, manual upgrades
    • Today, Google, Mozilla and Apple continually release new versions and push them through central update mechanisms
  • Software deployment is moving more towards DevOps where developers have more responsibility for the full software lifecycle
    • Developers are the experts of the software deployment where ever it may land: cloud, edge, end-user machine -- developers can rapidly see and fix issues

16

17 of 31

Rapid growth of Edge + Shifting paradigm of software delivery

  • Software deployment is happening on hardware owned by others
    • At home, you may own a Nest Thermostat, Ring doorbell, Google Home device or Amazon Alexa device but do you handle the software updates?
    • AT&T plans to host software from multiple vendors in their micro data centers while providing isolated 5G networks for the deployments
  • Security requirements are driving the curation of software and deployment of only known validated updates and services
    • Microsoft Azure Sphere now provides hardware, OS and cloud components for deploying IoT --
    • SLATE ci providing a curated catalog of known vetted science software to deploy

17

18 of 31

Supporting multiple tenants at the Edge

  • Industry has financial contracts, legal recourse and proprietary rules of use
    • Each entity has terms and conditions by which to abide
    • The owner of the underlying hardware and infrastructure software can charge the entity providing the services -- Telco, Cloud or Edge provider
    • Each entity operates their own layer or portion of the edge.
  • Science has collaboration, federation models and the need for transparency
    • Science still has security requirements and policies to which to adhere
    • Social and security standards need more development for allowing third party deployment of software at edge sites
      • WLCG security working group exploring this topic
      • SCIv2 framework

18

19 of 31

Examples of Federated Operations�Gateways

19

Harvester

U.S. ATLAS and U.S. CMS are both investigating or developing SLATE-based solutions for sending workloads to the Frontera supercomputer at TACC.

SLATE @ TACC

OSG Compute Element

20 of 31

Examples of Federated Operations�Content Delivery Networks

20

U.S. ATLAS is using SLATE to deploy XCache for XRootD-based data caching in the US and Europe.

Ilija Vukotic, UChicago

Edgar Fajardo, UCSD

The Open Science Grid operates a world-wide network of XCache-based caches, used by LIGO, Virgo, DUNE and other scientific communities.

21 of 31

Tying the Edge, Service delivery and support of multiple tenants together

  • Allow deployment of Edge Services and Edge Computing for multiple tenants
    • Ability to deploy in the Science DMZ next to the HPC resources
    • Ability to deploy closer to the user and the instrument
    • Ability to deploy in cloud next to cloud resources
  • Allow deployment of known services in a continuous and secure fashion
  • Allow deployment of federated services
  • Allow federated operations of services
  • Allow deployment of persistent services

21

22 of 31

Federated Operations�Your Services

22

SLATE is designed to allow you to build your own multi-institutional scientific computing collaborations and deploy your services around the world in minutes.

For Developers: Design new services, deploy, rapidly iterate.

For Sites:

Allow trusted entities deploy applications, but retain control over who and what.

New Services

Award #: 1724821

23 of 31

23

Application deployment that is:

  • Rapid
  • Consistent
  • Secure

Built with Security baked in

  • Trust relationships and responsibilities
  • Security Policy and Posture Definitions
  • Application Curation Workflow

Software Catalogs

  • Distributed
  • Scalable
  • Manageable

Award #: 1724821

24 of 31

Options for developing/testing Edge Services with at scale workflows - National Testbeds

  • GENI - distributed compute racks
    • http:s//www.geni.net
    • Utilizes Internet2 OESS orchestration software for WAN orchestration
  • Cloudlab - Cloud testbed and workflows at scale
    • https://cloudlab.us
    • Utilizes Internet2 OESS orchestration software for WAN orchestration
    • Utah/Wisconsin/Clemson and GENI resources
  • Chameleon - Cloud testbed and workflows at scale
    • https://www.chameleoncloud.org
    • Utilizes Internet2 OESS orchestration software for WAN orchestration
    • Computation Institute (Chicago)/iCAIR/TACC/RENCI and GENI resources

24

25 of 31

National Testbeds

  • POWDER - 5G/Wireless testbed
    • https://www.powderwireless.net
    • Develop wireless protocols, test wireless IoT, workflow and edge service development/ deployment in terms of wireless technologies
  • COSMOS - 5G/Wireless testbed
    • https://www.cosmos-lab.org/
    • Develop wireless protocols, test wireless IoT, workflow and edge service development/ deployment in terms of wireless technologies
  • AERPAW - Aerial wireless experiment platform
    • https://aerpaw.org
    • Develop 5G/wireless protocols integrated with drones and other mobile devices in order to improve coverage, signal and location data on fully mobile platforms

25

26 of 31

FABRIC

26

27 of 31

FABRIC - tying Testbeds and CI infrastructure together

  • Interconnects Testbeds, Computation facilities to support long running, at scale workflow experiments (NOT production domain science)
    • Support research in Cyberinfrastructure and Security
  • Programmable core network infrastructure
  • Programmable core racks
  • Programmable edge racks
  • Guaranteed quality of service
  • Highly accurate timestamping
  • Support of bare metal/VMs/Containers
  • FPGA and SmartNIC support
  • Full visibility at each layer and through the entire network

27

https://fabric-testbed.net

28 of 31

Summary

  • Science happening in multiple locations -- Edge is the current emerging growth area
  • Edge Services/Compute are either close to the resources or close to the user/experiment
  • Software paradigms are shifting to Continuous Integration/Continuous Deployment with developers moving to a full lifecycle DevOps support model.
  • A Federated Operations model supports multi-tenant services on 3rd party edge hardware in the science domains
  • Security policies and models need to continue to develop to support the new operations models.
  • FABRIC will tie national testbeds and national compute together for development of long running workflow scenarios and at scale scenarios

28

29 of 31

thanks

29

30 of 31

Explain

  • the paradigm shift of who operates the services vs who owns the hardware. Acknowledge the social challenges.

Give

  • examples of network caches in the network backbone, and other services

Workflow

  • management / Facility-apis (for all compute and storage not just for ‘analysis’ or edge-services)

K8s & Helm - Declarative infrastructure��Expanse

  • as an example of joint Slurm & K8S facility including cloud scale out

�Plethora

  • of other tools/approaches - Dask; Ray; Lambda step-functions ..

30

31 of 31

  • Where does the science need to happen?
  • Edge services
    • What are edge services
    • Example: OCLF Slate
    • Example: NERSC SPIN
    • Example: Edge services in cloud -- whole AWS, Google and Azure premise
  • Edge computing
    • What is edge computing
      • Example: CDNs
      • Example: 5G and automotive
      • Example: Physics with sensors -- staging compute/storage or latency sensitive
  • Deliver software on other people’s hardware?
    • Every group might have edge services but do they also have to deploy edge compute AND keep the software up to date?
    • Need a change in mindset (or do we just need to expand?) and need to address security
    • Example: Netscape manual upgrades vs Google, Mozilla, Apple constant delivery and updates -- continuous deployment / continuous integration
    • Example: Nest thermostat or Ring doorbell or Google Home or Alexa
    • Example: Microsoft and Ford?
    • Example Microsoft IoT - curated /hardened software
    • Example of software on Mars Rover or space station with Raspberry Pi/SenseHat - curated software
  • SLATE - combining these thoughts
    • TACC example - Edge services
    • Edge computing -- target small schools or in the field where IoT is
    • Software delivery from curated platform
    • Working with WLCG working group on security policies
  • Designing the full ecosystem with full visibility What are the next advances?
    • GENI, Chameleon, Cloudlab Testbeds - Create workflows, orchestrate movement of compute
      • Slices between compute blocks but limited programmability and visibility in middle
    • 5G - POWDER - sliced network, isolated services
    • FABRIC - programmability and visibility at all layers - edge services, edge computing, cloud computing, and fully programmable/visible middle
      • Developing with technologies that came from GENI, Chameleon and Cloudlab

31