1 of 58

Massively Distributed Use cases -

Draft for discussion

Openstack FEMDC

2 of 58

Different Kinds of use cases

  • Deployment scenarios
    • This is about the architecture of Massively Distributed Fog/Edge cloud.
    • Many have been proposed in presentation at Openstack Summit or in prior work on Fog or MEC.
  • Service Scenarios
    • What is the service that is being provided (Video Streaming, AR, …)
  • Edge Node Component list
  • Test use cases
    • Specific event that would happened on a FEMD cloud
    • Chosen because they may stress the control plane.

3 of 58

Deployment Scenarios

(compute hosts with lower overhead e.g. FPGA, DSP, etc.

10,000+ locations

Zero touch provisioning)

4 of 58

Service Scenarios :

Fog/Edge Massively Distributed

  • Mobile Use cases from MEC call for Edge Node deployment near the base station (ref: ETSI GS MEC-IEG 004 v1.1.1 (2015-11) or Customer Premise

5 of 58

Uplink Limited Service Scenario

  • For the fog edge, with many IOT use cases are uplink heavy, there is no real need to bring the data all the way up to the cloud.

6 of 58

What does it mean

  • Tens of thousand of more of edge nodes in a single organization (like a telco).
    • Zero touch provisioning
  • Characteristics of an edge node
    • Compute:
      • Edge nodes may be as small as a single x86 server but may also be as large a few racks
      • The virtual machines will usually be low in overhead and will include non-linux types such as FPGS, DSP, etc.
    • Storage: Edge nodes have limited storage capability
    • Networking: Connectivity back to central nodes and the cloud is 
      • Expensive
      • High latency
    • Location: Proximity is relevant and not all nodes are the same (e.g. uCPE)
    • Either a regional or a thin control plane (for the edge node)

7 of 58

Edge Node Components list Use Cases:

Three proposed Use cases for Edge node components:

1- Edge Nodes as Regions

2- Edge Nodes as Cells

3- Edge Nodes as plain compute nodes

8 of 58

Region2

Region1

Keystone

Horizon

Nova

Glance

Cinder

Swift

Network

Ceilometer

Heat

DB

Compute

Storage

Central

Edge

Nova

Glance

Network

DB

Compute

Storage

Ceilometer

RabbitMQ

RabbitMQ

Option 1: Edge Nodes as Openstack Regions

9 of 58

Cell 1

Central

Edge

nova-compute

Libvirt (KVM/+QEMU)s

Neutron-server

Neutron-ml2

Neutron-linuxbr

Neutron-dhcp

Neutron-metadata

Neutron-dhcp

DB

RabbitMQ

Keystone

Horizon

Glance

Cinder

Ceilometer

Heat

Storage

DB

RabbitMQ

Ceilometer-Compute

Neutron-ml2 ?

Neutron-linuxbr?

nova-conductor

nova-console

nova-novncproxy

nova-scheduler

nova-placement-api

nova-api

Option 2: Edge Nodes as Openstack Cells

Cell 2

nova-compute

Libvirt (KVM/+QEMU)s

DB

RabbitMQ

Ceilometer-Compute

Neutron-ml2 ?

Neutron-linuxbr?

nova-conductor

nova-console

nova-novncproxy

nova-scheduler

nova-placement-api

10 of 58

Compute 1

Nova-Compute

Neutron-ML2

Neutron-LBR

Nova-API

Neutron-API

Neutron-ML2

Rabbit-MQ

Horizon

Keystone

Glance

Neutron-L3

Neutron-DHCP

DB

Libvirt (KVM/+QEMU)

Central

Edge

Neutron-linuxbr

Option 3: Edge Nodes as Remote Compute Nodes

Compute 2

Nova-Compute

Neutron-ML2

Neutron-LBR

Libvirt (KVM/+QEMU)

11 of 58

Networking in Edge Node

The performance of a specific use case will depend specifically on how networking implementation is done in Edge Node.

1- Do we use Neutron or something else (e.g. tricircle)?

2- Is networking in Edge node at L2, or is it done at L3?

3- Is there a need for encryption (between edge and central nodes)? Is there a performance impact (e.g. MTU)?

4- How is Control plane separated from User Plane (L2 or L3, VLAN, how Neutron has been configured) ?

5- Network Partitioning (resilience when connectivity to central node is lost)

Let’s not underestimate the importance and complexity of the networking implementation when we define use cases.

12 of 58

User Plane Considerations�

13 of 58

Performance Consideration on User Plane

  • East-West Traffic
  • Local Caching
  • Local Breakout
  • Are all nodes the same?
  • Backhaul Capacity
  • Backhaul Latency
  • Security
  • High Availability and Failure Recovery
  • Relations to Slicing (For wireless use cases: are there QoS implication in relation to network slices in 5G)
  • Storage (Is Swift the right solution?)

14 of 58

Control Plane�Latency and Bandwidth impact?

15 of 58

Control Plane Use cases

  1. Commissioning of a new edge node (updating inventories ..)
  2. Decommissioning of a edge node
  3. Link Failure and Recovery (Many node loosing connection and then coming back)
  4. Disaster Recovery (Many nodes going out of commission and then back)
  5. SW upgrade on Edge Node (e.g. New OS version)
  6. Instantiating a complex virtual function composed of severals VM (e.g. a chain) on the edge node
  7. Migrating a function from a edge node to another nearby
  8. Database Synchronisation (if local DB on the edge need to synchronize with central node or with a redundant node)
  9. Telemetry (Depending on the frequency of telemetry measurement, the amount of data can become overwhelming for an edge node or for the transport)
  10. Autoscaling on Edge Node
  11. VM Snapshot on Edge Node
  12. Dual homing of Edge node on two core nodes (e.g. can there be two central nodes for redundancy reason or because two administrative domains share the edge node)
  13. Storage on Edge Node

16 of 58

Control Plane Use cases

  • Commissioning of a new edge node (updating inventories ..)
  • Decommissioning of a edge node
  • Link Failure and Recovery (Many node loosing connection and then coming back)
  • Disaster Recovery (Many nodes going out of commission and then back)
  • SW upgrade on Edge Node (e.g. New OS version)
  • Instantiating a complex virtual function composed of severals VM (e.g. a chain) on the edge node
  • Migrating a function from a edge node to another nearby
  • Database Synchronisation (if local DB on the edge need to synchronize with central node or with a redundant node)
  • Telemetry (Depending on the frequency of telemetry measurement, the amount of data can become overwhelming for an edge node or for the transport)
  • Autoscaling on Edge Node
  • VM Snapshot on Edge Node
  • Dual homing of Edge node on two core nodes (e.g. can there be two central nodes for redundancy reason or because two administrative domains share the edge node)
  • Storage on Edge Node

17 of 58

Massively Distributed Use cases -

Draft for discussion

Openstack FEMDC

18 of 58

Different Kinds of use cases

  • Deployment scenarios
    • This is about the architecture of Massively Distributed Fog/Edge cloud.
    • Many have been proposed in presentation at Openstack Summit or in prior work on Fog or MEC.
  • Service Scenarios
    • What is the service that is being provided (Video Streaming, AR, …)
  • Edge Node Component list
  • Test use cases
    • Specific event that would happened on a FEMD cloud
    • Chosen because they may stress the control plane.

19 of 58

Deployment Scenarios

(compute hosts with lower overhead e.g. FPGA, DSP, etc.

10,000+ locations

Zero touch provisioning)

20 of 58

Service Scenarios :

Fog/Edge Massively Distributed

  • Mobile Use cases from MEC call for Edge Node deployment near the base station (ref: ETSI GS MEC-IEG 004 v1.1.1 (2015-11) or Customer Premise

21 of 58

Uplink Limited Service Scenario

  • For the fog edge, with many IOT use cases are uplink heavy, there is no real need to bring the data all the way up to the cloud.

22 of 58

What does it mean

  • Tens of thousand of more of edge nodes in a single organization (like a telco).
    • Zero touch provisioning
  • Characteristics of an edge node
    • Compute:
      • Edge nodes may be as small as a single x86 server but may also be as large a few racks
      • The virtual machines will usually be low in overhead and will include non-linux types such as FPGS, DSP, etc.
    • Storage: Edge nodes have limited storage capability
    • Networking: Connectivity back to central nodes and the cloud is 
      • Expensive
      • High latency
    • Location: Proximity is relevant and not all nodes are the same (e.g. uCPE)
    • Either a regional or a thin control plane (for the edge node)

23 of 58

Edge Node Components list Use Cases:

Three proposed Use cases for Edge node components:

1- Edge Nodes as Regions

2- Edge Nodes as Cells

3- Edge Nodes as plain compute nodes

24 of 58

Region2

Region1

Keystone

Horizon

Nova

Glance

Cinder

Swift

Network

Ceilometer

Heat

DB

Compute

Storage

Central

Edge

Nova

Glance

Network

DB

Compute

Storage

Ceilometer

RabbitMQ

RabbitMQ

Option 1: Edge Nodes as Openstack Regions

25 of 58

Cell 1

Central

Edge

nova-compute

Libvirt (KVM/+QEMU)s

Neutron-server

Neutron-ml2

Neutron-linuxbr

Neutron-dhcp

Neutron-metadata

Neutron-dhcp

DB

RabbitMQ

Keystone

Horizon

Glance

Cinder

Ceilometer

Heat

Storage

DB

RabbitMQ

Ceilometer-Compute

Neutron-ml2 ?

Neutron-inuxbr?

nova-conductor

nova-console

nova-novncproxy

nova-scheduler

nova-placement-api

nova-api

Option 2: Edge Nodes as Openstack Cells

Cell 2

nova-compute

Libvirt (KVM/+QEMU)s

DB

RabbitMQ

Ceilometer-Compute

Neutron-ml2 ?

Neutron-inuxbr?

nova-conductor

nova-console

nova-novncproxy

nova-scheduler

nova-placement-api

26 of 58

Compute 1

Nova-Compute

Neutron-ML2

Neutron-LBR

Nova-API

Neutron-API

Neutron-ML2

Rabbit-MQ

Horizon

Keyston

Glance

Neutron-L3

Neutron-DHCP

DB

Libvirt (KVM/+QEMU)

Central

Edge

Neutron-linuxbr

Option 3: Edge Nodes as Remote Compute Nodes

Compute 2

Nova-Compute

Neutron-ML2

Neutron-LBR

Libvirt (KVM/+QEMU)

27 of 58

Networking in Edge Node

The performance of a specific use case will depend specifically on how networking implementation is done in Edge Node.

1- Do we use Neutron or something else (e.g. tricircle)?

2- Is networking in Edge node at L2, or is it done at L3?

3- Is there a need for encryption (between edge and central nodes)? Is there a performance impact (e.g. MTU)?

4- How is Control plane separated from User Plane (L2 or L3, VLAN, how Neutron has been configured) ?

5- Network Partitioning (resilience when connectivity to central node is lost)

Let’s not underestimate the importance and complexity of the networking implementation when we define use cases.

28 of 58

User Plane Considerations�

29 of 58

Performance Consideration on User Plane

  • East-West Traffic
  • Local Caching
  • Local Breakout
  • Are all nodes the same?
  • Backhaul Capacity
  • Backhaul Latency
  • Security
  • High Availability and Failure Recovery
  • Relations to Slicing (For wireless use cases: are there QoS implication in relation to network slices in 5G)
  • Storage (Is Swift the right solution?)

30 of 58

Control Plane�Latency and Bandwidth impact?

31 of 58

Control Plane Use cases

  • Commissioning of a new edge node (updating inventories ..)
  • Decommissioning of a edge node
  • Link Failure and Recovery (Many node loosing connection and then coming back)
  • Disaster Recovery (Many nodes going out of commission and then back)
  • SW upgrade on Edge Node (e.g. New OS version)
  • Instantiating a complex virtual function composed of severals VM (e.g. a chain) on the edge node
  • Migrating a function from a edge node to another nearby
  • Database Synchronisation (if local DB on the edge need to synchronize with central node or with a redundant node)
  • Telemetry (Depending on the frequency of telemetry measurement, the amount of data can become overwhelming for an edge node or for the transport)
  • Autoscaling on Edge Node
  • VM Snapshot on Edge Node
  • Dual homing of Edge node on two core nodes (e.g. can there be two central nodes for redundancy reason or because two administrative domains share the edge node)
  • Storage on Edge Node

32 of 58

Control Plane Use cases

  • Commissioning of a new edge node (updating inventories ..)
  • Decommissioning of a edge node
  • Link Failure and Recovery (Many node loosing connection and then coming back)
  • Disaster Recovery (Many nodes going out of commission and then back)
  • SW upgrade on Edge Node (e.g. New OS version)
  • Instantiating a complex virtual function composed of severals VM (e.g. a chain) on the edge node
  • Migrating a function from a edge node to another nearby
  • Database Synchronisation (if local DB on the edge need to synchronize with central node or with a redundant node)
  • Telemetry (Depending on the frequency of telemetry measurement, the amount of data can become overwhelming for an edge node or for the transport)
  • Autoscaling on Edge Node
  • VM Snapshot on Edge Node
  • Dual homing of Edge node on two core nodes (e.g. can there be two central nodes for redundancy reason or because two administrative domains share the edge node)
  • Storage on Edge Node

33 of 58

1. Commissioning of Edge Node (Draft)

  • Control plane need to be reconfigured for new edge node:
    • Impact on Communications:
      • New API endpoints being created
      • New AMQP queues
    • Impact on Compute, Storage and Networking and their respective db
      • Is the db centralized or at the edge?
    • Impact on orchestration?
      • Is it just a matter of updating db?
    • New Telemetry streams
  • Questions:
    • What is the impact in each area?
    • What is the call flow?

34 of 58

2. Decommissioning of Edge Node (Draft)

  • Control plane need to be reconfigured to remove edge node:
    • Impact on Communications:
      • API endpoints to be removed
      • AMQP queues to be deleted
    • Impact on Compute, Storage and Networking and their respective db
      • Is the db centralized or at the edge?
    • Impact on orchestration?
      • Is it just a matter of updating db?
  • Questions:
    • If clean up was not done, what would be the impact?
    • What is the call flow?

35 of 58

3. Link Failure and Recovery - i.e. Many nodes loosing connection and coming back(Draft)

  • Control plane need to recover from failure:
    • Impact on Communications:
      • Multiple failed transactions on APIs and on AMQP
    • Impact on Compute, Storage and Networking and their respective db
      • Is there a chance for db corruption?
    • Impact on orchestration

Questions:

    • Which databases keep track of node availability? (Nova, Neutron, Cinder ?)
    • Is there a heartbeat or other keepalive mechanism, or would transactions just fail?
    • What is the call flow when nodes come back?
    • If many nodes come back at once, is there a risk of flood?
    • Would the isolated nodes be able to keep providing some level of service? If yes, how are things resynchronized?

36 of 58

4. Disaster Recovery - i.e. Many nodes going out of commission and coming back(Draft)

  • Control plane need to recover from failure:
    • Impact on Communications:
      • Multiple failed transactions on APIs and on AMQP
      • Would recovery re-create same endpoints and queues (or new ones)?
    • Impact on Compute, Storage and Networking and their respective db
    • Impact on orchestration

Questions:

    • Which databases keep track of node availability? (Nova, Neutron, Cinder ?)
    • Is there a heartbeat or other keepalive mechanism, or would transactions just fail?
    • What is the call flow when nodes come back?
    • If many nodes come back at once, is there a risk of flood?

37 of 58

5. SW Upgrade on Edge Node (Draft)

  • ?

38 of 58

6. Instantiating a complex VF composed of several VMs (e.g. a chain) on the Edge Node (draft)

  • A series of Nova and Neutron operations need to be performed at once:
    • Impact on Communications: pretty much BAU
    • Impact on Compute, Storage and Networking and their respective db: BAU
    • Impact on orchestration
      • Need special Recovery mechanism if instantiation of some VM succeed but some fail

Questions:

    • Need some sort of affinity rule to ensure all VMs are on the same node. Does that exist in Openstack?

39 of 58

7. Migrating a function from an edge node to another Nearby (draft)

  • Control plane Impact:
    • Impact on Communications: BAU
    • Impact on Compute, Storage and Networking and their respective db: BAU
    • Impact on orchestration

Questions:

    • How to identify edge nodes that are nearby?
    • What does the call flow look like?

40 of 58

8. Db synchronisation (if local DB on the edge need to synchronize with central node or with a redundant node)(Draft)

  • Control plane need to recover from failure:
    • Impact on Communications:
      • What is the transport mechanism between data bases?
      • Is traffic bursty?
    • Impact on Compute, Storage and Networking and their respective db
    • Impact on orchestration

Questions:

    • ?

41 of 58

9. Telemetry (draft)

  • Control plane need to handle volume of data:
    • Impact on Communications:
      • Most data is transported by AMQP. Can it be kept local to edge node?
    • Impact on Telemetry db
      • If database is local, it will be size limited.
      • If database is central, it might be transport limited.

Questions:

    • What are the current challenges with Telemetry implementation?
    • In recent releases, there may be different db for gnocchi and for aodh and for Panko. Would all of them need to be at the edge?
    • How would rules set on a central node apply at the edge? How about rules that require information from multiple edge nodes?

42 of 58

10. Autoscaling on Edge Node(Draft)

  • Control plane need to recover from failure:
    • Impact on Communications:
      • Multiple failed transactions on APIs and on AMQP
      • Would recovery re-create same endpoints and queues (or new ones)?
    • Impact on Telemetry
    • Impact on orchestration

Questions:

    • Would orchestration be done centrally or at the edge?
    • Would telemetry be done centrally or at the edge?
    • Can VM be instantiated on an Edge node close by?

43 of 58

11. VM Snapshot(Draft)

  • Control plane need to recover from failure:
    • Impact on Communications:
      • Impact would be on Glance API. what if glance is in a central node?
    • Impact on Compute, Storage and Networking and their respective db
    • Impact on orchestration

Questions:

    • Other than expected duration of the process are there impacts to take into account?

44 of 58

12. Dual Homing(Draft)

  • ?

45 of 58

13. Storage on Edge Node(Draft)

  • ?

46 of 58

1. Commissioning of Edge Node (Draft)

  • Control plane need to be reconfigured for new edge node:
    • Impact on Communications:
      • New API endpoints being created
      • New AMQP queues
    • Impact on Compute, Storage and Networking and their respective db
      • Is the db centralized or at the edge?
    • Impact on orchestration?
      • Is it just a matter of updating db?
    • New Telemetry streams
  • Questions:
    • What is the impact in each area?
    • What is the call flow?

47 of 58

2. Decommissioning of Edge Node (Draft)

  • Control plane need to be reconfigured to remove edge node:
    • Impact on Communications:
      • API endpoints to be removed
      • AMQP queues to be deleted
    • Impact on Compute, Storage and Networking and their respective db
      • Is the db centralized or at the edge?
    • Impact on orchestration?
      • Is it just a matter of updating db?
  • Questions:
    • If clean up was not done, what would be the impact?
    • What is the call flow?

48 of 58

3. Link Failure and Recovery - i.e. Many nodes loosing connection and coming back(Draft)

  • Control plane need to recover from failure:
    • Impact on Communications:
      • Multiple failed transactions on APIs and on AMQP
    • Impact on Compute, Storage and Networking and their respective db
      • Is there a chance for db corruption?
    • Impact on orchestration

Questions:

    • Which databases keep track of node availability? (Nova, Neutron, Cinder ?)
    • Is there a heartbeat or other keepalive mechanism, or would transactions just fail?
    • What is the call flow when nodes come back?
    • If many nodes come back at once, is there a risk of flood?
    • Would the isolated nodes be able to keep providing some level of service? If yes, how are things resynchronized?

49 of 58

4. Disaster Recovery - i.e. Many nodes going out of commission and coming back(Draft)

  • Control plane need to recover from failure:
    • Impact on Communications:
      • Multiple failed transactions on APIs and on AMQP
      • Would recovery re-create same endpoints and queues (or new ones)?
    • Impact on Compute, Storage and Networking and their respective db
    • Impact on orchestration

Questions:

    • Which databases keep track of node availability? (Nova, Neutron, Cinder ?)
    • Is there a heartbeat or other keepalive mechanism, or would transactions just fail?
    • What is the call flow when nodes come back?
    • If many nodes come back at once, is there a risk of flood?

50 of 58

5. SW Upgrade on Edge Node (Draft)

  • ?

51 of 58

6. Instantiating a complex VF composed of several VMs (e.g. a chain) on the Edge Node (draft)

  • A series of Nova and Neutron operations need to be performed at once:
    • Impact on Communications: pretty much BAU
    • Impact on Compute, Storage and Networking and their respective db: BAU
    • Impact on orchestration
      • Need special Recovery mechanism if instantiation of some VM succeed but some fail

Questions:

    • Need some sort of affinity rule to ensure all VMs are on the same node. Does that exist in Openstack?

52 of 58

7. Migrating a function from an edge node to another Nearby (draft)

  • Control plane Impact:
    • Impact on Communications: BAU
    • Impact on Compute, Storage and Networking and their respective db: BAU
    • Impact on orchestration

Questions:

    • How to identify edge nodes that are nearby?
    • What does the call flow look like?

53 of 58

8. Db synchronisation (if local DB on the edge need to synchronize with central node or with a redundant node)(Draft)

  • Control plane need to recover from failure:
    • Impact on Communications:
      • What is the transport mechanism between data bases?
      • Is traffic bursty?
    • Impact on Compute, Storage and Networking and their respective db
    • Impact on orchestration

Questions:

    • ?

54 of 58

9. Telemetry (draft)

  • Control plane need to handle volume of data:
    • Impact on Communications:
      • Most data is transported by AMQP. Can it be kept local to edge node?
    • Impact on Telemetry db
      • If database is local, it will be size limited.
      • If database is central, it might be transport limited.

Questions:

    • What are the current challenges with Telemetry implementation?
    • In recent releases, there may be different db for gnocchi and for aodh and for Panko. Would all of them need to be at the edge?
    • How would rules set on a central node apply at the edge? How about rules that require information from multiple edge nodes?

55 of 58

10. Autoscaling on Edge Node(Draft)

  • Control plane need to recover from failure:
    • Impact on Communications:
      • Multiple failed transactions on APIs and on AMQP
      • Would recovery re-create same endpoints and queues (or new ones)?
    • Impact on Telemetry
    • Impact on orchestration

Questions:

    • Would orchestration be done centrally or at the edge?
    • Would telemetry be done centrally or at the edge?
    • Can VM be instantiated on an Edge node close by?

56 of 58

11. VM Snapshot(Draft)

  • Control plane need to recover from failure:
    • Impact on Communications:
      • Impact would be on Glance API. what if glance is in a central node?
    • Impact on Compute, Storage and Networking and their respective db
    • Impact on orchestration

Questions:

    • Other than expected duration of the process are there impacts to take into account?

57 of 58

12. Dual Homing(Draft)

  • ?

58 of 58

13. Storage on Edge Node(Draft)

  • ?