Content will be constantly updated at https://docs.google.com/presentation/d/1UQWeAMIJgJsWw-cyz9R7NvcAuSWUnKvaZFXLfRAQ6fI/
The uplink challenge in centralized cloud
3
Central
DataCenter
Central
DataCenter
Long path, limited uplink bandwidth, bad experience for uplink in centralized cloud
downlink is a little better for using CDN etc re-distribution services
Massive distributed small edge clouds:
VNF* / App / Storage placement close to end user for better user experience
4
Edge DataCenter
Edge DataCenter
Edge DataCenter
Central
DataCenter
Central
DataCenter
Thousands of edge data center cloud to place VNF/APP/Storage close to end user for better user experience, for example, video processing once it’s uploaded/streamed to the nearby cloud by the user, better personalized networking capabilities
VNF: virtualized network function, telecom application running in the cloud
Massive distributed small edge clouds:
Bandwidth sensitive, heavy load App like CAD modeling, video editing/uplink streaming ask for the cloud close to the end user
Edge DataCenter
Edge DataCenter
Edge DataCenter
Central
DataCenter
Central
DataCenter
Enterprise asked for cloud close to the end user in production cloud, for the heavy load application like CAD modeling, video editing are very bandwidth sensitive. And there are often multiple branches of the Enterprise in different location, need to collaborate together, for example, video editing collaboration across different branches. So the edge clouds serve for the Enterprise are also needed to be distributed
VNF / App / Storage movement/distribution on demand among edge clouds
for better user experience
6
Edge DataCenter
Edge DataCenter
Edge DataCenter
Central
DataCenter
Central
DataCenter
The VNF / APP / Storage could be moved/distributed on demand among edge clouds for personalized best user experience in computation/storage/networking.
Distributed VNF/APP for better reliability, availability and user experience
7
Edge DataCenter
Edge DataCenter
Edge DataCenter
The VNF / APP / Storage distributed into multiple edge data centers for better reliability, availability and user experience.
Internet
Internet
vEPC
vEPC
vEPC
Internet
vEPC distributed into multiple DCs
Service function chaining across sites for flexible service logic
8
Edge DataCenter
Edge DataCenter
Edge DataCenter
Central
DataCenter
Central
DataCenter
Flexible service logic by dynamically chaining app across data centers
Edge DataCenter
Internet
OpenStack
@Site1
OpenStack
@Site2
OpenStack
@Site3
Why not just put one OpenStack in each site for distributed edge cloud?
OpenStack
@Site1
OpenStack
@Site2
OpenStack
@Site3
Tenant level L2/L3 networking and its automation for tenant E-W traffic isolation.
If a tenant has resources distributed in multi-site, for example one company’s branches, inter-connection and isolation is needed
VM1
Router
VM2
Router
OpenStack
@Site1
OpenStack
@Site2
OpenStack
VM1
VM2
OpenStack
Can we make one OpenStack distributed into multiple sites?
The question is why we want to do this, and how many sites
VM1
Router
VM2
Router
@Site1
@Site2
The benefit using one OpenStack to manage multiple sites:
Site1
Site2
Site3
one OpenStack
Challenge in using one OpenStack instances to manage massive multi-sites:
Site1
Site2
Site3
one OpenStack
API
Scheduler
DB
MessageBus
API
Scheduler
DB
MessageBus
API
Scheduler
DB
MessageBus
Massive distributed edge clouds bring new requirements
Amazon Region, AZ (1)
16
http://www.slideshare.net/AmazonWebServices/spot301-aws-innovation-at-scale-aws-reinvent-2014
Amazon AZ capacity
17
http://www.slideshare.net/AmazonWebServices/spot301-aws-innovation-at-scale-aws-reinvent-2014
Challenge in capacity expansion for one OpenStack in public cloud
OpenStack
API Server
Compute Node
MessageBus
DB
Network Node
Sizing is headache in capacity expansion in production public cloud:
You have to estimate, calculate, monitoring, simulate, test, online grey expansion for controller nodes and network nodes…whenever you add new machines to the cloud. Too much work to do to expand to 50000 compute nodes.
Number of Nova-API Server...
Number of Cinder-API Server..
Number of Neutron-API Server…
Number of Scheduler..
Number of Conductor…
specification of physical server…
specification of physical switch…
Size of storage for Image..
Size of management plane bandwidth…
size of data plane bandwidth…
reservation of rack space …
reservation of networking slots…
….
1000 (compute nodes) -> 2000 -> 50000
Capacity expansion should be controllable, modularized in public cloud
You can’t test all size, sometimes you even don’t have so much resources to test
but you can add already tested and verified building block for capacity expansion
(experience from production public cloud)
API Server
Compute Node
MessageBus
Network Node
DB
OpenStack
API Server
Compute Node
MessageBus
Network Node
DB
OpenStack
….
1000 (compute nodes) -> 2000 -> 50000
Capacity expansion should be controllable, modularized in public cloud
Don’t ruin the user expectation when adding new blocking for capacity expansion. Most of them are networking...
OpenStack
….
1000 -> 2000 -> 50000
OpenStack
VM2
VM3
VM4
VM1
VM6
VM7
VM5
SEG, Net1@AZ1
R
VM8
VM9
SEG, Net2@AZ1
After capacity expansion, new building block will be added to the same AZ. End user should not be aware of your capacity expansion.
App placement in multi-AZ for higher reliability and availability
DNS and/or load balancing for APPs in different AZs…
(AZ as a fault domain concept, It’s very strange that OpenStack’s AZs will share controller nodes/message bus/DB, Each AZ should have its own these components)
OpenStack (AZ1)
VM2
VM3
VM4
VM1
VM5
SEG, Net1@AZ1
R
OpenStack(AZ2)
VM7
VM8
VM9
VM6
VM10
SEG, Net2@AZ2
R
App placement in multi-AZ for higher reliability and availability
vEPC is designed as distributed application, the DB/session process unit and front end load balance can be distributed into multiple AZs for higher reliability and availability
vEPC distributed into multiple AZs
OpenStack(AZ3)
OpenStack(AZ2)
OpenStack(AZ1)
But one single endpoint and OpenStack api is expected for these building blocks by end user, PaaS, CLI, SDK, …
OpenStack
AZ1
AZ2
OpenStack
OpenStack
Large scale cloud brings new requirements
OpenStack (AZ1)
VM2
VM3
VM4
VM1
VM5
SEG, Net1@AZ1
R
AWS
R
VM2
VM1
AZure
R
VM2
VM1
Already built the private cloud and also use the power of public cloud, how to manage the resources in hybrid cloud? Networking, migration, backup/restore
Tricircle is OpenStack API gateway with added value like cross OpenStack L2/L3 networking, volume/VM movement, image distribution, global resource view, distributed quota management …
This makes massive distributed edge clouds work like one inter-connected cloud, one OpenStack
28
Edge DataCenter
Edge DataCenter
Edge DataCenter
Tricircle
OpenStack API
OpenStack API
OpenStack API
Tricircle is OpenStack API gateway with added value like cross OpenStack L2/L3 networking, volume/VM movement, image distribution, global resource view, distributed quota management …
This makes Tricircle being able to address the capacity expansion and multi-AZ challenges in a large scale cloud, and work like one OpenStack
29
Tricircle
OpenStack API
AZ1
AZ2
API Server
Compute Node
MessageBus
Network Node
DB
OpenStack
API Server
Compute Node
MessageBus
Network Node
DB
OpenStack
….
1000 (compute nodes) -> 2000 -> 50000
OpenStack
Tricircle is OpenStack API gateway with added value like cross OpenStack L2/L3 networking, volume/VM movement, image distribution, global resource view, distributed quota management …
This makes Tricircle being able to manage hybrid cloud leverage the help of “Jacket” (https://wiki.openstack.org/wiki/Jacket) project
Region, AZ, Pod, DC, Top, Bottom
32
Region One
Tricircle
DC1
bottom OpenStack ( Pod )
bottom OpenStack ( Pod )
bottom OpenStack ( Pod )
Nova
Cinder
Neutron
DC2
bottom OpenStack ( Pod )
Nova
Cinder
Neutron
DC3
bottom OpenStack ( Pod )
bottom OpenStack ( Pod )
bottom OpenStack ( Pod )
Nova
Cinder
Neutron
bottom OpenStack ( Pod )
Nova
Cinder
Neutron
AZ1
AZ3
AZ4
OpenStack API
OpenStack API
OpenStack API
Tricircle provides an OpenStack API gateway and networking automation to allow multiple OpenStack instances, spanning in one site or multiple sites or in hybrid cloud, to be managed as a single OpenStack cloud
*Tricircle itself can be deployed into multiple distributed data centers
V1 Tricircle is modified from OpenStack … coupling
OpenStack
@Site1
OpenStack
@Site2
OpenStack
@Site3
Tricircle
OpenStack API
OpenStack API
OpenStack API
OpenStack API
Nova
Nova Driver
Cinder
Nova Driver
Neutron
L2/L3 Agent
V2 Tricircle is OpenStack API gateway and networking automation, decoupled from OpenStack
Tricircle
OpenStack API
OpenStack API
OpenStack API
OpenStack API
Nova
API Gateway
Neutron
API Gateway
Neutron API
Cinder
API Gateway
Tricircle Plugin
OpenStack
@Site1
OpenStack
@Site2
OpenStack
@Site3
Tricircle, stateless design is like cells but better...
35
API Cell
Message Bus
DB
Compute Nodes
OpenStack
Message Bus
DB
Compute Nodes
Cell
Nova
-APIGW
routing DB
Cinder
-APIGW
Neutron
-APIGW
Nova,Cinder,Neutron,Glance
Controllers
Nova
-API
routing DB
API Cell
RPC
REST
| Cells | Tricircle |
Sharding the cloud | Yes | Yes |
Interface between API entrance and edge data center | RPC SQL | RESTful. OpenStack API Easy for upgrade, troubleshooting, multi-vendor integration |
Services involved | Nova | Nova,Cinder,Neutron, Glance(*), Ceilometer(*) |
North API | Nova API | OpenStack API |
Tricircle
Tricircle is API gateway (API entrance ) just like Nova API in API Cell, even simpler, for just forwarding the request, and leave the request parameter validation to be done in bottom OpenStacks. No VM/volume/backup/snapshot data will be stored in Tricircle.
* Glance, Ceilometer will be involved later.
Message Bus
DB
Compute Nodes
Cell
RPC
Message Bus
DB
Compute Nodes
OpenStack
Nova,Cinder,Neutron,Glance
Controllers
REST
Tricircle, OpenStack API-gateway to route request - VM placement
36
API Cell
Nova
-APIGW
routing DB
Cinder
-APIGW
Neutron
-APIGW
Message Bus
DB
Compute Nodes
OpenStack
Nova,Cinder,Neutron,Glance
Controllers
1
2
3
Message route:
Tricircle
Message Bus
DB
Compute Nodes
OpenStack
Nova,Cinder,Neutron,Glance
Controllers
Tricircle, OpenStack API-gateway to routing request- operation forwarding
37
API Cell
Nova
-APIGW
routing DB
Cinder
-APIGW
Neutron
-APIGW
1
2
3
Message Bus
DB
Compute Nodes
OpenStack
Nova,Cinder,Neutron,Glance
Controllers
5
4
Message route:
4,5 just a locality OpenStack reboot VM operation.
Tricircle
Message Bus
DB
Compute Nodes
OpenStack
Nova,Cinder,Neutron,Glance
Controllers
Tricircle, OpenStack API-gateway to routing request - VM/Volume co-location
38
API Cell
Nova
-APIGW
routing DB
Cinder
-APIGW
Neutron
-APIGW
Message Bus
DB
Compute Nodes
OpenStack
Nova,Cinder,Neutron,Glance
Controllers
1
2
3
Tricircle
Message Bus
DB
Compute Nodes
OpenStack
Nova,Cinder,Neutron,Glance
Controllers
if one AZ includes more than one OpenStack. In each AZ, there is one current active binding OpenStack for the same project-id
AZ1
boot VM(AZ1)
create Volume(AZ1)
Neutron can not just forward API request, need to do networking for tenant VMs in different edge data centers
So Neutron API-GW is not simple Neutron API forwarding gateway, need networking functionality,
Neutron API-GW includes Neutron API, and Tricircle plugin for networking purpose, and reserve Neutron DB for tenant level IP/mac address (which is spanning across multiple OpenStack instances) management, otherwise, conflict will happen.
Tricircle, networking of my VMs in different bottom OpenStack?
39
API Cell
Cinder
-APIGW
Neutron
-APIGW
Message Bus
DB
Compute Nodes
OpenStack
Nova,Cinder,Neutron,Glance
Controllers
Tricircle
Message Bus
DB
Compute Nodes
OpenStack
Nova,Cinder,Neutron,Glance
Controllers
VM1
VM2
R
Nova
-APIGW
routing DB
Cinder
-APIGW
Neutron
-APIGW
Tricircle
Neutron
-APIGW
Neutron API
Tricircle Plugin
Neutron DB
Networking - L2 networking (mixed VLAN/VxLAN)
40
Tricircle
Neutron API
Tricircle Plugin
L2GW Driver
bottom OpenStack
bottom OpenStack
Network1-1
3 Create Network1-1
Nova API-GW
1 Create Network1
2 Create VM1(Network1, AZ1)
VLAN1
L2GW1
L2GW2
5. Create Port1 for VM1
VM1
6 Create VM1(Port1, Network1-1)
4. update Network1( segment1 = Network1-1@ AZ1)
*support from Networking L2GW project
Networking - L2 networking (mixed VLAN/VxLAN)
41
Tricircle
Neutron API
Tricircle Plugin
L2GW Driver
bottom OpenStack
bottom OpenStack
Network1-1
8 Create Network1-2
Nova API-GW
7 Create VM2(Network1, AZ2)
VLAN1
L2GW1
10. Create Port2 for VM2
VM1
11 Create VM2(Port2, Network1-2)
9. update Network1( segment2 = Network1-2 @ AZ2)
VM2
Network1-2
VxLAN2
L2GW2
Networking - L2 networking (mixed VLAN/VxLAN)
42
Tricircle
Neutron API
Tricircle Plugin
L2GW Driver
bottom OpenStack
bottom OpenStack
Network1-1
Nova API-GW
VLAN1
L2GW1
VM1
VM2
Network1-2
VxLAN2
L2GW2
XJob
11. Start async job for L2 Networking for (Network1-1, Network1-2)
12. Create L2GW local connection
13. Create L2GW remote connection
14. Populate remote mac/IP info
12. Create L2GW local connection
13. Create L2GW remote connection
14. Populate remote mac/IP info
L2 Networking
(EVPN)
Edge DataCenter
When a new VM is booted, the networking is a relative time consuming task: need to create security group, network, subnet, connecting to router, etc.. For better user experience, not to let the end user wait for 30 seconds or longer for the VM booting request, it could be done by async. job. So we can introduce XJob for such async. background job.
Tricircle, balance between user experience and simplicity
43
Edge DataCenter
OpenStack
API Cell
Cinder
-APIGW
Neutron
-APIGW
Nova
-APIGW
DB
Cinder
-APIGW
Neutron-API
Tricircle Plugin
Tricircle
OpenStack
XJob
Networking - L3 networking ( E-W/N-S, VxLAN or Mixed VLAN/VxLAN )
44
Will be implemented after cross OpenStack L2 networking (VxLAN, Mixed VLAN/VxLAN) is ready, using VxLAN or Mixed VLAN/VxLAN as the N-S bridging network, for E-W just stretch the L2 network to where needed.
For Local Network and L2/L3 networking, refer to the spec:
https://review.openstack.org/#/c/304540/
Networking - L3 networking ( Shared VLAN E-W )
45
Neutron API
Tricircle Plugin
bottom OpenStack
bottom OpenStack
Provider L2 Networking
Router1
Router2
VM1
VM2
Network2
Network1
provider VLAN1
1,2
7,8
9,10
11,12
Nova API-GW
XJob
3,4
5,6
a
b
c
d
e
g
f
13
provider VLAN1
a Create Network1
b Create VM1(Network1)
1 Create Network1
2 Create VM1(Network1)
c Create Router
d Add router-interface ( Router1, Network1)
3 Create Router1
4 Add router-interface ( Router1, Network1)
e Create Network2
f Create VM2(Network2)
5 Create Network2
6 Create VM2(Network2)
g Add router-interface ( Router, Network2)
7 Create Router2
8 Add router-interface ( Router2, Network2)
9 create provider network VLAN1
10 add router interface ( Router1, VLAN1)
11 create provider network VLAN1
12 add router interface ( Router2, VLAN1)
Shared VLAN is mainly used intra data center. But doesn't mean it can't used inter~datacenter. for example, put physical vxlan gateway(vlan to vxlan
1:1 mapping) in each dc. it will look like this: vlan100 in dc 1-> vxlan gw in dc1,convert vlan100 to vxlan 5095- > ....~> vxlan gw in dc2,convert vxlan 5095 to vlan 100 -> vlan 100 in dc2 you have to configure the vlan vxlan mapping manually.
Networking - L3 networking ( Shared VLAN E-W/N-S )
46
bottom OpenStack
bottom OpenStack
Router1
Router2
VM1
VM2
Network2
Network1
bottom OpenStack
provider VLAN for E-W
(bridge network E-W,
CIDR: cidr E-W)
Router3
External Networking
provider VLAN for N-S
(bridge network N-S,
CIDR: cirdr N-S)
IP1
IP5
IP6
IP2
IP3
FIP1(External)
ip1
fip1(internal)
ip2
Tenant data movement across OpenStack
Tricircle
(OpenStack Controller)
(OpenStack Controller)
(OpenStack Controller)
VM1
(Trans Tool)
VM2
(Trans Tool)
OpenStack API Gateway:
OpenStack API
OpenStack API
OpenStack API
OpenStack API
volume
volume
Create VM with transportation tool, and attach the volume( data to be moved) to the VM, move the data across OpenStack through tenant level L2/L3 networking.
*Conveyor, a project built above Tricircle will help to do this:https://launchpad.net/conveyor
Summary: Stateless designed Tricircle
48
Admin API
DB
Nova API-GW
Cinder API-GW
Neutron API
Tricircle Plugin
XJob
Restful OpenStack API
Message bus, Async. XJob RPC API for cross OpenStack functionalities like networking, volume migration
DB Access for site management, resource routing
Msg. Bus
New components of Tricircle
Tricircle
Nova
Cinder
Neutron
bottom OpenStack(Pod)
Nova
Cinder
Neutron
bottom OpenStack(Pod)
KeyStone:
Glance:
Summary: Stateless designed Tricircle
49
Tricircle
Admin API
DB
Nova API-GW
Cinder API-GW
Neutron API
Tricircle Plugin
XJob
Restful OpenStack API
Message bus, Async. XJob RPC API for cross OpenStack functionalities like networking, migrate volume
DB Access for site management, resource routing
Msg. Bus
bottom OpenStack
Nova
Cinder
Neutron
bottom OpenStack ( Pod )
Nova
Cinder
Neutron
New components of Tricircle
What does stateless mean for Tricircle
Tricricle, which is only working as API gateway, will not store object data like VM/Volume/Backup/Snapshor/etc.., especially status of these object. For networking logic object like Network/Subnet/Router, it’s logical concept can be across multi-sites, it’s necessary to store these logical abstract object for global IP/mac address management and networking purpose. But even for port, it will be queried from the bottom OpenStack.
Stateless designed Tricircle
The new components of Tricircle is fully decoupled with OpenStack services like Nova, Cinder, and the Tricircle plugin will work just like OVN, ODL plugin in Neutron project. The stateless architecture proposal is to remove the uuid mapping, status synchronization and allow resource provisioning in each bottom OpenStack instances, make tricircle a very slim API gateway.
51
Stateless designed Tricircle
The new components of Tricircle is fully decoupled with OpenStack services like Nova, Cinder, and the Tricircle plugin will work just like OVN, ODL plugin in Neutron project. The stateless architecture proposal is to remove the uuid mapping, status synchronization and allow resource provisioning in each bottom OpenStack instances, make tricircle a very slim API gateway.
52
More information
wiki of Tricircle: https://wiki.openstack.org/wiki/Tricircle
play and contribute: https://github.com/openstack/tricircle
Design doc: https://docs.google.com/document/d/18kZZ1snMOCD9IQvUKI5NVDzSASpw-QKj7l2zNqMEd3g
Backup Slides
The good aspect for cell is to divide the DB and message bus into separate small DB and message bus in each cell. Cell could be deployed into data center, then resource locality could be achieve.
Cells V2
55
Edge DataCenter
Edge DataCenter
Edge DataCenter
Nova
-API
routing DB
API Cell
Message Bus
DB
Compute Nodes
Cell
Message Bus
DB
Compute Nodes
Cell
The challenge for cells is that,
If it’s RESTful interface between API-cell and child cell, it’ll be better, the site(child cell) is still manageable even if the link to the site is broken.But RESTful interface don’t allow to access DB and message bus directly. That’s why Tricircle uses OpenStack api as the interface between “API-cell” and “Child-cell” in site
Cells V2
56
Edge DataCenter
Edge DataCenter
Edge DataCenter
Nova
-API
routing DB
API Cell
1
2
Message Bus
DB
Compute Nodes
Cell
Message Bus
DB
Compute Nodes
Cell
3
4
5