µONOS - Next Generation ONOS
The goal of this document is to kick off the collaboration on the next generation architecture for the ONOS controller, code-named µONOS.
As its predecessor, µONOS will be an open-source SDN control and configuration platform. The new architecture is:
µONOS is based on our 5+ years of experience building and deploying ONOS which has been a leader in the SDN control plane space when it comes to high availability, performance and scalability.
The platform enables comprehensive set of network operations:
Table of Contents
Software Enabled Broadband Access (SEBA)
Topology Models / Abstractions
Generalized abstractions for runtime control
Platform for specialized NB pipeline-level APIs
Version Control & Issue Tracking
Naming Conventions, Code Style Guidelines, and Related Tools
With the release of ONOS 2.0, today’s ONOS architecture provides a stable platform with many nice characteristics:
But - as with all things - the current architecture also has some caveats and limitations:
While the above caveats are largely technical in nature, they affect some of the prominent industry use-cases, which then have to work-around these limitations of ONOS, or ODL for that matter, since caveats of both are similar.
There are a number of technical motives and business use-case requirements both signalling that this effort would be highly beneficial.
With ONOS 2.0 having gone through a major infrastructure upgrade and thus being a stable platform for some time to come, now is the time to consider the next generation architecture.
Furthermore, since the initial ONOS architecture was developed a number of new standards and technologies have emerged or matured. Advancements such as P4Runtime, gNMI, gNOI, gRIBI (and gRPC in general), OpenConfig YANG models, ygot toolchain, Kubernetes, Golang, etc. While the existing ONOS architecture was able to successfully accommodate many of these on an incremental basis (e.g. P4, gNMI, etc.), there still remain significant opportunities to exploit these advancements in adjusting our approach to them in a less incremental fashion. Furthermore, many of these play an important part of the Stratum project and therefore it is fitting for them to be incorporated natively into the next generation of the SDN controller architecture as well.
Thus the goal is to establish the next generation SDN controller architecture and in order to assure the maximum possible acceptance, the intent is to formulate it completely in the open and with the help of the ONOS community.
Clearly, the ONOS team will also continue to curate ONOS 1.x & 2.x maintenance and releases. However, the core team will focus solely on bug fixes, code reviews and release engineering and will rely on the rest of the ONOS community to continue new feature development as needed. This will make sure that the existing ONOS architecture will continue to be a stable platform for deployments and further app development.
The following are high-level descriptions of a few specific deployment use-cases that the µONOS effort will be targeted at supporting and demonstrating.
One of the major deployment use-cases for service providers is the network access edge. There are a variety of technologies for providing wired access (cable, fibre, etc.) as well as wireless access (RAN). The following use-cases describe the various parts of the network access edge solution.
While the access technologies differ and have their own specific sets of requirements - and hence also their own use-cases - the one thing they all have in common is the need for a fabric that serves as an efficient means to connect to the internet and as a place to host various network functions. Given that the network edge provides access to end-users (residences or businesses), which need to be treated distinctly, there are specific scaling requirements for such fabric.
Such fabric will typically consist of ~30 network infrastructure switches and ~2000 ports organized in some form of a leaf-spine topology. Although the number of devices and ports is low, the scalability and performance pressures will come from the high number of routes (~150K) and the resulting flows (~1.5M) that are required to properly handle service for ~20K users/subscribers.
Route/flow programming rates - both sustained and peak rates
Nominal latency for flow/group/meter programming
Configuration operation rates
Description
...
Route/flow programming rates - both sustained and peak rates
Nominal latency for flow/group/meter programming
Configuration operation rates
The wireless access (RAN) presents some unique and challenging problems due to the mobility of devices and the physics of the radio medium. While a number of these challenges are tackled directly by the various devices (hand-sets, base units, etc.), some control-plane concerns still need to be addressed by the platform and the applications alike in order for the users to experience the great service they expect.
The principal challenge comes from the limited response times (sub 10ms) given to the platform and applications for exercising certain control-plane functions that are involved in admitting users to the network and handling session hand-over between base stations as user roams from one coverage area to another. These response times are near real-time in that they need to be both fast and predictable.
Although the entire platform does not need to support the near real-time aspects throughout, the architecture must facilitate construction of a narrow set of interactions that are near real-time in order for it to be suitable for application at the RAN edge.
Given that the platform admits for components to be written in various languages, in some cases it may make sense for those RAN-specific modules to be written in languages, such as C++ either due to availability of ASN1 libraries, or to escape GC-related latencies. Furthermore, adequate caching strategies may need to be employed to avoid access to data required for speedy decision-making in order to avoid accessing data that requires engaging in consensus protocols, which could also induce undesired delays.
Number of subscribers, RU/DUs
Number of routes/flows
Expected environmental events (hand-offs, admissions)... both sustained and peak rates
Route/flow programming rates - both sustained and peak rates
Nominal latency for flow/group/meter programming
Maximum allowable latency for critical operations (latency and characterization of operations)
Configuration operation rates
One of the attributes of a next-generation SDN network should be that it can be deployed with a suite of applications that allow it to simply replace the classic non-SDN network. As such the various data-plane and control-plane modules need to work together in order to provide the following:
See the following figure for an example of possible modules in such a network:
The next generation SDN controller platform must allow applications to engage with network infrastructure devices and the network as a whole. The operators expect to be able to:
Furthermore, it is expected that these activities can be incorporated into a variety of custom work-flows in order to support the network operations consistent with the desired policies. Such work-flows will often involve a combination of configuration, control, monitoring and validation activities and can generically be categorized as “zero-touch” operations.
Examples include the following:
Some of the more prominent usage scenarios are outlined below.
The platform (and supporting applications) should be able to demonstrate how a new switch can be brought up on the network by providing only high-level information such as:
The demonstration application guiding the work-flow should be able to use the platform’s configuration, operational and programming interfaces to discover and pre-configure the switch with the proper security certificates, configuration profiles, pipeline structure and then provide the operator with visual cues via LEDs on how the switch should be connected to the existing network. Once the proper connectivity has been validated, the switch can be brought on-line either automatically or through an explicit operator action, depending on the desired network behaviour.
Similarly, the platform should be able to demonstrate how an existing switch can be safely upgraded. In order for this to occur with no (or minimal) impact on the network operation, the demonstration application guiding the work-flow should be able to administratively bring the switch off-line by evacuating any of its work-load and then to use the operational interfaces to initiate software update. Once the software update has been completed and validated, the switch can be brought back on-line automatically.
...
The following are basic tenets of the next generation architecture and methods of construction.
In principle, and at macro-level, the NG ONOS architecture will resemble the old one in that the notions of core, south-bound and apps will remain as separate entities.
Clearly, the core itself will not be a monolith as depicted above. As in the current architecture, it will be an assembly of various subsystems, each responsible for a certain aspect of control or configuration. However, unlike in the current architecture, where the subsystems are Java components interfacing with each other via Java interfaces and are housed in a single JVM process, the core components in the new architecture will be deployed as containers that use gRPC interfaces.
gRPC will be the canonical form of interfacing not just among the core components, but between the architectural tiers as well. This will allow the various parts of the system to be written in different languages. However, it is not sufficient to just specify gRPC alone. We must also - and especially - give attention to the abstractions or models that prescribe the structure of information being exchanged. Here, the goal is to rely on existing open standards, such as OpenConfig or TAPI as much as possible.
With respect to the southbound interfaces it should suffice to rely on gNMI, gNOI and P4Runtime and OpenConfig heavily, if not solely. It is not entirely clear though, whether the existing OpenConfig, TAPI or IETF models are adequate for northbound or intra-core APIs. This is an opportunity to address the long-standing issue of standardization of northbound controller interfaces; both ONOS and ODL have been targets of this valid criticism.
Therefore, it will be critical to establish a set of abstractions that can gain wide acceptance as a means for apps to interact with the northbound and/or for intra-core interactions themselves. It may very well be that gNMI, gNOI may fit the bill here as well, but coupled with YANG models that convey network-level (rather than just device-level) information.
Today, the ONOS core has roughly 50 different subsystems, some of which work closely with each other. For example, the network graph abstraction is provided by the topology subsystem, which works closely with device, link and host subsystems. While the goal of the new architecture is to disaggregate the controller and the core itself, it may not make sense to indiscriminately break apart the core along the subsystem boundaries. Instead, we ought to consider the affinities between the different parts of the core, their functional roles and also their horizontal scalability requirements.
For example, it may make sense to keep the topology subsystem together with the required device and link inventory subsystems, but separate from the configuration subsystem and from the flow control programming subsystem comprised of flow rule, group and meter subsystems.
Clearly, these are just initial thoughts and given that the interfaces will be gRPC, we will always have the flexibility to adjust the physical boundaries if necessary.
Both modularity and performance characteristics will be factors in achieving a proper balance in how the various services are split or aggregated.
The following diagram shows a sample deployment architecture. Note that the sole notion of the cluster is with the key/value store where consensus will be required for guaranteed consistency of the various data structures. Otherwise, the key platform services providing access to network information base and to network control and configuration functions will run as load-balanced banks of individual services that can be scaled as needed using the cloud orchestrator, e.g. Kubernetes. Load balancing for the gRPC services can be setup using a third-party mechanism such as Linkerd or similar means.
Drivers and other southbound adapters, if needed, will be explicitly separated from the core via gRPC-based interfaces and thus are expected to be deployed as separate micro-service(s), leaving them free to use whatever implementation language, protocol or libraries they wish.
(See the Device Configuration section where the NB API and SB API tenets are described. Similar approach will apply to control facets.)
This will allow drivers for Stratum, VOLTHA, ORAN or other systems to be authored using their “native” means - using low level libraries and merchant silicon stacks, while at the same time allowing reuse of the “legacy” ONOS drivers (OpenFlow or otherwise). In fact, it may also allow reuse of OpenDaylight drivers.
Capture thoughts on stateful vs stateless. Where the state is kept, etc.
There are 2 important aspects of this: 1 for the controller architecture itself, and 2 for the control application where application state (and possibly even network protocol state) might need to be catered to.
For distributed state, the plan is to use Atomix (equivalent to etcd, but faster) for constructing resilient stores.
Think about distributed state holistically - w/r/t the drivers as mentioned above, and also w/r/t the apps. You can’t specify the app, but you can provide tools to make life simpler to create apps with distributed state.
In order to keep the deployment as simple as possible and to minimize the number of means for propagating changes to various components, the architecture will initially rely on individual services for streaming notifications via gRPC as a principal means for apps to remain informed about changes in state.
However, the platform may also need to provide a mechanism for broadcasting events using publish/subscribe as a looser coordination mechanism. For this, we will draw on any existing technologies such as Kafka for example. In this case, in order to maintain the ability to efficiently serialize and deserialize payloads from multiple different languages we will use GPB as the means to encode message payloads.
The initial design will abstain from relying on a separate pub/sub bus.
The purpose of the topology subsystem is to provide a central base for accessing information about various elements on the network and about the network structure. This includes the various network infrastructure devices (switches, ROADMs, base stations, etc.), links that connected them into a traversable graph and end-station hosts (NICs, hosts, hand units, etc.)
The system needs to provide a unified interface for accessing the topology information, but needs to be extensible to allow for various means of network discovery.
Leaning on the existing ONOS topology graph abstractions is certainly one of the options. However, there are some drawbacks in that the existing graph abstraction does not capture limited intra-device connectivity that exists in many optical devices; meaning that not all ingress ports can be connected to all egress ports.
We may need to extend the graph abstraction to allow capture of this information. If there are existing models in place that allow this, we may need to lean on those instead.
More research is required before we can make the final determination here. Currently some of the options are as follows (in no particular order):
Design documentation has been moved to the ONOS Configuration GitHub repository.
To access the operational state of a device, a cache or a passthrough in the configuration system may be needed.
The cache would enable fast read times for the apps, decoupling the application reads from the speed of retrieval from the device. The cache should leverage the gNMI streaming model. If the device is not gNMI enabled the adapter would take care of the polling to streaming conversion.
A caching mechanism does introduce additional complexity however.
The controller is the owner of the authoritative state of mutable configuration. All changes to a device configuration must be done though the configuration subsystem and controller through its NB API. If device changes a controller prescribed configuration (not operational state) there will be an error and the controller will take action.
Authority in the controller is needed for network-wide transactionality.
The system may allow a transient mode where initial configuration can be “learned” from the current device configuration; after that the system should revert to asserting authority over the configuration.
The ownership of configuration holds true only to the scope of configuration is taking care of → if config changes are out of the ONOS config-subsystem scope we can ignore.
Use of gNMI both on the SB and NB API will allow the same toolchain to be used for compiling YANG models. The ideal candidate for this is ygot. It is open-source, authored by Google and maintained by an established community. The tool compiles YANG models into protobuf models, which can then in turn be used to generate various language-specific bindings.
Timelines and specific deliverable milestones are to be determined. There is the desire to show this at a conference, maybe ONF Connect but this is not a hard commitment.
Different components will move at different pace.
The initial focus will be on a single device and ability to roll back/roll forward it’s configuration.
A second step will be the addition of a basic device inventory subsystem as a precursor to the topology subsystem. Such inventory will track device addresses, certificates, etc. required for connection
Building on top of the device inventory the team will implement multi-device transactions.
Demonstrations and milestones will be shown of the different use cases through the implementation of exemplar applications.
Following this the team will focus on adding and demonstrating high-availability aspects and failure scenarios.
The store will start with the basic functionality needed for the device configuration. The Team will focus on defining the interface to the store while the implementation is somewhat independent and can be in memory cache or structured with ATOMIX, etcd or other implementations.
Today’s FlowRule, FlowObjective, Group, and Meter APIs are all modeled after the OpenFlow switch abstraction. After years of experience operating OpenFlow networks with ONOS, some fundamental limitations of this model have surfaced, such as the ambiguity of the abstraction (e.g. action specification) that makes it hard for developers to write portable apps; the lack of a pipeline specification (which table supports what?), the inability to natively support new protocols, etc. New core abstractions should be based on the following principles.
We should offer core abstractions that are stripped of any protocol semantics (e.g. IPvX, Ethernet, etc.), but instead limit them to model the essential capabilities of forwarding devices (e.g, match-action tables, action indirection for WCMP, controller packet I/O, etc.), while also giving attention to what’s required to correctly and efficiently manage such state at runtime (e.g. dependency between flow rules and groups, which one has to be written first?). In doing this, we should look at re-using existing work such as the Portable Switch Architecture (PSA), which defines a P4 architecture common to a large variety of switches (including OpenFlow), and P4Runtime, which defines pipeline/protocol-independent structures for runtime entities such as table entries, groups, etc. Whether the core should be based on the same P4Runtime protobuf messages (possibly augmented with network-level information), or a more convenient representation of that (e.g. PI runtime classes of current ONOS), is up for debate.
NG ONOS should be aware that each device can be associated with a different pipeline specification. Having the core aware of such specification allows to (1) support programmable devices (which require a pipeline to be configured before we can program its tables), and, most importantly, (2) to manage data plane state more efficiently, such as better error reporting to apps (e.g, table almost full, or action/match field not supported), more efficient stats collection (e.g. do not poll tables that do not support counters), etc. NG ONOS should offer infrastructure to maintain many of these pipeline specifications, without prescribing a specific one, but allowing apps to bring their own. How to abstract pipeline specifications is up for debate. One option is to use the P4Info structure like current ONOS PI model classes, which provide all information for runtime control but lacks a specification of the packet flow and pipeline implementation useful for T3-like debugging (e.g. which is the first table in the pipeline? What’s the next one if a given action is hit?). On the other end, there’s the full P4 program, which might be consumed with a friendly P4 compiler output (e.g., BMv2 JSON or we can implement our own p4c backend).
Like with FlowObjective today, NG ONOS should offer means to opportunistically augment basic NB forwarding APIs with pipeline-level semantics that favors app portability, BUT without prescribing any of such pipeline. FlowObjective is an example of a pipeline API (3 stages: filtering, forwarding, and next), with its merits (flexibility, can cover many use cases) and faults (highly underspecified, it’s hard for drivers to provide a mapping without knowledge of the apps). Other pipeline APIs are possible, for example one for standard bridging/routing based on SAI, or one specifically designed for Trellis. That of coming up with ONE pipeline API that can cover all apps use cases and that is easy to map by drivers, it’s the holy grail of SDN research (and perhaps an utopian goal). Instead it’s easier to imagine a future where a few of such pipeline APIs will be adopted by the industry. NG ONOS should provide infrastructure to let such APIs flourish and be interchangeable, allowing apps to choose which one to use and allowing device drivers to choose which one to implement. Moreover, NG ONOS should provide infrastructure to translate from one API to another (if such a translation is possible).[b]
We need to decide on a name for the project. Based on the multiple online and offline discussion threads, the overwhelming feedback is to preserve the ONOS branding and to stick with the ONOS name or the µONOS variant, rather than going with something entirely different. Given this feedback, it is proposed we skip forward and proceed as follows:
The team has decided to use GitHub. Rather than using Gerrit (as is done with the current ONOS and had good historical reasons for this), the project will follow the path of Stratum by using GitHub directly. This has many benefits:
The team will create three repositories at the minimum:
The project will use the GitHub mark-down for documentation. This has the benefit of keeping documentation close to - and versioned with - the code to which it pertains.
Google docs can be used as an early form of collaboration, but established information needs to be recorded as mark-down documents in GitHub.
For the CI pipeline the project will use Jenkins. Currently Travis, although integrated with GitHub, is more limited than Jenkins especially with respect to unit and integration testing.
See https://wiki.jenkins.io/display/JENKINS/GitHub+Plugin (hosted at https://github.com/jenkinsci/github-plugin).
We may revisit this decision once the project establishes some momentum.
Testing goals will be established early on. These may include:
Responsibilities for testing should be well defined
As code will be split across repos, testing discipline and tooling to help with refactoring may gain importance. Worth a watch: https://www.youtube.com/watch?v=TrC6ROeV4GI (and possibly evaluating the Kythe toolset, although that might be more applicable to an existing larger codebase).
Bazel will be used as a build system. The reasons for this are as follows:
During the early stages of development and experimentation, the team will be using the native Go tools. As the project starts to take shape and gains some level of complexity, we will transition to using Bazel.
The build mechanism should:
The repositories containing the code will be open from the get-go. Community participation is encouraged and welcomed. The core team will move forward at its own pace. However, until stated explicitly, all initial implementations are to be considered experimental and interfaces are non-binding (as not to create legacy liability too early); basically all the code has to be treated as “beta” and 0.x version, allowing the APIs to recover from any mistakes in early experiments. This temporary “beta” label will be lifted at some point in the future.
The preferred implementation language is Go. These are the reasons for this recommendation:
Clearly, IDE will remain a developer’s choice. These are some options for the Go language:
The ideal solution for organizing the source code is to use the same project layout pattern for all of the subsystems and use the same set of rules and guidelines for development purposes of each subsystem that allows developers to contribute in all of the subsystems easily. Furthermore, using the same project layout pattern for all of the subsystems makes the maintenance of the whole project easier.
The projects will use patterns in compliance with the suggested Go language project layout; directories that are not needed will of course be omitted. This layout is also used by a number of well-known projects such as Kubernetes and others.
[a]We should revise this section to make sure we articulate the high-level design objectives and tenets first.
Some of the lower level design concerns can then be moved off to a separate "NG ONOS Control Design" document.
[b]Carmelo to add a diagram that shows how apps can use one or more pipeline APIs, how such APIs can be mapped one to each other, and how drivers can map such APIs to one or multiple P4 programs.
[c]Go modules are still young and the experience varies. I personally believe this is the right way to go, but there are projects that prefer to "vendor" their dependencies as it gets a more stable build.