Simpler, Faster, More Flexible:
Forward Evolution of NSM
The Dream
Solve the unsolved L2/L3 problems in K8s in a Cloud Native Way
The Plan
Borrow ideas from Application Service Mesh
(Network) Service Registry
(Network)Services
(Network Service) Endpoints
The First Speed Bump
There is no one true connection protocol like TCP for things that carry L2/L3 payloads. So we create one:
Networkservice
message NetworkServiceRequest {
connection.Connection connection = 1;
repeated connection.Mechanism mechanism_preferences = 2;
}
service NetworkService {
rpc Request(NetworkServiceRequest) returns (connection.Connection);
rpc Close(connection.Connection) returns (google.protobuf.Empty);
}
service MonitorConnection {
rpc MonitorConnections(MonitorScopeSelector) returns (stream ConnectionEvent);
}
Have repo will travel
Stuff started working
We realized this was bigger than one cluster
App Pod
App Pod
App Pod
App Pod
K8s Cluster
K8s Networking
(via CNI)
Intra-Cluster Connectivity, Security
App Pod
App Pod
App Pod
App Pod
K8s Cluster
K8s Networking
(via CNI)
Intra-Cluster Connectivity, Security
App Pod
App Pod
App Pod
App Pod
K8s Cluster
K8s Networking
(via CNI)
Intra-Cluster Connectivity, Security
App Pod
App Pod
App Pod
App Pod
K8s Cluster
K8s Networking
(via CNI)
Intra-Cluster Connectivity, Security
DB
Replication
Istio
We realized this was bigger than just K8s
App Pod
App Pod
App Pod
App Pod
K8s Cluster
K8s Networking
(via CNI)
Intra-Cluster Connectivity, Security
App Pod
App Pod
App Pod
App Pod
K8s Cluster
K8s Networking
(via CNI)
Intra-Cluster Connectivity, Security
App Pod
App Pod
App Pod
App Pod
K8s Cluster
K8s Networking
(via CNI)
Intra-Cluster Connectivity, Security
App Pod
App Pod
App Pod
App Pod
K8s Cluster
K8s Networking
(via CNI)
Intra-Cluster Connectivity, Security
DB
Replication
Istio
VM
VM
VM
VM
Virtual Networking
On-prem Servers
Complexity Grew
Kubernetes Cluster
Node
Forwarder
(kernel/vswitch)
Network Service Client (NSC)
(Pod)
Network Service Client (NSC)
(Pod)
...
Network Service Endpoint (NSE)
(Pod)
Network Service Endpoint (NSE)
(Pod)
...
...
Kubernetes API Server
(Network Service Registry via CRDs)
Network Service Manager (NSMgr)
(Daemonset)
Node
Network Service Manager (NSMgr)
(Daemonset)
Forwarder
(kernel/vswitch)
Network Service Client (NSC)
(Pod)
...
Network Service Client (NSC)
(Pod)
...
Network Service Endpoint (NSE)
(Pod)
Network Service Endpoint (NSE)
(Pod)
APIs Proliferated
Networkservice
message NetworkServiceRequest {
connection.Connection connection = 1;
repeated connection.Mechanism mechanism_preferences = 2;
}
service NetworkService {
rpc Request(NetworkServiceRequest) returns (connection.Connection);
rpc Close(connection.Connection) returns (google.protobuf.Empty);
}
Registry
service NetworkServiceRegistry {
rpc RegisterNSE (NSERegistration) returns (NSERegistration);
rpc RemoveNSE (RemoveNSERequest) returns (google.protobuf.Empty);
}
CrossConnect
message CrossConnect {
string id = 1;
string payload = 2;
connection.Connection source = 4;
connection.Connection destination = 7;
}
service Forwarder {
rpc Request(crossconnect.CrossConnect) returns (crossconnect.CrossConnect);
rpc Close(crossconnect.CrossConnect) returns (google.protobuf.Empty);
}
ForwarderRegistration
service ForwarderRegistration {
rpc RequestForwarderRegistration (ForwarderRegistrationRequest) returns (ForwarderRegistrationReply);
rpc RequestLiveness (stream google.protobuf.Empty) returns (stream google.protobuf.Empty);
}
API Proliferation
Link to diagram in draw.io
CI Slowed Down
1h40m
Adding Features
got hard
Simplify
Simplify - Fewer fundamental things
Fundamentally, there are only three kinds of things:
(Network) Service Registry
(Network Service) Client
(Network Service) Endpoint
Simplify - Fewer APIs
(Network) Service Registry
(Network Service) Client
(Network Service) Endpoint
Register/UnRegister
Find
Request,Close, Monitor
Simplify - Fewer APIs
(Network) Service Registry
(Network Service) Client
(Network Service) Endpoint
Register/UnRegister
Find
Request,Close, Monitor
Simplify - Easy to Secure
(Network) Service Registry
(Network Service) Client
(Network Service) Endpoint
Register/UnRegister
Find
Request,Close, Monitor
Simplify - Easy to heal
(Network) Service Registry
(Network Service) Client
(Network Service) Endpoint
Simplify - Basic Heal Pattern
Client Detects Failure Fast - Repeats Request
Requests have expiration
Client Repeats Request before it expires
Server cleans up expired state
Compose to Complexity
NSC
NSE/NSC
NSE/NSC
NSE/NSC
NSE/NSC
Control Plane
Dataplane
Dataplane
Dataplane
NSE/NSC
Dataplane
Node 1
Example
NSC
Sidecar
NSMgr
Forwarder/
Cross Connect
NSMgr
Forwarder/
Cross Connect
Control Plane
Dataplane
Dataplane
Dataplane
NSE
Dataplane
Node 2
Pod
Pod
Pod
Pod
Pod
Pod
Compose to Complex Heal
message PathSegment {
string name = 1;
string id = 2;
string token = 3;
google.protobuf.Timestamp expires = 4;
map<string, string> metrics = 5;
}
message Path {
uint32 index = 1;
repeated PathSegment path_segments = 2;
}
message Connection {
...
Path path = 6;
...
}
Modularity
(sdk)
Build NSEs out of chain elements
authz
func NewServer(ctx context.Context, name string, authzServer networkservice.NetworkServiceServer, tokenGenerator token.GeneratorFunc, additionalFunctionality ...networkservice.NetworkServiceServer) Endpoint {
rv := &endpoint{}
var ns networkservice.NetworkServiceServer = rv
rv.NetworkServiceServer = chain.NewNetworkServiceServer(
append([]networkservice.NetworkServiceServer{
authzServer,
updatepath.NewServer(name),
monitor.NewServer(ctx, &rv.MonitorConnectionServer),
timeout.NewServer(&ns),
updatetoken.NewServer(tokenGenerator),
}, additionalFunctionality...)...)
return rv
}
updatepath
monitor
timeout
updatetoken
...
Build NSCs out of chain elements
authorize
func NewClient(ctx context.Context, name string, onHeal *networkservice.NetworkServiceClient, tokenGenerator token.GeneratorFunc, cc grpc.ClientConnInterface, additionalFunctionality ...networkservice.NetworkServiceClient) networkservice.NetworkServiceClient {
return chain.NewNetworkServiceClient(
append(
append([]networkservice.NetworkServiceClient{
authorize.NewClient(),
updatepath.NewClient(name),
heal.NewClient(ctx, networkservice.NewMonitorConnectionClient(cc), onHeal),
refresh.NewClient(ctx),
injectpeer.NewClient(),
updatetoken.NewClient(tokenGenerator),
}, additionalFunctionality...),
networkservice.NewNetworkServiceClient(cc),
)...)
}
updatepath
heal
refresh
injectpeer
...
updatetoken
NSM is Fractal System
NSM is Fractal System
Modularity
(multi-repo)
Repo pipelining
api/
sdk/
api
sdk
cmd
api - our top level NSM APIs
Repo pipelining
api/
sdk/
sdk-${platform}/
cmd-${cmd}/
api
sdk
cmd
sdk - platform independent sdk
Repo pipelining
api/
sdk/
sdk-${platform}/
cmd-${cmd}/
api
sdk
cmd
sdk-${platform} - platform dependent sdk
Repo pipelining
api/
sdk/
sdk-${platform}/
cmd-${cmd}/
api
sdk
cmd
cmd-${cmd} - various commands (one per repo)
helm
operator
Repo pipelining
api/
sdk/
sdk-${platform}/
cmd-${cmd}/
api
sdk
cmd
integration-${platform} - runs integration tests
helm
operator
integration-${platform}/
integration-tests/
CI Time
api/
sdk/
sdk-${platform}/
cmd-${cmd}/
api
sdk
cmd
helm
operator
integration-${platform}/
integrationtests/
CI Time:
~1m20s
~16m30s
CI Time:
~4m15s
CI Time:
~3m30s
CI Time:
~2m40s
CI Time:
~4m40s
api/
sdk/
sdk-${platform}/
cmd-${cmd}/
api
sdk
cmd
helm
operator
integration-${platform}/
integrationtests/
Pr is merged
Update PR passes CI and is merged
Update PR
Failure Detection/Mitigation
api/
sdk/
sdk-${platform}/
cmd-${cmd}/
api
sdk
cmd
helm
operator
integration-${platform}/
integrationtests/
Update PR
Chase back failure:
Failure Detection/Mitigation
Failure Detection/Mitigation
api/
sdk/
sdk-${platform}/
cmd-${cmd}/
api
sdk
cmd
helm
operator
integration-${platform}/
integrationtests/
Fix Pr is merged
Update PR passes CI and is merged
Update PR
Succeeds
A Special Thank You to our Event Sponsors!
Questions?