1 of 34

A journey through migration of a monolith to a multi tenant cloud platform and lessons learned.

2 of 34

About me...

Graeme Baillie

Cloud Solution Architect & Community evangelist

SWITCH ( NREN Switzerland )

3 of 34

Target Audience

IT Administration

Cloud Administration / Engineering

Migration Projects

4 of 34

What is our product?

On-Prem Video Encoding and Learning Management System.

5 of 34

What is our dev/ops environment?

6 of 34

Target?

  1. SaaS solution on Public Cloud
  2. Put everything on AWS
  3. Make it multi-tenant ( ?? )
  4. Do it in 6 months
  5. There will be a few consultants to help

7 of 34

Our Reaction??

8 of 34

9 of 34

What next?

  • 2+2 tasked with creating 100’s of User stories to reach “MVP”
  • AGILE redesigned

Everyone else?

  • Continue work stabilising and professionalizing our Openshift runtime and development environment
  • Continue with existing product development plans

10 of 34

Our Reaction??

11 of 34

  • Will we be replaced by consultants?

  • Are we not trusted?

  • Will our current actions be deemed a waste?

12 of 34

The Bombshell

13 of 34

Migrate the container platform!

14 of 34

Reaction?

15 of 34

Huge questions!!

  • Openshift experience thrown away
  • Decision did not follow our recommendation
  • Cost reasons
  • Listening to consultants and not us

16 of 34

Lessons learned

Not clear to anyone at decision point what the scope and size of such a migration to the most critical component - Think carefully and plan it !

Little or no understanding of dependencies e.g. ci/cd, monitoring systems which may have to be re-done - Map out the existing platform, dependencies and future deficiencies before any decision!

Large re-architecting required, not just Kubernetes->Kubernetes - Understand how two platforms differ and plan accordingly

17 of 34

Migrate the ci-cd tools!

18 of 34

Reaction?

19 of 34

More ( huge ) questions

  • Careful recommendations were ( more or less ) ignored
  • POC’s with acceptance criteria pointless
  • Use-cases not clear, reason for selection not clear

20 of 34

Lessons learned

Migrate ci-cd at the same time as a platform migration - one more unnecessary variable - don’t do it

Ansible playbooks had to be re-written - we were not sure exactly what the dependencies were

Was found that it was not possible to deploy natively containers - had to mis-use multiple CodeBuild projects - not a good idea

We still had to continue Product development with existing ci-cd - too difficult to plan migration

21 of 34

Split into two tracks

PODS ( SCRUM )

Cloud Architecture/Monitoring/Platform

Product Development/ci-cd

22 of 34

Application Migration Scope

  1. Lift and Shift all containers
  2. Move some non-application logic container components to managed services ( ES, DB, MQ )
  3. Move some event driven parts to Serverless ( Lambda )
  4. Complete re-architecture

23 of 34

Lessons learned

Don’t aim for perfection with MVP 1.0

Don’t keep moving the goalposts - Clearly define the scope and stick to it

Realise no system is EVER finished - aim for constant evolution, continual improvement

Understand the available resources

Plan for the unexpected - Don’t book every second till the project end

24 of 34

Application Design - Front End ( simplified )

http://tenant1.app.company.com/browser

-> Service “tenant1/browser”

http://tenant2.app.company.com/editor

-> Service “tenant2/editor”

25 of 34

Application Design - Content Distribution

Single CloudFront distribution with 2 origin behaviors.

  • /video/
  • /api/

Signed URLs provide restrictions to content as well to add an extra layer of security.

Single S3 bucket with separate folders for each tenant, and subfolders matching the caching patterns

  • /tenant1/video/*/
  • /tenant1/api/*
  • /tenant2/video/*
  • /tenant2/api/*

Lambda@Edge to rewrite URL’s for particular tenants to remove this overhead from the application

Files only available through Cloudfront and not direct with s3 due to OAI ( Origin Access Identity )

26 of 34

Application Design - Distribution workflow

1. Request comes into Route 53 requesting content for the first time. Request passed from CloudFront to Application http://tenant1.app.company.com

2. Application creates signed URL with get request to media folder in S3 bucket where content resides.

Lambda @ Edge catches the viewer, request and rewrites URL to add the tenant name to the path.

https://tenant1.app.company.com/tenant1/video/video1.mpg

3. Cloudfront forwards origin request to specific subfolder in S3 bucket.

4. Data is retrieved to end user for consumption and cached by Cloudfront if it matches any cache behaviors.

5. User makes same request for video1 later. Request is rewritten by Lambda and the request stops at CloudFront cache and finds a matched cache object.

6. User consumes cached object directly

27 of 34

Lessons learned

Lambda@EDGE an early case of over-engineering - don’t waste time on efficiencies that are clearly no longer required!

28 of 34

AWS as a Platform - Scope

A Platform for growth

29 of 34

security

Logging

auth

shared

monitoring

tools

App-1 prod

App-1 Staging

App-2 prod

App-2 Staging

Playground-1

Playground-2

federation?

GLOBAL

APP SHARED

APPS

R&D / TESTING

30 of 34

Lessons learned

Global SSO/Authentication system

Avoid Physical IAM User hell with custom inline policies

Centralise logging/monitoring/ci-cd in own accounts and design so they can be used by multiple teams/projects/apps

Security account for audit/expedited access

Centralised secret management ( secrets manger / parameter store )

31 of 34

Lessons learned - there are more

MAKE IT EASY FOR EMPLOYEES TO USE THE PLATFORM!

Allow to connect to accounts with minimum of fuss

Enforce MFA & Password policies

Single place to login, then assume roles/Short term creds

Design & Promote tooling ( aws-cli, profiles, aws-vault )

Tooling for edge cases ( e.g. EKS )

32 of 34

Sustainability

Should be planned from day 1

Beware consultants - priorities may be different

Insist on Operations / Patching / Support concept be built before rather than at the end.

Decide on Monitoring / Logging stack early - ensure developers take this into consideration every day.

Insist on Auto-Scaling / Elasticity as much as possible.

33 of 34

Sustainability (2)

Use Infrastructure as Code tools - FROM DAY 1 - Configuration Management will save the day

34 of 34

Top 5 Takeaways

  1. Understand the Impact of big decisions
  2. Clearly define the Scope and stick to it. Understand that nothing is perfect at v1.0 Continuous Improvement
  3. Getting something working is easy. Maintaining & Supporting the Cloud, the platforms and Applications is hard. Plan for it from day 1
  4. Moving to the cloud doesn’t need to mean 100% buy in for Managed services. It’s ok ( perhaps desirable ) to have other SaaS or Vendors
  5. Insist on what is best for you - don’t become seduced by consultants or what others are doing