1 of 19

Rootconf 2017 // Ruchi Singh

Migration of 300 microservices from AWS Cloud to Snapdeal Cloud

@rchsngh569

2 of 19

The next 15 mins...

  • Who are we?
  • Why did we migrate 300 microservices
  • Planning for migration
  • Runtime gotchas and our fix to them
  • Rollback strategies in case of failure
  • Key learnings
  • Party time

3 of 19

Largest e-commerce marketplace in India

Buyer Seller

4 of 19

Overview Of Snapdeal Cloud:Cirrus

5 of 19

And the story begins…..Our plan for migration

6 of 19

Checklist before starting migration

  • Understand why you’re migrating
  • Map out a clear plan
  • Regaining knowledge
  • Risk reduction & ensuring compatibility
  • Network particularities & limiting latencies
  • Get everyone onboard

7 of 19

Why did we build a private cloud?

8 of 19

Cost - A major driving factor

Public clouds are great till the growth is unpredictable

At an inflection point, public clouds don’t remain cost effective, we needed an alternate.

And how we made it cost effective -

SD cloud is built using 100% open-source components.

Some analysis on some enterprise technologies and calculation of operational cost gave us clearness about our idea.

We have a team to build, automate and manage the DC and Cloud Platform.

9 of 19

Other factors...

Performance - One machine one service; much higher performance, optimized it for self use

Security - Advance Enterprise firewall, Intrusion Detection, DDOS Prevention

Data sovereignty - keeping our critical data within boundaries

10 of 19

To Summarize...

11 of 19

Gaining knowledge about infrastructure...

12 of 19

After planning steps

  • Listed down all the services and their dependencies in a central place which is yaml file act as a infrastructure as code
  • Divided services in smaller groups, kept all those together which were tightly dependent
  • Made a dependency graph to facilitate the migration by showing dependency between services
  • Kept all datastores in replication mode and ensured that data is in sync continuously
  • Migrated tightly coupled services together according to the business flows

13 of 19

Gotchas during migration and our fix to them

-- Security Groups to redirect traffic between services running in old cloud and new cloud

-- Data not in sync issue so took a delta and dump that data to the new machines

-- launched machines were not able to handle the load so extend our infra at run time.

-- Strong monitoring needed for new system and applications was required and we were missing some parts in monitoring. We use EFK, Icinga for monitoring

14 of 19

Our plan for failure...Rollback strategies

If a service migration fails for any reason, some rollback strategies we have -

  1. Traffic redirection - switching the pointing of applications
  2. Database resync - database servers in old cloud and in new cloud are in sync for few days and If we see any failure, we just change the pointing of application to the old database server
  3. Cooling period - we set a cooling period to shut down the servers in old cloud

15 of 19

Execution part : technical tools

-- Our yaml files for each individual service (our infrastructure as code source)

-- Dendrite for service discovery (nerve and synapse)

-- Saltstack, chef (orchestration tools)

-- Git, Jenkins (CI/CD pipeline)

-- Automation scripts

16 of 19

Key Learnings

-- Plan, Plan and Plan for your cloud migration! This is where your project fails.

-- Understand your services, architecture and their dependencies

-- We created a live service dependency graph to facilitate migration

-- Don’t migrate as-it-is, fix the problems that you never got time to fix

-- Strict naming conventions and make sure all launched services are registered with all orchestration tools you are using

-- Automate and monitor everything!

17 of 19

But our cloud is hybrid...

But we didn’t stop there, still using public cloud for these purposes ---

-- Disaster Recovery : for data backup and recovery

-- New service/company acquisition: new company acquires which used to run in public cloud and takes sometime to migrate it

-- On-demand : at critical times like diwali. Reach maximum capacity and still need to grow

18 of 19

And it’s party time!

After 1.5 years, ups and downs, few downtimes, challenges and Snapdeal.com is running on it’s own cloud.

19 of 19

Thank you!