Distributed Marketplace
draft - comments encouraged!
(pretty outdated as of dec 2014)
Current single datacenter architecture
Memcache
Zeus
Web
Rabbit
Celery
Redis
MySQL
Admin/Cron
Web
Web
Celery
Celery
Elastic Search
Memcache
Memcache
Internet
MySQL
MySQL
Filesystem
Goals in order of priority
Specifically a non-goal:
MySQL
App-level Routing
A synchronized index stores the location of all our data.
Master & Slaves
Index & Slaves
App Server
Master & Slaves
Master & Slaves
Master & Slaves
Index & Slaves
App Server
Where is User 26?
User 26 is in DC2, Shard 7.
Hi DC2, Shard 7. Please give me user 26.
Here is user 26. Please cache it well.
App-level Routing
Pros
Cons
Conclusions: This is an expensive change (developer time) but leaves us a lot of options for scaling in the future.
Complete Abstraction of Data Objects
For effective caching and invalidation we need to cache objects, not queries. This is an extensive change to our code, replacing every database query with an equivalent object look up. At a minimum this is putting another abstraction layer on the ORM, in other places it may be replacing it completely (particularly if it is compiling an object’s data from multiple sources).
Replace APIPinningMiddleware with Remote Flags
We have naive method of avoiding stale data in APIPinningMiddleware. We should copy Facebook’s improved version of this where the databases flush the cache. This would require us to build a McSqueal clone (the non-open software they use to send signals from their db to memcache).
Legacy API Support
Older versions of our APIs are currently maintained in the code base with if statements or, in an extreme case, an entirely different method. This is not scalable for developers to work on or QA to test.
Instead of this fragile versioning strategy we should be tagging our API versions in git and releasing them separately. When we finish an API version we tag it 1.1 and ask the ops team to deploy it. Then when we deploy 1.2 it is a completely different system and doesn’t change 1.1 at all, despite potentially massive code changes. This is easier for QA, easier for developers, and if not easier, at least straight-forward for ops.
Note that this will require the Complete Abstraction of Objects in the previous slide so all the API versions don’t need to track db changes.
Elasticsearch
Memcache
Memcache
Filesystem
Monolith
Monolith
This is a service that simply holds all our statistics (mysql). Our frontends all cache this in ElasticSearch so this doesn’t need to be super performant or super resilient.
Worst case: We could get by with a single master cluster in a single DC and just mirror this out everywhere else read-only. If we have a DC go down, we can manually bring this back up elsewhere. Stats are only calculated every 24 hours and we can backfill from the log files for any missing days.
Better case: What if we wanted stats faster than every 24 hours? The load balancers tie off log files every hour I think - what if we processed this every hour? Seems like that doesn’t change the worst case plan, it would still work fine...
@robhudson - thoughts on above?
Open Question: How do multiple data centers affect how we collect metrics? Currently we parse log files the next day. < Do we still parse log files, I thought it all came from Google Analytics or the database these days.
Redis
Redis
We use redis to keep track of lists of apps because it can append() without sending the whole list over the network. We originally tried this with memcache and saturated the network.
One thing I think I’ve learned: No one runs Redis with slaves. All these stories of people weeping with delight at how incredible Redis is and how it’s unstoppably scalable, I’m pretty sure just run straight Redis, and not as a master/slave configuration. Perhaps we should do the same.
Any solutions we come up with here could leverage the messaging service to send invalidation lists between DCs, so worse case, we continue to use Redis as it is today.
Rabbit / Celery
Rabbit / Celery
tbd
Payments
Payments
Data about who has purchased what is stored in zamboni and solitude, with the latter being the definitive source.
When a new receipt needs to be issued - zamboni asks trunion to sign it.
I’d need to know what consistency MySQL zamboni has to see if it needs to be “stronger” than that.
Notes
Extra Slides
Google’s analysis of data storage engines across multiple data centers (2009). Today they offer both M/S and Paxos (Paxos is the default).
Current Datacenter Options
Mozilla
AWS (details)
Service Breakdown
SLA
← downtime is ok uptime is critical →
← Just a little Lots →
Traffic
Search
Payments
Catalog
Ratings
App Submission
Abuse Reports
App Review
My Apps
Feedback
Metrics
FxA
Installation
Admin Tools
Backups
Database: 5 year retention. details
File System: NFS backed up incrementally daily and fully each month to tape. Also 4 hour and 24 hour snapshots. Once we move to AWS this will change.
Geographic Segmentation
Geographic Segmentation
This is an open conversation I’m having with legal and our stakeholders. It’s unclear still how much this would benefit us specifically for privacy (it would certainly help for performance, but that’s already a goal taken into account in the rest of this document) and where we would draw the line. Some scenarios:
App-level Routing Bonus Slide (with sharding)
DC 1
Global Data
Regional Data Shards
DC 2
Global Data
Regional Data Shards
DC N
Global Data
Regional Data Shards
...
We would need to divide application data into Global or Regional. Global data examples would be Apps, Categories, Prices. Regional data examples would be Users, Logs.