Love cloud cost savings ? Let’s talk !
rolland @ rubrik
For the next 30 mins or so …
Efforts of 3 engineers (+ yours truly), reduced costs by 50%, under 3 months
Some answers to:
“How many people do I need ? How much time do I spend ?”
“Do I need to re-write my stuff ?” “Will I break my SLOs ?”
“Is this even relevant for me ?”
About us …
rubrik’s products & solutions
Team mission & goals
Services & capabilities offered
Also see Proactive, Real-time Monitoring and Alerting for Customer Engagement
Workflow under consideration
Pipeline #1�Custom code: data prep, extractors
Pipeline #2�Generic extract & load
Yeah, these are ETL pipelines
Where’s the problem ?
More data for decisions + Business growth (Yes, it’s always scale !)
Where were we heading ? “Can you reduce costs by 90% ?”
“What’s the plan ?” Measure, analyze, correct (repeat)
Cost analysis tools: CloudHealth + something custom (why ?)
Cost observability
Cost attribution
Problems, insights & solutions
Data not used ? no ROI => stop processing !
Pipeline #1: Extractor compute => waste => pre-filter data
Pipeline #2: Input size skew => utilization => bucketize
Reduced cost-dominant resource: compute (doh !)
Where did we end up ?
70:30
45:55
Compute
Storage
$$$
What else is possible ?
Flexible processing (eg. on-demand) -vs- every upload “event”
Data reduction at source
Pipeline #1: generic “declarative” extractor
Infra changes & tuning: eg. spot nodes
Learnings
Start quick & dirty: no EMR metrics ? no problem
Low hanging fruit: top 5 extractors, top 2 tables
Make it a “team sport” - validate ROI
Preparation ? Yes, familiarity with the system helped
Misses ? “How do I reduce my extractor costs ?”
Hope that helped …