1 of 54

Distributed storage with Shameless

Scaling rate storage at HotelTonight

Olek Janiszewski

@exviva

2 of 54

About me

  • Ruby ♡ since 2007
  • Berlin ♡ since 2011
  • Platform Engineer at HotelTonight
  • Into scalability, machine learning
  • Wannabe Red Hot Chili Pepper

twitter.com/exviva

github.com/exviva

3 of 54

HotelTonight

world’s first mobile-only last minute booking app

4 of 54

Hotels

40% rooms unsold

Customers

20-70% discounts

5 of 54

make the world more spontaneous

6 of 54

spontaneity brings

technical challenges

7 of 54

Simplicity

  • few taps and a swipe
  • curated list of hotels
  • fast

8 of 54

Perishable inventory

  • availability and pricing change constantly
  • real-time
  • book up to 7 days in advance
  • book up to 5 nights
  • inventory grows by 100% YoY

9 of 54

The problem

10 of 54

11 of 54

checkin_date�stay_length

hotel_id�price

discount_type

discount_amount

...

12 of 54

CREATE TABLE rates (� hotel_id INT,� checkin_date DATE,� stay_length INT,� room_type VARCHAR,� rate_plan VARCHAR,� price DECIMAL,� ...�)

13 of 54

rates

hotel_id

checkin_date

stay_length

price

...

...

...

3

2016-09-03

1

90.00

4

2016-09-04

2

170.00

14 of 54

Issues with a single table solution

  • single table, single db
  • long, scary migrations
  • write-heavy

15 of 54

Goals

  • scale storage to handle any foreseeable future amount of data
  • consistent read and write performance
  • no migration anxiety

16 of 54

The solution

Shameless

17 of 54

Shameless

  • append-only
  • schemaless
  • sharded & partitioned

18 of 54

Shameless

  • built on top of Sequel (we’re using MySQL)
  • data stored in schemaless columns
  • “indices” are stored in separate tables

19 of 54

Append Only

  • only allows insert operations, no updates or deletes
  • all data is written to the end of the table
  • possible to get a snapshot of a point in time

20 of 54

Sharded & Partitioned

  • n shards (tables)
  • across m partitions (databases)

rates_01

index_01

rates_02

index_02

rates_03

index_03

rates_04

index_04

partition_01

rates_13

index_13

rates_14

index_14

rates_15

index_15

rates_16

index_16

partition_04

...

21 of 54

Schemaless “schema”

CREATE TABLE rates_000245 (� uuid VARCHAR,� version INT,� body TEXT�)

22 of 54

Example rate body

MessagePack.pack(hotel_id: 5,checkin_date: "2016-03-08",stay_length: 3,price: 355.55,discount: 18.0,nightly_rates: [ {...}, {...}, {...} ],

# ...)

23 of 54

CREATE TABLE rates_000245 (� uuid VARCHAR,� version INT,� body TEXT�)

MessagePack.pack(hotel_id: 5,checkin_date: "2016-03-08",stay_length: 3,price: 355.55,discount: 18.0,nightly_rates: [ {...}, {...}, {...} ],

# ...)

24 of 54

How do we query MessagePack?

25 of 54

Index Tables

  • separate table per query type
  • allows us to index against the contents of the body
  • maps index fields to rate UUIDs

26 of 54

Example index

CREATE TABLE primary_index_000245 (� hotel_id INT,� checkin_date DATE,� stay_length INT,� uuid VARCHAR�)

27 of 54

Index Tables

  • one query to Shameless = 1+n SQL queries
  • 1 query to index
  • n queries for n rates

28 of 54

rates

hotel_id

checkin_date

stay_length

price

...

...

...

3

2016-09-03

1

90.00

4

2016-09-04

2

170.00

29 of 54

rates

hotel_id

checkin_date

stay_length

price

...

1781

2016-09-03

1

90.00

2365

2016-09-04

2

170.00

primary_index_000317

hotel_id

checkin_date

stay_length

uuid

2365

2016-09-04

2

c88dfa...

primary_index_000245

hotel_id

checkin_date

stay_length

uuid

1781

2016-09-03

1

1fec09...

30 of 54

primary_index_000245

hotel_id

checkin_date

stay_length

uuid

1781

2016-09-03

1

1fec09...

rates_000098

uuid

version

body

1fec09...

1

msgpack(h, cd, sl, price, ...)

31 of 54

rates

hotel_id

checkin_date

stay_length

price

...

1781

2016-09-03

1

90.00

primary_index_000245

hotel_id

checkin_date

stay_length

uuid

1781

2016-09-03

1

1fec09...

rates_000098

uuid

version

body

1fec09...

1

msgpack(h, cd, sl, pr, ...)

32 of 54

Show me the Ruby!

33 of 54

First some pseudocode

34 of 54

Writing

def write(body)

uuid = SecureRandom.uuid� write_content(uuid, body)� write_to_index(uuid, body)�end

35 of 54

Writing - content

def write_content(uuid, body)

shard_id = find_shard(uuid)� partition_id = find_partition(shard_id)� db = DB_CONNECTIONS[partition_id]� content_table = db[“rates_#{shard_id}”]�� content_table.insert(uuid: uuid,version: 1,body: serialize(body)) # serialize to msgpackend

36 of 54

Writing - index

def write_to_index(uuid, body)� shard_id = find_shard(body[:hotel_id])� partition_id = find_partition(shard_id)� db = DB_CONNECTIONS[partition_id]� primary_index = db[“primary_index_#{shard_id}”]�� primary_index.insert(

hotel_id: body[:hotel_id],checkin_date: body[:checkin_date],stay_length: body[:stay_length],uuid: uuid)end

37 of 54

Querying

def find(hotel_id, checkin_date, stay_length)� uuids = find_rate_uuids(hotel_id,

checkin_date,

stay_length)� uuids.map {|u| find_latest_rate(u) }�end

38 of 54

Querying - index

def find_rate_uuids(hotel_id, checkin_date, stay_length)� shard_id = find_shard(hotel_id)� partition_id = find_partition(shard_id)� db = DB_CONNECTIONS[partition_id]� primary_index = db[“primary_index_#{shard_id}”]�� primary_index.where(hotel_id: body[:hotel_id],checkin_date: body[:checkin_date],stay_length: body[:stay_length])end

39 of 54

Querying - content

def find_latest_rate(uuid)� shard_id = find_shard(uuid)� partition_id = find_partition(shard_id)� db = DB_CONNECTIONS[partition_id]� content_table = db[“rates_#{shard_id}”]�� row = content_table.where(uuid: uuid).� order(:version).last�� deserialize(row[:body]) # deserialize from msgpackend

40 of 54

gem 'shameless'

41 of 54

Initialization

# config/initializers/rate_store.rb��RateStore = Shameless::Store.new(:rate_store) do |c|� c.partition_urls = [

ENV['RATE_STORE_DATABASE_URL_0'],

ENV['RATE_STORE_DATABASE_URL_1']

]

# total number of shards across all partitions� c.shards_count = 512end

42 of 54

Defining models

# app/models/rate.rb��class RateRateStore.attach(self)�� # ...end

43 of 54

Defining indices

class RateRateStore.attach(self)�� index do� integer :hotel_id� string :room_type� string :check_in_date�� shard_on :hotel_id # required, values need to be numericendend

44 of 54

Writing - first insert

# All index fields are required,

# the rest is the schemaless content�rate = Rate.put(hotel_id: 1,

room_type: '1 bed',

check_in_date: Date.today,

discount_type: 'geo',

net_price: 120.0) # potentially ~20 fields

rate[:hotel_id] # => 1�rate[:net_price] # => 120.0

45 of 54

Writing - inserting a new version

rate[:net_price] = 130.0

rate.ref_key # => 0�rate.save

rate.ref_key # => 1

46 of 54

Querying

rates = Rate.where(hotel_id: 1,

room_type: '1 bed',

check_in_date: Date.today)

47 of 54

Creating tables

RateStore.create_tables!

48 of 54

The stats

49 of 54

  • 1600 writes/sec
  • Up to 3000 reads/sec

50 of 54

  • 2 ms read latency
  • 5 ms write latency

51 of 54

Trade-offs

  • transactions
  • parallel writes
  • no JOINs
  • no aggregations

52 of 54

Inspirations

  • Uber
  • Pinterest
  • Friendfeed

53 of 54

54 of 54

Thank you!

Questions?

Olek Janiszewski

@exviva

20€ off first booking with invite code OLEK

(audience downloads app)