Distributed storage with Shameless
Scaling rate storage at HotelTonight
Olek Janiszewski
@exviva
About me
twitter.com/exviva
github.com/exviva
HotelTonight
world’s first mobile-only last minute booking app
Hotels
40% rooms unsold
Customers
20-70% discounts
make the world more spontaneous
spontaneity brings
technical challenges
Simplicity
Perishable inventory
The problem
checkin_date�stay_length
hotel_id�price
discount_type
discount_amount
...
CREATE TABLE rates (� hotel_id INT,� checkin_date DATE,� stay_length INT,� room_type VARCHAR,� rate_plan VARCHAR,� price DECIMAL,� ...�)
rates | ||||||
hotel_id | checkin_date | stay_length | price | ... | ... | ... |
3 | 2016-09-03 | 1 | 90.00 | | | |
4 | 2016-09-04 | 2 | 170.00 | | | |
Issues with a single table solution
Goals
The solution
Shameless
Shameless
Shameless
Append Only
Sharded & Partitioned
rates_01
index_01
rates_02
index_02
rates_03
index_03
rates_04
index_04
partition_01
rates_13
index_13
rates_14
index_14
rates_15
index_15
rates_16
index_16
partition_04
...
Schemaless “schema”
CREATE TABLE rates_000245 (� uuid VARCHAR,� version INT,� body TEXT�)
Example rate body
MessagePack.pack(� hotel_id: 5,� checkin_date: "2016-03-08",� stay_length: 3,� price: 355.55,� discount: 18.0,� nightly_rates: [ {...}, {...}, {...} ],
# ...�)
CREATE TABLE rates_000245 (� uuid VARCHAR,� version INT,� body TEXT�)
MessagePack.pack(� hotel_id: 5,� checkin_date: "2016-03-08",� stay_length: 3,� price: 355.55,� discount: 18.0,� nightly_rates: [ {...}, {...}, {...} ],
# ...�)
How do we query MessagePack?
Index Tables
Example index
CREATE TABLE primary_index_000245 (� hotel_id INT,� checkin_date DATE,� stay_length INT,� uuid VARCHAR�)
Index Tables
rates | ||||||
hotel_id | checkin_date | stay_length | price | ... | ... | ... |
3 | 2016-09-03 | 1 | 90.00 | | | |
4 | 2016-09-04 | 2 | 170.00 | | | |
rates | ||||
hotel_id | checkin_date | stay_length | price | ... |
1781 | 2016-09-03 | 1 | 90.00 | |
2365 | 2016-09-04 | 2 | 170.00 | |
primary_index_000317 | |||
hotel_id | checkin_date | stay_length | uuid |
2365 | 2016-09-04 | 2 | c88dfa... |
primary_index_000245 | |||
hotel_id | checkin_date | stay_length | uuid |
1781 | 2016-09-03 | 1 | 1fec09... |
primary_index_000245 | |||
hotel_id | checkin_date | stay_length | uuid |
1781 | 2016-09-03 | 1 | 1fec09... |
rates_000098 | ||
uuid | version | body |
1fec09... | 1 | msgpack(h, cd, sl, price, ...) |
rates | ||||
hotel_id | checkin_date | stay_length | price | ... |
1781 | 2016-09-03 | 1 | 90.00 | |
primary_index_000245 | |||
hotel_id | checkin_date | stay_length | uuid |
1781 | 2016-09-03 | 1 | 1fec09... |
rates_000098 | ||
uuid | version | body |
1fec09... | 1 | msgpack(h, cd, sl, pr, ...) |
Show me the Ruby!
First some pseudocode
Writing
def write(body)
uuid = SecureRandom.uuid� write_content(uuid, body)� write_to_index(uuid, body)�end
Writing - content
def write_content(uuid, body)
shard_id = find_shard(uuid)� partition_id = find_partition(shard_id)� db = DB_CONNECTIONS[partition_id]� content_table = db[“rates_#{shard_id}”]�� content_table.insert(� uuid: uuid,� version: 1,� body: serialize(body)) # serialize to msgpack�end
Writing - index
def write_to_index(uuid, body)� shard_id = find_shard(body[:hotel_id])� partition_id = find_partition(shard_id)� db = DB_CONNECTIONS[partition_id]� primary_index = db[“primary_index_#{shard_id}”]�� primary_index.insert(
hotel_id: body[:hotel_id],� checkin_date: body[:checkin_date],� stay_length: body[:stay_length],� uuid: uuid)�end
Querying
def find(hotel_id, checkin_date, stay_length)� uuids = find_rate_uuids(hotel_id,
checkin_date,
stay_length)� uuids.map {|u| find_latest_rate(u) }�end
Querying - index
def find_rate_uuids(hotel_id, checkin_date, stay_length)� shard_id = find_shard(hotel_id)� partition_id = find_partition(shard_id)� db = DB_CONNECTIONS[partition_id]� primary_index = db[“primary_index_#{shard_id}”]�� primary_index.where(� hotel_id: body[:hotel_id],� checkin_date: body[:checkin_date],� stay_length: body[:stay_length])�end
Querying - content
def find_latest_rate(uuid)� shard_id = find_shard(uuid)� partition_id = find_partition(shard_id)� db = DB_CONNECTIONS[partition_id]� content_table = db[“rates_#{shard_id}”]�� row = content_table.where(uuid: uuid).� order(:version).last�� deserialize(row[:body]) # deserialize from msgpack�end
gem 'shameless'
Initialization
# config/initializers/rate_store.rb��RateStore = Shameless::Store.new(:rate_store) do |c|� c.partition_urls = [
ENV['RATE_STORE_DATABASE_URL_0'],
ENV['RATE_STORE_DATABASE_URL_1']
]
# total number of shards across all partitions� c.shards_count = 512�end
Defining models
# app/models/rate.rb��class Rate� RateStore.attach(self)�� # ...�end
Defining indices
class Rate� RateStore.attach(self)�� index do� integer :hotel_id� string :room_type� string :check_in_date�� shard_on :hotel_id # required, values need to be numeric� end�end
Writing - first insert
# All index fields are required,
# the rest is the schemaless content�rate = Rate.put(hotel_id: 1,
room_type: '1 bed',
check_in_date: Date.today,
discount_type: 'geo',
net_price: 120.0) # potentially ~20 fields
rate[:hotel_id] # => 1�rate[:net_price] # => 120.0
Writing - inserting a new version
rate[:net_price] = 130.0
rate.ref_key # => 0�rate.save
rate.ref_key # => 1
Querying
rates = Rate.where(hotel_id: 1,
room_type: '1 bed',
check_in_date: Date.today)
Creating tables
RateStore.create_tables!
The stats
Trade-offs
Inspirations
Thank you!
Questions?
Olek Janiszewski
@exviva
20€ off first booking with invite code OLEK
(audience downloads app)