1 of 47

Distributed Databases

an exploration of approaches and best practices

Julia Ferraioli

Developer Advocate

Brian Dorsey

Developer Programs Engineer

2 of 47

Your Hosts

Julia Ferraioli

Developer Advocate

@juliaferraioli

Brian Dorsey

Developer Programs Engineer

@briandorsey

3 of 47

Why Distributed Databases?

Image courtesy of Allie Brosh of Hyperbole and a Half

4 of 47

Images by Connie Zhou

5 of 47

Images by Connie Zhou

6 of 47

7 of 47

Your Panelists

Google Cloud Datastore

Tyler Hannan

@tylerhannan

Chris Ramsdale

@cramsdale

Will Shulman

@willshulman

Mike Miller

@mlmilleratmit

8 of 47

Riak: An Open Source, Distributed Key/Value Database

Basho Technologies

Tyler Hannan

9 of 47

About Basho Technologies

Who we are, what we do

  • Founded January 2008�
  • ~ 140 employees worldwide�
  • Headquarters in Cambridge, MA with offices in Reston, San Francisco, London, and Tokyo�
  • A distributed company building distributed systems�
  • Basho makes Riak & Riak CS

10 of 47

What Is Riak?

The Benefits of Riak

Riak is an Ops-friendly database that is:�

  • Fault-tolerant�
  • Highly-available�
  • Scalable�
  • Self healing

11 of 47

How Does That Work?

The Properties of a Distributed Database

Riak is a key/value store that is:

  • Open source�
  • Distributed�
  • Masterless�
  • Eventually consistent

12 of 47

Riak is a Key/Value Store

Simple Operations, Opaque Values, Layered with Extras

  • GET / PUT / DELETE�
  • Value is mostly opaque�
  • HTTP & Protobufs API + Client Libraries�
  • Extras:
    • MapReduce
    • Full-text search
    • Secondary indices
    • Pre/post-commit hooks

Bucket

Key

Value

Key

Value

Key

Value

13 of 47

Riak is Masterless

Deployed as a Cluster of Nodes

  • Based on principles of Dynamo specification

  • Any node can serve any request�
  • Data and load are spread evenly�
  • Gossip protocol (mesh network)�
  • Hinted handoff�
  • Achieve near-linear scale by adding hardware

Node

Node

Node

Node

Node

14 of 47

"Big Data", "Web Scale", "Other Terms"

When Your Data Is Critical, Scalability Is Critical

$ gcutil --project=RiakCluster addinstance \� riak5 --machine_type=n1-standard-4�$ gcutil --project=RiakCluster ssh riak5

# Install Riak programmatically or via startup script

$ riak-admin cluster join riak1@192.168.2.2$ riak-admin cluster plan�$ riak-admin cluster commit

Shell

15 of 47

When Would I Use Riak on Google Compute?

The situations & the circumstances

Operationally-friendly database

- combined with -

Operationally-scalable compute platform

for gaming, social, mobile, retail, advertising, etc.

16 of 47

Getting to Know Cloudant

Your Friendly Neighborhood NoSQL Database Service

Mike Miller

Co-Founder, Chief Scientist

17 of 47

Ships with a mobile strategy

18 of 47

19 of 47

20 of 47

Google Cloud Datastore

Scale with your users, not your servers

Chris Ramsdale

Product Manager, Google Cloud Platform

21 of 47

Google Cloud Platform Storage

Family of Managed Storage Services

22 of 47

Google Cloud Platform Storage

Family of Managed Storage Services

23 of 47

Announcing the Google Cloud Datastore

App Engine High Replication Datastore (HRD)

Fully Managed Schemaless Storage

Google Cloud Datastore

Google Cloud Datastore

HRD

Memcache

Managed Runtimes

Task Queues

Google Cloud Datastore

HTTP Interface

24 of 47

Google Cloud Datastore

Bringing Google Infrastructure to Developers

API Frontend

Cloud Datastore Service

Megastore

BigTable

Colossus

Networking

Server Hardware

25 of 47

Google Cloud Datastore

Bringing Google Infrastructure to Developers

API Frontend

Cloud Datastore Service

Megastore

BigTable

Colossus

Networking

Server Hardware

26 of 47

Google Cloud Datastore

High Availability

  • Auto-replication across multiple datacenters
  • Paxos consensus
  • Strong and Eventual consistency

API Frontend

Cloud Datastore Service

Megastore

BigTable

Colossus

Networking

Server Hardware

27 of 47

Google Cloud Datastore

High Scalability

  • Horizontal auto-scaling
  • Huge capacity
  • High durability

API Frontend

Cloud Datastore Service

Megastore

BigTable

Colossus

Networking

Server Hardware

28 of 47

Google Cloud Datastore

Access from Anywhere

API Frontend

Cloud Datastore Service

Megastore

BigTable

Colossus

Networking

Server Hardware

Managed Frontend

App Engine SDK

Unmanaged Backend

Cloud Datastore API

29 of 47

Google Cloud Datastore

Fully Managed

API Frontend

Cloud Datastore Service

Megastore

BigTable

Colossus

Networking

Server Hardware

30 of 47

Google Cloud Datastore

Fully Managed

API Frontend

Cloud Datastore Service

Megastore

BigTable

Colossus

Networking

Server Hardware

31 of 47

Google Cloud Datastore

Fully Managed

API Frontend

Cloud Datastore Service

Megastore

BigTable

Colossus

Networking

Server Hardware

32 of 47

An intro to MongoDB and MongoLab

in < 5 minutes

Will Shulman

CEO MongoLab

33 of 47

What is MongoDB?

34 of 47

MongoDB is an open source, high-performance, distributed, and document-oriented database.

35 of 47

MongoDB is document-oriented

a.k.a. object-oriented

{

_id: 1234,

author: { name: "Bob Davis", email : bob@davis.com },

post: "In these troubled times I like to …",

date: { $date: "2010-07-12 13:23UTC" },

location: [ -121.2322, 42.1223222 ],

rating: 2.2,

comments: [

{ user: "jgs32@gmail.com", upVotes: 22, downVotes: 14, text: "Great point" },

{ user: "holly.lu@gmail.com", upVotes: 421, downVotes: 22, text: "You're a moron" }

],

tags: [ "Politics", "Virginia" ]

}

36 of 47

MongoDB is great as an operational data store

. . . with a rich query language

db.posts.find({ author.name: "mike" })

db.posts.find({ rating: { $gt: 2 }})

db.posts.find({ tags: "Software" })

db.posts.find().sort({date: -1}).limit(10)

db.places.find({loc: {$within : {$center : [[40,40],10]}}})

db.places.aggregate({$group: { _id: "$state", pop: { $sum: "$pop" }}})

37 of 47

MongoDB is great as an operational data store

. . . with support for indexes on any field

db.posts.ensureIndex({ author.name : 1 })

db.posts.find({ author.name: "mike" })

38 of 47

MongoDB is a distributed database

. . . with high availability via Replica Set clusters

primary

secondary_0

secondary_n

client

  • Single master (read / write)
  • Multiple secondaries (read)
  • Automatic failover
  • Strong consistency or eventual consistency
  • Configurable write-concerns
    • w = 1
    • w = 3
    • w = "majority"

. . .

replication

39 of 47

MongoDB is a distributed database

. . . with horizontal scalability via Sharded Clusters

client

. . .

shard_0

shard_1

shard_n

mongos

config_0

config_1

config_2

mongos

. . .

Replica Set

40 of 47

What is MongoLab?

41 of 47

MongoLab is MongoDB-as-a-Service

42 of 47

MongoLab is MongoDB-as-a-Service

Features/benefits

  • provisioning and scaling
  • replication and backups
  • monitoring and alerting
  • rich web UI and tools
  • expert support

We automate the operational aspects of running MongoDB (so you don't have to)

Product offering

  • shared and dedicated VM plans
  • SSD plans
  • single-node and multi-zone Replica Set clusters
  • support for Sharded Clusters in 2014

43 of 47

MongoLab is MongoDB-as-a-Service

We support all the major cloud providers

New as of today!

44 of 47

SELECT questions FROM audience

45 of 47

Tyler Hannan

@tylerhannan

Chris Ramsdale

@cramsdale

Will Shulman

@willshulman

Mike Miller

@mlmilleratmit

Google Cloud Datastore

46 of 47

<Thank You!>

jrf@google.com

google.com/+JuliaFerraioli

@juliaferraioli

briandorsey@google.com

google.com/+BrianDorsey

@briandorsey

47 of 47