2 of 35

セッションの目的

AppEngineを知らない人：

AppEngineでのデータストアの使い方の感触を掴んでもらう

AppEngineを知っている人：

新しいNDBについて知ってもらう

3 of 35

NDBとは？

NDBは、データストアを操作するためのAPI

データストアとは？

AppEngineでデータを保存するところ
SQLデータベースではない
ゆるいスキーマ
スケールする
クエリもトランザクションもある�（単純なKey-Valueストアではない）

4 of 35

Model

from google.appengine.ext import ndb

class Person(ndb.Model):

name = ndb.StringProperty()

email = ndb.StringProperty()

created = ndb.DateTimeProperty(auto_now_add=True)

普通のORMと同じ

この例では、Personがテーブル的なもの(Kind)

5 of 35

Properties

IntegerProperty, FloatProperty, BooleanProperty, StringProperty, TextProperty, BlobProperty, DateTimeProperty, DateProperty, TimeProperty, GeoPtProperty, KeyProperty, BlobKeyProperty, UserProperty, StructuredProperty, LocalStructuredProperty, JsonProperty, PickleProperty, GenericProperty, ComputedProperty

いろいろ用意されています。

6 of 35

Create, Read, Delete

# create

obj = Person(name = u'ぐーぐる太郎')

key = obj.put()

# read

obj = key.get()

# delete

key.delete()

7 of 35

StructuredProperty

class Address(Model):

street = StringProperty()

city = StringProperty()

class Person(Model):

name = StringProperty()

address = StructuredProperty(Address)

address = Address(street='...', city='...')

person = Person(name='Jotaro', address=address)

8 of 35

Query

q = Person.query()

q2 = q.filter(Person.name == u'太郎')

q3 = q2.order(-Person.created)

persons = q3.fetch(10)

for person in persons:

print person.email

9 of 35

Paging, Cursor

# 1ページ目

q = Person.query(...)

rets, cur, more = q.fetch_page(5)

# 2ページ目

q = Person.query(...)

rets, cur, more = q.fetch_page(5, start_cursor=cur)

10 of 35

Transaction

@ndb.transactional(xg=True)

def move_money():

act_a = key_a.get()

act_b = key_b.get()

act_a.amount -= 100

act_b.amount += 100

ndb.put_multi([act_a, act_b])

move_money()

11 of 35

Cache

キャッシュ機能が標準でサポート

なぜCacheが必要か？

DatastoreからのReadは数十ミリ秒～
速いレスポンスはWebにおける正義
Cacheなら速い

しかも、コスト節約になる

Datastoreの利用は従量課金
Memcacheは無料

12 of 35

Cache

Context (In-process)

micro seconds
オンメモリ
生存期間は1リクエスト内

Memecache

milli seconds
データストアより速い、Contextより遅い
アプリ全体で共有

13 of 35

Context

コンテキスト

リクエストごとにコンテキストは別
トランザクションも別のコンテキスト
キャッシュの設定
データストアの設定

処理によってオプションを切り替える場合に使う

14 of 35

Context Cache

a = key.get() # call RPC

b = key.get() # from cache, not datastore

# aとbは同一のオブジェクト

a.foo = 'newvalue'

print b.foo # 'newvalue'

15 of 35

Controlling cache

# context

ctx = ndb.get_context()

ctx.set_cache_policy(

lambda key: key.kind() != 'Person')

# onetime

obj = key.get(use_cache=False)

# clear all caches

ctx.clear_cache()

16 of 35

Cache

クエリ

クエリはキャッシュからデータを取得しない
取得したデータは Context Cache に入れる

トランザクション

Memcacheは使わない
Tx内はTx外とは別コンテキスト

17 of 35

Asynchronous

すべてのAPIはAsyncとSyncの両対応

get_async
put_async
delete_async
fetch_async
transaction_async

NDB内部では、同期APIは非同期APIをラップ

return get_async(...).get_result()

18 of 35

Asynchronous

なぜ非同期か？

DatastoreなどのIO待ち時間の有効活用
速くレスポンスを返すため
インスタンス時間の節約

19 of 35

同期API

blog = Blog.get_by_id(blog_id)

comments = Comment.latest(blog_id)

# 処理

= Blog.get_by_id =>

= Comment.latest =>

20 of 35

非同期API

b_future = Blog.get_by_id_async(blog_id)

c_future = Comment.latest_async(blog_id)

blog = b_future.get_result()

comments = c_future.get_result()

# 処理

= Blog.get_by_id_async =>

= Comment.latest_async =>

21 of 35

非同期API

future_a = async_a()

future_b = async_b()

ret_a = future_a.get_result()

# ret_a を使った処理

ret_b = future_b.get_result()

# ret_b を使った処理

22 of 35

非同期API

# 複数の非同期APIコールのどれが最初に

# 終わるかわからないので、常に最適ではない

==== async_a ====>

= async_b =>

| = ret_a処理 =>

| = ret_b処理 =>

# ここでret_bを使えるのに……

23 of 35

コールバック

future_a = async_a()

future_b = async_b()

future_a.add_callback(proc_a)

future_b.add_callback(proc_b)

Future.wait_all([future_a, future_b])

24 of 35

コールバック

# 終わったらコールバックを呼んでくれる

==== async_a ====>

= async_b =>

| = proc_a =>

|= proc_b =>

25 of 35

コールバック

どこかで見たような、コールバック地獄……

Pythonならもっと美しく書ける

tasklet を使おう！

26 of 35

tasklets

スレッドなしで、処理を並行させる。

処理を細切れにして、

実行できる部分を実行していく。

(concurrent)

# Tornadoのgenモジュールと似ている

27 of 35

yieldがないと……

def fib(limit):

r, x, y = [], 1, 1

while x <= limit:

r.append(x)

x, y = y, x + y

return r

for i in fib(10):

print i # 1, 1, 2, 3, 5, 8

# 結果を全部作ってからfibを抜ける

# 引数が大きい数値だと、最初の1もprintされない

28 of 35

yieldがあると……

def fib(limit):

x, y = 1, 1

while x <= limit:

yield x

x, y = y, x + y

for i in fib(10):

print i # 1, 1, 2, 3, 5, 8

# yieldで関数を抜けて値を返す

# 引数が大きい数値でも、すぐにprint開始される

29 of 35

yield

yieldは関数を「一時停止」して抜ける

（returnは関数を「終了」して抜ける）

yieldで一時停止したところに、

あとでまた戻ってこれる

非同期APIを呼んでからyieldすることで、

他の処理を行い、APIが終わってから戻れる

30 of 35

tasklets

@tasklet

def proc_a():

ret_a = yield async_a()

# ret_aを使った処理

@tasklet

def proc_b():

ret_b = yield async_b()

# ret_bを使った処理

yield proc_a(), proc_b()

#コールバックだと……

def proc_a(ret_a):

...

def proc_b(ret_b):

...

future_a = async_a()

future_b = async_b()

future_a.add_callback(proc_a)

future_b.add_callback(proc_b)

Future.wait_all([future_a, future_b])

31 of 35

tasklets

# 処理が終わったyieldから再開するので最適

==== async_a ====>

= async_b =>

| = proc_a =>

|= proc_b =>

32 of 35

比較

# 同期

blog = Blog.get(key)

comments = Comment.latest(key)

# 非同期

fb = Blog.get_async(key)

fc = Comment.latest_async(key)

blog = fb.get_result()

comments = fc.get_result()

# tasklets (非同期)

blog, comments = yield Blog.get_async(key), \

Comment.latest_async(key)

33 of 35

tasklets

非同期処理を待っている間に、他の処理を実行させることができる
非同期処理を使っているのに、同期処理のようにシーケンシャルにコードを書ける
待ち時間を最小にしてくれる

34 of 35

まとめ

NDBを使うとDatastoreは怖くない
クエリもトランザクションもある�（高機能なNoSQL）
Cache機能で速く安く
非同期で速く安く

35 of 35

参考資料

このドキュメント

http://goo.gl/Ugif4

Google App Engine

https://developers.google.com/appengine/

NDB

https://developers.google.com/appengine/docs/python/ndb/