Live Google IO notes
Richard L. Burton III from SmartCode LLC
http://www.EasyShout.com
http://twitter.com/rburton
Dushyanth Inguva from Lab49
http://twitter.com/dushyanth
This is a running notes. Please bear with us. We will post this as a blog entry soon.
Keynote speech:
-
Five things that Google is excited about
-
Canvas
-
Drawing on the screen without Images, Silverlight, Flex..
-
API for drawing and animation at pixel level control.
-
A custom Canvas tag using HTML 5.
-
Avoid browser locks up
-
3 D standards
-
Very nice in the browser demo of 3D games.
-
Could this lead into 3D online games that are pay to play?
-
Very Nice performance from the demo.
-
Javascript talks to the GPU (Graphics processing unit). Plug-in..
-
Browsers supported: Chrome, firefox, safari, Opera
-
Video
-
YouTube.com HTML5 Video demo
-
Interesting idea's on what the Video tag can do for Youtube.
-
Browsers supported: Firefox, Chrome, Safari, Opera (No IE since they haven't proven themselves to show commitment)
-
Geolocation
-
Google and other companies have created large databases of cellular and other sources for triangle to a single location.
-
Pretty good coverage of the world.
-
Jay Sullivan presentation (VP of Mozilla)
-
Google Maps (Share my location) #failed - second try worked.
-
Jay nervously tries another Geolocation.. :)
-
Supported Browsers: Chrome, Firefox, Safari, Opera
-
Supported on the iPhone for OS3
-
App Cache & Databases
-
Allows for offline storing of data
-
Nice UI for looking to the database which is SQLlite
-
Manifest file tells the browser where to store items locally. What about more complex items?
-
Android offline mode..
-
Michael Abbott from Palm VP
-
Has a Bad haircut
-
Mobile applications using more JavaScript + HTML + CSS
-
Wants Accelerometer API in HTML5... that makes no sense
-
Web Workers
-
HTML5 Threading..
-
Allows for background threads to avoid locking of the view.
Sessions
How do I code Thee? Let me count the ways
By: Dan Morrill Google Developer
-
What is Android
-
Code stack for making phone calls
-
Network stack and internet client
-
Platform for running code.
-
Android is a framework for interacting components
-
apps can pull in pieces of other apps, the web, or even native
-
But we didn't create android in an attempt to "own mobile"
-
Android has an application platform, true
-
Agenda
-
Three ways of writing code
-
how to use them
-
Useful comparisons and statistics
-
Future direction
-
Not Going to:
-
Not covering how to write apps
-
Rehash stuff that's covered elsewhere
-
Not going to pass Judgement
-
Three Flavors of android development
-
Managed Code
-
Ajax
-
Native Code - C code for the ARM processor
-
Dalvik
-
VM, like jVM
-
Memory-protected, GC, Lifecyucled-managed
-
Optimized for embedded machines
-
Build to reduce much of the need for JTIC
-
Custom bytecode convertor
-
Provides core frameworks
-
APIs are backed by system infrastructure in native code
-
OpenGL, Binder IPC, media,
-
Rich UIs
-
Background services
-
shared components
-
tight integration with system ui events
-
What Dalvik can't do?
-
Some apps need raw speed
-
Some apps don't need tight integration
-
AJAX Applications
-
Broken up into convenient declarative layouts & code
-
JavaScript code mutates the DOM to create UI effects
-
Network access is available via XMLHttpRequest
-
Recently, <canvas> allows JavaScript to do direct painting
-
Android 1.5
-
Androids's browser is based on webkit + SquirrelFish
-
Webkit v528.5 Equivalent to Sfari 5 Beta
-
Reports wrong User-Agent string as 3.1.2
-
Includes Gears 0.5
-
HTML 5 - some elements
-
Run Code outside the main thread
-
Store data and pages locally
-
Can't do
-
HTML 5
-
Background processing
-
Code only runs when browser is open and your page is loaded
-
access system frameworks
-
Demo is about - K-Means clustering in 2,000 words
-
NDK Native Development Kit
-
Designed for Physics Simulations
-
fast loading of largish data files
-
Speed-intensive lookups, such as for IMEs
-
Custom VMs
-
UnSupported things
-
technically, other libraries are present, but have no guarantees
-
Use them, and you deserve the market user ratings you will get when your app breaks
-
Current Set of APIs is limited
-
Supported libraries libc and libm, more to come
-
Still runs in a sandbox
-
Future libraries TBD
-
Supports Java objects injected into JavaScript. Java objects now can be interfaced via JavaScript
-
Nice way to enhance your AJAX applications for speed or low level APIs
Effective GWT: Developing a complex, high-performance app with Google Web Toolkit
By:
-
Lombardi Software - Blueprint
-
GWT What and Why
-
Generates optimized javascript (like escape analysis etc)
-
High Fidelity Mockup
-
Done in photoshop
-
More expensive
-
Finalize the icons and colors etc
-
Going to code
-
Be involved in the design
-
You need to know css and HTML DOM
-
What is the appropriate DOM structure
-
How to create and manipulate GWT
-
Design
-
Design outer layer with divs
-
Faster way is to do html panel and divs ? (what does that mean?)
-
DOM structure is created by GWT decorator panel
-
And, you can apply css on them
-
Handling Window Resizing
-
Goal is to handle browser window resizing
-
Static HTML you're limited to what you can achieve in css
-
Listen to ResizeEvent from window and propagate sizes down to children
-
Because they only do fixed sized row, they can do background images to do styling in table rows
-
Animation
-
Not all browsers do CSS3 ?
-
Helps users understand the behavior of application (provides visual feedback)
-
Done all in java in GWT
-
Original Implementation
-
Iterate through your objects, create widgets and add it to containers
-
Javascript - object creation and GC is expensive especially in IE6
-
New Implementation
-
Generate raw HTML in Javascript
-
Use flyweight pattern for event handling
-
They create html inside java (javascript) and do a DOM.setInnerHTML()
-
Event Handling
-
When All Else Fails
-
They dual compile code to Java and Javascript
-
If they find that the browser's javascript engine is slow, they render it on the server and sent to the client
-
So based on performance, they can dynamically move rendering between server and client
-
Compiling GWT code is slow
-
By default, GWT compiles code to 5 different browsers
-
You can tell GWT to compile code only for a single browser - locale, this speeds up development time
-
Well, you can run hosted mode and that never compiles :-) or use GWT 2.0 it never compiles and supports out of process hosted mode !
-
Instead of doing DOM manipulation over objects like Element.getStyle().setProperty('css property'), put that property in a css file
-
Checkout Episodes plugin from the creator of YSlow. It sends client performance numbers back to server.
This presentation is more of a war story. It deals with the Blueprint product. Because most of their clients run it in IE6, Lombadri had to go through extra steps to optimize their application to rely less on IE6's javascript engine. These techniques also apply when you have a really rich GWT application.
Transactions Across Datacenters (and Other Weekend Projects)
-
Talked about Weak Consistency, Eventual Consistency (thanks to Werner for making this popular), Strong consistency (AppEngine datastore, File systems, RDBMSes, Azure tables)
-
Transactions
-
Why across datacenters?
-
Catastrophic failures, expected failures, routing maintenance, geo locality (CDN, edge caching etc)
-
Basically vertically partitioning your application
-
Packet roundtrip from west to east coast is 30ms
-
Why not across datacenters
-
Within a datacenter, it costs much lesser to communicate, low latency (1ms within rack, 1-5 ms across)
-
Outside datacenter
-
Multihoming
-
????
-
As soon as you write across multiple locations, you will have consistency problems
-
Realtime writes is always the hardest
-
Don't do it
-
A datacenter in silicon valley went down and twitter and friendfeed went down for more than 2 hrs. Both did not have multihoming
-
Option 2:
-
Better but not ideal
-
Have multiple datacenters, have primary and secondary
-
Mediocre at catastrophic failure
-
window of lost data because of asynchronous replication
-
Examples:
-
Amazon Web Services
-
Banks, Brokerages etc
-
Depending on systems, all your slaves can serve reads
-
Option 3: True Multihoming
-
Simultaneous writes in different data centers
-
Two way: hard
-
NASDAQ does 2 datacenters and does 2phase commit across them for transactions
-
Expensive and definitely slower
-
Techniques and Tradeoffs
-
Backups
-
Make a copy
-
Dog Fooding - they make other teams use their internal systems so they get to iterate and then release the API
-
Maser Slave replication
-
Usually asynchronous
-
Good for throughput, latency
-
Most RDBMSs do binary log based replication
-
AppEngine also follows this model.
-
AppEngine write is much slower than a relational db
-
But, it is geared for read more than write
-
Multi Master Replication
-
Support writes at multiple locations and then merge them
-
Asynchronous, eventual consistency (Amazon's shopping cart service does this)
-
You cannot rely on a global clock
-
Because of this, you cannot do global transactions
-
Another way of thinking about this is this is like mutlithreading without locks
-
Two Phase Commit
-
Heavyweight, synchronous, high latency
-
Semi distributed as there is a coordinator
-
Paxos
-
Fully distributed consensus protocol
-
No single master like 2PC
-
Still has longer latency
-
Gives a better throughput than 2PC
-
Paxos for the Datastore
-
Closer datacenter? not really because you are doing two round trips
-
Same datacenter? no
-
Opt In...
-
Paxos for AppEngine
-
They use that to coordinate when moving between datacenters
-
Use a lock server
-
Managing memcache
-
Conclusion
-
No silver bullet
-
Embracing tradeoffs
-
Consistency is app driven, the platform cannot make that choice.
-
AppEngine is going to support options in consistency models in future (Nice)
Distributed Transactions for Google AppEngine
Notes: This presentation is dry with respects to the presentation. The presentation is basically read directly off of the slides and no in-depth knowledge or experience of the speaker is shared. The text below is from the actual slides word-for-word.
-
Thanks to
-
Daniel Shawcross Wilkerson - Unemployed (Interested?)
-
Simon Fredrick Vicente Goldsmith - Coverity
-
Robert Jonson - SUNY Stony Brook
-
Erick Armbrust - Google
-
Ryan Barrett - Google
-
Correctness & Performance
-
Correctness and performance are the heart of engineers.
-
Correctness - The output is what you want.
-
Performance - The output doesn't cost too much
-
Where cost is any resource: time, space, energy, money, people, machines
-
Invariants
-
reasoning about program correctness requires invariants
-
Invariant a sentence that is always true; that which does not change when all else is changing.
-
Use invariants from which you can ensure correctness
-
Initialize invariants during construction;
-
Maintain them during operations
-
If you aren't thinking in terms of invariants, start now.
-
If you get nothing else out of my talk remember this.. invariants!
-
Example Invariant: Data-Structures
-
A doubly-linked list module/class maintains the invariant that (credit: Scott McPeak):
-
x~>next == null OR x ~> next -> prev ==
-
Many other data-structures are similar.
-
Scalability Requires Distributed Computing
-
Unbound performance scalability requires a large and therefore distributed computing machine
-
"small" machines give us the illusion of a single-point abstraction; this makes us lazy programmers.
-
however, large/distributed machines are
-
non-reliable: ongoing random local failures
-
non-serial: operate in massive parallel, and
-
non-synchronized: lack coordination of behavior
-
This is the future
-
Distributed computing makes maintaining invariants hard
-
Transactions maintain invariants
-
(Correctness) let a "Good" state be one where all invariants are satisfied
-
(Performance) To make something happen, invariants often must be temporarily violated.
-
Call a set of operations that take us from one good state to another "transaction".
-
ACID: The correctness Perspective
-
Correctness: program state stays within the subset of good machine states.
-
Performance: something has to happen
-
Transactions jump the machine from state to stat:
-
Durable: states persist
-
Atomic and Isolated: There are no in-between states for yourself or others.
-
Consistent: Jump only from good state to good state.
-
Local transactions
-
If transaction data is localized: gathered onto one machine, one locality within the distributed system,
-
then the process is easier to control and one may more easily implement the ACID properties
-
Google AppEngine provides local transactions:
-
at Object construction time objects my be grouped;
-
a transaction my only operate on the data of one group.
-
but then only local invariants can be maintained!
-
(Note that GAE is also strongly consistent.) (Can't run a query within a transaction on GAE?)
-
Algorithm Overview (Basically a two-phase commit)
-
Run client:
-
Serve reads and record their version numbers;
-
Buffer writes in shadow object.
-
Get write locks on written objects in key-order.
-
Check version numbers of read objects;
-
Also check they are not write locked
-
Copy shadows to their user objects in a local transactions;
-
Also update object version numbers and
-
Delete write locks and shadows.
-
What occurs when a thread who has a lock died? The state of the transaction is stored in the database and times are kept to maintain if the active txn is alive?
-
This is not as easy as it looks
-
Deadlock prevention: holding locks creates waits-for graph; a cycle means no progress will be made
-
Ongoing progress: a DT must not languish for a lack of attention if it's thread times-out.
-
Concurrent roll-forward: once past the client stage, other threads may have to roll-forward to DT in parallel; doing must maintain correct operation.
-
Proof Isolation: Guarantee that some serialization of the transaction is possible.
-
Monotonic Locking
-
How to get and release locks in monotonic way?
-
Getting goes in one direction releasing goes in another. D..
-
Presenter flipped slides too fast.. I think he's feeling nervous and trying to keep the attention of the audience?!
-
Saving Users from Themselves
-
DT and LT don't mix Erick and Ryan required and object to be DT or LT Flavored.
-
Local Transactions don't honor DT locks
-
Read then write then stop:
-
Client reads X=1, writes x=2, then reads x again; writes are buffered, so the client will read x=1!
-
Thus allows neither reads or writes after a write.
-
Did my DT fish? The client must query any timed-out DTs and first roll them forward
-
Failed DTs throw exceptions and maybe reported to the user.
-
Queries are not handled
-
We provide no transaction semantics for queries,
-
only for reads and writes on sets of objects specified directly by the object key..
-
A distributed transaction library will be open source and placed on http://code.google.com
Comment: This is exactly why distributed transactions should be avoided.
Transactions Across Datacenters (and other weekend projects)
-
Consistency
-
Weak
-
After a write, reads may or may not see it - Weak read/write
-
What you get from a cache
-
Examples:
-
message in a bottle
-
AppEngine: memcache
-
Eventual
-
After a write, reads ill eventually see it - Strong write, weak read.
-
Examples:
-
AppEngine: Mail
-
Search Engine indexing
-
DNS, SMTP, snail mail
-
Amazon S3
-
-
Strong
-
After a write, reads will see it - Strong read/write
-
Examples:
-
AppEngine: datastore
-
File Systems
-
RDBMS
-
Azure tables
-
Use Case: read after write
-
Why across datacenters
-
Catstropic failures
-
Expected failures
-
Routine maintenance
-
Geolocatity
-
Why *not* to cross datacenters
-
Within a datacenter
-
High bandwidth
-
low latency
-
little to no cost
-
Between Datacenters
-
low bandwidth
-
high latency
-
cost money for fiber
-
Why Multihoming
-
Hard problem
-
Consistently? Harder
-
with real-time writes? hardest
-
Options:
-
Don't do it!
-
Primary with hot failover(s)
-
True Multihoming
-
Simultaneous writes in different datacenters
-
Two way commit: hard
-
N way commit: harder
-
Comes at the cost of latency
-
Things to look for:
-
Consistency
-
Transactions
-
Latency
-
Throughput
-
Data lost
-
Fail over (moving servers..)
-
Backups
-
Master/Slave replication
-
Usually asynchronous
-
Good for throughput, latency
-
Most RDBMS support it
-
Weak/eventual consistency
-
Datastore: current
-
Multi-master replication
-
Umbrella term for merging concurrent writes
-
Asynchronous, eventual consistency
-
Need serialization protocol - order of write requests
-
No global transactions
-
Datastore: No strong consistency
-
Two Phase Commit
-
Semi-distributed consensus protocol
-
Deterministic coordinator
-
1: Propose, 2: vote, 3: commit/abort
-
Heavyweight, synchronous, high latency
-
3PC buys async with extra round trip
-
Datasource: poor throughput
-
Paxos
-
Fully distributed consensus protocol
-
"Either Paxos, or Paxos with cruft, or broken"
-
Majority writes; survives minority failure
-
Protocol similar to 2PC/3PC
-
Lighter, but still latency
Building Scalable Complex apps on AppEngine:
-
List Property
-
Property has multiple values
-
Maintains it's order
-
Queried with an equals filter
-
Densely pack information instead of denormalizing it
-
Cut across all data and query on one of the values in the list property
-
select * from FavoriteColors where color = 'yellow' where color is a list property
-
Saves space to use list property
-
Uses more CPU to serialize and deserialize the list property
-
Never have composite index between two list properties because it creates a cartesian product index
-
Concrete Example: Microblogging
-
Fanout of messages can be inefficient in terms of space
-
Message sending by reference
-
You would use list properties instead of joins
-
select * from messages where receiver = 'user'
-
Problem with List Property
-
selects load all of the list properties
-
Relational Index Entity
-
Split the message into two entities (message index and message)
-
We put them into same entity group and make message index a child of the message
-
There is a key only query it lets you fetch just the fetch
-
Reads are 10 times faster and cheaper than with just plain list properties
-
Merge Join
-
AppEngine supports self joins
-
Data mining like operations
-
Don't have to build indexes in advance before this query
-
Can be used to test set membership
-
How does Merge Join work?
-
Because they don't have histograms (RDBMSs use histograms to make a query plan)
-
They store all property indexes in sorted order
-
Uses zigzag algorithm
-
If we are using 2 filters, it scans the first property to find a match, then moves the second one to find a match. Then, if the keys don't match, it moves the first one until both the property and key match
-
select * from animal where legs = 4 and type = 'cow'
-
Scales with number of filters
-
Can't apply sort orders - must sort in memory
Keynote 2
-
New Google Product Google Wave
-
Platform
-
Product
-
Protocol
-
A wave is a conversation between multiple people.
-
Wave can be viewed has an enhanced twitter or a hybrid (e-mail, IM, and word document) (Note: We'll post real pictures of it later tonight! We intend to also do screen shots when we get our account and a screencast!)
-
Comment Support - Allows for inline comments or your typical comments at the end of the wave/posting.
-
Real-time updates from others - The document changes real-time while others are updating it.
-
Spell Check - A very sweet inline real-time spell checking and takes the word context into account. i.e., "Can I have some been soup" and it offered the follow "Can I have some bean soup"
-
Play back changes - You can play back the changes in the conversation to see what was done and in what order. Very much like playing back a video.
-
Private replies - Supports for private conversations between users hidden from others on the wave.
-
Drag and Drop - Wave supports D&D from iPhoto.
-
Uses Google Contacts - Integrates with the your GMail and GTalk contacts
-
Wave cloning - Wave allows for cloning of an existing Wave to create a new wave. Reminds me of Git :)
-
When this occurs, all subscribers or people in the wave are notified.
-
Inline editing - Supports inline editing from Wave and external websites that are using the Wave plug-ins.
-
When changes are made, the document is marked up to reflect where the changes were done, ONLY from the last time *you* viewed it.
-
Collaboration Editing - Awesome support for changing content in the same document in the same area and changes are reflected in different colors.
-
Open Social Integration
-
Any open social app can live inside wave
-
Developer API -
-
A demo was given of a custom widget that allows a user to vote "Yes, Maybe, No"
-
Sudoku Widget that allows multiple players to play with one another.
-
Chess Widget that allows others to play one another and uses the playback feature. Nice integration.
-
Google Maps Widget that shows all other users where you're looking. Also, draw regions, add pins and more! Very sweet!!
-
Search - You can search your contacts or use their built in Google search to search the web and actually use the results in the document. i.e., search for an image and select it to embed it into the document.
-
Multiple languages - Supports multiple languages.
-
Real-time language translation using a program called "Rosy". This was very sweet!
-
Polls - A nice little extension allows for creating polls.
-
System Federation between Wave 'systems'
-
Wave systems can collaborate between one another.
-
Private Waves between people within the same 'wave system' are never sent to other wave servers in the federation.
-
Forms are native to Wave
-
They are going to Open Source the 'lion' share of the code.
-
Written in GWT and HTML5
-
Demos
-
There was a demo of dragging and dropping a file into the browser to create an attachment. This is not supported in HTML5 yet, it is a prototype.
-
A nice demo of how the API is used to write an external application like a blog. Very cool demo.
-
Orkut demo with using their embedded API.
-
Nice twitter integration that signs in to twitter and actually will post tweets.
-
Very sweet code.google.com integration tool.
-
All attendees at Google I/O will obtain an account to use Google Wave before it's released. Hence, come see Google IO. Did I mention we also got a free Cell phone with a full month of unlimited service?
-
Comes with a developers API (We're talking about Google, it's expected! :)
-
Minor bug occurred when doing the demo. Hey, we're talking about live demos (turned out to be a wrong configured browser proxy).
-
Google Wave is great for team collaboration by adding inline comments, embed images, viewing changes, live changes, remove the conversation nose and release a final product and more.
-
-
Open Social Integration
-
Any open social app can live inside wave.
-
Website URLs
-
http://wave.google.com - Main website for Wave
-
http://code.google.com/apis/wave - API website
-
http://waveprotocol.org - The protocol website
Offline Processing on App Engine: A Look Ahead
By: Brett
Live Notes by @dushyanth
-
Motivation
-
AppEngine is great for request based database backed applications
-
Cron is good for periodic jobs, but not good enough
-
Problems with Polling
-
Wasted work as it is not event driven
-
Workers stay resident when there is no work wasting resources
-
Fixed number of workers. Or admins must manually add workers
-
Limited amount of optimization possible
-
Long lived hanging connectons
-
Existing task queue like systems
-
MQ, Amazon SQS, Azure Queues, Starling (getting popular these days)
-
Task Queue API
-
Part of AppEngine Labs (API may change until it graduates from Labs)
-
Asynchronous execution for a first in first out queue.
-
If execution fails, work will be retried until successful
-
Tasks are light weight to store. They are 3 times faster than storing in the datastore.
-
Tasks are scalable. The tasks can be started across a lot of machines.
-
Implements queueing. NOT pub-sub
-
Goals: High throughput, maximizing data throughput
-
Pushes tasks to the app. No polling
-
Uses Web hooks (It is a RESTful push-based interface for doing work)
-
Task is submitted as a web hook. If you get a 200 back, it succeeds.
-
Essentially combines queuing over REST.
-
Integrated into admin console as normal requests
-
Supports config driven throttling
-
Can be used to Prevent web services (external) from getting overloaded
-
Stay inside budget per hour etc
-
How task Queue Works
-
Tasks enqueue in a queue
-
Queue Moderator pulls from the head of the queue
-
It submits the task to the workers. Queue Moderator has capability to create new workers (threads).
-
Max number of threads depends on throughput
-
When a task is submitted, it could be running even before the enqueue request API call returns :-)
-
EdgeCases
-
Tasks have to be idempotent
-
Possible for a task to spuriously run twice even without failures.
-
You could use memcache or database to avoid it running twice, but that responsibility is on the developer
-
Working with TaskQueues
-
Each task added to a single queue
-
You can create multiple queues per application
-
Working with ETA (Estimated time of Arrival)
-
How long until the task is executed
-
Different than "visibility timeouts" in other systems
-
Working with tasks: Names
-
Tasks can be named. If a task is not named, it is auto generated
-
Prevents tasks from accidentally being submitted multiple times
-
Concrete Example: Write behind cache
-
Minimizes writes with repeated cache flushing
-
Write new data to cache
-
Periodically read cache and persist to disk
-
To implement, user submits data to cache and a task to task queue
-
When the task queue is processed, task is dispatched and the task does a periodic read from the cache and writes to the datastore. Essentially using the TaskQueue as an executor
-
Python only at first. Java comes next
-
Java support in the works
-
The Future
-
Batch Processing
-
Task Queue is good for small daasets (<100k rows)
-
More tools needed for parallelization
-
Map Reduce in future
-
Eventually
-
Want it to work with small and large (Terabyte scale) datasets
Offline Processing on App Engine: A Look Ahead
Live Notes by @rburton
-
Task Queue API
-
Task, Webhooks
-
Push vs. Pull Performance
-
Idempotence, Queues, Thorttling
-
Names, ETA
-
Cron is good for periodic jobs, but not good enough
-
Having Offline processing really opens the doors to new applications to run on the App Engine
-
Task Queue
-
New API for batch processing.
-
Billing details have not been finalized
-
Expected to be released in a few weeks
-
What is it
-
Describe the work you want to do now
-
Save the description somewhere
-
Have something else execute the work late
-
Basically an HTTP adapter for queue's where the worker is modeled has an HTTP application -> Queue - > Queue mediator -push-> Request Handler (HTTP application).
-
Work executed in the order received (best-effort FIFO)
-
If execution fails, it will be retried until successful
-
Benefits
-
Asynchronous
-
Low-Latency
-
Tasks are lightweight; 3x faster than datastore
-
Reliable
-
Once written, a task will eventually complete
-
Scalable
-
Storage of new tasks has no contention
-
Parallelizable with multiple workers
-
Many features can extend this basic concept
-
Other queue systems exist
-
New API implementing queueing not pub-sub.
-
Polling has problem
-
Not event driven -
-
Workers stay resident when there's no work to do
-
Fixed number of workers
-
Admins must add more workers to keep up to the workload.
-
Limited optimization possible
-
Many systems fake a polling interface with something event-driven under the hood
-
Task Queue is different because...
-
Work is Push to your app; no polling necessary
-
HTTP Web Hooks
-
RESTful, push-based interface for doing work
-
Concept used outside Google and App Engine
-
http://en.wikipedia.org/wiki/Web_hooks
-
Task as web Hooks
-
Task is just an http request
-
Enqueue and we send your app the request later
-
if the web hook returns HTTP 200, it's done.
-
Any other response causes back-off and retired.
-
Worker threads added depending on work-load
-
max number of threads depends on throughput
-
high maximum rate limits for safety
-
Integrated into admin console as normal request
-
Application and request logs searchable
-
Dashboard statics and error-rate monitoring
-
Graphics include offline work
-
Idempotence
-
Important for tasks to be idempotent
-
Run the same task repeatedly without harmful effects or acceptable effects. i.e., duplicate emails
-
Necessary because failure may happen at anytime
-
Tasks will be retired until success
-
Possible for a task to spuriously run twice even without server failures!
-
It is your responsibility as a the developer to ensure Idempotence
-
Working with a Queue
-
Each task added to a single Queue for execution
-
Multiple queues allowed per application
-
Queues provide isolation and separation of tasks
-
Configure how each queue is throttled
-
Why throttle?
-
Handle execution in batches
-
Prioritization of work
-
Queues can be used to setup SLA
-
Schema migration
-
Define handler to: query for next N entities; modi.. slide changed DAMN!
-
ETA
-
Estimated Time of Arrival
-
How long until a task should be executed
-
Different than visibility timeouts in other systems.
-
Useful for doing work in the near future
-
More fin-grained and programmatic control than cron
-
i.e., clear caches periodically, fulsh buffers, etc.. basically avoid polling
-
Tasks Names:
-
A unique name by the app
-
Auto-generated if not given
-
Task names are used to ensure a task is only ran once since it'll collide with an existing name. They 'tombstone' task names after they're used.
-
Thoughts:
-
Even though Task Queues are useful, but I see it being problematic for developers to use properly. A good example is how the presenter had a bug in his code which did a simple add to list.
The Softer Side of Schemas - Mapping Java Persistence Standards to the Google App Engine Datastore
Live Notes by
@rburton
-
Datastore is
-
transactional
-
natively partitioned
-
hierarchical
-
Schema-less
-
Based on BigTable
-
Datastore is NOT
-
A relationshional database
-
A SQL engine
-
Designed to
-
Simplify development of Apps
-
Simplify management of Apps
-
Strength of Services on Google's platform
-
Easy to Scale
-
Request volume
-
Data volume
-
Storage Model
-
Basic Unit of storage is Entity consisting of
-
Kind - aka Table
-
Key - Primary Key
-
Entity Group - Partitioning of data
-
0..N typed properties - Columns
-
Datastore Transactions
-
Transactions can only be done on a single "entity group"
-
get()/put()/delete() are transactional
-
Soft Schema
-
Constraints are done in the application layer
-
Ease of development
-
Rapid typesafe prototyping
-
Implementing a Soft Schema on the Datastore using JDO & JPA
-
JDO or JPA meta-data defines the soft schema
-
Standards based APIs
-
Existing tooling
-
Easier to port
-
Specs closely map to the datastore and the datastore to the specs.
-
API is a good fit
-
implementation is tougher
-
Transactions
-
Global transactions vs. Entity Group Transactions
-
Two phase commit
-
Relationship Management
-
JDO & JPA are not just about object relationships
-
Transparent persistence - Pure POJO's
-
Object view of your data - ORM (Object to Relationship Mapping)
-
Centralized mapping
-
Big maintainability win
-
Letting the framework manage relationship management
-
Transparent Entity Group Management
-
Entity Group Layout is important
-
Write throughput
-
Atomicity of updates
-
Object relationships can be described as "owned" and "unowned"
-
Owned - A child object doesn't make sense without the parent. Composite?
-
Unowned - The object doesn't require a parent.
-
We let Ownership imply co-location within an entity group.
-
Future JDO/JPA Work
-
Provide more control over physical layout
-
Required getNextId() to avoid multiple updates to the same entity.
-
Supported unowned relationships
-
Tricky transaction issues (Because transactions can only be done on the same entity groups.)
-
Loosen our query restrictions
-
Migrating to the App Engine
-
It's important to note that it's not a drop in replacement to RDBMS
-
Analyize
-
Primary Keys
-
Single-column numeric and string keys are a nice fit
-
Composite keys can be an ancestor chain
-
Mapping tables can be represented using multi-value properties (many-to-many join tables)
-
Transactions
-
Queries
-
Views
-
Triggers
-
Data Migration
-
Porting
-
Transactions
-
Identify roots in your data model
-
Identify operations that transact on multiple roots
-
Analyze the impact of partial success and then either
-
Refactor
-
disable the transaction
-
disable the transaction and write compenstaing logic.
-
Queries
-
Shift processing from reads to writes
-
Identify joins
-
Denormalize or rewrite as multiple queries
-
Identify unsupported filter operations (distinct, toUpper)
-
Rewrite as multiple queries
-
Filter in-memory
-
Taking your code to someone else's party
-
App Engine persistence code is generally more restrictive
-
Queries
-
Transactions
-
Multiple updates
-
Decide what portablilty means and how important it is
-
To key or not to Key
-
Multi-value properties
-
Your domain model is already shareded.
-
Key Takeaways
-
The App Engine Dtastore simplifies persistence
-
You can use JDO/JPA to implement a soft schema
-
Denormalize is not a dirty word
-
Think about porting if you leave.
The Softer Side of Schemas - Mapping Java Persistence Standards to the Google App Engine Datastore
Live Notes by @dushyanth
-
Datastore is
-
Transactional
-
Natively Partitioned - developer does not have to worry about scaling
-
Hierarchical - every entity can have a notion of parent
-
Schemaless - no restricted structure
-
Based on BigTable
-
Not a relational database
-
Not a SQL Engine
-
Simplifying Storage
-
Simplifies
-
Development
-
Management of applications
-
Scale always matter
-
Request volume
-
Data volume
-
Datastore Storage Model
-
Entity consists of
-
Kind
-
Key
-
Entity Group
-
0..n properties
-
If entity group == key, the entity is a parent
-
Heterogeneous property types. Properties can be of different types in different entities
-
Supports multi valued properties
-
Variable property - Having the same properties between entities is not needed
-
Soft Schema
-
It is a schema whose constraints are enforced only in the application layer
-
Simpler development process
-
Can be enforced by JDO or JPA metadata mappings
-
Transactions
-
Only transact within an entity
-
Relationship Management
-
JDO and JPA are not just about object relationships
-
Transparent persistence
-
Object view of your data
-
Centralized mapping
-
Big maintainability win
-
AppEngine decides and manages which entity group the entity belongs to
-
Uses ownership to enfore entity group colocation
-
Future JDO/JPA work
-
Support unowned relationships
-
Bringing existing code to App Engine
-
Datastore is not a drop in replacement for RDBMS
-
Plan for data migration
-
Primary Keys
-
Single property keys: Straight forward way to map single property keys
-
Composite keys:Can map to ancestor chain
-
Mapping table: Can be represented using multi-value properties
-
And can be queried with set memebership
-
Transactions
-
Identify roots in the data model
-
Identify operations that transact on multiple roots
-
Analyze impact of partial success
-
Refactor
-
Run compensating logic
-
Queries
-
Shift processing from reads to writes.
-
Denormalize
-
Expensive write and cheap reads
Google Wave Client: Powered by GWT
Live Notes by @dushyanth
-
It got to be fast
-
Stunning
-
Optimistic UI
-
JSNI
-
Client Architecture
-
Bidirectional communication channel - keep alive http
-
Protocol Compiler
-
Generates interfaces, client + server implementations
-
GWT
-
Code Heavy
-
Can use UIBinder to plop GWT components into html
-
Most bugs are from CSS
-
Style Injector + CssResource
-
Looks like Minification + Image Spriting is done by GWT
-
Allows modularization of CSS
-
Different CSS for different browsers
-
Inefficient JSON handling
-
JSO - Javascript object structure
-
Didn't quite get it
-
Hosted mode isn't quite browser like
-
OOPHM (Out Of Process Hosted Mode) - Browser plugin to debug in eclipse
-
Download Size
-
runAsync(dynamic loading of code)
-
Download lazily
-
No transparency between javascript and java
-
SOYC (Story of your compile) reports
-
Java package to javascript breakdown report
-
JSOs cannot implement interfaces
-
SingleJsoImpl
-
In order to inline, JSOs cannot have polymorphic dispatch
-
Atmost one JSO class being implementing one interface
-
Improving Gears
-
Client side thumbnailing
-
They create a thumbnail using the workerpool before uploading the image to server.
-
Desktop drag n drop
-
Resumable uploading
-
Performance
-
Startup
-
runAsync
-
fast start
-
inline images + css
-
smaller download
-
stats collection
-
server-side script selection
-
Server sends down the correct javascript + css files based on http headers
-
Loaded Client
-
Optimistic UI (trying to guess what the user will click next)
-
Prefetching
-
Flyweight pattern
-
Rendering tricks
-
Mobile Client
-
Deferred binding saves the day
-
iPhone browser is always running
-
It loads faster than native apps
-
Testing
-
Use Model View Presenter design pattern - how is it different from MVC?
-
Prefer JUnit tests over GWTTestCase
-
Browser automation - WebDriver
-
Web driver is a developer focused tool for browser automation
-
Has native keyboard and mouse events, rather than synthesised via JS
-
iPhone Driver - automated testing on iPhone
-
Remote Web Driver - so web testing can be farmed out into a grid
-
Google Wave Client: Powered by GWT
By: Richard L. Burton III
-
Bi-directional communication channel
-
Protocol Compiler
-
Generates interfaces, client+ server implementation
-
Concurrency Control stack
-
What was GWT missing in 2007
-
UI Code was cumbersome
-
Cross-browser code
-
Solution: StyleInjector + CssResource
-
Validation
-
Minification + Image Spriting
-
Image Spriting creates one large image and has code to read spirts.
-
Allowing Modularization of CSS
-
Different CSS for different browers at compile-time.
-
JSON handling heavy-handed
-
Solution: JavaScriptObject (JSO) used to create anoverlay type.
-
Avoid using JSONObject: Use JSO/StringBuffer
-
Debugging environment not the browser
-
Solution: Out-of-Process Hosted Mode (OOPHM)
-
Browser plugin to debug in eclipse, but runs in the real browser.
-
Firebug only for FireFox
-
OOPHM allows JavaScript debugging in FF, Safari, IE (So far..)
-
See http://code.google.com/p/google-web-toolkit/wiki/DesignOOPHM
-
Monolithic compile -> everything downloaded at start
-
Solution: runAsync
-
Downloads what you need when you need it
-
Resources (css, images, msgs) come with with code that uses it.
-
Automatically handled by GWT compiler.
-
See http://code.google.com/p/google-web-toolkit/wiki/CodeSplitting
-
Sorty-of-your-compiler (SOYC) reports.
-
Mapping from java <-> JS unclear
-
Solution: Sorty-of-your-compiler (SOYC) reports.
-
How does it help?
-
Messages too large
-
Compiled class names
-
what's in the initial download of the client.
-
Compiler needed to be updated.
-
JSO connot implement interface
-
Issue: Needed interfaces for the Mesages for server and client to hide implementations
-
Solution: SingleJsoImpl
-
In order to inline, JSOs cannot have polymorphic dispatch
-
SingleJsoImpl: allow at most one JSO clss to implement any interface.
-
Improving Gears
-
Client-side thumbnailing
-
Send thumbnails before image upload
-
uses WorkerPool to avoid blocking the UI. i.e., threads
-
Desktop Drag and Drop support
-
Resumable uploading
-
Performance
-
Startup
-
RunAsync
-
inline images + CSS
-
Server-side script selection- To detect what scripts are required.
-
Loaded client
-
Optimistic UI
-
Prefetching
-
flyweight pattern
-
rendering tricks (prefer DOM over GWTs Widget)
-
Mobile Client
-
GWT defrerred bindings helped a lot
-
Version 1 is AJAX only
-
HTML 5 / Gears cacing - Uses AppCache manifest GWT linker
-
Special communication channel for mobile devices.
-
Testing
-
Model View Presenter
-
Prefer JUnit over GWTTestCase
-
Browser automation: WebDriver
-
Developer-focused tool for brower automation
-
Native keyboard and mouse events, rather than synthesised via JavaScript
-
Incomplete
-
Early adoption by Wave
-
Google Wave commitment
-
iPhoneDriver coming soon
-
RemoteWebdriver for gride computing
- Tips
- Avoid XPath
- Use ID's name, and sub-dom navigation