[TM Technical Report]
Improving Scalability and Performance
[Authors: James Dam Tuan Long]
Bottleneck: Inefficient reads/writes
Bottleneck: Overhead of managing entity relationships
Bottleneck: Iterating all entities
Bottleneck: Too many writes in a single request
This report describes a preliminary study that identified potential scalability bottlenecks in TEAMMATES and some attempts to remove them.
Currently, objects are read using GQL queries. Here is an example:
The above function tries to query datastore for the coordinator with a particular googleID, the result from the query is a list but the function only returns the first element in the list.
There is nothing wrong with that function in a normal application using SQL. However, GAE datastore uses a NoSQL proprietary backend technology called BigTable, which has a much more efficient way to read an object which is direct lookup [1]. Direct lookup use the key of object (in the above case is the googleID of coordinators) to identify the object. Direct lookup can be 4-5 times faster and consume half of the resource used by a query. For frequently used entities, using memcache can boost performance (up to 20 times faster than direct lookup) and save a lot of resources [TODO: cite source].
Writing data back to datastore can also be improved. Since GAE by default builds index for all fields, except for text fields and blob fields. However, only fields that used for querying need to be indexed.
Proposed solution 1:
Rewrite of current datastore to use direct lookup and memcache.
Pros: Increases performance and saves space.
Cons: Have to rewrite current datastore and add extra code.
Decision: Put on hold as the change is drastic and the need for performance improvement is not that drastic at the moment.
Proposed solution 2:
Disable unused built-in indexes.
Pros: Reduces write operations and space to store indexes, increase writing speed.
Cons: Removing wrong indexes could results in serious problems.
Decision: Solution adopted. Some unused indexes were identified and removed. (see Issue 346). Removing unused indexes actually reduces storage space. For the data set of 50,000 users, the space used for indexing reduces from 547Mb to 335Mb. The dastastore read and write usage reduced by 10-20% .
Currently, objects are updated separately and relationships are maintained at code level (not database level) using string references as entity IDs. For example, the Student entity has a String field called courseId that is used to identify a Course entity, as opposed to having a field of Course type. Maintaining relationship between two objects by keeping the ID of one object as a field in the other is used quite often, especially for 1-to-many relationships. However, it is not very efficient for “1-to-a-few” and “parent-child” relationships (like instructor-course, student-course or course-evaluation relationships, where parent object is not likely to own a lot of child objects and the ownership is not supposed to change).
Proposed solution 1:
Do not use Bigtable (datastore). Google has now offered Google Cloud SQL, an SQL-like database that can be used in GAE. For a not very data-intensive application like TEAMMATES, some web developers suggest using SQL because of its simplicity.
Pros: simplify database design, and still use Google infrastructures.
Cons: Same scalability problem as a traditional web application because of the natural of SQL.
Decision: Changing to a new database is a drastic measure that is not needed at this point.
Proposed solution 2:
GAE provides better ways for this type of relationships,
Pros: save resources, increases speed and manages object relationships better.
Cons: require deep knowledge of GAE datastore and need to write a lot of extra code.
Decision: To be considered in the future.
Proposed solution 3:
Use the Objectify framework.
Objectify is a very popular framework for GAE. According to their documentation,
“Objectify is a Java data access API specifically designed for the Google App Engine datastore. It occupies a "middle ground"; easier to use and more transparent than JDO or JPA, but significantly more convenient than the Low-Level API. Objectify is designed to make novices immediately productive yet also expose the full power of the GAE datastore. “
Pros: Objectify exposes all native datastore features, including batch operations, queries, transactions, asynchronous operations, and partial indexes. Its syntax is intuitive and easy to understand.
Cons: Objectify is not easy to master. It adds to the learning curve of developers.
Decision: Put on hold.
Experiments: After redesigning part of the app (classes converted: Account, Instructor) to use Objectify (version 4.0a), we compared its performances against the original app. The results were not encouraging. In some cases, Objectify was found to be even slower than the original.
Given below are some data (averaged over 100 requests)
| Using Objectify | NOT Using Objectify |
Delete coordinator | 874 ms | 603 ms |
Delete course | 924 | 622 |
Create coordinator | 801 | 634 |
Create course | 803 | 608 |
Get coordinator | 799 | 620 |
Get course | 969 | 565 |
There are some functions in the current TEAMMATES that are not very scalable. One kind of such functions queries all the entity of a kind from datastore and then processes the result in memory. This is never a good solution even for non-web application since all machines have limited memory (the current instance type TEAMMATES is using has only 128MB of memory). Imagine that we have 100,000 students in the system, one call to such functions below will crash the system because of insufficient memory.
Even if the system does not crash, it is never a good idea nor a good practice to write such functions because of the memory consumption and speed.
Proposed solution 1:
Use query cursor. Query cursor helps to break the query result into smaller chunks to process repeatedly. Query cursor and task queue can be used to solve long processing task in background to overcome 60s limit for a request.
Pros: intuitive, similar to cursor in SQL
Cons: have to use multiple queries, which means more resources and waiting time.
Proposed solution 2:
Design queries to reduce in-memory processing.
Decision: Revisit queries in the current system and try to optimize them. Currently, there are methods such as getAllStudents which are not scalable but used only by the admin features.
Every request to GAE should be completed within 60 seconds. After 60s, GAE kills unfinished requests by throwing a DeadlineExceededException. Even if the request is not killed by GAE, having a long request can cause unpleasant experience to users. For example, in the current implementation, creating an evaluation for a large class is extremely expensive because it also creates all Submission entities requires for the evaluation, costing up to thousands of read and write operations. The number of submissions in an evaluation depends on the size of the course. For a course of hundreds of students and time size of about 3-5 students, an evaluation can have more than 1000 submissions. Because of that, creating evaluation operation has the risk of exceeding 60s. Note: A quick test with V4.35 indicates creating an evaluation for a class of 300 students takes around 30 seconds.
Proposed solution 1:
Persist objects in batches (i.e., persistAll method) instead of one at a time.
Decision: This is the current approach.
Proposed solution 2:
Use entity groups and transactions.
Pros : Transactions ensures atomicity of data.
Cons: Requires redesigning of datastore. Using transactions will block the whole entity group from being modified by other requests.
Decision: To be considered in the future.
Proposed solution 3:
Use asynchronous writes to the datastore.
Pros: Saves the waiting time of write operations. Can save multiple entities at the same time.
Cons: Really advanced and newly introduced technique. Can cause more problems than solving if not used correctly.
Decision: To be considered in the future.
[1] Google I/O 2012 talk - Optimizing Your Google App Engine App (video)(slides)
---end of report---