Thoughts on Improving ADS 01 --Reducing Space in Rows

ADS was very fast for high-volume workloads with lots of updates --especially when each write transaction did only a very few updates. However, there is always room for improvement and this section is for discussing that.

There are two significant overheads of the concurrency control model used ADS:

The row takes up more space than it would need just to represent the data. The extra space includes bit fields, a rowlock and the extra space required because every cell has to be at least 8 bytes, even if the data for the cell would fit in one. This space is costly because increasing the size of a row decreases the number of rows that can be kept in a disk page, which in turn means that you have to load more pages to see the same number of rows.
Updates require making a linked list of updates for the session, then traversing that list on a commit or rollback, then traversing it again on cleanup. Compare the dozens or hundreds of instructions required for each update to the less than 5 instructions that would be required to update the same value in a non-concurrent system. And not only is this a lot more work, it also has very poor cache behavior.

In database technology, it is a rule of thumb that you don’t worry a lot about the numbers of instructions used for an operation because all of the time is spent waiting for I/O. The extra space used for a row increases the amount of I/O that has to be done. But the numbers of instructions are important also because ADS was designed for the case where the working set is kept in memory and I/O is not the dominating factor. Cutting 90% of the instructions associated with updates could make a substantial performance difference.

Getting rid of the extra space

I’ve already discussed how to get rid of one set of flags by using range checks on the pointer value. There is still one flag per column needed and a rowlock used for deletes. After getting rid of the other flag, the remaining overhead could be represented with a 4 byte bit field plus a 4 byte rowlock plus the additional bit fields needed when there are more than 32 columns.

The remaining flags are needed because ADS has two different cell representations: one where the cell holds an immediate value, and one where a cell holds a pointer to an update block or history block --lets call this the history pointer because it is used to support historic reads. Since a cell value can represent either of two different things, ADS needs a flag to distinguish which one it is and it needs to make the cell large enough for the larger of the two possibilities.

It is this dual use of cells that causes many of our problems, so how can we change that? Instead of keeping the history pointer in each cell, we can put a single history pointer at the head of the row and have this one pointer point to a chain of all of the history for the row. For example, suppose we have a row for a table, t, with three integer columns, i, j, and k and that we update a row with a statement like

update t set i = i+1, k = k*2 where ...

The resulting structure would look like this:

With this representation the cell in the row always represents an immediate value so it can be sized for the value and there is no need for a flag to tell what kind of value it is. In addition, this pointer can be used to replace the rowlock by having a special “deleted” node for the update chain. So we haven’t completely eliminated the need for a row header; we have replaced a 4 byte bit field and a 4 byte rowlock plus additional bytes when there are more than 32 columns with a single 8 byte pointer. The only significant savings is really in the column sizes.

The disadvantage of this scheme is that if there are any updates at all on a row, then every reader has to traverse the list of cell updates to see if a given cell has been updated in addition to reading the cell. If there are updates on many different cells, then all readers will have to take extra time traversing the column update list to find the proper column. We could avoid this overhead by keeping the column list as an array instead of as a linked list. In other words, the column history pointer would point to an array of pointers, one for each column:

With this structure reading costs no more than it did in ADS. For updates where over half of the columns are modified, the array uses less memory and takes less work to initialize (because the linked list has two pointers for each history value and the array just one). For updates where less than half the columns are modified (which is probably the more usual case) the array representation takes more memory and more work to initialize because you have to zero out all elements that do not point to an update block. For a long row, that can be a significant cost if you only want to update one cell but you end up having to set hundreds of columns to 0. On the other hand, zeroing out adjacent cells is relatively cheap compared to following linked lists because linked lists have poor cache behavior.

We can even avoid the overhead of zeroing a large array by using a bit field to say what columns are set so we only have to initialize the bit field to 0, not all of the cells. An intermediate approach is to allocated column arrays in segments of, say 16 columns, and use a bit field to indicate which are fields are in use.

For a block of size 16 we need to reserve 2 bytes for the bit fields that say which columns have updates. If we want to align the pointers to an 8-byte boundary, that leaves us with 6 bytes to specify the block number. Block number 0 represents the first 16 cells of a row, block number 1 represents the next 16, etc.

There are obviously a lot of variations on this theme, all with different overheads in terms of space, the time it takes to do the first update on a cell, the time it takes to find a historic value, and the complexity of the structure. Complexity is a cost because complexity makes it more expensive in programmer time to write and maintain the code.