CS 186 Exam Prep 9
Recovery
Recovery
WAL, Undo Logging, Redo Logging, ARIES
Recovery Policies
Write-Ahead Logging
Undo Logging
Recovery with Undo Log
6
Undo Log Example
Operation | t | Mem A | Mem B | Disk A | Disk B | UNDO Log |
| | | | | | <START T> |
READ(A, t) | 8 | 8 | | 8 | 8 | |
t := t*2 | 16 | 8 | | 8 | 8 | |
WRITE(A, t) | 16 | 16 | | 8 | 8 | <T,A,8> |
READ(B, t) | 8 | 16 | 8 | 8 | 8 | |
t := t*2 | 16 | 16 | 8 | 8 | 8 | |
WRITE(B, t) | 16 | 16 | 16 | 8 | 8 | <T,B,8> |
FLUSH(A) | 16 | 16 | 16 | 16 | 8 | |
FLUSH(B) | 16 | 16 | 16 | 16 | 16 | |
COMMIT | | | | | | <COMMIT T> |
Redo Logging
Recovery with Redo Log
9
Redo Log Example
Operation | t | Mem A | Mem B | Disk A | Disk B | REDO Log |
| | | | | | <START T> |
READ(A, t) | 8 | 8 | | 8 | 8 | |
t := t*2 | 16 | 8 | | 8 | 8 | |
WRITE(A, t) | 16 | 16 | | 8 | 8 | <T,A,16> |
READ(B, t) | 8 | 16 | 8 | 8 | 8 | |
t := t*2 | 16 | 16 | 8 | 8 | 8 | |
WRITE(B, t) | 16 | 16 | 16 | 8 | 8 | <T,B,16> |
COMMIT | | | | | | <COMMIT T> |
FLUSH(A) | 16 | 16 | 16 | 16 | 8 | |
FLUSH(B) | 16 | 16 | 16 | 16 | 16 | |
Undo/Redo Logging Summary
Aries Recovery - LSNs
Recovery Structures
Record Types
Record Types (cont.)
ARIES: Analysis (Part 1)
ARIES: Analysis (Part 2)
ARIES: Redo
ARIES: Undo
ARIES: Overall
ARIES Recovery Summary
1. Analysis
Goal: Rebuild the Xact Table and DPT at time of crash
Start at LSN after begin_checkpoint
If record != END
- Add xact to Xact Table if not there
- Set lastLSN of xact to record.LSN
If record = COMMIT or record = ABORT
- Change status in Xact Table accordingly
If record = UPDATE and record.page NOT IN DPT
- Add to DPT & Set recLSN to record.LSN in DPT
If record = END
- Remove xact from Xact Table
At the end:
If transaction is Committing
- Write END record and remove from Xact Table
If transaction is Running
- Write ABORT record and change status to Aborting
2. Redo
Goal: Maintain durability by getting back unflushed changes from before the crash
Redo everything from the earliest recLSN unless:
Page not in DPT
- Page on disk must be up to date, since we have no changes!
recLSN of page > LSN
- No need to redo here: recLSN of page is first record that dirtied page, so this change must have been flushed
On-Disk pageLSN >= LSN
- On-Disk Page LSN is the authoritative source for determining which changes have been applied already
3. Undo
Goal: Maintain atomicity by undoing any operations done by transactions running at the time of the crash
For all xacts in xact table not committing, change status to abort
Start from largest lastLSN in transaction table and work your way up the log.
- Write the relevant CLRs for any actions done by aborting transactions
For CLRs:
- prevLSN = lastLSN of associated transaction
- undoNextLSN = current record’s prevLSN
Performance optimization:
- Make a set containing the last undoNextLSN of each transaction (or lastLSN if no CLRs written for transaction yet)
- Undo the largest LSN, repeat
Worksheet
Conceptual Recovery
Question 1
Consider a scenario where we update the recLSN in the dirty page table to reflect each update to a page, regardless of when the page was brought into the buffer pool. What bugs might you see after recovery? Select all that apply. Explain your reasoning:
Answer: a, b.
Question 2
Suppose that you are forced to flush pages in the DPT to disk upon making a checkpoint. Which of the following cases are now guaranteed? There is one correct answer. Explain your reasoning.
Answer: c.
In general, we redo everything from the earliest recLSN in the DPT to get back unflushed changes from before crash. Since we can guarantee that all changes up to a checkpoint have been flushed, all unflushed changes from before the crash happened after the checkpoint. Therefore, we can redo starting from the checkpoint.
Question 3
If the buffer pool is large enough that uncommitted data is never forced to disk, is UNDO still necessary? How about REDO? Explain your reasoning.
Answer: UNDO isn’t necessary in terms of undoing operations on disk.
Having a buffer pool large enough to hold all uncommitted data means we don’t have to STEAL (allow an uncommitted transaction to overwrite the most recent committed value of an object on disk). Since all the updates will be sitting in the buffer pool at the time of crash, no changes will be made to disk, so no operations need to be undone. REDO is still necessary. REDO is needed to get back unflushed changes from before the crash. If everything is held in the buffer, this must be redone
Question 4
If updates are always forced to disk when a transaction commits, is UNDO still necessary? Will ARIES perform any REDOs? Explain your reasoning.
Answer: Only UNDO is necessary.
UNDO is necessary because we still have to finish aborting transactions that were in progress and weren’t committing. The updates that these transactions have gotten through so far must be rolled back. As for REDOs, ARIES might perform some redoes because there may be transactions still in progress at the time of a crash, but these will be undone in the UNDO phase.
Recovery Practice
Question 1
What is the latest LSN that this checkpoint is guaranteed to be up-to-date to?
Answer: LSN 50 - the begin-checkpoint record.
Question 2
What do the transaction table and dirty page table look like at the end of analysis, and what log records do we write during analysis?
Question 3
The next phase of ARIES is redo. What LSN do we start the redo from?
Answer: LSN 10 - the oldest recLSN in the Dirty Page Table
Question 4
From that record, we will redo the effects of all the following records, except we will not redo certain records. What are the LSNs of the records we do NOT redo?
Answer:
Question 5
The last phase of ARIES is undo. What do we do for this phase? Answer this question by writing out the log records that will be recorded for each step. Stop after you write your first CLR record (make sure your CLR record specifies the nextLSN!).
Question 6
You load up the checkpoint. What does the transaction table and dirty page table look like?
Answer: Same as before, we never made another checkpoint!
Question 7
You run the analysis phase. What do the transaction table and dirty page look like at the end of analysis?
Question 8
You run the redo phase. In order, what are the LSNs that we redo?
Answer: 10, 30, 60, 90, 120.
Importantly, note that we redid T1 even though it committed, and that we ARE redoing CLRs.
Question 9
Now we run the undo phase. What do we do? (Answer again with the log records that you have to add.)
Attendance Link