HBase-2729

1. Symptom

The problem lies in the method internalFlushCache, at HRegion. It writes directly to the target spot of the flushed data, regardless of the success of the data write.

1.1 Severity

Blocker.

1.2 Was there exception thrown?

IOException. Not for the failure itself (the flush). An exception could be thrown when the faulty file is written, but that is not part of the bug itself.

1.2.1 Were there multiple exceptions?

No.

1.3 Was there a long propagation of the failure?

No.

1.4 What was the scope of the failure?

Large number of region files.

1.5 Catastrophic?

yes.

2. How to reproduce this failure

2.0 Version

0.89.20100621

2.1 Configuration

One cluster with one Region Server and an HMaster with a faulty file system.

2.2 Reproduction procedure

1. Start writing something into the temporary file system (file write)

2. Throw any IOException and interrupt the procedure (e.g. disconnect the writing client), creating a corrupted file (disconnect)

3. Flush the data using the method internalFlushCache. (feature start)

2.2.1 Timing order

In this order.

2.2.2 Events order externally controllable?

Yes.

2.3 Can the logs tell how to reproduce the failure?

No (they show the corrupted file, but do not show it was wrongly written to the disk, which is the wrong behavior).

2.4 How many machines needed?

One.

2.5 How hard is the reproduction?

Simple.

3. Diagnosis procedure

3.1 Detailed Symptom (where you start)

An internalFlushCache was ran and completed successfully for a file that was corrupted.

3.2 Backward inference

HRegion.internalFlushCache writes directly to the target spot of the flushed data. The finally() block appends the metadata and closes the file as if nothing bad went wrong in case of an exception. This is really bad, since it means that an IOException in the middle of flushing cache could easily write a valid looking file with only half the data, which would then prevent us from recovering those edits during log replay.

3.3 Are the printed log sufficient for diagnosis?

No.

3.4 Are logs misleading?

No.

3.5 Do we need to examine different component’s log for diagnosis?

No.

3.6 Is it a multi-components failure?

No.

3.7 How hard is the diagnosis?

Simple.

4. Root cause

HRegion.internalFlushCache() writes directly to the memory and does not check the status of the write and also does not check for exceptions thrown. The correct logic is to first write the data to a temporary directory, and only if there is no exception thrown and writes succeed, write them to the actual directory.

4.1 Category:

Incorrect Error Handling (an IOException during the write would expose the failure!) But to test this, you need to thrown an error (and statement coverage is not enough)

5. Fix

5.1 How?

The method internalFlushCash should write the data to a temporary directory first and check if the write was successful before moving to the permanent directory.

try{

-    writer = createWriter(this.regionCompactionDir, maxKeyCount);

+    writer = createWriterInTmp(maxKeyCount,

+             this.compactionCompression);

} finally {

   // Doesn’t care about if an exception is thrown...

   if (writer != null) {

        writer.appendMetadata(maxId, majorCompaction);

        writer.close();

   }

}