HBase-2312 Report

1. Symptom

Data loss in a corner case: some newly written data will be lost permenantly!

1.1 Severity

Critical

1.2 Was there exception thrown?

Yes.

1.2.1 Were there multiple exceptions?

HMaster cannot find RegionsServer:

2013-08-14 09:08:16,830 DEBUG org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation: Looked up root region location, connection=org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation@5f9299f5; serverName=localhost,60020,1376485390350

2013-08-14 09:08:18,012 INFO org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, processing expiration [localhost,60020,1376485390350]

2013-08-14 09:08:18,018 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for localhost,60020,1376485390350

1.3 Scope of the failure

Some newly written data is permanently lost!

2. How to reproduce this failure

2.0 Version

0.90.0

2.1 Configuration

Standard. Need 2 RS + HMaster (so that when one RS is down, the other RS can pick up the log splitting job and serve as the new server).

2.2 Reproduction procedure

1. Wait/force a roll hlog on an RS

2. Immediately disconnect this RS from HMaster

3. Notice master starts the log splitting

4. Before 3 finishes, put RS back online and do some write to the regions hosted on this RS

5. After 3 finishes, notice the newly written data is lost!

You can simulate this using the testcase:

mvn test -Dtest=TestHLogSplit#testLogRollAfterSplitStart

2.2.1 Timing order

The timing order of above is critically important!

2.2.2 Events order externally controllable?

Yes.

2.3 Can the logs tell how to reproduce the failure?

Yes.

2.4 How many machines needed?

1 (2RS + HM)

3. Diagnosis procedure

3.1 Detailed Symptom (where you start)

You notice the data loss. The key step is that you realise the lost data is the newly added data while the splitting. This can be a bit hard since there is no exception msgs.

3.2 Backward inference

If you realise that the lost data is the newly added data while splitting occurred, you will notice the following events from the log:

A). HMaster cannot find RS and started splitting:

2013-08-14 09:08:18,012 INFO org.apache.hadoop.hbase.zookeeper.RegionServerTracker: RegionServer ephemeral node deleted, processing expiration [localhost,60020,1376485390350]

2013-08-14 09:08:18,018 INFO org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Splitting logs for localhost,60020,1376485390350

B). HLog is rolled by the RS

LOG.info((oldFile != null?

"Roll " + FSUtils.getPath(oldFile) + ", entries=" +

this.numEntries.get() +

", filesize=" +

this.fs.getFileStatus(oldFile).getLen() + ". ": "") +

" for " + FSUtils.getPath(newPath));

--- This is from the RS.

4. Root cause

Not anticipating this corner case (RS temporarily cannot be connected while HLog rolling). This would trigger splitLog to throw an IOException. Unfortunately, it was ignored…

splitLog:

try {
try {
splitLogSize = splitLogManager.splitLogDistributed(logDirs);
} catch (OrphanHLogAfterSplitException e) {
LOG.warn("Retrying distributed splitting for " +
serverNames + "because of:", e);
splitLogManager.splitLogDistributed(logDirs);
}
} catch (IOException e) {
LOG.error("Failed distributed splitting " + serverNames, e);
← An IOException is thrown, but is simply swallowed!
}

4.1 Category:

Incorrect exception handling (empty handler)

5. Fix

5.1 How?

The fix:

1. Before HMaster splits the log, it will first rename the log directory by adding a “-splitting” suffix. So later if the rogue RS comes back online, it cannot find the original HLog directory, thus will not further write to HLog.

2. In RS’s roll log, if the writer cannot find the old HLog directory (because of the rename in step 1), it will throw “file not found” exception, instead of silently create a new directory.

After the fix, the splitLog will propagate the IOException to the caller, instead of ignoring it!

Here is what happens after the fix: