A lot of data is lost when name node crashed

(1) Log information

(1.1) Roles in this case

ActiveNameNode crashes, BackupNameNode takes over

Active HMaster splits log for the crashed RegionServer

(1.2) Symptoms

Cluster setup:

1. HDFS cluster is HA namenode (ANN (ActiveNameNode) and BackupNN (BackupNameNode))

2. HBASE Version 0.90.1: Active Hmaster, Backup Hmaster, several Region servers

Operation:

1. ANN crashed and BNN becomed Active (that needs some time)

2. A region server crashed (which is holding the meta table) when Hbase client is putting into data.

3. Hmaster split hlog failed and skip it.

4. BNN had been active and Hmaster had finished processed shutdown event.

5. A lots of data is lost because the region server had crashed.

HMaster log: “There are some split failed logs”

2011-03-22 13:21:55,056 WARN

org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the logs

java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused

at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)

at org.apache.hadoop.ipc.Client.call(Client.java:820)

at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)

at $Proxy5.getListing(Unknown Source)

at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)

at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)

at $Proxy5.getListing(Unknown Source)

at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)

at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)

at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)

at org.apache.hadoop.hbase.Chore.run(Chore.java:66)

org.apache.hadoop.hbase.master.LogCleaner.run(LogCleaner.java:154)

Caused by: java.net.ConnectException: Connection refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)

at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)

at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)

at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)

at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)

at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)

at org.apache.hadoop.ipc.Client.call(Client.java:788)

... 13 more

2011-03-22 13:21:56,056 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).

2011-03-22 13:21:57,057 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).

…

2011-03-22 13:22:05,060 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).

2011-03-22 13:22:05,060 ERROR

org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting

hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398

java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused

at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)

at org.apache.hadoop.ipc.Client.call(Client.java:820)

at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)

at $Proxy5.getFileInfo(Unknown Source)

at sun.reflect.GeneratedMethodAccessor4.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)

at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)

at $Proxy5.getFileInfo(Unknown Source)

at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)

at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)

at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)

at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)

at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)

at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)

at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)

Caused by: java.net.ConnectException: Connection refused

at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)

at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)

at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:408)

at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:332)

at org.apache.hadoop.ipc.Client$Connection.access$2000(Client.java:202)

at org.apache.hadoop.ipc.Client.getConnection(Client.java:943)

at org.apache.hadoop.ipc.Client.call(Client.java:788)

... 18 more

2011-03-22 13:22:45,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 0 time(s).

2011-03-22 13:22:46,600 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 1 time(s).

…

2011-03-22 13:22:54,603 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: C4C1/157.5.100.1:9000. Already tried 9 time(s).

2011-03-22 13:22:54,603 WARN

org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the logs

java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused

at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)

at org.apache.hadoop.ipc.Client.call(Client.java:820)

at org.apache.hadoop.ipc. RPC$Invoker.invoke(RPC.java:221)

…

(2) How to figure out the root cause based on logs

(2.1) In the above logs, we see ERROR on Failed splitting.

…

at $Proxy5.getFileInfo(Unknown Source)

at org.apache.hadoop.hdfs.DFSClient.getFileInfo(DFSClient.java:623)

at org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:461)

at org.apache.hadoop.fs.FileSystem.exists(FileSystem.java:690)

at org.apache.hadoop.hbase.regionserver.wal.HLogSplitter.splitLog(HLogSplitter.java:177)

at org.apache.hadoop.hbase.master.MasterFileSystem.splitLog(MasterFileSystem.java:196)

at org.apache.hadoop.hbase.master.handler.ServerShutdownHandler.process(ServerShutdownHandler.java:95)

at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:151)

…

From the callstack contained in this ERROR, we know that when the RS which is carrying meta table crashes, HMaster’s ServerShutdownHandler will split the log for this RS. Then HLogSplitter.splitLog() will call FileSystem.exists() to check the hlog to be split. The hlog file status is obtained through RPC call to HDFS’ NameNode.

…

org.apache.hadoop.hbase.master.MasterFileSystem: Failed splitting

hdfs://C4C1:9000/hbase/.logs/C4C9.site,60020,1300767633398

java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused

…

Since the active NN(C4C1/157.5.100.1:9000) crashes, and the backup NN has not been active, HMaster got the ConnectException from the RPC call. (fail to connect the first NN)

(2.2) When the RS which is carrying meta table crashes, HMaster’s ServerShutdownHandler.process() will split the log for this RS.

public void process() throws IOException {
final String serverName = this.hsi.getServerName();
LOG.info("Splitting logs for " + serverName);
this.services.getMasterFileSystem().splitLog(serverName);
...
// Assign root and meta if we were carrying them.
if (isCarryingRoot()) { // -ROOT-
...

}
// Carrying meta?
if (isCarryingMeta()) this.services.getAssignmentManager().assignMeta();
// Wait on meta to come online; we need it to progress.
...
// Skip regions that were in transition unless CLOSING or PENDING_CLOSE
...
LOG.info("Reassigning " + hris.size() + " region(s) that " + serverName +
" was carrying (skipping " + regionsInTransition.size() +
" regions(s) that are already in transition)");

// Iterate regions that were on this server and assign them
for (Map.Entry<HRegionInfo, Result> e: hris.entrySet()) {
        if (processDeadRegion(e.getKey(), e.getValue(),
         this.services.getAssignmentManager(),
         this.server.getCatalogTracker())) {

this.services.getAssignmentManager().assign(e.getKey(), true);
}
}
this.deadServers.finish(serverName);
LOG.info("Finished processing of shutdown of " + serverName);
}

When the ConnectException occurs in splitLog(), it will be caught and an ERROR log is printed. So the log split is “skipped” due to the ConnectException.

As a result, the hlog belongs to each region on this crashed RS will be lost.

public void splitLog(final String serverName) {
this.splitLogLock.lock();
long splitTime = 0, splitLogSize = 0;
Path logDir = new Path(this.rootdir, HLog.getHLogDirectoryName(serverName));
try {

        HLogSplitter splitter = HLogSplitter.createLogSplitter(
         conf, rootdir, logDir, oldLogDir, this.fs);
        try {          splitter.splitLog();
        } catch (OrphanHLogAfterSplitException e) {          LOG.warn("Retrying splitting because of:", e);
         // An HLogSplitter instance can only be used once. Get new instance.
         splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir,
         oldLogDir, this.fs);
         splitter.splitLog();
        }
        splitTime = splitter.getTime();
        splitLogSize = splitter.getSize();
} catch (IOException e) {
        LOG.error("Failed splitting " + logDir.toString(), e);
} finally {        this.splitLogLock.unlock();
}
if (this.metrics != null) {        this.metrics.addSplit(splitTime, splitLogSize);
}
}

(2.3) For the WARN in the above log:

2011-03-22 13:21:55,056 WARN

org.apache.hadoop.hbase.master.LogCleaner: Error while cleaning the logs

java.net.ConnectException: Call to C4C1/157.5.100.1:9000 failed on connection exception: java.net.ConnectException: Connection refused

at org.apache.hadoop.ipc.Client.wrapException(Client.java:844)

at org.apache.hadoop.ipc.Client.call(Client.java:820)

at org.apache.hadoop.ipc.RPC$Invoker.invoke(RPC.java:221)

at $Proxy5.getListing(Unknown Source)

at sun.reflect.GeneratedMethodAccessor9.invoke(Unknown Source)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:82)

at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:59)

at $Proxy5.getListing(Unknown Source)

at org.apache.hadoop.hdfs.DFSClient.listPaths(DFSClient.java:614)

at org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:252)

at org.apache.hadoop.hbase.master.LogCleaner.chore(LogCleaner.java:121)

at org.apache.hadoop.hbase.Chore.run(Chore.java:66)

LogCleaner is a chore thread (daemon thread) which runs at some specified interval and attempt to delete the Hlogs in the oldlogs directory. From the callstack, we see that LogCleaner.chore() calls DistributedFileSystem.listStatus(), which is also a RPC call to NameNode. Since the active NN is crashed, we will also get the ConnectException:Connection refused.

(3) Root Cause

When HMaster uses its ServerShutdownHandler to process the crashed RS (which is carrying META table), it fails to split log for this RS due to the connection failure with NameNode (the active NN is crashed, and the backup NN has not been active). After the connection failure, the HMaster will skip log split and continue to reassign the META table and the other user regions on this dead RS. As a result, the hlog on this RS will be lost.

(4) Fixing Method

The HMaster should shut down itself when the HDFS is crashed. So that the backup HMaster can become active, and it can process the crashed RS (splitting its hlog). Since BNN has been active, splitting log can be successful. And the hlog on this RS will not be lost.

When IOException occurs with splitter.splitLog(), calling checkFileSystem() to see if the file system is still accessible. If not, abort the current HMaster.

• src/main/java/org/apache/hadoop/hbase/master/MasterFileSystem.java

public void splitLog(final String serverName) {
this.splitLogLock.lock();
long splitTime = 0, splitLogSize = 0;
Path logDir = new Path(this.rootdir, HLog.getHLogDirectoryName(serverName));
try {
        HLogSplitter splitter = HLogSplitter.createLogSplitter(
         conf, rootdir, logDir, oldLogDir, this.fs);
        try {          splitter.splitLog();
        } catch (OrphanHLogAfterSplitException e) {
         LOG.warn("Retrying splitting because of:", e);
         // An HLogSplitter instance can only be used once. Get new instance.
         splitter = HLogSplitter.createLogSplitter(conf, rootdir, logDir,
         oldLogDir, this.fs);
         splitter.splitLog();
        }

splitTime = splitter.getTime();
splitLogSize = splitter.getSize();
} catch (IOException e) {

+          checkFileSystem();
        LOG.error("Failed splitting " + logDir.toString(), e);
} finally {
        this.splitLogLock.unlock();
}

if (this.metrics != null) { this.metrics.addSplit(splitTime, splitLogSize); }
}

public boolean checkFileSystem() {
if (this.fsOk) {

        try {          FSUtils.checkFileSystemAvailable(this.fs);
        } catch (IOException e) {
         master.abort("Shutting down HBase cluster: file system not available", e);

this.fsOk = false;
}
}
return this.fsOk;
}

(5) How many nodes are involved in the patch

HMaster(s)

(6) Code Snippets

public List<Path> splitLog()
throws IOException {
Preconditions.checkState(!hasSplit,
"An HLogSplitter instance may only be used once");
hasSplit = true;

long startTime = System.currentTimeMillis();
List<Path> splits = null;
if (!fs.exists(srcDir)) {        // Nothing to do
        return splits;
}
FileStatus[] logfiles = fs.listStatus(srcDir);
if (logfiles == null || logfiles.length == 0) {        // Nothing to do
        return splits;
}
LOG.info("Splitting " + logfiles.length + " hlog(s) in "
         + srcDir.toString());
splits = splitLog(logfiles);

splitTime = System.currentTimeMillis() - startTime;
LOG.info("hlog file splitting completed in " + splitTime +
" ms for " + srcDir.toString());
return splits;
}