if one/some Region Sever(s) die(s) after the cluster makes use of ZooKeeper's multi-update functionality (enable hbase.zookeeper.useMulti) and is setup as a replication source, all regionservers aborted within a few seconds.
Blocker
Yes
No
yes
All the Region Servers are aborted
0.94.5
set hbase.replication and hbase.zookeeper.useMulti to true
2.2 Reproduction procedure
1 enable zookeeper.useMulti and replication features (config change)
2 restart replication source cluster (restartfff)
3 abort one Region Server (shutdown)
In the above order
yes
no
2 (1RS + 1ZK)
Easy
2013-03-14 00:11:37,993 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Atomically moving dnds1-4,60020,1363219866063's hlogs to my queue
2013-03-14 00:11:37,997 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: Creating dnds1-4%2C60020%2C1363219866063.1363219868868 with data
2013-03-14 00:11:37,997 DEBUG org.apache.hadoop.hbase.replication.ReplicationZookeeper: The multi list size is: 5
2013-03-14 00:11:38,068 WARN org.apache.hadoop.hbase.replication.ReplicationZookeeper: Got exception in copyQueuesFromRSUsingMulti:
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.ZooKeeper.multiInternal(ZooKeeper.java:945)
at org.apache.zookeeper.ZooKeeper.multi(ZooKeeper.java:911)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.multi(RecoverableZooKeeper.java:531)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.multiOrSequential(ZKUtil.java:1436)
at org.apache.hadoop.hbase.replication.ReplicationZookeeper.copyQueuesFromRSUsingMulti(ReplicationZookeeper.java:705)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager$NodeFailoverWorker.run(ReplicationSourceManager.java:590)
at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)
2013-03-14 00:11:38,070 INFO org.apache.hadoop.hbase.metrics: new MBeanInfo
2013-03-14 00:11:38,077 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Getting 2 rs from peer cluster # 1
2013-03-14 00:11:38,077 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Choosing peer ist6-dnds1-6,60020,1359508078741
2013-03-14 00:11:38,077 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Choosing peer ist6-dnds1-3,60020,1359508079058
2013-03-14 00:11:39,080 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Replicating 4f3d5435-898c-47c2-8821-aeb01f9e87cc -> 74c750a5-4254-4a3b-ab12-063869759edd
2013-03-14 00:11:39,081 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening log for replication dnds1-4%2C60020%2C1363219866063.136321986886
8 at 0
2013-03-14 00:11:39,090 DEBUG org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: currentNbOperations:0 and seenEntries:3 and size: 0
2013-03-14 00:11:39,090 INFO org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Going to report log #dnds1-4%2C60020%2C1363219866063.1363219868868
for position 1004 in hdfs://cluster/hbase/.oldlogs/dnds1-4%2C60020%2C1363219866063.1363219868868
2013-03-14 00:11:39,139 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: ABORTING region server dnds1-13,60020,1363219887385: Writing replication status
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/replication/rs/dnds1-13,60020,1363219887385/1-dnds1-4,60
020,1363219866063/dnds1-4%2C60020%2C1363219866063.1363219868868
at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266)
at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:349)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:848)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:900)
at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:894)
at org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:558)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:155)
at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:379)
2013-03-14 00:11:39,140 FATAL org.apache.hadoop.hbase.regionserver.HRegionServer: RegionServer abort: loaded coprocessors are: []
-------------------
yes
no
no
no
hard
If a RegionServer accidentally shuts down, only one of the NodeFailoverWorkers is supposed to take the dead RegionServer znode and starts moving the log znodes under it. When the rest of NodeFailoverWorkers want to copy the log, they will get an exception and not move the logs. However, this exception is not handled correctly. It returns to other NodeFailoverWorkers a non-empty queue so that they will try to move the znode as well and try to clean up the log later. Since the first NodeFailoverWorker has already done these operations. The following job will crash the cluster.
incorrect error handling (statement coverage), see below for the patch.
4.2 Are there multiple fault?
NO
Yes
diff --git src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
index f23c31c..290741e 100644
--- src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
+++ src/main/java/org/apache/hadoop/hbase/replication/ReplicationZookeeper.java
@@ -672,7 +672,7 @@ public class ReplicationZookeeper {
} catch (KeeperException e) {
// Multi call failed; it looks like some other regionserver took away the logs.
LOG.warn("Got exception in copyQueuesFromRSUsingMulti: ", e);
+ queues.clear();
}
return queues;
}