HBase-4729

1. Symptom

When the client runs online alter operation while regions were splitting, the master dies and left the table half-altered.

1.1 Severity

Critical

1.2 Was there exception thrown?

Yes

1.2.1 Were there multiple exceptions?

No

1.3 Was there a long propagation of the failure?

No

1.4 Scope of the failure

Master

2. How to reproduce this failure

2.0 Version

0.92.0

2.1 Configuration

Basic configuration

2.2 Reproduction procedure

1 Run Load Balancer and split the regions (load balancing will occur automatically when there are no regions in transition and according to a fixed period of a time)

2 Alter column family schema by running the command like:

alter 'TestTable', NAME => 'Name', METHOD => 'delete'

2.2.1 Timing order

2 must happen during the 1 is running

2.2.2 Events order externally controllable?

We can control 1 and 0 is happened during a fixed period of a time

2.3 Can the logs tell how to reproduce the failure?

????

2.4 How many machines needed?

2 (Client + RegionServer)

2.5 How hard is the reproduction?

Easy

3. Diagnosis procedure

3.1 Detailed Symptom (where you start)

2011-11-02 17:06:40,704 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Starting split of region TestTable,0012469153,1320253135043.f7e1783e65ea8d621a4bc96ad310f101.

2011-11-02 17:06:40,704 DEBUG org.apache.hadoop.hbase.regionserver.SplitTransaction: regionserver:62023-0x132f043bbde0710 Creating ephemeral node for f7e1783e65ea8d621a4bc96ad310f101 in SPLITTING state

2011-11-02 17:06:40,751 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Attempting to transition node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLITTING

...

2011-11-02 17:06:44,061 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: regionserver:62023-0x132f043bbde0710 Successfully transitioned node f7e1783e65ea8d621a4bc96ad310f101 from RS_ZK_REGION_SPLITTING to RS_ZK_REGION_SPLIT

2011-11-02 17:06:44,061 INFO org.apache.hadoop.hbase.regionserver.SplitTransaction: Still waiting on the master to process the split for f7e1783e65ea8d621a4bc96ad310f101

2011-11-02 17:06:44,428 FATAL org.apache.hadoop.hbase.master.HMaster: Unexpected ZK exception creating node CLOSING

org.apache.zookeeper.KeeperException$NodeExistsException: KeeperErrorCode = NodeExists for /hbase/unassigned/f7e1783e65ea8d621a4bc96ad310f101

at org.apache.zookeeper.KeeperException.create(KeeperException.java:110)

at org.apache.zookeeper.KeeperException.create(KeeperException.java:42)

at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:637)

at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.createNonSequential(RecoverableZooKeeper.java:459)

at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.create(RecoverableZooKeeper.java:441)

at org.apache.hadoop.hbase.zookeeper.ZKUtil.createAndWatch(ZKUtil.java:769)

at org.apache.hadoop.hbase.zookeeper.ZKAssign.createNodeClosing(ZKAssign.java:568)

at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1722)

at org.apache.hadoop.hbase.master.AssignmentManager.unassign(AssignmentManager.java:1661)

at org.apache.hadoop.hbase.master.BulkReOpen$1.run(BulkReOpen.java:69)

at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)

at java.lang.Thread.run(Thread.java:662)

3.2 Backward inference

3.3 Are the printed log sufficient for diagnosis?

Yes

3.4 Are logs misleading?

No

3.5 Do we need to examine different component’s log for diagnosis?

???

3.6 Is it a multi-components failure?

Yes (Client+RS)

3.7 How hard is the diagnosis?

Easy

4. Root cause

The original exception did not handle well such a situation where the client tries to alter column family schema during the region is splitting. And the master would be aborted if this situation happens.

4.1 Category:

Incorrect error handling --- they handled the error but not a specific subtype of the error. See below.

4.2 Are there multiple fault?

No

5. Fix

@@ -1759,27 +1754,49 @@

         synchronized (regionsInTransition) {

           state = regionsInTransition.get(encodedName);

           if (state == null) {

-

              // Create the znode in CLOSING state

             try {

               ZKAssign.createNodeClosing(

                 master.getZooKeeper(), region, master.getServerName());

             } catch (KeeperException e) {

+              if (e instanceof NodeExistsException) {

+                // Handle race between master initiated close and regionserver

+                // orchestrated splitting. See if existing node is in a

+                // SPLITTING or SPLIT state.  If so, the regionserver started

+                // an op on node before we could get our CLOSING in.  Deal.

+                NodeExistsException nee = (NodeExistsException)e;

+                String path = nee.getPath();

+                try {

+                  if (isSplitOrSplitting(path)) {

+                    LOG.debug(path + " is SPLIT or SPLITTING; " +

+                      "skipping unassign because region no longer exists -- its split");

+                    return;

+                  }

+                } catch (KeeperException.NoNodeException ke) {

+                  LOG.warn("Failed getData on SPLITTING/SPLIT at " + path +

+                    "; presuming split and that the region to unassign, " +

+                    encodedName + ", no longer exists -- confirm", ke);

+                  return;

+                } catch (KeeperException ke) {

+                  LOG.error("Unexpected zk state", ke);

+                  ke = e;

+                }

+              }

+              // If we get here, don't understand whats going on -- abort.

               master.abort("Unexpected ZK exception creating node CLOSING", e);

               return;

             }

@@ -1837,6 +1854,28 @@

   }

 

   /**

+   * @param path

+   * @return True if znode is in SPLIT or SPLITTING state.

+   * @throws KeeperException Can happen if the znode went away in meantime.

+   */

+  private boolean isSplitOrSplitting(final String path) throws KeeperException {

+        boolean result = false;

+        // This may fail if the SPLIT or SPLITTING znode gets cleaned up before we

+        // can get data from it.

+        RegionTransitionData data = ZKAssign.getData(master.getZooKeeper(), path);

+        EventType evt = data.getEventType();

+        switch (evt) {

+        case RS_ZK_REGION_SPLIT:

+        case RS_ZK_REGION_SPLITTING:

+          result = true;

+          break;

+        default:

+          break;

+        }

+        return result;

+  }

+

+  /**