HDFS-3295 Report

https://issues.apache.org/jira/browse/HDFS-3295

1. Symptom

When using hdfs balancer, the “Bytes Already Moved” field of the hdfs balancer progress report always shows “0”.

Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved

14-Jul-2013 3:45:48 PM            0                 0 KB             2.14 GB           18.45 MB

14-Jul-2013 3:47:24 PM            1                 0 KB             2.08 GB           18.45 MB

14-Jul-2013 3:49:00 PM            2                 0 KB             2.02 GB           18.45 MB

14-Jul-2013 3:50:36 PM            3                 0 KB             1.95 GB           18.45 MB

14-Jul-2013 3:52:12 PM            4                 0 KB             1.89 GB           18.45 MB

14-Jul-2013 3:53:48 PM            5                 0 KB             1.83 GB           18.45 MB

14-Jul-2013 3:55:24 PM            6                 0 KB             1.76 GB           18.45 MB

14-Jul-2013 3:57:00 PM            7                 0 KB             1.69 GB           18.45 MB

1.1 Severity

Major

1.2 Was there exception thrown?

No

2. How to reproduce this failure

2.0 Version

0.23.0

2.1 Configuration

Normal config.

  <property>

    <name>dfs.replication</name>

    <value>1</value>

  </property>

At the beginning just set 1 DN to the “slaves” in NN. Later add another DN on the fly.

2.2 Reproduction procedure

1. start NN and DN1

2. $ hdfs dfs -put /home/ding/research/hadoop/data/random.txt /

3. start DN2

4. hdfs balancer -threshold 0.001

2.3 Can the logs tell how to reproduce the failure?

Yes. NN + client log is enough.

Starting DN2:

2013-07-14 15:45:38,549 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* NameSystem.registerDatanode: node registration from 128.100.241.109:50010 stora

2013-07-14 15:45:38,549 INFO org.apache.hadoop.net.NetworkTopology: Adding a new node: /default-rack/128.100.241.109:50010

Balancer log: see above.

2.4 How many machines needed?

2 nodes; 3 components (NN + 2 DN)

3. Diagnosis procedure

3.1 Detailed Symptom (where you start)

Time Stamp               Iteration#  Bytes Already Moved  Bytes Left To Move  Bytes Being Moved

14-Jul-2013 3:45:48 PM            0                 0 KB             2.14 GB           18.45 MB

14-Jul-2013 3:47:24 PM            1                 0 KB             2.08 GB           18.45 MB

14-Jul-2013 3:49:00 PM            2                 0 KB             2.02 GB           18.45 MB

14-Jul-2013 3:50:36 PM            3                 0 KB             1.95 GB           18.45 MB

14-Jul-2013 3:52:12 PM            4                 0 KB             1.89 GB           18.45 MB

14-Jul-2013 3:53:48 PM            5                 0 KB             1.83 GB           18.45 MB

14-Jul-2013 3:55:24 PM            6                 0 KB             1.76 GB           18.45 MB

14-Jul-2013 3:57:00 PM            7                 0 KB             1.69 GB           18.45 MB

3.2 Backward inference

Locate the code that prints “0 KB”:

Balancer.java,

      formatter.format("%-24s %10d  %19s  %18s  %17s\n",

          DateFormat.getDateTimeInstance().format(new Date()),

          iteration,

          StringUtils.byteDesc(bytesMoved.get()),

          StringUtils.byteDesc(bytesLeftToMove),

          StringUtils.byteDesc(bytesToMove)

          );

So the “Already moved” is from “bytesMoved”.

public class Balancer {

  .. ..

  private static class BytesMoved {

    private long bytesMoved = 0L;;

    private synchronized void inc( long bytes ) {

      bytesMoved += bytes;

    }

    private long get() {

      return bytesMoved;

    }

  };

  private BytesMoved bytesMoved = new BytesMoved();

   --- No “static” here!!!

}

So here is the bug: since “bytesMoved” is not declared as “static”, so every iteration, since a new Balancer instance will be created, a new bytesMoved will be created.

The original intention was to make this bytesMoved “static”.

However, there is another bug: even if this “bytesMoved” is static, it will accumulate for all the balancer operations. We need to further differentiate different “bytesMoved” across different balancer operations. 

4. Root cause

Not using “static” on bytesMoved.

4.1 Category:

Semantic.

5. Fix

+  private final static Map<NameNodeConnector, BalancerMetrics>
+   nncVsBalancerMetricsMap = new HashMap<NameNodeConnector, BalancerMetrics>();
  private final NameNodeConnector nnc;
  private final BalancingPolicy policy;
  private final double threshold;
@@ -316,7 +321,7 @@
        in = new DataInputStream( new BufferedInputStream(
            sock.getInputStream(), HdfsConstants.IO_FILE_BUFFER_SIZE));
        receiveResponse(in);
-        bytesMoved.inc(block.getNumBytes());
+        nncVsBalancerMetricsMap.get(nnc).bytesMoved.inc(block.getNumBytes());
        LOG.info( "Moving block " + block.getBlock().getBlockId() +
              " from "+ source.getDisplayName() + " to " +
              target.getDisplayName() + " through " +
@@ -1074,9 +1079,35 @@
      return bytesMoved;
    }
  };
-  private BytesMoved bytesMoved = new BytesMoved();
-  private int notChangedIterations = 0;

… ..

Replace bytesMoved with a static nncVsBalanceMetricsMap --- it is static, and also differentiate different connectors.