HDFS-3194 Report

https://issues.apache.org/jira/browse/HDFS-3194

1. Symptom

DataNode block scanner is running too frequently.

After the files are copied files to the HDFS, the following log message keep written into the datanode log message.

2013-07-02 16:01:14,883 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Starting a new period : work left in prev period : 100.00%

2013-07-02 16:01:14,895 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification succeeded for BP-640052684-127.0.1.1-1372795126053:blk_4525799356559840979_1002

2013-07-02 16:01:19,912 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Starting a new period : work left in prev period : 0.00%

2013-07-02 16:01:24,913 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Starting a new period : work left in prev period : 0.00%

2013-07-02 16:01:25,124 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification succeeded for BP-640052684-127.0.1.1-1372795126053:blk_4525799356559840979_1002

2013-07-02 16:01:30,125 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Starting a new period : work left in prev period : 0.00%

2013-07-02 16:01:35,127 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Starting a new period : work left in prev period : 0.00%

2013-07-02 16:01:35,731 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification succeeded for BP-640052684-127.0.1.1-1372795126053:blk_4525799356559840979_1002

The “Starting a new period” message appears every 5 seconds and the “Verification succeeded for BP-” appears every 10 seconds.

In particular, this msg:

2013-07-02 16:01:25,124 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification succeeded for BP-640052684-127.0.1.1-1372795126053:blk_4525799356559840979_1002

indicates a full pass of block scan, which occurs every 10 seconds --- too frequent.

The correct behavior is to do a full scan only every 21 days.

1.1 Severity

Major

1.2 Was there exception thrown?

No

2. How to reproduce this failure

2.0 Version

I used version 2.0.0-alpha, but version 0.23 should have the similar problem.

2.1 Configuration

One machine (one namenode and one datanode) is enough to reproduce this failure. I used one VM as master.

No extra setting is needed for configuration files.

2.2 Reproduction procedure

1 remove the content in the temporary folder. In my case, I remove the all the files in the /app/hadoop/tmp

2 hadoop namenode –format

Format the namenode

3 start-dfs.sh

Start a namenode and a datanode

4 hadoop dfs –copyFromLocal random.txt /

Copy a small random file into dfs

5 check the datanode log file, and we can see the symptom

2.3 Are input events logged?

Yes!

2013-07-02 16:00:04,974 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Namenode Block pool BP-640052684-127.0.1.1-1372795126053 (storage id DS-1998050303-127.0.1.1-50010-1372795204528) service to master/192.168.59.135:54310 trying to claim ACTIVE state with txid=1

2013-07-02 16:00:04,974 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Acknowledging ACTIVE Namenode Block pool BP-640052684-127.0.1.1-1372795126053 (storage id DS-1998050303-127.0.1.1-50010-1372795204528) service to master/192.168.59.135:54310

2013-07-02 16:00:04,997 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: BlockReport of 0 blocks took 1 msec to generate and 22 msecs for RPC and NN processing

2013-07-02 16:00:04,997 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: sent block report, processed command:org.apache.hadoop.hdfs.server.protocol.FinalizeCommand@451710be

2013-07-02 16:00:05,002 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Periodic Block Verification Scanner initialized with interval 504 hours for block pool BP-640052684-127.0.1.1-1372795126053.

2013-07-02 16:00:05,004 INFO org.apache.hadoop.hdfs.server.datanode.DataBlockScanner: Added bpid=BP-640052684-127.0.1.1-1372795126053 to blockPoolScannerMap, new size=1

2013-07-02 16:00:09,861 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Starting a new period : work left in prev period : 0.00%

.

.

.

2013-07-02 16:00:59,879 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Starting a new period : work left in prev period : 0.00%

2013-07-02 16:01:04,881 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Starting a new period : work left in prev period : 0.00%

2013-07-02 16:01:09,882 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Starting a new period : work left in prev period : 0.00%

2013-07-02 16:01:14,274 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block BP-640052684-127.0.1.1-1372795126053:blk_4525799356559840979_1002 src: /192.168.59.135:55331 dest: /192.168.59.135:50010

2013-07-02 16:01:14,438 INFO org.apache.hadoop.hdfs.server.datanode.DataNode.clienttrace: src: /192.168.59.135:55331, dest: /192.168.59.135:50010, bytes: 940457, op: HDFS_WRITE, cliID: DFSClient_NONMAPREDUCE_1408428174_1, offset: 0, srvID: DS-1998050303-127.0.1.1-50010-1372795204528, blockid: BP-640052684-127.0.1.1-1372795126053:blk_4525799356559840979_1002, duration: 68352824

2013-07-02 16:01:14,442 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-640052684-127.0.1.1-1372795126053:blk_4525799356559840979_1002, type=LAST_IN_PIPELINE, downstreams=0:[] terminating

2013-07-02 16:01:14,883 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Starting a new period : work left in prev period : 100.00%

2013-07-02 16:01:14,895 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification succeeded for BP-640052684-127.0.1.1-1372795126053:blk_4525799356559840979_1002

2013-07-02 16:01:19,912 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Starting a new period : work left in prev period : 0.00%

2013-07-02 16:01:24,913 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Starting a new period : work left in prev period : 0.00%

2013-07-02 16:01:25,124 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification succeeded for BP-640052684-127.0.1.1-1372795126053:blk_4525799356559840979_1002

From the above log message, we can see that at first there is no block in the block pool. But after the datanode receives a block,

2013-07-02 16:01:14,274 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Receiving block BP-640052684-127.0.1.1-1372795126053:blk_4525799356559840979_1002 src: /192.168.59.135:55331 dest: /192.168.59.135:50010

the “Verification succeeded for BP-” log messages begin appearing in the log.

2.4 How many machines needed?

One  (1NN + 1DN)

3. Diagnosis procedure

3.1 Detailed Symptom

Problem 1: The “starting a new period” line gets logged on datanode every 5 seconds with the default interval. It will keep increase the size of the datanode.

Problem 2: The “verification succeeded for BP-” appears right after one block is scanned successfully. And block scanning interval by default should be taken as 21 days (3 weeks) and each block scanning should happen once in 21 days. So it means the “verification succeeded for BP-” for one block is supposed to appear once in 21 days.

3.2 Backward inference

A. Locate the source location where the two msgs:

“Verification succeeded for...” and “Starting a new period...”. Here is the relevant call graph of the functions printing these two messages.

B. From the log we see that the startNewPeriod function is called every 5 seconds and the VerifyBlock function is called every 10 seconds, which means the block is scanned every 10 seconds. Since the verification log is used to track the information about previous scan time, and ensure that one block scanned only after 3 weeks, there must be something wrong for verification log tracking the scan time.

2013-07-02 16:03:12,194 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification succeeded for BP-640052684-127.0.1.1-1372795126053:blk_4525799356559840979_1002

2013-07-02 16:03:17,195 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Starting a new period : work left in prev period : 0.00%

2013-07-02 16:03:22,196 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Starting a new period : work left in prev period : 0.00%

2013-07-02 16:03:22,799 INFO org.apache.hadoop.hdfs.server.datanode.BlockPoolSliceScanner: Verification succeeded for BP-640052684-127.0.1.1-1372795126053:blk_4525799356559840979_1002

C. Now let’s read the code.

/* scanBlockPoolSlice is called every  5 seconds to scan the incremental blocks. */

void scanBlockPoolSlice() {

          startNewPeriod();

          --- this will print the “Starting a new period...” msgs every 5 seconds

       .. ..

          // Start scanning

          try {

             scan();

          } finally {

            totalBlocksScannedInLastRun.set(processedBlocks.size());

         lastScanTime.set(System.currentTimeMillis());

          }

  }

 

 private void scan() {

         .. ..        

          try {

            .. ..

         

          while (datanode.shouldRun && !Thread.interrupted()

                    && datanode.isBPServiceAlive(blockPoolId)) {

          long now = System.currentTimeMillis();

          .. ..

          if (((now - getEarliestScanTime()) >= scanPeriod)

                    || ((!blockInfoSet.isEmpty()) && !(this.isFirstBlockProcessed()))) {

           --- This condition is to make sure “verifyFirstBlock” is called every 21 days.

                    verifyFirstBlock();

---- this function calls verifyBlock which prints the “verification succeeded ...” log message

          }

         .. ..

            }

          } catch (RuntimeException e) {

            .. ..

          } finally {

            cleanUp();

  --- Note under this logic, cleanUp will be called every 5 seconds!

            }

  }

 

  // Picks one block and verifies it

  /* Basically this function will do a full scan by calling verifyBlock. */

  private void verifyFirstBlock() {

        Block block = null;

        synchronized (this) {

          if (!blockInfoSet.isEmpty()) {

            block = blockInfoSet.first().block;

          }

        }

        if ( block != null ) {

          verifyBlock(new ExtendedBlock(blockPoolId, block));

      processedBlocks.put(block.getBlockId(), 1);

        }

  }

 

private void verifyBlock(ExtendedBlock block) {

        .. ..

            LOG.info((second ? "Second " : "") +

                     "Verification succeeded for " + block);

            

       ... …

 }

 

/* cleanUp will roll the edit log, which stores the scan timestamp. Rolling edit log is essentially resetting the scan timestamp.*/

private synchronized void cleanUp() {

          if (verificationLog != null) {

          try {

          verificationLog.logs.roll();

          } catch (IOException ex) {

          LOG.warn("Received exception: ", ex);

          verificationLog.close();

          }

          }

  }

The scanBlockPoolSlice is called every 5 seconds by default, and it calls startNewPeriod function at the beginning. Therefore, the “Starting a new period” appears every 5 seconds in the log messages.

The cleanUp function is implemented to roll the verification log (record the previous scan time). And it is called every time the scan function is called, which means the verification log is rolled every 5 seconds. So it is very likely that the failure is triggered because the cleanUp function is called too frequently. 

 

D. The actual root cause is described in section 4

4. Root cause

BlockPoolSliceScanner#scan calls cleanUp every time it's invoked from DataBlockScanner#run via scanBlockPoolSlice. But cleanUp unconditionally roll()s the verificationLogs, so after two iterations we have lost the first iteration of block verification times (Please refer to figure 1). As a result a cluster with just one block repeatedly rescans it every 10 seconds.

figure 1

4.1 Category

Semantic.

5. Fix

5.1 How?

make sure the scan function is not called when a scan is running, therefore we do not “cleanUp” the edit log every 5 seconds.

Also, rename “cleanUp” to “rollVerificationLogs” to make the name more clear.

Patch:

+  private synchronized boolean workRemainingInCurrentPeriod() {

+        if (bytesLeft <= 0 && Time.now() < currentPeriodStart + scanPeriod) {

+          if (LOG.isDebugEnabled()) {

+        LOG.debug("Skipping scan since bytesLeft=" + bytesLeft + ", Start=" +

+                  currentPeriodStart + ", period=" + scanPeriod + ", now=" +

+                  Time.now() + " " + blockPoolId);

+          }

+          return false;

+        } else {

+          return true;

+        }

+  }

+

   void scanBlockPoolSlice() {

+        if (!workRemainingInCurrentPeriod()) {

+          return;

+        }

+

       ..

       try {

         scan();

       }

@@ -608,14 +632,14 @@ class BlockPoolSliceScanner {

In scan():

       LOG.warn("RuntimeException during BlockPoolScanner.scan()", e);

           throw e;

         } finally {

-          cleanUp();

+      rollVerificationLogs();

           if (LOG.isDebugEnabled()) {

         LOG.debug("Done scanning block pool: " + blockPoolId);

           }

         }

   }

 

-  private synchronized void cleanUp() {

+  private synchronized void rollVerificationLogs() {

         if (verificationLog != null) {

           try {

             verificationLog.logs.roll();