HDFS-3436 Report

https://issues.apache.org/jira/browse/HDFS-3436

1. Symptom:

When appending to a file, the append failed (exception thrown and content not appended).

13/06/20 14:39:15 WARN hdfs.DFSClient: DataStreamer Exception

java.io.EOFException: Premature EOF: no length prefix available

        at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)

1.1 Severity

Major

1.2 Was there exception thrown

Yes. There are multiple exceptions from DNs.

Exceptions on DN1:

2013-06-20 14:39:15,664 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1107653285-127.0.1.1-1371753149490:blk_8462409925548039750_1013, type

=HAS_DOWNSTREAM_IN_PIPELINE

java.io.EOFException: Premature EOF: no length prefix available

        at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:95)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:993)

2013-06-20 14:39:15,673 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for BP-1107653285-127.0.1.1-1371753149490:blk_84624099255480397

50_1013

java.io.IOException: Connection reset by peer

        at sun.nio.ch.FileDispatcher.read0(Native Method)

        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)

        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)

        at sun.nio.ch.IOUtil.read(IOUtil.java:171)

2013-06-20 14:39:15,675 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-1107653285-127.0.1.1-1371753149490:blk_8462409925548039750_1013 received exception java.io.IOException: Connection reset by peer

2013-06-20 14:39:15,675 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: research-virtual-machine:50010:DataXceiver error processing WRITE_BLOCK operation  src: /172.16.124.129:56689 dest: /172.16.124.129:50010

java.io.IOException: Connection reset by peer        at sun.nio.ch.FileDispatcher.read0(Native Method)        at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)

        at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:198)

        at sun.nio.ch.IOUtil.read(IOUtil.java:171)

        at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:245)

        at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:54)

        at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:142)

Exception in DN2:

2013-06-20 14:39:15,640 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode{data=FSDataset{dirpath='[/home/ding/research/hadoop/hadoop-2.0.0-alpha-sr      c/hadoop-dist/target/hadoop-2.0.0-alpha/dfs/data/current]'}, localName='research-virtual-machine:50010', storageID='DS-1891234353-127.0.1.1-50010-1371753278058',       xmitsInProgress=0}:Exception transfering block BP-1107653285-127.0.1.1-1371753149490:blk_8462409925548039750_1013 to mirror 172.16.124.130:50010: java.net.ConnectException: Connection refused

2013-06-20 14:39:15,640 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-1107653285-127.0.1.1-1371753149490:blk_8462409925548039750_1013 received exception java.net.ConnectException: Connection refused

2013-06-20 14:39:15,642 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: research-virtual-machine:50010:DataXceiver error processing WRITE_BLOCK operation

       src: /172.16.124.129:60109 dest: /172.16.124.128:50010

java.net.ConnectException: Connection refused

        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

        at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:599)

        .. ..

2013-06-20 14:39:15,663 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: research-virtual-machine:50010:DataXceiver error processing TRANSFER_BLOCK operati      on  src: /172.16.124.129:60110 dest: /172.16.124.128:50010

java.io.IOException: BP-1107653285-127.0.1.1-1371753149490:blk_8462409925548039750_1012 is neither a RBW nor a Finalized, r=ReplicaBeingWritten, blk_8462409925548039750_1013, RBW

Exception in DN3:

2013-06-20 14:38:47,532 DEBUG org.apache.hadoop.hdfs.server.datanode.DataNode: Sending heartbeat from service actor: Block pool BP-1107653285-127.0.1.1-1371753149490 (

2013-06-20 14:38:48,609 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down DataNode at research-virtual-machine/127.0.1.1

************************************************************/

Clearly, there is a chain of exceptions, and we have:

DN3 shutdown → Connection refused exception on DN2 (when trying to connect to DN3) → “Premature EOF” exception on DN1 → “Premature EOF” exception on client

What is the correct behavior?

ding@research-virtual-machine:~/research/hadoop/hadoop-2.0.0-alpha-src/append_prog$ java -cp `hadoop classpath`:build/classes/ mypackage.HDFS_4022

Hello World

13/07/09 14:57:10 INFO hdfs.DFSClient: Exception in createBlockOutputStream

java.io.IOException: Bad connect ack with firstBadLink as 172.16.124.131:50010

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1053)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:943)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)

13/07/09 14:57:10 WARN hdfs.DFSClient: Error Recovery for block BP-1854916357-127.0.1.1-1373395657208:blk_-5676536617205446517_1011 in pipeline 172.16.124.129:50010, 172.16.124.130:50010, 172.16.124.131:50010: bad datanode 172.16.124.131:50010

But when you look at the tail of the file, you can see the content is appended!

1.2.1 Multiple exceptions?

Yes.

java.net.ConnectException

java.io.IOException

java.io.EOFException

1.4 Scope of the failure

Only one client’s request is affected.

2. How to reproduce:

2.1 Version:

2.0.0-alpha

2.2 Configuration & reproduction procedure:

A) set up: 4 nodes (straightforward)

 --- Question: will we really need all 4 nodes?

Use baker221 for this failure:

ding-hdp-2.0.0-alpha-amd64-1: 172.16.124.129 (NN + DN)

ding-hdp-2.0.0-alpha-amd64-2: 172.16.124.128

ding-hdp-2.0.0-alpha-amd64-3: 172.16.124.130

ding-hdp-2.0.0-alpha-amd64-4: 172.16.124.131

B) set up the master:

core-site.xml: set every node’s core-site.xml to point to the NN (129)

slaves: add all 4 nodes into the “slaves”

hdfs-site.xml: replication factor set to 4

  Set the replication factor to 3!

C) set up all the slaves

modify: core-site.xml, masters, hdfs-site.xml (is it really necessary?)

core-site.xml:

  <property>

    <name>fs.default.name</name>

    <!-- <value>hdfs://localhost:8020</value> -->

    <!-- <value>hdfs://dlrz.eecg.utoronto.ca:54310</value> -->

    <!-- <value>hdfs://128.100.241.109:54310</value> -->

    <value>hdfs://172.16.124.129:54310</value>

    <final>true</final>

  </property>

D) format the nn:

hadoop namenode -format

Also remember to delete the data directory in each DN.

E) start hdfs: start-dfs.sh on NN

F) upload a file:

hdfs dfs -put ./random.txt /

Verify that the file is being replicated:

hadoop fsck / -files -locations -blocks

Note: you need to check the NN’s log (located in logs/hadoop-ding-namenode-MACHINENAME.log) to see the datanode pipeline sequence for the last block of the file. For example, in my case, I found this msg:

2013-06-20 14:37:27,041 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: updatePipeline(block=BP-1107653285-127.0.1.1-1371753149490:blk_8462409925548039750_1011, newGenerationStamp=1012, newLength=23183360, newNodes=[172.16.124.129:50010, 172.16.124.128:50010, 172.16.124.130:50010], clientName=DFSClient_NONMAPREDUCE_-1158938200_1)

G) Now stop DN3 in the write pipeline:

In this case, this DN is 172.16.124.130 (see above)

 hadoop-daemon.sh stop datanode

 --- Question: will the error still occur if we stop this DN3 for a while and then append?

    Answer: No, but you need to wait for quite a while. At some point the NN will notice that DN3 is down, and then replicate all the blocks on that DN to a new DN:

2013-06-21 12:46:38,040 INFO org.apache.hadoop.hdfs.StateChange: BLOCK* ask 172.16.124.131:50010 to replicate blk_-5787218417578301815_1038 to datanode(s) 172.16.124.130:50010

H) append to the file:

 java -cp `hadoop classpath`:build/classes/ mypackage.HDFS_4022

Hello World

  --- this java program (see below) does nothing more than appending to a file.

 

ding@research-virtual-machine:~/research/hadoop/hadoop-2.0.0-alpha-src/hadoop-dist/target/hadoop-2.0.0-alpha/append_prog$ java -cp `hadoop classpath`:build/classes/ mypackage.HDFS_4022

Hello World

13/06/20 14:39:15 INFO hdfs.DFSClient: Exception in createBlockOutputStream

java.io.IOException: Bad connect ack with firstBadLink as 172.16.124.130:50010

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1053)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:943)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)

13/06/20 14:39:15 WARN hdfs.DFSClient: Error Recovery for block BP-1107653285-127.0.1.1-1371753149490:blk_8462409925548039750_1012 in pipeline 172.16.124.129:50010, 172.16.124.128:50010, 172.16.124.130:50010: bad datanode 172.16.124.130:50010

13/06/20 14:39:15 DEBUG datatransfer.DataTransferProtocol: lastAckedSeqno = -1

13/06/20 14:39:15 DEBUG hdfs.DFSClient: Connecting to datanode 172.16.124.128:50010

13/06/20 14:39:15 DEBUG hdfs.DFSClient: Send buf size 131071

13/06/20 14:39:15 WARN hdfs.DFSClient: DataStreamer Exception

java.io.EOFException: Premature EOF: no length prefix available

        at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)

        .. ..

Exception in thread "main" java.io.EOFException: Premature EOF: no length prefix available

        .. ..

13/06/20 14:39:15 ERROR hdfs.DFSClient: Failed to close file /random.txt

java.io.EOFException: Premature EOF: no length prefix available

        .. ..

The complete logs are located at:

baker221: ~ding/research/failure_cases/hdfs-3436

Java program (same as the one used in 4022):

package mypackage;

import java.net.URI;

import org.apache.hadoop.conf.Configuration;

import org.apache.hadoop.fs.FSDataOutputStream;

import org.apache.hadoop.fs.Path;

import org.apache.hadoop.hdfs.DistributedFileSystem;

import org.apache.hadoop.io.IOUtils;

public class HDFS_4022 {

  public static void main (String[] args) throws Exception {

     System.out.println("Hello World");

     Configuration conf=     new Configuration();

     DistributedFileSystem dfs=      null;

     dfs=    (DistributedFileSystem) DistributedFileSystem.get(URI.create("hdfs://192.168.103.137:54310/"), conf);

     Path f= new Path ("/random.txt");

     FSDataOutputStream fsDataOutputStream=  dfs.append(f);

     fsDataOutputStream.write("First Battle of Bull Run (First Manassas)".getBytes());

     fsDataOutputStream.close();

  }

}

2.2.1 Summary of the key events

 A) Shutdown DN3 in the write pipeline

 B) Append to the file

2.2.2 Timing order

 B) should occur relatively quickly after A).

2.2.3 Externally controllable?

 Yes. If A) and B) occur in this particular sequence, the failure will occur!

2.3 Can the logs tell how to reproduce the failure?

Probably.

- Last DN of the pipeline of the LAST block is down

- Then append!

2.4 How many machines are needed?

4. (or 3),

need a 3-nodes pipeline.

3. Diagnosis:

OK, now, what is the problem?

3.1 Detailed symptom

The exception:

I do not understand the meaning of this exception. What is the expected behavior? What does it mean by “Premature EOF: no length prefix available”?

But at least it tells clearly where in the code it is printed.

It also tells about the troublesome block:

blk_8462409925548039750_1012

:

Since there are too many log msgs from 5 components, we first look at the relevant ones:

Client log:

13/06/20 14:39:14 WARN util.KerberosName: Kerberos krb5 configuration not found, setting default realm to empty

13/06/20 14:39:15 DEBUG hdfs.DFSClient: Short circuit read is false

13/06/20 14:39:15 DEBUG hdfs.DFSClient: computePacketChunkSize: src=/random.txt, chunkSize=475, chunksPerPacket=1, packetSize=506

13/06/20 14:39:15 DEBUG hdfs.LeaseRenewer: Lease renewer daemon for [DFSClient_NONMAPREDUCE_-1152429769_1] with renew id 1 started

13/06/20 14:39:15 DEBUG hdfs.DFSClient: DFSClient writeChunk allocating new packet seqno=0, src=/random.txt, packetSize=506, chunksPerPacket=1, bytesCurBlock=2318

13/06/20 14:39:15 DEBUG hdfs.DFSClient: DFSClient writeChunk packet full seqno=0, src=/random.txt, bytesCurBlock=23183442, blockSize=67108864, appendChunk=true

13/06/20 14:39:15 DEBUG hdfs.DFSClient: Queued packet 0

13/06/20 14:39:15 DEBUG hdfs.DFSClient: Queued packet 1

13/06/20 14:39:15 DEBUG hdfs.DFSClient: Waiting for ack for: 1

13/06/20 14:39:15 DEBUG hdfs.DFSClient: Append to block BP-1107653285-127.0.1.1-1371753149490:blk_8462409925548039750_1012

13/06/20 14:39:15 DEBUG hdfs.DFSClient: pipeline = 172.16.124.129:50010

13/06/20 14:39:15 DEBUG hdfs.DFSClient: pipeline = 172.16.124.128:50010

13/06/20 14:39:15 DEBUG hdfs.DFSClient: pipeline = 172.16.124.130:50010

13/06/20 14:39:15 DEBUG hdfs.DFSClient: Connecting to datanode 172.16.124.129:50010

13/06/20 14:39:15 DEBUG hdfs.DFSClient: Send buf size 131071

13/06/20 14:39:15 INFO hdfs.DFSClient: Exception in createBlockOutputStream

java.io.IOException: Bad connect ack with firstBadLink as 172.16.124.130:50010

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.createBlockOutputStream(DFSOutputStream.java:1053)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:943)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)

13/06/20 14:39:15 WARN hdfs.DFSClient: Error Recovery for block BP-1107653285-127.0.1.1-1371753149490:blk_8462409925548039750_1012 in pipeline 172.16.124.129:5001

13/06/20 14:39:15 DEBUG datatransfer.DataTransferProtocol: lastAckedSeqno = -1

13/06/20 14:39:15 DEBUG hdfs.DFSClient: Connecting to datanode 172.16.124.128:50010

13/06/20 14:39:15 DEBUG hdfs.DFSClient: Send buf size 131071

13/06/20 14:39:15 WARN hdfs.DFSClient: DataStreamer Exception

java.io.EOFException: Premature EOF: no length prefix available

        at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:865)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:843)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:934)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)

Exception in thread "main" java.io.EOFException: Premature EOF: no length prefix available

        at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:865)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:843)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:934)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)

13/06/20 14:39:15 ERROR hdfs.DFSClient: Failed to close file /random.txt

java.io.EOFException: Premature EOF: no length prefix available

        at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.transfer(DFSOutputStream.java:865)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.addDatanode2ExistingPipeline(DFSOutputStream.java:843)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.setupPipelineForAppendOrRecovery(DFSOutputStream.java:934)

        at org.apache.hadoop.hdfs.DFSOutputStream$DataStreamer.run(DFSOutputStream.java:461)

3.2 Backward Inference

NN logs:

2013-06-20 14:37:26,778 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.appendFile: file /random.txt for DFSClient_NONMAPREDUCE_-1158938200_1 at 172.16.124.129 block BP-1107653285-127.0.1.1-1371753149490:blk_8462409925548039750_1011{blockUCState=UNDER_CONSTRUCTION, primaryNodeIndex=-1, replicas=[ReplicaUnderConstruction[172.16.124.129:50010|RBW], ReplicaUnderConstruction[172.16.124.128:50010|RBW], ReplicaUnderConstruction[172.16.124.130:50010|RBW]]} block size 23183360

2013-06-20 14:37:27,219 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block blk_8462409925548039750_1012 on 172.16.124.130:50010 size 23183401 replicaState = FINALIZED

2013-06-20 14:37:27,223 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block blk_8462409925548039750_1012 on 172.16.124.128:50010 size 23183401 replicaState = FINALIZED

2013-06-20 14:37:27,226 DEBUG org.apache.hadoop.hdfs.server.blockmanagement.BlockManager: Reported block blk_8462409925548039750_1012 on 172.16.124.129:50010 size 23183401 replicaState = FINALIZED

2013-06-20 14:39:15,663 DEBUG org.apache.hadoop.hdfs.StateChange: BLOCK* block RECEIVING_BLOCK: blk_8462409925548039750_1013 is received from DatanodeRegistration(172.16.124.128, storageID=DS-1891234353-127.0.1.1-50010-1371753278058, infoPort=50075, ipcPort=50020, storageInfo=lv=-40;cid=CID-2d21384f-d2ef-4922-893c-fe208aa332fb;nsid=1270822596;c=0)

Then there was an abnormal append, the genstamp becomes: 1013 on DN1 and DN2. But nothing suspicious from NN’s log.

DN1's log:

2013-06-20 14:39:15,646 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Appending to replica FinalizedReplica, blk_8462409925548039750_1012, FINALIZED

2013-06-20 14:39:15,663 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Datanode 2 got response for connect ack  from downstream datanode with firstbadlink as 172.16.124.130:50010

2013-06-20 14:39:15,663 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Datanode 2 forwarding connect ack to upstream firstbadlink is 172.16.124.130:50010

   --- These 2 msgs seems OK: DN130 cannot be reached..

2013-06-20 14:39:15,664 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: PacketResponder: BP-1107653285-127.0.1.1-1371753149490:blk_8462409925548039750_1013, type=HAS_DOWNSTREAM_IN_PIPELINE

java.io.EOFException: Premature EOF: no length prefix available

        at org.apache.hadoop.hdfs.protocol.HdfsProtoUtil.vintPrefixed(HdfsProtoUtil.java:162)

        at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:95)

        at org.apache.hadoop.hdfs.server.datanode.BlockReceiver$PacketResponder.run(BlockReceiver.java:993)

        at java.lang.Thread.run(Thread.java:662)

   --- This EOFException seems suspicious...

2013-06-20 14:39:15,673 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: Exception in receiveBlock for BP-1107653285-127.0.1.1-1371753149490:blk_8462409925548039750_1013

java.io.IOException: Connection reset by peer

        at sun.nio.ch.FileDispatcher.read0(Native Method)

        .. .. ..

2013-06-20 14:39:15,675 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: research-virtual-machine:50010:DataXceiver error processing WRITE_BLOCK operation  src: /172.16.124.129:56689 dest: /172.16.124.129:50010

java.io.IOException: Connection reset by peer

        at sun.nio.ch.FileDispatcher.read0(Native Method)

        .. .. ..

This “Premature EOF” exception is suspicious (and is also seen in client). Let’s see where they are printed:

      /* Client code. */

   private boolean createBlockOutputStream(DatanodeInfo[] nodes, long newGS,

     boolean recoveryFlag) {

         .. .. ..

      try {

         .. .. ..

        out = new DataOutputStream(new BufferedOutputStream(

            NetUtils.getOutputStream(s, writeTimeout),

            HdfsConstants.SMALL_BUFFER_SIZE));

       

        .. .. ..

        // receive ack for connect

        BlockOpResponseProto resp = BlockOpResponseProto.parseFrom(

            HdfsProtoUtil.vintPrefixed(blockReplyStream));

        pipelineStatus = resp.getStatus();

        firstBadLink = resp.getFirstBadLink();

       

        if (pipelineStatus != SUCCESS) {

          if (pipelineStatus == Status.ERROR_ACCESS_TOKEN) {

            throw new InvalidBlockTokenException(

                "Got access token error for connect ack with firstBadLink as "

                    + firstBadLink);

          } else {

            throw new IOException("Bad connect ack with firstBadLink as "

                + firstBadLink);

               --- This was printed...

          }

        }

      } 

      .. .. ..

    }

After the first exception is received, the client tries to recover, and tried setupPipelineForAppendOrRecovery again.

Locate the source:

DN1’s code:

  public static InputStream vintPrefixed(final InputStream input)

  throws IOException {

    final int firstByte = input.read();

      ← Didn’t read good stuffs from the DN..

    if (firstByte == -1) {

      throw new EOFException("Premature EOF: no length prefix available");

    }

    .. ..

  }

Caller:

private void transfer(final DatanodeInfo src, final DatanodeInfo[] targets,

        final Token<BlockTokenIdentifier> blockToken) throws IOException

{

      //transfer replica to the new datanode

      Socket sock = null;

      DataOutputStream out = null;

      DataInputStream in = null;

      try {

        sock = createSocketForPipeline(src, 2, dfsClient);

        .. ..

        //ack

        BlockOpResponseProto response =

          BlockOpResponseProto.parseFrom(HdfsProtoUtil.vintPrefixed(in));

}

This is exception is never handled...Basically, this error is when the socket created by DN1 contained nothing. This socket is to the next DN2. Then let's examine DN2's log.

DN2's log:

2013-06-20 14:39:15,634 INFO org.apache.hadoop.hdfs.server.datanode.fsdataset.impl.FsDatasetImpl: Appending to replica FinalizedReplica, blk_8462409925548039750_1012, FINALIZED

2013-06-20 14:39:15,640 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: DataNode{data=FSDataset{dirpath='[/home/ding/research/hadoop/hadoop-2.0.0-alpha-src/hadoop-dist/target/hadoop-2.0.0-alpha/dfs/data/current]'}, localName='research-virtual-machine:50010', storageID='DS-1891234353-127.0.1.1-50010-1371753278058', xmitsInProgress=0}:Exception transfering block BP-1107653285-127.0.1.1-1371753149490:blk_8462409925548039750_1013 to mirror 172.16.124.130:50010: java.net.ConnectException: Connection refused

2013-06-20 14:39:15,640 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: opWriteBlock BP-1107653285-127.0.1.1-1371753149490:blk_8462409925548039750_1013 received

exception java.net.ConnectException: Connection refused

java.net.ConnectException: Connection refused

        at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)

        .. .. ..

2013-06-20 14:39:15,663 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: research-virtual-machine:50010:DataXceiver error processing TRANSFER_BLOCK operation  src: /172.16.124.129:60110 dest: /172.16.124.128:50010

java.io.IOException: BP-1107653285-127.0.1.1-1371753149490:blk_8462409925548039750_1012 is neither a RBW nor a Finalized, r=ReplicaBeingWritten, blk_8462409925548039750_1013, RBW

  getNumBytes()     = 23183401

  getBytesOnDisk()  = 23183401

  getVisibleLength()= 23183401

  getVolume()       = /home/ding/research/hadoop/hadoop-2.0.0-alpha-src/hadoop-dist/target/hadoop-2.0.0-alpha/dfs/data/current

  getBlockFile()    = /home/ding/research/hadoop/hadoop-2.0.0-alpha-src/hadoop-dist/target/hadoop-2.0.0-alpha/dfs/data/current/BP-1107653285-127.0.1.1-1371753149490/current/rbw/blk_8462409925548039750

  bytesAcked=23183401

  bytesOnDisk=23183401

        at org.apache.hadoop.hdfs.server.datanode.DataNode.transferReplicaForPipelineRecovery(DataNode.java:2048)

        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.transferBlock(DataXceiver.java:524)

        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opTransferBlock(Receiver.java:114)

        at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:78)

        at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:189)

        at java.lang.Thread.run(Thread.java:662)

DN3's log:

2013-06-20 14:38:48,609 INFO org.apache.hadoop.hdfs.server.datanode.DataNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down DataNode at research-virtual-machine/127.0.1.1

************************************************************/

  --- Nothing suspicious here..

Locate the source in DN2:

  /**

   * Transfer a replica to the datanode targets.

   * @param b the block to transfer.

   *          The corresponding replica must be an RBW or a Finalized.

   *          Its GS and numBytes will be set to

   *          the stored GS and the visible length.

   * @param targets

   * @param client

   */

  void transferReplicaForPipelineRecovery(final ExtendedBlock b,

      final DatanodeInfo[] targets, final String client) throws IOException {

    //get replica information

    synchronized(data) {

      if (data.isValidRbw(b)) {

        stage = BlockConstructionStage.TRANSFER_RBW;

      } else if (data.isValidBlock(b)) {

        stage = BlockConstructionStage.TRANSFER_FINALIZED;

      } else {

        final String r = data.getReplicaString(b.getBlockPoolId(), b.getBlockId());

        throw new IOException(b + " is neither a RBW nor a Finalized, r=" + r);

“java.io.IOException: BP-1107653285-127.0.1.1-1371753149490:blk_8462409925548039750_1012 is neither a RBW nor a Finalized, r=ReplicaBeingWritten, blk_8462409925548039750_1013, RBW”

      }

      .. .. ..

  }

Here:

b: blk_8462409925548039750_1012

r: ReplicaBeingWritten, blk_8462409925548039750_1013, RBW

Now, you can see the mismatch of the genstamp. 'b' is from the client; 'r' is from the local repo.

FsDatasetImpl.java: isValidBlock returns true if the block is valid.

  /**

   * Check whether the given block is a valid one.

   * valid means finalized

   */

  public boolean isValidBlock(ExtendedBlock b) {

    return isValid(b, ReplicaState.FINALIZED);  // →FsDatasetImpl.isValid (ExtendedBlock, ReplicaState)

  }

FsDatasetImpl.java:

  /** Does the block exist and have the given state? */

  private boolean isValid(final ExtendedBlock b, final ReplicaState state) {

    final ReplicaInfo replicaInfo = volumeMap.get(b.getBlockPoolId(),

        b.getLocalBlock());  // →ReplicaMap.get(String bpid, Block)

    return replicaInfo != null

        && replicaInfo.getState() == state

        && replicaInfo.getBlockFile().exists();

  }

 /**

   * Get the meta information of the replica that matches both block id

   * and generation stamp

   * @param bpid block pool id

   * @param block block with its id as the key

   * @return the replica's meta information

   * @throws IllegalArgumentException if the input block or block pool is null

   */

  ReplicaInfo get(String bpid, Block block) {

    checkBlockPool(bpid);

    checkBlock(block);

    ReplicaInfo replicaInfo = get(bpid, block.getBlockId());

    if (replicaInfo != null &&

        block.getGenerationStamp() == replicaInfo.getGenerationStamp()) {

          ← 1100 (from Client) != 1101 (local replica) This if test failed!!

      return replicaInfo;

    }

    return null;

  }

The reason, similar to 4022, is the mismatch of genstamp. The genstamp from the network (from the client) and from the local repo doesn't match!

In fact, if you take a look at the NN or DN2's log, you can see that the genstamp becomes 1013.

The root cause: while the failing append caused the genstamp to become 1013 in DN1 and DN2, the client is not aware of it. Therefore when there is a network error, the client asks the DNs to find another node to replicate this block, but gave them the wrong genstamp, causing DN2 to throw an error and cause the exception of "premature EOF" in DN1.

4. Root cause:

Client uses a stale generation stamp of a block, causing DN to refuse replicate it.

4.1 Category:

Incorrect exception handling. (DN3 is down, which happens to be the last DN in an append pipeline).

5. Fix

5.1 How?

Simple simple simple: in this case, in DN2's code, update the genstamp

before verify the validity of the block:

Index:

hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java

===================================================================

---

        hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java(revision

1341741)

+++

        hadoop-hdfs-project/hadoop-hdfs/src/main/java/org/apache/hadoop/hdfs/server/datanode/DataNode.java(working

copy)

@@ -2028,6 +2028,18 @@

     //get replica information

     synchronized(data) {

+      Block storedBlock = data.getStoredBlock(b.getBlockPoolId(),

+          b.getBlockId());

+      if (null == storedBlock) {

+        throw new IOException(b + " not found in datanode.");

+      }

+      storedGS = storedBlock.getGenerationStamp();

+      if (storedGS < b.getGenerationStamp()) {

+        throw new IOException(storedGS

+            + " = storedGS < b.getGenerationStamp(), b=" + b);

+      }

+      // Update the genstamp with storedGS

+      b.setGenerationStamp(storedGS);

       if (data.isValidRbw(b)) {

         stage = BlockConstructionStage.TRANSFER_RBW;

       } else if (data.isValidBlock(b)) {

@@ -2036,18 +2048,9 @@

         final String r = data.getReplicaString(b.getBlockPoolId(),

b.getBlockId());

         throw new IOException(b + " is neither a RBW nor a Finalized,

r=" + r);

       }

-

-      storedGS = data.getStoredBlock(b.getBlockPoolId(),

-          b.getBlockId()).getGenerationStamp();

-      if (storedGS < b.getGenerationStamp()) {

-        throw new IOException(

-            storedGS + " = storedGS < b.getGenerationStamp(), b=" + b);

-      }

       visible = data.getReplicaVisibleLength(b);

     }

-

-    //set storedGS and visible length

-    b.setGenerationStamp(storedGS);

+    //set visible length

     b.setNumBytes(visible);