HDFS-4807 Report

1. Symptom

When a client want to create a socket for a write pipeline, the client will connect to the datanode with a specified timeout value. However, as the number of datanodes in the pipeline increases, connection timeout increases. Since this is a simple single node connection, it is unnecessary to increase connection timeout based on the number of datanodes.

1.1 Severity

Major

1.2 Was there exception thrown?

No

1.2.1 Were there multiple exceptions?

No

1.3 Scope or the failure

Only affect the performance of the connection between the client and the datanodes

2. How to reproduce this failure

2.0 Version

0.23.8

2.1 Configuration

No special configuration

2.2 Reproduction procedure

1 start the dfs with only one/two/three datanode (s)

2 client writes to the datanode(s)

3 fail the connection between the client and the datanode(s)

4 record the timeout with different datanodes

2.2.1 Timing order

No

2.2.2 Events order externally controllable?

Yes

2.3 Can the logs tell how to reproduce the failure?

No

2.4 How many machines needed?

At least two (1NN+2DN)

3. Diagnosis procedure

3.1 Detailed Symptom (where you start)

Timeout increases as the number of datanodes increases

3.2 Backward inference (how do you infer from the symptom to the root cause)

We can roughly tell that the connection time increases as we add more datanodes. Then we should check the connecting function to see why the connection timeout is determined by the number of datanodes.

4. Root cause

Original code:

createSocketForPipeline function In the DFSOutputStream.java file:

final int timeout = client.getDatanodeReadTimeout(length);

NetUtils.connect(sock, isa, timeout);

sock.setSoTimeout(timeout);

getDatanodeReadTimeout function in the DFSClient.java:

int getDatanodeReadTimeout(int numNodes) {

        return dfsClientConf.socketTimeout > 0 ?

            (HdfsServerConstants.READ_TIMEOUT_EXTENSION * numNodes +

                dfsClientConf.socketTimeout) : 0;

  }

---- HdfsServerConstants.READ_TIMEOUT_EXTENSION is a constant

connect fucntion in the NetUtils.java:

public static void connect(Socket socket,

                                 SocketAddress endpoint,

                                 int timeout) throws IOException {

        … …

        SocketChannel ch = socket.getChannel();

   

        if (ch == null) {

              socket.connect(endpoint, timeout);

        } else {

            SocketIOWithTimeout.connect(ch, endpoint, timeout);

        }

In the createSocketForPipeline() function, connect() is called with a timeout set to (socket_timeout + read_extension * num_datanodes). Since it is simply connecting to a single node, it does not make sense to increase connection timeout based on the number of datanodes.

4.1 Category:

Semantic

5. Fix

         final int timeout = client.getDatanodeReadTimeout(length);

-        NetUtils.connect(sock, isa, timeout);

+        NetUtils.connect(sock, isa, client.getConf().socketTimeout);

         sock.setSoTimeout(timeout);

5.1 How?

Only pass the socketTimeout to the connect function.