HDFS-4344 Report

1. Symptom

If the “dfs.hosts” and “dfs.hosts.exclude” files contain a port number, the datanode commission/decommission operations won’t succeed.  The web UI will show the following error messages:

Problem accessing /dfshealth.jsp. Reason:

 

        For input string: ":9999"

 

Caused by:

 

java.lang.NumberFormatException: For input string: ":9999"

                    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)

                    at java.lang.Integer.parseInt(Integer.java:449)

               .. ..

1.1 Severity

Major

1.2 Was there exception thrown?

Yes

1.2.1 Were there multiple exceptions?

No.

2. How to reproduce this failure

2.0 Version

2.0.2-alpha

2.1 Configuration

Namenode conf directory:

Add the following lines to hdfs-site.xml .

<property>

        <name>dfs.hosts</name>

              <value>/home/research/hadoop-2.0.2-alpha/etc/hadoop/include</value>

        <final>true</final>

</property>

 

Before I run hdfs I just leave the include file blank.

 

2.2 Reproduction procedure

1 Start hdfs: start-dfs.sh

2 Commission one datanode:

1) Add the datanode IP address with port number in the include file:

echo master:9999 >> include

2) Update the namenode with the new set of permitted datanodes using this command: hadoop dfsadmin -refreshNodes

3 Check the datanode states whether a datenode is commissioned: hadoop dfsadmin -report or check the web UI

2.3 Can the logs tell how to reproduce the failure?

Yes

2013-07-18 17:09:27,126 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting

2013-07-18 17:09:27,127 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 9999: starting

2013-07-18 17:09:27,131 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: NameNode up at: master/192.168.59.165:9999

2013-07-18 17:09:27,131 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Starting services required for active state

2013-07-18 17:09:53,853 INFO org.apache.hadoop.util.HostsFileReader: Setting the includes file to /home/research/hadoop-2.0.2-alpha/etc/hadoop/include

---- set the include file

2013-07-18 17:09:53,853 INFO org.apache.hadoop.util.HostsFileReader: Setting the excludes file to

2013-07-18 17:09:53,853 INFO org.apache.hadoop.util.HostsFileReader: Refreshing hosts (include/exclude) list

---- this is where I run the command: hadoop dfsadmin –refreashNodes

2013-07-18 17:09:53,853 INFO org.apache.hadoop.util.HostsFileReader: Adding master:9999 to the list of hosts from /home/research/hadoop-2.0.2-alpha/etc/hadoop/include

----  read the datanode I want to add to the cluster  

2013-07-18 17:09:58,222 ERROR org.mortbay.log: /dfshealth.jsp

---- error message

java.lang.NumberFormatException: For input string: ":9999"

                    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)

                    at java.lang.Integer.parseInt(Integer.java:449)

                    at java.lang.Integer.valueOf(Integer.java:554)

                     .. ..

2013-07-18 17:10:29,220 WARN org.apache.hadoop.ipc.Server: IPC Server handler 7 on 9999, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getDatanodeReport from 192.168.59.165:45069: error: java.lang.NumberFormatException: For input string: ":9999"

2.4 How many machines needed?

One VM machine (One NN and one DN)  

3. Diagnosis procedure

3.1 Detailed symptom

2013-07-18 17:09:58,222 ERROR org.mortbay.log: /dfshealth.jsp

java.lang.NumberFormatException: For input string: ":9999"

                    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)

                    at java.lang.Integer.parseInt(Integer.java:449)

                    at java.lang.Integer.valueOf(Integer.java:554)

                    … ...

3.2 Backward inference

First of the all, from the ERROR log message “java.lang.NumberFormatException: For input string: ":9999"”, we know that a NumberFormatException is thrown for the input string “:9999”. So, the error probably is caused by an extra “:” in this string. Then, from this log message “org.apache.hadoop.hdfs.server.blockmanagement.DatanodeManager.parseDNFromHostsEntry(DatanodeManager.java:832)” we can tell that there may exist a bug at the parseDNFromHostsEntry function in the DatanodeManager.java file. And the bug may be parsing the wrong string to port number.

private DatanodeID parseDNFromHostsEntry(String hostLine) {

        DatanodeID dnId;

        String hostStr;

        int port;

        int idx = hostLine.indexOf(':');

 

        if (-1 == idx) {

          hostStr = hostLine;

          port = DFSConfigKeys.DFS_DATANODE_DEFAULT_PORT;

        } else {

          hostStr = hostLine.substring(0, idx);

          port = Integer.valueOf(hostLine.substring(idx));

}

.........

4. Root cause

The root cause is that the port number is supposed to be the number after the “:”.However, the original code parsers the string from “:” to the port integer. Therefore, the port number becomes “: (integer)”For example, in my case, the string should by “9999” instead of “:9999”

           port = DFSConfigKeys.DFS_DATANODE_DEFAULT_PORT;

         } else {

           hostStr = hostLine.substring(0, idx);

-          port = Integer.valueOf(hostLine.substring(idx));

+          port = Integer.valueOf(hostLine.substring(idx+1));

         }

 

         if (InetAddresses.isInetAddress(hostStr)) {

4.1 Category:

Semantic