HDFS-2358 Report

1. Symptom

In core-site.xml, if default filesystem's URI (fs.default.name) is misconfigured, the namenode log message shows a NullPointerException instead of a meaningful error message.

1.1 Severity

Major

1.2 Was there exception thrown?

Yes

1.2.1 Were there multiple exceptions?

No

2. How to reproduce this failure

2.0 Version

0.20.204.0

2.1 Configuration

Misconfiguring the default filesystem’s URI in the core-site.xml file

<property>

  <name>fs.default.name</name>

  <value></value>

  <description>

  </description>

</property>

2.2 Reproduction procedure

1 format namenode

2 start dfs

2.3 Can the logs tell how to reproduce the failure?

Yes

2.4 How many machines needed?

One

3. Diagnosis procedure

3.1 Detailed Symptom (where you start)

Namenode log message:

2013-07-29 17:21:36,072 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started

2013-07-29 17:21:36,078 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.NullPointerException

                    at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:136)

                    at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:176)

                    at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:206)

                    at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:240)

                    at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:434)

                    at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1153)

                    at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1162)

 

2013-07-29 17:21:36,079 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

3.2 Backward inference (how do you infer from the symptom to the root cause)

The namenode log message clearly shows a NullPointerException is thrown in the createSocketAddr function, and this is because the String target is set to null. From the log message we can see that the createSocketAddr is called by getAddress function, which passes String address, and called by another createSocketAddr which pass the FileSystem.getDefaultUri(conf).getAuthority() to the first createSocketAddr. Because the I don’t set the value of fs.default.name in the core-site.xml configuration file, the FileSystem.getDefaultUri(conf).getAuthority() is null. Therefore, the String target in the createSocketAddr function becomes null.

4. Root cause

4.1 Category:

Semantic Error

5. Fix

  public static InetSocketAddress getAddress(Configuration conf) {
-    return getAddress(FileSystem.getDefaultUri(conf).getAuthority());
+    return getAddress(FileSystem.getDefaultUri(conf).toString());
  }

The Namenode log message after the fix:

org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NameNode metrics system started

2013-07-29 17:26:03,241 ERROR org.apache.hadoop.hdfs.server.namenode.NameNode: java.lang.IllegalArgumentException: Does not contain a valid host:port authority: file:///

        at org.apache.hadoop.net.NetUtils.createSocketAddr(NetUtils.java:184)

        at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:198)

        at org.apache.hadoop.hdfs.server.namenode.NameNode.getAddress(NameNode.java:228)

        at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:262)

        at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:497)

        at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1268)

        at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1277)

2013-07-29 17:26:03,242 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

5.1 How?

Passing the full URI, instead of just the authority to getAddress.