HDFS-2460 Report

1. Symptom

webhdfs returns content type text/htmlwhen an error is thrown, we should return application/json

1.1 Severity

Major

1.2 Was there exception thrown?

No

1.2.1 Were there multiple exceptions?

No

1.3 Scope of the failure

Webhdfs (client side only), does not affect other parts of hdfs or cause dataloss.

2. How to reproduce this failure

Make a “create” call using rest api (hdfs api) and send an invalid value for parameter overwrite

2.0 Version

0.20.205.0, 0.23.0

2.1 Configuration

The hdfs needs to be enabled.

2.2 Reproduction procedure

1. Enable webhdfs by add this property to the configuration file

 <property>

               <name>dfs.webhdfs.enabled</name>

               <value>true</value>

 </property>

2. Start hdfs

3. make a create call using rest api and send an invalid value for parameter overwrite

4. observe returned content type: type text/html; charset=utf-8

5. the correct result should be application/json (official internet media type for javascript object notation) instead

2.2.1 Timing order

Hdfs with webhdfs must be started before the error can be reproduced. But this is trivial.

2.2.2 Events order externally controllable?

Yes

2.3 Can the logs tell how to reproduce the failure?

Since this failure is observed on the webhdfs client side, it is very hard to diagnose. There are no logs on the webhdfs client side. Logs on the server side is not helpfu.

2.4 How many machines needed?

1

3. Diagnosis procedure

Domain knowledge is needed for this failure. Developers need to know where the bug is. Even then the developer took more than 20 increment to fix this “simple problem” part of a much larger problem.  

3.1 Detailed Symptom (where you start)

The symptom is the wrong content type (text/html) is returned instead of application/json type.

3.2 Backward inference (how do you infer from the symptom to the root cause)

Backward inference is virtually impossible. But a rough guideline is outlined below.

1. HDFS-2460 returns unexpected content type.

2. Realize that it seems similar to an earlier bug (HDFS-2453) had 2 content types being returned.

3. So we look at hdfs-2453 instead which is about “tail using a webhdfds uri throws an error”. At first, the description is completely useless. Only digging into the details would one know that this two failures are somewhat related. Infact hdfs-2460 is a special case of hdfs-2453.

4. However upon looking at hdfs-2453, it is not intuitive. It incorporates another failure hdfs-2456) as well being affected by HDFS-2385. HDFS-2385 depends on MAPREDUCE-2764 and HADOOP-7510.

5. So to summarize, backward inference for hdfs2460 is only possible if someone has worked on HDFS-2453, hdfs 2456, hdfs-2385, mapreduce-2764, hadoop-7510.

4. Root cause

Semantic error (no “error handling” for error case, only the “success” case is handled). See HDFS-2453

4.1 Category:

Semantic error.

5. Fix

Apply HDFS-2453 patch. We do not “fix” the bug. We added basically whole functions to implement the missing “feature”. Code can be found in hdfs-2453 latest patch file.  

5.1 How?

Write new functions to allow webhdfs to return the correct content type. Or run the patch.