When connection is unusable, webhdfs will hang indefinitely.
Break the connection using any method (e.g. disconnect webhdfs’s network), and the problem will be apparent.
webhdfs configuration needed
1. Start hdfs with webhdfs
2. Terminate webhdfs’s network connection
3. Observe hang
Trivial timing. I.e. must start hdfs and webhdfs before reproducing failure.
There is no logs because timeout is not implemented in webhdfs.
Symptom leads to the root cause. Someone with domain knowledge would know this feature is not available in webhdfs. So in this case, the person who fixed HDFS-3166 (for hftp timeout) adapted the fix to hdfs-3180.
Trivial: start with a hanged hdfs request.
1. Once we realized webhdfs is hanged, we check the network connection and hdfs status
2. Then we have found the root cause.
3. Then we implemented a timeout feature.
No connection timeout handling in webhdfs
Implemented timeout feature in webhdfs. The detailed source code addition can be found in the patch.
Implemented new feature to take care of a connection failure event.