HDFS-4505 Report

1. Symptom

hdfs balancer failed with exception:

13/07/22 11:43:00 WARN hdfs.DFSClient: DataStreamer Exception

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /system/balancer.id File does not exist.

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2317)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2308)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2104)

at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:469)

at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:294)

at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.ja

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:910)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1696)

at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1692)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)

at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1690)

.. …

13/07/22 11:43:00 ERROR hdfs.DFSClient: Failed to close file /system/balancer.id

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.LeaseExpiredException): No lease on /system/balancer.id File does not exist.

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2317)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.checkLease(FSNamesystem.java:2308)

at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2104)

at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:469)

.. ..

1.1 Severity

Major

1.2 Was there exception thrown?

Yes

1.3 Were there multiple exceptions?

No.

1.5 Scope of the failure

One client failure.

2. How to reproduce this failure

2.0 Version

2.0.2-alpha

2.1 Configuration

Set-up federation (standard federation config):

http://hadoop.apache.org/docs/r2.0.2-alpha/hadoop-yarn/hadoop-yarn-site/Federation.html

To reproduce this particular production scenario, we need to change the DNS mapping so that both the first NN and the 2nd NN’s URL maps to the same IP (say the first NN’s IP).

However, another way to trigger the exact failure is to use “IP”, instead of hostnames, in core-site.xml and hdfs-site.xml. See the configurations I stored in baker221.

2.2 Reproduction procedure

just “hdfs balancer” and you should see the symptom.

2.3 Can the logs tell how to reproduce the failure?

A bit hard. This requires unique configuration: everything should be in IP format, instead of hostname format. How can we know this?

2.4 How many machines needed?

1. 1NN + 1DN.

3. Diagnosis procedure

3.1 Detailed Symptom (where you start)

The balancer log (client log) exceptions (see above). This suggests that this file “/system/balancer.id” cannot be found by the balancer.

3.2 Backward inference

1. The log tells clearly where this exception was thrown:

private void checkLease(String src, String holder, INode file)

throws LeaseExpiredException {

assert hasReadOrWriteLock();

if (file == null || file.isDirectory()) {

Lease lease = leaseManager.getLease(holder);

throw new LeaseExpiredException("No lease on " + src +

" File does not exist. " +

(lease != null ? lease.toString() :

"Holder " + holder +

" does not have any open files."));

}

.. .. ..

}

So we can infer the “src” passed in checkLease is null!

Now the question is why is this null?

checkLease(FSNamesystem.java:2317)

checkLease(FSNamesystem.java:2308)

private INodeFileUnderConstruction checkLease(String src, String holder) {

checkLease(src, holder, file);

}

at FSNamesystem.getAdditionalBlock(FSNamesystem.java:2104)

LocatedBlock getAdditionalBlock(String src,

String clientName,

ExtendedBlock previous,

HashMap<Node, Node> excludedNodes

) {

INodeFileUnderConstruction pendingFile = checkLease(src, clientName);

}

at NameNodeRpcServer.addBlock(NameNodeRpcServer.java:469)

public LocatedBlock addBlock(String src, ..)

throws IOException {

namesystem.getAdditionalBlock(src, clientName, previous, excludedNodesSet);

}

at ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:294)

public AddBlockResponseProto addBlock(RpcController controller,

AddBlockRequestProto req) throws ServiceException {

server.addBlock(req.getSrc(), req.getClientName(), .. ..);

}

--- So this is from “req.getSrc()”, which apparently returned NULL...

public String getSrc() {

java.lang.Object ref = src_;

if (ref instanceof String) {

return (String) ref;

} else {

com.google.protobuf.ByteString bs =

(com.google.protobuf.ByteString) ref;

String s = bs.toStringUtf8();

if (com.google.protobuf.Internal.isValidUtf8(bs)) {

src_ = s;

}

return s;

}

--- Then it is from the RPC...

at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:43138)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:454)

at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:910)

So we cannot infer too much from the code why it is null, without much experiences...

If we further examine the NN’s log, we can find this line not far away:

2013-07-22 11:43:00,168 INFO org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.completeFile: file /system/balancer.id is closed by DFSClient_NONMAPREDUCE_481114479_1

private boolean completeFileInternal(String src,

String holder, Block last) throws SafeModeException,

UnresolvedLinkException, IOException {

.. ..

INodeFileUnderConstruction pendingFile;

try {

pendingFile = checkLease(src, holder);

} .. ..

// commit the last block and complete it if it has minimum replicas

commitOrCompleteLastBlock(pendingFile, last);

finalizeINodeFileUnderConstruction(src, pendingFile);

NameNode.stateChangeLog.info("DIR* NameSystem.completeFile: file " + src

+ " is closed by " + holder);

return true;

}

So from here we can infer a few things:

1. So at some point it was able to open that file, and have that file closed...

2. In that process, it did a successful “checkLease”

Why it does “checkLease” again? and failed?

In fact, at this moment, an experienced developer should know that after the file is closed, it will further be deleted (DEBUG msg):

2013-07-22 11:43:00,188 DEBUG org.apache.hadoop.hdfs.StateChange: *DIR* Namenode.delete: src=/system/balancer.id, recursive=true

2013-07-22 11:43:00,188 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* NameSystem.delete: /system/balancer.id

2013-07-22 11:43:00,188 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.delete: /system/balancer.id

2013-07-22 11:43:00,188 DEBUG org.apache.hadoop.hdfs.StateChange: DIR* FSDirectory.unprotectedDelete: /system/balancer.id is removed

Now the developers will be able to infer that in fact, the checkLease, which is initiated by the “balancer”, ran twice on this machine! The first time it will remove the file /system/balancer.id, causing the 2nd one to fail.

But why it ran twice?

See this client log:

13/07/22 11:42:58 INFO balancer.Balancer: namenodes = [hdfs://128.100.23.4:54310, hdfs://baker221.eecg.utoronto.ca:54310]

Experienced devs can suspect that the two namenodes might be the same (in this case they indeed made such suspision and solved the bug).

4. Root cause

When DNS maps two hostnames to the same IP, balancer will fail since the first run will simply delete /system/balancer.id which is required by the 2nd balancer.

4.1 Category:

Incorrect exception handling --- the DNS mapping error was not handled.

5. Fix

5.1 How?

The fix:

https://issues.apache.org/jira/secure/attachment/12569389/HADOOP-9172.patch

When client is about to send balancer request to all the NN, first resolve all the hostname and map them to the unique IP addresses. Also do not overwrite the /system/balancer.id file.

Note that this same failure can also be triggered by a different setting:

in the configuration, everything is in “IP”, instead of hostname.

See the report of HDFS-4458 for more detail.