HDFS-3083 Report

https://issues.apache.org/jira/browse/HDFS-3083

1. Symptom

Early termination with Exception thrown.

In the setting of HA and Security, the use of delegation token result in job failure (e.g., mapreduce job failure, hdfs access failure, etc.). Unable to complete any hdfs jobs. InvalidToken exception is received.

Exceptions from client (from the bug reporter):

12/03/13 06:19:47 INFO mapreduce.Job: map 0% reduce 0%
12/03/13 06:19:47 INFO mapreduce.Job: Job job_1331619043691_0002 failed with state FAILED due to: Application application_1331619043691_0002 failed 1 times due to AM Container for appattempt_1331619043691_0002_000001 exited with exitCode: -1000 due to: RemoteTrace:
org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 40 for yarn) can't be found in cache
at org.apache.hadoop.ipc.Client.call(Client.java:1159)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:188)
.. .. ..

.Failing this attempt.. Failing the application.
12/03/13 06:19:47 INFO mapreduce.Job: Counters: 0
Job ended: Tue Mar 13 06:19:47 UTC 2012
The job took 3 seconds.

1.1 Severity

Critical

1.2 Was there exception thrown?

Yes. See above and Section 3.

1.3 Are there multiple exceptions?

Yes. The Failover, and the invalid token exception thrown from NN to client.

2. How to reproduce this failure

Summary:

- Need 2 machines (M1 and M2).

- Set up HA (Hadoop High Availability)

- Set up Security

- $ starting NN (start-dfs.sh) on M1

- $ kinit -k -t key_M1/hdfs.keytab hdfs/test on M1

- $ hdfs haadmin failover nn1 nn2 (so M2 becomes the primary and M1 becomes standby)

- start datanode

- $ hdfs fetchdt --renewer hdfs/test /tmp/token

- $ kdestroy

- $ HADOOP_TOKEN_FILE_LOCATION=/tmp/token hdfs dfs -ls /

---- This command will fail. Any operations accessing hdfs will fail.

Clearly, there are multiple events, and they have to occur in the right sequence.

http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/latest/CDH4-Security-Guide/CDH4-Security-Guide.html

This link is OK reference, but many steps are wrong. Please follow the

steps below.

2.0 Version

I used hadoop-2.0.2-alpha, and manually modified the code to reverse the

patch. I tried other versions but they didn't work.

2.1 Environment setting and configuration

A. Need 2 machines.

M1: as kerberos server, NFS server NN, and DN

M2: as NFS client, another NN (in hadoop HA)

B. Set up NFS:

The Hadoop High Availability (HA) requires 2 NNs on different nodes; it

also requires these 2 NNs to have a shared dir to access. Therefore we

need NFS. There is a gdoc titled "Setting up NFS" to describe this.

C. Set up HA:

http://hadoop.apache.org/docs/r2.0.2-alpha/hadoop-yarn/hadoop-yarn-site/HDFSHighAvailability.html

This is a good link to follow.

Simply use the configuration files I prepared or follow the link above.

No other software needed.

Once you set-up the config files, first format:

hdfs namenode -format

Run this on one machine ONLY (M1), and manually copy the local

namenode dir (specified in dfs.namenode.name.dir) to the other machine

(M2).

Then use the command:

$ start-dfs.sh

then:

$ hdfs haadmin -failover nn1 nn2

You should see failover successful. (note: you should always do a

failover before actual using the hdfs).

I am using 2 NN, and 1 DN in this case.

D. Set up Kerberos:

http://bobthegnome.blogspot.ca/2012/11/testing-kerberos-in-ubuntu.html

Basically just follow this post. There are several important things in

kerberos setting:

The realm (which is a domain): I just named the realm as "TEST".

The principal (user): you can create as many users as you want.

I used kadmin.local command on M1 to manage kerberos.

2.2 Reproduction procedure

All right, once you set up the above, you should be good to go!

A. create a principal (hdfs/haha) in kerberos

(and this doesn't have to be a user on your OS!):

$ sudo kadmin.local

kadmin: addprinc -randkey hdfs/test@TEST

kadmin: xst -norandkey -k etc/hadoop/key/hdfs.keytab hdfs/test@TEST

The above will generate the keytab file for hdfs/test@TEST. Copy this

file to both M1 and M2, and set the configuration entry

"dfs.namenode.keytab.file"

"dfs.secondary.namenode.keytab.file"

"dfs.datanode.keytab.file"

to point to the keytab files.

B. Test the combination of HA and Security works: all of the steps below

should be successful:

M1$ kinit -k -t key_M1/hdfs.keytab hdfs/test on M1

M2$ kinit -k -t key_M2/hdfs.keytab hdfs/test

Commands below are all to be executed on M1:

M1$ klist

--- This should print some info about your tgt info.

M1$ start-dfs.sh

M1$ hdfs haadmin failover nn1 nn2 (so M2 becomes the primary and M1 becomes standby)

M1$ sudo -s

M1$ hadoop-daemon.sh start datanode

--- execute on M1:

M1$ hdfs dfs -ls /

$ hdfs dfs -put ./somefile /

$ hdfs dfs -ls /

--- "somefile" should show up.

C. Reproducing the failure:

Commands below are to be executed M1, unless otherwise stated.

$ starting NN (start-dfs.sh) on M1

$ kinit -k -t key_M1/hdfs.keytab hdfs/test on M1

M2$ kinit -k -t key_M2/hdfs.keytab hdfs/test on M2

$ hdfs haadmin failover nn1 nn2 (so M2 becomes the primary and M1 becomes standby)

$ sudo -s

$ hadoop-daemon.sh start datanode

$ hdfs fetchdt --renewer hdfs/haha /tmp/token

$ kdestroy

$ HADOOP_TOKEN_FILE_LOCATION=/tmp/token hdfs dfs -ls /

--- It should fail with:

13/07/04 11:52:32 ERROR security.UserGroupInformation: PriviledgedActionException as:ding (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 3 for hdfs) can't be found in cache

13/07/04 11:52:32 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 3 for hdfs) can't be found in cache

without showing the ls result. The correct behavior, though the client

log will also show some ERROR msgs, should show the ls result.

To stop:

$ stop-dfs.sh

$ sudo -s

$ hadoop-daemon.sh stop datanode

2.2.1Timing order

This timing order is externally visible.

2.3 Can the logs tell how to reproduce the failure?

Yes!

2.4 How many machines needed?

Two machines.

3. Diagnosis procedure

3.1 Detailed symptom

When client execute:

M1:~/research/hadoop/hadoop-2.0.2-alpha-src/hadoop-dist/target/hadoop-2.0.2-alpha/etc/hadoop$ HADOOP_TOKEN_FILE_LOCATION=/tmp/token hdfs dfs -ls /

13/07/04 11:52:12 INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 3 for hdfs on ha-hdfs:mycluster

It does not show the expected output (should show the ls result).

Correct behavior:

HADOOP_TOKEN_FILE_LOCATION=/tmp/token hdfs dfs -ls /

13/07/04 08:43:43 ERROR security.UserGroupInformation: PriviledgedActionException as:ding (auth:KERBEROS) cause:org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby

13/07/04 08:43:43 WARN ipc.Client: Exception encountered while connecting to the server : org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.ipc.StandbyException): Operation category READ is not supported in state standby

3.2 Backward inference

In fact, the diagnosis of this failure is not too complicated.

A. From the client log, we can infer that it received a

"RemoteException: SecretManager$InvalidToken.

B. By looking at the NN's log, we can find the corresponding exception

(it is sent from NN).

2013-07-04 11:52:32,221 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for 128.100.23.4:60160:null

2013-07-04 11:52:32,222 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8020: readAndProcess threw exception javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 3 for hdfs) can't be found in cache] from client 128.100.23.4. Count of bytes read: 0

javax.security.sasl.SaslException: DIGEST-MD5: IO error acquiring password [Caused by org.apache.hadoop.security.token.SecretManager$InvalidToken: token (HDFS_DELEGATION_TOKEN token 3 for hdfs) can't be found in cache]

at com.sun.security.sasl.digest.DigestMD5Server.validateClientResponse(DigestMD5Server.java:577)

at com.sun.security.sasl.digest.DigestMD5Server.evaluateResponse(DigestMD5Server.java:226)

at org.apache.hadoop.ipc.Server$Connection.saslReadAndProcess(Server.java:1203)

at org.apache.hadoop.ipc.Server$Connection.readAndProcess(Server.java:1397)

at org.apache.hadoop.ipc.Server$Listener.doRead(Server.java:712)

.. .. ..

Note this is from M1--- the standby NN.

If we look at the active NN (M2):

2013-07-04 17:10:36,612 DEBUG org.apache.hadoop.ipc.Server: Have read input token of size 75 for processing by saslServer.unwrap()

2013-07-04 17:10:36,613 INFO SecurityLogger.org.apache.hadoop.security.authorize.ServiceAuthorizationManager: Authorization successful for hdfs/haha@ (auth:KERBEROS) for protocol=interface org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol

2013-07-04 17:10:36,614 DEBUG org.apache.hadoop.ipc.Server: Successfully authorized userInfo {

effectiveUser: "hdfs/haha@"

}

protocol: "org.apache.hadoop.hdfs.server.protocol.DatanodeProtocol"

--- seems log-in successful!

C. Now there are two paths for diagnosis: 1st, why NN throw this exception? 2nd, why this exception resulted in client termination?

Let's start with 2nd first.

Why this exception resulted in client termination?

By only looking at the client code, it is hard to know where the

exception is received. The exception was thrown by NN, so we cannot find

any specific text in the client. But a programmer familiar with client

code should be able to reason about how the client handles the

exception.

If we enable the DEBUG log on client, we can clearly see how client

handles the exception via a call stack:

13/07/04 11:52:32 DEBUG ipc.Client: closing ipc connection to baker221.eecg.utoronto.ca/128.100.23.4:8020: token (HDFS_DELEGATION_TOKEN token 3 for hdfs) can't be found in cache

org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.security.token.SecretManager$InvalidToken): token (HDFS_DELEGATION_TOKEN token 3 for hdfs) can't be found in cache

at org.apache.hadoop.security.SaslRpcClient.readStatus(SaslRpcClient.java:111)

at org.apache.hadoop.security.SaslRpcClient.saslConnect(SaslRpcClient.java:173)

at org.apache.hadoop.ipc.Client$Connection.setupSaslConnection(Client.java:425)

at org.apache.hadoop.ipc.Client$Connection.access$1300(Client.java:220)

at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:592)

at org.apache.hadoop.ipc.Client$Connection$2.run(Client.java:589)

at java.security.AccessController.doPrivileged(Native Method)

at javax.security.auth.Subject.doAs(Subject.java:396)

at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1367)

at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:588)

at org.apache.hadoop.ipc.Client$Connection.access$1700(Client.java:220)

at org.apache.hadoop.ipc.Client.getConnection(Client.java:1217)

at org.apache.hadoop.ipc.Client.call(Client.java:1144)

at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:202)

at com.sun.proxy.$Proxy9.getFileInfo(Unknown Source)

at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.getFileInfo(ClientNamenodeProtocolTranslatorPB.java:608)

at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)

at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)

at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:164)

at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:83)

at com.sun.proxy.$Proxy10.getFileInfo(Unknown Source)

.. .. ..

If we look at the source of "invoke" in (RetryInvocationHandler.java):

public Object invoke(Object proxy, Method method, Object[] args)

throws Throwable {

while (true) {

// The number of times this invocation handler has ever been failed over,

// before this method invocation attempt. Used to prevent concurrent

// failed method invocations from triggering multiple failover attempts.

long invocationAttemptFailoverCount;

synchronized (proxyProvider) {

invocationAttemptFailoverCount = proxyProviderFailoverCount;

}

try {

Object ret = invokeMethod(method, args);

hasMadeASuccessfulCall = true;

return ret;

} catch (Exception e) {

LOG.debug("Ding: REtryInvocationHandler.java: invoke got exception", e);

boolean isMethodIdempotent = proxyProvider.getInterface()

.getMethod(method.getName(), method.getParameterTypes())

.isAnnotationPresent(Idempotent.class);

RetryAction action = policy.shouldRetry(e, retries++, invocationFailoverCount,

isMethodIdempotent);

<--- For this particular exception (InvalidToken), client

decides it should NOT retry (should retry decision is FAIL). And the

reason is "null".

if (action.action == RetryAction.RetryDecision.FAIL) {

if (action.reason != null) {

LOG.warn("Exception while invoking " +

currentProxy.getClass() + "." + method.getName() +

". Not retrying because " + action.reason, e);

}

throw e;

} else { // retry or failover

// avoid logging the failover if this is the first call on this

// proxy object, and we successfully achieve the failover without

// any flip-flopping

boolean worthLogging =

!(invocationFailoverCount == 0 && !hasMadeASuccessfulCall);

worthLogging |= LOG.isDebugEnabled();

if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY &&

worthLogging) {

String msg = "Exception while invoking " + method.getName()

+ " of class " + currentProxy.getClass().getSimpleName();

if (invocationFailoverCount > 0) {

msg += " after " + invocationFailoverCount + " fail over attempts";

}

msg += ". Trying to fail over " + formatSleepMessage(action.delayMillis);

if (LOG.isDebugEnabled()) {

LOG.debug(msg, e);

} else {

LOG.warn(msg);

}

} else {

if(LOG.isDebugEnabled()) {

LOG.debug("Exception while invoking " + method.getName()

+ " of class " + currentProxy.getClass().getSimpleName() +

". Retrying " + formatSleepMessage(action.delayMillis), e);

}

if (action.delayMillis > 0) {

LOG.debug("Ding: RetryInvocationHandler.java: sleepATLeastIgnoreInterrupt");

ThreadUtil.sleepAtLeastIgnoreInterrupts(action.delayMillis);

}

if (action.action == RetryAction.RetryDecision.FAILOVER_AND_RETRY) {

// Make sure that concurrent failed method invocations only cause a

// single actual fail over.

synchronized (proxyProvider) {

if (invocationAttemptFailoverCount == proxyProviderFailoverCount) {

proxyProvider.performFailover(currentProxy);

proxyProviderFailoverCount++;

currentProxy = proxyProvider.getProxy();

} else {

LOG.warn("A failover has occurred since the start of this method"

+ " invocation attempt.");

}

invocationFailoverCount++;

}

Now, let's look at the NN's code to see why this InvalidToken is thrown:

private void saslReadAndProcess(byte[] saslToken) throws IOException,

InterruptedException {

if (!saslContextEstablished) {

byte[] replyToken = null;

try {

if (saslServer == null) {

switch (authMethod) {

case DIGEST:

if (secretManager == null) {

throw new AccessControlException(

"Server is not configured to do DIGEST authentication.");

}

/* secretManager.checkAvailableForRead(); */ // Ding

saslServer = Sasl.createSaslServer(AuthMethod.DIGEST

.getMechanismName(), null, SaslRpcServer.SASL_DEFAULT_REALM,

SaslRpcServer.SASL_PROPS, new SaslDigestCallbackHandler(

secretManager, this));

break;

default:

.. ..

}

if (LOG.isDebugEnabled())

LOG.debug("Have read input token of size " + saslToken.length

+ " for processing by saslServer.evaluateResponse()");

replyToken = saslServer.evaluateResponse(saslToken);

} catch (IOException e) {

IOException sendToClient = e;

Throwable cause = e;

while (cause != null) {

if (cause instanceof InvalidToken) {

sendToClient = (InvalidToken) cause;

break;

}

cause = cause.getCause();

}

doSaslReply(SaslStatus.ERROR, null, sendToClient.getClass().getName(),

sendToClient.getLocalizedMessage());

rpcMetrics.incrAuthenticationFailures();

String clientIP = this.toString();

// attempting user could be null

AUDITLOG.warn(AUTH_FAILED_FOR + clientIP + ":" + attemptingUser);

throw e;

}

.. ..

So createSaslServer will throw an exception (InvalidToken), and fall into the "catch" block above, which will further send back to the client.

If we really think about it, the fact this standby NN throws InvalidToken exception is really not surprising:

This token was generated by the active NN (M2), and cannot be verified

by the active NN.

4. Root cause

Here is the sequence of event that are relevant:

A). Client got a delegation token from the NNs (active NN generated this token)

B). Client used this token (and not tgt because we used kdestroy) to do an hdfs command (-ls).

C). Active NN (M2) accepted this token, however, standby NN (M1)

rejects this token (because it was generated by M2). It throws

InvalidToken back to client.

D). Client handles InvalidToken in a way that will terminate the

execution immidiately without retrying.

E). The fix: letting M1 throw an exception other than InvalidToken, so client will retry on that exception!

4.1 Category:

Incorrect exception handling. This is a tough one to test. Statement coverage won’t work. YOu have to anticipate the error scenario to test it.