[hadoop]YARN-854 Report

1. Symptom

App submission on secure cluster fails with exception.

1.1 Severity

Blocker

1.2 Was there exception thrown?

Yes. SaslException.

1.2.1 Were there multiple exceptions?

No

1.3 Scope of the failure

Some users

2. How to reproduce this failure

2.0 Version

2.0.5-alpha

2.1 Configuration

Need to enable security

2.2 Reproduction procedure

1. Enable a user to use security (config change)

2. submit job (feature start)

2.2.1 Timing order

In this order

2.2.2 Events order externally controllable?

Yes

2.3 Can the logs tell how to reproduce the failure?

Yes

2.4 How many machines needed?

2 (client + NM)

3. Diagnosis procedure

3.1 Detailed Symptom (where you start)

INFO mapreduce.Job: Job jobID failed with state FAILED due to: Application applicationID failed 2 times due to AM Container for appattemptID exited with  exitCode: -1000 due to: App initialization failed (255) with output: main : command provided 0
main : user is qa_user
javax.security.sasl.SaslException: DIGEST-MD5: digest response format violation. Mismatched response. [Caused by org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.]
        at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
        at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
        at org.apache.hadoop.yarn.ipc.RPCUtil.instantiateException(RPCUtil.java:53)
        at org.apache.hadoop.yarn.ipc.RPCUtil.unwrapAndThrowException(RPCUtil.java:104)
        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:65)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.localizeFiles(ContainerLocalizer.java:235)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.runLocalization(ContainerLocalizer.java:169)
        at org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ContainerLocalizer.main(ContainerLocalizer.java:348)
Caused by: org.apache.hadoop.ipc.RemoteException(javax.security.sasl.SaslException): DIGEST-MD5: digest response format violation. Mismatched response.
        at org.apache.hadoop.ipc.Client.call(Client.java:1298)
        at org.apache.hadoop.ipc.Client.call(Client.java:1250)
        at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:204)
        at $Proxy7.heartbeat(Unknown Source)
        at org.apache.hadoop.yarn.server.nodemanager.api.impl.pb.client.LocalizationProtocolPBClientImpl.heartbeat(LocalizationProtocolPBClientImpl.java:62)
        ... 3 more

.Failing this attempt.. Failing the application.

3.2 Backward inference

Clearly this is because the user credential was rejected. The root cause is in the node manager’s container manager --- did not check if user’s security is enabled.

4. Root cause

NM did not handle correctly when users enabled security.

   private void writeCredentials(Path nmPrivateCTokensPath)

        throws IOException {

      DataOutputStream tokenOut = null;

      try {

        Credentials credentials = context.getCredentials();

        FileContext lfs = getLocalFileContext(getConfig());

        tokenOut =

            lfs.create(nmPrivateCTokensPath, EnumSet.of(CREATE, OVERWRITE));

        LOG.info("Writing credentials to the nmPrivate file "

            + nmPrivateCTokensPath.toString() + ". Credentials list: ");

        if (LOG.isDebugEnabled()) {

          for (Token<? extends TokenIdentifier> tk : credentials

              .getAllTokens()) {

            LOG.debug(tk.getService() + " : " + tk.encodeToUrlString());

          }

        }

+        if (UserGroupInformation.isSecurityEnabled()) {
+          LocalizerTokenIdentifier id = secretManager.createIdentifier();
+          Token<LocalizerTokenIdentifier> localizerToken =
+              new Token<LocalizerTokenIdentifier>(id, secretManager);
+          credentials.addToken(id.getKind(), localizerToken);
+        }

        credentials.writeTokenStorageToStream(tokenOut);

      } finally {

        if (tokenOut != null) {

          tokenOut.close();

        }

      }

    }

4.1 Category:

Semantic.