HDFS-3275 Report (Same as 3287)

1. Symptom

Format command overwrites contents of non-empty shared edits directory if name directories are empty without any prompting

1.1 Severity

Major

1.2 Was there exception thrown?

No

2. How to reproduce this failure

2.0 Version

2.0.0-alpha (however, I changed the code of version 2.0.2-alpha to reproduce this failure)

2.1 Environment setting and configuration

I used two physical machines, in both machine I enabled the HA feature.

In core-site.xml file

<configuration>

 

  <property>

        <name>fs.defaultFS</name>

        <value>hdfs://mycluster</value>

  </property>

 

</configuration>

 

In hdfs-site.xml file

<configuration>

  <property>

        <name>dfs.nameservices</name>

        <value>mycluster</value>

  </property>

 

 

  <property>

    <name>dfs.ha.namenodes.mycluster</name>

        <value>nn1,nn2</value>

  </property>

 

  <property>

        <name>dfs.namenode.rpc-address.mycluster.nn1</name>

    <value>dlrz.eecg.utoronto.ca:8020</value>

  </property>

 

 

  <property>

    <name>dfs.namenode.http-address.mycluster.nn1</name>

    <value>dlrz.eecg.utoronto.ca:50070</value>

  </property>

 

  <property>

        <name>dfs.namenode.rpc-address.mycluster.nn2</name>

    <value>baker221.eecg.utoronto.ca:8020</value>

  </property>

 

  <property>

    <name>dfs.namenode.http-address.mycluster.nn2</name>

    <value>baker221.eecg.utoronto.ca:50070</value>

  </property>

 

  <property>

    <name>dfs.namenode.shared.edits.dir</name>

    <value>file:///home/zhuang/shared</value>

  </property>

  <property>

    <name>dfs.client.failover.proxy.provider.mycluster</name>

    <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>

  </property>

 

  <property>

    <name>dfs.ha.fencing.methods</name>

        <value>shell(/bin/true)</value>

  </property>

 

  <property>

    <name>dfs.datanode.data.dir</name>

    <value>file:/home/zhuang/data</value>

        <final>true</final>

  </property>

 

  <property>

    <name>dfs.namenode.name.dir</name>

    <value>file:/home/zhuang/name</value>

        <final>true</final>

  </property>

 

   <property>

        <name>dfs.permissions</name>

        <value>false</value>

  </property>

 

</configuration>

 

2.2 Reproduction procedure

1 Configure a NameNode with namedirs and a shared edits dir, all of which are empty.

2 Run “hdfs namenode -format.” Namedirs and shared edits dir gets populated.

3 Delete the contents of the namedirs. Leave the shared edits dir as is. Check the timestamps of the shared edits dir contents.

4 Run format again. The namedirs as well as the shared edits dir get formatted. The shared edits dir's contents have been replaced without any prompting.

2.3 Can the logs tell how to reproduce the failure?

Yes, console log.

2.4 How many machines needed?

Two machines

3. Diagnosis procedure

3.1 Detailed symptom

After first format

After second format

Terminal

First format:

zhuang@dlrz:~$ hadoop namenode -format

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

 

13/07/23 14:59:26 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = dlrz/192.168.103.1

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 2.0.2-alpha

STARTUP_MSG:   classpath = /home/zhuang/hadoop-2.0.2-alpha-src/hadoop-

... ... ...

STARTUP_MSG:   build = file:///home/zhuang/hadoop-2.0.2-alpha-src/hadoop-common-project/hadoop-common -r Unknown; compiled by 'zhuang' on Tue Jul 23 12:22:03 EDT 2013

************************************************************/

Formatting using clusterid: CID-e6dd2afd-40ca-410c-99e6-9038b35e24a9

13/07/23 14:59:26 INFO util.HostsFileReader: Refreshing hosts (include/exclude) list

13/07/23 14:59:26 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000

13/07/23 14:59:26 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false

13/07/23 14:59:26 INFO blockmanagement.BlockManager: defaultReplication             = 3

13/07/23 14:59:26 INFO blockmanagement.BlockManager: maxReplication                 = 512

13/07/23 14:59:26 INFO blockmanagement.BlockManager: minReplication                 = 1

13/07/23 14:59:26 INFO blockmanagement.BlockManager: maxReplicationStreams          = 2

13/07/23 14:59:26 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false

13/07/23 14:59:26 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000

13/07/23 14:59:26 INFO blockmanagement.BlockManager: encryptDataTransfer            = false

13/07/23 14:59:26 INFO namenode.FSNamesystem: fsOwner                 = zhuang (auth:SIMPLE)

13/07/23 14:59:26 INFO namenode.FSNamesystem: supergroup              = supergroup

13/07/23 14:59:26 INFO namenode.FSNamesystem: isPermissionEnabled = false

13/07/23 14:59:26 INFO namenode.FSNamesystem: Determined nameservice ID: mycluster

13/07/23 14:59:26 INFO namenode.FSNamesystem: HA Enabled: true

13/07/23 14:59:26 INFO namenode.FSNamesystem: Append Enabled: true

13/07/23 14:59:26 INFO namenode.NameNode: Caching file names occuring more than 10 times

13/07/23 14:59:26 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033

13/07/23 14:59:26 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0

13/07/23 14:59:26 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension         = 30000

13/07/23 14:59:26 INFO common.Storage: Storage directory /home/zhuang/name has been successfully formatted.

13/07/23 14:59:26 INFO common.Storage: Storage directory /home/zhuang/shared/shared-edits-dir has been successfully formatted.

13/07/23 14:59:26 INFO namenode.FSImage: Saving image file /home/zhuang/name/current/fsimage.ckpt_0000000000000000000 using no compression

13/07/23 14:59:27 INFO namenode.FSImage: Image file of size 121 saved in 0 seconds.

13/07/23 14:59:27 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0

13/07/23 14:59:27 INFO namenode.FileJournalManager: Purging logs older than 0

13/07/23 14:59:27 INFO util.ExitUtil: Exiting with status 0

13/07/23 14:59:27 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at dlrz/192.168.103.1

************************************************************/

 

Second format

zhuang@dlrz:~$ hadoop namenode -format

DEPRECATED: Use of this script to execute hdfs command is deprecated.

Instead use the hdfs command for it.

 

13/07/23 15:03:09 INFO namenode.NameNode: STARTUP_MSG:

/************************************************************

STARTUP_MSG: Starting NameNode

STARTUP_MSG:   host = dlrz/192.168.103.1

STARTUP_MSG:   args = [-format]

STARTUP_MSG:   version = 2.0.2-alpha

STARTUP_MSG:   classpath = /home/zhuang/hadoop-2.0.2-alpha-src/hadoop-

... ... ...

STARTUP_MSG:   build = file:///home/zhuang/hadoop-2.0.2-alpha-src/hadoop-common-project/hadoop-common -r Unknown; compiled by 'zhuang' on Tue Jul 23 12:22:03 EDT 2013

************************************************************/

Formatting using clusterid: CID-a7dbce60-5e85-47b6-93de-a950500a45bf

13/07/23 15:03:09 INFO util.HostsFileReader: Refreshing hosts (include/exclude) list

13/07/23 15:03:09 INFO blockmanagement.DatanodeManager: dfs.block.invalidate.limit=1000

13/07/23 15:03:09 INFO blockmanagement.BlockManager: dfs.block.access.token.enable=false

13/07/23 15:03:09 INFO blockmanagement.BlockManager: defaultReplication             = 3

13/07/23 15:03:09 INFO blockmanagement.BlockManager: maxReplication                 = 512

13/07/23 15:03:09 INFO blockmanagement.BlockManager: minReplication                 = 1

13/07/23 15:03:09 INFO blockmanagement.BlockManager: maxReplicationStreams          = 2

13/07/23 15:03:09 INFO blockmanagement.BlockManager: shouldCheckForEnoughRacks  = false

13/07/23 15:03:09 INFO blockmanagement.BlockManager: replicationRecheckInterval = 3000

13/07/23 15:03:09 INFO blockmanagement.BlockManager: encryptDataTransfer            = false

13/07/23 15:03:09 INFO namenode.FSNamesystem: fsOwner                 = zhuang (auth:SIMPLE)

13/07/23 15:03:09 INFO namenode.FSNamesystem: supergroup              = supergroup

13/07/23 15:03:09 INFO namenode.FSNamesystem: isPermissionEnabled = false

13/07/23 15:03:09 INFO namenode.FSNamesystem: Determined nameservice ID: mycluster

13/07/23 15:03:09 INFO namenode.FSNamesystem: HA Enabled: true

13/07/23 15:03:09 INFO namenode.FSNamesystem: Append Enabled: true

13/07/23 15:03:09 INFO namenode.NameNode: Caching file names occuring more than 10 times

13/07/23 15:03:09 INFO namenode.FSNamesystem: dfs.namenode.safemode.threshold-pct = 0.9990000128746033

13/07/23 15:03:09 INFO namenode.FSNamesystem: dfs.namenode.safemode.min.datanodes = 0

13/07/23 15:03:09 INFO namenode.FSNamesystem: dfs.namenode.safemode.extension         = 30000

13/07/23 15:03:09 INFO common.Storage: Storage directory /home/zhuang/name has been successfully formatted.

13/07/23 15:03:10 INFO common.Storage: Storage directory /home/zhuang/shared/shared-edits-dir has been successfully formatted.

13/07/23 15:03:10 INFO namenode.FSImage: Saving image file /home/zhuang/name/current/fsimage.ckpt_0000000000000000000 using no compression

13/07/23 15:03:10 INFO namenode.FSImage: Image file of size 121 saved in 0 seconds.

13/07/23 15:03:10 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0

13/07/23 15:03:10 INFO namenode.FileJournalManager: Purging logs older than 0

13/07/23 15:03:10 INFO util.ExitUtil: Exiting with status 0

13/07/23 15:03:10 INFO namenode.NameNode: SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG: Shutting down NameNode at dlrz/192.168.103.1

************************************************************/

 

 

3.2 Backward inference

Because after the first format, I deleted the name directory, but didn't delete the share-edits-dir, so the correct behavior is that the message

“Re-format filesystem in /share-edits-dir? (Y or N)”

should be prompted.  Since this message is printed by the confirmFormat function which also checks whether the given storage directories already exist, it is possible that a bug exists in this function.

4. Root cause

The confirmFormat function is implemented to check whether the given storage directories already exist.

The original code only checks the name directories, but doesn't check the shared edits directory. Therefore, no matter the shared edits directory exits or not, as long as the name directories are empty, the formatting command can be run properly without prompting. And the contents of the shared edits directory will be overwritten.

     initializeGenericKeys(conf, nsId, namenodeId);

     checkAllowFormat(conf);

         

-    Collection<URI> dirsToFormat = FSNamesystem.getNamespaceDirs(conf);

+    Collection<URI> nameDirsToFormat = FSNamesystem.getNamespaceDirs(conf);

+        List<URI> sharedDirs = FSNamesystem.getSharedEditsDirs(conf);

+        List<URI> dirsToPrompt = new ArrayList<URI>();

+        dirsToPrompt.addAll(nameDirsToFormat);

+        dirsToPrompt.addAll(sharedDirs);

 

----both name directory and shared edit directory will be checked whether they already exist

         List<URI> editDirsToFormat =

                  FSNamesystem.getNamespaceEditsDirs(conf);

-        if (!confirmFormat(dirsToFormat, force, isInteractive)) {

+        if (!confirmFormat(dirsToPrompt, force, isInteractive)) {

           return true; // aborted

         }

 

@@ -689,7 +694,7 @@

         }

     System.out.println("Formatting using clusterid: " + clusterId);

         

-        FSImage fsImage = new FSImage(conf, dirsToFormat, editDirsToFormat);

+        FSImage fsImage = new FSImage(conf, nameDirsToFormat, editDirsToFormat);

         FSNamesystem fsn = new FSNamesystem(conf, fsImage);

     fsImage.format(fsn, clusterId);

         return false;

@@ -711,7 +716,18 @@

           boolean force, boolean interactive)

           throws IOException {

     for(Iterator<URI> it = dirsToFormat.iterator(); it.hasNext();) {

-          File curDir = new File(it.next().getPath());

+          URI dirUri = it.next();

+          if (!dirUri.getScheme().equals(NNStorage.LOCAL_URI_SCHEME)) {

+        System.err.println("Skipping format for directory \"" + dirUri

+                + "\". Can only format local directories with scheme \""

+                + NNStorage.LOCAL_URI_SCHEME + "\".");

+            continue;

+          }

+          // To validate only file based schemes are formatted

+          assert dirUri.getScheme().equals(NNStorage.LOCAL_URI_SCHEME) :

+        "formatting is not supported for " + dirUri;

+

+          File curDir = new File(dirUri.getPath());

           // Its alright for a dir not to exist, or to exist (properly accessible)

           // and be completely empty.

4.1 Category:

Semantic