du -x

Version:

coreutils-6.6(fixed in 6.7)

How it is diagnosed (reproduced or source analysis)?

reproduced and source analysis

How to reproduce?

$ mkdir d1

$ mkdir d2

$ touch d1/temp

$ coreutils-6.6/src/du -x ./d2 ./d1

Background:

What is du?

Disk usage. Estimate file space usage.

What is du -x?

skip directories on different file systems

Symptom:

Incorrect results.

du will just handle the first directory and take the second one (d1) as if an empty directory.

While the incorrect output:

4       ./d2

4       ./d1

The correct output should be:

4       ./d2

8       ./d1

Root cause:

du forgot to assign the variable value of ‘sp->fts_dev’ to ‘p->fts_statp->st_dev’, and the incorrect data-flow would affect control-flow and eventually caused du to think d1’s sub-directories were all visited after the first time it cd into it.

Understanding the bug requires understanding FTS --- file system traversal. The man page provides a nice background:

http://linux.die.net/man/3/fts

Comments written by us are in blue color. This bug is quite complicated.

/* du_files is the top-level function doing the job. */

static bool du_files (char **files, int bit_flags)

{

  .. ...

  FTS *fts = xfts_open (files, bit_flags, NULL);

  /* This loop is to iterate over all the directory hierarchies.

     In our input, there were 4 iterations. The bug

     occurred during the 3rd iteration. */

   while (1)  {

            FTSENT *ent;

       /* each fts_read is to describe the current dir hierarchy. */

            ent = fts_read (fts);

           if (ent == NULL) {

                if (errno != 0)  {

                 /* FIXME: try to give a better message  */

                 error (0, errno, _("fts_read failed"));

                 ok = false;

                 }

               break;

           }

        FTS_CROSS_CHECK (fts);

         

        /* process_file is to actually collect the du info and print. */

        ok &= process_file (fts, ent);

   } // while (1)

   /* Ignore failure, since the only way it can do so is in failing to

      return to the original directory, and since we're about to exit,

      that doesn't matter.  */

   fts_close (fts);

}

/* fts_read is called from each iteration. During the 3rd iteration in the

  while loop above,

  it sets sp->fts_dev to 0 in the wrong version. sp->fts_path == “d1”.

  Here we are only showing the relevant code for the 3rd iteration (bug). */

fts_read (register FTS *sp) {  

 if (p->fts_info == FTS_D) {

    /* Here,sp->fts_dev will be 0. The patch forced it to

       be p->fts_statp->st_dev, which is 19. The logic is, if the

       directory is the root, we need to set  sp->fts_dev

       to ‘p->fts_statp->st_dev’ --- an operation they forgot to do. */

+     if (p->fts_level == FTS_ROOTLEVEL)

+        sp->fts_dev = p->fts_statp->st_dev;

          .. ..

  }

 return p;

}

/* Now we are showing the relevant code of ‘fts_read’ during

  the 4nd iteration in the while loop in ‘du_files’.

  This iteration will be the failure point (printing the wrong size).

  Recall that from the previous iteration, the argument sp->fts_dev

  will be 0, where in the fixed version, sp->fts_dev will be 19.

  sp->fts_path == “d1” */

fts_read (register FTS *sp) {  

  /* Directory in pre-order. */

  if (p->fts_info == FTS_D) {

      /* Here, in the buggy execution,  p->fts_statp->st_dev == 19,

         sp->fts_dev == 0, so the if condition below is evaluated to true.

       */

    if (instr == FTS_SKIP ||

         (ISSET(FTS_XDEV) && p->fts_statp->st_dev != sp->fts_dev)) {

             ...

        /* below it will set p->fts_info to FTS_DP.

         * FTS_DP  --- postorder directory

         * it means p, which correspond to ‘d1’, is a directory that has

         * no subdirectory and already been visited. So later it would print

         * “4 d1” in process_file. */

                 p->fts_info = FTS_DP;

                 LEAVE_DIR (sp, p, "1");

                 return (p);

           }

     }

}

/* process_file, during the 4th iteration, would print “4 d1”,which is incorrect. This is because the its ent->fts_info is set to FTS_DP --

 postorder dir indicating it has been already visited. */

static bool process_file (FTS *fts, FTSENT *ent) {

   … …

switch (ent->fts_info)

        {

        case FTS_NS:

            .. ..

        case FTS_ERR:

            .. ..

        .. ..

        default:

     /* Here, in the last iteration, ent->fts_info would be equal to FTS_DP,

        which will fall into the default case. This is very important to

        diagnose this failure!!! */

            ok = true;

            break;

        }

   /* If this is the first (pre-order) encounter with a directory,

        or if it's the second encounter for a skipped directory, then

        return right away.  */

   /* Since the ent->fts_info is set to FTS_PD,

      it would not return here. The logic is FTS_D is

      preorder directory,which indicates the directory is

      encountered the first time. If it is FTS_PD, the directory

      has already been visited, so it will not enter into the dir..

    */

   if (ent->fts_info == FTS_D)

        return ok;

   /* #define IS_DIR_TYPE(Type)  ((Type) == FTS_DP || (Type) == FTS_DNR)

    * So this if is evaluated to true. Size 4 is printed without further

    * entering the d1. */

   if ((IS_DIR_TYPE (ent->fts_info) && level <= max_depth)

         || ((opt_all && level <= max_depth) || level == 0))

           /* print_size, it printed “4 d1”. */

        print_size (&dui_to_print, file);

}

Is there Error Message?

No

Can Errlog/developers anticipate the error with an error message?

Yes. The pattern is default-switch.