du -x
Version:
coreutils-6.6(fixed in 6.7)
How it is diagnosed (reproduced or source analysis)?
reproduced and source analysis
How to reproduce?
$ mkdir d1
$ mkdir d2
$ touch d1/temp
$ coreutils-6.6/src/du -x ./d2 ./d1
Background:
What is du?
Disk usage. Estimate file space usage.
What is du -x?
skip directories on different file systems
Symptom:
Incorrect results.
du will just handle the first directory and take the second one (d1) as if an empty directory.
While the incorrect output:
4 ./d2
4 ./d1
The correct output should be:
4 ./d2
8 ./d1
Root cause:
du forgot to assign the variable value of ‘sp->fts_dev’ to ‘p->fts_statp->st_dev’, and the incorrect data-flow would affect control-flow and eventually caused du to think d1’s sub-directories were all visited after the first time it cd into it.
Understanding the bug requires understanding FTS --- file system traversal. The man page provides a nice background:
http://linux.die.net/man/3/fts
Comments written by us are in blue color. This bug is quite complicated.
/* du_files is the top-level function doing the job. */
static bool du_files (char **files, int bit_flags)
{
.. ...
FTS *fts = xfts_open (files, bit_flags, NULL);
/* This loop is to iterate over all the directory hierarchies.
In our input, there were 4 iterations. The bug
occurred during the 3rd iteration. */
while (1) {
FTSENT *ent;
/* each fts_read is to describe the current dir hierarchy. */
ent = fts_read (fts);
if (ent == NULL) {
if (errno != 0) {
/* FIXME: try to give a better message */
error (0, errno, _("fts_read failed"));
ok = false;
}
break;
}
FTS_CROSS_CHECK (fts);
/* process_file is to actually collect the du info and print. */
ok &= process_file (fts, ent);
} // while (1)
/* Ignore failure, since the only way it can do so is in failing to
return to the original directory, and since we're about to exit,
that doesn't matter. */
fts_close (fts);
}
/* fts_read is called from each iteration. During the 3rd iteration in the
while loop above,
it sets sp->fts_dev to 0 in the wrong version. sp->fts_path == “d1”.
Here we are only showing the relevant code for the 3rd iteration (bug). */
fts_read (register FTS *sp) {
if (p->fts_info == FTS_D) {
/* Here,sp->fts_dev will be 0. The patch forced it to
be p->fts_statp->st_dev, which is 19. The logic is, if the
directory is the root, we need to set sp->fts_dev
to ‘p->fts_statp->st_dev’ --- an operation they forgot to do. */
+ if (p->fts_level == FTS_ROOTLEVEL)
+ sp->fts_dev = p->fts_statp->st_dev;
.. ..
}
return p;
}
/* Now we are showing the relevant code of ‘fts_read’ during
the 4nd iteration in the while loop in ‘du_files’.
This iteration will be the failure point (printing the wrong size).
Recall that from the previous iteration, the argument sp->fts_dev
will be 0, where in the fixed version, sp->fts_dev will be 19.
sp->fts_path == “d1” */
fts_read (register FTS *sp) {
/* Directory in pre-order. */
if (p->fts_info == FTS_D) {
/* Here, in the buggy execution, p->fts_statp->st_dev == 19,
sp->fts_dev == 0, so the if condition below is evaluated to true.
*/
if (instr == FTS_SKIP ||
(ISSET(FTS_XDEV) && p->fts_statp->st_dev != sp->fts_dev)) {
...
/* below it will set p->fts_info to FTS_DP.
* FTS_DP --- postorder directory
* it means p, which correspond to ‘d1’, is a directory that has
* no subdirectory and already been visited. So later it would print
* “4 d1” in process_file. */
p->fts_info = FTS_DP;
LEAVE_DIR (sp, p, "1");
return (p);
}
}
}
/* process_file, during the 4th iteration, would print “4 d1”,which is incorrect. This is because the its ent->fts_info is set to FTS_DP --
postorder dir indicating it has been already visited. */
static bool process_file (FTS *fts, FTSENT *ent) {
… …
switch (ent->fts_info)
{
case FTS_NS:
.. ..
case FTS_ERR:
.. ..
.. ..
default:
/* Here, in the last iteration, ent->fts_info would be equal to FTS_DP,
which will fall into the default case. This is very important to
diagnose this failure!!! */
ok = true;
break;
}
/* If this is the first (pre-order) encounter with a directory,
or if it's the second encounter for a skipped directory, then
return right away. */
/* Since the ent->fts_info is set to FTS_PD,
it would not return here. The logic is FTS_D is
preorder directory,which indicates the directory is
encountered the first time. If it is FTS_PD, the directory
has already been visited, so it will not enter into the dir..
*/
if (ent->fts_info == FTS_D)
return ok;
/* #define IS_DIR_TYPE(Type) ((Type) == FTS_DP || (Type) == FTS_DNR)
* So this if is evaluated to true. Size 4 is printed without further
* entering the d1. */
if ((IS_DIR_TYPE (ent->fts_info) && level <= max_depth)
|| ((opt_all && level <= max_depth) || level == 0))
/* print_size, it printed “4 d1”. */
print_size (&dui_to_print, file);
}
Is there Error Message?
No
Can Errlog/developers anticipate the error with an error message?
Yes. The pattern is default-switch.