squid-2224

Version:

Squid 3.0

Bug link:

http://bugs.squid-cache.org/show_bug.cgi?id=2224

How it is diagnosed (reproduced or source analysis)?

We did not reproduce the failure, only relied on source analysis.

Symptom:

Squid crashes (stack exhausted).

They discovered that it only occurs with specific debugging level, 9, and option -X. Developers recommended a reporter to get core file, but he couldn’t, and instead got  stack traces with gdb. It then showed a recursive call chain with a debug function and assert function.

Root cause:

A infinite recursive call cycle:

assert()->debugs()->assert()->debugs()

Detailed:

There is an error msg printed by ‘assert’:

receivedBodyTooLarge: -3107 >? -1

But this debugging line did not indicate where the problem actually is (and the message is actually nothing to do with this failure), and developers didn’t focus on this debugging msg.

Debug.h,

#define assert(EX)  ((EX)?((void)0):xassert("EX", __FILE__, __LINE__))

-- When assert( ) happens, the defined xassert( ) func is executed.

debug.cc,

void xassert(const char *msg, const char *file, int line) {

   debugs(0, 0, "assertion failed: " << file << ":" << line << ": \"" << msg << "\"");

   if (!shutting_down)

       abort();

}

Debug.h,

#define debugs(SECTION, LEVEL, CONTENT) \

  do { \

       if ((Debug::level = (LEVEL)) <= Debug::Levels[SECTION]) { \

               Debug::getDebugOut() << CONTENT; \

               Debug::finishDebug(); \

       } \

debug.cc,

std::ostream & Debug::getDebugOut() {

   assert (CurrentDebug == NULL);

   CurrentDebug = new std::ostringstream();

   // set default formatting flags

   CurrentDebug->setf(std::ios::fixed);

   CurrentDebug->precision(2);

   return *CurrentDebug;

}

-- The xassert is calling debugs()

void ClientSocketContext::pullData() {

 debugs(33, 5, "ClientSocketContext::pullData: FD " << fd() << " attempting to pull upstream data");

 ..

}

int cbdataReferenceValid(const void *p){

   cbdata *c;

   if (p == NULL)

       return 1;   /* A NULL pointer cannot become invalid */

   debugs(45, 9, "cbdataReferenceValid: " << p);

   return c->valid;

}

--------------------------

Failure sequence is like this :

1) pullData() -> debugs()

 inside debugs(), there is fd() call.

2) fd() -> debugs()

 fd() is not simply return a value, but sometimes print a debug message (with level 9 option)

3) debugs() -> getDebugOut() -> assert( )

 Since the original debugs( ) in 1) above doesn’t finish yet,  assert (CurrentDebug == NULL) is called. CurrentDebug can be NULL only after printing a debug msg completely.

4) assert()->debugs()->assert()->debugs()

 assert() calls debugs again.. recursive calls.

Fix:

They check the depth of the call chains, and if it is an inner call, they don’t construct ostringstream but just print the debugging msg.

std::ostream & Debug::getDebugOut() {

   assert(TheDepth >= 0);

   ++TheDepth;

   if (TheDepth > 1) {

      assert(CurrentDebug);

      *CurrentDebug << std::endl << "reentrant debuging " << TheDepth << "-{";

   } else {

       assert(!CurrentDebug);

       CurrentDebug = new std::ostringstream();

       // set default formatting flags

       CurrentDebug->setf(std::ios::fixed);

       CurrentDebug->precision(2);

   }

   return *CurrentDebug;

}

void Debug::finishDebug() {

   assert(TheDepth >= 0);

   assert(CurrentDebug);

   if (TheDepth > 1) {

       *CurrentDebug << "}-" << TheDepth << std::endl;

   } else {

       assert(TheDepth == 1);

       _db_print("%s\n", CurrentDebug->str().c_str());

       delete CurrentDebug;

       CurrentDebug = NULL;

   }

   --TheDepth;

}

Is there Message?

Yes.

Can Errlog anticipate with an error msg?

Yes. Printing the error msg in the signal handler and the backtrace, in particular, is very helpful to diagnose this.