squid-1972

Version:

2.6.STABLE14 (fixed in 2.6.STABLE15)

Bug link:

http://bugs.squid-cache.org/show_bug.cgi?id=1972 

How it is diagnosed (reproduced or source analysis)?

We reproduced the failure and triggered the error message.

How to reproduce?

Need to set up two squid servers, and set one as the parent of the other one. And shut the parent server down for a while to observe the error msg in the child proxy server.

Symptom:

Squid in reverse proxy mode declares origin servers dead after they have been
down for some period. The only way to recover from this is to restart squid.

There is the following message in cache.log:

2007/05/12 00:14:56| TCP connection to www.here.net/8080 failed
2007/05/12 00:14:56| TCP connection to www.here.net/8080 failed
2007/05/12 00:14:56| TCP connection to www.here.net/8080 failed
2007/05/12 00:14:56| TCP connection to www.here.net/8080 failed
2007/05/12 00:14:56| TCP connection to www.here.net/8080 failed
2007/05/12 00:14:56| TCP connection to www.here.net/8080 failed
2007/05/12 00:14:56| TCP connection to www.here.net/8080 failed
2007/05/12 00:14:56| TCP connection to www.here.net/8080 failed
2007/05/12 00:14:56| TCP connection to www.here.net/8080 failed
2007/05/12 00:14:56| TCP connection to www.here.net/8080 failed
2007/05/12 00:14:56| Detected DEAD Parent: www.here.net
2007/05/12 00:14:56| TCP connection to www.here.net/8080 failed
2007/05/12 00:15:00| Failed to select source for
'
http://www.here.net/nsc/getior?lbprobe'
2007/05/12 00:15:00|   always_direct = 0
2007/05/12 00:15:00|    never_direct = 0
2007/05/12 00:15:00|        timedout = 0

Root cause:

Squid never called function ‘peerProbeConnect’, which will simply re-probe the peer to set-up the ‘peer’ structure. The fix suggests that the ‘peerProbeConnect’ should be called within ‘peerDNSConfigure’ function.

Here is the fix in ‘peerDNSConfigure’ code:

static void peerDNSConfigure(const ipcache_addrs * ia, void *data) {

   ...

-   p->tcp_up = PEER_TCP_MAGIC_COUNT; // defined as 10

   …

+   if (!p->tcp_up)

+        peerProbeConnect((peer *) p);

}

Now let’s see where the error msg is printed:

void peerConnectFailed(peer * p) {

   /* This error msg is printed in this failure. */

   debug(15, 1) ("TCP connection to %s/%d failed\n", p->host, p->http_port);

   peerConnectFailedSilent(p);

}

static void

fwdConnectDone(int server_fd, int status, void *data)

{

  .. ..

  /* This status is the return value from ‘connect’ system call. */

   if (status == COMM_ERR_DNS) {

    … ...

   } else if (status != COMM_OK) {

      … ...

       peerConnectFailed(fs->peer);

       comm_close(server_fd);

   } else {

     … ...

   }

}

Is there any error message?

Yes. It was printed because the connect system call failed (system call return error pattern).

Can Errlog automatically anticipate the error?

Yes. This error message was printed because ‘connect’ system call returned error status! Belong to system-call return value pattern!