httpd-40010

Version:

2.2.6

Bug Link:

https://issues.apache.org/bugzilla/show_bug.cgi?id=40010  

Symptom(Failure):

Aapache becomes suddenly unresponsive, and starts filling my
httpd-error.log with thousands of lines like:

[Sat Jul 08 20:57:32 2006] [warn] (61)Connection refused: connect to listener on 0.0.0.0:80

How it is diagnosed:

We did not reproduce the failure. We rely on understanding the discussions and source code.

Root Cause:

User’s configuration error.  

The production environment was running apache in FreeBSD with jail set up. However, the user did not change the default configuration: "Listen 80" to “Listen your.jail.ip:80”. Thus, any connections would attampt to connect to 0.0.0.0:80.

In FreeBSD jails, although only one single IP is visible when issuing an ifconfig, users have to bind every server process to this specific IP, not * or 0.0.0.0.

The related source code that prints this log message is:

static apr_status_t dummy_connection(ap_pod_t *pod)

{

   ...

   /* create a temporary pool for the socket.  pconf stays around too long */

   rv = apr_pool_create(&p, pod->p);

   if (rv != APR_SUCCESS) {

       return rv;

   }

   rv = apr_socket_create(&sock, ap_listeners->bind_addr->family,

                          SOCK_STREAM, 0, p);

   ...

   rv = apr_socket_connect(sock, ap_listeners->bind_addr);

   if (rv != APR_SUCCESS) {

       int log_level = APLOG_WARNING;

       if (APR_STATUS_IS_TIMEUP(rv)) {

           /* probably some server processes bailed out already and there

            * is nobody around to call accept and clear out the kernel

            * connection queue; usually this is not worth logging

            */

           log_level = APLOG_DEBUG;

       }

       ap_log_error(APLOG_MARK, log_level, rv, ap_server_conf,

                    "connect to listener on %pI", ap_listeners->bind_addr);

       /* No return, fall through!!! */

   }

   ...

}

apr_status_t apr_socket_connect(apr_socket_t *sock, apr_sockaddr_t *sa)

{

int rc;        

do {

       /* connect is actually non-blocking. Here, Apache uses its own time-out based

         *  I/O implementation (as discussed in bug 22030). Later it uses poll (in

         * apr_wait_for_io_or_timeout) to install a timeout. */

       rc = connect(sock->socketdes,

                    (const struct sockaddr *)&sa->sa.sin,

                    sa->salen);

} while (rc == -1 && errno == EINTR);

           /* we can see EINPROGRESS the first time connect is called on a non-blocking

                   * socket; if called again, we can see EALREADY

                  */

              if ((rc == -1) && (errno == EINPROGRESS || errno == EALREADY) && (sock->timeout > 0)) {

                   rc = apr_wait_for_io_or_timeout(NULL, sock, 0);

                   if (rc != APR_SUCCESS) {

                     return rc;

                  }

              }

if (rc == -1 && errno != EISCONN) {

       return errno;

}

        ...

}

apr_status_t apr_wait_for_io_or_timeout(... ...)

{

   do {    

       /* Apache hangs in ‘poll’. */

       rc = poll(&pfd, 1, timeout);

   } while (rc == -1 && errno == EINTR);

   if (rc == 0) {

       return APR_TIMEUP;

   }

   else if (rc > 0) {

       return APR_SUCCESS;

   }

   else {  

       return errno;

   }

}

This connect is operating system specific, so the fact that jail won’t remap 0.0.0.0 to jails ip is not caused by apache. The hang actually occurs

Is there error msg?

Yes.

Can Errlog anticipate the error message?

Yes. Essentially Apache printed the error message by checking the error return value of system call ‘connect’.