httpd-45023

Version:

2.2.8

Bug report link:

https://issues.apache.org/bugzilla/show_bug.cgi?id=45023

Symptom:

When adding DEFLATE filter in the request header, “304 NOT MODIFIED” gets disabled; it always returns “200 OK”. This will in the end introduce performance problems.

*background: ETag field

For resource caching, HTTP protocol needs a mechanism for which ETag provides cache validation. ETag (entity tag) is an identifier assigned by a web server to the specific resource at a URL. If the resource content changes, a new and different ETag is assigned. It’s like a fingerprint of each resource. When a URL is retrived, the web server will return the resource along with its ETag.(e.g. ETag: “123981ds7f3”). Then, the client may decide to cache the resource along with the ETag. Later, if the client wants to retrieve the same URL again, it will send its saved ETag value in a “If-None-Match” field. (e.g. If-None_match: “123981ds7f3”). Next, the server compares the client’s ETag value with current ETag value of the resource in the server side. If it matches, it means that the resource has not changed and the server returns “304 Not Modified status”. If not, a ful lresponse including the resource content is returned.

*DEFLATE filter

you might want to use DEFLATE filter to reduce the packet size by encoding the content. In this report, gzip encoding doesn’t work correctly.

How it is diagnosed?

We reproduced the failure.

How to reproduce?

1. ‘Proof Of Concept’ of normal case (Not Modified status) when no DEFLATE

*Request Packet (can use telnet):

GET /test.js HTTP/1.1

Host: xxxxxxxxxxxxx

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14)

Accept: */*

Accept-Language: en-us,en;q=0.5

Accept-Encoding: gzip,deflate

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

Keep-Alive: 300

Connection: keep-alive

If-Modified-Since: Mon, 18 Oct 2010 23:08:30 GMT

If-None-Match: "a488bc1-1d-492ec41293780"

Cache-Control: max-age=0

*Reposonse packet from the web server:

HTTP/1.1 304 Not Modified

Date: Tue, 19 Oct 2010 00:43:49 GMT

Server: Apache/2.2.8 (Unix)

Connection: Keep-Alive

Keep-Alive: timeout=5, max=95

ETag: "a488bc1-1d-492ec41293780"

2. HOWTO apply deflate filter by enabling ‘mod_deflate’ library

mod_deflate.c compilation & make run-time library

./apxs -i -c -Wl,--rpath -Wl,-lz ../httpd-2.2.8/modules/filters/mod_deflate.c

3. Reproducing the normal case("not modified" status) vs. the error case("200 http/ok") when using DEFLATE filter.

* the normal case: the response with “not modified” status is coming after the following packet request

GET /test.js HTTP/1.1

Host: xxxxxxxxxxxxx

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14)

Accept: */*

Accept-Language: en-us,en;q=0.5

Accept-Encoding: gzip,deflate

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

Keep-Alive: 300

Connection: keep-alive

If-Modified-Since: Mon, 18 Oct 2010 23:08:30 GMT

If-None-Match: "a488bc1-1d-492ec41293780"

Cache-Control: max-age=0

* the error case: the response with “200 http/ok” status is coming after the following packet request

3-1. [Client Request] Request ‘/test.js’ with gzip encoding.

GET /test.js HTTP/1.1

Host: xxxxxxxxxxxxx

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14)

Accept: */*

Accept-Language: en-us,en;q=0.5

Accept-Encoding: gzip,deflate

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

Keep-Alive: 300

Connection: keep-alive

3-2. [Server Response] The whole content returns with ETag value

HTTP/1.1 200 OK

Date: Mon, 07 Feb 2011 02:49:27 GMT

Server: Apache/2.2.8 (Unix)

Last-Modified: Mon, 18 Oct 2010 23:08:30 GMT

ETag: "a488bc1-1d-492ec41293780"-gzip

Accept-Ranges: bytes

Vary: Accept-Encoding

Content-Encoding: gzip

Content-Length: 49

Keep-Alive: timeout=5, max=100

Connection: Keep-Alive

Content-Type: application/javascript

3-3. [Client Request] Later, the clients request the same resource again with the given ETag value

GET /test.js HTTP/1.1

Host: xxxxxxxxxxxxx

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.14)

Accept: */*

Accept-Language: en-us,en;q=0.5

Accept-Encoding: gzip,deflate

Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7

Keep-Alive: 300

Connection: keep-alive

If-Modified-Since: Mon, 18 Oct 2010 23:08:30 GMT

If-None-Match: "a488bc1-1d-492ec41293780"-gzip ( * ‘-gzip’ is the difference bet. non-deflate one)

Cache-Control: max-age=0

3-4. [Server Response] We are expecting ‘Not Modified’, but still returns the whole content

HTTP/1.1 200 OK

Date: Mon, 07 Feb 2011 02:55:16 GMT

Server: Apache/2.2.8 (Unix)

Last-Modified: Mon, 18 Oct 2010 23:08:30 GMT

ETag: "a488bc1-1d-492ec41293780"-gzip

Accept-Ranges: bytes

Vary: Accept-Encoding

Content-Encoding: gzip

Content-Length: 49

Keep-Alive: timeout=5, max=100

Connection: Keep-Alive

Content-Type: application/javascript

* If we reqeust a packet in the client side with ETag value "a488bc1-1d-492ec41293780", then the server returns “Not Modified” status. That is,  “-gzip” seems misleading.

Root cause:

Server returns modified ETag value and it does not match with the original ETag, so that it always does not return “Not Modified 304” status.

In the first request with encoding option  ‘deflate, gzip’, the server returns ETag value “5954c6-10f4-449d11713aac0”-gzip.

/* ETag value for gzip encode is modified in “deflate_check_etag” function.*/

/* this function is called by ‘deflate_out_filter’ */

static void deflate_check_etag(request_rec *r, const char *transform)

{

/* retrieve ETag value from the response header table. it is “5954c6-10f4-449d11713aac0” */

   const char *etag = apr_table_get(r->headers_out, "ETag");

   if (etag && (((etag[0] != 'W') && (etag[0] !='w')) || (etag[1] != '/'))) {

       apr_table_set(r->headers_out, "ETag",

                     apr_pstrcat(r->pool, etag, "-", transform, NULL));

/* modify ETag value to  “5954c6-10f4-449d11713aac0”-gzip to represent that this packet is gzip-encoded  */

   }

}

In the next request of the client, the ETag value of the packet is “5954c6-10f4-449d11713aac0”-gzip. (not “5954c6-10f4-449d11713aac0”) because the server returns this value.

The only part which changes the status code into “HTTP_NOT_MODIFIED 304” is in the function “ap_meets_conditions”.

AP_DECLARE(int) ap_meets_conditions(request_rec *r)

{

   const char *etag;

   const char *if_match, *if_modified_since, *if_unmodified, *if_nonematch;

/* retrieve the ETag value in the server side. it is  “5954c6-10f4-449d11713aac0” */

   etag = apr_table_get(r->headers_out, "ETag");

/* retrieve the “If-None_Match” field, which contains the ETag value of client side. it is  “5954c6-10f4-449d11713aac0”-gzip. That is, it is different from the one in the server side */

   if_nonematch = apr_table_get(r->headers_in, "If-None-Match");

   if (if_nonematch != NULL) {

      ...

       if (r->method_number == M_GET) {

           if (if_nonematch[0] == '*') {

               not_modified = 1;

           }

           else if (etag != NULL) {

               if (apr_table_get(r->headers_in, "Range")) {

                   not_modified = etag[0] != 'W'

                                  && ap_find_list_item(r->pool,

                                                       if_nonematch, etag);

               }

               else {

                   not_modified = ap_find_list_item(r->pool,

                                                    if_nonematch, etag);

               }

/* compare ‘if_nonematch’ with ‘etag’. Definitely, they are different due to the postfix “-gzip”.  */

           }

       }

     … ...

   }

   …    ….

   if (not_modified) {

       return HTTP_NOT_MODIFIED;

   }

   return OK;

}

Patch:

remove the function “deflate_check_etag” so that ETag value has not changed.

Is there Error Message?

No.

Can Errlog anticipate error and put a magic error message?

No.