[redis]github-547-2 Report

In the bug report of github-547, the developers discovered two independent bugs. This report documents the 2nd bug.



1. Symptom

Client reply buffer limit (a new feature introduced in redis 2.6)’s computation is broken for some requests. This was discovered in the “INFO” logging in bug github-547.

1.1 Severity


1.2 Was there exception thrown?


1.2.1 Were there multiple exceptions?


1.3 Scope of the failure

a few clients

2. How to reproduce this failure

The same way as github-547.

3. Diagnosis procedure

3.1 Detailed Symptom (where you start)

The symptom is logged:

[10845] 09 Jun 04:42:08.293 # --- CURRENT CLIENT INFO

[10845] 09 Jun 04:42:08.293 # client: addr= fd=125 age=13903 idle=0 flags=N db=12 sub=0 psub=0 multi=-1 qbuf=0 qbuf-free=32768 obl=0 oll=0 omem=18446744073709398960 events=r cmd=eval

3.2 Backward inference

This log msg is printed by:

/* Turn a Redis client into an sds string representing its state. */

sds getClientInfoString(redisClient *client) {

  .. ..

   return sdscatprintf(sdsempty(),

        "addr=%s:%d fd=%d name=%s age=%ld idle=%ld flags=%s db=%d sub=%d psub=%d multi=%d qbuf=%lu qbuf-free=%lu obl=%lu oll=%lu omem=%lu events=%s cmd=%s",

        (client->flags & REDIS_UNIX_SOCKET) ? server.unixsocket : ip,



        client->name ? (char*)client->name->ptr : "",

        (long)(server.unixtime - client->ctime),

        (long)(server.unixtime - client->lastinteraction),



        (int) dictSize(client->pubsub_channels),

        (int) listLength(client->pubsub_patterns),

        (client->flags & REDIS_MULTI) ? client->mstate.count : -1,

        (unsigned long) sdslen(client->querybuf),

        (unsigned long) sdsavail(client->querybuf),

        (unsigned long) client->bufpos,

        (unsigned long) listLength(client->reply),



        client->lastcmd ? client->lastcmd->name : "NULL");


/* This function returns the number of bytes that Redis is virtually

 * using to store the reply still not read by the client.

 * It is "virtual" since the reply output list may contain objects that

 * are shared and are not really using additional memory.


 * The function returns the total sum of the length of all the objects

 * stored in the output list, plus the memory used to allocate every

 * list node. The static reply buffer is not taken into account since it

 * is allocated anyway.


 * Note: this function is very fast so can be called as many time as

 * the caller wishes. The main usage of this function currently is

 * enforcing the client output length limits. */

unsigned long getClientOutputBufferMemoryUsage(redisClient *c) {

    unsigned long list_item_size = sizeof(listNode)+sizeof(robj);

    return c->reply_bytes + (list_item_size*listLength(c->reply));


So we know the computation messed up with the calculation of c->reply_bytes. But it is hard to further infer where the computation messed up.

4. Root cause

Wrong computation on c->reply_bytes when the objects are glued. In this case, the computation will subtract more from c->reply_bytes than it should, causing c->reply_bytes to grow to negative (and was interpreted as a large positive).

4.1 Category:


5. Fix

5.1 How?

         /* Only glue when the next node is non-NULL (an sds in this case) */

         if (next->ptr != NULL) {

+            c->reply_bytes -= zmalloc_size_sds(len->ptr);

+            c->reply_bytes -= zmalloc_size_sds(next->ptr);

             len->ptr = sdscatlen(len->ptr,next->ptr,sdslen(next->ptr));

+            c->reply_bytes += zmalloc_size_sds(len->ptr);