[redis]github-327 Report

https://github.com/antirez/redis/issues/327

1. Symptom

Redis server wiped out entire DB when “maxmemory” is met with multiple slaves connected.

1.1 Severity

Critical

1.2 Was there exception thrown?

Yes. command not allowed when used memory > 'maxmemory'

1.2.1 Were there multiple exceptions?

No, but this error msg will appear many times.

1.3 Scope of the failure

All data erased! Quite significant.

2. How to reproduce this failure

2.0 Version

2.4.1

2.1 Configuration

Enable maxmemory configuration and multiple slaves

2.2 Reproduction procedure

1. Enable maxmemory configuration (config change)

2. Attach multiple slaves (add node)

2.2.1 Timing order

In this order

2.2.2 Events order externally controllable?

Yes

2.3 Can the logs tell how to reproduce the failure?

Yes --- slaves connections

2.4 How many machines needed?

2

3. Diagnosis procedure

3.1 Detailed Symptom (where you start)

Data loss + “command not allowed when used memory > ‘maxmemory’”...

3.2 Backward inference

From the code that prints the log:

    if (server.maxmemory) freeMemoryIfNeeded();

    if (server.maxmemory && (c->cmd->flags & REDIS_CMD_DENYOOM) &&

        zmalloc_used_memory() > server.maxmemory)

    {

        addReplyError(c,"command not allowed when used memory > 'maxmemory'");

              ---- Once it’s triggered, something must be wrong!

        return REDIS_OK;

    }

We can see the error happens after freeMemoryIfNeeded did not successfully free any memory...

4. Root cause

When maxmemory is met, Redis tries to expire keys. But the memory usage calculation accounted the output buffer size for the slave, which keeps growing with the “DEL” command sending to the slaves. Therefore the more Redis tries to expire keys, the larger the memory consumption is, and therefore in the end, Redis deleted all the keys!!!

The issue happens for the following reason:

4.1 Category:

Incorrect error handling (handled)

  --- the initial error is not handled correctly.

5. Fix

5.1 How?

Quite complicated. Basically when free-ing the memory, make sure the output buffer size is not counted torwards memory usage!

https://github.com/antirez/redis/commit/f6b32c14f4c8680d2a6b7a4d71758e76ca2c3554