[redis]github-90 Report

https://github.com/antirez/redis/issues/90

1. Symptom

When RDB (snapshot of the db for backup) directory doesn’t have write permission, redis keeps execution. Later when crashed, using RDB to recover, it will lose all the data.

1.1 Severity

Critical

1.2 Was there exception thrown?

Yes. Multiple.

1.2.1 Were there multiple exceptions?

Yes.

Write error saving DB on disk: Permission denied --- multiple times.

Background saving error -- multiple times

1.3 Scope of the failure

Massive dataloss!

2. How to reproduce this failure

2.0 Version

2.6.unstable

2.1 Configuration

save 10 1

  --- After 10 seconds, as long as there is 1 key change,  backup!

dir path-without-write-permission

2.2 Reproduction procedure

1. configure as above;

2. write to db

3. restart the server and observe dataloss

2.2.1 Timing order

In this order

2.2.2 Events order externally controllable?

Yes

2.3 Can the logs tell how to reproduce the failure?

Yes. The error log will log the rdb save failure & restart event

2.4 How many machines needed?

1

3. Diagnosis procedure

3.1 Detailed Symptom (where you start)

The error messages --- cannot save

3.2 Backward inference

It’s pretty straight-forward that the error handling of this RDB save failure is wrong. The system simply shouldn’t continue execution.

4. Root cause

When RDB file cannot be saved, the system shouldn’t have continued execution

4.1 Category:

Incorrect error handling (handled but wrong, simply not tested!)

5. Fix

5.1 How?

Make the system refuse any additional write when RDB cannot be saved.

@@ -656,6 +656,7 @@ int rdbSave(char *filename) {

     redisLog(REDIS_NOTICE,"DB saved on disk");

     server.dirty = 0;

     server.lastsave = time(NULL);

+    server.lastbgsave_status = REDIS_OK;

     return REDIS_OK;

 

 werr:

@@ -1061,12 +1062,15 @@ void backgroundSaveDoneHandler(int exitcode, int bysignal) {

             "Background saving terminated with success");

         server.dirty = server.dirty - server.dirty_before_bgsave;

         server.lastsave = time(NULL);

+        server.lastbgsave_status = REDIS_OK;

     } else if (!bysignal && exitcode != 0) {

         redisLog(REDIS_WARNING, "Background saving error");

+        server.lastbgsave_status = REDIS_ERR;

     } else {

         redisLog(REDIS_WARNING,

             "Background saving terminated by signal %d", bysignal);

         rdbRemoveTempFile(server.rdb_child_pid);

+        server.lastbgsave_status = REDIS_ERR;

     }

     server.rdb_child_pid = -1;

     /* Possibly there are slaves waiting for a BGSAVE in order to be served

+    /* Don't accept write commands if there are problems persisting on disk. */

+    if (server.saveparamslen > 0 && server.lastbgsave_status == REDIS_ERR &&

+        c->cmd->flags & REDIS_CMD_WRITE)

+    {

+        addReply(c, shared.bgsaveerr);

+        return REDIS_OK;

+    }

+