[redis]github-103 Report

https://github.com/antirez/redis/issues/103

1. Symptom

During server restart, when RDB (snapshot of the db for backup) file cannot be loaded because wrong signature, etc., the system continues execution. resulting in dataloss.  

1.1 Severity

Critical

1.2 Was there exception thrown?

Yes. Multiple.

1.2.1 Were there multiple exceptions?

Yes.

Server restart

Wrong signature trying to load DB from file”

“Can't handle RDB format version”

1.3 Scope of the failure

Massive dataloss!

2. How to reproduce this failure

2.0 Version

2.4.unstable

2.1 Configuration

After restart, change the RDB format version to an invalid one.

2.2 Reproduction procedure

1. shutdown server

2. change the RDB format version

3. start server

2.2.1 Timing order

In this order

2.2.2 Events order externally controllable?

Yes

2.3 Can the logs tell how to reproduce the failure?

Yes. The error log will log the rdb load failure & restart event

2.4 How many machines needed?

2 (need a client)

3. Diagnosis procedure

3.1 Detailed Symptom (where you start)

The error messages --- cannot save

3.2 Backward inference

It’s pretty straight-forward that the error handling of this RDB load failure is wrong. The system simply shouldn’t continue execution.

4. Root cause

When RDB file cannot be loaded, the system shouldn’t have continued execution.

4.1 Category:

Incorrect error handling (handled but wrong, simply not tested!)

This was not intentional, and is an error introduced later, but never found since it is not common to have mismatching RDB versions or file signature.”

5. Fix

5.1 How?

Make the system refuse any to start when RDB load error occurs

@@ -947,19 +947,24 @@ int rdbLoad(char *filename) {

     rio rdb;

 

     fp = fopen(filename,"r");

-    if (!fp) return REDIS_ERR;

+    if (!fp) {

+        errno = ENOENT;

+        return REDIS_ERR;

+    }

     rioInitWithFile(&rdb,fp);

     if (rioRead(&rdb,buf,9) == 0) goto eoferr;

     buf[9] = '\0';

     if (memcmp(buf,"REDIS",5) != 0) {

         fclose(fp);

         redisLog(REDIS_WARNING,"Wrong signature trying to load DB from file");

+        errno = EINVAL;

         return REDIS_ERR;

     }

     rdbver = atoi(buf+5);

     if (rdbver < 1 || rdbver > 2) {

         fclose(fp);

         redisLog(REDIS_WARNING,"Can't handle RDB format version %d",rdbver);

+        errno = EINVAL;

         return REDIS_ERR;

     }

@@ -1802,8 +1802,13 @@ int main(int argc, char **argv) {

         if (loadAppendOnlyFile(server.appendfilename) == REDIS_OK)

             redisLog(REDIS_NOTICE,"DB loaded from append only file: %.3f seconds",(float)(ustime()-start)/1000000);

     } else {

-        if (rdbLoad(server.dbfilename) == REDIS_OK)

-            redisLog(REDIS_NOTICE,"DB loaded from disk: %.3f seconds",(float)(ustime()-start)/1000000);

+        if (rdbLoad(server.dbfilename) == REDIS_OK) {

+            redisLog(REDIS_NOTICE,"DB loaded from disk: %.3f seconds",

+                (float)(ustime()-start)/1000000);

+        } else if (errno != ENOENT) {

+            redisLog(REDIS_WARNING,"Fatal error loading the DB. Exiting.");

+            exit(1);

+        }