[redis]github-607 Report
Cluster crash entirely.
https://github.com/antirez/redis/issues/607
Severe (catastrophic since the entire cluster will crash).
Yes. Crash.
Yes. Before the crash, there was an error command:
redis 127.0.0.1:6379> MGET foo foobar
(error) ERR Multi keys request invalid in cluster
This command’s error is where the things start to go south.
Bring down a single node in the cluster setting. However, since the client will repeat the same command on the data before they got the results, so eventually this sequence of command will bring down the entire cluster. Catastrophic.
2.9.7
Standard
0. start a cluster with three nodes
0. SET foobar test
1. redis 127.0.0.1:6379> MGET foo foobar
(error) ERR Multi keys request invalid in cluster
2. redis 127.0.0.1:6379> GETRANGE foobar 0 1
(error) MOVED 1650 127.0.0.1:6380
3. redis 127.0.0.1:6379> GETRANGE foobar 0 1
(error) ERR unknown command ''
4. redis 127.0.0.1:6379> GETRANGE foobar 0 1
Could not connect to Redis at 127.0.0.1:6379: Connection refused
It is also possible that the first GETRANGE command fails immediately after the “MGET” command.
In this order
Yes.
The client log can tell.
2 to start the cluster
They have the core-dump:
=== REDIS BUG REPORT START: Cut & paste starting from here ===
[21541] 29 Jul 10:52:23.419 # Redis 2.9.7 crashed by signal: 11
[21541] 29 Jul 10:52:23.419 # Failed assertion: <no assertion failed> (<no file>:0)
[21541] 29 Jul 10:52:23.419 # --- STACK TRACE
./redis-server(logStackTrace+0x71)[0x8083ad1]
./redis-server(decrRefCount+0x8)[0x8067e38]
[0x52440c]
./redis-server(decrRefCount+0x8)[0x8067e38]
./redis-server[0x8064121]
./redis-server(resetClient+0xf)[0x8064cff]
./redis-server(processInputBuffer+0x53)[0x8066083]
./redis-server(readQueryFromClient+0x9d)[0x806618d]
./redis-server(aeProcessEvents+0x140)[0x8057ac0]
./redis-server(aeMain+0x2c)[0x8057dbc]
./redis-server(main+0x299)[0x8056c99]
/lib/i386-linux-gnu/libc.so.6(__libc_start_main+0xf3)[0x129113]
./redis-server[0x8056e09]
The problem is that the first time the command failed, the system did not handle it well and had an extra “decrRefCount” on the client object.
/* If it is not the first key, make sure it is exactly
* the same key as the first we saw. */
if (!equalStringObjects(firstkey,margv[keyindex[j]])) {
← Error handling code
- decrRefCount(firstkey);
getKeysFreeResult(keyindex);
return NULL;
}
The first time MGET returns error, the server incorrectly decremented the reference count.
Incorrect exception handling.