[redis]github-141-2 Report

https://github.com/antirez/redis/issues/141

It turns out there are two bugs causing two failures in this ticket. This one is about the second failure.

1. Symptom

When multiple slaves connects to the master at the same time, the socket buffers were not correctly created, causing them sharing buffers and resulting in data loss in the slaves.

1.1 Severity

critical

1.2 Was there exception thrown?

1.2.1 Were there multiple exceptions?

1.3 Scope of the failure

One of the slaves

2. How to reproduce this failure

2.0 Version

redis-2.4.0-rc7

2.1 Configuration

Standard

2.2 Reproduction procedure

1. connect slave (add node)

2. connect another slave (add node)

2.2.1 Timing order

In this order

2.2.2 Events order externally controllable?

Yes

2.3 Can the logs tell how to reproduce the failure?

Yes.

2.4 How many machines needed?

3. (master + 2 slaves)

3. Diagnosis procedure

3.1 Detailed Symptom (where you start)

Data loss in slave, no clear evidence.

3.2 Backward inference

It’s hard as there is not much evidences.

4. Root cause

The socket buffer wasn’t created correctly for the 2nd and later slaves. The buffers were all messed up...

4.1 Category:

Semantic