CASSANDRA-6638

Background knowledge:

http://www.datastax.com/dev/blog/virtual-nodes-in-cassandra-1-2

1. Symptom

SSTableScanner could skip rows with vnodes. This SSTableScanner is used by SSTableReader, which is used by the any tasks, including user-read, to read the SSTable.

This means that to a user, the data cannot be read, and effectively seems like dataloss.

 

Category (in the spreadsheet):

Data loss

1.1 Severity

Blocker

1.2 Was there exception thrown? (Exception column in the spreadsheet)

yes

 

1.2.1 Were there multiple exceptions?

no

 

1.3 Was there a long propagation of the failure?

no

 

1.4 Scope of the failure (e.g., single client, all clients, single file, entire fs, etc.)

Single file. The failure affected the core of Cassandra. Because of this, dataloss would be expected. The severity of dataloss depends on how many vnodes and nodes there are. The more vnodes& nodes, the percentage of dataloss.

 

Catastrophic? (spreadsheet column)

no

 

2. How to reproduce this failure

2.0 Version

2.0.1

2.1 Configuration

Must be configured using vnodes

 

# of Nodes?

1

2.2 Reproduction procedure

Too many ways to trigger it: as long as the Scanner is triggered.

For example:

1. insert some data

2. force a flush (so data will be flushed to on-disk SSTable).

3. Read the data

Then read will fail.

Num triggering events

2

 

2.2.1 Timing order (Order important column)

Yes

2.2.2 Events order externally controllable? (Order externally controllable? column)

yes

2.3 Can the logs tell how to reproduce the failure?

yes

2.4 How many machines needed?

2

3. Diagnosis procedure

Error msg?

yes

3.1 Detailed Symptom (where you start)

SSTableScanner does not support vnodes properly. Keys skipped in the ringspace under certain conditions.

3.2 Backward inference

When looking through the SSTableScanner code, we noticed that developer made a wrong assumption. Due to the way vnode is designed, to be at the real beginning of the SSTable, we should do ifile.seek(0) and dfile.seek(0).

 

4. Root cause

4.1 Category:

Semantic

4.2 Are there multiple fault?

no

4.2 Can we automatically test it?

Yes, but only with testcase. It is hard to test it from a general perspective.

5. Fix

5.1 How?

The fix fixes developer’s wrong assumption. In addition to indexPosition == -1, ifile.seek(0) and dfile.seek(0) must be performed in order to guarantee we are at the beginning of the SSTable.