CASSANDRA-1235

1. Symptom

When running test, individual column insert works with values generated. However batch insert with the same values cause an encoding failure of the key, some data may be corrupted.

 

Category (in the spreadsheet):

wrong computation

 

1.1 Severity

Critical

1.2 Was there exception thrown? (Exception column in the spreadsheet)

No

 

1.2.1 Were there multiple exceptions?

No

1.3 Was there a long propagation of the failure?

No. Very short propagation.

 

1.4 Scope of the failure (e.g., single client, all clients, single file, entire fs, etc.)

Single File. The only part affected by this failure is when batch inserting key values and getting corrupted values.

 

Catastrophic? (spreadsheet column)

no

 

2. How to reproduce this failure

2.0 Version

0.6.5

2.1 Configuration

Standard default configuration

 

# of Nodes?

1

2.2 Reproduction procedure

  1. Define keys with UUID. E.g.

t1 = TestItem({'test':'foo'})

t1.key = TestItemKey(uuid.UUID('936a87e2-a5fc-11df-82c1-000c29f73b23').bytes)

  1. Save key E.g. t1.save()
  2. Print key E.g. print TestItem().load(t1.key.clone())

Num triggering events

3

2.2.1 Timing order (Order important column)

NA

2.2.2 Events order externally controllable? (Order externally controllable? column)

Yes, with the right input, the failure can be deterministically reproduced

2.3 Can the logs tell how to reproduce the failure?

Yes, log tells you exactly how to reproduce it.

2.4 How many machines needed?

1

2.5 How hard is the reproduction?

Reproduction is easy. Initialize, save, print.

 

3. Diagnosis procedure

Error msg?

No. There is no error message

3.1 Detailed Symptom (where you start)

Keys saved are corrupted.

3.2 Backward inference

Not all keys are corrupted. Keys are only corrupted when match inserted with same values causes an encoding failure on the keys. Bytes are dropped from the end of the byte array that represents the key value.

3.3 How hard is the diagnosis?

The diagnosis is easy. The data corruption can be observed and diagnosed easily because the byte array was dropped from the end of byte array, not truly corrupted. From there, a solution is easily found.

 

4. Root cause

4.1 Category:

Semantic

4.2 Are there multiple fault?

No, only 1 fault

4.2 Can we automatically test it?

Yes

5. Fix

5.1 How?

The fix is simple. When doing row mutation, instead of using key.trim, use key as the parameter.