CASSANDRA-4205

1. Symptom

On a system on which users frequently over-writes existing rows/columns, after upgrading Cassandra from 0.7.9 to 1.0.7, read/write performance is not optimal.

 

Category (in the spreadsheet):

Performance, 

1.1 Severity

Critical

1.2 Was there exception thrown? (Exception column in the spreadsheet)

No

 

1.2.1 Were there multiple exceptions?

No        

 

1.3 Was there a long propagation of the failure?

No

 

1.4 Scope of the failure (e.g., single client, all clients, single file, entire fs, etc.)

Single file with single client

 

Catastrophic? (spreadsheet column)

no  

2. How to reproduce this failure

2.0 Version

1.1.1

2.1 Configuration

Replication factor must be set greater than 1.

 

# of Nodes?

1

2.2 Reproduction procedure

1. Update row on Cassandra 0.7.9 (file write)

2. Upgrade Cassandra 0.7.9 to 1.0.7 (feature start)

3. Read the updated row on Cassandra 1.0.7 (file read)

 

Num triggering events

3

 

2.2.1 Timing order (Order important column)

yes

2.2.2 Events order externally controllable? (Order externally controllable? column)

yes

2.3 Can the logs tell how to reproduce the failure?

Yes. When reading the updated column, nodetool cfhistograms shows cassandra touching two SSTables.

2.4 How many machines needed?

1

 

3. Diagnosis procedure

Error msg?

no

3.1 Detailed Symptom (where you start)

The performance is lower than expected for read and write after upgrade.

3.2 Backward inference

Because the drop in performance noticed after upgrade, taking a look at the IO logs with the tool nodetool cfhistogram will be useful.  From the IO logs, we can see for update row operation, multiple replicas are read from different nodes. This behavior is strange because because we are doing unnecessary reads. The correct way to do update row operation is to first check the max timestamp value (newest time). The max timestamp value serves as the index which points to the location of the newest replica. From that point, cassandra will only need to fetch that specific replica. After analysing the sequence of events, we noticed the max timestamp field is missing after upgrading Cassandra to newer version. Due to this failure, the next update row operation must first read in multiple replica for comparison and return the newest copy.

Note: Cassandra creates tombstones for every update operation. If max timestamp value is missing, not only does Cassandra need to compare the replicas, Cassandra may need to read in the tombstones for comparison as well. This would intensify the existing problem.  

 

4. Root cause

Max timestamp value is not carried over from previous version of Cassandra. So when reading the value, multiple replicas must be obtained, their timestamps compared, before returning the replica with the newest time stamp. We can’t get the newest replica right away thus must compare the returned result thus gives lower performance

4.1 Category:

semantic

4.2 Can we automatically test it?

yes

5. Fix

5.1 How?

The patch made sure max timestamp value gets carried over to upgraded cassandra version for all SSTable content (including tombstones).