CASSANDRA-3540

1. Symptom

After upgrading to 1.0.0+ from pre-1.0.0, we get java.lang.RuntimeException when starting a cassandra node.

 

Category (in the spreadsheet):

early termination,

1.1 Severity

critical

1.2 Was there exception thrown? (Exception column in the spreadsheet)

yes.

java.lang.RuntimeException: Cannot open /var/lib/cassandra/data/Index/AttractionLocationCategoryDateIdx.AttractionLocationCategoryDateIdx_09partition_idx-h-1 because partitioner does not match org.apache.cassandra.dht.LocalPartitioner

1.2.1 Were there multiple exceptions?

no

 

1.3 Was there a long propagation of the failure?

no

 

1.4 Scope of the failure (e.g., single client, all clients, single file, entire fs, etc.)

all the upgraded nodes

 

Catastrophic? (spreadsheet column)

no (although it affects entire cluster, it’s an upgrade so likely the cluster is not providing production service)

 

2. How to reproduce this failure

2.0 Version

1.0.6

2.1 Configuration

upgraded from pre-1.0.0 (pre version hc) to 1.0.0+ version of cassandra.

 

# of Nodes?

1

2.2 Reproduction procedure

1. upgrade version (config change)

2. start cassandra after upgrading (start node)

Num triggering events

2

 

2.2.1 Timing order (Order important column)

yes

2.2.2 Events order externally controllable? (Order externally controllable? column)

yes

2.3 Can the logs tell how to reproduce the failure?

yes

2.4 How many machines needed?

1

3. Diagnosis procedure

Error msg?

yes

3.1 Detailed Symptom (where you start)

After upgrading to 1.0.0+ from pre-1.0.0, we get java.lang.RuntimeException: Cannot open /var/lib/cassandra/data/Index/AttractionLocationCategoryDateIdx.AttractionLocationCategoryDateIdx_09partition_idx-h-1 because partitioner does not match org.apache.cassandra.dht.LocalPartitioner

3.2 Backward inference

With background knowledge we know there are changes in Partitioner component in 1.0.0 and above. We introduced secondary index partitioner in the new release. It is intuitive that the exposed error is from this new feature (no other major change existed). Then we looked into the code and found out cassandra failed when trying to retrieve the partitioner type cassandra’s secondary index. Since previous cassandra version does not have this value, cassandra does not know what to do thus raising the exception.

 

4. Root cause

Trying to access a value existed on new version of cassandra but not on older version.

4.1 Category:

Incorrect Handling

4.2 Are there multiple fault?

no

4.2 Can we automatically test it?

yes

5. Fix

5.1 How?

When returning the non-existent partition type with existing data from older cassandra release, return the value as NULL.