CASSANDRA-2428

1. Symptom

The cleanup operation on a node with configuration “join_ring=false” removes all data. This operation was automatically started whenever Cassandra noticed “join_ring” is set to false. This is an undocumented feature.

 

Category (in the spreadsheet):

data loss 

1.1 Severity

critical

1.2 Was there exception thrown? (Exception column in the spreadsheet)

no

1.2.1 Were there multiple exceptions?

no

 

1.3 Was there a long propagation of the failure?

no

1.4 Scope of the failure (e.g., single client, all clients, single file, entire fs, etc.)

entire fs (on the affected node)

 

Catastrophic? (spreadsheet column)

no (affects a subset of nodes whose data is replicated)

2. How to reproduce this failure

2.0 Version

0.7.5

2.1 Configuration

basic configuration but put the node under maintenance

 

# of Nodes?

1

2.2 Reproduction procedure

1. stop a node that has valid data (node stop)

2. set join_ring to false (config change)

3. start that node to join the cluster (node start)

 

Num triggering events

3

 

2.2.1 Timing order (Order important column)

yes

2.2.2 Events order externally controllable? (Order externally controllable? column)

yes

2.3 Can the logs tell how to reproduce the failure?

yes

2.4 How many machines needed?

1

 

3. Diagnosis procedure

Error msg?

no

3.1 Detailed Symptom (where you start)

The problem started when the node ran out of space. There have been too many SSTables being created thus unable compact. Before bringing node back online, we set the join_ring (join the cluster setting) to false. By doing this, we will avoid live traffic but still able to compact.  When we brought the node back online (with “join_ring” still set to false) and run cleanup, we noticed data is removed from nodes.

Note this is not the intended use of “join_ring”. “join_ring” is supposed to always be true when there is valid data on the node. In this case, the admin uses this hack to try to clean-up the data. It’s an undocumented feature.

3.2 Backward inference

Since this is a unusual situation. We do not switch join_ring parameter to false if a node has data on it. This operation is not officially supported or documented. All we can do now is ask the developer to fix this dangerous behavior temporarily. Until the official feature is implemented, there is no plan to fix the problem from the root cause.

 

4. Root cause

The admin was trying to use a feature that is not supported. The admin did not expect Cassandra to run clean-up automatically when “join_ring” is set to false.

4.1 Category:

incorrect handling

4.2 Are there multiple fault?

no

4.2 Can we automatically test it?

yes

5. Fix

5.1 How?

The fix is if cassandra detect range is empty (join_ring=false), we do not run cleanup procedure.

+        if (ranges.isEmpty())
+        {
+            logger.info("Cleanup cannot be ran before the node join the ring");
+            return;
+        }