CASSANDRA-2773

1. Symptom

When executing an unsupported operation (insert & delete in the same mutation) cassandra blocks the connection and appears as dead. Only removing commit log and restart will restore the cluster.

 

Category (in the spreadsheet):

Hang

1.1 Severity

critical

1.2 Was there exception thrown? (Exception column in the spreadsheet)

yes, but only after restarting cassandra.

unsupportedexception : "Index manager cannot support deleting and inserting into a row in the same mutation" 

1.2.1 Were there multiple exceptions?

no

 

1.3 Was there a long propagation of the failure?

no

1.4 Scope of the failure (e.g., single client, all clients, single file, entire fs, etc.)

all clients

 

Catastrophic? (spreadsheet column)

yes. Cassandra stops responding and restart alone does not fix the problem.

 

2. How to reproduce this failure

2.0 Version

0.7.7

2.1 Configuration

standard configuration using HECTOR as the client

 

# of Nodes?

1

2.2 Reproduction procedure

1. create mutator by using hector api (feature start)

2. Insert a few columns into the mutator for key "key1", cf "standard".  (file write)

3. add a deletion to the mutator to delete the record of "key1", cf "standard". (file write)

4. execute the mutator (feature start)

Num triggering events

4

 

2.2.1 Timing order (Order important column)

yes

2.2.2 Events order externally controllable? (Order externally controllable? column)

yes

2.3 Can the logs tell how to reproduce the failure?

yes

2.4 How many machines needed?

1

3. Diagnosis procedure

Error msg?

yes

3.1 Detailed Symptom (where you start)

In hector, the user tries to send an unsupported command (both insert and delete in 1 mutation) to cassandra. Cassandra stops responding. Restart does not fix the problem unless commitlog is deleted.

3.2 Backward inference

No backwards inference needed. It is unsupported operation. We found race condition between delete and insert. However the developer has a workaround to fix the problem.

 

4. Root cause

Race condition with insert and delete. An unsupported operation exception is thrown.

4.1 Category:

incorrect handling (you can detect the failure once you trigger the handler -- statement coverage is enough)

4.2 Are there multiple fault?

no

4.2 Can we automatically test it?

yes

5. Fix

5.1 How?

Instead of throwing an exception, fix is the separate mutation into two separate mutation. First insert, then mark as delete.

            if (newColumn != null && cf.isMarkedForDelete())

-                throw new UnsupportedOperationException("Index manager cannot support deleting and inserting into a row in the same mutation");
+            {
+                // row is marked for delete, but column was also updated.  if column is timestamped less than
+                // the row tombstone, treat it as if it didn't exist.  Otherwise we don't care about row
+                // tombstone for the purpose of the index update and we can proceed as usual.
+                if (newColumn.timestamp() <= cf.getMarkedForDeleteAt())
+                {
+                    // don't remove from the cf object; that can race w/ CommitLog write.  Leaving it is harmless.
+                    newColumn = null;
+                }
+            }

6.Any interesting findings?

watch out for unsupported / undefined function crashing the program.

 

7. Scope of the failure

all clients