CASSANDRA-2613

1. Symptom

When user is trying to delete a row or a few rows using CQL (Cassandra Query Language), the deletion will fail as long as there is a NULL value in any of the columns in the rows to delete.

Similarly, if there is a NULL value in one row, the user cannot retrieve the row using CQL, nor can she use the Range() function.

 

Category (in the spreadsheet):

early termination,

1.1 Severity

Critical (new feature in version 0.8.0)

1.2 Was there exception thrown? (Exception column in the spreadsheet)

Yes.

assert ['kd', None, None] == r, r - AssertionError: [u'kd'] and AssertionError: [u'kc']

assert ['Row Key', 'ca1', 'col', 'cd1'] == [col_dscptn[0] for col_dscptn in d], d - AssertionError: [('Row Key', 'org.apache.cassandra.db.marshal.UTF8Type', None, None, None, None, None, False), ('col', 'org.apache.cassandra.db.marshal.AsciiType', None, None, None, None, True), ('cd1', 'org.apache.cassandra.db.marshal.AsciiType', None, None, None, None, True)]

 

1.2.1 Were there multiple exceptions?

Yes, There could be multiple exception from different part of Cassandra (delete column from a row, delete columns from multiple rows, delete entire rows, retrieve multiple rows, range)

 

1.3 Was there a long propagation of the failure?

No.

1.4 Scope of the failure (e.g., single client, all clients, single file, entire fs, etc.)

Single file. CQL API component is affected

 

Catastrophic? (spreadsheet column)

no

 

2. How to reproduce this failure

2.0 Version

0.8.0

2.1 Configuration

Must use CQL (Cassandra query language)

 

# of Nodes?

1

2.2 Reproduction procedure

1. CREATE COLUMNFAMILY users (KEY varchar PRIMARY KEY, password varchar, gender varchar);

2. INSERT INTO users (KEY, password) VALUES ('user', 'password);

3. select name from users (problem will show up here because gender field is null)

 

There are multiple ways of producing this failure. The above mentioned is accessing rows.

Delete column from a row, delete columns from multiple rows, delete entire rows, range operation on the affected row will also go into the same code path (decode row function).

 

 

2.2.1 Timing order (Order important column)

NA

2.2.2 Events order externally controllable? (Order externally controllable? column)

yes

2.3 Can the logs tell how to reproduce the failure?

yes

2.4 How many machines needed?

1

2.5 How hard is the reproduction?excep

The reproduction is easy. Insert into a row, but do not insert into all columns. Then delete the row.

 

3. Diagnosis procedure

Error msg?

yes

3.1 Detailed Symptom (where you start)

When trying to delete column from a row, delete columns from multiple rows, delete entire rows, access multiple rows and range on rows containing null columns, the operation will fail.

3.2 Backward inference

Looking into the logs, we see assertions where one or more of the column is (NONE)

 

4. Root cause

The bug is in the decode_row function in CQL’s code. CQL will use this library to decode the rows, and the code does not handle the NULL value correctly, terminating too early.

4.1 Category:

semantic

4.2 Are there multiple fault?

no

4.2 Can we automatically test it?

Yes, a testcase exposed this problem

5. Fix

5.1 How?

In the CQL decoder function, if a value is NULL, do not terminate the processing.

 

-            if column.value == None:
-                continue
-
            description.append((unmarshal(column.name), comparator, None, None, None, None, True))
            validator = self.__validator_for(keyspace, column_family, column.name)
-            values.append(unmarshallers.get(validator, unmarshal_noop)(column.value))
+            if column.value == None:
+                values.append(None)
+            else:
+                values.append(unmarshallers.get(validator, unmarshal_noop)(column.value))

        return description, values