CASSANDRA-1299

1. Symptom

SSTableUtils writing some garbage data (Data in DataOutputBuffer in additional to the valid data) while bypassing memtable to assemble is own SSTables. The garbage data might cause EOF exception to be thrown when reading it.

 

Category (in the spreadsheet):

wrong computation

1.1 Severity

Critical

1.2 Was there exception thrown? (Exception column in the spreadsheet)

Yes. Java.io.EOFException.

java.io.IOError: java.io.EOFException

                    at org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:103)

                    at org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:32)

                    at org.apache.commons.collections.iterators.CollatingIterator.set(CollatingIterator.java:284)

                    at org.apache.commons.collections.iterators.CollatingIterator.least(CollatingIterator.java:326)

                    at org.apache.commons.collections.iterators.CollatingIterator.next(CollatingIterator.java:230)

                    at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:68)

                    at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)

                    at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)

                    at com.google.common.collect.Iterators$7.computeNext(Iterators.java:604)

                    at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)

                    at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)

                    at org.apache.cassandra.db.ColumnIndexer.serializeInternal(ColumnIndexer.java:76)

                    at org.apache.cassandra.db.ColumnIndexer.serialize(ColumnIndexer.java:50)

                    at org.apache.cassandra.io.LazilyCompactedRow.<init>(LazilyCompactedRow.java:62)

                    at org.apache.cassandra.io.CompactionIterator.getCompactedRow(CompactionIterator.java:135)

                    at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:107)

                    at org.apache.cassandra.io.CompactionIterator.getReduced(CompactionIterator.java:46)

                    at org.apache.cassandra.utils.ReducingIterator.computeNext(ReducingIterator.java:73)

                    at com.google.common.collect.AbstractIterator.tryToComputeNext(AbstractIterator.java:136)

                    at com.google.common.collect.AbstractIterator.hasNext(AbstractIterator.java:131)

                    at org.apache.commons.collections.iterators.FilterIterator.setNextObject(FilterIterator.java:183)

                    at org.apache.commons.collections.iterators.FilterIterator.hasNext(FilterIterator.java:94)

                    at org.apache.cassandra.db.CompactionManager.doCompaction(CompactionManager.java:334)

                    at org.apache.cassandra.db.LongCompactionSpeedTest.testCompaction(LongCompactionSpeedTest.java:101)

                    at org.apache.cassandra.db.LongCompactionSpeedTest.testCompactionWide(LongCompactionSpeedTest.java:49)

Caused by: java.io.EOFException

                    at java.io.RandomAccessFile.readInt(RandomAccessFile.java:725)

                    at java.io.RandomAccessFile.readLong(RandomAccessFile.java:758)

                    at org.apache.cassandra.db.TimestampClockSerializer.deserialize(TimestampClock.java:128)

                    at org.apache.cassandra.db.TimestampClockSerializer.deserialize(TimestampClock.java:119)

                    at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:90)

                    at org.apache.cassandra.db.ColumnSerializer.deserialize(ColumnSerializer.java:31)

                    at org.apache.cassandra.io.sstable.SSTableIdentityIterator.next(SSTableIdentityIterator.java:99)

1.2.1 Were there multiple exceptions?

No. EOFException is the only exception. This is not a complicated failure.

1.3 Was there a long propagation of the failure?

Yes. The failure occurs after SSTableUtils (specifically SSTableWriter) writes the content of DataOutputBuffer to SSTable on disk.

1.4 Scope of the failure (e.g., single client, all clients, single file, entire fs, etc.)

Single file. The scope of the failure is small. The only affected area is the corrupted SSTable that is written to disk.

Catastrophic? (spreadsheet column)

no

2. How to reproduce this failure

2.0 Version

0.7 beta 1

2.1 Configuration

Standard default configuration

 

# of Nodes?

1

2.2 Reproduction procedure

Invoke a SSTableUtils function and wait for SSTableWriter to reach invalid data in DataOutputBuffer.

Events

1. File write

2. Wait until compaction occurs (feature start)

 

Num triggering events

2

 

2.2.1 Timing order (Order important column)

Yes

2.2.2 Events order externally controllable? (Order externally controllable? column)

Yes, with the right input, the failure can be deterministically reproduced

2.3 Can the logs tell how to reproduce the failure?

yes

2.4 How many machines needed?

1

3. Diagnosis procedure

Error msg?

Yes. The failure does not produce any error/warning message during the failure execution. The only error message is EOFException.

3.1 Detailed Symptom (where you start)

We first see that SSTable contains garbage data.

3.2 Backward inference

The problem is traced to SSTableWriter which wrote the garbage data. The root problem is SSTableWriter tries to write everything in DataOutputBuffer. Instead, it should only write the valid data.

3.3 Are the printed log sufficient for diagnosis?

yes

 

4. Root cause

4.1 Category:

Semantic

4.2 Are there multiple fault?

No. There is only 1 fault.

4.2 Can we automatically test it?

Yes, we can automatically test it.

5. Fix

5.1 How?

Previously, SSTableWriter writes everything in DataOutputBuffer into SSTable. The fix copies only the valid portion of the DataOutputBuffer, then pass it to SSTableWriter  to write into SSTable.

5.2 Exception behavior?

After the fix, there is no more exception. This behavior is expected.

6.Any interesting findings?

Normally, when memtable is full, it is compacted and written to disk as SSTable. However in this case, memtable is bypassed and SSTableUtils does an exception handling of SSTable.

 

7. Scope of the failure

The scope of the failure is small. The only affected area is the corrupted SSTable that is written to disk.