CASSANDRA-3003

1. Symptom

When performing “trunk single-pass streaming” on very large column family will fail and that node raise out of memory exception.

 

Category (in the spreadsheet):

early termination,

1.1 Severity

critical

1.2 Was there exception thrown? (Exception column in the spreadsheet)

yes, out of memory exception

 

1.2.1 Were there multiple exceptions?

no

 

1.3 Was there a long propagation of the failure?

no

 

1.4 Scope of the failure (e.g., single client, all clients, single file, entire fs, etc.)

single node

 

Catastrophic? (spreadsheet column)

no

 

2. How to reproduce this failure

2.0 Version

1.0.0

2.1 Configuration

basic configuration with streaming

 

# of Nodes?

1

2.2 Reproduction procedure

1. write large amount of data into a column family (file write)

2. perform trunk single-pass stream on a column family that is larger than “inMemoryLimit” (feature start)

 

Num triggering events

2

 

2.2.1 Timing order (Order important column)

Yes

2.2.2 Events order externally controllable? (Order externally controllable? column)

yes

2.3 Can the logs tell how to reproduce the failure?

yes

2.4 How many machines needed?

1

 

3. Diagnosis procedure

Error msg?

yes

3.1 Detailed Symptom (where you start)

When performing trunk single-pass streaming operation on large column family, we run out of memory.

3.2 Backward inference

First we check whether the column family we want to stream is larger than our available memory (inMemoryLimit value). The answer is yes and the current code doesn’t handle this in a graceful way.

 

4. Root cause

Streamed column family data cannot fit inside the memory and no graceful error handling is present.

4.1 Category:

Incorrect handling

4.2 Are there multiple fault?

no

4.2 Can we automatically test it?

yes

5. Fix

5.1 How?

Better error handling. Fix is to write a portion of the streamed data to disk and leave a portion of it in memory.