CASSANDRA-1573

1. Symptom

Background: Streaming is primarily used by cassandra itself to transfer data from one node to another. For this case, the failure occurs in a function where Cassandra transfer matching portion of sstable (or differential data) from a node to another node. If the SSTable is the same between both nodes, no data would need to be transferred.

Failure: Inside transferSSTables function within StreamOut, we do not have special handling to deal with empty stream (i.e. SSTable between two nodes are the same).

User perception: Since this function is not directly visible to the user, we can only observe it indirectly. User will have problems with moving or restoring nodes in multi node system . Because both nodes would have old data in on disk, there is a chance Cassandra will try to modify duplicated SSTable between the two nodes. In this case, we will trigger the failure stated above.

 

Category (in the spreadsheet):

early termination,

1.1 Severity

Blocker

1.2 Was there exception thrown? (Exception column in the spreadsheet)

yes, IOException

 

1.2.1 Were there multiple exceptions?

no

 

1.3 Was there a long propagation of the failure?

no

 

1.4 Scope of the failure (e.g., single client, all clients, single file, entire fs, etc.)

all clients

 

Catastrophic? (spreadsheet column)

no

 

2. How to reproduce this failure

2.0 Version

0.7 beta 3

2.1 Configuration

standard configuration

 

# of Nodes?

2

2.2 Reproduction procedure

1. Start empty steam (feature start)

2. transferSSTables (feature start)

 

Num triggering events

2

 

2.2.1 Timing order (Order important column)

yes

2.2.2 Events order externally controllable? (Order externally controllable? column)

yes

2.3 Can the logs tell how to reproduce the failure?

yes

2.4 How many machines needed?

1

 

3. Diagnosis procedure

Error msg?

yes. IOException

3.1 Detailed Symptom (where you start)

StreamOut only starts a stream if there are actually files to transfer. This means callbacks will never get called for streams that don't actually have anything to transfer.

3.2 Backward inference

The empty stream is a very obvious clue to the failure. Checking out this assumption easily identifies the problem.

y

 

4. Root cause

Do not have code to handle empty stream.

4.1 Category:

semantic

4.2 Are there multiple fault?

no

4.2 Can we automatically test it?

yes

5. Fix

5.1 How?

    public static void transferSSTables(StreamOutSession session, Collection<SSTableReader> sstables, Collection<Range> ranges) throws IOException

    {

        List<PendingFile> pending = createPendingFiles(sstables, ranges);

        if (pending.size() > 0)

        {

            session.addFilesToStream(pending);

            session.begin();

        }

       + else

       + {

       +     session.close();

       + }

    }