CASSANDRA-1292

1. Symptom

Multiple migration might run at once.

Background:

Migration is a process/feature in cassandra which moves the database from one keyspace to another. The use case can be migrating (enlarging) from 3 node cluster to 10 node cluster.

Category (in the spreadsheet):

wrong computation

1.1 Severity

Critical        

1.2 Was there exception thrown? (Exception column in the spreadsheet)

yes, InvalidRequestException

 

1.2.1 Were there multiple exceptions?

no 

1.3 Was there a long propagation of the failure?

no

 

1.4 Scope of the failure (e.g., single client, all clients, single file, entire fs, etc.)

single cluster

 

Catastrophic? (spreadsheet column)

no, no dataloss

 

2. How to reproduce this failure

2.0 Version

0.7beta1

2.1 Configuration

Standard configuration before migration. Add one more node during migration.

 

# of Nodes?

2

2.2 Reproduction procedure

1) add empty node

2) start cassandra migration (feature start)

Num triggering events

2

 

2.2.1 Timing order (Order important column)

yes

2.2.2 Events order externally controllable? (Order externally controllable? column)

yes

2.3 Can the logs tell how to reproduce the failure?

yes

2.4 How many machines needed?

2

2.5 How hard is the reproduction?

medium

3. Diagnosis procedure

Error msg?

yes

3.1 Detailed Symptom (where you start)

After defining a new keyspace. We need to run database migration. During the migration phase, multiple migrations ran at the same time.

3.2 Backward inference

Backwards inference requires lots of domain knowledge. The service.MigrationManager class manages a MIGRATION_STAGE where nodes should execute db.migration.Migration instances.

The problem is that the node that a client connects to via Thrift or Avro initiates the migration in their client thread (calls migration.apply). Instead, the Thrift and Avro clients should ensure that the migration occurs in MIGRATION_STAGE, and should block until the migration is applied by the stage.

To summarize, the client should block further migration attempts if one migration is already in progress. Also, when applying the migration through a client, the database should be put into “MIGRATION_STAGE”.

 

4. Root cause

Implementation of migration feature on the client is incorrect. Cassandra needs to be put into MIGRATION_STAGE mode in order to perform migration.

4.1 Category:

semantic

5. Fix

5.1 How?

The fix is to put the database in MIGRATION STAGE when migrating and block further migration requests.

5.2 Exception behavior?

no more exception

5.3 # of discussion threads?

3