CASSANDRA-3551
User upgraded from 1.0.2 to 1.0.5. Some column families always get TimeoutException when doing RangeSlice with Quorum and rplication factor of 3 .
early termination,
critical
yes, java.util.concurrent.TimeoutException
no
no
single file
no
1.0.5
Quroum with replication factor of 3
3
1) Upgrade cluster from 1.0.2 to 1.0.5 (feature start)
2) rangeslice a column family (file read)
2
yes
yes
yes
3
yes
User upgraded from 1.0.2 to 1.0.5. Some column families always get TimeoutException when doing RangeSlice with Quorum and rplication factor of 3. No Error in node logs, no anomalies in system monitoring (like sudden increased disk latency). Only cassandra’s storageproxy latency goes way up (hundreds of miliseconds) before failure.
Closer look at the code reveals that there are some changes in the RowStorageProxy algorithm between 1.0.2 to 1.0.5. The developer did not finish implementing the new algorithm.
When rewriting storage proxy, the developer did not finish the implementation.
Semantic
no
yes
Finished implementing features in StorageProxy
+ WriteResponse response = new WriteResponse(rm.getTable(), rm.key(), true);
+ Message responseMessage = WriteResponse.makeWriteResponseMessage(message, response);
+ MessagingService.instance().sendReply(responseMessage, id, message.getFrom());