User upgraded from 1.0.2 to 1.0.5. Some column families always get TimeoutException when doing RangeSlice with Quorum and rplication factor of 3 .
Quroum with replication factor of 3
1) Upgrade cluster from 1.0.2 to 1.0.5 (feature start)
2) rangeslice a column family (file read)
User upgraded from 1.0.2 to 1.0.5. Some column families always get TimeoutException when doing RangeSlice with Quorum and rplication factor of 3. No Error in node logs, no anomalies in system monitoring (like sudden increased disk latency). Only cassandra’s storageproxy latency goes way up (hundreds of miliseconds) before failure.
Closer look at the code reveals that there are some changes in the RowStorageProxy algorithm between 1.0.2 to 1.0.5. The developer did not finish implementing the new algorithm.
When rewriting storage proxy, the developer did not finish the implementation.
Finished implementing features in StorageProxy
+ WriteResponse response = new WriteResponse(rm.getTable(), rm.key(), true);
+ Message responseMessage = WriteResponse.makeWriteResponseMessage(message, response);
+ MessagingService.instance().sendReply(responseMessage, id, message.getFrom());