New cassandra release features snappy compression to save space on datacentre and SSL encryption for inter-data center communication.
The symptom is:
Unable to decompress content of incoming TCP connection from inter-datacenter communication. The backup datacenter won’t receive valid data, but instead receive all junks as the replicated. As a result, if the active datacenter goes down, the backup datacenter won’t be able to serve.
javax.net.ssl.SSLException: bad record MAC
java.io.IOException: CRC unmatched
1.2.1 Were there multiple exceptions?
all affected interconnected data centers
no, because there’s no dataloss
3 nodes in each datacentre. All nodes are configured with snappy compression. All data centers communications are encrypted with ssl. One datacenter must be using AWS and other using Rackspace hosting.
1) Configure SSTable compression switched on (config change)
2) start nodetool rebuild command on AWS. (feature start)
no (data race)
2 is the minimum requirement
When performing node rebuild command on AWS to Rackspace, we get SSL problems and snappy compression errors. The setup is simple: 3 nodes in AWS east, 3 nodes in Rackspace.
Packet-level inspection revealed malformed packets on both end of the communication. So we eliminated the communication problem. It is intuitive that the problem occured on the machine the packet is generated on. Further investigation found out that the problem only happens when the inter-datacenter bandwidth is throttled to 1Mbps. So this leads us to a race condition problem. After doing debugging traces on cassandra, we found out the race condition is in the part where the code handles decompression of sstables when these they are streamed from the remote datacentre. More detailed analysis showed CompressedFileStreamTask function is not sending the right part of the decompressed SSTable when using internode encryption and that causes various IOException observed in the symptoms.
Fixed the race condition such that CompressedFileStreamTask sends the right part of SSTable to the remote datacenter.
+ // seek to the beginning of the section when socket channel is not available
+ if (sc == null)