CASSANDRA-3156
java.lang.AssertionError exception when Row Repair is run.
Row Repair is a feature that checks and fixes the inconsistencies of row data. It works like this.
Optimal phase: Coordinator reads data from the closest replica, and the digest of the data from other replicas
If there is a mismatch (optimism fail), we go to the repair phase:
Coordinator sends data reads to all replicas to merge + repair
A "digest" query is like a read query except that instead of the receiving node actually returning the data, it only returns a digest (hash) of the would-be data.
The intent of submitting a digest query is to discover whether two or more nodes agree on what the current data is, without sending the data over the network. In particular for large amounts of data, this is a significant saving of bandwidth cost relative to sending the full data response.
Keep in mind that the cost of potentially going down to disk, and most or all of the CPU cost, associated with a query will still be taken on nodes that receive digest queries. The optimization is only for bandwidth.
In this failure, this exception could occur even when there is no actual data inconsistency.
wrong computation
Blocker
yes. java.lang.AssertionError
no
no
single client
no
1.0.0
Number of nodes must be greater or equal to 2. The node that is failing must not have a copy of the data which the local coordinator is trying to access.
2
1. start row repair (feature start)
yes
yes
yes
2
yes. java.lang.AssertionError exception in RowRepairResolver
Getting java.lang.AssertionError exception in RowRepairResolver.
After looking at the log, it seems that some spurious (false/random) digest mismatches mixed in. It seems to happen when coordinator does not have a copy of the data. Thus the data must came from a different node. However, when we sent a data request, we got a digest (see background information). In the process of looking at the code, we find that the code did not reset the buffer before each use.
yes
Did not reset buffer before each use, thus getting random leftover data from previous usage.
Semantic
no
yes
Wiped the buffer before use.
DataOutputBuffer out = threadLocalOut.get();
+ out.reset();