After performing rolling upgrade from cassandra from 1.1.7 to 1.2.0, new cassandra nodes can only see other 1.2.0 nodes. Nodes still running cassandra 1.1.7 seems to disappear.
yes, java.lang.RuntimeException: java.net.UnknownHostException
all nodes on older version of cassandra after upgrade
At least 2 node and perform rolling upgrade with at least one 1.1.7 node and 1.2.0 node
1. Perform rolling upgrade on 1 machine from 1.1.7 to 1.2.0(feature start)
2. Need to perform rolling upgrade.
Whether there are any error/warning message during the failure execution. This column should start with “yes” or “no”
Initially, the user have a fully functional cassandra 1.1.7 cluster containing multiple nodes. While performing rolling upgrade, some nodes have finished upgrading to 1.2.0 while other nodes are queued to be upgraded, but still on 1.1.7. The problem is exposed when the user cannot see nodes on older version of cassandra (1.1.7). Running node ring command only showed nodes with 1.2.0 connected to the cluster. However, the problem no longer affected the cluster after nodes moved to 1.2.0.
It seems that the problem is node with cassandra version 1.2.0 cannot see nodes with cassandra version 1.1.7. With the domain knowledge of the developer, it is obvious that multiple release cycle in cassandra caused this regression. The cause of these exceptions is CASSANDRA-4576. There, we added checks against VERSION_11 to prevent using the compatible mode with newer node that didn't need it. VERSION_11 has an actual value of 4. We closed the ticket on Sept 18, and that was that.
Fast forward to November, where we closed CASSANDRA-4880. To do this, we needed a protocol version bump, and created VERSION_117, which has an actual value of 5. Unfortunately we used <= comparisons in CASSANDRA-4576, but now had created a version higher than VERSION_11 that still needed the compatibility, and we got our original bug back.
The effect of this is if you upgrade from nodes on 1.1.7 or later to 1.2.0, the 1.2.0 nodes won't be able to gossip with the 1.1.7 nodes and they won't be visible in ring output on the 1.2.0 node until they too are on 1.2.0. The 1.1.7 nodes will still know about the 1.2.0 node, but they won't be able to successfully gossip (communicate) with it, and keep it marked down.
When incrementing version number, developer forgot to increment version compatibility check as well.
- if (version <= MessagingService.VERSION_11)
+ if (version < MessagingService.VERSION_12)