Published using Google Docs
ZOOKEEPER-512 (release)
Updated automatically every 5 minutes

ZooKeeper-512

FLE election fails to elect leader

(1)  Log information

(1.1) Roles in this case

5 zk servers

(1.2) Symptoms

Environment:

There are 5 zk servers in the cluster.

“I was doing fault injection testing using aspectj and noticed that after some time (after 16:23:50,525) no quorum is formed.

The faults are injected into socketchannel read/write, I throw exceptions randomly at a 1/200 ratio (rand.nextFloat() <= .005 => throw IOException

You can see when a fault is injected in the log via:

2009-08-19 16:57:09,568 - INFO [Thread-74:ReadRequestFailsIntermittently@38] - READPACKET FORCED FAIL”

The jstack trace is for one of the servers. Notice that #RecvWorker != #SendWorker

2009-08-19 16:59:45  Full thread dump Java HotSpot(TM) Client VM (14.0-b16 mixed mode, sharing):

"Thread-182" prio=10 tid=0x08a81c00 nid=0x6fa9 runnable [0xb48ff000]

   java.lang.Thread.State: RUNNABLE

       ...

       at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:594)

"Thread-181" prio=10 tid=0x08b32800 nid=0x6fa7 waiting on condition [0xb4476000]

   java.lang.Thread.State: WAITING (parking)

       ...

       at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:533)

       ...

"Thread-180" prio=10 tid=0x08af4000 nid=0x6edd runnable [0xb4569000]

   java.lang.Thread.State: RUNNABLE

       ...

       at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:594)

"Thread-179" prio=10 tid=0x08a87800 nid=0x6edb waiting on condition [0xb4b87000]

   java.lang.Thread.State: WAITING (parking)

       ...

       at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:533)

"Thread-131" prio=10 tid=0x08a0d400 nid=0x5fc3 waiting on condition [0xb4bd8000]

   java.lang.Thread.State: WAITING (parking)

       ...

       at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:533)

       ...

"Thread-2" prio=10 tid=0x08aee400 nid=0x58e8 waiting on condition [0xb4c29000]

   java.lang.Thread.State: WAITING (parking)

       ...

       at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:533)

From the thread dump, 2 RecvWorker, 4 SendWorker (#RecvWorker != #SendWorker), which is abnormal, since the RecvWorker and sendWorker come in pairs.

(2) How to figure out the root cause based on logs

 (2.1) Based on the description, IOException is injected on each zk server. After some time, the ensemble fails to re-elect a leader.

Then we check the logs from each zk server after 16:23:50,525, and all the servers are in LOOKING state.

server5: the notifications are only from server 1,2,4,5, without server3; RecvWorker will catch the injected IOException.

2009-08-19 16:24:10,595 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2185:FastLeaderElection@618] - Notification: 5, 12884901889, 34, 5, LOOKING, LOOKING, 2

2009-08-19 16:24:12,024 - INFO  [Thread-59:ReadRequestFailsIntermittently@38] - READPACKET FORCED FAIL

2009-08-19 16:24:12,024 - WARN  [Thread-59:QuorumCnxManager$RecvWorker@621] - Connection broken:

java.io.IOException: aspect failed

                      at org.apache.zookeeper.server.quorum.ReadRequestFailsIntermittently.ajc$before$org_apache_zookeeper_server_quorum_ReadRequestFailsIntermittently$1$61870ae2(ReadRequestFailsIntermittently.aj:39)

                      at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:594)

 

2009-08-19 16:31:09,255 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2185:FastLeaderElection@618] - Notification: 5, 12884901889, 34, 5, LOOKING, LOOKING, 5

2009-08-19 16:31:09,261 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2185:FastLeaderElection@618] - Notification: 2, 12884901889, 34, 5, LOOKING, LOOKING, 1

 

2009-08-19 16:52:09,460 - INFO  [Thread-46:ReadRequestFailsIntermittently@38] - READPACKET FORCED FAIL

2009-08-19 16:52:09,461 - WARN  [Thread-46:QuorumCnxManager$SendWorker@550] - Exception when using channel: 4

java.io.IOException: aspect failed

                      at org.apache.zookeeper.server.quorum.ReadRequestFailsIntermittently.ajc$before$org_apache_zookeeper_server_quorum_ReadRequestFailsIntermittently$1$61870ae2(ReadRequestFailsIntermittently.aj:39)

                      at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.send(QuorumCnxManager.java:512)

                      at org.apache.zookeeper.server.quorum.QuorumCnxManager$SendWorker.run(QuorumCnxManager.java:548)

 

2009-08-19 16:53:09,528 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2185:FastLeaderElection@618] - Notification: 4, 12884901889, 35, 5, LOOKING, LOOKING, 4

server4: the notifications are only from server 2,4,5, without server 1,3; RecvWorker will catch the injected IOException.

2009-08-19 16:21:16,628 - WARN  [Thread-132:QuorumCnxManager$RecvWorker@621] - Connection broken:

java.io.IOException: aspect failed

                      at org.apache.zookeeper.server.quorum.ReadRequestFailsIntermittently.ajc$before$org_apache_zookeeper_server_quorum_ReadRequestFailsIntermittently$1$61870ae2(ReadRequestFailsIntermittently.aj:39)

                      at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:594)

 

2009-08-19 16:24:16,068 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2184:FastLeaderElection@618] - Notification: 4, 12884901889, 35, 4, LOOKING, LOOKING, 2

2009-08-19 16:24:16,069 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2184:FastLeaderElection@642] - Adding vote

2009-08-19 16:24:16,069 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2184:FastLeaderElection@618] - Notification: 4, 12884901889, 35, 4, LOOKING, LOOKING, 4

2009-08-19 16:24:16,840 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2184:FastLeaderElection@618] - Notification: 5, 12884901889, 34, 4, LOOKING, LOOKING, 5

server3: the notifications are only from server 3,4,5, without server 1,2; RecvWorker will catch the injected IOException.

2009-08-19 16:17:23,620 - WARN  [Thread-73:QuorumCnxManager$RecvWorker@621] - Connection broken:

java.io.IOException: aspect failed

                      at org.apache.zookeeper.server.quorum.ReadRequestFailsIntermittently.ajc$before$org_apache_zookeeper_server_quorum_ReadRequestFailsIntermittently$1$61870ae2(ReadRequestFailsIntermittently.aj:39)

                      at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:594)

2009-08-19 16:23:56,202 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2183:FastLeaderElection@618] - Notification: 3, 12884901889, 34, 3, LOOKING, LOOKING, 3

2009-08-19 16:24:10,608 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2183:FastLeaderElection@618] - Notification: 5, 12884901889, 34, 3, LOOKING, LOOKING, 4

2009-08-19 16:24:16,840 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2183:FastLeaderElection@618] - Notification: 5, 12884901889, 34, 3, LOOKING, LOOKING, 5

server2: the notifications are only from server 1,2,4,5, without server 3;  RecvWorker will catch the injected IOException.

2009-08-19 16:24:08,581 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2182:FastLeaderElection@618] - Notification: 2, 12884901889, 34, 2, LOOKING, LOOKING, 1

2009-08-19 16:24:08,793 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2182:FastLeaderElection@618] - Notification: 2, 12884901889, 34, 2, LOOKING, LOOKING, 2

2009-08-19 16:24:10,600 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2182:FastLeaderElection@618] - Notification: 4, 12884901889, 34, 2, LOOKING, LOOKING, 4

2009-08-19 16:24:16,839 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2182:FastLeaderElection@618] - Notification: 5, 12884901889, 34, 2, LOOKING, LOOKING, 5

2009-08-19 16:24:22,527 - WARN  [Thread-117:QuorumCnxManager$RecvWorker@621] - Connection broken:

java.io.IOException: aspect failed

                      at org.apache.zookeeper.server.quorum.ReadRequestFailsIntermittently.ajc$before$org_apache_zookeeper_server_quorum_ReadRequestFailsIntermittently$1$61870ae2(ReadRequestFailsIntermittently.aj:39)

                      at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:609)

server1:  the notifications are only from server 1,2, without server 3,4,5; RecvWorker will catch the injected IOException.

2009-08-19 16:21:09,313 - WARN  [Thread-27:QuorumCnxManager$RecvWorker@621] - Connection broken:

java.io.IOException: aspect failed

                      at org.apache.zookeeper.server.quorum.ReadRequestFailsIntermittently.ajc$before$org_apache_zookeeper_server_quorum_ReadRequestFailsIntermittently$1$61870ae2(ReadRequestFailsIntermittently.aj:39)

                      at org.apache.zookeeper.server.quorum.QuorumCnxManager$RecvWorker.run(QuorumCnxManager.java:609)

2009-08-19 16:24:08,585 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@618] - Notification: 2, 12884901889, 34, 1, LOOKING, LOOKING, 1

2009-08-19 16:24:08,783 - INFO  [QuorumPeer:/0:0:0:0:0:0:0:0:2181:FastLeaderElection@618] - Notification: 2, 12884901889, 34, 1, LOOKING, LOOKING, 2

Besides, from the thread dump (jstack trace), we know that #RecvWorker < #SendWorker. It is likely that the zk server cannot receive the notification from the specific peer due to the failed RecvWorker, then the exchanged notifications are not enough to elect the new leader.

(on each zk server, there will be a pair of SendWorker/RecvWorker for each peer: for example, in a 5 servers zk cluster, on each zk server, there are 4 pairs of SendWorker/RecvWorker threads for exchanging messages with the other peers)

(2.2) Then we check the source code of SendWorker and RecvWorker.

SendWorker.run()

When IOException is caught by SendWorker, finish() is called, and senderWorkerMap.remove(sid)will remove the sendWorker which is related to the specific server from the map.

RecvWorker.run()

When IOException is caught by RecvWorker, only a WARN message is printed.

 

SendWorker and RecvWorker is created by initiateConnection(), which is called by connectOne().

In connectOne(), before initiating the connection, it will checks whether the connection already exists by senderWorkerMap.get(sid) == null. connectOne() is called when the current zk server needs to send messages to the other peers.

Therefore, if the injected IOException is caught by SendWorker, a new connection (including SendWorker/RecvWorker) will be created for the target peer when exchanging messages.

If the injected IOException is caught by RecvWorker, the RecvWorker will just fail silently. When the current zk server wants to exchange messages with the affected peer, it can only send message to the peer, without receiving message from the peer.

(3) Root Cause

RecvWorker on each zk server does not handle the IOException correctly, which leads to the zk server fails to receive the notifications from the specific peers. As a result, the ensemble cannot elect the leader.

(4) Fixing Method

Add finally block that closes the channel in RecvWorker.run().

Once the channel is closed in RecvWorker.run(), the SendWorker.run() will be affected and finish() will be called. So a new channel and SendWorker/RecvWorder pair can be created next time.

•       src/java/main/org/apache/zookeeper/server/quorum/QuorumCnxManager.java

RecvWorker.run()

The following fixes are not related to this bug.

 

Replaces take() with poll() in SendWorker().

“This fixes a race condition that can easily manifest when the system goes haywire (see Pat's aspects for examples).”

check the validity of server id.

The other fixes are in Listener.run(). 

(5) How many nodes are involved in the patch

zk servers.

(6)    Code Snippets

recvWorker.finish()

SendWorker.run()