4 Dec 2016

Kubernetes: Simulating a network partition

A couple of weeks ago I wrote a post explaining how to create a Neo4j causal cluster using Kubernetes and ... the I wanted to work out how to simulate a network partition which would put the leader on the minority side and force an election.

We’ve done this on our internal tooling on AWS using the https://en.wikipedia.org/wiki/Iptables command but unfortunately that isn’t available in my container, which only has the utilities provided by BusyBox.

Luckily one of these is route command which will allow us to achieve the same thing.

To recap, I have 3 Neo4j pods up and running:

$ kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
neo4j-0   1/1       Running   0          6h
neo4j-1   1/1       Running   0          6h
neo4j-2   1/1       Running   0          6h

And we can check that the route command is available:

$ kubectl exec neo4j-0 -- ls -alh /sbin/route
lrwxrwxrwx    1 root     root          12 Oct 18 18:58 /sbin/route -> /bin/busybox

Let’s have a look what role each server is currently playing:

$ kubectl exec neo4j-0 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"FOLLOWER"

Bye!

$ kubectl exec neo4j-1 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"FOLLOWER"

Bye!

$ kubectl exec neo4j-2 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"LEADER"

Bye!

Slight aside: I’m able to call cypher-shell without a user and password because I’ve disable authorisation by putting the following in conf/neo4j.conf:

dbms.connector.bolt.enabled=true

Back to the network partitioning...we need to partition away neo4j-2 from the other two servers which we can do by running the following commands:

$ kubectl exec neo4j-2 -- route add -host neo4j-0.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-2 -- route add -host neo4j-1.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-0 -- route add -host neo4j-2.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-1 -- route add -host neo4j-2.neo4j.default.svc.cluster.local reject

If we look at the logs of neo4j-2 we can see that it’s stepped down after being disconnected from the other two servers:

$ kubectl exec neo4j-2 -- cat logs/debug.log
...
2016-12-04 11:30:10.186+0000 INFO  [o.n.c.c.c.RaftMachine] Moving to FOLLOWER state after not receiving heartbeat responses in this election timeout period. Heartbeats received: []
...

Who’s taken over as leader?

$ kubectl exec neo4j-0 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"LEADER"

Bye!

$ kubectl exec neo4j-1 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"FOLLOWER"

Bye!

$ kubectl exec neo4j-2 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"FOLLOWER"

Bye!

Looks like neo4j-0! Let’s put some data into the database:

$ kubectl exec neo4j-0 -- bin/cypher-shell "CREATE (:Person {name: 'Mark'})"
Added 1 nodes, Set 1 properties, Added 1 labels

Bye!

Let’s check if that node made it to the other two servers. We’d expect it to be on neo4j-1 but not on neo4j-2:

$ kubectl exec neo4j-1 -- bin/cypher-shell "MATCH (p:Person) RETURN p"
p
(:Person {name: "Mark"})

Bye!

$ kubectl exec neo4j-2 -- bin/cypher-shell "MATCH (p:Person) RETURN p"


Bye!

On neo4j-2 we’ll repeatedly see these types of entries in the log as its election timeout triggers but fails to get any responses to the vote requests it sends out:

$ kubectl exec neo4j-2 -- cat logs/debug.log
...
2016-12-04 11:32:56.735+0000 INFO  [o.n.c.c.c.RaftMachine] Election timeout triggered
2016-12-04 11:32:56.736+0000 INFO  [o.n.c.c.c.RaftMachine] Election started with vote request: Vote.Request from MemberId{ca9b954c} {term=11521, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467} and members: [MemberId{484178c4}, MemberId{0acdb8dd}, MemberId{ca9b954c}]
...

We can see those vote requests by looking at the raft-messages.log which can be enabled by setting the following property in conf/neo4j.conf:

causal_clustering.raft_messages_log_enable=true

$ kubectl exec neo4j-2 -- cat logs/raft-messages.log
...
11:33:42.101 -->MemberId{484178c4}: Request: Vote.Request from MemberId{ca9b954c} {term=11537, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
11:33:42.102 -->MemberId{0acdb8dd}: Request: Vote.Request from MemberId{ca9b954c} {term=11537, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}

11:33:45.432 -->MemberId{484178c4}: Request: Vote.Request from MemberId{ca9b954c} {term=11538, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
11:33:45.433 -->MemberId{0acdb8dd}: Request: Vote.Request from MemberId{ca9b954c} {term=11538, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}

11:33:48.362 -->MemberId{484178c4}: Request: Vote.Request from MemberId{ca9b954c} {term=11539, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
11:33:48.362 -->MemberId{0acdb8dd}: Request: Vote.Request from MemberId{ca9b954c} {term=11539, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
...

To 'heal' the network partition we just need to delete all the commands we ran earlier:

$ kubectl exec neo4j-2 -- route delete neo4j-0.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-2 -- route delete neo4j-1.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-0 -- route delete neo4j-2.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-1 -- route delete neo4j-2.neo4j.default.svc.cluster.local reject

Now let’s check that neo4j-2 now has the node that we created earlier:

$ kubectl exec neo4j-2 -- bin/cypher-shell "MATCH (p:Person) RETURN p"
p
(:Person {name: "Mark"})

Bye!

That’s all for now!

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.