Kubernetes: Spinning up a Neo4j 3.1 Causal Cluster
A couple of weeks ago I wrote a blog post explaining how I’d created a Neo4j causal cluster using docker containers directly and for my next pet project I wanted to use Kubernetes as an orchestration layer so that I could declaratively change the number of servers in my cluster.
I’d never used Kubernetes before but I saw a presentation showing how to use it to create an Elastic cluster at the GDG Cloud meetup a couple of months ago.
In that presentation I was introduced to the idea of a PetSet which is an abstraction exposed by Kubernetes which allows us to manage a set of pods (containers) which have a fixed identity. The documentation explains it better:
A PetSet ensures that a specified number of “pets” with unique identities are running at any given time. The identity of a Pet is comprised of:
a stable hostname, available in DNS
an ordinal index
stable storage: linked to the ordinal & hostname
In my case I need to have a stable hostname because each member of a Neo4j cluster is given a list of other cluster members with which it can create a new cluster or join an already existing one. This is the first use case described in the documentation:
PetSet also helps with the 2 most common problems encountered managing such clustered applications:
discovery of peers for quorum
startup/teardown ordering
So the first thing we need to do is create some stable storage for our pods to use.
We’ll create a cluster of 3 members so we need to create one PersistentVolume for each of them. The following script does the job:
volumes.sh
for i in $(seq 0 2); do
cat <<EOF | kubectl create -f -
kind: PersistentVolume
apiVersion: v1
metadata:
name: pv${i}
labels:
type: local
app: neo4j
spec:
capacity:
storage: 1Gi
accessModes:
- ReadWriteOnce
hostPath:
path: "/tmp/${i}"
EOF
cat <<EOF | kubectl create -f -
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: datadir-neo4j-${i}
labels:
app: neo4j
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1Gi
EOF
done;
If we run this script it’ll create 3 volumes which we can see by running the following command:
$ kubectl get pv
NAME CAPACITY ACCESSMODES STATUS CLAIM REASON AGE
pv0 1Gi RWO Bound default/datadir-neo4j-0 7s
pv1 1Gi RWO Bound default/datadir-neo4j-1 7s
pv2 1Gi RWO Bound default/datadir-neo4j-2 7s
$ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESSMODES AGE
datadir-neo4j-0 Bound pv0 1Gi RWO 26s
datadir-neo4j-1 Bound pv1 1Gi RWO 26s
datadir-neo4j-2 Bound pv2 1Gi RWO 25s
Next we need to create a PetSet template. After a lot of iterations I ended up with the following:
# Headless service to provide DNS lookup
apiVersion: v1
kind: Service
metadata:
labels:
app: neo4j
name: neo4j
spec:
clusterIP: None
ports:
- port: 7474
selector:
app: neo4j
---
# new API name
apiVersion: "apps/v1alpha1"
kind: PetSet
metadata:
name: neo4j
spec:
serviceName: neo4j
replicas: 3
template:
metadata:
annotations:
pod.alpha.kubernetes.io/initialized: "true"
pod.beta.kubernetes.io/init-containers: '[
{
"name": "install",
"image": "gcr.io/google_containers/busybox:1.24",
"command": ["/bin/sh", "-c", "echo \"
unsupported.dbms.edition=enterprise\n
dbms.mode=CORE\n
dbms.connectors.default_advertised_address=$HOSTNAME.neo4j.default.svc.cluster.local\n
dbms.connectors.default_listen_address=0.0.0.0\n
dbms.connector.bolt.type=BOLT\n
dbms.connector.bolt.enabled=true\n
dbms.connector.bolt.listen_address=0.0.0.0:7687\n
dbms.connector.http.type=HTTP\n
dbms.connector.http.enabled=true\n
dbms.connector.http.listen_address=0.0.0.0:7474\n
causal_clustering.raft_messages_log_enable=true\n
causal_clustering.initial_discovery_members=neo4j-0.neo4j.default.svc.cluster.local:5000,neo4j-1.neo4j.default.svc.cluster.local:5000,neo4j-2.neo4j.default.svc.cluster.local:5000\n
causal_clustering.leader_election_timeout=2s\n
\" > /work-dir/neo4j.conf" ],
"volumeMounts": [
{
"name": "confdir",
"mountPath": "/work-dir"
}
]
}
]'
labels:
app: neo4j
spec:
containers:
- name: neo4j
image: "neo4j/neo4j-experimental:3.1.0-M13-beta3-enterprise"
imagePullPolicy: Always
ports:
- containerPort: 5000
name: discovery
- containerPort: 6000
name: tx
- containerPort: 7000
name: raft
- containerPort: 7474
name: browser
- containerPort: 7687
name: bolt
securityContext:
privileged: true
volumeMounts:
- name: datadir
mountPath: /data
- name: confdir
mountPath: /conf
volumes:
- name: confdir
volumeClaimTemplates:
- metadata:
name: datadir
annotations:
volume.alpha.kubernetes.io/storage-class: anything
spec:
accessModes: [ "ReadWriteOnce" ]
resources:
requests:
storage: 1Gi
The main thing I had trouble with was getting the members of the cluster to talk to each other. The default docker config uses hostnames but I found that pods were unable to contact each other unless I specified the FQDN in the config file. We can run the following command to create the PetSet:
$ kubectl create -f neo4j.yaml
service "neo4j" created
petset "neo4j" created
We can check if the pods are up and running by executing the following command:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
neo4j-0 1/1 Running 0 2m
neo4j-1 1/1 Running 0 14s
neo4j-2 1/1 Running 0 10s
And we can tail neo4j’s log files like this:
$ kubectl logs neo4j-0
Starting Neo4j.
2016-11-25 16:39:50.333+0000 INFO Starting...
2016-11-25 16:39:51.723+0000 INFO Bolt enabled on 0.0.0.0:7687.
2016-11-25 16:39:51.733+0000 INFO Initiating metrics...
2016-11-25 16:39:51.911+0000 INFO Waiting for other members to join cluster before continuing...
2016-11-25 16:40:12.074+0000 INFO Started.
2016-11-25 16:40:12.428+0000 INFO Mounted REST API at: /db/manage
2016-11-25 16:40:13.350+0000 INFO Remote interface available at http://neo4j-0.neo4j.default.svc.cluster.local:7474/
$ kubectl logs neo4j-1
Starting Neo4j.
2016-11-25 16:39:53.846+0000 INFO Starting...
2016-11-25 16:39:56.212+0000 INFO Bolt enabled on 0.0.0.0:7687.
2016-11-25 16:39:56.225+0000 INFO Initiating metrics...
2016-11-25 16:39:56.341+0000 INFO Waiting for other members to join cluster before continuing...
2016-11-25 16:40:16.623+0000 INFO Started.
2016-11-25 16:40:16.951+0000 INFO Mounted REST API at: /db/manage
2016-11-25 16:40:17.607+0000 INFO Remote interface available at http://neo4j-1.neo4j.default.svc.cluster.local:7474/
$ kubectl logs neo4j-2
Starting Neo4j.
2016-11-25 16:39:57.828+0000 INFO Starting...
2016-11-25 16:39:59.166+0000 INFO Bolt enabled on 0.0.0.0:7687.
2016-11-25 16:39:59.176+0000 INFO Initiating metrics...
2016-11-25 16:39:59.329+0000 INFO Waiting for other members to join cluster before continuing...
2016-11-25 16:40:19.216+0000 INFO Started.
2016-11-25 16:40:19.675+0000 INFO Mounted REST API at: /db/manage
2016-11-25 16:40:21.029+0000 INFO Remote interface available at http://neo4j-2.neo4j.default.svc.cluster.local:7474/
I wanted to log into the servers from my host machine’s browser so I setup port forwarding for each of the servers:
$ kubectl port-forward neo4j-0 7474:7474 7687:7687
We can then get an overview of the cluster by running the following procedure:
CALL dbms.cluster.overview()
╒════════════════════════════════════╤═════════════════════════════════════════════════════╤════════╕
│id │addresses │role │
╞════════════════════════════════════╪═════════════════════════════════════════════════════╪════════╡
│81d8e5e2-02db-4414-85de-a7025b346e84│[bolt://neo4j-0.neo4j.default.svc.cluster.local:7687,│LEADER │
│ │ http://neo4j-0.neo4j.default.svc.cluster.local:7474]│ │
├────────────────────────────────────┼─────────────────────────────────────────────────────┼────────┤
│347b7517-7ca0-4b92-b9f0-9249d46b2ad3│[bolt://neo4j-1.neo4j.default.svc.cluster.local:7687,│FOLLOWER│
│ │ http://neo4j-1.neo4j.default.svc.cluster.local:7474]│ │
├────────────────────────────────────┼─────────────────────────────────────────────────────┼────────┤
│a5ec1335-91ce-4358-910b-8af9086c2969│[bolt://neo4j-2.neo4j.default.svc.cluster.local:7687,│FOLLOWER│
│ │ http://neo4j-2.neo4j.default.svc.cluster.local:7474]│ │
└────────────────────────────────────┴─────────────────────────────────────────────────────┴────────┘
So far so good. What if we want to have 5 servers in the cluster instead of 3? We can run the following command to increase our replica size:
$ kubectl patch petset neo4j -p '{"spec":{"replicas":5}}'
"neo4j" patched
Let’s run that procedure again:
CALL dbms.cluster.overview()
╒════════════════════════════════════╤═════════════════════════════════════════════════════╤════════╕
│id │addresses │role │
╞════════════════════════════════════╪═════════════════════════════════════════════════════╪════════╡
│81d8e5e2-02db-4414-85de-a7025b346e84│[bolt://neo4j-0.neo4j.default.svc.cluster.local:7687,│LEADER │
│ │ http://neo4j-0.neo4j.default.svc.cluster.local:7474]│ │
├────────────────────────────────────┼─────────────────────────────────────────────────────┼────────┤
│347b7517-7ca0-4b92-b9f0-9249d46b2ad3│[bolt://neo4j-1.neo4j.default.svc.cluster.local:7687,│FOLLOWER│
│ │ http://neo4j-1.neo4j.default.svc.cluster.local:7474]│ │
├────────────────────────────────────┼─────────────────────────────────────────────────────┼────────┤
│a5ec1335-91ce-4358-910b-8af9086c2969│[bolt://neo4j-2.neo4j.default.svc.cluster.local:7687,│FOLLOWER│
│ │ http://neo4j-2.neo4j.default.svc.cluster.local:7474]│ │
├────────────────────────────────────┼─────────────────────────────────────────────────────┼────────┤
│28613d06-d4c5-461c-b277-ddb3f05e5647│[bolt://neo4j-3.neo4j.default.svc.cluster.local:7687,│FOLLOWER│
│ │ http://neo4j-3.neo4j.default.svc.cluster.local:7474]│ │
├────────────────────────────────────┼─────────────────────────────────────────────────────┼────────┤
│2eaa0058-a4f3-4f07-9f22-d310562ad1ec│[bolt://neo4j-4.neo4j.default.svc.cluster.local:7687,│FOLLOWER│
│ │ http://neo4j-4.neo4j.default.svc.cluster.local:7474]│ │
└────────────────────────────────────┴─────────────────────────────────────────────────────┴────────┘
Neat! And it’s as easy to go back down to 3 again:
$ kubectl patch petset neo4j -p '{"spec":{"replicas":3}}'
"neo4j" patched
CALL dbms.cluster.overview()
╒════════════════════════════════════╤══════════════════════════════════════════════════════╤════════╕
│id │addresses │role │
╞════════════════════════════════════╪══════════════════════════════════════════════════════╪════════╡
│81d8e5e2-02db-4414-85de-a7025b346e84│[bolt://neo4j-0.neo4j.default.svc.cluster.local:7687, │LEADER │
│ │http://neo4j-0.neo4j.default.svc.cluster.local:7474] │ │
├────────────────────────────────────┼──────────────────────────────────────────────────────┼────────┤
│347b7517-7ca0-4b92-b9f0-9249d46b2ad3│[bolt://neo4j-1.neo4j.default.svc.cluster.local:7687, │FOLLOWER│
│ │http://neo4j-1.neo4j.default.svc.cluster.local:7474] │ │
├────────────────────────────────────┼──────────────────────────────────────────────────────┼────────┤
│a5ec1335-91ce-4358-910b-8af9086c2969│[bolt://neo4j-2.neo4j.default.svc.cluster.local:7687, │FOLLOWER│
│ │http://neo4j-2.neo4j.default.svc.cluster.local:7474] │ │
└────────────────────────────────────┴──────────────────────────────────────────────────────┴────────┘
Next I need to look at how we can add read replicas into the cluster. These don’t take part in the membership/quorum algorithm so I think I’ll be able to use the more common ReplicationController/Pod architecture for those.
If you want to play around with this the code is available as a gist. I’m using the minikube library for all my experiments but I’ll hopefully get around to trying this on GCE or AWS soon.
About the author
I'm currently working on real-time user-facing analytics with Apache Pinot at StarTree. I previously worked on graph analytics at Neo4j, where I also I co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.