Mark Needham

Thoughts on Software Development

Archive for the ‘kubernetes’ tag

Kubernetes: Copy a dataset to a StatefulSet’s PersistentVolume

without comments

In this post we’ll learn how to copy an existing dataset to the PersistentVolumes used by a Neo4j cluster running on Kubernetes.

Neo4j Clusters on Kubernetes

This posts assumes that we’re familiar with deploying Neo4j on Kubernetes. I wrote an article on the Neo4j blog explaining this in more detail.

The StatefulSet we create for our core servers require persistent storage, achieved via the PersistentVolumeClaim (PVC) primitive. A Neo4j cluster containing 3 core servers would have the following PVCs:

$ kubectl get pvc
NAME                            STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
datadir-neo-helm-neo4j-core-0   Bound     pvc-043efa91-cc54-11e7-bfa5-080027ab9eac   10Gi       RWO            standard       45s
datadir-neo-helm-neo4j-core-1   Bound     pvc-1737755a-cc54-11e7-bfa5-080027ab9eac   10Gi       RWO            standard       13s
datadir-neo-helm-neo4j-core-2   Bound     pvc-18696bfd-cc54-11e7-bfa5-080027ab9eac   10Gi       RWO            standard       11s

Each of the PVCs has a corresponding PersistentVolume (PV) that satisifies it:

$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                                   STORAGECLASS   REASON    AGE
pvc-043efa91-cc54-11e7-bfa5-080027ab9eac   10Gi       RWO            Delete           Bound     default/datadir-neo-helm-neo4j-core-0   standard                 41m
pvc-1737755a-cc54-11e7-bfa5-080027ab9eac   10Gi       RWO            Delete           Bound     default/datadir-neo-helm-neo4j-core-1   standard                 40m
pvc-18696bfd-cc54-11e7-bfa5-080027ab9eac   10Gi       RWO            Delete           Bound     default/datadir-neo-helm-neo4j-core-2   standard                 40m

The PVCs and PVs are usually created at the same time that we deploy our StatefulSet. We need to intervene in that lifecycle so that our dataset is already in place before the StatefulSet is deployed.

Deploying an existing dataset

We can do this by following these steps:

  • Create PVCs with the above names manually
  • Attach pods to those PVCs
  • Copy our dataset onto those pods
  • Delete the pods
  • Deploy our Neo4j cluster

We can use the following script to create the PVCs and pods:

pvs.sh

#!/usr/bin/env bash
 
set -exuo pipefail
 
for i in $(seq 0 2); do
  cat <<EOF | kubectl apply -f -
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: datadir-neo-helm-neo4j-core-${i}
  labels:
    app: neo4j
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
EOF
 
  cat <<EOF | kubectl apply -f -
kind: Pod
apiVersion: v1
metadata:
  name: neo4j-load-data-${i}
  labels:
    app: neo4j-loader
spec:
  volumes:
    - name: datadir-neo4j-core-${i}
      persistentVolumeClaim:
        claimName: datadir-neo-helm-neo4j-core-${i}
  containers:
    - name: neo4j-load-data-${i}
      image: ubuntu
      volumeMounts:
      - name: datadir-neo4j-core-${i}
        mountPath: /data
      command: ["/bin/bash", "-ecx", "while :; do printf '.'; sleep 5 ; done"]
EOF
 
done;

Let’s run that script to create our PVCs and pods:

$ ./pvs.sh 
++ seq 0 2
+ for i in $(seq 0 2)
+ cat
+ kubectl apply -f -
persistentvolumeclaim "datadir-neo-helm-neo4j-core-0" configured
+ cat
+ kubectl apply -f -
pod "neo4j-load-data-0" configured
+ for i in $(seq 0 2)
+ cat
+ kubectl apply -f -
persistentvolumeclaim "datadir-neo-helm-neo4j-core-1" configured
+ cat
+ kubectl apply -f -
pod "neo4j-load-data-1" configured
+ for i in $(seq 0 2)
+ cat
+ kubectl apply -f -
persistentvolumeclaim "datadir-neo-helm-neo4j-core-2" configured
+ cat
+ kubectl apply -f -
pod "neo4j-load-data-2" configured

Now we can copy our database onto the pods:

for i in $(seq 0 2); do
  kubectl cp graph.db.tar.gz neo4j-load-data-${i}:/data/
  kubectl exec neo4j-load-data-${i} -- bash -c "mkdir -p /data/databases && tar -xf  /data/graph.db.tar.gz -C /data/databases"
done

graph.db.tar.gz contains a backup from a local database I created:

$ tar -tvf graph.db.tar.gz 
 
drwxr-xr-x  0 markneedham staff       0 24 Jul 15:23 graph.db/
drwxr-xr-x  0 markneedham staff       0 24 Jul 15:23 graph.db/certificates/
drwxr-xr-x  0 markneedham staff       0 17 Feb  2017 graph.db/index/
drwxr-xr-x  0 markneedham staff       0 24 Jul 15:22 graph.db/logs/
-rw-r--r--  0 markneedham staff    8192 24 Jul 15:23 graph.db/neostore
-rw-r--r--  0 markneedham staff     896 24 Jul 15:23 graph.db/neostore.counts.db.a
-rw-r--r--  0 markneedham staff    1344 24 Jul 15:23 graph.db/neostore.counts.db.b
-rw-r--r--  0 markneedham staff       9 24 Jul 15:23 graph.db/neostore.id
-rw-r--r--  0 markneedham staff   65536 24 Jul 15:23 graph.db/neostore.labelscanstore.db
...
-rw-------  0 markneedham staff     1700 24 Jul 15:23 graph.db/certificates/neo4j.key

We’ll run the following command to check the databases are in place:

$ kubectl exec neo4j-load-data-0 -- ls -lh /data/databases/
total 4.0K
drwxr-xr-x 6 501 staff 4.0K Jul 24 14:23 graph.db
 
$ kubectl exec neo4j-load-data-1 -- ls -lh /data/databases/
total 4.0K
drwxr-xr-x 6 501 staff 4.0K Jul 24 14:23 graph.db
 
$ kubectl exec neo4j-load-data-2 -- ls -lh /data/databases/
total 4.0K
drwxr-xr-x 6 501 staff 4.0K Jul 24 14:23 graph.db

All good so far. The pods have done their job so we’ll tear those down:

$ kubectl delete pods -l app=neo4j-loader
pod "neo4j-load-data-0" deleted
pod "neo4j-load-data-1" deleted
pod "neo4j-load-data-2" deleted

We’re now ready to deploy our Neo4j cluster.

helm install incubator/neo4j --name neo-helm --wait --set authEnabled=false

Finally we’ll run a Cypher query to check that the Neo4j servers used the database that we uploaded:

$ kubectl exec neo-helm-neo4j-core-0 -- bin/cypher-shell "match (n) return count(*)"
count(*)
32314
 
$ kubectl exec neo-helm-neo4j-core-1 -- bin/cypher-shell "match (n) return count(*)"
count(*)
32314
 
$ kubectl exec neo-helm-neo4j-core-2 -- bin/cypher-shell "match (n) return count(*)"
count(*)
32314

Success!

We could achieve similar results by using an init container but I haven’t had a chance to try out that approach yet. If you give it a try let me know in the comments and I’ll add it to the post.

Written by Mark Needham

November 18th, 2017 at 12:44 pm

Posted in Kubernetes,neo4j

Tagged with ,

Kubernetes 1.8: Using Cronjobs to take Neo4j backups

without comments

With the release of Kubernetes 1.8 Cronjobs have graduated to beta, which means we can now more easily run Neo4j backup jobs against Kubernetes clusters.

Before we learn how to write a Cronjob let’s first create a local Kubernetes cluster and deploy Neo4j.

Spinup Kubernetes & Helm

minikube start --memory 8192
helm init && kubectl rollout status -w deployment/tiller-deploy --namespace=kube-system

Deploy a Neo4j cluster

helm repo add incubator https://kubernetes-charts-incubator.storage.googleapis.com/
helm install incubator/neo4j --name neo-helm --wait --set authEnabled=false,core.extraVars.NEO4J_dbms_backup_address=0.0.0.0:6362

Populate our cluster with data

We can run the following command to check our cluster is ready:

kubectl exec neo-helm-neo4j-core-0 \
  -- bin/cypher-shell --format verbose \
  "CALL dbms.cluster.overview() YIELD id, role RETURN id, role"
 
+-----------------------------------------------------+
| id                                     | role       |
+-----------------------------------------------------+
| "0b3bfe6c-6a68-4af5-9dd2-e96b564df6e5" | "LEADER"   |
| "09e9bee8-bdc5-4e95-926c-16ea8213e6e7" | "FOLLOWER" |
| "859b9b56-9bfc-42ae-90c3-02cedacfe720" | "FOLLOWER" |
+-----------------------------------------------------+

Now let’s create some data:

kubectl exec neo-helm-neo4j-core-0 \
  -- bin/cypher-shell --format verbose \
  "UNWIND range(0,1000) AS id CREATE (:Person {id: id})"
 
0 rows available after 653 ms, consumed after another 0 ms
Added 1001 nodes, Set 1001 properties, Added 1001 labels

Now that our Neo4j cluster is running and contains data we want to take regular backups.

Neo4j backups

The Neo4j backup tool supports full and incremental backups.

Full backup

A full backup streams a copy of the store files to the backup location and any transactions that happened during the backup. Those transactions are then applied to the backed up copy of the database.

Incremental backup

An incremental backup is triggered if there is already a Neo4j database in the backup location. In this case there will be no copying of store files. Instead the tool will copy any new transactions from Neo4j and apply them to the backup.

Backup Cronjob

We will use a Cronjob to execute a backup. In the example below we attach a PersistentVolumeClaim to our Cronjob so that we can see both of the backup scenarios in action.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: backupdir-neo4j
  labels:
    app: neo4j-backup
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: neo4j-backup
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          volumes:
            - name: backupdir-neo4j
              persistentVolumeClaim:
                claimName: backupdir-neo4j
          containers:
            - name: neo4j-backup
              image: neo4j:3.3.0-enterprise
              env:
                - name: NEO4J_ACCEPT_LICENSE_AGREEMENT
                  value: "yes"
              volumeMounts:
              - name: backupdir-neo4j
                mountPath: /tmp
              command: ["bin/neo4j-admin",  "backup", "--backup-dir", "/tmp", "--name", "backup", "--from", "neo-helm-neo4j-core-2.neo-helm-neo4j.default.svc.cluster.local:6362"]
          restartPolicy: OnFailure

$ kubectl apply -f cronjob.yaml 
cronjob "neo4j-backup" created
kubectl get jobs -w
NAME                      DESIRED   SUCCESSFUL   AGE
neo4j-backup-1510940940   1         1            34s

Now let’s view the logs from the job:

kubectl logs $(kubectl get pods -a --selector=job-name=neo4j-backup-1510940940 --output=jsonpath={.items..metadata.name})
 
Doing full backup...
2017-11-17 17:49:05.920+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.nodestore.db.labels
...
2017-11-17 17:49:06.038+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.labelscanstore.db 48.00 kB
2017-11-17 17:49:06.038+0000 INFO [o.n.c.s.StoreCopyClient] Done, copied 18 files
2017-11-17 17:49:06.094+0000 INFO [o.n.b.BackupService] Start recovering store
2017-11-17 17:49:07.669+0000 INFO [o.n.b.BackupService] Finish recovering store
Doing consistency check...
2017-11-17 17:49:07.716+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Selected RecordFormat:StandardV3_2[v0.A.8] record format from store /tmp/backup
2017-11-17 17:49:07.716+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Format not configured. Selected format from the store: RecordFormat:StandardV3_2[v0.A.8]
2017-11-17 17:49:07.755+0000 INFO [o.n.m.MetricsExtension] Initiating metrics...
....................  10%
...
.................... 100%
Backup complete.

All good so far. Now let’s create more data:

kubectl exec neo-helm-neo4j-core-0 \
  -- bin/cypher-shell --format verbose \
  "UNWIND range(0,1000) AS id CREATE (:Person {id: id})"
 
 
0 rows available after 114 ms, consumed after another 0 ms
Added 1001 nodes, Set 1001 properties, Added 1001 labels

And wait for another backup job to run:

kubectl get jobs -w
NAME                      DESIRED   SUCCESSFUL   AGE
neo4j-backup-1510941180   1         1            2m
neo4j-backup-1510941240   1         1            1m
neo4j-backup-1510941300   1         1            17s
kubectl logs $(kubectl get pods -a --selector=job-name=neo4j-backup-1510941300 --output=jsonpath={.items..metadata.name})
 
Destination is not empty, doing incremental backup...
Doing consistency check...
2017-11-17 17:55:07.958+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Selected RecordFormat:StandardV3_2[v0.A.8] record format from store /tmp/backup
2017-11-17 17:55:07.959+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Format not configured. Selected format from the store: RecordFormat:StandardV3_2[v0.A.8]
2017-11-17 17:55:07.995+0000 INFO [o.n.m.MetricsExtension] Initiating metrics...
....................  10%
...
.................... 100%
Backup complete.

If we were to extend this job further we could have it copy each backup it takes to an S3 bucket but I’ll leave that for another day.

Written by Mark Needham

November 17th, 2017 at 6:10 pm

Posted in Kubernetes,neo4j

Tagged with , , ,

Kubernetes: Simple example of pod running

without comments

I recently needed to create a Kubernetes pod that would ‘just sit there’ while I used kube cp to copy some files to a persistent volume to which it was bound.

I started out with this naive pod spec:

pod_no_while.yaml

kind: Pod
apiVersion: v1
metadata:
  name: marks-dummy-pod
spec:
  containers:
    - name: marks-dummy-pod
      image: ubuntu
  restartPolicy: Never

Let’s apply that template:

$ kubectl apply -f pod_no_while.yaml 
pod "marks-dummy-pod" created

And let’s check if we have any running pods:

$ kubectl get pods
No resources found, use --show-all to see completed objects.

We won’t see anything here because unsurprisingly the pod has run to completion as there’s nothing to keep it running! We can confirm that by running this command:

$ kubectl get pods --show-all
NAME              READY     STATUS      RESTARTS   AGE
marks-dummy-pod   0/1       Completed   0          1m

Now let’s create a pod that has an infinite bash while loop:

pod.yaml

kind: Pod
apiVersion: v1
metadata:
  name: marks-dummy-pod
spec:
  containers:
    - name: marks-dummy-pod
      image: ubuntu
      command: ["/bin/bash", "-ec", "while :; do echo '.'; sleep 5 ; done"]
  restartPolicy: Never

Let’s apply that one instead:

$ kubectl apply -f pod.yaml 
The Pod "marks-dummy-pod" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)

Oops, we need to delete it first so let’s do that:

$ kubectl delete pod marks-dummy-pod
pod "marks-dummy-pod" deleted

Attempt #2:

$ kubectl apply -f pod.yaml 
pod "marks-dummy-pod" created

And let’s checkup on our pod:

$ kubectl get pods
NAME              READY     STATUS    RESTARTS   AGE
marks-dummy-pod   1/1       Running   0          14s

Looks better already. Let’s check the logs

$ kubectl logs marks-dummy-pod 
.
.

Great! We can now kubectl cp to our heart’s content and then delete the pod aftewards.

Written by Mark Needham

October 21st, 2017 at 10:06 am

Posted in Kubernetes

Tagged with ,

Kubernetes: Which node is a pod on?

without comments

When running Kubernetes on a cloud provider, rather than locally using minikube, it’s useful to know which node a pod is running on.

The normal command to list pods doesn’t contain this information:

$ kubectl get pod
NAME           READY     STATUS    RESTARTS   AGE       
neo4j-core-0   1/1       Running   0          6m        
neo4j-core-1   1/1       Running   0          6m        
neo4j-core-2   1/1       Running   0          2m

I spent a while searching for a command that I could use before I came across Ta-Ching Chen’s blog post while looking for something else.

Ta-Ching points out that we just need to add the flag -o wide to our original command to get the information we require:

$ kubectl get pod -o wide
NAME           READY     STATUS    RESTARTS   AGE       IP           NODE
neo4j-core-0   1/1       Running   0          6m        10.32.3.6    gke-neo4j-cluster-default-pool-ded394fa-0kpw
neo4j-core-1   1/1       Running   0          6m        10.32.3.7    gke-neo4j-cluster-default-pool-ded394fa-0kpw
neo4j-core-2   1/1       Running   0          2m        10.32.0.10   gke-neo4j-cluster-default-pool-ded394fa-kp68

Easy!

Written by Mark Needham

June 14th, 2017 at 8:49 am

Posted in Kubernetes

Tagged with

Kubernetes: Simulating a network partition

without comments

A couple of weeks ago I wrote a post explaining how to create a Neo4j causal cluster using Kubernetes and … the I wanted to work out how to simulate a network partition which would put the leader on the minority side and force an election.

We’ve done this on our internal tooling on AWS using the iptables command but unfortunately that isn’t available in my container, which only has the utilities provided by BusyBox.

Luckily one of these is route command which will allow us to achieve the same thing.

To recap, I have 3 Neo4j pods up and running:

$ kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
neo4j-0   1/1       Running   0          6h
neo4j-1   1/1       Running   0          6h
neo4j-2   1/1       Running   0          6h

And we can check that the route command is available:

$ kubectl exec neo4j-0 -- ls -alh /sbin/route 
lrwxrwxrwx    1 root     root          12 Oct 18 18:58 /sbin/route -> /bin/busybox

Let’s have a look what role each server is currently playing:

$ kubectl exec neo4j-0 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"FOLLOWER"
 
Bye!
$ kubectl exec neo4j-1 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"FOLLOWER"
 
Bye!
$ kubectl exec neo4j-2 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"LEADER"
 
Bye!

Slight aside: I’m able to call cypher-shell without a user and password because I’ve disable authorisation by putting the following in conf/neo4j.conf:

dbms.connector.bolt.enabled=true

Back to the network partitioning…we need to partition away neo4j-2 from the other two servers which we can do by running the following commands:

$ kubectl exec neo4j-2 -- route add -host neo4j-0.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-2 -- route add -host neo4j-1.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-0 -- route add -host neo4j-2.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-1 -- route add -host neo4j-2.neo4j.default.svc.cluster.local reject

If we look at the logs of neo4j-2 we can see that it’s stepped down after being disconnected from the other two servers:

$ kubectl exec neo4j-2 -- cat logs/debug.log
...
2016-12-04 11:30:10.186+0000 INFO  [o.n.c.c.c.RaftMachine] Moving to FOLLOWER state after not receiving heartbeat responses in this election timeout period. Heartbeats received: []
...

Who’s taken over as leader?

$ kubectl exec neo4j-0 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"LEADER"
 
Bye!
$ kubectl exec neo4j-1 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"FOLLOWER"
 
Bye!
$ kubectl exec neo4j-2 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"FOLLOWER"
 
Bye!

Looks like neo4j-0! Let’s put some data into the database:

$ kubectl exec neo4j-0 -- bin/cypher-shell "CREATE (:Person {name: 'Mark'})"
Added 1 nodes, Set 1 properties, Added 1 labels
 
Bye!

Let’s check if that node made it to the other two servers. We’d expect it to be on neo4j-1 but not on neo4j-2:

$ kubectl exec neo4j-1 -- bin/cypher-shell "MATCH (p:Person) RETURN p"
p
(:Person {name: "Mark"})
 
Bye!
$ kubectl exec neo4j-2 -- bin/cypher-shell "MATCH (p:Person) RETURN p"
 
 
Bye!

On neo4j-2 we’ll repeatedly see these types of entries in the log as its election timeout triggers but fails to get any responses to the vote requests it sends out:

$ kubectl exec neo4j-2 -- cat logs/debug.log
...
2016-12-04 11:32:56.735+0000 INFO  [o.n.c.c.c.RaftMachine] Election timeout triggered
2016-12-04 11:32:56.736+0000 INFO  [o.n.c.c.c.RaftMachine] Election started with vote request: Vote.Request from MemberId{ca9b954c} {term=11521, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467} and members: [MemberId{484178c4}, MemberId{0acdb8dd}, MemberId{ca9b954c}]
...

We can see those vote requests by looking at the raft-messages.log which can be enabled by setting the following property in conf/neo4j.conf:

causal_clustering.raft_messages_log_enable=true
$ kubectl exec neo4j-2 -- cat logs/raft-messages.log
...
11:33:42.101 -->MemberId{484178c4}: Request: Vote.Request from MemberId{ca9b954c} {term=11537, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
11:33:42.102 -->MemberId{0acdb8dd}: Request: Vote.Request from MemberId{ca9b954c} {term=11537, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
 
11:33:45.432 -->MemberId{484178c4}: Request: Vote.Request from MemberId{ca9b954c} {term=11538, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
11:33:45.433 -->MemberId{0acdb8dd}: Request: Vote.Request from MemberId{ca9b954c} {term=11538, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
 
11:33:48.362 -->MemberId{484178c4}: Request: Vote.Request from MemberId{ca9b954c} {term=11539, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
11:33:48.362 -->MemberId{0acdb8dd}: Request: Vote.Request from MemberId{ca9b954c} {term=11539, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
...

To ‘heal’ the network partition we just need to delete all the commands we ran earlier:

$ kubectl exec neo4j-2 -- route delete neo4j-0.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-2 -- route delete neo4j-1.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-0 -- route delete neo4j-2.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-1 -- route delete neo4j-2.neo4j.default.svc.cluster.local reject

Now let’s check that neo4j-2 now has the node that we created earlier:

$ kubectl exec neo4j-2 -- bin/cypher-shell "MATCH (p:Person) RETURN p"
p
(:Person {name: "Mark"})
 
Bye!

That’s all for now!

Written by Mark Needham

December 4th, 2016 at 12:37 pm

Posted in Kubernetes,neo4j

Tagged with ,

Kubernetes: Spinning up a Neo4j 3.1 Causal Cluster

with 2 comments

A couple of weeks ago I wrote a blog post explaining how I’d created a Neo4j causal cluster using docker containers directly and for my next pet project I wanted to use Kubernetes as an orchestration layer so that I could declaratively change the number of servers in my cluster.

I’d never used Kubernetes before but I saw a presentation showing how to use it to create an Elastic cluster at the GDG Cloud meetup a couple of months ago.

In that presentation I was introduced to the idea of a PetSet which is an abstraction exposed by Kubernetes which allows us to manage a set of pods (containers) which have a fixed identity. The documentation explains it better:

A PetSet ensures that a specified number of “pets” with unique identities are running at any given time. The identity of a Pet is comprised of:

  • a stable hostname, available in DNS
  • an ordinal index
  • stable storage: linked to the ordinal & hostname

In my case I need to have a stable hostname because each member of a Neo4j cluster is given a list of other cluster members with which it can create a new cluster or join an already existing one. This is the first use case described in the documentation:

PetSet also helps with the 2 most common problems encountered managing such clustered applications:

  • discovery of peers for quorum
  • startup/teardown ordering

So the first thing we need to do is create some stable storage for our pods to use.

We’ll create a cluster of 3 members so we need to create one PersistentVolume for each of them. The following script does the job:

volumes.sh

for i in $(seq 0 2); do
  cat <<EOF | kubectl create -f -
kind: PersistentVolume
apiVersion: v1
metadata:
  name: pv${i}
  labels:
    type: local
    app: neo4j
spec:
  capacity:
    storage: 1Gi
  accessModes:
    - ReadWriteOnce
  hostPath:
    path: "/tmp/${i}"
EOF
 
  cat <<EOF | kubectl create -f -
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: datadir-neo4j-${i}
  labels:
    app: neo4j
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 1Gi
EOF
done;

If we run this script it’ll create 3 volumes which we can see by running the following command:

$ kubectl get pv
NAME      CAPACITY   ACCESSMODES   STATUS    CLAIM                     REASON    AGE
pv0       1Gi        RWO           Bound     default/datadir-neo4j-0             7s
pv1       1Gi        RWO           Bound     default/datadir-neo4j-1             7s
pv2       1Gi        RWO           Bound     default/datadir-neo4j-2             7s
$ kubectl get pvc
NAME              STATUS    VOLUME    CAPACITY   ACCESSMODES   AGE
datadir-neo4j-0   Bound     pv0       1Gi        RWO           26s
datadir-neo4j-1   Bound     pv1       1Gi        RWO           26s
datadir-neo4j-2   Bound     pv2       1Gi        RWO           25s

Next we need to create a PetSet template. After a lot of iterations I ended up with the following:

# Headless service to provide DNS lookup
apiVersion: v1
kind: Service
metadata:
  labels:
    app: neo4j
  name: neo4j
spec:
  clusterIP: None
  ports:
    - port: 7474
  selector:
    app: neo4j
----
# new API name
apiVersion: "apps/v1alpha1"
kind: PetSet
metadata:
  name: neo4j
spec:
  serviceName: neo4j
  replicas: 3
  template:
    metadata:
      annotations:
        pod.alpha.kubernetes.io/initialized: "true"
        pod.beta.kubernetes.io/init-containers: '[
            {
                "name": "install",
                "image": "gcr.io/google_containers/busybox:1.24",
                "command": ["/bin/sh", "-c", "echo \"
                unsupported.dbms.edition=enterprise\n
                dbms.mode=CORE\n
                dbms.connectors.default_advertised_address=$HOSTNAME.neo4j.default.svc.cluster.local\n
                dbms.connectors.default_listen_address=0.0.0.0\n
                dbms.connector.bolt.type=BOLT\n
                dbms.connector.bolt.enabled=true\n
                dbms.connector.bolt.listen_address=0.0.0.0:7687\n
                dbms.connector.http.type=HTTP\n
                dbms.connector.http.enabled=true\n
                dbms.connector.http.listen_address=0.0.0.0:7474\n
                causal_clustering.raft_messages_log_enable=true\n
                causal_clustering.initial_discovery_members=neo4j-0.neo4j.default.svc.cluster.local:5000,neo4j-1.neo4j.default.svc.cluster.local:5000,neo4j-2.neo4j.default.svc.cluster.local:5000\n
                causal_clustering.leader_election_timeout=2s\n
                  \" > /work-dir/neo4j.conf" ],
                "volumeMounts": [
                    {
                        "name": "confdir",
                        "mountPath": "/work-dir"
                    }
                ]
            }
        ]'
      labels:
        app: neo4j
    spec:
      containers:
      - name: neo4j
        image: "neo4j/neo4j-experimental:3.1.0-M13-beta3-enterprise"
        imagePullPolicy: Always
        ports:
        - containerPort: 5000
          name: discovery
        - containerPort: 6000
          name: tx
        - containerPort: 7000
          name: raft
        - containerPort: 7474
          name: browser
        - containerPort: 7687
          name: bolt
        securityContext:
          privileged: true
        volumeMounts:
        - name: datadir
          mountPath: /data
        - name: confdir
          mountPath: /conf
      volumes:
      - name: confdir
  volumeClaimTemplates:
  - metadata:
      name: datadir
      annotations:
        volume.alpha.kubernetes.io/storage-class: anything
    spec:
      accessModes: [ "ReadWriteOnce" ]
      resources:
        requests:
          storage: 1Gi

The main thing I had trouble with was getting the members of the cluster to talk to each other. The default docker config uses hostnames but I found that pods were unable to contact each other unless I specified the FQDN in the config file. We can run the following command to create the PetSet:

$ kubectl create -f neo4j.yaml 
service "neo4j" created
petset "neo4j" created

We can check if the pods are up and running by executing the following command:

$ kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
neo4j-0   1/1       Running   0          2m
neo4j-1   1/1       Running   0          14s
neo4j-2   1/1       Running   0          10s

And we can tail neo4j’s log files like this:

$ kubectl logs neo4j-0
Starting Neo4j.
2016-11-25 16:39:50.333+0000 INFO  Starting...
2016-11-25 16:39:51.723+0000 INFO  Bolt enabled on 0.0.0.0:7687.
2016-11-25 16:39:51.733+0000 INFO  Initiating metrics...
2016-11-25 16:39:51.911+0000 INFO  Waiting for other members to join cluster before continuing...
2016-11-25 16:40:12.074+0000 INFO  Started.
2016-11-25 16:40:12.428+0000 INFO  Mounted REST API at: /db/manage
2016-11-25 16:40:13.350+0000 INFO  Remote interface available at http://neo4j-0.neo4j.default.svc.cluster.local:7474/
$ kubectl logs neo4j-1
Starting Neo4j.
2016-11-25 16:39:53.846+0000 INFO  Starting...
2016-11-25 16:39:56.212+0000 INFO  Bolt enabled on 0.0.0.0:7687.
2016-11-25 16:39:56.225+0000 INFO  Initiating metrics...
2016-11-25 16:39:56.341+0000 INFO  Waiting for other members to join cluster before continuing...
2016-11-25 16:40:16.623+0000 INFO  Started.
2016-11-25 16:40:16.951+0000 INFO  Mounted REST API at: /db/manage
2016-11-25 16:40:17.607+0000 INFO  Remote interface available at http://neo4j-1.neo4j.default.svc.cluster.local:7474/
$ kubectl logs neo4j-2
Starting Neo4j.
2016-11-25 16:39:57.828+0000 INFO  Starting...
2016-11-25 16:39:59.166+0000 INFO  Bolt enabled on 0.0.0.0:7687.
2016-11-25 16:39:59.176+0000 INFO  Initiating metrics...
2016-11-25 16:39:59.329+0000 INFO  Waiting for other members to join cluster before continuing...
2016-11-25 16:40:19.216+0000 INFO  Started.
2016-11-25 16:40:19.675+0000 INFO  Mounted REST API at: /db/manage
2016-11-25 16:40:21.029+0000 INFO  Remote interface available at http://neo4j-2.neo4j.default.svc.cluster.local:7474/

I wanted to log into the servers from my host machine’s browser so I setup port forwarding for each of the servers:

$ kubectl port-forward neo4j-0 7474:7474 7687:7687

We can then get an overview of the cluster by running the following procedure:

CALL dbms.cluster.overview()
 
╒════════════════════════════════════╤═════════════════════════════════════════════════════╤════════╕
│id                                  │addresses                                            │role    │
╞════════════════════════════════════╪═════════════════════════════════════════════════════╪════════╡
│81d8e5e2-02db-4414-85de-a7025b346e84│[bolt://neo4j-0.neo4j.default.svc.cluster.local:7687,│LEADER  │
│                                    │ http://neo4j-0.neo4j.default.svc.cluster.local:7474]│        │
├────────────────────────────────────┼─────────────────────────────────────────────────────┼────────┤
│347b7517-7ca0-4b92-b9f0-9249d46b2ad3│[bolt://neo4j-1.neo4j.default.svc.cluster.local:7687,│FOLLOWER│
│                                    │ http://neo4j-1.neo4j.default.svc.cluster.local:7474]│        │
├────────────────────────────────────┼─────────────────────────────────────────────────────┼────────┤
│a5ec1335-91ce-4358-910b-8af9086c2969│[bolt://neo4j-2.neo4j.default.svc.cluster.local:7687,│FOLLOWER│
│                                    │ http://neo4j-2.neo4j.default.svc.cluster.local:7474]│        │
└────────────────────────────────────┴─────────────────────────────────────────────────────┴────────┘

So far so good. What if we want to have 5 servers in the cluster instead of 3? We can run the following command to increase our replica size:

$ kubectl patch petset neo4j -p '{"spec":{"replicas":5}}'
"neo4j" patched

Let’s run that procedure again:

CALL dbms.cluster.overview()
 
╒════════════════════════════════════╤═════════════════════════════════════════════════════╤════════╕
│id                                  │addresses                                            │role    │
╞════════════════════════════════════╪═════════════════════════════════════════════════════╪════════╡
│81d8e5e2-02db-4414-85de-a7025b346e84│[bolt://neo4j-0.neo4j.default.svc.cluster.local:7687,│LEADER  │
│                                    │ http://neo4j-0.neo4j.default.svc.cluster.local:7474]│        │
├────────────────────────────────────┼─────────────────────────────────────────────────────┼────────┤
│347b7517-7ca0-4b92-b9f0-9249d46b2ad3│[bolt://neo4j-1.neo4j.default.svc.cluster.local:7687,│FOLLOWER│
│                                    │ http://neo4j-1.neo4j.default.svc.cluster.local:7474]│        │
├────────────────────────────────────┼─────────────────────────────────────────────────────┼────────┤
│a5ec1335-91ce-4358-910b-8af9086c2969│[bolt://neo4j-2.neo4j.default.svc.cluster.local:7687,│FOLLOWER│
│                                    │ http://neo4j-2.neo4j.default.svc.cluster.local:7474]│        │
├────────────────────────────────────┼─────────────────────────────────────────────────────┼────────┤
│28613d06-d4c5-461c-b277-ddb3f05e5647│[bolt://neo4j-3.neo4j.default.svc.cluster.local:7687,│FOLLOWER│
│                                    │ http://neo4j-3.neo4j.default.svc.cluster.local:7474]│        │
├────────────────────────────────────┼─────────────────────────────────────────────────────┼────────┤
│2eaa0058-a4f3-4f07-9f22-d310562ad1ec│[bolt://neo4j-4.neo4j.default.svc.cluster.local:7687,│FOLLOWER│
│                                    │ http://neo4j-4.neo4j.default.svc.cluster.local:7474]│        │
└────────────────────────────────────┴─────────────────────────────────────────────────────┴────────┘

Neat! And it’s as easy to go back down to 3 again:

$ kubectl patch petset neo4j -p '{"spec":{"replicas":3}}'
"neo4j" patched
CALL dbms.cluster.overview()
 
╒════════════════════════════════════╤══════════════════════════════════════════════════════╤════════╕
│id                                  │addresses                                             │role    │
╞════════════════════════════════════╪══════════════════════════════════════════════════════╪════════╡
│81d8e5e2-02db-4414-85de-a7025b346e84│[bolt://neo4j-0.neo4j.default.svc.cluster.local:7687, │LEADER  │
│                                    │http://neo4j-0.neo4j.default.svc.cluster.local:7474]  │        │
├────────────────────────────────────┼──────────────────────────────────────────────────────┼────────┤
│347b7517-7ca0-4b92-b9f0-9249d46b2ad3│[bolt://neo4j-1.neo4j.default.svc.cluster.local:7687, │FOLLOWER│
│                                    │http://neo4j-1.neo4j.default.svc.cluster.local:7474]  │        │
├────────────────────────────────────┼──────────────────────────────────────────────────────┼────────┤
│a5ec1335-91ce-4358-910b-8af9086c2969│[bolt://neo4j-2.neo4j.default.svc.cluster.local:7687, │FOLLOWER│
│                                    │http://neo4j-2.neo4j.default.svc.cluster.local:7474]  │        │
└────────────────────────────────────┴──────────────────────────────────────────────────────┴────────┘

Next I need to look at how we can add read replicas into the cluster. These don’t take part in the membership/quorum algorithm so I think I’ll be able to use the more common ReplicationController/Pod architecture for those.

If you want to play around with this the code is available as a gist. I’m using the minikube library for all my experiments but I’ll hopefully get around to trying this on GCE or AWS soon.

Written by Mark Needham

November 25th, 2016 at 4:55 pm

Posted in neo4j

Tagged with ,

Kubernetes: Writing hostname to a file

without comments

Over the weekend I spent a bit of time playing around with Kubernetes and to get the hang of the technology I set myself the task of writing the hostname of the machine to a file.

I’m using the excellent minikube tool to create a local Kubernetes cluster for my experiments so the first step is to spin that up:

$ minikube start
Starting local Kubernetes cluster...
Kubectl is now configured to use the cluster.

The first thing I needed to work out how to get the hostname. I figured there was probably an environment variable that I could access. We can call the env command to see a list of all the environment variables in a container so I created a pod template that would show me that information:

hostname_super_simple.yaml

apiVersion: v1
kind: Pod
metadata:
  name: mark-super-simple-test-pod
spec:
  containers:
    - name: test-container
      image: gcr.io/google_containers/busybox:1.24
      command: [ "/bin/sh", "-c", "env" ]      
  dnsPolicy: Default
  restartPolicy: Never

I then created a pod from that template and checked the logs of that pod:

$ kubectl create -f hostname_super_simple.yaml 
pod "mark-super-simple-test-pod" created
$ kubectl logs  mark-super-simple-test-pod
KUBERNETES_SERVICE_PORT=443
KUBERNETES_PORT=tcp://10.0.0.1:443
HOSTNAME=mark-super-simple-test-pod
SHLVL=1
HOME=/root
KUBERNETES_PORT_443_TCP_ADDR=10.0.0.1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_PORT_443_TCP=tcp://10.0.0.1:443
PWD=/
KUBERNETES_SERVICE_HOST=10.0.0.1

The information we need is in $HOSTNAME so the next thing we need to do is created a pod template which puts that into a file.

hostname_simple.yaml

apiVersion: v1
kind: Pod
metadata:
  name: mark-test-pod
spec:
  containers:
    - name: test-container
      image: gcr.io/google_containers/busybox:1.24
      command: [ "/bin/sh", "-c", "echo $HOSTNAME > /tmp/bar; cat /tmp/bar" ]
  dnsPolicy: Default
  restartPolicy: Never

We can create a pod using this template by running the following command:

$ kubectl create -f hostname_simple.yaml
pod "mark-test-pod" created

Now let’s check the logs of the instance to see whether our script worked:

$ kubectl logs mark-test-pod
mark-test-pod

Indeed it did, good times!

Written by Mark Needham

November 22nd, 2016 at 7:56 pm

Posted in Software Development

Tagged with