Mark Needham

Thoughts on Software Development

Archive for the ‘Kubernetes’ Category

Kubernetes: Copy a dataset to a StatefulSet’s PersistentVolume

without comments

In this post we’ll learn how to copy an existing dataset to the PersistentVolumes used by a Neo4j cluster running on Kubernetes.

Neo4j Clusters on Kubernetes

This posts assumes that we’re familiar with deploying Neo4j on Kubernetes. I wrote an article on the Neo4j blog explaining this in more detail.

The StatefulSet we create for our core servers require persistent storage, achieved via the PersistentVolumeClaim (PVC) primitive. A Neo4j cluster containing 3 core servers would have the following PVCs:

$ kubectl get pvc
NAME                            STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS   AGE
datadir-neo-helm-neo4j-core-0   Bound     pvc-043efa91-cc54-11e7-bfa5-080027ab9eac   10Gi       RWO            standard       45s
datadir-neo-helm-neo4j-core-1   Bound     pvc-1737755a-cc54-11e7-bfa5-080027ab9eac   10Gi       RWO            standard       13s
datadir-neo-helm-neo4j-core-2   Bound     pvc-18696bfd-cc54-11e7-bfa5-080027ab9eac   10Gi       RWO            standard       11s

Each of the PVCs has a corresponding PersistentVolume (PV) that satisifies it:

$ kubectl get pv
NAME                                       CAPACITY   ACCESS MODES   RECLAIM POLICY   STATUS    CLAIM                                   STORAGECLASS   REASON    AGE
pvc-043efa91-cc54-11e7-bfa5-080027ab9eac   10Gi       RWO            Delete           Bound     default/datadir-neo-helm-neo4j-core-0   standard                 41m
pvc-1737755a-cc54-11e7-bfa5-080027ab9eac   10Gi       RWO            Delete           Bound     default/datadir-neo-helm-neo4j-core-1   standard                 40m
pvc-18696bfd-cc54-11e7-bfa5-080027ab9eac   10Gi       RWO            Delete           Bound     default/datadir-neo-helm-neo4j-core-2   standard                 40m

The PVCs and PVs are usually created at the same time that we deploy our StatefulSet. We need to intervene in that lifecycle so that our dataset is already in place before the StatefulSet is deployed.

Deploying an existing dataset

We can do this by following these steps:

  • Create PVCs with the above names manually
  • Attach pods to those PVCs
  • Copy our dataset onto those pods
  • Delete the pods
  • Deploy our Neo4j cluster

We can use the following script to create the PVCs and pods:

pvs.sh

#!/usr/bin/env bash
 
set -exuo pipefail
 
for i in $(seq 0 2); do
  cat <<EOF | kubectl apply -f -
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: datadir-neo-helm-neo4j-core-${i}
  labels:
    app: neo4j
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
EOF
 
  cat <<EOF | kubectl apply -f -
kind: Pod
apiVersion: v1
metadata:
  name: neo4j-load-data-${i}
  labels:
    app: neo4j-loader
spec:
  volumes:
    - name: datadir-neo4j-core-${i}
      persistentVolumeClaim:
        claimName: datadir-neo-helm-neo4j-core-${i}
  containers:
    - name: neo4j-load-data-${i}
      image: ubuntu
      volumeMounts:
      - name: datadir-neo4j-core-${i}
        mountPath: /data
      command: ["/bin/bash", "-ecx", "while :; do printf '.'; sleep 5 ; done"]
EOF
 
done;

Let’s run that script to create our PVCs and pods:

$ ./pvs.sh 
++ seq 0 2
+ for i in $(seq 0 2)
+ cat
+ kubectl apply -f -
persistentvolumeclaim "datadir-neo-helm-neo4j-core-0" configured
+ cat
+ kubectl apply -f -
pod "neo4j-load-data-0" configured
+ for i in $(seq 0 2)
+ cat
+ kubectl apply -f -
persistentvolumeclaim "datadir-neo-helm-neo4j-core-1" configured
+ cat
+ kubectl apply -f -
pod "neo4j-load-data-1" configured
+ for i in $(seq 0 2)
+ cat
+ kubectl apply -f -
persistentvolumeclaim "datadir-neo-helm-neo4j-core-2" configured
+ cat
+ kubectl apply -f -
pod "neo4j-load-data-2" configured

Now we can copy our database onto the pods:

for i in $(seq 0 2); do
  kubectl cp graph.db.tar.gz neo4j-load-data-${i}:/data/
  kubectl exec neo4j-load-data-${i} -- bash -c "mkdir -p /data/databases && tar -xf  /data/graph.db.tar.gz -C /data/databases"
done

graph.db.tar.gz contains a backup from a local database I created:

$ tar -tvf graph.db.tar.gz 
 
drwxr-xr-x  0 markneedham staff       0 24 Jul 15:23 graph.db/
drwxr-xr-x  0 markneedham staff       0 24 Jul 15:23 graph.db/certificates/
drwxr-xr-x  0 markneedham staff       0 17 Feb  2017 graph.db/index/
drwxr-xr-x  0 markneedham staff       0 24 Jul 15:22 graph.db/logs/
-rw-r--r--  0 markneedham staff    8192 24 Jul 15:23 graph.db/neostore
-rw-r--r--  0 markneedham staff     896 24 Jul 15:23 graph.db/neostore.counts.db.a
-rw-r--r--  0 markneedham staff    1344 24 Jul 15:23 graph.db/neostore.counts.db.b
-rw-r--r--  0 markneedham staff       9 24 Jul 15:23 graph.db/neostore.id
-rw-r--r--  0 markneedham staff   65536 24 Jul 15:23 graph.db/neostore.labelscanstore.db
...
-rw-------  0 markneedham staff     1700 24 Jul 15:23 graph.db/certificates/neo4j.key

We’ll run the following command to check the databases are in place:

$ kubectl exec neo4j-load-data-0 -- ls -lh /data/databases/
total 4.0K
drwxr-xr-x 6 501 staff 4.0K Jul 24 14:23 graph.db
 
$ kubectl exec neo4j-load-data-1 -- ls -lh /data/databases/
total 4.0K
drwxr-xr-x 6 501 staff 4.0K Jul 24 14:23 graph.db
 
$ kubectl exec neo4j-load-data-2 -- ls -lh /data/databases/
total 4.0K
drwxr-xr-x 6 501 staff 4.0K Jul 24 14:23 graph.db

All good so far. The pods have done their job so we’ll tear those down:

$ kubectl delete pods -l app=neo4j-loader
pod "neo4j-load-data-0" deleted
pod "neo4j-load-data-1" deleted
pod "neo4j-load-data-2" deleted

We’re now ready to deploy our Neo4j cluster.

helm install incubator/neo4j --name neo-helm --wait --set authEnabled=false

Finally we’ll run a Cypher query to check that the Neo4j servers used the database that we uploaded:

$ kubectl exec neo-helm-neo4j-core-0 -- bin/cypher-shell "match (n) return count(*)"
count(*)
32314
 
$ kubectl exec neo-helm-neo4j-core-1 -- bin/cypher-shell "match (n) return count(*)"
count(*)
32314
 
$ kubectl exec neo-helm-neo4j-core-2 -- bin/cypher-shell "match (n) return count(*)"
count(*)
32314

Success!

We could achieve similar results by using an init container but I haven’t had a chance to try out that approach yet. If you give it a try let me know in the comments and I’ll add it to the post.

Written by Mark Needham

November 18th, 2017 at 12:44 pm

Posted in Kubernetes,neo4j

Tagged with ,

Kubernetes 1.8: Using Cronjobs to take Neo4j backups

without comments

With the release of Kubernetes 1.8 Cronjobs have graduated to beta, which means we can now more easily run Neo4j backup jobs against Kubernetes clusters.

Before we learn how to write a Cronjob let’s first create a local Kubernetes cluster and deploy Neo4j.

Spinup Kubernetes & Helm

minikube start --memory 8192
helm init && kubectl rollout status -w deployment/tiller-deploy --namespace=kube-system

Deploy a Neo4j cluster

helm repo add incubator https://kubernetes-charts-incubator.storage.googleapis.com/
helm install incubator/neo4j --name neo-helm --wait --set authEnabled=false,core.extraVars.NEO4J_dbms_backup_address=0.0.0.0:6362

Populate our cluster with data

We can run the following command to check our cluster is ready:

kubectl exec neo-helm-neo4j-core-0 \
  -- bin/cypher-shell --format verbose \
  "CALL dbms.cluster.overview() YIELD id, role RETURN id, role"
 
+-----------------------------------------------------+
| id                                     | role       |
+-----------------------------------------------------+
| "0b3bfe6c-6a68-4af5-9dd2-e96b564df6e5" | "LEADER"   |
| "09e9bee8-bdc5-4e95-926c-16ea8213e6e7" | "FOLLOWER" |
| "859b9b56-9bfc-42ae-90c3-02cedacfe720" | "FOLLOWER" |
+-----------------------------------------------------+

Now let’s create some data:

kubectl exec neo-helm-neo4j-core-0 \
  -- bin/cypher-shell --format verbose \
  "UNWIND range(0,1000) AS id CREATE (:Person {id: id})"
 
0 rows available after 653 ms, consumed after another 0 ms
Added 1001 nodes, Set 1001 properties, Added 1001 labels

Now that our Neo4j cluster is running and contains data we want to take regular backups.

Neo4j backups

The Neo4j backup tool supports full and incremental backups.

Full backup

A full backup streams a copy of the store files to the backup location and any transactions that happened during the backup. Those transactions are then applied to the backed up copy of the database.

Incremental backup

An incremental backup is triggered if there is already a Neo4j database in the backup location. In this case there will be no copying of store files. Instead the tool will copy any new transactions from Neo4j and apply them to the backup.

Backup Cronjob

We will use a Cronjob to execute a backup. In the example below we attach a PersistentVolumeClaim to our Cronjob so that we can see both of the backup scenarios in action.

kind: PersistentVolumeClaim
apiVersion: v1
metadata:
  name: backupdir-neo4j
  labels:
    app: neo4j-backup
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 10Gi
---
apiVersion: batch/v1beta1
kind: CronJob
metadata:
  name: neo4j-backup
spec:
  schedule: "*/1 * * * *"
  jobTemplate:
    spec:
      template:
        spec:
          volumes:
            - name: backupdir-neo4j
              persistentVolumeClaim:
                claimName: backupdir-neo4j
          containers:
            - name: neo4j-backup
              image: neo4j:3.3.0-enterprise
              env:
                - name: NEO4J_ACCEPT_LICENSE_AGREEMENT
                  value: "yes"
              volumeMounts:
              - name: backupdir-neo4j
                mountPath: /tmp
              command: ["bin/neo4j-admin",  "backup", "--backup-dir", "/tmp", "--name", "backup", "--from", "neo-helm-neo4j-core-2.neo-helm-neo4j.default.svc.cluster.local:6362"]
          restartPolicy: OnFailure

$ kubectl apply -f cronjob.yaml 
cronjob "neo4j-backup" created
kubectl get jobs -w
NAME                      DESIRED   SUCCESSFUL   AGE
neo4j-backup-1510940940   1         1            34s

Now let’s view the logs from the job:

kubectl logs $(kubectl get pods -a --selector=job-name=neo4j-backup-1510940940 --output=jsonpath={.items..metadata.name})
 
Doing full backup...
2017-11-17 17:49:05.920+0000 INFO [o.n.c.s.StoreCopyClient] Copying neostore.nodestore.db.labels
...
2017-11-17 17:49:06.038+0000 INFO [o.n.c.s.StoreCopyClient] Copied neostore.labelscanstore.db 48.00 kB
2017-11-17 17:49:06.038+0000 INFO [o.n.c.s.StoreCopyClient] Done, copied 18 files
2017-11-17 17:49:06.094+0000 INFO [o.n.b.BackupService] Start recovering store
2017-11-17 17:49:07.669+0000 INFO [o.n.b.BackupService] Finish recovering store
Doing consistency check...
2017-11-17 17:49:07.716+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Selected RecordFormat:StandardV3_2[v0.A.8] record format from store /tmp/backup
2017-11-17 17:49:07.716+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Format not configured. Selected format from the store: RecordFormat:StandardV3_2[v0.A.8]
2017-11-17 17:49:07.755+0000 INFO [o.n.m.MetricsExtension] Initiating metrics...
....................  10%
...
.................... 100%
Backup complete.

All good so far. Now let’s create more data:

kubectl exec neo-helm-neo4j-core-0 \
  -- bin/cypher-shell --format verbose \
  "UNWIND range(0,1000) AS id CREATE (:Person {id: id})"
 
 
0 rows available after 114 ms, consumed after another 0 ms
Added 1001 nodes, Set 1001 properties, Added 1001 labels

And wait for another backup job to run:

kubectl get jobs -w
NAME                      DESIRED   SUCCESSFUL   AGE
neo4j-backup-1510941180   1         1            2m
neo4j-backup-1510941240   1         1            1m
neo4j-backup-1510941300   1         1            17s
kubectl logs $(kubectl get pods -a --selector=job-name=neo4j-backup-1510941300 --output=jsonpath={.items..metadata.name})
 
Destination is not empty, doing incremental backup...
Doing consistency check...
2017-11-17 17:55:07.958+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Selected RecordFormat:StandardV3_2[v0.A.8] record format from store /tmp/backup
2017-11-17 17:55:07.959+0000 INFO [o.n.k.i.s.f.RecordFormatSelector] Format not configured. Selected format from the store: RecordFormat:StandardV3_2[v0.A.8]
2017-11-17 17:55:07.995+0000 INFO [o.n.m.MetricsExtension] Initiating metrics...
....................  10%
...
.................... 100%
Backup complete.

If we were to extend this job further we could have it copy each backup it takes to an S3 bucket but I’ll leave that for another day.

Written by Mark Needham

November 17th, 2017 at 6:10 pm

Posted in Kubernetes,neo4j

Tagged with , , ,

Kubernetes: Simple example of pod running

without comments

I recently needed to create a Kubernetes pod that would ‘just sit there’ while I used kube cp to copy some files to a persistent volume to which it was bound.

I started out with this naive pod spec:

pod_no_while.yaml

kind: Pod
apiVersion: v1
metadata:
  name: marks-dummy-pod
spec:
  containers:
    - name: marks-dummy-pod
      image: ubuntu
  restartPolicy: Never

Let’s apply that template:

$ kubectl apply -f pod_no_while.yaml 
pod "marks-dummy-pod" created

And let’s check if we have any running pods:

$ kubectl get pods
No resources found, use --show-all to see completed objects.

We won’t see anything here because unsurprisingly the pod has run to completion as there’s nothing to keep it running! We can confirm that by running this command:

$ kubectl get pods --show-all
NAME              READY     STATUS      RESTARTS   AGE
marks-dummy-pod   0/1       Completed   0          1m

Now let’s create a pod that has an infinite bash while loop:

pod.yaml

kind: Pod
apiVersion: v1
metadata:
  name: marks-dummy-pod
spec:
  containers:
    - name: marks-dummy-pod
      image: ubuntu
      command: ["/bin/bash", "-ec", "while :; do echo '.'; sleep 5 ; done"]
  restartPolicy: Never

Let’s apply that one instead:

$ kubectl apply -f pod.yaml 
The Pod "marks-dummy-pod" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)

Oops, we need to delete it first so let’s do that:

$ kubectl delete pod marks-dummy-pod
pod "marks-dummy-pod" deleted

Attempt #2:

$ kubectl apply -f pod.yaml 
pod "marks-dummy-pod" created

And let’s checkup on our pod:

$ kubectl get pods
NAME              READY     STATUS    RESTARTS   AGE
marks-dummy-pod   1/1       Running   0          14s

Looks better already. Let’s check the logs

$ kubectl logs marks-dummy-pod 
.
.

Great! We can now kubectl cp to our heart’s content and then delete the pod aftewards.

Written by Mark Needham

October 21st, 2017 at 10:06 am

Posted in Kubernetes

Tagged with ,

Kubernetes: Which node is a pod on?

without comments

When running Kubernetes on a cloud provider, rather than locally using minikube, it’s useful to know which node a pod is running on.

The normal command to list pods doesn’t contain this information:

$ kubectl get pod
NAME           READY     STATUS    RESTARTS   AGE       
neo4j-core-0   1/1       Running   0          6m        
neo4j-core-1   1/1       Running   0          6m        
neo4j-core-2   1/1       Running   0          2m

I spent a while searching for a command that I could use before I came across Ta-Ching Chen’s blog post while looking for something else.

Ta-Ching points out that we just need to add the flag -o wide to our original command to get the information we require:

$ kubectl get pod -o wide
NAME           READY     STATUS    RESTARTS   AGE       IP           NODE
neo4j-core-0   1/1       Running   0          6m        10.32.3.6    gke-neo4j-cluster-default-pool-ded394fa-0kpw
neo4j-core-1   1/1       Running   0          6m        10.32.3.7    gke-neo4j-cluster-default-pool-ded394fa-0kpw
neo4j-core-2   1/1       Running   0          2m        10.32.0.10   gke-neo4j-cluster-default-pool-ded394fa-kp68

Easy!

Written by Mark Needham

June 14th, 2017 at 8:49 am

Posted in Kubernetes

Tagged with

Kubernetes: Simulating a network partition

without comments

A couple of weeks ago I wrote a post explaining how to create a Neo4j causal cluster using Kubernetes and … the I wanted to work out how to simulate a network partition which would put the leader on the minority side and force an election.

We’ve done this on our internal tooling on AWS using the iptables command but unfortunately that isn’t available in my container, which only has the utilities provided by BusyBox.

Luckily one of these is route command which will allow us to achieve the same thing.

To recap, I have 3 Neo4j pods up and running:

$ kubectl get pods
NAME      READY     STATUS    RESTARTS   AGE
neo4j-0   1/1       Running   0          6h
neo4j-1   1/1       Running   0          6h
neo4j-2   1/1       Running   0          6h

And we can check that the route command is available:

$ kubectl exec neo4j-0 -- ls -alh /sbin/route 
lrwxrwxrwx    1 root     root          12 Oct 18 18:58 /sbin/route -> /bin/busybox

Let’s have a look what role each server is currently playing:

$ kubectl exec neo4j-0 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"FOLLOWER"
 
Bye!
$ kubectl exec neo4j-1 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"FOLLOWER"
 
Bye!
$ kubectl exec neo4j-2 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"LEADER"
 
Bye!

Slight aside: I’m able to call cypher-shell without a user and password because I’ve disable authorisation by putting the following in conf/neo4j.conf:

dbms.connector.bolt.enabled=true

Back to the network partitioning…we need to partition away neo4j-2 from the other two servers which we can do by running the following commands:

$ kubectl exec neo4j-2 -- route add -host neo4j-0.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-2 -- route add -host neo4j-1.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-0 -- route add -host neo4j-2.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-1 -- route add -host neo4j-2.neo4j.default.svc.cluster.local reject

If we look at the logs of neo4j-2 we can see that it’s stepped down after being disconnected from the other two servers:

$ kubectl exec neo4j-2 -- cat logs/debug.log
...
2016-12-04 11:30:10.186+0000 INFO  [o.n.c.c.c.RaftMachine] Moving to FOLLOWER state after not receiving heartbeat responses in this election timeout period. Heartbeats received: []
...

Who’s taken over as leader?

$ kubectl exec neo4j-0 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"LEADER"
 
Bye!
$ kubectl exec neo4j-1 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"FOLLOWER"
 
Bye!
$ kubectl exec neo4j-2 -- bin/cypher-shell "CALL dbms.cluster.role()"
role
"FOLLOWER"
 
Bye!

Looks like neo4j-0! Let’s put some data into the database:

$ kubectl exec neo4j-0 -- bin/cypher-shell "CREATE (:Person {name: 'Mark'})"
Added 1 nodes, Set 1 properties, Added 1 labels
 
Bye!

Let’s check if that node made it to the other two servers. We’d expect it to be on neo4j-1 but not on neo4j-2:

$ kubectl exec neo4j-1 -- bin/cypher-shell "MATCH (p:Person) RETURN p"
p
(:Person {name: "Mark"})
 
Bye!
$ kubectl exec neo4j-2 -- bin/cypher-shell "MATCH (p:Person) RETURN p"
 
 
Bye!

On neo4j-2 we’ll repeatedly see these types of entries in the log as its election timeout triggers but fails to get any responses to the vote requests it sends out:

$ kubectl exec neo4j-2 -- cat logs/debug.log
...
2016-12-04 11:32:56.735+0000 INFO  [o.n.c.c.c.RaftMachine] Election timeout triggered
2016-12-04 11:32:56.736+0000 INFO  [o.n.c.c.c.RaftMachine] Election started with vote request: Vote.Request from MemberId{ca9b954c} {term=11521, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467} and members: [MemberId{484178c4}, MemberId{0acdb8dd}, MemberId{ca9b954c}]
...

We can see those vote requests by looking at the raft-messages.log which can be enabled by setting the following property in conf/neo4j.conf:

causal_clustering.raft_messages_log_enable=true
$ kubectl exec neo4j-2 -- cat logs/raft-messages.log
...
11:33:42.101 -->MemberId{484178c4}: Request: Vote.Request from MemberId{ca9b954c} {term=11537, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
11:33:42.102 -->MemberId{0acdb8dd}: Request: Vote.Request from MemberId{ca9b954c} {term=11537, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
 
11:33:45.432 -->MemberId{484178c4}: Request: Vote.Request from MemberId{ca9b954c} {term=11538, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
11:33:45.433 -->MemberId{0acdb8dd}: Request: Vote.Request from MemberId{ca9b954c} {term=11538, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
 
11:33:48.362 -->MemberId{484178c4}: Request: Vote.Request from MemberId{ca9b954c} {term=11539, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
11:33:48.362 -->MemberId{0acdb8dd}: Request: Vote.Request from MemberId{ca9b954c} {term=11539, candidate=MemberId{ca9b954c}, lastAppended=68, lastLogTerm=11467}
...

To ‘heal’ the network partition we just need to delete all the commands we ran earlier:

$ kubectl exec neo4j-2 -- route delete neo4j-0.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-2 -- route delete neo4j-1.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-0 -- route delete neo4j-2.neo4j.default.svc.cluster.local reject && \
  kubectl exec neo4j-1 -- route delete neo4j-2.neo4j.default.svc.cluster.local reject

Now let’s check that neo4j-2 now has the node that we created earlier:

$ kubectl exec neo4j-2 -- bin/cypher-shell "MATCH (p:Person) RETURN p"
p
(:Person {name: "Mark"})
 
Bye!

That’s all for now!

Written by Mark Needham

December 4th, 2016 at 12:37 pm

Posted in Kubernetes,neo4j

Tagged with ,