Mark Needham

Thoughts on Software Development

Neo4j Browser: Expected entity id to be an integral value

without comments

I came across an interesting error while writing a Cypher query that used parameters in the Neo4j browser which I thought I should document for future me.

We’ll start with a graph that has 1,000 people:

unwind range(0,1000) AS id
create (:Person {id: id})

Now we’ll try and retrieve some of those people via a parameter lookup:

:param ids: [0]
match (p:Person) where p.id in {ids}
return p
 
╒════════╕
│"p"     │
╞════════╡
│{"id":0}│
└────────┘

All good so far. Now what about if we try and look them up by their internal node id instead:

match (p:Person) where id(p) in {ids}
return p
 
Neo.ClientError.Statement.TypeError
Expected entity id to be an integral value

Hmm, that was unexpected. It turns out that to get this query to work we need to cast each of the integer values in the {ids} array to a 64 bit integer using the toInteger function:

match (p:Person) where id(p) in [id in {ids} | toInteger(id)]
return p
 
╒════════╕
│"p"     │
╞════════╡
│{"id":0}│
└────────┘

Success!

Written by Mark Needham

November 6th, 2017 at 4:17 pm

Posted in neo4j

Tagged with , ,

Neo4j: Traversal query timeout

without comments

I’ve been spending some of my spare time over the last few weeks creating an application that generates running routes from Open Roads data – transformed and imported into Neo4j of course!

I’ve created a user defined procedure which combines several shortest path queries, but I wanted to exit any of these shortest path searches if they were taking too long. My code without a timeout looks like this:

StandardExpander orderedExpander = new OrderedByTypeExpander()
    .add( RelationshipType.withName( "CONNECTS" ), Direction.BOTH );
 
PathFinder<Path> shortestPathFinder = GraphAlgoFactory.shortestPath( expander, 250 );
 
...

There are several places where we could check the time elapsed, but the expand method in the Expander seemed like an obvious one to me. I wrote my own Expander class which looks like this:

public class TimeConstrainedExpander implements PathExpander
{
    private final StandardExpander expander;
    private final long startTime;
    private final Clock clock;
    private int pathsExpanded = 0;
    private long timeLimitInMillis;
 
    public TimeConstrainedExpander( StandardExpander expander, Clock clock, long timeLimitInMillis )
    {
        this.expander = expander;
        this.clock = clock;
        this.startTime = clock.instant().toEpochMilli();
        this.timeLimitInMillis = timeLimitInMillis;
    }
 
    @Override
    public Iterable<Relationship> expand( Path path, BranchState state )
    {
        long timeSoFar = clock.instant().toEpochMilli() - startTime;
        if ( timeSoFar > timeLimitInMillis )
        {
            return Collections.emptyList();
        }
 
        return expander.expand( path, state );
    }
 
    @Override
    public PathExpander reverse()
    {
        return expander.reverse();
    }
}

The code snippet from earlier now needs to be updated to use our new class, which isn’t too tricky:

StandardExpander orderedExpander = new OrderedByTypeExpander()
    .add( RelationshipType.withName( "CONNECTS" ), Direction.BOTH );
 
TimeConstrainedExpander expander = new TimeConstrainedExpander(orderedExpander, 
    Clock.systemUTC(), 200);
 
PathFinder<Path> shortestPathFinder = GraphAlgoFactory.shortestPath( expander, 250 );
...

I’m not sure if this is the best way to achieve what I want but after failing with several other approaches at least this one actually works!

Written by Mark Needham

October 31st, 2017 at 9:43 pm

Posted in neo4j

Tagged with ,

Kubernetes: Simple example of pod running

without comments

I recently needed to create a Kubernetes pod that would ‘just sit there’ while I used kube cp to copy some files to a persistent volume to which it was bound.

I started out with this naive pod spec:

pod_no_while.yaml

kind: Pod
apiVersion: v1
metadata:
  name: marks-dummy-pod
spec:
  containers:
    - name: marks-dummy-pod
      image: ubuntu
  restartPolicy: Never

Let’s apply that template:

$ kubectl apply -f pod_no_while.yaml 
pod "marks-dummy-pod" created

And let’s check if we have any running pods:

$ kubectl get pods
No resources found, use --show-all to see completed objects.

We won’t see anything here because unsurprisingly the pod has run to completion as there’s nothing to keep it running! We can confirm that by running this command:

$ kubectl get pods --show-all
NAME              READY     STATUS      RESTARTS   AGE
marks-dummy-pod   0/1       Completed   0          1m

Now let’s create a pod that has an infinite bash while loop:

pod.yaml

kind: Pod
apiVersion: v1
metadata:
  name: marks-dummy-pod
spec:
  containers:
    - name: marks-dummy-pod
      image: ubuntu
      command: ["/bin/bash", "-ec", "while :; do echo '.'; sleep 5 ; done"]
  restartPolicy: Never

Let’s apply that one instead:

$ kubectl apply -f pod.yaml 
The Pod "marks-dummy-pod" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)

Oops, we need to delete it first so let’s do that:

$ kubectl delete pod marks-dummy-pod
pod "marks-dummy-pod" deleted

Attempt #2:

$ kubectl apply -f pod.yaml 
pod "marks-dummy-pod" created

And let’s checkup on our pod:

$ kubectl get pods
NAME              READY     STATUS    RESTARTS   AGE
marks-dummy-pod   1/1       Running   0          14s

Looks better already. Let’s check the logs

$ kubectl logs marks-dummy-pod 
.
.

Great! We can now kubectl cp to our heart’s content and then delete the pod aftewards.

Written by Mark Needham

October 21st, 2017 at 10:06 am

Posted in Kubernetes

Tagged with ,

Neo4j: Cypher – Deleting duplicate nodes

without comments

I had a problem on a graph I was working on recently where I’d managed to create duplicate nodes because I hadn’t applied any unique constraints.

I wanted to remove the duplicates, and came across Jimmy Ruts’ excellent post which shows some ways to do this.

Let’s first create a graph with some duplicate nodes to play with:

UNWIND range(0, 100) AS id
CREATE (p1:Person {id: toInteger(rand() * id)})
MERGE (p2:Person {id: toInteger(rand() * id)})
MERGE (p3:Person {id: toInteger(rand() * id)})
MERGE (p4:Person {id: toInteger(rand() * id)})
CREATE (p1)-[:KNOWS]->(p2)
CREATE (p1)-[:KNOWS]->(p3)
CREATE (p1)-[:KNOWS]->(p4)
 
Added 173 labels, created 173 nodes, set 173 properties, created 5829 relationships, completed after 408 ms.

How do we find the duplicate nodes?

MATCH (p:Person)
WITH p.id as id, collect(p) AS nodes 
WHERE size(nodes) >  1
RETURN [ n in nodes | n.id] AS ids, size(nodes)
ORDER BY size(nodes) DESC
LIMIT 10
 
╒══════════════════════╤═════════════╕
│"ids"                 │"size(nodes)"│
╞══════════════════════╪═════════════╡
│[1,1,1,1,1,1,1,1]     │8            │
├──────────────────────┼─────────────┤
│[0,0,0,0,0,0,0,0]     │8            │
├──────────────────────┼─────────────┤
│[17,17,17,17,17,17,17]│7            │
├──────────────────────┼─────────────┤
│[4,4,4,4,4,4,4]       │7            │
├──────────────────────┼─────────────┤
│[2,2,2,2,2,2]         │6            │
├──────────────────────┼─────────────┤
│[5,5,5,5,5,5]         │6            │
├──────────────────────┼─────────────┤
│[19,19,19,19,19,19]   │6            │
├──────────────────────┼─────────────┤
│[11,11,11,11,11]      │5            │
├──────────────────────┼─────────────┤
│[25,25,25,25,25]      │5            │
├──────────────────────┼─────────────┤
│[43,43,43,43,43]      │5            │
└──────────────────────┴─────────────┘

Let’s zoom in on all the people with ‘id:1’ and work out how many relationships they have. Our plan is keep the node that has the most connections and get rid of the others.

MATCH (p:Person)
WITH p.id as id, collect(p) AS nodes 
WHERE size(nodes) >  1
WITH nodes ORDER BY size(nodes) DESC
LIMIT 1
UNWIND nodes AS n 
RETURN n.id, id(n) AS internalId, size((n)--()) AS rels
ORDER BY rels DESC
 
╒══════╤════════════╤══════╕
│"n.id"│"internalId"│"rels"│
╞══════╪════════════╪══════╡
│1     │175         │1284  │
├──────┼────────────┼──────┤
│1     │184         │721   │
├──────┼────────────┼──────┤
│1     │180         │580   │
├──────┼────────────┼──────┤
│1     │2           │391   │
├──────┼────────────┼──────┤
│1     │195         │361   │
├──────┼────────────┼──────┤
│1     │199         │352   │
├──────┼────────────┼──────┤
│1     │302         │5     │
├──────┼────────────┼──────┤
│1     │306         │1     │
└──────┴────────────┴──────┘

So in this example we want to keep the node that has 210 relationships and delete the rest.

To make things easy we need the node with the highest cardinality to be first or last in our list. We can ensure that’s the case by ordering the nodes before we group them.

MATCH (p:Person)
WITH p 
ORDER BY p.id, size((p)--()) DESC
WITH p.id as id, collect(p) AS nodes 
WHERE size(nodes) >  1
RETURN [ n in nodes | {id: n.id,rels: size((n)--()) } ] AS ids, size(nodes)
ORDER BY size(nodes) DESC
LIMIT 10
 
╒══════════════════════════════════════════════════════════════════════╤═════════════╕
│"ids"                                                                 │"size(nodes)"│
╞══════════════════════════════════════════════════════════════════════╪═════════════╡
│[{"id":1,"rels":1284},{"id":1,"rels":721},{"id":1,"rels":580},{"id":1,│8            │
│"rels":391},{"id":1,"rels":361},{"id":1,"rels":352},{"id":1,"rels":5},│             │
│{"id":1,"rels":1}]                                                    │             │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":0,"rels":2064},{"id":0,"rels":2059},{"id":0,"rels":1297},{"id":│8            │
│0,"rels":1124},{"id":0,"rels":995},{"id":0,"rels":928},{"id":0,"rels":│             │
│730},{"id":0,"rels":702}]                                             │             │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":17,"rels":153},{"id":17,"rels":105},{"id":17,"rels":81},{"id":1│7            │
│7,"rels":31},{"id":17,"rels":15},{"id":17,"rels":14},{"id":17,"rels":1│             │
│}]                                                                    │             │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":4,"rels":394},{"id":4,"rels":320},{"id":4,"rels":250},{"id":4,"│7            │
│rels":201},{"id":4,"rels":162},{"id":4,"rels":162},{"id":4,"rels":14}]│             │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":2,"rels":514},{"id":2,"rels":329},{"id":2,"rels":318},{"id":2,"│6            │
│rels":241},{"id":2,"rels":240},{"id":2,"rels":2}]                     │             │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":5,"rels":487},{"id":5,"rels":378},{"id":5,"rels":242},{"id":5,"│6            │
│rels":181},{"id":5,"rels":158},{"id":5,"rels":8}]                     │             │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":19,"rels":153},{"id":19,"rels":120},{"id":19,"rels":84},{"id":1│6            │
│9,"rels":53},{"id":19,"rels":45},{"id":19,"rels":1}]                  │             │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":11,"rels":222},{"id":11,"rels":192},{"id":11,"rels":172},{"id":│5            │
│11,"rels":152},{"id":11,"rels":89}]                                   │             │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":25,"rels":133},{"id":25,"rels":107},{"id":25,"rels":98},{"id":2│5            │
│5,"rels":15},{"id":25,"rels":2}]                                      │             │
├──────────────────────────────────────────────────────────────────────┼─────────────┤
│[{"id":43,"rels":92},{"id":43,"rels":85},{"id":43,"rels":9},{"id":43,"│5            │
│rels":5},{"id":43,"rels":1}]                                          │             │
└──────────────────────────────────────────────────────────────────────┴─────────────┘───────────────────────────────────────────────────────┴─────────────┘

Now it’s time to delete the duplicates:

MATCH (p:Person)
WITH p 
ORDER BY p.id, size((p)--()) DESC
WITH p.id as id, collect(p) AS nodes 
WHERE size(nodes) >  1
UNWIND nodes[1..] AS n
DETACH DELETE n
 
Deleted 143 nodes, deleted 13806 relationships, completed after 29 ms.

Now if we run our duplicate query:

MATCH (p:Person)
WITH p.id as id, collect(p) AS nodes 
WHERE size(nodes) >  1
RETURN [ n in nodes | n.id] AS ids, size(nodes)
ORDER BY size(nodes) DESC
LIMIT 10
 
(no changes, no records)

What about if we remove the WHERE clause?

MATCH (p:Person)
WITH p.id as id, collect(p) AS nodes 
RETURN [ n in nodes | n.id] AS ids, size(nodes)
ORDER BY size(nodes) DESC
LIMIT 10
 
╒═════╤═════════════╕
│"ids"│"size(nodes)"│
╞═════╪═════════════╡
│[23] │1            │
├─────┼─────────────┤
│[86] │1            │
├─────┼─────────────┤
│[77] │1            │
├─────┼─────────────┤
│[59] │1            │
├─────┼─────────────┤
│[50] │1            │
├─────┼─────────────┤
│[32] │1            │
├─────┼─────────────┤
│[41] │1            │
├─────┼─────────────┤
│[53] │1            │
├─────┼─────────────┤
│[44] │1            │
├─────┼─────────────┤
│[8]  │1            │
└─────┴─────────────┘

Hoorah, no more duplicates! Finally, let’s check that we kept the node we expected to keep. We expect it to have an ‘internalId’ of 175:

MATCH (p:Person {id: 1})
RETURN size((p)--()), id(p) AS internalId
 
╒═══════════════╤════════════╕
│"size((p)--())"│"internalId"│
╞═══════════════╪════════════╡
│242            │175         │
└───────────────┴────────────┘

Which it does! There are many fewer relationships than we had before because a lot of those relationships were to duplicate nodes that we’ve now deleted.

If we want to go a step further we could ‘merge’ the duplicate node’s relationships onto the nodes that we did keep, but that’s for another post!

Written by Mark Needham

October 6th, 2017 at 4:13 pm

Posted in neo4j

Tagged with ,

AWS: Spinning up a Neo4j instance with APOC installed

without comments

One of the first things I do after installing Neo4j is install the APOC library, but I find it’s a bit of a manual process when spinning up a server on AWS so I wanted to simplify it a bit.

There’s already a Neo4j AMI which installs Neo4j 3.2.0 and my colleague Michael pointed out that we could download APOC into the correct folder by writing a script and sending it as UserData.

I’ve been doing some work in JavaScript over the last two weeks so I thought I’d automate all the steps using the AWS library. You can find the full script on GitHub.

The UserData part of the script is actually very simple:

This script creates a key pair, security group, opens up that security group on ports 22 (SSH), 7474 (HTTP), 7473 (HTTPS), and 7687 (Bolt). The server created is m3.medium, but you can change that to something else if you prefer.

#!/bin/bash
curl -L https://github.com/neo4j-contrib/neo4j-apoc-procedures/releases/download/3.2.0.3/apoc-3.2.0.3-all.jar -O
sudo cp apoc-3.2.0.3-all.jar /var/lib/neo4j/plugins/

We can run it like this:

$ node neo4j-with-apoc.js 
Creating a Neo4j server
Key pair created. Save this to a file - you'll need to use it if you want to ssh into the Neo4j server
-----BEGIN RSA PRIVATE KEY-----
<Private key details>
-----END RSA PRIVATE KEY-----
Created Group Id:<Group Id>
Opened Neo4j ports
Instance Id: <Instance Id>
Your Neo4j server is now ready!
You'll need to login to the server and change the default password:
https://ec2-ip-address.compute-1.amazonaws.com:7473 or http://ec2-ip-address.compute-1.amazonaws.com:7474
User:neo4j, Password:<Instance Id>

We’ll need to wait a few seconds for Neo4j to spin up, but it’ll be accessible at the URI specified.

Once it’s accessible we can login with the username neo4j and password . We’ll then be instructed to choose a new password.

We can then run the following query to check that APOC has been installed:

call dbms.procedures() YIELD name
WHERE name starts with "apoc"
RETURN count(*)
 
╒══════════╕
│"count(*)"│
╞══════════╡
│214       │
└──────────┘

Cool, it worked and we can now Neo4j and APOC to our heart’s content! If we want to SSH into the server we can do that as well by first saving the private key printed on the command line to a file and then executing the following command:

$ cat aws-private-key.pem
-----BEGIN RSA PRIVATE KEY-----
<Private key details>
-----END RSA PRIVATE KEY-----
 
$ chmod 600 aws-private-key.pem
 
$ ssh -i aws-private-key.pem ubuntu@ec2-ip-address.compute-1.amazonaws.com
Welcome to Ubuntu 16.04.2 LTS (GNU/Linux 4.4.0-1013-aws x86_64)
 
 * Documentation:  https://help.ubuntu.com
 * Management:     https://landscape.canonical.com
 * Support:        https://ubuntu.com/advantage
 
  Get cloud support with Ubuntu Advantage Cloud Guest:
    http://www.ubuntu.com/business/services/cloud
 
106 packages can be updated.
1 update is a security update.
 
To run a command as administrator (user "root"), use "sudo <command>".
See "man sudo_root" for details.

You can start/stop neo4j by running the following command:

$ /etc/init.d/neo4j 
Usage: /etc/init.d/neo4j {start|stop|status|restart|force-reload}

The other commands you may be used to finding in the bin folder can be found here:

$ ls -lh /usr/share/neo4j/bin/
total 48K
-rwxr-xr-x 1 neo4j adm   15K May  9 09:22 neo4j
-rwxr-xr-x 1 neo4j adm  5.6K May  9 09:22 neo4j-admin
-rwxr-xr-x 1 root  root  612 May 12 00:03 neo4j-awspasswd
-rwxr-xr-x 1 neo4j adm  5.6K May  9 09:22 neo4j-import
-rwxr-xr-x 1 neo4j adm  5.6K May  9 09:22 neo4j-shell
drwxr-xr-x 2 neo4j adm  4.0K May 11 22:13 tools

Let me know if this is helpful and if you have any suggestions/improvements.

Written by Mark Needham

September 30th, 2017 at 9:23 pm

Posted in neo4j

Tagged with , , ,

Serverless: Building a mini producer/consumer data pipeline with AWS SNS

with one comment

I wanted to create a little data pipeline with Serverless whose main use would be to run once a day, call an API, and load that data into a database.

It’s mostly used to pull in recent data from that API, but I also wanted to be able to invoke it manually and specify a date range.

I created the following pair of lambdas that communicate with each other via an SNS topic.

The code

serverless.yml

service: marks-blog

frameworkVersion: ">=1.2.0 <2.0.0"

provider:
  name: aws
  runtime: python3.6
  timeout: 180
  iamRoleStatements:
    - Effect: 'Allow'
      Action:
        - "sns:Publish"
      Resource:
        - ${self:custom.BlogTopic}

custom:
  BlogTopic:
    Fn::Join:
      - ":"
      - - arn
        - aws
        - sns
        - Ref: AWS::Region
        - Ref: AWS::AccountId
        - marks-blog-topic

functions:
  message-consumer:
    name: MessageConsumer
    handler: handler.consumer
    events:
      - sns:
          topicName: marks-blog-topic
          displayName: Topic to process events
  message-producer:
    name: MessageProducer
    handler: handler.producer
    events:
      - schedule: rate(1 day)

handler.py

import boto3
import json
import datetime
from datetime import timezone
 
def producer(event, context):
    sns = boto3.client('sns')
 
    context_parts = context.invoked_function_arn.split(':')
    topic_name = "marks-blog-topic"
    topic_arn = "arn:aws:sns:{region}:{account_id}:{topic}".format(
        region=context_parts[3], account_id=context_parts[4], topic=topic_name)
 
    now = datetime.datetime.now(timezone.utc)
    start_date = (now - datetime.timedelta(days=1)).strftime("%Y-%m-%d")
    end_date = now.strftime("%Y-%m-%d")
 
    params = {"startDate": start_date, "endDate": end_date, "tags": ["neo4j"]}
 
    sns.publish(TopicArn= topic_arn, Message= json.dumps(params))
 
 
def consumer(event, context):
    for record in event["Records"]:
        message = json.loads(record["Sns"]["Message"])
 
        start_date = message["startDate"]
        end_date = message["endDate"]
        tags = message["tags"]
 
        print("start_date: " + start_date)
        print("end_date: " + end_date)
        print("tags: " + str(tags))

Trying it out

We can simulate a message being received locally by executing the following command:

$ serverless invoke local \
    --function message-consumer \
    --data '{"Records":[{"Sns": {"Message":"{\"tags\": [\"neo4j\"], \"startDate\": \"2017-09-25\", \"endDate\": \"2017-09-29\"  }"}}]}'
 
start_date: 2017-09-25
end_date: 2017-09-29
tags: ['neo4j']
null

That seems to work fine. What about if we invoke the message-producer on AWS?

$ serverless invoke --function message-producer
 
null

Did the consumer received the message?

$ serverless logs --function message-consumer
 
START RequestId: 0ef5be87-a5b1-11e7-a905-f1387e68c65f Version: $LATEST
start_date: 2017-09-29
end_date: 2017-09-30
tags: ['neo4j']
END RequestId: 0ef5be87-a5b1-11e7-a905-f1387e68c65f
REPORT RequestId: 0ef5be87-a5b1-11e7-a905-f1387e68c65f	Duration: 0.46 ms	Billed Duration: 100 ms 	Memory Size: 1024 MB	Max Memory Used: 32 MB

Looks like it! We can also invoke the consumer directly on AWS:

$ serverless invoke \
    --function message-consumer \
    --data '{"Records":[{"Sns": {"Message":"{\"tags\": [\"neo4j\"], \"startDate\": \"2017-09-25\", \"endDate\": \"2017-09-26\"  }"}}]}'
 
null

And now if we check the consumer’s logs we’ll see both messages:

$ serverless logs --function message-consumer
 
START RequestId: 0ef5be87-a5b1-11e7-a905-f1387e68c65f Version: $LATEST
start_date: 2017-09-29
end_date: 2017-09-30
tags: ['neo4j']
END RequestId: 0ef5be87-a5b1-11e7-a905-f1387e68c65f
REPORT RequestId: 0ef5be87-a5b1-11e7-a905-f1387e68c65f	Duration: 0.46 ms	Billed Duration: 100 ms 	Memory Size: 1024 MB	Max Memory Used: 32 MB	
 
START RequestId: 4cb42bc9-a5b1-11e7-affb-99fa6b4dc3ed Version: $LATEST
start_date: 2017-09-25
end_date: 2017-09-26
tags: ['neo4j']
END RequestId: 4cb42bc9-a5b1-11e7-affb-99fa6b4dc3ed
REPORT RequestId: 4cb42bc9-a5b1-11e7-affb-99fa6b4dc3ed	Duration: 16.46 ms	Billed Duration: 100 ms 	Memory Size: 1024 MB	Max Memory Used: 32 MB

Success!

Written by Mark Needham

September 30th, 2017 at 7:51 am

Serverless: S3 – S3BucketPermissions – Action does not apply to any resource(s) in statement

without comments

I’ve been playing around with S3 buckets with Serverless, and recently wrote the following code to create an S3 bucket and put a file into that bucket:

const AWS = require("aws-sdk");
 
let regionParams = { 'region': 'us-east-1' }
let s3 = new AWS.S3(regionParams);
 
let s3BucketName = "marks-blog-bucket";
 
console.log("Creating bucket: " + s3BucketName);
let bucketParams = { Bucket: s3BucketName, ACL: "public-read" };
 
s3.createBucket(bucketParams).promise()
  .then(console.log)
  .catch(console.error);
 
var putObjectParams = {
    Body: "<html><body><h1>Hello blog!</h1></body></html>",
    Bucket: s3BucketName,
    Key: "blog.html"
   };
 
s3.putObject(putObjectParams).promise()
  .then(console.log)
  .catch(console.error);

When I tried to cURL the file I got a permission denied exception:

$ curl --head --silent https://s3.amazonaws.com/marks-blog-bucket/blog.html
HTTP/1.1 403 Forbidden
x-amz-request-id: 512FE36798C0BE4D
x-amz-id-2: O1ELGMJ0jjs11WCrNiVNF2z2ssRgtO4+M4H2QQB5025HjIpC54VId0eKZryYeBYN8Pvb8GsolTQ=
Content-Type: application/xml
Transfer-Encoding: chunked
Date: Fri, 29 Sep 2017 05:42:03 GMT
Server: AmazonS3

I wrote the following code to try and have Serverless create a bucket policy that would make all files in my bucket publicly accessible:

serverless.yml

service: marks-blog

frameworkVersion: ">=1.2.0 <2.0.0"

provider:
  name: aws
  runtime: python3.6
  timeout: 180

resources:
  Resources:
    S3BucketPermissions:
      Type: AWS::S3::BucketPolicy
      Properties:
        Bucket: marks-blog-bucket
        PolicyDocument:
          Statement:
            - Principal: "*"
              Action:
                - s3:GetObject
              Effect: Allow
              Sid: "AddPerm"
              Resource: arn:aws:s3:::marks-blog-bucket
 
...

Let’s try to deploy it:

./node_modules/serverless/bin/serverless deploy
Serverless: Packaging service...
Serverless: Excluding development dependencies...
Serverless: Uploading CloudFormation file to S3...
Serverless: Uploading artifacts...
Serverless: Uploading service .zip file to S3 (1.3 KB)...
Serverless: Validating template...
Serverless: Updating Stack...
Serverless: Checking Stack update progress...
........
Serverless: Operation failed!
 
  Serverless Error ---------------------------------------
 
  An error occurred: S3BucketPermissions - Action does not apply to any resource(s) in statement.

D’oh! That didn’t do what I expected.

I learnt that this message means:

Some services do not let you specify actions for individual resources; instead, any actions that you list in the Action or NotAction element apply to all resources in that service. In these cases, you use the wildcard * in the Resource element.

To fix it we need to use the wildcard * to indicate that the s3:GetObject permission should apply to all values in the bucket rather than to the bucket itself.

Take 2!

serverless.yml

service: marks-blog

frameworkVersion: ">=1.2.0 <2.0.0"

provider:
  name: aws
  runtime: python3.6
  timeout: 180

resources:
  Resources:
    S3BucketPermissions:
      Type: AWS::S3::BucketPolicy
      Properties:
        Bucket: marks-blog-bucket
        PolicyDocument:
          Statement:
            - Principal: "*"
              Action:
                - s3:GetObject
              Effect: Allow
              Sid: "AddPerm"
              Resource: arn:aws:s3:::marks-blog-bucket/*
 
...

Let’s deploy again and try to access the file:

$ curl --head --silent https://s3.amazonaws.com/marks-blog-bucket/blog.html
HTTP/1.1 200 OK
x-amz-id-2: uGwsLLoFHf+slXADGYkqW0bLfQ7EPG/kqzV3l2k7SMex4NlMEpNsNN/cIC9INLPohDtVFwUAa90=
x-amz-request-id: 7869E21760CD50F1
Date: Fri, 29 Sep 2017 06:05:11 GMT
Last-Modified: Fri, 29 Sep 2017 06:01:33 GMT
ETag: "57bac87219812c2f9a581943da34cfde"
Accept-Ranges: bytes
Content-Type: application/octet-stream
Content-Length: 46
Server: AmazonS3

Success! And if we check in the AWS console we can see that the bucket policy has been applied to our bucket:

2017 09 29 07 06 13

Written by Mark Needham

September 29th, 2017 at 6:09 am

Posted in Software Development

Tagged with , ,

Python 3: Create sparklines using matplotlib

without comments

I recently wanted to create sparklines to show how some values were changing over time. In addition, I wanted to generate them as images on the server rather than introducing a JavaScript library.

Chris Seymour’s excellent gist which shows how to create sparklines inside a Pandas dataframe got me most of the way there, but I had to tweak his code a bit to get it to play nicely with Python 3.6.

This is what I ended up with:

import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import base64
 
from io import BytesIO
 
def sparkline(data, figsize=(4, 0.25), **kwags):
    """
    Returns a HTML image tag containing a base64 encoded sparkline style plot
    """
    data = list(data)
 
    fig, ax = plt.subplots(1, 1, figsize=figsize, **kwags)
    ax.plot(data)
    for k,v in ax.spines.items():
        v.set_visible(False)
    ax.set_xticks([])
    ax.set_yticks([])
 
    plt.plot(len(data) - 1, data[len(data) - 1], 'r.')
 
    ax.fill_between(range(len(data)), data, len(data)*[min(data)], alpha=0.1)
 
    img = BytesIO()
    plt.savefig(img, transparent=True, bbox_inches='tight')
    img.seek(0)
    plt.close()
 
    return base64.b64encode(img.read()).decode("UTF-8")

I had to change the class used to write the image from StringIO to BytesIO and I found I needed to decode the bytes produced if I wanted it to display in a HTML page.

This is how you would call the above function:

if __name__ == "__main__":
    values = [
        [1,2,3,4,5,6,7,8,9,10],
        [7,10,12,18,2,8,10,6,7,12],
        [10,9,8,7,6,5,4,3,2,1]
    ]
 
    with open("/tmp/foo.html", "w") as file:
        for value in values:
            file.write('<div><img src="data:image/png;base64,{}"/></div>'.format(sparkline(value)))

And the HTML page looks like this:

2017 09 23 07 49 32

Written by Mark Needham

September 23rd, 2017 at 6:51 am

Neo4j: Cypher – Create Cypher map with dynamic keys

without comments

I was recently trying to create a map in a Cypher query but wanted to have dynamic keys in that map. I started off with this query:

WITH "a" as dynamicKey, "b" as dynamicValue
RETURN { dynamicKey: dynamicValue } AS map
 
 
╒══════════════════╕
│"map"             │
╞══════════════════╡
│{"dynamicKey":"b"}│
└──────────────────┘

Not quite what we want! We want dynamicKey to be evaluated rather than treated as a literal. As usual, APOC comes to the rescue!

In fact APOC has several functions that will help us out here. Let’s take a look at them:

CALL dbms.functions() yield name, description
WHERE name STARTS WITH "apoc.map.from"
RETURN name, description
 
╒═════════════════════╤═════════════════════════════════════════════════════╕
│"name"               │"description"                                        │
╞═════════════════════╪═════════════════════════════════════════════════════╡
│"apoc.map.fromLists" │"apoc.map.fromLists([keys],[values])"                │
├─────────────────────┼─────────────────────────────────────────────────────┤
│"apoc.map.fromNodes" │"apoc.map.fromNodes(label, property)"                │
├─────────────────────┼─────────────────────────────────────────────────────┤
│"apoc.map.fromPairs" │"apoc.map.fromPairs([[key,value],[key2,value2],...])"│
├─────────────────────┼─────────────────────────────────────────────────────┤
│"apoc.map.fromValues"│"apoc.map.fromValues([key1,value1,key2,value2,...])" │
└─────────────────────┴─────────────────────────────────────────────────────┘

So we can generate a map like this:

WITH "a" as dynamicKey, "b" as dynamicValue
RETURN apoc.map.fromValues([dynamicKey, dynamicValue]) AS map
 
╒═════════╕
│"map"    │
╞═════════╡
│{"a":"b"}│
└─────────┘

or like this:

WITH "a" as dynamicKey, "b" as dynamicValue
RETURN apoc.map.fromLists([dynamicKey], [dynamicValue]) AS map
 
╒═════════╕
│"map"    │
╞═════════╡
│{"a":"b"}│
└─────────┘

or even like this:

WITH "a" as dynamicKey, "b" as dynamicValue
RETURN apoc.map.fromPairs([[dynamicKey, dynamicValue]]) AS map
 
╒═════════╕
│"map"    │
╞═════════╡
│{"a":"b"}│
└─────────┘

Written by Mark Needham

September 19th, 2017 at 7:30 pm

Posted in neo4j

Tagged with , ,

Neo4j: Cypher – Rounding of floating point numbers/BigDecimals

without comments

I was doing some data cleaning a few days ago and wanting to multiply a value by 1 million. My Cypher code to do this looked like this:

with "8.37" as rawNumeric 
RETURN toFloat(rawNumeric) * 1000000 AS numeric
 
╒═════════════════╕
│"numeric"        │
╞═════════════════╡
│8369999.999999999│
└─────────────────┘

Unfortunately that suffers from the classic rounding error when working with floating point numbers. I couldn’t figure out a way to solve it using pure Cypher, but there tends to be an APOC function to solve every problem and this was no exception.

I’m using Neo4j 3.2.3 so I downloaded the corresponding APOC jar and put it in a plugins directory:

$ ls -lh plugins/
total 3664
-rw-r--r--@ 1 markneedham  staff   1.8M  9 Aug 09:14 apoc-3.2.0.4-all.jar

I’m using Docker so I needed to tell that where my plugins folder lives:

$ docker run -v $PWD/plugins:/plugins \
    -p 7474:7474 \
    -p 7687:7687 \
    -e NEO4J_AUTH="none" \
    neo4j:3.2.3

Now we’re reading to try out our new function:

with "8.37" as rawNumeric 
RETURN apoc.number.exact.mul(rawNumeric,"1000000") AS apocConversion
 
╒════════════════╕
│"apocConversion"│
╞════════════════╡
│"8370000.00"    │
└────────────────┘

That almost does what we want, but the result is a string rather than numeric value. It’s not too difficult to fix though:

with "8.37" as rawNumeric 
RETURN toFloat(apoc.number.exact.mul(rawNumeric,"1000000")) AS apocConversion
 
╒════════════════╕
│"apocConversion"│
╞════════════════╡
│8370000         │
└────────────────┘

That’s more like it!

Written by Mark Needham

August 13th, 2017 at 7:23 am

Posted in neo4j

Tagged with , ,