Mark Needham

Thoughts on Software Development

Archive for the ‘Software Development’ Category

AWS Lambda: Programmatically scheduling a CloudWatchEvent

without comments

I recently wrote a blog post showing how to create a Python ‘Hello World’ AWS lambda function and manually invoke it, but what I really wanted to do was have it run automatically every hour.

To achieve that in AWS Lambda land we need to create a CloudWatch Event. The documentation describes them as follows:

Using simple rules that you can quickly set up, you can match events and route them to one or more target functions or streams.

2017 04 05 23 06 36

This is actually really easy from the Amazon web console as you just need to click the ‘Triggers’ tab and then ‘Add trigger’. It’s not obvious that there are actually three steps are involved as they’re abstracted from you.

So what are the steps?

  1. Create rule
  2. Give permission for that rule to execute
  3. Map the rule to the function

I forgot to do step 2) initially and then you just end up with a rule that never triggers, which isn’t particularly useful.

The following code creates a ‘Hello World’ lambda function and runs it once an hour:

import boto3
 
lambda_client = boto3.client('lambda')
events_client = boto3.client('events')
 
fn_name = "HelloWorld"
fn_role = 'arn:aws:iam::[your-aws-id]:role/lambda_basic_execution'
 
fn_response = lambda_client.create_function(
    FunctionName=fn_name,
    Runtime='python2.7',
    Role=fn_role,
    Handler="{0}.lambda_handler".format(fn_name),
    Code={'ZipFile': open("{0}.zip".format(fn_name), 'rb').read(), },
)
 
fn_arn = fn_response['FunctionArn']
frequency = "rate(1 hour)"
name = "{0}-Trigger".format(fn_name)
 
rule_response = events_client.put_rule(
    Name=name,
    ScheduleExpression=frequency,
    State='ENABLED',
)
 
lambda_client.add_permission(
    FunctionName=fn_name,
    StatementId="{0}-Event".format(name),
    Action='lambda:InvokeFunction',
    Principal='events.amazonaws.com',
    SourceArn=rule_response['RuleArn'],
)
 
events_client.put_targets(
    Rule=name,
    Targets=[
        {
            'Id': "1",
            'Arn': fn_arn,
        },
    ]
)

We can now check if our trigger has been configured correctly:

$ aws events list-rules --query "Rules[?Name=='HelloWorld-Trigger']"
[
    {
        "State": "ENABLED", 
        "ScheduleExpression": "rate(1 hour)", 
        "Name": "HelloWorld-Trigger", 
        "Arn": "arn:aws:events:us-east-1:[your-aws-id]:rule/HelloWorld-Trigger"
    }
]
 
$ aws events list-targets-by-rule --rule HelloWorld-Trigger
{
    "Targets": [
        {
            "Id": "1", 
            "Arn": "arn:aws:lambda:us-east-1:[your-aws-id]:function:HelloWorld"
        }
    ]
}
 
$ aws lambda get-policy --function-name HelloWorld
{
    "Policy": "{\"Version\":\"2012-10-17\",\"Id\":\"default\",\"Statement\":[{\"Sid\":\"HelloWorld-Trigger-Event\",\"Effect\":\"Allow\",\"Principal\":{\"Service\":\"events.amazonaws.com\"},\"Action\":\"lambda:InvokeFunction\",\"Resource\":\"arn:aws:lambda:us-east-1:[your-aws-id]:function:HelloWorld\",\"Condition\":{\"ArnLike\":{\"AWS:SourceArn\":\"arn:aws:events:us-east-1:[your-aws-id]:rule/HelloWorld-Trigger\"}}}]}"
}

All looks good so we’re done!

Written by Mark Needham

April 5th, 2017 at 11:49 pm

Posted in Software Development

Tagged with , ,

AWS Lambda: Encrypted environment variables

with 3 comments

Continuing on from my post showing how to create a ‘Hello World’ AWS lambda function I wanted to pass encrypted environment variables to my function.

The following function takes in both an encrypted and unencrypted variable and prints them out.

Don’t print out encrypted variables in a real function, this is just so we can see the example working!

import boto3
import os
 
from base64 import b64decode
 
def lambda_handler(event, context):
    encrypted = os.environ['ENCRYPTED_VALUE']
    decrypted = boto3.client('kms').decrypt(CiphertextBlob=b64decode(encrypted))['Plaintext']
 
    # Don't print out your decrypted value in a real function! This is just to show how it works.
    print("Decrypted value:", decrypted)
 
    plain_text = os.environ["PLAIN_TEXT_VALUE"]
    print("Plain text:", plain_text)

Now we’ll zip up our function into HelloWorldEncrypted.zip, ready to send to AWS.

zip HelloWorldEncrypted.zip HelloWorldEncrypted.py

Now it’s time to upload our function to AWS and create the associated environment variables.

If you’re using a Python editor then you’ll need to install boto3 locally to keep the editor happy but you don’t need to include boto3 in the code you send to AWS Lambda – it comes pre-installed.

Now we write the following code to automate the creation of our Lambda function:

import boto3
from base64 import b64encode
 
fn_name = "HelloWorldEncrypted"
kms_key = "arn:aws:kms:[aws-zone]:[your-aws-id]:key/[your-kms-key-id]"
fn_role = 'arn:aws:iam::[your-aws-id]:role/lambda_basic_execution'
 
lambda_client = boto3.client('lambda')
kms_client = boto3.client('kms')
 
encrypt_me = "abcdefg"
encrypted = b64encode(kms_client.encrypt(Plaintext=encrypt_me, KeyId=kms_key)["CiphertextBlob"])
 
plain_text = 'hijklmno'
 
lambda_client.create_function(
        FunctionName=fn_name,
        Runtime='python2.7',
        Role=fn_role,
        Handler="{0}.lambda_handler".format(fn_name),
        Code={ 'ZipFile': open("{0}.zip".format(fn_name), 'rb').read(),},
        Environment={
            'Variables': {
                'ENCRYPTED_VALUE': encrypted,
                'PLAIN_TEXT_VALUE': plain_text,
            }
        },
        KMSKeyArn=kms_key
)

The tricky bit for me here was figuring out that I needed to pass the value that I wanted to base 64 encode the output of the value encrypted by the KMS client. The KMS client relies on a KMS key that we need to setup. We can see a list of all our KMS keys by running the following command:

$ aws kms list-keys

The format of these keys is arn:aws:kms:[zone]:[account-id]:key/[key-id].

Now let’s try executing our Lambda function from the AWS console:

$ python CreateHelloWorldEncrypted.py

Let’s check it got created:

$ aws lambda list-functions --query "Functions[*].FunctionName"
[
    "HelloWorldEncrypted", 
]

And now let’s execute the function:

$ aws lambda invoke --function-name HelloWorldEncrypted --invocation-type RequestResponse --log-type Tail /tmp/out | jq ".LogResult"
"U1RBUlQgUmVxdWVzdElkOiA5YmNlM2E1MC0xODMwLTExZTctYjFlNi1hZjQxZDYzMzYxZDkgVmVyc2lvbjogJExBVEVTVAooJ0RlY3J5cHRlZCB2YWx1ZTonLCAnYWJjZGVmZycpCignUGxhaW4gdGV4dDonLCAnaGlqa2xtbm8nKQpFTkQgUmVxdWVzdElkOiA5YmNlM2E1MC0xODMwLTExZTctYjFlNi1hZjQxZDYzMzYxZDkKUkVQT1JUIFJlcXVlc3RJZDogOWJjZTNhNTAtMTgzMC0xMWU3LWIxZTYtYWY0MWQ2MzM2MWQ5CUR1cmF0aW9uOiAzNjAuMDQgbXMJQmlsbGVkIER1cmF0aW9uOiA0MDAgbXMgCU1lbW9yeSBTaXplOiAxMjggTUIJTWF4IE1lbW9yeSBVc2VkOiAyNCBNQgkK"

That’s a bit hard to read, some decoding is needed:

$ echo "U1RBUlQgUmVxdWVzdElkOiA5YmNlM2E1MC0xODMwLTExZTctYjFlNi1hZjQxZDYzMzYxZDkgVmVyc2lvbjogJExBVEVTVAooJ0RlY3J5cHRlZCB2YWx1ZTonLCAnYWJjZGVmZycpCignUGxhaW4gdGV4dDonLCAnaGlqa2xtbm8nKQpFTkQgUmVxdWVzdElkOiA5YmNlM2E1MC0xODMwLTExZTctYjFlNi1hZjQxZDYzMzYxZDkKUkVQT1JUIFJlcXVlc3RJZDogOWJjZTNhNTAtMTgzMC0xMWU3LWIxZTYtYWY0MWQ2MzM2MWQ5CUR1cmF0aW9uOiAzNjAuMDQgbXMJQmlsbGVkIER1cmF0aW9uOiA0MDAgbXMgCU1lbW9yeSBTaXplOiAxMjggTUIJTWF4IE1lbW9yeSBVc2VkOiAyNCBNQgkK" | base64 --decode
START RequestId: 9bce3a50-1830-11e7-b1e6-af41d63361d9 Version: $LATEST
('Decrypted value:', 'abcdefg')
('Plain text:', 'hijklmno')
END RequestId: 9bce3a50-1830-11e7-b1e6-af41d63361d9
REPORT RequestId: 9bce3a50-1830-11e7-b1e6-af41d63361d9	Duration: 360.04 ms	Billed Duration: 400 ms 	Memory Size: 128 MB	Max Memory Used: 24 MB

And it worked, hoorah!

Written by Mark Needham

April 3rd, 2017 at 5:49 am

Posted in Software Development

Tagged with ,

AWS Lambda: Programatically create a Python ‘Hello World’ function

with 2 comments

I’ve been playing around with AWS Lambda over the last couple of weeks and I wanted to automate the creation of these functions and all their surrounding config.

Let’s say we have the following Hello World function:

def lambda_handler(event, context):
    print("Hello world")

To upload it to AWS we need to put it inside a zip file so let’s do that:

$ zip HelloWorld.zip HelloWorld.py
$ unzip -l HelloWorld.zip 
Archive:  HelloWorld.zip
  Length     Date   Time    Name
 --------    ----   ----    ----
       61  04-02-17 22:04   HelloWorld.py
 --------                   -------
       61                   1 file

Now we’re ready to write a script to create our AWS lambda function.

import boto3
 
lambda_client = boto3.client('lambda')
 
fn_name = "HelloWorld"
fn_role = 'arn:aws:iam::[your-aws-id]:role/lambda_basic_execution'
 
lambda_client.create_function(
    FunctionName=fn_name,
    Runtime='python2.7',
    Role=fn_role,
    Handler="{0}.lambda_handler".format(fn_name),
    Code={'ZipFile': open("{0}.zip".format(fn_name), 'rb').read(), },
)

[your-aws-id] needs to be replaced with the identifier of our AWS account. We can find that out be running the following command against the AWS CLI:

$ aws ec2 describe-security-groups --query 'SecurityGroups[0].OwnerId' --output text
123456789012

Now we can create our function:

$ python CreateHelloWorld.py

2017 04 02 23 07 38

And if we test the function we’ll get the expected output:

2017 04 02 23 02 59

Written by Mark Needham

April 2nd, 2017 at 10:11 pm

Posted in Software Development

Tagged with ,

My top 10 technology podcasts

without comments

For the last six months I’ve been listening to 2 or 3 technology podcasts every day while out running and on my commute and I thought it’d be cool to share some of my favourites.

I listen to all of these on the Podbean android app which seems pretty good. It can’t read the RSS feeds of some podcasts but other than that it’s worked well.

Anyway, on with the podcasts:

Software Engineering Daily

This is the most reliable of all the podcasts I’ve listened to and a new episode is posted every weekday.

It sweeps across lots of different areas of technology – there’s a bit of software development, a bit of data engineering, and a bit of infrastructure.

Every now and then there’s a focus on a particular topic area or company which I find really interesting e.g. in 2015 there was a week of Bitcoin focused episodes and more recently there’s been a bunch of episodes about Stripe.

Partially Derivative

This one is more of a data science postcast and cover lots of different areas in that space but thankfully keep the conversation at a level that a non data scientist like me can understand.

I especially liked the post US election episode where they talked about the problems with polling and how most election predictions had ended up being wrong.

There’s roughly one new episode a week.

O’Reilly Bots podcast

I didn’t know anything about bots before i listened to this podcast and it was quite addictive – i powered through all the episodes in a few weeks.

They cover all sorts of topics that I’d have never thought of – why have developers got interested in bots? How do UIs differ to ones in apps? How do users find out about bots?

I really enjoy listening to this one but it’s been a bit quiet recently.

Datanauts

I found this one really useful for getting the hang of infrastructure topics. I wanted to learn a bit more about Kubernetes a few months ago and they had an episode which gives an overview as well as more detailed episodes.

One neat feature of this podcast is that after each part of an interview the hosts summarise what they picked up from that segment. I like that it gives you a few seconds to think about what you picked up and whether it matches the summary.

Some of the episodes go really deep into specific infrastructure topics and I struggle to follow along but there are enough other ones to keep me happy.

Becoming a Data Scientist

This one mirrors the journey of Renee Teate getting into data science and bringing everyone along on the journey.

Each episode is paired with a learning exercises for the listener to try and although any of the learning exercises yet I like how some interviews are structured around them. e.g. Sebastien Rashka was interviewed about model accuracy on the week that was being explored in the learning club.

If you’re interested in data science topics but aren’t a data scientist yourself this is a good one to listen to.

This Week In Machine Learning and AI Podcast

This one mostly goes well over my head but it’s still interesting to listen to other people talk about stuff they’re working on.

There’s a lot of focus on Deep Learning so i think i need to learn a bit more about that and then the episodes will make more sense.

The last episode with Evan Wright was much more accessible. I need more like that one!

The Women in Tech Show

I came across Edaena Salinas on Software Engineering Daily and didn’t initially realise that Edaena had a podcast until a couple of weeks ago.

There’s lots of interesting content on this one. The episodes on data driven marketing and unconscious bias are my favourites of the ones I’ve listened to so far.

The Bitcoin Podcast

I listened to a few shows about bitcoin on Software Engineering Daily and found this podcast while trying to learn more.

Some of the episodes are general enough that i can follow along but others use a lot of block chain specific terminology that leave me feeling a bit lost.

I especially liked the episode that featured Greg Walker of learnmeabitcoin fame. Greg uses Neo4j as part of the website and presented at the London Neo4j meetup earlier this week.

Go Time

This one has a chat based format that I really. They have a cool section called ‘free software Friday’ at the end of each show where everybody calls out a piece of software or maintainer that they’re grateful for.

I was playing around with Go in November/December last year so it was really helpful in pointing me in the right direction. I haven’t done any stuff recently so it’s more a general interest show for now.

Change Log

This one covers lots of different topics, mostly around different open source projects.

The really cool thing about this one is they get every guest to explain their ‘origin story’ i.e. how did they get into software and what was their path to the current job. The interview with Nathan Sobo about Atom was particularly good in this respect.

It’s always interesting to hear how other people got started and contrast it with my own experiences.

Another cool feature of this podcast is that they sometimes have episodes where they interview people at open source conferences.

That’s it folks

That’s all for now. Hopefully there’s one or more in there that you haven’t listened to before.

If you’ve got any suggestions for other ones I should listen to let me know in the comments or send me a message on twitter @markhneedham

Written by Mark Needham

March 30th, 2017 at 10:38 pm

Kubernetes: Writing hostname to a file

without comments

Over the weekend I spent a bit of time playing around with Kubernetes and to get the hang of the technology I set myself the task of writing the hostname of the machine to a file.

I’m using the excellent minikube tool to create a local Kubernetes cluster for my experiments so the first step is to spin that up:

$ minikube start
Starting local Kubernetes cluster...
Kubectl is now configured to use the cluster.

The first thing I needed to work out how to get the hostname. I figured there was probably an environment variable that I could access. We can call the env command to see a list of all the environment variables in a container so I created a pod template that would show me that information:

hostname_super_simple.yaml

apiVersion: v1
kind: Pod
metadata:
  name: mark-super-simple-test-pod
spec:
  containers:
    - name: test-container
      image: gcr.io/google_containers/busybox:1.24
      command: [ "/bin/sh", "-c", "env" ]      
  dnsPolicy: Default
  restartPolicy: Never

I then created a pod from that template and checked the logs of that pod:

$ kubectl create -f hostname_super_simple.yaml 
pod "mark-super-simple-test-pod" created
$ kubectl logs  mark-super-simple-test-pod
KUBERNETES_SERVICE_PORT=443
KUBERNETES_PORT=tcp://10.0.0.1:443
HOSTNAME=mark-super-simple-test-pod
SHLVL=1
HOME=/root
KUBERNETES_PORT_443_TCP_ADDR=10.0.0.1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
KUBERNETES_SERVICE_PORT_HTTPS=443
KUBERNETES_PORT_443_TCP=tcp://10.0.0.1:443
PWD=/
KUBERNETES_SERVICE_HOST=10.0.0.1

The information we need is in $HOSTNAME so the next thing we need to do is created a pod template which puts that into a file.

hostname_simple.yaml

apiVersion: v1
kind: Pod
metadata:
  name: mark-test-pod
spec:
  containers:
    - name: test-container
      image: gcr.io/google_containers/busybox:1.24
      command: [ "/bin/sh", "-c", "echo $HOSTNAME > /tmp/bar; cat /tmp/bar" ]
  dnsPolicy: Default
  restartPolicy: Never

We can create a pod using this template by running the following command:

$ kubectl create -f hostname_simple.yaml
pod "mark-test-pod" created

Now let’s check the logs of the instance to see whether our script worked:

$ kubectl logs mark-test-pod
mark-test-pod

Indeed it did, good times!

Written by Mark Needham

November 22nd, 2016 at 7:56 pm

Posted in Software Development

Tagged with

Study until your mind wanders

with 2 comments

I’ve previously found it very difficult to read math heavy content which has made it challenging to read Distributed Computing which I bought last May.

After several false starts where I gave up after getting frustrated that I couldn’t understand things the first time around and forgot everything if I left it a couple of days I decided to try again with a different approach.

I’ve been trying a technique I learned from Mini Habits where every day I have a (very small) goal of reading one page of the book. Having such a small goal means that I can read the material as slowly as I like (repeating previous days if necessary).

So far I’ve read 4 chapters (~100 pages) over the last month – some days I read 6 or 7 pages, other days I only manage one. The key is to keep the rhythm of reading something.

I tried doing the reading at different times of the day – on the bus on the way to work, in the evening before going to sleep – and found that for me the best time is immediately when I wake up. To minimise my mind wandering I don’t read any emails, chat messages or social media accounts before I start.

Despite this, I’ve noticed that after a while my mind starts to wander while reading proofs and that’s a signal for me to stop for the day. When I pick up again the next day I’ve often found that I understand what I was having difficulty with.

I’ve read that meditation prior to studying is an effective way to quiet the mind and would be a more ‘on demand’ way of achieving the concentration required to read this type of material. I’ve never done any meditation so if anyone has any tips on where to start that’d be helpful.

Written by Mark Needham

December 31st, 2015 at 10:47 am

jq: Cannot iterate over number / string and number cannot be added

without comments

In my continued parsing of meetup.com’s JSON API I wanted to extract some information from the following JSON file:

$ head -n40 data/members/18313232.json
[
  {
    "status": "active",
    "city": "London",
    "name": ". .",
    "other_services": {},
    "country": "gb",
    "topics": [],
    "lon": -0.13,
    "joined": 1438866605000,
    "id": 92951932,
    "state": "17",
    "link": "http://www.meetup.com/members/92951932",
    "photo": {
      "thumb_link": "http://photos1.meetupstatic.com/photos/member/8/d/6/b/thumb_250896203.jpeg",
      "photo_id": 250896203,
      "highres_link": "http://photos1.meetupstatic.com/photos/member/8/d/6/b/highres_250896203.jpeg",
      "photo_link": "http://photos1.meetupstatic.com/photos/member/8/d/6/b/member_250896203.jpeg"
    },
    "lat": 51.49,
    "visited": 1446745707000,
    "self": {
      "common": {}
    }
  },
  {
    "status": "active",
    "city": "London",
    "name": "Abdelkader Idryssy",
    "other_services": {},
    "country": "gb",
    "topics": [
      {
        "name": "Weekend Adventures",
        "urlkey": "weekend-adventures",
        "id": 16438
      },
      {
        "name": "Community Building",
        "urlkey": "community-building",

In particular I want to extract the member’s id, name, join date and the ids of topics they’re interested in. I started with the following jq query to try and extract those attributes:

$ jq -r '.[] | [.id, .name, .joined, (.topics[] | .id | join(";"))] | @csv' data/members/18313232.json
Cannot iterate over number (16438)

Annoyingly this treats topic ids on an individual basis rather than as an array as I wanted. I tweaked the query to the following with no luck:

$ jq -r '.[] | [.id, .name, .joined, (.topics[].id | join(";"))] | @csv' data/members/18313232.json
Cannot iterate over number (16438)

As a guess I decided to wrap ‘.topics[].id’ in an array literal to see if it had any impact:

$ jq -r '.[] | [.id, .name, .joined, ([.topics[].id] | join(";"))] | @csv' data/members/18313232.json
92951932,". .",1438866605000,""
jq: error (at data/members/18313232.json:60013): string ("") and number (16438) cannot be added

Woot! A different error message at least and this one seems to be due to a type mismatch between the string we want to end up with and the array of numbers that we currently have.

We can cast our way to victory with the ‘tostring’ function:

$ jq -r '.[] | [.id, .name, .joined, ([.topics[].id | tostring] | join(";"))] | @csv' data/members/18313232.json
...
92951932,". .",1438866605000,""
193866304,"Abdelkader Idryssy",1445195325000,"16438;20727;15401;9760;20246;20923;3336;2767;242;58259;4417;1789;10454;20274;10232;563;25375;16433;15187;17635;26273;21808;933;7789;23884;16212;144477;15322;21067;3833;108403;20221;1201;182;15083;9696;4377;15360;18296;15121;17703;10161;1322;3880;18333;3485;15585;44584;18692;21681"
28643052,"Abhishek Chanda",1439688955000,"646052;520302;15167;563;65735;537492;646072;537502;24959;1025832;8599;31197;24410;26118;10579;1064;189;48471;16216;18062;33089;107633;46831;20479;1423042;86258;21441;3833;21681;188;9696;58162;20398;113032;18060;29971;55324;30928;15261;58259;638;16475;27591;10107;242;109595;10470;26384;72514;1461192"
39523062,"Adam Kinder-Jones",1438677474000,"70576;21549;3833;42277;164111;21522;93380;48471;15167;189;563;25435;87614;9696;18062;58162;10579;21681;19882;108403;128595;15582;7029"
194119823,"Adam Lewis",1444867994000,"10209"
14847001,"Adam Rogers",1422917313000,""
87709042,"Adele Green",1436867381000,"15167;18062;102811;9696;30928;18060;78565;189;7029;48471;127567;10579;58162;563;3833;16216;21441;37708;209821;15401;59325;31792;21836;21900;984862;15720;17703;96823;4422;85951;87614;37428;2260;827;121802;19672;38660;84325;118991;135612;10464;1454542;17936;21549;21520;17628;148303;20398;66339;29661"
11497937,"Adrian Bridgett",1421067940000,"30372;15046;25375;638;498;933;374;27591;18062;18060;15167;10581;16438;15672;1998;1273;713;26333;15099;15117;4422;15892;242;142180;563;31197;20479;1502;131178;15018;43256;58259;1201;7319;15940;223;8652;66493;15029;18528;23274;9696;128595;21681;17558;50709;113737"
14151190,"adrian lawrence",1437142198000,"7029;78565;659;85951;15582;48471;9696;128595;563;10579;3833;101960;16137;1973;78566;206;223;21441;16216;108403;21681;186;1998;15731;17703;15043;16613;17885;53531;48375;16615;19646;62326;49954;933;22268;19243;37381;102811;30928;455;10358;73511;127567;106382;16573;36229;781;23981;1954"
183557824,"Adrien Pujol",1421882756000,"108403;563;9696;21681;188;24410;1064;32743;124668;15472;21123;1486432;1500742;87614;46831;1454542;46810;166000;126177;110474"
...

Written by Mark Needham

November 24th, 2015 at 12:12 am

Posted in Software Development

Tagged with

jq: Filtering missing keys

without comments

I’ve been playing around with the meetup.com API again over the last few days and having saved a set of events to disk I wanted to extract the venues using jq.

This is what a single event record looks like:

$ jq -r ".[0]" data/events/0.json
{
  "status": "past",
  "rating": {
    "count": 1,
    "average": 1
  },
  "utc_offset": 3600000,
  "event_url": "http://www.meetup.com/londonweb/events/3261890/",
  "group": {
    "who": "Web Peeps",
    "name": "London Web",
    "group_lat": 51.52000045776367,
    "created": 1034097743000,
    "join_mode": "approval",
    "group_lon": -0.12999999523162842,
    "urlname": "londonweb",
    "id": 163876
  },
  "name": "London Web Design October Meetup",
  "created": 1094756756000,
  "venue": {
    "city": "London",
    "name": "Roadhouse Live Music Restaurant , Bar & Club",
    "country": "GB",
    "lon": -0.1,
    "phone": "44-020-7240-6001",
    "address_1": "The Piazza",
    "address_2": "Covent Garden",
    "repinned": false,
    "lat": 51.52,
    "id": 11725
  },
  "updated": 1273536337000,
  "visibility": "public",
  "yes_rsvp_count": 2,
  "time": 1097776800000,
  "waitlist_count": 0,
  "headcount": 0,
  "maybe_rsvp_count": 5,
  "id": "3261890"
}

We want to extract the keys underneath ‘venue’.
I started with the following:

$ jq -r ".[] | .venue" data/events/0.json
...
{
  "city": "London",
  "name": "Counting House Pub",
  "country": "gb",
  "lon": -0.085022,
  "phone": "020 7283 7123",
  "address_1": "50 Cornhill Rd",
  "address_2": "EC3V 3PD",
  "repinned": false,
  "lat": 51.513407,
  "id": 835790
}
null
{
  "city": "Paris",
  "name": "Mozilla Paris",
  "country": "fr",
  "lon": 2.341002,
  "address_1": "16 Bis Boulevard Montmartre",
  "repinned": false,
  "lat": 48.871834,
  "id": 23591845
}
...

This is close to what I want but it includes ‘null’ values which means when you extract the keys inside ‘venue’ they are all empty as well:

jq -r ".[] | .venue | [.id, .name, .city, .address_1, .address_2, .lat, .lon] | @csv" data/events/0.json
...
101958,"The Green Man and French Horn,  -","London","54, St. Martins Lane - Covent Garden","WC2N 4EA",51.52,-0.1
,,,,,,
107295,"The Yorkshire Grey Pub","London","46 Langham Street","W1W 7AX",51.52,-0.1
...
,,,,,,

If functional programming lingo we want to filter out any JSON documents which don’t have the ‘venue’ key.
‘filter’ has a different meaning in jq so it took me a while to realise that the ‘select’ function was what I needed to get rid of the null values:

$ jq -r ".[] | select(.venue != null) | .venue | [.id, .name, .city, .address_1, .address_2, .lat, .lon] | @csv" data/events/0.json | head
11725,"Roadhouse Live Music Restaurant , Bar & Club","London","The Piazza","Covent Garden",51.52,-0.1
11725,"Roadhouse Live Music Restaurant , Bar & Club","London","The Piazza","Covent Garden",51.52,-0.1
11725,"Roadhouse Live Music Restaurant , Bar & Club","London","The Piazza","Covent Garden",51.52,-0.1
11725,"Roadhouse Live Music Restaurant , Bar & Club","London","The Piazza","Covent Garden",51.52,-0.1
76192,"Pied Bull Court","London","Galen Place, London, WC1A 2JR",,51.516747,-0.12719
76192,"Pied Bull Court","London","Galen Place, London, WC1A 2JR",,51.516747,-0.12719
85217,"Earl's Court Exhibition Centre","London","Warwick Road","SW5 9TA",51.49233,-0.199735
96579,"Olympia 2","London","Near Olympia tube station",,51.52,-0.1
76192,"Pied Bull Court","London","Galen Place, London, WC1A 2JR",,51.516747,-0.12719
101958,"The Green Man and French Horn,  -","London","54, St. Martins Lane - Covent Garden","WC2N 4EA",51.52,-0.1

And we’re done.

Written by Mark Needham

November 14th, 2015 at 10:51 pm

Posted in Software Development

Tagged with

Docker 1.9: Port forwarding on Mac OS X

without comments

Since the Neo4j 2.3.0 release there’s been an official docker image which I thought I’d give a try this afternoon.

The last time I used docker about a year ago I had to install boot2docker which has now been deprecated in place of Docker Machine and the Docker Toolbox.

I created a container with the following command:

docker run --detach --publish=7474:7474 neo4j/neo4j

And then tried to access the Neo4j server locally:

$ curl http://localhost:7474
curl: (7) Failed to connect to localhost port 7474: Connection refused

I quickly checked that docker had started up Neo4j correctly:

$ docker ps
CONTAINER ID        IMAGE               COMMAND                  CREATED             STATUS              PORTS                              NAMES
1f7c48e267f0        neo4j/neo4j         "/docker-entrypoint.s"   10 minutes ago      Up 10 minutes       7473/tcp, 0.0.0.0:7474->7474/tcp   kickass_easley

Looks good. Amusingly I then came across my own blog post from a year ago where I’d run into the same problem – the problem being that we need to access the Neo4j server via the VM’s IP address rather than localhost.

Instead of using boot2docker we now need to use docker-machine to find the VM’s IP address:

$ docker-machine ls
NAME      ACTIVE   DRIVER       STATE     URL                         SWARM
default   *        virtualbox   Running   tcp://192.168.99.100:2376
$ curl http://192.168.99.100:7474
{
  "management" : "http://192.168.99.100:7474/db/manage/",
  "data" : "http://192.168.99.100:7474/db/data/"
}

And we’re back in business.

Written by Mark Needham

November 8th, 2015 at 8:58 pm

Posted in Software Development

Tagged with

IntelliJ ‘java: cannot find JDK 1.8’

without comments

I upgraded to IntelliJ 15.0 a few days ago and was initially seeing the following exception when trying to compile:

module-name
 
java: cannot find JDK 1.8

I’ve been compiling against JDK 1.8 for a while now using IntelliJ 14 so I wasn’t sure what was going on.

I checked my project settings and they seemed fine:

2015 11 08 11 39 16

The error message suggested I look in the logs to find more information but I wasn’t sure where those live! I eventually found out the answer via the comments of this support ticket although I later found a post describing it in more detail.

Looking into the logs revealed the following error message:

$ less /Users/markneedham/Library/Logs/IntelliJIdea15/idea.log
 
2015-11-05 16:31:28,429 [ 428129]   INFO - figurations.GeneralCommandLine - Cannot run program "/Library/Java/JavaVirtualMachines/jdk1.7.0_71.jdk/Contents/Home/bin/java" (in directory "/Applications/IntelliJ IDEA 15.app/Contents/bin"): error=2, No such file or directory
java.io.IOException: Cannot run program "/Library/Java/JavaVirtualMachines/jdk1.7.0_71.jdk/Contents/Home/bin/java" (in directory "/Applications/IntelliJ IDEA 15.app/Contents/bin"): error=2, No such file or directory
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1048)
	at com.intellij.execution.configurations.GeneralCommandLine.startProcess(GeneralCommandLine.java:368)
	at com.intellij.execution.configurations.GeneralCommandLine.createProcess(GeneralCommandLine.java:354)
	at com.intellij.execution.process.OSProcessHandler.<init>(OSProcessHandler.java:38)
	at org.jetbrains.idea.maven.server.MavenServerManager$3.startProcess(MavenServerManager.java:359)
	at org.jetbrains.idea.maven.server.MavenServerManager$3.execute(MavenServerManager.java:345)
	at com.intellij.execution.rmi.RemoteProcessSupport.a(RemoteProcessSupport.java:206)
	at com.intellij.execution.rmi.RemoteProcessSupport.acquire(RemoteProcessSupport.java:139)
	at org.jetbrains.idea.maven.server.MavenServerManager.create(MavenServerManager.java:163)
	at org.jetbrains.idea.maven.server.MavenServerManager.create(MavenServerManager.java:71)
	at org.jetbrains.idea.maven.server.RemoteObjectWrapper.getOrCreateWrappee(RemoteObjectWrapper.java:41)
	at org.jetbrains.idea.maven.server.MavenServerManager$9.execute(MavenServerManager.java:525)
	at org.jetbrains.idea.maven.server.MavenServerManager$9.execute(MavenServerManager.java:522)
	at org.jetbrains.idea.maven.server.RemoteObjectWrapper.perform(RemoteObjectWrapper.java:76)
	at org.jetbrains.idea.maven.server.MavenServerManager.applyProfiles(MavenServerManager.java:522)
	at org.jetbrains.idea.maven.project.MavenProjectReader.applyProfiles(MavenProjectReader.java:369)
	at org.jetbrains.idea.maven.project.MavenProjectReader.doReadProjectModel(MavenProjectReader.java:98)
	at org.jetbrains.idea.maven.project.MavenProjectReader.access$300(MavenProjectReader.java:42)
	at org.jetbrains.idea.maven.project.MavenProjectReader$1.doProcessParent(MavenProjectReader.java:422)
	at org.jetbrains.idea.maven.project.MavenProjectReader$1.doProcessParent(MavenProjectReader.java:399)
	at org.jetbrains.idea.maven.project.MavenParentProjectFileProcessor.processRepositoryParent(MavenParentProjectFileProcessor.java:84)
	at org.jetbrains.idea.maven.project.MavenParentProjectFileProcessor.process(MavenParentProjectFileProcessor.java:62)
	at org.jetbrains.idea.maven.project.MavenProjectReader.resolveInheritance(MavenProjectReader.java:425)
	at org.jetbrains.idea.maven.project.MavenProjectReader.doReadProjectModel(MavenProjectReader.java:95)
	at org.jetbrains.idea.maven.project.MavenProjectReader.access$300(MavenProjectReader.java:42)
	at org.jetbrains.idea.maven.project.MavenProjectReader$1.doProcessParent(MavenProjectReader.java:422)
	at org.jetbrains.idea.maven.project.MavenProjectReader$1.doProcessParent(MavenProjectReader.java:399)
	at org.jetbrains.idea.maven.project.MavenParentProjectFileProcessor.processRepositoryParent(MavenParentProjectFileProcessor.java:84)
	at org.jetbrains.idea.maven.project.MavenParentProjectFileProcessor.process(MavenParentProjectFileProcessor.java:62)
	at org.jetbrains.idea.maven.project.MavenProjectReader.resolveInheritance(MavenProjectReader.java:425)
	at org.jetbrains.idea.maven.project.MavenProjectReader.doReadProjectModel(MavenProjectReader.java:95)
	at org.jetbrains.idea.maven.project.MavenProjectReader.access$300(MavenProjectReader.java:42)
	at org.jetbrains.idea.maven.project.MavenProjectReader$1.doProcessParent(MavenProjectReader.java:422)
	at org.jetbrains.idea.maven.project.MavenProjectReader$1.doProcessParent(MavenProjectReader.java:399)
	at org.jetbrains.idea.maven.project.MavenParentProjectFileProcessor.processRepositoryParent(MavenParentProjectFileProcessor.java:84)
	at org.jetbrains.idea.maven.project.MavenParentProjectFileProcessor.process(MavenParentProjectFileProcessor.java:62)
	at org.jetbrains.idea.maven.project.MavenProjectReader.resolveInheritance(MavenProjectReader.java:425)
	at org.jetbrains.idea.maven.project.MavenProjectReader.doReadProjectModel(MavenProjectReader.java:95)
	at org.jetbrains.idea.maven.project.MavenProjectReader.readProject(MavenProjectReader.java:53)
	at org.jetbrains.idea.maven.project.MavenProject.read(MavenProject.java:626)
	at org.jetbrains.idea.maven.project.MavenProjectsTree.doUpdate(MavenProjectsTree.java:564)
	at org.jetbrains.idea.maven.project.MavenProjectsTree.doAdd(MavenProjectsTree.java:509)
	at org.jetbrains.idea.maven.project.MavenProjectsTree.update(MavenProjectsTree.java:470)
	at org.jetbrains.idea.maven.project.MavenProjectsTree.updateAll(MavenProjectsTree.java:441)
	at org.jetbrains.idea.maven.project.MavenProjectsProcessorReadingTask.perform(MavenProjectsProcessorReadingTask.java:60)
	at org.jetbrains.idea.maven.project.MavenProjectsProcessor.doProcessPendingTasks(MavenProjectsProcessor.java:134)
	at org.jetbrains.idea.maven.project.MavenProjectsProcessor.access$100(MavenProjectsProcessor.java:30)
	at org.jetbrains.idea.maven.project.MavenProjectsProcessor$2.run(MavenProjectsProcessor.java:109)
	at org.jetbrains.idea.maven.utils.MavenUtil$7.run(MavenUtil.java:464)
	at com.intellij.openapi.application.impl.ApplicationImpl$8.run(ApplicationImpl.java:365)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)
	at org.jetbrains.ide.PooledThreadExecutor$1$1.run(PooledThreadExecutor.java:55)
Caused by: java.io.IOException: error=2, No such file or directory
	at java.lang.UNIXProcess.forkAndExec(Native Method)
	at java.lang.UNIXProcess.<init>(UNIXProcess.java:248)
	at java.lang.ProcessImpl.start(ProcessImpl.java:134)
	at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
	... 55 more

Somewhere I had a JDK 1.7 defined which no longer existed on my machine. I actually only have one JDK installed at the moment:

$ /usr/libexec/java_home -V
Matching Java Virtual Machines (1):
    1.8.0_51, x86_64:	"Java SE 8"	/Library/Java/JavaVirtualMachines/jdk1.8.0_51.jdk/Contents/Home
 
/Library/Java/JavaVirtualMachines/jdk1.8.0_51.jdk/Contents/Home

A bit of exploring led me to ‘Platform Settings’ which is where the culprit was:

2015 11 08 11 45 00

That setting lives actually lives in /Users/markneedham/Library/Preferences/IntelliJIdea15/options/jdk.table.xml and once I removed it IntelliJ resumed normal service.

Written by Mark Needham

November 8th, 2015 at 11:47 am