Mark Needham

Thoughts on Software Development

PHP vs Python: Generating a HMAC

without comments

I’ve been writing a bit of code to integrate with a ClassMarker webhook, and you’re required to check that an incoming request actually came from ClassMarker by checking the value of a base64 hash using HMAC SHA256.

The example in the documentation is written in PHP which I haven’t done for about 10 years so I had to figure out how to do the same thing in Python.

This is the PHP version:

$ php -a
php > echo base64_encode(hash_hmac("sha256", "my data", "my_secret", true));
vyniKpNSlxu4AfTgSJImt+j+pRx7v6m+YBobfKsoGhE=

The Python equivalent is a bit more code but it’s not too bad.

Import all the libraries

import hmac
import hashlib
import base64

Generate that hash

data = "my data".encode("utf-8")
digest = hmac.new(b"my_secret", data, digestmod=hashlib.sha256).digest()
 
print(base64.b64encode(digest).decode())
'vyniKpNSlxu4AfTgSJImt+j+pRx7v6m+YBobfKsoGhE='

We’re getting the same value as the PHP version so it’s good times all round.

Written by Mark Needham

August 2nd, 2017 at 6:09 am

Posted in Python

Tagged with ,

Docker: Building custom Neo4j images on Mac OS X

without comments

I sometimes needs to create custom Neo4j Docker images to try things out and wanted to share my work flow, mostly for future Mark but also in case it’s useful to someone else.

There’s already a docker-neo4j repository so we’ll just tweak the files in there to achieve what we want.

$ git clone git@github.com:neo4j/docker-neo4j.git
$ cd docker-neo4j

If we want to build a Docker image for Neo4j Enterprise Edition we can run the following build target:

$ make clean build-enterprise
Makefile:9: *** This Make does not support .RECIPEPREFIX. Please use GNU Make 4.0 or later.  Stop.

Denied at the first hurdle! What version of make have we got on this machine?

$ make --version
GNU Make 3.81
Copyright (C) 2006  Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
 
This program built for i386-apple-darwin11.3.0

We can sort that out by installing a newer version using brew:

$ brew install make
$ gmake --version
GNU Make 4.2.1
Built for x86_64-apple-darwin15.6.0
Copyright (C) 1988-2016 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.

That’s more like it! brew installs make with the ‘g’ prefix and since I’m not sure if anything else on my system relies on the older version of make I won’t bother changing the symlink.

Let’s retry our original command:

$ gmake clean build-enterprise
Makefile:14: *** NEO4J_VERSION is not set.  Stop.

It’s still not happy with us! Let’s set that environment variable to the latest released version as of writing:

$ export NEO4J_VERSION="3.2.2"
$ gmake clean build-enterprise
...
Successfully built c16b6f2738de
Successfully tagged test/18334:latest
Neo4j 3.2.2-enterprise available as: test/18334

We can see that image in Docker land by running the following command:

$ docker images | head -n2
REPOSITORY                                     TAG                          IMAGE ID            CREATED             SIZE
test/18334                                     latest                       c16b6f2738de        4 minutes ago       303MB

If I wanted to deploy that image to my own Docker Hub I could run the following commands:

$ docker login --username=markhneedham
$ docker tag c16b6f2738de markhneedham/neo4j:3.2.2
$ docker push markhneedham/neo4j

Putting Neo4j Enterprise 3.2.2 on my Docker Hub isn’t very interesting though – that version is already on the official Neo4j Docker Hub.

I’ve actually been building versions of Neo4j against the HEAD of the Neo4j 3.2 branch (i.e. 3.2.3-SNAPSHOT), deploying those to S3, and then building a Docker image based on those archives.

To change the destination of the Neo4j artifact we need to tweak this line in the Makefile:

$ git diff Makefile
diff --git a/Makefile b/Makefile
index c77ed1f..98e05ca 100644
--- a/Makefile
+++ b/Makefile
@@ -15,7 +15,7 @@ ifndef NEO4J_VERSION
 endif
 
 tarball = neo4j-$(1)-$(2)-unix.tar.gz
-dist_site := http://dist.neo4j.org
+dist_site := https://s3-eu-west-1.amazonaws.com/core-edge.neotechnology.com/20170726
 series := $(shell echo "$(NEO4J_VERSION)" | sed -E 's/^([0-9]+\.[0-9]+)\..*/\1/')
 
 all: out/enterprise/.sentinel out/community/.sentinel

We can then update the Neo4j version environment variable:

$ export NEO4J_VERSION="3.2.3-SNAPSHOT"

And then repeat the Docker commands above. You’ll need to sub in your own Docker Hub user and repository names.

I’m using these custom images as part of Kubernetes deployments but you can use them anywhere that accepts a Docker container.

If anything on the post didn’t make sense or you want more clarification let me know @markhneedham.

Written by Mark Needham

July 26th, 2017 at 10:20 pm

Posted in neo4j

Tagged with ,

Pandas: ValueError: The truth value of a Series is ambiguous.

without comments

I’ve been playing around with Kaggle in my spare time over the last few weeks and came across an unexpected behaviour when trying to add a column to a dataframe.

First let’s get Panda’s into our program scope:

Prerequisites

import pandas as pd

Now we’ll create a data frame to play with for the duration of this post:

>>> df = pd.DataFrame({"a": [1,2,3,4,5], "b": [2,3,4,5,6]})
>>> df
   a  b
0  5  2
1  6  6
2  0  8
3  3  2
4  1  6

Let’s say we want to create a new column which returns True if either of the numbers are odd. If not then it’ll return False.

We’d expect to see a column full of True values so let’s get started.

>>> divmod(df["a"], 2)[1] > 0
0     True
1    False
2     True
3    False
4     True
Name: a, dtype: bool
 
>>> divmod(df["b"], 2)[1] > 0
0    False
1     True
2    False
3     True
4    False
Name: b, dtype: bool

So far so good. Now let’s combine those two calculations together and create a new column in our data frame:

>>> df["anyOdd"] = (divmod(df["a"], 2)[1] > 0) or (divmod(df["b"], 2)[1] > 0)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/markneedham/projects/kaggle/house-prices/a/lib/python3.6/site-packages/pandas/core/generic.py", line 953, in __nonzero__
    .format(self.__class__.__name__))
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().

Hmmm, that was unexpected! Unfortunately Python’s or and and statements don’t work very well against Panda’s Series’, so instead we need to use the bitwise or (|) and and (&).

Let’s update our example:

>>> df["anyOdd"] = (divmod(df["a"], 2)[1] > 0) | (divmod(df["b"], 2)[1] > 0)
>>> df
   a  b  anyOdd
0  1  2    True
1  2  3    True
2  3  4    True
3  4  5    True
4  5  6    True

Much better. And what about if we wanted to check if both values are odd?

>>> df["bothOdd"] = (divmod(df["a"], 2)[1] > 0) & (divmod(df["b"], 2)[1] > 0)
>>> df
   a  b  anyOdd  bothOdd
0  1  2    True    False
1  2  3    True    False
2  3  4    True    False
3  4  5    True    False
4  5  6    True    False

Works exactly as expected, hoorah!

Written by Mark Needham

July 26th, 2017 at 9:41 pm

Posted in Data Science

Tagged with , ,

Pandas/scikit-learn: get_dummies test/train sets – ValueError: shapes not aligned

without comments

I’ve been using panda’s get_dummies function to generate dummy columns for categorical variables to use with scikit-learn, but noticed that it sometimes doesn’t work as I expect.

Prerequisites

import pandas as pd
import numpy as np
from sklearn import linear_model

Let’s say we have the following training and test sets:

Training set

train = pd.DataFrame({"letter":["A", "B", "C", "D"], "value": [1, 2, 3, 4]})
X_train = train.drop(["value"], axis=1)
X_train = pd.get_dummies(X_train)
y_train = train["value"]

Test set

test = pd.DataFrame({"letter":["D", "D", "B", "E"], "value": [4, 5, 7, 19]})
X_test = test.drop(["value"], axis=1)
X_test = pd.get_dummies(X_test)
y_test = test["value"]

Now say we want to train a linear model on our training set and then use it to predict the values in our test set:

Train the model

lr = linear_model.LinearRegression()
model = lr.fit(X_train, y_train)

Test the model

model.score(X_test, y_test)
ValueError: shapes (4,3) and (4,) not aligned: 3 (dim 1) != 4 (dim 0)

Hmmm that didn’t go to plan. If we print X_train and X_test it might help shed some light:

Checking the train/test datasets

print(X_train)
   letter_A  letter_B  letter_C  letter_D
0         1         0         0         0
1         0         1         0         0
2         0         0         1         0
3         0         0         0         1
print(X_test)
   letter_B  letter_D  letter_E
0         0         1         0
1         0         1         0
2         1         0         0
3         0         0         1

They do indeed have different shapes and some different column names because the test set contained some values that weren’t present in the training set.

We can fix this by making the ‘letter’ field categorical before we run the get_dummies method over the dataframe. At the moment the field is of type ‘object’:

Column types

print(train.info)
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4 entries, 0 to 3
Data columns (total 2 columns):
letter    4 non-null object
value     4 non-null int64
dtypes: int64(1), object(1)
memory usage: 144.0+ bytes

We can fix this by converting the ‘letter’ field to the type ‘category’ and setting the list of allowed values to be the unique set of values in the train/test sets.

All allowed values

all_data = pd.concat((train,test))
for column in all_data.select_dtypes(include=[np.object]).columns:
    print(column, all_data[column].unique())
letter ['A' 'B' 'C' 'D' 'E']

Now let’s update the type of our ‘letter’ field in the train and test dataframes.

Type: ‘category’

all_data = pd.concat((train,test))
 
for column in all_data.select_dtypes(include=[np.object]).columns:
    train[column] = train[column].astype('category', categories = all_data[column].unique())
    test[column] = test[column].astype('category', categories = all_data[column].unique())

And now if we call get_dummies on either dataframe we’ll get the same set of columns:

get_dummies: Take 2

X_train = train.drop(["value"], axis=1)
X_train = pd.get_dummies(X_train)
print(X_train)
   letter_A  letter_B  letter_C  letter_D  letter_E
0         1         0         0         0         0
1         0         1         0         0         0
2         0         0         1         0         0
3         0         0         0         1         0
X_test = test.drop(["value"], axis=1)
X_test = pd.get_dummies(X_test)
print(X_train)
   letter_A  letter_B  letter_C  letter_D  letter_E
0         0         0         0         1         0
1         0         0         0         1         0
2         0         1         0         0         0
3         0         0         0         0         1

Great! Now we should be able to train our model and use it against the test set:

Train the model: Take 2

lr = linear_model.LinearRegression()
model = lr.fit(X_train, y_train)

Test the model: Take 2

model.score(X_test, y_test)
-1.0604490500863557

And we’re done!

Written by Mark Needham

July 5th, 2017 at 3:42 pm

Posted in Python

Tagged with ,

Pandas: Find rows where column/field is null

without comments

In my continued playing around with the Kaggle house prices dataset I wanted to find any columns/fields that have null values in.

If we want to get a count of the number of null fields by column we can use the following code, adapted from Poonam Ligade’s kernel:

Prerequisites

import pandas as pd

Count the null columns

train = pd.read_csv("train.csv")
null_columns=train.columns[train.isnull().any()]
train[null_columns].isnull().sum()
LotFrontage      259
Alley           1369
MasVnrType         8
MasVnrArea         8
BsmtQual          37
BsmtCond          37
BsmtExposure      38
BsmtFinType1      37
BsmtFinType2      38
Electrical         1
FireplaceQu      690
GarageType        81
GarageYrBlt       81
GarageFinish      81
GarageQual        81
GarageCond        81
PoolQC          1453
Fence           1179
MiscFeature     1406
dtype: int64

So there are lots of different columns containing null values. What if we want to find the solitary row which has ‘Electrical’ as null?

Single column is null

print(train[train["Electrical"].isnull()][null_columns])
      LotFrontage Alley MasVnrType  MasVnrArea BsmtQual BsmtCond BsmtExposure  \
1379         73.0   NaN       None         0.0       Gd       TA           No   
 
     BsmtFinType1 BsmtFinType2 Electrical FireplaceQu GarageType  GarageYrBlt  \
1379          Unf          Unf        NaN         NaN    BuiltIn       2007.0   
 
     GarageFinish GarageQual GarageCond PoolQC Fence MiscFeature  
1379          Fin         TA         TA    NaN   NaN         NaN

And what if we want to return every row that contains at least one null value? That’s not too difficult – it’s just a combination of the code in the previous two sections:

All null columns

print(train[train.isnull().any(axis=1)][null_columns].head())
   LotFrontage Alley MasVnrType  MasVnrArea BsmtQual BsmtCond BsmtExposure  \
0         65.0   NaN    BrkFace       196.0       Gd       TA           No   
1         80.0   NaN       None         0.0       Gd       TA           Gd   
2         68.0   NaN    BrkFace       162.0       Gd       TA           Mn   
3         60.0   NaN       None         0.0       TA       Gd           No   
4         84.0   NaN    BrkFace       350.0       Gd       TA           Av   
 
  BsmtFinType1 BsmtFinType2 Electrical FireplaceQu GarageType  GarageYrBlt  \
0          GLQ          Unf      SBrkr         NaN     Attchd       2003.0   
1          ALQ          Unf      SBrkr          TA     Attchd       1976.0   
2          GLQ          Unf      SBrkr          TA     Attchd       2001.0   
3          ALQ          Unf      SBrkr          Gd     Detchd       1998.0   
4          GLQ          Unf      SBrkr          TA     Attchd       2000.0   
 
  GarageFinish GarageQual GarageCond PoolQC Fence MiscFeature  
0          RFn         TA         TA    NaN   NaN         NaN  
1          RFn         TA         TA    NaN   NaN         NaN  
2          RFn         TA         TA    NaN   NaN         NaN  
3          Unf         TA         TA    NaN   NaN         NaN  
4          RFn         TA         TA    NaN   NaN         NaN

Hope that helps future Mark!

Written by Mark Needham

July 5th, 2017 at 2:31 pm

Posted in Python

Tagged with , ,

Shell: Create a comma separated string

without comments

I recently needed to generate a string with comma separated values, based on iterating a range of numbers.

e.g. we should get the following output where n = 3

foo-0,foo-1,foo-2

I only had the shell available to me so I couldn’t shell out into Python or Ruby for example. That means it’s bash scripting time!

If we want to iterate a range of numbers and print them out on the screen we can write the following code:

n=3
for i in $(seq 0 $(($n > 0? $n-1: 0))); do 
  echo "foo-$i"
done
 
foo-0
foo-1
foo-2

Combining them into a string is a bit more tricky, but luckily I found a great blog post by Andreas Haupt which shows what to do. Andreas is solving a more complicated problem than me but these are the bits of code that we need from the post.

n=3
combined=""
 
for i in $(seq 0 $(($n > 0? $n-1: 0))); do 
  token="foo-$i"
  combined="${combined}${combined:+,}$token"
done
echo $combined
 
foo-0,foo-1,foo-2

This won’t work if you set n<0 but that’s ok for me! I’ll let Andreas explain how it works:

  • ${combined:+,} will return either a comma (if combined exists and is set) or nothing at all.
  • In the first invocation of the loop combined is not yet set and nothing is put out.
  • In the next rounds combined is set and a comma will be put out.

We can see how it in action by printing out the value of $combined after each iteration of the loop:

n=3
combined=""
 
for i in $(seq 0 $(($n > 0 ? $n-1: 0))); do 
  token="foo-$i"
  combined="${combined}${combined:+,}$token"
  echo $combined
done
 
foo-0
foo-0,foo-1
foo-0,foo-1,foo-2

Looks good to me!

Written by Mark Needham

June 23rd, 2017 at 12:26 pm

Posted in Shell Scripting

Tagged with , ,

scikit-learn: Random forests – Feature Importance

without comments

As I mentioned in a blog post a couple of weeks ago, I’ve been playing around with the Kaggle House Prices competition and the most recent thing I tried was training a random forest regressor.

Unfortunately, although it gave me better results locally it got a worse score on the unseen data, which I figured meant I’d overfitted the model.

I wasn’t really sure how to work out if that theory was true or not, but by chance I was reading Chris Albon’s blog and found a post where he explains how to inspect the importance of every feature in a random forest. Just what I needed!

Stealing from Chris’ post I wrote the following code to work out the feature importance for my dataset:

Prerequisites

import numpy as np
import pandas as pd
 
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
 
# We'll use this library to make the display pretty
from tabulate import tabulate

Load Data

train = pd.read_csv('train.csv')
 
# the model can only handle numeric values so filter out the rest
data = train.select_dtypes(include=[np.number]).interpolate().dropna()

Split train/test sets

y = train.SalePrice
X = data.drop(["SalePrice", "Id"], axis=1)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=.33)

Train model

clf = RandomForestRegressor(n_jobs=2, n_estimators=1000)
model = clf.fit(X_train, y_train)

Feature Importance

headers = ["name", "score"]
values = sorted(zip(X_train.columns, model.feature_importances_), key=lambda x: x[1] * -1)
print(tabulate(values, headers, tablefmt="plain"))
name                 score
OverallQual    0.553829
GrLivArea      0.131
BsmtFinSF1     0.0374779
TotalBsmtSF    0.0372076
1stFlrSF       0.0321814
GarageCars     0.0226189
GarageArea     0.0215719
LotArea        0.0214979
YearBuilt      0.0184556
2ndFlrSF       0.0127248
YearRemodAdd   0.0126581
WoodDeckSF     0.0108077
OpenPorchSF    0.00945239
LotFrontage    0.00873811
TotRmsAbvGrd   0.00803121
GarageYrBlt    0.00760442
BsmtUnfSF      0.00715158
MasVnrArea     0.00680341
ScreenPorch    0.00618797
Fireplaces     0.00521741
OverallCond    0.00487722
MoSold         0.00461165
MSSubClass     0.00458496
BedroomAbvGr   0.00253031
FullBath       0.0024245
YrSold         0.00211638
HalfBath       0.0014954
KitchenAbvGr   0.00140786
BsmtFullBath   0.00137335
BsmtFinSF2     0.00107147
EnclosedPorch  0.000951266
3SsnPorch      0.000501238
PoolArea       0.000261668
LowQualFinSF   0.000241304
BsmtHalfBath   0.000179506
MiscVal        0.000154799

So OverallQual is quite a good predictor but then there’s a steep fall to GrLivArea before things really tail off after WoodDeckSF.

I think this is telling us that a lot of these features aren’t useful at all and can be removed from the model. There are also a bunch of categorical/factor variables that have been stripped out of the model but might be predictive of the house price.

These are the next things I’m going to explore:

  • Make the categorical variables numeric (perhaps by using one hot encoding for some of them)
  • Remove the most predictive features and build a model that only uses the other features

Written by Mark Needham

June 16th, 2017 at 5:55 am

Kubernetes: Which node is a pod on?

without comments

When running Kubernetes on a cloud provider, rather than locally using minikube, it’s useful to know which node a pod is running on.

The normal command to list pods doesn’t contain this information:

$ kubectl get pod
NAME           READY     STATUS    RESTARTS   AGE       
neo4j-core-0   1/1       Running   0          6m        
neo4j-core-1   1/1       Running   0          6m        
neo4j-core-2   1/1       Running   0          2m

I spent a while searching for a command that I could use before I came across Ta-Ching Chen’s blog post while looking for something else.

Ta-Ching points out that we just need to add the flag -o wide to our original command to get the information we require:

$ kubectl get pod -o wide
NAME           READY     STATUS    RESTARTS   AGE       IP           NODE
neo4j-core-0   1/1       Running   0          6m        10.32.3.6    gke-neo4j-cluster-default-pool-ded394fa-0kpw
neo4j-core-1   1/1       Running   0          6m        10.32.3.7    gke-neo4j-cluster-default-pool-ded394fa-0kpw
neo4j-core-2   1/1       Running   0          2m        10.32.0.10   gke-neo4j-cluster-default-pool-ded394fa-kp68

Easy!

Written by Mark Needham

June 14th, 2017 at 8:49 am

Posted in Kubernetes

Tagged with

Kaggle: House Prices: Advanced Regression Techniques – Trying to fill in missing values

without comments

I’ve been playing around with the data in Kaggle’s House Prices: Advanced Regression Techniques and while replicating Poonam Ligade’s exploratory analysis I wanted to see if I could create a model to fill in some of the missing values.

Poonam wrote the following code to identify which columns in the dataset had the most missing values:

import pandas as pd
train = pd.read_csv('train.csv')
null_columns=train.columns[train.isnull().any()]
 
>>> print(train[null_columns].isnull().sum())
LotFrontage      259
Alley           1369
MasVnrType         8
MasVnrArea         8
BsmtQual          37
BsmtCond          37
BsmtExposure      38
BsmtFinType1      37
BsmtFinType2      38
Electrical         1
FireplaceQu      690
GarageType        81
GarageYrBlt       81
GarageFinish      81
GarageQual        81
GarageCond        81
PoolQC          1453
Fence           1179
MiscFeature     1406
dtype: int64

The one that I’m most interested in is LotFrontage, which describes ‘Linear feet of street connected to property’. There are a few other columns related to lots so I thought I might be able to use them to fill in the missing LotFrontage values.

We can write the following code to find a selection of the rows missing a LotFrontage value:

cols = [col for col in train.columns if col.startswith("Lot")]
missing_frontage = train[cols][train["LotFrontage"].isnull()]
 
>>> print(missing_frontage.head())
    LotFrontage  LotArea LotShape LotConfig
7           NaN    10382      IR1    Corner
12          NaN    12968      IR2    Inside
14          NaN    10920      IR1    Corner
16          NaN    11241      IR1   CulDSac
24          NaN     8246      IR1    Inside

I want to use scikit-learn‘s linear regression model which only works with numeric values so we need to convert our categorical variables into numeric equivalents. We can use pandas get_dummies function for this.

Let’s try it out on the LotShape column:

sub_train = train[train.LotFrontage.notnull()]
dummies = pd.get_dummies(sub_train[cols].LotShape)
 
>>> print(dummies.head())
   IR1  IR2  IR3  Reg
0    0    0    0    1
1    0    0    0    1
2    1    0    0    0
3    1    0    0    0
4    1    0    0    0

Cool, that looks good. We can do the same with LotConfig and then we need to add these new columns onto the original DataFrame. We can use pandas concat function to do this.

import numpy as np
 
data = pd.concat([
        sub_train[cols],
        pd.get_dummies(sub_train[cols].LotShape),
        pd.get_dummies(sub_train[cols].LotConfig)
    ], axis=1).select_dtypes(include=[np.number])
 
>>> print(data.head())
   LotFrontage  LotArea  IR1  IR2  IR3  Reg  Corner  CulDSac  FR2  FR3  Inside
0         65.0     8450    0    0    0    1       0        0    0    0       1
1         80.0     9600    0    0    0    1       0        0    1    0       0
2         68.0    11250    1    0    0    0       0        0    0    0       1
3         60.0     9550    1    0    0    0       1        0    0    0       0
4         84.0    14260    1    0    0    0       0        0    1    0       0

We can now split data into train and test sets and create a model.

from sklearn import linear_model
from sklearn.model_selection import train_test_split
 
X = data.drop(["LotFrontage"], axis=1)
y = data.LotFrontage
 
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42, test_size=.33)
 
lr = linear_model.LinearRegression()
 
model = lr.fit(X_train, y_train)

Now it’s time to give it a try on the test set:

>>> print("R^2 is: \n", model.score(X_test, y_test))
R^2 is: 
 -0.84137438493

Hmm that didn’t work too well – an R^2 score of less than 0 suggests that we’d be better off just predicting the average LotFrontage regardless of any of the other features. We can confirm that with the following code:

from sklearn.metrics import r2_score
 
>>> print(r2_score(y_test, np.repeat(y_test.mean(), len(y_test))))
0.0

whereas if we had all of the values correct we’d get a score of 1:

>>> print(r2_score(y_test, y_test))
1.0

In summary, not a very successful experiment. Poonam derives a value for LotFrontage based on the square root of LotArea so perhaps that’s the best we can do here.

Written by Mark Needham

June 4th, 2017 at 9:22 am

Posted in Data Science,Python

Tagged with ,

GraphQL-Europe: A trip to Berlin

without comments

Last weekend my colleagues Will, Michael, Oskar, and I went to Berlin to spend Sunday at the GraphQL Europe conference in Berlin.

IMG 20170521 084449

Neo4j sponsored the conference as we’ve been experimenting with building a GraphQL to Neo4j integration and wanted to get some feedback from the community as well as learn what’s going on in GraphQL land.

Will and Michael have written about their experience where they talk more about the hackathon we hosted so I’ll cover it more from a personal perspective.

The first thing that stood out for me was how busy it was – I knew GraphQL was pretty hipster but I wasn’t expecting there to be ~ 300 attendees.

The venue was amazing – the nHow Hotel is located right next to the Spree River so there were great views to be had during the breaks. It also helped that it was really sunny for the whole day!

IMG 20170521 103636

I spent most of the day hanging out at the Neo4j booth which was good fun – several people pointed out that an integration between Neo4j and GraphQL made a lot of sense given that GraphQL talks about the application graph and Neo4j graphs in general.

I managed to attend a few of the talks, including one by Brooks Swinnerton from GitHub who announced that they’d be moving to GraphQL for v4 of their API.

The most interesting part of the talk for me was when Brooks said they’d directed requests for their REST API to the GraphQL one behind the scenes for a while now to check that it could handle the load.

GitHub is moving to GraphQL for v4 of our API because it offers significantly more flexibility for our integrators. The ability to define precisely the data you want—and only the data you want—is a powerful advantage over the REST API v3 endpoints.

I think twitter may be doing something similar based on this tweet by Tom Ashworth:

From what I could tell the early pick up of GraphQL seems to be from the front end of applications – several of the attendees had attended ReactEurope a couple of days earlier – but micro services were mentioned in a few of the talks and it was suggested that GraphQL works well in this world as well.

It was a fun day out so thanks to the folks at Graphcool for organising!

Written by Mark Needham

May 27th, 2017 at 11:31 am

Posted in Conferences

Tagged with ,