· neo4j

Neo4j: What is a node?

One of the first things I needed to learn when I started using Neo4j was how to model my domain using nodes and relationships and it wasn’t initially obvious to me what things should be nodes.

Luckily Ian Robinson showed me a mini-algorithm which I found helpful for getting started. The steps are as follows:

  1. Write out the questions you want to ask

  2. Highlight/underline the nouns

  3. Those are your nodes!

This is reasonably similar to the way that we work out what our objects should be when we’re doing OO modelling and I thought I’d give it a try on some of the data sets that I’ve worked with recently:

  • Female friends of friends that somebody could go out with

  • Goals scored by Arsenal players in a particular season

  • Colleagues who have similar skills to me

  • Episodes of a TV program that a particular actor appeared in

  • Customers who would be affected if a piece of equipment went in for repair

If you’re like me and aren’t that great at English grammar we can always cheat and get NLTK to help us out:

>>> nltk.pos_tag(nltk.word_tokenize("Female friends of friends that somebody could go out with"))
[('Female', 'NNP'), ('friends', 'NNS'), ('of', 'IN'), ('friends', 'NNS'), ('that', 'WDT'), ('somebody', 'NN'), ('could', 'MD'), ('go', 'VB'), ('out', 'RP'), ('with', 'IN')]

That tells us the likely tag for each part of speech in the sentence and we can filter the resulting list so we only see nouns like this:

>>> nouns = ['NNS', 'NN', 'NP', 'NNP']
>>> [(word, grammar) for (word, grammar) in nltk.pos_tag(nltk.word_tokenize("Female friends of friends that somebody could go out with")) if grammar in nouns]
[('Female', 'NNP'), ('friends', 'NNS'), ('friends', 'NNS'), ('somebody', 'NN')]

We can ignore the 'Female' in this sentence (I think it’s been picked up as a proper noun because of the capitalisation) which leaves us with 'friends' and 'somebody' In both cases these nouns represent the concept of a person so we’d want to create nodes representing people in this domain.

Let’s see how NLTK gets on with our second question:

>>> sentence = "Goals scored by Arsenal players in a particular season"
>>> [(word, grammar) for (word, grammar) in nltk.pos_tag(nltk.word_tokenize(sentence)) if grammar in ['NNS', 'NN', 'NP']]
[('Goals', 'NNS'), ('Arsenal', 'NNP'), ('players', 'NNS'), ('season', 'NN')]

In this case we’d have goals, teams (e.g. Arsenal), players and seasons as our nodes.

Although this is a very rough algorithm for working out what things should be nodes in a graph I think it’s a good way to get started.

After that the other queries we want to write may lead us to change the model to solve our problem even better.

  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket