22 Aug 2013

Model to answer your questions rather than modelling reality

On the recommendation of Ian Robinson I’ve been reading the 2nd edition of William’s Kent’s 'Data and Reality' and the author makes an interesting observation at the end of the first chapter which resonated with me:

Once more: we are not modelling reality, but the way information about reality is processed, by people.

It reminds me of similar advice in Eric Evans' Domain Driven Design and it’s advice which I believe is helpful when designing a model in a graph database.

Last year I wrote a post explaining how I’d be using an approach of defining questions that I wanted to ask before modelling my data and in neo4j land we can do this by writing cypher queries up front.

We can then play around with increasing the size of our data set in different ways to check that our queries are still performant and tweak our model if necessary.

For example one simple optimisation would be to run an offline query to make implicit relationships explicit.

Although graphs are very whiteboard friendly and it can be tempting to design our whole model before writing any queries this often causes problems later on.

When we eventually get to asking questions of our data we may find that we’ve modelled some things unnecessarily or have designed the model in a way that leads to inefficient queries.

I’ve found an effective approach is to keep the feedback loop tight by minimising the amount of time between drawing parts of our model on a whiteboard and writing queries against it.

If you’re interested in learning more, Ian has a slide deck from a talk he did at JAX 2013 which covers this idea and others when building out graph database applications.

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.