31 Jan 2013

Levels of automation

Over the last 18 months or so I’ve worked on a variety of different projects in different organisations and seen some patterns around the way that automation was done which I thought would be interesting to document.

The approaches tend to fall into roughly three categories:

Predominantly Manual

This tends to be less frequent these days as most developers have at some stage flicked through The Pragmatic Programmer and been persuaded that automating away boring and repetitive tasks is probably a good idea.

It is still possible to drift into this mode when time pressured and addressing a problem for the first time.

The general approach I’ve seen is to first address the crisis with a manual solution and then automate it so they don’t have to handle it manually the next time.

There can be a tendency to think that someone is a one off and that the effort it would take to automate it isn’t worthwhile. In my experience if something goes wrong once it’s probably going to happen again at some stage when you least want it to!

Partial Automation

This is the stage where you have a few automation scripts for common tasks but there’s still some human involvement.

It might be that a developer needs to remember which machine to run the script against or which arguments to pass to the script but we’re not quite at the 'click a button' stage.

Interestingly when we have partial automation it can still feel as if we’re solving problems manually because there’s still a degree of human interaction involved.

An example of this might be that we’ve fully automated the deployment of a service but manually take a server out of the load balancer, deploy to it, check that it passes a smoke test and then put it back into the load balancer.

The main task is automated but we still need to babysit the deployment rather than being able to fire it off and then leave it running in the background while we focus on something else.

Full Automation

At the fully automated stage we’ve abstracted away the complexity around the tasks we’re automating and may now be at the stage where it’s possible for a non developer to use the tools we’ve written.

We fully automated the earlier service example by making use of Fabric and Boto to get a collection of the appropriate AWS instances and then automated the removing/adding of them from the load balancer.

We also wrote a simple smoke test which used curl to check that the main route returned a 200 response code and aborted the deployment if that wasn’t the case.

Another example of full automation is creating a dashboard to glue all the different parts together.

For example we recently spent a couple of hours creating an interface on top of a bunch of Resque queues. We had previously got in a situation where we knew there were jobs stuck somewhere but didn’t know which node they were on because we have queues spread across 12 machines. This makes it easier.

I have a tendency to want to automate absolutely everything and it can be tricky to tell when you’re overdoing it although it’s equally hard to know when you’re undergoing it!

It’s nearly always initially faster manually if you know what you’re doing. but in the long run you may just be spending time doing something that a computer can help with.

One suggestion was to keep a tally chart of how often you end up doing something manually and then if it reaches a certain threshold put some automation in place.

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.