16 Aug 2009

Impersonators: Why do we need them?

I wrote previously about an impersonator we are using on my project which Martin Fowler has dubbed the 'self initializing fake' and although I thought this was the only type of situation where we might use this approach, from discussing this with my colleague Julio Maia and from experiences on the project I’m working on I realise there are other advantages to this approach as well.

To deal with unstable/slow integration points

This is the main reason that we use the self initializing fake and provides perhaps the most obvious reason why we might create an impersonator because we will remain in pain if we don’t create one.

Before we used it all our acceptance tests went from the UI all the way through to the back end via a service layer which often led to tests failing due to timeouts (the service took longer than expected to respond) or problems somewhere in the back end.

These 'random build failures' were amazingly frustrating since it meant that maybe more than 50% of the time that our continuous integration build light was red it wasn’t actually related to anything that the person who checked in had done.

The confidence in the build light had drained and 'red builds' were pretty much being ignored.

To speed up the feedback cycle

A nice side effect that we got from putting in an impersonator is that our tests run about 5x more quickly than they did previously due to the fact that the impersonator is on a machine in Sydney whereas the actual end point is on a machine in Melbourne and because the impersonator just returns the same response for a given request each time instead of having to recalculate each time.

It is quite common on projects to use an in memory database instead of the real database for integration tests to allow these tests to run more quickly and I think this logic works well when applied to other systems we need to integrate against as well.

In general if we can have as much of the system as possible under our control in the early parts of our build pipeline then we can have a continuous integration build which gives us very quick feedback and only fails due to problems we created.

To handle not yet ready integration points early

Although we would rather do integration early on in a project so that we can discover any problems and then fix them, sometimes the way the project has been planned means that a system that we need to integrate won’t actually be ready until much later on.

An example of this which I’ve seen a couple of times is integrating applications with single sign on systems.

Even though we don’t know exactly how we will integrate against the real system we might be able to take a reasonably good guess of how it might work e.g. our application is able to detect whether a user is logged in based on some information set in the headers of each request by an ISAPI filter which each request passes through before reaching our application.

On this occasion we didn’t developer an impersonator for this integration point until after we already had the real end point but I think it would probably have made life easier if we had.

Impersonators and Continuous Integration

I’m sure there are more reasons why impersonators make sense but these are the ones that we’ve noticed so far.

The key with all these ways of impersonating an integration point is that they should only form part of our continuous integration pipeline -we do still need to test that our code works in a proper end to end test as well.

In our case we have another stage of our Cruise pipeline which runs all the tests against the real end points.

I’m curious as to what our pipeline should look like when we have built multiple impersonators for different end points - my current thinking is that we can have an early stage of the pipeline where we make use of all of the impersonators and then another stage straight after that where we use all the real end points.

However, it does seem like a bit of a jump to go from all impersonators to no impersonators and it increases the likelihood that the build is going to fail since we are going straight from minimal integration to having everything integrated.

I guess another approach would be to gradually introduce a real integration point on each part of the pipeline until we have everything integrated on the last one although this might actually result in our feedback being slower if its the last integration point which gives us problem.

Julio has written more about the impersonator pattern on his blog.

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.