Mark Needham

Thoughts on Software Development

Archive for the ‘Software Development’ tag

The Affect Heuristic

with 2 comments

In my continued reading of Daniel Kahneman’s Thinking Fast and Slow I’ve reached the section which talks about the affect heuristic which seems particularly applicable to the technical decisions that we make.

The dominance of conclusions over arguments is most pronounced where emotions are involved. The psychologist Paul Slovic has proposed an affect heuristic in which people let their likes and dislikes determine their beliefs about the world.

The way I’ve seen this heuristic coming into play in the software world is when we do an ‘objective’ overview of the technical tools/options that we could use to solve a particular problem.

We may do this by coming up with a list of advantages/disadvantages for each technology but the way we come up with this will probably be influenced by which of the technologies we prefer.

We’ll therefore place strong emphasis on the advantages of a technology and not think too much of disadvantages or work arounds that we have to implement.

For example if Clojure were the technology in question then as an advocate of Clojure you might focus on the reduced lines of code and benefits of the functional way of programming and place less emphasis on the learning curve that new team members will have to overcome.

Equally if you weren’t a fan of Clojure then you’d do the opposite.

I covered similar ground in a post I wrote a few months ago about compatible opinions where I suggested people used confirmation bias to back up their own opinions.

I think the affect heuristic is slightly different though because it applies even when we think we’re being impartial in our judgement.

When I read things I like to try and think what action I should be taking as a result of learning new information. In this case I think the take away is to be more self aware than usual when talking about things we’re passionate about.

One way to achieve that could be to run our opinions via someone who is knowledgeable in the subject area but is less emotionally involved.

It’d be interesting to see whether this resonates with others as well and how you handle it.

Written by Mark Needham

June 6th, 2013 at 10:36 pm

Ego Depletion

without comments

On the recommendation of Mike Jones I’ve been reading through Daniel Kahneman’s Thinking Fast and Slow in which the first part of the book covers our two styles of thinking:

  • System 1 – operates automatically and quickly, with little or no effort and no sense of voluntary control.
  • System 2 – allocates attention to the effortful mental activities that demand it, including complex computations. The operations of System 2 are often associated with the subjective experience of agency, choice, and concentration.

He then uses a series of stories to explain this in more detail but I found an experiment run by Roy Baumeister the most interesting:

A series of surprising experiments [...] has shown conclusively that all variants of voluntary effort – cognitive, emotional, or physical – draw at least partly on a shared pool of mental energy. Their experiments involve successive rather than simultaneous tasks.

Baumeister’s group has repeatedly found that an effort of will or self control is tiring; if you have had to force yourself to do something, you are less willing or less able to exert self-control when the next challenge comes around.

The phenomenon has been named ego depletion.

This particularly resonates with me as I’ve frequently seen people (including myself) let a series of events involving another person to go by before finally snapping over something seemingly innocuous.

Quite frequently the other person had no idea that the way they were behaving was irritating so the reaction comes as a surprise to them!

We can get around this problem to some extent by providing timely feedback but in order for that to happen I think we need to admit to ourselves when we are frustrated rather than pretending that it doesn’t bother us.

It will probably feel silly to address these innocuous events so early on when we don’t think they’re bothering us that much but I think it’s better than the explosive alternative!

Written by Mark Needham

June 4th, 2013 at 11:16 pm

Viewing the contents of an archive

with 2 comments

Everyone now and then I want to check the contents of an archive without unpacking it and I tend to use unzip to do so:

$ unzip -l batch-import-jar-with-dependencies.jar | tail -n 10 
     1645  02-17-13 01:03   org/neo4j/batchimport/StdOutReport.class
     3089  02-17-13 01:03   org/neo4j/batchimport/structs/NodeStruct.class
     1244  02-17-13 01:03   org/neo4j/batchimport/structs/Property.class
     1732  02-17-13 01:03   org/neo4j/batchimport/structs/PropertyHolder.class
     1635  02-17-13 01:03   org/neo4j/batchimport/structs/Relationship.class
      905  02-17-13 01:03   org/neo4j/batchimport/utils/Chunker.class
     1884  02-17-13 01:03   org/neo4j/batchimport/utils/Params.class
     4445  02-17-13 01:03   org/neo4j/batchimport/Utils.class
 --------                   -------
 49947859                   16447 files

It does the job although it does print out some information that we’re not really interested in so I was intrigued to see that Alistair used zipinfo when he wanted to achieve a similar thing:

$ zipinfo -1 batch-import-jar-with-dependencies.jar | tail -n 10
org/neo4j/batchimport/ParallelImporter.class
org/neo4j/batchimport/Report.class
org/neo4j/batchimport/StdOutReport.class
org/neo4j/batchimport/structs/NodeStruct.class
org/neo4j/batchimport/structs/Property.class
org/neo4j/batchimport/structs/PropertyHolder.class
org/neo4j/batchimport/structs/Relationship.class
org/neo4j/batchimport/utils/Chunker.class
org/neo4j/batchimport/utils/Params.class
org/neo4j/batchimport/Utils.class

From a bit of man page reading it sounds like zipinfo is unzip, but with different flags that give an output that’s a cross between unzip and ls:

The format is a cross between
Unix “ls -l” and “unzip -v” output. See DETAILED DESCRIPTION below. Note that zipinfo is the same program as unzip (under Unix, a link to it); on some systems,
however, zipinfo support may have been omitted when unzip was compiled.

As long as I remember I’ll be using zipinfo from now on!

Written by Mark Needham

May 29th, 2013 at 11:22 am

Polyglot Persistence: Embrace the ETL

with 7 comments

Over the past few years I’ve seen the emergence of polyglot persistence i.e. using different data storage technologies for different data and in most situations we work that out up front.

Etl1

For example we might use MongoDB to store data about a customer journey through our website but we might simultaneously write page view data through to something like Hadoop or Redshift:

This works reasonably well but sometimes it might not be immediately obvious how we want to query our data when we first start collecting it and our storage choice might not be the best for writing these queries.

An interesting thing to think about at this stage is whether it makes sense to add a stage to our data processing pipeline where we write an ETL job to get it into a more appropriate format:

Etl2

My initial experience doing this was when I created the ThoughtWorks graph which involved transforming data into a graph so that I could find links between people.

Ashok and I followed a similar approach for a client we went on to work for and it allowed us to find the answers to questions that couldn’t be answered when the data was in its original format.

The main down side to this approach is that we now have to keep two data sources in sync but it’s interesting to think about whether this trade off is worthwhile if it helps us gain new insights or find the answers to questions more quickly.

I don’t have any experience with how this approach plays out over time so I’d be interesting in hearing how people have got on with this approach/if it does or doesn’t work.

Written by Mark Needham

May 27th, 2013 at 12:11 am

Polyglot Persistence: The ‘boring’ relational option

with 2 comments

I was chatting with Brian Blignaut last week after the Equal Experts NoSQL event and he made an interesting observation that in this age of Polyglot Persistence we often rule out the relational database.

I think it’s definitely better that we now have many different options for where we store our data – be it as key/value pairs, documents or as a network/graph.

Having these options forces us to think more about how we’re going to read/write data in our application whereas previously our effort was focused around which tables we were going to pull out.

Having said that, I realised I’d fallen into the trap that Brian was referring to when thinking through how we could model the energy plans that users were selecting from a results table.

We wanted to run aggregate queries over the data to work out the most popular plans and suppliers on different days and then across different customer segments.

We represented each user selection as an event, stored as a document in MongoDB which worked fine to start with but queries started to become much slower as the number of documents approached 500,000 or so.

I started thinking about whether we’d be better off storing the data in a different store which was better optimised for the types of queries that we wanted to run.

My initial thought was to use one of the flashier ‘NoSQL’ databases but Ashok pointed out that the use case – slicing/dicing data at a scale (< 1 m rows) where everything would fit on disc - was perfect for something like PostgreSQL.

I realised that he was absolutely right and I think it’s important to remember that in future – when we talk about choosing the appropriate data store for our problem that doesn’t mean we have to rule out the relational database even though it’s probably more fun to use another store.

Written by Mark Needham

May 26th, 2013 at 11:29 pm

Mac OS X: A couple of neat tools

with 7 comments

When I first started working at uSwitch Sid installed a couple of ‘productivity applications’ on my Mac which I’ve found pretty useful but from talking to others I realised they aren’t known/being used by everyone.

Alfred

Alfred is a Quick Silver replacement which allows you to quickly open applications, find files, search Google and more. Even though we’re not using half of its features it’s still proved to be useful.

I quite like the calculator feature which we’ve been using for adhoc calculation like working out how much free memory there was on a server or the conversion rate on part of an A/B test.

Calculator

Moom

The other application is Moom which allows you to move/resize windows.

I didn’t see the point when I first saw it but it’s actually really useful when you’re working on a big monitor and want to put say the terminal alongside the browser.

We have the following shortcuts set up:

Moom1

That allows us to type ‘Ctrl + Space’ to make the window fill the left hand side of the screen, ‘Alt + Space’ to make it fill the right hand side of the screen and ‘Alt + Ctrl + Space’ to fill the whole screen.

You can also set up shortcuts to allow you to move a window between displays or to rearrange the windows based on certain events.

Highly recommended!

If anyone knows any other cool tools like this I’d love to hear about them.

Written by Mark Needham

April 30th, 2013 at 8:07 pm

Sublime: Getting Textmate’s Reveal/Select in Side Bar (Cmd + Ctrl + R)

without comments

After coming across this post about why you should use Sublime Text I decided to try using it a bit more and one of the things that I missed from Textmate was the way you can select the current file on the sidebar.

In Textmate the shortcut to do that is ‘Cmd + Ctrl + R’ so I wanted to be able to do something similar or configure Sublime so it responded to the same shortcut.

The option to reveal a file in the side bar is accessible from the context menu by right clicking on the contents of a file after it’s opening and selecting ‘Reveal in Side Bar’ which is a good start.

To map that to a key binding we need to go ‘Preferences > Key Bindings (User)’ and put the following into that file:

[
	{ "keys": ["ctrl+super+r"], "command": "reveal_in_side_bar" }
]

Of course if we already have other custom key bindings then we can just add it after those instead.

We can work out what the name of commands are by turning on command logging in the Sublime console.

We need to first open the console with ‘Ctrl + `” and then paste the following:

sublime.log_commands(True)

Any commands that we run will now have their name printed in the console window. e.g.

>>> sublime.log_commands(True)
command: context_menu {"event": {"button": 1, "x": 390.21484375, "y": 329.66796875}}
command: reveal_in_side_bar
command: rename_path {"paths": ["/Users/markhneedham/code/thinkingingraphs/public/js/bootstrap.js"]}
no command for selector: noop:
command: show_panel {"panel": "console", "toggle": true}

We can then setup appropriate key bindings for whichever commands we like.

Written by Mark Needham

April 7th, 2013 at 1:00 am

Editing config files on a server & Ctrl-Z

with 5 comments

A couple of weeks ago Tim and I were spinning up a new service on a machine which wasn’t quite working so we were manually making changes to the /etc/nginx/nginx.conf file and restarting nginx to try and sort it out.

This process is generally not that interesting – you open the file in vi, make some changes, close it, then restart nginx and see if it works. If not then you open the file again and repeat.

Except Tim had a slight variation on this workflow which is an improvement that I don’t want to forget!

Once we’d finished making the changes to the file in vi Tim hit ‘Ctrl + Z‘ which suspended the vi process and put us back at the shell prompt.

We could then restart nginx or do whatever else we needed to do and then type ‘fg‘ to go back into vi again.

Not only is this workflow quicker, it also keeps the history of the changes that we’ve made to the file so if one of our changes really screws things up we can easily undo it. Previously we’d have to remember what changes we’d made and do that manually.

In summary this workflow is a pretty simple idea but nevertheless one I had never thought about or seen anyone else do and I’ll be using it in future.

Written by Mark Needham

March 29th, 2013 at 10:51 am

When nokogiri fails with ‘Nokogiri::XML::SyntaxError: Element script embeds close tag’ Web Driver to the rescue

without comments

As I mentioned in my previous post I wanted to add televised games to my football graph and the Premier League website seemed like the best case to find out which games those were.

I initially tried to use Nokogiri to grab the data that I wanted…

> require 'nokogiri'
> require 'open-air'
> tv_times = Nokogiri::HTML(open('http://www.premierleague.com/en-gb/matchday/broadcast-schedules.tv.html?rangeType=.dateSeason&country=GB&clubId=ALL&season=2012-2013&isLive=true'))

…but when I tried to query by CSS selector for all the matches nothing came back:

> tv_times.css(".broadcastschedule table.contentTable tbody tr")
=> []

I was a bit surprised but read somewhere that I should check if there were any errors while parsing the document. In fact there were quite a few!

> tv_times.errors
=> [#<Nokogiri::XML::SyntaxError: Element script embeds close tag>, #<Nokogiri::XML::SyntaxError: Element script embeds close tag>, #<Nokogiri::XML::SyntaxError: Element script embeds close tag>, #<Nokogiri::XML::SyntaxError: Element script embeds close tag>, #<Nokogiri::XML::SyntaxError: Element script embeds close tag>, #<Nokogiri::XML::SyntaxError: Element script embeds close tag>, #<Nokogiri::XML::SyntaxError: Element script embeds close tag>, #<Nokogiri::XML::SyntaxError: Element script embeds close tag>, #<Nokogiri::XML::SyntaxError: Element script embeds close tag>, #<Nokogiri::XML::SyntaxError: Element script embeds close tag>, #<Nokogiri::XML::SyntaxError: Element script embeds close tag>, #<Nokogiri::XML::SyntaxError: Element script embeds close tag>, ...]

I ran the document through the W3C markup validation service and it didn’t seem to find any problem with it.

Next I tried stripping out all the script tags using loofah before manually removing them but neither of those approaches helped.

I’ve previously used Web Driver to scrape web pages but I’d found that Nokogiri was much faster so I stopped using it.

Since my new library wasn’t playing ball I thought I’d quickly see if Web Driver was up to the challenge and indeed it was:

require "selenium-webdriver"
 
driver = Selenium::WebDriver.for :chrome
driver.navigate.to "http://www.premierleague.com/en-gb/matchday/broadcast-schedules.tv.html?rangeType=.dateSeason&country=GB&clubId=ALL&season=2012-2013&isLive=true"
 
matches = driver.find_elements(:css, '.broadcastschedule table.contentTable tbody tr')
matches.each do|tr| 	
  match = tr.find_element(:css, "td.show a").text
  broadcaster = tr.find_element(:css, "td.broadcaster img").attribute("src")
  tv_channel = broadcaster.include?("sky-sports") ? "Sky" : "ESPN"
 
  puts "#{match},#{tv_channel}"
end
 
driver.quit
$ ruby tv_games.rb 
Newcastle United vs Tottenham Hotspur,ESPN
Wigan Athletic vs Chelsea,Sky
Manchester City vs Southampton,Sky
Everton vs Manchester United,Sky
Swansea City vs West Ham United,Sky
Chelsea vs Newcastle United,ESPN
...

Ideally I’d like to use Nokogiri to do this job but it’s decided that the document is invalid and it can’t parse it properly so Web Driver is a pretty decent replacement I reckon!

Written by Mark Needham

March 24th, 2013 at 9:20 pm

Wiring up an Amazon S3 bucket to a CNAME entry – The specified bucket does not exist

without comments

Jason and I were setting up an internal static website using an S3 bucket a couple of days ago and wanted to point a more friendly domain name at it.

We initially called our bucket ‘static-site’ and then created a CNAME entry using zerigo to point our sub domain at the bucket.

The mapping was something like this:

our-subdomain.somedomain.com -> static-site.s3-website-eu-west-1.amazonaws.com

When we tried to access the site through our-subdomain.somedomain.com we got the following error:

<Error>
<Code>NoSuchBucket</Code>
<Message>The specified bucket does not exist</Message>
<BucketName></BucketName>
<RequestId></RequestId>
<HostId>

A bit of googling led us to this thread which suggested that we needed to ensure that our bucket was named after the sub domain that we wanted to serve the site from.

In this case we needed to rename our bucket to ‘our-subdomain.somedomain.com” and then our CNAME entry to:

our-subdomain.somedomain.com -> our-subdomain.somedomain.com.s3-website-eu-west-1.amazonaws.com

And then everything was happy.

Written by Mark Needham

March 21st, 2013 at 10:39 pm