Mark Needham

Thoughts on Software Development

Archive for the ‘Software Development’ tag

Downloading the JDK 6 source code

without comments

Every now and then I want to get the JDK source code onto a new machine and it always seems to take me longer than I expect it to so this post is an attempt to help future me!

Googling for this takes me to this page and I always think I’ll just checkout the SVN repository and hook that up but it doesn’t seem to be available.

$ wget -S http://java.net/projects/jdk-jrl-sources/
--2012-02-11 09:51:34--  http://java.net/projects/jdk-jrl-sources/
Resolving java.net (java.net)... 192.9.164.103
Connecting to java.net (java.net)|192.9.164.103|:80... connected.
HTTP request sent, awaiting response... 
  HTTP/1.1 404 Not Found
  Date: Sat, 11 Feb 2012 09:51:34 GMT

The alternative is therefore to download the jar provided which we can do like this:

wget http://www.java.net/download/jdk6/6u23/promoted/b05/jdk-6u23-fcs-src-b05-jrl-12_nov_2010.jar

The next step is then to execute the jar which I somehow didn’t realise until I unpacked it and had a look at the README:

java -jar jdk-6u23-fcs-src-b05-jrl-12_nov_2010.jar

You get asked to choose a folder location for the sources and then the code is under ‘src/share/classes’. So for me I need to give IntelliJ a source path of :

/Users/mneedham/github/j2se/src/share/classes

You can browse the different versions of the source code by changing the version number at the end of URLs like this one. At the moment version 23 is the latest one available.

Written by Mark Needham

February 11th, 2012 at 10:02 am

Delivery approach and constraints

without comments

In my latest post I described an approach we’d been taking when analysing how to rewrite part of an existing system so that we could build the new version in an incremental way.

Towards the end I pointed out that we weren’t actually going to be using an incremental approach as we’d initially thought which was due to a couple of constraints that we have to work under.

Hardware provisioning

One of the main reasons that we favoured an incremental approach is that we’d be able to deploy to production early which would allow us to show a quicker return on investment.

Unfortunately we later on came to learn that it takes around 6-9 months to provision production hardware.

It therefore didn’t make a lot of sense to take an approach where we tried to integrate into the existing system since we wouldn’t be able to deploy that work.

We’re working under the assumption that in 6-9 months we’ll probably be able to rewrite the whole thing and can therefore avoid the need to write the code which would allow us to integrate into the existing version.

We couldn’t see any value in writing bridging code between the existing and new versions of the application if it never sees production – we’d put in all the effort for no reward.

Running two systems side by side

Even if we had been able to provision hardware in time to release incrementally we came to learn that the business would be uncomfortable with having two versions of the same application in production at the same time.

The application is used to do pricing and the worry was that the different versions might produce different results for the same inputs.

It’s arguably something we may have been able to overcome if we could prove that the new version worked exactly the same as the existing one by running both applications against a set of scenarios and checking that they returned the same results.

Theoretically I suppose the first problem could also be overcome but it’s a battle we’ve chosen to leave alone for the moment.

What I found interesting in the discussions about the way we should deliver our solution was that I’d worked under the assumption that an incremental approach was always a better approach but with these constraints it isn’t.

Written by Mark Needham

February 8th, 2012 at 10:34 pm

Looking for the seam

without comments

During December/early January we spent some time analysing an existing system which we were looking to rewrite and our approach was to look for how we could do this in an incremental way.

In order to do that we needed to look for what Michael Feathers refers to as a seam:

A seam is a place where you can alter behaviour in your program without editing in that place

On previous times when I’ve been thinking about seams it’s been at a code level inside a single application but this time there were more than one pieces interacting.

Seam

We knew that there was a web application where the user could request a quote which would be calculated offline and then an email sent to them when it was ready to view.

That led us to believe that there was probably some sort of queue being used to store the outstanding requests and there’d probably be some sort of application processing the requests.

As it turned out the design of the system actually looked like the diagram on the right with the database effectively as a queue.

We then needed to work out which tables we had to read from/write to so that we’d be able to just replace the ‘polling application’ and leave the ‘web application’ alone.

We were then able to come up with a design whereby we isolated any interaction with the database into a ‘bridging application’ which then farmed requests out to a new application which we could scale horizontally.

It could also take care of writing the quotes back into the database so the existing application could read them back onto the screen.

Although we ended up not using this architecture for other reasons I think it’s a neat way of looking at systems to work out how we can change them with minimal impact.

Written by Mark Needham

February 6th, 2012 at 10:22 pm

Developer machine automation: Dependencies

without comments

As I mentioned in a post last week we’ve been automating the setup of our developer machines with puppet over the last week and one thing that we’ve learnt is that you need to be careful about how you define dependencies.

The aim is to get your scripts to the point where the outcome is reasonably deterministic so that we can have confidence they’re going to work the next we run them.

We noticed two ways in which we haven’t quite achieved determinism yet:

Accidental Dependencies

The first few times that we ran the scripts on top of a vanilla image we were doing it on a virtual machine which had VMware tools installed on it.

We’d forgotten that VMware tools had been installed on those VMs and ran into a problem with Oracle dependencies not being satisfied when we ran puppet on some machines which had CentOS installed directly (i.e. not on a virtual machine).

Those dependencies had been satisfied by our VMware tools installation on the VMs so we didn’t realise that we hadn’t explicitly stated those dependencies, something which we have done now.

External Dependencies

We couldn’t find the Firefox version that we wanted install on the default yum repositories so we created a puppet task which linked to a Firefox RPM on an external server and then installed it.

It worked originally but at some stage over the last couple of weeks the URI was changed as a minor version had been upgraded, breaking our script.

We also came across another way that external dependencies can fail today – if a corporate proxy blocks access to the URL!

We’re trying to get to the stage where we’re only relying on artifacts either coming from a yum repository or an internal repository where we can store any libraries which aren’t available through yum.

Don’t assume determinism

While trying to solve these dependency problems in our puppet scripts I made the mistake of assuming that if the script runs through once and works that it’s always going to be that way in the future.

Since we had achieved that previously in my mind it was impossible for it to fail in future which stopped me from properly investigating why it had stopped working.

Written by Mark Needham

January 24th, 2012 at 11:16 pm

Installing Puppet on Oracle Linux

with one comment

We’ve been spending some time trying to setup our developer environment on a Oracle Linux 5.7 build and one of the first steps was to install Puppet as we’ve already created scripts which automate the installation of most things.

Unfortunately Oracle Linux builds don’t come with any yum repos configured so when you run the following command…

ls -alh /etc/yum.repos.d/

…you don’t see anything :(

We eventually realised that there are a list of public yum repositories on the Oracle website, of which we needed to download the definition for Oracle Linux 5 like so:

cd /etc/yum.repos.d
wget http://public-yum.oracle.com/public-yum-el5.repo

We then need to edit that file to enable the appropriate repository. In this case we want to enable ol5_u7_base:

[ol5_u7_base]
name=Oracle Linux $releasever - U7 - $basearch - base
baseurl=http://public-yum.oracle.com/repo/OracleLinux/OL5/7/base/$basearch/
gpgkey=http://public-yum.oracle.com/RPM-GPG-KEY-oracle-el5
gpgcheck=1
enabled=1

I made the mistake of enabling ol5_u5_base which led to us getting some really weird problems whereby yum got confused as to which version of libselinux we had installed and was therefore unable to install libselinux-ruby as its dependencies weren’t being properly satisfied.

Calling ‘yum list installed’ suggested that we had libselinux 1.33.4.5-7 installed but if we ran ‘yum install libselinux’ then it suggested we already had 1.33.4.5-5 installed. Very confusing!

After trying to uninstall and downgrade libselinux and pretty much destroying the installation in the process, another colleague spotted my mistake.

We also found that we had to add the epel repo which gave us access to some other packages that we needed:

rpm -Uvh http://download.fedora.redhat.com/pub/epel/5/x86_64/epel-release-5-4.noarch.rpm

After all that was done we were able to run the command to install puppet:

yum install puppet

That installs puppet 2.6.12 as that’s the latest version in that repo. The latest stable version is 2.7.9 but I think we’ll need to hook up a puppet specific repo to get that working.

Written by Mark Needham

January 18th, 2012 at 12:30 am

Application footprint

without comments

I recently came across Carl Erickson’s ‘small teams are dramatically more efficient than large teams‘ blog post which reminded me of something which my colleague Ashok suggested as a useful way for determining team size – the application footprint.

As I understand it the application footprint is applicable for an application at a given point in time and determines how many parallel tasks/streams of work we have.

In the case of the project that I’m currently working on there are 3 separate components which need to interact with each other via an API but otherwise are independent.

Footprint

We can therefore have 3 pairs working – one on each component – and won’t have to worry about them stepping on each other’s toes.

One interesting thing about the application footprint is that it doesn’t stay the same size all the time.

More often than not once a team has gained trust by getting a release out the product owner will start prioritising more independent features which don’t necessarily overlap.

At this stage it might not be such a bad idea to add people to the team if we want to try and finish more quickly.

If we’re already at the point where we have the same number of pairs as parallel pieces of work then adding people is going to be problematic because we’ll struggle to find work for everyone to do.

Stories in the same stream will have dependencies on each other and although it’s theoretically possible to start on something which has a dependency, the likelihood of having to rework it is higher.

One way to get around that problem if we decide that we don’t want to reduce our team size is to have a pair assigned to working on bugs, cross functional requirements such as performance testing/tuning or doing some technical analysis on upcoming stories.

It’s easy enough to remember all this when you’re starting out building an application but I think it’s something that we need to keep in mind so that if there’s pressure to add people to ‘go faster’ then we can determine if that will actually be the case.

As an aside

Obviously there are times when we decide that we’re happy to put more people on a team than it’s footprint might suggest in order to get an overall gain.

For example with 5 pairs we may finish 50 points in a week but if we increase to 10 pairs then perhaps we now get 60 points.

We’ve nearly halved the efficiency of each pair but overall we’ve got a marginal gain which sometimes makes sense. We also need to be aware of the collective unresponsibility that we might introduce by doing this.

Photo courtesy of farlane

Written by Mark Needham

January 16th, 2012 at 1:40 am

Oracle: exp – EXP-00008: ORACLE error 904 encountered/ORA-00904: “POLTYP”: invalid identifier

without comments

I spent a bit of time this afternoon trying to export an Oracle test database so that we could use it locally using the exp tool.

I had to connect to exp like this:

exp user/password@remote_address

And then filled in the other parameters interactively.

Unfortunately when I tried to actually export the specified tables I got the following error message:

EXP-00008: ORACLE error 904 encountered
ORA-00904: "POLTYP": invalid identifier
EXP-00000: Export terminated unsuccessfully

I eventually came across Oyvind Isene’s blog post which pointed out that you’d get this problem if you tried to export a 10g database using an 11g client which is exactly what I was trying to do!

He explains it like so:

The export command runs a query against a table called EXU9RLS in the SYS schema. On 11g this table was expanded with the column POLTYP and the export command (exp) expects to find this column.

I needed to download the 10g client so that I could use that version of exp instead. I haven’t quite got it working yet but at least it’s a different error to deal with!

Written by Mark Needham

January 13th, 2012 at 9:46 pm

My Software Development journey: 2011

without comments

A couple of years ago I used to write a blog post reflecting on what I’d worked on in the preceding year and what I’d learned and having read 2011 reviews by a couple of other people I thought I’d have a go.

Am I actually learning anything?

A thought I had many times in 2011 was ‘am I actually learning anything?‘ as, although I was working with languages that I hadn’t used professionally before, the applications that we I worked on were very similar to ones that I’ve worked on previously.

Often I’d work on something and know exactly how it should be designed and where we could go wrong since I’d done the same thing several times before and the challenge of not knowing what to do had disappeared somewhat.

Now and then…

I certainly failed to learn one thing a day as I suggested in a blog post a couple of years ago although eventually I managed to learn a bit about node.js and clojure by building some toy applications with my colleague Uday.

We decided to rewrite part of our Scala application in clojure in our own time to see what it’d look like which provided us with an interesting insight into what it’d be like to build a system for the second time when you know exactly what to do.

I also completed ml-class which was fun as it was the type of programming that I’ve never done before. Obviously I’m still a novice at the whole machine learning thing but it’s given me an idea of the sorts of things you can do.

Learning is doing

From February until April I was in Bangalore working as a trainer/coach for one of the ThoughtWorks University batches where we tried as much as possible to reduce the amount of ‘teaching’ done.

Sumeet has previously written about the new style of ThoughtWorks University which is more focused on people working on a real project than sitting in workshops and we tried to take this even further.

Previous groups had spent about 2 weeks doing workshop style sessions and then 4 weeks working on a project but we got it to the point where we spent just over a week in workshops and the rest working on the project.

In general I think it worked reasonably well and the skill level of the group seemed reasonably high by the end. We were lucky that there were only 13 people in the group – it would be interesting to see how our approach would scale.

I’ve also noticed this last year that when I’m learning something new it’s not enough to just do toy exercises anymore, I actually have to build something to retain interest.

During the Christmas holidays I decided to try and build a Flipboard style application for my Android phone so I can (yet again) capture the links that people post on twitter.

Actually having a real problem to solve has made me much more engaged than following a tutorial or hello world demo would have done.

Remembering the value of blogging

My rate of posting on here has decreased a lot over the last year which I think is partly down to the fact that I’ve written about a lot of the stuff I see on projects before but also because I started filtering what I thought was interesting enough to write about.

In hindsight the latter approach doesn’t necessarily make sense – the most read posts on this blog are the ones which I thought were the most pointless when I wrote them.

I got stuck in the mindsight that I wasn’t actually learning anything by writing blog posts, which has been proved wrong multiple times both in terms of what I learn in writing the post and from what I learn from people’s comments.

Expressing opinions in big groups/public

I spent 10 months in late 2010/early 2011 working in India and one of the most interesting things I remember observing was that people seemed very reluctant to express their opinion in big groups.

I thought that was something specific to India but on coming back to the UK I’ve noticed the same thing here as well which means we need to adjust our approach in retrospectives if we want everyone to participate.

I also learnt that expressing strong opinions in public in isn’t necessarily the most effective way of making change happen. I probably should have learnt this already but it became increasingly evident how ineffective this approach was in 2011.

Going at my own pace

A couple of years ago I was advised by a couple of colleagues that the way to get to the ‘next level’ was to become more knowledgeable about the overall architectural design of systems but at the time I wasn’t that interested in that.

It’s only more recently that I’ve found it interesting to read about different architectures on High Scalability or Systems We Make.

Another interesting way for me to learn in this area is to try and understand the architectures used in other ThoughtWorks projects that I didn’t work on and see how they compare to the ones I’ve worked on.

I generally can’t force myself to be interested in something if I’m not but once I am interested then I want to learn every detail about it so it’s better to wait until I become interested naturally.

The next thing which I’m sure I’ll eventually become interested in is tech leading a team which several of my peers (in terms of years of experience) are doing now or have been doing for a year or two. Right now though I want to focus on coding!

Overall…

I’m not sure 2011 was a year where I learned as much as I did in previous years – the learning did seem to taper off a bit which in a way is inevitable unless you completely change your role/the types of things you’re building.

In 2012 I plan to keep learning about Android development and I’m going to be doing algo-class to try and get better at another aspect of programming which I’m not very good at right now.

Written by Mark Needham

January 3rd, 2012 at 1:48 am

The supposed black box

without comments

On a reasonable number of the systems that I’ve worked on over the past few years there’s been a ‘black box’ component which the team I’ve been on has needed to integrate with.

I’ve always found it a little strange that you wouldn’t need to/want to know how that part of the system worked or that you could actually believe that it was truly a black box.

If it doesn’t work then you have no way of diagnosing the problem – did you do something wrong, was there something wrong inside the black box or was it something else.

On a project I worked on a few years ago the reason for the black box thinking was that each layer was being developed by people from a different company.

The problem we had was that we were working on the top layer, the one that was visible to the end user and therefore our progress was very visible to the stakeholders who were paying for the product to be built.

We therefore had no choice but to go into the metaphorical black box and try and gather as much information as we could to pass on to the teams working on the other layers so that they would be able to help us better.

I recently watched a talk by Artur Bergman titled ‘Full Stack Awareness‘ where he talks about the necessity of understanding exactly what is happening when our code gets executed rather than thinking of it as magic.

Although Artur is working in a different context to most application developers who maybe don’t need to know the stack as well as he does I think the advice about treating something as magic is useful.

If we think of something as a ‘black box’ then effectively we are saying that it’s somewhat magic.

If the integrated component is being custom written then I think the team who needs to integrate with it should at the very least have someone who knows how it works very well so they can diagnose any problems quickly.

That person then needs to spread their knowledge amongst the rest of the team so that they don’t end up being the bottle neck.

In summary I think the term ‘black box’ is frequently a misnomer and we’ll rarely be able to view said black box in such an opaque way.

Written by Mark Needham

December 20th, 2011 at 11:57 pm

The 5 Whys/Root cause analysis – Douglas Squirrel

with 4 comments

At XP Day I was chatting to Benjamin Mitchell about the 5 whys exercises that we’d tried on my team and I suggested that beyond Eric Ries’ post on the subject I hadn’t come across an article/video which explained how to do it.

Benjamin mentioned that Douglas Squirrel had recently done a talk on this very subject at Skillsmatter and as with most Skillsmatter talks there’s a video of the presentation online.

Gojko wrote a post summarising the talk at the time but I was interested in seeing how a 5 whys facilitated by Douglas would compare to the ones that we’d done.

These were some of my observations/learnings:

  • Douglas started off with a similar approach to the one we tried in our last attempt whereby he listed all the initial problems across the board and then worked through them.

    One thing he did much better was ensuring that the 5 whys were covered for each problem before moving onto the next one. He described this as ‘move down, then across‘ and made the interesting observation that when you get to the real root cause (in theory the 5th why) there will be a pause and it will hurt.

    I don’t remember noticing that in any of our 5 whys which means, Douglas suggests, that ‘you[/we] are not doing it right’. In terms of actually getting to the root cause he’s probably right but you can still learn some useful things even if you don’t dig down that far.

  • He also made the suggestions that we shouldn’t follow whys which we can immediately see are not going to go anywhere – we’d be better off going down one of the other nodes which might lead us to some useful learning.

    I think we made the mistake of following some nodes which we could tell were going to go nowhere the first time that we did the exercise and ended up reaching a 5th why which was so general that we couldn’t do anything with it.

    On the other hand I think it probably takes a couple of goes at the 5 whys before you can say with certainty that following a why is going to go nowhere.

  • Another suggestion was to ensure that everyone linked with the problem being discussed is in the room, partly so that they don’t end up being made the scape goat in absentia.

    In the two exercises we’ve run we only included the people on our immediate team and we did reach a point where it was difficult to work out what the answer to some of the whys should be because the person who could answer that question wasn’t in the room.

    It does obviously make it more logistically difficult to organise the meeting, especially if you have people working in different countries.

  • Squirrel suggested then any actions that come out of the meeting should be completable in a week which helps to ensure that they’re realistic and proportionate to the problem.

    If something goes wrong once then we don’t necessarily need to make massive changes to avoid it in future, it might be sufficient to just make some small changes and then observe if things have improved.

Overall I found the talk quite useful and it was especially helpful to be able to see how a more experienced facilitator, like Douglas, was able to guide the discussion back into the framework so that it didn’t drift off.

I’m not yet convinced that we would want to run a 5 whys exercise every week which is what I’ve heard suggested before – I think the format could quickly become dull to people as with any other meeting format when used repeatedly.

Written by Mark Needham

December 10th, 2011 at 2:11 pm