Mark Needham

Mark Needham https://www.markhneedham.com/blog/ Recent content on Mark Needham Hugo -- gohugo.io en-us Sun, 09 Feb 2025 00:44:37 +0000 DuckDB 1.2: SQL gets even friendlier https://www.markhneedham.com/blog/2025/02/09/duckdb-1.2-sql-gets-even-friendlier/ Sun, 09 Feb 2025 00:44:37 +0000 https://www.markhneedham.com/blog/2025/02/09/duckdb-1.2-sql-gets-even-friendlier/ DuckDB 1.2 is here, packed with new features to make SQL even more user-friendly. We’re going to explore these features with help from Jeff Sackmann’s tennis dataset. Let’s launch DuckDB and then create a variable referring to one of the CSV files. I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: ClickHouse: A hacky way to default parameters in a view https://www.markhneedham.com/blog/2024/11/25/clickhouse-view-hacky-default-parameters/ Mon, 25 Nov 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/11/25/clickhouse-view-hacky-default-parameters/ ClickHouse recently added support for runtime provided parameters in views, so I wanted to try it when querying the MidJourney messages dataset. It worked pretty well, but I ran into problems when trying to define default parameters, which is what we’re going to explore in this blog post. Let’s launch ClickHouse Local: clickhouse -m --max_http_get_redirects=10 --output_format_pretty_row_numbers=0 We need to set max_http_get_redirects so that it can handle redirects in the Hugging Face URL, and output_format_pretty_row_numbers is so that it won’t put numbers in front of each result row. PIVOTing data in ClickHouse and DuckDB https://www.markhneedham.com/blog/2024/11/15/pivot-clickhouse-duckdb/ Fri, 15 Nov 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/11/15/pivot-clickhouse-duckdb/ I really like DuckDB’s PIVOT clause and along with some others wish that ClickHouse supported it too. Sadly it doesn’t, but we can get pretty close to this functionality using ClickHouse’s aggregate function combinators. In this blog post, I’m going to go through each of the examples in the DuckDB documentation and show how to do the equivalent in ClickHouse. Set up First, we need to setup the sample data. LLMs on the command line https://www.markhneedham.com/blog/2024/10/25/llms-on-command-line/ Fri, 25 Oct 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/10/25/llms-on-command-line/ I’ve been playing around with Simon Willison’s llm library over the last week and I have to say I love it! If you want to use LLMs on the command line, this is the tool you need. I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: Installing llm Let’s have a look at how to use it, starting with installation. Ollama: Multiple prompts on vision models https://www.markhneedham.com/blog/2024/10/06/ollama-multi-prompts-vision-models/ Sun, 06 Oct 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/10/06/ollama-multi-prompts-vision-models/ In this blog post, we’re going to learn how to send multiple prompts to vision models when using Ollama. This isn’t super well documented, but it is possible! I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: Let’s import Ollam: import ollama We’re going to call the ollama. Running OpenAI Whisper Turbo on a Mac with insanely-fast-whisper https://www.markhneedham.com/blog/2024/10/02/insanely-fast-whisper-running-openai-whisper-turbo-mac/ Wed, 02 Oct 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/10/02/insanely-fast-whisper-running-openai-whisper-turbo-mac/ A couple of days ago OpenAI released a new version of Whisper, their audio to text model. It’s called Turbo and we can run it on a Mac using the insanely-fast-whisper library. I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: I like trying out this models on podcasts and a recent favourite is The AI Daily Brief, so we’re going to download an MP3 file from a recent episode about some executive departures at OpenAI. An intro to rerankers https://www.markhneedham.com/blog/2024/09/28/intro-to-rerankers/ Sat, 28 Sep 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/09/28/intro-to-rerankers/ rerankers provides a unified API for various reranking models, including any that use transformers, FlashRank, RankGPT, RankLLM, and more. In this blog, we’ll take it for a spin. I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: But first, let’s remind ourselves what reranking is. A basic RAG pipeline would look like this: DuckDB 1.1: Dynamic Column Selection gets even better https://www.markhneedham.com/blog/2024/09/22/duckdb-dynamic-column-selection/ Sun, 22 Sep 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/09/22/duckdb-dynamic-column-selection/ DuckDB 1.1 was released a couple of weeks ago and there are a couple of features that make dynamic column selection even better. We’re going to explore those features in this blog. I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: Kaggle’s FIFA 2022 Dataset To demonstrate dynamic column selection, we need a dataset that has a lot of columns, ideally one containing lots of numeric values as well. DuckDB: Chaining functions https://www.markhneedham.com/blog/2024/08/25/duckdb-chaining-functions/ Sun, 25 Aug 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/08/25/duckdb-chaining-functions/ One of my favourite things about DuckDB is the innovations it’s made in SQL. A recent discovery (for me at least) is that you can chain functions using the dot operator, in the same way you can in many general purpose programming languages. In this blog, we’re going to explore that functionality. I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: Searching through AWS Icons https://www.markhneedham.com/blog/2024/08/23/searching-aws-icons/ Fri, 23 Aug 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/08/23/searching-aws-icons/ I recently needed to search for an icon in the AWS [asset package](https://aws.amazon.com/architecture/icons/) and wanted to share a little script that I wrote. You wouldn’t think that searching for icons should be that hard, but they’re spread across so many folders and sub-folders that you can spend forever trying to find what you want. First, let’s import some modules: import base64 import sys import glob import os And then I’m using the following function to render images in the terminal: ClickHouse: Specifying config settings https://www.markhneedham.com/blog/2024/08/05/clickhouse-config-settings/ Mon, 05 Aug 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/08/05/clickhouse-config-settings/ We recently had a question on ClickHouse Community Slack about configuring the network_compression_method on an individual query basis and across all requests. Let’s see how to do just that in this blog post. Set up Let’s start by downloading and running the ClickHouse Server: curl https://clickhouse.com/ | sh ./clickhouse server Output 2024.08.05 12:01:54.701406 [ 85587882 ] {} <Information> Application: Listening for http://[::1]:8123 2024.08.05 12:01:54.701426 [ 85587882 ] {} <Information> Application: Listening for native protocol (tcp): [::1]:9000 2024. Hybrid Search in SQL with DuckDB https://www.markhneedham.com/blog/2024/07/28/hybrid-search-sql-duckdb/ Sun, 28 Jul 2024 01:44:37 +0000 https://www.markhneedham.com/blog/2024/07/28/hybrid-search-sql-duckdb/ I’ve been playing around with different approaches for Retrieval Augmented Generation (RAG) recently and came across a blog post describing Reciprocal Rank Fusion, a hybrid search technique. In this blog post, we’re going to explore how to apply this method in SQL using DuckDB. I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: DuckDB: Create a function in SQL https://www.markhneedham.com/blog/2024/07/28/duckdb-create-function-sql/ Sun, 28 Jul 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/07/28/duckdb-create-function-sql/ I’ve been learning about Hybrid Search via this blog post, which describes the Reciprocal Rank Fusion algorithm, and I wanted to implement and use it in a DuckDB query. The formula for the function is shown below: RRF(d) = Σ(r ∈ R) 1 / (k + r(d)) Where: d is a document R is the set of rankers (retrievers) k is a constant (typically 60) r(d) is the rank of document d in ranker r ClickHouse: Unknown setting 'allow_nullable_key' https://www.markhneedham.com/blog/2024/06/27/clickhouse-unknown-setting-allow_nullable_key/ Thu, 27 Jun 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/06/27/clickhouse-unknown-setting-allow_nullable_key/ I’ve been playing around with ClickHouse’s Amazon reviews dataset and ran into an interesting problem when trying to set the allow_nullable_key setting. In this blog post, we’ll learn how and why we might choose to set it. I started off with the following SQL statement to create a table called reviews based on the structure of the Parquet file: CREATE TABLE reviews ENGINE = MergeTree ORDER BY review_date EMPTY AS ( SELECT * FROM s3(concat( 'https://datasets-documentation. Mistral 7B function calling with llama.cpp https://www.markhneedham.com/blog/2024/06/23/mistral-7b-function-calling-llama-cpp/ Sun, 23 Jun 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/06/23/mistral-7b-function-calling-llama-cpp/ Mistral AI recently released version 3 of their popular 7B model and this one is fine-tuned for function calling. Function calling is a confusing name because the LLM isn’t doing any function calling itself. Instead, it takes a prompt and can then tell you which function you should call in your code and with which parameters. In this blog post, we’re going to learn how to use this functionality with llama. Side by side LLMs with Ollama and Streamlit https://www.markhneedham.com/blog/2024/05/11/side-by-side-local-llms-ollama-streamlit/ Sat, 11 May 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/05/11/side-by-side-local-llms-ollama-streamlit/ The recent 0.1.33 release of Ollama added experimental support for running multiple LLMs or the same LLM in parallel. But, to compare models on the same prompt we need a UI and that’s what we’re going to build in this blog post. I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: Semantic Router: Stop LLM chatbots going rogue https://www.markhneedham.com/blog/2024/04/14/semantic-router-stop-llm-chatbot-going-rogue/ Sun, 14 Apr 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/04/14/semantic-router-stop-llm-chatbot-going-rogue/ A tricky problem when deploying LLM-based chatbots is working out how to stop them from talking about topics that you don’t want them to talk about. Even with the cleverest prompts, with enough effort and ingenuity, users will figure a way around the guard rails. However, I recently came across a library called Semantic Router, which amongst other things, seems to provide a solution to this problem. In this blog post, we’re going to explore Semantic Router and see if we can create a chatbot that only talks about a pre-defined set of topics. llama.cpp - ValueError: Failed to create llama_context - ggml-common.h file not found https://www.markhneedham.com/blog/2024/03/31/llama-cpp-value-error-llama-context-ggml-common-not-found/ Sun, 31 Mar 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/03/31/llama-cpp-value-error-llama-context-ggml-common-not-found/ I’ve been playing around with the outlines library and needed to install llama.cpp as a result. I ran into trouble when trying to offload model layers to the GPU and in this post, I’ll explain how to install llama.cpp so that you don’t have the same issues. This was how I installed the library initially: CMAKE_ARGS="-DLLAMA_METAL=on" pip install llama-cpp-python And then let’s try to load a GGUF model with some layers offloaded to the GPU: DuckDB 0.10: Binder Error: No function matches the given name and argument types https://www.markhneedham.com/blog/2024/03/09/duckdb-strptime-binder-error-no-function-matches/ Sat, 09 Mar 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/03/09/duckdb-strptime-binder-error-no-function-matches/ In the 0.10 version of DuckDB, a breaking change was made that stops implicit casting to VARCHAR during function binding. In this blog post, we’re going to look at some ways to work around this change when fixing our DuckDB code from 0.9 or earlier. I have a CSV file that looks like this: from 'people.csv' select *; Output ┌─────────┬─────────────┐ │ name │ dateOfBirth │ │ varchar │ int64 │ ├─────────┼─────────────┤ │ John │ 19950105 │ └─────────┴─────────────┘ The dateOfBirth column isn’t an int64, but that’s how DuckDB has inferred it. Clustering YouTube comments using Ollama Embeddings https://www.markhneedham.com/blog/2024/02/27/clustering-youtube-comments-ollama-embeddings-nomic/ Tue, 27 Feb 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/02/27/clustering-youtube-comments-ollama-embeddings-nomic/ One of my favourite tools in the LLM space is Ollama and if you want to learn how to use it, there’s no better place than Matt Williams' YouTube channel. His videos get a lot of comments and they tend to contain a treasure trove of the things that people are thinking about and the questions that they have. Matt recently did a video about embeddings in Ollama and I thought it’d be fun to try to get a high-level overview of what’s happening in the comments section. python-youtube: Retrieving multiple pages using page token https://www.markhneedham.com/blog/2024/02/26/python-youtube-data-page-token/ Mon, 26 Feb 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/02/26/python-youtube-data-page-token/ I’ve been playing around with the YouTube API to analyse comments on YouTube videos and needed to use pagination to get all the comments. In this blog post, we’ll learn how to do that. But before we do anything, you’ll need to go to console.developers.google.com, create a project and enable YouTube Data API v3. Figure 1. YouTube Data API Once you’ve done that, create an API key. Figure 2. Creating an API key Create an environment variable that contains your API key: Using environment variables in ClickHouse queries https://www.markhneedham.com/blog/2024/02/23/clickhouse-environment-variables/ Fri, 23 Feb 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/02/23/clickhouse-environment-variables/ For quite some time I’ve been wondering how to get access to an environment variable in a ClickHouse Local and finally today I have a solution, which we’ll explore in this blog post. My reason for wanting to do this is so that I can pass through a ClickHouse Cloud password to use in a remoteSecure function call. I wanted to do this as part of a blog post I wrote showing how to do Hybrid Query Execution with ClickHouse. Render a CSV across multiple columns on the terminal/shell https://www.markhneedham.com/blog/2024/02/20/shell-render-csv-multiple-columns/ Tue, 20 Feb 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/02/20/shell-render-csv-multiple-columns/ I was recently working with a CSV file that contained a bunch of words and I wanted to render them on the console so that you could see all of them at once without any scrolling. i.e. I wanted the rendering of the CSV file to wrap across columns. I learned that we can do exactly this using the paste command, so let’s see how to do it. Imagine we have the CSV file shown below: Qdrant/FastEmbed: Content discovery for my blog posts https://www.markhneedham.com/blog/2024/02/11/qdrant-fast-embed-content-discovery/ Sun, 11 Feb 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/02/11/qdrant-fast-embed-content-discovery/ I was recently reading Simon Willison’s blog post about embedding algorithms in which he described how he’d used them to create a 'related posts' section on his blog post. So, of course, I wanted to see whether I could do the same for my blog as well. Note I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: LLaVA 1.5 vs. 1.6 https://www.markhneedham.com/blog/2024/02/04/llava-large-multi-modal-model-v1.5-v1.6/ Sun, 04 Feb 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/02/04/llava-large-multi-modal-model-v1.5-v1.6/ LLaVA (or Large Language and Vision Assistant), an open-source large multi-modal model, just released version 1.6. It claims to have improvements over version 1.5, which was released a few months ago: Increasing the input image resolution to 4x more pixels. This allows it to grasp more visual details. It supports three aspect ratios, up to 672x672, 336x1344, 1344x336 resolution. Better visual reasoning and OCR capability with an improved visual instruction tuning data mixture. Ollama is on PyPi https://www.markhneedham.com/blog/2024/01/28/ollama-now-on-pypi/ Sun, 28 Jan 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/01/28/ollama-now-on-pypi/ This week Ollama released a Python/PyPi library to go with their awesome tool for running LLMs on your own machine. You still need to download and run Ollama, but after that you can do almost everything from the library. In this blog post, we’re going to take it for a spin. I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: ClickHouse: Configure default output format https://www.markhneedham.com/blog/2024/01/19/clickhouse-configure-output-format/ Fri, 19 Jan 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/01/19/clickhouse-configure-output-format/ When running queries with ClickHouse Local, the results are rendered back to the screen in a table format in blocks. This default format is called PrettyCompact and most of the time this works fine, but sometimes you can end up with multiple mini-tables. In this blog post, we’re going to learn how to change the default format so that all the results show in one table. But first, let’s see how the problem manifests. An introduction to Retrieval Augmented Generation https://www.markhneedham.com/blog/2024/01/12/intro-to-retrieval-augmented-generation/ Fri, 12 Jan 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/01/12/intro-to-retrieval-augmented-generation/ Retrieval Augmented Generation (RAG) is a technique used with Large Language Models (LLM) where you augment the prompt with data retrieved from a data store so that the LLM can generate a better answer to the question that is being asked. In this blog post, we’re going to learn the basics of RAG by creating a Question and Answer system on top of the 2023 Wimbledon Championships Wikipedia page. Pandas: Exclude columns using regex https://www.markhneedham.com/blog/2024/01/05/pandas-exclude-columns-regex/ Fri, 05 Jan 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/01/05/pandas-exclude-columns-regex/ After a few months of using ClickHouse, I’ve got quite used to using the SELECT <expr> EXCEPT modifier, which lets you remove columns based on a regular expression. I wanted to do something similar when working with some data in Pandas and in this blog we’ll explore how to do that. We’re gonna be working with a CSV file of UK energy and gas tariffs for one of the energy providers. ClickHouse: Float equality https://www.markhneedham.com/blog/2024/01/04/clickhouse-float-equality/ Thu, 04 Jan 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/01/04/clickhouse-float-equality/ I’ve been playing around with NumPy data in ClickHouse this week and wanted to share what I learnt when checking for equality of float values. Let’s get going! Creating arrays We’re going to use Python’s NumPy library to create 5 arrays containing 10 values each: import numpy as np rng = np.random.default_rng(seed=42) rng.random(size=(5, 5)) Output array([[0.28138389, 0.29359376, 0.66191651, 0.55703215, 0.78389821], [0.66431354, 0.40638686, 0.81402038, 0.16697292, 0.02271207], [0.09004786, 0.72235935, 0.46187723, 0.16127178, 0. nvim: Unable to create directory for swap file - recovery impossible: permission denied https://www.markhneedham.com/blog/2024/01/03/nvim-swap-file-permission-denied/ Wed, 03 Jan 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/01/03/nvim-swap-file-permission-denied/ I was playing around with neovim last week and despite installing it via Homebrew, ran into a weird permissions error. In this blog post, I’ll describe the problem I had and how to solve it. I installed it like this: brew install nvim And then tried to create a new file: nvim foo.py Which resulted in the following error: Output E303: Unable to create directory "/Users/markhneedham/.local/state/nvim" for swap file, recovery impossible: permission denied E303: Unable to open swap file for "foo. ClickHouse: How does a number have a set number of decimal places? https://www.markhneedham.com/blog/2024/01/02/clickhouse-set-number-decimal-places/ Tue, 02 Jan 2024 00:44:37 +0000 https://www.markhneedham.com/blog/2024/01/02/clickhouse-set-number-decimal-places/ I’ve been working with a dataset in ClickHouse where I compute currency values and I really struggled to figure out how to get numbers whose decimal part is divisible by 10 to have a fixed number of decimal places. If you want to do that too, hopefully, this blog post will help. Let’s start by seeing what happens if we output the number 12.40 SELECT 12.40 AS number; Output ┌─number─┐ │ 12. Experimenting with insanely-fast-whisper https://www.markhneedham.com/blog/2023/12/23/insanely-fast-whisper-experiments/ Sat, 23 Dec 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/12/23/insanely-fast-whisper-experiments/ I recently came across insanely-fast-whisper, a CLI tool that you can use to transcribe audio files using OpenAI’s whisper-large-v3 model or other smaller models. In this blog post, I’ll summarise my experience using it to transcribe one of Scott Galloway’s podcast episodes. I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: Generating sample JSON data in S3 with shadowtraffic.io https://www.markhneedham.com/blog/2023/12/22/sample-data-s3-shadowtraffic/ Fri, 22 Dec 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/12/22/sample-data-s3-shadowtraffic/ I needed to quickly generate some data to write to S3 for a recent video on the ClickHouse YouTube channel and it seemed like a good opportunity to try out ShadowTraffic. ShadowTraffic is a tool being built by Michael Drogalis and it simulates production traffic based on a JSON file that you provide. Michael is documenting the process of building ShadowTraffic on his Substack newsletter. Michael gave me a free license to use for a few months as a 'thank you' for giving him some feedback on the product, but there is also a free version of the tool. litellm and llamafile - APIError: OpenAIException - File Not Found https://www.markhneedham.com/blog/2023/12/14/litellm-apierror-openaiexception-file-not-found/ Thu, 14 Dec 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/12/14/litellm-apierror-openaiexception-file-not-found/ I wanted to get two of my favourite tools in the LLM world - llmlite and llamafile - to play nicely and ran into an issue that I’ll explain in this blog post. This should be helpful if you’re trying to wire up other LLM servers to llmlite, it’s not specific to llamafile. Setting up llamafile In case you want to follow along, I downloaded llamafile and MistralAI 7B weights from TheBloke/Mistral-7B-v0. ClickHouse: S3Queue Table Engine - DB::Exception: There is no Zookeeper configuration in server config https://www.markhneedham.com/blog/2023/12/13/clickhouse-s3queue-no-zookeeper-configuration/ Wed, 13 Dec 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/12/13/clickhouse-s3queue-no-zookeeper-configuration/ This week I’ve been making a video showing how to use ClickHouse’s S3Queue table engine, which allows streaming import of files in an S3 bucket. The S3Queue table engine was released in version 23.8, but only received 'production-ready' status in version 23.11. In this blog post, we’ll walk through the steps to getting this to work locally and the mistakes that I made along the way. I configured an S3 bucket, added 10 files containing 100,000 rows of JSON each, and made sure that I’d set the AWS_PROFILE environment variable so that ClickHouse Server could read from the bucket. Dask: Parallelising file downloads https://www.markhneedham.com/blog/2023/12/11/dash-parallelise-file-downloads/ Mon, 11 Dec 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/12/11/dash-parallelise-file-downloads/ Before a recent meetup talk that I did showing how to do analytics on your laptop with ClickHouse Local at Aiven’s Open Source Data Infrastructure Meetup, I needed to download a bunch of Parquet files from Hugging Face’s midjourney-messages dataset. I alternate between using wget/curl or a Python script to do this type of work. This time I used Python’s requests library and I had the following script which downloads the Parquet files that I haven’t already downloaded. ClickHouse: Tuples - Code: 47. DB::Exception: Missing columns: while processing query: https://www.markhneedham.com/blog/2023/12/04/clickhouse-tuples-missing-columns/ Mon, 04 Dec 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/12/04/clickhouse-tuples-missing-columns/ I’ve been playing around with the Mid Journey Parquet metadata that I wrote about in my last blog post and struggled quite a bit to get the query to do what I wanted. Come along on a journey with me and we’ll figure it out together. We’re querying the metadata of a Parquet file that contains the metadata (I know!) of images created by the Mid Journey generative AI service. Summing columns in remote Parquet files using ClickHouse https://www.markhneedham.com/blog/2023/11/15/clickhouse-summing-columns-remote-files/ Wed, 15 Nov 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/11/15/clickhouse-summing-columns-remote-files/ I’m an avid reader of Simon Willison’s TIL blog and enjoyed a recent post showing how to sum the size of all the Midjourney images stored on Discord. He did this by querying a bunch of Parquet files stored on Hugging Face with DuckDB. I was curious whether I could do the same thing using ClickHouse and in this blog post, we’re going to find out. The dataset that we’re going to use is available at vivym/midjourney-messages. ClickHouse - How to get the first 'n' values from an array https://www.markhneedham.com/blog/2023/11/09/clickhouse-array-first-n-values/ Thu, 09 Nov 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/11/09/clickhouse-array-first-n-values/ I was recently working with some very long arrays in ClickHouse and I wanted to select just a few values so that they didn’t take up the entire screen. The way I thought would 'just work' ™ didn’t, so this blog documents how to do it. If you want to follow along, you’ll need to install ClickHouse. On a Mac, Brew is a pretty good option: brew install clickhouse Once you’ve done that, launch ClickHouse Local: ClickHouse - AttributeError: 'NoneType' object has no attribute 'array' https://www.markhneedham.com/blog/2023/11/08/clickhouse-client-array-nonetype-no-attribute/ Wed, 08 Nov 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/11/08/clickhouse-client-array-nonetype-no-attribute/ I was querying a ClickHouse server from a Python script a couple of days ago and ran into an error message when trying to create a Pandas DataFrame. In this blog, we’ll see the error message and how to fix it. I’m gonna assume that we have a ClickHouse Server running and we’re going to connect to it like this: ./clickhouse client Output ClickHouse client version 23.10.1.1709 (official build). Connecting to localhost:9000 as user default. ClickHouse - DB::Exception:: there is no writeable access storage in user directories (ACCESS_STORAGE_FOR_INSERTION_NOT_FOUND) https://www.markhneedham.com/blog/2023/11/07/clickhouse-no-writeable-access-storage/ Tue, 07 Nov 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/11/07/clickhouse-no-writeable-access-storage/ I’ve been working with ClickHouse’s access control/account management as part of a video that I created showing how to login to a ClickHouse server with an SSH key, but getting it all setup locally was a bit fiddly. In this blog post, we’ll go through the mistakes I made and how to fix them. I initially tried starting the ClickHouse server: ./clickhouse server Connecting to it with a client: ClickHouse: Convert date or datetime to epoch https://www.markhneedham.com/blog/2023/11/06/clickhouse-date-to-epoch/ Mon, 06 Nov 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/11/06/clickhouse-date-to-epoch/ I’ve been working with dates in ClickHouse today and I wanted to convert some values into epoch seconds/milliseconds to use with another tool. We’re going to document how to do that in this blog post, for future me if no one else. Let’s start an instance of ClickHouse Local: clickhouse local -m And now we’ll write a query that returns the current date/time: SELECT now() AS time; Output ┌────────────────time─┐ │ 2023-11-06 14:58:19 │ └─────────────────────┘ If we want to convert this value to epoch seconds, we can use the toUnixTimestamp function. ClickHouse: Nested type Array(String) cannot be inside Nullable type (ILLEGAL_TYPE_OF_ARGUMENT) https://www.markhneedham.com/blog/2023/11/03/clickhouse-nested-type-cannot-be-inside-nullable-type/ Fri, 03 Nov 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/11/03/clickhouse-nested-type-cannot-be-inside-nullable-type/ I’ve been working with some data that’s in CSV format but has tab-separated values in some columns. In this blog post, we’re going to learn how to process that data in ClickHouse. The CSV file that we’re working with looks like this: Table 1. data.csv value foo bar We’ll launch ClickHouse Local (clickhouse local) and then run the following: FROM file('data.csv', CSVWithNames) SELECT *; Output ┌─value─────┐ │ foo bar │ └───────────┘ Let’s try to split the value field on tab using the splitByString function: Poetry: OSError: Precompiled binaries are not available for the current platform. Please reinstall from source https://www.markhneedham.com/blog/2023/11/02/poetry-precompiled-binaries-not-available/ Thu, 02 Nov 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/11/02/poetry-precompiled-binaries-not-available/ I’ve been playing around with the CTransformers library recently and getting it installed via Poetry was a bit fiddly. In this post, we’ll run through what I’ve ended up doing. If we try to add the library in the usual way: poetry add ctransformers We’ll get the following error: Output OSError: Precompiled binaries are not available for the current platform. Please reinstall from source using: pip uninstall ctransformers --yes CT_METAL=1 pip install ctransformers --no-binary ctransformers Instead, we need to call the following command to tell Poetry to install the library from source: iPython: How to disable autocomplete https://www.markhneedham.com/blog/2023/10/29/ipython-disable-autocomplete/ Sun, 29 Oct 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/10/29/ipython-disable-autocomplete/ I’ve been toying with the idea of using iPython as the Python REPL for videos on @LearnDataWithMark, but I wanted to disable the autocomplete functionality as I find it too distracting. In this blog post, I’ll show how to do it. First, let’s install iPython: poetry add ipython And now we’ll launch the iPython REPL: poetry run ipython Output Python 3.11.4 (main, Jun 20 2023, 17:23:00) [Clang 14.0.3 (clang-1403.0.22.14.1)] Type 'copyright', 'credits' or 'license' for more information IPython 8. Poetry: Install does not contain any element https://www.markhneedham.com/blog/2023/10/26/poetry-install-does-not-contain-any-element/ Thu, 26 Oct 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/10/26/poetry-install-does-not-contain-any-element/ I’ve run into an interesting error a few times when using the Poetry package manager over the last few weeks and wanted to document it in case anyone else has the same problem. I’m still not sure how to avoid it in the first place, so if you know, please let me know! Anyway, let’s get started. Imagine we’re creating a new project and we type the following: $ poetry init It will pop up the following dialogue and we’ll select the defaults, won’t define anything interactively, and will then have it create the file: Ollama: Running GGUF Models from Hugging Face https://www.markhneedham.com/blog/2023/10/18/ollama-hugging-face-gguf-models/ Wed, 18 Oct 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/10/18/ollama-hugging-face-gguf-models/ GGUF (GPT-Generated Unified Format) has emerged as the de facto standard file format for storing large language models for inference. We are starting to see a lot of models in this format on Hugging Face, many of them uploaded by The Bloke. One cool thing about GGUF models is that it’s super easy to get them running on your own machine using Ollama. In this blog post, we’re going to look at how to download a GGUF model from Hugging Face and run it locally. ClickHouse: Code: 60. DB::Exception: Table does not exist https://www.markhneedham.com/blog/2023/10/16/clickhouse-local-table-does-not-exist/ Mon, 16 Oct 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/10/16/clickhouse-local-table-does-not-exist/ I’ve been playing with clickhouse-local again this week and ran into an interesting issue when persisting a table that I thought I’d document for future Mark. You can install ClickHouse on your machine by running the following command: curl https://clickhouse.com/ | sh Or you could use HomeBrew if you’re working on a Mac: brew install clickhouse We can then launch clickhouse-local, which lets you run ClickHouse in what I think of as an embedded mode. Apache Superset: Refusing to start due to insecure SECRET_KEY https://www.markhneedham.com/blog/2023/10/13/apache-superset-refusing-start-insecure-key/ Fri, 13 Oct 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/10/13/apache-superset-refusing-start-insecure-key/ I’ve been trying to install Apache Superset so that I can use it for a demo and ran into an issue with a secret key when trying to install it. In this blog post, I’ll explain how to work around it. We’re going to be using Poetry and will follow the installing from scratch guide. First up is installing the library: poetry add apache-superset And then after initialising the database with poetry run superset db upgrade, we’ll try to create the admin user: Ollama: Experiments with few-shot prompting on Llama2 7B https://www.markhneedham.com/blog/2023/10/11/ollama-few-shot-prompting-experiments-llama2-7b/ Wed, 11 Oct 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/10/11/ollama-few-shot-prompting-experiments-llama2-7b/ A problem that I’m currently trying to solve is how to work out whether a given sentence is a question. If there’s a question mark on the end we can assume it is a question, but what about if the question mark’s been left off? Few-shot prompting is a technique where we provide some examples in our prompt to try to guide the LLM to do what we want. And, this seemed like a good opportunity to try it out on Meta’s Llama2 7B Large Language Model using Ollama. Poetry/Dagster: ImportError: cannot import name 'appengine' from 'requests.packages.urllib3.contrib' https://www.markhneedham.com/blog/2023/10/09/dagster-poetry-importerror-cannot-import-appengine-requests/ Mon, 09 Oct 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/10/09/dagster-poetry-importerror-cannot-import-appengine-requests/ I’m taking some tentative steps into the world of batch data pipelines and I’ve been following Dagster’s DuckDB tutorial when I ran into a dependency issue that I had to work around. In this blog post, I’ll share the steps that I took in case you run into the same issue. I’m using the Poetry dependency management tool, but I think you’d get the same issue even if you used pip directly. Poetry: The current project's Python requirement is not compatible https://www.markhneedham.com/blog/2023/10/05/poetry-project-python-not-compatible/ Thu, 05 Oct 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/10/05/poetry-project-python-not-compatible/ A few times this week I’ve run into an interesting problem with Python version requirements when trying to install various packages. In this blog post, we’ll learn what’s going on and how to fix it. Our story begins with the innocent creation of a Poetry project: poetry init Next, we’re going to add dlt, the data loading tool: poetry add dlt Output Creating virtualenv incompatible-blog-Bp2VMsrx-py3.11 in /Users/markhneedham/Library/Caches/pypoetry/virtualenvs Using version ^0. Poetry: Updating a package to a new version https://www.markhneedham.com/blog/2023/10/04/poetry-package-update/ Wed, 04 Oct 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/10/04/poetry-package-update/ I’m using the Poetry package manager for all my Python projects these days and wanted to upgrade as library that I installed a few weeks ago. I got myself all tangled up and wanted to write down how to do it for future me. Let’s create a simple project to demonstrate what to do: poetry init pyproject.toml [tool.poetry] name = "update-blog" version = "0.1.0" description = "" authors = ["Mark Needham <m. Running Mistral AI on my machine with Ollama https://www.markhneedham.com/blog/2023/10/03/mistral-ai-own-machine-ollama/ Tue, 03 Oct 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/10/03/mistral-ai-own-machine-ollama/ Last week Mistral AI announced the release of their first Large Language Model (LLM), trained with 7 billion parameters, and better than Meta’s Llama 2 model with 13 billion parameters. For those keeping track, Mistral AI was founded in the summer of 2023 and raised $113m in their seed round. I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: DuckDB: Show a list of views https://www.markhneedham.com/blog/2023/10/02/duckdb-list-show-views/ Mon, 02 Oct 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/10/02/duckdb-list-show-views/ I recently wanted to get a list of the views that I’d created in a DuckDB database and it took me a while to figure out how to do it. So this blog post is for future Mark more than anyone else! We’re going to start with the following CSV file: data/sales.csv date,product_id,quantity,sales_amount 2021-01-01,101,5,50 2021-01-02,102,3,30 2021-02-01,101,4,40 2021-02-02,103,6,60 And now we’ll create a table from the DuckDB CLI: CREATE TABLE sales AS SELECT * from 'data/sales. dbt-duckdb: KeyError: "'winner_seed'" https://www.markhneedham.com/blog/2023/10/01/dbt-duckdb-key-error/ Sun, 01 Oct 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/10/01/dbt-duckdb-key-error/ I’ve been building a little demo with dbt and DuckDB to transform CSV files from Jeff Sackmann’s tennis dataset and ran into an error that initially puzzled me. In this blog post, we’ll learn how to deal with it. But first things first, we’re going to install dbt-duckdb as well as the latest version of DuckDB, which at the time of writing is 0.9.0. pip install dbt-duckdb duckdb I then cloned Mehdi Ouazza’s demo project and adjusted it to work with my dataset. GPT 3.5 Turbo vs GPT 3.5 Turbo Instruct https://www.markhneedham.com/blog/2023/09/29/openai-gpt-chat-vs-instruct/ Fri, 29 Sep 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/09/29/openai-gpt-chat-vs-instruct/ Last week OpenAI sent out the following email introducing the gpt-3.5-turbo-instruct large language model: Figure 1. Open AI announce gpt-3.5-turbo-instruct LLM I’ve never completely understood the difference between the chat and instruct models, so this seemed like a good time to figure it out. In this blog post, we’re going to give the models 5 tasks to do and then we’ll see how they get on. I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: FAISS: Exploring Approximate Nearest Neighbours Cell Probe Methods https://www.markhneedham.com/blog/2023/09/14/faiss-approximate-nearest-neighbors-cell-probe/ Thu, 14 Sep 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/09/14/faiss-approximate-nearest-neighbors-cell-probe/ I’ve been learning about vector search in recent weeks and I came across FaceBook’s FAISS library. I wanted to learn the simplest way to do approximate nearest neighbours, and that’s what we’ll be exploring in this blog post. I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: kcat: SASL - Java JAAS configuration is not supported https://www.markhneedham.com/blog/2023/09/12/kcat-sasl-java-jaas-not-supported/ Tue, 12 Sep 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/09/12/kcat-sasl-java-jaas-not-supported/ I’ve been updating the StarTree Kafka SASL recipe to use Pinot 0.12 and ran into an error while trying to have it use kcat to ingest data into Kafka. In this blog post, we’ll learn how I did this. The initial recipe was ingesting data into Kafka using kafka-console-consumer.sh, which uses the Java Kafka client. I’m using this Kafka client config file: kafka-config/kafka_client.conf security.protocol=SASL_PLAINTEXT sasl.mechanism=PLAIN sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required \ username="alice" \ password="alice-secret"; And, we use this script to ingest data from a data generator: How to run a Kotlin script https://www.markhneedham.com/blog/2023/09/07/how-to-run-kotlin-script/ Thu, 07 Sep 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/09/07/how-to-run-kotlin-script/ I was recently helping Tim get a Pinot data-loading Kotlin script working and it took me a while to figure out the best way to run it. In this blog post, I’ll share the solution we came up with. But first things first, we need to install Kotlin if it’s not already installed. I used a library called SDKMAN6 for all things JVM, so I’m gonna run the following command: Quix Streams: Process certain number of Kafka messages https://www.markhneedham.com/blog/2023/09/05/quix-streams-process-n-kafka-messages/ Tue, 05 Sep 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/09/05/quix-streams-process-n-kafka-messages/ In a recent demo, I wanted to use Quix Streams to process a specified number of messages from a Kafka topic, write a message to another stream, and then exit the Quix app. This is an unusual use of Quix Streams, so it took me a while to figure out how to do it. Let’s assume we have a Kafka broker running. We’ll create a couple of topics using the rpk tool: JupyterLab 4.0.5: Scroll output with keyboard shortcut https://www.markhneedham.com/blog/2023/09/03/jupyterlab-scroll-output-keyboard-shortcut/ Sun, 03 Sep 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/09/03/jupyterlab-scroll-output-keyboard-shortcut/ In the latest version of Jupyter Notebook/Lab (at least), the output of each cell is shown in full, regardless of how long it is. I wanted to limit the height of the output and then scroll through it within that inner window, ideally by triggering a keyboard shortcut. I learnt how to do this with the help of Stack Overflow. First, you need to open the settings editor by typing Cmd + , on a Mac or by clicking on that screen from the top menu: pyarrow: pyarrow.lib.ArrowNotImplementedError: Filter argument must be boolean type https://www.markhneedham.com/blog/2023/08/23/pyarrow-filter-argument-boolean-type/ Wed, 23 Aug 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/08/23/pyarrow-filter-argument-boolean-type/ I wanted to filter a table in pyarrow table recently and ran into troubles when trying to use the filter syntax that I’m used to from DuckDB. In this blog post I’ll explain my mistake and how to fix it. First, let’s install pyarrow: pip install pyarrow And now we’re going to create a table that has a few countries and their corresponding continents: import pyarrow as pa countries = pa. Python: TypeError: Instance and class checks can only be used with @runtime_checkable protocols https://www.markhneedham.com/blog/2023/08/21/python-typeerrr-instance-class-check-runtime-checkable/ Mon, 21 Aug 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/08/21/python-typeerrr-instance-class-check-runtime-checkable/ I’ve been playing around with ChromaDB and I wanted to programatically get a list of the embedding functions, which was a little trickier thna I expected. In this blog post, we’ll explore how I failed and then succeeded at this task. But first, let’s install ChromaDB: pip install chromadb The embedding functions live in the chromadb.utils.embedding_functions module. So my first thought was that I could list all the things defined in that module and then check which ones were a sub class of EmbeddingFunction: JupyterLab 4.0.5: Adding execution time to cell https://www.markhneedham.com/blog/2023/08/20/jupyterlab-time-cell-execution/ Sun, 20 Aug 2023 00:44:37 +0000 https://www.markhneedham.com/blog/2023/08/20/jupyterlab-time-cell-execution/ I’ve been using Jupyter Lab notebooks in some of my recent videos on Learn Data with Mark and I wanted to show cell execution timings so that viewers would have an idea of how long things were taking. I thought I’d need to use a custom timer, but it turns out there’s quite a nice plug-in, which we’ll learn about in this blog post. The plug-in is called jupyterlab-execute-time and it shows a live view of the time that a cell takes to execute, as well as showing the execution time afterward. Apache Pinot: Experimenting with the StarTree Index https://www.markhneedham.com/blog/2023/07/28/apache-pinot-experimenting-with-startree-index/ Fri, 28 Jul 2023 11:44:37 +0000 https://www.markhneedham.com/blog/2023/07/28/apache-pinot-experimenting-with-startree-index/ My colleagues Sandeep Dabade and Kulbir Nijjer recently wrote a three part blog post series about the StarTree index, an Apache Pinot indexing technique that dynamically builds a tree structure to maintain aggregates across a group of dimensions. I’ve not used this index before and wanted to give it a try and in this blog post, I’ll share what I learned. I’ve put all the code in the startreedata/pinot-recipes GitHub repository in case you want to try it out yourself. Python/Poetry: Library not loaded: no such file, not in dyld cache https://www.markhneedham.com/blog/2023/07/27/poetry-library-not-loaded-no-such-file-dyld-cache/ Thu, 27 Jul 2023 11:44:37 +0000 https://www.markhneedham.com/blog/2023/07/27/poetry-library-not-loaded-no-such-file-dyld-cache/ As I mentioned in a previous blog post, I’ve been using Python’s Poetry library, but today it stopped working! In this blog post, I’ll explain what happened and how I got it working again. It started off innocent enough, with me trying to create a new project: poetry init But instead of seeing the usual interactive wizard, I got the following error: Output dyld[20269]: Library not loaded: /opt/homebrew/Cellar/python@3.11/3.11.3/Frameworks/Python.framework/Versions/3.11/Python Referenced from: <1B2377F9-2187-39A9-AA98-20E438024DE2> /Users/markhneedham/Library/Application Support/pypoetry/venv/bin/python Reason: tried: '/opt/homebrew/Cellar/python@3. OpenAI/GPT: Returning consistent/valid JSON from a prompt https://www.markhneedham.com/blog/2023/07/27/return-consistent-predictable-valid-json-openai-gpt/ Thu, 27 Jul 2023 01:44:37 +0000 https://www.markhneedham.com/blog/2023/07/27/return-consistent-predictable-valid-json-openai-gpt/ When using OpenAI it can be tricky to get it to return a consistent response for a prompt. In this blog post, we’re going to learn how to use functions to return a consistent JSON format for a basic sentiment analysis prompt. I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: How to delete a Kafka topic https://www.markhneedham.com/blog/2023/07/26/how-to-delete-kafka-topic/ Wed, 26 Jul 2023 04:44:37 +0000 https://www.markhneedham.com/blog/2023/07/26/how-to-delete-kafka-topic/ A few years ago I wrote a blog post showing how to delete a Kafka topic when running on Docker and while that approach still works, I think I’ve now got a better way. And that’s what we’re going to learn about in this blog post. Spin up Kafka Cluster We’re going to spin up Kafka using the following Docker Compose file: docker-compose.yml version: "3" services: zookeeper: image: zookeeper:3.8.0 hostname: zookeeper container_name: zookeeper-delete ports: - "2181:2181" environment: ZOOKEEPER_CLIENT_PORT: 2181 ZOOKEEPER_TICK_TIME: 2000 kafka: image: wurstmeister/kafka:latest restart: unless-stopped container_name: "kafka-delete" ports: - "9092:9092" expose: - "9093" depends_on: - zookeeper environment: KAFKA_ZOOKEEPER_CONNECT: zookeeper-delete:2181/kafka KAFKA_BROKER_ID: 0 KAFKA_ADVERTISED_HOST_NAME: kafka-delete KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://kafka-delete:9093,OUTSIDE://localhost:9092 KAFKA_LISTENERS: PLAINTEXT://0. Confluent Kafka: DeprecationWarning: AvroProducer has been deprecated. Use AvroSerializer instead. https://www.markhneedham.com/blog/2023/07/25/confluent-kafka-avroproducer-deprecated-use-avroserializer/ Tue, 25 Jul 2023 04:44:37 +0000 https://www.markhneedham.com/blog/2023/07/25/confluent-kafka-avroproducer-deprecated-use-avroserializer/ I’ve been creating a demo showing how to ingest Avro-encoded data from Apache Kafka into Apache Pinot and ran into a deprecation warning. In this blog post, I’ll show how to update code using the Confluent Kafka Python client to get rid of that warning. I started by installing the following libraries: pip install confluent-kafka avro urllib3 requests And then my code to publish an Avro encoded event to Kafka looked like this: VSCode: Adding Poetry Python Interpreter https://www.markhneedham.com/blog/2023/07/24/vscode-poetry-python-interpreter/ Mon, 24 Jul 2023 04:44:37 +0000 https://www.markhneedham.com/blog/2023/07/24/vscode-poetry-python-interpreter/ I’ve been trying out Python’s Poetry dependency management tool recently and I really like it, but couldn’t figure out how to get it setup as VSCode’s Python interpreter. In this blog post, we’ll learn how to do that. One way to add the Python interpreter in VSCode is to press Cmd+Shift+p and then type Python Interpreter. If you select the first result, you’ll see something like the following: Figure 1. Docker: Failed to create network: Error response from daemon: could not find an available, non-overlapping IPv4 address pool among the defaults to assign to the network https://www.markhneedham.com/blog/2023/07/20/docker-network-could-not-find-non-overlapping-address-pool/ Thu, 20 Jul 2023 04:44:37 +0000 https://www.markhneedham.com/blog/2023/07/20/docker-network-could-not-find-non-overlapping-address-pool/ I use Docker for pretty much every demo I create and this sometimes results in me running out of IP addresses to serve all those networks. In this blog post, we’ll learn how to diagnose and solve this issue. Our story starts with the following command on a new project: docker compose up Usually this purs along nicely and all our components spin up just fine, but today is not our lucky day and we get the following error: Plotly: Visualising a normal distribution given average and standard deviation https://www.markhneedham.com/blog/2023/07/19/plotly-normal-distribution-average-stdev/ Wed, 19 Jul 2023 04:44:37 +0000 https://www.markhneedham.com/blog/2023/07/19/plotly-normal-distribution-average-stdev/ I’ve been playing around with Microsoft’s TrueSkill algorithm, which attempts to quantify the skill of a player using the Bayesian inference algorithm. A rating in this system is a Gaussian distribution that starts with an average of 25 and a confidence of 8.333. I wanted to visualise various ratings using Plotly and that’s what we’ll be doing in this blog post. To save you from having to install TrueSkill, we’re going to create a named tuple to simulate a TrueSkill Rating object: Redpanda: Configure pruning/retention of data https://www.markhneedham.com/blog/2023/07/18/redpanda-prune-retention/ Tue, 18 Jul 2023 04:44:37 +0000 https://www.markhneedham.com/blog/2023/07/18/redpanda-prune-retention/ I wanted to test how Apache Pinot deals with data being truncated from the underlying stream from which it’s consuming, so I’ve been trying to work out how to prune data in Redpanda. In this blog post, I’ll share what I’ve learnt so far. We’re going to spin up a Redpanda cluster using the following Docker Compose file: docker-compose.yml version: '3.7' services: redpanda: container_name: "redpanda-pruning" image: docker.redpanda.com/vectorized/redpanda:v22.2.2 command: - redpanda start - --smp 1 - --overprovisioned - --node-id 0 - --kafka-addr PLAINTEXT://0. Puppeteer: Button click doesn't work when zoomed in https://www.markhneedham.com/blog/2023/07/17/puppeteer-button-click-not-working-after-zoom/ Mon, 17 Jul 2023 04:44:37 +0000 https://www.markhneedham.com/blog/2023/07/17/puppeteer-button-click-not-working-after-zoom/ I’m still playing around with Puppeteer, a Nodejs library that provides an API to control Chrome/Chromium. I want to load the Pinot UI zoomed to 250% and then write and run some queries. We can install Puppeteer by running the following command: npm i puppeteer-core I then created the file drive_pinot.mjs and added the following code, which opens the Pinot query console and then clicks on the 'Run Query' button: Puppeteer: Unsupported command-line flag: --enabled-blink-features=IdleDetection. https://www.markhneedham.com/blog/2023/07/13/puppeteer-unsupported-flag-enabled-blink-features-idledetection/ Thu, 13 Jul 2023 04:44:37 +0000 https://www.markhneedham.com/blog/2023/07/13/puppeteer-unsupported-flag-enabled-blink-features-idledetection/ In many of the StarTree recipe videos that I’ve worked on, I show how to write queries in the Pinot UI. If I wrote these queries manually there’d be way too many typos, so I drive the UI using a script. I’ve recently been exploring whether I can do this using a Node.js library called Puppeteer and wanted to share a warning message that I ran into early doors. Redpanda: Viewing consumer group offsets from __consumer_offsets https://www.markhneedham.com/blog/2023/07/12/redpanda-consumer-group-offsets/ Wed, 12 Jul 2023 04:44:37 +0000 https://www.markhneedham.com/blog/2023/07/12/redpanda-consumer-group-offsets/ Redpanda supports consumer groups, which are sets of consumers that cooperate to consume data from topics. The consumers in a group are assigned a partition and they keep track of the last consumed offset in the __consumer_offsets topic. I wanted to see how many messages had been consumed by a consumer group and that’s what we’ll explore in this post. My first thought was to query the __consumer_offsets topic using rpk topic consume. Quix Streams: Consuming and Producing JSON messages https://www.markhneedham.com/blog/2023/07/11/quix-streams-consume-produce-json-messages/ Tue, 11 Jul 2023 04:44:37 +0000 https://www.markhneedham.com/blog/2023/07/11/quix-streams-consume-produce-json-messages/ I’ve been meaning to take Quix Streams for a spin for a while and got the chance while building a recent demo. Quix Streams is a library for building streaming applications on time-series data, but I wanted to use it to do some basic consuming and producing of JSON messages. That’s what we’re going to do in this blog post. We’re going to use Redpanda to store our messages. We’ll launch a Redpanda instance using the following Docker Compose file: Python: Re-import module https://www.markhneedham.com/blog/2023/07/07/python-reimport-module/ Fri, 07 Jul 2023 04:44:37 +0000 https://www.markhneedham.com/blog/2023/07/07/python-reimport-module/ I often write little Python scripts that import code from other local modules and a common problem I have when using the Python REPL is that I update the code in the other modules and then can’t use the new functionality without restarting the REPL and re-importing everything. At least so I thought! It turns out there is a way to refresh those modules and that’s what we’ll be exploring in this blog post. ClickHouse: How to unpack or unnest an array https://www.markhneedham.com/blog/2023/07/03/clickhouse-unpack-unnest-array/ Mon, 03 Jul 2023 04:44:37 +0000 https://www.markhneedham.com/blog/2023/07/03/clickhouse-unpack-unnest-array/ I recently came across clickhouse-local via this article in the MotherDuck monthly newsletter and I wanted to give it a try on my expected goals dataset. One of the first things that I wanted to do was unpack an array and in this blog post, we’ll learn how to do that. I installed Clickhouse by running the following command: curl https://clickhouse.com/ | sh And then launched the clickhouse-local CLI like this: Detecting and splitting scenes in a video https://www.markhneedham.com/blog/2023/06/30/detecting-splitting-scenes-video/ Fri, 30 Jun 2023 04:44:37 +0000 https://www.markhneedham.com/blog/2023/06/30/detecting-splitting-scenes-video/ When editing videos for my YouTube channel, Learn Data with Mark, I spend a bunch of time each week chopping up a screencast into scenes that I then line up with a separately recorded voice-over. I was curious whether I could automate the chopping-up process and that’s what we’re going to explore in this blog post. I started out by asking ChatGPT the following question: ChatGPT Prompt I want to chop up a demo for a YouTube video into smaller segments. Python: All about the next function https://www.markhneedham.com/blog/2023/06/28/python-next-function-iterator/ Wed, 28 Jun 2023 04:44:37 +0000 https://www.markhneedham.com/blog/2023/06/28/python-next-function-iterator/ Yesterday I wrote a blog post about some different ways to take the first element from a Python list. Afterward I was chatting to my new rubber duck, ChatGPT, which suggested the next function on an iterator as an alternative approach. And so that’s what we’re going to explore in this blog post. The next function gets the first value from an iterator and optionally returns a provided default value if the iterator is empty. Python: Get the first item from a collection, ignore the rest https://www.markhneedham.com/blog/2023/06/27/python-get-first-item-collection-ignore-rest/ Tue, 27 Jun 2023 04:44:37 +0000 https://www.markhneedham.com/blog/2023/06/27/python-get-first-item-collection-ignore-rest/ When writing Python scripts, I often find myself wanting to take the first item from a collection and ignore the rest of the values. I usually use something like values[0] to take the first value from the list, but I was curious whether I could do better by using destructuring. That’s what we’re going to explore in this blog post. We’ll start with a list that contains some names: Running a Hugging Face Large Language Model (LLM) locally on my laptop https://www.markhneedham.com/blog/2023/06/23/hugging-face-run-llm-model-locally-laptop/ Fri, 23 Jun 2023 04:44:37 +0000 https://www.markhneedham.com/blog/2023/06/23/hugging-face-run-llm-model-locally-laptop/ I’ve been playing around with a bunch of Large Language Models (LLMs) on Hugging Face and while the free inference API is cool, it can sometimes be busy, so I wanted to learn how to run the models locally. That’s what we’ll be doing in this blog post. I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: LangChain: 1 validation error for LLMChain - value is not a valid dict (type=type_error.dict) https://www.markhneedham.com/blog/2023/06/23/langchain-validation-error-llmchain-value-not-valid-dict/ Fri, 23 Jun 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/06/23/langchain-validation-error-llmchain-value-not-valid-dict/ I surely can’t be the first to make the mistake that I’m about to describe and I expect I won’t be the last! I’m still swimming in the LLM waters and I was trying to get GPT4All to play nicely with LangChain. I wrote the following code to create an LLM chain in LangChain so that every question would use the same prompt template: from langchain import PromptTemplate, LLMChain from gpt4all import GPT4All llm = GPT4All( model_name="ggml-gpt4all-j-v1. GPT4All/LangChain: Model.__init__() got an unexpected keyword argument 'ggml_model' (type=type_error) https://www.markhneedham.com/blog/2023/06/22/gpt4all-langchain-unexpected-keyword-ggml_model/ Thu, 22 Jun 2023 04:44:37 +0000 https://www.markhneedham.com/blog/2023/06/22/gpt4all-langchain-unexpected-keyword-ggml_model/ I’m starting to realise that things move insanely fast in the world of LLMs (Large Language Models) and you will run into issues because you aren’t using the latest version of libraries. I say this because I’ve been following Sami Maameri’s blog post which explains how to run an LLM on your own machine and ran into an error, which we’ll explore in this blog post. Sami’s post is based around a library called GPT4All, but he also uses LangChain to glue things together. Chroma/LangChain: 'NoneType' object has no attribute 'info' https://www.markhneedham.com/blog/2023/06/22/chroma-nonetype-object-no-attribute-info/ Thu, 22 Jun 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/06/22/chroma-nonetype-object-no-attribute-info/ Following on from a blog post that I wrote yesterday about doing similarity search with ChromaDB, I noticed an odd error message being printed as the script was exiting. In this blog post, we’ll explore what was going on. To recap, I have the following code to find chunks of YouTube transcripts that are most similar to an input query: test_chroma.py from langchain.embeddings import HuggingFaceEmbeddings from langchain.vectorstores import Chroma hf_embeddings = HuggingFaceEmbeddings(model_name='sentence-transformers/all-MiniLM-L6-v2') store = Chroma(collection_name="transcript", persist_directory="db", embedding_function=hf_embeddings) result = store. Chroma/LangChain: Index not found, please create an instance before querying https://www.markhneedham.com/blog/2023/06/21/chroma-index-not-found-create-instance-querying/ Wed, 21 Jun 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/06/21/chroma-index-not-found-create-instance-querying/ Somewhat belatedly, I’ve been playing around with LangChain and HuggingFace to spike a tool that lets me ask question about Tim Berglund’s Real-Time Analytics podcast. I’m using the Chroma database to store vectors of chunks of the transcript so that I can find appropriate sections to feed to the Large Language Model to help with answering my questions. I ran into an initially perplexing error while building this out, which we’re going to explore in this blog post. DuckDB/SQL: Convert string in YYYYmmdd format to Date https://www.markhneedham.com/blog/2023/06/20/duckdb-sql-string-date/ Tue, 20 Jun 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/06/20/duckdb-sql-string-date/ I’ve been working with a data set that represents dates as strings in the format 'YYYYmmdd' and I wanted to convert those values to Dates in DuckDB. In this blog post, we’ll learn how to do that. Let’s create a small table with a single column that represents date of births: create table players (dob VARCHAR); insert into players values('20080203'), ('20230708'); We can write the following query to return the rows in the table: Hugging Face: Using `max_length`'s default (20) to control the generation length. This behaviour is deprecated https://www.markhneedham.com/blog/2023/06/19/huggingface-max-length-generation-length-deprecated/ Mon, 19 Jun 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/06/19/huggingface-max-length-generation-length-deprecated/ I’ve been trying out some of the Hugging Face tutorials and came across an interesting warning message while playing around with the google/flan-t5-large model. In this blog post, we’ll learn how to get rid of that warning. I was running a variation of the getting started example: from transformers import T5Tokenizer, T5ForConditionalGeneration tokenizer = T5Tokenizer.from_pretrained("google/flan-t5-large") model = T5ForConditionalGeneration.from_pretrained("google/flan-t5-large") input_text = "Who is the UK Prime Minister? Explain step by step" input_ids = tokenizer(input_text, return_tensors="pt"). DuckDB/SQL: Transpose columns to rows with UNPIVOT https://www.markhneedham.com/blog/2023/06/13/duckdb-sql-transpose-columns-to-rows-unpivot/ Tue, 13 Jun 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/06/13/duckdb-sql-transpose-columns-to-rows-unpivot/ I’ve been playing around with the Kaggle European Soccer dataset, which contains, amongst other things, players and their stats in the FIFA video game. I wanted to compare the stats of Ronaldo and Messi, which is where this story begins. I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: GitHub: Get a CSV containing my pull requests (PRs) https://www.markhneedham.com/blog/2023/06/12/github-list-pull-requests-csv/ Mon, 12 Jun 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/06/12/github-list-pull-requests-csv/ I wanted to get a list of my GitHub pull requests (PRs) and commits, which was surprisingly difficult to figure out how to do. I’m sure it must be possible to get this data from the API, but it was a lot easier to figure out how to do so with the GitHub CLI. This blog post explains how to use the GitHub CLI on the Mac OS terminal. If you’re trying to do this on Windows, see Get a CSV of all my pull requests from Github using Github CLI and PowerShell. Creating LinkedIn Carousel/Slides https://www.markhneedham.com/blog/2023/06/08/linkedin-slides-carousel/ Thu, 08 Jun 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/06/08/linkedin-slides-carousel/ If you’ve been using LinkedIn recently, you’ve likely seen those posts where people post slides in a kind of carousel that you can horizontally scroll. I wanted to create one to explain Apache Pinot’s Upserts feature, but I wasn’t sure how to create one. Since it’s 2023, I started by asking ChatGPT: Figure 1. ChatGPT doesn’t know about LinkedIn Carousel Unfortunately ChatGPT doesn’t know how to do it, but Harrison Avisto does and was happy to teach me. DuckDB/SQL: Pivot - 0 if null https://www.markhneedham.com/blog/2023/06/07/duckdb-sql-pivot-0-if-null/ Wed, 07 Jun 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/06/07/duckdb-sql-pivot-0-if-null/ I’ve been learning all about the PIVOT function that was recently added in DuckDB and I ran into an issue where lots of the cells in my post PIVOT table were null values. In this blog post, we’ll learn how to replace those nulls with 0s (or indeed any other value). Setup I’m working with Jeff Sackmann’s tennis dataset, which I loaded by running the following query: CREATE OR REPLACE TABLE matches AS SELECT * FROM read_csv_auto( list_transform( range(1968, 2023), y -> 'https://raw. Kafka/Kubernetes: Failed to resolve: nodename nor servname provided, or not known https://www.markhneedham.com/blog/2023/06/06/kafka-kubernetes-failed-resolve-nodename-servname-not-known/ Tue, 06 Jun 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/06/06/kafka-kubernetes-failed-resolve-nodename-servname-not-known/ I’ve been trying out the Running Pinot in Kubernetes tutorial and ran into a problem trying to write data to Kafka. In this blog we’ll explore how I got around that problem. I’m using Helm with Kubernetes and started a Kafka service by running the following: helm repo add kafka https://charts.bitnami.com/bitnami helm install -n pinot-quickstart kafka kafka/kafka --set replicas=1,zookeeper.image.tag=latest I waited until the service had started and then ran the following command to port forward the Kafka service’s port 9092 to port 9092 on my host OS: Python: Working with tuples in lambda expressions https://www.markhneedham.com/blog/2023/06/06/python-lambda-expression-tuple/ Tue, 06 Jun 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/06/06/python-lambda-expression-tuple/ I’m still playing around with data returned by Apache Pinot’s HTTP API and I wanted to sort a dictionary of segment names by partition id and index. In this blog post we’re going to look into how to do that. We’ll start with the following dictionary: segments = { "events3__4__1__20230605T1335Z": "CONSUMED", "events3__4__13__20230605T1335Z": "CONSUMED", "events3__4__20__20230605T1335Z": "CONSUMING" } As I mentioned above, I want to sort the dictionary’s items by partition id and index, which are embedded inside the key name. Python: Padding a string https://www.markhneedham.com/blog/2023/06/05/python-pad-string/ Mon, 05 Jun 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/06/05/python-pad-string/ I’ve been writing some scripts to parse data from Apache Pinot’s HTTP API and I wanted to format the values stored in a map to make them more readable. In this blog post, we’ll look at some ways that I did that. I started with a map that looked a bit like this: segments = { "events3__4__1__20230605T1335Z": "CONSUMED", "events3__4__20__20230605T1335Z": "CONSUMING" } And then I iterated over and printed each item like this: DuckDB: Generate dummy data with user defined functions (UDFs) https://www.markhneedham.com/blog/2023/06/02/duckdb-dummy-data-user-defined-functions/ Fri, 02 Jun 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/06/02/duckdb-dummy-data-user-defined-functions/ In the 0.8 release of DuckDB, they added functionality that lets you add your own functions when using the Python package I wanted to see if I could use it to generate dummy data so that’s what we’re going to do in this blog post. Note I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: Debezium: Capture changes from MySQL https://www.markhneedham.com/blog/2023/05/31/debezium-capture-changes-mysql/ Wed, 31 May 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/05/31/debezium-capture-changes-mysql/ I’ve been working on a Real-Time Analytics workshop that I’m going to be presenting at the ODSC Europe conference in June 2023 and I wanted to have Debezium publish records from a MySQL database without including the schema. I’m using the debezium/connect:2.3 Docker image to run Debezium locally and I have a MySQL database running with the hostname mysql on port 3306. Below is the way that I configured this: Node.js: Minifying JSON documents https://www.markhneedham.com/blog/2023/05/30/nodejs-minify-json-documents/ Tue, 30 May 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/05/30/nodejs-minify-json-documents/ I often need to minimise the schema and table config files that you use to configure Apache Pinot so that they don’t take up so much space. After doing this manually for ages, I came across the json-stringify-pretty-compact library, which speeds up the process. We can install it like this: npm install json-stringify-pretty-compact And then I have the following script: minify.mjs import pretty from 'json-stringify-pretty-compact'; let inputData = ''; process. DuckDB: Ingest a bunch of CSV files from GitHub https://www.markhneedham.com/blog/2023/05/25/duckdb-ingest-csv-files-github/ Thu, 25 May 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/05/25/duckdb-ingest-csv-files-github/ Jeff Sackmann’s tennis_atp repository is one of the best collections of tennis data and I wanted to ingest the ATP Tour singles matches using the DuckDB CLI. In this blog post we’ll learn how to do that. Usually when I’m ingesting data into DuckDB I’ll specify the files that I want to ingest using the wildcard syntax. In this case that would mean running a query like this: CREATE OR REPLACE TABLE matches AS SELECT * FROM "https://raw. DuckDB/SQL: Create a list of numbers https://www.markhneedham.com/blog/2023/05/24/duckdb-sql-create-list-numbers/ Wed, 24 May 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/05/24/duckdb-sql-create-list-numbers/ While in DuckDB land, I wanted to create a list of numbers, just like you can with Cypher’s range function. After a bit of searching that resulted in very complex solutions, I came across the Postgres generate_series function, which does the trick. We can use it in place of a table, like this: SELECT * FROM generate_series(1, 10); Table 1. Output generate_series 1 2 3 4 5 6 7 Arc Browser: Building a plugin (Boost) with help from ChatGPT https://www.markhneedham.com/blog/2023/05/23/arc-browser-creating-boost-plugin-with-chatgpt/ Tue, 23 May 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/05/23/arc-browser-creating-boost-plugin-with-chatgpt/ I’ve been using the Arc Browser for a couple of months now and one of my favourite things is the simplicity of the plugin (or as they call it, 'Boost') functionality. I wanted to port over a Chrome bookmark that I use to capture the podcasts that I’ve listened to on Player.FM. In this blog post I’ll show how ChatGPT helped me convert the bookmark code to an Arc Boost. Venkat - An inline code snippet execution extension for VS Code (Made in GPT-4) https://www.markhneedham.com/blog/2023/05/17/venkat-inline-code-snippet-execution-vs-code-execution/ Wed, 17 May 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/05/17/venkat-inline-code-snippet-execution-vs-code-execution/ (Co-authored with Michael Hunger) Venkat Subramaniam is a legendary speaker on the tech conference circuit whose presentations are famous for executing arbitrary code snippets and showing the results as a tooltip directly in the editor. This makes it really great for videos or talks as you don’t need a second output terminal to run your code and you can just continue explaining what you’re doing. The results go away afterwards, so you don’t need to worry about that. Cropping a video using FFMPEG https://www.markhneedham.com/blog/2023/05/15/cropping-video-ffmpeg/ Mon, 15 May 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/05/15/cropping-video-ffmpeg/ I needed to crop a video that I used as part of a video on my YouTube channel, Learn Data With Mark, and Camtasia kept rendering a black screen. So I had to call for FFMPEG! Cropping the bottom of a video My initial video was 2160 x 3840 but I didn’t need the bottom 1920 pixels because I’m using that part of the screen for a video of me. Python: Naming slices https://www.markhneedham.com/blog/2023/05/13/python-naming-slices/ Sat, 13 May 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/05/13/python-naming-slices/ Another gem from Fluent Python is that you can name slices. How did I not know that?! Let’s have a look how it works using an example of a Vehicle Identification Number, which has 17 characters that act as a unique identifier for a vehicle. Different parts of that string mean different things. So given the following VIN: vin = "2B3HD46R02H210893" We can extract components like this: print(f""" World manufacturer identifier: {vin[0:3]} Vehicle Descriptor: {vin[3:9]} Vehicle Identifier: {vin[9:17]} """. Python 3.10: Pattern matching with match/case https://www.markhneedham.com/blog/2023/05/09/python-pattern-matching-match-case/ Tue, 09 May 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/05/09/python-pattern-matching-match-case/ I’ve been reading Fluent Python and learnt about pattern matching with the match/case statement, introduced in Python 3.10. You can use it instead of places where you’d otherwise use if, elif, else statements. I created a small example to understand how it works. The following function takes in a list where the first argument should be foo, followed by a variable number of arguments, which we print to the console: DuckDB/SQL: Get decade from date https://www.markhneedham.com/blog/2023/04/20/duckdb-sql-decade-from-date/ Thu, 20 Apr 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/04/20/duckdb-sql-decade-from-date/ Working with dates in SQL can sometimes be a bit tricky, especially when you need to extract specific information, like the decade a date belongs to. In this blog post, we’ll explore how to easily obtain the decade from a date using DuckDB, a lightweight and efficient SQL database engine. First, install DuckDB and launch it: ./duckdb Next, we’re going to create a movies table that has columns for title and releaseDate: DuckDB/SQL: Convert epoch to timestamp with timezone https://www.markhneedham.com/blog/2023/04/05/duckdb-sql-convert-epoch-timestamp-timezone/ Wed, 05 Apr 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/04/05/duckdb-sql-convert-epoch-timestamp-timezone/ I’ve been playing around with the Citi Bike Stations dataset on Kaggle with DuckDB and ran into trouble when trying to convert a column containing epoch timestamps to a timestamp with timezone support. In this blog we’ll learn how to do that, which will at least be helpful to future me, if noone else! The dataset contains 4GB worth of CSV files, but I’ve just downloaded a few of them manually for now. Tennis Head to Head with DuckDB and Streamlit https://www.markhneedham.com/blog/2023/03/31/tennis-head-to-head-duckdb-streamlit/ Fri, 31 Mar 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/03/31/tennis-head-to-head-duckdb-streamlit/ In this blog post we’re going to learn how to build an application to compare the matches between two ATP tennis players. DuckDB and Streamlit will be our partners in crime for this mission. Set up To get started, let’s create a virtual environment: python -m venv .venv source .venv/bin/activate And now install some libraries: pip install duckdb streamlit streamlit-searchbox And now let’s open a file, app.py and import the packages: DuckDB/Python: Cannot combine LEFT and RIGHT relations of different connections! https://www.markhneedham.com/blog/2023/03/20/duckdb-cannot-combine-left-right-relations/ Mon, 20 Mar 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/03/20/duckdb-cannot-combine-left-right-relations/ I’ve been playing around with DuckDB over the weekend and ran into an interesting problem while using the Relational API in the Python package. We’re going to explore that in this blog post. Set up To get started, let’s install DuckDB: pip install duckdb And now let’s open a Python shell and import the package: import duckdb Next, let’s create a DuckDB connection and import the httpfs module, which we’ll use in just a minute: Apache Pinot: Geospatial - java.nio.BufferUnderflowException: null https://www.markhneedham.com/blog/2023/03/10/apache-pinot-geospatial-buffer-underflow/ Fri, 10 Mar 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/03/10/apache-pinot-geospatial-buffer-underflow/ I’ve been working on a blog post showing how to use Geospatial indexes in Apache Pinot and ran into an interesting exception that I’ll explain in this blog post. Set up But first, let’s take a look at the structure of the data that I’m ingesting from Apache Kafka. Below is an example of one of those events: { "trainCompany": "London Overground", "atocCode": "LO", "lat": 51.541615, "lon": -0.122528896, "ts": "2023-03-10 11:35:20", "trainId": "202303107145241" } As you’ve probably guessed, I’m importing the locations of trains in the UK. DuckDB: Join based on maximum value in other table https://www.markhneedham.com/blog/2023/02/01/duckdb-join-max-value-other-table/ Wed, 01 Feb 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/02/01/duckdb-join-max-value-other-table/ In this blog post we’re going to learn how to write a SQL query to join two tables where one of the tables has multiple rows for each key. We want to select only the rows that contain the most recent (or maximum) value from that table. Our story begins with a YouTube video that I created showing how to query the European Soccer SQLite database with DuckDB. This database contains lots of different tables, but we are only interested in Player and Player_Attributes. Flink SQL: Could not execute SQL statement. Reason: java.io.IOException: Corrupt Debezium JSON message https://www.markhneedham.com/blog/2023/01/24/flink-sql-could-not-execute-sql-statement-corrupt-debezium-message/ Tue, 24 Jan 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/01/24/flink-sql-could-not-execute-sql-statement-corrupt-debezium-message/ As part of a JFokus workshop that I’m working on I wanted to create a Flink table around a Kafka stream that I’d populated from MySQL with help from Debezium. In this blog post I want to show how to do this and explain an error that I encountered along the way. To start, we have a products table in MySQL that’s publishing events to Apache Kafka. We can see the fields in this event by running the following command: Flink SQL: Exporting nested JSON to a Kafka topic https://www.markhneedham.com/blog/2023/01/24/flink-sql-export-nested-json-kafka/ Tue, 24 Jan 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/01/24/flink-sql-export-nested-json-kafka/ I’ve been playing around with Flink as part of a workshop that I’m doing at JFokus in a couple of weeks and I wanted to export some data from Flink to Apache Kafka in a nested format. In this blog post we’ll learn how to do that. Setup We’re going to be using the following Docker Compose config: docker-compose.yml version: "3" services: zookeeper: image: zookeeper:latest container_name: zookeeper hostname: zookeeper ports: - "2181:2181" environment: ZOO_MY_ID: 1 ZOO_PORT: 2181 ZOO_SERVERS: server. Exporting CSV files to Parquet file format with Pandas, Polars, and DuckDB https://www.markhneedham.com/blog/2023/01/06/export-csv-parquet-pandas-polars-duckdb/ Fri, 06 Jan 2023 02:44:37 +0000 https://www.markhneedham.com/blog/2023/01/06/export-csv-parquet-pandas-polars-duckdb/ I was recently trying to convert a CSV file to Parquet format and came across a StackOverflow post that described a collection of different options. My CSV file was bigger than the amount of memory I had available, which ruled out some of the methods. In this blog post we’re going to walk through some options for exporting big CSV files to Parquet format. Note I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: kcat/jq: Reached end of topic at offset: exiting https://www.markhneedham.com/blog/2022/12/06/kcat-jq-reached-end-of-topic-exiting/ Tue, 06 Dec 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/12/06/kcat-jq-reached-end-of-topic-exiting/ I’ve recently been working with Debezium to get the Pizza Shop product catalogue from MySQL into Apache Kafka and ran into an issue when querying the resulting stream using kcat and jq. In this blog I’ll show how I worked around that problem. I configured Debezium to write any changes to the products table into the mysql.pizzashop.products topic. I then queriesthis topic to find the changes for just one of the products: Python: Sorting lists of dictionaries with sortedcontainers https://www.markhneedham.com/blog/2022/12/02/python-sorting-lists-dictionaries-sortedcontainers/ Fri, 02 Dec 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/12/02/python-sorting-lists-dictionaries-sortedcontainers/ I was recently working on a Kafka streams data generator, where I only wanted to publish events once the time on those events had been reached. To solve this problem I needed a sorted list and in this blog post we’re going to explore how I went about doing this. Note I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: Blogging for Google: Why I write about error messages https://www.markhneedham.com/blog/2022/11/22/blogging-for-google-error-messages/ Tue, 22 Nov 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/11/22/blogging-for-google-error-messages/ Writing blog posts that aim to go viral on social media is a well-known content strategy, but in this post, I want to persuade you that you should blog for Google as well. Blog for Google? What does blogging for Google even mean? The easiest demonstration is to look at a screenshot from the Google Console Insights report I was sent last week. This section of the report shows the most used search terms that result in someone ending up on my blog. Apache Pinot: Inserts from SQL - Unable to get tasks states map - ClassNotFoundException: 'org.apache.pinot.plugin.filesystem.S3PinotFS' https://www.markhneedham.com/blog/2022/11/18/apache-pinot-inserts-sql-unable-get-tasks-states-map-classnotfoundexception-s3pinotfs/ Fri, 18 Nov 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/11/18/apache-pinot-inserts-sql-unable-get-tasks-states-map-classnotfoundexception-s3pinotfs/ I recently wrote a post on the StarTre blog describing the inserts from SQL feature that was added in Apache Pinot 0.11, and while writing it I came across some interesting exceptions due to configuration mistakes I’d made. In this post we’re going to describe one of those exceptions. To recap, I was trying to ingest a bunch of JSON files from an S3 bucket using the following SQL query: Apache Pinot: Inserts from SQL - Unable to get tasks states map - No task is generated for table https://www.markhneedham.com/blog/2022/11/18/apache-pinot-inserts-sql-unable-get-tasks-states-map-no-task-generated-for-table/ Fri, 18 Nov 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/11/18/apache-pinot-inserts-sql-unable-get-tasks-states-map-no-task-generated-for-table/ I recently wrote a post on the StarTre blog describing the inserts from SQL feature that was added in Apache Pinot 0.11, and while writing it I came across some interesting exceptions due to configuration mistakes I’d made. In this post we’re going to describe one of those exceptions. To recap, I was trying to ingest a bunch of JSON files from an S3 bucket using the following SQL query: Apache Pinot: Inserts from SQL - Unable to get tasks states map - NullPointerException https://www.markhneedham.com/blog/2022/11/18/apache-pinot-inserts-sql-unable-get-tasks-states-map-nullpointerexception/ Fri, 18 Nov 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/11/18/apache-pinot-inserts-sql-unable-get-tasks-states-map-nullpointerexception/ I recently wrote a post on the StarTre blog describing the inserts from SQL feature that was added in Apache Pinot 0.11, and while writing it I came across some interesting exceptions due to configuration mistakes I’d made. In this post we’re going to describe one of those exceptions. To recap, I was trying to ingest a bunch of JSON files from an S3 bucket using the following SQL query: Diffing Apache Parquet schemas with DuckDB https://www.markhneedham.com/blog/2022/11/17/duckdb-diff-parquet-schema/ Thu, 17 Nov 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/11/17/duckdb-diff-parquet-schema/ I’ve been playing around with DuckDB, the new hotness in the analytics space, over the last month, and my friend Michael Hunger asked whether you could use it to compute a diff of Apache Parquet schemas. Challenge accepted! Note I’ve created a video showing how to do this on my YouTube channel, Learn Data with Mark, so if you prefer to consume content through that medium, I’ve embedded it below: Apache Pinot: Unable to render templates on ingestion job spec template file https://www.markhneedham.com/blog/2022/11/14/apache-pinot-unable-to-render-templates/ Mon, 14 Nov 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/11/14/apache-pinot-unable-to-render-templates/ I was recently trying to ingest some JSON files into Apache Pinot from an S3 bucket and came across an exception when trying to pass a variable to the LaunchDataIngestionJob command I was using the following ingestion job specification: config/job-spec.yml executionFrameworkSpec: name: 'standalone' segmentGenerationJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentGenerationJobRunner' segmentTarPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentTarPushJobRunner' segmentUriPushJobRunnerClassName: 'org.apache.pinot.plugin.ingestion.batch.standalone.SegmentUriPushJobRunner' jobType: SegmentCreationAndTarPush inputDirURI: 's3://marks-st-cloud-bucket/events/' includeFileNamePattern: 'glob:**/*.json' outputDirURI: '/data' overwriteOutput: true pinotFSSpecs: - scheme: s3 className: org.apache.pinot.plugin.filesystem.S3PinotFS configs: region: 'eu-west-2' - scheme: file className: org. Java: FileSystems.getDefault().getPathMatcher: IllegalArgumentException https://www.markhneedham.com/blog/2022/11/11/java-file-systems-path-matcher/ Fri, 11 Nov 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/11/11/java-file-systems-path-matcher/ I was debugging something in the Apache Pinot code earlier this week and came across the FileSystems.getDefault().getPathMatcher function, which didn’t work quite how I expected. The function creates a PathMatcher that you can use to match against Paths. I was passing through a value of *.json, which was then resulting in code similar to this: import java.nio.file.FileSystems; import java.nio.file.Path; import java.nio.file.PathMatcher; class Main { public static void main(String args[]) { PathMatcher matcher = FileSystems. Vercel: Redirect wildcard (nested) paths https://www.markhneedham.com/blog/2022/07/27/vercel-redirect-wildcards-nested-paths/ Wed, 27 Jul 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/07/27/vercel-redirect-wildcards-nested-paths/ We’re deploying the StarTree developer site, dev.startree.ai, to Vercel, and recently needed to do some redirects of a few pages. I initially added individual redirects for each page, but there were eventually too many pages and I wanted to automate it. In this post we’ll learn how to do that. Figure 1. Vercel: Redirect wildcard (nested) paths We wanted to redirect everything under https://dev.startree.ai/docs/thirdeye to https://dev.startree.ai/docs/startree-enterprise-edition/startree-thirdeye/ and started off by using wild card path matching, as seen in the vercel. Apache Pinot: Import JSON data from a CSV file - Illegal Json Path: $['id'] does not match document https://www.markhneedham.com/blog/2022/07/21/apache-pinot-import-json-data-csv-file-illegal-json-path-does-not-match-document/ Thu, 21 Jul 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/07/21/apache-pinot-import-json-data-csv-file-illegal-json-path-does-not-match-document/ I’ve been working on an Apache Pinot dataset where I ingested a JSON document stored in a CSV file. I made a mistake with the representation of the JSON and it took me a while to figure out what I’d done wrong. We’ll go through it in this blog post. Figure 1. Apache Pinot: Import JSON data from a CSV file - Illegal Json Path: $['id'] does not match document Setup We’re going to spin up a local instance of Pinot and Kafka using the following Docker compose config: Docusaurus: Side menu on custom page https://www.markhneedham.com/blog/2022/07/11/docusaurus-side-menu-custom-page/ Mon, 11 Jul 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/07/11/docusaurus-side-menu-custom-page/ I’ve been working with Docusaurus to build the dev.startree.ai website over the last few months and I wanted to add a custom page with a sidebar similar to the one that gets automatically generated on documentation pages. All the examples I could find showed you to create a splash page, so it took me a while to figure out how to do what I wanted, but in this post we’ll learn how to do it. Apache Pinot: Skipping periodic task: Task: PinotTaskManager https://www.markhneedham.com/blog/2022/06/23/apache-pinot-skipping-periodic-task-pinot-task-manager/ Thu, 23 Jun 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/06/23/apache-pinot-skipping-periodic-task-pinot-task-manager/ As I mentioned in my last blog post, I’ve been working on an Apache Pinot recipe showing how to ingest data from S3 and after I’d got that working I moved onto using the SegmentGenerationAndPushTask to poll S3 and ingest files automatically. It took me longer than it should have to get this working and hopefully this blog post will help you avoid the problems that I had. Figure 1. docker exec: Passing in environment variables https://www.markhneedham.com/blog/2022/06/16/docker-exec-environment-variables/ Thu, 16 Jun 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/06/16/docker-exec-environment-variables/ I’ve been working on an Apache Pinot recipe showing how to ingest data from S3 and I needed to pass in my AWS credentials to the docker exec command that I was running. It wasn’t difficult to do, but took me a little while to figure out. Figure 1. docker exec: Passing in environment variables The command that I was running looked like this: docker exec \ -it pinot-controller bin/pinot-admin. Dash: Configurable dcc.Interval https://www.markhneedham.com/blog/2022/04/23/dash-configurable-dcc-interval/ Sat, 23 Apr 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/04/23/dash-configurable-dcc-interval/ As I mentioned in my blog post about building a Real-Time Crypto Dashboard, I’ve recently been working with the Dash low-code framework for building interactive data apps. I was using the dcc.Interval component to automatically refresh components on the page and wanted to make the refresh interval configurable. In this blog post we’ll learn how to do that. Figure 1. Dash: Configurable dcc.Interval Setup Let’s first setup our Python environment: Apache Pinot: Speeding up queries with IdSets https://www.markhneedham.com/blog/2022/04/08/apache-pinot-speeding-up-queries-id-set/ Fri, 08 Apr 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/04/08/apache-pinot-speeding-up-queries-id-set/ As I continue to build an Apache Pinot demo using CryptoWatch data, I found myself needing to optimise some queries so that the real-time dashboard would render more quickly. I did this using IdSets and in this blog post we’ll learn about those and how to use them. Figure 1. Apache Pinot: Speeding up queries with IdSets Pinot Schema For the purpose of this blog post we don’t need to know how to configure the Pinot schema and tables, but we do need to know that we’re working with trades and pairs tables, whose schemas are shown below: Apache Pinot: Lookup Join - 500 Error - Unsupported function: lookup with 4 parameters https://www.markhneedham.com/blog/2022/04/05/apache-pinot-lookup-join-internal-error-unsupported-function/ Tue, 05 Apr 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/04/05/apache-pinot-lookup-join-internal-error-unsupported-function/ I’m currently working on an Apache Pinot demo using data from Crypto Watch, in which I was using the lookup function and had a bug in my query that didn’t return the clearest error message. In this blog post we’ll have a look at the query and how to fix it. Figure 1. Apache Pinot: Lookup Join - 500 Error - Unsupported function: lookup with 4 parameters The query that I was writing was using the lookup function to return the name of the base asset in a transaction: Apache Pinot: Failed to generate segment - Input path {} does not exist https://www.markhneedham.com/blog/2022/03/17/apache-pinot-failed-to-generated-segment-input-path-does-not-exist/ Thu, 17 Mar 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/03/17/apache-pinot-failed-to-generated-segment-input-path-does-not-exist/ In this blog post we’re going to learn how to work around a bug when trying to ingest CSV files with the same name into Apache Pinot. I came across this issue while writing a recipe showing how to import data files from different directories. Figure 1. Apache Pinot: Failed to generate segment - Input path {} does not exist Setup We’re going to spin up a local instance of Pinot and Kafka using the following Docker compose config: Apache Pinot: Deleting instances in a bad state https://www.markhneedham.com/blog/2022/02/21/apache-pinot-delete-instances-bad-state/ Mon, 21 Feb 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/02/21/apache-pinot-delete-instances-bad-state/ Sometimes when I start up a local Pinot cluster after doing a hard shutdown (by restarting my computer) I noticed that the Pinot Data Explorer shows controllers, brokers, or servers in a bad state. In this blog post we’ll see how to get rid of those bad instances. Figure 1. Apache Pinot: Deleting instances in a bad state The screenshot below shows several instances in the bad state. Figure 2. Streamlit: Overwrite previous value in a loop https://www.markhneedham.com/blog/2022/02/19/streamlit-overwrite-previous-value-loop/ Sat, 19 Feb 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/02/19/streamlit-overwrite-previous-value-loop/ I was recently building a Streamlit app in which I was looping through a stream of values and wanted to only print out the most recent value. In this blog post we’ll learn how to do that. Figure 1. Streamlit: Overwrite previous value in a loop Setup If you want to play along you’ll need to create a virtual environment and install Streamlit: python -m venv env source venv/bin/activate pip install streamlit Streamlit App Now, create a file app. Apache Pinot: Resetting a segment after an invalid JSON Transformation https://www.markhneedham.com/blog/2022/01/31/pinot-resetting-segment-invalid-json-transformation/ Mon, 31 Jan 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/01/31/pinot-resetting-segment-invalid-json-transformation/ I recently had a typo in a Pinot ingestion transformation function and wanted to have Pinot re-process the Kafka stream without having to restart all the things. In this blog post we’ll learn how to do that. Figure 1. Apache Pinot: Resetting a segment after an invalid JSON Transformation Setup We’re going to spin up a local instance of Pinot and Kafka using the following Docker compose config: docker-compose.yml version: '3. Kafka: Writing data to a topic from the command line https://www.markhneedham.com/blog/2022/01/22/kafka-writing-data-topic-command-line/ Sat, 22 Jan 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/01/22/kafka-writing-data-topic-command-line/ I’ve been doing more Apache Pinot documentation - this time covering the JSON functions - and I needed to quickly write some data into Kafka to test things out. I’d normally do that using the Python Kafka client, but this time I wanted to do it using only command line tools. So that’s what we’ll be doing in this blog post and it’s more for future me than anyone else! Apache Pinot: Sorted indexes on real-time tables https://www.markhneedham.com/blog/2022/01/20/apache-pinot-sorted-indexes-realtime-tables/ Thu, 20 Jan 2022 02:44:37 +0000 https://www.markhneedham.com/blog/2022/01/20/apache-pinot-sorted-indexes-realtime-tables/ I’ve recently been learning all about Apache Pinot’s sorted forward indexes, and in my first blog post I explained how they work for offline tables. In this blog post we’ll learn how sorted indexes work with real-time tables. Figure 1. Apache Pinot: Sorted indexes on real-time tables Launch Components We’re going to spin up a local instance of Pinot and Kafka using the following Docker compose config: docker-compose.yml version: '3. Apache Pinot: Sorted indexes on offline tables https://www.markhneedham.com/blog/2022/01/19/apache-pinot-sorted-indexes-offline-tables/ Wed, 19 Jan 2022 00:44:37 +0000 https://www.markhneedham.com/blog/2022/01/19/apache-pinot-sorted-indexes-offline-tables/ I’ve recently been learning all about Apache Pinot’s sorted forward indexes. I was initially going to explain how they work for offline and real-time tables, but the post got a bit long, so instead we’ll have two blog posts. In this one we’ll learn how sorted indexes are applied for offline tables. Figure 1. Apache Pinot: Sorted indexes on offline tables Launch Components We’re going to spin up a local instance of Pinot using the following Docker compose config: Strava: Export and interpolate lat/long points for an activity https://www.markhneedham.com/blog/2022/01/18/strava-export-interpolate-lat-long-points-activity/ Tue, 18 Jan 2022 00:44:37 +0000 https://www.markhneedham.com/blog/2022/01/18/strava-export-interpolate-lat-long-points-activity/ I’ve been working with Strava data again recently and wanted to extract all the lat/long coordinates recorded for my runs. Having done this, I realised that my running watch hadn’t recorded as many points as I expected, so I needed to interpolate the missing points. In this blog post we’ll learn how to do that. Figure 1. Strava: Export and interpolate lat/long points for an activity Setup Let’s first install a few libraries that we’ll be using: Python: Generate WKT from Lat Long Coordinates https://www.markhneedham.com/blog/2022/01/14/python-generate-wkt-lat-long-coordinates/ Fri, 14 Jan 2022 00:44:37 +0000 https://www.markhneedham.com/blog/2022/01/14/python-generate-wkt-lat-long-coordinates/ Recently I’ve been playing around with geometry objects in WKT format while documenting Apache Pinot’s Geospatial functions. I then wanted to figure out how to generate a WKT string from a list of lat long coordinates, which we’ll learn how to do in this blog post. Figure 1. Python: Generate WKT from Lat Long Coordinates We’re going to do all this using Python’s Shapely library, so let’s first install that library: Apache Pinot: Checking which indexes are defined https://www.markhneedham.com/blog/2022/01/13/apache-pinot-which-indexes-are-defined/ Thu, 13 Jan 2022 00:44:37 +0000 https://www.markhneedham.com/blog/2022/01/13/apache-pinot-which-indexes-are-defined/ One of the most common questions in the Apache Pinot community Slack is how to work out which indexes are defined on columns in Pinot segments. This blog post will attempt to answer that question. Figure 1. Apache Pinot: Checking which indexes are defined Setup First, we’re going to spin up a local instance of Pinot using the following Docker compose config: docker-compose.yml version: '3.7' services: zookeeper: image: zookeeper:3.5.6 hostname: zookeeper container_name: zookeeper-indexes ports: - "2181:2181" environment: ZOOKEEPER_CLIENT_PORT: 2181 ZOOKEEPER_TICK_TIME: 2000 pinot-controller: image: apachepinot/pinot:0. Apache Pinot: Exploring range queries https://www.markhneedham.com/blog/2021/12/07/apache-pinot-exploring-range-queries/ Tue, 07 Dec 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/12/07/apache-pinot-exploring-range-queries/ In our last post about the Chicago Crimes dataset and Apache Pinot, we learnt how to use various indexes to filter columns by exact values. In this post we’re going to learn how to write range queries against the dataset. Figure 1. Apache Pinot - Range Queries Recap To recap, the Chicago Crimes dataset contains more than 7 million crimes committed in Chicago from 2001 until today. For each crime we have various identifiers, a timestamp, location, and codes reprsenting the type of crime that’s been committed. Apache Pinot: Copying a segment to a new table https://www.markhneedham.com/blog/2021/12/06/apache-pinot-copy-segment-new-table/ Mon, 06 Dec 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/12/06/apache-pinot-copy-segment-new-table/ In this post we’ll learn how to use the same Pinot segment in multiple tables. Figure 1. Apache Pinot - Copy segment to another table Setup First, we’re going to spin up a local instance of Pinot using the following Docker compose config: docker-compose.yml version: '3.7' services: zookeeper: image: zookeeper:3.5.6 hostname: zookeeper container_name: manual-zookeeper ports: - "2181:2181" environment: ZOOKEEPER_CLIENT_PORT: 2181 ZOOKEEPER_TICK_TIME: 2000 pinot-controller: image: apachepinot/pinot:0.9.0 command: "StartController -zkAddress manual-zookeeper:2181" container_name: "manual-pinot-controller" volumes: - . Apache Pinot: Convert DateTime string to Timestamp - IllegalArgumentException: Invalid timestamp https://www.markhneedham.com/blog/2021/12/03/apache-pinot-convert-datetime-string-timestamp-invalid-timestamp/ Fri, 03 Dec 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/12/03/apache-pinot-convert-datetime-string-timestamp-invalid-timestamp/ In this post we’ll learn how to deal with a field that contains DateTime strings when importing a CSV file into Apache Pinot. We’ll also cover some of the error messages that you’ll see if you do it the wrong way. Figure 1. Apache Pinot - Convert DateTime string to Timestamp Setup We’re going to spin up a local instance of Pinot using the following Docker compose config: docker-compose.yml version: '3. Apache Pinot: Exploring indexing techniques on Chicago Crimes https://www.markhneedham.com/blog/2021/11/30/apache-pinot-exploring-index-chicago-crimes/ Tue, 30 Nov 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/11/30/apache-pinot-exploring-index-chicago-crimes/ In Neha Pawar’s recent blog post, What Makes Apache Pinot fast?, she summarises it with the following sentence: At the heart of the system, Pinot is a columnar store with several smart optimizations that can be applied at various stages of the query by the different Pinot components. Some of the most commonly used and impactful optimizations are data partitioning strategies, segment assignment strategies, smart query routing techniques, a rich set of indexes for filter optimizations, and aggregation optimization techniques. Apache Pinot: Importing CSV files with columns containing spaces https://www.markhneedham.com/blog/2021/11/25/apache-pinot-csv-columns-spaces/ Thu, 25 Nov 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/11/25/apache-pinot-csv-columns-spaces/ I’ve been playing around with one of my favourite datasets from the Chicago Data Portal and spent a while figuring out how to import columns that contain spaces into Apache Pinot. In this blog post we’ll learn how to do that using a subset of the data. Setup We’re going to spin up a local instance of Pinot using the following Docker compose config: docker-compose.yml version: '3.7' services: zookeeper: image: zookeeper:3. Apache Pinot: org.apache.helix.HelixException: Cluster structure is not set up for cluster: PinotCluster https://www.markhneedham.com/blog/2021/11/23/apache-pinot-helix-exception-cluster-structure-not-set-up/ Tue, 23 Nov 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/11/23/apache-pinot-helix-exception-cluster-structure-not-set-up/ In my continued exploration of Apache Pinot, I wanted to spin up all the components individually rather than relying on one of the QuickStarts that takes care of that for me. In doing so I came across an interesting error that we’ll explore in this post. Setup We’re going to spin up a local instance of Pinot using the following Docker compose config: version: '3.7' services: zookeeper: image: zookeeper:3.5.6 hostname: zookeeper container_name: manual-zookeeper ports: - "2181:2181" environment: ZOOKEEPER_CLIENT_PORT: 2181 ZOOKEEPER_TICK_TIME: 2000 pinot-controller: image: apachepinot/pinot:0. Apache Pinot: BadQueryRequestException - Cannot convert value to type: LONG https://www.markhneedham.com/blog/2021/07/16/pinot-bad-query-request-exception-cannot-convert-value-long/ Fri, 16 Jul 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/07/16/pinot-bad-query-request-exception-cannot-convert-value-long/ In my continued exploration of Apache Pinot I’ve been trying out the GitHub events recipe , which imports data from the GitHub events stream into Pinot. In this blog post I want to show how I worked around an exception I was getting when trying to filter the data by one of the timestamp’s column. Setup We’re going to spin up a local instance of Pinot using the following Docker compose config: Apache Pinot: Analysing England's Covid case data https://www.markhneedham.com/blog/2021/06/22/pinot-analysing-england-covid-cases/ Tue, 22 Jun 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/06/22/pinot-analysing-england-covid-cases/ As I mentioned in my last blog post, I’ve been playing around with Apache Pinot, a data store that’s optimised for user facing analytical workloads. My understanding is that Pinot is a really good fit for datasets where: The query patterns are of an analytical nature e.g. slicing and dicing on any columns. We’re ingesting the data in real time from a stream of events. Kenny Bastani has some cool blog posts showing how to do this with Wikipedia and GitHub, and Jackie Jiang showed how to analyse Meetup’s RSVP stream in last week’s Pinot meeetup. Apache Pinot: {'errorCode': 410, 'message': 'BrokerResourceMissingError'} https://www.markhneedham.com/blog/2021/06/21/pinot-broker-resource-missing/ Mon, 21 Jun 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/06/21/pinot-broker-resource-missing/ I’ve recently been playing around with Apache Pinot, a realtime analytical data store that’s used for user facing analytics use cases. In this blog post I want to walk through some challenges I had connecting to Pinot using the Python driver and how I got things working. I’m running Pinot locally using the Docker image, which I setup in a Docker compose file: docker-compose.yml version: '3.7' services: pinot: image: apachepinot/pinot:0. jq: Select multiple keys https://www.markhneedham.com/blog/2021/05/19/jq-select-multiple-keys/ Wed, 19 May 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/05/19/jq-select-multiple-keys/ I recently started a new job, working for a FinTech company called Finbourne, who build a data platform for investment data. It’s an API first product that publishes a Swagger API JSON file that I’ve been trying to parse to get a list of the end points and their operation ids. In this blog post I’ll show how I’ve been parsing that file using jq, my favourite tool for parsing JSON files. Pandas: Add row to DataFrame https://www.markhneedham.com/blog/2021/05/13/pandas-add-row-to-dataframe-with-index/ Thu, 13 May 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/05/13/pandas-add-row-to-dataframe-with-index/ Usually when I’m working with Pandas DataFrames I want to add new columns of data, but I recently wanted to add a row to an existing DataFrame. It turns out there are more than one ways to do that, which we’ll explore in this blog post. Let’s start by importing Pandas into our Python script: import pandas as pd We’ll start from a DataFrame that has two rows and the columns name and age: Altair/Pandas: TypeError: Cannot interpret 'Float64Dtype()' as a data type https://www.markhneedham.com/blog/2021/04/28/altair-pandas-cannot-interpret-float64dtype-as-data-type/ Wed, 28 Apr 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/04/28/altair-pandas-cannot-interpret-float64dtype-as-data-type/ I ran into an interesting problem when trying to use Altair to visualise a Pandas DataFrame containing vaccination rates of different parts of England. In this blog post we’ll look at how to work around this issue. First, let’s install Pandas, numpy, and altair: pip install pandas altair numpy And now we’ll import those modules into a Python script or Jupyter notebook: import pandas as pd import altair as alt import numpy as np Next, we’ll create a DataFrame containing the vaccinations rates of a couple of regions: Pandas: Compare values in DataFrame to previous days https://www.markhneedham.com/blog/2021/04/21/pandas-compare-dataframe-to-previous-days/ Wed, 21 Apr 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/04/21/pandas-compare-dataframe-to-previous-days/ I’m still playing around with Covid vaccine data, this time exploring how the number of doses varies week by week. I want to know how many more (or less) vaccines have been done on a given day compared to that same day last week. We’ll be using Pandas in this blog post, so let’s first install that library and import it: Install Pandas pip install pandas Import module import pandas as pd And now let’s create a DataFrame containing a subset of the data that I’m working with: Vaccinating England: The Data (cleanup) https://www.markhneedham.com/blog/2021/04/17/england-covid-vaccination-rates-the-data/ Sat, 17 Apr 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/04/17/england-covid-vaccination-rates-the-data/ Over the last 13 months I’ve spent countless hours looking at dashboards that showed Coronavirus infection rates, death rates, and numbers of people vaccinated. The UK government host a dashboard at coronavirus.data.gov.uk, which contains charts and tables showing all of the above. One thing I haven’t been able to find, however, is a drill down of vaccinations by local area and age group. So I’m going to try to build my own! Pandas - Format DataFrame numbers with commas and control decimal places https://www.markhneedham.com/blog/2021/04/11/pandas-format-dataframe-numbers-commas-decimals/ Sun, 11 Apr 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/04/11/pandas-format-dataframe-numbers-commas-decimals/ I’m still playing around with the UK’s COVID-19 vaccination data and in this blog post we’ll learn how to format a DataFrame that contains a mix of string and numeric values. Note On 10th November 2022 I created a video that covers the same content as this blog post. Let me know if it’s helpful 😊 We’ll be using Pandas' styling functionality, which generates CSS and HTML, so if you want to follow along you’ll need to install Pandas and Jupyter: Pandas - Dividing two DataFrames (TypeError: unsupported operand type(s) for /: 'str' and 'str') https://www.markhneedham.com/blog/2021/04/08/pandas-divide-dataframes-unsupported-operand-type-str/ Thu, 08 Apr 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/04/08/pandas-divide-dataframes-unsupported-operand-type-str/ I’ve been doing some more exploration of the UK Coronavirus vaccine data, this time looking at the number of people vaccinated by Local Tier Local Authority. The government publish data showing the number of people vaccinated in each authority by age group, as well as population estimates for each cohort. Having loaded that data into two Pandas DataFrames, I wanted to work out the % of people vaccinated per age group per local area. Altair - Remove margin/padding on discrete X axis https://www.markhneedham.com/blog/2021/04/02/altair-discrete-x-axis-margin-padding/ Fri, 02 Apr 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/04/02/altair-discrete-x-axis-margin-padding/ One of the Altair charts on my Covid Vaccine Dashboards Streamlit app shows the % of first doses, but when I first created it there was some padding on the X axis that I wanted to remove. In this blog post we’ll learn how to do that. Pre requisites Let’s start by installing the following libraries: pip install pandas altair altair_viewer Next let’s import them, as shown below: import pandas as pd import altair as alt Visualising % of first doses Now we’re going to create a DataFrame that contains two columns - one contains the year and week number, the other the percentage of 1st doses administered. Pandas: Filter column value in array/list - ValueError: The truth value of a Series is ambiguous https://www.markhneedham.com/blog/2021/03/28/pandas-column-value-in-array-list-truth-value-ambiguous/ Sun, 28 Mar 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/03/28/pandas-column-value-in-array-list-truth-value-ambiguous/ The UK government publishes Coronavirus vaccinations data on coronavirus.data.gov.uk, but I wanted to create some different visualisations so I downloaded the data and have been playing with it in the mneedham/covid-vaccines GitHub repository. I massaged the data so that I have rows in a Pandas DataFrame representing the numbers of first doses, second doses, and total doses done each day. I then wanted to filter this DataFrame based on the type of dose, but initially got a bit stuck. Neo4j Graph Data Science 1.5: Exploring the Speaker-Listener LPA Overlapping Community Detection Algorithm https://www.markhneedham.com/blog/2021/02/08/neo4j-gdsl-overlapping-community-detection-sllpa/ Mon, 08 Feb 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/02/08/neo4j-gdsl-overlapping-community-detection-sllpa/ The Neo4j Graph Data Science Library provides efficiently implemented, parallel versions of common graph algorithms for Neo4j, exposed as Cypher procedures. It recently published version 1.5, which introduces some fun new algorithms. In this blog post, we’re going to explore the newly added Speaker-Listener Label Propagation algorithm with the help of a twitter dataset. Launching Neo4j We’re going to run Neo4j with the Graph Data Science Library using the following Docker Compose configuration: Neo4j Graph Data Science 1.5: Exploring the HITS Algorithm https://www.markhneedham.com/blog/2021/02/03/neo4j-gdsl-hits-algorithm/ Wed, 03 Feb 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/02/03/neo4j-gdsl-hits-algorithm/ The Neo4j Graph Data Science Library provides efficiently implemented, parallel versions of common graph algorithms for Neo4j, exposed as Cypher procedures. It recently published version 1.5, which has lots of goodies to play with. In this blog post, we’re going to explore the newly added HITS algorithm with the help of a citations dataset. Launching Neo4j We’re going to run Neo4j with the Graph Data Science Library using the following Docker Compose configuration: Materialize: Creating multiple views on one source https://www.markhneedham.com/blog/2021/01/28/materialize-multiple-views-one-source/ Thu, 28 Jan 2021 00:44:37 +0000 https://www.markhneedham.com/blog/2021/01/28/materialize-multiple-views-one-source/ This is another post describing my exploration of Materialize, a SQL streaming database. In this post we’re going to learn how to create multiple views on top of the same underlying source. We’re still going to be using data extracted from Strava, an app that I use to record my runs, but this time we have more detailed information about each run. As in the previous blog posts, each run is represented as JSON document and store in the activities-detailed-all. Materialize: Unable to automatically determine a timestamp for your query; this can happen if your query depends on non-materialized sources https://www.markhneedham.com/blog/2020/12/31/materialize-unable-automatically-determine-timestamp-query/ Thu, 31 Dec 2020 00:44:37 +0000 https://www.markhneedham.com/blog/2020/12/31/materialize-unable-automatically-determine-timestamp-query/ This is another post describing my exploration of Materialize, a SQL streaming database. In this post I’m going to explain a confusing (to me at least) error message that you might come across when you’re getting started. As I mentioned in my first post about Materialize, the general idea is that you create a source around a data resource and then a view on top of that. Those views can either be materialized or non-materialized. jq: How to change the value of keys in JSON documents https://www.markhneedham.com/blog/2020/12/30/jq-change-value-multiple-keys/ Tue, 29 Dec 2020 00:44:37 +0000 https://www.markhneedham.com/blog/2020/12/30/jq-change-value-multiple-keys/ jq, the command-line JSON processor, is my favourite tool for transforming JSON documents. In this post we’re going to learn how to use it to transform the values for specific keys in a document, while leaving everything else untouched. We have the following file, which contains one JSON document: /tmp/foo.json {"id":1341735877953904600,"conversation_id":"1341735877953904641","created_at":"2020-12-23 13:22:16 GMT","date":"2020-12-23","time":"13:22:16","timezone":"+0000","user_id":"972709154329591800","username":"dondaconceicao","name":"T N Biscuits","place":"","tweet":"Can’t imagine being sick with covid while living alone","language":"en","mentions":[],"urls":[],"photos":[],"replies_count":0,"retweets_count":0,"likes_count":1,"hashtags":[],"cashtags":[],"link":"https://twitter.com/dondaconceicao/status/1341735877953904641","retweet":false,"quote_url":"","video":0,"thumbnail":"","near":"London","geo":"","source":"","user_rt_id":"","user_rt":"","retweet_id":"","reply_to":[],"retweet_date":"","translate":"","trans_src":"","trans_dest":""} We want to update the id field so that its value is a string rather than numeric value. Materialize: Querying JSON arrays https://www.markhneedham.com/blog/2020/12/29/materialize-json-arrays/ Tue, 29 Dec 2020 00:44:37 +0000 https://www.markhneedham.com/blog/2020/12/29/materialize-json-arrays/ In a blog post I wrote a couple of weeks ago, we learned how to analyse JSON files using the Materialize SQL streaming database. In this post we’re going to build on that knowledge to analyse a JSON file of tweets that contain arrays of hashtags. It took me a while to figure out to do this, so I wanted to share what I learnt along the way. The JSON file that we’re going to analyse looks like this and we’ll save that file in a data directory locally. Strava: Export all activities to JSON file https://www.markhneedham.com/blog/2020/12/20/strava-export-all-activities-json/ Sun, 20 Dec 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/12/20/strava-export-all-activities-json/ In my continued playing around with the Strava API, I wanted to write a script to download all of my Strava activities to a JSON file. As I mentioned in a previous blog post, the approach to authenticating requests has changed in the last two years, so we first need to generate an access token via the OAuth endpoint. Luckily Odd Eirik Igland shared a script showing how to solve most of the problem, and I’ve adapted it to do what I want. git: Ignore local changes on committed (env) file https://www.markhneedham.com/blog/2020/12/18/git-ignore-local-changes-committed-env-file/ Fri, 18 Dec 2020 00:44:37 +0000 https://www.markhneedham.com/blog/2020/12/18/git-ignore-local-changes-committed-env-file/ Whenever I’ve writing scripts that rely on credentials defined as environment variables, I like to create a .env (or equivalent) file containing those variables. I then seed that file with placeholder values for each variable and make local changes that aren’t checked in. I recently created the mneedham/materialize-sandbox/strava repository where I’m using this approach with a .envsettings file that has the following contents: envsettings export CLIENT_ID="client_id" export CLIENT_SECRET="client_secret" I have that file checked in so that anybody else can clone the repository and update this file with their own credentials. Materialize: Querying JSON files https://www.markhneedham.com/blog/2020/12/17/materialize-querying-json-file/ Thu, 17 Dec 2020 00:44:37 +0000 https://www.markhneedham.com/blog/2020/12/17/materialize-querying-json-file/ I recently learnt about Materialize, a SQL streaming database, via their Series B fundraising announcement, and thought I’d take it for a spin. My go-to dataset for new databases is Strava, an app that I use to record my runs. It has an API that returns a JSON representation of each run, containing information like the distance covered, elapsed time, heart rate metrics, and more. I’ve extracted my latest 30 activities to a file in the JSON lines format and in this post we’re going to analyse that data using Materialize. Strava: Authorization Error - Missing activity:read_permission https://www.markhneedham.com/blog/2020/12/15/strava-authorization-error-missing-read-permission/ Tue, 15 Dec 2020 00:44:37 +0000 https://www.markhneedham.com/blog/2020/12/15/strava-authorization-error-missing-read-permission/ I’m revisiting the Strava API after a two year absence and the approach to authenticating requests has changed in that time. You now need to generate an access token via OAuth 2.0, as described in the 'How to authenticate with OAuth 2.0' section of the Getting Started with the Strava API guide. I want to generate a token that lets me retrieve all of my activities via the /athlete/activities end point. Neo4j: Cypher - FOREACH vs CALL {} (subquery) https://www.markhneedham.com/blog/2020/10/29/neo4j-foreach-call-subquery/ Thu, 29 Oct 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/10/29/neo4j-foreach-call-subquery/ I recently wanted to create a graph based on an adjacency list, and in this post we’ll learn how to do that using the FOREACH clause and then with the new CALL {} subquery clause. We’ll start with the following map of ids → arrays of ids: :param list => ({`0`: [7, 9], `1`: [2, 4, 5, 6, 8, 9], `2`: [0, 6, 8, 9], `3`: [1, 2, 6, 9], `4`: [1, 2, 3, 7], `5`: [8, 9], `6`: [2, 4, 5, 7, 8, 9], `7`: [0, 3, 4, 6, 8, 9], `8`: [1, 6, 9], `9`: [0, 1, 3, 5]}) We want to create one node per id and create a relationship from each node to the nodes in its array. Unix: Get file name without extension from file path https://www.markhneedham.com/blog/2020/08/24/unix-get-file-name-without-extension-from-file-path/ Mon, 24 Aug 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/08/24/unix-get-file-name-without-extension-from-file-path/ I recently found myself needing to extract the file name but not file extension from a bunch of file paths and wanted to share a neat technique that I learnt to do it. I started with a bunch of Jupyter notebook files, which I listed usign the following command; $ find notebooks/ -maxdepth 1 -iname *ipynb notebooks/09_Predictions_sagemaker.ipynb notebooks/00_Environment.ipynb notebooks/05_Train_Evaluate_Model.ipynb notebooks/01_DataLoading.ipynb notebooks/05_SageMaker.ipynb notebooks/09_Predictions_sagemaker-Copy2.ipynb notebooks/09_Predictions_sagemaker-Copy1.ipynb notebooks/02_Co-Author_Graph.ipynb notebooks/04_Model_Feature_Engineering.ipynb notebooks/09_Predictions_scikit.ipynb notebooks/03_Train_Test_Split.ipynb If we pick one of those files: pipenv: ImportError: No module named 'virtualenv.seed.via_app_data' https://www.markhneedham.com/blog/2020/08/07/pipenv-import-file-no-module-named-virtualenv/ Fri, 07 Aug 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/08/07/pipenv-import-file-no-module-named-virtualenv/ I’ve been trying to install pipenv on a new computer and ran into a frustrating issue. After installing pipenv using pip, I tried to run the command below: $ /home/markhneedham/.local/bin/pipenv shell Creating a virtualenv for this project… Pipfile: /tmp/Pipfile Using /usr/bin/python3.8 (3.8.2) to create virtualenv… ⠙ Creating virtual environment...ModuleNotFoundError: No module named 'virtualenv.seed.via_app_data' ✘ Failed creating virtual environment [pipenv.exceptions.VirtualenvCreationException]: Failed to create virtual environment. Hmmm, for some reason it’s unable to find one of the virtualenv modules. Google Docs: Find and replace script https://www.markhneedham.com/blog/2020/05/12/google-docs-find-and-replace-script/ Tue, 12 May 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/05/12/google-docs-find-and-replace-script/ I keep track of the podcasts that I’ve listened to in a Google Doc, having pasted the episode title and podcast name from Player.FM. The format isn’t exactly what I want so I’ve been running the Find and Replace command to update each entry. This is obviously a very boring task, so I wanted to see if I could automate it. An example entry in the Google Doc reads like this: QuickGraph #7: An entity graph of TWIN4j using APOC NLP https://www.markhneedham.com/blog/2020/05/05/quick-graph-building-entity-graph-twin4j-apoc-nlp/ Tue, 05 May 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/05/05/quick-graph-building-entity-graph-twin4j-apoc-nlp/ One of the most popular use cases for Neo4j is knowledge graphs, and part of that process involves using NLP to create a graph structure from raw text. If we were doing a serious NLP project we’d want to use something like GraphAware Hume, but in this blog post we’re going to learn how to add basic NLP functionality to our graph applications. Figure 1. Building an entity graph of TWIN4j using APOC NLP APOC NLP The big cloud providers (AWS, GCP, and Azure) all have Natural Language Processing APIs and, although their APIs aren’t identical, they all let us extract entities, key phrases, and sentiment from text documents. Python: Select keys from map/dictionary https://www.markhneedham.com/blog/2020/04/27/python-select-keys-from-map-dictionary/ Mon, 27 Apr 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/04/27/python-select-keys-from-map-dictionary/ In this post we’re going to learn how to filter a Python map/dictionary to return a subset of keys or values. I needed to do this recently while logging some maps that had a lot of keys that I wasn’t interested in. We’ll start with the following map: x = {"a": 1, "b": 2, "c": 3, "d": 4} {'a': 1, 'b': 2, 'c': 3, 'd': 4} We want to filter this map so that we only have the keys a and c. QuickGraph #6: COVID-19 Taxonomy Graph https://www.markhneedham.com/blog/2020/04/21/quick-graph-covid-19-taxonomy/ Tue, 21 Apr 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/04/21/quick-graph-covid-19-taxonomy/ It’s been several months since our last QuickGraph and the world feels very different than it was back then. I’ve been reading a couple of books about viruses - Spillover and Pale Rider - and am now very curious to learn more about the medical terms reference in the books. With the Pre Release of neosemantics (n10s) for Neo4j 4.0, I thought it would be interesting to create a graph of the taxonomy of the virus that caused COVID-19, using data extracted from Wikidata’s SPARQL API. Python: Find the starting Sunday for all the weeks in a month https://www.markhneedham.com/blog/2020/04/18/python-starting-sundays-in-a-month/ Sat, 18 Apr 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/04/18/python-starting-sundays-in-a-month/ In this post we’re going to learn how to find the dates of all the Sundays in a given month, as well as the Sunday immediately preceding the 1st day in the month, assuming that day isn’t a Sunday. Let’s start by importing some libraries that we’re going to use in this blog post: from dateutil import parser import datetime import calendar Next we need to find the first day of the current month, which we can do with the following code: React Semantic-UI: Adding a custom icon to open link in a new window https://www.markhneedham.com/blog/2020/04/13/react-semantic-ui-custom-add-icon-open-new-window/ Mon, 13 Apr 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/04/13/react-semantic-ui-custom-add-icon-open-new-window/ I’ve been building a little React app that uses the Semantic UI library and found myself wanting to render a custom icon. Semantic UI describes an icon as "a glyph used to represent something else", and there are a big list of in built icons. For example, the following code renders a thumbs up icon: import {Icon} from "semantic-ui-react"; <Icon name="thumbs up outline icon green large" style={{margin: 0}}/> Figure 1. Streamlit: multiselect - AttributeError: 'numpy.ndarray' object has no attribute 'index' https://www.markhneedham.com/blog/2020/03/31/streamlit-multiselect-numpy-no-attribute-index/ Tue, 31 Mar 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/03/31/streamlit-multiselect-numpy-no-attribute-index/ In this post we’ll learn how to overcome a problem I encountered while building a small Streamlit application to analyse John Hopkin’s data on the COVID-19 disease. The examples in this post use a CSV file that contains time series data of deaths in each country. I started with the following code to create a multiselect widget that lists all countries and selected the United Kingdom by default: import streamlit as st import pandas as pd default_countries = ["United Kingdom"] url="https://github. SPARQL: OR conditions in a WHERE clause using the UNION clause https://www.markhneedham.com/blog/2020/02/07/sparql-or-conditions-where-union-query/ Fri, 07 Feb 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/02/07/sparql-or-conditions-where-union-query/ This is part 4 of my series of posts about querying the Wikidata API, in which I learn how to use SPARQL’s UNION clause to handle an OR condition in a WHERE clause. Figure 1. Using SPARQL’s UNION clause But first, some context! After running queries against the Wikidata SPARQL API to pull the date of birth and nationality of tennis players into the Australian Open Graph, I noticed that several players hadn’t actually been updated. Neo4j: Enriching an existing graph by querying the Wikidata SPARQL API https://www.markhneedham.com/blog/2020/02/04/neo4j-enriching-existing-graph-wikidata-sparql-api/ Tue, 04 Feb 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/02/04/neo4j-enriching-existing-graph-wikidata-sparql-api/ This is the third post in a series about querying Wikidata’s SPARQL API. In the first post we wrote some basic queries, in the second we learnt about the SELECT and CONSTRUCT clauses, and in this post we’re going to import query results into an existing Neo4j graph. Figure 1. Enriching a Neo4j Graph with Wikidata Setting up Neo4j We’re going to use the following Docker Compose configuration in this blog post: Neo4j: Cross database querying with Neo4j Fabric https://www.markhneedham.com/blog/2020/02/03/neo4j-cross-database-querying-fabric/ Mon, 03 Feb 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/02/03/neo4j-cross-database-querying-fabric/ A couple of weeks ago I wrote a QuickGraph blog post about the Australian Open, in which I showed how to use Neo4j 4.0’s multi database feature. In that post we focused on queries that could be run on one database, but the 4.0 release also contains another feature for doing cross database querying - Neo4j Fabric - and we’re going to learn how to use that in this post. Querying Wikidata: SELECT vs CONSTRUCT https://www.markhneedham.com/blog/2020/02/02/querying-wikidata-construct-select/ Sun, 02 Feb 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/02/02/querying-wikidata-construct-select/ In this blog post we’re going to build upon the newbie’s guide to querying Wikidata, and learn all about the CONSTRUCT clause. Figure 1. SPARQL’s CONSTRUCT and SELECT clauses In the newbie’s guide, we wrote the following query to find a tennis player with the name "Nick Kyrgios" and return their date of birth: SELECT * WHERE { ?person wdt:P106 wd:Q10833314 ; rdfs:label 'Nick Kyrgios'@en ; wdt:P569 ?dateOfBirth } where: Neo4j: Finding the longest path https://www.markhneedham.com/blog/2020/01/29/neo4j-finding-longest-path/ Wed, 29 Jan 2020 15:21:00 +0000 https://www.markhneedham.com/blog/2020/01/29/neo4j-finding-longest-path/ One on my favourite things about storing data in a graph database is executing path based queries against that data. I’ve been trying to find a way to write such queries against the Australian Open QuickGraph, and in this blog post we’re going to write what I think of as longest path queries against this graph. Figure 1. Finding longest paths in Neo4j Setting up Neo4j We’re going to use the following Docker Compose configuration in this blog post: A newbie's guide to querying Wikidata https://www.markhneedham.com/blog/2020/01/29/newbie-guide-querying-wikidata/ Wed, 29 Jan 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/01/29/newbie-guide-querying-wikidata/ After reading one of Jesús Barrasa’s recent QuickGraph posts about enriching a knowledge graph with data from Wikidata, I wanted to learn how to query the Wikidata API so that I could pull in the data for my own QuickGraphs. I want to look up information about tennis players, and one of my favourite players is Nick Kyrgios, so this blog post is going to be all about him. Neo4j: Performing a database dump within a Docker container https://www.markhneedham.com/blog/2020/01/28/neo4j-database-dump-docker-container/ Tue, 28 Jan 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/01/28/neo4j-database-dump-docker-container/ Before the release of Neo4j 4.0, taking a dump of a database running within a Docker container was a tricky affair. We’d need to stop the container and remove it, run the container again in bash mode, and finally take a dump of the database. With 4.0 things are simpler. Figure 1. Neo4j on Docker We’ll be using the following Docker Compose configuration in this blog post: Dockerfile version: '3. Neo4j: Exporting a subset of data from one database to another https://www.markhneedham.com/blog/2020/01/27/neo4j-exporting-subset-database/ Mon, 27 Jan 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/01/27/neo4j-exporting-subset-database/ As part of the preparation for another blog post, I wanted to export a subset of data from one Neo4j database to another one, which seemed like a blog post in its own right. Figure 1. Exporting data using APOC’s Export JSON Setting up Neo4j We’re going to use the following Docker Compose configuration in this blog post: Dockerfile version: '3.7' services: neo4j: image: neo4j:4.0.0-enterprise container_name: "quickgraph-aus-open" volumes: - . QuickGraph #5: Australian Open https://www.markhneedham.com/blog/2020/01/23/quick-graph-australian-open/ Thu, 23 Jan 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/01/23/quick-graph-australian-open/ It’s time for another QuickGraph, this one based on data from the Australian Open tennis tournament. We’re going to use data curated by Jeff Sackmann in the tennis_wta and tennis_atp repositories. Figure 1. Australian Open Graph (Background from https://www.freepik.com/free-photo/3d-network-background-with-connecting-lines-dots_3961382.htm) Setting up Neo4j We’re going to use the following Docker Compose configuration in this blog post: docker-compose.yml version: '3.7' services: neo4j: image: neo4j:4.0.0-enterprise container_name: "quickgraph-aus-open" volumes: - ./plugins:/plugins - ./data:/data - . Creating an Interactive UK Official Charts Data App with Streamlit and Neo4j https://www.markhneedham.com/blog/2020/01/16/interactive-uk-charts-quickgraph-neo4j-streamlit/ Thu, 16 Jan 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/01/16/interactive-uk-charts-quickgraph-neo4j-streamlit/ I recently came across Streamlit, a tool that makes it easy to build data based single page web applications. I wanted to give it a try, and the UK Charts QuickGraph that I recently wrote about seemed like a good opportunity for that. This blog post starts from where we left off. The data is loaded into Neo4j and we’ve written some queries to explore different aspects of the dataset. Python: Altair - Setting the range of Date values for an axis https://www.markhneedham.com/blog/2020/01/14/altair-range-values-dates-axis/ Tue, 14 Jan 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/01/14/altair-range-values-dates-axis/ In my continued experiments with the Altair visualisation library, I wanted to set a custom range of data values on the x axis of a chart. In this blog post we’ll learn how to do that. We’ll start where we left off in the last blog post, with the following code that renders a scatterplot containing the chart position of a song on a certain date: import altair as alt import pandas as pd import datetime df = pd. Python: Altair - TypeError: Object of type date is not JSON serializable https://www.markhneedham.com/blog/2020/01/10/altair-typeerror-object-type-date-not-json-serializable/ Fri, 10 Jan 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/01/10/altair-typeerror-object-type-date-not-json-serializable/ I’ve been playing with the Altair statistical visualisation library and recently ran into an error while trying to render a DataFrame that contained dates. I was trying to render a scatterplot containing the chart position of a song on a certain date, as seen in the code below: # pip install altair pandas import altair as alt import pandas as pd import datetime df = pd.DataFrame( [ {"position": 2, "date": datetime. QuickGraph #4: UK Official Singles Chart 2019 https://www.markhneedham.com/blog/2020/01/04/quick-graph-uk-official-charts/ Sat, 04 Jan 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/01/04/quick-graph-uk-official-charts/ For our first QuickGraph of the new decade we’re going to explore data from the Official UK Top 40 Chart. This chart ranks the top 100 songs of the week based on official sales of sales of downloads, CD, vinyl, audio streams and video streams. Every week BBC Radio 1 broadcast the top 40 songs, which explains the name of the chart. Figure 1. The Official UK Charts Scraping the Official Charts I couldn’t find a dump of the dataset, so we’re going to use our web scraping skills again. Spotify API: Making my first call https://www.markhneedham.com/blog/2020/01/02/spotify-api-making-my-first-call/ Thu, 02 Jan 2020 00:21:00 +0000 https://www.markhneedham.com/blog/2020/01/02/spotify-api-making-my-first-call/ I wanted to enrich the data for a little music application I’m working on and realised it would be a perfect opportunity to try out the Spotify API. I want to extract data about individual tracks (via the Tracks API), but before we do that we’ll need to create an app and have it approved for access to the Spotify API. Registering an application After logging into the Spotify Dashboard using my usual Spotify credentials, I was prompted to create an application: QuickGraph #3: Itsu Allergens https://www.markhneedham.com/blog/2019/12/23/quick-graph-itsu-allergens/ Mon, 23 Dec 2019 00:21:00 +0000 https://www.markhneedham.com/blog/2019/12/23/quick-graph-itsu-allergens/ As someone who’s allergic to lots of different things, the introduction of allergen charts in restaurants over the last few years has been very helpful. These charts are often hidden away in PDF files, but the Asian inspired Itsu restaurant have all this information available on their online menus. This therefore seemed like a great opportunity for another QuickGraph. Scraping the Itsu website I wrote a couple of Python scripts to download each of the menu items and then extract the product name, description, and allergens. QuickGraph #2: Guardian Top 100 Male Footballers https://www.markhneedham.com/blog/2019/12/22/quick-graph-guardian-top-100-male-footballers/ Sun, 22 Dec 2019 00:21:00 +0000 https://www.markhneedham.com/blog/2019/12/22/quick-graph-guardian-top-100-male-footballers/ Over the last week the Guardian have been counting down their top 100 male footballers of 2019, and on Friday they also published a Google sheet containing all the votes, which seemed like a perfect candidate for a QuickGraph. We can see a preview of the Google sheet in the printscreen below: We can also download Google Sheets in CSV format based on the following URI template: https://docs.google.com/spreadsheets/d/KEY/export?format=csv&id=KEY&gid=SHEET_ID where: React React Router: Setting parent component state based on route change event https://www.markhneedham.com/blog/2019/12/19/react-reach-parent-event-route-change/ Thu, 19 Dec 2019 00:21:00 +0000 https://www.markhneedham.com/blog/2019/12/19/react-reach-parent-event-route-change/ I’ve been working on a React Reach Router based application that has several routes and wanted to show a search box in the header unless the user was on the search page. After a lot of trial and error I learnt that I could use a route change event listener to do this. The CodeSandbox below shows all the code to do this: Let’s walk through the code. We have a top level component called App that has the state showSearchBox, which defaults to true: Elasticsearch: Importing data into App Search https://www.markhneedham.com/blog/2019/11/24/elasticsearch-import-data-appsearch/ Sun, 24 Nov 2019 00:21:00 +0000 https://www.markhneedham.com/blog/2019/11/24/elasticsearch-import-data-appsearch/ For a side project that I’m working on I wanted to create a small React application that can query data stored in Elasticsearch, and most of the tutorials I found suggested using a tool called Elastic App Search. I’d not heard of App Search before, and it took me a while to figure out that it’s the mid level product in between Elasticsearch Service and Elastic Site Search Service, as described on elastic. Graphing Brexit: Did the threat work? https://www.markhneedham.com/blog/2019/09/27/graphing-brexit-did-the-threat-work/ Fri, 27 Sep 2019 00:47:00 +0000 https://www.markhneedham.com/blog/2019/09/27/graphing-brexit-did-the-threat-work/ Following on from the blog post where we compared how MPs and parties voted on Brexit indicative measures, in this post we’re going to explore how Conservative MPs have voted with respect to a no deal exit from the European Union. In particular we’d like to know whether the threat to have the party whip removed had an impact on how they voted in the recent motion to request an extension to work out a deal. Graphing Brexit: MPs vs Parties https://www.markhneedham.com/blog/2019/09/23/graphing-brexit-mps-vs-parties/ Mon, 23 Sep 2019 00:47:00 +0000 https://www.markhneedham.com/blog/2019/09/23/graphing-brexit-mps-vs-parties/ In the previous post of the Graphing Brexit series we computed the average vote by party. In this post we’re going to take those average party scores and compare them against the votes placed by individual MPs. The goal is to determine whether, Brexit wise, MPs are representing the right party! It won’t be perfect since we know that not everyone in a party voted the same way, but it should still give us some fun results. Graphing Brexit: Plotting how the parties voted https://www.markhneedham.com/blog/2019/09/20/graphing-brexit-charting-how-the-parties-voted/ Fri, 20 Sep 2019 00:47:00 +0000 https://www.markhneedham.com/blog/2019/09/20/graphing-brexit-charting-how-the-parties-voted/ Over the last week I’ve revisited the Brexit Graph that I created in March 2019, this time looking at how the parties voted on average on each of the indicative votes. To recap, we have a graph that has the following schema: Since the initial post I’ve slightly changed how the MEMBER_OF relationship works. As several MPs have switched MPs in the intervening months, we’re now storing a start property to indicate when they started representing a party and an end property to indicate when they stopped representing a party. Neo4j: Approximate string matching/similarity https://www.markhneedham.com/blog/2019/09/18/neo4j-string-matching-similarity/ Wed, 18 Sep 2019 00:47:00 +0000 https://www.markhneedham.com/blog/2019/09/18/neo4j-string-matching-similarity/ I’ve been playing with the Brexit Graph over the last few days, and wanted to map the MPs that I got from CommonsVotes with data from the TheyWorkForYou API. I already had voting records loaded into Neo4j, but to recap, this is how I did that: UNWIND [655,656,657,658,659,660,661,662,711, 669, 668, 667, 666, 664] AS division LOAD CSV FROM "https://github.com/mneedham/graphing-brexit/raw/master/data/commonsvotes/Division" + division + ".csv" AS row // Create motion nodes WITH division, collect(row) AS rows MERGE (motion:Motion {division: trim(split(rows[0][0], ":")[1]) }) SET motion. Neo4j: apoc.load.csv - Neo.ClientError.Statement.SyntaxError: Type mismatch: expected Float, Integer, Number or String but was Any https://www.markhneedham.com/blog/2019/09/05/neo4j-apoc-load-csv-type-mismatch-expected-float-integer-number-string/ Thu, 05 Sep 2019 00:47:00 +0000 https://www.markhneedham.com/blog/2019/09/05/neo4j-apoc-load-csv-type-mismatch-expected-float-integer-number-string/ The Neo4j APOC library's Load CSV procedure is very useful if you want more control over the import process than the LOAD CSV clause allows. I found myself using it last week to import a CSV file of embeddings, because I wanted to know the line number of the row in the CSV file while importing the data. I had a file that looked like this, which I put into the import directory: Neo4j: Cypher - Nested Path Comprehensions vs OPTIONAL MATCH https://www.markhneedham.com/blog/2019/08/23/neo4j-cypher-path-comprehensions-optional-match/ Fri, 23 Aug 2019 00:47:00 +0000 https://www.markhneedham.com/blog/2019/08/23/neo4j-cypher-path-comprehensions-optional-match/ While writing my previous post about Cypher nested path comprehensions, I realised that for this particular problem, the OPTIONAL MATCH clause is a better choice. To recap, we have the following graph: MERGE (club:Club {name: "Man Utd"}) MERGE (league:League {name: "Premier League"}) MERGE (country:Country {name: "England"}) MERGE (club)-[:IN_LEAGUE]->(league) MERGE (league)-[:IN_COUNTRY]->(country) MERGE (club2:Club {name: "Juventus"}) MERGE (league2:League {name: "Serie A"}) MERGE (club2)-[:IN_LEAGUE]->(league2) We started the post with the following query that returns (club)-[:IN_LEAGUE]→(league)-[:IN_COUNTRY]→(country) paths: Neo4j: Cypher - Nested Path Comprehensions https://www.markhneedham.com/blog/2019/08/22/neo4j-cypher-nested-pattern-comprehensions/ Thu, 22 Aug 2019 11:08:00 +0000 https://www.markhneedham.com/blog/2019/08/22/neo4j-cypher-nested-pattern-comprehensions/ I’ve recently been building an application using the GRANDstack, which uses nested Cypher path comprehensions to translate GraphQL queries to Cypher ones. I’d not done this before, so I was quite curious how this feature worked. We’ll explore it using the following dataset: MERGE (club:Club {name: "Man Utd"}) MERGE (league:League {name: "Premier League"}) MERGE (country:Country {name: "England"}) MERGE (club)-[:IN_LEAGUE]->(league) MERGE (league)-[:IN_COUNTRY]->(country) MERGE (club2:Club {name: "Juventus"}) MERGE (league2:League {name: "Serie A"}) MERGE (club2)-[:IN_LEAGUE]->(league2) If we want to return a path containing a club, the league they play in, and the country that the league belongs to, we could write the following query: Neo4j: Conditional WHERE clause with APOC https://www.markhneedham.com/blog/2019/07/31/neo4j-conditional-where-query-apoc/ Wed, 31 Jul 2019 11:08:00 +0000 https://www.markhneedham.com/blog/2019/07/31/neo4j-conditional-where-query-apoc/ Sometimes we want to be able to vary our Cypher queries based on the value of a parameter. I came across such a situation today, and thought I’d share how I solved it using the APOC library. Let’s first setup some sample data: UNWIND range(0, 5) AS id CREATE (:Person {name: "person-" + id}) Now, if we want to get all pairs of people, we could write the following query: Python: Click - Handling Date Parameter https://www.markhneedham.com/blog/2019/07/29/python-click-date-parameter-type/ Mon, 29 Jul 2019 11:08:00 +0000 https://www.markhneedham.com/blog/2019/07/29/python-click-date-parameter-type/ I’ve been building a little CLI application using the Python Click Library, and I wanted to pass in a Date as a parameter. There’s more than one way to do this. Let’s first install the Click library: pip install click And now we’ll import our required libraries: from datetime import date import click Now we’ll create a sub command that takes two parameters: date-start and date-end. These parameters have the type DateTime, and we can pass a string in the format yyyy-mm-dd from the command line: Kafka: Python Consumer - No messages with group id/consumer group https://www.markhneedham.com/blog/2019/06/03/kafka-python-consumer-no-messages-group-id-consumer-group/ Mon, 03 Jun 2019 11:08:00 +0000 https://www.markhneedham.com/blog/2019/06/03/kafka-python-consumer-no-messages-group-id-consumer-group/ When I’m learning a new technology, I often come across things that are incredibly confusing when I first come across them, but make complete sense afterwards. In this post I’ll explain my experience writing a Kafka consumer that wasn’t finding any messages when using consumer groups . Setting up Kafka infrastructure We’ll set up the Kafka infrastructure locally using the Docker Compose Template that I describe in my Kafka: A Basic Tutorial blog post. Twint: Loading tweets into Kafka and Neo4j https://www.markhneedham.com/blog/2019/05/29/loading-tweets-twint-kafka-neo4j/ Wed, 29 May 2019 06:50:00 +0000 https://www.markhneedham.com/blog/2019/05/29/loading-tweets-twint-kafka-neo4j/ In this post we’re going to load tweets via the twint library into Kafka, and once we’ve got them in there we’ll use the Kafka Connect Neo4j Sink Plugin to get them into Neo4j. What is twint? Twitter data has always been some of the most fun to play with, but over the years the official API has become more and more restritive, and it now takes a really long time to download enough data to do anything interesting. Docker: Find the network for a container https://www.markhneedham.com/blog/2019/05/24/docker-find-network-for-container/ Fri, 24 May 2019 06:10:00 +0000 https://www.markhneedham.com/blog/2019/05/24/docker-find-network-for-container/ If we want two Docker containers to communicate with each other they need to belong to the same network. In this post we’ll learn how to find out the network of existing containers so that we can attach new containers to that network. All the containers mentioned in this post can be launched locally from Docker compose, using the following command: git clone git@github.com:mneedham/ksql-kafka-neo4j-streams.git && cd ksql-kafka-neo4j-streams docker-compose-up Running this command will create four containers: Processing Neo4j Transaction Events with KSQL and Kafka Streams https://www.markhneedham.com/blog/2019/05/23/processing-neo4j-transaction-events-ksql-kafka-streams/ Thu, 23 May 2019 12:46:00 +0000 https://www.markhneedham.com/blog/2019/05/23/processing-neo4j-transaction-events-ksql-kafka-streams/ The Neo4j Streams Library lets users send transaction events to a Kafka topic, and in this post we’re going to learn how to explore these events using the KSQL streaming SQL Engine. All the infrastructure used in this post can be launched locally from Docker compose, using the following command: git clone git@github.com:mneedham/ksql-kafka-neo4j-streams.git && cd ksql-kafka-neo4j-streams docker-compose-up Running this command will create four containers: Starting zookeeper-blog ... Starting broker-blog . Deleting Kafka Topics on Docker https://www.markhneedham.com/blog/2019/05/23/deleting-kafka-topics-on-docker/ Thu, 23 May 2019 07:58:00 +0000 https://www.markhneedham.com/blog/2019/05/23/deleting-kafka-topics-on-docker/ In this post we’re going to learn how to delete a Kafka Topic when running a Kafka Broker on Docker. Note Update: 26th July 2023 While the approach described in this blog post still works, I think I’ve now got an even easier way using a command line tool called rpk. If you’re interested in seeing an alternative approach check out my other blog post. If not, as you were! KSQL: Create Stream - extraneous input 'properties' https://www.markhneedham.com/blog/2019/05/20/kql-create-stream-extraneous-input/ Mon, 20 May 2019 11:43:00 +0000 https://www.markhneedham.com/blog/2019/05/20/kql-create-stream-extraneous-input/ In my continued playing with the KSQL streaming engine for Kafka, I came across another interesting error while trying to put a stream on top of a topic generated by the Neo4j Streams Library. We’ll simplify the events being posted on the topic for this blog post, so this is what the events on the topic look like: { "id":"ABCDEFGHI", "properties": { "name":"Mark", "location":"London" } } We then create a stream on that topic: KSQL: Create Stream - Failed to prepare statement: name is null https://www.markhneedham.com/blog/2019/05/19/ksql-create-stream-failed-to-prepare-statement-name-is-null/ Sun, 19 May 2019 19:21:00 +0000 https://www.markhneedham.com/blog/2019/05/19/ksql-create-stream-failed-to-prepare-statement-name-is-null/ I’ve been playing with KSQL over the weekend and ran into a basic error message that took me a little while to solve. I was trying to create a stream over a topic dummy1, which is the simplest possible thing you can do with KSQL. The events posted to dummy1 are JSON messages containing only an id key. Below is an example of a message posted to the topic: Kafka: A basic tutorial https://www.markhneedham.com/blog/2019/05/16/kafka-basic-tutorial/ Thu, 16 May 2019 10:02:00 +0000 https://www.markhneedham.com/blog/2019/05/16/kafka-basic-tutorial/ In this post we’re going to learn how to launch Kafka locally and write to and read from a topic using one of the Python drivers. To make things easy for myself, I’ve created a Docker Compose template that launches 3 containers: broker - our Kafka broker zookeeper - used by Kafka for leader election jupyter - notebooks for connecting to our Kafka broker This template can be downloaded from the mneedham/basic-kafka-tutorial repository, and reads as follows: Neo4j: keep/filter keys in a map using APOC https://www.markhneedham.com/blog/2019/05/12/neo4j-keep-filter-keys-map-apoc/ Sun, 12 May 2019 17:58:00 +0000 https://www.markhneedham.com/blog/2019/05/12/neo4j-keep-filter-keys-map-apoc/ In this post we’ll learn how to write a Cypher query to create a node in Neo4j containing some of the keys from a map. This post assumes that the APOC library is installed. We’ll start by creating a map that contains data from my twitter profile: :param document => { id: 14707949, name: "Mark Needham", username: "markhneedham", bio: "Developer Relations @neo4j", location: "London, United Kingdom", url: "http://www.markhneedham.com", join_date: "8 May 2008", join_time: "5:58 PM", tweets: 24710, following: 2479, followers: 5054, likes: 1014 }; We want to create a User node based on this data, but we don’t want to use all of the keys in the map. Jupyter: RuntimeError: This event loop is already running https://www.markhneedham.com/blog/2019/05/10/jupyter-runtimeerror-this-event-loop-is-already-running/ Fri, 10 May 2019 23:00:00 +0000 https://www.markhneedham.com/blog/2019/05/10/jupyter-runtimeerror-this-event-loop-is-already-running/ I’ve been using the twint library to explore the Neo4j twitter community, and ran into an initially confusing error when I moved the code I’d written into a Jupyter notebook. The first three cells of my notebook contain the following code: Cell 1: ! pip install twint Cell 2: import json import twint Cell 3: users = ["vikatakavi11", "tee_mars3"] for username in users[:10]: c = twint.Config() c.Username = username c. pyspark: Py4JJavaError: An error occurred while calling o138.loadClass.: java.lang.ClassNotFoundException: org.graphframes.GraphFramePythonAPI https://www.markhneedham.com/blog/2019/04/17/pyspark-class-not-found-exception-org-graphframes-graphframepythonapi/ Wed, 17 Apr 2019 09:00:00 +0000 https://www.markhneedham.com/blog/2019/04/17/pyspark-class-not-found-exception-org-graphframes-graphframepythonapi/ I’ve been building a Docker Container that has support for Jupyter, Spark, GraphFrames, and Neo4j, and ran into a problem that had me pulling my (metaphorical) hair out! The pyspark-notebook container gets us most of the way there, but it doesn’t have GraphFrames or Neo4j support. Adding Neo4j is as simple as pulling in the Python Driver from Conda Forge, which leaves us with GraphFrames. When I’m using GraphFrames with pyspark locally I would pull it in via the --packages config parameter, like this: Neo4j: Delete all nodes https://www.markhneedham.com/blog/2019/04/14/neo4j-delete-all-nodes/ Sun, 14 Apr 2019 12:52:00 +0000 https://www.markhneedham.com/blog/2019/04/14/neo4j-delete-all-nodes/ When experimenting with a new database, at some stage we’ll probably want to delete all our data and start again. I was trying to do this with Neo4j over the weekend and it didn’t work as I expected, so I thought I’d write the lessons I learned. We’ll be using Neo4j via the Neo4j Desktop with the default settings. This means that we have a maximum heap size of 1GB. Python: Getting GitHub download count from the GraphQL API using requests https://www.markhneedham.com/blog/2019/04/07/python-github-download-count-graphql-requests/ Sun, 07 Apr 2019 05:03:00 +0000 https://www.markhneedham.com/blog/2019/04/07/python-github-download-count-graphql-requests/ I was recently trying to use some code I shared just over a year ago to compute GitHub Project download numbers from the GraphQL API, and wanted to automate this in a Python script. It was more fiddly than I expected, so I thought I’d share the code for the benefit of future me more than anything else! Pre requisites We’re going to use the popular requests library to query the API, so we need to import that. Finding famous MPs based on their Wikipedia Page Views https://www.markhneedham.com/blog/2019/04/01/famous-mps-wikipedia-pageviews/ Mon, 01 Apr 2019 05:03:00 +0000 https://www.markhneedham.com/blog/2019/04/01/famous-mps-wikipedia-pageviews/ As part of the Graphing Brexit series of blog posts, I wanted to work out who were the most important Members of the UK parliament, and after a bit of Googling I realised that views of their Wikipedia pages would do the trick. I initially found my way to tools.wmflabs.org, which is great for exploring the popularity of an individual MP, but not so good if you want to extract the data for 600 of them. Neo4j: From Graph Model to Neo4j Import https://www.markhneedham.com/blog/2019/03/27/from-graph-model-to-neo4j-import/ Wed, 27 Mar 2019 06:42:00 +0000 https://www.markhneedham.com/blog/2019/03/27/from-graph-model-to-neo4j-import/ In this post we’re going to learn how to import the DBLP citation network into Neo4j using the Neo4j Import Tool. In case you haven’t come across this dataset before, Tomaz Bratanic has a great blog post explaining it. The tl;dr is that we have articles, authors, and venues. Authors can write articles, articles can reference other articles, and articles are presented at a venue. Below is the graph model for this dataset: Neo4j: Delete/Remove dynamic properties https://www.markhneedham.com/blog/2019/03/14/neo4j-delete-dynamic-properties/ Thu, 14 Mar 2019 06:42:00 +0000 https://www.markhneedham.com/blog/2019/03/14/neo4j-delete-dynamic-properties/ Irfan and I were playing with a dataset earlier today, and having run a bunch of graph algorithms, we had a lot of properties that we wanted to clear out. The following Cypher query puts Neo4j into the state that we were dealing with. CREATE (:Node {name: "Mark", pagerank: 2.302, louvain: 1, lpa: 4 }) CREATE (:Node {name: "Michael", degree: 23, triangles: 12, betweeness: 48.70 }) CREATE (:Node {name: "Ryan", eigenvector: 2. Neo4j: Cypher - Date ranges https://www.markhneedham.com/blog/2019/01/13/neo4j-cypher-date-ranges/ Sun, 13 Jan 2019 06:42:00 +0000 https://www.markhneedham.com/blog/2019/01/13/neo4j-cypher-date-ranges/ As part of a dataset I’ve been working with this week, I wanted to generate a collection of a range of dates using the Cypher query language. I’ve previously used the duration function, which lets you add (or subtract) from a specific date, so I thought I’d start from there. If we want to find the day after 1st January 2019, we could write the following query: neo4j> WITH date("2019-01-01") AS startDate RETURN startDate + duration({days: 1}) AS date; +------------+ | date | +------------+ | 2019-01-02 | +------------+ We can extend this code sample to find the next 5 dates from 1st January 2019 by using the range function: Neo4j: APOC - Caused by: java.io.RuntimeException: Can't read url or key file (No such file or directory) https://www.markhneedham.com/blog/2019/01/12/neo4j-apoc-file-not-found-exception-no-such-file-directory/ Sat, 12 Jan 2019 19:05:00 +0000 https://www.markhneedham.com/blog/2019/01/12/neo4j-apoc-file-not-found-exception-no-such-file-directory/ I’ve been using Neo4j’s APOC library to load some local JSON files this week, and ran into an interesting problem. The LOAD CSV tool assumes that any files you load locally are in the import directory, so I’ve got into the habit of putting my data there. Let’s check what I’m trying to import by opening the import directory: What’s in there? Just the one JSON file needs processing. If we want to import local files we need to add the following property to our Neo4j configuration file: Neo4j: Cypher - Remove consecutive duplicates from a list https://www.markhneedham.com/blog/2019/01/12/neo4j-cypher-remove-consecutive-duplicates/ Sat, 12 Jan 2019 04:32:00 +0000 https://www.markhneedham.com/blog/2019/01/12/neo4j-cypher-remove-consecutive-duplicates/ I was playing with a dataset this week and wanted to share how I removes duplicate consecutive elements from a list using the Cypher query language. For simplicity’s sake, imagine that we have this list: neo4j> return [1,2,3,3,4,4,4,5,3] AS values; +-----------------------------+ | values | +-----------------------------+ | [1, 2, 3, 3, 4, 4, 4, 5, 3] | +-----------------------------+ We want to remove the duplicate 3’s and 4’s, such that our end result should be: Python: Add query parameters to a URL https://www.markhneedham.com/blog/2019/01/11/python-add-query-parameters-url/ Fri, 11 Jan 2019 09:42:00 +0000 https://www.markhneedham.com/blog/2019/01/11/python-add-query-parameters-url/ I was recently trying to automate adding a query parameter to a bunch of URLS and came across a neat approach a long way down this StackOverflow answer, that uses the PreparedRequest class from the requests library. Let’s first get the class imported: from requests.models import PreparedRequest req = PreparedRequest() And now let’s use use this class to add a query parameter to a URL. We can do this with the following code: Python: Pandas - DataFrame plotting ignoring figure https://www.markhneedham.com/blog/2018/12/25/python-pandas-dataframe-plot-figure/ Tue, 25 Dec 2018 21:09:00 +0000 https://www.markhneedham.com/blog/2018/12/25/python-pandas-dataframe-plot-figure/ In my continued use of matplotlib I wanted to change the size of the chart I was plotting and struggled a bit to start with. We’ll use the same DataFrame as before: df = pd.DataFrame({ "name": ["Mark", "Arya", "Praveena"], "age": [34, 1, 31] }) df In my last blog post I showed how we can create a bar chart by executing the following code: df.plot.bar(x="name") plt.tight_layout() plt.show() plt.close() But how do we make it bigger? Neo4j: Pruning transaction logs more aggressively https://www.markhneedham.com/blog/2018/12/24/neo4j-prune-transaction-logs-more-aggressively/ Mon, 24 Dec 2018 21:09:00 +0000 https://www.markhneedham.com/blog/2018/12/24/neo4j-prune-transaction-logs-more-aggressively/ One thing that new users of Neo4j when playing around with it locally is how much space the transaction logs can take up, especially when we’re creating and deleting lots of data while we get started. We can see this by running the following query a few times: UNWIND range(0, 1000) AS id CREATE (:Foo {id: id}); MATCH (f:Foo) DELETE f This query creates a bunch of data before immediately deleting it. Pandas: Create matplotlib plot with x-axis label not index https://www.markhneedham.com/blog/2018/12/21/pandas-plot-x-axis-index/ Fri, 21 Dec 2018 16:57:00 +0000 https://www.markhneedham.com/blog/2018/12/21/pandas-plot-x-axis-index/ I’ve been using matplotlib a bit recently, and wanted to share a lesson I learnt about choosing the label of the x-axis. Let’s first import the libraries we’ll use in this post: import pandas as pd import matplotlib.pyplot as plt And now we’ll create a DataFrame of values that we want to chart: df = pd.DataFrame({ "name": ["Mark", "Arya", "Praveena"], "age": [34, 1, 31] }) df This is what our DataFrame looks like: PySpark: Creating DataFrame with one column - TypeError: Can not infer schema for type: <type 'int'> https://www.markhneedham.com/blog/2018/12/09/pyspark-creating-dataframe-one-column/ Sun, 09 Dec 2018 10:25:00 +0000 https://www.markhneedham.com/blog/2018/12/09/pyspark-creating-dataframe-one-column/ I’ve been playing with PySpark recently, and wanted to create a DataFrame containing only one column. I tried to do this by writing the following code: spark.createDataFrame([(1)], ["count"]) If we run that code we’ll get the following error message: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/markhneedham/projects/graph-algorithms/spark-2.4.0-bin-hadoop2.7/python/pyspark/sql/session.py", line 748, in createDataFrame rdd, schema = self._createFromLocal(map(prepare, data), schema) File "/home/markhneedham/projects/graph-algorithms/spark-2.4.0-bin-hadoop2.7/python/pyspark/sql/session.py", line 416, in _createFromLocal struct = self. Neo4j: Storing inferred relationships with APOC triggers https://www.markhneedham.com/blog/2018/11/05/neo4j-inferred-relationships-apoc-triggers/ Mon, 05 Nov 2018 06:15:00 +0000 https://www.markhneedham.com/blog/2018/11/05/neo4j-inferred-relationships-apoc-triggers/ One of my favourite things about modelling data in graphs is how easy it makes it to infer relationships between pieces of data based on other relationships. In this post we’re going to learn how to compute and store those inferred relationships using the triggers feature from the APOC library. Meetup Graph Before we get to that, let’s first understand what we mean when we say inferred relationship. We’ll create a small graph containing Person, Meetup, and Topic nodes with the following query: Neo4j Graph Algorithms: Visualising Projected Graphs https://www.markhneedham.com/blog/2018/10/31/neo4j-graph-algorithms-visualise-projected-graph/ Wed, 31 Oct 2018 18:12:00 +0000 https://www.markhneedham.com/blog/2018/10/31/neo4j-graph-algorithms-visualise-projected-graph/ A few weeks ago I wrote a blog post showing how to work out the best tennis player of all time using the Weighted PageRank algorithm, and in the process created a projected credibility graph which I want to explore in more detail in this post. As I pointed out in that post, sometimes the graph model doesn’t fit well with what the algorithm expects, so we need to project the graph on which we run graph algorithms. Neo4j Graph Algorithms: Calculating the cosine similarity of Game of Thrones episodes https://www.markhneedham.com/blog/2018/09/28/neo4j-graph-algorithms-cosine-game-of-thrones/ Fri, 28 Sep 2018 07:55:00 +0000 https://www.markhneedham.com/blog/2018/09/28/neo4j-graph-algorithms-cosine-game-of-thrones/ A couple of years ago I wrote a blog post showing how to calculate cosine similarity on Game of Thrones episodes using scikit-learn, and with the release of Similarity Algorithms in the Neo4j Graph Algorithms library I thought it was a good time to revisit that post. The dataset contains characters and episodes, and we want to calculate episode similarity based on the characters that appear in each episode. Before we run any algorithms we need to get the data into Neo4j. matplotlib - Create a histogram/bar chart for ratings/full numbers https://www.markhneedham.com/blog/2018/09/24/matplotlib-histogram-bar-chart-ratings-full-values/ Mon, 24 Sep 2018 07:55:00 +0000 https://www.markhneedham.com/blog/2018/09/24/matplotlib-histogram-bar-chart-ratings-full-values/ In my continued work with matplotlib I wanted to plot a histogram (or bar chart) for a bunch of star ratings to see how they were distributed. Before we do anything let’s import matplotlib as well as pandas: import random import pandas as pd import matplotlib matplotlib.use('TkAgg') import matplotlib.pyplot as plt plt.style.use('fivethirtyeight') Next we’ll create an array of randomly chosen star ratings between 1 and 5: stars = pd.Series([random.randint(1, 5) for _ in range(0, 100)]) We want to plot a histogram showing the proportion for each rating. matplotlib - MatplotlibDeprecationWarning: Adding an axes using the same arguments as a previous axes currently reuses the earlier instance. In a future version, a new instance will always be created and returned. https://www.markhneedham.com/blog/2018/09/18/matplotlib-matplotlib-deprecation-adding-axes/ Tue, 18 Sep 2018 07:56:00 +0000 https://www.markhneedham.com/blog/2018/09/18/matplotlib-matplotlib-deprecation-adding-axes/ In my last post I showed how to remove axes legends from a matplotlib chart, and while writing the post I actually had the change the code I used as my initial approach is now deprecated. As in the previous post, we’ll first import pandas and matplotlib: import pandas as pd import matplotlib matplotlib.use('TkAgg') import matplotlib.pyplot as plt plt.style.use('fivethirtyeight') And we’ll still use this DataFrame: df = pd.DataFrame({"label": ["A", "B", "C", "D"], "count": [12, 19, 5, 10]}) My initial approach to remove all legends was this: matplotlib - Remove axis legend https://www.markhneedham.com/blog/2018/09/18/matplotlib-remove-axis-legend/ Tue, 18 Sep 2018 07:55:00 +0000 https://www.markhneedham.com/blog/2018/09/18/matplotlib-remove-axis-legend/ I’ve been working with matplotlib a bit recently, and I wanted to remove all axis legends from my chart. It took me a bit longer than I expected to figure it out so I thought I’d write it up. Before we do anything let’s import matplotlib as well as pandas, since we’re going to plot data from a pandas DataFrame. import pandas as pd import matplotlib matplotlib.use('TkAgg') import matplotlib.pyplot as plt plt. Neo4j: Using LOAD CSV to process csv.gz files from S3 https://www.markhneedham.com/blog/2018/09/05/neo4j-load-csv-gz-s3/ Wed, 05 Sep 2018 07:26:00 +0000 https://www.markhneedham.com/blog/2018/09/05/neo4j-load-csv-gz-s3/ I’ve been building some training material for the GraphConnect conference that happens in a couple of weeks time and I wanted to load gzipped CSV files. I got this working using Cypher’s LOAD CSV command with the file stored locally, but when I uploaded it to S3 it didn’t work as I expected. I uploaded the file to an S3 bucket and then tried to read it back like this: QuickGraph #1: Analysing Python Dependency Graph with PageRank, Closeness Centrality, and Betweenness Centrality https://www.markhneedham.com/blog/2018/07/16/quick-graph-python-dependency-graph/ Mon, 16 Jul 2018 05:25:00 +0000 https://www.markhneedham.com/blog/2018/07/16/quick-graph-python-dependency-graph/ I’ve always wanted to build a dependency graph of libraries in the Python ecosytem but I never quite got around to it…until now! I thought I might be able to get a dump of all the libraries and their dependencies, but while searching I came across this article which does a good job of explaining why that’s not possible. Finding Python Dependencies The best we can do is generate a dependency graph of our locally installed packages using the excellent pipdeptree tool. Python: Parallel download files using requests https://www.markhneedham.com/blog/2018/07/15/python-parallel-download-files-requests/ Sun, 15 Jul 2018 15:10:00 +0000 https://www.markhneedham.com/blog/2018/07/15/python-parallel-download-files-requests/ I often find myself downloading web pages with Python’s requests library to do some local scrapping when building datasets but I’ve never come up with a good way for downloading those pages in parallel. Below is the code that I use. First we’ll import the required libraries: import os import requests from time import time as timer And now a function that streams a response into a local file: Neo4j 3.4: Grouping Datetimes https://www.markhneedham.com/blog/2018/07/10/neo4j-grouping-datetimes/ Tue, 10 Jul 2018 04:21:00 +0000 https://www.markhneedham.com/blog/2018/07/10/neo4j-grouping-datetimes/ In my continued analysis of Strava runs I wanted to try and find my best runs grouped by different time components, which was actually much easier than I was expecting. Importing the dataset If you want to try out the examples below you can execute the following LOAD CSV commands to load the data: LOAD CSV WITH HEADERS FROM "https://github.com/mneedham/strava/raw/master/runs.csv" AS row MERGE (run:Run {id: toInteger(row.id)}) SET run.distance = toFloat(row. Neo4j 3.4: Syntax Error - Text cannot be parsed to a Duration (aka dealing with empty durations) https://www.markhneedham.com/blog/2018/07/09/neo4j-text-cannot-be-parsed-to-duration/ Mon, 09 Jul 2018 18:21:00 +0000 https://www.markhneedham.com/blog/2018/07/09/neo4j-text-cannot-be-parsed-to-duration/ As I continued with my travels with Neo4j 3.4’s temporal data type I came across some fun edge cases when dealing with empty durations while importing data. Imagine we’re trying to create 3 nodes from the following array of input data. Two of the rows have invalid durations! UNWIND [ {id: 12345, duration: "PT2M20S"}, {id: 12346, duration: ""}, {id: 12347, duration: null} ] AS row MERGE (run:Run {id: row.id}) SET run. Neo4j: Querying the Strava Graph using Py2neo https://www.markhneedham.com/blog/2018/06/15/neo4j-querying-strava-graph-py2neo/ Fri, 15 Jun 2018 13:45:21 +0000 https://www.markhneedham.com/blog/2018/06/15/neo4j-querying-strava-graph-py2neo/ Last week Nigel released v4 of Py2neo and given I was just getting ready to write some queries against my Strava activity graph I thought I’d give it a try. If you want to learn how to create your own Strava graph you should read my previous post, but just to recap, this is the graph model that we created: Let’s get to it! tl;dr the code in this post is available as a Jupyter notebook so if you want the code and nothing but the code head over there! Neo4j: Building a graph of Strava activities https://www.markhneedham.com/blog/2018/06/12/neo4j-building-strava-graph/ Tue, 12 Jun 2018 05:30:21 +0000 https://www.markhneedham.com/blog/2018/06/12/neo4j-building-strava-graph/ In my last post I showed how to import activities from Strava’s API into Neo4j using only the APOC library, but that was only part of the graph so I thought I’d share the rest of what I’ve done. The Graph Model In the previous post I showed how to import nodes with Run label, but there are some other pieces of data that I wanted to import as well. Neo4j APOC: Importing data from Strava's paginated JSON API https://www.markhneedham.com/blog/2018/06/05/neo4j-apoc-loading-data-strava-paginated-json-api/ Tue, 05 Jun 2018 05:30:21 +0000 https://www.markhneedham.com/blog/2018/06/05/neo4j-apoc-loading-data-strava-paginated-json-api/ Over the weekend I’ve been playing around with loading data from the Strava API into Neo4j and I started with the following Python script which creates a node with a Run label for each of my activities. If you want to follow along on your own data you’ll need to get an API key via the 'My API Application' section of the website. Once you’ve got that put it in the TOKEN environment variable and you should be good to go. Neo4j 3.4: Gotchas when working with Durations https://www.markhneedham.com/blog/2018/06/03/neo4j-3.4-gotchas-working-with-durations/ Sun, 03 Jun 2018 20:11:21 +0000 https://www.markhneedham.com/blog/2018/06/03/neo4j-3.4-gotchas-working-with-durations/ Continuing with my explorations of Strava data in Neo4j I wanted to share some things I learnt while trying to work out my pace for certain distances. Before we get into the pace calculations let’s first understand how the duration function works. If we run the following query we might expect to get back the same value that we put in… RETURN duration({seconds: 413.77}).seconds AS seconds ╒═════════╕ │"seconds"│ ╞═════════╡ │413 │ └─────────┘ …but as you can see the value gets rounded down to the nearest number, losing us some accuracy. Neo4j 3.4: Formatting instances of the Duration and Datetime date types https://www.markhneedham.com/blog/2018/06/03/neo4j-3.4-formatting-instances-durations-dates/ Sun, 03 Jun 2018 04:08:21 +0000 https://www.markhneedham.com/blog/2018/06/03/neo4j-3.4-formatting-instances-durations-dates/ In my last blog post I showed how to compare instances of Neo4j’s Duration data type, and in the middle of the post I realised that I needed to use the APOC library to return the value in the format I wanted. This was the solution I ended up with: WITH duration({seconds: 100}) AS duration RETURN apoc.text.lpad(toString(duration.minutes), 2, "0") + ":" + apoc.text.lpad(toString(duration.secondsOfMinute), 2, "0") If we run that query this is the output: Neo4j 3.4: Comparing durations https://www.markhneedham.com/blog/2018/06/02/neo4j-3.4-comparing-durations/ Sat, 02 Jun 2018 03:24:21 +0000 https://www.markhneedham.com/blog/2018/06/02/neo4j-3.4-comparing-durations/ Neo4j 3.4 saw the introduction of the temporal date type, which my colleague Adam Cowley covered in his excellent blog post, and in this post I want to share my experience using durations from my Strava runs. I’ll show how to load the whole Strava dataset in another blog post but for now we’ll just manually create some durations based on the elapsed time in seconds that Strava provides. We can run the following query to convert duration in seconds into the duration type: Interpreting Word2vec or GloVe embeddings using scikit-learn and Neo4j graph algorithms https://www.markhneedham.com/blog/2018/05/19/interpreting-word2vec-glove-embeddings-sklearn-neo4j-graph-algorithms/ Sat, 19 May 2018 09:47:21 +0000 https://www.markhneedham.com/blog/2018/05/19/interpreting-word2vec-glove-embeddings-sklearn-neo4j-graph-algorithms/ A couple of weeks I came across a paper titled Parameter Free Hierarchical Graph-Based Clustering for Analyzing Continuous Word Embeddings via Abigail See's blog post about ACL 2017. The paper explains an algorithm that helps to make sense of word embeddings generated by algorithms such as Word2vec and GloVe. I’m fascinated by how graphs can be used to interpret seemingly black box data, so I was immediately intrigued and wanted to try and reproduce their findings using Neo4j. Predicting movie genres with node2Vec and Tensorflow https://www.markhneedham.com/blog/2018/05/11/node2vec-tensorflow/ Fri, 11 May 2018 08:12:21 +0000 https://www.markhneedham.com/blog/2018/05/11/node2vec-tensorflow/ In my previous post we looked at how to get up and running with the node2Vec algorithm, and in this post we’ll learn how we can feed graph embeddings into a simple Tensorflow model. Recall that node2Vec takes in a list of edges (or relationships) and gives us back an embedding (array of numbers) for each node. This time we’re going to run the algorithm over a movies recommendation dataset from the Neo4j Sandbox. Exploring node2vec - a graph embedding algorithm https://www.markhneedham.com/blog/2018/05/11/exploring-node2vec-graph-embedding-algorithm/ Fri, 11 May 2018 08:08:21 +0000 https://www.markhneedham.com/blog/2018/05/11/exploring-node2vec-graph-embedding-algorithm/ In my explorations of graph based machine learning, one algorithm I came across is called node2Vec. The paper describes it as "an algorithmic framework for learning continuous feature representations for nodes in networks". So what does the algorithm do? From the website: The node2vec framework learns low-dimensional representations for nodes in a graph by optimizing a neighborhood preserving objective. The objective is flexible, and the algorithm accommodates for various definitions of network neighborhoods by simulating biased random walks. Tensorflow 1.8: Hello World using the Estimator API https://www.markhneedham.com/blog/2018/05/05/tensorflow-18-hello-world-using-estimator-api/ Sat, 05 May 2018 00:31:34 +0000 https://www.markhneedham.com/blog/2018/05/05/tensorflow-18-hello-world-using-estimator-api/ Over the last week I’ve been going over various Tensorflow tutorials and one of the best ones when getting started is Sidath Asiri’s Hello World in TensorFlow, which shows how to build a simple linear classifier on the Iris dataset. I’ll use the same data as Sidath, so if you want to follow along you’ll need to download these files: iris_training.csv iris_test.csv Loading data The way we load data will remain exactly the same - we’ll still be reading it into a Pandas dataframe: Python via virtualenv on Mac OS X: RuntimeError: Python is not installed as a framework. https://www.markhneedham.com/blog/2018/05/04/python-runtime-error-osx-matplotlib-not-installed-as-framework-mac/ Fri, 04 May 2018 22:03:08 +0000 https://www.markhneedham.com/blog/2018/05/04/python-runtime-error-osx-matplotlib-not-installed-as-framework-mac/ I’ve previously written a couple of blog posts about my troubles getting matplotlib to play nicely and I run into a slightly different variant today while following Sidath Asiri’s Hello World in TensorFlow tutorial. When I ran the script using a version of Python installed via virtualenv I got the following exception: Traceback (most recent call last): File "iris.py", line 4, in <module> from matplotlib import pyplot as plt File "/Users/markneedham/projects/tensorflow-playground/a/lib/python3. PyData London 2018 Conference Experience Report https://www.markhneedham.com/blog/2018/04/29/pydata-london-2018/ Sun, 29 Apr 2018 11:54:02 +0000 https://www.markhneedham.com/blog/2018/04/29/pydata-london-2018/ Over the last few days I attended PyData London 2018 and wanted to share my experience. The PyData series of conferences aim to bring together users and developers of data analysis tools to share ideas and learn from each other. I presented a talk on building a recommendation with Python and Neo4j at the 2016 version but didn’t attend last year. The organisers said there were ~ 550 attendees spread over 1 day of tutorials and 2 days of talks. Python: Serialize and Deserialize Numpy 2D arrays https://www.markhneedham.com/blog/2018/04/07/python-serialize-deserialize-numpy-2d-arrays/ Sat, 07 Apr 2018 19:38:36 +0000 https://www.markhneedham.com/blog/2018/04/07/python-serialize-deserialize-numpy-2d-arrays/ I’ve been playing around with saving and loading scikit-learn models and needed to serialize and deserialize Numpy arrays as part of the process. I could use pickle but that seems a bit overkill so I decided instead to save the byte representation of the array. We can get that representation by calling the tobytes method on a Numpy array: import numpy as np >>> np.array([ [1,2,3], [4,5,6], [7,8,9] ]) array([[1, 2, 3], [4, 5, 6], [7, 8, 9]]) >>> np. Python 3: Converting a list to a dictionary with dictionary comprehensions https://www.markhneedham.com/blog/2018/04/02/python-list-to-dictionary-comprehensions/ Mon, 02 Apr 2018 04:20:00 +0000 https://www.markhneedham.com/blog/2018/04/02/python-list-to-dictionary-comprehensions/ When coding in Python I often find myself with lists containing key/value pairs that I want to convert to a dictionary. In a recent example I had the following code: values = [{'key': 'name', 'value': 'Mark'}, {'key': 'age', 'value': 34}] And I wanted to create a dictionary that had the keys name and age and their respective values. The easiest way to convert this list to a dictionary is to iterate over the list and construct the dictionary key by key: GitHub: Getting the download count for a release https://www.markhneedham.com/blog/2018/03/23/github-release-download-count/ Fri, 23 Mar 2018 15:49:48 +0000 https://www.markhneedham.com/blog/2018/03/23/github-release-download-count/ At Neo4j we distribute several of our Developer Relations projects via GitHub Releases so I was curious whether there was a way to see how many people had downloaded them. I found an article explaining how to do it on v3 of the GitHub API, but I’ve got used to the v4 GraphQL API and I’m not going back! Thankfully it’s not too difficult to figure out. GitHub let you explore the API via the GitHub GraphQL Explorer and the following query gets us the information we require: Neo4j Desktop: undefined: Unable to extract host from undefined https://www.markhneedham.com/blog/2018/03/20/neo4j-undefined-unable-to-extract-host-from-undefined/ Tue, 20 Mar 2018 17:51:10 +0000 https://www.markhneedham.com/blog/2018/03/20/neo4j-undefined-unable-to-extract-host-from-undefined/ During a training session I facilitated today one of the attendees got the following error message while trying to execute a query inside the Neo4j Desktop. This error message happens if we try to run a query when the database hasn’t been started, and would usually be accompanied by this screen: On this occasion that wasn’t happening, but we can easily fix it by going back to the project screen and starting the database: Neo4j: Using the Neo4j Import Tool with the Neo4j Desktop https://www.markhneedham.com/blog/2018/03/19/neo4j-using-neo4j-import-tool-with-neo4j-desktop/ Mon, 19 Mar 2018 21:38:13 +0000 https://www.markhneedham.com/blog/2018/03/19/neo4j-using-neo4j-import-tool-with-neo4j-desktop/ Last week as part of a modelling and import webinar I showed how to use the Neo4j Import Tool to create a graph of the Yelp Open Dataset: Afterwards I realised that I didn’t show how to use the tool if you already have an existing database in place so this post will show how to do that. Imagine we have a Neo4j Desktop project that looks like this: Neo4j: Cypher - Neo.ClientError.Statement.TypeError: Don't know how to add Double and String https://www.markhneedham.com/blog/2018/03/14/neo4j-cypher-neo-clienterror-statement-typeerror-dont-know-add-double-string/ Wed, 14 Mar 2018 16:53:33 +0000 https://www.markhneedham.com/blog/2018/03/14/neo4j-cypher-neo-clienterror-statement-typeerror-dont-know-add-double-string/ I recently upgraded a Neo4j backed application from Neo4j 3.2 to Neo4j 3.3 and came across an interesting change in behaviour around type coercion which led to my application throwing a bunch of errors. In Neo4j 3.2 and earlier if you added a String to a Double it would coerce the Double to a String and concatenate the values. The following would therefore be valid Cypher: RETURN toFloat("1.0") + " Mark" ╒══════════╕ │"result" │ ╞══════════╡ │"1. Yelp: Reverse geocoding businesses to extract detailed location information https://www.markhneedham.com/blog/2018/03/14/yelp-reverse-geocoding-businesses-extract-detailed-location-information/ Wed, 14 Mar 2018 08:53:04 +0000 https://www.markhneedham.com/blog/2018/03/14/yelp-reverse-geocoding-businesses-extract-detailed-location-information/ I’ve been playing around with the Yelp Open Dataset and wanted to extract more detailed location information for each business. This is an example of the JSON representation of one business: $ cat dataset/business.json | head -n1 | jq { "business_id": "FYWN1wneV18bWNgQjJ2GNg", "name": "Dental by Design", "neighborhood": "", "address": "4855 E Warner Rd, Ste B9", "city": "Ahwatukee", "state": "AZ", "postal_code": "85044", "latitude": 33.3306902, "longitude": -111.9785992, "stars": 4, "review_count": 22, "is_open": 1, "attributes": { "AcceptsInsurance": true, "ByAppointmentOnly": true, "BusinessAcceptsCreditCards": true }, "categories": [ "Dentists", "General Dentistry", "Health & Medical", "Oral Surgeons", "Cosmetic Dentists", "Orthodontists" ], "hours": { "Friday": "7:30-17:00", "Tuesday": "7:30-17:00", "Thursday": "7:30-17:00", "Wednesday": "7:30-17:00", "Monday": "7:30-17:00" } } The businesses reside in different countries so I wanted to extract the area/county/state and the country for each of them. Running asciidoctor-pdf on TeamCity https://www.markhneedham.com/blog/2018/03/13/running-asciidoctor-pdf-teamcity/ Tue, 13 Mar 2018 21:57:14 +0000 https://www.markhneedham.com/blog/2018/03/13/running-asciidoctor-pdf-teamcity/ I’ve been using asciidoctor-pdf to generate PDF and while I was initially running the tool locally I eventually decided to setup a build on TeamCity. It was a bit trickier than I expected, mostly because I’m not that familiar with deploying Ruby applications, but I thought I’d capture what I’ve done for future me. I have the following Gemfile that installs asciidoctor-pdf and its dependencies: Gemfile source 'https://rubygems.org' gem 'prawn' gem 'addressable' gem 'prawn-svg' gem 'prawn-templates' gem 'asciidoctor-pdf' I don’t have permissions to install gems globally on the build agents so I’m bundling those up into the vendor directory. Neo4j Import: java.lang.IllegalStateException: Mixing specified and unspecified group belongings in a single import isn't supported https://www.markhneedham.com/blog/2018/03/07/neo4j-import-java-lang-illegalstateexception-mixing-specified-unspecified-group-belongings-single-import-isnt-supported/ Wed, 07 Mar 2018 03:11:12 +0000 https://www.markhneedham.com/blog/2018/03/07/neo4j-import-java-lang-illegalstateexception-mixing-specified-unspecified-group-belongings-single-import-isnt-supported/ I’ve been working with the Neo4j Import Tool recently after a bit of a break and ran into an interesting error message that I initially didn’t understand. I had some CSV files containing nodes that I wanted to import into Neo4j. Their contents look like this: $ cat people_header.csv name:ID(Person) $ cat people.csv "Mark" "Michael" "Ryan" "Will" "Jennifer" "Karin" $ cat companies_header.csv name:ID(Company) $ cat companies.csv "Neo4j" I find it easier to use separate header files because I often make typos with my column names and it’s easier to update a single line file than to open a multi-million line file and change the first line. Asciidoctor: Creating a macro https://www.markhneedham.com/blog/2018/02/19/asciidoctor-creating-macro/ Mon, 19 Feb 2018 20:51:31 +0000 https://www.markhneedham.com/blog/2018/02/19/asciidoctor-creating-macro/ I’ve been writing the TWIN4j blog for almost a year now and during that time I’ve written a few different asciidoc macros to avoid repetition. The most recent one I wrote does the formatting around the Featured Community Member of the Week. I call it like this from the asciidoc, passing in the name of the person and a link to an image: featured::https://s3.amazonaws.com/dev.assets.neo4j.com/wp-content/uploads/20180202004247/this-week-in-neo4j-3-february-2018.jpg[name="Suellen Stringer-Hye"] The code for the macro has two parts. Tensorflow: Kaggle Spooky Authors Bag of Words Model https://www.markhneedham.com/blog/2018/01/29/tensorflow-kaggle-spooky-authors-bag-words-model/ Mon, 29 Jan 2018 06:51:10 +0000 https://www.markhneedham.com/blog/2018/01/29/tensorflow-kaggle-spooky-authors-bag-words-model/ I’ve been playing around with some Tensorflow tutorials recently and wanted to see if I could create a submission for Kaggle’s Spooky Author Identification competition that I’ve written about recently. My model is based on one from the text classification tutorial. The tutorial shows how to create custom Estimators which we can learn more about in a post on the Google Developers blog. Imports Let’s get started. First, our imports: Asciidoc to Asciidoc: Exploding includes https://www.markhneedham.com/blog/2018/01/23/asciidoc-asciidoc-exploding-includes/ Tue, 23 Jan 2018 21:11:49 +0000 https://www.markhneedham.com/blog/2018/01/23/asciidoc-asciidoc-exploding-includes/ One of my favourite features in AsciiDoc is the ability to include other files, but when using lots of includes is that it becomes difficult to read the whole document unless you convert it to one of the supported backends. $ asciidoctor --help Usage: asciidoctor [OPTION]... FILE... Translate the AsciiDoc source FILE or FILE(s) into the backend output format (e.g., HTML 5, DocBook 4.5, etc.) By default, the output is written to a file with the basename of the source file and the appropriate extension. Strava: Calculating the similarity of two runs https://www.markhneedham.com/blog/2018/01/18/strava-calculating-similarity-two-runs/ Thu, 18 Jan 2018 23:35:25 +0000 https://www.markhneedham.com/blog/2018/01/18/strava-calculating-similarity-two-runs/ I go running several times a week and wanted to compare my runs against each other to see how similar they are. I record my runs with the Strava app and it has an API that returns lat/long coordinates for each run in the Google encoded polyline algorithm format. We can use the polyline library to decode these values into a list of lat/long tuples. For example: import polyline polyline. Leaflet: Fit polyline in view https://www.markhneedham.com/blog/2017/12/31/leaflet-fit-polyline-view/ Sun, 31 Dec 2017 17:35:03 +0000 https://www.markhneedham.com/blog/2017/12/31/leaflet-fit-polyline-view/ I’ve been playing with the Leaflet.js library over the Christmas holidays to visualise running routes drawn onto the map using a Polyline and I wanted to zoom the map the right amount to see all the points. Pre requisites We have the following HTML to define the div that will contain the map. <div id="container"> <div id="map" style="width: 100%; height: 100%"> </div> </div> We also need to import the following Javascript and CSS files: Ethereum Hello World Example using solc and web3 https://www.markhneedham.com/blog/2017/12/28/ethereum-hello-world-example-using-solc-and-web3/ Thu, 28 Dec 2017 11:03:56 +0000 https://www.markhneedham.com/blog/2017/12/28/ethereum-hello-world-example-using-solc-and-web3/ I’ve been trying to find an Ethereum Hello World example and came across Thomas Conté’s excellent post that shows how to compile and deploy an Ethereum smart contract with solc and web3. In the latest version of web3 the API has changed to be based on promises so I decided to translate Thomas' example. Let’s get started. Install npm libraries We need to install these libraries before we start: Morning Pages: What should I write about? https://www.markhneedham.com/blog/2017/12/27/morning-pages-write/ Wed, 27 Dec 2017 23:28:35 +0000 https://www.markhneedham.com/blog/2017/12/27/morning-pages-write/ I’ve been journalling for almost 2 years now but some days I get stuck and can’t think of anything to write about. I did a bit of searching to see if anybody had advice on solving this problem and found a few different articles: The Productive Benefits of Journaling (plus 11 ideas for making the habit stick) Read This If You Want To Keep A Journal But Don’t Know How scikit-learn: Using GridSearch to tune the hyper-parameters of VotingClassifier https://www.markhneedham.com/blog/2017/12/10/scikit-learn-using-gridsearch-tune-hyper-parameters-votingclassifier/ Sun, 10 Dec 2017 07:55:43 +0000 https://www.markhneedham.com/blog/2017/12/10/scikit-learn-using-gridsearch-tune-hyper-parameters-votingclassifier/ In my last blog post I showed how to create a multi class classification ensemble using scikit-learn’s http://scikit-learn.org/stable/modules/ensemble.html#voting-classifier and finished mentioning that I didn’t know which classifiers should be part of the ensemble. We need to get a better score with each of the classifiers in the ensemble otherwise they can be excluded. We have a TF/IDF based classifier as well as well as the classifiers I wrote about in the last post. scikit-learn: Building a multi class classification ensemble https://www.markhneedham.com/blog/2017/12/05/scikit-learn-building-multi-class-classification-ensemble/ Tue, 05 Dec 2017 22:19:34 +0000 https://www.markhneedham.com/blog/2017/12/05/scikit-learn-building-multi-class-classification-ensemble/ For the Kaggle Spooky Author Identification I wanted to combine multiple classifiers together into an ensemble and found the VotingClassifier that does exactly that. We need to predict the probability that a sentence is written by one of three authors so the VotingClassifier needs to make a 'soft' prediction. If we only needed to know the most likely author we could have it make a 'hard' prediction instead. We start with three classifiers which generate different n-gram based features. Python: Combinations of values on and off https://www.markhneedham.com/blog/2017/12/03/python-combinations-values-off/ Sun, 03 Dec 2017 17:23:14 +0000 https://www.markhneedham.com/blog/2017/12/03/python-combinations-values-off/ In my continued exploration of Kaggle’s Spooky Authors competition, I wanted to run a GridSearch turning on and off different classifiers to work out the best combination. I therefore needed to generate combinations of 1s and 0s enabling different classifiers. e.g. if we had 3 classifiers we’d generate these combinations 0 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 1 where. Neo4j: Cypher - Property values can only be of primitive types or arrays thereof. https://www.markhneedham.com/blog/2017/12/01/neo4j-cypher-property-values-can-primitive-types-arrays-thereof/ Fri, 01 Dec 2017 22:09:17 +0000 https://www.markhneedham.com/blog/2017/12/01/neo4j-cypher-property-values-can-primitive-types-arrays-thereof/ I ran into an interesting Cypher error message earlier this week while trying to create an array property on a node which I thought I’d share. This was the Cypher query I wrote: CREATE (:Person {id: [1, "mark", 2.0]}) which results in this error: Neo.ClientError.Statement.TypeError Property values can only be of primitive types or arrays thereof. We actually are storing an array of primitives but we have a mix of different types which isn’t allowed. Python: Learning about defaultdict's handling of missing keys https://www.markhneedham.com/blog/2017/12/01/python-learning-defaultdicts-handling-missing-keys/ Fri, 01 Dec 2017 15:26:36 +0000 https://www.markhneedham.com/blog/2017/12/01/python-learning-defaultdicts-handling-missing-keys/ While reading the scikit-learn code I came across a bit of code that I didn’t understand for a while but in retrospect is quite neat. This is the code snippet that intrigued me: vocabulary = defaultdict() vocabulary.default_factory = vocabulary.__len__ Let’s quickly see how it works by adapting an example from scikit-learn: >>> from collections import defaultdict >>> vocabulary = defaultdict() >>> vocabulary.default_factory = vocabulary.__len__ >>> vocabulary["foo"] 0 >>> vocabulary.items() dict_items([('foo', 0)]) >>> vocabulary["bar"] 1 >>> vocabulary. scikit-learn: Creating a matrix of named entity counts https://www.markhneedham.com/blog/2017/11/29/scikit-learn-creating-a-matrix-of-named-entity-counts/ Wed, 29 Nov 2017 23:01:38 +0000 https://www.markhneedham.com/blog/2017/11/29/scikit-learn-creating-a-matrix-of-named-entity-counts/ I’ve been trying to improve my score on Kaggle’s Spooky Author Identification competition, and my latest idea was building a model which used named entities extracted using the polyglot NLP library. We’ll start by learning how to extract entities form a sentence using polyglot which isn’t too tricky: >>> from polyglot.text import Text >>> doc = "My name is David Beckham. Hello from London, England" >>> Text(doc, hint_language_code="en").entities [I-PER(['David', 'Beckham']), I-LOC(['London']), I-LOC(['England'])] This sentence contains three entities. Python: polyglot - ModuleNotFoundError: No module named 'icu' https://www.markhneedham.com/blog/2017/11/28/python-polyglot-modulenotfounderror-no-module-named-icu/ Tue, 28 Nov 2017 19:52:13 +0000 https://www.markhneedham.com/blog/2017/11/28/python-polyglot-modulenotfounderror-no-module-named-icu/ I wanted to use the polyglot NLP library that my colleague Will Lyon mentioned in his analysis of Russian Twitter Trolls but had installation problems which I thought I’d share in case anyone else experiences the same issues. I started by trying to install polyglot: $ pip install polyglot ImportError: No module named 'icu' Hmmm I’m not sure what icu is but luckily there’s a GitHub issue covering this problem. Python 3: TypeError: unsupported format string passed to numpy.ndarray.*format* https://www.markhneedham.com/blog/2017/11/19/python-3-typeerror-unsupported-format-string-passed-to-numpy-ndarray-__format__/ Sun, 19 Nov 2017 07:16:56 +0000 https://www.markhneedham.com/blog/2017/11/19/python-3-typeerror-unsupported-format-string-passed-to-numpy-ndarray-__format__/ This post explains how to work around a change in how Python string formatting works for numpy arrays between Python 2 and Python 3. I’ve been going through Kevin Markham's scikit-learn Jupyter notebooks and ran into a problem on the Cross Validation one, which was throwing this error when attempting to print the KFold example: Iteration Training set observations Testing set observations --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-28-007cbab507e3> in <module>() 6 print('{} {:^61} {}'. Kubernetes: Copy a dataset to a StatefulSet's PersistentVolume https://www.markhneedham.com/blog/2017/11/18/kubernetes-copy-a-dataset-to-a-statefulsets-persistentvolume/ Sat, 18 Nov 2017 12:44:37 +0000 https://www.markhneedham.com/blog/2017/11/18/kubernetes-copy-a-dataset-to-a-statefulsets-persistentvolume/ In this post we’ll learn how to copy an existing dataset to the PersistentVolumes used by a Neo4j cluster running on Kubernetes. Neo4j Clusters on Kubernetes This posts assumes that we’re familiar with deploying Neo4j on Kubernetes. I wrote an article on the Neo4j blog explaining this in more detail. The StatefulSet we create for our core servers require persistent storage, achieved via the PersistentVolumeClaim (PVC) primitive. A Neo4j cluster containing 3 core servers would have the following PVCs: Kubernetes 1.8: Using Cronjobs to take Neo4j backups https://www.markhneedham.com/blog/2017/11/17/kubernetes-1-8-using-cronjobs-take-neo4j-backups/ Fri, 17 Nov 2017 18:10:28 +0000 https://www.markhneedham.com/blog/2017/11/17/kubernetes-1-8-using-cronjobs-take-neo4j-backups/ With the release of Kubernetes 1.8 Cronjobs have graduated to beta, which means we can now more easily run Neo4j backup jobs against Kubernetes clusters. Before we learn how to write a Cronjob let’s first create a local Kubernetes cluster and deploy Neo4j. Spinup Kubernetes & Helm minikube start --memory 8192 helm init && kubectl rollout status -w deployment/tiller-deploy --namespace=kube-system Deploy a Neo4j cluster helm repo add incubator https://kubernetes-charts-incubator.storage.googleapis.com/ helm install incubator/neo4j --name neo-helm --wait --set authEnabled=false,core. Neo4j Browser: Expected entity id to be an integral value https://www.markhneedham.com/blog/2017/11/06/neo4j-browser-expected-entity-id-integral-value/ Mon, 06 Nov 2017 16:17:35 +0000 https://www.markhneedham.com/blog/2017/11/06/neo4j-browser-expected-entity-id-integral-value/ I came across an interesting error while writing a Cypher query that used parameters in the Neo4j browser which I thought I should document for future me. We’ll start with a graph that has 1,000 people: unwind range(0,1000) AS id create (:Person {id: id}) Now we’ll try and retrieve some of those people via a parameter lookup: :param ids: [0] match (p:Person) where p.id in {ids} return p ╒════════╕ │"p" │ ╞════════╡ │{"id":0}│ └────────┘ All good so far. Neo4j: Traversal query timeout https://www.markhneedham.com/blog/2017/10/31/neo4j-traversal-query-timeout/ Tue, 31 Oct 2017 21:43:17 +0000 https://www.markhneedham.com/blog/2017/10/31/neo4j-traversal-query-timeout/ I’ve been spending some of my spare time over the last few weeks creating an application that generates running routes from Open Roads data - transformed and imported into Neo4j of course! I’ve created a user defined procedure which combines several shortest path queries, but I wanted to exit any of these shortest path searches if they were taking too long. My code without a timeout looks like this: Kubernetes: Simple example of pod running https://www.markhneedham.com/blog/2017/10/21/kubernetes-simple-example-pod-running/ Sat, 21 Oct 2017 10:06:55 +0000 https://www.markhneedham.com/blog/2017/10/21/kubernetes-simple-example-pod-running/ I recently needed to create a Kubernetes pod that would 'just sit there' while I used kube cp to copy some files to a persistent volume to which it was bound. I started out with this naive pod spec: pod_no_while.yaml kind: Pod apiVersion: v1 metadata: name: marks-dummy-pod spec: containers: - name: marks-dummy-pod image: ubuntu restartPolicy: Never Let’s apply that template: $ kubectl apply -f pod_no_while.yaml pod "marks-dummy-pod" created And let’s check if we have any running pods: Neo4j: Cypher - Deleting duplicate nodes https://www.markhneedham.com/blog/2017/10/06/neo4j-cypher-deleting-duplicate-nodes/ Fri, 06 Oct 2017 16:13:33 +0000 https://www.markhneedham.com/blog/2017/10/06/neo4j-cypher-deleting-duplicate-nodes/ I had a problem on a graph I was working on recently where I’d managed to create duplicate nodes because I hadn’t applied any unique constraints. I wanted to remove the duplicates, and came across Jimmy Ruts' excellent post which shows some ways to do this. Let’s first create a graph with some duplicate nodes to play with: UNWIND range(0, 100) AS id CREATE (p1:Person {id: toInteger(rand() * id)}) MERGE (p2:Person {id: toInteger(rand() * id)}) MERGE (p3:Person {id: toInteger(rand() * id)}) MERGE (p4:Person {id: toInteger(rand() * id)}) CREATE (p1)-[:KNOWS]->(p2) CREATE (p1)-[:KNOWS]->(p3) CREATE (p1)-[:KNOWS]->(p4) Added 173 labels, created 173 nodes, set 173 properties, created 5829 relationships, completed after 408 ms. AWS: Spinning up a Neo4j instance with APOC installed https://www.markhneedham.com/blog/2017/09/30/aws-spinning-up-a-neo4j-instance-with-apoc-installed/ Sat, 30 Sep 2017 21:23:11 +0000 https://www.markhneedham.com/blog/2017/09/30/aws-spinning-up-a-neo4j-instance-with-apoc-installed/ One of the first things I do after installing Neo4j is install the APOC library, but I find it’s a bit of a manual process when spinning up a server on AWS so I wanted to simplify it a bit. There’s already a Neo4j AMI which installs Neo4j 3.2.0 and my colleague Michael pointed out that we could download APOC into the correct folder by writing a script and sending it as UserData. Serverless: Building a mini producer/consumer data pipeline with AWS SNS https://www.markhneedham.com/blog/2017/09/30/serverless-building-mini-producerconsumer-data-pipeline-aws-sns/ Sat, 30 Sep 2017 07:51:29 +0000 https://www.markhneedham.com/blog/2017/09/30/serverless-building-mini-producerconsumer-data-pipeline-aws-sns/ I wanted to create a little data pipeline with Serverless whose main use would be to run once a day, call an API, and load that data into a database. It’s mostly used to pull in recent data from that API, but I also wanted to be able to invoke it manually and specify a date range. I created the following pair of lambdas that communicate with each other via an SNS topic. Serverless: S3 - S3BucketPermissions - Action does not apply to any resource(s) in statement https://www.markhneedham.com/blog/2017/09/29/serverless-s3-s3bucketpermissions-action-does-not-apply-to-any-resources-in-statement/ Fri, 29 Sep 2017 06:09:58 +0000 https://www.markhneedham.com/blog/2017/09/29/serverless-s3-s3bucketpermissions-action-does-not-apply-to-any-resources-in-statement/ I’ve been playing around with S3 buckets with Serverless, and recently wrote the following code to create an S3 bucket and put a file into that bucket: const AWS = require("aws-sdk"); let regionParams = { 'region': 'us-east-1' } let s3 = new AWS.S3(regionParams); let s3BucketName = "marks-blog-bucket"; console.log("Creating bucket: " + s3BucketName); let bucketParams = { Bucket: s3BucketName, ACL: "public-read" }; s3.createBucket(bucketParams).promise() .then(console.log) .catch(console.error); var putObjectParams = { Body: "<html><body><h1>Hello blog! Python 3: Create sparklines using matplotlib https://www.markhneedham.com/blog/2017/09/23/python-3-create-sparklines-using-matplotlib/ Sat, 23 Sep 2017 06:51:56 +0000 https://www.markhneedham.com/blog/2017/09/23/python-3-create-sparklines-using-matplotlib/ I recently wanted to create sparklines to show how some values were changing over time. In addition, I wanted to generate them as images on the server rather than introducing a JavaScript library. Chris Seymour’s excellent gist which shows how to create sparklines inside a Pandas dataframe got me most of the way there, but I had to tweak his code a bit to get it to play nicely with Python 3. Neo4j: Cypher - Create Cypher map with dynamic keys https://www.markhneedham.com/blog/2017/09/19/neo4j-cypher-create-cypher-map-with-dynamic-keys/ Tue, 19 Sep 2017 19:30:09 +0000 https://www.markhneedham.com/blog/2017/09/19/neo4j-cypher-create-cypher-map-with-dynamic-keys/ I was recently trying to create a map in a Cypher query but wanted to have dynamic keys in that map. I started off with this query: WITH "a" as dynamicKey, "b" as dynamicValue RETURN { dynamicKey: dynamicValue } AS map ╒══════════════════╕ │"map" │ ╞══════════════════╡ │{"dynamicKey":"b"}│ └──────────────────┘ Not quite what we want! We want dynamicKey to be evaluated rather than treated as a literal. As usual, APOC comes to the rescue! Neo4j: Cypher - Rounding of floating point numbers/BigDecimals https://www.markhneedham.com/blog/2017/08/13/neo4j-cypher-rounding-of-floating-point-numbersbigdecimals/ Sun, 13 Aug 2017 07:23:46 +0000 https://www.markhneedham.com/blog/2017/08/13/neo4j-cypher-rounding-of-floating-point-numbersbigdecimals/ I was doing some data cleaning a few days ago and wanting to multiply a value by 1 million. My Cypher code to do this looked like this: with "8.37" as rawNumeric RETURN toFloat(rawNumeric) * 1000000 AS numeric ╒═════════════════╕ │"numeric" │ ╞═════════════════╡ │8369999.999999999│ └─────────────────┘ Unfortunately that suffers from the classic rounding error when working with floating point numbers. I couldn’t figure out a way to solve it using pure Cypher, but there tends to be an APOC function to solve every problem and this was no exception. Serverless: AWS HTTP Gateway - 502 Bad Gateway https://www.markhneedham.com/blog/2017/08/11/serverless-aws-http-gateway-502-bad-gateway/ Fri, 11 Aug 2017 16:01:50 +0000 https://www.markhneedham.com/blog/2017/08/11/serverless-aws-http-gateway-502-bad-gateway/ In my continued work with Serverless and AWS Lambda I ran into a problem when trying to call a HTTP gateway. My project looked like this: serverless.yaml service: http-gateway frameworkVersion: ">=1.2.0 <2.0.0" provider: name: aws runtime: python3.6 timeout: 180 functions: no-op: name: NoOp handler: handler.noop events: - http: POST noOp handler.py def noop(event, context): return "hello" Let’s deploy to AWS: $ serverless deploy Serverless: Packaging service... Serverless: Excluding development dependencies. Serverless: Python - virtualenv - { "errorMessage": "Unable to import module 'handler'" } https://www.markhneedham.com/blog/2017/08/06/serverless-python-virtualenv-errormessage-unable-import-module-handler/ Sun, 06 Aug 2017 19:03:30 +0000 https://www.markhneedham.com/blog/2017/08/06/serverless-python-virtualenv-errormessage-unable-import-module-handler/ I’ve been using the Serverless library to deploy and run some Python functions on AWS lambda recently and was initially confused about how to handle my dependencies. I tend to create a new virtualenv for each of my project so let’s get that setup first: Prerequisites $ npm install serverless $ virtualenv -p python3 a $ . a/bin/activate Now let’s create our Serverless project. I’m going to install the requests library so that I can use it in my function. AWS Lambda: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory' https://www.markhneedham.com/blog/2017/08/03/aws-lambda-libld-linux-2-bad-elf-interpreter-no-file-directory/ Thu, 03 Aug 2017 17:24:16 +0000 https://www.markhneedham.com/blog/2017/08/03/aws-lambda-libld-linux-2-bad-elf-interpreter-no-file-directory/ I’ve been working on an AWS lambda job to convert a HTML page to PDF using a Python wrapper around the wkhtmltopdf library but ended up with the following error when I tried to execute it: b'/bin/sh: ./binary/wkhtmltopdf: /lib/ld-linux.so.2: bad ELF interpreter: No such file or directory\n': Exception Traceback (most recent call last): File "/var/task/handler.py", line 33, in generate_certificate wkhtmltopdf(local_html_file_name, local_pdf_file_name) File "/var/task/lib/wkhtmltopdf.py", line 64, in wkhtmltopdf wkhp.render() File "/var/task/lib/wkhtmltopdf. PHP vs Python: Generating a HMAC https://www.markhneedham.com/blog/2017/08/02/php-vs-python-generating-a-hmac/ Wed, 02 Aug 2017 06:09:10 +0000 https://www.markhneedham.com/blog/2017/08/02/php-vs-python-generating-a-hmac/ I’ve been writing a bit of code to integrate with a ClassMarker webhook, and you’re required to check that an incoming request actually came from ClassMarker by checking the value of a base64 hash using HMAC SHA256. The example in the documentation is written in PHP which I haven’t done for about 10 years so I had to figure out how to do the same thing in Python. This is the PHP version: Docker: Building custom Neo4j images on Mac OS X https://www.markhneedham.com/blog/2017/07/26/docker-building-custom-neo4j-images-on-mac-os-x/ Wed, 26 Jul 2017 22:20:23 +0000 https://www.markhneedham.com/blog/2017/07/26/docker-building-custom-neo4j-images-on-mac-os-x/ I sometimes needs to create custom Neo4j Docker images to try things out and wanted to share my work flow, mostly for future Mark but also in case it’s useful to someone else. There’s already a docker-neo4j repository so we’ll just tweak the files in there to achieve what we want. $ git clone git@github.com:neo4j/docker-neo4j.git $ cd docker-neo4j If we want to build a Docker image for Neo4j Enterprise Edition we can run the following build target: Pandas: ValueError: The truth value of a Series is ambiguous. https://www.markhneedham.com/blog/2017/07/26/pandas-valueerror-the-truth-value-of-a-series-is-ambiguous/ Wed, 26 Jul 2017 21:41:55 +0000 https://www.markhneedham.com/blog/2017/07/26/pandas-valueerror-the-truth-value-of-a-series-is-ambiguous/ I’ve been playing around with Kaggle in my spare time over the last few weeks and came across an unexpected behaviour when trying to add a column to a dataframe. First let’s get Panda’s into our program scope: Prerequisites import pandas as pd Now we’ll create a data frame to play with for the duration of this post: >>> df = pd.DataFrame({"a": [1,2,3,4,5], "b": [2,3,4,5,6]}) >>> df a b 0 5 2 1 6 6 2 0 8 3 3 2 4 1 6 Let’s say we want to create a new column which returns True if either of the numbers are odd. Pandas/scikit-learn: get_dummies test/train sets - ValueError: shapes not aligned https://www.markhneedham.com/blog/2017/07/05/pandasscikit-learn-get_dummies-testtrain-sets-valueerror-shapes-not-aligned/ Wed, 05 Jul 2017 15:42:08 +0000 https://www.markhneedham.com/blog/2017/07/05/pandasscikit-learn-get_dummies-testtrain-sets-valueerror-shapes-not-aligned/ I’ve been using panda’s https://pandas.pydata.org/pandas-docs/stable/generated/pandas.get_dummies.html function to generate dummy columns for categorical variables to use with scikit-learn, but noticed that it sometimes doesn’t work as I expect. Prerequisites import pandas as pd import numpy as np from sklearn import linear_model Let’s say we have the following training and test sets: Training set train = pd.DataFrame({"letter":["A", "B", "C", "D"], "value": [1, 2, 3, 4]}) X_train = train.drop(["value"], axis=1) X_train = pd. Pandas: Find rows where column/field is null https://www.markhneedham.com/blog/2017/07/05/pandas-find-rows-where-columnfield-is-null/ Wed, 05 Jul 2017 14:31:04 +0000 https://www.markhneedham.com/blog/2017/07/05/pandas-find-rows-where-columnfield-is-null/ In my continued playing around with the Kaggle house prices dataset I wanted to find any columns/fields that have null values in. If we want to get a count of the number of null fields by column we can use the following code, adapted from Poonam Ligade’s kernel: Prerequisites import pandas as pd Count the null columns train = pd.read_csv("train.csv") null_columns=train.columns[train.isnull().any()] train[null_columns].isnull().sum() LotFrontage 259 Alley 1369 MasVnrType 8 MasVnrArea 8 BsmtQual 37 BsmtCond 37 BsmtExposure 38 BsmtFinType1 37 BsmtFinType2 38 Electrical 1 FireplaceQu 690 GarageType 81 GarageYrBlt 81 GarageFinish 81 GarageQual 81 GarageCond 81 PoolQC 1453 Fence 1179 MiscFeature 1406 dtype: int64 So there are lots of different columns containing null values. Shell: Create a comma separated string https://www.markhneedham.com/blog/2017/06/23/shell-create-comma-separated-string/ Fri, 23 Jun 2017 12:26:49 +0000 https://www.markhneedham.com/blog/2017/06/23/shell-create-comma-separated-string/ I recently needed to generate a string with comma separated values, based on iterating a range of numbers. e.g. we should get the following output where n = 3 foo-0,foo-1,foo-2 I only had the shell available to me so I couldn’t shell out into Python or Ruby for example. That means it’s bash scripting time! If we want to iterate a range of numbers and print them out on the screen we can write the following code: scikit-learn: Random forests - Feature Importance https://www.markhneedham.com/blog/2017/06/16/scikit-learn-random-forests-feature-importance/ Fri, 16 Jun 2017 05:55:29 +0000 https://www.markhneedham.com/blog/2017/06/16/scikit-learn-random-forests-feature-importance/ As I mentioned in a blog post a couple of weeks ago, I’ve been playing around with the Kaggle House Prices competition and the most recent thing I tried was training a random forest regressor. Unfortunately, although it gave me better results locally it got a worse score on the unseen data, which I figured meant I’d overfitted the model. I wasn’t really sure how to work out if that theory was true or not, but by chance I was reading Chris Albon’s blog and found a post where he explains how to inspect the importance of every feature in a random forest. Kubernetes: Which node is a pod on? https://www.markhneedham.com/blog/2017/06/14/kubernetes-node-pod/ Wed, 14 Jun 2017 08:49:06 +0000 https://www.markhneedham.com/blog/2017/06/14/kubernetes-node-pod/ When running Kubernetes on a cloud provider, rather than locally using minikube, it’s useful to know which node a pod is running on. The normal command to list pods doesn’t contain this information: $ kubectl get pod NAME READY STATUS RESTARTS AGE neo4j-core-0 1/1 Running 0 6m neo4j-core-1 1/1 Running 0 6m neo4j-core-2 1/1 Running 0 2m I spent a while searching for a command that I could use before I came across Ta-Ching Chen’s blog post while looking for something else. Kaggle: House Prices: Advanced Regression Techniques - Trying to fill in missing values https://www.markhneedham.com/blog/2017/06/04/kaggle-house-prices-advanced-regression-techniques-trying-fill-missing-values/ Sun, 04 Jun 2017 09:22:47 +0000 https://www.markhneedham.com/blog/2017/06/04/kaggle-house-prices-advanced-regression-techniques-trying-fill-missing-values/ I’ve been playing around with the data in Kaggle’s House Prices: Advanced Regression Techniques and while replicating Poonam Ligade’s exploratory analysis I wanted to see if I could create a model to fill in some of the missing values. Poonam wrote the following code to identify which columns in the dataset had the most missing values: import pandas as pd train = pd.read_csv('train.csv') null_columns=train.columns[train.isnull().any()] >>> print(train[null_columns].isnull().sum()) LotFrontage 259 Alley 1369 MasVnrType 8 MasVnrArea 8 BsmtQual 37 BsmtCond 37 BsmtExposure 38 BsmtFinType1 37 BsmtFinType2 38 Electrical 1 FireplaceQu 690 GarageType 81 GarageYrBlt 81 GarageFinish 81 GarageQual 81 GarageCond 81 PoolQC 1453 Fence 1179 MiscFeature 1406 dtype: int64 The one that I’m most interested in is LotFrontage, which describes 'Linear feet of street connected to property'. GraphQL-Europe: A trip to Berlin https://www.markhneedham.com/blog/2017/05/27/graphql-europe-trip-berlin/ Sat, 27 May 2017 11:31:08 +0000 https://www.markhneedham.com/blog/2017/05/27/graphql-europe-trip-berlin/ Last weekend my colleagues Will, Michael, Oskar, and I went to Berlin to spend Sunday at the GraphQL Europe conference in Berlin. Neo4j sponsored the conference as we’ve been experimenting with building a GraphQL to Neo4j integration and wanted to get some feedback from the community as well as learn what’s going on in GraphQL land. Will and Michael have written about their experience where they talk more about the hackathon we hosted so I’ll cover it more from a personal perspective. PostgreSQL: ERROR: argument of WHERE must not return a set https://www.markhneedham.com/blog/2017/05/01/postgresql-error-argument-must-not-return-set/ Mon, 01 May 2017 20:42:07 +0000 https://www.markhneedham.com/blog/2017/05/01/postgresql-error-argument-must-not-return-set/ In my last post I showed how to load and query data from the Strava API in PostgreSQL and after executing some simple queries my next task was to query more complex part of the JSON structure. Strava allows users to create segments, which are edited portions of road or trail where athletes can compete for time. I wanted to write a query to find all the times that I’d run a particular segment. Loading and analysing Strava runs using PostgreSQL JSON data type https://www.markhneedham.com/blog/2017/05/01/loading-and-analysing-strava-runs-using-postgresql-json-data-type/ Mon, 01 May 2017 19:11:54 +0000 https://www.markhneedham.com/blog/2017/05/01/loading-and-analysing-strava-runs-using-postgresql-json-data-type/ In my last post I showed how to map Strava runs using data that I’d extracted from their https://strava.github.io/api/v3/activities/ API, but the API returns a lot of other data that I discarded because I wasn’t sure what I should keep. The API returns a nested JSON structure so the easiest solution would be to save each run as an individual file but I’ve always wanted to try out PostgreSQL’s JSON data type and this seemed like a good opportunity. Leaflet: Mapping Strava runs/polylines on Open Street Map https://www.markhneedham.com/blog/2017/04/29/leaflet-strava-polylines-osm/ Sat, 29 Apr 2017 15:36:36 +0000 https://www.markhneedham.com/blog/2017/04/29/leaflet-strava-polylines-osm/ I’m a big Strava user and spent a bit of time last weekend playing around with their API to work out how to map all my runs. Strava API and polylines This is a two step process: Call the /athlete/activities/ endpoint to get a list of all my activities For each of those activities call /activities/ endpoint to get more detailed information for each activity</cite> That second API returns a 'polyline' property which the documentation describes as follows: AWS Lambda: Programmatically scheduling a CloudWatchEvent https://www.markhneedham.com/blog/2017/04/05/aws-lambda-programatically-scheduling-a-cloudwatchevent/ Wed, 05 Apr 2017 23:49:45 +0000 https://www.markhneedham.com/blog/2017/04/05/aws-lambda-programatically-scheduling-a-cloudwatchevent/ I recently wrote a blog post showing how to create a Python 'Hello World' AWS lambda function and manually invoke it, but what I really wanted to do was have it run automatically every hour. To achieve that in AWS Lambda land we need to create a CloudWatch Event. The documentation describes them as follows: Using simple rules that you can quickly set up, you can match events and route them to one or more target functions or streams. AWS Lambda: Encrypted environment variables https://www.markhneedham.com/blog/2017/04/03/aws-lambda-encrypted-environment-variables/ Mon, 03 Apr 2017 05:49:53 +0000 https://www.markhneedham.com/blog/2017/04/03/aws-lambda-encrypted-environment-variables/ Continuing on from my post showing how to create a 'Hello World' AWS lambda function I wanted to pass encrypted environment variables to my function. The following function takes in both an encrypted and unencrypted variable and prints them out. Don’t print out encrypted variables in a real function, this is just so we can see the example working! import boto3 import os from base64 import b64decode def lambda_handler(event, context): encrypted = os. AWS Lambda: Programatically create a Python 'Hello World' function https://www.markhneedham.com/blog/2017/04/02/aws-lambda-programatically-create-a-python-hello-world-function/ Sun, 02 Apr 2017 22:11:47 +0000 https://www.markhneedham.com/blog/2017/04/02/aws-lambda-programatically-create-a-python-hello-world-function/ I’ve been playing around with AWS Lambda over the last couple of weeks and I wanted to automate the creation of these functions and all their surrounding config. Let’s say we have the following Hello World function: ~python def lambda_handler(event, context): print("Hello world") ~ To upload it to AWS we need to put it inside a zip file so let’s do that: ~bash $ zip HelloWorld.zip HelloWorld.py ~ ~bash $ unzip -l HelloWorld. My top 10 technology podcasts https://www.markhneedham.com/blog/2017/03/30/top-10-technology-podcasts/ Thu, 30 Mar 2017 22:38:47 +0000 https://www.markhneedham.com/blog/2017/03/30/top-10-technology-podcasts/ For the last six months I’ve been listening to 2 or 3 technology podcasts every day while out running and on my commute and I thought it’d be cool to share some of my favourites. I listen to all of these on the Podbean android app which seems pretty good. It can’t read the RSS feeds of some podcasts but other than that it’s worked well. Anyway, on with the podcasts: Luigi: Defining dynamic requirements (on output files) https://www.markhneedham.com/blog/2017/03/28/luigi-defining-dynamic-requirements-on-output-files/ Tue, 28 Mar 2017 05:39:04 +0000 https://www.markhneedham.com/blog/2017/03/28/luigi-defining-dynamic-requirements-on-output-files/ In my last blog post I showed how to convert a JSON document containing meetup groups into a CSV file using Luigi, the Python library for building data pipelines. As well as creating that CSV file I wanted to go back to the meetup.com API and download all the members of those groups. This was a rough flow of what i wanted to do: Take JSON document containing all groups Luigi: An ExternalProgramTask example - Converting JSON to CSV https://www.markhneedham.com/blog/2017/03/25/luigi-externalprogramtask-example-converting-json-csv/ Sat, 25 Mar 2017 14:09:59 +0000 https://www.markhneedham.com/blog/2017/03/25/luigi-externalprogramtask-example-converting-json-csv/ I’ve been playing around with the Python library Luigi which is used to build pipelines of batch jobs and I struggled to find an example of an ExternalProgramTask so this is my attempt at filling that void. I’m building a little data pipeline to get data from the meetup.com API and put it into CSV files that can be loaded into Neo4j using the LOAD CSV command. The first task I created calls the /groups endpoint and saves the result into a JSON file: Python 3: TypeError: Object of type 'dict_values' is not JSON serializable https://www.markhneedham.com/blog/2017/03/19/python-3-typeerror-object-type-dict_values-not-json-serializable/ Sun, 19 Mar 2017 16:40:03 +0000 https://www.markhneedham.com/blog/2017/03/19/python-3-typeerror-object-type-dict_values-not-json-serializable/ I’ve recently upgraded to Python 3 (I know, took me a while!) and realised that one of my scripts that writes JSON to a file no longer works! This is a simplified version of what I’m doing: >>> import json >>> x = {"mark": {"name": "Mark"}, "michael": {"name": "Michael"} } >>> json.dumps(x.values()) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/local/Cellar/python3/3.6.0/Frameworks/Python.framework/Versions/3.6/lib/python3.6/json/__init__.py", line 231, in dumps return _default_encoder. Neo4j: apoc.date.parse - java.lang.IllegalArgumentException: Illegal pattern character 'T' / java.text.ParseException: Unparseable date: "2012-11-12T08:46:15Z" https://www.markhneedham.com/blog/2017/03/06/neo4j-apoc-date-parse-java-lang-illegalargumentexception-illegal-pattern-character-t-java-text-parseexception-unparseable-date-2012-11-12t084615z/ Mon, 06 Mar 2017 20:52:01 +0000 https://www.markhneedham.com/blog/2017/03/06/neo4j-apoc-date-parse-java-lang-illegalargumentexception-illegal-pattern-character-t-java-text-parseexception-unparseable-date-2012-11-12t084615z/ I often find myself wanting to convert date strings into Unix timestamps using Neo4j’s APOC library and unfortunately some sources don’t use the format that apoc.date.parse expects. e.g. return apoc.date.parse("2012-11-12T08:46:15Z",'s') AS ts Failed to invoke function `apoc.date.parse`: Caused by: java.lang.IllegalArgumentException: java.text.ParseException: Unparseable date: "2012-11-12T08:46:15Z" We need to define the format explicitly so the SimpleDataFormat documentation comes in handy. I tried the following: return apoc.date.parse("2012-11-12T08:46:15Z",'s',"yyyy-MM-ddTHH:mm:ssZ") AS ts Failed to invoke function `apoc. Neo4j: Graphing the 'My name is...I work' Twitter meme https://www.markhneedham.com/blog/2017/02/28/neo4j-graphing-name-work-twitter-meme/ Tue, 28 Feb 2017 15:50:27 +0000 https://www.markhneedham.com/blog/2017/02/28/neo4j-graphing-name-work-twitter-meme/ Over the last few days I’ve been watching the chain of 'My name is...' tweets kicked off by DHH with interest. As I understand it, the idea is to show that coding interview riddles/hard tasks on a whiteboard are ridiculous. Hello, my name is David. I would fail to write bubble sort on a whiteboard. I look code up on the internet all the time. I don't do riddles. __ Neo4j: How do null values even work? https://www.markhneedham.com/blog/2017/02/22/neo4j-null-values-even-work/ Wed, 22 Feb 2017 23:28:23 +0000 https://www.markhneedham.com/blog/2017/02/22/neo4j-null-values-even-work/ Every now and then I find myself wanting to import a CSV file into Neo4j and I always get confused with how to handle the various null values that can lurk within. Let’s start with an example that doesn’t have a CSV file in sight. Consider the following list and my attempt to only return null values: WITH [null, "null", "", "Mark"] AS values UNWIND values AS value WITH value WHERE value = null RETURN value (no changes, no records) Hmm that’s weird. Neo4j: Analysing a CSV file using LOAD CSV and Cypher https://www.markhneedham.com/blog/2017/02/19/neo4j-analysing-csv-file-using-load-csv-cypher/ Sun, 19 Feb 2017 22:39:05 +0000 https://www.markhneedham.com/blog/2017/02/19/neo4j-analysing-csv-file-using-load-csv-cypher/ Last week we ran our first online meetup for several years and I wanted to wanted to analyse the stats that YouTube lets you download for an event. The file I downloaded looked like this: $ cat ~/Downloads/youtube_stats_pW9boJoUxO0.csv Video IDs:, pW9boJoUxO0, Start time:, Wed Feb 15 08:57:55 2017, End time:, Wed Feb 15 10:03:10 2017 Playbacks, Peak concurrent viewers, Total view time (hours), Average session length (minutes) 348, 112, 97. ReactJS/Material-UI: Cannot resolve module 'material-ui/lib/' https://www.markhneedham.com/blog/2017/02/12/reactjsmaterial-ui-cannot-resolve-module-material-uilib/ Sun, 12 Feb 2017 22:43:53 +0000 https://www.markhneedham.com/blog/2017/02/12/reactjsmaterial-ui-cannot-resolve-module-material-uilib/ I’ve been playing around with ReactJS and the Material-UI library over the weekend and ran into this error while trying to follow one of the example from the demo application: ERROR in ./src/app/modules/Foo.js Module not found: Error: Cannot resolve module 'material-ui/lib/Subheader' in /Users/markneedham/neo/reactjs-test/src/app/modules @ ./src/app/modules/Foo.js 13:17-53 webpack: Failed to compile. This was the component code: import React from 'react' import Subheader from 'material-ui/lib/Subheader' export default React.createClass({ render() { return <div> <Subheader>Some Text</Subheader> </div> } }) which is then rendered like this: Go: Multi-threaded writing to a CSV file https://www.markhneedham.com/blog/2017/01/31/go-multi-threaded-writing-csv-file/ Tue, 31 Jan 2017 05:57:11 +0000 https://www.markhneedham.com/blog/2017/01/31/go-multi-threaded-writing-csv-file/ As part of a Go script I’ve been working on I wanted to write to a CSV file from multiple Go routines, but realised that the built in CSV Writer isn’t thread safe. My first attempt at writing to the CSV file looked like this: package main import ( "encoding/csv" "os" "log" "strconv" ) func main() { csvFile, err := os.Create("/tmp/foo.csv") if err != nil { log.Panic(err) } w := csv. Go vs Python: Parsing a JSON response from a HTTP API https://www.markhneedham.com/blog/2017/01/21/go-vs-python-parsing-a-json-response-from-a-http-api/ Sat, 21 Jan 2017 10:49:46 +0000 https://www.markhneedham.com/blog/2017/01/21/go-vs-python-parsing-a-json-response-from-a-http-api/ As part of a recommendations with Neo4j talk that I’ve presented a few times over the last year I have a set of scripts that download some data from the meetup.com API. They’re all written in Python but I thought it’d be a fun exercise to see what they’d look like in Go. My eventual goal is to try and parallelise the API calls. This is the Python version of the script: Go: First attempt at channels https://www.markhneedham.com/blog/2016/12/24/go-first-attempt-at-channels/ Sat, 24 Dec 2016 10:45:42 +0000 https://www.markhneedham.com/blog/2016/12/24/go-first-attempt-at-channels/ In a previous blog post I mentioned that I wanted to extract blips from The ThoughtWorks Radar into a CSV file and I thought this would be a good mini project for me to practice using Go. In particular I wanted to try using channels and this seemed like a good chance to do that. </p> I watched a talk by Rob Pike on designing concurrent applications where he uses the following definition of concurrency:</p> Go: cannot execute binary file: Exec format error https://www.markhneedham.com/blog/2016/12/23/go-cannot-execute-binary-file-exec-format-error/ Fri, 23 Dec 2016 18:24:12 +0000 https://www.markhneedham.com/blog/2016/12/23/go-cannot-execute-binary-file-exec-format-error/ In an earlier blog post I mentioned that I’d been building an internal application to learn a bit of Go and I wanted to deploy it to AWS. Since the application was only going to live for a couple of days I didn’t want to spend a long time build up anything fancy so my plan was just to build the executable, SSH it to my AWS instance, and then run it. Neo4j: Graphing the ThoughtWorks Technology Radar https://www.markhneedham.com/blog/2016/12/23/neo4j-graphing-the-thoughtworks-technology-radar/ Fri, 23 Dec 2016 17:40:45 +0000 https://www.markhneedham.com/blog/2016/12/23/neo4j-graphing-the-thoughtworks-technology-radar/ For a bit of Christmas holiday fun I thought it’d be cool to create a graph of the different blips on the ThoughtWorks Technology Radar and how the recommendations have changed over time. I wrote a script to extract each blip (e.g. .NET Core) and the recommendation made in each radar that it appeared in. I ended up with a CSV file: |----------------------------------------------+----------+-------------| | technology | date | suggestion | |----------------------------------------------+----------+-------------| | AppHarbor | Mar 2012 | Trial | | Accumulate-only data | Nov 2015 | Assess | | Accumulate-only data | May 2015 | Assess | | Accumulate-only data | Jan 2015 | Assess | | Buying solutions you can only afford one of | Mar 2012 | Hold | |----------------------------------------------+----------+-------------| I then wrote a Cypher script to create the following graph model: Go: Templating with the Gin Web Framework https://www.markhneedham.com/blog/2016/12/23/go-templating-with-the-gin-web-framework/ Fri, 23 Dec 2016 14:30:09 +0000 https://www.markhneedham.com/blog/2016/12/23/go-templating-with-the-gin-web-framework/ I spent a bit of time over the last week building a little internal web application using Go and the Gin Web Framework and it took me a while to get the hang of the templating language so I thought I’d write up some examples. Before we get started, I’ve got my GOPATH set to the following path: $ echo $GOPATH /Users/markneedham/projects/gocode And the project containing the examples sits inside the src directory: Docker: Unknown - Unable to query docker version: x509: certificate is valid for https://www.markhneedham.com/blog/2016/12/21/docker-unknown-unable-to-query-docker-version-x509-certificate-is-valid-for/ Wed, 21 Dec 2016 07:11:50 +0000 https://www.markhneedham.com/blog/2016/12/21/docker-unknown-unable-to-query-docker-version-x509-certificate-is-valid-for/ I was playing around with Docker locally and somehow ended up with this error when I tried to list my docker machines: $ docker-machine ls NAME ACTIVE DRIVER STATE URL SWARM DOCKER ERRORS default - virtualbox Running tcp://192.168.99.101:2376 Unknown Unable to query docker version: Get https://192.168.99.101:2376/v1.15/version: x509: certificate is valid for 192.168.99.100, not 192.168.99.101 My Google Fu was weak I couldn’t find any suggestions for what this might mean so I tried shutting it down and starting it again! Kubernetes: Simulating a network partition https://www.markhneedham.com/blog/2016/12/04/kubernetes-simulating-a-network-partition/ Sun, 04 Dec 2016 12:37:49 +0000 https://www.markhneedham.com/blog/2016/12/04/kubernetes-simulating-a-network-partition/ A couple of weeks ago I wrote a post explaining how to create a Neo4j causal cluster using Kubernetes and ... the I wanted to work out how to simulate a network partition which would put the leader on the minority side and force an election. We’ve done this on our internal tooling on AWS using the https://en.wikipedia.org/wiki/Iptables command but unfortunately that isn’t available in my container, which only has the utilities provided by BusyBox. Kubernetes: Spinning up a Neo4j 3.1 Causal Cluster https://www.markhneedham.com/blog/2016/11/25/kubernetes-spinning-up-a-neo4j-3-1-causal-cluster/ Fri, 25 Nov 2016 16:55:56 +0000 https://www.markhneedham.com/blog/2016/11/25/kubernetes-spinning-up-a-neo4j-3-1-causal-cluster/ A couple of weeks ago I wrote a blog post explaining how I’d created a Neo4j causal cluster using docker containers directly and for my next pet project I wanted to use Kubernetes as an orchestration layer so that I could declaratively change the number of servers in my cluster. I’d never used Kubernetes before but I saw a presentation showing how to use it to create an Elastic cluster at the GDG Cloud meetup a couple of months ago. Kubernetes: Writing hostname to a file https://www.markhneedham.com/blog/2016/11/22/kubernetes-writing-hostname-to-a-file/ Tue, 22 Nov 2016 19:56:31 +0000 https://www.markhneedham.com/blog/2016/11/22/kubernetes-writing-hostname-to-a-file/ Over the weekend I spent a bit of time playing around with Kubernetes and to get the hang of the technology I set myself the task of writing the hostname of the machine to a file. I’m using the excellent minikube tool to create a local Kubernetes cluster for my experiments so the first step is to spin that up: $ minikube start Starting local Kubernetes cluster... Kubectl is now configured to use the cluster. Neo4j 3.1 beta3 + docker: Creating a Causal Cluster https://www.markhneedham.com/blog/2016/11/13/neo4j-3-1-beta3-docker-creating-a-causal-cluster/ Sun, 13 Nov 2016 12:30:08 +0000 https://www.markhneedham.com/blog/2016/11/13/neo4j-3-1-beta3-docker-creating-a-causal-cluster/ Over the weekend I’ve been playing around with docker and learning how to spin up a Neo4j Causal Cluster. Causal Clustering is Neo4j’s new clustering architecture which makes use of Diego Ongaro’s Raft consensus algorithm to ensure writes are committed on a majority of servers. It’ll be available in the 3.1 series of Neo4j which is currently in beta. I’ll be using BETA3 in this post. I don’t know much about docker but luckily my colleague Kevin Van Gundy wrote a blog post a couple of weeks ago explaining how to spin up Neo4j inside a docker container which was very helpful for getting me started. Neo4j: Find the intermediate point between two lat/longs https://www.markhneedham.com/blog/2016/11/01/neo4j-find-the-intermediate-point-between-two-latlongs/ Tue, 01 Nov 2016 22:10:57 +0000 https://www.markhneedham.com/blog/2016/11/01/neo4j-find-the-intermediate-point-between-two-latlongs/ Yesterday I wrote a blog post showing how to find the midpoint between two lat/longs using Cypher which worked well as a first attempt at filling in missing locations, but I realised I could do better. As I mentioned in the last post, when I find a stop that’s missing lat/long coordinates I can usually find two nearby stops that allow me to triangulate this stop’s location. I also have train routes which indicate the number of seconds it takes to go from one stop to another, which allows me to indicate whether the location-less stop is closer to one stop than the other. Neo4j: Find the midpoint between two lat/longs https://www.markhneedham.com/blog/2016/10/31/neo4j-find-the-midpoint-between-two-latlongs/ Mon, 31 Oct 2016 19:31:46 +0000 https://www.markhneedham.com/blog/2016/10/31/neo4j-find-the-midpoint-between-two-latlongs/ Over the last couple of weekends I’ve been playing around with some transport data and I wanted to run the A* algorithm to find the quickest route between two stations. The A* algorithm takes an estimateEvaluator as one of its parameters and the evaluator looks at lat/longs of nodes to work out whether a path is worth following or not. I therefore needed to add lat/longs for each station and I found it surprisingly hard to find this location date for all the points in the dataset. Neo4j: Create dynamic relationship type https://www.markhneedham.com/blog/2016/10/30/neo4j-create-dynamic-relationship-type/ Sun, 30 Oct 2016 22:12:50 +0000 https://www.markhneedham.com/blog/2016/10/30/neo4j-create-dynamic-relationship-type/ One of the things I’ve often found frustrating when importing data using Cypher, Neo4j’s query language, is that it’s quite difficult to create dynamic relationship types. Say we have a CSV file structured like this: load csv with headers from "file:///people.csv" AS row RETURN row ╒═══════════════════════════════════════════════════════╕ │row │ ╞═══════════════════════════════════════════════════════╡ │{node1: Mark, node2: Reshmee, relationship: MARRIED_TO}│ ├───────────────────────────────────────────────────────┤ │{node1: Mark, node2: Alistair, relationship: FRIENDS} │ └───────────────────────────────────────────────────────┘ We want to create nodes with the relationship type specified in the file. Neo4j: Dynamically add property/Set dynamic property https://www.markhneedham.com/blog/2016/10/27/neo4j-dynamically-add-property/ Thu, 27 Oct 2016 05:29:30 +0000 https://www.markhneedham.com/blog/2016/10/27/neo4j-dynamically-add-property/ I’ve been playing around with a dataset which has the timetable for the national rail in the UK and they give you departure and arrival times of each train in a textual format. For example, the node to represent a stop could be created like this: CREATE (stop:Stop {arrival: "0802", departure: "0803H"}) That time format isn’t particular amenable to querying so I wanted to add another property which indicated the number of seconds since the start of the day. Neo4j: Detecting rogue spaces in CSV headers with LOAD CSV https://www.markhneedham.com/blog/2016/10/19/neo4j-detecting-rogue-spaces-in-csv-headers-with-load-csv/ Wed, 19 Oct 2016 05:16:07 +0000 https://www.markhneedham.com/blog/2016/10/19/neo4j-detecting-rogue-spaces-in-csv-headers-with-load-csv/ Last week I was helping someone load the data from a CSV file into Neo4j and we were having trouble filtering out rows which contained a null value in one of the columns. This is what the data looked like: load csv with headers from "file:///foo.csv" as row RETURN row ╒══════════════════════════════════╕ │row │ ╞══════════════════════════════════╡ │{key1: a, key2: (null), key3: c}│ ├──────────────────────────────────┤ │{key1: d, key2: e, key3: f} │ └──────────────────────────────────┘ We’d like to filter out any rows which have 'key2' as null, so let’s tweak our query to do that: Neo4j: requirement failed https://www.markhneedham.com/blog/2016/10/04/neo4j-requirement-failed/ Tue, 04 Oct 2016 22:33:43 +0000 https://www.markhneedham.com/blog/2016/10/04/neo4j-requirement-failed/ Last week during a hands on Cypher meetup, using Neo4j’s built in movie dataset, one of the attendees showed me the following query which wasn’t working as expected: MATCH (p:Person)-[:ACTED_IN]->(movie) RETURN p, COLLECT(movie.title) AS movies ORDER BY COUNT(movies) DESC LIMIT 10 requirement failed We can get a full stack trace in logs/debug.log if we run the same query from the cypher-shell, which was introduced during one fo the Neo4j 3. Neo4j: Procedure call inside a query does not support passing arguments implicitly (pass explicitly after procedure name instead) https://www.markhneedham.com/blog/2016/10/02/neo4j-procedure-call-inside-a-query-does-not-support-passing-arguments-implicitly-pass-explicitly-after-procedure-name-instead/ Sun, 02 Oct 2016 10:13:26 +0000 https://www.markhneedham.com/blog/2016/10/02/neo4j-procedure-call-inside-a-query-does-not-support-passing-arguments-implicitly-pass-explicitly-after-procedure-name-instead/ A couple of days I was trying to write a Cypher query to filter the labels in my database. I started with the following procedure call to get the list of all the labels: CALL db.labels ╒══════════╕ │label │ ╞══════════╡ │Airport │ ├──────────┤ │Flight │ ├──────────┤ │Airline │ ├──────────┤ │Movie │ ├──────────┤ │AirportDay│ ├──────────┤ │Person │ ├──────────┤ │Engineer │ └──────────┘ I was only interested in labels that contained the letter 'a' so I tweaked the query to filter the output of the procedure: scikit-learn: First steps with log_loss https://www.markhneedham.com/blog/2016/09/14/scikit-learn-first-steps-with-log_loss/ Wed, 14 Sep 2016 05:33:38 +0000 https://www.markhneedham.com/blog/2016/09/14/scikit-learn-first-steps-with-log_loss/ Over the last week I’ve spent a little bit of time playing around with the data in the Kaggle TalkingData Mobile User Demographics competition, and came across a notebook written by dune_dweller showing how to run a logistic regression algorithm on the dataset. The metric used to evaluate the output in this competition is multi class logarithmic loss, which is implemented by the http://scikit-learn.org/stable/modules/generated/sklearn.metrics.log_loss.html function in the scikit-learn library. scikit-learn: Clustering and the curse of dimensionality https://www.markhneedham.com/blog/2016/08/27/scikit-learn-clustering-and-the-curse-of-dimensionality/ Sat, 27 Aug 2016 20:32:09 +0000 https://www.markhneedham.com/blog/2016/08/27/scikit-learn-clustering-and-the-curse-of-dimensionality/ In my last post I attempted to cluster Game of Thrones episodes based on character appearances without much success. After I wrote that post I was flicking through the scikit-learn clustering documentation and noticed the following section which describes some of the weaknesses of the K-means clustering algorithm: Inertia is not a normalized metric: we just know that lower values are better and zero is optimal. But in very high-dimensional spaces, Euclidean distances tend to become inflated (this is an instance of the so-called “curse of dimensionality”). scikit-learn: Trying to find clusters of Game of Thrones episodes https://www.markhneedham.com/blog/2016/08/25/scikit-learn-trying-to-find-clusters-of-game-of-thrones-episodes/ Thu, 25 Aug 2016 22:07:25 +0000 https://www.markhneedham.com/blog/2016/08/25/scikit-learn-trying-to-find-clusters-of-game-of-thrones-episodes/ In my last post I showed how to find similar Game of Thrones episodes based on the characters that appear in different episodes. This allowed us to find similar episodes on an episode by episode basis, but I was curious whether there were groups of similar episodes that we could identify. scikit-learn provides several clustering algorithms that can run over our episode vectors and hopefully find clusters of similar episodes. Neo4j/scikit-learn: Calculating the cosine similarity of Game of Thrones episodes https://www.markhneedham.com/blog/2016/08/22/neo4jscikit-learn-calculating-the-cosine-similarity-of-game-of-thrones-episodes/ Mon, 22 Aug 2016 21:12:54 +0000 https://www.markhneedham.com/blog/2016/08/22/neo4jscikit-learn-calculating-the-cosine-similarity-of-game-of-thrones-episodes/ A couple of months ago Praveena and I created a Game of Thrones dataset to use in a workshop and I thought it’d be fun to run it through some machine learning algorithms and hopefully find some interesting insights. The dataset is available as CSV files but for this analysis I’m assuming that it’s already been imported into neo4j. If you want to import the data you can run the tutorial by typing the following into the query bar of the neo4j browser: Python: matplotlib, seaborn, virtualenv - Python is not installed as a framework https://www.markhneedham.com/blog/2016/08/14/python-matplotlibseabornvirtualenv-python-is-not-installed-as-a-framework/ Sun, 14 Aug 2016 18:56:35 +0000 https://www.markhneedham.com/blog/2016/08/14/python-matplotlibseabornvirtualenv-python-is-not-installed-as-a-framework/ Over the weekend I was following The Marketing Technologist’s content based recommender tutorial but ran into the following exception when trying to import the seaborn library: $ python 5_content_based_recommender/run.py Traceback (most recent call last): File "5_content_based_recommender/run.py", line 14, in <module> import seaborn as sns File "/Users/markneedham/projects/themarketingtechnologist/tmt/lib/python2.7/site-packages/seaborn/__init__.py", line 6, in <module> from .rcmod import * File "/Users/markneedham/projects/themarketingtechnologist/tmt/lib/python2.7/site-packages/seaborn/rcmod.py", line 8, in <module> from . import palettes, _orig_rc_params File "/Users/markneedham/projects/themarketingtechnologist/tmt/lib/python2.7/site-packages/seaborn/palettes.py", line 12, in <module> from . scikit-learn: TF/IDF and cosine similarity for computer science papers https://www.markhneedham.com/blog/2016/07/27/scitkit-learn-tfidf-and-cosine-similarity-for-computer-science-papers/ Wed, 27 Jul 2016 02:45:28 +0000 https://www.markhneedham.com/blog/2016/07/27/scitkit-learn-tfidf-and-cosine-similarity-for-computer-science-papers/ A couple of months ago I downloaded the meta data for a few thousand computer science papers so that I could try and write a mini recommendation engine to tell me what paper I should read next. Since I don’t have any data on which people read each paper a collaborative filtering approach is ruled out, so instead I thought I could try content based filtering instead. Let’s quickly check the Wikipedia definition of content based filtering: Mahout/Hadoop: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4 https://www.markhneedham.com/blog/2016/07/22/mahouthadoop-org-apache-hadoop-ipc-remoteexception-server-ipc-version-9-cannot-communicate-with-client-version-4/ Fri, 22 Jul 2016 13:55:14 +0000 https://www.markhneedham.com/blog/2016/07/22/mahouthadoop-org-apache-hadoop-ipc-remoteexception-server-ipc-version-9-cannot-communicate-with-client-version-4/ I’ve been working my way through Dragan Milcevski’s mini tutorial on using Mahout to do content based filtering on documents and reached the final step where I needed to read in the generated item-similarity files. I got the example compiling by using the following Maven dependency: <dependency> <groupId>org.apache.mahout</groupId> <artifactId>mahout-core</artifactId> <version>0.9</version> </dependency> Unfortunately when I ran the code I ran into a version incompatibility problem: Exception in thread "main" org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4 at org. Hadoop: DataNode not starting https://www.markhneedham.com/blog/2016/07/22/hadoop-datanode-not-starting/ Fri, 22 Jul 2016 13:31:15 +0000 https://www.markhneedham.com/blog/2016/07/22/hadoop-datanode-not-starting/ In my continued playing with Mahout I eventually decided to give up using my local file system and use a local Hadoop instead since that seems to have much less friction when following any examples. Unfortunately all my attempts to upload any files from my local file system to HDFS were being met with the following exception: java.io.IOException: File /user/markneedham/book2.txt could only be replicated to 0 nodes, instead of 1 at org. Mahout: Exception in thread "main" java.lang.IllegalArgumentException: Wrong FS: file:/... expected: hdfs:// https://www.markhneedham.com/blog/2016/07/21/mahout-exception-in-thread-main-java-lang-illegalargumentexception-wrong-fs-file-expected-hdfs/ Thu, 21 Jul 2016 17:57:41 +0000 https://www.markhneedham.com/blog/2016/07/21/mahout-exception-in-thread-main-java-lang-illegalargumentexception-wrong-fs-file-expected-hdfs/ I’ve been playing around with Mahout over the last couple of days to see how well it works for content based filtering. I started following a mini tutorial from Stack Overflow but ran into trouble on the first step: bin/mahout seqdirectory \ --input file:///Users/markneedham/Downloads/apache-mahout-distribution-0.12.2/foo \ --output file:///Users/markneedham/Downloads/apache-mahout-distribution-0.12.2/foo-out \ -c UTF-8 \ -chunk 64 \ -prefix mah 16/07/21 21:19:20 INFO AbstractJob: Command line arguments: {--charset=[UTF-8], --chunkSize=[64], --endPhase=[2147483647], --fileFilterClass=[org.apache.mahout.text.PrefixAdditionFilter], --input=[file:///Users/markneedham/Downloads/apache-mahout-distribution-0.12.2/foo], --keyPrefix=[mah], --method=[mapreduce], --output=[file:///Users/markneedham/Downloads/apache-mahout-distribution-0. Neo4j: Cypher - Detecting duplicates using relationships https://www.markhneedham.com/blog/2016/07/20/neo4j-cypher-detecting-duplicates-using-relationships/ Wed, 20 Jul 2016 17:32:19 +0000 https://www.markhneedham.com/blog/2016/07/20/neo4j-cypher-detecting-duplicates-using-relationships/ I’ve been building a graph of computer science papers on and off for a couple of months and now that I’ve got a few thousand loaded in I realised that there are quite a few duplicates. They’re not duplicates in the sense that there are multiple entries with the same identifier but rather have different identifiers but seem to be the same paper! e.g. there are a couple of papers titled 'Authentication in the Taos operating system': Python: Scraping elements relative to each other with BeautifulSoup https://www.markhneedham.com/blog/2016/07/11/python-scraping-elements-relative-to-each-other-with-beautifulsoup/ Mon, 11 Jul 2016 06:01:22 +0000 https://www.markhneedham.com/blog/2016/07/11/python-scraping-elements-relative-to-each-other-with-beautifulsoup/ Last week we hosted a Game of Thrones based intro to Cypher at the Women Who Code London meetup and in preparation had to scrape the wiki to build a dataset. I’ve built lots of datasets this way and it’s a painless experience as long as the pages make liberal use of CSS classes and/or IDs. Unfortunately the Game of Thrones wiki doesn’t really do that so I had to find another way to extract the data I wanted - extracting elements based on their position to more prominent elements on the page. Neo4j 3.0 Drivers - Failed to save the server ID and the certificate received from the server https://www.markhneedham.com/blog/2016/07/11/neo4j-3-0-drivers-failed-to-save-the-server-id-and-the-certificate-received-from-the-server/ Mon, 11 Jul 2016 05:21:43 +0000 https://www.markhneedham.com/blog/2016/07/11/neo4j-3-0-drivers-failed-to-save-the-server-id-and-the-certificate-received-from-the-server/ I’ve been using the Neo4j Java Driver on various local databases over the past week and ran into the following certificate problem a few times: org.neo4j.driver.v1.exceptions.ClientException: Unable to process request: General SSLEngine problem at org.neo4j.driver.internal.connector.socket.SocketClient.start(SocketClient.java:88) at org.neo4j.driver.internal.connector.socket.SocketConnection.<init>(SocketConnection.java:63) at org.neo4j.driver.internal.connector.socket.SocketConnector.connect(SocketConnector.java:52) at org.neo4j.driver.internal.pool.InternalConnectionPool.acquire(InternalConnectionPool.java:113) at org.neo4j.driver.internal.InternalDriver.session(InternalDriver.java:53) Caused by: javax.net.ssl.SSLHandshakeException: General SSLEngine problem at sun.security.ssl.Handshaker.checkThrown(Handshaker.java:1431) at sun.security.ssl.SSLEngineImpl.checkTaskThrown(SSLEngineImpl.java:535) at sun.security.ssl.SSLEngineImpl.writeAppRecord(SSLEngineImpl.java:1214) at sun.security.ssl.SSLEngineImpl.wrap(SSLEngineImpl.java:1186) at javax.net.ssl.SSLEngine.wrap(SSLEngine.java:469) at org.neo4j.driver.internal.connector.socket.TLSSocketChannel.wrap(TLSSocketChannel.java:270) at org.neo4j.driver.internal.connector.socket.TLSSocketChannel.runHandshake(TLSSocketChannel.java:131) at org.neo4j.driver.internal.connector.socket.TLSSocketChannel.<init>(TLSSocketChannel.java:95) at org.neo4j.driver.internal.connector.socket.TLSSocketChannel.<init>(TLSSocketChannel.java:77) at org.neo4j.driver.internal.connector.socket.TLSSocketChannel.<init>(TLSSocketChannel.java:70) at org. R: Sentiment analysis of morning pages https://www.markhneedham.com/blog/2016/07/09/r-sentiment-analysis-of-morning-pages/ Sat, 09 Jul 2016 06:36:51 +0000 https://www.markhneedham.com/blog/2016/07/09/r-sentiment-analysis-of-morning-pages/ A couple of months ago I came across a cool blog post by Julia Silge where she runs a sentiment analysis algorithm over her tweet stream to see how her tweet sentiment has varied over time. I wanted to give it a try but couldn’t figure out how to get a dump of my tweets so I decided to try it out on the text from my morning pages writing which I’ve been experimenting with for a few months. Python: BeautifulSoup - Insert tag https://www.markhneedham.com/blog/2016/06/30/python-beautifulsoup-insert-tag/ Thu, 30 Jun 2016 21:28:35 +0000 https://www.markhneedham.com/blog/2016/06/30/python-beautifulsoup-insert-tag/ I’ve been scraping the Game of Thrones wiki in preparation for a meetup at Women Who Code next week and while attempting to extract character allegiances I wanted to insert missing line breaks to separate different allegiances. I initially tried creating a line break like this: >>> from bs4 import BeautifulSoup >>> tag = BeautifulSoup("<br />", "html.parser") >>> tag <br/> It looks like it should work but later on in my script I check the 'name' attribute to work out whether I’ve got a line break and it doesn’t return the value I expected it to: Unix: Find files greater than date https://www.markhneedham.com/blog/2016/06/24/unix-find-files-greater-than-date/ Fri, 24 Jun 2016 16:56:17 +0000 https://www.markhneedham.com/blog/2016/06/24/unix-find-files-greater-than-date/ For the latter part of the week I’ve been running some tests against Neo4j which generate a bunch of log files and I wanted to filter those files based on the time they were created to do some further analysis. This is an example of what the directory listing looks like: $ ls -alh foo/database-agent-* -rw-r--r-- 1 markneedham wheel 2.5K 23 Jun 14:00 foo/database-agent-mac17f73-1-logs-archive-201606231300176.tar.gz -rw-r--r-- 1 markneedham wheel 8.6K 23 Jun 11:49 foo/database-agent-mac19b6b-1-logs-archive-201606231049507. Unix: Find all text below string in a file https://www.markhneedham.com/blog/2016/06/19/unix-find-all-text-below-string-in-a-file/ Sun, 19 Jun 2016 08:36:46 +0000 https://www.markhneedham.com/blog/2016/06/19/unix-find-all-text-below-string-in-a-file/ I recently wanted to parse some text out of a bunch of files so that I could do some sentiment analysis on it. Luckily the text I want is at the end of the file and doesn’t have anything after it but there is text before it that I want to get rid. The files look like this: # text I don't care about = Heading of the bit I care about # text I care about In other words I want to find the line that contains the Heading and then get all the text after that point. Unix: Split string using separator https://www.markhneedham.com/blog/2016/06/19/unix-split-string-using-separator/ Sun, 19 Jun 2016 07:22:57 +0000 https://www.markhneedham.com/blog/2016/06/19/unix-split-string-using-separator/ I recently found myself needing to iterate over a bunch of '/' separated strings on the command line and extract just the text after the last '/'. e.g. an example of one of the strings A/B/C I wanted to write some code that could split on '/' and then pick the 3rd item in the resulting collection. One way of doing this is to echo the string and then pipe it through cut: Python: Regex - matching foreign characters/unicode letters https://www.markhneedham.com/blog/2016/06/18/python-regex-matching-foreign-charactersunicode-letters/ Sat, 18 Jun 2016 07:38:04 +0000 https://www.markhneedham.com/blog/2016/06/18/python-regex-matching-foreign-charactersunicode-letters/ I’ve been back in the land of screen scrapping this week extracting data from the Game of Thrones wiki and needed to write a regular expression to pull out characters and actors. Here are some examples of the format of the data: ~text Peter Dinklage as Tyrion Lannister Daniel Naprous as Oznak zo Pahl(credited as Stunt Performer) Filip Lozić as Young Nobleman Morgan C. Jones as a Braavosi captain Adewale Akinnuoye-Agbaje as Malko ~ Unix parallel: Populating all the USB sticks https://www.markhneedham.com/blog/2016/06/01/unix-parallel-populating-all-the-usb-sticks/ Wed, 01 Jun 2016 05:53:38 +0000 https://www.markhneedham.com/blog/2016/06/01/unix-parallel-populating-all-the-usb-sticks/ The day before Graph Connect Europe 2016 we needed to create a bunch of USB sticks containing Neo4j and the training materials and eventually iterated our way to a half decent approach which made use of the GNU parallel command which I’ve always wanted to use! But first I needed to get a USB hub so I could do lots of them at the same time. I bought the EasyAcc USB 3. Neo4j vs Relational: Refactoring - Extracting node/table https://www.markhneedham.com/blog/2016/05/22/neo4j-vs-relational-refactoring-extracting-nodetable/ Sun, 22 May 2016 09:58:38 +0000 https://www.markhneedham.com/blog/2016/05/22/neo4j-vs-relational-refactoring-extracting-nodetable/ In my previous blog post I showed how to add a new property/field to a node with a label/record in a table for a football transfers dataset that I’ve been playing with. After introducing this 'nationality' property I realised that I now had some duplication in the model: </p> players.nationality and clubs.country are referring to the same countries but they’ve both got them stored as strings so we can’t ensure the integrity of our countries and ensure that we’re referring to the same country. Neo4j vs Relational: Refactoring - Add a new field/property https://www.markhneedham.com/blog/2016/05/22/neo4j-vs-relational-refactoring-add-a-new-fieldproperty/ Sun, 22 May 2016 09:09:27 +0000 https://www.markhneedham.com/blog/2016/05/22/neo4j-vs-relational-refactoring-add-a-new-fieldproperty/ A couple of months ago I presented a webinar comparing how you’d model and evolve a data model using a Postgres SQL database and Neo4j. This is what the two data models looked like after the initial data import and before any refactoring/migration had been done: Relational Graph I wanted to add a 'nationality' property to the players table in the SQL schema and to the nodes with the 'Player' label in the graph. R: substr - Getting a vector of positions https://www.markhneedham.com/blog/2016/04/18/r-substr-getting-a-vector-of-positions/ Mon, 18 Apr 2016 19:49:02 +0000 https://www.markhneedham.com/blog/2016/04/18/r-substr-getting-a-vector-of-positions/ I recently found myself writing an R script to extract parts of a string based on a beginning and end index which is reasonably easy using the https://stat.ethz.ch/R-manual/R-devel/library/base/html/substr.html function: > substr("mark loves graphs", 0, 4) [1] "mark" But what if we have a vector of start and end positions? > substr("mark loves graphs", c(0, 6), c(4, 10)) [1] "mark" Hmmm that didn’t work as I expected! It turns out we actually need to use the https://stat. R: tm - Unique words/terms per document https://www.markhneedham.com/blog/2016/04/11/r-tm-unique-wordsterms-per-document/ Mon, 11 Apr 2016 05:40:06 +0000 https://www.markhneedham.com/blog/2016/04/11/r-tm-unique-wordsterms-per-document/ I’ve been doing a bit of text mining over the weekend using the R tm package and I wanted to only count a term once per document which isn’t how it works out the box. For example let’s say we’re writing a bit of code to calculate the frequency of terms across some documents. We might write the following code: library(tm) text = c("I am Mark I am Mark", "Neo4j is cool Neo4j is cool") corpus = VCorpus(VectorSource(text)) tdm = as. Neo4j: A procedure for the SLM clustering algorithm https://www.markhneedham.com/blog/2016/02/28/neo4j-a-procedure-for-the-slm-clustering-algorithm/ Sun, 28 Feb 2016 20:40:11 +0000 https://www.markhneedham.com/blog/2016/02/28/neo4j-a-procedure-for-the-slm-clustering-algorithm/ In the middle of last year I blogged about the Smart Local Moving algorithmwhich is used for community detection in networks and with the upcoming introduction of procedures in Neo4j I thought it’d be fun to make that code accessible as one. If you want to grab the code and follow along it’s sitting on the SLM repository on my github. At the moment the procedure is hard coded to work with a KNOWS relationship between two nodes but that could easily be changed. Clojure: First steps with reducers https://www.markhneedham.com/blog/2016/01/24/clojure-first-steps-with-reducers/ Sun, 24 Jan 2016 22:01:43 +0000 https://www.markhneedham.com/blog/2016/01/24/clojure-first-steps-with-reducers/ I’ve been playing around with Clojure a bit today in preparation for a talk I’m giving next week and found myself writing the following code to apply the same function to three different scores: (defn log2 [n] (/ (Math/log n) (Math/log 2))) (defn score-item [n] (if (= n 0) 0 (log2 n))) (+ (score-item 12) (score-item 13) (score-item 5)) 9.60733031374961 I’d forgotten about folding over a collection but quickly remembered that I could achieve the same result with the following code: Neo4j: Cypher - avoid duplicate calls to NOT patterns https://www.markhneedham.com/blog/2016/01/17/neo4j-cypher-avoid-duplicate-calls-to-not-patterns/ Sun, 17 Jan 2016 12:19:35 +0000 https://www.markhneedham.com/blog/2016/01/17/neo4j-cypher-avoid-duplicate-calls-to-not-patterns/ I’ve been reacquainting myself with the meetup.com dataset ahead of Wednesday’s meetup in London and wanted to write a collaborative filtering type query to work out which groups people in my groups were in. This started simple enough: MATCH (member:Member {name: "Mark Needham"})-[:MEMBER_OF]->(group:Group)<-[:MEMBER_OF]-(other:Member)-[:MEMBER_OF]->(otherGroup:Group) RETURN otherGroup, COUNT(*) AS commonMembers ORDER BY commonMembers DESC LIMIT 5 And doesn’t take too long to run: Cypher version: CYPHER 2.3, planner: COST. 1084378 total db hits in 1103 ms. 2015: A year in the life of the Neo4j London meetup group https://www.markhneedham.com/blog/2015/12/31/2015-a-year-in-the-life-of-the-neo4j-london-meetup-group/ Thu, 31 Dec 2015 13:58:39 +0000 https://www.markhneedham.com/blog/2015/12/31/2015-a-year-in-the-life-of-the-neo4j-london-meetup-group/ Given we’ve only got a few more hours left of 2015 I thought it’d be fun to do a quick overview of how things have been going in the London chapter of the Neo4j meetup using Neo4j with a bit of R mixed in. We’re going to be using the RNeo4j library to interact with the database along with a few other libraries which will help us out with different tasks: Study until your mind wanders https://www.markhneedham.com/blog/2015/12/31/study-until-your-mind-wanders/ Thu, 31 Dec 2015 10:47:01 +0000 https://www.markhneedham.com/blog/2015/12/31/study-until-your-mind-wanders/ I’ve previously found it very difficult to read math heavy content which has made it challenging to read Distributed Computing which I bought last May. After several false starts where I gave up after getting frustrated that I couldn’t understand things the first time around and forgot everything if I left it a couple of days I decided to try again with a different approach. I’ve been trying a technique I learned from Mini Habits where every day I have a (very small) goal of reading one page of the book. R: Error in approxfun(x.values.1, y.values.1, method = "constant", f = 1, : zero non-NA points https://www.markhneedham.com/blog/2015/12/27/r-error-in-approxfunx-values-1-y-values-1-method-constant-f-1-zero-non-na-points/ Sun, 27 Dec 2015 12:24:05 +0000 https://www.markhneedham.com/blog/2015/12/27/r-error-in-approxfunx-values-1-y-values-1-method-constant-f-1-zero-non-na-points/ I’ve been following Michy Alice’s logistic regression tutorial to create an attendance model for London dev meetups and ran into an interesting problem while doing so. Our dataset has a class imbalance i.e. most people RSVP 'no' to events which can lead to misleading accuracy score where predicting 'no' every time would lead to supposed high accuracy. Source: local data frame [2 x 2] attended n (dbl) (int) 1 0 1541 2 1 53 I sampled the data using caret's http://www. Python: Squashing 'duplicate' pairs together https://www.markhneedham.com/blog/2015/12/20/python-squashing-duplicate-pairs-together/ Sun, 20 Dec 2015 12:12:46 +0000 https://www.markhneedham.com/blog/2015/12/20/python-squashing-duplicate-pairs-together/ As part of a data cleaning pipeline I had pairs of ids of duplicate addresses that I wanted to group together. I couldn’t work out how to solve the problem immediately so I simplified the problem into pairs of letters i.e. A B (A is the same as B) B C (B is the same as C) C D ... E F (E is the same as F) F G . Neo4j: Specific relationship vs Generic relationship + property https://www.markhneedham.com/blog/2015/12/13/neo4j-specific-relationship-vs-generic-relationship-property/ Sun, 13 Dec 2015 21:22:07 +0000 https://www.markhneedham.com/blog/2015/12/13/neo4j-specific-relationship-vs-generic-relationship-property/ For optimal traversal speed in Neo4j queries we should make our relationship types as specific as possible. Let’s take a look at an example from the 'modelling a recommendations engine' talk I presented at Skillsmatter a couple of weeks ago. I needed to decided how to model the 'RSVP' relationship between a Member and an Event. A person can RSVP 'yes' or 'no' to an event and I’d like to capture both of these responses. Neo4j: Facts as nodes https://www.markhneedham.com/blog/2015/12/04/neo4j-facts-as-nodes/ Fri, 04 Dec 2015 07:52:34 +0000 https://www.markhneedham.com/blog/2015/12/04/neo4j-facts-as-nodes/ On Tuesday I spoke at the Neo4j London user group about incrementally building a recommendation engine and described the 'facts as nodes' modeling pattern, defined as follows in the Graph Databases book: When two or more domain entities interact for a period of time, a fact emerges. We represent a fact as a separate node with connections to each of the entities engaged in that fact. Modeling an action in terms of its product—that is, in terms of the thing that results from the action—produces a similar structure: an intermediate node that represents the outcome of an interaction between two or more entities. Python: Parsing a JSON HTTP chunking stream https://www.markhneedham.com/blog/2015/11/28/python-parsing-a-json-http-chunking-stream/ Sat, 28 Nov 2015 13:56:59 +0000 https://www.markhneedham.com/blog/2015/11/28/python-parsing-a-json-http-chunking-stream/ I’ve been playing around with meetup.com’s API again and this time wanted to consume the chunked HTTP RSVP stream and filter RSVPs for events I’m interested in. I use Python for most of my hacking these days and if HTTP requests are required the requests library is my first port of call. I started out with the following script import requests import json def stream_meetup_initial(): uri = "http://stream.meetup.com/2/rsvps" response = requests. jq: Cannot iterate over number / string and number cannot be added https://www.markhneedham.com/blog/2015/11/24/jq-cannot-iterate-over-number-string-and-number-cannot-be-added/ Tue, 24 Nov 2015 00:12:59 +0000 https://www.markhneedham.com/blog/2015/11/24/jq-cannot-iterate-over-number-string-and-number-cannot-be-added/ In my continued parsing of meetup.com’s JSON API I wanted to extract some information from the following JSON file: $ head -n40 data/members/18313232.json [ { "status": "active", "city": "London", "name": ". .", "other_services": {}, "country": "gb", "topics": [], "lon": -0.13, "joined": 1438866605000, "id": 92951932, "state": "17", "link": "http://www.meetup.com/members/92951932", "photo": { "thumb_link": "http://photos1.meetupstatic.com/photos/member/8/d/6/b/thumb_250896203.jpeg", "photo_id": 250896203, "highres_link": "http://photos1.meetupstatic.com/photos/member/8/d/6/b/highres_250896203.jpeg", "photo_link": "http://photos1.meetupstatic.com/photos/member/8/d/6/b/member_250896203.jpeg" }, "lat": 51.49, "visited": 1446745707000, "self": { "common": {} } }, { "status": "active", "city": "London", "name": "Abdelkader Idryssy", "other_services": {}, "country": "gb", "topics": [ { "name": "Weekend Adventures", "urlkey": "weekend-adventures", "id": 16438 }, { "name": "Community Building", "urlkey": "community-building", In particular I want to extract the member’s id, name, join date and the ids of topics they’re interested in. jq: Filtering missing keys https://www.markhneedham.com/blog/2015/11/14/jq-filtering-missing-keys/ Sat, 14 Nov 2015 22:51:38 +0000 https://www.markhneedham.com/blog/2015/11/14/jq-filtering-missing-keys/ I’ve been playing around with the meetup.com API again over the last few days and having saved a set of events to disk I wanted to extract the venues using jq. This is what a single event record looks like: $ jq -r ".[0]" data/events/0.json { "status": "past", "rating": { "count": 1, "average": 1 }, "utc_offset": 3600000, "event_url": "http://www.meetup.com/londonweb/events/3261890/", "group": { "who": "Web Peeps", "name": "London Web", "group_lat": 51.52000045776367, "created": 1034097743000, "join_mode": "approval", "group_lon": -0. Docker 1.9: Port forwarding on Mac OS X https://www.markhneedham.com/blog/2015/11/08/docker-1-9-port-forwarding-on-mac-os-x/ Sun, 08 Nov 2015 20:58:42 +0000 https://www.markhneedham.com/blog/2015/11/08/docker-1-9-port-forwarding-on-mac-os-x/ Since the Neo4j 2.3.0 release there’s been an official docker image which I thought I’d give a try this afternoon. The last time I used docker about a year ago I had to install boot2docker which has now been deprecated in place of Docker Machine and the Docker Toolbox. I created a container with the following command: docker run --detach --publish=7474:7474 neo4j/neo4j And then tried to access the Neo4j server locally: IntelliJ 'java: cannot find JDK 1.8' https://www.markhneedham.com/blog/2015/11/08/intellij-java-cannot-find-jdk-1-8/ Sun, 08 Nov 2015 11:47:36 +0000 https://www.markhneedham.com/blog/2015/11/08/intellij-java-cannot-find-jdk-1-8/ I upgraded to IntelliJ 15.0 a few days ago and was initially seeing the following exception when trying to compile: module-name java: cannot find JDK 1.8 I’ve been compiling against JDK 1.8 for a while now using IntelliJ 14 so I wasn’t sure what was going on. I checked my project settings and they seemed fine: The error message suggested I look in the logs to find more information but I wasn’t sure where those live! Hadoop: HDFS - java.lang.NoSuchMethodError: org.apache.hadoop.fs.FSOutputSummer.<init>(Ljava/util/zip/Checksum;II)V https://www.markhneedham.com/blog/2015/10/31/hadoop-hdfs-ava-lang-nosuchmethoderror-org-apache-hadoop-fs-fsoutputsummer-ljavautilzipchecksumiiv/ Sat, 31 Oct 2015 23:58:22 +0000 https://www.markhneedham.com/blog/2015/10/31/hadoop-hdfs-ava-lang-nosuchmethoderror-org-apache-hadoop-fs-fsoutputsummer-ljavautilzipchecksumiiv/ I wanted to write a little program to check that one machine could communicate a HDFS server running on the other and adapted some code from the Hadoop wiki as follows: package org.playground; import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.FSDataInputStream; import org.apache.hadoop.fs.FSDataOutputStream; import org.apache.hadoop.fs.FileSystem; import org.apache.hadoop.fs.Path; import java.io.IOException; public class HadoopDFSFileReadWrite { static void printAndExit(String str) { System.err.println( str ); System.exit(1); } public static void main (String[] argv) throws IOException { Configuration conf = new Configuration(); conf. Spark: MatchError (of class org.apache.spark.sql.catalyst.expressions.GenericRow) spark https://www.markhneedham.com/blog/2015/10/27/spark-matcherror-of-class-org-apache-spark-sql-catalyst-expressions-genericrow-spark/ Tue, 27 Oct 2015 23:10:47 +0000 https://www.markhneedham.com/blog/2015/10/27/spark-matcherror-of-class-org-apache-spark-sql-catalyst-expressions-genericrow-spark/ I’ve been using Spark again lately to do some pre-processing on the Land Registry data set and ran into an initially confusing problem when trying to parse the CSV file. I’m using the Databricks CSV parsing library and wrote the following script to go over each row, collect up the address components and then derive a 'fullAddress' field. To refresh, this is what the CSV file looks like: $ head -n5 pp-complete. Exploring (potential) data entry errors in the Land Registry data set https://www.markhneedham.com/blog/2015/10/18/exploring-potential-data-entry-errors-in-the-land-registry-data-set/ Sun, 18 Oct 2015 10:03:57 +0000 https://www.markhneedham.com/blog/2015/10/18/exploring-potential-data-entry-errors-in-the-land-registry-data-set/ I’ve previously written a couple of blog posts describing the mechanics of analysing the Land Registry data set and I thought it was about time I described some of the queries I’ve been running the discoveries I’ve made. To recap, the land registry provides a 3GB, 20 million line CSV file containing all the property sales in the UK since 1995. We’ll be loading and query the data in R using the data. jq: error - Cannot iterate over null (null) https://www.markhneedham.com/blog/2015/10/09/jq-error-cannot-iterate-over-null-null/ Fri, 09 Oct 2015 06:34:45 +0000 https://www.markhneedham.com/blog/2015/10/09/jq-error-cannot-iterate-over-null-null/ I’ve been playing around with the jq library again over the past couple of days to convert the JSON from the Stack Overflow API into CSV and found myself needing to deal with an optional field. I’ve downloaded 100 or so questions and stored them as an array in a JSON array like so: $ head -n 100 so.json [ { "has_more": true, "items": [ { "is_answered": false, "delete_vote_count": 0, "body_markdown": ". Mac OS X: Installing the PROJ.4 - Cartographic Projections Library https://www.markhneedham.com/blog/2015/10/05/mac-os-x-installing-the-proj-4-cartographic-projections-library/ Mon, 05 Oct 2015 22:41:10 +0000 https://www.markhneedham.com/blog/2015/10/05/mac-os-x-installing-the-proj-4-cartographic-projections-library/ I’ve been following Scott Barnham’s guide to transforming UK postcodes into (lat, long) coordinates and needed to install the PROJ.4 Cartographic Projections library which I initially struggled with. The first step is to download a tar.gz version which is linked from the wiki page: $ wget http://download.osgeo.org/proj/proj-4.9.1.tar.gz Next we’ll unpack the file and then build the binaries: $ tar -xvf proj-4.9.1.tar.gz $ cd proj-4.9.1 $ ./configure --prefix ~/projects/land-registry/proj-4.9.1 $ make $ make install The files we need are in the bin directory. R: data.table - Finding the maximum row https://www.markhneedham.com/blog/2015/10/02/r-data-table-finding-the-maximum-row/ Fri, 02 Oct 2015 18:42:47 +0000 https://www.markhneedham.com/blog/2015/10/02/r-data-table-finding-the-maximum-row/ In my continued playing around with the R data.table package I wanted to find the maximum row based on one of the columns, grouped by another column, and then return back the whole row. We’ll use the following data table to illustrate: > blogDT = data.table(name = c("Property 1","Property 1","Property 1","Property 2","Property 2","Property 2"), price = c(10000, 12500, 18000, 245000, 512000, 1000000), date = c("Day 1", "Day 7", "Day 10", "Day 3", "Day 5", "Day 12")) > blogDT[, lag. IntelliJ 14.1.5: Unable to import maven project https://www.markhneedham.com/blog/2015/09/30/intellij-14-1-5-unable-to-import-maven-project/ Wed, 30 Sep 2015 05:54:54 +0000 https://www.markhneedham.com/blog/2015/09/30/intellij-14-1-5-unable-to-import-maven-project/ After a recent IntelliJ upgrade I’ve been running into the following error when trying to attach the sources of any library being pulled in via Maven: Unable to import maven project It seems like this is a recent issue in the 14.x series and luckily is reasonably easy to fix by adding the following flag to the VM options passed to the Maven importer: -Didea.maven3.use.compat.resolver And this is where you need to add it: R: data.table - Comparing adjacent rows https://www.markhneedham.com/blog/2015/09/27/r-data-table-comparing-adjacent-rows/ Sun, 27 Sep 2015 22:02:07 +0000 https://www.markhneedham.com/blog/2015/09/27/r-data-table-comparing-adjacent-rows/ As part of my exploration of the Land Registry price paid data set I wanted to compare the difference between consecutive sales of properties. This means we need to group the sales by a property identifier and then get the previous sale price into a column on each row unless it’s the first sale in which case we’ll have 'NA'. We can do this by creating a http://stackoverflow.com/questions/26291988/r-how-to-create-a-lag-variable-for-each-by-group variable. R: Querying a 20 million line CSV file - data.table vs data frame https://www.markhneedham.com/blog/2015/09/25/r-querying-a-20-million-line-csv-file-data-table-vs-data-frame/ Fri, 25 Sep 2015 06:28:29 +0000 https://www.markhneedham.com/blog/2015/09/25/r-querying-a-20-million-line-csv-file-data-table-vs-data-frame/ As I mentioned in a couple of blog posts already, I’ve been exploring the Land Registry price paid data set and although I’ve initially been using SparkR I was curious how easy it would be to explore the data set using plain R. I thought I’d start out by loading the data into a data frame and run the same queries using deployer. I’ve come across Hadley Wickham’s readr library before but hadn’t used it and since I needed to load a 20 million line CSV file this seemed the perfect time to give it a try. SparkR: Add new column to data frame by concatenating other columns https://www.markhneedham.com/blog/2015/09/21/sparkr-add-new-column-to-data-frame-by-concatenating-other-columns/ Mon, 21 Sep 2015 22:30:51 +0000 https://www.markhneedham.com/blog/2015/09/21/sparkr-add-new-column-to-data-frame-by-concatenating-other-columns/ Continuing with my exploration of the Land Registry open data set using SparkR I wanted to see which road in the UK has had the most property sales over the last 20 years. To recap, this is what the data frame looks like: ./spark-1.5.0-bin-hadoop2.6/bin/sparkR --packages com.databricks:spark-csv_2.11:1.2.0 > sales <- read.df(sqlContext, "pp-complete.csv", "com.databricks.spark.csv", header="false") > head(sales) C0 C1 C2 C3 C4 C5 1 {0C7ADEF5-878D-4066-B785-0000003ED74A} 163000 2003-02-21 00:00 UB5 4PJ T N 2 {35F67271-ABD4-40DA-AB09-00000085B9D3} 247500 2005-07-15 00:00 TA19 9DD D N 3 {B20B1C74-E8E1-4137-AB3E-0000011DF342} 320000 2010-09-10 00:00 W4 1DZ F N 4 {7D6B0915-C56B-4275-AF9B-00000156BCE7} 104000 1997-08-27 00:00 NE61 2BH D N 5 {47B60101-B64C-413D-8F60-000002F1692D} 147995 2003-05-02 00:00 PE33 0RU D N 6 {51F797CA-7BEB-4958-821F-000003E464AE} 110000 2013-03-22 00:00 NR35 2SF T N C6 C7 C8 C9 C10 C11 1 F 106 READING ROAD NORTHOLT NORTHOLT 2 F 58 ADAMS MEADOW ILMINSTER ILMINSTER 3 L 58 WHELLOCK ROAD LONDON 4 F 17 WESTGATE MORPETH MORPETH 5 F 4 MASON GARDENS WEST WINCH KING'S LYNN 6 F 5 WILD FLOWER WAY DITCHINGHAM BUNGAY C12 C13 C14 1 EALING GREATER LONDON A 2 SOUTH SOMERSET SOMERSET A 3 EALING GREATER LONDON A 4 CASTLE MORPETH NORTHUMBERLAND A 5 KING'S LYNN AND WEST NORFOLK NORFOLK A 6 SOUTH NORFOLK NORFOLK A This document explains the data stored in each field and for this particular query we’re interested in fields C9-C12. SparkR: Error in invokeJava(isStatic = TRUE, className, methodName, ...) : java.lang.ClassNotFoundException: Failed to load class for data source: csv. https://www.markhneedham.com/blog/2015/09/21/sparkr-error-in-invokejavaisstatic-true-classname-methodname-java-lang-classnotfoundexception-failed-to-load-class-for-data-source-csv/ Mon, 21 Sep 2015 22:06:44 +0000 https://www.markhneedham.com/blog/2015/09/21/sparkr-error-in-invokejavaisstatic-true-classname-methodname-java-lang-classnotfoundexception-failed-to-load-class-for-data-source-csv/ I’ve been wanting to play around with SparkR for a while and over the weekend deciding to explore a large Land Registry CSV file containing all the sales of properties in the UK over the last 20 years. First I started up the SparkR shell with the CSV package loaded in: ~bash ./spark-1.5.0-bin-hadoop2.6/bin/sparkR --packages com.databricks:spark-csv_2.11:1.2.0 ~ Next I tried to read the CSV file into a Spark data frame by modifying one of the examples from the tutorial: ~bash > sales <- read. Neo4j: Summarising neo4j-shell output https://www.markhneedham.com/blog/2015/08/21/neo4j-summarising-neo4j-shell-output/ Fri, 21 Aug 2015 20:59:37 +0000 https://www.markhneedham.com/blog/2015/08/21/neo4j-summarising-neo4j-shell-output/ </p> I frequently find myself trying to optimise a set of cypher queries and I tend to group them together in a script that I fed to the Neo4j shell. </p> When tweaking the queries it’s easy to make a mistake and end up not creating the same data so I decided to write a script which will show me the aggregates of all the commands executed. I want to see the number of constraints created, indexes added, nodes, relationships and properties created. Python: Extracting Excel spreadsheet into CSV files https://www.markhneedham.com/blog/2015/08/19/python-extracting-excel-spreadsheet-into-csv-files/ Wed, 19 Aug 2015 23:27:42 +0000 https://www.markhneedham.com/blog/2015/08/19/python-extracting-excel-spreadsheet-into-csv-files/ I’ve been playing around with the Road Safety open data set and the download comes with several CSV files and an excel spreadsheet containing the legend. There are 45 sheets in total and each of them looks like this: I wanted to create a CSV file for each sheet so that I can import the data set into Neo4j using the LOAD CSV command. I came across the Python Excel website which pointed me at the xlrd library since I’m working with a pre 2010 Excel file. Unix: Stripping first n bytes in a file / Byte Order Mark (BOM) https://www.markhneedham.com/blog/2015/08/19/unix-stripping-first-n-bytes-in-a-file-byte-order-mark-bom/ Wed, 19 Aug 2015 23:27:28 +0000 https://www.markhneedham.com/blog/2015/08/19/unix-stripping-first-n-bytes-in-a-file-byte-order-mark-bom/ I’ve previously written a couple of blog posts showing how to strip out the byte order mark (BOM) from CSV files to make loading them into Neo4j easier and today I came across another way to clean up the file using tail. The BOM is 3 bytes long at the beginning of the file so if we know that a file contains it then we can strip out those first 3 bytes tail like this: Unix: Redirecting stderr to stdout https://www.markhneedham.com/blog/2015/08/15/unix-redirecting-stderr-to-stdout/ Sat, 15 Aug 2015 15:55:32 +0000 https://www.markhneedham.com/blog/2015/08/15/unix-redirecting-stderr-to-stdout/ I’ve been trying to optimise some Neo4j import queries over the last couple of days and as part of the script I’ve been executed I wanted to redirect the output of a couple of commands into a file to parse afterwards. I started with the following script which doesn’t do any explicit redirection of the output: #!/bin/sh ./neo4j-community-2.2.3/bin/neo4j start Now let’s run that script and redirect the output to a file: Sed: Using environment variables https://www.markhneedham.com/blog/2015/08/13/sed-using-environment-variables/ Thu, 13 Aug 2015 19:30:51 +0000 https://www.markhneedham.com/blog/2015/08/13/sed-using-environment-variables/ I’ve been playing around with the BBC football data set that I wrote about a couple of months ago and I wanted to write some code that would take the import script and replace all instances of remote URIs with a file system path. For example the import file contains several lines similar to this: LOAD CSV WITH HEADERS FROM "https://raw.githubusercontent.com/mneedham/neo4j-bbc/master/data/matches.csv" AS row And I want that to read: Java: Jersey - java.lang.NoSuchMethodError: com.sun.jersey.core.reflection.ReflectionHelper. getContextClassLoaderPA()Ljava/security/PrivilegedAction; https://www.markhneedham.com/blog/2015/08/11/java-jersey-java-lang-nosuchmethoderror-com-sun-jersey-core-reflection-reflectionhelper-getcontextclassloaderpaljavasecurityprivilegedaction/ Tue, 11 Aug 2015 06:59:50 +0000 https://www.markhneedham.com/blog/2015/08/11/java-jersey-java-lang-nosuchmethoderror-com-sun-jersey-core-reflection-reflectionhelper-getcontextclassloaderpaljavasecurityprivilegedaction/ I’ve been trying to put some tests around an Neo4j unmanaged extension I’ve been working on and ran into the following stack trace when launching the server using the Neo4j test harness: public class ExampleResourceTest { @Rule public Neo4jRule neo4j = new Neo4jRule() .withFixture("CREATE (:Person {name: 'Mark'})") .withFixture("CREATE (:Person {name: 'Nicole'})") .withExtension( "/unmanaged", ExampleResource.class ); @Test public void shouldReturnAllTheNodes() { // Given URI serverURI = neo4j.httpURI(); // When HTTP.Response response = HTTP. Neo4j 2.2.3: Unmanaged extensions - Creating gzipped streamed responses with Jetty https://www.markhneedham.com/blog/2015/08/10/neo4j-2-2-3-unmanaged-extensions-creating-gzipped-streamed-responses-with-jetty/ Mon, 10 Aug 2015 23:57:01 +0000 https://www.markhneedham.com/blog/2015/08/10/neo4j-2-2-3-unmanaged-extensions-creating-gzipped-streamed-responses-with-jetty/ Back in 2013 I wrote a couple of blog posts showing examples of an unmanaged extension which had a streamed and gzipped response but two years on I realised they were a bit out of date and deserved a refresh. When writing unmanaged extensions in Neo4j a good rule of thumb is to try and reduce the amount of objects you keep hanging around. In this context this means that we should stream our response to the client as quickly as possible rather than building it up in memory and sending it in one go. Record Linkage: Playing around with Duke https://www.markhneedham.com/blog/2015/08/08/record-linkage-playing-around-with-duke/ Sat, 08 Aug 2015 22:50:41 +0000 https://www.markhneedham.com/blog/2015/08/08/record-linkage-playing-around-with-duke/ I’ve become quite interesting in record linkage recently and came across the Duke project which provides some tools to help solve this problem. I thought I’d give it a try. The typical problem when doing record linkage is that we have two records from different data sets which represent the same entity but don’t have a common key that we can use to merge them together. We therefore need to come up with a heuristic that will allow us to do so. Spark: Convert RDD to DataFrame https://www.markhneedham.com/blog/2015/08/06/spark-convert-rdd-to-dataframe/ Thu, 06 Aug 2015 21:11:44 +0000 https://www.markhneedham.com/blog/2015/08/06/spark-convert-rdd-to-dataframe/ As I mentioned in a previous blog post I’ve been playing around with the Databricks Spark CSV library and wanted to take a CSV file, clean it up and then write out a new CSV file containing some of the columns. I started by processing the CSV file and writing it into a temporary table: import org.apache.spark.sql.{SQLContext, Row, DataFrame} val sqlContext = new SQLContext(sc) val crimeFile = "Crimes_-_2001_to_present.csv" sqlContext.load("com.databricks.spark.csv", Map("path" -> crimeFile, "header" -> "true")). Spark: pyspark/Hadoop - py4j.protocol.Py4JJavaError: An error occurred while calling o23.load.: org.apache.hadoop.ipc.RemoteException: Server IPC version 9 cannot communicate with client version 4 https://www.markhneedham.com/blog/2015/08/04/spark-pysparkhadoop-py4j-protocol-py4jjavaerror-an-error-occurred-while-calling-o23-load-org-apache-hadoop-ipc-remoteexception-server-ipc-version-9-cannot-communicate-with-client-version-4/ Tue, 04 Aug 2015 06:35:40 +0000 https://www.markhneedham.com/blog/2015/08/04/spark-pysparkhadoop-py4j-protocol-py4jjavaerror-an-error-occurred-while-calling-o23-load-org-apache-hadoop-ipc-remoteexception-server-ipc-version-9-cannot-communicate-with-client-version-4/ I’ve been playing around with pyspark - Spark’s Python library - and I wanted to execute the following job which takes a file from my local HDFS and then counts how many times each FBI code appears using Spark SQL: from pyspark import SparkContext from pyspark.sql import SQLContext sc = SparkContext("local", "Simple App") sqlContext = SQLContext(sc) file = "hdfs://localhost:9000/user/markneedham/Crimes_-_2001_to_present.csv" sqlContext.load(source="com.databricks.spark.csv", header="true", path = file).registerTempTable("crimes") rows = sqlContext.sql("select `FBI Code` AS fbiCode, COUNT(*) AS times FROM crimes GROUP BY `FBI Code` ORDER BY times DESC"). Spark: Processing CSV files using Databricks Spark CSV Library https://www.markhneedham.com/blog/2015/08/02/spark-processing-csv-files-using-databricks-spark-csv-library/ Sun, 02 Aug 2015 18:08:47 +0000 https://www.markhneedham.com/blog/2015/08/02/spark-processing-csv-files-using-databricks-spark-csv-library/ Last year I wrote about exploring the Chicago crime data set using Spark and the OpenCSV parser and while this worked well, a few months ago I noticed that there’s now a spark-csv library which I should probably use instead. I thought it’d be a fun exercise to translate my code to use it. So to recap our goal: we want to count how many times each type of crime has been committed. Neo4j: Cypher - Removing consecutive duplicates https://www.markhneedham.com/blog/2015/07/30/neo4j-cypher-removing-consecutive-duplicates/ Thu, 30 Jul 2015 06:23:03 +0000 https://www.markhneedham.com/blog/2015/07/30/neo4j-cypher-removing-consecutive-duplicates/ When writing Cypher queries I sometimes find myself wanting to remove consecutive duplicates in collections that I’ve joined together. e.g we might start with the following query where 1 and 7 appear consecutively: RETURN [1,1,2,3,4,5,6,7,7,8] AS values ==> +-----------------------+ ==> | values | ==> +-----------------------+ ==> | [1,1,2,3,4,5,6,7,7,8] | ==> +-----------------------+ ==> 1 row We want to end up with [1,2,3,4,5,6,7,8]. We can start by exploding our array and putting consecutive elements next to each other: Neo4j: MERGE'ing on super nodes https://www.markhneedham.com/blog/2015/07/28/neo4j-mergeing-on-super-nodes/ Tue, 28 Jul 2015 21:04:58 +0000 https://www.markhneedham.com/blog/2015/07/28/neo4j-mergeing-on-super-nodes/ In my continued playing with the Chicago crime data set I wanted to connect the crimes committed to their position in the FBI crime type hierarchy. These are the sub graphs that I want to connect: We have a 'fbiCode' on each 'Crime' node which indicates which 'Crime Sub Category' the crime belongs to. I started with the following query to connect the nodes together: MATCH (crime:Crime) WITH crime SKIP {skip} LIMIT 10000 MATCH (subCat:SubCategory {code: crime. Python: Difference between two datetimes in milliseconds https://www.markhneedham.com/blog/2015/07/28/python-difference-between-two-datetimes-in-milliseconds/ Tue, 28 Jul 2015 20:05:47 +0000 https://www.markhneedham.com/blog/2015/07/28/python-difference-between-two-datetimes-in-milliseconds/ I’ve been doing a bit of adhoc measurement of some cypher queries executed via py2neo and wanted to work out how many milliseconds each query was taking end to end. I thought there’d be an obvious way of doing this but if there is it’s evaded me so far and I ended up calculating the different between two datetime objects which gave me the following timedelta object: ~python >>> import datetime >>> start = datetime. Neo4j: From JSON to CSV to LOAD CSV via jq https://www.markhneedham.com/blog/2015/07/25/neo4j-from-json-to-csv-to-load-csv-via-jq/ Sat, 25 Jul 2015 23:05:33 +0000 https://www.markhneedham.com/blog/2015/07/25/neo4j-from-json-to-csv-to-load-csv-via-jq/ In my last blog post I showed how to import a Chicago crime categories & sub categories JSON document using Neo4j’s cypher query language via the py2neo driver. While this is a good approach for people with a developer background, many of the users I encounter aren’t developers and favour using Cypher via the Neo4j browser. If we’re going to do this we’ll need to transform our JSON document into a CSV file so that we can use the LOAD CSV command on it. Neo4j: Loading JSON documents with Cypher https://www.markhneedham.com/blog/2015/07/23/neo4j-loading-json-documents-with-cypher/ Thu, 23 Jul 2015 06:15:11 +0000 https://www.markhneedham.com/blog/2015/07/23/neo4j-loading-json-documents-with-cypher/ One of the most commonly asked questions I get asked is how to load JSON documents into Neo4j and although Cypher doesn’t have a 'LOAD JSON' command we can still get JSON data into the graph. Michael shows how to do this from various languages in this blog post and I recently wanted to load a JSON document that I generated from Chicago crime types. This is a snippet of the JSON document: Neo4j 2.2.3: neo4j-import - Encoder StringEncoder[2] returned an illegal encoded value 0 https://www.markhneedham.com/blog/2015/07/21/neo4j-2-2-3-neo4j-import-encoder-stringencoder2-returned-an-illegal-encoded-value-0/ Tue, 21 Jul 2015 06:11:25 +0000 https://www.markhneedham.com/blog/2015/07/21/neo4j-2-2-3-neo4j-import-encoder-stringencoder2-returned-an-illegal-encoded-value-0/ I’ve been playing around with the Chicago crime data set again while preparing for a Neo4j webinar next week and while running the import tool ran into the following exception: Importing the contents of these files into tmp/crimes.db: Nodes: /Users/markneedham/projects/neo4j-spark-chicago/tmp/crimes.csv /Users/markneedham/projects/neo4j-spark-chicago/tmp/beats.csv /Users/markneedham/projects/neo4j-spark-chicago/tmp/primaryTypes.csv /Users/markneedham/projects/neo4j-spark-chicago/tmp/locations.csv Relationships: /Users/markneedham/projects/neo4j-spark-chicago/tmp/crimesBeats.csv /Users/markneedham/projects/neo4j-spark-chicago/tmp/crimesPrimaryTypes.csv /Users/markneedham/projects/neo4j-spark-chicago/tmp/crimesLocationsCleaned.csv Available memory: Free machine memory: 263.17 MB Max heap memory : 3.56 GB Nodes [*>:17.41 MB/s-------------------------|PROPERTIES(3)=|NODE:3|LABEL SCAN----|v:36.30 MB/s(2)===] 3MImport error: Panic called, so exiting java. R: Bootstrap confidence intervals https://www.markhneedham.com/blog/2015/07/19/r-bootstrap-confidence-intervals/ Sun, 19 Jul 2015 19:44:59 +0000 https://www.markhneedham.com/blog/2015/07/19/r-bootstrap-confidence-intervals/ I recently came across an interesting post on Julia Evans' blog showing how to generate a bigger set of data points by sampling the small set of data points that we actually have using bootstrapping. Julia’s examples are all in Python so I thought it’d be a fun exercise to translate them into R. We’re doing the bootstrapping to simulate the number of no-shows for a flight so we can work out how many seats we can overbook the plane by. R: Blog post frequency anomaly detection https://www.markhneedham.com/blog/2015/07/17/r-blog-post-frequency-anomaly-detection/ Fri, 17 Jul 2015 23:34:52 +0000 https://www.markhneedham.com/blog/2015/07/17/r-blog-post-frequency-anomaly-detection/ I came across Twitter’s anomaly detection library last year but haven’t yet had a reason to take it for a test run so having got my blog post frequency data into shape I thought it’d be fun to run it through the algorithm. I wanted to see if it would detect any periods of time when the number of posts differed significantly - I don’t really have an action I’m going to take based on the results, it’s curiosity more than anything else! Neo4j: The football transfers graph https://www.markhneedham.com/blog/2015/07/16/neo4j-the-football-transfers-graph/ Thu, 16 Jul 2015 06:40:26 +0000 https://www.markhneedham.com/blog/2015/07/16/neo4j-the-football-transfers-graph/ Given we’re still in pre season transfer madness as far as European football is concerned I thought it’d be interesting to put together a football transfers graph to see whether there are any interesting insights to be had. It took me a while to find an appropriate source but I eventually came across transfermarkt.co.uk which contains transfers going back at least as far as the start of the Premier League in 1992. Python: UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 0: ordinal not in range(128) https://www.markhneedham.com/blog/2015/07/15/python-unicodedecodeerror-ascii-codec-cant-decode-byte-0xe2-in-position-0-ordinal-not-in-range128/ Wed, 15 Jul 2015 06:20:07 +0000 https://www.markhneedham.com/blog/2015/07/15/python-unicodedecodeerror-ascii-codec-cant-decode-byte-0xe2-in-position-0-ordinal-not-in-range128/ I was recently doing some text scrubbing and had difficulty working out how to remove the '†' character from strings. e.g. I had a string like this: >>> u'foo †' u'foo \u2020' I wanted to get rid of the '†' character and then strip any trailing spaces so I’d end up with the string 'foo'. I tried to do this in one call to 'replace': >>> u'foo †'.replace(" †", "") Traceback (most recent call last): File "<stdin>", line 1, in <module> UnicodeDecodeError: 'ascii' codec can't decode byte 0xe2 in position 1: ordinal not in range(128) It took me a while to work out that "† " was being treated as ASCII rather than UTF-8. R: I write more in the last week of the month, or do I? https://www.markhneedham.com/blog/2015/07/12/r-i-write-more-in-the-last-week-of-the-month-or-do-i/ Sun, 12 Jul 2015 09:53:04 +0000 https://www.markhneedham.com/blog/2015/07/12/r-i-write-more-in-the-last-week-of-the-month-or-do-i/ I’ve been writing on this blog for almost 7 years and have always believed that I write more frequently towards the end of a month. Now that I’ve got all the data I thought it’d be interesting to test that belief. I started with a data frame containing each post and its publication date and added an extra column which works out how many weeks from the end of the month that post was written: R: Filling in missing dates with 0s https://www.markhneedham.com/blog/2015/07/12/r-filling-in-missing-dates-with-0s/ Sun, 12 Jul 2015 08:30:40 +0000 https://www.markhneedham.com/blog/2015/07/12/r-filling-in-missing-dates-with-0s/ I wanted to plot a chart showing the number of blog posts published by month and started with the following code which makes use of zoo’s 'as.yearmon' function to add the appropriate column and grouping: > library(zoo) > library(dplyr) > df %>% sample_n(5) title date 888 R: Converting a named vector to a data frame 2014-10-31 23:47:26 144 Rails: Populating a dropdown list using 'form_for' 2010-08-31 01:22:14 615 Onboarding: Sketch the landscape 2013-02-15 07:36:06 28 Javascript: The 'new' keyword 2010-03-06 15:16:02 1290 Coding Dojo #16: Reading SUnit code 2009-05-28 23:23:19 > posts_by_date = df %>% mutate(year_mon = as. R: Date for given week/year https://www.markhneedham.com/blog/2015/07/10/r-date-for-given-weekyear/ Fri, 10 Jul 2015 22:01:58 +0000 https://www.markhneedham.com/blog/2015/07/10/r-date-for-given-weekyear/ As I mentioned in my last couple of blog posts I’ve been looking at the data behind this blog and I wanted to plot a chart showing the number of posts per week since the blog started. I started out with a data frame with posts and publication date: > library(dplyr) > df = read.csv("posts.csv") > df$date = ymd_hms(df$date) > df %>% sample_n(10) title date 538 Nygard Big Data Model: The Investigation Stage 2012-10-10 00:00:36 341 The read-only database 2011-08-29 23:32:26 1112 CSS in Internet Explorer - Some lessons learned 2008-10-31 15:24:51 143 Coding: Mutating parameters 2010-08-26 07:47:23 433 Scala: Counting number of inversions (via merge sort) for an unsorted collection 2012-03-20 06:53:18 618 neo4j/cypher: SQL style GROUP BY functionality 2013-02-17 21:05:27 1111 Testing Hibernate mappings: Setting up test data 2008-10-30 13:24:14 462 neo4j: What question do you want to answer? R: dplyr - Error: cannot modify grouping variable https://www.markhneedham.com/blog/2015/07/09/r-dplyr-error-cannot-modify-grouping-variable/ Thu, 09 Jul 2015 05:55:33 +0000 https://www.markhneedham.com/blog/2015/07/09/r-dplyr-error-cannot-modify-grouping-variable/ I’ve been doing some exploration of the posts made on this blog and I thought I’d start with answering a simple question - on which dates did I write the most posts? I started with a data frame containing each post and the date it was published: > library(dplyr) > df %>% sample_n(5) title date 1148 Taiichi Ohno's Workplace Management: Book Review 2008-12-08 14:14:48 158 Rails: Faking a delete method with 'form_for' 2010-09-20 18:52:15 331 Retrospectives: The 4 L's Retrospective 2011-07-25 21:00:30 1035 msbuild - Use OutputPath instead of OutDir 2008-08-14 18:54:03 1181 The danger of commenting out code 2009-01-17 06:02:33 To find the most popular days for blog posts we can write the following aggregation function: Python: Converting WordPress posts in CSV format https://www.markhneedham.com/blog/2015/07/07/python-converting-wordpress-posts-in-csv-format/ Tue, 07 Jul 2015 06:28:01 +0000 https://www.markhneedham.com/blog/2015/07/07/python-converting-wordpress-posts-in-csv-format/ Over the weekend I wanted to look into the Wordpress data behind this blog (very meta!) and wanted to get the data in CSV format so I could do some analysis in R. I found a couple of WordPress CSV plugins but unfortunately I couldn’t get any of them to work and ended up working with the raw XML data that WordPress produces when you 'export' a blog. I had the problem of the export being incomplete which I 'solved' by importing the posts in two parts of a few years each. R: Wimbledon - How do the seeds get on? https://www.markhneedham.com/blog/2015/07/05/r-wimbledon-how-do-the-seeds-get-on/ Sun, 05 Jul 2015 08:38:03 +0000 https://www.markhneedham.com/blog/2015/07/05/r-wimbledon-how-do-the-seeds-get-on/ Continuing on with the Wimbledon data set I’ve been playing with I wanted to do some exploration on how the seeded players have fared over the years. Taking the last 10 years worth of data there have always had 32 seeds and with the following function we can feed in a seeding and get back the round they would be expected to reach: expected_round = function(seeding) { if(seeding == 1) { return("Winner") } else if(seeding == 2) { return("Finals") } else if(seeding <= 4) { return("Semi-Finals") } else if(seeding <= 8) { return("Quarter-Finals") } else if(seeding <= 16) { return("Round of 16") } else { return("Round of 32") } } > expected_round(1) [1] "Winner" > expected_round(4) [1] "Semi-Finals" We can then have a look at each of the Wimbledon tournaments and work out how far they actually got. R: Calculating the difference between ordered factor variables https://www.markhneedham.com/blog/2015/07/02/r-calculating-the-difference-between-ordered-factor-variables/ Thu, 02 Jul 2015 22:55:01 +0000 https://www.markhneedham.com/blog/2015/07/02/r-calculating-the-difference-between-ordered-factor-variables/ In my continued exploration of Wimbledon data I wanted to work out whether a player had done as well as their seeding suggested they should. I therefore wanted to work out the difference between the round they reached and the round they were expected to reach. A 'round' in the dataset is an ordered factor variable. These are all the possible values: rounds = c("Did not enter", "Round of 128", "Round of 64", "Round of 32", "Round of 16", "Quarter-Finals", "Semi-Finals", "Finals", "Winner") And if we want to factorise a couple of strings into this factor we would do it like this: R: write.csv - unimplemented type 'list' in 'EncodeElement' https://www.markhneedham.com/blog/2015/06/30/r-write-csv-unimplemented-type-list-in-encodeelement/ Tue, 30 Jun 2015 22:26:39 +0000 https://www.markhneedham.com/blog/2015/06/30/r-write-csv-unimplemented-type-list-in-encodeelement/ Everyone now and then I want to serialise an R data frame to a CSV file so I can easily load it up again if my R environment crashes without having to recalculate everything but recently ran into the following error: > write.csv(foo, "/tmp/foo.csv", row.names = FALSE) Error in .External2(C_writetable, x, file, nrow(x), p, rnames, sep, eol, : unimplemented type 'list' in 'EncodeElement' If we take a closer look at the data frame in question it looks ok: R: Speeding up the Wimbledon scraping job https://www.markhneedham.com/blog/2015/06/29/r-speeding-up-the-wimbledon-scraping-job/ Mon, 29 Jun 2015 05:36:22 +0000 https://www.markhneedham.com/blog/2015/06/29/r-speeding-up-the-wimbledon-scraping-job/ Over the past few days I’ve written a few blog posts about a Wimbledon data set I’ve been building and after running the scripts a few times I noticed that it was taking much longer to run that I expected. To recap, I started out with the following function which takes in a URI and returns a data frame containing a row for each match: library(rvest) library(dplyr) scrape_matches1 = function(uri) { matches = data. R: dplyr - Update rows with earlier/previous rows values https://www.markhneedham.com/blog/2015/06/28/r-dplyr-update-rows-with-earlierprevious-rows-values/ Sun, 28 Jun 2015 22:30:08 +0000 https://www.markhneedham.com/blog/2015/06/28/r-dplyr-update-rows-with-earlierprevious-rows-values/ Recently I had a data frame which contained a column which had mostly empty values: > data.frame(col1 = c(1,2,3,4,5), col2 = c("a", NA, NA , "b", NA)) col1 col2 1 1 a 2 2 <NA> 3 3 <NA> 4 4 b 5 5 <NA> I wanted to fill in the NA values with the last non NA value from that column. So I want the data frame to look like this: R: Command line - Error in GenericTranslator$new : could not find function "loadMethod" https://www.markhneedham.com/blog/2015/06/27/r-command-line-error-in-generictranslatornew-could-not-find-function-loadmethod/ Sat, 27 Jun 2015 22:47:22 +0000 https://www.markhneedham.com/blog/2015/06/27/r-command-line-error-in-generictranslatornew-could-not-find-function-loadmethod/ I’ve been reading Text Processing with Ruby over the last week or so and one of the ideas the author describes is setting up your scripts so you can run them directly from the command line. I wanted to do this with my Wimbledon R script and wrote the following script which uses the 'Rscript' executable so that R doesn’t launch in interactive mode: wimbledon #!/usr/bin/env Rscript library(rvest) library(dplyr) library(stringr) library(readr) # stuff Then I tried to run it: R: dplyr - squashing multiple rows per group into one https://www.markhneedham.com/blog/2015/06/27/r-dplyr-squashing-multiple-rows-per-group-into-one/ Sat, 27 Jun 2015 22:36:50 +0000 https://www.markhneedham.com/blog/2015/06/27/r-dplyr-squashing-multiple-rows-per-group-into-one/ I spent a bit of the day working on my Wimbledon data set and the next thing I explored is all the people that have beaten Andy Murray in the tournament. The following dplyr query gives us the names of those people and the year the match took place: library(dplyr) > main_matches %>% filter(loser == "Andy Murray") %>% select(winner, year) winner year 1 Grigor Dimitrov 2014 2 Roger Federer 2012 3 Rafael Nadal 2011 4 Rafael Nadal 2010 5 Andy Roddick 2009 6 Rafael Nadal 2008 7 Marcos Baghdatis 2006 8 David Nalbandian 2005 As you can see, Rafael Nadal shows up multiple times. R: ggplot - Show discrete scale even with no value https://www.markhneedham.com/blog/2015/06/26/r-ggplot-show-discrete-scale-even-with-no-value/ Fri, 26 Jun 2015 22:48:17 +0000 https://www.markhneedham.com/blog/2015/06/26/r-ggplot-show-discrete-scale-even-with-no-value/ As I mentioned in a previous blog post, I’ve been scraping data for the Wimbledon tennis tournament, and having got the data for the last ten years I wrote a query using dplyr to find out how players did each year over that period. I ended up with the following functions to filter my data frame of all the matches: round_reached = function(player, main_matches) { furthest_match = main_matches %>% filter(winner == player | loser == player) %>% arrange(desc(round)) %>% head(1) return(ifelse(furthest_match$winner == player, "Winner", as. R: Scraping Wimbledon draw data https://www.markhneedham.com/blog/2015/06/25/r-scraping-wimbledon-draw-data/ Thu, 25 Jun 2015 23:14:51 +0000 https://www.markhneedham.com/blog/2015/06/25/r-scraping-wimbledon-draw-data/ Given Wimbledon starts next week I wanted to find a data set to explore before it gets underway. Having searched around and failed to find one I had to resort to scraping the ATP World Tour’s event page which displays the matches in an easy to access format. We’ll be using the Wimbledon 2013 draw since Andy Murray won that year! This is what the page looks like: Each match is in its own row of a table and each column has a class attribute which makes it really easy to scrape. R: Scraping the release dates of github projects https://www.markhneedham.com/blog/2015/06/23/r-scraping-the-release-dates-of-github-projects/ Tue, 23 Jun 2015 22:34:47 +0000 https://www.markhneedham.com/blog/2015/06/23/r-scraping-the-release-dates-of-github-projects/ Continuing on from my blog post about scraping Neo4j’s release dates I thought it’d be even more interesting to chart the release dates of some github projects. In theory the release dates should be accessible through the github API but the few that I looked at weren’t returning any data so I scraped the data together. We’ll be using rvest again and I first wrote the following function to extract the release versions and dates from a single page: R: Scraping Neo4j release dates with rvest https://www.markhneedham.com/blog/2015/06/21/r-scraping-neo4j-release-dates-with-rvest/ Sun, 21 Jun 2015 22:07:49 +0000 https://www.markhneedham.com/blog/2015/06/21/r-scraping-neo4j-release-dates-with-rvest/ As part of my log analysis I wanted to get the Neo4j release dates which are accessible from the release notes and decided to try out Hadley Wickham’s rvest scraping library which he released at the end of 2014. rvest is based on Python’s beautifulsoup which has become my scraping library of choice so I didn’t find it too difficult to pick up. To start with we need to download the release notes locally so we don’t have to go over the network when we’re doing our scraping: R: dplyr - segfault cause 'memory not mapped' https://www.markhneedham.com/blog/2015/06/20/r-dplyr-segfault-cause-memory-not-mapped/ Sat, 20 Jun 2015 22:18:55 +0000 https://www.markhneedham.com/blog/2015/06/20/r-dplyr-segfault-cause-memory-not-mapped/ In my continued playing around with web logs in R I wanted to process the logs for a day and see what the most popular URIs were. I first read in all the lines using the read_lines function in readr and put the vector it produced into a data frame so I could process it using dplyr. library(readr) dlines = data.frame(column = read_lines("~/projects/logs/2015-06-18-22-docs")) In the previous post I showed some code to extract the URI from a log line. R: Regex - capturing multiple matches of the same group https://www.markhneedham.com/blog/2015/06/19/r-regex-capturing-multiple-matches-of-the-same-group/ Fri, 19 Jun 2015 21:38:47 +0000 https://www.markhneedham.com/blog/2015/06/19/r-regex-capturing-multiple-matches-of-the-same-group/ I’ve been playing around with some web logs using R and I wanted to extract everything that existed in double quotes within a logged entry. This is an example of a log entry that I want to parse: log = '2015-06-18-22:277:548311224723746831\t2015-06-18T22:00:11\t2015-06-18T22:00:05Z\t93317114\tip-127-0-0-1\t127.0.0.5\tUser\tNotice\tneo4j.com.access.log\t127.0.0.3 - - [18/Jun/2015:22:00:11 +0000] "GET /docs/stable/query-updating.html HTTP/1.1" 304 0 "http://neo4j.com/docs/stable/cypher-introduction.html" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/43.0.2357.124 Safari/537.36"' And I want to extract these 3 things: Coding: Explore and retreat https://www.markhneedham.com/blog/2015/06/17/coding-explore-and-retreat/ Wed, 17 Jun 2015 17:23:10 +0000 https://www.markhneedham.com/blog/2015/06/17/coding-explore-and-retreat/ When refactoring code or looking for the best way to integrate a new piece of functionality I generally favour a small steps/incremental approach but recent experiences have led me to believe that this isn’t always the quickest approach. Sometimes it seems to make more sense to go on little discovery missions in the code, make some bigger steps and then if necessary retreat and revert our changes and apply the lessons learnt on our next discovery mission. Northwind: Finding direct/transitive Reports in SQL and Neo4j's Cypher https://www.markhneedham.com/blog/2015/06/15/northwind-finding-directtransitive-reports-in-sql-and-neo4js-cypher/ Mon, 15 Jun 2015 22:53:33 +0000 https://www.markhneedham.com/blog/2015/06/15/northwind-finding-directtransitive-reports-in-sql-and-neo4js-cypher/ Every few months we run a relational to graph meetup at the Neo London office where we go through how to take your data from a relational database and into the graph. We use the Northwind dataset which often comes as a demo dataset on relational databases and come up with some queries which seem graph in nature. My favourite query is one which finds out how employees are organised and who reports to whom. The Willpower Instinct: Reducing time spent mindlessly scrolling for things to read https://www.markhneedham.com/blog/2015/06/12/the-willpower-instinct-reducing-time-spent-mindlessly-scrolling-for-things-to-read/ Fri, 12 Jun 2015 23:12:32 +0000 https://www.markhneedham.com/blog/2015/06/12/the-willpower-instinct-reducing-time-spent-mindlessly-scrolling-for-things-to-read/ I recently finished reading Kelly McGonigal’s excellent book 'The Willpower Instinct' having previously watched her Google talk of the same title My main takeaway from the book is that there are things that we want to do (or not do) but doing them (or not as the case may be) isn’t necessarily instinctive and so we need to develop some strategies to help ourselves out. In one of the early chapters she suggests picking a habit that you want to do less off and write down on a piece of paper every time you want to do it and how you’re feeling at that point. Neo4j: Using LOAD CSV to help explore CSV files https://www.markhneedham.com/blog/2015/06/11/neo4j-using-load-csv-to-help-explore-csv-files/ Thu, 11 Jun 2015 23:15:06 +0000 https://www.markhneedham.com/blog/2015/06/11/neo4j-using-load-csv-to-help-explore-csv-files/ During the Neo4j How I met your mother hackathon that we ran last week one of the attendees noticed that one of the CSV files we were importing wasn’t creating as many records as they expected it to. This is typically the case when there’s some odd quoting in the CSV file but we decided to look into it. The file in question was one containing references made in HIMYM. Mac OS X: GNU sed - Hex string replacement / replacing new line characters https://www.markhneedham.com/blog/2015/06/11/mac-os-x-gnu-sed-hex-string-replacement-replacing-new-line-characters/ Thu, 11 Jun 2015 21:38:32 +0000 https://www.markhneedham.com/blog/2015/06/11/mac-os-x-gnu-sed-hex-string-replacement-replacing-new-line-characters/ Recently I was working with a CSV file which contained both Windows and Unix line endings which was making it difficult to work with. The actual line endings were HEX '0A0D' i.e. Windows line breaks but there were also HEX 'OA' i.e. Unix line breaks within one of the columns. I wanted to get rid of the Unix line breaks and discovered that you can do HEX sequence replacement using the GNU version of sed - unfortunately the Mac ships with the BSD version which doesn’t have this functionaltiy. Unix: Converting a file of values into a comma separated list https://www.markhneedham.com/blog/2015/06/08/unix-converting-a-file-of-values-into-a-comma-separated-list/ Mon, 08 Jun 2015 22:23:32 +0000 https://www.markhneedham.com/blog/2015/06/08/unix-converting-a-file-of-values-into-a-comma-separated-list/ I recently had a bunch of values in a file that I wanted to paste into a Java program which required a comma separated list of strings. This is what the file looked like: $ cat foo2.txt | head -n 5 1.0 1.0 1.0 1.0 1.0 And the idea is that we would end up with something like this: "1.0","1.0","1.0","1.0","1.0" The first thing we need to do is quote each of the values. Netty: Testing encoders/decoders https://www.markhneedham.com/blog/2015/06/05/netty-testing-encodersdecoders/ Fri, 05 Jun 2015 21:25:25 +0000 https://www.markhneedham.com/blog/2015/06/05/netty-testing-encodersdecoders/ I’ve been working with Netty a bit recently and having built a pipeline of encoders/decoders as described in this excellent tutorial wanted to test that the encoders and decoders were working without having to send real messages around. Luckily there is a EmbeddedChannel which makes our life very easy indeed. Let’s say we’ve got a message 'Foo' that we want to send across the wire. It only contains a single integer value so we’ll just send that and reconstruct 'Foo' on the other side. Neo4j: Cypher - Step by step to creating a linked list of adjacent nodes using UNWIND https://www.markhneedham.com/blog/2015/06/04/neo4j-cypher-step-by-step-to-creating-a-linked-list-of-adjacent-nodes-using-unwind/ Thu, 04 Jun 2015 22:17:34 +0000 https://www.markhneedham.com/blog/2015/06/04/neo4j-cypher-step-by-step-to-creating-a-linked-list-of-adjacent-nodes-using-unwind/ In late 2013 I wrote a post showing how to create a linked list connecting different football seasons together using Neo4j’s Cypher query language, a post I’ve frequently copy & pasted from! Now 18 months later, and using Neo4j 2.2 rather than 2.0, we can actually solve this problem in what I believe is a more intuitive way using the http://neo4j.com/docs/stable/query-unwind.html function. Credit for the idea goes to Michael, I’m just the messenger. R: ggplot geom_density - Error in exists(name, envir = env, mode = mode) : argument "env" is missing, with no default https://www.markhneedham.com/blog/2015/06/03/r-ggplot-geom_density-error-in-existsname-envir-env-mode-mode-argument-env-is-missing-with-no-default/ Wed, 03 Jun 2015 05:52:08 +0000 https://www.markhneedham.com/blog/2015/06/03/r-ggplot-geom_density-error-in-existsname-envir-env-mode-mode-argument-env-is-missing-with-no-default/ Continuing on from yesterday’s blog post where I worked out how to clean up the Think Bayes Price is Right data set, the next task was to plot a distribution of the prices of show case items. To recap, this is what the data frame we’re working with looks like: library(dplyr) df2011 = read.csv("~/projects/rLearning/showcases.2011.csv", na.strings = c("", "NA")) df2011 = df2011 %>% na.omit() > df2011 %>% head() X Sep..19 Sep. R: dplyr - removing empty rows https://www.markhneedham.com/blog/2015/06/02/r-dplyr-removing-empty-rows/ Tue, 02 Jun 2015 06:49:10 +0000 https://www.markhneedham.com/blog/2015/06/02/r-dplyr-removing-empty-rows/ I’m still working my way through the exercises in Think Bayes and in Chapter 6 needed to do some cleaning of the data in a CSV file containing information about the Price is Right. I downloaded the file using wget: wget http://www.greenteapress.com/thinkbayes/showcases.2011.csv And then loaded it into R and explored the first few rows using dplyr library(dplyr) df2011 = read.csv("~/projects/rLearning/showcases.2011.csv") > df2011 %>% head(10) X Sep..19 Sep..20 Sep..21 Sep..22 Sep. R: Think Bayes Euro Problem https://www.markhneedham.com/blog/2015/05/31/r-think-bayes-euro-problem/ Sun, 31 May 2015 23:11:50 +0000 https://www.markhneedham.com/blog/2015/05/31/r-think-bayes-euro-problem/ I’ve got back to working my way through Think Bayes after a month’s break and started out with the one euro coin problem in Chapter 4: A statistical statement appeared in "`The Guardian" on Friday January 4, 2002: When spun on edge 250 times, a Belgian one-euro coin came up heads 140 times and tails 110. ‘It looks very suspicious to me,’ said Barry Blight, a statistics lecturer at the London School of Economics. Python: CSV writing - TypeError: 'builtin_function_or_method' object has no attribute '*getitem*' https://www.markhneedham.com/blog/2015/05/31/python-csv-writing-typeerror-builtin_function_or_method-object-has-no-attribute-__getitem__/ Sun, 31 May 2015 22:33:54 +0000 https://www.markhneedham.com/blog/2015/05/31/python-csv-writing-typeerror-builtin_function_or_method-object-has-no-attribute-__getitem__/ When I’m working in Python I often find myself writing to CSV files using the in built library and every now and then make a mistake when calling writerow: import csv writer = csv.writer(file, delimiter=",") writer.writerow["player", "team"] This results in the following error message: TypeError: 'builtin_function_or_method' object has no attribute '__getitem__' The error message is a bit weird at first but it’s basically saying that I’ve tried to do an associative lookup on an object which doesn’t support that operation. Neo4j: The BBC Champions League graph https://www.markhneedham.com/blog/2015/05/30/neo4j-the-bbc-champions-league-graph/ Sat, 30 May 2015 21:45:07 +0000 https://www.markhneedham.com/blog/2015/05/30/neo4j-the-bbc-champions-league-graph/ A couple of weekends ago I started scraping the BBC live text feed of the Bayern Munich/Barcelona match, initially starting out with just the fouls and building the foul graph. I’ve spent a bit more time on it since then and have managed to model several other events as well including attempts, goals, cards and free kicks. I started doing this just for the Bayern Munich/Barcelona match but realised it wasn’t particularly difficult to extend this out and graph the events for every match in the Champions League 2014/2015. Python: Look ahead multiple elements in an iterator/generator https://www.markhneedham.com/blog/2015/05/28/python-look-ahead-multiple-elements-in-an-iteratorgenerator/ Thu, 28 May 2015 20:56:08 +0000 https://www.markhneedham.com/blog/2015/05/28/python-look-ahead-multiple-elements-in-an-iteratorgenerator/ As part of the BBC live text scraping code I’ve been working on I needed to take an iterator of raw events created by a generator and transform this into an iterator of cards shown in a match. The structure of the raw events I’m interested in is as follows: Line 1: Player booked Line 2: Player fouled Line 3: Information about the foul e.g. events = [ {'event': u'Booking Pedro (Barcelona) is shown the yellow card for a bad foul. Neo4j: The foul revenge graph https://www.markhneedham.com/blog/2015/05/26/neo4j-the-foul-revenge-graph/ Tue, 26 May 2015 07:03:36 +0000 https://www.markhneedham.com/blog/2015/05/26/neo4j-the-foul-revenge-graph/ Last week I was showing the foul graph to my colleague Alistair who came up with the idea of running a 'foul revenge' query to find out which players gained revenge for a foul with one of their own later in them match. Queries like this are very path centric and therefore work well in a graph. To recap, this is what the foul graph looks like: The first thing that we need to do is connect the fouls in a linked list based on time so that we can query their order more easily. Python: Joining multiple generators/iterators https://www.markhneedham.com/blog/2015/05/24/python-joining-multiple-generatorsiterators/ Sun, 24 May 2015 23:51:25 +0000 https://www.markhneedham.com/blog/2015/05/24/python-joining-multiple-generatorsiterators/ In my previous blog post I described how I’d refactored some scraping code I’ve been working on to use iterators and ended up with a function which returned a generator containing all the events for one BBC live text match: match_id = "32683310" events = extract_events("data/raw/%s" % (match_id)) >>> print type(events) <type 'generator'> The next thing I wanted to do is get the events for multiple matches which meant I needed to glue together multiple generators into one big generator. Python: Refactoring to iterator https://www.markhneedham.com/blog/2015/05/23/python-refactoring-to-iterator/ Sat, 23 May 2015 10:14:38 +0000 https://www.markhneedham.com/blog/2015/05/23/python-refactoring-to-iterator/ Over the last week I’ve been building a set of scripts to scrape the events from the Bayern Munich/Barcelona game and I’ve ended up with a few hundred lines of nested for statements, if statements and mutated lists. I thought it was about time I did a bit of refactoring. The following is a function which takes in a match file and spits out a collection of maps containing times & events. Python: UnicodeEncodeError: 'ascii' codec can't encode character u'\xfc' in position 11: ordinal not in range(128) https://www.markhneedham.com/blog/2015/05/21/python-unicodeencodeerror-ascii-codec-cant-encode-character-uxfc-in-position-11-ordinal-not-in-range128/ Thu, 21 May 2015 06:14:32 +0000 https://www.markhneedham.com/blog/2015/05/21/python-unicodeencodeerror-ascii-codec-cant-encode-character-uxfc-in-position-11-ordinal-not-in-range128/ I’ve been trying to write some Python code to extract the players and the team they represented in the Bayern Munich/Barcelona match into a CSV file and had much more difficulty than I expected. I have some scraping code (which is beyond the scope of this article) which gives me a list of (player, team) pairs that I want to write to disk. The contents of the list is as follows: Neo4j: Finding all shortest paths https://www.markhneedham.com/blog/2015/05/19/neo4j-finding-all-shortest-paths/ Tue, 19 May 2015 22:45:48 +0000 https://www.markhneedham.com/blog/2015/05/19/neo4j-finding-all-shortest-paths/ One of the Cypher language features we show in Neo4j training courses is the shortest path function which allows you to find the shortest path in terms of number of relationships between two nodes. Using the movie graph, which you can import via the ':play movies' command in the browser, we’ll first create a 'KNOWS' relationship between any people that have appeared in the same movie: MATCH (p1:Person)-[:ACTED_IN]->()<-[:ACTED_IN]-(p2:Person) MERGE (p1)-[:KNOWS]-(p2) Now that we’ve got that relationship we can easily find the shortest path between two people, say Tom Cruise and Tom Hanks: Neo4j: Refactoring the BBC football live text fouls graph https://www.markhneedham.com/blog/2015/05/17/neo4j-refactoring-the-bbc-football-live-text-fouls-graph/ Sun, 17 May 2015 11:04:59 +0000 https://www.markhneedham.com/blog/2015/05/17/neo4j-refactoring-the-bbc-football-live-text-fouls-graph/ Yesterday I wrote about a Neo4j graph I’ve started building which contains all the fouls committed in the Champions League game between Barcelona & Bayern Munich and surrounding meta data. While adding other events into the graph I realised that I’d added some duplication in the model and the model could do with some refactoring to make it easier to use. To recap, this is the model that we designed in the previous blog post: Neo4j: BBC football live text fouls graph https://www.markhneedham.com/blog/2015/05/16/neo4j-bbc-football-live-text-fouls-graph/ Sat, 16 May 2015 21:13:01 +0000 https://www.markhneedham.com/blog/2015/05/16/neo4j-bbc-football-live-text-fouls-graph/ I recently came across the Partially Derivative podcast and in episode 17 they describe how Kirk Goldsberry scraped a bunch of data about shots in basketball matches then ran some analysis on that data. It got me thinking that we might be able to do something similar for football matches and although event based data for football matches only comes from Opta, the BBC does expose some of them in live text feeds. R: ggplot - Displaying multiple charts with a for loop https://www.markhneedham.com/blog/2015/05/14/r-ggplot-displaying-multiple-charts-with-a-for-loop/ Thu, 14 May 2015 00:17:02 +0000 https://www.markhneedham.com/blog/2015/05/14/r-ggplot-displaying-multiple-charts-with-a-for-loop/ Continuing with my analysis of the Neo4j London user group I wanted to drill into some individual meetups and see the makeup of the people attending those meetups with respect to the cohort they belong to. I started by writing a function which would take in an event ID and output a bar chart showing the number of people who attended that event from each cohort. <?p> We can work out the cohort that a member belongs to by querying for the first event they attended. R: Cohort heatmap of Neo4j London meetup https://www.markhneedham.com/blog/2015/05/11/r-cohort-heatmap-of-neo4j-london-meetup/ Mon, 11 May 2015 23:16:07 +0000 https://www.markhneedham.com/blog/2015/05/11/r-cohort-heatmap-of-neo4j-london-meetup/ A few months ago I had a go at doing some cohort analysis of the Neo4j London meetup group which was an interesting experiment but unfortunately resulted in a chart that was completely illegible. I wasn’t sure how to progress from there but a few days ago I came across the cohort heatmap which seemed like a better way of visualising things over time. The underlying idea is still the same - we’ve comparing different cohorts of users against each other to see whether a change or intervention we did at a certain time had any impact. R: Neo4j London meetup group - How many events do people come to? https://www.markhneedham.com/blog/2015/05/09/r-neo4j-london-meetup-group-how-many-events-do-people-come-to/ Sat, 09 May 2015 22:33:05 +0000 https://www.markhneedham.com/blog/2015/05/09/r-neo4j-london-meetup-group-how-many-events-do-people-come-to/ Earlier this week the number of members in the Neo4j London meetup group creeped over the 2,000 mark and I thought it’d be fun to re-explore the data that I previously imported into Neo4j. How often do people come to meetups? library(RNeo4j) library(dplyr) graph = startGraph("http://localhost:7474/db/data/") query = "MATCH (g:Group {name: 'Neo4j - London User Group'})-[:HOSTED_EVENT]->(event)<-[:TO]-({response: 'yes'})<-[:RSVPD]-(profile)-[:HAS_MEMBERSHIP]->(membership)-[:OF_GROUP]->(g) WHERE (event.time + event.utc_offset) < timestamp() RETURN event.id, event.time + event.utc_offset AS eventTime, profile. Python: Selecting certain indexes in an array https://www.markhneedham.com/blog/2015/05/05/python-selecting-certain-indexes-in-an-array/ Tue, 05 May 2015 21:39:24 +0000 https://www.markhneedham.com/blog/2015/05/05/python-selecting-certain-indexes-in-an-array/ A couple of days ago I was scrapping the UK parliament constituencies from Wikipedia in preparation for the Graph Connect hackathon and had got to the point where I had an array with one entry per column in the table. import requests from bs4 import BeautifulSoup from soupselect import select page = open("constituencies.html", 'r') soup = BeautifulSoup(page.read()) for row in select(soup, "table.wikitable tr"): if select(row, "th"): print [cell.text for cell in select(row, "th")] if select(row, "td"): print [cell. Neo4j: LOAD CSV - java.io.InputStreamReader there's a field starting with a quote and whereas it ends that quote there seems to be character in that field after that ending quote. That isn't supported. https://www.markhneedham.com/blog/2015/05/04/neo4j-load-csv-java-io-inputstreamreader-theres-a-field-starting-with-a-quote-and-whereas-it-ends-that-quote-there-seems-to-be-character-in-that-field-after-that-ending-quote-that-isnt-suppor/ Mon, 04 May 2015 09:56:22 +0000 https://www.markhneedham.com/blog/2015/05/04/neo4j-load-csv-java-io-inputstreamreader-theres-a-field-starting-with-a-quote-and-whereas-it-ends-that-quote-there-seems-to-be-character-in-that-field-after-that-ending-quote-that-isnt-suppor/ I recently came across the last.fm dataset via Ben Frederickson’s blog and thought it’d be an interesting one to load into Neo4j and explore. I started with a simple query to parse the CSV file and count the number of rows: LOAD CSV FROM "file:///Users/markneedham/projects/neo4j-recommendations/lastfm-dataset-360K/usersha1-artmbid-artname-plays.tsv" AS row FIELDTERMINATOR "\t" return COUNT(*) At java.io.InputStreamReader@4d307fda:6484 there's a field starting with a quote and whereas it ends that quote there seems to be character in that field after that ending quote. Coding: Visualising a bitmap https://www.markhneedham.com/blog/2015/05/03/coding-visualising-a-bitmap/ Sun, 03 May 2015 00:19:51 +0000 https://www.markhneedham.com/blog/2015/05/03/coding-visualising-a-bitmap/ Over the last month or so I’ve spent some time each day reading a new part of the Neo4j code base to get more familiar with it, and one of my favourite classes is the Bits class which does all things low level on the wire and to disk. In particular I like its toString method which returns a binary representation of the values that we’re storing in bytes, ints and longs. Deliberate Practice: Building confidence vs practicing https://www.markhneedham.com/blog/2015/04/30/deliberate-practice-building-confidence-vs-practicing/ Thu, 30 Apr 2015 07:48:38 +0000 https://www.markhneedham.com/blog/2015/04/30/deliberate-practice-building-confidence-vs-practicing/ A few weeks ago I wrote about the learning to cycle dependency graph which described some of the skills required to become proficient at riding a bike. While we’ve been practicing various skills/sub skills I’ve often found myself saying the following: if it’s not hard you’re not practicing me, April 2015 i.e. you should find the skill you’re currently practicing difficult otherwise you’re not stretching yourself and therefore aren’t getting better. R: dplyr - Error in (list: invalid subscript type 'double' https://www.markhneedham.com/blog/2015/04/27/r-dplyr-error-in-list-invalid-subscript-type-double/ Mon, 27 Apr 2015 22:34:43 +0000 https://www.markhneedham.com/blog/2015/04/27/r-dplyr-error-in-list-invalid-subscript-type-double/ In my continued playing around with R I wanted to find the minimum value for a specified percentile given a data frame representing a cumulative distribution function (CDF). e.g. imagine we have the following CDF represented in a data frame: library(dplyr) df = data.frame(score = c(5,7,8,10,12,20), percentile = c(0.05,0.1,0.15,0.20,0.25,0.5)) and we want to find the minimum value for the 0.05 percentile. We can use the filter function to do so: Deliberate Practice: Watching yourself fail https://www.markhneedham.com/blog/2015/04/25/deliberate-practice-watching-yourself-fail/ Sat, 25 Apr 2015 22:26:51 +0000 https://www.markhneedham.com/blog/2015/04/25/deliberate-practice-watching-yourself-fail/ I’ve recently been reading the literature written by K. Anders Eriksson and co on Deliberate Practice and one of the suggestions for increasing our competence at a skill is to put ourselves in a situation where we can fail. I’ve been reading Think Bayes - an introductory text on Bayesian statistics, something I know nothing about - and each chapter concludes with a set of exercises to practice, a potentially perfect exercise in failure! R: Think Bayes Locomotive Problem - Posterior probabilities for different priors https://www.markhneedham.com/blog/2015/04/24/r-think-bayes-locomotive-problem-posterior-probabilities-for-different-priors/ Fri, 24 Apr 2015 23:53:12 +0000 https://www.markhneedham.com/blog/2015/04/24/r-think-bayes-locomotive-problem-posterior-probabilities-for-different-priors/ In my continued reading of Think Bayes the next problem to tackle is the Locomotive problem which is defined thus: A railroad numbers its locomotives in order 1..N. One day you see a locomotive with the number 60. Estimate how many loco- motives the railroad has. The interesting thing about this question is that it initially seems that we don’t have enough information to come up with any sort of answer. R: Replacing for loops with data frames https://www.markhneedham.com/blog/2015/04/22/r-replacing-for-loops-with-data-frames/ Wed, 22 Apr 2015 22:18:00 +0000 https://www.markhneedham.com/blog/2015/04/22/r-replacing-for-loops-with-data-frames/ In my last blog post I showed how to derive posterior probabilities for the Think Bayes dice problem: Suppose I have a box of dice that contains a 4-sided die, a 6-sided die, an 8-sided die, a 12-sided die, and a 20-sided die. If you have ever played Dungeons & Dragons, you know what I am talking about. Suppose I select a die from the box at random, roll it, and get a 6. R: Numeric keys in the nested list/dictionary https://www.markhneedham.com/blog/2015/04/21/r-numeric-keys-in-the-nested-listdictionary/ Tue, 21 Apr 2015 05:59:24 +0000 https://www.markhneedham.com/blog/2015/04/21/r-numeric-keys-in-the-nested-listdictionary/ Last week I described how I’ve been creating fake dictionaries in R using lists and I found myself using the same structure while solving the dice problem in Think Bayes. The dice problem is described as follows: Suppose I have a box of dice that contains a 4-sided die, a 6-sided die, an 8-sided die, a 12-sided die, and a 20-sided die. If you have ever played Dungeons & Dragons, you know what I am talking about. R: non-numeric argument to binary operator https://www.markhneedham.com/blog/2015/04/19/r-non-numeric-argument-to-binary-operator/ Sun, 19 Apr 2015 23:08:45 +0000 https://www.markhneedham.com/blog/2015/04/19/r-non-numeric-argument-to-binary-operator/ When debugging R code, given my Java background, I often find myself trying to print out the state of variables along with an appropriate piece of text like this: names = c(1,2,3,4,5,6) > print("names: " + names) Error in "names: " + names : non-numeric argument to binary operator We might try this next: > print("names: ", names) [1] "names: " which doesn’t actually print the names variable - only the first argument to the print function is printed. R: Removing for loops https://www.markhneedham.com/blog/2015/04/18/r-removing-for-loops/ Sat, 18 Apr 2015 23:53:20 +0000 https://www.markhneedham.com/blog/2015/04/18/r-removing-for-loops/ In my last blog post I showed the translation of a likelihood function from Think Bayes into R and in my first attempt at this function I used a couple of nested for loops. likelihoods = function(names, mixes, observations) { scores = rep(1, length(names)) names(scores) = names for(name in names) { for(observation in observations) { scores[name] = scores[name] * mixes[[name]][observation] } } return(scores) } Names = c("Bowl 1", "Bowl 2") bowl1Mix = c(0. R: Think Bayes - More posterior probability calculations https://www.markhneedham.com/blog/2015/04/16/r-think-bayes-more-posterior-probability-calculations/ Thu, 16 Apr 2015 20:57:20 +0000 https://www.markhneedham.com/blog/2015/04/16/r-think-bayes-more-posterior-probability-calculations/ As I mentioned in a post last week I’ve been reading through Think Bayes and translating some of the examples form Python to R. After my first post Antonios suggested a more idiomatic way of writing the function in R so I thought I’d give it a try to calculate the probability that combinations of cookies had come from each bowl. In the simplest case we have this function which takes in the names of the bowls and the likelihood scores: Spark: Generating CSV files to import into Neo4j https://www.markhneedham.com/blog/2015/04/14/spark-generating-csv-files-to-import-into-neo4j/ Tue, 14 Apr 2015 22:56:35 +0000 https://www.markhneedham.com/blog/2015/04/14/spark-generating-csv-files-to-import-into-neo4j/ About a year ago Ian pointed me at a Chicago Crime data set which seemed like a good fit for Neo4j and after much procrastination I’ve finally got around to importing it. The data set covers crimes committed from 2001 until now. It contains around 4 million crimes and meta data around those crimes such as the location, type of crime and year to name a few. The contents of the file follow this structure: R: Creating an object with functions to calculate conditional probability https://www.markhneedham.com/blog/2015/04/12/r-creating-an-object-with-functions-to-calculate-conditional-probability/ Sun, 12 Apr 2015 07:55:29 +0000 https://www.markhneedham.com/blog/2015/04/12/r-creating-an-object-with-functions-to-calculate-conditional-probability/ I’ve been working through Alan Downey’s Thinking Bayes and I thought it’d be an interesting exercise to translate some of the code from Python to R. The first example is a simple one about conditional probablity and the author creates a class 'PMF' (Probability Mass Function) to solve the following problem: Suppose there are two bowls of cookies. Bowl 1 contains 30 vanilla cookies and 10 chocolate cookies. Bowl 2 contains 20 of each. R: Snakes and ladders markov chain https://www.markhneedham.com/blog/2015/04/09/r-snakes-and-ladders-markov-chain/ Thu, 09 Apr 2015 22:02:18 +0000 https://www.markhneedham.com/blog/2015/04/09/r-snakes-and-ladders-markov-chain/ A few days ago I read a really cool blog post explaining how Markov chains can be used to model the possible state transitions in a game of snakes and ladders, a use of Markov chains I hadn’t even thought of! While the example is very helpful for understanding the concept, my understanding of the code is that it works off the assumption that any roll of the dice that puts you on a score > 100 is a winning roll. Neo4j: The learning to cycle dependency graph https://www.markhneedham.com/blog/2015/04/07/neo4j-the-learning-to-cycle-dependency-graph/ Tue, 07 Apr 2015 20:59:49 +0000 https://www.markhneedham.com/blog/2015/04/07/neo4j-the-learning-to-cycle-dependency-graph/ Over the past couple of weeks I’ve been reading about skill building and the break down of skills into more manageable chunks, and recently had a chance to break down the skills required to learn to cycle. I initially sketched out the skill progression but quickly realised I had drawn a dependency graph and thought that putting it into Neo4j would simplify things. I started out with the overall goal for cycling which was to 'Be able to cycle through a public park': R: Markov Chain Wikipedia Example https://www.markhneedham.com/blog/2015/04/05/r-markov-chain-wikipedia-example/ Sun, 05 Apr 2015 10:07:12 +0000 https://www.markhneedham.com/blog/2015/04/05/r-markov-chain-wikipedia-example/ Over the weekend I’ve been reading about Markov Chains and I thought it’d be an interesting exercise for me to translate Wikipedia’s example into R code. But first a definition: A Markov chain is a random process that undergoes transitions from one state to another on a state space. It is required to possess a property that is usually characterized as "memoryless": the probability distribution of the next state depends only on the current state and not on the sequence of events that preceded it. How I met your mother: Story arcs https://www.markhneedham.com/blog/2015/04/03/how-i-met-your-mother-story-arcs/ Fri, 03 Apr 2015 23:31:33 +0000 https://www.markhneedham.com/blog/2015/04/03/how-i-met-your-mother-story-arcs/ After weeks of playing around with various algorithms to extract story arcs in How I met your mother I’ve come to the conclusion that I don’t yet have the skills to completely automate this process so I’m going to change my approach. The new plan is to treat the outputs of the algorithms as suggestions for possible themes but then have a manual step where I extract what I think are interesting themes in the series. Neo4j: Cypher - Building the query for a movie's profile page https://www.markhneedham.com/blog/2015/04/01/neo4j-cypher-building-the-query-for-a-movies-profile-page/ Wed, 01 Apr 2015 11:54:03 +0000 https://www.markhneedham.com/blog/2015/04/01/neo4j-cypher-building-the-query-for-a-movies-profile-page/ Yesterday I spent the day in Berlin delivering a workshop as part of the Data Science Retreat and one of the exercises we did was write a query that would pull back all the information you’d need to create the IMDB page for a movie. Scanning the page we can see that need to get some basic meta data including the title. Next we’ll need to pull in the actors, directors, producers and finally a recommendation for some other movies the viewer might like to see. Python: Creating a skewed random discrete distribution https://www.markhneedham.com/blog/2015/03/30/python-creating-a-skewed-random-discrete-distribution/ Mon, 30 Mar 2015 22:28:23 +0000 https://www.markhneedham.com/blog/2015/03/30/python-creating-a-skewed-random-discrete-distribution/ I’m planning to write a variant of the TF/IDF algorithm over the HIMYM corpus which weights in favour of term that appear in a medium number of documents and as a prerequisite needed a function that when given a number of documents would return a weighting. It should return a higher value when a term appears in a medium number of documents i.e. if I pass in 10 I should get back a higher value than 200 as a term that appears in 10 episodes is likely to be more interesting than one which appears in almost every episode. InetAddressImpl#lookupAllHostAddr slow/hangs https://www.markhneedham.com/blog/2015/03/29/inetaddressimpllookupallhostaddr-slowhangs/ Sun, 29 Mar 2015 00:31:37 +0000 https://www.markhneedham.com/blog/2015/03/29/inetaddressimpllookupallhostaddr-slowhangs/ Since I upgraded to Yosemite I’ve noticed that attempts to resolve localhost on my home network have been taking ages (sometimes over a minute) so I thought I’d try and work out why. This is what my initial /etc/hosts file looked like based on the assumption that my machine’s hostname was teetotal: $ cat /etc/hosts ## # Host Database # # localhost is used to configure the loopback interface # when the system is booting. Neo4j: Generating real time recommendations with Cypher https://www.markhneedham.com/blog/2015/03/27/neo4j-generating-real-time-recommendations-with-cypher/ Fri, 27 Mar 2015 06:59:02 +0000 https://www.markhneedham.com/blog/2015/03/27/neo4j-generating-real-time-recommendations-with-cypher/ One of the most common uses of Neo4j is for building real time recommendation engines and a common theme is that they make use of lots of different bits of data to come up with an interesting recommendation. For example in this video Amanda shows how dating websites build real time recommendation engines by starting with social connections and then introducing passions, location and a few other things. Graph Aware have a neat framework that helps you to build your own recommendation engine using Java and I was curious what a Cypher version would look like. Python: matplotlib hangs and shows nothing (Mac OS X) https://www.markhneedham.com/blog/2015/03/26/python-matplotlib-hangs-and-shows-nothing-mac-os-x/ Thu, 26 Mar 2015 00:02:54 +0000 https://www.markhneedham.com/blog/2015/03/26/python-matplotlib-hangs-and-shows-nothing-mac-os-x/ I’ve been playing around with some of the matplotlib demos recently and discovered that simply copying one of the examples didn’t actually work for me. I was following the bar chart example and had the following code: import numpy as np import matplotlib.pyplot as plt N = 5 ind = np.arange(N) fig, ax = plt.subplots() menMeans = (20, 35, 30, 35, 27) menStd = (2, 3, 4, 1, 2) width = 0. Topic Modelling: Working out the optimal number of topics https://www.markhneedham.com/blog/2015/03/24/topic-modelling-working-out-the-optimal-number-of-topics/ Tue, 24 Mar 2015 22:33:42 +0000 https://www.markhneedham.com/blog/2015/03/24/topic-modelling-working-out-the-optimal-number-of-topics/ In my continued exploration of topic modelling I came across The Programming Historian blog and a post showing how to derive topics from a corpus using the Java library mallet. The instructions on the blog make it very easy to get up and running but as with other libraries I’ve used, you have to specify how many topics the corpus consists of. I’m never sure what value to select but the authors make the following suggestion: Python: Equivalent to flatMap for flattening an array of arrays https://www.markhneedham.com/blog/2015/03/23/python-equivalent-to-flatmap-for-flattening-an-array-of-arrays/ Mon, 23 Mar 2015 00:45:00 +0000 https://www.markhneedham.com/blog/2015/03/23/python-equivalent-to-flatmap-for-flattening-an-array-of-arrays/ I found myself wanting to flatten an array of arrays while writing some Python code earlier this afternoon and being lazy my first attempt involved building the flattened array manually: episodes = [ {"id": 1, "topics": [1,2,3]}, {"id": 2, "topics": [4,5,6]} ] flattened_episodes = [] for episode in episodes: for topic in episode["topics"]: flattened_episodes.append({"id": episode["id"], "topic": topic}) for episode in flattened_episodes: print episode If we run that we’ll see this output: Python: Simplifying the creation of a stop word list with defaultdict https://www.markhneedham.com/blog/2015/03/22/python-simplifying-the-creation-of-a-stop-word-list-with-defaultdict/ Sun, 22 Mar 2015 01:51:52 +0000 https://www.markhneedham.com/blog/2015/03/22/python-simplifying-the-creation-of-a-stop-word-list-with-defaultdict/ I’ve been playing around with topics models again and recently read a paper by David Mimno which suggested the following heuristic for working out which words should go onto the stop list: A good heuristic for identifying such words is to remove those that occur in more than 5-10% of documents (most common) and those that occur fewer than 5-10 times in the entire corpus (least common). I decided to try this out on the HIMYM dataset that I’ve been working on over the last couple of months. Python: Forgetting to use enumerate https://www.markhneedham.com/blog/2015/03/22/python-forgetting-to-use-enumerate/ Sun, 22 Mar 2015 01:28:33 +0000 https://www.markhneedham.com/blog/2015/03/22/python-forgetting-to-use-enumerate/ Earlier this evening I found myself writing the equivalent of the following Python code while building a stop list for a topic model... words = ["mark", "neo4j", "michael"] word_position = 0 for word in words: print word_position, word word_position +=1 ...which is very foolish given that there’s already a function that makes it really easy to grab the position of an item in a list: for word_position, word in enumerate(words): print word_position, word Python does make things extremely easy at times - you’re welcome future Mark! Badass: Making users awesome - Kathy Sierra: Book Review https://www.markhneedham.com/blog/2015/03/20/badass-making-users-awesome-kathy-sierra-book-review/ Fri, 20 Mar 2015 07:30:55 +0000 https://www.markhneedham.com/blog/2015/03/20/badass-making-users-awesome-kathy-sierra-book-review/ I started reading Kathy Sierra’s new book 'Badass: Making users awesome' a couple of weeks ago and with the gift of flights to/from Stockholm this week I’ve got through the rest of it. I really enjoyed the book and have found myself returning to it almost every day to check up exactly what was said on a particular topic. There were a few things that I’ve taken away and have been going on about to anyone who will listen. Neo4j: Detecting potential typos using EXPLAIN https://www.markhneedham.com/blog/2015/03/17/neo4j-detecting-potential-typos-using-explain/ Tue, 17 Mar 2015 22:46:13 +0000 https://www.markhneedham.com/blog/2015/03/17/neo4j-detecting-potential-typos-using-explain/ I’ve been running a few intro to Neo4j training sessions recently using Neo4j 2.2.0 RC1 and at some stage in every session somebody will make a typo when writing out of the example queries. For example one of the queries that we do about half way finds the actors and directors who have worked together and aggregates the movies they were in. This is the correct query: MATCH (actor:Person)-[:ACTED_IN]->(movie)<-[:DIRECTED]-(director) RETURN actor. One month of mini habits https://www.markhneedham.com/blog/2015/03/17/one-month-of-mini-habits/ Tue, 17 Mar 2015 01:32:18 +0000 https://www.markhneedham.com/blog/2015/03/17/one-month-of-mini-habits/ I recently read a book in the 'getting things done' genre written by Stephen Guise titled 'Mini Habits' and although I generally don’t like those types of books I quite enjoyed this one and decided to give his system a try. The underlying idea is that there are two parts of actually doing stuff: Planning what to do Doing it We often get stuck in between the first and second steps because what we’ve planned to do is too big and overwhelming. Python: Transforming Twitter datetime string to timestamp (z' is a bad directive in format) https://www.markhneedham.com/blog/2015/03/15/python-transforming-twitter-datetime-string-to-timestamp-z-is-a-bad-directive-in-format/ Sun, 15 Mar 2015 22:43:17 +0000 https://www.markhneedham.com/blog/2015/03/15/python-transforming-twitter-datetime-string-to-timestamp-z-is-a-bad-directive-in-format/ I’ve been playing around with importing Twitter data into Neo4j and since Neo4j can’t store dates natively just yet I needed to convert a date string to timestamp. I started with the following which unfortunately throws an exception: from datetime import datetime date = "Sat Mar 14 18:43:19 +0000 2015" >>> datetime.strptime(date, "%a %b %d %H:%M:%S %z %Y") Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/System/Library/Frameworks/Python. Python: Checking any value in a list exists in a line of text https://www.markhneedham.com/blog/2015/03/14/python-checking-any-value-in-a-list-exists-in-a-line-of-text/ Sat, 14 Mar 2015 02:52:02 +0000 https://www.markhneedham.com/blog/2015/03/14/python-checking-any-value-in-a-list-exists-in-a-line-of-text/ I’ve been doing some log file analysis to see what cypher queries were being run on a Neo4j instance and I wanted to narrow down the lines I looked at to only contain ones which had mutating operations i.e. those containing the words MERGE, DELETE, SET or CREATE Here’s an example of the text file I was parsing: $ cat blog.txt MATCH n RETURN n MERGE (n:Person {name: "Mark"}) RETURN n MATCH (n:Person {name: "Mark"}) ON MATCH SET n. Python/Neo4j: Finding interesting computer sciency people to follow on Twitter https://www.markhneedham.com/blog/2015/03/11/pythonneo4j-finding-interesting-computer-sciency-people-to-follow-on-twitter/ Wed, 11 Mar 2015 21:13:26 +0000 https://www.markhneedham.com/blog/2015/03/11/pythonneo4j-finding-interesting-computer-sciency-people-to-follow-on-twitter/ At the beginning of this year I moved from Neo4j’s field team to dev team and since the code we write there is much lower level than I’m used to I thought I should find some people to follow on twitter whom I can learn from. My technique for finding some of those people was to pick a person from the Neo4j kernel team who’s very good at systems programming and uses twitter which led me to Mr Chris Vest. Python: Streaming/Appending to a file https://www.markhneedham.com/blog/2015/03/09/python-streamingappending-to-a-file/ Mon, 09 Mar 2015 23:00:56 +0000 https://www.markhneedham.com/blog/2015/03/09/python-streamingappending-to-a-file/ I’ve been playing around with Twitter’s API (via the tweepy library) and due to the rate limiting it imposes I wanted to stream results to a CSV file rather than waiting until my whole program had finished. I wrote the following program to simulate what I was trying to do: import csv import time with open("rows.csv", "a") as file: writer = csv.writer(file, delimiter = ",") end = time.time() + 10 while True: if time. Neo4j: TF/IDF (and variants) with cypher https://www.markhneedham.com/blog/2015/03/08/neo4j-tfidf-and-variants-with-cypher/ Sun, 08 Mar 2015 13:24:19 +0000 https://www.markhneedham.com/blog/2015/03/08/neo4j-tfidf-and-variants-with-cypher/ A few weeks ago I wrote a blog post on running TF/IDF over HIMYM transcripts using scikit-learn to find the most important phrases by episode and afterwards I was curious how difficult it’d be to do in Neo4j. I started by translating one of wikipedia’s TF/IDF examples to cypher to see what the algorithm would look like: WITH 3 as termFrequency, 2 AS numberOfDocuments, 1 as numberOfDocumentsWithTerm WITH termFrequency, log10(numberOfDocuments / numberOfDocumentsWithTerm) AS inverseDocumentFrequency return termFrequency * inverseDocumentFrequency 0. Python: scikit-learn/lda: Extracting topics from QCon talk abstracts https://www.markhneedham.com/blog/2015/03/05/python-scikit-learnlda-extracting-topics-from-qcon-talk-abstracts/ Thu, 05 Mar 2015 08:52:22 +0000 https://www.markhneedham.com/blog/2015/03/05/python-scikit-learnlda-extracting-topics-from-qcon-talk-abstracts/ Following on from Rik van Bruggen’s blog post on a QCon graph he’s created ahead of this week’s conference, I was curious whether we could extract any interesting relationships between talks based on their abstracts. Talks are already grouped by their hosting track but there’s likely to be some overlap in topics even for talks on different tracks. I therefore wanted to extract topics and connect each talk to the topic that describes it best. Python: scikit-learn - Training a classifier with non numeric features https://www.markhneedham.com/blog/2015/03/02/python-scikit-learn-training-a-classifier-with-non-numeric-features/ Mon, 02 Mar 2015 07:48:24 +0000 https://www.markhneedham.com/blog/2015/03/02/python-scikit-learn-training-a-classifier-with-non-numeric-features/ Following on from my previous posts on training a classifier to pick out the speaker in sentences of HIMYM transcripts the next thing to do was train a random forest of decision trees to see how that fared. I’ve used scikit-learn for this before so I decided to use that. However, before building a random forest I wanted to check that I could build an equivalent decision tree. I initially thought that scikit-learn’s DecisionTree classifier would take in data in the same format as nltk’s so I started out with the following code: Python: Detecting the speaker in HIMYM using Parts of Speech (POS) tagging https://www.markhneedham.com/blog/2015/03/01/python-detecting-the-speaker-in-himym-using-parts-of-speech-pos-tagging/ Sun, 01 Mar 2015 02:36:06 +0000 https://www.markhneedham.com/blog/2015/03/01/python-detecting-the-speaker-in-himym-using-parts-of-speech-pos-tagging/ Over the last couple of weeks I’ve been experimenting with different classifiers to detect speakers in HIMYM transcripts and in all my attempts so far the only features I’ve used have been words. This led to classifiers that were overfitted to the training data so I wanted to generalise them by introducing parts of speech of the words in sentences which are more generic. First I changed the function which generates the features for each word to also contain the parts of speech of the previous and next words as well as the word itself: R/ggplot: Controlling X axis order https://www.markhneedham.com/blog/2015/02/27/rggplot-controlling-x-axis-order/ Fri, 27 Feb 2015 00:49:54 +0000 https://www.markhneedham.com/blog/2015/02/27/rggplot-controlling-x-axis-order/ As part of a talk I gave at the Neo4j London meetup earlier this week I wanted to show how you could build a simple chart showing the number of friends that different actors had using the ggplot library. I started out with the following code: df = read.csv("/tmp/friends.csv") top = df %>% head(20) ggplot(aes(x = p.name, y = colleagues), data = top) + geom_bar(fill = "dark blue", stat = "identity") The friends CSV file is available as a gist if you want to reproduce the chart. R: Conditionally updating rows of a data frame https://www.markhneedham.com/blog/2015/02/26/r-conditionally-updating-rows-of-a-data-frame/ Thu, 26 Feb 2015 00:45:42 +0000 https://www.markhneedham.com/blog/2015/02/26/r-conditionally-updating-rows-of-a-data-frame/ In a blog post I wrote a couple of days ago about cohort analysis I had to assign a monthNumber to each row in a data frame and started out with the following code: library(zoo) library(dplyr) monthNumber = function(cohort, date) { cohortAsDate = as.yearmon(cohort) dateAsDate = as.yearmon(date) if(cohortAsDate > dateAsDate) { "NA" } else { paste(round((dateAsDate - cohortAsDate) * 12), sep="") } } cohortAttendance %>% group_by(row_number()) %>% mutate(monthNumber = monthNumber(cohort, date)) %>% filter(monthNumber ! Python/nltk: Naive vs Naive Bayes vs Decision Tree https://www.markhneedham.com/blog/2015/02/24/pythonnltk-naive-vs-naive-bayes-vs-decision-tree/ Tue, 24 Feb 2015 22:39:49 +0000 https://www.markhneedham.com/blog/2015/02/24/pythonnltk-naive-vs-naive-bayes-vs-decision-tree/ Last week I wrote a blog post describing a decision tree I’d trained to detect the speakers in a How I met your mother transcript and after writing the post I wondered whether a simple classifier would do the job. The simple classifier will work on the assumption that any word followed by a ":" is a speaker and anything else isn’t. Here’s the definition of a NaiveClassifier: import nltk from nltk import ClassifierI class NaiveClassifier(ClassifierI): def classify(self, featureset): if featureset['next-word'] == ":": return True else: return False As you can see it only implements the classify method and executes a static check. R: Cohort analysis of Neo4j meetup members https://www.markhneedham.com/blog/2015/02/24/r-cohort-analysis-of-neo4j-meetup-members/ Tue, 24 Feb 2015 01:19:26 +0000 https://www.markhneedham.com/blog/2015/02/24/r-cohort-analysis-of-neo4j-meetup-members/ A few weeks ago I came across a blog post explaining how to apply cohort analysis to customer retention using R and I thought it’d be a fun exercise to calculate something similar for meetup attendees. In the customer retention example we track customer purchases on a month by month basis and each customer is put into a cohort or bucket based on the first month they made a purchase in. R/dplyr: Extracting data frame column value for filtering with %in% https://www.markhneedham.com/blog/2015/02/22/rdplyr-extracting-data-frame-column-value-for-filtering-with-in/ Sun, 22 Feb 2015 08:58:57 +0000 https://www.markhneedham.com/blog/2015/02/22/rdplyr-extracting-data-frame-column-value-for-filtering-with-in/ I’ve been playing around with dplyr over the weekend and wanted to extract the values from a data frame column to use in a later filtering step. I had a data frame: library(dplyr) df = data.frame(userId = c(1,2,3,4,5), score = c(2,3,4,5,5)) And wanted to extract the userIds of those people who have a score greater than 3. I started with: highScoringPeople = df %>% filter(score > 3) %>% select(userId) > highScoringPeople userId 1 3 2 4 3 5 And then filtered the data frame expecting to get back those 3 people: Python/scikit-learn: Detecting which sentences in a transcript contain a speaker https://www.markhneedham.com/blog/2015/02/20/pythonscikit-learn-detecting-which-sentences-in-a-transcript-contain-a-speaker/ Fri, 20 Feb 2015 22:42:59 +0000 https://www.markhneedham.com/blog/2015/02/20/pythonscikit-learn-detecting-which-sentences-in-a-transcript-contain-a-speaker/ Over the past couple of months I’ve been playing around with How I met your mother transcripts and the most recent thing I’ve been working on is how to extract the speaker for a particular sentence. This initially seemed like a really simple problem as most of the initial sentences I looked at weere structured like this: <speaker>: <sentence> If there were all in that format then we could write a simple regular expression and then move on but unfortunately they aren’t. Python's pandas vs Neo4j's cypher: Exploring popular phrases in How I met your mother transcripts https://www.markhneedham.com/blog/2015/02/19/pythons-pandas-vs-neo4js-cypher-exploring-popular-phrases-in-how-i-met-your-mother-transcripts/ Thu, 19 Feb 2015 00:52:10 +0000 https://www.markhneedham.com/blog/2015/02/19/pythons-pandas-vs-neo4js-cypher-exploring-popular-phrases-in-how-i-met-your-mother-transcripts/ I’ve previously written about extracting TF/IDF scores for phrases in documents using scikit-learn and the final step in that post involved writing the words into a CSV file for analysis later on. I wasn’t sure what the most appropriate tool of choice for that analysis was so I decided to explore the data using Python’s pandas library and load it into Neo4j and write some Cypher queries. To do anything with Neo4j we need to first load the CSV file into the database. Python/pandas: Column value in list (ValueError: The truth value of a Series is ambiguous.) https://www.markhneedham.com/blog/2015/02/16/pythonpandas-column-value-in-list-valueerror-the-truth-value-of-a-series-is-ambiguous/ Mon, 16 Feb 2015 21:39:16 +0000 https://www.markhneedham.com/blog/2015/02/16/pythonpandas-column-value-in-list-valueerror-the-truth-value-of-a-series-is-ambiguous/ I’ve been using Python’s pandas library while exploring some CSV files and although for the most part I’ve found it intuitive to use, I had trouble filtering a data frame based on checking whether a column value was in a list. A subset of one of the CSV files I’ve been working with looks like this: $ cat foo.csv "Foo" 1 2 3 4 5 6 7 8 9 10 Loading it into a pandas data frame is reasonably simple: Python/scikit-learn: Calculating TF/IDF on How I met your mother transcripts https://www.markhneedham.com/blog/2015/02/15/pythonscikit-learn-calculating-tfidf-on-how-i-met-your-mother-transcripts/ Sun, 15 Feb 2015 15:56:09 +0000 https://www.markhneedham.com/blog/2015/02/15/pythonscikit-learn-calculating-tfidf-on-how-i-met-your-mother-transcripts/ Over the past few weeks I’ve been playing around with various NLP techniques to find interesting insights into How I met your mother from its transcripts and one technique that kept coming up is TF/IDF. The Wikipedia definition reads like this: tf—idf, short for term frequency—inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. Neo4j: Building a topic graph with Prismatic Interest Graph API https://www.markhneedham.com/blog/2015/02/13/neo4j-building-a-topic-graph-with-prismatic-interest-graph-api/ Fri, 13 Feb 2015 23:38:43 +0000 https://www.markhneedham.com/blog/2015/02/13/neo4j-building-a-topic-graph-with-prismatic-interest-graph-api/ Over the last few weeks I’ve been using various NLP libraries to derive topics for my corpus of How I met your mother episodes without success and was therefore enthused to see the release of Prismatic’s Interest Graph API The Interest Graph API exposes a web service to which you feed a block of text and get back a set of topics and associated score. It has been trained over the last few years with millions of articles that people share on their social media accounts and in my experience using Prismatic the topics have been very useful for finding new material to read. Python/gensim: Creating bigrams over How I met your mother transcripts https://www.markhneedham.com/blog/2015/02/12/pythongensim-creating-bigrams-over-how-i-met-your-mother-transcripts/ Thu, 12 Feb 2015 23:45:03 +0000 https://www.markhneedham.com/blog/2015/02/12/pythongensim-creating-bigrams-over-how-i-met-your-mother-transcripts/ As part of my continued playing around with How I met your mother transcripts I wanted to identify plot arcs and as a first step I wrote some code using the gensim and nltk libraries to identify bigrams (two word phrases). There’s an easy to follow tutorial in the gensim docs showing how to go about this but I needed to do a couple of extra steps to get my text data from a CSV file into the structure gensim expects. R: Weather vs attendance at NoSQL meetups https://www.markhneedham.com/blog/2015/02/11/r-weather-vs-attendance-at-nosql-meetups/ Wed, 11 Feb 2015 07:09:25 +0000 https://www.markhneedham.com/blog/2015/02/11/r-weather-vs-attendance-at-nosql-meetups/ A few weeks ago I came across a tweet by Sean Taylor asking for a weather data set with a few years worth of recording and I was surprised to learn that R already has such a thing - the weatherData package. Winner is: @UTVilla! library(weatherData)df <- getWeatherForYear("SFO", 2013)ggplot(df, aes(x=Date, y = Mean_TemperatureF)) + geom_line() — Sean J. Taylor (@seanjtaylor) January 22, 2015 weatherData provides a thin veneer around the wunderground API and was exactly what I’d been looking for to compare meetup at London’s NoSQL against weather conditions that day. Python/matpotlib: Plotting occurrences of the main characters in How I Met Your Mother https://www.markhneedham.com/blog/2015/01/30/pythonmatpotlib-plotting-occurrences-of-the-main-characters-in-how-i-met-your-mother/ Fri, 30 Jan 2015 21:29:00 +0000 https://www.markhneedham.com/blog/2015/01/30/pythonmatpotlib-plotting-occurrences-of-the-main-characters-in-how-i-met-your-mother/ Normally when I’m playing around with data sets in R I get out ggplot2 to plot some charts to get a feel for the data but having spent quite a bit of time with Python and How I met your mother transcripts I haven’t created a single plot. I thought I’d better change change that. After a bit of searching around it seems that matplotlib is the go to library for this job and I thought an interesting thing to plot would be how often each of the main characters appear in each episode across the show. R: ggplot2 - Each group consist of only one observation. Do you need to adjust the group aesthetic? https://www.markhneedham.com/blog/2015/01/30/r-ggplot2-each-group-consist-of-only-one-observation-do-you-need-to-adjust-the-group-aesthetic/ Fri, 30 Jan 2015 00:27:53 +0000 https://www.markhneedham.com/blog/2015/01/30/r-ggplot2-each-group-consist-of-only-one-observation-do-you-need-to-adjust-the-group-aesthetic/ I’ve been playing around with some weather data over the last couple of days which I aggregated down to the average temperature per month over the last 4 years and stored in a CSV file. This is what the file looks like: $ cat /tmp/averageTemperatureByMonth.csv "month","aveTemperature" "January",6.02684563758389 "February",5.89380530973451 "March",7.54838709677419 "April",10.875 "May",13.3064516129032 "June",15.9666666666667 "July",18.8387096774194 "August",18.3709677419355 "September",16.2583333333333 "October",13.4596774193548 "November",9.19166666666667 "December",7.01612903225806 I wanted to create a simple line chart which would show the months of the year in ascending order with the appropriate temperature. Python: Find the highest value in a group https://www.markhneedham.com/blog/2015/01/25/python-find-the-highest-value-in-a-group/ Sun, 25 Jan 2015 12:47:01 +0000 https://www.markhneedham.com/blog/2015/01/25/python-find-the-highest-value-in-a-group/ In my continued playing around with a How I met your mother data set I needed to find out the last episode that happened in a season so that I could use it in a chart I wanted to plot. I had this CSV file containing each of the episodes: $ head -n 10 data/import/episodes.csv NumberOverall,NumberInSeason,Episode,Season,DateAired,Timestamp 1,1,/wiki/Pilot,1,"September 19, 2005",1127084400 2,2,/wiki/Purple_Giraffe,1,"September 26, 2005",1127689200 3,3,/wiki/Sweet_Taste_of_Liberty,1,"October 3, 2005",1128294000 4,4,/wiki/Return_of_the_Shirt,1,"October 10, 2005",1128898800 5,5,/wiki/Okay_Awesome,1,"October 17, 2005",1129503600 6,6,/wiki/Slutty_Pumpkin,1,"October 24, 2005",1130108400 7,7,/wiki/Matchmaker,1,"November 7, 2005",1131321600 8,8,/wiki/The_Duel,1,"November 14, 2005",1131926400 9,9,/wiki/Belly_Full_of_Turkey,1,"November 21, 2005",1132531200 I started out by parsing the CSV file into a dictionary of (seasons -> episode ids): Python/pdfquery: Scraping the FIFA World Player of the Year votes PDF into shape https://www.markhneedham.com/blog/2015/01/22/pythonpdfquery-scraping-the-fifa-world-player-of-the-year-votes-pdf-into-shape/ Thu, 22 Jan 2015 00:25:24 +0000 https://www.markhneedham.com/blog/2015/01/22/pythonpdfquery-scraping-the-fifa-world-player-of-the-year-votes-pdf-into-shape/ Last week the FIFA Ballon d’Or 2014 was announced and along with the announcement of the winner the individual votes were also made available. Unfortunately they weren’t made open in a way that Ben Wellington (of IQuantNY fame) would approve of - the choice of format for the data is a PDF file! I wanted to extract this data to play around with it but I wanted to automate the extraction as I’d done when working with Google Trends data. Python/NLTK: Finding the most common phrases in How I Met Your Mother https://www.markhneedham.com/blog/2015/01/19/pythonnltk-finding-the-most-common-phrases-in-how-i-met-your-mother/ Mon, 19 Jan 2015 00:24:23 +0000 https://www.markhneedham.com/blog/2015/01/19/pythonnltk-finding-the-most-common-phrases-in-how-i-met-your-mother/ Following on from last week’s blog post where I found the most popular words in How I met your mother transcripts, in this post we’ll have a look at how we can pull out sentences and then phrases from our corpus. The first thing I did was tweak the scraping script to pull out the sentences spoken by characters in the transcripts.</p> Each dialogue is separated by two line breaks so we use that as our separator. Python: Counter - ValueError: too many values to unpack https://www.markhneedham.com/blog/2015/01/12/python-counter-valueerror-too-many-values-to-unpack/ Mon, 12 Jan 2015 23:16:58 +0000 https://www.markhneedham.com/blog/2015/01/12/python-counter-valueerror-too-many-values-to-unpack/ I recently came across Python’s Counter tool which makes it really easy to count the number of occurrences of items in a list. In my case I was trying to work out how many times words occurred in a corpus so I had something like the following: >> from collections import Counter >> counter = Counter(["word1", "word2", "word3", "word1"]) >> print counter Counter({'word1': 2, 'word3': 1, 'word2': 1}) I wanted to write a for loop to iterate over the counter and print the (key, value) pairs and started with the following: Python: scikit-learn: ImportError: cannot import name __check_build https://www.markhneedham.com/blog/2015/01/10/python-scikit-learn-importerror-cannot-import-name-__check_build/ Sat, 10 Jan 2015 08:48:04 +0000 https://www.markhneedham.com/blog/2015/01/10/python-scikit-learn-importerror-cannot-import-name-__check_build/ In part 3 of Kaggle’s series on text analytics I needed to install scikit-learn and having done so ran into the following error when trying to use one of its classes: >>> from sklearn.feature_extraction.text import CountVectorizer Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/Users/markneedham/projects/neo4j-himym/himym/lib/python2.7/site-packages/sklearn/__init__.py", line 37, in <module> from . import __check_build ImportError: cannot import name __check_build This error doesn’t reveal very much but I found that when I exited the REPL and tried the same command again I got a different error which was a bit more useful: Python: gensim - clang: error: unknown argument: '-mno-fused-madd' [-Wunused-command-line-argument-hard-error-in-future] https://www.markhneedham.com/blog/2015/01/10/python-gensim-clang-error-unknown-argument-mno-fused-madd-wunused-command-line-argument-hard-error-in-future/ Sat, 10 Jan 2015 08:39:15 +0000 https://www.markhneedham.com/blog/2015/01/10/python-gensim-clang-error-unknown-argument-mno-fused-madd-wunused-command-line-argument-hard-error-in-future/ While working through part 2 of Kaggle’s bag of words tutorial I needed to install the gensim library and initially ran into the following error: $ pip install gensim ... cc -fno-strict-aliasing -fno-common -dynamic -arch x86_64 -arch i386 -g -Os -pipe -fno-common -fno-strict-aliasing -fwrapv -mno-fused-madd -DENABLE_DTRACE -DMACOSX -DNDEBUG -Wall -Wstrict-prototypes -Wshorten-64-to-32 -DNDEBUG -g -fwrapv -Os -Wall -Wstrict-prototypes -DENABLE_DTRACE -arch x86_64 -arch i386 -pipe -I/Users/markneedham/projects/neo4j-himym/himym/build/gensim/gensim/models -I/System/Library/Frameworks/Python.framework/Versions/2.7/include/python2.7 -I/Users/markneedham/projects/neo4j-himym/himym/lib/python2.7/site-packages/numpy/core/include -c ./gensim/models/word2vec_inner.c -o build/temp. Python NLTK/Neo4j: Analysing the transcripts of How I Met Your Mother https://www.markhneedham.com/blog/2015/01/10/python-nltkneo4j-analysing-the-transcripts-of-how-i-met-your-mother/ Sat, 10 Jan 2015 01:22:56 +0000 https://www.markhneedham.com/blog/2015/01/10/python-nltkneo4j-analysing-the-transcripts-of-how-i-met-your-mother/ After reading Emil’s blog post about dark data a few weeks ago I became intrigued about trying to find some structure in free text data and I thought How I met your mother’s transcripts would be a good place to start. I found a website which has the transcripts for all the episodes and then having manually downloaded the two pages which listed all the episodes, wrote a script to grab each of the transcripts so I could use them on my machine. R: Featuring engineering for a linear model https://www.markhneedham.com/blog/2014/12/28/r-featuring-engineering-for-a-linear-model/ Sun, 28 Dec 2014 21:55:23 +0000 https://www.markhneedham.com/blog/2014/12/28/r-featuring-engineering-for-a-linear-model/ I previously wrote about a linear model I created to predict how many people would RSVP 'yes' to a meetup event and having not found much correlation between any of my independent variables and RSVPs was a bit stuck. As luck would have it I bumped into Antonios at a meetup a month ago and he offered to take a look at what I’d tried so far and give me some tips on how to progress. Neo4j 2.1.6 - Cypher: FOREACH slowness https://www.markhneedham.com/blog/2014/12/28/neo4j-2-1-6-cypher-foreach-slowness/ Sun, 28 Dec 2014 04:28:25 +0000 https://www.markhneedham.com/blog/2014/12/28/neo4j-2-1-6-cypher-foreach-slowness/ A common problem that people have when using Neo4j for social network applications is updating a person with their newly imported friends. We’ll have an array of friends that we want to connect to a single Person node. Assuming the following schema… $ schema Indexes ON :Person(id) ONLINE No constraints …a simplified version would look like this: WITH range (2,1002) AS friends MERGE (p:Person {id: 1}) FOREACH(f IN friends | MERGE (friend:Person {id: f}) MERGE (friend)-[:FRIENDS]->p); If we execute that on an empty database we’ll see something like this: R: Vectorising all the things https://www.markhneedham.com/blog/2014/12/22/r-vectorising-all-the-things/ Mon, 22 Dec 2014 11:46:25 +0000 https://www.markhneedham.com/blog/2014/12/22/r-vectorising-all-the-things/ After my last post about finding the distance a date/time is from the weekend Hadley Wickham suggested I could improve the function by vectorising it... @markhneedham vectorise with pmin(pmax(dateToLookup - before, 0), pmax(after - dateToLookup, 0)) / dhours(1) — Hadley Wickham (@hadleywickham) December 14, 2014 …so I thought I’d try and vectorise some of the other functions I’ve written recently and show the two versions. I found the following articles useful for explaining vectorisation and why you might want to do it: R: Time to/from the weekend https://www.markhneedham.com/blog/2014/12/13/r-time-tofrom-the-weekend/ Sat, 13 Dec 2014 20:38:22 +0000 https://www.markhneedham.com/blog/2014/12/13/r-time-tofrom-the-weekend/ In my last post I showed some examples using R’s lubridate package and another problem it made really easy to solve was working out how close a particular date time was to the weekend. I wanted to write a function which would return the previous Sunday or upcoming Saturday depending on which was closer. lubridate’s floor_date and ceiling_date functions make this quite simple. e.g. if we want to round the 18th December down to the beginning of the week and up to the beginning of the next week we could do the following: R: Numeric representation of date time https://www.markhneedham.com/blog/2014/12/13/r-numeric-representation-of-date-time/ Sat, 13 Dec 2014 19:58:13 +0000 https://www.markhneedham.com/blog/2014/12/13/r-numeric-representation-of-date-time/ I’ve been playing around with date times in R recently and I wanted to derive a numeric representation for a given value to make it easier to see the correlation between time and another variable. e.g. December 13th 2014 17:30 should return 17.5 since it’s 17.5 hours since midnight. Using the standard R libraries we would write the following code: > december13 = as.POSIXlt("2014-12-13 17:30:00") > as.numeric(december13 - trunc(december13, "day"), units="hours") [1] 17. R: data.table/dplyr/lubridate - Error in wday(date, label = TRUE, abbr = FALSE) : unused arguments (label = TRUE, abbr = FALSE) https://www.markhneedham.com/blog/2014/12/11/r-data-tabledplyrlubridate-error-in-wdaydate-label-true-abbr-false-unused-arguments-label-true-abbr-false/ Thu, 11 Dec 2014 19:03:06 +0000 https://www.markhneedham.com/blog/2014/12/11/r-data-tabledplyrlubridate-error-in-wdaydate-label-true-abbr-false-unused-arguments-label-true-abbr-false/ I spent a couple of hours playing around with data.table this evening and tried changing some code written using a data frame to use a data table instead. I started off by building a data frame which contains all the weekends between 2010 and 2015... > library(lubridate) > library(dplyr) > dates = data.frame(date = seq( dmy("01-01-2010"), to=dmy("01-01-2015"), by="day" )) > dates = dates %>% filter(wday(date, label = TRUE, abbr = FALSE) %in% c("Saturday", "Sunday")) . R: Cleaning up and plotting Google Trends data https://www.markhneedham.com/blog/2014/12/09/r-cleaning-up-plotting-google-trends-data/ Tue, 09 Dec 2014 18:14:45 +0000 https://www.markhneedham.com/blog/2014/12/09/r-cleaning-up-plotting-google-trends-data/ I recently came across an excellent article written by Stian Haklev in which he describes things he wishes he’d been told before starting out with R, one being to do all data clean up in code which I thought I’d give a try. My goal is to leave the raw data completely unchanged, and do all the transformation in code, which can be rerun at any time. While I’m writing the scripts, I’m often jumping around, selectively executing individual lines or code blocks, running commands to inspect the data in the REPL (read-evaluate-print-loop, where each command is executed as soon as you type enter, in the picture above it’s the pane to the right), etc. R: dplyr - mutate with strptime (incompatible size/wrong result size) https://www.markhneedham.com/blog/2014/12/08/r-dplyr-mutate-with-strptime-incompatible-sizewrong-result-size/ Mon, 08 Dec 2014 19:02:46 +0000 https://www.markhneedham.com/blog/2014/12/08/r-dplyr-mutate-with-strptime-incompatible-sizewrong-result-size/ Having worked out how to translate a string into a date or NA if it wasn’t the appropriate format the next thing I wanted to do was store the result of the transformation in my data frame. I started off with this: data = data.frame(x = c("2014-01-01", "2014-02-01", "foo")) > data x 1 2014-01-01 2 2014-02-01 3 foo And when I tried to do the date translation ran into the following error: R: String to Date or NA https://www.markhneedham.com/blog/2014/12/07/r-string-to-date-or-na/ Sun, 07 Dec 2014 19:29:01 +0000 https://www.markhneedham.com/blog/2014/12/07/r-string-to-date-or-na/ I’ve been trying to clean up a CSV file which contains some rows with dates and some not - I only want to keep the cells which do have dates so I’ve been trying to work out how to do that. My first thought was that I’d try and find a function which would convert the contents of the cell into a date if it was in date format and NA if not. R: Applying a function to every row of a data frame https://www.markhneedham.com/blog/2014/12/04/r-applying-a-function-to-every-row-of-a-data-frame/ Thu, 04 Dec 2014 06:31:02 +0000 https://www.markhneedham.com/blog/2014/12/04/r-applying-a-function-to-every-row-of-a-data-frame/ In my continued exploration of London’s meetups I wanted to calculate the distance from meetup venues to a centre point in London. I’ve created a gist containing the coordinates of some of the venues that host NoSQL meetups in London town if you want to follow along: library(dplyr) # https://gist.github.com/mneedham/7e926a213bf76febf5ed venues = read.csv("/tmp/venues.csv") venues %>% head() ## venue lat lon ## 1 Skills Matter 51.52482 -0.099109 ## 2 Skinkers 51. Spark: Write to CSV file with header using saveAsFile https://www.markhneedham.com/blog/2014/11/30/spark-write-to-csv-file-with-header-using-saveasfile/ Sun, 30 Nov 2014 08:21:54 +0000 https://www.markhneedham.com/blog/2014/11/30/spark-write-to-csv-file-with-header-using-saveasfile/ In my last blog post I showed how to write to a single CSV file using Spark and Hadoop and the next thing I wanted to do was add a header row to the resulting row. Hadoop’s https://hadoop.apache.org/docs/current/api/org/apache/hadoop/fs/FileUtil.html#copyMerge(org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.Path, org.apache.hadoop.fs.FileSystem, org.apache.hadoop.fs.Path, boolean, org.apache.hadoop.conf.Configuration, java.lang.String) function does take a String parameter but it adds this text to the end of each partition file which isn’t quite what we want. However, if we copy that function into our own FileUtil class we can restructure it to do what we want: Spark: Write to CSV file https://www.markhneedham.com/blog/2014/11/30/spark-write-to-csv-file/ Sun, 30 Nov 2014 07:40:00 +0000 https://www.markhneedham.com/blog/2014/11/30/spark-write-to-csv-file/ A couple of weeks ago I wrote how I’d been using Spark to explore a City of Chicago Crime data set and having worked out how many of each crime had been committed I wanted to write that to a CSV file. Spark provides a saveAsTextFile function which allows us to save RDD’s so I refactored my code into the following format to allow me to use that: import au. Docker/Neo4j: Port forwarding on Mac OS X not working https://www.markhneedham.com/blog/2014/11/27/dockerneo4j-port-forwarding-on-mac-os-x-not-working/ Thu, 27 Nov 2014 12:28:14 +0000 https://www.markhneedham.com/blog/2014/11/27/dockerneo4j-port-forwarding-on-mac-os-x-not-working/ Prompted by Ognjen Bubalo’s excellent blog post I thought it was about time I tried running Neo4j on a docker container on my Mac Book Pro to make it easier to play around with different data sets. I got the container up and running by following Ognien’s instructions and had the following ports forwarded to my host machine: $ docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES c62f8601e557 tpires/neo4j:latest "/bin/bash -c /launc About an hour ago Up About an hour 0. R: dplyr - Select 'random' rows from a data frame https://www.markhneedham.com/blog/2014/11/26/r-dplyr-select-random-rows-from-a-data-frame/ Wed, 26 Nov 2014 00:01:12 +0000 https://www.markhneedham.com/blog/2014/11/26/r-dplyr-select-random-rows-from-a-data-frame/ Frequently I find myself wanting to take a sample of the rows in a data frame where just taking the head isn’t enough. Let’s say we start with the following data frame: data = data.frame( letter = sample(LETTERS, 50000, replace = TRUE), number = sample (1:10, 50000, replace = TRUE) ) And we’d like to sample 10 rows to see what it contains. We’ll start by generating 10 random numbers to represent row numbers using the runif function: R: dplyr - "Variables not shown" https://www.markhneedham.com/blog/2014/11/23/r-dplyr-variables-not-shown/ Sun, 23 Nov 2014 01:02:06 +0000 https://www.markhneedham.com/blog/2014/11/23/r-dplyr-variables-not-shown/ I recently ran into a problem where the result of applying some operations to a data frame wasn’t being output the way I wanted. I started with this data frame: words = function(numberOfWords, lengthOfWord) { w = c(1:numberOfWords) for(i in 1:numberOfWords) { w[i] = paste(sample(letters, lengthOfWord, replace=TRUE), collapse = "") } w } numberOfRows = 100 df = data.frame(a = sample (1:numberOfRows, 10, replace = TRUE), b = sample (1:numberOfRows, 10, replace = TRUE), name = words(numberOfRows, 10)) I wanted to group the data frame by a and b and output a comma separated list of the associated names. R: ggmap - Overlay shapefile with filled polygon of regions https://www.markhneedham.com/blog/2014/11/17/r-ggmap-overlay-shapefile-with-filled-polygon-of-regions/ Mon, 17 Nov 2014 00:53:11 +0000 https://www.markhneedham.com/blog/2014/11/17/r-ggmap-overlay-shapefile-with-filled-polygon-of-regions/ I’ve been playing around with plotting maps in R over the last week and got to the point where I wanted to have a google map in the background with a filled polygon on a shapefile in the foreground. The first bit is reasonably simple - we can just import the ggmap library and make a call to get_map: > library(ggmap) > sfMap = map = get_map(location = 'San Francisco', zoom = 12) Next I wanted to show the outlines of the different San Francisco zip codes and came across a blog post by Paul Bidanset on Baltimore neighbourhoods which I was able to adapt. Spark: Parse CSV file and group by column value https://www.markhneedham.com/blog/2014/11/16/spark-parse-csv-file-and-group-by-column-value/ Sun, 16 Nov 2014 22:53:49 +0000 https://www.markhneedham.com/blog/2014/11/16/spark-parse-csv-file-and-group-by-column-value/ I’ve found myself working with large CSV files quite frequently and realising that my existing toolset didn’t let me explore them quickly I thought I’d spend a bit of time looking at Spark to see if it could help. I’m working with a crime data set released by the City of Chicago: it’s 1GB in size and contains details of 4 million crimes: $ ls -alh ~/Downloads/Crimes_-_2001_to_present.csv -rw-r--r--@ 1 markneedham staff 1. R: dplyr - Sum for group_by multiple columns https://www.markhneedham.com/blog/2014/11/11/r-dplyr-sum-for-group_by-multiple-columns/ Tue, 11 Nov 2014 00:17:32 +0000 https://www.markhneedham.com/blog/2014/11/11/r-dplyr-sum-for-group_by-multiple-columns/ Over the weekend I was playing around with dplyr and had the following data frame grouped by both columns: > library(dplyr) > data = data.frame( letter = sample(LETTERS, 50000, replace = TRUE), number = sample (1:10, 50000, replace = TRUE) ) > data %>% count(letter, number) %>% head(20) Source: local data frame [20 x 3] Groups: letter letter number n 1 A 1 184 2 A 2 192 3 A 3 183 4 A 4 193 5 A 5 214 6 A 6 172 7 A 7 196 8 A 8 198 9 A 9 174 10 A 10 196 11 B 1 212 12 B 2 198 13 B 3 194 14 B 4 181 15 B 5 203 16 B 6 234 17 B 7 221 18 B 8 179 19 B 9 182 20 B 10 170 I wanted to add an extra column which would show what percentage of the values for that letter each number had. R: dplyr - Maximum value row in each group https://www.markhneedham.com/blog/2014/11/10/r-maximum-value-row-in-each-group/ Mon, 10 Nov 2014 22:06:49 +0000 https://www.markhneedham.com/blog/2014/11/10/r-maximum-value-row-in-each-group/ In my continued work with R’s dplyr I wanted to be able to group a data frame by some columns and then find the maximum value for each group. We’ll continue with my favourite dummy data set: > library(dplyr) > data = data.frame( letter = sample(LETTERS, 50000, replace = TRUE), number = sample (1:10, 50000, replace = TRUE) ) I started with the following code to count how many occurrences of each (letter, number) pair there were: R: dplyr - Ordering by count after multiple column group_by https://www.markhneedham.com/blog/2014/11/09/r-dplyr-ordering-by-count-after-multiple-column-group_by/ Sun, 09 Nov 2014 09:30:09 +0000 https://www.markhneedham.com/blog/2014/11/09/r-dplyr-ordering-by-count-after-multiple-column-group_by/ I was recently trying to group a data frame by two columns and then sort by the count using dplyr but it wasn’t sorting in the way I expecting which was initially very confusing. I started with this data frame: library(dplyr) data = data.frame( letter = sample(LETTERS, 50000, replace = TRUE), number = sample (1:10, 50000, replace = TRUE) ) And I wanted to find out how many occurrences of each (letter, number) pair exist in the data set. R: Refactoring to dplyr https://www.markhneedham.com/blog/2014/11/09/r-refactoring-to-dplyr/ Sun, 09 Nov 2014 00:11:48 +0000 https://www.markhneedham.com/blog/2014/11/09/r-refactoring-to-dplyr/ I’ve been looking back over some of the early code I wrote using R before I knew about the dplyr library and thought it’d be an interesting exercise to refactor some of the snippets. We’ll use the following data frame for each of the examples: library(dplyr) data = data.frame( letter = sample(LETTERS, 50000, replace = TRUE), number = sample (1:10, 50000, replace = TRUE) ) Take {n} rows > data[1:5,] letter number 1 R 7 2 Q 3 3 B 8 4 R 3 5 U 2 becomes: R: dplyr - Group by field dynamically ('regroup' is deprecated / no applicable method for 'as.lazy' applied to an object of class "list" ) https://www.markhneedham.com/blog/2014/11/08/r-dplyr-group-by-field-dynamically-regroup-is-deprecated-no-applicable-method-for-as-lazy-applied-to-an-object-of-class-list/ Sat, 08 Nov 2014 22:29:01 +0000 https://www.markhneedham.com/blog/2014/11/08/r-dplyr-group-by-field-dynamically-regroup-is-deprecated-no-applicable-method-for-as-lazy-applied-to-an-object-of-class-list/ A few months ago I wrote a blog explaining how to dynamically/programatically group a data frame by a field using dplyr but that approach has been deprecated in the latest version. To recap, the original function looked like this: library(dplyr) groupBy = function(df, field) { df %.% regroup(list(field)) %.% summarise(n = n()) } And if we execute that with a sample data frame we’ll see the following: > data = data. R: Joining multiple data frames https://www.markhneedham.com/blog/2014/11/07/r-joining-multiple-data-frames/ Fri, 07 Nov 2014 01:29:09 +0000 https://www.markhneedham.com/blog/2014/11/07/r-joining-multiple-data-frames/ I’ve been looking through the code from Martin Eastwood’s excellent talk 'Predicting Football Using R' and was intrigued by the code which reshaped the data into that expected by glm. The original looks like this: df <- read.csv('http://www.football-data.co.uk/mmz4281/1314/E0.csv') # munge data into format compatible with glm function df <- apply(df, 1, function(row){ data.frame(team=c(row['HomeTeam'], row['AwayTeam']), opponent=c(row['AwayTeam'], row['HomeTeam']), goals=c(row['FTHG'], row['FTAG']), home=c(1, 0)) }) df <- do.call(rbind, df) The initial data frame looks like this: R: Converting a named vector to a data frame https://www.markhneedham.com/blog/2014/10/31/r-converting-a-named-vector-to-a-data-frame/ Fri, 31 Oct 2014 23:47:26 +0000 https://www.markhneedham.com/blog/2014/10/31/r-converting-a-named-vector-to-a-data-frame/ I’ve been playing around with igraph’s page rank function to see who the most central nodes in the London NoSQL scene are and I wanted to put the result in a data frame to make the data easier to work with. I started off with a data frame containing pairs of people and the number of events that they’d both RSVP’d 'yes' to: > library(dplyr) > data %>% arrange(desc(times)) %>% head(10) p. hdiutil: could not access / create failed - Operation canceled https://www.markhneedham.com/blog/2014/10/31/hdiutil-could-not-access-create-failed-operation-canceled/ Fri, 31 Oct 2014 09:45:32 +0000 https://www.markhneedham.com/blog/2014/10/31/hdiutil-could-not-access-create-failed-operation-canceled/ Earlier in the year I wrote a blog post showing how to build a Mac OS X DMG file for a Java application and I recently revisited this script to update it to a new version and ran into a frustrating error message. I tried to run the following command to create a new DMG file from a source folder... $ hdiutil create -volname "DemoBench" -size 100m -srcfolder dmg/ -ov -format UDZO pack. Data Modelling: The Thin Model https://www.markhneedham.com/blog/2014/10/27/data-modelling-the-thin-model/ Mon, 27 Oct 2014 06:55:34 +0000 https://www.markhneedham.com/blog/2014/10/27/data-modelling-the-thin-model/ About a third of the way through Mastering Data Modeling the authors describe common data modelling mistakes and one in particular resonated with me - 'Thin LDS, Lost Users'. LDS stands for 'Logical Data Structure' which is a diagram depicting what kinds of data some person or group wants to remember. In other words, a tool to help derive the conceptual model for our domain. They describe the problem that a thin model can cause as follows: Neo4j: Cypher - Avoiding the Eager https://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/ Thu, 23 Oct 2014 05:56:57 +0000 https://www.markhneedham.com/blog/2014/10/23/neo4j-cypher-avoiding-the-eager/ Although I love how easy Cypher’s LOAD CSV command makes it to get data into Neo4j, it currently breaks the rule of least surprise in the way it eagerly loads in all rows for some queries even those using periodic commit. This is something that my colleague Michael noted in the second of his blog posts explaining how to use LOAD CSV successfully: The biggest issue that people ran into, even when following the advice I gave earlier, was that for large imports of more than one million rows, Cypher ran into an out-of-memory situation. Neo4j: Modelling sub types https://www.markhneedham.com/blog/2014/10/20/neo4j-modelling-sub-types/ Mon, 20 Oct 2014 23:08:45 +0000 https://www.markhneedham.com/blog/2014/10/20/neo4j-modelling-sub-types/ A question which sometimes comes up when discussing graph data modelling is how you go about modelling sub/super types. In my experience there are two reasons why we might want to do this: To ensure that certain properties exist on bits of data To write drill down queries based on those types At the moment the former isn’t built into Neo4j and you’d only be able to achieve it by wiring up some code in a pre commit hook of a transaction event handler so we’ll focus on the latter. Python: Converting a date string to timestamp https://www.markhneedham.com/blog/2014/10/20/python-converting-a-date-string-to-timestamp/ Mon, 20 Oct 2014 15:53:51 +0000 https://www.markhneedham.com/blog/2014/10/20/python-converting-a-date-string-to-timestamp/ I’ve been playing around with Python over the last few days while cleaning up a data set and one thing I wanted to do was translate date strings into a timestamp. I started with a date in this format: date_text = "13SEP2014" So the first step is to translate that into a Python date - the strftime section of the documentation is useful for figuring out which format code is needed: Neo4j: LOAD CSV - The sneaky null character https://www.markhneedham.com/blog/2014/10/18/neo4j-load-csv-the-sneaky-null-character/ Sat, 18 Oct 2014 10:49:07 +0000 https://www.markhneedham.com/blog/2014/10/18/neo4j-load-csv-the-sneaky-null-character/ I spent some time earlier in the week trying to import a CSV file extracted from Hadoop into Neo4j using Cypher’s LOAD CSV command and initially struggled due to some rogue characters. The CSV file looked like this: $ cat foo.csv foo,bar,baz 1,2,3 I wrote the following LOAD CSV query to extract some of the fields and compare others: load csv with headers from "file:/Users/markneedham/Downloads/foo.csv" AS line RETURN line.foo, line. R: Linear models with the lm function, NA values and Collinearity https://www.markhneedham.com/blog/2014/10/18/r-linear-models-with-the-lm-function-na-values-and-collinearity/ Sat, 18 Oct 2014 06:35:49 +0000 https://www.markhneedham.com/blog/2014/10/18/r-linear-models-with-the-lm-function-na-values-and-collinearity/ In my continued playing around with R I’ve sometimes noticed 'NA' values in the linear regression models I created but hadn’t really thought about what that meant. On the advice of Peter Huber I recently started working my way through Coursera’s Regression Models which has a whole slide explaining its meaning: So in this case 'z' doesn’t help us in predicting Fertility since it doesn’t give us any more information that we can’t already get from 'Agriculture' and 'Education'. The Hard Thing About Hard Things - Ben Horowitz: Book Review https://www.markhneedham.com/blog/2014/10/13/the-hard-thing-about-hard-things-ben-horowitz-book-review/ Mon, 13 Oct 2014 23:59:25 +0000 https://www.markhneedham.com/blog/2014/10/13/the-hard-thing-about-hard-things-ben-horowitz-book-review/ I came across 'The Hard Thing About Hard Things' while reading an article about Ben Horowitz’s venture capital firm and it was intriguing enough that I bought it and then read through it over a couple of days. Although the blurb suggests that it’s a book about about building and running a startup I think a lot of the lessons are applicable for any business. These were some of the main points that stood out for me: Lessons from running Neo4j based 'hackathons' https://www.markhneedham.com/blog/2014/10/11/lessons-from-running-neo4j-based-hackathons/ Sat, 11 Oct 2014 10:52:01 +0000 https://www.markhneedham.com/blog/2014/10/11/lessons-from-running-neo4j-based-hackathons/ Over the last 6 months my colleagues and I have been running hands on Neo4j based sessions every few weeks and I was recently asked if I could write up the lessons we’ve learned. So in no particular order here are some of the things that we’ve learnt: Have a plan but don’t stick to it rigidly Something we learnt early on is that it’s helpful to have a rough plan of how you’re going to spend the session otherwise it can feel quite chaotic for attendees. Conceptual Model vs Graph Model https://www.markhneedham.com/blog/2014/10/06/conceptual-model-vs-graph-model/ Mon, 06 Oct 2014 07:11:50 +0000 https://www.markhneedham.com/blog/2014/10/06/conceptual-model-vs-graph-model/ We’ve started running some sessions on graph modelling in London and during the first session it was pointed out that the process I’d described was very similar to that when modelling for a relational database. I thought I better do some reading on the way relational models are derived and I came across an excellent video by Joe Maguire titled 'Data Modelers Still Have Jobs: Adjusting For the NoSQL Environment' R: A first attempt at linear regression https://www.markhneedham.com/blog/2014/09/30/r-a-first-attempt-at-linear-regression/ Tue, 30 Sep 2014 22:20:05 +0000 https://www.markhneedham.com/blog/2014/09/30/r-a-first-attempt-at-linear-regression/ I’ve been working through the videos that accompany the Introduction to Statistical Learning with Applications in R book and thought it’d be interesting to try out the linear regression algorithm against my meetup data set. I wanted to see how well a linear regression algorithm could predict how many people were likely to RSVP to a particular event. I started with the following code to build a data frame containing some potential predictors: Neo4j: Generic/Vague relationship names https://www.markhneedham.com/blog/2014/09/30/neo4j-genericvague-relationship-names/ Tue, 30 Sep 2014 16:47:29 +0000 https://www.markhneedham.com/blog/2014/09/30/neo4j-genericvague-relationship-names/ An approach to modelling that I often see while working with Neo4j users is creating very generic relationships (e.g. HAS, CONTAINS, IS) and filtering on a relationship property or on a property/label at the end node. Intuitively this doesn’t seem to make best use of the graph model as it means that you have to evaluate many relationships and nodes that you’re not interested in. However, I’ve never actually tested the performance differences between the approaches so I thought I’d try it out. PostgreSQL: ERROR: column does not exist https://www.markhneedham.com/blog/2014/09/29/postgresql-error-column-does-not-exist/ Mon, 29 Sep 2014 22:40:31 +0000 https://www.markhneedham.com/blog/2014/09/29/postgresql-error-column-does-not-exist/ I’ve been playing around with PostgreSQL recently and in particular the Northwind dataset typically used as an introductory data set for relational databases. Having imported the data I wanted to take a quick look at the employees table: postgres=# select * from employees limit 1; EmployeeID | LastName | FirstName | Title | TitleOfCourtesy | BirthDate | HireDate | Address | City | Region | PostalCode | Country | HomePhone | Extension | Photo | Notes | ReportsTo | PhotoPath ------------+----------+-----------+----------------------+-----------------+------------+------------+-----------------------------+---------+--------+------------+---------+----------------+-----------+-------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-----------+-------------------------------------- 1 | Davolio | Nancy | Sales Representative | Ms. R: Deriving a new data frame column based on containing string https://www.markhneedham.com/blog/2014/09/29/r-deriving-a-new-data-frame-column-based-on-containing-string/ Mon, 29 Sep 2014 21:37:21 +0000 https://www.markhneedham.com/blog/2014/09/29/r-deriving-a-new-data-frame-column-based-on-containing-string/ I’ve been playing around with R data frames a bit more and one thing I wanted to do was derive a new column based on the text contained in the existing column. I started with something like this: > x = data.frame(name = c("Java Hackathon", "Intro to Graphs", "Hands on Cypher")) > x name 1 Java Hackathon 2 Intro to Graphs 3 Hands on Cypher And I wanted to derive a new column based on whether or not the session was a practical one. R: Filtering data frames by column type ('x' must be numeric) https://www.markhneedham.com/blog/2014/09/29/r-filtering-data-frames-by-column-type-x-must-be-numeric/ Mon, 29 Sep 2014 05:46:43 +0000 https://www.markhneedham.com/blog/2014/09/29/r-filtering-data-frames-by-column-type-x-must-be-numeric/ I’ve been working through the exercises from An Introduction to Statistical Learning and one of them required you to create a pair wise correlation matrix of variables in a data frame. The exercise uses the 'Carseats' data set which can be imported like so: > install.packages("ISLR") > library(ISLR) > head(Carseats) Sales CompPrice Income Advertising Population Price ShelveLoc Age Education Urban US 1 9.50 138 73 11 276 120 Bad 42 17 Yes Yes 2 11. Neo4j: COLLECTing multiple values (Too many parameters for function 'collect') https://www.markhneedham.com/blog/2014/09/26/neo4j-collecting-multiple-values-too-many-parameters-for-function-collect/ Fri, 26 Sep 2014 20:46:50 +0000 https://www.markhneedham.com/blog/2014/09/26/neo4j-collecting-multiple-values-too-many-parameters-for-function-collect/ One of my favourite functions in Neo4j’s cypher query language is COLLECT which allows us to group items into an array for later consumption. However, I’ve noticed that people sometimes have trouble working out how to collect multiple items with COLLECT and struggle to find a way to do so. Consider the following data set: create (p:Person {name: "Mark"}) create (e1:Event {name: "Event1", timestamp: 1234}) create (e2:Event {name: "Event2", timestamp: 4567}) create (p)-[:EVENT]->(e1) create (p)-[:EVENT]->(e2) If we wanted to return each person along with a collection of the event names they’d participated in we could write the following: Neo4j: LOAD CSV - Column is null https://www.markhneedham.com/blog/2014/09/24/neo4j-load-csv-column-is-null/ Wed, 24 Sep 2014 20:21:59 +0000 https://www.markhneedham.com/blog/2014/09/24/neo4j-load-csv-column-is-null/ One problem I’ve seen a few people have recently when using Neo4j’s LOAD CSV function is dealing with CSV files that have dodgy hidden characters at the beginning of the header line. For example, consider an import of this CSV file: $ cat ~/Downloads/dodgy.csv userId,movieId 1,2 We might start by checking which columns it has: $ load csv with headers from "file:/Users/markneedham/Downloads/dodgy.csv" as line return line; +----------------------------------+ | line | +----------------------------------+ | {userId -> "1", movieId -> "2"} | +----------------------------------+ 1 row Looks good so far but what about if we try to return just 'userId'? R: ggplot - Plotting multiple variables on a line chart https://www.markhneedham.com/blog/2014/09/16/r-ggplot-plotting-multiple-variables-on-a-line-chart/ Tue, 16 Sep 2014 16:59:21 +0000 https://www.markhneedham.com/blog/2014/09/16/r-ggplot-plotting-multiple-variables-on-a-line-chart/ In my continued playing around with meetup data I wanted to plot the number of members who join the Neo4j group over time. I started off with the variable 'byWeek' which shows how many members joined the group each week: > head(byWeek) Source: local data frame [6 x 2] week n 1 2011-06-02 8 2 2011-06-09 4 3 2011-06-30 2 4 2011-07-14 1 5 2011-07-21 1 6 2011-08-18 1 I wanted to plot the actual count alongside a rolling average for which I created the following data frame: R: ggplot - Plotting a single variable line chart (geom_line requires the following missing aesthetics: y) https://www.markhneedham.com/blog/2014/09/13/r-ggplot-plotting-a-single-variable-line-chart-geom_line-requires-the-following-missing-aesthetics-y/ Sat, 13 Sep 2014 11:41:39 +0000 https://www.markhneedham.com/blog/2014/09/13/r-ggplot-plotting-a-single-variable-line-chart-geom_line-requires-the-following-missing-aesthetics-y/ I’ve been learning how to do moving averages in R and having done that calculation I wanted to plot these variables on a line chart using ggplot. The vector of rolling averages looked like this: > rollmean(byWeek$n, 4) [1] 3.75 2.00 1.25 1.00 1.25 1.25 1.75 1.75 1.75 2.50 2.25 2.75 3.50 2.75 2.75 [16] 2.25 1.50 1.50 2.00 2.00 2.00 2.00 1.25 1.50 2.25 2.50 3.00 3.25 2.75 4. R: Calculating rolling or moving averages https://www.markhneedham.com/blog/2014/09/13/r-calculating-rolling-or-moving-averages/ Sat, 13 Sep 2014 08:15:26 +0000 https://www.markhneedham.com/blog/2014/09/13/r-calculating-rolling-or-moving-averages/ I’ve been playing around with some time series data in R and since there’s a bit of variation between consecutive points I wanted to smooth the data out by calculating the moving average. I struggled to find an in built function to do this but came across Didier Ruedin’s blog post which described the following function to do the job: mav <- function(x,n=5){filter(x,rep(1/n,n), sides=2)} I tried plugging in some numbers to understand how it works: R: ggplot - Cumulative frequency graphs https://www.markhneedham.com/blog/2014/08/31/r-ggplot-cumulative-frequency-graphs/ Sun, 31 Aug 2014 22:10:42 +0000 https://www.markhneedham.com/blog/2014/08/31/r-ggplot-cumulative-frequency-graphs/ In my continued playing around with ggplot I wanted to create a chart showing the cumulative growth of the number of members of the Neo4j London meetup group. My initial data frame looked like this: > head(meetupMembers) joinTimestamp joinDate monthYear quarterYear week dayMonthYear 1 1.376572e+12 2013-08-15 13:13:40 2013-08-01 2013-07-01 2013-08-15 2013-08-15 2 1.379491e+12 2013-09-18 07:55:11 2013-09-01 2013-07-01 2013-09-12 2013-09-18 3 1.349454e+12 2012-10-05 16:28:04 2012-10-01 2012-10-01 2012-10-04 2012-10-05 4 1.383127e+12 2013-10-30 09:59:03 2013-10-01 2013-10-01 2013-10-24 2013-10-30 5 1. R: dplyr - group_by dynamic or programmatic field / variable (Error: index out of bounds) https://www.markhneedham.com/blog/2014/08/29/r-dplyr-group_by-dynamic-or-programmatic-field-variable-error-index-out-of-bounds/ Fri, 29 Aug 2014 09:13:00 +0000 https://www.markhneedham.com/blog/2014/08/29/r-dplyr-group_by-dynamic-or-programmatic-field-variable-error-index-out-of-bounds/ In my last blog post I showed how to group timestamp based data by week, month and quarter and by the end we had the following code samples using dplyr and zoo: library(RNeo4j) library(zoo) timestampToDate <- function(x) as.POSIXct(x / 1000, origin="1970-01-01", tz = "GMT") query = "MATCH (:Person)-[:HAS_MEETUP_PROFILE]->()-[:HAS_MEMBERSHIP]->(membership)-[:OF_GROUP]->(g:Group {name: \"Neo4j - London User Group\"}) RETURN membership.joined AS joinTimestamp" meetupMembers = cypher(graph, query) meetupMembers$joinDate <- timestampToDate(meetupMembers$joinTimestamp) meetupMembers$monthYear <- as.Date(as.yearmon(meetupMembers$joinDate)) meetupMembers$quarterYear <- as. R: Grouping by week, month, quarter https://www.markhneedham.com/blog/2014/08/29/r-grouping-by-week-month-quarter/ Fri, 29 Aug 2014 00:25:33 +0000 https://www.markhneedham.com/blog/2014/08/29/r-grouping-by-week-month-quarter/ In my continued playing around with R and meetup data I wanted to have a look at when people joined the London Neo4j group based on week, month or quarter of the year to see when they were most likely to do so. I started with the following query to get back the join timestamps: library(RNeo4j) query = "MATCH (:Person)-[:HAS_MEETUP_PROFILE]->()-[:HAS_MEMBERSHIP]->(membership)-[:OF_GROUP]->(g:Group {name: \"Neo4j - London User Group\"}) RETURN membership.joined AS joinTimestamp" meetupMembers = cypher(graph, query) > head(meetupMembers) joinTimestamp 1 1. Neo4j: LOAD CSV - Handling empty columns https://www.markhneedham.com/blog/2014/08/22/neo4j-load-csv-handling-empty-columns/ Fri, 22 Aug 2014 12:51:36 +0000 https://www.markhneedham.com/blog/2014/08/22/neo4j-load-csv-handling-empty-columns/ A common problem that people encounter when trying to import CSV files into Neo4j using Cypher’s LOAD CSV command is how to handle empty or 'null' entries in said files. For example let’s try and import the following file which has 3 columns, 1 populated, 2 empty: $ cat /tmp/foo.csv a,b,c mark,, load csv with headers from "file:/tmp/foo.csv" as row MERGE (p:Person {a: row.a}) SET p.b = row.b, p.c = row. R: Rook - Hello world example - 'Cannot find a suitable app in file' https://www.markhneedham.com/blog/2014/08/22/r-rook-hello-world-example-cannot-find-a-suitable-app-in-file/ Fri, 22 Aug 2014 11:05:54 +0000 https://www.markhneedham.com/blog/2014/08/22/r-rook-hello-world-example-cannot-find-a-suitable-app-in-file/ I’ve been playing around with the Rook library and struggled a bit getting a basic Hello World application up and running so I thought I should document it for future me. I wanted to spin up a web server using Rook and serve a page with the text 'Hello World'. I started with the following code: library(Rook) s <- Rhttpd$new() s$add(name='MyApp',app='helloworld.R') s$start() s$browse("MyApp") where helloWorld.R contained the following code: Ruby: Create and share Google Drive Spreadsheet https://www.markhneedham.com/blog/2014/08/17/ruby-create-and-share-google-drive-spreadsheet/ Sun, 17 Aug 2014 21:42:24 +0000 https://www.markhneedham.com/blog/2014/08/17/ruby-create-and-share-google-drive-spreadsheet/ Over the weekend I’ve been trying to write some code to help me create and share a Google Drive spreadsheet and for the first bit Ihttp://www.markhneedham.com/blog/2014/08/17/ruby-google-drive-errorbadauthentication-googledriveauthenticationerror-infoinvalidsecondfactor/[started out with the Google Drive gem]. This worked reasonably well but that gem doesn’t have an API for changing the permissions on a document so I ended up using the google-api-client gem for that bit. This tutorial provides a good quick start for getting up and running but it still has a manual step to copy/paste the 'OAuth token' which I wanted to get rid of. Ruby: Receive JSON in request body https://www.markhneedham.com/blog/2014/08/17/ruby-receive-json-in-request-body/ Sun, 17 Aug 2014 12:21:15 +0000 https://www.markhneedham.com/blog/2014/08/17/ruby-receive-json-in-request-body/ I’ve been building a little Sinatra app to play around with the Google Drive API and one thing I struggled with was processing JSON posted in the request body. I came across a few posts which suggested that the request body would be available as params or request but after trying several ways of sending a POST request that doesn’t seem to be the case. I eventually came across this StackOverflow post which shows how to do it: Ruby: Google Drive - Error=BadAuthentication (GoogleDrive::AuthenticationError) Info=InvalidSecondFactor https://www.markhneedham.com/blog/2014/08/17/ruby-google-drive-errorbadauthentication-googledriveauthenticationerror-infoinvalidsecondfactor/ Sun, 17 Aug 2014 01:49:10 +0000 https://www.markhneedham.com/blog/2014/08/17/ruby-google-drive-errorbadauthentication-googledriveauthenticationerror-infoinvalidsecondfactor/ I’ve been using the Google Drive gem to try and interact with my Google Drive account and almost immediately ran into problems trying to login. I started out with the following code: require "rubygems" require "google_drive" session = GoogleDrive.login("me@mydomain.com", "mypassword") I’ll move it to use OAuth when I put it into my application but for spiking this approach works. Unfortunately I got the following error when running the script: Where does r studio install packages/libraries? https://www.markhneedham.com/blog/2014/08/14/where-does-r-studio-install-packageslibraries/ Thu, 14 Aug 2014 10:24:52 +0000 https://www.markhneedham.com/blog/2014/08/14/where-does-r-studio-install-packageslibraries/ As a newbie to R I wanted to look at the source code of some of the libraries/packages that I’d installed via R Studio which I initially struggled to do as I wasn’t sure where the packages had been installed. I eventually came across a StackOverflow post which described the http://stat.ethz.ch/R-manual/R-devel/library/base/html/libPaths.html function which tells us where that is: > .libPaths() [1] "/Library/Frameworks/R.framework/Versions/3.1/Resources/library" If we want to see which libraries are installed we can use the http://stat. R: Grouping by two variables https://www.markhneedham.com/blog/2014/08/11/r-grouping-by-two-variables/ Mon, 11 Aug 2014 16:47:35 +0000 https://www.markhneedham.com/blog/2014/08/11/r-grouping-by-two-variables/ In my continued playing around with R and meetup data I wanted to group a data table by two variables - day and event - so I could see the most popular day of the week for meetups and which events we’d held on those days. I started off with the following data table: > head(eventsOf2014, 20) eventTime event.name rsvps datetime day monthYear 16 1.393351e+12 Intro to Graphs 38 2014-02-25 18:00:00 Tuesday 02-2014 17 1. 4 types of user https://www.markhneedham.com/blog/2014/07/29/4-types-of-user/ Tue, 29 Jul 2014 19:07:11 +0000 https://www.markhneedham.com/blog/2014/07/29/4-types-of-user/ I’ve been working with Neo4j full time for slightly more than a year now and from interacting with the community I’ve noticed that while using different features of the product people fall into 4 categories. These are as follows: On one axis we have 'loudness' i.e. how vocal somebody is either on twitter, StackOverflow or by email and on the other we have 'success' which is how well a product feature is working for them. R: ggplot - Plotting back to back charts using facet_wrap https://www.markhneedham.com/blog/2014/07/25/r-ggplot-plotting-back-to-back-charts-using-facet_wrap/ Fri, 25 Jul 2014 21:57:24 +0000 https://www.markhneedham.com/blog/2014/07/25/r-ggplot-plotting-back-to-back-charts-using-facet_wrap/ Earlier in the week I showed a way to plot back to back charts using R’s ggplot library but looking back on the code it felt like it was a bit hacky to 'glue' two charts together using a grid. I wanted to find a better way. To recap, I came up with the following charts showing the RSVPs to Neo4j London meetup events using this code: The first thing we need to do to simplify chart generation is to return 'yes' and 'no' responses in the same cypher query, like so: Java: Determining the status of data import using kill signals https://www.markhneedham.com/blog/2014/07/23/java-determining-the-status-of-data-import-using-kill-signals/ Wed, 23 Jul 2014 22:20:25 +0000 https://www.markhneedham.com/blog/2014/07/23/java-determining-the-status-of-data-import-using-kill-signals/ A few weeks ago I was working on the initial import of ~ 60 million bits of data into Neo4j and we kept running into a problem where the import process just seemed to freeze and nothing else was imported. It was very difficult to tell what was happening inside the process - taking a thread dump merely informed us that it was attempting to process one line of a CSV line and was somehow unable to do so. R: ggplot - Plotting back to back bar charts https://www.markhneedham.com/blog/2014/07/20/r-ggplot-plotting-back-to-back-bar-charts/ Sun, 20 Jul 2014 16:50:55 +0000 https://www.markhneedham.com/blog/2014/07/20/r-ggplot-plotting-back-to-back-bar-charts/ I’ve been playing around with R’s ggplot library to explore the Neo4j London meetup and the next thing I wanted to do was plot back to back bar charts showing 'yes' and 'no' RSVPs. I’d already done the 'yes' bar chart using the following code: query = "MATCH (e:Event)<-[:TO]-(response {response: 'yes'}) RETURN response.time AS time, e.time + e.utc_offset AS eventTime" allYesRSVPs = cypher(graph, query) allYesRSVPs$time = timestampToDate(allYesRSVPs$time) allYesRSVPs$eventTime = timestampToDate(allYesRSVPs$eventTime) allYesRSVPs$difference = as. Neo4j 2.1.2: Finding where I am in a linked list https://www.markhneedham.com/blog/2014/07/20/neo4j-2-1-2-finding-where-i-am-in-a-linked-list/ Sun, 20 Jul 2014 15:13:00 +0000 https://www.markhneedham.com/blog/2014/07/20/neo4j-2-1-2-finding-where-i-am-in-a-linked-list/ I was recently asked how to calculate the position of a node in a linked list and realised that as the list increases in size this is one of the occasions when we should write an unmanaged extension rather than using cypher. I wrote a quick bit of code to create a linked list with 10,000 elements in it: public class Chains { public static void main(String[] args) { String simpleChains = "/tmp/longchains"; populate( simpleChains, 10000 ); } private static void populate( String path, int chainSize ) { GraphDatabaseService db = new GraphDatabaseFactory(). R: ggplot - Don't know how to automatically pick scale for object of type difftime - Discrete value supplied to continuous scale https://www.markhneedham.com/blog/2014/07/20/r-ggplot-dont-know-how-to-automatically-pick-scale-for-object-of-type-difftime-discrete-value-supplied-to-continuous-scale/ Sun, 20 Jul 2014 00:21:17 +0000 https://www.markhneedham.com/blog/2014/07/20/r-ggplot-dont-know-how-to-automatically-pick-scale-for-object-of-type-difftime-discrete-value-supplied-to-continuous-scale/ While reading 'Why The R Programming Language Is Good For Business' I came across Udacity’s 'Data Analysis with R' courses - part of which focuses exploring data sets using visualisations, something I haven’t done much of yet. I thought it’d be interesting to create some visualisations around the times that people RSVP 'yes' to the various Neo4j events that we run in London. I started off with the following query which returns the date time that people replied 'Yes' to an event and the date time of the event: R: Apply a custom function across multiple lists https://www.markhneedham.com/blog/2014/07/16/r-apply-a-custom-function-across-multiple-lists/ Wed, 16 Jul 2014 05:04:46 +0000 https://www.markhneedham.com/blog/2014/07/16/r-apply-a-custom-function-across-multiple-lists/ In my continued playing around with R I wanted to map a custom function over two lists comparing each item with its corresponding items. If we just want to use a built in function such as subtraction between two lists it’s quite easy to do: > c(10,9,8,7,6,5,4,3,2,1) - c(5,4,3,4,3,2,2,1,2,1) [1] 5 5 5 3 3 3 2 2 0 0 I wanted to do a slight variation on that where instead of returning the difference I wanted to return a text value representing the difference e. Neo4j: LOAD CSV - Processing hidden arrays in your CSV documents https://www.markhneedham.com/blog/2014/07/10/neo4j-load-csv-processing-hidden-arrays-in-your-csv-documents/ Thu, 10 Jul 2014 14:54:25 +0000 https://www.markhneedham.com/blog/2014/07/10/neo4j-load-csv-processing-hidden-arrays-in-your-csv-documents/ I was recently asked how to process an 'array' of values inside a column in a CSV file using Neo4j’s LOAD CSV tool and although I initially thought this wouldn’t be possible as every cell is treated as a String, Michael showed me a way of working around this which I thought was pretty neat. Let’s say we have a CSV file representing people and their friends. It might look like this: R/plyr: ddply - Error in vector(type, length) : vector: cannot make a vector of mode 'closure'. https://www.markhneedham.com/blog/2014/07/07/rplyr-ddply-error-in-vectortype-length-vector-cannot-make-a-vector-of-mode-closure/ Mon, 07 Jul 2014 06:07:29 +0000 https://www.markhneedham.com/blog/2014/07/07/rplyr-ddply-error-in-vectortype-length-vector-cannot-make-a-vector-of-mode-closure/ In my continued playing around with plyr’s ddply function I was trying to group a data frame by one of its columns and return a count of the number of rows with specific values and ran into a strange (to me) error message. I had a data frame: n = c(2, 3, 5) s = c("aa", "bb", "cc") b = c(TRUE, FALSE, TRUE) df = data.frame(n, s, b) And wanted to group and count on column 'b' so I’d get back a count of 2 for TRUE and 1 for FALSE. R/plyr: ddply - Renaming the grouping/generated column when grouping by date https://www.markhneedham.com/blog/2014/07/02/rplyr-ddply-renaming-the-groupinggenerate-column-when-grouping-by-date/ Wed, 02 Jul 2014 06:30:50 +0000 https://www.markhneedham.com/blog/2014/07/02/rplyr-ddply-renaming-the-groupinggenerate-column-when-grouping-by-date/ On Nicole’s recommendation I’ve been having a look at R’s plyr package to see if I could simplify my meetup analysis and I started by translating my code that grouped meetup join dates by day of the week. To refresh, the code without plyr looked like this: library(Rneo4j) timestampToDate <- function(x) as.POSIXct(x / 1000, origin="1970-01-01") query = "MATCH (:Person)-[:HAS_MEETUP_PROFILE]->()-[:HAS_MEMBERSHIP]->(membership)-[:OF_GROUP]->(g:Group {name: \"Neo4j - London User Group\"}) RETURN membership.joined AS joinDate" meetupMembers = cypher(graph, query) meetupMembers$joined <- timestampToDate(meetupMembers$joinDate) dd = aggregate(meetupMembers$joined, by=list(format(meetupMembers$joined, "%A")), function(x) length(x)) colnames(dd) = c("dayOfWeek", "count") which returns the following: R: Aggregate by different functions and join results into one data frame https://www.markhneedham.com/blog/2014/06/30/r-aggregate-by-different-functions-and-join-results-into-one-data-frame/ Mon, 30 Jun 2014 22:47:44 +0000 https://www.markhneedham.com/blog/2014/06/30/r-aggregate-by-different-functions-and-join-results-into-one-data-frame/ In continuing my analysis of the London Neo4j meetup group using R I wanted to see which days of the week we organise meetups and how many people RSVP affirmatively by the day. I started out with this query which returns each event and the number of 'yes' RSVPS: library(Rneo4j) timestampToDate <- function(x) as.POSIXct(x / 1000, origin="1970-01-01") query = "MATCH (g:Group {name: \"Neo4j - London User Group\"})-[:HOSTED_EVENT]->(event)<-[:TO]-({response: 'yes'})<-[:RSVPD]-() WHERE (event. R: Order by data frame column and take top 10 rows https://www.markhneedham.com/blog/2014/06/30/r-order-by-data-frame-column-and-take-top-10-rows/ Mon, 30 Jun 2014 21:44:14 +0000 https://www.markhneedham.com/blog/2014/06/30/r-order-by-data-frame-column-and-take-top-10-rows/ I’ve been doing some ad-hoc analysis of the Neo4j London meetup group using R and Neo4j and having worked out how to group by certain keys the next step was to order the rows of the data frame. I wanted to drill into the days on which people join the group and see whether they join it at a specific time of day. My feeling was that most people would join on a Monday morning. Neo4j/R: Grouping meetup members by join timestamp https://www.markhneedham.com/blog/2014/06/30/neo4jr-grouping-meetup-members-by-join-timestamp/ Mon, 30 Jun 2014 00:06:54 +0000 https://www.markhneedham.com/blog/2014/06/30/neo4jr-grouping-meetup-members-by-join-timestamp/ I wanted to do some ad-hoc analysis on the join date of members of the Neo4j London meetup group and since cypher doesn’t yet have functions for dealings with dates I thought I’d give R a try. I started off by executing a cypher query which returned the join timestamp of all the group members using Nicole White’s RNeo4j package: > library(Rneo4j) > query = "match (:Person)-[:HAS_MEETUP_PROFILE]->()-[:HAS_MEMBERSHIP]->(membership)-[:OF_GROUP]->(g:Group {name: \"Neo4j - London User Group\"}) RETURN membership. Neo4j: Set Based Operations with the experimental Cypher optimiser https://www.markhneedham.com/blog/2014/06/29/neo4j-set-based-operations-with-the-experimental-cypher-optimiser/ Sun, 29 Jun 2014 08:45:34 +0000 https://www.markhneedham.com/blog/2014/06/29/neo4j-set-based-operations-with-the-experimental-cypher-optimiser/ A few months ago I wrote about cypher queries which look for a missing relationship and showed how you could optimise them by re-working the query slightly. To refresh, we wanted to find all the people in the London office that I hadn’t worked with given this model... </img> ...and this initial query: MATCH (p:Person {name: "me"})-[:MEMBER_OF]->(office {name: "London Office"})<-[:MEMBER_OF]-(colleague) WHERE NOT (p-[:COLLEAGUES]->(colleague)) RETURN COUNT(colleague) This took on average 7. Neo4j's Cypher vs Clojure - Group by and Sorting https://www.markhneedham.com/blog/2014/06/29/neo4j-cypher-vs-clojure-for-group-by-and-sorting/ Sun, 29 Jun 2014 02:56:53 +0000 https://www.markhneedham.com/blog/2014/06/29/neo4j-cypher-vs-clojure-for-group-by-and-sorting/ One of the points that I emphasised during my talk on building Neo4j backed applications using Clojure last week is understanding when to use Cypher to solve a problem and when to use the programming language. A good example of this is in the meetup application I’ve been working on. I have a collection of events and want to display past events in descending order and future events in ascending order. Data Science: Mo' Data Mo' Problems https://www.markhneedham.com/blog/2014/06/28/data-science-mo-data-mo-problems/ Sat, 28 Jun 2014 23:35:25 +0000 https://www.markhneedham.com/blog/2014/06/28/data-science-mo-data-mo-problems/ Over the last couple of years I’ve worked on several proof of concept style Neo4j projects and on a lot of them people have wanted to work with their entire data set which I don’t think makes sense so early on. In the early parts of a project we’re trying to prove out our approach rather than prove we can handle big data - something that Ashok taught me a couple of years ago on a project we worked on together. Neo4j: Cypher - Finding movies by decade https://www.markhneedham.com/blog/2014/06/28/neo4j-cypher-finding-movies-by-decade/ Sat, 28 Jun 2014 11:12:30 +0000 https://www.markhneedham.com/blog/2014/06/28/neo4j-cypher-finding-movies-by-decade/ I was recently asked how to find the number of movies produced per decade in the movie data set that comes with the Neo4j browser and can be imported with the following command: :play movies We want to get one row per decade and have a count alongside so the easiest way is to start with one decade and build from there. MATCH (movie:Movie) WHERE movie.released >= 1990 and movie. Neo4j: Cypher - Separation of concerns https://www.markhneedham.com/blog/2014/06/27/neo4j-cypher-separation-of-concerns/ Fri, 27 Jun 2014 10:51:25 +0000 https://www.markhneedham.com/blog/2014/06/27/neo4j-cypher-separation-of-concerns/ While preparing my talk on building Neo4j backed applications with Clojure I realised that some of the queries I’d written were incredibly complicated and went against anything I’d learnt about separating different concerns. One example of this was the query I used to generate the data for the following page of the meetup application I’ve been working on: Depending on the selected tab you can choose to see the people signed up for the meetup and the date that they signed up or the topics that those people are interested in. Neo4j: LOAD CSV - Handling conditionals https://www.markhneedham.com/blog/2014/06/17/neo4j-load-csv-handling-conditionals/ Tue, 17 Jun 2014 23:41:35 +0000 https://www.markhneedham.com/blog/2014/06/17/neo4j-load-csv-handling-conditionals/ While building up the Neo4j World Cup Graph I’ve been making use of the http://neo4j.com/blog/neo4j-2-1-graph-etl function and I frequently found myself needing to do different things depending on the value in one of the columns. For example I have one CSV file which contains the different events that can happen in a football match: match_id,player,player_id,time,type "1012","Antonin Panenka","174835",21,"penalty" "1012","Faisal Al Dakhil","2204",57,"goal" "102","Roger Milla","79318",106,"goal" "102","Roger Milla","79318",108,"goal" "102","Bernardo Redin","44555",115,"goal" "102","Andre Kana-biyik","174649",44,"yellow" If the type is 'penalty', 'owngoal' or 'goal' then I want to create a SCORED_GOAL relationship whereas if it’s 'yellow', 'yellowred' or 'red' then I want to create a RECEIVED_CARD relationship instead. Ruby: Regex - Matching the Trademark ™ character https://www.markhneedham.com/blog/2014/06/08/ruby-regex-matching-the-trademark-character/ Sun, 08 Jun 2014 01:34:03 +0000 https://www.markhneedham.com/blog/2014/06/08/ruby-regex-matching-the-trademark-character/ I’ve been playing around with some World Cup data and while cleaning up the data I wanted to strip out the year and host country for a world cup. I started with a string like this which I was reading from a file: 1930 FIFA World Cup Uruguay ™ And I wanted to be able to extract just the 'Uruguay' bit without getting the trademark or the space preceding it. Neo4j Meetup Coding Dojo Style https://www.markhneedham.com/blog/2014/05/31/neo4j-meetup-coding-dojo-style/ Sat, 31 May 2014 22:55:33 +0000 https://www.markhneedham.com/blog/2014/05/31/neo4j-meetup-coding-dojo-style/ A few weeks ago we ran a http://www.meetup.com/graphdb-london/events/179211972/ meetup in the Neo4j London office during which we worked with the meta data around 1 million images recently released into the public domain by the British Library. Feedback from previous meetups had indicated that attendees wanted to practice modelling a domain from scratch and understand the options for importing said model into the database. This data set seemed perfect for this purpose. Neo4j/R: Analysing London NoSQL meetup membership https://www.markhneedham.com/blog/2014/05/31/neo4jr-analysing-london-nosql-meetup-membership/ Sat, 31 May 2014 21:32:24 +0000 https://www.markhneedham.com/blog/2014/05/31/neo4jr-analysing-london-nosql-meetup-membership/ In my spare time I’ve been working on a Neo4j application that runs on tops of meetup.com’s API and Nicole recently showed me how I could wire up some of the queries to use her Rneo4j library: @markhneedham pic.twitter.com/8014jckEUl — Nicole White (@_nicolemargaret) May 31, 2014 The query used in that visualisation shows the number of members that overlap between each pair of groups but a more interesting query is the one which shows the % overlap between groups based on the unique members across the groups. Thoughts on meetups https://www.markhneedham.com/blog/2014/05/31/thoughts-on-meetups/ Sat, 31 May 2014 19:50:26 +0000 https://www.markhneedham.com/blog/2014/05/31/thoughts-on-meetups/ I recently came across an interesting blog post by Zach Tellman in which he explains a new approach that he’s been trialling at The Bay Area Clojure User Group. Zach explains that a lecture based approach isn’t necessarily the most effective way for people to learn and that half of the people attending the meetup are likely to be novices and would struggle to follow more advanced content. He then goes on to explain an alternative approach: Neo4j: Cypher - UNWIND vs FOREACH https://www.markhneedham.com/blog/2014/05/31/neo4j-cypher-unwind-vs-foreach/ Sat, 31 May 2014 14:19:25 +0000 https://www.markhneedham.com/blog/2014/05/31/neo4j-cypher-unwind-vs-foreach/ I’ve written a couple of posts about the new UNWIND clause in Neo4j’s cypher query language but I forgot about my favourite use of UNWIND, which is to get rid of some uses of FOREACH from our queries. Let’s say we’ve created a timetree up front and now have a series of events coming in that we want to create in the database and attach to the appropriate part of the timetree. Neo4j: Cypher - Neo.ClientError.Statement.ParameterMissing and neo4j-shell https://www.markhneedham.com/blog/2014/05/31/neo4j-cypher-neo-clienterror-statement-parametermissing-and-neo4j-shell/ Sat, 31 May 2014 12:44:10 +0000 https://www.markhneedham.com/blog/2014/05/31/neo4j-cypher-neo-clienterror-statement-parametermissing-and-neo4j-shell/ Every now and then I get sent Neo4j cypher queries to look at and more often than not they’re parameterised which means you can’t easily run them in the Neo4j browser. For example let’s say we have a database which has a user called 'Mark': CREATE (u:User {name: "Mark"}) Now we write a query to find 'Mark' with the name parameterised so we can easily search for a different user in future: Clojure: Destructuring group-by's output https://www.markhneedham.com/blog/2014/05/31/clojure-destructuring-group-bys-output/ Sat, 31 May 2014 00:03:48 +0000 https://www.markhneedham.com/blog/2014/05/31/clojure-destructuring-group-bys-output/ One of my favourite features of Clojure is that it allows you to destructure a data structure into values that are a bit easier to work with. I often find myself referring to Jay Fields' article which contains several examples showing the syntax and is a good starting point. One recent use of destructuring I had was where I was working with a vector containing events like this: user> (def events [{:name "e1" :timestamp 123} {:name "e2" :timestamp 456} {:name "e3" :timestamp 789}]) I wanted to split the events in two - those containing events with a timestamp greater than 123 and those less than or equal to 123. Neo4j: Cypher - Rounding a float value to decimal places https://www.markhneedham.com/blog/2014/05/25/neo4j-cypher-rounding-a-float-value-to-decimal-places/ Sun, 25 May 2014 22:17:35 +0000 https://www.markhneedham.com/blog/2014/05/25/neo4j-cypher-rounding-a-float-value-to-decimal-places/ About 6 months ago Jacqui Read created a github issue explaining how she wanted to round a float value to a number of decimal places but was unable to do so due to the http://docs.neo4j.org/chunked/stable/query-functions-mathematical.html#functions-round function not taking the appropriate parameter. I found myself wanting to do the same thing last week where I initially had the following value: RETURN toFloat("12.336666") AS value I wanted to round that to 2 decimal places and Wes suggested multiplying the value before using ROUND and then dividing afterwards to achieve that. Neo4j 2.1: Passing around node ids vs UNWIND https://www.markhneedham.com/blog/2014/05/25/neo4j-2-1-passing-around-node-ids-vs-unwind/ Sun, 25 May 2014 10:48:39 +0000 https://www.markhneedham.com/blog/2014/05/25/neo4j-2-1-passing-around-node-ids-vs-unwind/ When Neo4j 2.1 is released we’ll have the UNWIND clause which makes working with collections of things easier. In my blog post about creating adjacency matrices we wanted to show how many people were members of the first 5 meetup groups ordered alphabetically and then check how many were members of each of the other groups. Without the UNWIND clause we’d have to do this: MATCH (g:Group) WITH g ORDER BY g. Clojure: Create a directory https://www.markhneedham.com/blog/2014/05/24/clojure-create-a-directory/ Sat, 24 May 2014 00:12:56 +0000 https://www.markhneedham.com/blog/2014/05/24/clojure-create-a-directory/ I spent much longer than I should have done trying to work out how to create a directory in Clojure as part of an import script I’m working out so for my future self this is how you do it: (.mkdir (java.io.File. "/path/to/dir/to/create")) I’m creating a directory which contains today’s date so I’d want something like 'members-2014-05-24' if I was running it today. The clj-time library is very good for working with dates. Neo4j 2.1: Creating adjacency matrices https://www.markhneedham.com/blog/2014/05/20/neo4j-2-0-creating-adjacency-matrices/ Tue, 20 May 2014 23:14:07 +0000 https://www.markhneedham.com/blog/2014/05/20/neo4j-2-0-creating-adjacency-matrices/ About 9 months ago I wrote a blog post showing how to export an adjacency matrix from a Neo4j 1.9 database using the cypher query language and I thought it deserves an update to use 2.0 syntax. I’ve been spending some of my free time working on an application that runs on top of meetup.com’s API and one of the queries I wanted to write was to find the common members between 2 meetup groups. Jersey/Jax RS: Streaming JSON https://www.markhneedham.com/blog/2014/04/30/jerseyjax-rs-streaming-json/ Wed, 30 Apr 2014 01:24:33 +0000 https://www.markhneedham.com/blog/2014/04/30/jerseyjax-rs-streaming-json/ About a year ago I wrote a blog post showing how to stream a HTTP response using Jersey/Jax RS and I recently wanted to do the same thing but this time using JSON. A common pattern is to take our Java object and get a JSON string representation of that but that isn’t the most efficient use of memory because we now have the Java object and a string representation. Clojure: Paging meetup data using lazy sequences https://www.markhneedham.com/blog/2014/04/30/clojure-paging-meetup-data-using-lazy-sequences/ Wed, 30 Apr 2014 00:20:46 +0000 https://www.markhneedham.com/blog/2014/04/30/clojure-paging-meetup-data-using-lazy-sequences/ I’ve been playing around with the meetup API to do some analysis on the Neo4j London meetup and one thing I wanted to do was download all the members of the group. A feature of the meetup API is that each end point will only allow you to return a maximum of 200 records so I needed to make use of offsets and paging to retrieve everybody. It seemed like a good chance to use some lazy sequences to keep track of the offsets and then stop making calls to the API once I wasn’t retrieving any more results. Clojure: clj-time - Formatting a date / timestamp with day suffixes e.g. 1st, 2nd, 3rd https://www.markhneedham.com/blog/2014/04/26/clojure-clj-time-formatting-a-date-timestamp-with-day-suffixes-e-g-1st-2nd-3rd/ Sat, 26 Apr 2014 07:50:46 +0000 https://www.markhneedham.com/blog/2014/04/26/clojure-clj-time-formatting-a-date-timestamp-with-day-suffixes-e-g-1st-2nd-3rd/ I’ve been using the clj-time library recently - a Clojure wrapper around Joda Time - and one thing I wanted to do is format a date with day suffixes e.g. 1st, 2nd, 3rd. I started with the following timestamp: 1309368600000 The first step was to convert that into a DateTime object like so: user> (require '[clj-time.coerce :as c]) user> (c/from-long 1309368600000) #<DateTime 2011-06-29T17:30:00.000Z> I wanted to output that date in the following format: Neo4j: Cypher - Flatten a collection https://www.markhneedham.com/blog/2014/04/23/neo4j-cypher-flatten-a-collection/ Wed, 23 Apr 2014 22:02:19 +0000 https://www.markhneedham.com/blog/2014/04/23/neo4j-cypher-flatten-a-collection/ Every now and then in Cypher land we’ll end up with a collection of arrays, often created via the COLLECT function, that we want to squash down into one array. For example let’s say we have the following array of arrays... $ RETURN [[1,2,3], [4,5,6], [7,8,9]] AS result; ==> +---------------------------+ ==> | result | ==> +---------------------------+ ==> | [[1,2,3],[4,5,6],[7,8,9]] | ==> +---------------------------+ ==> 1 row ...and we want to return the array . Neo4j: Cypher - Creating a time tree down to the day https://www.markhneedham.com/blog/2014/04/19/neo4j-cypher-creating-a-time-tree-down-to-the-day/ Sat, 19 Apr 2014 21:15:21 +0000 https://www.markhneedham.com/blog/2014/04/19/neo4j-cypher-creating-a-time-tree-down-to-the-day/ Michael recently wrote a blog post showing how to create a time tree representing time down to the second using Neo4j’s Cypher query language, something I built on top of for a side project I’m working on. The domain I want to model is RSVPs to meetup invites - I want to understand how much in advance people respond and how likely they are to drop out at a later stage. Neo4j 2.0.1: Cypher - Concatenating an empty collection / Type mismatch: expected Integer, Collection<Integer> or Collection<Collection<Integer>> but was Collection<Any> https://www.markhneedham.com/blog/2014/04/19/neo4j-2-0-1-cypher-concatenating-an-empty-collection-type-mismatch-expected-integer-collection-or-collection-but-was-collection/ Sat, 19 Apr 2014 19:51:58 +0000 https://www.markhneedham.com/blog/2014/04/19/neo4j-2-0-1-cypher-concatenating-an-empty-collection-type-mismatch-expected-integer-collection-or-collection-but-was-collection/ Last weekend I was playing around with some collections using Neo4j’s Cypher query language and I wanted to concatenate two collections. This was easy enough when both collections contained values... $ RETURN [1,2,3,4] + [5,6,7]; ==> +---------------------+ ==> | [1,2,3,4] + [5,6,7] | ==> +---------------------+ ==> | [1,2,3,4,5,6,7] | ==> +---------------------+ ==> 1 row ...but I ended up with the following exception when I tried to concatenate with an empty collection: Neo4j: Cypher - Creating relationships between a collection of nodes / Invalid input '[': https://www.markhneedham.com/blog/2014/04/19/neo4j-cypher-creating-relationships-between-a-collection-of-nodes-invalid-input/ Sat, 19 Apr 2014 06:33:39 +0000 https://www.markhneedham.com/blog/2014/04/19/neo4j-cypher-creating-relationships-between-a-collection-of-nodes-invalid-input/ When working with graphs we’ll frequently find ourselves wanting to create relationships between collections of nodes. A common example of this would be creating a linked list of days so that we can quickly traverse across a time tree. Let’s say we start with just 3 days: MERGE (day1:Day {day:1 }) MERGE (day2:Day {day:2 }) MERGE (day3:Day {day:3 }) RETURN day1, day2, day3 And we want to create a 'NEXT' relationship between adjacent days: Neo4j 2.0.0: Query not prepared correctly / Type mismatch: expected Map https://www.markhneedham.com/blog/2014/04/13/neo4j-2-0-0-query-not-prepared-correctly-type-mismatch-expected-map/ Sun, 13 Apr 2014 17:40:05 +0000 https://www.markhneedham.com/blog/2014/04/13/neo4j-2-0-0-query-not-prepared-correctly-type-mismatch-expected-map/ I was playing around with Neo4j’s Cypher last weekend and found myself accidentally running some queries against an earlier version of the Neo4j 2.0 series (2.0.0). My first query started with a map and I wanted to create a person from an identifier inside the map: WITH {person: {id: 1}} AS params MERGE (p:Person {id: params.person.id}) RETURN p When I ran the query I got this error: ==> SyntaxException: Type mismatch: expected Map but was Boolean, Number, String or Collection<Any> (line 1, column 62) ==> "WITH {person: {id: 1}} AS params MERGE (p:Person {id: params. install4j and AppleScript: Creating a Mac OS X Application Bundle for a Java application https://www.markhneedham.com/blog/2014/04/07/install4j-and-applescript-creating-a-mac-os-x-application-bundle-for-a-java-application/ Mon, 07 Apr 2014 00:04:28 +0000 https://www.markhneedham.com/blog/2014/04/07/install4j-and-applescript-creating-a-mac-os-x-application-bundle-for-a-java-application/ We have a few internal applications at Neo which can be launched using 'java -jar ' and I always forget where the jars are so I thought I’d wrap a Mac OS X application bundle around it to make life easier.</p> My favourite installation pattern is the one where when you double click the dmg it shows you a window where you can drag the application into the 'Applications' folder, like this: Clojure: Not so lazy sequences a.k.a chunking behaviour https://www.markhneedham.com/blog/2014/04/06/clojure-not-so-lazy-sequences-a-k-a-chunking-behaviour/ Sun, 06 Apr 2014 22:07:47 +0000 https://www.markhneedham.com/blog/2014/04/06/clojure-not-so-lazy-sequences-a-k-a-chunking-behaviour/ I’ve been playing with Clojure over the weekend and got caught out by the behaviour of lazy sequences due to chunking - something which was obvious to experienced Clojurians although not me. I had something similar to the following bit of code which I expected to only evaluate the first item of the infinite sequence that the range function generates: > (take 1 (map (fn [x] (println (str "printing..." x))) (range))) (printing. Soulver: For all your random calculations https://www.markhneedham.com/blog/2014/03/30/soulver-for-all-your-random-calculations/ Sun, 30 Mar 2014 14:48:41 +0000 https://www.markhneedham.com/blog/2014/03/30/soulver-for-all-your-random-calculations/ I often find myself doing random calculations and I used to do so part manually and part using Alfred's calculator until Alistair pointed me at Soulver, a desktop/iPhone/iPad app, which is even better. I thought I’d write some examples of calculations I use it for, partly so I’ll remember the syntax in future! Calculating how much memory Neo4j memory mapping will take up 800 mb + 2660mb + 6600mb + 9500mb + 40mb in GB = 19. Remote profiling Neo4j using yourkit https://www.markhneedham.com/blog/2014/03/24/remote-profiling-neo4j-using-yourkit/ Mon, 24 Mar 2014 23:44:29 +0000 https://www.markhneedham.com/blog/2014/03/24/remote-profiling-neo4j-using-yourkit/ yourkit is my favourite JVM profiling tool and whilst it’s really easy to profile a local JVM process, sometimes I need to profile a process on a remote machine. In that case we need to first have the remote JVM started up with a yourkit agent parameter passed as one of the args to the Java program. Since I’m mostly working with Neo4j this means we need to add the following to conf/neo4j-wrapper. Functional Programming in Java - Venkat Subramaniam: Book Review https://www.markhneedham.com/blog/2014/03/23/functional-programming-in-java-venkat-subramaniam-book-review/ Sun, 23 Mar 2014 21:18:36 +0000 https://www.markhneedham.com/blog/2014/03/23/functional-programming-in-java-venkat-subramaniam-book-review/ I picked up Venkat Subramaniam’s 'Functional Programming in Java: Harnessing the Power of Java 8 Lambda Expressions' to learn a little bit more about Java 8 having struggled to find any online tutorials which did that. A big chunk of the book focuses on lambdas, functional collection parameters and lazy evaluation which will be familiar to users of C#, Clojure, Scala, Haskell, Ruby, Python, F# or libraries like totallylazy and Guava. Neo4j 2.1.0-M01: LOAD CSV with Rik Van Bruggen's Tube Graph https://www.markhneedham.com/blog/2014/03/03/neo4j-2-1-0-m01-load-csv-with-rik-van-bruggens-tube-graph/ Mon, 03 Mar 2014 16:34:18 +0000 https://www.markhneedham.com/blog/2014/03/03/neo4j-2-1-0-m01-load-csv-with-rik-van-bruggens-tube-graph/ Last week we released the first milestone of Neo4j 2.1.0 and one its features is a new function in cypher - LOAD CSV - which aims to make it easier to get data into Neo4j. I thought I’d give it a try to import the London tube graph - something that my colleague Rik wrote about a few months ago. I’m using the same data set as Rik but I had to tweak it a bit as there were naming differences when describing the connection from Kennington to Waterloo and Kennington to Oval. Neo4j: Cypher - Finding directors who acted in their own movie https://www.markhneedham.com/blog/2014/02/28/neo4j-cypher-finding-directors-who-acted-in-their-own-movie/ Fri, 28 Feb 2014 22:57:59 +0000 https://www.markhneedham.com/blog/2014/02/28/neo4j-cypher-finding-directors-who-acted-in-their-own-movie/ I’ve been doing quite a few Intro to Neo4j sessions recently and since it contains a lot of problems for the attendees to work on I get to see how first time users of Cypher actually use it. A couple of hours in we want to write a query to find directors who acted in their own film based on the following model. A common answer is the following: Java 8: Lambda Expressions vs Auto Closeable https://www.markhneedham.com/blog/2014/02/26/java-8-lambda-expressions-vs-auto-closeable/ Wed, 26 Feb 2014 07:32:14 +0000 https://www.markhneedham.com/blog/2014/02/26/java-8-lambda-expressions-vs-auto-closeable/ If you used earlier versions of Neo4j via its Java API with Java 6 you probably have code similar to the following to ensure write operations happen within a transaction: public class StylesOfTx { public static void main( String[] args ) throws IOException { String path = "/tmp/tx-style-test"; FileUtils.deleteRecursively(new File(path)); GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedDatabase( path ); Transaction tx = db.beginTx(); try { db.createNode(); tx.success(); } finally { tx.close(); } } } In Neo4j 2. Jersey: Ignoring SSL certificate - javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException https://www.markhneedham.com/blog/2014/02/26/jersey-ignoring-ssl-certificate-javax-net-ssl-sslhandshakeexception-java-security-cert-certificateexception/ Wed, 26 Feb 2014 00:12:01 +0000 https://www.markhneedham.com/blog/2014/02/26/jersey-ignoring-ssl-certificate-javax-net-ssl-sslhandshakeexception-java-security-cert-certificateexception/ Last week Alistair and I were working on an internal application and we needed to make a HTTPS request directly to an AWS machine using a certificate signed to a different host. We use jersey-client so our code looked something like this: Client client = Client.create(); client.resource("https://some-aws-host.compute-1.amazonaws.com").post(); // and so on When we ran this we predictably ran into trouble: com.sun.jersey.api.client.ClientHandlerException: javax.net.ssl.SSLHandshakeException: java.security.cert.CertificateException: No subject alternative DNS name matching some-aws-host. Java 8: Group by with collections https://www.markhneedham.com/blog/2014/02/23/java-8-group-by-with-collections/ Sun, 23 Feb 2014 19:16:27 +0000 https://www.markhneedham.com/blog/2014/02/23/java-8-group-by-with-collections/ In my continued reading of Venkat Subramaniam’s 'Functional Programming in Java' I’ve reached the part of the book where the http://download.java.net/jdk8/docs/api/java/util/stream/Stream.html#collect-java.util.stream.Collector- function is introduced. We want to take a collection of people, group them by age and return a map of (age -> people’s names) for which this comes in handy. To refresh, this is what the Person class looks like: static class Person { private String name; private int age; Person(String name, int age) { this. Java 8: Sorting values in collections https://www.markhneedham.com/blog/2014/02/23/java-8-sorting-values-in-collections/ Sun, 23 Feb 2014 14:43:47 +0000 https://www.markhneedham.com/blog/2014/02/23/java-8-sorting-values-in-collections/ Having realised that Java 8 is due for its GA release within the next few weeks I thought it was about time I had a look at it and over the last week have been reading Venkat Subramaniam’s book. I’m up to chapter 3 which covers sorting a collection of people. The Person class is defined roughly like so: static class Person { private String name; private int age; Person(String name, int age) { this. Automating Skype's 'This message has been removed' https://www.markhneedham.com/blog/2014/02/20/automating-skypes-this-message-has-been-removed/ Thu, 20 Feb 2014 23:16:34 +0000 https://www.markhneedham.com/blog/2014/02/20/automating-skypes-this-message-has-been-removed/ One of the stranger features of Skype is that that it allows you to delete the contents of a message that you’ve already sent to someone - something I haven’t seen on any other messaging system I’ve used. For example if I wrote a message in Skype and wanted to edit it I would press the 'up' arrow: Once I’ve deleted the message I’d see this in the space where the message used to be: Neo4j: Cypher - Set Based Operations https://www.markhneedham.com/blog/2014/02/20/neo4j-cypher-set-based-operations/ Thu, 20 Feb 2014 18:22:43 +0000 https://www.markhneedham.com/blog/2014/02/20/neo4j-cypher-set-based-operations/ I was recently reminded of a Neo4j cypher query that I wrote a couple of years ago to find the colleagues that I hadn’t worked with in the ThoughtWorks London office. The model looked like this: And I created the following fake data set of the aforementioned model: public class SetBasedOperations { private static final Label PERSON = DynamicLabel.label( "Person" ); private static final Label OFFICE = DynamicLabel.label( "Office" ); private static final DynamicRelationshipType COLLEAGUES = DynamicRelationshipType. Neo4j: Creating nodes and relationships from a list of maps https://www.markhneedham.com/blog/2014/02/17/neo4j-creating-nodes-and-relationships-from-a-list-of-maps/ Mon, 17 Feb 2014 14:11:07 +0000 https://www.markhneedham.com/blog/2014/02/17/neo4j-creating-nodes-and-relationships-from-a-list-of-maps/ Last week Alistair and I were porting some Neo4j cypher queries from 1.8 to 2.0 and one of the queries we had to change was an interesting one that created a bunch of relationships from a list/array of maps. In the query we had a user 'Mark' and wanted to create 'FRIENDS_WITH' relationships to Peter and Michael. The application passed in a list of maps representing Peter and Michael as a parameter but if we remove the parameters the query looked like this: Neo4j: Value in relationships, but value in nodes too! https://www.markhneedham.com/blog/2014/02/13/neo4j-value-in-relationships-but-value-in-nodes-too/ Thu, 13 Feb 2014 00:10:37 +0000 https://www.markhneedham.com/blog/2014/02/13/neo4j-value-in-relationships-but-value-in-nodes-too/ I’ve recently spent a bit of time working with people on their graph commons and a common pattern I’ve come across is that although the models have lots of relationships there are often missing nodes. Emails We’ll start with a model which represents the emails that people send between each other. A first cut might look like this: The problem with this approach is that we haven’t modelled the concept of an email - that’s been implicitly modelled via a relationship. Jython/Neo4j: java.lang.ExceptionInInitializerError: java.lang.ExceptionInInitializerError https://www.markhneedham.com/blog/2014/02/05/jythonneo4j-java-lang-exceptionininitializererror-java-lang-exceptionininitializererror/ Wed, 05 Feb 2014 12:21:30 +0000 https://www.markhneedham.com/blog/2014/02/05/jythonneo4j-java-lang-exceptionininitializererror-java-lang-exceptionininitializererror/ I’ve been playing around with calling Neo4j’s Java API from Python via Jython and immediately ran into the following exception when trying to create an embedded instance: $ jython -Dpython.path /path/to/neo4j.jar Jython 2.5.3 (2.5:c56500f08d34+, Aug 13 2012, 14:48:36) [Java HotSpot(TM) 64-Bit Server VM (Oracle Corporation)] on java1.7.0_45 Type "help", "copyright", "credits" or "license" for more information. >>> import org.neo4j.graphdb.factory >>> org.neo4j.graphdb.factory.GraphDatabaseFactory().newEmbeddedDatabase("/tmp/foo") Traceback (most recent call last): File "<stdin>", line 1, in <module> at org. Java: Handling a RuntimeException in a Runnable https://www.markhneedham.com/blog/2014/01/31/java-handling-a-runtimeexception-in-a-runnable/ Fri, 31 Jan 2014 23:59:58 +0000 https://www.markhneedham.com/blog/2014/01/31/java-handling-a-runtimeexception-in-a-runnable/ At the end of last year I was playing around with running scheduled tasks to monitor a Neo4j cluster and one of the problems I ran into was that the monitoring would sometimes exit. I eventually realised that this was because a RuntimeException was being thrown inside the Runnable method and I wasn’t handling it. The following code demonstrates the problem: import java.util.ArrayList; import java.util.List; import java.util.concurrent.*; public class RunnableBlog { public static void main(String[] args) throws ExecutionException, InterruptedException { ScheduledExecutorService executor = Executors. Neo4j 2.0.0: Optimising a football query https://www.markhneedham.com/blog/2014/01/31/neo4j-2-0-0-optimising-a-football-query/ Fri, 31 Jan 2014 22:41:57 +0000 https://www.markhneedham.com/blog/2014/01/31/neo4j-2-0-0-optimising-a-football-query/ A couple of months ago I wrote a blog post explaining how I’d applied Wes Freeman’s Cypher optimisation patterns to a query - since then Neo4j 2.0.0 has been released and I’ve extended the model so I thought I’d try again. The updated model looks like this: The query is similar to before - I want to calculate the top away goal scorers in the 2012-2013 season. I started off with this: Neo4j 2.0.0: Cypher - Index Hints and Neo.ClientError.Schema.NoSuchIndex https://www.markhneedham.com/blog/2014/01/31/neo4j-2-0-0-cypher-index-hints-and-neo-clienterror-schema-nosuchindex/ Fri, 31 Jan 2014 07:14:53 +0000 https://www.markhneedham.com/blog/2014/01/31/neo4j-2-0-0-cypher-index-hints-and-neo-clienterror-schema-nosuchindex/ One of the features added into the more recent versions of Neo4j’s cypher query language is the ability to tell Cypher which index you’d like to use in your queries. We’ll use the football dataset, so let’s start by creating an index on the 'name' property of nodes labelled 'Player': CREATE INDEX ON :Player(name) Let’s say we want to write a query to find 'Wayne Rooney' while explicitly using this index. Java: Work out the serialVersionUID of a class https://www.markhneedham.com/blog/2014/01/31/java-work-out-the-serialversionuid-of-a-class/ Fri, 31 Jan 2014 06:51:06 +0000 https://www.markhneedham.com/blog/2014/01/31/java-work-out-the-serialversionuid-of-a-class/ Earlier in the week I wanted to work out the serialVersionUID of a serializable class so that I could override its toString method without breaking everything. I came across Frank Kim’s blog post which suggested using the serialver tool which comes with the JDK. I created a little Maven project to test this tool out on a very simple class: import java.io.Serializable; public class SerialiseMe implements Serializable { } If we compile that class into a JAR and then run the serialver tool we see the following output: Neo4j: org.eclipse.jetty.io.EofException - Caused by: java.io.IOException: Broken pipe https://www.markhneedham.com/blog/2014/01/27/neo4j-org-eclipse-jetty-io-eofexception-caused-by-java-io-ioexception-broken-pipe/ Mon, 27 Jan 2014 11:32:03 +0000 https://www.markhneedham.com/blog/2014/01/27/neo4j-org-eclipse-jetty-io-eofexception-caused-by-java-io-ioexception-broken-pipe/ From scouring the Neo4j google group and Stack Overflow I’ve noticed that a few people have been hitting the following exception when executing queries against Neo4j server: SEVERE: The response of the WebApplicationException cannot be utilized as the response is already committed. Re-throwing to the HTTP container javax.ws.rs.WebApplicationException: javax.ws.rs.WebApplicationException: org.eclipse.jetty.io.EofException at org.neo4j.server.rest.repr.OutputFormat$1.write(OutputFormat.java:174) at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:71) at com.sun.jersey.core.impl.provider.entity.StreamingOutputProvider.writeTo(StreamingOutputProvider.java:57) at com.sun.jersey.spi.container.ContainerResponse.write(ContainerResponse.java:306) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1437) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com. Neo4j HA: org.neo4j.graphdb.TransactionFailureException: Timeout waiting for database to allow new transactions. Blocking components (1): [] https://www.markhneedham.com/blog/2014/01/27/neo4j-ha-org-neo4j-graphdb-transactionfailureexception-timeout-waiting-for-database-to-allow-new-transactions-blocking-components-1/ Mon, 27 Jan 2014 09:42:18 +0000 https://www.markhneedham.com/blog/2014/01/27/neo4j-ha-org-neo4j-graphdb-transactionfailureexception-timeout-waiting-for-database-to-allow-new-transactions-blocking-components-1/ As I mentioned in my previous post, I’ve been spending quite a bit of time working with Neo4j HA and recently came across the following exception in data/graph.db/messages.log: org.neo4j.graphdb.TransactionFailureException: Timeout waiting for database to allow new transactions. Blocking components (1): [] at org.neo4j.kernel.ha.HighlyAvailableGraphDatabase.beginTx(HighlyAvailableGraphDatabase.java:199) at org.neo4j.kernel.TransactionBuilderImpl.begin(TransactionBuilderImpl.java:43) at org.neo4j.kernel.InternalAbstractGraphDatabase.beginTx(InternalAbstractGraphDatabase.java:949) at org.neo4j.server.rest.transactional.TransactionalRequestDispatcher.dispatch(TransactionalRequestDispatcher.java:52) at com.sun.jersey.server.impl.uri.rules.HttpMethodRule.accept(HttpMethodRule.java:288) at com.sun.jersey.server.impl.uri.rules.ResourceClassRule.accept(ResourceClassRule.java:108) at com.sun.jersey.server.impl.uri.rules.RightHandPathRule.accept(RightHandPathRule.java:147) at com.sun.jersey.server.impl.uri.rules.RootResourceClassesRule.accept(RootResourceClassesRule.java:84) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1469) at com.sun.jersey.server.impl.application.WebApplicationImpl._handleRequest(WebApplicationImpl.java:1400) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1349) at com.sun.jersey.server.impl.application.WebApplicationImpl.handleRequest(WebApplicationImpl.java:1339) at com.sun.jersey.spi.container.servlet.WebComponent.service(WebComponent.java:416) at com.sun.jersey.spi.container.servlet.ServletContainer.service(ServletContainer.java:537) at com. Neo4j HA: Election could not pick a winner https://www.markhneedham.com/blog/2014/01/24/neo4j-ha-election-could-not-pick-a-winner/ Fri, 24 Jan 2014 10:30:41 +0000 https://www.markhneedham.com/blog/2014/01/24/neo4j-ha-election-could-not-pick-a-winner/ Recently I’ve been spending a reasonable chunk of my time helping people get up and running with their Neo4j High Availability cluster and there’s sometimes confusion around how it should be configured. A Neo4j cluster typically consists of a master and two slaves and you’d usually have it configured so that any machine can be the master. However, there is a configuration parameter 'ha.slave_only' which can be set to 'true' to ensure that a machine will never be elected as master when an election takes place. Neo4j Backup: Store copy and consistency check https://www.markhneedham.com/blog/2014/01/22/neo4j-backup-store-copy-and-consistency-check/ Wed, 22 Jan 2014 17:36:53 +0000 https://www.markhneedham.com/blog/2014/01/22/neo4j-backup-store-copy-and-consistency-check/ One of the lesser known things about the Neo4j online backup tool, which I wrote about last week, is that conceptually there are two parts to it: Copying the store files to a location of your choice Verifying that those store files are consistent. By default both of these run when you run the 'neo4j-backup' script but sometimes it’s useful to be able to run them separately. If we want to just run the copying the store files part of the process we can tell the backup tool to skip the consistency check by using the 'verify' flag: Neo4j Backup: java.lang.ClassCastException: org.jboss.netty.buffer.BigEndianHeapChannelBuffer cannot be cast to org.neo4j.cluster.com.message.Message https://www.markhneedham.com/blog/2014/01/19/neo4j-backup-java-lang-classcastexception-org-jboss-netty-buffer-bigendianheapchannelbuffer-cannot-be-cast-to-org-neo4j-cluster-com-message-message/ Sun, 19 Jan 2014 19:29:16 +0000 https://www.markhneedham.com/blog/2014/01/19/neo4j-backup-java-lang-classcastexception-org-jboss-netty-buffer-bigendianheapchannelbuffer-cannot-be-cast-to-org-neo4j-cluster-com-message-message/ (as Gabriel points out in the comments the ability to do a 'HA backup' doesn’t exist in more recent versions of Neo4j. I’ll leave this post here for people still running on older versions who encounter the error.) When using Neo4j’s online backup facility there are two ways of triggering it, either by using the 'single://' or 'ha://' syntax and these behave slightly differently. If you’re using the 'single://' syntax and don’t specify a port then it will connect to '6362' by default: Learning about bitmaps https://www.markhneedham.com/blog/2014/01/12/learning-about-bitmaps/ Sun, 12 Jan 2014 17:44:46 +0000 https://www.markhneedham.com/blog/2014/01/12/learning-about-bitmaps/ A few weeks ago Alistair and I were working on the code used to model the labels that a node has attached to it in a Neo4j database. The way this works is that chunks of 32 nodes ids are represented as a 32 bit bitmap for each label where a 1 for a bit means that a node has the label and a 0 means that it doesn’t. RxJava: From Future to Observable https://www.markhneedham.com/blog/2013/12/28/rxjava-from-future-to-observable/ Sat, 28 Dec 2013 21:46:42 +0000 https://www.markhneedham.com/blog/2013/12/28/rxjava-from-future-to-observable/ I first came across Reactive Extensions about 4 years ago on Matthew Podwysocki’s blog but then haven’t heard much about it until I saw Matthew give a talk at Code Mesh a few weeks ago. It seems to have grown in popularity recently and I noticed that’s there’s now a Java version called RxJava written by Netflix. I thought I’d give it a try by changing some code I wrote while exploring cypher’s MERGE function to expose an Observable instead of Futures. Neo4j: Cypher - Using MERGE with schema indexes/constraints https://www.markhneedham.com/blog/2013/12/23/neo4j-cypher-using-merge-with-schema-indexesconstraints/ Mon, 23 Dec 2013 13:30:38 +0000 https://www.markhneedham.com/blog/2013/12/23/neo4j-cypher-using-merge-with-schema-indexesconstraints/ A couple of weeks about I wrote about cypher’s MERGE function and over the last few days I’ve been exploring how it works when used with schema indexes and unique constraints. A common use case with Neo4j is to model users and events where an event could be a tweet, Facebook post or Pinterest pin. The model might look like this: We’d have a stream of (user, event) pairs and a cypher statement like the following to get the data into Neo4j: Supporting production code: Start with the simple things https://www.markhneedham.com/blog/2013/12/20/supporting-production-code-start-with-the-simple-things/ Fri, 20 Dec 2013 18:07:36 +0000 https://www.markhneedham.com/blog/2013/12/20/supporting-production-code-start-with-the-simple-things/ A few months ago I wrote about my experiences supporting production code while working at uSwitch. Since then I’ve been working on support for Neo4j customers and I’ve realised that there are a couple of other things to keep in mind while debugging production problems that I missed from the initial list. Keep a clear head / Hold back your assumptions The first is that it’s very helpful to completely clear your head of any assumptions when looking at a problem. Neo4j: Cypher - Getting the hang of MERGE https://www.markhneedham.com/blog/2013/12/10/neo4j-cypher-getting-the-hang-of-merge/ Tue, 10 Dec 2013 23:46:46 +0000 https://www.markhneedham.com/blog/2013/12/10/neo4j-cypher-getting-the-hang-of-merge/ I’ve been trying to get the hang of cypher’s MERGE function and started out by writing a small file to import some people with random properties using the java-faker library. public class Merge { private static Label PERSON = DynamicLabel.label("Person"); public static void main(String[] args) throws IOException { File dbFile = new File("/tmp/test-db"); FileUtils.deleteRecursively(dbFile); Faker faker = new Faker(); Random random = new Random(); GraphDatabaseService db = new GraphDatabaseFactory().newEmbeddedDatabase(dbFile.getPath()); Transaction tx = db. Neo4j: What is a node? https://www.markhneedham.com/blog/2013/11/29/neo4j-what-is-a-node/ Fri, 29 Nov 2013 19:50:53 +0000 https://www.markhneedham.com/blog/2013/11/29/neo4j-what-is-a-node/ One of the first things I needed to learn when I started using Neo4j was how to model my domain using nodes and relationships and it wasn’t initially obvious to me what things should be nodes. Luckily Ian Robinson showed me a mini-algorithm which I found helpful for getting started. The steps are as follows: Write out the questions you want to ask Highlight/underline the nouns Those are your nodes! Neo4j: The case of neo4j-shell and the invisible text ft. Windows and the neo4j-desktop https://www.markhneedham.com/blog/2013/11/29/neo4j-the-case-of-windows-neo4j-desktop-and-the-invisible-text/ Fri, 29 Nov 2013 17:08:49 +0000 https://www.markhneedham.com/blog/2013/11/29/neo4j-the-case-of-windows-neo4j-desktop-and-the-invisible-text/ I’ve been playing around with Neo4j on a Windows VM recently and I wanted to launch neo4j-shell to run a few queries. The neo4j-shell script isn’t shipped with Neo4j desktop which I used to install Neo4j on my VM but we can still launch it from the Windows Command Prompt with the following command:</p ~bash C:\Users\Mark> cd "C:\Program Files\Neo4j Community" C:\Program Files\Neo4j Community>jre\bin\java -cp bin\neo4j-desktop-2.0.0-RC1.jar org.neo4j.shell.StartClient Welcome to the Neo4j Shell! Neo4j: Modelling 'series' of events https://www.markhneedham.com/blog/2013/11/29/neo4j-modelling-series-of-events/ Fri, 29 Nov 2013 00:51:25 +0000 https://www.markhneedham.com/blog/2013/11/29/neo4j-modelling-series-of-events/ One of the things I’ve never worked out how to model in my football graph is series of matches so that I could answer questions like the following: How many goals has Robin Van Persie scored in his last 10 matches in the Barclays Premier League? A brute force approach would be to get all the matches featuring Robin Van Persie in a certain competition, order them by date and take the top ten which would work but doesn’t feel very graph. Neo4j: The 'thinking in graphs' curve https://www.markhneedham.com/blog/2013/11/27/neo4j-the-thinking-in-graphs-curve/ Wed, 27 Nov 2013 23:09:31 +0000 https://www.markhneedham.com/blog/2013/11/27/neo4j-the-thinking-in-graphs-curve/ In a couple of Neo4j talks I’ve done recently I’ve been asked how long it takes to get used to modelling data in graphs and whether I felt it’s simpler than alternative approaches. My experience of 'thinking in graphs'™ closely mirrors what I believe is a fairly common curve when learning technologies which change the way you think: There is an initial stage where it seems really hard because it’s different to what we’re used to and at this stage we might want to go back to what we’re used to. Neo4j: Using aliases to handle messy data https://www.markhneedham.com/blog/2013/11/26/neo4j-using-aliases-to-handle-messy-data/ Tue, 26 Nov 2013 00:12:56 +0000 https://www.markhneedham.com/blog/2013/11/26/neo4j-using-aliases-to-handle-messy-data/ One of the common problems when building data heavy applications is that names of things in the domain are often named differently depending on which system you get the data from. This means that we’ll typically end up running the data from different sources through a normalisation process to ensure that we have consistent naming in the database: I’ve recently started linking the football stadium a match was played in to the match in my football graph but unfortunately different match compilers use different spellings or even names for the same stadium. Neo4j 2.0.0-M06 \-> 2.0.0-RC1: Optional relationships with OPTIONAL MATCH https://www.markhneedham.com/blog/2013/11/23/neo4j-2-0-0-m06-2-0-0-rc1-optional-relationships-with-optional-match/ Sat, 23 Nov 2013 22:54:58 +0000 https://www.markhneedham.com/blog/2013/11/23/neo4j-2-0-0-m06-2-0-0-rc1-optional-relationships-with-optional-match/ One of the breaking changes in Neo4j 2.0.0-RC1 compared to previous versions is that the -[?]-> syntax for matching optional relationships has been retired and replaced with the http://docs.neo4j.org/chunked/milestone/query-optional-match.html construct. An example where we might want to match an optional relationship could be if we want to find colleagues that we haven’t worked with given the following model: Suppose we have the following data set: CREATE (steve:Person {name: "Steve"}) CREATE (john:Person {name: "John"}) CREATE (david:Person {name: "David"}) CREATE (paul:Person {name: "Paul"}) CREATE (sam:Person {name: "Sam"}) CREATE (londonOffice:Office {name: "London Office"}) CREATE UNIQUE (steve)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (john)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (david)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (paul)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (sam)-[:WORKS_IN]->(londonOffice) CREATE UNIQUE (steve)-[:COLLEAGUES_WITH]->(john) CREATE UNIQUE (steve)-[:COLLEAGUES_WITH]->(david) We might write the following query to find people from the same office as Steve but that he hasn’t worked with: Neo4j 2.0.0-M06 \-> 2.0.0-RC1: Working with path expressions https://www.markhneedham.com/blog/2013/11/23/neo4j-2-0-0-m06-2-0-0-rc1-working-with-path-expressions/ Sat, 23 Nov 2013 10:30:41 +0000 https://www.markhneedham.com/blog/2013/11/23/neo4j-2-0-0-m06-2-0-0-rc1-working-with-path-expressions/ We recently released Neo4j 2.0.0-RC1 and since there were some breaking changes from Neo4j 2.0.0-M06 I decided to check if I needed to update any of my football graph queries. On query which no longer worked as I expected was the following one which calculated the top goal scorers for televised games: MATCH (player:Player)-[:played|subbed_on]->stats WITH stats.goals AS goals, player, stats-[:in]->()-[:on_tv]-() as onTv RETURN player.name, SUM(CASE WHEN onTv = FALSE THEN goals ELSE 0 END) as nonTvGoals, SUM(CASE WHEN onTv = TRUE THEN goals ELSE 0 END) as tvGoals, SUM(goals) as allGoals ORDER BY tvGoals DESC LIMIT 10 This is what that section of the graph looks like visually: Neo4j: Cypher - Creating relationships between nodes from adjacent rows in a query https://www.markhneedham.com/blog/2013/11/22/neo4j-cypher-creating-relationships-between-nodes-from-adjacent-rows-in-a-query/ Fri, 22 Nov 2013 22:45:32 +0000 https://www.markhneedham.com/blog/2013/11/22/neo4j-cypher-creating-relationships-between-nodes-from-adjacent-rows-in-a-query/ I want to introduce the concept of a season into my graph so I can have import matches for multiple years and then vary the time period which queries take into account. I started by creating season nodes like this: CREATE (:Season {name: "2013/2014", timestamp: 1375315200}) CREATE (:Season {name: "2012/2013", timestamp: 1343779200}) CREATE (:Season {name: "2011/2012", timestamp: 1312156800}) CREATE (:Season {name: "2010/2011", timestamp: 1280620800}) CREATE (:Season {name: "2009/2010", timestamp: 1249084800}) I wanted to add a 'NEXT' relationship between the seasons so that I could have an in graph season index which would allow me to write queries like the following: Java: Schedule a job to run on a time interval https://www.markhneedham.com/blog/2013/11/17/java-schedule-a-job-to-run-on-a-time-interval/ Sun, 17 Nov 2013 22:58:35 +0000 https://www.markhneedham.com/blog/2013/11/17/java-schedule-a-job-to-run-on-a-time-interval/ Recently I’ve spent some time building a set of tests around rolling upgrades between Neo4j versions and as part of that I wanted to log the state of the cluster as the upgrade was happening. The main thread of the test blocks waiting until the upgrade is done so I wanted to log on another thread every few seconds. Alistair pointed me at the http://docs.oracle.com/javase/7/docs/api/java/util/concurrent/ScheduledExecutorService.html which worked quite nicely. Git: Viewing the last commit on all the tags https://www.markhneedham.com/blog/2013/11/16/git-viewing-the-last-commit-on-all-the-tags/ Sat, 16 Nov 2013 21:58:08 +0000 https://www.markhneedham.com/blog/2013/11/16/git-viewing-the-last-commit-on-all-the-tags/ A couple of days ago I was curious when different versions of Neo4j had been released and although the release notes page was helpful I thought I’d find more detailed information if I looked up the git tags. Assuming that we’ve already got a clone of the repository on our machine: $ git clone git@github.com:neo4j/neo4j.git We can pull down the latest tags by calling git fetch --tags or git fetch -t Python: Making scikit-learn and pandas play nice https://www.markhneedham.com/blog/2013/11/09/python-making-scikit-learn-and-pandas-play-nice/ Sat, 09 Nov 2013 13:58:54 +0000 https://www.markhneedham.com/blog/2013/11/09/python-making-scikit-learn-and-pandas-play-nice/ In the last post I wrote about Nathan and my http://www.markhneedham.com/blog/2013/10/30/kaggle-titanic-python-pandas-attempt/[attempts at the http://www.kaggle.com/c/titanic-gettingStarted[Kaggle Titanic Problem\] I mentioned that we our next step was to try out http://scikit-learn.org/stable/tutorial/[scikit-learn\] so I thought I should summarise where we’ve got up to.</p> We needed to write a classification algorithm to work out whether a person onboard the Titanic survived and luckily scikit-learn has http://scikit-learn.org/stable/supervised_learning.html#supervised-learning[extensive documentation on each of the algorithms\]. Unfortunately almost all those examples use http://www. Python: Scoping variables to use with timeit https://www.markhneedham.com/blog/2013/11/09/python-scoping-variables-to-use-with-timeit/ Sat, 09 Nov 2013 11:01:08 +0000 https://www.markhneedham.com/blog/2013/11/09/python-scoping-variables-to-use-with-timeit/ I’ve been playing around with Python’s timeit library to help benchmark some Neo4j cypher queries but I ran into some problems when trying to give it accessible to variables in my program. I had the following python script which I would call from the terminal using python top-away-scorers.py: import query_profiler as qp attempts = [ {"query": '''MATCH (player:Player)-[:played]->stats-[:in]->game, stats-[:for]->team WHERE game<-[:away_team]-team RETURN player.name, SUM(stats.goals) AS goals ORDER BY goals DESC LIMIT 10'''} ] qp. Neo4j 2.0.0-M06: Applying Wes Freeman's Cypher Optimisation tricks https://www.markhneedham.com/blog/2013/11/08/neo4j-2-0-0-m06-applying-wes-freemans-cypher-optimisation-tricks/ Fri, 08 Nov 2013 09:40:01 +0000 https://www.markhneedham.com/blog/2013/11/08/neo4j-2-0-0-m06-applying-wes-freemans-cypher-optimisation-tricks/ Wes has been teaching me some of his tricks for tuning Neo4j cypher queries over the last few weeks so I thought I should write up a few examples of the master’s advice in action. I’ve created a mini benchmarking tool using Python’s timeit and numpy to run different queries multiple times and return the mean, min, max and 95th percentile times. I’ve made my football data set available in case you want to follow along and we’ll start with a query to find the top goal scorers away from home. Python: Generate all combinations of a list https://www.markhneedham.com/blog/2013/11/06/python-generate-all-combinations-of-a-list/ Wed, 06 Nov 2013 07:25:24 +0000 https://www.markhneedham.com/blog/2013/11/06/python-generate-all-combinations-of-a-list/ Nathan and I have been playing around with different scikit-learn machine learning classifiers and we wanted to run different combinations of features through each one and work out which gave the best result. We started with a list of features: all_columns = ["Fare", "Sex", "Pclass", 'Embarked'] itertools#combinations allows us to create combinations with a length of our choice: >>> import itertools as it >>> list(it.combinations(all_columns, 3)) [('Fare', 'Sex', 'Pclass'), ('Fare', 'Sex', 'Embarked'), ('Fare', 'Pclass', 'Embarked'), ('Sex', 'Pclass', 'Embarked')] We wanted to create combinations of arbitrary length so we wanted to combine a few invocations of that functions like this: Python: matplotlib - Import error ft2font Symbol not found: _FT_Attach_File (Mac OS X 10.8.3/Mountain Lion) https://www.markhneedham.com/blog/2013/11/03/python-matplotlib-import-error-ft2font-symbol-not-found-_ft_attach_file-mac-os-x-10-8-3mountain-lion/ Sun, 03 Nov 2013 11:14:48 +0000 https://www.markhneedham.com/blog/2013/11/03/python-matplotlib-import-error-ft2font-symbol-not-found-_ft_attach_file-mac-os-x-10-8-3mountain-lion/ As I mentioned at the end of my last post about the Titanic Kaggle problem our next step was to do some proper machine learning™ using scikit-learn so I started by looking at the Decision Tree example. Unfortunately I ended up on the mother of all yak shaving missions while trying to execute the code which draws a chart using matplotlib. I ran the following line from the tutorial: Neo4j: A first attempt at retail product substitution https://www.markhneedham.com/blog/2013/11/01/neo4j-a-first-attempt-at-retail-product-substitution/ Fri, 01 Nov 2013 20:41:18 +0000 https://www.markhneedham.com/blog/2013/11/01/neo4j-a-first-attempt-at-retail-product-substitution/ One of the interesting problems in the world of online shopping from the perspective of the retailer is working out whether there is a suitable substitute product if an ordered item isn’t currently in stock. Since this problem brings together three types of data - order history, stock levels and products - it seems like it should be a nice fit for Neo4j so I 'graphed up' a quick example. Kaggle Titanic: Python pandas attempt https://www.markhneedham.com/blog/2013/10/30/kaggle-titanic-python-pandas-attempt/ Wed, 30 Oct 2013 07:26:49 +0000 https://www.markhneedham.com/blog/2013/10/30/kaggle-titanic-python-pandas-attempt/ Nathan and I have been looking at Kaggle’s Titanic problem and while working through the Python tutorial Nathan pointed out that we could greatly simplify the code if we used pandas instead. The problem we had with numpy is that you use integers to reference columns. We spent a lot of time being thoroughly confused as to why something wasn’t working only to realise we were using the wrong column. pandas: Adding a column to a DataFrame (based on another DataFrame) https://www.markhneedham.com/blog/2013/10/30/pandas-adding-a-column-to-a-dataframe-based-on-another-dataframe/ Wed, 30 Oct 2013 06:12:08 +0000 https://www.markhneedham.com/blog/2013/10/30/pandas-adding-a-column-to-a-dataframe-based-on-another-dataframe/ Nathan and I have been working on the Titanic Kaggle problem using the pandas data analysis library and one thing we wanted to do was add a column to a DataFrame indicating if someone survived. We had the following (simplified) DataFrame containing some information about customers on board the Titanic: def addrow(df, row): return df.append(pd.DataFrame(row), ignore_index=True) customers = pd.DataFrame(columns=['PassengerId','Pclass','Name','Sex','Fare']) customers = addrow(customers, [dict(PassengerId=892, Pclass=3, Name="Kelly, Mr. James", Sex="male", Fare=7.8292)]) customers = addrow(customers, [dict(PassengerId=893, Pclass=3, Name="Wilkes, Mrs. Thinking Fast and Slow - Daniel Kahneman: Book Review https://www.markhneedham.com/blog/2013/10/27/thinking-fast-and-slow-daniel-kahneman-book-review/ Sun, 27 Oct 2013 22:53:49 +0000 https://www.markhneedham.com/blog/2013/10/27/thinking-fast-and-slow-daniel-kahneman-book-review/ I picked up Daniel Kahneman’s 'Thinking Fast and Slow' after a recommendation by Mike Jones in early 2013 - it’s taken me quite a while to get through it. The book starts by describing our two styles of thinking... System 1 — operates automatically and quickly, with little or no effort and no sense of voluntary control. System 2 — allocates attention to the effortful mental activities that demand it, including complex computations. Neo4j: Cypher - Profiling ORDER BY LIMIT vs LIMIT https://www.markhneedham.com/blog/2013/10/27/neo4j-cypher-profiling-order-by-limit-vs-limit/ Sun, 27 Oct 2013 00:33:54 +0000 https://www.markhneedham.com/blog/2013/10/27/neo4j-cypher-profiling-order-by-limit-vs-limit/ Something I’ve seen people get confused by when writing queries using Neo4j’s cypher query language is the sometimes significant difference in query execution time when using 'LIMIT' on its own compared to using it in combination with 'ORDER BY'. The confusion is centred around the fact that at first glance it seems like the only thing different between these queries is the sorting of the rows but there’s actually more to it. Neo4j: Making implicit relationships explicit & bidirectional relationships https://www.markhneedham.com/blog/2013/10/25/neo4j-making-implicit-relationships-explicit-bidirectional-relationships/ Fri, 25 Oct 2013 16:03:48 +0000 https://www.markhneedham.com/blog/2013/10/25/neo4j-making-implicit-relationships-explicit-bidirectional-relationships/ I recently read Michal Bachman’s post about bidirectional relationships in Neo4j in which he suggests that for some relationship types we’re not that interested in the relationship’s direction and can therefore ignore it when querying. He uses the following example showing the partnership between Neo Technology and GraphAware: Both companies are partners with each other but since we can just as quickly find incoming and outgoing relationships we may as well just have one relationship between the two companies/nodes. Neo4j: Modelling hyper edges in a property graph https://www.markhneedham.com/blog/2013/10/22/neo4j-modelling-hyper-edges-in-a-property-graph/ Tue, 22 Oct 2013 22:02:14 +0000 https://www.markhneedham.com/blog/2013/10/22/neo4j-modelling-hyper-edges-in-a-property-graph/ At the Graph Database meet up in Antwerp last week we discussed how you would model a hyper edge in a property graph like Neo4j and I realised that I’d done this in my football graph without realising. A hyper edge is defined as follows: A hyperedge is a connection between two or more vertices, or nodes, of a hypergraph. A hypergraph is a graph in which generalized edges (called hyperedges) may connect more than two nodes with discrete properties. Neo4j 2.0: Labels, indexes and the like https://www.markhneedham.com/blog/2013/10/22/neo4j-2-0-labels-indexes-and-the-like/ Tue, 22 Oct 2013 20:20:30 +0000 https://www.markhneedham.com/blog/2013/10/22/neo4j-2-0-labels-indexes-and-the-like/ Last week I did a couple of talks about modelling with Neo4j meet ups in Amsterdam and Antwerp and there were a few questions about how indexing works with labels that are being introduced in Neo4j 2.0 As well as defining properties on nodes we can also assign them a label which can be used to categorise different groups of nodes. For example in the football graph we might choose to tag player nodes with the label 'Player': Neo4j: Testing an unmanaged extension using CommunityServerBuilder https://www.markhneedham.com/blog/2013/10/20/neo4j-testing-an-unmanaged-extension-using-communitserverbuilder/ Sun, 20 Oct 2013 21:46:16 +0000 https://www.markhneedham.com/blog/2013/10/20/neo4j-testing-an-unmanaged-extension-using-communitserverbuilder/ I’ve been playing around with Neo4j unmanaged extensions recently and I wanted to be able to check that it worked properly without having to deploy it to a real Neo4j server. I’d previously used http://grepcode.com/file/repo1.maven.org/maven2/org.neo4j/neo4j-kernel/1.2-1.2/org/neo4j/kernel/ImpermanentGraphDatabase.java when using Neo4j embedded and Ian pointed me towards CommunityServerBuilder which allows us to do a similar thing in Neo4j server world. I’ve created an example of a dummy unmanaged extension and test showing this approach but it’s reasonably simple. Neo4j: Accessing JMX beans via HTTP https://www.markhneedham.com/blog/2013/10/20/neo4j-accessing-jmx-beans-via-http/ Sun, 20 Oct 2013 11:13:54 +0000 https://www.markhneedham.com/blog/2013/10/20/neo4j-accessing-jmx-beans-via-http/ One of the additional features that Neo4j enterprise provides is access to various JMX properties which describe various aspects of the database. These would typically be accessed by using jConsole or similar but some monitoring tools aren’t able to use the JMX hook and a HTTP interface would work better. Luckily Neo4j server does expose the JMX beans and we can get a list of URIs to query by hitting the following URI: Neo4j: Exploring new data sets with help from Neo4j browser https://www.markhneedham.com/blog/2013/10/18/neo4j-exploring-new-data-sets-with-help-from-neo4j-browser/ Fri, 18 Oct 2013 11:43:59 +0000 https://www.markhneedham.com/blog/2013/10/18/neo4j-exploring-new-data-sets-with-help-from-neo4j-browser/ One of the things that I’ve found difficult when looking at a new Neo4j database is working out the structure of the data it contains. I’m used to relational databases where you can easily get a list of the table and the foreign keys that allow you to join them to each other. This has traditionally been difficult when using Neo4j but with the release of the Neo4j browser we can now easily get this type of overview by clicking on the Neo4j icon at the top left of the browser. neo4j: Setting query timeout https://www.markhneedham.com/blog/2013/10/17/neo4j-setting-query-timeout/ Thu, 17 Oct 2013 06:47:10 +0000 https://www.markhneedham.com/blog/2013/10/17/neo4j-setting-query-timeout/ Updated December 2015 When I initially wrote this post in 2013 this was an experimental feature that worked using the Neo4j 1.9 series but no longer does in more recent Neo4j versions (2.2, 2.3). The terminating a running transaction page in the docs describes the supported way of terminating queries. - - - - - - - - - - When I was first learning cypher, neo4j’s query language, I frequently wrote queries which traversed the whole graph multiple times and 'hung' for hours as they were evaluated. Java: Incrementally read/stream a CSV file https://www.markhneedham.com/blog/2013/10/14/java-incrementally-readstream-a-csv-file/ Mon, 14 Oct 2013 07:27:10 +0000 https://www.markhneedham.com/blog/2013/10/14/java-incrementally-readstream-a-csv-file/ I’ve been doing some work which involves reading in CSV files, for which I’ve been using OpenCSV, and my initial approach was to read through the file line by line, parse the contents and save it into a list of maps. This works when the contents of the file fit into memory but is problematic for larger files where I needed to stream the file and process each line individually rather than all of them after the file was loaded. neo4j/cypher: Getting rid of an optional match https://www.markhneedham.com/blog/2013/10/13/neo4jcypher-getting-rid-of-an-optional-match/ Sun, 13 Oct 2013 21:59:51 +0000 https://www.markhneedham.com/blog/2013/10/13/neo4jcypher-getting-rid-of-an-optional-match/ I was looking back over some of the queries I wrote for my football data set and I came across one I’d written to work out how many goals players scored in matches that were televised. The data model looks like this: My initial query to work out the top 10 scorers in televised games was as follows: MATCH (player:Player) WITH player MATCH player-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel WITH COLLECT({goals: stats.goals, type: TYPE(t)}) AS games, player RETURN player. neo4j/cypher: Converting queries from 1.9 to 2.0 - 'Can't use optional patterns without explicit START clause' https://www.markhneedham.com/blog/2013/10/03/neo4jcypher-converting-queries-from-1-9-to-2-0-cant-use-optional-patterns-without-explicit-start-clause/ Thu, 03 Oct 2013 16:16:02 +0000 https://www.markhneedham.com/blog/2013/10/03/neo4jcypher-converting-queries-from-1-9-to-2-0-cant-use-optional-patterns-without-explicit-start-clause/ I’ve been playing around with the most recent Neo4j 2.0 milestone release - 2.0.0-M05 - and one of the first things I did was translate the queries from my football data set which were written against Neo4j 1.9. The following query calculates the number of goals scored by players in matches that were shown on television, not on television and in total. START player=node:players('name:*') MATCH player-[:played|subbed_on]->stats-[:in]->game-[t?:on_tv]->channel WITH COLLECT([stats.goals, TYPE(t)]) AS games, player RETURN player. On Writing Well - William Zinsser: Book Review https://www.markhneedham.com/blog/2013/09/30/on-writing-well-william-zinsser-book-review/ Mon, 30 Sep 2013 22:48:13 +0000 https://www.markhneedham.com/blog/2013/09/30/on-writing-well-william-zinsser-book-review/ I first came across William Zinsser’s 'On Writing Well' about a year ago, but put it down having flicked through a couple of the chapters that I felt were relevant. It came back onto my radar a month ago and this time I decided to read it cover to cover as I was sure there were some insights that I’d missed due to my haphazard approach the first time around. neo4j/cypher: Translating 1.9 FILTER queries to use 2.0 list comprehensions https://www.markhneedham.com/blog/2013/09/30/neo4jcypher-translating-1-9-filter-queries-to-use-2-0-list-comprehensions/ Mon, 30 Sep 2013 21:34:01 +0000 https://www.markhneedham.com/blog/2013/09/30/neo4jcypher-translating-1-9-filter-queries-to-use-2-0-list-comprehensions/ I was looking back over some cypher queries I’d written earlier in the year against my football data set to find some examples of where list comprehensions could be useful and I came across this query which is used to work out which teams were the most badly behaved in terms of accumulating red and yellow cards: START team = node:teams('name:*') MATCH team<-[:for]-like_this<-[:started|as_sub]-player-[r?:sent_off_in|booked_in]->game<-[:in]-like_this WITH team, COLLECT(r) AS cards WITH team, FILTER(x IN cards: TYPE(x) = "sent_off_in") AS reds, FILTER(x IN cards: TYPE(x) = "booked_in") AS yellows RETURN team. Elo Rating System: Ranking Champions League teams using Clojure Part 2 https://www.markhneedham.com/blog/2013/09/30/elo-rating-system-ranking-champions-league-teams-using-clojure-part-2/ Mon, 30 Sep 2013 20:26:35 +0000 https://www.markhneedham.com/blog/2013/09/30/elo-rating-system-ranking-champions-league-teams-using-clojure-part-2/ A few weeks ago I wrote about ranking Champions League teams using the Elo Rating algorithm, and since I wrote that post I’ve collated data for 10 years worth of matches so I thought an update was in order. After extracting the details of all those matches I saved them to a JSON file so that I wouldn’t have to parse the HTML pages every time I tweaked the algorithm. Clojure: Writing JSON to a file - "Exception Don't know how to write JSON of class org.joda.time.DateTime" https://www.markhneedham.com/blog/2013/09/26/clojure-writing-json-to-a-file-exception-dont-know-how-to-write-json-of-class-org-joda-time-datetime/ Thu, 26 Sep 2013 19:11:29 +0000 https://www.markhneedham.com/blog/2013/09/26/clojure-writing-json-to-a-file-exception-dont-know-how-to-write-json-of-class-org-joda-time-datetime/ As I mentioned in an earlier post I’ve been transforming Clojure hash’s into JSON strings using data.json but ran into trouble while trying to parse a hash which contained a Joda Time DateTime instance. The date in question was constructed like this: (ns json-date-example (:require [clj-time.format :as f]) (:require [clojure.data.json :as json])) (defn as-date [date-field] (f/parse (f/formatter "dd MMM YYYY") date-field )) (def my-date (as-date "18 Mar 2012")) And when I tried to convert a hash containing that object into a string I got the following exception: Clojure: Writing JSON to a file/reading JSON from a file https://www.markhneedham.com/blog/2013/09/26/clojure-writing-json-to-a-filereading-json-from-a-file/ Thu, 26 Sep 2013 07:47:34 +0000 https://www.markhneedham.com/blog/2013/09/26/clojure-writing-json-to-a-filereading-json-from-a-file/ A few weeks ago I described how I’d scraped football matches using Clojure’s Enlive, and the next step after translating the HTML representation into a Clojure map was to save it as a JSON document. I decided to follow a two step process to achieve this: Convert hash to JSON string Write JSON string to file I imagine there’s probably a way to convert the hash to a stream and pipe that into a file but my JSON document isn’t very large so I think this way is ok for now. cURL: POST/Upload multi part form https://www.markhneedham.com/blog/2013/09/23/curl-postupload-multi-part-form/ Mon, 23 Sep 2013 22:16:29 +0000 https://www.markhneedham.com/blog/2013/09/23/curl-postupload-multi-part-form/ I’ve been doing some work which involved uploading a couple of files from a HTML form and I wanted to check that the server side code was working by executing a cURL command rather than using the browser. The form looks like this: <form action="http://foobar.com" method="POST" enctype="multipart/form-data"> <p> <label for="nodes">File 1:</label> <input type="file" name="file1" id="file1"> </p> <p> <label for="relationships">File 2:</label> <input type="file" name="file2" id="file2"> </p> <input type="submit" name="submit" value="Submit"> </form> If we convert the POST request from the browser into a cURL equivalent we end up with the following: Clojure: Anonymous functions using short notation and the 'ArityException Wrong number of args (0) passed to: PersistentVector' https://www.markhneedham.com/blog/2013/09/23/clojure-anonymous-functions-using-short-notation-and-the-arityexception-wrong-number-of-args-0-passed-to-persistentvector/ Mon, 23 Sep 2013 21:42:12 +0000 https://www.markhneedham.com/blog/2013/09/23/clojure-anonymous-functions-using-short-notation-and-the-arityexception-wrong-number-of-args-0-passed-to-persistentvector/ In the time I’ve spent playing around with Clojure one thing I’ve always got confused by is the error message you get when trying to return a vector using the anonymous function shorthand. For example, if we want function which creates a vector with the values 1, 2, and the argument passed into the function we could write the following: > ((fn [x] [1 2 x]) 6) [1 2 6] However, when I tried to convert it to the shorthand '#()' syntax I got the following exception: Clojure/Emacs/nrepl: Stacktrace-less error messages https://www.markhneedham.com/blog/2013/09/22/clojureemacsnrepl-stacktrace-less-error-messages/ Sun, 22 Sep 2013 23:07:04 +0000 https://www.markhneedham.com/blog/2013/09/22/clojureemacsnrepl-stacktrace-less-error-messages/ Ever since I started using the Emacs + nrepl combination to play around with Clojure I’ve been getting fairly non descript error messages whenever I pass the wrong parameters to a function. For example if I try to update a non existent key in a form I get a Null Pointer Exception: > (update-in {} [:mark] inc) NullPointerException clojure.lang.Numbers.ops (Numbers.java:942) In this case it’s clear that the hash doesn’t have a key ':mark' so the function blows up. Clojure/Emacs/nrepl: Ctrl X + Ctrl E leads to 'FileNotFoundException Could not locate [...] on classpath' https://www.markhneedham.com/blog/2013/09/22/clojureemacsnrepl-ctrl-x-ctrl-e-leads-to-filenotfoundexception-could-not-locate-on-classpath/ Sun, 22 Sep 2013 21:23:25 +0000 https://www.markhneedham.com/blog/2013/09/22/clojureemacsnrepl-ctrl-x-ctrl-e-leads-to-filenotfoundexception-could-not-locate-on-classpath/ I’ve been playing around with Clojure using Emacs and nrepl recently and my normal work flow is to write some code in Emacs and then have it evaluated in nrepl by typing Ctrl X + Ctrl E at the end of the function. I tried this once recently and got the following exception instead of a successful evaluation: FileNotFoundException Could not locate ranking_algorithms/ranking__init.class or ranking_algorithms/ranking.clj on classpath: clojure.lang.RT.load (RT.java:432) I was a bit surprised because I had nrepl running already (via (Meta + X) + Enter + nrepl-jack-in) and I’d only ever seen that exception refer to dependencies which weren’t in my project. Clojure: Stripping all the whitespace https://www.markhneedham.com/blog/2013/09/22/clojure-stripping-all-the-whitespace/ Sun, 22 Sep 2013 18:54:47 +0000 https://www.markhneedham.com/blog/2013/09/22/clojure-stripping-all-the-whitespace/ When putting together data sets to play around with, one of the more boring tasks is stripping out characters that you’re not interested in and more often than not those characters are white spaces. Since I’ve been building data sets using Clojure I wanted to write a function that would do this for me. I started out with the following string: (def word " with a little bit of space we can make it through the night ") which I wanted to format in such a way that there would be a maximum of one space between each word. Clojure: Converting an array/set into a hash map https://www.markhneedham.com/blog/2013/09/20/clojure-converting-an-arrayset-into-a-hash-map/ Fri, 20 Sep 2013 21:13:01 +0000 https://www.markhneedham.com/blog/2013/09/20/clojure-converting-an-arrayset-into-a-hash-map/ When I was implementing the Elo Rating algorithm a few weeks ago one thing I needed to do was come up with a base ranking for each team. I started out with a set of teams that looked like this: (def teams #{ "Man Utd" "Man City" "Arsenal" "Chelsea"}) and I wanted to transform that into a map from the team to their ranking e.g. Man Utd -> {:points 1200} Man City -> {:points 1200} Arsenal -> {:points 1200} Chelsea -> {:points 1200} I had read the documentation of http://clojuredocs. Clojure: Converting a string to a date https://www.markhneedham.com/blog/2013/09/20/clojure-converting-a-string-to-a-date/ Fri, 20 Sep 2013 07:00:01 +0000 https://www.markhneedham.com/blog/2013/09/20/clojure-converting-a-string-to-a-date/ I wanted to do some date manipulation in Clojure recently and figured that since clj-time is a wrapper around Joda Time it’d probably do the trick. The first thing we need to do is add the dependency to our project file and then run lein reps to pull down the appropriate JARs. The project file should look something like this: project.clj (defproject ranking-algorithms "0.1.0-SNAPSHOT" :license {:name "Eclipse Public License" :url "http://www. Clojure: See every step of a reduce https://www.markhneedham.com/blog/2013/09/19/clojure-see-every-step-of-a-reduce/ Thu, 19 Sep 2013 23:57:49 +0000 https://www.markhneedham.com/blog/2013/09/19/clojure-see-every-step-of-a-reduce/ Last year I wrote about a Haskell function called scanl which returned the intermediate steps of a fold over a collection and last week I realised that I needed a similar function in Clojure to analyse a reduce I’d written. A simple reduce which adds together the numbers 1-10 would look like this: > (reduce + 0 (range 1 11)) 55 If we want to see the intermediate values of this function called then instead of using http://clojuredocs. Data Science: Don't build a crawler (if you can avoid it!) https://www.markhneedham.com/blog/2013/09/19/data-science-dont-build-a-crawler-if-you-can-avoid-it/ Thu, 19 Sep 2013 06:55:19 +0000 https://www.markhneedham.com/blog/2013/09/19/data-science-dont-build-a-crawler-if-you-can-avoid-it/ On Tuesday I spoke at the Data Science London meetup about football data and I started out by covering some lessons I’ve learnt about building data sets for personal use when open data isn’t available. When that’s the case you often end up scraping HTML pages to extract the data that you’re interested in and then storing that in files or in a database if you want to be more fancy. Clojure: Merge two maps but only keep the keys of one of them https://www.markhneedham.com/blog/2013/09/17/clojure-merge-two-maps-but-only-keep-the-keys-of-one-of-them/ Tue, 17 Sep 2013 01:03:37 +0000 https://www.markhneedham.com/blog/2013/09/17/clojure-merge-two-maps-but-only-keep-the-keys-of-one-of-them/ I’ve been playing around with Clojure maps recently and I wanted to merge two maps of rankings where the rankings in the second map overrode those in the first while only keeping the teams from the first map. The http://clojuredocs.org/clojure_core/clojure.core/merge function overrides keys in earlier maps but also adds keys that only appear in later maps. For example, if we merge the following maps: > (merge {"Man. United" 1500 "Man. Clojure: Updating keys in a map https://www.markhneedham.com/blog/2013/09/17/clojure-updating-keys-in-a-map/ Tue, 17 Sep 2013 00:24:48 +0000 https://www.markhneedham.com/blog/2013/09/17/clojure-updating-keys-in-a-map/ I’ve been playing with Clojure over the last few weeks and as a result I’ve been using a lot of maps to represent the data. For example if we have the following map of teams to Glicko ratings and ratings deviations: (def teams { "Man. United" {:points 1500 :rd 350} "Man. City" {:points 1450 :rd 300} }) We might want to increase Man. United’s points score by one for which we could use the http://clojuredocs. Glicko Rating System: A simple example using Clojure https://www.markhneedham.com/blog/2013/09/14/glicko-rating-system-a-simple-example-using-clojure/ Sat, 14 Sep 2013 21:02:30 +0000 https://www.markhneedham.com/blog/2013/09/14/glicko-rating-system-a-simple-example-using-clojure/ A couple of weeks ago I wrote about the Elo Rating system and when reading more about it I learnt that one of its weaknesses is that it doesn’t take into account the reliability of a players' rating. For example, a player may not have played for a long time. When they next play a match we shouldn’t assume that the accuracy of that rating is the same as for another player with the same rating but who plays regularly. Clojure: All things regex https://www.markhneedham.com/blog/2013/09/14/clojure-all-things-regex/ Sat, 14 Sep 2013 01:24:51 +0000 https://www.markhneedham.com/blog/2013/09/14/clojure-all-things-regex/ I’ve been doing some scrapping of web pages recently using Clojure and Enlive and as part of that I’ve had to write regular expressions to extract the data I’m interested in. On my travels I’ve come across a few different functions and I’m never sure which is the right one to use so I thought I’d document what I’ve tried for future me. Check if regex matches The first regex I wrote was while scrapping the Champions League results from the Rec. jackson-core-asl - java.lang.AbstractMethodError: org.codehaus.jackson.JsonNode.getValueAsText()Ljava/lang/String; https://www.markhneedham.com/blog/2013/09/14/jackson-core-asl-java-lang-abstractmethoderror-org-codehaus-jackson-jsonnode-getvalueastextljavalangstring/ Sat, 14 Sep 2013 00:06:37 +0000 https://www.markhneedham.com/blog/2013/09/14/jackson-core-asl-java-lang-abstractmethoderror-org-codehaus-jackson-jsonnode-getvalueastextljavalangstring/ Ian and I were doing a bit of work on an internal application which processes JSON messages and interacts with AWS and we started seeing the following exception after doing an upgrade of http://mvnrepository.com/artifact/org.codehaus.jackson/jackson-mapper-asl from 1.8.9 to 1.9.13: 2013-09-13 11:01:50 +0000: Exception while handling {MessageId: 7e695fb3-549a-4b 40-b1cf-9dbc5e97a8df, ... } java.lang.AbstractMethodError: org.codehaus.jackson.JsonNode.getValueAsText()Lja va/lang/String; ... at com.amazonaws.services.sqs.AmazonSQSAsyncClient$20.call(AmazonSQSAsyn cClient.java:1200) at com.amazonaws.services.sqs.AmazonSQSAsyncClient$20.call(AmazonSQSAsyn cClient.java:1191) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334) at java.util.concurrent.FutureTask.run(FutureTask.java:166) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor. java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor .java:615) at java. Elo Rating System: Ranking Champions League teams using Clojure https://www.markhneedham.com/blog/2013/08/31/elo-rating-system-ranking-champions-league-teams-using-clojure/ Sat, 31 Aug 2013 13:01:16 +0000 https://www.markhneedham.com/blog/2013/08/31/elo-rating-system-ranking-champions-league-teams-using-clojure/ As I mentioned in an earlier blog post I’ve been learning about ranking systems and one of the first ones I came across was the Elo rating system which is most famously used to rank chess players. The Elo rating system uses the following formula to work out a player/team’s ranking after they’ve participated in a match: R' = R + K * (S - E) R' is the new rating Neo4j's Graph Café London - 28th August 2013 https://www.markhneedham.com/blog/2013/08/31/neo4js-graph-cafe-london-28th-august-2013/ Sat, 31 Aug 2013 10:52:14 +0000 https://www.markhneedham.com/blog/2013/08/31/neo4js-graph-cafe-london-28th-august-2013/ On Wednesday evening I attended an interesting spin on the monthly Neo4j meetup, where instead of the usual 'talk then go to the pub afterwards' format my colleagues Rik and Arturas organised Graph Café in the Doggetts Coat and Badge pub in Blackfriars. The format was changed as well - the evening consisted of ~10 lightening talks which were spread out over about 3 hours, an approach Rik has used at similar events in Belgium and Holland earlier in the year. Clojure: Handling state by updating a vector inside an atom https://www.markhneedham.com/blog/2013/08/30/clojure-handling-state-by-updating-a-vector-inside-an-atom/ Fri, 30 Aug 2013 12:23:21 +0000 https://www.markhneedham.com/blog/2013/08/30/clojure-handling-state-by-updating-a-vector-inside-an-atom/ As I mentioned in a previous blog post, I’ve been learning about ranking algorithms and I wanted to apply them to a series of football matches to see who the strongest team was. Before that, however, I wanted to sketch out the functions that I’d need to do this and I started with the following collections of matches and team rankings: (def m [{:home "Manchester United", :away "Manchester City", :home_score 1, :away_score 0} {:home "Manchester United", :away "Manchester City", :home_score 2, :away_score 0}]) (def teams [ {:name "Manchester United" :points 1200} {:name "Manchester City" :points 1200} ]) I wanted to iterate over the matches and make the appropriate updates to the teams' rankings depending on the result of the match. Clojure/Enlive: Screen scraping a HTML file from disk https://www.markhneedham.com/blog/2013/08/26/clojureenlive-screen-scraping-a-html-file-from-disk/ Mon, 26 Aug 2013 17:58:58 +0000 https://www.markhneedham.com/blog/2013/08/26/clojureenlive-screen-scraping-a-html-file-from-disk/ I wanted to play around with some Champions League data and I came across the Rec Sport Soccer Statistics Foundation which has collected results of all matches since the tournament started in 1955. I wanted to get a list of all the matches for a specific season so I started out by downloading the file: $ pwd /tmp/football $ wget http://www.rsssf.com/ec/ec200203det.html The next step was to load that page and then run a CSS selector over it to extract the matches. Ranking Systems: What I've learnt so far https://www.markhneedham.com/blog/2013/08/24/ranking-systems-what-ive-learnt-so-far/ Sat, 24 Aug 2013 11:05:58 +0000 https://www.markhneedham.com/blog/2013/08/24/ranking-systems-what-ive-learnt-so-far/ I often go off on massive tangents reading all about a new topic but don’t record what I’ve read so if I go back to the topic again in the future I have to start from scratch which is quite frustrating. In this instance after playing around with calculating the eigenvector centrality of a sub graph I learnt that this algorithm can also be used in ranking systems. I started off by reading a paper written by James Keener about the Perron-Frobenius Theorem and the ranking of American football teams. Unix: tar - Extracting, creating and viewing archives https://www.markhneedham.com/blog/2013/08/22/unix-tar-extracting-creating-and-viewing-archives/ Thu, 22 Aug 2013 22:56:23 +0000 https://www.markhneedham.com/blog/2013/08/22/unix-tar-extracting-creating-and-viewing-archives/ I’ve been playing around with the Unix tar command a bit this week and realised that I’d memorised some of the flag combinations but didn’t actually know what each of them meant. For example, one of the most common things that I want to do is extract a gripped neo4j archive: $ wget http://dist.neo4j.org/neo4j-community-1.9.2-unix.tar.gz $ tar -xvf neo4j-community-1.9.2-unix.tar.gz where: -x means extract -v means produce verbose output i.e. print out the names of all the files as you unpack it Products & Infinite configurability https://www.markhneedham.com/blog/2013/08/22/products-infinite-configurability/ Thu, 22 Aug 2013 22:11:35 +0000 https://www.markhneedham.com/blog/2013/08/22/products-infinite-configurability/ One of the common feature requests on the ThoughtWorks projects that I worked on was that the application we were working on should be almost infinitely configurable to cover potential future use cases. My experience of attempting to do this was that you ended up with an extremely complicated code base and those future use cases often didn’t come to fruition. It therefore made more sense to solve the problem at hand and then make the code more configurable if/when the need arose. Model to answer your questions rather than modelling reality https://www.markhneedham.com/blog/2013/08/22/model-to-answer-your-questions-rather-than-modelling-reality/ Thu, 22 Aug 2013 21:26:10 +0000 https://www.markhneedham.com/blog/2013/08/22/model-to-answer-your-questions-rather-than-modelling-reality/ On the recommendation of Ian Robinson I’ve been reading the 2nd edition of William’s Kent’s 'Data and Reality' and the author makes an interesting observation at the end of the first chapter which resonated with me: Once more: we are not modelling reality, but the way information about reality is processed, by people. It reminds me of similar advice in Eric Evans' Domain Driven Design and it’s advice which I believe is helpful when designing a model in a graph database. Coding: Hack then revert https://www.markhneedham.com/blog/2013/08/19/coding-hack-then-revert/ Mon, 19 Aug 2013 23:13:04 +0000 https://www.markhneedham.com/blog/2013/08/19/coding-hack-then-revert/ For a long while my default approach when I came across a new code base that I wanted to change was to read all the code and try and understand how it all fitted together by sketching out flow of control diagrams. Only after I’d done that would I start planning how I could make my changes. This works reasonably well but it’s quite time consuming and a couple of years ago a former colleague (I can’t remember who! BT Internet: Non existent hosts mapping to 92.242.132.15 https://www.markhneedham.com/blog/2013/08/17/bt-internet-non-existent-hosts-mapping-to-92-242-132-15/ Sat, 17 Aug 2013 21:13:27 +0000 https://www.markhneedham.com/blog/2013/08/17/bt-internet-non-existent-hosts-mapping-to-92-242-132-15/ We have a test in our code which checks for unresolvable hosts and it started failing for me because instead of throwing an UnknownHostException from the following call: InetAddress.getByName( "host.that.is.invalid" ) I was getting back a valid although unreachable host. When I called ping it was easier to see what was going on: $ ping host.that.is.invalid PING host.that.is.invalid (92.242.132.15): 56 data bytes Request timeout for icmp_seq 0 Request timeout for icmp_seq 1 Request timeout for icmp_seq 2 As you can see, that hostname is resolving to '92. Jersey Client: java.net.ProtocolException: Server redirected too many times/Setting cookies on request https://www.markhneedham.com/blog/2013/08/17/jersey-client-java-net-protocolexception-server-redirected-too-many-timessetting-cookies-on-request/ Sat, 17 Aug 2013 20:25:28 +0000 https://www.markhneedham.com/blog/2013/08/17/jersey-client-java-net-protocolexception-server-redirected-too-many-timessetting-cookies-on-request/ A couple of weeks ago I was trying to write a test around some OAuth code that we have on an internal application and I was using Jersey Client to send the various requests. I initially started with the following code: Client = Client.create(); ClientResponse response = client.resource( "http://localhost:59680" ).get( ClientResponse.class ); but when I ran the test I was getting the following exception: com.sun.jersey.api.client.ClientHandlerException: java.net.ProtocolException: Server redirected too many times (20) at com. Python: for/list comprehensions and dictionaries https://www.markhneedham.com/blog/2013/08/13/python-forlist-comprehensions-and-dictionaries/ Tue, 13 Aug 2013 22:59:52 +0000 https://www.markhneedham.com/blog/2013/08/13/python-forlist-comprehensions-and-dictionaries/ I’ve been working through Coursera’s Linear Algebra course and since all of the exercises are in Python I’ve been playing around with it again. One interesting thing I learnt is that you can construct dictionaries using a list comprehension type syntax. For example, if we start with the following dictionaries: >>> x = { "a": 1, "b":2 } >>> y = {1: "mark", 2: "will"} >>> x {'a': 1, 'b': 2} >>> y {1: 'mark', 2: 'will'} We might want to create a new dictionary which links from the keys in x to the values in y. 9 algorithms that changed the future - John MacCormick: Book Review https://www.markhneedham.com/blog/2013/08/13/9-algorithms-that-changed-the-future-john-maccormick-book-review/ Tue, 13 Aug 2013 20:00:45 +0000 https://www.markhneedham.com/blog/2013/08/13/9-algorithms-that-changed-the-future-john-maccormick-book-review/ The Book 9 algorithms that changed the future (the ingenious ideas that drive today’s computers) by John MacCormick My Thoughts I came across this book while idly browsing a book store and since I’ve found most introduction to algorithms books very dry I thought it’d be interesting to see what one aimed at the general public would be like. Overall it was an enjoyable read and I quite like the pattern that the author used for each algorithm, which was: Jersey Client: com.sun.jersey.api.client.UniformInterfaceException https://www.markhneedham.com/blog/2013/08/11/jersey-client-com-sun-jersey-api-client-uniforminterfaceexception/ Sun, 11 Aug 2013 08:07:01 +0000 https://www.markhneedham.com/blog/2013/08/11/jersey-client-com-sun-jersey-api-client-uniforminterfaceexception/ As I mentioned in a post a couple of weeks ago we’ve been doing some which involved calling the neo4j server’s HA URI to determine whether a machine was slave or master. We started off with the following code using jersey-client: public class HaSpike { public static void main(String[] args) { String response = client() .resource("http://localhost:7474/db/manage/server/ha/slave") .accept(MediaType.TEXT_PLAIN) .get(String.class); System.out.println("response = " + response); } private static Client client() { DefaultClientConfig defaultClientConfig = new DefaultClientConfig(); defaultClientConfig. neo4j: Extracting a subgraph as an adjacency matrix and calculating eigenvector centrality with JBLAS https://www.markhneedham.com/blog/2013/08/11/neo4j-extracting-a-subgraph-as-an-adjacency-matrix-and-calculating-eigenvector-centrality-with-jblas/ Sun, 11 Aug 2013 07:23:31 +0000 https://www.markhneedham.com/blog/2013/08/11/neo4j-extracting-a-subgraph-as-an-adjacency-matrix-and-calculating-eigenvector-centrality-with-jblas/ Earlier in the week I wrote a blog post showing how to calculate the eigenvector centrality of an adjacency matrix using JBLAS and the next step was to work out the eigenvector centrality of a neo4j sub graph. There were 3 steps involved in doing this: Export the neo4j sub graph as an adjacency matrix Run JBLAS over it to get eigenvector centrality scores for each node Write those scores back into neo4j Java/JBLAS: Calculating eigenvector centrality of an adjacency matrix https://www.markhneedham.com/blog/2013/08/05/javajblas-calculating-eigenvector-centrality-of-an-adjacency-matrix/ Mon, 05 Aug 2013 22:12:37 +0000 https://www.markhneedham.com/blog/2013/08/05/javajblas-calculating-eigenvector-centrality-of-an-adjacency-matrix/ I recently came across a very interesting post by Kieran Healy where he runs through a bunch of graph algorithms to see whether he can detect the most influential people behind the American Revolution based on their membership of various organisations. The first algorithm he looked at was betweenness centrality which I’ve looked at previously and is used to determine the load and importance of a node in a graph. AWS: Attaching an EBS volume on an EC2 instance and making it available for use https://www.markhneedham.com/blog/2013/07/31/aws-attaching-an-ebs-volume-on-an-ec2-instance-and-making-it-available-for-use/ Wed, 31 Jul 2013 06:21:42 +0000 https://www.markhneedham.com/blog/2013/07/31/aws-attaching-an-ebs-volume-on-an-ec2-instance-and-making-it-available-for-use/ I recently wanted to attach an EBS volume to an existing EC2 instance that I had running and since it was for a one off tasks (famous last words) I decided to configure it manually. I created the EBS volume through the AWS console and one thing that initially caught me out is that the EC2 instance and EBS volume need to be in the same region and zone. Getting started with screen https://www.markhneedham.com/blog/2013/07/31/getting-started-with-screen/ Wed, 31 Jul 2013 05:41:12 +0000 https://www.markhneedham.com/blog/2013/07/31/getting-started-with-screen/ Last week I had a ~10GB file I wanted to download to my machine but Chrome’s initial estimate was that it would take 10+ hours to do so which meant I’d have probably shutdown my machine before it had completed. It seemed to make more sense to spin up an EC2 instance and download it onto there instead but I didn’t want to have to keep an SSH session open to that machine either. s3cmd: put fails with "`Connection reset by peer`" for large files https://www.markhneedham.com/blog/2013/07/30/s3cmd-put-fails-with-connection-reset-by-peer-for-large-files/ Tue, 30 Jul 2013 16:20:16 +0000 https://www.markhneedham.com/blog/2013/07/30/s3cmd-put-fails-with-connection-reset-by-peer-for-large-files/ I recently wanted to copy some large files from an AWS instance into an S3 bucket using s3cmd but ended up with the following error when trying to use the 'put' command: $ s3cmd put /mnt/ebs/myfile.tar s3://mybucket.somewhere.com /mnt/ebs/myfile.tar -> s3://mybucket.somewhere.com/myfile.tar [1 of 1] 1077248 of 12185313280 0% in 1s 937.09 kB/s failed WARNING: Upload failed: /myfile.tar ([Errno 104] Connection reset by peer) WARNING: Retrying on lower speed (throttle=0.00) WARNING: Waiting 3 sec. netcat: Strange behaviour with UDP - only receives first packet sent https://www.markhneedham.com/blog/2013/07/30/netcat-strange-behaviour-with-udp-only-receives-first-packet-sent/ Tue, 30 Jul 2013 06:01:47 +0000 https://www.markhneedham.com/blog/2013/07/30/netcat-strange-behaviour-with-udp-only-receives-first-packet-sent/ I was playing around with netcat yesterday to create a client and server which would communicate via UDP packets and I rediscovered some "weird" behaviour which I’d previously encountered but not explained. I started up a netcat server listening for UDP packets on port 9000 of my machine: $ nc -kluv localhost 9000 We can check with lsof what running that command has done: $ lsof -Pni :9000 COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME nc 63289 markhneedham 5u IPv6 0xc99222a54b3975b5 0t0 UDP [::1]:9000 We can see that the netcat process is listening on port 9000 so let’s send it a UDP packet, using another netcat process: Jersey Client: Testing external calls https://www.markhneedham.com/blog/2013/07/28/jersey-client-testing-external-calls/ Sun, 28 Jul 2013 20:43:24 +0000 https://www.markhneedham.com/blog/2013/07/28/jersey-client-testing-external-calls/ Jim and I have been doing a bit of work over the last week which involved calling neo4j’s HA status URI to check whether or not an instance was a master/slave and we’ve been using jersey-client. The code looked roughly like this: class Neo4jInstance { private Client httpClient; private URI hostname; public Neo4jInstance(Client httpClient, URI hostname) { this.httpClient = httpClient; this.hostname = hostname; } public Boolean isSlave() { String slaveURI = hostname. Product Documentation: The receiver decides if it's successful https://www.markhneedham.com/blog/2013/07/28/product-documentation-the-receiver-decides-if-its-successful/ Sun, 28 Jul 2013 16:18:38 +0000 https://www.markhneedham.com/blog/2013/07/28/product-documentation-the-receiver-decides-if-its-successful/ One of the things I remember being taught while growing up is that in an interaction where somebody is something to someone else it’s their responsibility to do so in a way that the receiver can understand. Even if you think you’ve done a good job of explaining something, the receiver of the communication decides whether or not that’s the case. I’d always assumed that this advice made most sense in the context of a one to one conversation but recently I’ve realised that it also makes sense when thinking about product documentation. Graph Processing: Betweeness Centrality - neo4j's cypher vs graphstream https://www.markhneedham.com/blog/2013/07/27/graph-processing-betweeness-centrality-neo4js-cypher-vs-graphstream/ Sat, 27 Jul 2013 11:21:52 +0000 https://www.markhneedham.com/blog/2013/07/27/graph-processing-betweeness-centrality-neo4js-cypher-vs-graphstream/ Last week I wrote about the betweenness centrality algorithm and my attempts to understand it using graphstream and while reading the source I realised that I might be able to put something together using neo4j’s all shortest paths algorithm. To recap, the betweenness centrality algorithm is used to determine the load and importance of a node in a graph. While talking about this with Jen she pointed out that calculating the betweenness centrality of nodes across the whole graph often doesn’t make sense. neo4j/cypher: Getting the hang of query parameters https://www.markhneedham.com/blog/2013/07/27/neo4jcypher-getting-the-hang-of-query-parameters/ Sat, 27 Jul 2013 09:30:26 +0000 https://www.markhneedham.com/blog/2013/07/27/neo4jcypher-getting-the-hang-of-query-parameters/ For as long as I’ve been using neo4j's cypher query language Michael has been telling me to use parameters in my queries but the performance of the queries was always acceptable so I didn’t feel the need. However, recently I was playing around with a data set and I created ~500 nodes using code similar to this: require 'open-uri' open("data/people.cyp", 'w') { |f| (1..500).each do |value| f.puts("CREATE (p:Person{name: \"#{value}\"})") end } That creates a file of cypher statements that look like this: On "The fear of blogging about technical topics" https://www.markhneedham.com/blog/2013/07/22/on-the-fear-of-blogging-about-technical-topics/ Mon, 22 Jul 2013 23:47:28 +0000 https://www.markhneedham.com/blog/2013/07/22/on-the-fear-of-blogging-about-technical-topics/ My former colleague Anne Simmons recently wrote an interesting post in which she describes some of the reasons that she finds herself not wanting to write about technical topics.. I wrote a post at the end of 2012 in which I explained some of the reasons why I think writing about what you learn is a good idea but Anne brought up some things I hadn’t thought of which I think are worth addressing. Lessons from supporting production code https://www.markhneedham.com/blog/2013/07/22/lessons-from-supporting-production-code/ Mon, 22 Jul 2013 22:37:28 +0000 https://www.markhneedham.com/blog/2013/07/22/lessons-from-supporting-production-code/ Until I started working on the uSwitch energy website around 8 months ago I had not really done any support of a production system so I learnt some interesting lessons in my time there. Look at the new code first We had our application wired up to Airbrake so whenever a user did anything which resulted in an exception being thrown we received a report with the stack trace, environment variables and which page they were on. Jersey: Listing all resources, paths, verbs to build an entry point/index for an API https://www.markhneedham.com/blog/2013/07/21/jersey-listing-all-resources-paths-verbs-to-build-an-entry-pointindex-for-an-api/ Sun, 21 Jul 2013 11:07:11 +0000 https://www.markhneedham.com/blog/2013/07/21/jersey-listing-all-resources-paths-verbs-to-build-an-entry-pointindex-for-an-api/ I’ve been playing around with Jersey over the past couple of days and one thing I wanted to do was create an entry point or index which listed all my resources, the available paths and the verbs they accepted. Guido Simone explained a neat way of finding the paths and verbs for a specific resource using Jersey’s http://grepcode.com/file/repo1.maven.org/maven2/com.sun.jersey/jersey-server/1.0.3/com/sun/jersey/server/impl/modelapi/annotation/IntrospectionModeller.java: AbstractResource resource = IntrospectionModeller.createResource(JacksonResource.class); System.out.println("Path is " + resource.getPath().getValue()); String uriPrefix = resource. Jersey Server: com.sun.jersey.api.MessageException: A message body writer for Java class org.codehaus.jackson.node.ObjectNode and MIME media type application/json was not found https://www.markhneedham.com/blog/2013/07/21/jersey-server-com-sun-jersey-api-messageexception-a-message-body-writer-for-java-class-org-codehaus-jackson-node-objectnode-and-mime-media-type-applicationjson-was-not-found/ Sun, 21 Jul 2013 10:37:45 +0000 https://www.markhneedham.com/blog/2013/07/21/jersey-server-com-sun-jersey-api-messageexception-a-message-body-writer-for-java-class-org-codehaus-jackson-node-objectnode-and-mime-media-type-applicationjson-was-not-found/ I’ve been reacquainted with my good friend Jersey over the last couple of days and in getting up and running was reminded that things which seemed easy at the time aren’t as easy when starting from scratch. I eventually settled on using Sunny Gleason's j4-minimal repository which wires up Jersey with Jackson, Guice and Jetty which seemed like a good place to start. I prefer building up JSON objects explicitly rather than setting up automatic mapping so the first thing I did was change the https://github. Graph Processing: Calculating betweenness centrality for an undirected graph using graphstream https://www.markhneedham.com/blog/2013/07/19/graph-processing-calculating-betweenness-centrality-for-an-undirected-graph-using-graphstream/ Fri, 19 Jul 2013 00:37:41 +0000 https://www.markhneedham.com/blog/2013/07/19/graph-processing-calculating-betweenness-centrality-for-an-undirected-graph-using-graphstream/ Since I now spend most of my time surrounded by graphs I thought it’d be interesting to learn a bit more about graph processing, a topic my colleague Jim wrote about a couple of years ago. I like to think of the types of queries you’d do with a graph processing engine as being similar in style graph global queries where you take most of the nodes in a graph into account and do some sort of calculation. Git: Commit squashing made even easier using 'git branch --set-upstream' https://www.markhneedham.com/blog/2013/07/16/git-commit-squashing-made-even-easier-using-git-branch-set-upstream/ Tue, 16 Jul 2013 08:13:02 +0000 https://www.markhneedham.com/blog/2013/07/16/git-commit-squashing-made-even-easier-using-git-branch-set-upstream/ A few days ago I wrote a blog post describing how I wanted to squash a series of commits into one bigger one before making a pull request and in the comments Rob Hunter showed me an even easier way to do so. To recap, by the end of the post I had the following git config: $ cat .git/config [remote "origin"] fetch = +refs/heads/*:refs/remotes/origin/* url = git@github.com:mneedham/neo4j-shell-tools.git [branch "master"] remote = origin merge = refs/heads/master [remote "base"] url = git@github. Java: Testing a socket is listening on all network interfaces/wildcard interface https://www.markhneedham.com/blog/2013/07/14/java-testing-a-socket-is-listening-on-all-network-interfaceswildcard-interface/ Sun, 14 Jul 2013 14:31:44 +0000 https://www.markhneedham.com/blog/2013/07/14/java-testing-a-socket-is-listening-on-all-network-interfaceswildcard-interface/ I previously wrote a blog post describing how I’ve been trying to learn more about network sockets in which I created some server sockets and connected to them using netcat. The next step was to do the same thing in Java and I started out by writing a server socket which echoed any messages sent by the client: public class EchoServer { public static void main(String[] args) throws IOException { int port = 4444; ServerSocket serverSocket = new ServerSocket(port, 50, InetAddress. Learning more about network sockets https://www.markhneedham.com/blog/2013/07/14/learning-more-about-network-sockets/ Sun, 14 Jul 2013 09:52:17 +0000 https://www.markhneedham.com/blog/2013/07/14/learning-more-about-network-sockets/ While reading through some of the neo4j code a few weeks ago I realised that I didn’t have a very good understanding about the mechanics behind network ports/sockets so I thought I’d try to learn more. In particular I’d not considered what binding a socket to different network interfaces meant so I decided to setup a few examples using netcat to help me understand better. To start with let’s list the network interfaces that I have on my machine using ifconfig: Git/GitHub: Squashing all commits before sending a pull request https://www.markhneedham.com/blog/2013/07/13/gitgithub-squashing-all-commits-before-sending-a-pull-request/ Sat, 13 Jul 2013 18:47:49 +0000 https://www.markhneedham.com/blog/2013/07/13/gitgithub-squashing-all-commits-before-sending-a-pull-request/ My colleague Michael has been doing some work to make it easier for people to import data into neo4j and his latest attempt is neo4j-shell-tools which adds some additional commands to the neo4j-shell. I’ve spent a bit of time refactoring the readme which I’d done on a branch of my fork of the repository and consisted of 46 commits, most changing 2 or 3 lines. I wanted to send Michael a pull request on Github but first I needed to squash all my commits down into a single one. neo4j Unmanaged Extension: Creating gzipped streamed responses with Jetty https://www.markhneedham.com/blog/2013/07/08/neo4j-unmanaged-extension-creating-gzipped-streamed-responses-with-jetty/ Mon, 08 Jul 2013 23:48:23 +0000 https://www.markhneedham.com/blog/2013/07/08/neo4j-unmanaged-extension-creating-gzipped-streamed-responses-with-jetty/ I recently wrote a blog post describing how we created a streamed response and the next thing we wanted to do was gzip the response to shrink it’s size a bit. A bit of searching led to GZIPContentEncodingFilter popping up a lot of times but this is actually needed for a client processing a gripped response rather than helping us to gzip a response from the server. I noticed that there was a question about this on the mailing list from about a year ago although Michael pointed out that the repository has now moved and the example is available here instead. JAX RS: Streaming a Response using StreamingOutput https://www.markhneedham.com/blog/2013/07/08/jax-rs-streaming-a-response-using-streamingoutput/ Mon, 08 Jul 2013 23:19:32 +0000 https://www.markhneedham.com/blog/2013/07/08/jax-rs-streaming-a-response-using-streamingoutput/ A couple of weeks ago Jim and I were building out a neo4j unmanaged extension from which we wanted to return the results of a traversal which had a lot of paths. Our code initially looked a bit like this: package com.markandjim @Path("/subgraph") public class ExtractSubGraphResource { private final GraphDatabaseService database; public ExtractSubGraphResource(@Context GraphDatabaseService database) { this.database = database; } @GET @Produces(MediaType.TEXT_PLAIN) @Path("/{nodeId}/{depth}") public Response hello(@PathParam("nodeId") long nodeId, @PathParam("depth") int depth) { Node node = database. Survivorship Bias and Product Development https://www.markhneedham.com/blog/2013/07/08/survivorship-bias-and-product-development/ Mon, 08 Jul 2013 22:14:38 +0000 https://www.markhneedham.com/blog/2013/07/08/survivorship-bias-and-product-development/ A couple of months ago I came across an interesting article by the author of 'You Are Not So Smart' about a fallacy known as 'Survivorship Bias' which Wikipedia defines as: The logical error of concentrating on the people or things that "survived" some process and inadvertently overlooking those that didn’t because of their lack of visibility. I particularly liked the story describing how Abraham Wald helped the US military overcome an instance of this error when trying to work out where to place armour on their bomber planes: Ruby: Calculating the orthodromic distance using the Haversine formula https://www.markhneedham.com/blog/2013/06/30/ruby-calculating-the-orthodromic-distance-using-the-haversine-formula/ Sun, 30 Jun 2013 22:53:14 +0000 https://www.markhneedham.com/blog/2013/06/30/ruby-calculating-the-orthodromic-distance-using-the-haversine-formula/ As part of the UI I’m building around my football stadiums data set I wanted to calculate the distance from a football stadium to a point on the map in Ruby since cypher doesn’t currently return this value. I had the following cypher query to return the football stadiums near Westminster along with their lat/long values: lat, long, distance = ["51.55786291569685", "0.144195556640625", 10] query = " START node = node:geom('withinDistance:[#{lat}, #{long}, #{distance}]')" query << " RETURN node. Leaflet JS: Resizing a map to keep a circle diameter inside it https://www.markhneedham.com/blog/2013/06/30/leaflet-js-resizing-a-map-to-keep-a-circle-diameter-inside-it/ Sun, 30 Jun 2013 22:23:50 +0000 https://www.markhneedham.com/blog/2013/06/30/leaflet-js-resizing-a-map-to-keep-a-circle-diameter-inside-it/ I’ve been working on creating a UI to make searching for the football stadiums that I wrote about last week a bit easier and I thought I’d give Leaflet JS a try. Leaflet is a Javascript library which was recommended to me by Jason Neylon) and can be used as a wrapper around Open Street Map. I started by creating a simple form where you could fill in a lat/long and distance and it would centre the map on that lat/long and show you a list of the stadiums within that diameter next to the map. Vagrant: Multi (virtual) machine with Puppet roles https://www.markhneedham.com/blog/2013/06/30/vagrant-multi-virtual-machine-with-puppet-roles/ Sun, 30 Jun 2013 13:13:14 +0000 https://www.markhneedham.com/blog/2013/06/30/vagrant-multi-virtual-machine-with-puppet-roles/ I’ve been playing around with setting up a neo4j cluster using Vagrant and HAProxy and one thing I wanted to do was define two different roles for the HAProxy and neo4j machines. When I was working at uSwitch Nathan had solved a similar problem, but with AWS VMs, by defining the role in an environment variable in the VM’s spin up script. In retrospect I think I might have been able to do that by using the shell provisioner and calling that before the puppet provisioner but Nathan, Gareth Rushgrove and Gregor Russbuelt suggested that using facter might be better. Vagrant 1.2.2: `[]': can't convert Symbol into Integer (TypeError)/The following settings don't exist https://www.markhneedham.com/blog/2013/06/29/vagrant-1-2-2-cant-convert-symbol-into-integer-typeerrorthe-following-settings-dont-exist/ Sat, 29 Jun 2013 08:44:00 +0000 https://www.markhneedham.com/blog/2013/06/29/vagrant-1-2-2-cant-convert-symbol-into-integer-typeerrorthe-following-settings-dont-exist/ As I mentioned in my previous post I’ve been playing around with Vagrant for the past couple of days and I was trying to adapt a Vagrantfile that Nathan created a few months ago to do what I wanted. I’m using Vagrant 1.2.2 and I started out with the following Vagrantfile: Vagrant.configure("2") do |config| config.vm.box = "precise64" config.vm.box_url = "http://files.vagrantup.com/precise64.box" config.vm.define :neo01 do |neo| neo.vm.network :hostonly, "192.168.33.101" neo.vm.forward_port 8080, 4569 end end Unfortunately a 'vagrant up' doesn’t quite work as expected: Vagrant/Virtual Box: There was an error executing the following command with VBoxManage - Progress object failure: NS_ERROR_CALL_FAILED https://www.markhneedham.com/blog/2013/06/29/vagrantvirtual-box-there-was-an-error-executing-the-following-command-with-vboxmanage-progress-object-failure-ns_error_call_failed/ Sat, 29 Jun 2013 07:38:35 +0000 https://www.markhneedham.com/blog/2013/06/29/vagrantvirtual-box-there-was-an-error-executing-the-following-command-with-vboxmanage-progress-object-failure-ns_error_call_failed/ I’ve been playing around with Vagrant a bit again lately and having installed it on a new machine was running into the following exception when I tried to run 'vagrant up' on a new virtual machine: ERROR vagrant: /Applications/Vagrant/embedded/gems/gems/vagrant-1.1.2/plugins/providers/virtualbox/driver/base.rb:292:in `block in execute' /Applications/Vagrant/embedded/gems/gems/vagrant-1.1.2/lib/vagrant/util/retryable.rb:17:in `retryable' /Applications/Vagrant/embedded/gems/gems/vagrant-1.1.2/plugins/providers/virtualbox/driver/base.rb:282:in `execute' /Applications/Vagrant/embedded/gems/gems/vagrant-1.1.2/plugins/providers/virtualbox/driver/version_4_2.rb:165:in `import' /Applications/Vagrant/embedded/gems/gems/vagrant-1.1.2/plugins/providers/virtualbox/action/import.rb:15:in `call' /Applications/Vagrant/embedded/gems/gems/vagrant-1.1.2/lib/vagrant/action/warden.rb:34:in `call' /Applications/Vagrant/embedded/gems/gems/vagrant-1.1.2/lib/vagrant/action/builtin/handle_box_url.rb:38:in `call' /Applications/Vagrant/embedded/gems/gems/vagrant-1.1.2/lib/vagrant/action/warden.rb:34:in `call' /Applications/Vagrant/embedded/gems/gems/vagrant-1.1.2/plugins/providers/virtualbox/action/check_accessible.rb:18:in `call' /Applications/Vagrant/embedded/gems/gems/vagrant-1.1.2/lib/vagrant/action/warden.rb:34:in `call' /Applications/Vagrant/embedded/gems/gems/vagrant-1.1.2/lib/vagrant/action/runner.rb:61:in `block in run' /Applications/Vagrant/embedded/gems/gems/vagrant-1.1.2/lib/vagrant/util/busy.rb:19:in `busy' /Applications/Vagrant/embedded/gems/gems/vagrant-1.1.2/lib/vagrant/action/runner.rb:61:in `run' /Applications/Vagrant/embedded/gems/gems/vagrant-1.1.2/lib/vagrant/action/builtin/call.rb:51:in `call' /Applications/Vagrant/embedded/gems/gems/vagrant-1. neo4j/cypher: Aggregating relationships within a path https://www.markhneedham.com/blog/2013/06/27/neo4jcypher-aggregating-relationships-within-a-path/ Thu, 27 Jun 2013 10:32:18 +0000 https://www.markhneedham.com/blog/2013/06/27/neo4jcypher-aggregating-relationships-within-a-path/ I recently came across an interesting use case of paths in a graph where we wanted to calculate the frequency of communication between two people by showing how frequently each emailed the other. The model looked like this: which we can create with the following cypher statements: CREATE (email1 { name: 'Email 1', title: 'Some stuff' }) CREATE (email2 { name: 'Email 2', title: "Absolutely irrelevant" }) CREATE (email3 { name: 'Email 3', title: "Something else" }) CREATE (person1 { name: 'Mark' }) CREATE (person2 { name: 'Jim' }) CREATE (person3 { name: 'Alistair' }) CREATE (person1)-[:SENT]->(email1) CREATE (person2)-[:RECEIVED]->(email1) CREATE (person3)-[:RECEIVED]->(email1) CREATE (person1)-[:SENT]->(email2) CREATE (person2)-[:RECEIVED]->(email2) CREATE (person2)-[:SENT]->(email3) CREATE (person1)-[:RECEIVED]->(email3) We want to return a list containing pairs of people and how many times they emailed each other, so in this case we want to return a table showing the following: Unix/awk: Extracting substring using a regular expression with capture groups https://www.markhneedham.com/blog/2013/06/26/unixawk-extracting-substring-using-a-regular-expression-with-capture-groups/ Wed, 26 Jun 2013 15:23:14 +0000 https://www.markhneedham.com/blog/2013/06/26/unixawk-extracting-substring-using-a-regular-expression-with-capture-groups/ A couple of years ago I wrote a blog post explaining how I’d used GNU awk to extract story numbers from git commit messages and I wanted to do a similar thing today to extract some node ids from a file. My eventual solution looked like this: $ echo "mark #1000" | gawk '{ match($0, /#([0-9]+)/, arr); if(arr[1] != "") print arr[1] }' 1000 But in the comments an alternative approach was suggested which used the Mac version of awk and the RSTART and RLENGTH global variables which get set when a match is found: neo4j Spatial: Indexing football stadiums using the REST API https://www.markhneedham.com/blog/2013/06/24/neo4j-spatial-indexing-football-stadiums-using-the-rest-api/ Mon, 24 Jun 2013 07:17:15 +0000 https://www.markhneedham.com/blog/2013/06/24/neo4j-spatial-indexing-football-stadiums-using-the-rest-api/ Late last week my colleague Peter wrote up some documentation about creating spatial indexes in neo4j via HTTP, something I hadn’t realised was possible until then. I previously wrote about indexing football stadiums using neo4j spatial but the annoying thing about the approach I described was that I was using neo4j in embedded mode which restricts you to using a JVM language. The rest of my code is in Ruby so I thought I’d translate that code. neo4j: A simple example using the JDBC driver https://www.markhneedham.com/blog/2013/06/20/neo4j-a-simple-example-using-the-jdbc-driver/ Thu, 20 Jun 2013 07:21:46 +0000 https://www.markhneedham.com/blog/2013/06/20/neo4j-a-simple-example-using-the-jdbc-driver/ Michael recently pointed me to the neo4j JDBC driver which he and Rickard have written so I thought I’d try and port the code from my previous post to use that instead of the console. To start with I added the following dependencies to my POM file: <dependencies> ... <dependency> <groupId>org.neo4j</groupId> <artifactId>neo4j-jdbc</artifactId> <version>1.9</version> </dependency> </dependencies> <repositories> <repository> <id>neo4j-maven</id> <name>neo4j maven</name> <url>http://m2.neo4j.org</url> </repository> </repositories> I then tried to create a connection to a local neo4j server instance that I had running on port 7474: neo4j/cypher: CREATE with optional properties https://www.markhneedham.com/blog/2013/06/20/neo4jcypher-create-with-optional-properties/ Thu, 20 Jun 2013 06:31:11 +0000 https://www.markhneedham.com/blog/2013/06/20/neo4jcypher-create-with-optional-properties/ I’ve written before about using the cypher CREATE statement to add inferred information to a neo4j graph and sometimes we want to do that but have to deal with optional properties while creating our new relationships. For example let’s say we have the following people in our graph with the 'started' and 'left' properties representing their tenure at a company: CREATE (person1 { personId: 1, started: 1361708546 }) CREATE (person2 { personId: 2, started: 1361708546, left: 1371708646 }) CREATE (company { companyId: 1 }) We want to create a 'TENURE' link from them to the company including the 'started' and 'left' properties when applicable and might start with the following query: neo4j: WrappingNeoServerBootstrapper and the case of the /webadmin 404 https://www.markhneedham.com/blog/2013/06/19/neo4j-wrappingneoserverbootstrapper-and-the-case-of-the-webadmin-404/ Wed, 19 Jun 2013 05:32:50 +0000 https://www.markhneedham.com/blog/2013/06/19/neo4j-wrappingneoserverbootstrapper-and-the-case-of-the-webadmin-404/ When people first use neo4j they frequently start out by embedding it in a Java application but eventually they want to explore the graph in a more visual way. One simple way to do this is to start neo4j in server mode and use the web console. Our initial code might read like this: public class GraphMeUp { public static void main(String[] args) { GraphDatabaseService graphDb = new EmbeddedGraphDatabase("/path/to/data/graph.db"); } } or: neo4j/cypher: Finding single hop paths https://www.markhneedham.com/blog/2013/06/15/neo4jcypher-finding-single-hop-paths/ Sat, 15 Jun 2013 13:04:53 +0000 https://www.markhneedham.com/blog/2013/06/15/neo4jcypher-finding-single-hop-paths/ The neo4j docs have a few examples explaining how to to write cypher queries dealing with path ranges but an interesting variation that I came across recently is where we want to find the individual hops in a path. I thought the managers that Chelsea have had since Roman Abramovich took over would serve as a useful data set to show how this works. So we create all the managers and a 'SUCCEEDED_BY' relationship between them as follows: Java: Finding/Setting JDK/$JAVA_HOME on Mac OS X https://www.markhneedham.com/blog/2013/06/15/java-findingsetting-jdkjava_home-on-mac-os-x/ Sat, 15 Jun 2013 10:28:28 +0000 https://www.markhneedham.com/blog/2013/06/15/java-findingsetting-jdkjava_home-on-mac-os-x/ As long as I’ve been using a Mac I always understood that if you needed to set $JAVA_HOME for any program, it should be set to /System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK. On my machine this points to the 1.6 JDK: $ ls -alh /System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK /System/Library/Frameworks/JavaVM.framework/Versions/CurrentJDK -> /System/Library/Java/JavaVirtualMachines/1.6.0.jdk/Contents This was a bit surprising to me since I’ve actually got Java 7 installed on the machine as well so I’d assumed the symlink would have been changed: neo4j/cypher/Lucene: Dealing with special characters https://www.markhneedham.com/blog/2013/06/15/neo4jcypherlucene-dealing-with-special-characters/ Sat, 15 Jun 2013 09:53:15 +0000 https://www.markhneedham.com/blog/2013/06/15/neo4jcypherlucene-dealing-with-special-characters/ neo4j uses Lucene to handle indexing of nodes and relationships in the graph but something that can be a bit confusing at first is how to handle special characters in Lucene queries. For example let’s say we set up a database with the following data: CREATE ({name: "-one"}) CREATE ({name: "-two"}) CREATE ({name: "-three"}) CREATE ({name: "four"}) And for whatever reason we only wanted to return the nodes that begin with a hyphen. git: Having a branch/tag with the same name (error: dst refspec matches more than one.) https://www.markhneedham.com/blog/2013/06/13/git-having-a-branchtag-with-the-same-name-error-dst-refspec-matches-more-than-one/ Thu, 13 Jun 2013 22:18:31 +0000 https://www.markhneedham.com/blog/2013/06/13/git-having-a-branchtag-with-the-same-name-error-dst-refspec-matches-more-than-one/ Andres and I recently found ourselves wanting to delete a remote branch which had the same name as a tag and therefore the normal way of doing that wasn’t worked out as well as we’d hoped. I created a dummy repository to recreate the state we’d got ourselves into: $ echo "mark" > README $ git commit -am "readme" $ echo "for the branch" >> README $ git commit -am "for the branch" $ git checkout -b same Switched to a new branch 'same' $ git push origin same Counting objects: 5, done. Unix: find, xargs, zipinfo and the 'caution: filename not matched:' error https://www.markhneedham.com/blog/2013/06/09/unix-find-xargs-zipinfo-and-the-caution-filename-not-matched-error/ Sun, 09 Jun 2013 23:10:34 +0000 https://www.markhneedham.com/blog/2013/06/09/unix-find-xargs-zipinfo-and-the-caution-filename-not-matched-error/ As I mentioned in my previous post last week I needed to scan all the jar files included with the neo4j-enterprise gem and I started out by finding out where it’s located on my machine: $ bundle show neo4j-enterprise /Users/markhneedham/.rbenv/versions/jruby-1.7.1/lib/ruby/gems/shared/gems/neo4j-enterprise-1.8.2-java I then thought I could get a list of all the jar files using http://unixhelp.ed.ac.uk/CGI/man-cgi?find and pipe it into http://linux.about.com/library/cmd/blcmdl1_zipinfo.htm via xargs to get all the file names and then search for HighlyAvailableGraphDatabaseFactory: neo4j.rb HA: NameError: cannot load Java class org.neo4j.graphdb.factory.HighlyAvailableGraphDatabaseFactory https://www.markhneedham.com/blog/2013/06/09/neo4j-rb-ha-nameerror-cannot-load-java-class-org-neo4j-graphdb-factory-highlyavailablegraphdatabasefactory/ Sun, 09 Jun 2013 16:57:35 +0000 https://www.markhneedham.com/blog/2013/06/09/neo4j-rb-ha-nameerror-cannot-load-java-class-org-neo4j-graphdb-factory-highlyavailablegraphdatabasefactory/ neo4.rb is a JRuby gem that allows you to create an embedded neo4j database and last week I was working out how to setup a neo4j 1.8.2 HA cluster using the gem. There is an example showing how to create a HA cluster using neo4j.rb so I thought I could adapt that to do what I wanted. I had the following Gemfile: source 'http://rubygems.org' gem 'neo4j', '2.2.4' gem 'neo4j-community', '1. neo4j/cypher 2.0: The CASE statement https://www.markhneedham.com/blog/2013/06/09/neo4jcypher-2-0-the-case-statement/ Sun, 09 Jun 2013 14:02:27 +0000 https://www.markhneedham.com/blog/2013/06/09/neo4jcypher-2-0-the-case-statement/ I’ve been playing around with how you might model Premier League managers tenures at different clubs in neo4j and eventually decided on the following model: The date modelling is based on an approach I first came across in a shutl presentation and is described in more detail in the docs. I created a dummy data set with some made up appointments and dismissals and then tried to write a query to show me who was the manager for a team on a specific date. The Affect Heuristic https://www.markhneedham.com/blog/2013/06/06/the-affect-heuristic/ Thu, 06 Jun 2013 22:36:04 +0000 https://www.markhneedham.com/blog/2013/06/06/the-affect-heuristic/ In my continued reading of Daniel Kahneman’s Thinking Fast and Slow I’ve reached the section which talks about the affect heuristic which seems particularly applicable to the technical decisions that we make. The dominance of conclusions over arguments is most pronounced where emotions are involved. The psychologist Paul Slovic has proposed an affect heuristic in which people let their likes and dislikes determine their beliefs about the world. The way I’ve seen this heuristic coming into play in the software world is when we do an 'objective' overview of the technical tools/options that we could use to solve a particular problem. Ego Depletion https://www.markhneedham.com/blog/2013/06/04/ego-depletion/ Tue, 04 Jun 2013 23:16:29 +0000 https://www.markhneedham.com/blog/2013/06/04/ego-depletion/ On the recommendation of Mike Jones I’ve been reading through Daniel Kahneman’s Thinking Fast and Slow in which the first part of the book covers our two styles of thinking: System 1 - operates automatically and quickly, with little or no effort and no sense of voluntary control. System 2 - allocates attention to the effortful mental activities that demand it, including complex computations. The operations of System 2 are often associated with the subjective experience of agency, choice, and concentration. neo4j/cypher: 400 response - Paths can't be created inside of foreach https://www.markhneedham.com/blog/2013/05/31/neo4jcypher-400-response-paths-cant-be-created-inside-of-foreach/ Fri, 31 May 2013 00:37:57 +0000 https://www.markhneedham.com/blog/2013/05/31/neo4jcypher-400-response-paths-cant-be-created-inside-of-foreach/ In the neo4j 1.9 milestone releases if we wanted to create multiple relationships from a node we could use the following cypher syntax: require 'neography' neo = Neography::Rest.new neo.execute_query("create (me {name: 'Mark'})") query = " START n=node:node_auto_index(name={name})" query << " FOREACH (friend in {friends} : CREATE f=friend, n-[:FRIEND]->f)" neo.execute_query(query, {"name" => "Mark", "friends" => [{ "name" => "Will"}, {"name" => "Paul"}]}) To check that the 'FRIEND' relationships have been created we’d write the following query: Viewing the contents of an archive https://www.markhneedham.com/blog/2013/05/29/viewing-the-contents-of-an-archive/ Wed, 29 May 2013 11:22:35 +0000 https://www.markhneedham.com/blog/2013/05/29/viewing-the-contents-of-an-archive/ Everyone now and then I want to check the contents of an archive without unpacking it and I tend to use http://linux.about.com/od/commands/l/blcmdl1_unzip.htm to do so: $ unzip -l batch-import-jar-with-dependencies.jar | tail -n 10 1645 02-17-13 01:03 org/neo4j/batchimport/StdOutReport.class 3089 02-17-13 01:03 org/neo4j/batchimport/structs/NodeStruct.class 1244 02-17-13 01:03 org/neo4j/batchimport/structs/Property.class 1732 02-17-13 01:03 org/neo4j/batchimport/structs/PropertyHolder.class 1635 02-17-13 01:03 org/neo4j/batchimport/structs/Relationship.class 905 02-17-13 01:03 org/neo4j/batchimport/utils/Chunker.class 1884 02-17-13 01:03 org/neo4j/batchimport/utils/Params.class 4445 02-17-13 01:03 org/neo4j/batchimport/Utils.class -------- ------- 49947859 16447 files It does the job although it does print out some information that we’re not really interested in so I was intrigued to see that Alistair used http://linux. Pomodoros: Just start the timer https://www.markhneedham.com/blog/2013/05/27/pomodoros-just-start-the-timer/ Mon, 27 May 2013 13:23:36 +0000 https://www.markhneedham.com/blog/2013/05/27/pomodoros-just-start-the-timer/ I wrote earlier in the year about my use of pomodoros to track what I’m doing outside of work and having done this for 6 months I noticed that I’m now procrastinating over picking something off the list to work on. (I know…I am awesome!) I’m not sure whether this is because I don’t have anything really appealing on the list of things to work on or whether having so many things listed (I have 8-10 items) is causing the paralysis. A/B Testing: Being pragmatic with statistical significance https://www.markhneedham.com/blog/2013/05/27/ab-testing-pragmatica-statistical-significance/ Mon, 27 May 2013 13:13:49 +0000 https://www.markhneedham.com/blog/2013/05/27/ab-testing-pragmatica-statistical-significance/ One of the first things that we did before starting any of the A/B tests that I’ve previously written about was to work out how many users we needed to go through before we could be sure that the results we saw were statistically significant. We used the prop.test function from R to do this and based on our traffic at the time worked out that we’d need to run a test for 6 weeks to achieve statistical significance. Polyglot Persistence: Embrace the ETL https://www.markhneedham.com/blog/2013/05/27/polyglot-persistence-embrace-the-etl/ Mon, 27 May 2013 00:11:23 +0000 https://www.markhneedham.com/blog/2013/05/27/polyglot-persistence-embrace-the-etl/ Over the past few years I’ve seen the emergence of polyglot persistence i.e. using different data storage technologies for different data and in most situations we work that out up front. For example we might use MongoDB to store data about a customer journey through our website but we might simultaneously write page view data through to something like Hadoop or Redshift: This works reasonably well but sometimes it might not be immediately obvious how we want to query our data when we first start collecting it and our storage choice might not be the best for writing these queries. Polyglot Persistence: The 'boring' relational option https://www.markhneedham.com/blog/2013/05/26/polyglot-persistence-the-boring-relational-option/ Sun, 26 May 2013 23:29:12 +0000 https://www.markhneedham.com/blog/2013/05/26/polyglot-persistence-the-boring-relational-option/ I was chatting with Brian Blignaut last week after the Equal Experts NoSQL event and he made an interesting observation that in this age of Polyglot Persistence we often rule out the relational database. I think it’s definitely better that we now have many different options for where we store our data - be it as key/value pairs, documents or as a network/graph. Having these options forces us to think more about how we’re going to read/write data in our application whereas previously our effort was focused around which tables we were going to pull out. neo4j/cypher: Properties or relationships? It's easy to switch https://www.markhneedham.com/blog/2013/05/25/neo4jcypher-properties-or-relationships-its-easy-to-switch/ Sat, 25 May 2013 12:21:55 +0000 https://www.markhneedham.com/blog/2013/05/25/neo4jcypher-properties-or-relationships-its-easy-to-switch/ I’ve written previously about how I’ve converted properties on nodes into relationships and over the past week there was an interesting discussion on the neo4j mailing list about where each is appropriate. Jim gives quite a neat summary of the difference between the two on the thread: Properties are the data that an entity like a node [or relationship] holds. Relationships simply form the semantic glue (type, direction, cardinality) between nodes. Feedback: Reacting immediately https://www.markhneedham.com/blog/2013/05/23/feedback-reacting-immediately/ Thu, 23 May 2013 22:43:53 +0000 https://www.markhneedham.com/blog/2013/05/23/feedback-reacting-immediately/ I was recently reading an article written by Henry Winter where he mentioned some of the ideas that Sir ALex Ferguson has been covering in some interviews he’s been doing at Harvard and one bit stood out for me: In a series of interviews in Harvard, Ferguson debated dealing with “fragile” egos in the dressing room, the power of the two simple words “well done” in motivating individuals and the importance of criticising players’ mistakes immediately after the match and then moving on. Ruby/Python: Constructing a taxonomy from an array using zip https://www.markhneedham.com/blog/2013/05/19/rubypython-constructing-a-taxonomy-from-an-array-using-zip/ Sun, 19 May 2013 22:44:40 +0000 https://www.markhneedham.com/blog/2013/05/19/rubypython-constructing-a-taxonomy-from-an-array-using-zip/ As I mentioned in my previous blog post I’ve been hacking on a product taxonomy and I wanted to create a 'CHILD' relationship between a collection of categories. For example, I had the following array and I wanted to transform it into an array of 'SubCategory, Category' pairs: taxonomy = ["Cat", "SubCat", "SubSubCat"] # I wanted this to become [("Cat", "SubCat"), ("SubCat", "SubSubCat") In order to do this we need to zip the first 2 items with the last which I found reasonably easy to do using Python: neo4j/cypher: Keep longest path when finding taxonomy https://www.markhneedham.com/blog/2013/05/19/neo4jcypher-keep-longest-path-when-finding-taxonomy/ Sun, 19 May 2013 22:15:06 +0000 https://www.markhneedham.com/blog/2013/05/19/neo4jcypher-keep-longest-path-when-finding-taxonomy/ I’ve been playing around with modelling a product taxonomy and one thing that I wanted to do was find out the full path where a product sits under the tree. I created a simple data set to show the problem: CREATE (cat { name: "Cat" }) CREATE (subcat1 { name: "SubCat1" }) CREATE (subcat2 { name: "SubCat2" }) CREATE (subsubcat1 { name: "SubSubCat1" }) CREATE (product1 { name: "Product1" }) CREATE (cat)-[:CHILD]-subcat1-[:CHILD]-subsubcat1 CREATE (product1)-[:HAS_CATEGORY]-(subsubcat1) I wanted to write a query which would return 'product1' and the tree 'Cat -> SubCat1 -> SubSubCat1' and initially wrote the following query: Unix: Working with parts of large files https://www.markhneedham.com/blog/2013/05/19/unix-working-with-parts-of-large-files/ Sun, 19 May 2013 21:44:03 +0000 https://www.markhneedham.com/blog/2013/05/19/unix-working-with-parts-of-large-files/ Chris and I were looking at the neo4j log files of a client earlier in the week and wanted to do some processing of the file so we could ask the client to send us some further information. The log file was over 10,000 lines long but the bit of the file we were interesting in was only a few hundred lines. I usually use Vim and the ':set number' when I want to refer to line numbers in a file but Chris showed me that we can achieve the same thing with e. A/B Testing: User Experience vs Conversion https://www.markhneedham.com/blog/2013/05/18/ab-testing-user-experience-vs-conversion/ Sat, 18 May 2013 20:18:50 +0000 https://www.markhneedham.com/blog/2013/05/18/ab-testing-user-experience-vs-conversion/ I’ve written a couple of posts over the last few months about my experiences with A/B testing and one conversation we often used to have was around user experience vs conversion rate. Once you start running an A/B test it encourages you to focus more on the conversion rate of users in different parts of the flow and your inclination is to make changes that increase that conversion rate. neo4j: When the web console returns nothing...use the data browser! https://www.markhneedham.com/blog/2013/05/17/neo4j-when-the-web-console-returns-nothinguse-the-data-browser/ Fri, 17 May 2013 00:00:16 +0000 https://www.markhneedham.com/blog/2013/05/17/neo4j-when-the-web-console-returns-nothinguse-the-data-browser/ In my time playing around with neo4j I’ve run into a problem a few times where I executed a query using the web console (usually accessible @ http://localhost:7474/webadmin/#/console/) and have got absolutely no response. I noticed a similar thing today when Rickard and I were having a look at why a Lucene index query wasn’t behaving as we expected. I setup some data in a neo4j database using neography with the following code: Book Review: The Signal and the Noise - Nate Silver https://www.markhneedham.com/blog/2013/05/14/book-review-the-signal-and-the-noise-nate-silver/ Tue, 14 May 2013 00:16:56 +0000 https://www.markhneedham.com/blog/2013/05/14/book-review-the-signal-and-the-noise-nate-silver/ Nate Silver is famous for having correctly predicted the winner of all 50 states in the 2012 United States elections and Sid recommended his book so I could learn more about statistics for the A/B tests that we were running. I thought the book was a really good introduction to applied statistics and by using real life examples which most people would be able to relate to it makes a potentially dull subject interesting. Sublime: Overriding default file type/Assigning specific files to a file type https://www.markhneedham.com/blog/2013/05/05/sublime-overriding-default-file-typeassigning-specific-files-to-a-file-type/ Sun, 05 May 2013 00:03:17 +0000 https://www.markhneedham.com/blog/2013/05/05/sublime-overriding-default-file-typeassigning-specific-files-to-a-file-type/ I’ve been using Sublime a bit recently and one thing I wanted to do was put neo4j cypher queries into files with arbitrary extensions and have them recognised as cypher files every time I open them. I’m using the cypher Sublime plugin to get the syntax highlighting but since I’ve got my cypher in a .haml file it only remembers that it should have cypher highlighting as long as the file is open. Ruby 1.9.3 p0: Investigating weirdness with HTTP POST request in net/http https://www.markhneedham.com/blog/2013/04/30/ruby-1-9-3-p0-investigating-weirdness-with-http-post-request-in-nethttp/ Tue, 30 Apr 2013 21:37:11 +0000 https://www.markhneedham.com/blog/2013/04/30/ruby-1-9-3-p0-investigating-weirdness-with-http-post-request-in-nethttp/ Thibaut and I spent the best part of the last couple of days trying to diagnose a problem we were having trying to make a POST request using rest-client to one of our services. We have nginx fronting the application server so the request passes through there first: The problem we were having was that the request was timing out on the client side before it had been processed and the request wasn’t reaching the application server. Mac OS X: A couple of neat tools https://www.markhneedham.com/blog/2013/04/30/mac-os-x-a-couple-of-neat-tools/ Tue, 30 Apr 2013 20:07:57 +0000 https://www.markhneedham.com/blog/2013/04/30/mac-os-x-a-couple-of-neat-tools/ When I first started working at uSwitch Sid installed a couple of 'productivity applications' on my Mac which I’ve found pretty useful but from talking to others I realised they aren’t known/being used by everyone. Alfred Alfred is a Quick Silver replacement which allows you to quickly open applications, find files, search Google and more. Even though we’re not using half of its features it’s still proved to be useful. neo4j/cypher: Returning a row with zero count when no relationship exists https://www.markhneedham.com/blog/2013/04/30/neo4jcypher-returning-a-row-with-zero-count-when-no-relationship-exists/ Tue, 30 Apr 2013 07:02:09 +0000 https://www.markhneedham.com/blog/2013/04/30/neo4jcypher-returning-a-row-with-zero-count-when-no-relationship-exists/ I’ve been trying to see if I can match some of the football stats that OptaJoe posts on twitter and one that I was looking at yesterday was around the number of red cards different teams have received. 1 - Sunderland have picked up their first PL red card of the season. The only team without one now are Man Utd. Angels. To refresh this is the sub graph that we’ll need to look at to work it out: A/B Testing: Reporting https://www.markhneedham.com/blog/2013/04/28/ab-testing-reporting/ Sun, 28 Apr 2013 22:32:38 +0000 https://www.markhneedham.com/blog/2013/04/28/ab-testing-reporting/ A few months ago I wrote about my initial experiences with A/B testing and since then we’ve been working on another one and learnt some things around reporting on these types of tests that I thought was interesting. Reporting as a first class concern One thing we changed from our previous test after a suggestion by Mike was to start treating the reporting of data related to the test as a first class citizen. Treat servers as cattle: Spin them up, tear them down https://www.markhneedham.com/blog/2013/04/27/treat-servers-as-cattle-spin-them-up-tear-them-down/ Sat, 27 Apr 2013 14:22:10 +0000 https://www.markhneedham.com/blog/2013/04/27/treat-servers-as-cattle-spin-them-up-tear-them-down/ A few agos I wrote a post about treating servers as cattle, not as pets in which I described an approach to managing virtual machines at uSwitch whereby we frequently spin up new ones and delete the existing ones. I’ve worked on teams previously where we’ve also talked about this mentality but ended up not doing it because it was difficult, usually for one of two reasons: Slow spin up - this might be due to the cloud providers infrastructure, doing too much on spin up or I’m sure a variety of other reasons. Puppet: Package Versions - To pin or not to pin https://www.markhneedham.com/blog/2013/04/27/puppet-package-versions-to-pin-or-not-to-pin/ Sat, 27 Apr 2013 13:40:28 +0000 https://www.markhneedham.com/blog/2013/04/27/puppet-package-versions-to-pin-or-not-to-pin/ Over the last year or so I’ve spent quite a bit of time working with puppet and one of the things that we had to decide when installing packages was whether or not to specify a particular version. On the first project I worked on we didn’t bother and just let the package manager chose the most recent version. Therefore if we were installing nginx the puppet code would read like this: Unix: Checking for open sockets on nginx https://www.markhneedham.com/blog/2013/04/23/unix-checking-for-open-sockets-on-nginx/ Tue, 23 Apr 2013 23:59:49 +0000 https://www.markhneedham.com/blog/2013/04/23/unix-checking-for-open-sockets-on-nginx/ Tim and I were investigating a weird problem we were having with nginx where it was getting in a state where it had exceeded the number of open files allowed on the system and started rejecting requests. We can find out the maximum number of open files that we’re allowed on a system with the following command: $ ulimit -n 1024 Our hypothesis was that some socket connections were never being closed and therefore the number of open files was climbing slowly upwards until it exceeded the limit. No downtime deploy with capistrano, Thin and nginx https://www.markhneedham.com/blog/2013/04/23/no-downtime-deploy-with-capistrano-thin-and-nginx/ Tue, 23 Apr 2013 23:25:15 +0000 https://www.markhneedham.com/blog/2013/04/23/no-downtime-deploy-with-capistrano-thin-and-nginx/ As I mentioned a couple of weeks ago I’ve been working on a tutorial about thinking through problems in graphs and since it’s a Sinatra application I thought thin would be a decent choice for web server. In my initial setup I had the following nginx config file which was used to proxy requests on to thin: /etc/nginx/sites-available/thinkingingraphs.conf upstream thin { server 127.0.0.1:3000; } server { listen 80 default; server_name _; charset utf-8; rewrite ^\/status(. Puppet: Installing Oracle Java - oracle-license-v1-1 license could not be presented https://www.markhneedham.com/blog/2013/04/18/puppet-installing-oracle-java-oracle-license-v1-1-license-could-not-be-presented/ Thu, 18 Apr 2013 23:36:32 +0000 https://www.markhneedham.com/blog/2013/04/18/puppet-installing-oracle-java-oracle-license-v1-1-license-could-not-be-presented/ In order to run the neo4j server on my Ubuntu 12.04 Vagrant VM I needed to install the Oracle/Sun JDK which proved to be more difficult than I’d expected. I initially tried to install it via the OAB-Java script but was running into some dependency problems and eventually came across a post which specified a PPA that had an installer I could use. I wrote a little puppet Java module to wrap the commands in: dpkg/apt-cache: Useful commands https://www.markhneedham.com/blog/2013/04/18/dpkgapt-cache-useful-commands/ Thu, 18 Apr 2013 21:54:10 +0000 https://www.markhneedham.com/blog/2013/04/18/dpkgapt-cache-useful-commands/ As I’ve mentioned in a couple of previous posts I’ve been playing around with creating a Vagrant VM that I can use for my neo4j hacking which has involved a lot of messing around with installing apt packages. There are loads of different ways of working out what’s going on when packages aren’t installing as you’d expect so I thought it’d be good to document the ones I’ve been using so I can find them more easily next time. neo4j/cypher: Redundant relationships https://www.markhneedham.com/blog/2013/04/16/neo4jcypher-redundant-relationships/ Tue, 16 Apr 2013 21:41:58 +0000 https://www.markhneedham.com/blog/2013/04/16/neo4jcypher-redundant-relationships/ Last week I was writing a query to find the top scorers in the Premier League so far this season alongside the number of games they’ve played in which initially read like this: START player = node:players('name:*') MATCH player-[:started|as_sub]-playedLike-[:in]-game-[r?:scored_in]-player WITH player, COUNT(DISTINCT game) AS games, COLLECT(r) AS allGoals RETURN player.name, games, LENGTH(allGoals) AS goals ORDER BY goals DESC LIMIT 5 +------------------------------------+ | player.name | games | goals | +------------------------------------+ | "Luis Suárez" | 30 | 22 | | "Robin Van Persie" | 30 | 19 | | "Gareth Bale" | 27 | 17 | | "Michu" | 29 | 16 | | "Demba Ba" | 28 | 15 | +------------------------------------+ 5 rows 1 ms I modelled whether a player started a game or came on as a substitute with separate relationship types 'started' and 'as_sub' but in this query we’re not interested in that, we just want to know whether they played. Puppet Debt https://www.markhneedham.com/blog/2013/04/16/puppet-debt/ Tue, 16 Apr 2013 20:57:53 +0000 https://www.markhneedham.com/blog/2013/04/16/puppet-debt/ I’ve been playing around with a puppet configuration to run a neo4j server on an Ubuntu VM and one thing that has been quite tricky is getting the Sun/Oracle Java JDK to install repeatably. I adapted Julian’s Java module which uses OAB-Java and although it was certainly working cleanly at one stage I somehow ended up with it not working because of failed dependencies: [2013-04-12 07:03:10] Notice: /Stage[main]/Java/Exec[install OAB repo]/returns: [x] Installing Java build requirements Ofailed [2013-04-12 07:03:10] Notice: /Stage[main]/Java/Exec[install OAB repo]/returns: ^[[m^O [i] Showing the last 5 lines from the logfile (/root/oab-java. Capistrano: Host key verification failed. ** [err] fatal: The remote end hung up unexpectedly https://www.markhneedham.com/blog/2013/04/14/capistrano-host-key-verification-failed-err-fatal-the-remote-end-hung-up-unexpectedly/ Sun, 14 Apr 2013 18:18:32 +0000 https://www.markhneedham.com/blog/2013/04/14/capistrano-host-key-verification-failed-err-fatal-the-remote-end-hung-up-unexpectedly/ As I mentioned in my previous post I’ve been deploying a web application to a vagrant VM using Capistrano and my initial configuration was like so: require 'capistrano/ext/multistage' set :application, "thinkingingraphs" set :scm, :git set :repository, "git@bitbucket.org:markhneedham/thinkingingraphs.git" set :scm_passphrase, "" set :ssh_options, {:forward_agent => true, :paranoid => false, keys: ['~/.vagrant.d/insecure_private_key']} set :stages, ["vagrant"] set :default_stage, "vagrant" set :user, "vagrant" server "192.168.33.101", :app, :web, :db, :primary => true set :deploy_to, "/var/www/thinkingingraphs" When I ran 'cap deploy' I ended up with the following error: Capistrano: Deploying to a Vagrant VM https://www.markhneedham.com/blog/2013/04/13/capistrano-deploying-to-a-vagrant-vm/ Sat, 13 Apr 2013 11:17:37 +0000 https://www.markhneedham.com/blog/2013/04/13/capistrano-deploying-to-a-vagrant-vm/ I’ve been working on a tutorial around thinking through problems in graphs using my football graph and I wanted to deploy it on a local vagrant VM as a stepping stone to deploying it in a live environment. My Vagrant file for the VM looks like this: # -*- mode: ruby -*- # vi: set ft=ruby : Vagrant::Config.run do |config| config.vm.box = "precise64" config.vm.define :neo01 do |neo| neo.vm.network :hostonly, "192. awk: Parsing 'free -m' output to get memory usage/consumption https://www.markhneedham.com/blog/2013/04/10/awk-parsing-free-m-output-to-get-memory-usageconsumption/ Wed, 10 Apr 2013 07:03:15 +0000 https://www.markhneedham.com/blog/2013/04/10/awk-parsing-free-m-output-to-get-memory-usageconsumption/ Although I know this problem is already solved by collectd and New Relic I wanted to write a little shell script that showed me the memory usage on a bunch of VMs by parsing the output of http://linux.about.com/library/cmd/blcmdl1_free.htm. The output I was playing with looks like this: $ free -m total used free shared buffers cached Mem: 365 360 5 0 59 97 -/+ buffers/cache: 203 161 Swap: 767 13 754 I wanted to find out what % of the memory on the machine was being used and as I understand it the numbers that we would use to calculate this are the 'total' value on the 'Mem' line and the 'used' value on the 'buffers/cache' line. Python: Reading a JSON file https://www.markhneedham.com/blog/2013/04/09/python-reading-a-json-file/ Tue, 09 Apr 2013 07:23:59 +0000 https://www.markhneedham.com/blog/2013/04/09/python-reading-a-json-file/ I’ve been playing around with some code to spin up AWS instances using Fabric and Boto and one thing that I wanted to do was define a bunch of default properties in a JSON file and then load this into a script. I found it harder to work out how to do this than I expected to so I thought I’d document it for future me! My JSON file looks like this: Treating servers as cattle, not as pets https://www.markhneedham.com/blog/2013/04/07/treating-servers-as-cattle-not-as-pets/ Sun, 07 Apr 2013 11:41:34 +0000 https://www.markhneedham.com/blog/2013/04/07/treating-servers-as-cattle-not-as-pets/ Although I didn’t go to Dev Ops Days London earlier in the year I was following the hash tag on twitter and one of my favourites things that I read was the following: “Treating servers as cattle, not as pets” #DevOpsDays I think this is particularly applicable now that a lot of the time we’re using virtualised production environments via AWS, Rackspace or .</p> At uSwitch we use AWS and over the last week Sid and I spent some time investigating a memory leak by running our applications against two different versions of Ruby. Sublime: Getting Textmate's Reveal/Select in Side Bar (Cmd + Ctrl + R) https://www.markhneedham.com/blog/2013/04/07/sublime-getting-textmates-revealselect-in-side-bar-cmd-ctrl-r/ Sun, 07 Apr 2013 01:00:08 +0000 https://www.markhneedham.com/blog/2013/04/07/sublime-getting-textmates-revealselect-in-side-bar-cmd-ctrl-r/ After coming across this post about why you should use Sublime Text I decided to try using it a bit more and one of the things that I missed from Textmate was the way you can select the current file on the sidebar. In Textmate the shortcut to do that is 'Cmd + Ctrl + R' so I wanted to be able to do something similar or configure Sublime so it responded to the same shortcut. MySQL: Repairing broken tables/indices https://www.markhneedham.com/blog/2013/04/06/mysql-repairing-broken-tablesindices/ Sat, 06 Apr 2013 17:26:20 +0000 https://www.markhneedham.com/blog/2013/04/06/mysql-repairing-broken-tablesindices/ I part time administrate a football forum that I used to run when I was at university and one problem we had recently was that some of the tables/indices had got corrupted when MySQL crashed due to a lack of disc space. We weren’t seeing any visible sign of a problem in any of the logs but whenever you tried to query one of the topics it wasn’t returning any posts. Embracing the logs https://www.markhneedham.com/blog/2013/03/31/embracing-the-logs/ Sun, 31 Mar 2013 21:44:19 +0000 https://www.markhneedham.com/blog/2013/03/31/embracing-the-logs/ Despite the fact that I’ve been working full time in software for almost 8 years now every now and then I still need a reminder of how useful reading logs can be in helping solve problems. I had a couple of such instances recently which I thought I’d document. The first was a couple of weeks ago when Tim and I were pairing on moving some applications from Passenger to Unicorn and were testing whether or not we’d done so successfully. neo4j/cypher: Playing around with time https://www.markhneedham.com/blog/2013/03/31/neo4jcypher-playing-around-with-time/ Sun, 31 Mar 2013 21:08:22 +0000 https://www.markhneedham.com/blog/2013/03/31/neo4jcypher-playing-around-with-time/ I’ve done a bit of modelling with years and months in neo4j graphs that I’ve worked on previously but I haven’t ever done anything with time so I thought it’d be interesting to have a go with my football graph. I came across this StackOverflow post on my travels which suggested that indexing nodes by time would be helpful and since I have a bunch of football matches with associated times I thought I’d try it out. Editing config files on a server & Ctrl-Z https://www.markhneedham.com/blog/2013/03/29/editing-config-files-on-a-server-ctrl-z/ Fri, 29 Mar 2013 10:51:37 +0000 https://www.markhneedham.com/blog/2013/03/29/editing-config-files-on-a-server-ctrl-z/ A couple of weeks ago Tim and I were spinning up a new service on a machine which wasn’t quite working so we were manually making changes to the /etc/nginx/nginx.conf file and restarting nginx to try and sort it out. This process is generally not that interesting - you open the file in vi, make some changes, close it, then restart nginx and see if it works. If not then you open the file again and repeat. Incrementally rolling out machines with a new puppet role https://www.markhneedham.com/blog/2013/03/24/incrementally-rolling-out-machines-with-a-new-puppet-role/ Sun, 24 Mar 2013 22:52:19 +0000 https://www.markhneedham.com/blog/2013/03/24/incrementally-rolling-out-machines-with-a-new-puppet-role/ Last week Jason and I with (a lot of) help from Tim have been working on moving several of our applications from Passenger to Unicorn and decided that the easiest way to do this was to create a new set of nodes with this setup. The architecture we’re working with looks like this at a VM level: The 'nginx LB' nodes are responsible for routing all the requests to their appropriate application servers and the 'web' nodes serve the different applications initially using Passenger. Best tool for the job/Learning new ways to do things https://www.markhneedham.com/blog/2013/03/24/best-tool-for-the-joblearning-new-ways-to-do-things/ Sun, 24 Mar 2013 22:01:12 +0000 https://www.markhneedham.com/blog/2013/03/24/best-tool-for-the-joblearning-new-ways-to-do-things/ I recently came across an interesting post written by Randy Luecke titled 'I’m done with the web' in which he expresses his surprise that people often aren’t willing to take the time out to learn something new. In this context he’s referring to javascript libraries but I think his thinking is generally applicable. Having worked for a few years now I’ve played around with a reasonable number of programming languages/text editors/databases/etc to the point that I have favourites when it comes to solving certain problems. When nokogiri fails with 'Nokogiri::XML::SyntaxError: Element script embeds close tag' Web Driver to the rescue https://www.markhneedham.com/blog/2013/03/24/when-nokogiri-fails-with-nokogirixmlsyntaxerror-element-script-embeds-close-tag-web-driver-to-the-rescue/ Sun, 24 Mar 2013 21:20:35 +0000 https://www.markhneedham.com/blog/2013/03/24/when-nokogiri-fails-with-nokogirixmlsyntaxerror-element-script-embeds-close-tag-web-driver-to-the-rescue/ As I mentioned in my previous post I wanted to add televised games to my football graph and the Premier League website seemed like the best case to find out which games those were. I initially tried to use Nokogiri to grab the data that I wanted... > require 'nokogiri' > require 'open-air' > tv_times = Nokogiri::HTML(open('http://www.premierleague.com/en-gb/matchday/broadcast-schedules.tv.html?rangeType=.dateSeason&country=GB&clubId=ALL&season=2012-2013&isLive=true')) ...but when I tried to query by CSS selector for all the matches nothing came back: neo4j/cypher: CypherTypeException: Failed merging Number with Relationship https://www.markhneedham.com/blog/2013/03/24/neo4jcypher-cyphertypeexception-failed-merging-number-with-relationship/ Sun, 24 Mar 2013 13:00:29 +0000 https://www.markhneedham.com/blog/2013/03/24/neo4jcypher-cyphertypeexception-failed-merging-number-with-relationship/ The latest thing that I added to my football graph was the matches that are shown on TV as I have the belief that players who score on televised games get more attention than players who score in other games. I thought it’d be interesting to work out who the top scorers are on each of these game types. I added the following relationship type to allow me to do this: beanstalkd: Getting the status of the queue https://www.markhneedham.com/blog/2013/03/21/beanstalkd-getting-the-status-of-the-queue/ Thu, 21 Mar 2013 23:25:09 +0000 https://www.markhneedham.com/blog/2013/03/21/beanstalkd-getting-the-status-of-the-queue/ For the last few days Jason and I have been porting a few of our applications across to a new puppet setup and one thing we needed to do was check that messages were passing through beanstalkd correctly. We initially had the idea that it wasn’t configured correctly so Paul showed us a way of checking whether that was the case by connecting to the port it runs on like so: Wiring up an Amazon S3 bucket to a CNAME entry - The specified bucket does not exist https://www.markhneedham.com/blog/2013/03/21/wiring-up-an-amazon-s3-bucket-to-a-cname-entry-the-specified-bucket-does-not-exist/ Thu, 21 Mar 2013 22:39:02 +0000 https://www.markhneedham.com/blog/2013/03/21/wiring-up-an-amazon-s3-bucket-to-a-cname-entry-the-specified-bucket-does-not-exist/ Jason and I were setting up an internal static website using an S3 bucket a couple of days ago and wanted to point a more friendly domain name at it. We initially called our bucket 'static-site' and then created a CNAME entry using zerigo to point our sub domain at the bucket. The mapping was something like this: our-subdomain.somedomain.com -> static-site.s3-website-eu-west-1.amazonaws.com When we tried to access the site through our-subdomain. neo4j/cypher: WITH, COLLECT & EXTRACT https://www.markhneedham.com/blog/2013/03/20/neo4jcypher-with-collect-extract/ Wed, 20 Mar 2013 02:54:43 +0000 https://www.markhneedham.com/blog/2013/03/20/neo4jcypher-with-collect-extract/ As I mentioned in my last post I’m trying to get the hang of the http://docs.neo4j.org/chunked/milestone/query-with.html statement in neo4j’s cypher query language and I found another application when trying to work out which opponents teams played on certain days. I started out with a query which grouped the data set by day and showed the opponents that were played on that day: START team = node:teams('name:"Manchester United"') MATCH team-[h:home_team|away_team]-game-[:on_day]-day RETURN DISTINCT day. neo4j/cypher: Getting the hang of the WITH statement https://www.markhneedham.com/blog/2013/03/20/neo4jcypher-getting-the-hang-of-the-with-statement/ Wed, 20 Mar 2013 00:25:00 +0000 https://www.markhneedham.com/blog/2013/03/20/neo4jcypher-getting-the-hang-of-the-with-statement/ I wrote a post a few weeks ago showing an example of a cypher query which made use of the WITH statement but I still don’t completely understand how it works so I thought I’d write some more queries that use it. I wanted to find out whether Luis Suárez has a better scoring record depending on which day a match is played on. We start out by finding all the matches that he’s played in and which days those matches were on: neo4j/cypher: SQL style GROUP BY WITH LIMIT query https://www.markhneedham.com/blog/2013/03/18/neo4jcypher-sql-style-group-by-with-limit-query/ Mon, 18 Mar 2013 23:19:36 +0000 https://www.markhneedham.com/blog/2013/03/18/neo4jcypher-sql-style-group-by-with-limit-query/ A few weeks ago I wrote a blog post where I described how we could construct a SQL GROUP BY style query in cypher and last week I wanted to write a similar query but with what I think would be a LIMIT clause in SQL. I wanted to find the maximum number of goals that players had scored in a match for a specific team and started off with the following query to find all the matches that players had scored in: clojure/Java Interop: The doto macro https://www.markhneedham.com/blog/2013/03/17/clojurejava-interop-the-doto-macro/ Sun, 17 Mar 2013 20:21:10 +0000 https://www.markhneedham.com/blog/2013/03/17/clojurejava-interop-the-doto-macro/ I recently wrote about some code I’ve been playing with to import neo4j spatial data and while looking to simplify the code I came across the http://clojure.org/java_interop#Java Interop-The Dot special form-(doto instance-expr (instanceMethodName-symbol args*)*) macro. The doto macro allows us to chain method calls on an initial object and then returns the resulting object. e.g. (doto (new java.util.HashMap) (.put "a" 1) (.put "b" 2)) -> {a=1, b=2} In our case this comes in quite useful in the function used to create a stadium node which initially reads like this:~ ~lisp (defn create-stadium-node [db line] (let [stadium-node (. clojure/Java Interop - Importing neo4j spatial data https://www.markhneedham.com/blog/2013/03/17/clojurejava-interop-importing-neo4j-spatial-data/ Sun, 17 Mar 2013 18:56:36 +0000 https://www.markhneedham.com/blog/2013/03/17/clojurejava-interop-importing-neo4j-spatial-data/ I wrote a post about a week ago where I described how I’d added football stadiums to my football graph using neo4j spatial and after I’d done that I wanted to put it into my import script along with the rest of the data. I thought leiningen would probably work quite well for this as you can point it at a Java class and have it be executed. To start with I had to change the import code slightly to link stadiums to teams which have already been added to the graph: Understanding what lsof socket/port aliases refer to https://www.markhneedham.com/blog/2013/03/17/understanding-what-lsof-socketport-aliases-refer-to/ Sun, 17 Mar 2013 14:00:35 +0000 https://www.markhneedham.com/blog/2013/03/17/understanding-what-lsof-socketport-aliases-refer-to/ Earlier in the week we wanted to check which ports were being listened on and by what processes which we can do with the following command on Mac OS X: $ lsof -ni | grep LISTEN idea 2398 markhneedham 58u IPv6 0xac8f13f77b903331 0t0 TCP *:49410 (LISTEN) idea 2398 markhneedham 65u IPv6 0xac8f13f7799a4af1 0t0 TCP *:58741 (LISTEN) idea 2398 markhneedham 122u IPv6 0xac8f13f7799a4711 0t0 TCP 127.0.0.1:6942 (LISTEN) idea 2398 markhneedham 249u IPv6 0xac8f13f777586711 0t0 TCP *:63342 (LISTEN) idea 2398 markhneedham 253u IPv6 0xac8f13f777586331 0t0 TCP 127. A quick and dirty way of testing the performance of a service https://www.markhneedham.com/blog/2013/03/16/a-quick-and-dirty-way-of-testing-the-performance-of-a-service/ Sat, 16 Mar 2013 11:58:42 +0000 https://www.markhneedham.com/blog/2013/03/16/a-quick-and-dirty-way-of-testing-the-performance-of-a-service/ We had a power outage in our data centre yesterday and once it had recovered Jason and I wanted to do a quick check that one of our backend services was still responding in an acceptable amount of time. Since this particular service only serves HTTP GET requests it was reasonably easy to setup a cURL command to do this: while true; do curl -k -s -w %{time_total} https://serviceurl/whatever/something; -o /dev/null; printf "\n"; done > service. neo4j/cypher: Finding football stadiums near a city using spatial https://www.markhneedham.com/blog/2013/03/10/neo4jcypher-finding-football-stadiums-near-a-city-using-spatial/ Sun, 10 Mar 2013 22:13:41 +0000 https://www.markhneedham.com/blog/2013/03/10/neo4jcypher-finding-football-stadiums-near-a-city-using-spatial/ One of the things that I wanted to add to my football graph was something location related so I could try out neo4j spatial and I thought the easiest way to do that was to model the location of football stadiums. To start with I needed to add spatial as an unmanaged extension to my neo4j plugins folder which involved doing the following: $ git clone git://github.com/neo4j/spatial.git spatial $ cd spatial $ mvn clean package -Dmaven. neo4j: Make properties relationships https://www.markhneedham.com/blog/2013/03/06/neo4j-make-properties-relationships/ Wed, 06 Mar 2013 00:59:36 +0000 https://www.markhneedham.com/blog/2013/03/06/neo4j-make-properties-relationships/ I spent some of the weekend working my way through Jim, Ian & Emil's book 'Graph Databases' and one of the things that they emphasise is that graphs allow us to make relationships first class citizens in our model. Looking back on a couple of the graphs that I modelled last year I realise that I didn’t quite get this and although the graphs I modelled had some relationships a lot of the time I was defining things as properties on nodes. Ruby/Haml: Conditionally/Optionally setting an attribute/class https://www.markhneedham.com/blog/2013/03/02/rubyhaml-conditionallyoptionally-setting-an-attributeclass/ Sat, 02 Mar 2013 23:22:50 +0000 https://www.markhneedham.com/blog/2013/03/02/rubyhaml-conditionallyoptionally-setting-an-attributeclass/ One of the things that we want to do reasonably frequently is set an attribute (most often a class) on a HTML element depending on the value of a variable. I always forget how to do this in Haml so I thought I better write it down so I’ll remember next time! Let’s say we want to add a success class to a paragraph if the variable correct is true and not have any value if it’s false. Ruby/Haml: Maintaining white space/indentation in a <pre> tag https://www.markhneedham.com/blog/2013/03/02/rubyhaml-maintaining-white-spaceindentation-in-a-pre-tag/ Sat, 02 Mar 2013 22:19:11 +0000 https://www.markhneedham.com/blog/2013/03/02/rubyhaml-maintaining-white-spaceindentation-in-a-pre-tag/ I’ve been writing a little web app in which I wanted to display cypher queries inside a tag which was then prettified using SyntaxHighlighter but I was having problems with how code on new lines was being displayed. I had the following Haml code to display a query looking up Gareth Bale in a graph: %pre{ :class => "brush: cypher; gutter: false; toolbar: false;"} START player = node:players('name:"Gareth Bale"') RETURN player. neo4j: Loading data - REST API vs Batch Import https://www.markhneedham.com/blog/2013/02/28/neo4j-loading-data-rest-api-vs-batch-import/ Thu, 28 Feb 2013 23:36:13 +0000 https://www.markhneedham.com/blog/2013/02/28/neo4j-loading-data-rest-api-vs-batch-import/ A couple of weeks ago when I first started playing around with my football data set I was loading all the data into neo4j using the REST API via neography which was taking around 4 minutes to load. The data set consisted of just over 250 matches which translated into 8,000 nodes & 30,000 relationships so it’s very small by all means. Ashok and I were discussing how that could be quicker and the first thing we tried was to store inserted nodes in an in memory hash map and look them up from there rather than doing an index lookup each time. Vertical/Horizontal Slicing https://www.markhneedham.com/blog/2013/02/28/verticalhorizontal-slicing/ Thu, 28 Feb 2013 22:23:27 +0000 https://www.markhneedham.com/blog/2013/02/28/verticalhorizontal-slicing/ A few years ago I wrote a bunch of posts exploring my experiences of outside in development eventually coming to the conclusion that it seemed to make sense to drive out functionality from the UI and work back from there. i.e. we take a vertical slice of functionality and then drive it end to end. On the team I’m working on there’s been success using an approach where the functionality is still split vertically but we work across a horizontal layer for all the cards before moving onto the next layer. Compatible Opinions & Confirmation Bias https://www.markhneedham.com/blog/2013/02/28/compatible-opinions-confirmation-bias/ Thu, 28 Feb 2013 21:57:11 +0000 https://www.markhneedham.com/blog/2013/02/28/compatible-opinions-confirmation-bias/ In 2011 Jay Fields wrote a blog post in which he suggested that it’s better to build teams in which people have a similar opinion on the way software should be built at a high level rather than having people whose opinions are in conflict. He referred to this as having 'compatible opinions on software' and since I read the post I’ve become much more aware of this myself on the teams that I’ve worked on. Micro Services: Where does the complexity go? https://www.markhneedham.com/blog/2013/02/28/micro-services-where-does-the-complexity-go/ Thu, 28 Feb 2013 00:00:03 +0000 https://www.markhneedham.com/blog/2013/02/28/micro-services-where-does-the-complexity-go/ For the past year every system that I’ve worked on has been designed around a micro services architecture and while there are benefits with this approach there is an inherent complexity in software which has to go somewhere! I thought it’d be interesting to run through some of the new complexities that I’ve noticed in what may well be an acknowledgement of the difficulty of designing distributed systems. Interactions between components One of the advantages of having lots of small applications is that each one is conceptually easier to understand and we only need to keep the mental model of how that one application works when we’re working on it. Reading Code: Assume it doesn't work https://www.markhneedham.com/blog/2013/02/27/reading-code-assume-it-doesnt-work/ Wed, 27 Feb 2013 23:12:49 +0000 https://www.markhneedham.com/blog/2013/02/27/reading-code-assume-it-doesnt-work/ Jae and I have spent a reasonable chunk of the past few weeks pairing on code that neither of us are familiar with and at times we’ve found it quite difficult to work out exactly what it’s supposed to be doing. My default stance in this situation is to assume that the code is probably correct and then try and work out how that’s the case. After I’d vocalised this a few times, Jae pointed out that we couldn’t be sure that the code worked and it didn’t make sense to start with that as an assumption. Micro Services: Readme files https://www.markhneedham.com/blog/2013/02/25/micro-services-readme-files/ Mon, 25 Feb 2013 23:58:51 +0000 https://www.markhneedham.com/blog/2013/02/25/micro-services-readme-files/ By my latest count I have around 15 different micro services/applications checked out on my machine which comprise the system that I’m currently working on. Most of these are Ruby related so it’s easy to figure out how to start up a local copy because it’s either bundle exec rails server if it’s a rails application or bundle exec backup if it’s a sinatra/rack application. The clojure applications follow a similar convention and we use rake to run any offline tasks. Pomodoros and the To-Do list https://www.markhneedham.com/blog/2013/02/25/pomodoros-and-the-to-do-list/ Mon, 25 Feb 2013 23:33:34 +0000 https://www.markhneedham.com/blog/2013/02/25/pomodoros-and-the-to-do-list/ Anna and I were recently discussing the way that we get things done outside of work and since December I’ve been fairly religiously working through various 'to-do' lists with a pomodoro timer. So far I’ve done 308 30 minute pomodoros in about 8 weeks which is just under 20 hours a week which is not bad but still leaves time for a ridiculous amount of procrastination. These are some of the things that I’ve noticed from only doing things when it’s explicitly on a timer: Reading outside your area of interest https://www.markhneedham.com/blog/2013/02/25/reading-outside-your-area-of-interest/ Mon, 25 Feb 2013 22:56:11 +0000 https://www.markhneedham.com/blog/2013/02/25/reading-outside-your-area-of-interest/ A reasonable amount of the information that I consume comes either via scanning twitter or from my prismatic feed but I noticed that I’m quite biased to reading things in similar subject areas. I tend to end up reading about data mining/science, functional programming and startups and while the articles are mostly interesting it does eventually start to feel like you’re in an echo chamber.</li> I have a subscription to the ACM mainly because I enjoy reading the 'Communications of the ACM' magazine which gets sent out every month and until recently I only read articles which I thought would be interesting. neo4j/cypher: Combining COUNT and COLLECT in one query https://www.markhneedham.com/blog/2013/02/24/neo4jcypher-combining-count-and-collect-in-one-query/ Sun, 24 Feb 2013 19:19:59 +0000 https://www.markhneedham.com/blog/2013/02/24/neo4jcypher-combining-count-and-collect-in-one-query/ In my continued playing around with football data I wanted to write a cypher query against neo4j which would show me which teams had missed the most penalties this season and who missed them. I started off with a query that returned all the penalties that have been missed this season and the games those misses happened in: START player = node:players('name:*') MATCH player-[:missed_penalty_in]-game, player-[:played|subbed_on]-stats-[:in]-game, stats-[:for]-team, game-[:home_team]-home, game-[:away_team]-away RETURN player. Ruby: Stripping out a non breaking space character ( ) https://www.markhneedham.com/blog/2013/02/23/ruby-stripping-out-a-non-breaking-space-character-nbsp/ Sat, 23 Feb 2013 15:04:58 +0000 https://www.markhneedham.com/blog/2013/02/23/ruby-stripping-out-a-non-breaking-space-character-nbsp/ A couple of days ago I was playing with some code to scrape data from a web page and I wanted to skip a row in a table if the row didn’t contain any text. I initially had the following code to do that: rows.each do |row| next if row.strip.empty? # other scraping code end Unfortunately that approach broke down fairly quickly because empty rows contained a non breaking space i. neo4j/cypher: Using a WHERE clause to filter paths https://www.markhneedham.com/blog/2013/02/19/neo4jcypher-using-a-where-clause-to-filter-paths/ Tue, 19 Feb 2013 00:03:18 +0000 https://www.markhneedham.com/blog/2013/02/19/neo4jcypher-using-a-where-clause-to-filter-paths/ One of the cypher queries that I wanted to write recently was one to find all the players that have started matches for Arsenal this season and the number of matches that they’ve played in. The data model that I’m querying looks like this: I started off with the following query which traverses from Arsenal to all the games that they’ve taken part in and finds all the players who’ve played in those games: Micro Services Style Data Work Flow https://www.markhneedham.com/blog/2013/02/18/micro-services-style-data-work-flow/ Mon, 18 Feb 2013 22:16:39 +0000 https://www.markhneedham.com/blog/2013/02/18/micro-services-style-data-work-flow/ Having worked on a few data related applications over the last ten months or so Ashok and I were recently discussing some of the things that we’ve learnt One of the things he pointed out is that it’s very helpful to separate the different stages of a data work flow into their own applications/scripts. I decided to try out this idea with some football data that I’m currently trying to model and I ended up with the following stages: neo4j/cypher: SQL style GROUP BY functionality https://www.markhneedham.com/blog/2013/02/17/neo4jcypher-sql-style-group-by-functionality/ Sun, 17 Feb 2013 21:05:27 +0000 https://www.markhneedham.com/blog/2013/02/17/neo4jcypher-sql-style-group-by-functionality/ As I mentioned in a previous post I’ve been playing around with some football related data over the last few days and one query I ran (using cypher) was to find all the players who’ve been sent off this season in the Premiership. The model in the graph around sending offs looks like this: My initial query looked like this: START player = node:players('name:*') MATCH player-[:sent_off_in]-game-[:in_month]-month RETURN player.name, month.name First we get the names of all the players which are stored in an index and then we follow relationships to the games they were sent off in and then find which months those games were played in. Data Science: Don't filter data prematurely https://www.markhneedham.com/blog/2013/02/17/data-science-dont-filter-data-prematurely/ Sun, 17 Feb 2013 20:02:31 +0000 https://www.markhneedham.com/blog/2013/02/17/data-science-dont-filter-data-prematurely/ Last year I wrote a post describing how I’d gone about getting data for my ThoughtWorks graph and one mistake about my approach in retrospect is that I filtered the data too early. My workflow looked like this: Scrape internal application using web driver and save useful data to JSON files Parse JSON files and load nodes/relationships into neo4j The problem with the first step is that I was trying to determine up front what data was useful and as a result I ended up running the scrapping application multiple times when I realised I didn’t have all the data I wanted. Regular Expressions: Non greedy matching https://www.markhneedham.com/blog/2013/02/16/regular-expressions-non-greedy-matching/ Sat, 16 Feb 2013 12:17:49 +0000 https://www.markhneedham.com/blog/2013/02/16/regular-expressions-non-greedy-matching/ I was playing around with some football data earlier in the week and I wanted to try and extract just the name 'Rooney' from the following bit of text: Rooney 8′, 27′ My initial regular expression was the following which annoyingly captures the time of the first goal: > "Rooney 8′, 27′".match(/(.*)\s\d(.*)/)[1] => "Rooney 8," It works fine if the player has only scored one goal… > "Rooney 8′".match(/(.*)\s\d(.*)/)[1] => "Rooney" . Onboarding: Sketch the landscape https://www.markhneedham.com/blog/2013/02/15/onboarding-sketch-the-landscape/ Fri, 15 Feb 2013 07:36:06 +0000 https://www.markhneedham.com/blog/2013/02/15/onboarding-sketch-the-landscape/ For four months during 2012 I was working on the GDS infrastructure team and one of the first tasks that Gareth suggested I do was update a diagram showing how all the different applications and databases worked together. I thought this was quite a strange thing to ask the 'new guy' to do since I obviously knew nothing at all about how anything worked but he told me that was partly why he wanted me to do it. Feature Extraction/Selection - What I've learnt so far https://www.markhneedham.com/blog/2013/02/10/feature-extractionselection-what-ive-learnt-so-far/ Sun, 10 Feb 2013 15:42:07 +0000 https://www.markhneedham.com/blog/2013/02/10/feature-extractionselection-what-ive-learnt-so-far/ A couple of weeks ago I wrote about some feature extraction work that I’d done on the Kaggle Digit Recognizer data set and having realised that I had no idea what I was doing I thought I should probably learn a bit more. I came across Dunja Mladenic’s 'Dimensionality Reduction by Feature Selection in Machine Learning' presentation in which she sweeps across the landscape of feature selection and explains how everything fits together. R: Building up a data frame row by row https://www.markhneedham.com/blog/2013/02/10/r-building-up-a-data-frame-row-by-row/ Sun, 10 Feb 2013 13:29:32 +0000 https://www.markhneedham.com/blog/2013/02/10/r-building-up-a-data-frame-row-by-row/ Jen and I recently started working on the Kaggle Titanic problem and we thought it’d probably be useful to start with some exploratory data analysis to get a feel for the data set. For this problem you are given a selection of different features describing the passengers on board the Titanic and you have to predict whether or not they would have survived or died based on those features. R: Modelling a conversion rate with a binomial distribution https://www.markhneedham.com/blog/2013/02/07/r-modelling-a-conversion-rate-with-a-binomial-distribution/ Thu, 07 Feb 2013 01:26:12 +0000 https://www.markhneedham.com/blog/2013/02/07/r-modelling-a-conversion-rate-with-a-binomial-distribution/ As part of some work Sid and I were doing last week we wanted to simulate the conversion rate for an A/B testing we were planning. We started with the following function which returns the simulated conversion rate for a given conversion rate of 12%: generateConversionRates <- function(sampleSize) { sample_a <- rbinom(seq(0, sampleSize), 1, 0.12) conversion_a <- length(sample_a[sample_a == 1]) / sampleSize sample_b <- rbinom(seq(0, sampleSize), 1, 0.12) conversion_b <- length(sample_b[sample_b == 1]) / sampleSize c(conversion_a, conversion_b) } If we call it: R: Mapping over a list of lists https://www.markhneedham.com/blog/2013/02/03/r-mapping-over-a-list-of-lists/ Sun, 03 Feb 2013 10:40:48 +0000 https://www.markhneedham.com/blog/2013/02/03/r-mapping-over-a-list-of-lists/ As part of the coursera Data Analysis course I had the following code to download and then read in a file: > file <- "https://dl.dropbox.com/u/7710864/data/csv_hid/ss06hid.csv" > download.file(file, destfile="americancommunity.csv", method="curl") > acomm <- read.csv("americancommunity.csv") We then had to filter the data based on the values in a couple of columns and work out how many rows were returned in each case: > one <- acomm[acomm$RMS == 4 & !is.na(acomm$RMS) & acomm$BDS == 3 & ! Kaggle Digit Recognizer: A feature extraction #fail https://www.markhneedham.com/blog/2013/01/31/kaggle-digit-recognizer-a-feature-extraction-fail/ Thu, 31 Jan 2013 23:24:55 +0000 https://www.markhneedham.com/blog/2013/01/31/kaggle-digit-recognizer-a-feature-extraction-fail/ I’ve written a few blog posts about our attempts at the Kaggle Digit Recogniser problem and one thing we haven’t yet tried is feature extraction. Feature extraction in this context means that we’d generate some other features to train a classifier with rather than relying on just the pixel values we were provided. Every week Jen would try and persuade me that we should try it out but it wasn’t until I was flicking through the notes from the Columbia Data Science class that it struck home: Levels of automation https://www.markhneedham.com/blog/2013/01/31/levels-of-automation/ Thu, 31 Jan 2013 22:36:34 +0000 https://www.markhneedham.com/blog/2013/01/31/levels-of-automation/ Over the last 18 months or so I’ve worked on a variety of different projects in different organisations and seen some patterns around the way that automation was done which I thought would be interesting to document. The approaches tend to fall into roughly three categories: Predominantly Manual This tends to be less frequent these days as most developers have at some stage flicked through The Pragmatic Programmer and been persuaded that automating away boring and repetitive tasks is probably a good idea. Ruby: invalid multibyte char (US-ASCII) https://www.markhneedham.com/blog/2013/01/27/ruby-invalid-multibyte-char-us-ascii/ Sun, 27 Jan 2013 15:14:01 +0000 https://www.markhneedham.com/blog/2013/01/27/ruby-invalid-multibyte-char-us-ascii/ I’ve used Ruby on and off for the last few years but somehow had never come across the following error which we got last week while attempting to print out a currency value: blah.ruby amount = "£10.00" puts amount $ ruby blah.ruby blah.ruby:1: invalid multibyte char (US-ASCII) blah.ruby:1: invalid multibyte char (US-ASCII) Luckily my pair Jae had come across this before and showed me a blog post which explains what’s going on and how to sort it out. A/B Testing: Thoughts so far https://www.markhneedham.com/blog/2013/01/27/ab-testing-thoughts-so-far/ Sun, 27 Jan 2013 13:27:32 +0000 https://www.markhneedham.com/blog/2013/01/27/ab-testing-thoughts-so-far/ I’ve been working at uSwitch for about two months now and for the majority of that time have been working on an A/B test we were running to try and make it easier for users to go through the energy comparison process. I found the 'Practical Guide to Controlled Experiments on the Web' paper useful for explaining how to go about doing an A/B test and there’s also an interesting presentation by Dan McKinley about how etsy do A/B testing. Python: (Conceptually) removing an item from a tuple https://www.markhneedham.com/blog/2013/01/27/python-conceptually-removing-an-item-from-a-tuple/ Sun, 27 Jan 2013 02:30:05 +0000 https://www.markhneedham.com/blog/2013/01/27/python-conceptually-removing-an-item-from-a-tuple/ As part of some code I’ve been playing around I wanted to remove an item from a tuple which wasn’t particularly easy because Python’s tuple data structure is immutable. I therefore needed to create a new tuple excluding the value which I wanted to remove. I ended up writing the following function to do this but I imagine there might be an easier way because it’s quite verbose: def tuple_without(original_tuple, element_to_remove): new_tuple = [] for s in list(original_tuple): if not s == element_to_remove: new_tuple. Python/numpy: Selecting values by multiple indices https://www.markhneedham.com/blog/2013/01/27/pythonnumpy-selecting-values-by-multiple-indices/ Sun, 27 Jan 2013 02:21:39 +0000 https://www.markhneedham.com/blog/2013/01/27/pythonnumpy-selecting-values-by-multiple-indices/ As I mentioned in my previous post I’ve been playing around with numpy and I wanted to get the values of a collection of different indices in a 2D array. If we had a 2D array that looked like this: >>> x = arange(20).reshape(4,5) >>> x array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19]]) I knew that it was possible to retrieve the first 3 rows by using the following code: Python/numpy: Selecting specific column in 2D array https://www.markhneedham.com/blog/2013/01/27/pythonnumpy-selecting-specific-column-in-2d-array/ Sun, 27 Jan 2013 02:10:10 +0000 https://www.markhneedham.com/blog/2013/01/27/pythonnumpy-selecting-specific-column-in-2d-array/ I’ve been playing around with numpy this evening in an attempt to improve the performance of a Travelling Salesman Problem implementation and I wanted to get every value in a specific column of a 2D array. The array looked something like this: >>> x = arange(20).reshape(4,5) >>> x array([[ 0, 1, 2, 3, 4], [ 5, 6, 7, 8, 9], [10, 11, 12, 13, 14], [15, 16, 17, 18, 19]]) I wanted to get the values for the 2nd column of each row which would return an array containing 1, 6, 11 and 16. R: Ordering rows in a data frame by multiple columns https://www.markhneedham.com/blog/2013/01/23/r-ordering-rows-in-a-data-frame-by-multiple-columns/ Wed, 23 Jan 2013 23:09:28 +0000 https://www.markhneedham.com/blog/2013/01/23/r-ordering-rows-in-a-data-frame-by-multiple-columns/ In one of the assignments of Computing for Data Analysis we needed to sort a data frame based on the values in two of the columns and then return the top value. The initial data frame looked a bit like this: > names <- c("paul", "mark", "dave", "will", "john") > values <- c(1,4,1,2,1) > smallData <- data.frame(name = names, value = values) > smallData name value 1 paul 1 2 mark 4 3 dave 1 4 will 2 5 john 1 I want to be able to sort the data frame by value and name both in ascending order so the final result should look like this: R: Filter a data frame based on values in two columns https://www.markhneedham.com/blog/2013/01/23/r-filter-a-data-frame-based-on-values-in-two-columns/ Wed, 23 Jan 2013 22:34:01 +0000 https://www.markhneedham.com/blog/2013/01/23/r-filter-a-data-frame-based-on-values-in-two-columns/ In the most recent assignment of the Computing for Data Analysis course we had to filter a data frame which contained N/A values in two columns to only return rows which had no N/A’s. I started with a data frame that looked like this: > data <- read.csv("specdata/002.csv") > # we'll just use a few rows to make it easier to see what's going on > data[2494:2500,] Date sulfate nitrate ID 2494 2007-10-30 3. Bellman-Ford algorithm in Python using vectorisation/numpy https://www.markhneedham.com/blog/2013/01/20/bellman-ford-algorithm-in-python-using-vectorisationnumpy/ Sun, 20 Jan 2013 19:14:08 +0000 https://www.markhneedham.com/blog/2013/01/20/bellman-ford-algorithm-in-python-using-vectorisationnumpy/ I recently wrote about an implementation of the Bellman Ford shortest path algorithm and concluded by saying that it took 27 seconds to calculate the shortest path in the graph for any node. This seemed a bit slow and while browsing the Coursera forums I came across a suggestion that the algorithm would run much more quickly if we used vectorization with numpy rather than nested for loops. Vectorisation refers to a problem solving approach where we make use of matrices operations which is what numpy allows us to do. telnet/netcat: Waiting for a port to be open https://www.markhneedham.com/blog/2013/01/20/waiting-for-a-port-to-be-open/ Sun, 20 Jan 2013 15:53:02 +0000 https://www.markhneedham.com/blog/2013/01/20/waiting-for-a-port-to-be-open/ On Friday Nathan and I were setting up a new virtual machine and we needed a firewall rule to be created to allow us to connect to another machine which had some JAR files we wanted to download. We wanted to know when it had been done by one of our operations team and I initially thought we might be able to do that using telnet: $ telnet 10.0.0.1 8081 Trying 10. Bellman-Ford algorithm in Python https://www.markhneedham.com/blog/2013/01/18/bellman-ford-algorithm-in-python/ Fri, 18 Jan 2013 00:40:32 +0000 https://www.markhneedham.com/blog/2013/01/18/bellman-ford-algorithm-in-python/ The latest problem of the Algorithms 2 class required us to write an algorithm to calculate the shortest path between two nodes on a graph and one algorithm which allows us to do this is Bellman-Ford. Bellman-Ford computes the single source shortest path which means that if we have a 5 vertex graph we’d need to run it 5 times to find the shortest path for each vertex and then find the shortest paths of those shortest paths. Fabric/Boto: boto.exception.NoAuthHandlerFound: No handler was ready to authenticate. 1 handlers were checked. ['QuerySignatureV2AuthHandler'] Check your credentials https://www.markhneedham.com/blog/2013/01/15/fabricboto-boto-exception-noauthhandlerfound-no-handler-was-ready-to-authenticate-1-handlers-were-checked-querysignaturev2authhandler-check-your-credentials/ Tue, 15 Jan 2013 00:37:01 +0000 https://www.markhneedham.com/blog/2013/01/15/fabricboto-boto-exception-noauthhandlerfound-no-handler-was-ready-to-authenticate-1-handlers-were-checked-querysignaturev2authhandler-check-your-credentials/ In our Fabric code we make use of Boto to connect to the EC2 API and pull back various bits of information and the first time anyone tries to use it they end up with the following stack trace: File "/Library/Python/2.7/site-packages/fabric/main.py", line 717, in main *args, **kwargs File "/Library/Python/2.7/site-packages/fabric/tasks.py", line 332, in execute results['<local-only>'] = task.run(*args, **new_kwargs) File "/Library/Python/2.7/site-packages/fabric/tasks.py", line 112, in run return self.wrapped(*args, **kwargs) File "/Users/mark/projects/forward-puppet/ec2.py", line 131, in running instances = instances_by_zones(running_instances(region, role_name)) File "/Users/mark/projects/forward-puppet/ec2. Fabric: Tailing log files on multiple machines https://www.markhneedham.com/blog/2013/01/15/fabric-tailing-log-files-on-multiple-machines/ Tue, 15 Jan 2013 00:20:49 +0000 https://www.markhneedham.com/blog/2013/01/15/fabric-tailing-log-files-on-multiple-machines/ We wanted to tail one of the log files simultaneously on 12 servers this afternoon to try and see if a particular event was being logged and rather than opening 12 SSH sessions decided to get Fabric to help us out. My initial attempt to do this was the following: fab -H host1,host2,host3 -- tail -f /var/www/awesome/current/log/production.log It works but the problem is that by default Fabric runs the specified command one machine after the other so we’ve actually managed to block Fabric with the tail command on 'host1'. Clojure: Reading and writing a reasonably sized file https://www.markhneedham.com/blog/2013/01/11/clojure-reading-and-writing-a-reasonably-sized-file/ Fri, 11 Jan 2013 00:40:49 +0000 https://www.markhneedham.com/blog/2013/01/11/clojure-reading-and-writing-a-reasonably-sized-file/ In a post a couple of days ago I described some code I’d written in R to find out all the features with zero variance in the Kaggle Digit Recognizer data set and yesterday I started working on some code to remove those features. Jen and I had previously written some code to parse the training data in Clojure so I thought I’d try and adapt that to write out a new file without the unwanted pixels. Knapsack Problem in Haskell https://www.markhneedham.com/blog/2013/01/09/knapsack-problem-in-haskell/ Wed, 09 Jan 2013 00:12:25 +0000 https://www.markhneedham.com/blog/2013/01/09/knapsack-problem-in-haskell/ I recently described two versions of the Knapsack problem written in Ruby and Python and one common thing is that I used a global cache to store the results of previous calculations. From my experience of coding in Haskell it’s not considered very idiomatic to write code like that and although I haven’t actually tried it, potentially more tricky to achieve. I thought it’d be interesting to try and write the algorithm in Haskell with that constraint in mind and my first version looked like this: Kaggle Digit Recognizer: Finding pixels with no variance using R https://www.markhneedham.com/blog/2013/01/08/kaggle-digit-recognizer-finding-pixels-with-no-variance-using-r/ Tue, 08 Jan 2013 00:48:07 +0000 https://www.markhneedham.com/blog/2013/01/08/kaggle-digit-recognizer-finding-pixels-with-no-variance-using-r/ I’ve written previously about our attempts at the Kaggle Digit Recogniser problem and our approach so far has been to use the data provided and plug it into different algorithms and see what we end up with. From browsing through the forums we saw others mentioning feature extraction - an approach where we transform the data into another format , the thinking being that we can train a better classifier with better data. Knapsack Problem: Python vs Ruby https://www.markhneedham.com/blog/2013/01/07/knapsack-problem-python-vs-ruby/ Mon, 07 Jan 2013 00:47:34 +0000 https://www.markhneedham.com/blog/2013/01/07/knapsack-problem-python-vs-ruby/ The latest algorithm that we had to code in Algorithms 2 was the Knapsack problem which is as follows: The knapsack problem or rucksack problem is a problem in combinatorial optimization: Given a set of items, each with a weight and a value, determine the number of each item to include in a collection so that the total weight is less than or equal to a given limit and the total value is as large as possible. A new year's idea: Share what you learn https://www.markhneedham.com/blog/2013/01/05/a-new-years-idea-share-what-you-learn/ Sat, 05 Jan 2013 00:25:30 +0000 https://www.markhneedham.com/blog/2013/01/05/a-new-years-idea-share-what-you-learn/ Apologies in advance for how meta this post is. About 4 1/2 years ago Jay Fields wrote a blog post where he encouraged people to write, present and contribute and outlined the advantages he’d seen in his career from doing so. In hindsight the bit which stood out the most for me was the following paragraph: Don’t know what to write about? The answers are all around you. Anything you do that’s interesting, there’s 100 people searching Google for how to do it. Haskell: Reading files https://www.markhneedham.com/blog/2013/01/02/haskell-reading-files/ Wed, 02 Jan 2013 00:16:50 +0000 https://www.markhneedham.com/blog/2013/01/02/haskell-reading-files/ In writing the clustering algorithm which I’ve mentioned way too many times already I needed to process a text file which contained all the points and my initial approach looked like this: import System.IO main = do withFile "clustering2.txt" ReadMode (\handle -> do contents <- hGetContents handle putStrLn contents) It felt a bit clunky but I didn’t realise there was an easier way until I came across this thread. We can simplify reading a file to the following by using the http://zvon. TextMate Bundles location on Mountain Lion https://www.markhneedham.com/blog/2012/12/31/textmate-bundles-location-on-mountain-lion/ Mon, 31 Dec 2012 23:59:42 +0000 https://www.markhneedham.com/blog/2012/12/31/textmate-bundles-location-on-mountain-lion/ Something that I’ve noticed when trying to install various different bundles is that the installation instructions which worked flawlessly on Snow Leopard don’t seem to do the job on Mountain Lion. For example, the Clojure bundle assumes that the installation directory is '~/Library/Application\ Support/TextMate/Bundles' but for some reason the 'Bundles' folder doesn’t exist. We therefore have two choices: mkdir -p ~/Library/Application\ Support/TextMate/Bundles and then continue as normal Install our bundle into '/Applications/TextMate. Haskell: Downloading the core library source code https://www.markhneedham.com/blog/2012/12/31/haskell-downloading-the-core-library-source-code/ Mon, 31 Dec 2012 22:39:15 +0000 https://www.markhneedham.com/blog/2012/12/31/haskell-downloading-the-core-library-source-code/ I’ve started playing around with Haskell again and since I’m doing so on a new machine I don’t have a copy of the language source code. I wanted to rectify that situation but my Google fu was weak and it took me way too long to figure out how to get it so I thought I’d better document it for future me. The easiest way is to clone the copy of the GHC repository on github: ~text git clone https://github. Haskell: Strictness and the monadic bind https://www.markhneedham.com/blog/2012/12/31/haskell-strictness-and-the-monadic-bind/ Mon, 31 Dec 2012 22:27:15 +0000 https://www.markhneedham.com/blog/2012/12/31/haskell-strictness-and-the-monadic-bind/ As I mentioned towards the end of my post about implementing the union find data structure in Haskell I wrote another version using a mutable array and having not seen much of a performance improvement started commenting out code to try and find the problem. I eventually narrowed it down to the union function which was defined like so: union :: IO (IOArray Int Int) -> Int -> Int -> IO (IOArray Int Int) union arrayContainer x y = do actualArray <- arrayContainer ls <- getAssocs actualArray leader1 <- readArray actualArray x leader2 <- readArray actualArray y let newValues = (map (\(index, value) -> (index, leader1)) . Haskell: An impressively non performant union find https://www.markhneedham.com/blog/2012/12/31/haskell-an-impressively-non-performant-union-find/ Mon, 31 Dec 2012 20:44:56 +0000 https://www.markhneedham.com/blog/2012/12/31/haskell-an-impressively-non-performant-union-find/ I’ve spent the best part of the last day debugging a clustering algorithm I wrote as part of the Algorithms 2 course, eventually coming to the conclusion that the union find data structure I was using wasn’t working as expected. In our algorithm we’re trying to group together points which are 'close' to each other and the data structure is particular useful for doing that. To paraphrase from my previous post about how we use the union find data structure:</p> Bitwise operations in Ruby and Haskell https://www.markhneedham.com/blog/2012/12/31/bitwise-operations-in-ruby-and-haskell/ Mon, 31 Dec 2012 13:14:42 +0000 https://www.markhneedham.com/blog/2012/12/31/bitwise-operations-in-ruby-and-haskell/ Part of one of the most recent problems in the Algorithms 2 course required us to find the 'neighbours' of binary values. In this case a neighbour is described as being any other binary value which has an equivalent value or differs in 1 or 2 bits. e.g. the neighbours of '10000' would be '00000', '00001', '00010', '00100', ''01000', '10001', '10010', '10011', '10100', '10101', '10110', '11000', '11001', '11010' and '11100'~ Gamification and Software: Some thoughts https://www.markhneedham.com/blog/2012/12/31/gamification-and-software-some-thoughts/ Mon, 31 Dec 2012 10:57:19 +0000 https://www.markhneedham.com/blog/2012/12/31/gamification-and-software-some-thoughts/ On the recommendation of J.B. Rainsberger I’ve been reading 'Reality is Broken' - a book which talks about how we can apply some of the things games designers have learned about getting people engaged to real life. The author, Jane McGonigal, also has a TED talk on the topic which will help you get a flavour for the topic. I was particularly interested in trying to see how her ideas could be applied in a software context and indeed how they are already being applied. Haskell: Using qualified imports to avoid polluting the namespace https://www.markhneedham.com/blog/2012/12/30/haskell-using-qualified-imports-to-avoid-polluting-the-namespace/ Sun, 30 Dec 2012 23:16:48 +0000 https://www.markhneedham.com/blog/2012/12/30/haskell-using-qualified-imports-to-avoid-polluting-the-namespace/ In most of the Haskell code I’ve read any functions from other modules have been imported directly into the namespace and I reached the stage where I had this list of imports in a file: import System.IO import Data.List.Split import Data.Char import Data.Bits import Control.Monad import Data.Map import Data.Set import Data.List import Data.Maybe This becomes a problem when you want to use a function which is defined in multiple modules such as filter: Haskell: Pattern matching a list https://www.markhneedham.com/blog/2012/12/30/haskell-pattern-matching-a-list/ Sun, 30 Dec 2012 22:39:16 +0000 https://www.markhneedham.com/blog/2012/12/30/haskell-pattern-matching-a-list/ As I mentioned in a post yesterday I’ve been converting a clustering algorithm into Haskell and I wanted to get the value from doing a bit wise or on two values in a list. I forgot it was possible to pattern match on lists until I came across a post I wrote about 8 months ago where I’d done this so my initial code looked like this: > import Data. Haskell: A cleaner way of initialising a map https://www.markhneedham.com/blog/2012/12/29/haskell-a-cleaner-way-of-initialising-a-map/ Sat, 29 Dec 2012 20:14:12 +0000 https://www.markhneedham.com/blog/2012/12/29/haskell-a-cleaner-way-of-initialising-a-map/ I recently wrote a blog post showing a way of initialising a Haskell map and towards the end of the post I realised how convoluted my approach was and wondered if there was an easier way and indeed there is! To recap, this is the code I ended up with to populate a map with binary based values as the keys and node ids as the values: import Data.Map toMap :: [Int] -> Map Int [Int] toMap nodes = fromList $ map asMapEntry $ (groupIgnoringIndex . Haskell: Initialising a map https://www.markhneedham.com/blog/2012/12/29/haskell-initialising-a-map/ Sat, 29 Dec 2012 19:27:46 +0000 https://www.markhneedham.com/blog/2012/12/29/haskell-initialising-a-map/ I’ve been converting a variation of Kruskal’s algorithm from Ruby into Haskell and one thing I needed to do was create a map of binary based values to node ids. In Ruby I wrote the following code to do this: nodes = [1,2,5,7,2,4] @magical_hash = {} nodes.each_with_index do |node, index| @magical_hash[node] ||= [] @magical_hash[node] << index end => {1=>[0], 2=>[1, 4], 5=>[2], 7=>[3], 4=>[5]} From looking at the documentation it seemed like the easiest way to do this in Haskell would be to convert the nodes into an appropriate list and then call the http://www. Sed: Replacing characters with a new line https://www.markhneedham.com/blog/2012/12/29/sed-replacing-characters-with-a-new-line/ Sat, 29 Dec 2012 17:49:46 +0000 https://www.markhneedham.com/blog/2012/12/29/sed-replacing-characters-with-a-new-line/ I’ve been playing around with writing some algorithms in both Ruby and Haskell and the latter wasn’t giving the correct result so I wanted to output an intermediate state of the two programs and compare them. I didn’t do any fancy formatting of the output from either program so I had the raw data structures in text files which I needed to transform so that they were comparable. The main thing I wanted to do was get each of the elements of the collection onto their own line. Restricting your own learning https://www.markhneedham.com/blog/2012/12/27/restricting-your-own-learning/ Thu, 27 Dec 2012 00:45:59 +0000 https://www.markhneedham.com/blog/2012/12/27/restricting-your-own-learning/ For the first few years that I worked professionally* every project that I worked on was different enough to the previous ones that I was always learning something new without having to put much effort in. After a while this became less the case because I’d seen more things and if I saw something even remotely similar I would abstract it away as something that I’d done before. A couple of months ago Martin Fowler wrote a blog post about priming and how research has showed that exposure to a stimulus influences a response to a later stimulus. Mahout: Parallelising the creation of DecisionTrees https://www.markhneedham.com/blog/2012/12/27/mahout-parallelising-the-creation-of-decisiontrees/ Thu, 27 Dec 2012 00:08:01 +0000 https://www.markhneedham.com/blog/2012/12/27/mahout-parallelising-the-creation-of-decisiontrees/ A couple of months ago I wrote a blog post describing our use of Mahout random forests for the Kaggle Digit Recogniser Problem and after seeing how long it took to create forests with 500+ trees I wanted to see if this could be sped up by parallelising the process. From looking at the https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/classifier/df/DecisionForest.java it seemed like it should be possible to create lots of small forests and then combine them together. The Tracer Bullet Approach: An example https://www.markhneedham.com/blog/2012/12/24/the-tracer-bullet-approach-an-example/ Mon, 24 Dec 2012 09:09:44 +0000 https://www.markhneedham.com/blog/2012/12/24/the-tracer-bullet-approach-an-example/ A few weeks ago my former colleague Kief Morris wrote a blog post describing the tracer bullet approach he’s used to setup a continuous delivery pipeline on his current project. The idea is to get the simplest implementation of a pipeline in place, prioritizing a fully working skeleton that stretches across the full path to production over a fully featured, final-design functionality for each stage of the pipeline. Kief goes on to explain in detail how we can go about executing this and it reminded of a project I worked on almost 3 years ago where we took a similar approach. Kruskal's Algorithm using union find in Ruby https://www.markhneedham.com/blog/2012/12/23/kruskals-algorithm-using-union-find-in-ruby/ Sun, 23 Dec 2012 21:43:42 +0000 https://www.markhneedham.com/blog/2012/12/23/kruskals-algorithm-using-union-find-in-ruby/ I recently wrote a blog post describing my implementation of Kruskal’s algorithm - a greedy algorithm using to find a minimum spanning tree (MST) of a graph - and while it does the job it’s not particularly quick. It takes 20 seconds to calculate the MST for a 500 node, ~2000 edge graph. One way that we can improve the performance of the algorithm is by storing the MST in a union find/http://en. Kruskal's Algorithm in Ruby https://www.markhneedham.com/blog/2012/12/23/kruskals-algorithm-in-ruby/ Sun, 23 Dec 2012 14:18:53 +0000 https://www.markhneedham.com/blog/2012/12/23/kruskals-algorithm-in-ruby/ Last week I wrote a couple of posts showing different implementations of Prim’s algorithm - an algorithm using to find a minimum spanning tree in a graph - and a similar algorithm is Kruskal’s algorithm. Kruskal’s algorithm also finds a minimum spanning tree but it goes about it in a slightly different way. Prim’s algorithm takes an approach whereby we select nodes and then find connecting edges until we’ve covered all the nodes. Prim's algorithm using a heap/priority queue in Ruby https://www.markhneedham.com/blog/2012/12/15/prims-algorithm-using-a-heappriority-queue-in-ruby/ Sat, 15 Dec 2012 16:31:05 +0000 https://www.markhneedham.com/blog/2012/12/15/prims-algorithm-using-a-heappriority-queue-in-ruby/ I recently wrote a blog post describing my implementation of Prim’s Algorithm for the Algorithms 2 class and while it comes up with the right answer for the supplied data set it takes almost 30 seconds to do so! In one of the lectures Tim Roughgarden points out that we’re doing the same calculations multiple times to work out the next smallest edge to include in our minimal spanning tree and could use a heap to speed things up. Prim's Algorithm in Ruby https://www.markhneedham.com/blog/2012/12/15/prims-algorithm-in-ruby/ Sat, 15 Dec 2012 02:51:14 +0000 https://www.markhneedham.com/blog/2012/12/15/prims-algorithm-in-ruby/ One of the first programming assignments of the Algorithms 2 course was to code Prim’s algorithm - a greedy algorithm used to find the minimum spanning tree of a connected weighted undirected graph. In simpler terms we need to find the path of least cost which connects all of the nodes together and there can’t be any cycles in that path. Wikipedia has a neat diagram which shows this more clearly: Weka: Saving and loading classifiers https://www.markhneedham.com/blog/2012/12/12/weka-saving-and-loading-classifiers/ Wed, 12 Dec 2012 00:04:42 +0000 https://www.markhneedham.com/blog/2012/12/12/weka-saving-and-loading-classifiers/ In our continued machine learning travels Jen and I have been building some classifiers using Weka and one thing we wanted to do was save the classifier and then reuse it later. There is documentation for how to do this from the command line but we’re doing everything programatically and wanted to be able to save our classifiers from Java code. As it turns out it’s not too tricky when you know which classes to call and saving a classifier to a file is as simple as this: rsyncing to an AWS instance https://www.markhneedham.com/blog/2012/12/11/rsyncing-to-an-aws-instance/ Tue, 11 Dec 2012 23:44:05 +0000 https://www.markhneedham.com/blog/2012/12/11/rsyncing-to-an-aws-instance/ I wanted to try running some of the machine learning algorithms that Jen and I have been playing around with on a beefier machine so I thought spinning up an AWS instance would be the best way to do that. I built the JAR with the appropriate algorithms on my machine and then wanted to copy it up onto an AWS instance. I could have used scp but I quite like the progress bar that you can get with rsync and since the JAR had somehow drifted to a size of 47MB the progress bar was useful. apt-get update: 416 Requested Range Not Satisfiable https://www.markhneedham.com/blog/2012/12/10/apt-get-update-416-requested-range-not-satisfiable/ Mon, 10 Dec 2012 00:39:34 +0000 https://www.markhneedham.com/blog/2012/12/10/apt-get-update-416-requested-range-not-satisfiable/ We were trying to run a puppet update on some machines last week and one of the first things it does is run 'apt-get update' which was working on all but one node for which it was returning the following exception: Err http://us-west-1.ec2.archive.ubuntu.com/ubuntu/ i386 Packages 416 Requested Range Not Satisfiable Fetched 5,079B in 2s (2,296B/s) W: Failed to fetch http://us-west-1.ec2.archive.ubuntu.com/ubuntu/dists/maverick-updates/main/binary-i386/Packages.gz 416 Requested Range Not Satisfiable It turns out one way that exception can manifest is if you’ve got a partial copy of the index files from the repository and in this case the solution was as simple as deleting those and trying again: Data Science: Discovery work https://www.markhneedham.com/blog/2012/12/09/data-science-discovery-work/ Sun, 09 Dec 2012 10:36:39 +0000 https://www.markhneedham.com/blog/2012/12/09/data-science-discovery-work/ Aaron Erickson recently wrote a blog post where he talks through some of the problems he’s seen with big data initiatives where organisations end up buying a product and expecting it to magically produce results. […] corporate IT departments are suddenly are looking at their long running “Business Intelligence” initiatives and wondering why they are not seeing the same kinds of return on investment. They are thinking… if only we tweaked that “BI” initiative and somehow mix in some “Big Data”, maybe we could become the next Amazon. Micro Services: Plugging in 3rd party components https://www.markhneedham.com/blog/2012/12/04/micro-services-plugging-in-3rd-party-components/ Tue, 04 Dec 2012 23:38:39 +0000 https://www.markhneedham.com/blog/2012/12/04/micro-services-plugging-in-3rd-party-components/ Over the past few weeks I’ve been involved in conversations with different clients around micro services and one thing about this architecture that seems quite popular is the ability to easily plug in 3rd party components. In one case we were talking through the design of a system which would calculate and then apply price optimisations on products. The parts of the system we were discussing looked roughly like this: There's No such thing as a 'DevOps Team': Some thoughts https://www.markhneedham.com/blog/2012/11/30/theres-no-such-thing-as-a-devops-team-some-thoughts/ Fri, 30 Nov 2012 16:56:16 +0000 https://www.markhneedham.com/blog/2012/11/30/theres-no-such-thing-as-a-devops-team-some-thoughts/ A few weeks ago Jez Humble wrote a blog post titled "There’s no such thing as a 'DevOps team'" where he explains what DevOps is actually supposed to be about and describes a model of how developers and operations folk can work together. Jez’s suggestion is for developers to take responsibility for the systems they create but he notes that: [...] they need support from operations to understand how to build reliable software that can be continuous deployed to an unreliable platform that scales horizontally. Kaggle Digit Recognizer: Weka AdaBoost attempt https://www.markhneedham.com/blog/2012/11/29/kaggle-digit-recognizer-weka-adaboost-attempt/ Thu, 29 Nov 2012 17:09:29 +0000 https://www.markhneedham.com/blog/2012/11/29/kaggle-digit-recognizer-weka-adaboost-attempt/ In our latest attempt at Kaggle’s Digit Recognizer Jen and I decided to try out boosting on our random forest algorithm, an approach that Jen had come across in a talk at the Clojure Conj. We couldn’t find any documentation that it was possible to apply boosting to Mahout’s random forest algorithm but we knew it was possible with Weka so we decided to use that instead! As I understand it the way that boosting works in the context of random forests is that each of the trees in the forest will be assigned a weight based on how accurately it’s able to classify the data set and these weights are then used in the voting stage. Micro Services: The curse of code 'duplication' https://www.markhneedham.com/blog/2012/11/28/micro-services-the-curse-of-code-duplication/ Wed, 28 Nov 2012 08:11:04 +0000 https://www.markhneedham.com/blog/2012/11/28/micro-services-the-curse-of-code-duplication/ A common approach we’ve been taking on some of the applications I’ve worked on recently is to decompose the system we’re building into smaller micro services which are independently deployable and communicate with each other over HTTP. An advantage of decomposing systems like that is that we could have separate teams working on each service and then make use of a consumer driven contract as a way of ensuring the contract between them is correct. Jersey: com.sun.jersey.api.client.ClientHandlerException: A message body reader for Java class [...] and MIME media type application/json was not found https://www.markhneedham.com/blog/2012/11/28/jersey-com-sun-jersey-api-client-clienthandlerexception-a-message-body-reader-for-java-class-and-mime-media-type-applicationjson-was-not-found/ Wed, 28 Nov 2012 06:03:55 +0000 https://www.markhneedham.com/blog/2012/11/28/jersey-com-sun-jersey-api-client-clienthandlerexception-a-message-body-reader-for-java-class-and-mime-media-type-applicationjson-was-not-found/ We’ve used the Jersey library on the last couple of Java based applications that I’ve worked on and one thing we’ve done on both of them is write services that communicate with each other using JSON. On both occasions we didn’t quite setup the Jersey client correctly and ended up with an error along these lines when making a call to an end point: com.sun.jersey.api.client.ClientHandlerException: A message body reader for Java class java. IntelliJ Debug Mode: Viewing beyond 100 frames/items in an array https://www.markhneedham.com/blog/2012/11/26/intellij-debug-mode-viewing-beyond-100-framesitems-in-an-array/ Mon, 26 Nov 2012 04:28:22 +0000 https://www.markhneedham.com/blog/2012/11/26/intellij-debug-mode-viewing-beyond-100-framesitems-in-an-array/ In my continued attempts at the Kaggle Digit Recognizer problem I’ve been playing around with the encog library to try and build a neural networks solution to the problem. Unfortunately it’s not quite working at the moment so I wanted to debug the code and see whether the input parameters were being correctly translated from the CSV file. Each input is an array containing 784 values but by default IntelliJ restricts you to seeing 100 elements which wasn’t helpful in my case since the early values tend to all be 0 and it’s not until you get half way through that you see different values: A first failed attempt at Natural Language Processing https://www.markhneedham.com/blog/2012/11/24/a-first-failed-attempt-at-natural-language-processing/ Sat, 24 Nov 2012 19:43:32 +0000 https://www.markhneedham.com/blog/2012/11/24/a-first-failed-attempt-at-natural-language-processing/ One of the things I find fascinating about dating websites is that the profiles of people are almost identical so I thought it would be an interesting exercise to grab some of the free text that people write about themselves and prove the similarity. I’d been talking to Matt Biddulph about some Natural Language Processing (NLP) stuff he’d been working on and he wrote up a bunch of libraries, articles and books that he’d found useful. Core Competency https://www.markhneedham.com/blog/2012/11/24/core-competency/ Sat, 24 Nov 2012 12:44:03 +0000 https://www.markhneedham.com/blog/2012/11/24/core-competency/ For at least the last few years I’ve heard colleagues talk about working out the core competency of our clients businesses and I’d confused myself into thinking that the software we helped them build was the core competency. I think Martin Fowler best explains how technology and business core competences work in his post about utility and strategic projects where he describes the difference between these like so: So what is the distinguishing factor between utility and strategic projects? Windows line endings: Exception in thread 'main' java.io.FileNotFoundException /opt/app/config.yml{caret}M (no such file or directory) https://www.markhneedham.com/blog/2012/11/24/windows-line-endings-exception-in-thread-main-java-io-filenotfoundexception-optappconfig-ymlm-no-such-file-or-directory/ Sat, 24 Nov 2012 09:04:17 +0000 https://www.markhneedham.com/blog/2012/11/24/windows-line-endings-exception-in-thread-main-java-io-filenotfoundexception-optappconfig-ymlm-no-such-file-or-directory/ As I mentioned in my previous post we’ve been making it possible to deploy our application to a new environment and as part of this we defined an upstart script which would run the JAR. We tend to edit code on Windows and then test it out on the vagrant VM afterwards. The end of our upstart script looked a bit like this: script cd /opt/app java -jar /opt/app/app.jar /opt/app/config. Java: java.lang.UnsupportedClassVersionError - Unsupported major.minor version 51.0 https://www.markhneedham.com/blog/2012/11/24/java-java-lang-unsupportedclassversionerror-unsupported-major-minor-version-51-0/ Sat, 24 Nov 2012 08:49:28 +0000 https://www.markhneedham.com/blog/2012/11/24/java-java-lang-unsupportedclassversionerror-unsupported-major-minor-version-51-0/ On my current project we’ve spent the last day or so setting up an environment where we can deploy a couple of micro services to. Although the machines are Windows based we’re deploying the application onto a vagrant managed VM since the production environment will be a flavour of Linux. Initially I was getting quite confused about whether or not we were in the VM or not and ended up with this error when trying to run the compiled JAR: Looking inside the black box https://www.markhneedham.com/blog/2012/11/21/looking-inside-the-black-box/ Wed, 21 Nov 2012 19:42:15 +0000 https://www.markhneedham.com/blog/2012/11/21/looking-inside-the-black-box/ I recently came across a really interesting post about black box abstraction by Angeleah where she talks about developers desire to know how things work and the need to understand when and when not to follow that instinct. Angeleah defines black box abstraction like so: It is a technique for controlling complexity and abstracting detail. The point of doing this is to allow you to to build bigger things. Hopefully bigger boxes. Learning: Switching between theory and practice https://www.markhneedham.com/blog/2012/11/19/learning-switching-between-theory-and-practice/ Mon, 19 Nov 2012 13:31:49 +0000 https://www.markhneedham.com/blog/2012/11/19/learning-switching-between-theory-and-practice/ In one of my first ever blog posts I wrote about the differences I’d experienced in learning the theory about a topic and then seeing it in practice. The way I remember learning at school and university was that you learn all the theory first and then put it into practice but I typically don’t find myself doing this whenever I learn something new. I spent a bit of time over the weekend learning more about neural networks as my colleague Jen Smith suggested this might be a more effective technique for getting a higher accuracy score on the Kaggle Digit Recogniser problem. Incremental/iterative development: Breaking down work https://www.markhneedham.com/blog/2012/11/19/incrementaliterative-development-breaking-down-work/ Mon, 19 Nov 2012 08:50:07 +0000 https://www.markhneedham.com/blog/2012/11/19/incrementaliterative-development-breaking-down-work/ Over the past couple of years I’ve worked on several different applications and one thing they had in common was that they had a huge feature which would take a few months to complete and initially seemed difficult to break down. Since we favoured an incremental/iterative approach to building these features and wanted to add value in short feedback cycles we needed to find a way to break them down. Buy vs Build: Driving from the problem https://www.markhneedham.com/blog/2012/11/17/buy-vs-build-driving-from-the-problem/ Sat, 17 Nov 2012 16:56:28 +0000 https://www.markhneedham.com/blog/2012/11/17/buy-vs-build-driving-from-the-problem/ My colleague Erik Doernenburg has written a couple of articles recently discussing the reasons why people buy and build IT solutions and one part in particular resonated with me: it is also possible, and not uncommon, that the software package does not do exactly what the business needs, leading to decreased productivity and lost opportunities. I feel like there’s a mindset change once you start thinking which package you could buy to solve your problem whereby you stop solving the problem you actually have and focus instead on what features the package offers. Web Operations: Feature flags to turn off failing parts of infrastructure https://www.markhneedham.com/blog/2012/11/13/web-operations-feature-flags-to-turn-off-failing-parts-of-infrastructure/ Tue, 13 Nov 2012 12:19:30 +0000 https://www.markhneedham.com/blog/2012/11/13/web-operations-feature-flags-to-turn-off-failing-parts-of-infrastructure/ On most of the projects I’ve worked on over the last couple of years we’ve made use of feature toggles that we used to turn pending features on and off while they were still being built but while reading Web Operations I came across another usage. In the chapter titled 'Dev and Ops Collaboration and Cooperation' Paul Hammond suggests the following: Eventually some of your infrastructure will fail in an unexpected way. Unix: Counting the number of commas on a line https://www.markhneedham.com/blog/2012/11/10/unix-counting-the-number-of-commas-on-a-line/ Sat, 10 Nov 2012 16:30:48 +0000 https://www.markhneedham.com/blog/2012/11/10/unix-counting-the-number-of-commas-on-a-line/ A few weeks ago I was playing around with some data stored in a CSV file and wanted to do a simple check on the quality of the data by making sure that each line had the same number of fields. One way this can be done is with awk: awk -F "," ' { print NF-1 } ' file.csv Here we’re specifying the file separator -F as ',' and then using the NF (number of fields) variable to print how many commas there are on the line. Clojure: Thread last (->>) vs Thread first (\->) https://www.markhneedham.com/blog/2012/11/06/clojure-thread-last-vs-thread-first/ Tue, 06 Nov 2012 12:42:36 +0000 https://www.markhneedham.com/blog/2012/11/06/clojure-thread-last-vs-thread-first/ In many of the Clojure examples that I’ve come across the thread last (→>) macro is used to make it easier (for people from a non lispy background!) to see the transformations that the initial data structure is going through. In one of my recent posts I showed how Jen & I had rewritten Mahout’s entropy function in Clojure: (defn calculate-entropy [counts data-size] (->> counts (remove #{0}) (map (partial individual-entropy data-size)) (reduce +))) Here we are using the thread last operator to first pass counts as the last argument of the remove function on the next line, then to pass the result of that to the map function on the next line and so on. Emacs/Clojure: Starting out with paredit https://www.markhneedham.com/blog/2012/10/31/emacsclojure-starting-out-with-paredit/ Wed, 31 Oct 2012 08:41:09 +0000 https://www.markhneedham.com/blog/2012/10/31/emacsclojure-starting-out-with-paredit/ I’ve been complaining recently to Jen and Bruce about the lack of a beginner’s guide to emacs paredit mode which seems to be the defacto approach for people working with Clojure and both pointed me to the paredit cheat sheet. While it’s very comprehensive, I found that it’s a little overwhelming for a complete newbie like myself. I therefore thought it’d be useful to write a bit about a couple of things that I’ve picked up from pairing with Jen on little bits of Clojure over the last couple of months. Clojure: Mahout's 'entropy' function https://www.markhneedham.com/blog/2012/10/30/clojure-mahouts-entropy-function/ Tue, 30 Oct 2012 22:46:34 +0000 https://www.markhneedham.com/blog/2012/10/30/clojure-mahouts-entropy-function/ As I mentioned in a couple of previous posts Jen and I have been playing around with Mahout random forests and for a few hours last week we spent some time looking through the code to see how it worked. In particular we came across an entropy function which is used to determine how good a particular 'split' point in a decision tree is going to be. I quite like the following definition: Mahout: Using a saved Random Forest/DecisionTree https://www.markhneedham.com/blog/2012/10/27/mahout-using-a-saved-random-forestdecisiontree/ Sat, 27 Oct 2012 22:03:30 +0000 https://www.markhneedham.com/blog/2012/10/27/mahout-using-a-saved-random-forestdecisiontree/ One of the things that I wanted to do while playing around with random forests using Mahout was to save the random forest and then use use it again which is something Mahout does cater for. It was actually much easier to do this than I’d expected and assuming that we already have a https://github.com/apache/mahout/blob/trunk/core/src/main/java/org/apache/mahout/classifier/df/DecisionForest.java built we’d just need the following code to save it to disc: int numberOfTrees = 1; Data data = loadData(. Kaggle Digit Recognizer: Mahout Random Forest attempt https://www.markhneedham.com/blog/2012/10/27/kaggle-digit-recognizer-mahout-random-forest-attempt/ Sat, 27 Oct 2012 20:24:48 +0000 https://www.markhneedham.com/blog/2012/10/27/kaggle-digit-recognizer-mahout-random-forest-attempt/ I’ve written previously about the K-means approach that Jen and I took when trying to solve Kaggle’s Digit Recognizer and having stalled at about 80% accuracy we decided to try one of the algorithms suggested in the tutorials section - the random forest! We initially used a clojure random forests library but struggled to build the random forest from the training set data in a reasonable amount of time so we switched to Mahout’s version which is based on Leo Breiman’s random forests paper. Retrospectives: An alternative safety check https://www.markhneedham.com/blog/2012/10/27/retrospectives-an-alternative-safety-check/ Sat, 27 Oct 2012 18:21:57 +0000 https://www.markhneedham.com/blog/2012/10/27/retrospectives-an-alternative-safety-check/ At the start of most of the retrospectives I’ve been part of we’ve followed the safety check ritual whereby each person participating has to write a number from 1-5 on a sticky describing how they’ll be participating in the retrospective. 1 means you’ll probably keep quiet and not say much, 5 means you’re perfectly comfortable saying anything and the other numbers fall in between those two extremes. In my experiences it’s a bit of a fruitless exercise because its viewed that a higher number is 'better' and therefore the minimum people will tend to write down is '3' because they don’t want to stand out or cause a problem. Kaggle Digit Recognizer: K-means optimisation attempt https://www.markhneedham.com/blog/2012/10/27/kaggle-digit-recognizer-k-means-optimisation-attempt/ Sat, 27 Oct 2012 12:27:10 +0000 https://www.markhneedham.com/blog/2012/10/27/kaggle-digit-recognizer-k-means-optimisation-attempt/ I recently wrote a blog post explaining how Jen and I used the K-means algorithm to classify digits in Kaggle’s Digit Recognizer problem and one of the things we’d read was that with this algorithm you often end up with situations where it’s difficult to classify a new item because if falls between two labels. We decided to have a look at the output of our classifier function to see whether or not that was the case. Configuration in DNS https://www.markhneedham.com/blog/2012/10/24/configuration-in-dns/ Wed, 24 Oct 2012 17:40:49 +0000 https://www.markhneedham.com/blog/2012/10/24/configuration-in-dns/ In the latest version of the ThoughtWorks Technology Radar one of the areas covered is 'configuration in DNS', a term which I first came across earlier in the year from a mailing list post by my former colleague Daniel Worthington-Bodart. The radar describes it like so: Application deployments often suffer from an excess of environment-specific configuration settings, including the hostnames of dependent services. Configuration in DNS is a valuable technique to reduce this complexity by using standard hostnames like ‘mail’ or ‘db’ and have DNS resolve to the correct host for that environment. Kaggle Digit Recognizer: A K-means attempt https://www.markhneedham.com/blog/2012/10/23/kaggle-digit-recognizer-a-k-means-attempt/ Tue, 23 Oct 2012 19:04:20 +0000 https://www.markhneedham.com/blog/2012/10/23/kaggle-digit-recognizer-a-k-means-attempt/ Over the past couple of months Jen and I have been playing around with the Kaggle Digit Recognizer problem - a 'competition' created to introduce people to Machine Learning. The goal in this competition is to take an image of a handwritten single digit, and determine what that digit is. You are given an input file which contains multiple rows each containing 784 pixel values representing a 28x28 pixel image as well as a label indicating which number that image actually represents. How we're using story points https://www.markhneedham.com/blog/2012/10/21/how-were-using-story-points/ Sun, 21 Oct 2012 23:08:01 +0000 https://www.markhneedham.com/blog/2012/10/21/how-were-using-story-points/ A couple of weeks ago Joshua Kerievsky wrote a post describing how he and his teams don’t use story points anymore because of the problems they’d had with them which included: Story Point Inflation - inflating estimates of stories so that the velocity for an iteration is higher Comparing teams by points - judging comparative performance of teams by how many points they’re able to complete On the team I’m currently working on we still estimate the relative size of stories using points but we don’t use velocity per iteration to keep score - most of the time it’s barely even mentioned. Do the simple thing https://www.markhneedham.com/blog/2012/10/21/do-the-simple-thing/ Sun, 21 Oct 2012 21:35:35 +0000 https://www.markhneedham.com/blog/2012/10/21/do-the-simple-thing/ One of the most unexpected things that I picked up while pairing with Ashok for a few days in August/September is his ability to pick the simplest solution when confronted with a problem. On numerous occasions we’d be trying to do something and I’d end up on a yak shaving mission trying to get a complicated approach to work while he watched on with bemusement. I thought I’d actually learnt this lesson from working with Ashok but on a couple of occasions over the last week I’ve caught myself doing the same thing again! Environment agnostic machines and applications https://www.markhneedham.com/blog/2012/10/14/environment-agnostic-machines-and-applications/ Sun, 14 Oct 2012 18:49:02 +0000 https://www.markhneedham.com/blog/2012/10/14/environment-agnostic-machines-and-applications/ On my current project we’ve been setting up production and staging environments and Shodhan came up with the idea of making staging and production identical to the point that a machine wouldn’t even know what environment it was in. Identical in this sense means: Puppet doesn’t know which environment the machine is in. Our factor variables suggest the environment is production. We set the RACK_ENV variable to production so applications don’t know what environment they’re in. Play Framework 2.0: Rendering JSON data in the view https://www.markhneedham.com/blog/2012/10/14/play-framework-2-0-rendering-json-data-in-the-view/ Sun, 14 Oct 2012 09:28:28 +0000 https://www.markhneedham.com/blog/2012/10/14/play-framework-2-0-rendering-json-data-in-the-view/ I’ve been playing around with the Play Framework which we’re using to front a bunch of visualisations and one thing I wanted to do is send a data structure to a view and then convert that into JSON. I’ve got a simple controller which looks like this: package controllers; import play.mvc.Controller; import play.mvc.Result; import views.html.*; public class SalesByCategory extends Controller { public static Result index() { ArrayList<Map<String, Object>> series = new ArrayList<Map<String, Object>>(); Map<String, Object> oneSeries = new HashMap<String, Object>(); oneSeries. Varnish: Purging the cache https://www.markhneedham.com/blog/2012/10/10/varnish-purging-the-cache/ Wed, 10 Oct 2012 23:28:40 +0000 https://www.markhneedham.com/blog/2012/10/10/varnish-purging-the-cache/ We’re using varnish to cache all the requests that come through our web servers and especially in our pre-production environments we deploy quite frequently and want to see the changes that we’ve made. This means that we need to purge the pages we’re accessing from varnish so that it will actually pass the request through to the application server and serve up the latest version of the page. For some reason my google-fu when trying to remember/work out how to do this has always been weak but my colleague Shodhan helped me understand how to do this today so I thought I better record it so I don’t forget! Nygard Big Data Model: The Investigation Stage https://www.markhneedham.com/blog/2012/10/10/nygard-big-data-model-the-investigation-stage/ Wed, 10 Oct 2012 00:00:36 +0000 https://www.markhneedham.com/blog/2012/10/10/nygard-big-data-model-the-investigation-stage/ Earlier this year Michael Nygard wrote an extremely detailed post about his experiences in the world of big data projects and included in the post was the following diagram which I’ve found very useful. Nygard’s Big Data Model (shamelessly borrowed by me because it’s awesome) Ashok and I have been doing some work in this area helping one of our clients make sense of and visualise some of their data and we realised retrospectively that we were very acting very much in the investigation stage of the model. Mac OS X: Removing Byte Order Mark with an editor https://www.markhneedham.com/blog/2012/10/07/mac-os-x-removing-byte-order-mark-with-an-editor/ Sun, 07 Oct 2012 10:43:46 +0000 https://www.markhneedham.com/blog/2012/10/07/mac-os-x-removing-byte-order-mark-with-an-editor/ About a month ago I wrote about some problems I was having working with Windows generated CSV files which had a Byte Order Mark (BOM) at the beginning of the file and I described a way to get rid of it using awk. It’s a bit of a long winded process though and I always forget what the parameters I need to pass to awk are so I thought it would probably be quicker if I could just work out a way to get rid of the BOM using an editor. Strata Conf London: Day 2 Wrap Up https://www.markhneedham.com/blog/2012/10/03/strata-conf-london-day-2-wrap-up/ Wed, 03 Oct 2012 06:46:13 +0000 https://www.markhneedham.com/blog/2012/10/03/strata-conf-london-day-2-wrap-up/ Yesterday I attended the second day of Strata Conf London and these are the some of the things I learned from the talks I attended: John Graham Cunningham opened the series of keynotes with a talk describing the problems British Rail had in 1955 when trying to calculate the distances between all train stations and comparing them to the problems we have today. British Rail were trying to solve a graph problem when people didn’t know about graphs and Dijkstra’s algorithm hadn’t been invented and it was effectively invented on this project but never publicised. Strata Conf London: Day 1 Wrap Up https://www.markhneedham.com/blog/2012/10/02/strata-conf-london-day-1-wrap-up/ Tue, 02 Oct 2012 23:42:58 +0000 https://www.markhneedham.com/blog/2012/10/02/strata-conf-london-day-1-wrap-up/ For the past couple of days I attended the first Strata Conf to be held in London - a conference which seems to bring together people from the data science and big data worlds to talk about the stuff they’re doing. Since I’ve been playing around with a couple of different things in this area over the last 4/5 months I thought it’d be interesting to come along and see what people much more experienced in this area had to say! neo4j: Handling SUM's scientific notation https://www.markhneedham.com/blog/2012/09/30/neo4j-handling-sums-scientific-notation/ Sun, 30 Sep 2012 19:47:32 +0000 https://www.markhneedham.com/blog/2012/09/30/neo4j-handling-sums-scientific-notation/ In some of the recent work I’ve been doing with neo4j the queries I’ve written have been summing up the values from multiple nodes and after a certain number is reached the value returned used scientific notation. For example in a cypher query like this: START category = node:categories('category_id:1') MATCH p = category-[:has_child*1..5]->subCategory-[:has_product]->product-[:sold]->sales RETURN EXTRACT(n in NODES(p) : n.category_id?),subCategory.category_id, SUM(sales.sales) I might get a result set like this: +------------------------------------------------------------------------------------------------+ | EXTRACT(n in NODES(p) : n. Testing XML generation with vimdiff https://www.markhneedham.com/blog/2012/09/30/testing-xml-generation-with-vimdiff/ Sun, 30 Sep 2012 15:48:10 +0000 https://www.markhneedham.com/blog/2012/09/30/testing-xml-generation-with-vimdiff/ A couple of weeks ago I spent a bit of time writing a Ruby DSL to automate the setup of load balancers, firewall and NAT rules through the VCloud API. The VCloud API deals primarily in XML so the DSL is just a thin layer which creates the appropriate mark up. When we started out we configured everything manually through the web console and then exported the XML so the first thing that the DSL needed to do was create XML that matched what we already had. Data Science: Making sense of the data https://www.markhneedham.com/blog/2012/09/30/data-science-making-sense-of-the-data/ Sun, 30 Sep 2012 14:58:11 +0000 https://www.markhneedham.com/blog/2012/09/30/data-science-making-sense-of-the-data/ Over the past month or so Ashok and I have been helping one of our clients explore and visualise some of their data and one of the first things we needed to do was make sense of the data that was available. Start small Ashok suggested that we work with a subset of our eventual data set so that we could get a feel for the data and quickly see whether what we were planning to do made sense. Data Science: Scrapping the data together https://www.markhneedham.com/blog/2012/09/30/data-science-scrapping-the-data-together/ Sun, 30 Sep 2012 13:44:18 +0000 https://www.markhneedham.com/blog/2012/09/30/data-science-scrapping-the-data-together/ On Friday Martin, Darren and I were discussing the ThoughtWorks graph that I was working on earlier in the year and Martin pointed out that an interesting aspect of this type of work is that the data you want to work with isn’t easily available. You therefore need to find a way to scrap the data together to make some headway and then maybe at a later stage once some progress has been made it will become easier to replace that with a cleaner solution. Upstart: Job getting stuck in the start/killed state https://www.markhneedham.com/blog/2012/09/29/upstart-job-getting-stuck-in-the-startkilled-state/ Sat, 29 Sep 2012 09:56:16 +0000 https://www.markhneedham.com/blog/2012/09/29/upstart-job-getting-stuck-in-the-startkilled-state/ We’re using upstart to handle the processes running on our machines and since the haproxy package only came package with an init.d script we wanted to make it upstartified. When defining an upstart script you need to specify an expect stanza in which you specify whether or not the process which you’re launching is going to fork. If you do not specify the expect stanza, Upstart will track the life cycle of the first PID that it executes in the exec or script stanzas. Java: Parsing CSV files https://www.markhneedham.com/blog/2012/09/23/java-parsing-csv-files/ Sun, 23 Sep 2012 22:46:09 +0000 https://www.markhneedham.com/blog/2012/09/23/java-parsing-csv-files/ As I mentioned in a previous post I recently moved a bunch of neo4j data loading code from Ruby to Java and as part of that process I needed to parse some CSV files. In Ruby I was using FasterCSV which became the standard CSV library from Ruby 1.9 but it’s been a while since I had to parse CSV files in Java so I wasn’t sure which library to use. Network Address Translation https://www.markhneedham.com/blog/2012/09/23/network-address-translation/ Sun, 23 Sep 2012 19:23:54 +0000 https://www.markhneedham.com/blog/2012/09/23/network-address-translation/ I’ve often heard people talking about Network Address Translation (NAT) but I never really understood exactly how it worked until we started configuring some virtual data centres on my current project. This is an attempt at documenting my own current understanding so I won’t forget in future. In our case we’ve been provisioning a bunch of machines into different private networks, and each machine therefore has an IP in the range of IPv4 addresses reserved for private networks: neo4j: The Batch Inserter and the sunk cost fallacy https://www.markhneedham.com/blog/2012/09/23/neo4j-the-batch-inserter-and-the-sunk-cost-fallacy/ Sun, 23 Sep 2012 10:29:10 +0000 https://www.markhneedham.com/blog/2012/09/23/neo4j-the-batch-inserter-and-the-sunk-cost-fallacy/ About a year and a half ago I wrote about the sunk cost fallacy which is defined like so: The Misconception: You make rational decisions based on the future value of objects, investments and experiences. The Truth: Your decisions are tainted by the emotional investments you accumulate, and the more you invest in something the harder it becomes to abandon it. Over the past few weeks Ashok and I have been doing some exploration of one of our client’s data by modelling it in a neo4j graph and seeing what interesting things the traversals reveal. Finding ways to use bash command line history shortcuts https://www.markhneedham.com/blog/2012/09/19/finding-ways-to-use-bash-command-line-history-shortcuts/ Wed, 19 Sep 2012 07:00:22 +0000 https://www.markhneedham.com/blog/2012/09/19/finding-ways-to-use-bash-command-line-history-shortcuts/ A couple of months ago I wrote about a bunch of command line history shortcuts that Phil had taught me and after recently coming across Peteris Krumins' bash history cheat sheet I thought it’d be interesting to find some real ways to use them. A few weeks ago I wrote about a UTF-8 byte order mark (BOM) that I wanted to remove from a file I was working on and I realised this evening that there were some other files with the same problem. zsh: Don't verify substituted history expansion a.k.a. disabling histverify https://www.markhneedham.com/blog/2012/09/16/zsh-dont-verify-substituted-history-expansion-a-k-a-disabling-histverify/ Sun, 16 Sep 2012 13:35:56 +0000 https://www.markhneedham.com/blog/2012/09/16/zsh-dont-verify-substituted-history-expansion-a-k-a-disabling-histverify/ I use zsh on my Mac terminal and in general I prefer it to bash but it has an annoying default setting whereby when you try to repeat a command via substituted history expansion it asks you to verify that. For example let’s say by mistake I try to vi into a directory rather than cd’ing into it: vi ~/.oh-my-zsh If I try to cd into the directory by using '! cURL and the case of the carriage return https://www.markhneedham.com/blog/2012/09/15/curl-and-the-case-of-the-carriage-return/ Sat, 15 Sep 2012 09:06:02 +0000 https://www.markhneedham.com/blog/2012/09/15/curl-and-the-case-of-the-carriage-return/ We were doing some work this week where we needed to make a couple of calls to an API via a shell script and in the first call we wanted to capture one of the lines of the HTTP response headers and use that as in input to the second call. The way we were doing this was something like the following: #!/bin/bash # We were actually grabbing a different header but for the sake # of this post we'll say it was 'Set-Cookie' AUTH_HEADER=`curl -I http://www. Bash: Piping data into a command using heredocs https://www.markhneedham.com/blog/2012/09/15/bash-piping-data-into-a-command-using-heredocs/ Sat, 15 Sep 2012 07:54:04 +0000 https://www.markhneedham.com/blog/2012/09/15/bash-piping-data-into-a-command-using-heredocs/ I’ve been playing around with some data modelled in neo4j recently and one thing I wanted to do is run an adhoc query in the neo4j-shell and grab the results and do some text manipulation on them. For example I wrote a query which outputted the following to the screen and I wanted to sum together all the values in the 3rd column: | ["1","2","3"] | "3" | 1234567 | | ["4","5","6"] | "6" | 8910112 | Initially I was pasting the output into a text file and then running the following sequence of commands to work it out: Unix: Caught out by shell significant characters https://www.markhneedham.com/blog/2012/09/13/unix-caught-out-by-shell-significant-characters/ Thu, 13 Sep 2012 00:17:49 +0000 https://www.markhneedham.com/blog/2012/09/13/unix-caught-out-by-shell-significant-characters/ One of the applications that Phil and I were deploying today needed a MySQL server and part of our puppet code to provision that node type runs a command to setup the privileges for a database user. The unevaluated puppet code reads like this: /usr/bin/mysql -h ${host} -uroot ${rootpassarg} -e "grant all on ${name}.* to ${user}@'${remote_host}' identified by '$password'; flush privileges;" In the application we were deploying that expanded into something like this: While waiting for VMs to provision... https://www.markhneedham.com/blog/2012/09/12/while-waiting-for-vms-to-provision/ Wed, 12 Sep 2012 22:53:39 +0000 https://www.markhneedham.com/blog/2012/09/12/while-waiting-for-vms-to-provision/ Phil and I spent part of the day provisioning new virtual machines for some applications that we need to deploy which involves running a provisioning script and then opening another terminal and repeatedly trying to ssh into the box until it succeeds. Eventually we got bored of doing that so we figured out a nice little one liner to use instead: while :; do ssh 10.0.0.2; done The ':' is a bash noop and is defined like so: neo4j/cypher: CREATE UNIQUE - "SyntaxException: string matching regex ``$' expected but ``p' found" https://www.markhneedham.com/blog/2012/09/09/neo4jcypher-create-unique-syntaxexception-string-matching-regex-expected-but-p-found/ Sun, 09 Sep 2012 22:29:33 +0000 https://www.markhneedham.com/blog/2012/09/09/neo4jcypher-create-unique-syntaxexception-string-matching-regex-expected-but-p-found/ I’ve been playing around with the mutating cypher syntax of neo4j which allows you to make changes to the graph as well as query it, a feature introduced into cypher in May in release 1.8 M01. I was trying to make use of the 'CREATE UNIQUE' syntax which allows you to create nodes/relationships if they’re missing but won’t do anything if they already exists. I had something like the following: logstash not picking up some files https://www.markhneedham.com/blog/2012/09/07/logstash-not-picking-up-some-files/ Fri, 07 Sep 2012 23:49:41 +0000 https://www.markhneedham.com/blog/2012/09/07/logstash-not-picking-up-some-files/ We’re using logstash to collect all the logs across the different machines that we use in various environments and had noticed that on some of the nodes log files which we’d told the logstash-client to track weren’t being collected. We wanted to check what the open file descriptors of logstash-client were so we first had to grab its process id: $ ps aux | grep logstash logstash 19896 134 9. Apt-Cacher-Server: Extra junk at end of file https://www.markhneedham.com/blog/2012/09/07/apt-cacher-server-extra-junk-at-end-of-file/ Fri, 07 Sep 2012 15:45:16 +0000 https://www.markhneedham.com/blog/2012/09/07/apt-cacher-server-extra-junk-at-end-of-file/ We’ve been installing Apt-Cache-Server so that we can cache some of the packages that we’re installing using apt-get on our own network. (Almost) Following the instructions from the home page we added the following to /etc/apt/apt.conf.d/01proxy: Acquire::http::Proxy "http://apt-cache-server:3142" And when we ran 'apt-get update' we were getting the following error: E: Syntax error /etc/apt/apt.conf.d/01proxy:2: Extra junk at end of file We initially thought it must be a problem with having an extra space or line ending but it turns out we had just left off the semi colon. A rogue "\357\273\277" (UTF-8 byte order mark) https://www.markhneedham.com/blog/2012/09/03/a-rogue-357273277-utf-8-byte-order-mark/ Mon, 03 Sep 2012 06:31:54 +0000 https://www.markhneedham.com/blog/2012/09/03/a-rogue-357273277-utf-8-byte-order-mark/ We’ve been loading some data into neo4j from a CSV file - creating one node per row and using the value in the first column as the index lookup for the node. Unfortunately the index lookup wasn’t working for the first row but was for every other row. By coincidence we started saving each row into a hash map and were then able to see what was going wrong: Book Review: The Retrospective Handbook - Pat Kua https://www.markhneedham.com/blog/2012/08/31/book-review-the-retrospective-handbook-pat-kua/ Fri, 31 Aug 2012 21:18:19 +0000 https://www.markhneedham.com/blog/2012/08/31/book-review-the-retrospective-handbook-pat-kua/ My colleague Pat Kua recently published a book he’s been working on for the first half of the year titled 'The Retrospective Handbook' - a book in which Pat shares his experiences with retrospectives and gives advice to budding facilitators. I was intrigued what the book would be like because the skill gap between Pat and me with respect to facilitating retrospectives is huge and I’ve often found that experts in a subject can have a tendency to be a bit preachy when writing about their subject! The Curse Of Knowledge https://www.markhneedham.com/blog/2012/08/28/the-curse-of-knowledge/ Tue, 28 Aug 2012 21:22:36 +0000 https://www.markhneedham.com/blog/2012/08/28/the-curse-of-knowledge/ My colleague Anand Vishwanath recently recommended the book 'Made To Stick' and one thing that has really stood out for me while reading it is the idea of the 'The Curse Of Knowledge' which is described like so: Once we know something, we find it hard to imagine what it was like not to know it. Our knowledge has "cursed" us. And it becomes difficult for us to share out knowledge with others, because can’t readily re-create our listeners' state of mind. Ruby: Finding where gems are https://www.markhneedham.com/blog/2012/08/25/ruby-finding-where-gems-are/ Sat, 25 Aug 2012 10:00:07 +0000 https://www.markhneedham.com/blog/2012/08/25/ruby-finding-where-gems-are/ In my infrequent travels into Ruby land I always seem to forget where the gems that I’ve installed actually live on the file system but my colleague Nick recently showed me a neat way of figuring it out. If I’m in the folder that contains all my ThoughtWorks graph code I’d just need to run the following command: $ gem which rubygems /Users/mneedham/.rbenv/versions/jruby-1.6.7/lib/ruby/site_ruby/1.8/rubygems.rb I then loaded up irb and wrote a simple cypher query executed using neography: puppetdb: Failed to submit 'replace catalog' command for client to PuppetDB at puppetmaster:8081: [500 Server Error] https://www.markhneedham.com/blog/2012/08/16/puppetdb-failed-to-submit-replace-catalog-command-for-client-to-puppetdb-at-puppetmaster8081-500-server-error/ Thu, 16 Aug 2012 23:31:28 +0000 https://www.markhneedham.com/blog/2012/08/16/puppetdb-failed-to-submit-replace-catalog-command-for-client-to-puppetdb-at-puppetmaster8081-500-server-error/ I’m still getting used to the idea of following the logs when working out what’s going wrong with distributed systems but it worked well when trying to work out why our puppet client which was throwing this error when we ran 'puppet agent -tdv': err: Could not retrieve catalog from remote server: Error 400 on SERVER: Failed to submit 'replace catalog' command for client to PuppetDB at puppetmaster:8081: [500 Server Error] We were seeing the same error in /var/log/syslog on the puppet master and a quick look at the process list didn’t show that the puppet master or puppetdb services were under a particularly heavy load. Presentations; Tell a story https://www.markhneedham.com/blog/2012/08/14/presentations-tell-a-story/ Tue, 14 Aug 2012 22:16:01 +0000 https://www.markhneedham.com/blog/2012/08/14/presentations-tell-a-story/ A few years ago before an F# talk that I gave at the .NET user group in Sydney my colleague Erik Doernenburg gave me some advice about how I should structure the talk. (paraphrasing) He suggested that in a lot of talks he’d seen the presenter rattle off a bunch of information about a topic but hadn’t provided any insight into their own experience with the topic. If two people give a talk on the same topic they therefore end up being fairly similar talks even though each person may have a totally different perspective. SSHing onto machines via a jumpbox https://www.markhneedham.com/blog/2012/08/10/sshing-onto-machines-via-a-jumpbox/ Fri, 10 Aug 2012 00:58:46 +0000 https://www.markhneedham.com/blog/2012/08/10/sshing-onto-machines-via-a-jumpbox/ We wanted to be able to ssh into some machines which were behind a firewall so we set up a jumpbox which our firewall directed any traffic on port 22 towards. Initially if we wanted to SSH onto a machine inside the network we’d have to do a two step process: $ ssh jumpbox # now on the jumpbx $ ssh internal-network-machine That got a bit annoying after a while so Sam showed us a neat way of proxying the second ssh command through the first one by making use of netcat. VCloud Guest Customization Script : [: postcustomization: unexpected operator https://www.markhneedham.com/blog/2012/08/06/vcloud-guest-customization-script-postcustomization-unexpected-operator/ Mon, 06 Aug 2012 21:50:07 +0000 https://www.markhneedham.com/blog/2012/08/06/vcloud-guest-customization-script-postcustomization-unexpected-operator/ We have been doing some work to automatically provision machines using the VCloud API via fog and one of the things we wanted to do was run a custom script the first time that a node powers on. The following explains how customization scripts work: In vCloud Director, when setting a customization script in a virtual machine, the script: Is called only on initial customization and force recustomization. Is called with the precustomization command line parameter before out-of-box customization begins. neo4j: Creating a custom index with neo4j.rb https://www.markhneedham.com/blog/2012/08/05/neo4j-creating-a-custom-index-with-neo4j-rb/ Sun, 05 Aug 2012 09:45:08 +0000 https://www.markhneedham.com/blog/2012/08/05/neo4j-creating-a-custom-index-with-neo4j-rb/ As I mentioned in my last post I’ve been playing around with the TFL Bus stop location and routes API and one thing I wanted to do was load all the bus stops into a neo4j database using the neo4j.rb gem. I initially populated the database via neography but it was taking around 20 minutes each run and I figured it’d probably be much quicker to populate it directly rather than using the REST API. London Bus Stops API: Mapping northing/easting values to lat/long https://www.markhneedham.com/blog/2012/07/30/london-bus-stops-api-mapping-northingeasting-values-to-latlong/ Mon, 30 Jul 2012 22:28:38 +0000 https://www.markhneedham.com/blog/2012/07/30/london-bus-stops-api-mapping-northingeasting-values-to-latlong/ I started playing around with the TFL Bus stop location and routes API and one of the annoying things about the data is that it uses easting/northing values to describe the location of bus stops rather than lat/longs. The first few lines of the CSV file look like this: 1000,91532,490000266G,WESTMINSTER STN <> / PARLIAMENT SQUARE,530171,179738,177,0K08,0 10001,72689,490013793E,TREVOR CLOSE,515781,174783,78,NB16,0 10002,48461,490000108F,HIGHBURY CORNER,531614,184603,5,C902,0 For each of the stops I wanted to convert from the easting/northing value to the equivalent lat/long value but I couldn’t find a simple way of doing it in code although I did come across an API that would do it for me. Puppet: Keeping the discipline https://www.markhneedham.com/blog/2012/07/29/puppet-keeping-the-discipline/ Sun, 29 Jul 2012 21:53:03 +0000 https://www.markhneedham.com/blog/2012/07/29/puppet-keeping-the-discipline/ For the last 5 weeks or so I’ve been working with puppet every day to automate the configuration of various nodes in our stack and my most interesting observation so far is that you really need to keep your discipline when doing this type of work. We can keep that discipline in three main ways when developing modules. Running from scratch Configuring various bits of software seems to follow the 80/20 rule and we get very close to having each thing working quite quickly but then end up spending a disproportionate amount of time tweaking the last little bits. Unix: tee https://www.markhneedham.com/blog/2012/07/29/unix-tee/ Sun, 29 Jul 2012 19:11:24 +0000 https://www.markhneedham.com/blog/2012/07/29/unix-tee/ I’ve read about the Unix 'tee' command before but never found a reason to use it until the last few weeks. One of the things I repeatedly do by mistake is open /etc/hosts without sudo and then try to make changes to it: $ vi /etc/hosts # Editing it leads to the dreaded 'W10: Changing a readonly file' I always used to close the file and then re-open it with sudo but I recently came across an approach which allows us to use 'tee' to get around the problem. neo4j: Multiple starting nodes by index lookup https://www.markhneedham.com/blog/2012/07/28/neo4j-multiple-starting-nodes-by-index-lookup/ Sat, 28 Jul 2012 23:32:28 +0000 https://www.markhneedham.com/blog/2012/07/28/neo4j-multiple-starting-nodes-by-index-lookup/ I spent a bit of time this evening extracting some data from the ThoughtWorks graph for our marketing team who were interested in anything related to our three European offices in London, Manchester and Hamburg. The most interesting things we can explore relate to the relationship between people and the offices. The model around people and offices looks like this: I added a 'current_home_office' relationship to make it easier to quickly get to the nodes of people who are currently working in a specific office. R: Mapping a function over a collection of values https://www.markhneedham.com/blog/2012/07/23/r-mapping-a-function-over-a-collection-of-values/ Mon, 23 Jul 2012 23:25:00 +0000 https://www.markhneedham.com/blog/2012/07/23/r-mapping-a-function-over-a-collection-of-values/ I spent a bit of Sunday playing around with R and one thing I wanted to do was map a function over a collection of values and transform each value slightly. I loaded my data set using the 'Import Dataset' option in R Studio (suggested to me by Rob) which gets converted to the following function call: > data <- read.csv("~/data.csv", header=T, encoding="ISO-8859") > data Column1 InterestingColumn 1 Mark 12. neo4j: Graph Global vs Graph Local queries https://www.markhneedham.com/blog/2012/07/23/neo4j-graph-global-vs-graph-local-queries/ Mon, 23 Jul 2012 22:23:10 +0000 https://www.markhneedham.com/blog/2012/07/23/neo4j-graph-global-vs-graph-local-queries/ A few weeks ago I did a presentation at the ThoughtWorks EU away day on the graph I’ve been developing using neo4j and I wanted to show who the most connected people in each of our European offices were. I started with the following cypher query: START n = node(*) MATCH n-[r:colleagues*1..2]->c, n-[r2:member_of]->office WHERE n.type? = 'person' AND (NOT(HAS(r2.end_date))) AND office.name = 'London - UK South' AND (NOT(HAS(c.thoughtquitter))) RETURN n. neo4j: Embracing the sub graph https://www.markhneedham.com/blog/2012/07/21/neo4j-embracing-the-sub-graph/ Sat, 21 Jul 2012 22:46:06 +0000 https://www.markhneedham.com/blog/2012/07/21/neo4j-embracing-the-sub-graph/ In May I wrote a blog post explaining how I’d been designing a neo4j graph by thinking about what questions I wanted to answer about the data. In the comments Josh Adell gave me the following advice: The neat things about graphs is that multiple subgraphs can live in the same data-space. ... Keep your data model rich! Don’t be afraid to have as many relationships as you need. The power of graph databases comes from finding surprising results when you have strongly interconnected data. neo4j: Shortest Path with and without cypher https://www.markhneedham.com/blog/2012/07/19/neo4j-shortest-path-with-and-without-cypher/ Thu, 19 Jul 2012 19:57:31 +0000 https://www.markhneedham.com/blog/2012/07/19/neo4j-shortest-path-with-and-without-cypher/ I was looking back at some code I wrote a few months ago to query a neo4j database to find the shortest path between two people via the colleagues relationships that exist. </img> The initial code, written using neography, looked like this: neo = Neography::Rest.new start_node = neo.get_node(start_node_id) destination_node = neo.get_node(destination_node_id) neo.get_paths(start_node, destination_node, { "type" => "colleagues" }, depth = 3, algorithm = "shortestPath") The neography code eventually makes a POST request to /node/{start_id}/paths and provides a JSON payload containing the other information about the query. neo4j: java.security.NoSuchAlgorithmException: Algorithm [JKS] of type [KeyStore] from provider [org.bouncycastle.jce.provider.BouncyCastleProvider: name=BC version=1.4] https://www.markhneedham.com/blog/2012/07/17/neo4j-java-security-nosuchalgorithmexception-algorithm-jks-of-type-keystore-from-provider-org-bouncycastle-jce-provider-bouncycastleprovider-namebc-version1-4/ Tue, 17 Jul 2012 00:02:51 +0000 https://www.markhneedham.com/blog/2012/07/17/neo4j-java-security-nosuchalgorithmexception-algorithm-jks-of-type-keystore-from-provider-org-bouncycastle-jce-provider-bouncycastleprovider-namebc-version1-4/ I’ve spent the last couple of hours moving my neo4j graph from my own machine onto a vanilla CentOS VM and initially tried to run neo using a non Sun version of Java which I installed like so: yum install java This is the version of Java that was installed: $ java -version java version "1.5.0" gij (GNU libgcj) version 4.4.6 20120305 (Red Hat 4.4.6-4) When I tried to start neo4j: tcpdump: Learning how to read UDP packets https://www.markhneedham.com/blog/2012/07/15/tcpdump-learning-how-to-read-udp-packets/ Sun, 15 Jul 2012 13:29:05 +0000 https://www.markhneedham.com/blog/2012/07/15/tcpdump-learning-how-to-read-udp-packets/ Phil and I spent some of Friday afternoon configuring statsd: A network daemon that runs on the Node.js platform and listens for statistics, like counters and timers, sent over UDP and sends aggregates to one or more pluggable backend services We configured it to listen on its default port 8125 and then used netcat to send UDP packets to see if it was working like so: echo -n "blah:36|c" | nc -w 1 -u -4 localhost 8125 We used tcpdump to capture any UDP packets on port 8125 like so: netcat: localhost resolution not working when sending UDP packets https://www.markhneedham.com/blog/2012/07/15/netcat-localhost-resolution-not-working-when-sending-udp-packets/ Sun, 15 Jul 2012 08:14:16 +0000 https://www.markhneedham.com/blog/2012/07/15/netcat-localhost-resolution-not-working-when-sending-udp-packets/ As part of some work we were doing last week Phil and I needed to send UDP packets to a local port and check that they were being picked up. We initially tried sending a UDP packet to localhost port 8125 using netcat like so: echo -n "hello" | nc -w 1 -u localhost 8125 That message wasn’t being received by the application listening on the port so Phil decided to try and send the same packet from Ruby which worked fine: Racket: Wiring it up to a REPL ala SLIME/Swank https://www.markhneedham.com/blog/2012/07/11/racket-wiring-it-up-to-a-repl-ala-slimeswank/ Wed, 11 Jul 2012 19:34:34 +0000 https://www.markhneedham.com/blog/2012/07/11/racket-wiring-it-up-to-a-repl-ala-slimeswank/ One of the awesome things about working with clojure is that it’s possible to wire up clojure files in emacs to a REPL by making use of Slime/https://github.com/technomancy/swank-clojure[Swank]. I’ve started using Racket to work through the examples in The Little Schemer and wanted to achieve a similar thing there. Racket is a modern programming language in the Lisp/Scheme family, suitable for a wide range of applications I don’t know much about configuring emacs so I made use of Phil Halgelberg’s emacs-starter-kit which is available on github. Data visualisation: Is 'interesting' enough? https://www.markhneedham.com/blog/2012/07/08/data-visualisation-is-interesting-enough/ Sun, 08 Jul 2012 22:45:41 +0000 https://www.markhneedham.com/blog/2012/07/08/data-visualisation-is-interesting-enough/ I recently read a blog post by Julian Boot titled 'visualisation without analysis is fine' where he suggests that we can learn things from visualising data in the right way - detailed statistical analysis isn’t always necessary. I thought this was quite an interesting observation because over the past couple of months I’ve been playing around with ThoughtWorks data and looking at different ways to visualise aspects of the data. ganglia: Importing gmond Python modules https://www.markhneedham.com/blog/2012/07/08/ganglia-importing-gmond-python-modules/ Sun, 08 Jul 2012 21:55:53 +0000 https://www.markhneedham.com/blog/2012/07/08/ganglia-importing-gmond-python-modules/ My colleague Shohdan and I spent a couple of days last week wiring up various monitoring metrics into ganglia and while most of them come built in, we also found some python based modules that we wanted to use. Unfortunately we couldn’t find any instructions on github explaining how to set them up but after a bit of trial and error we figured it out. One of the modules that we wanted to use was diskstat which provides I/O wait time metrics which we couldn’t find in the built in modules. Bash Shell: Reusing parts of previous commands https://www.markhneedham.com/blog/2012/07/05/bash-shell-reusing-parts-of-previous-commands/ Thu, 05 Jul 2012 23:42:35 +0000 https://www.markhneedham.com/blog/2012/07/05/bash-shell-reusing-parts-of-previous-commands/ I’ve paired a few times with my colleague Phil Potter over the last couple of weeks and since he’s a bit of a ninja with bash shortcuts/commands I wanted to record some of the things he’s shown me so I won’t forget them! Let’s say we’re in the '/tmp' directory and want to create a folder a few levels down but forget to pass the '-p' option to 'mkdir': sudo, sudo -i & sudo su https://www.markhneedham.com/blog/2012/07/04/sudo-sudo-i-sudo-su/ Wed, 04 Jul 2012 19:34:45 +0000 https://www.markhneedham.com/blog/2012/07/04/sudo-sudo-i-sudo-su/ On the project I’m currently working on we’re doing quite a bit of puppet and although we’re using the puppet master approach in production & test environments it’s still useful to be able to run puppet headless to test changes locally. Since several of the commands require having write access to 'root' folders we need to run 'puppet apply' as a super user using sudo. We also need to run it in the context of some environment variables which the root user has. Debugging: Google vs The Manual https://www.markhneedham.com/blog/2012/07/04/debugging-google-vs-the-manual/ Wed, 04 Jul 2012 00:00:07 +0000 https://www.markhneedham.com/blog/2012/07/04/debugging-google-vs-the-manual/ Over the last six months or so I’ve worked with a bunch of different people and one of the things that I’ve noticed is that when something isn’t working there tend to be two quite distinct ways that people go about trying to solve the problem. The Manual The RTFM crowd will go straight for the official documentation or source code if needs be in an attempt to work through the problem from first principals. Powerpoint saving movies as images https://www.markhneedham.com/blog/2012/06/30/powerpoint-saving-movies-as-images/ Sat, 30 Jun 2012 10:05:04 +0000 https://www.markhneedham.com/blog/2012/06/30/powerpoint-saving-movies-as-images/ I’ve been working on a presentation for the ThoughtWorks Europe away day over the last few days and I created some screen casts using Camtasia which I wanted to include. It’s reasonably easy to insert movies into Powerpoint but I was finding that when I saved the file and then reloaded it the movies had been converted into images which wasn’t what I wanted at all! Eventually I came across a blog post which explained that I’d been saving the file as the wrong format. neo4j: Handling optional relationships https://www.markhneedham.com/blog/2012/06/24/neo4j-handling-optional-relationships/ Sun, 24 Jun 2012 23:32:17 +0000 https://www.markhneedham.com/blog/2012/06/24/neo4j-handling-optional-relationships/ On my ThoughtWorks neo4j there are now two different types of relationships between people nodes - they can either be colleagues or one can be the sponsor of the other. The graph looks like this: I wanted to get a list of all the sponsor pairs but also have some indicator of whether the two people have worked together. I started off by getting all of the sponsor pairs: Why you shouldn't use name as a key a.k.a. I am an idiot https://www.markhneedham.com/blog/2012/06/24/why-you-shouldnt-use-name-as-a-key-a-k-a-i-am-an-idiot/ Sun, 24 Jun 2012 22:55:39 +0000 https://www.markhneedham.com/blog/2012/06/24/why-you-shouldnt-use-name-as-a-key-a-k-a-i-am-an-idiot/ I think one of the first things that I learnt about dealing with users in a data store is that you should never use name as a primary key because their might be two people with the same name. Despite knowing that I foolishly chose to ignore this knowledge when building my neo4j graph and used name as the key for the Lucene index. I thought I’d got away with it but NO! Brightbox Repository: GPG error: The following signatures couldn't be verified because the public key is not available https://www.markhneedham.com/blog/2012/06/24/brightbox-repository-gpg-error-the-following-signatures-couldnt-be-verified-because-the-public-key-is-not-available/ Sun, 24 Jun 2012 00:58:43 +0000 https://www.markhneedham.com/blog/2012/06/24/brightbox-repository-gpg-error-the-following-signatures-couldnt-be-verified-because-the-public-key-is-not-available/ We’re using the Brightbox Ruby repository to get the versions of Ruby which we install on our machines and although we eventually put the configuration for this repository into Puppet we initially tested it out on a local VM. To start with you need to add the repository to /etc/apt/sources.list: deb http://ppa.launchpad.net/brightbox/ruby-ng/ubuntu lucid main To get that picked up we run the following: apt-get update Which initially threw this error because it’s a gpg signed repository and we hadn’t added the key: Creating a Samba share between Ubuntu and Mac OS X https://www.markhneedham.com/blog/2012/06/24/creating-a-samba-share-between-ubuntu-and-mac-os-x/ Sun, 24 Jun 2012 00:40:35 +0000 https://www.markhneedham.com/blog/2012/06/24/creating-a-samba-share-between-ubuntu-and-mac-os-x/ On the project I’m currently working on we have our development environment setup on a bare bones Ubuntu instance which we run via VmWare. We wanted to be able to edit files on the VM from the host O/S so my colleague Phil suggested that we set up a Samba server on the VM and then connect to it from the Mac. We first needed to install a couple of packages on the VM: Visualising a neo4j graph using gephi https://www.markhneedham.com/blog/2012/06/21/visualising-a-neo4j-graph-using-gephi/ Thu, 21 Jun 2012 05:02:32 +0000 https://www.markhneedham.com/blog/2012/06/21/visualising-a-neo4j-graph-using-gephi/ At ThoughtWorks we don’t have line managers but people can choose to have a sponsor - typically someone who has worked in the company for longer/has more experience in the industry than them - who can help them navigate the organisation better. From hearing people talk about sponsors over the last 6 years it seemed like quite a few people sponsored the majority and there were probably a few people who didn’t have a sponsor. Haskell: Mixed type lists https://www.markhneedham.com/blog/2012/06/19/haskell-mixed-type-lists/ Tue, 19 Jun 2012 23:09:39 +0000 https://www.markhneedham.com/blog/2012/06/19/haskell-mixed-type-lists/ I’ve been continuing to work through the exercises in The Little Schemer and came across a problem which needed me to write a function to take a mixed list of Integers and Strings and filter out the Integers. As I mentioned in my previous post I’ve been doing the exercises in Haskell but I thought I might struggle with that approach here because Haskell collections are homogeneous i.e. all the elements need to be of the same type. The Little Schemer: Attempt #2 https://www.markhneedham.com/blog/2012/06/19/the-little-schemer-attempt-2/ Tue, 19 Jun 2012 00:21:52 +0000 https://www.markhneedham.com/blog/2012/06/19/the-little-schemer-attempt-2/ A few weeks ago I asked the twittersphere for some advice on how I could get better at writing recursive functions and one of the pieces of advice was to work through The Little Schemer. I first heard about The Little Schemer a couple of years ago and after going through the first few pages I got bored and gave up. I still found the first few pages a bit trivial this time around as well but my colleague Jen Smith encouraged me to keep going and once I’d got about 20 pages in it became clearer to me why the first few pages had been written the way they had. neo4j/Cypher: Finding the most connected node on the graph https://www.markhneedham.com/blog/2012/06/16/neo4jcypher-finding-the-most-connected-node-on-the-graph/ Sat, 16 Jun 2012 10:41:03 +0000 https://www.markhneedham.com/blog/2012/06/16/neo4jcypher-finding-the-most-connected-node-on-the-graph/ As I mentioned in another post about a month ago I’ve been playing around with a neo4j graph in which I have the following relationship between nodes: One thing I wanted to do was work out which node is the most connected on the graph, which would tell me who’s worked with the most people. I started off with the following cypher query: query = " START n = node(*)" query << " MATCH n-[r:colleagues]->c" query << " WHERE n. Functional Thinking: Separating concerns https://www.markhneedham.com/blog/2012/06/12/functional-thinking-separating-concerns/ Tue, 12 Jun 2012 23:50:45 +0000 https://www.markhneedham.com/blog/2012/06/12/functional-thinking-separating-concerns/ Over the weekend I was trying to port some of the neo4j import code for the ThoughtWorks graph I’ve been working on to make use of the REST Batch API and I came across an interesting example of imperative vs functional thinking. I’m using the neography gem to populate the graph and to start with I was just creating a person node and then creating an index entry for it: CSV parsing/UTF-8 encoding https://www.markhneedham.com/blog/2012/06/10/csv-parsingutf-8-encoding/ Sun, 10 Jun 2012 23:30:23 +0000 https://www.markhneedham.com/blog/2012/06/10/csv-parsingutf-8-encoding/ I was recently trying to parse a CSV file which I’d converted from an Excel spreadsheet but was having problems with characters beyond the standard character set. This is an example of what was going wrong: > require 'csv' > people = CSV.open("sponsors.csv", 'r', ?,, ?\r).to_a ["Erik D\366rnenburg", "N/A"] > people.each { |sponsee, sponsor| puts "#{sponsee} #{sponsor}" } Erik D?rnenburg N/A I came across a Ruby gem called http://snippets.aktagon.com/snippets/159-Detecting-file-data-encoding-with-Ruby-and-the-chardet-RubyGem which allowed me to work out the character set of Erik’s name like so: Haskell: Writing a function that can take Ints or Doubles https://www.markhneedham.com/blog/2012/06/05/haskell-writing-a-function-that-can-take-ints-or-doubles/ Tue, 05 Jun 2012 00:10:29 +0000 https://www.markhneedham.com/blog/2012/06/05/haskell-writing-a-function-that-can-take-ints-or-doubles/ In my continued reading of SICP I wanted to recreate a 'sum' function used to demonstrate a function which could take another function as one of its parameters. In Scheme the function is defined like this: (define (sum term a next b) (if (> a b) 0 (+ (term a) (sum term (next a) next b)))) And can be used like this to sum the values between two numbers: Haskell: Building a range of numbers from command line arguments https://www.markhneedham.com/blog/2012/06/03/haskell-building-a-range-of-numbers-from-command-line-arguments/ Sun, 03 Jun 2012 20:13:54 +0000 https://www.markhneedham.com/blog/2012/06/03/haskell-building-a-range-of-numbers-from-command-line-arguments/ I’m working through some of the SICP problems in Haskell and for problem 1.22 you need to write a function which will indicate the first 3 prime numbers above a starting value. It is also suggested to only consider odd numbers so to find the prime numbers above 1000 the function call would look like this: > searchForPrimes [1001,1003..] [1009,1013,1019] I wanted to be able to feed in the range of numbers from the command line so that I’d be able to call the function with different values and see how long it took to work it out. Google Maps without any labels/country names https://www.markhneedham.com/blog/2012/05/31/google-maps-without-any-labelscountry-names/ Thu, 31 May 2012 21:52:29 +0000 https://www.markhneedham.com/blog/2012/05/31/google-maps-without-any-labelscountry-names/ I wanted to get a blank version of Google Maps without any of the country names on for a visualisation I’m working on but I’d been led to believe that this wasn’t actually possible. In actual fact we do have control over whether the labels are shown via the 'styles' option which we can call on the map. In my case the code looks like this: var map = new google. Haskell: Using type classes to generify Project Euler #31 https://www.markhneedham.com/blog/2012/05/30/haskell-using-type-classes-to-generify-project-euler-31/ Wed, 30 May 2012 12:08:25 +0000 https://www.markhneedham.com/blog/2012/05/30/haskell-using-type-classes-to-generify-project-euler-31/ As I mentioned in my previous post I’ve been working on Project Euler #31 and initially wasn’t sure how to write the algorithm. I came across a post on StackOverflow which explained it in more detail but unfortunately the example used US coins rather than UK ones like in the Project Euler problem. To start with I created two versions of the function - one for US coins and one for UK coins: Haskell: Java Style Enums https://www.markhneedham.com/blog/2012/05/30/haskell-java-style-enums/ Wed, 30 May 2012 11:10:15 +0000 https://www.markhneedham.com/blog/2012/05/30/haskell-java-style-enums/ I’ve been playing around with problem 31 of Project Euler which is defined as follows: In England the currency is made up of pound, £, and pence, p, and there are eight coins in general circulation: 1p, 2p, 5p, 10p, 20p, 50p, £1 (100p) and £2 (200p). It is possible to make £2 in the following way: 1 £1 + 150p + 220p + 15p + 12p + 31p How many different ways can £2 be made using any number of coins? Haskell: Finding the minimum & maximum values of a Foldable in one pass https://www.markhneedham.com/blog/2012/05/28/haskell-finding-the-minimum-maximum-values-of-a-foldable-in-one-pass/ Mon, 28 May 2012 11:18:13 +0000 https://www.markhneedham.com/blog/2012/05/28/haskell-finding-the-minimum-maximum-values-of-a-foldable-in-one-pass/ I recently came across Dan Piponi’s blog post 'Haskell Monoids & their Uses' and towards the end of the post he suggests creating monoids to work out the maximum and minimum values of a Foldable value in one pass. The http://www.haskell.org/ghc/docs/6.12.2/html/libraries/base-4.2.0.1/Data-Foldable.html type class provides a generic approach to walking through a datastructure, accumulating values as we go. The foldMap function applies a function to each element of our structure and then accumulates the return values of each of these applications. Haskell: Debugging code https://www.markhneedham.com/blog/2012/05/27/haskell-debugging-code/ Sun, 27 May 2012 22:16:38 +0000 https://www.markhneedham.com/blog/2012/05/27/haskell-debugging-code/ In my continued attempts to learn QuickCheck, one thing I’ve been doing is comparing the results of my brute force and divide & conquer versions of the closest pairs algorithm. I started with this property: let prop_dc_bf xs = (length xs > 2) ==> (fromJust $ bfClosest xs) == dcClosest xs And then ran it from GHCI, which resulted in the following error: > quickCheck (prop_dc_bf :: [(Double, Double)] -> Property) *** Failed! Haskell: Using monoids when sorting by multiple parameters https://www.markhneedham.com/blog/2012/05/23/haskell-using-monoids-when-sorting-by-multiple-parameters/ Wed, 23 May 2012 06:44:41 +0000 https://www.markhneedham.com/blog/2012/05/23/haskell-using-monoids-when-sorting-by-multiple-parameters/ On the project I’ve been working on we had a requirement to sort a collection of rows by 4 different criteria such that if two items matched for the first criteria we should consider the second criteria and so on. If we wrote that code in Haskell it would read a bit like this: data Row = Row { shortListed :: Bool, cost :: Float, distance1 :: Int, distance2 :: Int } deriving (Show, Eq) import Data. Scala/Haskell: A simple example of type classes https://www.markhneedham.com/blog/2012/05/22/scalahaskell-a-simple-example-of-type-classes/ Tue, 22 May 2012 10:26:49 +0000 https://www.markhneedham.com/blog/2012/05/22/scalahaskell-a-simple-example-of-type-classes/ I never really understood type classes when I was working with Scala but I recently came across a video where Dan Rosen explains them pretty well. Since the last time I worked in Scala I’ve been playing around with Haskell where type classes are much more common - for example if we want to compare two values we need to make sure that their type extends the 'Eq' type class. Haskell: My first attempt with QuickCheck and HUnit https://www.markhneedham.com/blog/2012/05/20/haskell-my-first-attempt-with-quickcheck-and-hunit/ Sun, 20 May 2012 19:09:52 +0000 https://www.markhneedham.com/blog/2012/05/20/haskell-my-first-attempt-with-quickcheck-and-hunit/ As I mentioned in a blog post a few days I’ve started learning QuickCheck with the test-framework package as suggested by David Turner. I first needed to install test-framework and some dependencies using http://www.haskell.org/cabal/: > cabal install test-framework > cabal install test-framework-quickcheck > cabal install test-framework-hunit I thought it’d be interesting to try and write some tests around the windowed function that I wrote a few months ago: Windowed.hs Building an API: Test Harness UI https://www.markhneedham.com/blog/2012/05/19/building-an-api-test-harness-ui/ Sat, 19 May 2012 20:03:02 +0000 https://www.markhneedham.com/blog/2012/05/19/building-an-api-test-harness-ui/ On the project I’ve been working on we’re building an API to be used by other applications in the organisation but when we started none of those applications were ready to integrate with us and therefore drive the API design. Initially we tried driving the API through integration style tests but we realised that taking this approach made it quite difficult for us to imagine how an application would use it. Haskell: Writing a custom equality operator https://www.markhneedham.com/blog/2012/05/16/haskell-writing-a-custom-equality-operator/ Wed, 16 May 2012 13:16:48 +0000 https://www.markhneedham.com/blog/2012/05/16/haskell-writing-a-custom-equality-operator/ In the comments on my post about generating random numbers to test a function David Turner suggested that this was exactly the use case for which QuickCheck was intended for so I’ve been learning a bit more about that this week. I started with a simple property to check that the brute force (bf) and divide and conquer (dc) versions of the algorithm returned the same result, assuming that there were enough values in the list to have a closest pair: Haskell: Removing if statements https://www.markhneedham.com/blog/2012/05/12/haskell-removing-if-statements/ Sat, 12 May 2012 15:46:31 +0000 https://www.markhneedham.com/blog/2012/05/12/haskell-removing-if-statements/ When I was looking over my solution to the closest pairs algorithm which I wrote last week I realised there there were quite a few if statements, something I haven’t seen in other Haskell code I’ve read. This is the initial version that I wrote: dcClosest :: (Ord a, Floating a) => [Point a] -> (Point a, Point a) dcClosest pairs if length pairs <= 3 then = fromJust $ bfClosest pairs else foldl (\closest (p1:p2:_) -> if distance (p1, p2) < distance closest then (p1, p2) else closest) closestPair (windowed 2 pairsWithinMinimumDelta) where sortedByX = sortBy compare pairs (leftByX:rightByX:_) = chunk (length sortedByX `div` 2) sortedByX closestPair = if distance closestLeftPair < distance closestRightPair then closestLeftPair else closestRightPair where closestLeftPair = dcClosest leftByX closestRightPair = dcClosest rightByX pairsWithinMinimumDelta = sortBy (compare `on` snd) $ filter withinMinimumDelta sortedByX where withinMinimumDelta (x, _) = abs (xMidPoint - x) <= distance closestPair where (xMidPoint, _) = last leftByX We can remove the first if statement which checks the length of the list and replace it with pattern matching code like so: neo4j/Cypher: Finding the shortest path between two nodes while applying predicates https://www.markhneedham.com/blog/2012/05/12/neo4jcypher-finding-the-shortest-path-between-two-nodes-while-applying-predicates/ Sat, 12 May 2012 14:55:30 +0000 https://www.markhneedham.com/blog/2012/05/12/neo4jcypher-finding-the-shortest-path-between-two-nodes-while-applying-predicates/ As I mentioned in a blog post about a week ago I decided to restructure the ThoughtWorks graph I’ve modelled in neo4j so that I could explicitly model projects and clients. As a result I had to update a traversal I’d written for finding the shortest path between two people in the graph. The original traversal query I had was really simple because I had a direct connection between the people nodes: Haskell: Explicit type declarations in GHCI https://www.markhneedham.com/blog/2012/05/10/haskell-explicit-type-declarations-in-ghci/ Thu, 10 May 2012 07:11:17 +0000 https://www.markhneedham.com/blog/2012/05/10/haskell-explicit-type-declarations-in-ghci/ On a few occasions I’ve wanted to be able to explicitly define the type of something when trying things out in the Haskell REPL (GHCI) but I didn’t actually realise this was possible until a couple of days ago. For example say we want to use the http://zvon.org/other/haskell/Outputprelude/read_f.html function to parse an input string into an integer. We could do this: > read "1" :: Int 1 But if we just evaluate the function alone and try and assign the result without casting to a type we get an exception: Haskell: Closest Pairs Algorithm https://www.markhneedham.com/blog/2012/05/09/haskell-closest-pairs-algorithm/ Wed, 09 May 2012 00:05:56 +0000 https://www.markhneedham.com/blog/2012/05/09/haskell-closest-pairs-algorithm/ As I mentioned in a post a couple of days ago I’ve been writing the closest pairs algorithm in Haskell and while the brute force version works for small numbers of pairs it starts to fall apart as the number of pairs increases: time ./closest_pairs 100 bf ./closest_pairs 100 bf 0.01s user 0.00s system 87% cpu 0.016 total time ./closest_pairs 1000 bf ./closest_pairs 1000 bf 3.59s user 0.01s system 99% cpu 3. Haskell: Generating random numbers https://www.markhneedham.com/blog/2012/05/08/haskell-generating-random-numbers/ Tue, 08 May 2012 22:09:17 +0000 https://www.markhneedham.com/blog/2012/05/08/haskell-generating-random-numbers/ As I mentioned in my last post I’ve been coding the closest pairs algorithm in Haskell and needed to create some pairs of coordinates to test it against. I’ve tried to work out how to create lists of random numbers in Haskell before and always ended up giving up because it seemed way more difficult than it should be but this time I came across a really good explanation of how to do it by jrockway on Stack Overflow. Haskell: Maximum Int value https://www.markhneedham.com/blog/2012/05/07/haskell-maximum-int-value/ Mon, 07 May 2012 09:18:02 +0000 https://www.markhneedham.com/blog/2012/05/07/haskell-maximum-int-value/ One of the algorithms covered in Algo Class was the closest pairs algorithm - an algorithm used to determine which pair of points on a plane are closest to each other based on their Euclidean distance. My real interest lies in writing the divide and conquer version of the algorithm but I started with the brute force version so that I’d be able to compare my answers. This is the algorithm: neo4j: What question do you want to answer? https://www.markhneedham.com/blog/2012/05/05/neo4j-what-question-do-you-want-to-answer/ Sat, 05 May 2012 13:20:41 +0000 https://www.markhneedham.com/blog/2012/05/05/neo4j-what-question-do-you-want-to-answer/ Over the past few weeks I’ve been modelling ThoughtWorks project data in neo4j and I realised that the way that I’ve been doing this is by considering what question I want to answer and then building a graph to answer it. When I first started doing this the main question I wanted to answer was 'how connected are people to each other' which led to me modelling the data like this: gephi: Centring a graph around an individual node https://www.markhneedham.com/blog/2012/04/30/gephi-centring-a-graph-around-an-individual-node/ Mon, 30 Apr 2012 22:20:45 +0000 https://www.markhneedham.com/blog/2012/04/30/gephi-centring-a-graph-around-an-individual-node/ I spent some time recently playing around with gephi - an open source platform for creating visualisations of graphs - to get a bit more insight into the ThoughtWorks graph which I’ve created in neo4j. I followed Max De Marxi’s blog post to create a GEFX (Graph Exchange XML Format) file to use in gephi although I later learned that you can import directly from neo4j into gephi which I haven’t tried yet. Performance: Caching per request https://www.markhneedham.com/blog/2012/04/30/performance-caching-per-request/ Mon, 30 Apr 2012 21:45:50 +0000 https://www.markhneedham.com/blog/2012/04/30/performance-caching-per-request/ A couple of years ago I wrote a post describing an approach my then colleague Christian Blunden used to help improve the performance of an application where you try to do expensive things less or find another way to do them. On the application I’m currently working on we load reference data from an Oracle database into memory based on configurations provided by the user. There are multiple configurations and then multiple ways that those configurations can be priced so we have two nested for loops in which we load data and then perform calculations on it. Haskell: Colour highlighting when writing to the shell https://www.markhneedham.com/blog/2012/04/29/haskell-colour-highlighting-when-writing-to-the-shell/ Sun, 29 Apr 2012 00:01:07 +0000 https://www.markhneedham.com/blog/2012/04/29/haskell-colour-highlighting-when-writing-to-the-shell/ I spent a few hours writing a simple front end on top of the Rabin Karp algorithm so that I could show the line of the first occurrence of a pattern in a piece of text on the shell. I thought it would be quite cool if I could highlight the appropriate text on the line like how grep does when the '--color=auto' flag is supplied. We can make use of ANSI escape codes to do this. Haskell: Int and Integer https://www.markhneedham.com/blog/2012/04/28/haskell-int-and-integer/ Sat, 28 Apr 2012 17:39:54 +0000 https://www.markhneedham.com/blog/2012/04/28/haskell-int-and-integer/ In my last post about the Rabin Karp algorithm I mentioned that I was having some problems when trying to write a hash function which closely matched its English description. rm-1 * ascii char) + (rm-2 * ascii char) + … (r0 * ascii char % q where r = 256, q = 1920475943 This is my current version of the hash function: hash = hash' globalR globalQ hash' r q string m = foldl (\acc x -> (r * acc + ord x) `mod` q) 0 $ take m string And my initial attempt to write the alternate version was this: Algorithms: Rabin Karp in Haskell https://www.markhneedham.com/blog/2012/04/25/algorithms-rabin-karp-in-haskell/ Wed, 25 Apr 2012 21:28:42 +0000 https://www.markhneedham.com/blog/2012/04/25/algorithms-rabin-karp-in-haskell/ I recently came across a blog post describing the Rabin Karp algorithm - an algorithm that uses hashing to find a pattern string in some text - and thought it would be interesting to try and write a version of it in Haskell. This algorithm is typically used when we want to search for multiple pattern strings in a text e.g. when detecting plagiarism or a primitive way of detecting code duplication but my initial version only lets your search for one pattern. Algo Class: Start simple and build up https://www.markhneedham.com/blog/2012/04/24/algo-class-start-simple-and-build-up/ Tue, 24 Apr 2012 07:17:24 +0000 https://www.markhneedham.com/blog/2012/04/24/algo-class-start-simple-and-build-up/ Over the last six weeks I’ve been working through Stanford’s Design and Analysis of Algorithms I class and each week there’s been a programming assignment on a specific algorithm for which a huge data set is provided. For the first couple of assignments I tried writing the code for the algorithm and then running it directly against the provided data set. As you might imagine it never worked first time and this approach led to me becoming very frustrated because there’s no way of telling what went wrong. Coding: Is there a name for everything? https://www.markhneedham.com/blog/2012/04/23/coding-is-there-a-name-for-everything/ Mon, 23 Apr 2012 00:20:57 +0000 https://www.markhneedham.com/blog/2012/04/23/coding-is-there-a-name-for-everything/ A month ago I wrote a post describing an approach my team has been taking to avoid premature abstractions whereby we leave code inline until we know enough about the domain to pull out meaningful classes or methods. Since I wrote that post we’ve come across a couple of examples where there doesn’t seem to be a name to describe a data structure. We are building a pricing engine where the input is a set of configurations and the output is a set of pricing rows associated with each configuration. neo4J: Searching for nodes by name https://www.markhneedham.com/blog/2012/04/20/neo4j-searching-for-nodes-by-name/ Fri, 20 Apr 2012 07:10:57 +0000 https://www.markhneedham.com/blog/2012/04/20/neo4j-searching-for-nodes-by-name/ As I mentioned in a post a few days ago I’ve been graphing connections between ThoughtWorks people using neo4j and wanted to build auto complete functionality so I can search for the names of people in the graph. The solution I came up was to create a Lucene index with an entry for each node and a common property on each document in the index so that I’d be able to get all the index entries easily. Algorithms: Flood Fill in Haskell - Abstracting the common https://www.markhneedham.com/blog/2012/04/17/algorithms-flood-fill-in-haskell-abstracting-the-common/ Tue, 17 Apr 2012 07:22:12 +0000 https://www.markhneedham.com/blog/2012/04/17/algorithms-flood-fill-in-haskell-abstracting-the-common/ In the comments of my blog post describing the flood fill algorithm in Haskell David Turner pointed out that the way I was passing the grid around was quite error prone. floodFill :: Array (Int, Int) Colour -> (Int, Int) -> Colour -> Colour -> Array (Int, Int) Colour floodFill grid point@(x, y) target replacement = if((not $ inBounds grid point) || grid ! (x,y) /= target) then grid else gridNorth where grid' = replace grid point replacement gridEast = floodFill grid' (x+1, y) target replacement gridWest = floodFill gridEast (x-1, y) target replacement gridSouth = floodFill gridWest (x, y+1) target replacement gridNorth = floodFill gridSouth (x, y-1) target replacement I actually did pass the wrong grid variable around while I was writing it and ended up quite confused as to why it wasn’t working as I expected. neography/neo4j/Lucene: Getting a list of all the nodes indexed https://www.markhneedham.com/blog/2012/04/17/neographyneo4jlucene-getting-a-list-of-all-the-nodes-indexed/ Tue, 17 Apr 2012 06:54:38 +0000 https://www.markhneedham.com/blog/2012/04/17/neographyneo4jlucene-getting-a-list-of-all-the-nodes-indexed/ I’ve been playing around with neo4j using the neography gem to create a graph of all the people in ThoughtWorks and the connections between them based on working with each other. I created a UI where you could type in the names of two people and see when they’ve worked together or the path between the shortest path between them if they haven’t. I thought it would be cool to have auto complete functionality when typing in a name but I couldn’t figure out how to partially query the index of people’s names that I’d created. Haskell: A simple parsing example using pattern matching https://www.markhneedham.com/blog/2012/04/15/haskell-a-simple-parsing-example-using-pattern-matching/ Sun, 15 Apr 2012 14:22:45 +0000 https://www.markhneedham.com/blog/2012/04/15/haskell-a-simple-parsing-example-using-pattern-matching/ As part of the second question in the Google Code Jam I needed to be able to parse lines of data which looked like this: 3 1 5 15 13 11 where The first integer will be N, the number of Googlers, and the second integer will be S, the number of surprising triplets of scores. The third integer will be p, as described above. Next will be N integers ti: the total points of the Googlers. Haskell: Reading in multiple lines of arguments https://www.markhneedham.com/blog/2012/04/15/haskell-reading-in-multiple-lines-of-arguments/ Sun, 15 Apr 2012 13:44:09 +0000 https://www.markhneedham.com/blog/2012/04/15/haskell-reading-in-multiple-lines-of-arguments/ I’ve mostly avoided doing any I/O in Haskell but as part of the Google Code Jam I needed to work out how to read a variable number of lines as specified by the user. The input looks like this: 4 3 1 5 15 13 11 3 0 8 23 22 21 2 1 1 8 0 6 2 8 29 20 8 18 18 21 The first line indicates how many lines will follow. Ruby: neo4j gem - LoadError: no such file to load -- active_support/core_ext/class/inheritable_attributes https://www.markhneedham.com/blog/2012/04/14/ruby-neo4j-gem-loaderror-no-such-file-to-load-active_supportcore_extclassinheritable_attributes/ Sat, 14 Apr 2012 10:21:40 +0000 https://www.markhneedham.com/blog/2012/04/14/ruby-neo4j-gem-loaderror-no-such-file-to-load-active_supportcore_extclassinheritable_attributes/ I’ve been playing around with neo4j again over the past couple of days using the neo4j.rb gem to build up a graph. I installed the gem but then ended up with the following error when I tried to 'require neo4j' in 'irb': LoadError: no such file to load -- active_support/core_ext/class/inheritable_attributes require at org/jruby/RubyKernel.java:1033 require at /Users/mneedham/.rbenv/versions/jruby-1.6.7/lib/ruby/site_ruby/1.8/rubygems/custom_require.rb:36 (root) at /Users/mneedham/.rbenv/versions/jruby-1.6.7/lib/ruby/gems/1.8/gems/neo4j-1.3.1-java/lib/neo4j.rb:9 require at org/jruby/RubyKernel.java:1033 require at /Users/mneedham/.rbenv/versions/jruby-1.6.7/lib/ruby/gems/1.8/gems/neo4j-1.3.1-java/lib/neo4j.rb:59 (root) at src/main/ruby/neo_test.rb:2 It seems a few others have come across this problem as well and the problem seems to be that ActiveSupport 3. Just Observe https://www.markhneedham.com/blog/2012/04/09/just-observe/ Mon, 09 Apr 2012 22:45:07 +0000 https://www.markhneedham.com/blog/2012/04/09/just-observe/ One of the most common instincts of a developer when starting on a new team is to look at the way the application has been designed and find ways that it can be done differently. Most often 'differently' means that a pattern used in a previous project will be favoured and while I think it’s good to make use of experience that we’ve gained, we do miss out on some learning if we write every application the same way. Haskell: Processing program arguments https://www.markhneedham.com/blog/2012/04/08/haskell-processing-program-arguments/ Sun, 08 Apr 2012 20:11:57 +0000 https://www.markhneedham.com/blog/2012/04/08/haskell-processing-program-arguments/ My Prismatic news feed recently threw up an interesting tutorial titled 'Haskell the Hard Way' which has an excellent and easy to understand section showing how to do IO in Haskell. About half way down the page there’s an exercise to write a program which sums all its arguments which I thought I’d have a go at. We need to use the http://zvon.org/other/haskell/Outputsystem/getArgs_f.html function to get the arguments passed to the program. Algorithms: Flood Fill in Haskell https://www.markhneedham.com/blog/2012/04/07/algorithms-flood-fill-in-haskell/ Sat, 07 Apr 2012 00:25:34 +0000 https://www.markhneedham.com/blog/2012/04/07/algorithms-flood-fill-in-haskell/ Flood fill is an algorithm used to work out which nodes are connected to a certain node in a multi dimensional array. In this case we’ll use a two dimensional array. The idea is that we decide that we want to change the colour of one of the cells in the array and have its immediate neighbours who share its initial colour have their colour changed too i.e. the colour floods its way through the grid. Haskell: Print friendly representation of an Array https://www.markhneedham.com/blog/2012/04/03/haskell-print-friendly-representation-of-an-array/ Tue, 03 Apr 2012 21:52:56 +0000 https://www.markhneedham.com/blog/2012/04/03/haskell-print-friendly-representation-of-an-array/ Quite frequently I play around with 2D arrays in Haskell but I’ve never quite worked out how to print them in a way that makes it easy to see the contents. I’m using the array from the 'Data.Array' module because it seems to be easier to transform them into a new representation if I want to change a value in one of the cells. The function to create one therefore looks like this: Haskell: Pattern matching data types with named fields https://www.markhneedham.com/blog/2012/03/31/haskell-pattern-matching-data-types-with-named-fields/ Sat, 31 Mar 2012 22:49:18 +0000 https://www.markhneedham.com/blog/2012/03/31/haskell-pattern-matching-data-types-with-named-fields/ One of my favourite things about coding in Haskell is that I often end up pattern matching against data types. I’ve been playing around with modelling cars coming into and out from a car park and changing the state of the car park accordingly. I started with these data type definitions: data CarParkState = Available Bool Int Int | AlmostFull Bool Int Int | Full Bool Int deriving (Show) data Action = Entering | Leaving deriving (Show) data Sticker = Handicap | None deriving (Show) which were used in the following function: Micro Services: A simple example https://www.markhneedham.com/blog/2012/03/31/micro-services-a-simple-example/ Sat, 31 Mar 2012 09:06:14 +0000 https://www.markhneedham.com/blog/2012/03/31/micro-services-a-simple-example/ In our code base we had the concept of a 'ProductSpeed' with two different constructors which initialised the object in different ways: public class ProductSpeed { public ProductSpeed(String name) { ... } public ProductSpeed(String name, int order)) { } } In the cases where the first constructor was used the order of the product was irrelevant. When the second constructor was used we did care about it because we wanted to be able sort the products before showing them in a drop down list to the user. IntelliJ: Find/Replace using regular expressions with capture groups https://www.markhneedham.com/blog/2012/03/30/intellij-findreplace-using-regular-expressions-with-capture-groups/ Fri, 30 Mar 2012 06:21:00 +0000 https://www.markhneedham.com/blog/2012/03/30/intellij-findreplace-using-regular-expressions-with-capture-groups/ Everyone now and then we end up having to write a bunch of mapping code and I quite like using IntelliJ’s 'Replace' option to do it but always end up spending about 5 minutes trying to remember how to do capture groups so I thought I’d write it down this time. Given the following text in our file: val mark = 0 val dave = 0 val john = 0 val alex = 0 Let’s say we wanted to prefix each of those names with 'cool' and had decided not to use Column mode for whatever reason. Readability/Performance https://www.markhneedham.com/blog/2012/03/29/readabilityperformance/ Thu, 29 Mar 2012 06:45:59 +0000 https://www.markhneedham.com/blog/2012/03/29/readabilityperformance/ I recently read the Graphite chapter of The Architecture of Open Source Applications book which mostly tells the story of how Chris Davis incrementally built out Graphite - a pretty cool tool that can be used to do real time graphing of metrics. The whole chapter is a very good read but I found the design reflections especially interesting: One of Graphite’s greatest strengths and greatest weaknesses is the fact that very little of it was actually "designed" in the traditional sense. Testing: Trying not to overdo it https://www.markhneedham.com/blog/2012/03/28/testing-trying-not-to-overdo-it/ Wed, 28 Mar 2012 00:10:46 +0000 https://www.markhneedham.com/blog/2012/03/28/testing-trying-not-to-overdo-it/ The design of the code which contains the main logic of the application that I’m currently working on looks a bit like the diagram on the right hand side: We load a bunch of stuff from an Oracle database, construct some objects from the data and then invoke a sequence of methods on those objects in order to execute our domain logic. Typically we might expect to see unit level test against all the classes described in this diagram but we’ve actually been trying out an approach where we don’t test the orchestration code directly but rather only test it via the resource which makes use of it. Haskell: Memoization using the power of laziness https://www.markhneedham.com/blog/2012/03/24/haskell-memoization-using-the-power-of-laziness/ Sat, 24 Mar 2012 12:28:03 +0000 https://www.markhneedham.com/blog/2012/03/24/haskell-memoization-using-the-power-of-laziness/ I’ve been trying to solve problem 15 of Project Euler which requires you to find the number of routes that can be taken to navigate from the top corner of a grid down to the bottom right corner. For example there are six routes across a 2x2 grid: My initial solution looked like this: routes :: (Int, Int) -> Int -> Int routes origin size = inner origin size where inner origin@(x, y) size | x == size && y == size = 0 | x == size || y == size = 1 | otherwise = inner (x+1, y) size + inner (x, y+1) size Which can be called like this: Saving the values of dynamically populated dropdown on back button https://www.markhneedham.com/blog/2012/03/24/saving-the-values-of-dynamically-populated-dropdown-on-back-button/ Sat, 24 Mar 2012 00:40:34 +0000 https://www.markhneedham.com/blog/2012/03/24/saving-the-values-of-dynamically-populated-dropdown-on-back-button/ We wanted to be able to retain the value of a drop down menu that was being dynamically populated (via an AJAX call) when the user hit the back button but the AJAX request re-runs when we go hit back therefore losing our selection. Our initial thinking was that we might be able to store the value of the dropdown in a hidden field and then restore it into the dropdown using jQuery on page load but that approach didn’t work since hidden fields don’t seem to retain their values when you hit back. Oracle Spatial: Querying by a point/latitude/longitude https://www.markhneedham.com/blog/2012/03/23/oracle-spatial-querying-by-a-pointlatitudelongitude/ Fri, 23 Mar 2012 23:54:42 +0000 https://www.markhneedham.com/blog/2012/03/23/oracle-spatial-querying-by-a-pointlatitudelongitude/ We’re using Oracle Spatial on the application I’m working on and while most of the time any spatial queries we make are done from Java code we wanted to be able to run them directly from SQL as well to verify the code was working correctly. We normally end up forgetting how to construct a query so I thought I’d document it. Assuming we have a table table_with_shape which has a column shape which is a polygon, if we want to check whether a lat/long value interacts with that shape we can do that with the following query: Functional Programming: Handling the Options https://www.markhneedham.com/blog/2012/03/21/functional-programming-handling-the-options/ Wed, 21 Mar 2012 00:50:37 +0000 https://www.markhneedham.com/blog/2012/03/21/functional-programming-handling-the-options/ A couple of weeks ago Channing Walton tweeted the following: Every time you call get on an Option a kitten dies. As Channing points out in the comments he was referring to unguarded calls to 'get' which would lead to an exception if the Option was empty, therefore pretty much defeating the point of using an Option in the first place! We’re using Dan Bodart’s totallylazy library on the application I’m currently working on and in fact were calling 'get' on an Option so I wanted to see if we could get rid of it. Haskell: Newbie currying mistake https://www.markhneedham.com/blog/2012/03/20/haskell-newbie-currying-mistake/ Tue, 20 Mar 2012 23:55:51 +0000 https://www.markhneedham.com/blog/2012/03/20/haskell-newbie-currying-mistake/ As I mentioned in my last post I’ve spent a bit of this evening writing a merge sort function and one of the mistakes I made a few times was incorrectly passing arguments to the recursive calls of 'merge'. For example, this is one of the earlier versions of the function: middle :: [Int] -> Int middle = floor . (\y -> y / 2) . fromIntegral . length msort :: [Int] -> [Int] msort unsorted = let n = middle unsorted in if n == 0 then unsorted else let (left, right) = splitAt n unsorted in merge (msort left) (msort right) where merge [] right = right merge left [] = left merge left@(x:xs) right@(y:ys) = if x < y then x : merge(xs, right) else y : merge (left, ys) Which doesn’t actually compile: Haskell: Chaining functions to find the middle value in a collection https://www.markhneedham.com/blog/2012/03/20/haskell-chaining-functions-to-find-the-middle-value-in-a-collection/ Tue, 20 Mar 2012 23:36:03 +0000 https://www.markhneedham.com/blog/2012/03/20/haskell-chaining-functions-to-find-the-middle-value-in-a-collection/ I’ve been playing around with writing merge sort in Haskell and eventually ended up with the following function: msort :: [Int] -> [Int] msort unsorted = let n = floor (fromIntegral(length unsorted) / 2) in if n == 0 then unsorted else let (left, right) = splitAt n unsorted in merge (msort left) (msort right) where merge [] right = right merge left [] = left merge left@(x:xs) right@(y:ys) = if x < y then x : merge xs right else y : merge left ys The 3rd line was annoying me as it has way too many brackets on it and I was fairly sure that it should be possible to just combine the functions like I learnt to do in F# a few years ago. Scala: Counting number of inversions (via merge sort) for an unsorted collection https://www.markhneedham.com/blog/2012/03/20/scala-counting-number-of-inversions-via-merge-sort-for-an-unsorted-collection/ Tue, 20 Mar 2012 06:53:18 +0000 https://www.markhneedham.com/blog/2012/03/20/scala-counting-number-of-inversions-via-merge-sort-for-an-unsorted-collection/ The first programming questions of algo-class requires you to calculate the number of inversions it would take using merge sort to sort a collection in ascending order. I found quite a nice explanation here too: Finding "similarity" between two rankings. Given a sequence of n numbers 1..n (assume all numbers are distinct). Define a measure that tells us how far this list is from being in ascending order. The value should be 0 if a_1 < a_2 < . Functional Programming: One function at a time https://www.markhneedham.com/blog/2012/03/19/functional-programming-one-function-at-a-time/ Mon, 19 Mar 2012 23:25:47 +0000 https://www.markhneedham.com/blog/2012/03/19/functional-programming-one-function-at-a-time/ As I mentioned in an earlier post I got a bit stuck working out all the diagonals in the 20x20 grid of Project Euler problem 11 and my colleague Uday ended up showing me how to do it. I realised while watching him solve the problem that we’d been using quite different approaches to solving the problem and that his way worked way better than mine, at least in this context. Coding: Wait for the abstractions to emerge https://www.markhneedham.com/blog/2012/03/17/coding-wait-for-the-abstractions-to-emerge/ Sat, 17 Mar 2012 11:19:11 +0000 https://www.markhneedham.com/blog/2012/03/17/coding-wait-for-the-abstractions-to-emerge/ One of the things that I’ve learnt while developing code in an incremental way is that the way the code should be designed isn’t going to be obvious straight away so we need to be patience and wait for it to emerge. There’s often a tendency to pull out classes or methods but more recently I’ve been trying to follow an approach where I leave the code in one class/method and play around with/study it until I see a good abstraction to make. Mercurial: hg push to Google Code https://www.markhneedham.com/blog/2012/03/14/mercurial-hg-push-to-google-code/ Wed, 14 Mar 2012 21:25:40 +0000 https://www.markhneedham.com/blog/2012/03/14/mercurial-hg-push-to-google-code/ I wanted to make a change to add flatMap to Option in totallylazy so I had to clone the repository and make the change. I thought I’d then be able to just push the change using my Google user name and password but instead ended up with the following error: ➜ mhneedham-totally-lazy hg push pushing to https://m.h.needham@code.google.com/r/mhneedham-totally-lazy/ searching for changes 1 changesets found http authorization required realm: Google Code hg Repository user: m. Functional Programming: Shaping the data to fit a function https://www.markhneedham.com/blog/2012/03/13/functional-programming-shaping-the-data-to-fit-a-function/ Tue, 13 Mar 2012 22:55:10 +0000 https://www.markhneedham.com/blog/2012/03/13/functional-programming-shaping-the-data-to-fit-a-function/ As I mentioned in my last post I’ve been working on Project Euler problem 11 and one thing I noticed was that I was shaping the data around a http://www.markhneedham.com/blog/2012/02/28/haskell-creating-a-sliding-window-over-a-collection/ function since it seemed to fit the problem quite well. Problem 11 is defined like so: In the 20x20 grid below, four numbers along a diagonal line have been marked in red. The product of these numbers is 26 63 78 14 = 1788696. Haskell: Couldn't match expected type ``Int' with actual type ``Integer' https://www.markhneedham.com/blog/2012/03/13/haskell-couldnt-match-expected-type-int-with-actual-type-integer/ Tue, 13 Mar 2012 19:42:42 +0000 https://www.markhneedham.com/blog/2012/03/13/haskell-couldnt-match-expected-type-int-with-actual-type-integer/ One of the most frequent compilation error messages that I’ve been getting while working through the Project Euler problems in Haskell is the following: Couldn't match expected type `Int' with actual type `Integer' In problem 11, for example, I define the grid of numbers like so: grid = [[08,02,22,97,38,15,00,40,00,75,04,05,07,78,52,12,50,77,91,08], [49,49,99,40,17,81,18,57,60,87,17,40,98,43,69,48,04,56,62,00], [81,49,31,73,55,79,14,29,93,71,40,67,53,88,30,03,49,13,36,65], [52,70,95,23,04,60,11,42,69,24,68,56,01,32,56,71,37,02,36,91], [22,31,16,71,51,67,63,89,41,92,36,54,22,40,40,28,66,33,13,80], [24,47,32,60,99,03,45,02,44,75,33,53,78,36,84,20,35,17,12,50], [32,98,81,28,64,23,67,10,26,38,40,67,59,54,70,66,18,38,64,70], [67,26,20,68,02,62,12,20,95,63,94,39,63,08,40,91,66,49,94,21], [24,55,58,05,66,73,99,26,97,17,78,78,96,83,14,88,34,89,63,72], [21,36,23,09,75,00,76,44,20,45,35,14,00,61,33,97,34,31,33,95], [78,17,53,28,22,75,31,67,15,94,03,80,04,62,16,14,09,53,56,92], [16,39,05,42,96,35,31,47,55,58,88,24,00,17,54,24,36,29,85,57], [86,56,00,48,35,71,89,07,05,44,44,37,44,60,21,58,51,54,17,58], [19,80,81,68,05,94,47,69,28,73,92,13,86,52,17,77,04,89,55,40], [04,52,08,83,97,35,99,16,07,97,57,32,16,26,26,79,33,27,98,66], [88,36,68,87,57,62,20,72,03,46,33,67,46,55,12,32,63,93,53,69], [04,42,16,73,38,25,39,11,24,94,72,18,08,46,29,32,40,62,76,36], [20,69,36,41,72,30,23,88,34,62,99,69,82,67,59,85,74,04,36,16], [20,73,35,29,78,31,90,01,74,31,49,71,48,86,81,16,23,57,05,54], [01,70,54,71,83,51,54,69,16,92,33,48,61,43,52,01,89,19,67,48]] Which has the following type: Choosing where to put the complexity https://www.markhneedham.com/blog/2012/03/06/choosing-where-to-put-the-complexity/ Tue, 06 Mar 2012 01:17:30 +0000 https://www.markhneedham.com/blog/2012/03/06/choosing-where-to-put-the-complexity/ On the current application I’m working on we need to make use of some data which comes from another system so we’ve created an import script which creates a copy of that data so that we can use it in our application. In general we’ve been trying not to do too much manipulation of the data and keeping it close to the initial structure so that if something goes wrong with the import we can more easily trace the problem back to the original data source. Haskell: Creating a sliding window over a collection https://www.markhneedham.com/blog/2012/02/28/haskell-creating-a-sliding-window-over-a-collection/ Tue, 28 Feb 2012 00:21:59 +0000 https://www.markhneedham.com/blog/2012/02/28/haskell-creating-a-sliding-window-over-a-collection/ A couple of years ago when I was playing around with F# I came across the http://msdn.microsoft.com/en-us/library/ee340420.aspx function which allows you to create a sliding window of a specific size over a collection. Taking an example from the F# documentation page: let seqNumbers = [ 1.0; 1.5; 2.0; 1.5; 1.0; 1.5 ] :> seq<float> let seqWindows = Seq.windowed 3 seqNumbers We end up with this: Initial sequence: 1.0 1.5 2. Haskell: Getting the nth element in a list https://www.markhneedham.com/blog/2012/02/28/haskell-getting-the-nth-element-in-a-list/ Tue, 28 Feb 2012 00:02:21 +0000 https://www.markhneedham.com/blog/2012/02/28/haskell-getting-the-nth-element-in-a-list/ I started trying to solve some of the Project Euler problems as a way to learn a bit of Haskell and problem 7 is defined like so: By listing the first six prime numbers: 2, 3, 5, 7, 11, and 13, we can see that the 6th prime is 13. What is the 10 001st prime number? I read that the Sieve of Eratosthenes is a useful algorithm for working out all the prime numbers and there’s http://en. Java: Faking a closure with a factory to create a domain object https://www.markhneedham.com/blog/2012/02/26/java-faking-a-closure-with-a-factory-to-create-a-domain-object/ Sun, 26 Feb 2012 00:09:03 +0000 https://www.markhneedham.com/blog/2012/02/26/java-faking-a-closure-with-a-factory-to-create-a-domain-object/ Recently we wanted to create a domain object which needed to have an external dependency in order to do a calculation and we wanted to be able to stub out that dependency in our tests. Originally we were just new’ing up the dependency inside the domain class but that makes it impossible to control it’s value in a test. Equally it didn’t seem like we should be passing that dependency into the constructor of the domain object since it’s not a piece of state which defines the object, just something that it uses. Haskell: Viewing the steps of a reduce https://www.markhneedham.com/blog/2012/02/25/haskell-viewing-the-steps-of-a-reduce/ Sat, 25 Feb 2012 23:40:07 +0000 https://www.markhneedham.com/blog/2012/02/25/haskell-viewing-the-steps-of-a-reduce/ I’ve been playing around with Haskell a bit over the last week and in the bit of code I was working on I wanted to fold over a collection but see the state of the fold after each step. I remembered Don Syme showing me how to do something similar during the F# Exchange last year while we were writing some code to score a tennis game by using http://msdn. Thou shalt storm https://www.markhneedham.com/blog/2012/02/24/thou-shalt-storm/ Fri, 24 Feb 2012 02:03:34 +0000 https://www.markhneedham.com/blog/2012/02/24/thou-shalt-storm/ On the majority of the teams that I’ve worked on there’s been a time where everyone seems to be disagreeing with each other about almost everything and the whole situation becomes pretty tense for all involved. The first time I came across this it seemed quite dysfunctional but I was introduced to Bruce Tuckman’s model of group development which helps to explain what’s going on. Tuckman outlines four stages which teams tend to go through - forming, storming, norming and performing. Optimising for typing https://www.markhneedham.com/blog/2012/02/21/optimising-for-typing/ Tue, 21 Feb 2012 22:21:43 +0000 https://www.markhneedham.com/blog/2012/02/21/optimising-for-typing/ My colleague Ola Bini recently wrote a post describing his thoughts on the syntax of programming languages and while the post in general is interesting the bit that most resonates with me at the moment is the following: Typing fewer characters doesn’t actually optimize for writing either - the intuition behind that statement is quite easy: imagine you had to write a book. However, instead of writing it in English, you just wrote the gzipped version of the book directly. Coding: Packaging by vertical slice https://www.markhneedham.com/blog/2012/02/20/coding-packaging-by-vertical-slice/ Mon, 20 Feb 2012 21:54:55 +0000 https://www.markhneedham.com/blog/2012/02/20/coding-packaging-by-vertical-slice/ On most of the applications I’ve worked on we’ve tended to organise/package classes by the function that they have or the layer that they fit in. A typical package structure might therefore end up looking like this: com.awesome.project common StringUtils controllers LocationController PricingController domain Address Cost CostFactory Location Price repositories LocationRepository PriceRepository services LocationService This works reasonably well and allows you to find code which is similar in function but I find that more often than not a lot of the code that lives immediately around where you currently are isn’t actually relevant at the time. Tech Leads & The Progress Principle https://www.markhneedham.com/blog/2012/02/18/tech-leads-the-progress-principle/ Sat, 18 Feb 2012 01:31:09 +0000 https://www.markhneedham.com/blog/2012/02/18/tech-leads-the-progress-principle/ I’ve been reading The Progress Principle on and off for the last couple of months and one of my favourite quotes from the book is the following: Truly effective video game designers know how to create a sense of progress for players within all stages of a game. Truly effective managers know how to do the same for their subordinates. While a tech lead might not like to be referred to as a manager I think part of the role does involve helping developers to make progress and the best ones I’ve worked with seem to do that instinctively. Reading Code: boilerpipe https://www.markhneedham.com/blog/2012/02/13/reading-code-boilerpipe/ Mon, 13 Feb 2012 21:16:24 +0000 https://www.markhneedham.com/blog/2012/02/13/reading-code-boilerpipe/ I’m a big fan of the iPad application Flipboard, especially it’s ability to filter out the non important content on web pages and just show me the main content so I’ve been looking around at open source libraries which provide that facility. I came across a quora page where someone had asked how this was done and the suggested libraries were readability, Goose and boilerpipe. boilerpipe was written by Christian Kohlschütter and has a corresponding paper and video as well. Oracle Spatial: java.sql.SQLRecoverableException: No more data to read from socket https://www.markhneedham.com/blog/2012/02/11/oracle-spatial-java-sql-sqlrecoverableexception-no-more-data-to-read-from-socket/ Sat, 11 Feb 2012 10:55:58 +0000 https://www.markhneedham.com/blog/2012/02/11/oracle-spatial-java-sql-sqlrecoverableexception-no-more-data-to-read-from-socket/ We’re using Oracle Spatial on my current project so that we can locate points within geographical regions and decided earlier in the week to rename the table where we store the SDO_GEOMETRY objects for each region. We did that by using a normal table alter statement but then started seeing the following error when we tried to insert test data in that column which takes an SDO_GEOMETRY object: org.hibernate.exception.JDBCConnectionException: could not execute native bulk manipulation query at org. Java: Fooled by java.util.Arrays.asList https://www.markhneedham.com/blog/2012/02/11/java-fooled-by-java-util-arrays-aslist/ Sat, 11 Feb 2012 10:29:15 +0000 https://www.markhneedham.com/blog/2012/02/11/java-fooled-by-java-util-arrays-aslist/ I’ve been playing around with the boilerpipe code base by writing some tests around it to check my understanding but ran into an interesting problem using java.util.Arrays.asList to pass a list into one of the functions. I was testing the https://github.com/mneedham/boilerpipe/blob/master/src/main/de/l3s/boilerpipe/filters/heuristics/BlockProximityFusion.java class which is used to merge together adjacent text blocks. I started off calling that class like this: import static java.util.Arrays.asList; @Test public void willCallBlockProximityFustion() throws Exception { TextDocument document = new TextDocument(asList(contentBlock("some words"), contentBlock("followed by more words"))); BlockProximityFusion. Downloading the JDK 6 source code https://www.markhneedham.com/blog/2012/02/11/downloading-the-jdk-6-source-code/ Sat, 11 Feb 2012 10:02:09 +0000 https://www.markhneedham.com/blog/2012/02/11/downloading-the-jdk-6-source-code/ Every now and then I want to get the JDK source code onto a new machine and it always seems to take me longer than I expect it to so this post is an attempt to help future me! Googling for this takes me to this page and I always think I’ll just checkout the SVN repository and hook that up but it doesn’t seem to be available. $ wget -S http://java. Delivery approach and constraints https://www.markhneedham.com/blog/2012/02/08/delivery-approach-and-constraints/ Wed, 08 Feb 2012 22:34:02 +0000 https://www.markhneedham.com/blog/2012/02/08/delivery-approach-and-constraints/ In my latest post I described an approach we’d been taking when analysing how to rewrite part of an existing system so that we could build the new version in an incremental way. Towards the end I pointed out that we weren’t actually going to be using an incremental approach as we’d initially thought which was due to a couple of constraints that we have to work under. Hardware provisioning One of the main reasons that we favoured an incremental approach is that we’d be able to deploy to production early which would allow us to show a quicker return on investment. Looking for the seam https://www.markhneedham.com/blog/2012/02/06/looking-for-the-seam/ Mon, 06 Feb 2012 22:22:16 +0000 https://www.markhneedham.com/blog/2012/02/06/looking-for-the-seam/ During December/early January we spent some time analysing an existing system which we were looking to rewrite and our approach was to look for how we could do this in an incremental way. In order to do that we needed to look for what Michael Feathers refers to as a seam: A seam is a place where you can alter behaviour in your program without editing in that place Scala: Converting a scala collection to java.util.List https://www.markhneedham.com/blog/2012/02/05/scala-converting-a-scala-collection-to-java-util-list/ Sun, 05 Feb 2012 21:40:33 +0000 https://www.markhneedham.com/blog/2012/02/05/scala-converting-a-scala-collection-to-java-util-list/ I’ve been playing around a little with Goose - a library for extracting the main body of text from web pages - and I thought I’d try converting some of the code to be more scala-esque in style. The API of the various classes/methods is designed so it’s interoperable with Java code but in order to use functions like map/filter we need the collection to be a Scala one. Oracle: dbstart - ORACLE_HOME_LISTNER is not SET, unable to auto-start Oracle Net Listener https://www.markhneedham.com/blog/2012/01/26/oracle-dbstart-oracle_home_listner-is-not-set-unable-to-auto-start-oracle-net-listener/ Thu, 26 Jan 2012 21:58:27 +0000 https://www.markhneedham.com/blog/2012/01/26/oracle-dbstart-oracle_home_listner-is-not-set-unable-to-auto-start-oracle-net-listener/ We ran into an interesting problem when trying to start up an Oracle instance using dbstart whereby we were getting the following error: -bash-3.2$ dbstart ORACLE_HOME_LISTNER is not SET, unable to auto-start Oracle Net Listener Usage: /u01/app/oracle/product/11.2.0/dbhome_1/bin/dbstart ORACLE_HOME Processing Database instance "orcl": log file /u01/app/oracle/product/11.2.0/dbhome_1/startup.log Ignoring the usage message we thought that setting the environment variable was what we needed to do, but… -bash-3.2$ export ORACLE_HOME_LISTNER=$ORACLE_HOME -bash-3.2$ dbstart ORACLE_HOME_LISTNER is not SET, unable to auto-start Oracle Net Listener Usage: /u01/app/oracle/product/11. Developer machine automation: Dependencies https://www.markhneedham.com/blog/2012/01/24/developer-machine-automation-dependencies/ Tue, 24 Jan 2012 23:16:52 +0000 https://www.markhneedham.com/blog/2012/01/24/developer-machine-automation-dependencies/ As I mentioned in a post last week we’ve been automating the setup of our developer machines with puppet over the last week and one thing that we’ve learnt is that you need to be careful about how you define dependencies. The aim is to get your scripts to the point where the outcome is reasonably deterministic so that we can have confidence they’re going to work the next we run them. Playing around with pomodoros https://www.markhneedham.com/blog/2012/01/22/playing-around-with-pomodoros/ Sun, 22 Jan 2012 21:25:19 +0000 https://www.markhneedham.com/blog/2012/01/22/playing-around-with-pomodoros/ Over the last 3/4 months I’ve been playing around with the idea of using pomodoros to track all coding/software related stuff that I do outside of work. I originally started using this technique while I was doing the programming assignments for ml-class because I wanted to know how much time I was spending on it each week and make sure I didn’t run down rabbit holes too often. One interesting observation that I noticed from keeping the data of these pomodoros was that while during the early programming assignments it would take me 7 or 8 pomodoros to finish, by the end it was down to around 4. Installing Puppet on Oracle Linux https://www.markhneedham.com/blog/2012/01/18/installing-puppet-on-oracle-linux/ Wed, 18 Jan 2012 00:30:59 +0000 https://www.markhneedham.com/blog/2012/01/18/installing-puppet-on-oracle-linux/ We’ve been spending some time trying to setup our developer environment on a Oracle Linux 5.7 build and one of the first steps was to install Puppet as we’ve already created scripts which automate the installation of most things. Unfortunately Oracle Linux builds don’t come with any yum repos configured so when you run the following command… ls -alh /etc/yum.repos.d/ …you don’t see anything :( We eventually realised that there are a list of public yum repositories on the Oracle website, of which we needed to download the definition for Oracle Linux 5 like so: Application footprint https://www.markhneedham.com/blog/2012/01/16/application-footprint/ Mon, 16 Jan 2012 01:40:32 +0000 https://www.markhneedham.com/blog/2012/01/16/application-footprint/ I recently came across Carl Erickson’s 'small teams are dramatically more efficient than large teams' blog post which reminded me of something which my colleague Ashok suggested as a useful way for determining team size - the application footprint. As I understand it the application footprint is applicable for an application at a given point in time and determines how many parallel tasks/streams of work we have. In the case of the project that I’m currently working on there are 3 separate components which need to interact with each other via an API but otherwise are independent. Focused Retrospectives: things to watch for https://www.markhneedham.com/blog/2012/01/16/focused-retrospectives-things-to-watch-for/ Mon, 16 Jan 2012 01:01:30 +0000 https://www.markhneedham.com/blog/2012/01/16/focused-retrospectives-things-to-watch-for/ A few weeks ago a slide deck from an Esther Derby presentation on retrospectives was doing the rounds on twitter and one thing that I found interesting in the deck was the suggestion that a retrospective needs to be focused in some way. I’ve participated in a few focused retrospectives over the past 7/8 months and I think there are some things to be careful about when we decide to focus on something specific rather than just looking back at a time period in general. Wireshark: Following HTTP requests/responses https://www.markhneedham.com/blog/2012/01/14/wireshark-following-http-requestsresponses/ Sat, 14 Jan 2012 23:20:44 +0000 https://www.markhneedham.com/blog/2012/01/14/wireshark-following-http-requestsresponses/ I like using Wireshark to have a look at the traffic going across different interfaces but because it shows what’s happening across the wire by the packet it’s quite difficult to tell what a request/response looked like. I’ve been playing around with restfulie/http://vraptor.caelum.com.br/[Vraptor] today so I wanted to be able to see the request/response pair when something wasn’t working. I didn’t know it was actually possible but this post on StackOverflow describes how. Oracle: exp - EXP-00008: ORACLE error 904 encountered/ORA-00904: "POLTYP": invalid identifier https://www.markhneedham.com/blog/2012/01/13/oracle-exp-exp-00008-oracle-error-904-encounteredora-00904-poltyp-invalid-identifier/ Fri, 13 Jan 2012 21:46:58 +0000 https://www.markhneedham.com/blog/2012/01/13/oracle-exp-exp-00008-oracle-error-904-encounteredora-00904-poltyp-invalid-identifier/ I spent a bit of time this afternoon trying to export an Oracle test database so that we could use it locally using the http://www.orafaq.com/wiki/Import_Export_FAQ#How_does_one_use_the_import.2Fexport_utilities.3F tool. I had to connect to exp like this: exp user/password@remote_address And then filled in the other parameters interactively. Unfortunately when I tried to actually export the specified tables I got the following error message: EXP-00008: ORACLE error 904 encountered ORA-00904: "POLTYP": invalid identifier EXP-00000: Export terminated unsuccessfully I eventually came across Oyvind Isene’s blog post which pointed out that you’d get this problem if you tried to export a 10g database using an 11g client which is exactly what I was trying to do! Learning Android: Roboguice - Injecting context into PreferenceManager https://www.markhneedham.com/blog/2012/01/12/learning-android-roboguice-injecting-context-into-preferencemanager/ Thu, 12 Jan 2012 17:24:30 +0000 https://www.markhneedham.com/blog/2012/01/12/learning-android-roboguice-injecting-context-into-preferencemanager/ In my last post I showed how I’d been able to write a test around saved preferences in my app by making use of a ShadowPreferenceManager but it seemed a bit hacky. I didn’t want to have to do that for every test where I dealt with preferences - I thought it’d be better if I could wrap the preferences in an object of my own and then inject it where necessary. Learning Android: Robolectric - Testing details got saved to SharedPreferences https://www.markhneedham.com/blog/2012/01/10/learning-android-testing-details-got-saved-to-sharedpreferences/ Tue, 10 Jan 2012 09:53:48 +0000 https://www.markhneedham.com/blog/2012/01/10/learning-android-testing-details-got-saved-to-sharedpreferences/ I’ve been writing some tests around an app I’ve been working on using the Robolectric testing framework and one thing I wanted to do was check that an OAuth token/secret were being saved to the user’s preferences. The code that saved the preferences looked like this: public class AuthoriseWithTwitterActivity extends RoboActivity { @Override protected void onCreate(Bundle savedInstanceState) { super.onCreate(intent); ... save("fakeToken", "fakeSecret"); ... } private void save(String userKey, String userSecret) { SharedPreferences settings = PreferenceManager. Learning Android: Getting android-support jar/compatability package as a Maven dependency https://www.markhneedham.com/blog/2012/01/08/learning-android-getting-android-support-jarcompatability-package-as-a-maven-dependency/ Sun, 08 Jan 2012 20:56:45 +0000 https://www.markhneedham.com/blog/2012/01/08/learning-android-getting-android-support-jarcompatability-package-as-a-maven-dependency/ In the app I’m working on I make use of the ViewPager class which is only available in the compatibility package from revisions 3 upwards. Initially I followed the instructions on the developer guide to get hold of the jar but now that I’m trying to adapt my code to fit the RobolectricSample, as I mentioned in my previous post, I needed to hook it up as a Maven dependency. Learning Android: java.lang.OutOfMemoryError: Java heap space with android-maven-plugin https://www.markhneedham.com/blog/2012/01/07/learning-android-java-lang-outofmemoryerror-java-heap-space-with-android-maven-plugin/ Sat, 07 Jan 2012 17:14:41 +0000 https://www.markhneedham.com/blog/2012/01/07/learning-android-java-lang-outofmemoryerror-java-heap-space-with-android-maven-plugin/ I’ve been trying to adapt my Android application to fit into the structure of the RobolectricSample so that I can add some tests around my code but I was running into a problem when trying to deploy the application. To deploy the application you need to run the following command: mvn package android:deploy Which was resulting in the following error: [INFO] UNEXPECTED TOP-LEVEL ERROR: [INFO] java.lang.OutOfMemoryError: Java heap space [INFO] at com. Learning Android: Freezing the UI with a BroadcastReceiver https://www.markhneedham.com/blog/2012/01/06/learning-android-freezing-the-ui-with-a-broadcastreceiver/ Fri, 06 Jan 2012 23:40:53 +0000 https://www.markhneedham.com/blog/2012/01/06/learning-android-freezing-the-ui-with-a-broadcastreceiver/ As I mentioned in a previous post I recently wrote some code in my Android app to inform a BroadcastReceiver whenever a service processed a tweet with a link in it but in implementing this I managed to freeze the UI every time that happened. I made the stupid (in hindsight) mistake of not realising that I shouldn’t be doing a lot of logic in BroadcastReceiver.onReceive since that bit of code gets executed on the UI thread. Learning Android: Getting a service to communicate with an activity https://www.markhneedham.com/blog/2012/01/05/learning-android-getting-a-service-to-communicate-with-an-activity/ Thu, 05 Jan 2012 01:41:32 +0000 https://www.markhneedham.com/blog/2012/01/05/learning-android-getting-a-service-to-communicate-with-an-activity/ In the app I’m working on I created a service which runs in the background away from the main UI thread consuming the Twitter streaming API using twitter4j. It looks like this: public class TweetService extends IntentService { String consumerKey = "TwitterConsumerKey"; String consumerSecret = "TwitterConsumerSecret"; public TweetService() { super("Tweet Service"); } @Override protected void onHandleIntent(Intent intent) { AccessToken accessToken = createAccessToken(); StatusListener listener = new UserStreamListener() { // override a whole load of methods - removed for brevity public void onStatus(Status status) { String theTweet = status. My Software Development journey: 2011 https://www.markhneedham.com/blog/2012/01/03/my-software-development-journey-2011/ Tue, 03 Jan 2012 01:48:42 +0000 https://www.markhneedham.com/blog/2012/01/03/my-software-development-journey-2011/ A couple of years ago I used to write a blog post reflecting on what I’d worked on in the preceding year and what I’d learned and having read 2011 reviews by a couple of other people I thought I’d have a go. Am I actually learning anything? A thought I had many times in 2011 was 'am I actually learning anything?' as, although I was working with languages that I hadn’t used professionally before, the applications that we I worked on were very similar to ones that I’ve worked on previously. Learning Android: Authenticating with Twitter using OAuth https://www.markhneedham.com/blog/2012/01/02/learning-android-authenticating-with-twitter-using-oauth/ Mon, 02 Jan 2012 02:39:52 +0000 https://www.markhneedham.com/blog/2012/01/02/learning-android-authenticating-with-twitter-using-oauth/ I want to be able to get the tweets from my timeline into my app which means I need to authorise the app with Twitter using OAuth. The last time I tried to authenticate using OAuth a couple of years ago was a bit of a failure but luckily this time Honza Pokorny has written a blog post explaining what to do. I had to adjust the code a little bit from what’s written on his post so I thought I’d document what I’ve done. Learning Android: 'Unable to start service Intent not found' https://www.markhneedham.com/blog/2012/01/01/learning-android-unable-to-start-service-intent-not-found/ Sun, 01 Jan 2012 03:22:34 +0000 https://www.markhneedham.com/blog/2012/01/01/learning-android-unable-to-start-service-intent-not-found/ In the Android application that I’ve been playing around with I wrote a service which consumes the Twitter streaming API which I trigger from the app’s main activity like so: public class MyActivity extends Activity { ... @Override public void onCreate(Bundle savedInstanceState) { super.onCreate(savedInstanceState); Intent intent = new Intent(this, TweetService.class); startService(intent); ... } } Where TweetService is defined roughly like this: public class TweetService extends IntentService { @Override protected void onHandleIntent(Intent intent) { // Twitter streaming API stuff goes here } } Unfortunately when I tried to deploy the app the service wasn’t starting and I got this message in the log: Clojure: Casting to a Java class...or not! https://www.markhneedham.com/blog/2011/12/31/clojure-casting-to-a-java-class-or-not/ Sat, 31 Dec 2011 17:47:47 +0000 https://www.markhneedham.com/blog/2011/12/31/clojure-casting-to-a-java-class-or-not/ I have a bit of Java code for working out the final destination of a URL assuming that there might be one redirect which looks like this: private String resolveUrl(String url) { try { HttpURLConnection con = (HttpURLConnection) (new URL(url).openConnection()); con.setInstanceFollowRedirects(false); con.connect(); int responseCode = con.getResponseCode(); if (String.valueOf(responseCode).startsWith("3")) { return con.getHeaderField("Location"); } } catch (IOException e) { return url; } return url; } I need to cast to HttpURLConnection on the first line so that I can make the call to setInstanceFollowRedirects which isn’t available on URLConnection. Yak Shaving: Tracking the yak stack https://www.markhneedham.com/blog/2011/12/31/yak-shaving-tracking-the-yak-stack/ Sat, 31 Dec 2011 03:54:00 +0000 https://www.markhneedham.com/blog/2011/12/31/yak-shaving-tracking-the-yak-stack/ While I’ve been learning how to write an android application there’s been plenty of opportunities for me to go off shaving yaks, it’s pretty much Yakville Central. Typically I’d end up spending hours trying to work out some obscure thing which I didn’t really need to know so I wanted to try and avoid that this time. I started keeping a track of the 'yak stack' which I was currently following and mentally noting exactly where I was up to. The Language of Risk https://www.markhneedham.com/blog/2011/12/30/the-language-of-risk/ Fri, 30 Dec 2011 03:38:58 +0000 https://www.markhneedham.com/blog/2011/12/30/the-language-of-risk/ A few weeks ago Chris Matts wrote an interesting blog post 'the language of risk' in which he describes an approach he used to explain the processes his team uses to an auditor. Why did the auditor like what I said? Because I explained everything we did in terms of risk. When they asked for a “process”, I explained the risk the process was meant to address. I then explained how our different process addressed the risk more effectively. Learning Android: Sharing with Twitter/the 'share via' dialog https://www.markhneedham.com/blog/2011/12/29/learning-android-sharing-with-twitterthe-share-via-dialog/ Thu, 29 Dec 2011 22:40:19 +0000 https://www.markhneedham.com/blog/2011/12/29/learning-android-sharing-with-twitterthe-share-via-dialog/ One thing I wanted to do in the little application I’m working on was send data to other apps on my phone using the 'share via' dialog which I’ve seen used on the Twitter app. In this case I wanted to send a link and its title to twitter and came across a StackOverflow post which explained how to do so. To keep it simple I added a button to the view and then shared the data via the on click event on that button: Reading Code: Know what you're looking for https://www.markhneedham.com/blog/2011/12/29/reading-code-know-what-youre-looking-for/ Thu, 29 Dec 2011 02:43:34 +0000 https://www.markhneedham.com/blog/2011/12/29/reading-code-know-what-youre-looking-for/ In the last week or so before Christmas I got the chance to spend some time pairing with my colleague Alex Harin while trying to understand how an existing application which we were investigating was written. We knew from watching a demo of the application that the user was able to send some processing off to be done in the background and that they would be emailed once that had happened. Learning Android: WebView character encoding https://www.markhneedham.com/blog/2011/12/27/learning-android-webview-character-encoding/ Tue, 27 Dec 2011 23:53:56 +0000 https://www.markhneedham.com/blog/2011/12/27/learning-android-webview-character-encoding/ In my continued attempts to learn how to write an Android application I came across a problem with character encoding when trying to load some text into a WebView. I was initially trying to write the text to the WebView like this: WebView webview = new WebView(collection.getContext()); webview.loadData(textWithQuotesIn, "text/html", "UTF-8"); But ended up with the output in the picture on the left hand side. I tried playing around with the encoding and debugged the application all the way through until it hit the WebView but there didn’t seem to be any problem with the text. Leiningen: Using goose via a local Maven repository https://www.markhneedham.com/blog/2011/12/27/leiningen-using-goose-via-a-local-maven-repository/ Tue, 27 Dec 2011 12:48:17 +0000 https://www.markhneedham.com/blog/2011/12/27/leiningen-using-goose-via-a-local-maven-repository/ I’ve been playing around a little bit with goose - a HTML content/article extractor - originally in Java but later in clojure where I needed to work out how to include goose and all its dependencies via Leiningen. goose isn’t included in a Maven repository so I needed to create a local repository, something which I’ve got stuck on in the past. Luckily Paul Gross has written a cool blog post explaining how his team got past this problem. Learning Android: Deploying application to phone from Mac OS X https://www.markhneedham.com/blog/2011/12/23/learning-android-deploying-application-to-phone-from-mac-os-x/ Fri, 23 Dec 2011 22:55:17 +0000 https://www.markhneedham.com/blog/2011/12/23/learning-android-deploying-application-to-phone-from-mac-os-x/ I’ve been playing around a little bit today with writing an Android application and while for the majority of the time I’ve been deploying to an emulator I wanted to see what it’d look like on my phone. The developer guide contains all the instructions on how to do this but unfortunately I’m blessed with the ability to skim over instructions which meant that my phone wasn’t getting picked up by the Android Debug Bridge. The supposed black box https://www.markhneedham.com/blog/2011/12/20/the-supposed-black-box/ Tue, 20 Dec 2011 23:57:51 +0000 https://www.markhneedham.com/blog/2011/12/20/the-supposed-black-box/ On a reasonable number of the systems that I’ve worked on over the past few years there’s been a 'black box' component which the team I’ve been on has needed to integrate with. I’ve always found it a little strange that you wouldn’t need to/want to know how that part of the system worked or that you could actually believe that it was truly a black box. If it doesn’t work then you have no way of diagnosing the problem - did you do something wrong, was there something wrong inside the black box or was it something else. The Lean Startup: Book Review https://www.markhneedham.com/blog/2011/12/18/the-lean-startup-book-review/ Sun, 18 Dec 2011 21:00:04 +0000 https://www.markhneedham.com/blog/2011/12/18/the-lean-startup-book-review/ I’d heard about The Lean Startup for a long time before I actually read it, mainly from following the 'Startup Lessons Learned' blog, but I didn’t get the book until a colleague suggested a meetup to discuss how we might apply the ideas on our projects. My general learning from the book is that we need to take the idea of creating tight feedback loops, which we’ve learnt in the agile/lean worlds, and apply it to product development. WebDriver: Getting it to play nicely with Xvfb https://www.markhneedham.com/blog/2011/12/15/webdriver-getting-it-to-play-nicely-with-xvfb/ Thu, 15 Dec 2011 23:19:31 +0000 https://www.markhneedham.com/blog/2011/12/15/webdriver-getting-it-to-play-nicely-with-xvfb/ Another thing we’ve been doing with WebDriver is having it run with the FirefoxDriver while redirecting the display output into the Xvfb framebuffer so that we can run it on our continuous integration agents which don’t have a display attached. The first thing we needed to do was set the environment property 'webdriver.firefox.bin' to our own script which would point the display to Xvfb before starting Firefox: import java.lang.System._ lazy val firefoxDriver: FirefoxDriver = { setProperty("webdriver. WebDriver: Getting it to play nicely with jQuery ColorBox https://www.markhneedham.com/blog/2011/12/13/webdriver-getting-it-to-play-nicely-with-jquery-colorbox/ Tue, 13 Dec 2011 23:31:02 +0000 https://www.markhneedham.com/blog/2011/12/13/webdriver-getting-it-to-play-nicely-with-jquery-colorbox/ As I mentioned in an earlier post about removing manual test scenarios we’ve been trying to automate some parts of our application where a user action leads to a jQuery ColorBox powered overlay appearing. With this type of feature there tends to be some sort of animation which accompanies the overlay so we have to wait for an element inside the overlay to become visible on the screen before trying to do any assertions on the overlay. The 5 Whys/Root cause analysis - Douglas Squirrel https://www.markhneedham.com/blog/2011/12/10/the-5-whysroot-cause-analysis-douglas-squirrel/ Sat, 10 Dec 2011 14:11:13 +0000 https://www.markhneedham.com/blog/2011/12/10/the-5-whysroot-cause-analysis-douglas-squirrel/ At XP Day I was chatting to Benjamin Mitchell about the 5 whys exercises that we’d tried on my team and I suggested that beyond Eric Ries' post on the subject I hadn’t come across an article/video which explained how to do it. Benjamin mentioned that Douglas Squirrel had recently done a talk on this very subject at Skillsmatter and as with most Skillsmatter talks there’s a video of the presentation online. Continuous Delivery: Removing manual scenarios https://www.markhneedham.com/blog/2011/12/05/continuous-delivery-removing-manual-scenarios/ Mon, 05 Dec 2011 23:13:34 +0000 https://www.markhneedham.com/blog/2011/12/05/continuous-delivery-removing-manual-scenarios/ On the project that I’m currently working on we’re trying to move to the stage where we’d be able to deploy multiple times a week while still having a reasonable degree of confidence that the application still works. One of the (perhaps obvious) things that we’ve had to do as a result of wanting to do this is reduce the number of manual scenarios that our QAs need to run through. XP Day: Visualizing what's happening on our project https://www.markhneedham.com/blog/2011/11/30/xp-day-visualizing-whats-happening-on-our-project/ Wed, 30 Nov 2011 02:25:52 +0000 https://www.markhneedham.com/blog/2011/11/30/xp-day-visualizing-whats-happening-on-our-project/ Another presentation that I gave at XP Day was one covering some visualisations Liz, Uday and I have created from various data we have about our project, gathered from Git, Go and Mingle. Visualisations These were some of the things that I learned from doing the presentation: The various graphs I presented in the talk have a resolution of 1680 x 1050 which is a much higher resolution than what was available on the projector. Scala: Our Retrospective of the benefits/drawbacks https://www.markhneedham.com/blog/2011/11/28/scala-our-retrospective-of-the-benefitsdrawbacks/ Mon, 28 Nov 2011 00:15:00 +0000 https://www.markhneedham.com/blog/2011/11/28/scala-our-retrospective-of-the-benefitsdrawbacks/ As the closing part of a Scala Experience Report Liz and I gave at XP Day we detailed a retrospective that we’d carried out on the project after 3 months where the team outlined the positives/negatives of working with Scala. The team members who were there right at the beginning of the project 3 months earlier had come up with what they thought the proposed benefits/drawbacks would be so it was quite interesting to look at our thoughts at both times. XP Day: Scala: An Experience Report (Liz Douglass and me) https://www.markhneedham.com/blog/2011/11/24/xp-day-scala-an-experience-report-liz-douglass-and-me/ Thu, 24 Nov 2011 23:52:18 +0000 https://www.markhneedham.com/blog/2011/11/24/xp-day-scala-an-experience-report-liz-douglass-and-me/ At XP Day my colleague Liz Douglass and I presented the following experience report on our last 6 months working together on our project. Scala: An experience report We wanted to focus on answering the following questions with our talk: Should the project have been done in Java? Does it really speed up development as was hoped? What features of the language and patterns of usage have been successes? XP Day: Cynefin & Agile (Joseph Pelrine/Steve Freeman) https://www.markhneedham.com/blog/2011/11/24/xp-day-cynefin-agile-joseph-pelrinesteve-freeman/ Thu, 24 Nov 2011 22:25:04 +0000 https://www.markhneedham.com/blog/2011/11/24/xp-day-cynefin-agile-joseph-pelrinesteve-freeman/ Another session that I attended at XP Day was one facilitated by http://twitter.com/!/sf105[Steve Freeman] and http://twitter.com/!/josephpelrine[Joseph Pelrine] where we discussed the Cynefin model, something that I first came across earlier in the year at XP 2011. We spent the first part of the session drawing out the model and coming up with some software examples which might fit into each domain. Simple - when you’re going to checkin run the build XP Day: Refactoring to functional style (Julian Kelsey/Andrew Parker) https://www.markhneedham.com/blog/2011/11/22/xp-day-refactoring-to-functional-style-julian-kelseyandrew-parker/ Tue, 22 Nov 2011 00:13:40 +0000 https://www.markhneedham.com/blog/2011/11/22/xp-day-refactoring-to-functional-style-julian-kelseyandrew-parker/ I’m attending XP Day this year and the first talk I attended was one by Julian Kelsey and Andrew Parker titled 'Refactoring to functional style'. I’ve worked on a Scala project for the last 6 months and previously given a couple of talks about adopting a functional style of programming in C# so this is a subject area that I find quite interesting. The talk focused on 5 refactorings that the presenters have identified to help move imperative code to a more functional style: Java/Scala: Runtime.exec hanging/in 'pipe_w' state https://www.markhneedham.com/blog/2011/11/20/javascala-runtime-exec-hangingin-pipe_w-state/ Sun, 20 Nov 2011 20:20:08 +0000 https://www.markhneedham.com/blog/2011/11/20/javascala-runtime-exec-hangingin-pipe_w-state/ On the system that I’m currently working on we have a data ingestion process which needs to take zip files, unzip them and then import their contents into the database. As a result we delegate from Scala code to the system unzip command like so: def extract { var command = "unzip %s -d %s" format("/file/to/unzip.zip", "/place/to/unzip/to") var process: Process = null try { process = Runtime.getRuntime.exec(command) val exitCode = process. Dr Nic's 'How to stop killing people with your public speeches' https://www.markhneedham.com/blog/2011/11/16/dr-nics-how-to-stop-killing-people-with-your-public-speeches/ Wed, 16 Nov 2011 22:56:59 +0000 https://www.markhneedham.com/blog/2011/11/16/dr-nics-how-to-stop-killing-people-with-your-public-speeches/ I recently came across a really cool blog post by Dr Nic titled 'How to stop killing people with your public speeches' where he talks about the importance of practicing our presentations so that they actually make an impact on our audience. Towards the end of the post he suggests joining Toastmasters as a useful first step for getting used to speaking to a group of people and as an added bonus you get feedback after each speech you give. Scala: scala.xml.SpecialNode: StackOverFlowError https://www.markhneedham.com/blog/2011/11/15/scala-scala-xml-specialnode-stackoverflowerror/ Tue, 15 Nov 2011 00:26:46 +0000 https://www.markhneedham.com/blog/2011/11/15/scala-scala-xml-specialnode-stackoverflowerror/ We have some code in our application where we parse reasonably complex XML structures and then sometimes choose to get rid of certain elements from the structure. When we wanted to get rid of an element we replaced that element with a http://www.scala-lang.org/api/current/scala/xml/SpecialNode.html: val emptyNode = new scala.xml.SpecialNode() { def buildString(sb:StringBuilder) = new StringBuilder() def label = null } Unfortunately when you call #text on the node it results in the following exception which we only found out today: The 5 whys: Another attempt https://www.markhneedham.com/blog/2011/11/13/the-5-whys-another-attempt/ Sun, 13 Nov 2011 23:08:07 +0000 https://www.markhneedham.com/blog/2011/11/13/the-5-whys-another-attempt/ Towards the end of the week before last and the beginning of last week we’d been having quite a few problems with our QA environment to the point where we were unable to deploy anything to it for 3 days. A few weeks ago I wrote about a 5 whys exercise that we did in a retrospective and in our weekly code review we decided to give it a go and see what we could learn. fgrep: Searching for a list of identifiers https://www.markhneedham.com/blog/2011/11/10/fgrep-searching-for-a-list-of-identifiers/ Thu, 10 Nov 2011 23:37:36 +0000 https://www.markhneedham.com/blog/2011/11/10/fgrep-searching-for-a-list-of-identifiers/ We had a problem to solve earlier in the week where we wanted to try and find out which files we had ingested into our database based on a unique identifier. We had a few hundred thousand files to search through to try and find the ones where around 50,000 identifiers were mentioned so that we could re-ingest them. Running a normal grep for each identifier individually took a ridiculously long time so we needed to find a way to search for all of the identifiers at the same time to speed up the process. Scala: Setting default argument for function parameter https://www.markhneedham.com/blog/2011/11/08/scala-setting-default-argument-for-function-parameter/ Tue, 08 Nov 2011 22:46:47 +0000 https://www.markhneedham.com/blog/2011/11/08/scala-setting-default-argument-for-function-parameter/ Yesterday I wrote about a problem we’ve been having with trying to work out how to default a function parameter that we have in one of our methods. Our current version of the code defines the function parameter as implicit which means that if it isn’t passed in it defaults to http://www.scala-lang.org/api/current/index.html#scala.Predef$$$less$colon$less: def foo[T](bar: String)(implicit blah:(String => T)) = { println(blah(bar)); bar } It’s not entirely clear just from reading the code where the implicit value is coming from so we want to try and make the code a bit more expressive. Scala: Which implicit conversion is being used? https://www.markhneedham.com/blog/2011/11/06/scala-which-implicit-conversion-is-being-used/ Sun, 06 Nov 2011 21:25:06 +0000 https://www.markhneedham.com/blog/2011/11/06/scala-which-implicit-conversion-is-being-used/ Last week my colleague Pat created a method which had a parameter which he wanted to make optional so that consumers of the API wouldn’t have to provide it if they didn’t want to. We ended up making the method take in an implicit value such that the method signature looked a bit like this: def foo[T](implicit blah:(String => T)) = { println(blah("mark")) "foo" } We can call foo with or without an argument: Scala: Option.isDefined as the new null check https://www.markhneedham.com/blog/2011/11/01/scala-option-isdefined-as-the-new-null-check/ Tue, 01 Nov 2011 00:58:45 +0000 https://www.markhneedham.com/blog/2011/11/01/scala-option-isdefined-as-the-new-null-check/ One cool thing about using Scala on my current project is that we don’t have nulls anywhere in our code, instead when something may or may not be there we make use of the Option type. Unfortunately what we’ve (heavily contributed by me) ended up with in our code base is repeated use of the http://www.scala-lang.org/api/rc/scala/Option.html method whenever we want to make a decision depending on whether or not the option is populated. Working with external identifiers https://www.markhneedham.com/blog/2011/10/31/working-with-external-identifiers/ Mon, 31 Oct 2011 22:58:29 +0000 https://www.markhneedham.com/blog/2011/10/31/working-with-external-identifiers/ As part of the ingestion process for our application we import XML documents and corresponding PDFs into a database and onto the file system respectively. Since the user needs to be able to search for documents by the userFacingId we reference it by that identifier in the database and the web application. Each document also has an external identifier and we use this to identify the PDFs on the file system. Canonical Identifiers https://www.markhneedham.com/blog/2011/10/30/canonical-identifiers/ Sun, 30 Oct 2011 22:32:16 +0000 https://www.markhneedham.com/blog/2011/10/30/canonical-identifiers/ Duncan and I had an interesting problem recently where we had to make it possible to search within an 'item' to find possible sub items that exist inside it. The URI for the item was something like this: /items/234 Let’s say Item 234 contains the following sub items: Mark duncan We have a search box on the page which allows us to type in the name of a sub item and go the sub item’s page if it exists or see an error message if it doesn’t. Gaming the system: Some project examples https://www.markhneedham.com/blog/2011/10/26/gaming-the-system-some-project-examples/ Wed, 26 Oct 2011 23:55:44 +0000 https://www.markhneedham.com/blog/2011/10/26/gaming-the-system-some-project-examples/ Earlier this year Liz Keogh gave a talk at QCon London titled 'Learning and Perverse Incentives: The Evil Hat' where she eventually encouraged people to try and game the systems that they take part in. Over the last month or so we’ve had two different metrics visibly on show and are therefore prime targets for being gamed. The first metric is one we included on our build radiator which shows how many commits to the git repository each person has for that day. Scala: Adding logging around a repository https://www.markhneedham.com/blog/2011/10/25/scala-adding-logging-around-a-repository/ Tue, 25 Oct 2011 21:19:22 +0000 https://www.markhneedham.com/blog/2011/10/25/scala-adding-logging-around-a-repository/ We wanted to add some logging around one of our repositories to track how many times users were trying to do various things on the application and came across a cool blog post explaining how we might be able to do this. We ended up with the following code: class BarRepository { def all: Seq[Bar] = Seq() def find(barId:String) : Bar = Bar("myBar") } class TrackService(barRepository:BarRepository) { def all : Seq[Bar] = { var bars = barRepository. Scala: Creating an Xml element with an optional attribute https://www.markhneedham.com/blog/2011/10/25/scala-creating-an-xml-element-with-an-optional-attribute/ Tue, 25 Oct 2011 20:38:52 +0000 https://www.markhneedham.com/blog/2011/10/25/scala-creating-an-xml-element-with-an-optional-attribute/ We have a lot of Xml in our application and one of the things that we need to do reasonably frequently in our test code is create elements which have optional attributes on them. Our simple first approach looked like this: def createElement(attribute: Option[String]) = if(attribute.isDefined) <p bar={attribute.get} /> else <p /> That works but it always seemed like we should be able to do it in a simpler way. Retrospective: The 5 whys https://www.markhneedham.com/blog/2011/10/24/retrospective-the-5-whys/ Mon, 24 Oct 2011 22:53:14 +0000 https://www.markhneedham.com/blog/2011/10/24/retrospective-the-5-whys/ Last week my colleague Pat Fornasier ran our team’s fortnightly retrospective and one of the exercises we did was 'the 5 whys'. I’ve always wanted to see how the 5 why’s would pan out but could never see how you could fit it into a normal retrospective. Pat was able to do this by using the data gathered by an earlier timeline exercise where the team had to plot the main events that had happened over the last 6 months. Learning Unix find: Searching in/Excluding certain folders https://www.markhneedham.com/blog/2011/10/21/learning-unix-find-searching-inexcluding-certain-folders/ Fri, 21 Oct 2011 21:25:04 +0000 https://www.markhneedham.com/blog/2011/10/21/learning-unix-find-searching-inexcluding-certain-folders/ I love playing around with commands on the Unix shell but one of the ones that I’ve found the most difficult to learn beyond the very basics is http://unixhelp.ed.ac.uk/CGI/man-cgi?find. I think this is partially because I find the find man page quite difficult to read and partially because it’s usually quicker to work out how to solve my problem with a command I already know than to learn another one. Getting stuck and agile software teams https://www.markhneedham.com/blog/2011/10/20/getting-stuck-and-agile-software-teams/ Thu, 20 Oct 2011 22:09:31 +0000 https://www.markhneedham.com/blog/2011/10/20/getting-stuck-and-agile-software-teams/ I came across an interesting set of posts by Jeff Wofford where he talks about programmers getting stuck and it made me think that, despite its faults, agile software development does have some useful practices for stopping us getting stuck for too long. Many of the examples that Jeff describes sound like yak shaving to me which is part of what makes programming fun but doesn’t always correlate to adding value to the product that you’re building. git: Only pushing some changes from local repository https://www.markhneedham.com/blog/2011/10/20/git-only-pushing-some-changes-from-local-repository/ Thu, 20 Oct 2011 06:50:01 +0000 https://www.markhneedham.com/blog/2011/10/20/git-only-pushing-some-changes-from-local-repository/ Something that we want to do reasonable frequently on my current project is to push some changes which have been committed to our local repository to master but not all of them. For example we might end up with 3 changes we haven’t pushed: >> ~/github/local$ git status # On branch master # Your branch is ahead of 'origin/master' by 3 commits. # nothing to commit (working directory clean) >> ~/github/local$ git hist * bb7b139 Thu, 20 Oct 2011 07:37:11 +0100 | mark: one last time (HEAD, master) [Mark Needham] * 1cef99a Thu, 20 Oct 2011 07:36:35 +0100 | mark:another new line [Mark Needham] * 850e105 Thu, 20 Oct 2011 07:36:01 +0100 | mark: new line [Mark Needham] * 2b25622 Thu, 20 Oct 2011 07:32:43 +0100 | mark: adding file for first time (origin/master) [Mark Needham] And we only want to push the commit with hash 850e105 for example. Unix: Some useful tools https://www.markhneedham.com/blog/2011/10/17/unix-some-useful-tools/ Mon, 17 Oct 2011 22:58:50 +0000 https://www.markhneedham.com/blog/2011/10/17/unix-some-useful-tools/ On my current project we regularly use a few Unix tools which aren’t on the standard installation so I thought I’d collate them here so I don’t forget about them in the future. ghex We suspected we’d ended up with some rogue characters in a file that we weren’t able to detect in our normal text editor recently and wanted to view the byte by byte representation of the file to check it out. Bash: Reusing previous commands https://www.markhneedham.com/blog/2011/10/13/bash-reusing-previous-commands/ Thu, 13 Oct 2011 19:46:20 +0000 https://www.markhneedham.com/blog/2011/10/13/bash-reusing-previous-commands/ A lot of the time when I’m using the bash shell I want to re-use commands that I’ve previously entered and I’ve recently learnt some neat ways to do this from my colleagues Tom and Kief. If we want to list the history of all the commands we’ve entered in a shell session then the following command does the trick: > history ... 761 sudo port search pdfinfo 762 to_ipad andersen-phd-thesis. Unix: Getting the page count of a linearized PDF https://www.markhneedham.com/blog/2011/10/09/unix-getting-the-page-count-of-a-linearized-pdf/ Sun, 09 Oct 2011 11:34:04 +0000 https://www.markhneedham.com/blog/2011/10/09/unix-getting-the-page-count-of-a-linearized-pdf/ We were doing some work last week to rasterize a PDF document into a sequence of images and wanted to get a rough idea of how many pages we’d be dealing with if we created an image per page. The PDFs we’re dealing with are linearized since they’re available for viewing on the web: A LINEARIZED PDF FILE is one that has been organized in a special way to enable efﬁcient incremental access in a network environment. Git: Getting the history of a deleted file https://www.markhneedham.com/blog/2011/10/04/git-getting-the-history-of-a-deleted-file/ Tue, 04 Oct 2011 22:33:09 +0000 https://www.markhneedham.com/blog/2011/10/04/git-getting-the-history-of-a-deleted-file/ We recently wanted to get the Git history of a file which we knew existed but had now been deleted so we could find out what had happened to it. Using a simple git log didn’t work: git log deletedFile.txt fatal: ambiguous argument 'deletedFile.txt': unknown revision or path not in the working tree. We eventually came across Francois Marier’s blog post which points out that you need to use the following command instead: Scala: Replacing a trait with a fake one for testing https://www.markhneedham.com/blog/2011/09/25/scala-replacing-a-trait-with-a-fake-one-for-testing/ Sun, 25 Sep 2011 10:24:20 +0000 https://www.markhneedham.com/blog/2011/09/25/scala-replacing-a-trait-with-a-fake-one-for-testing/ We recently wanted to replace a trait mixed into one of our classes with a fake version to make it easier to test but forgot how exactly to do that! The class is roughly like this: trait Foo { def foo : String = "real foo" } class Mark extends Foo {} We originally tried to replace it like this: trait BrokenFakeFoo { def foo : String = "broken fake foo" } val m = new Mark with BrokenFakeFoo error: overriding method foo in trait Foo of type => String; method foo in trait BrokenFakeFoo of type => String needs `override' modifier val m = new Mark with BrokenFakeFoo If m compiled it would have two versions of foo but it wouldn’t know which one to use, hence the error message. jQuery: Collecting the results from a collection of asynchronous requests https://www.markhneedham.com/blog/2011/09/25/jquery-collecting-the-results-from-a-collection-of-asynchronous-requests/ Sun, 25 Sep 2011 09:26:19 +0000 https://www.markhneedham.com/blog/2011/09/25/jquery-collecting-the-results-from-a-collection-of-asynchronous-requests/ Liz and I recently spent some time building a pair stair to show how long ago people had paired with each other and one of the things we had to do was make AJAX requests to get the pairing data for each person and then collate it all to build the stair. The original attempt to do this looked a bit like this: var people = ["Marc", "Liz", "Ken", "Duncan", "Uday", "Mark", "Charles"]; var grid = []; $. Retrospectives: Getting overly focused on actions https://www.markhneedham.com/blog/2011/09/24/retrospectives-getting-overly-focused-on-actions/ Sat, 24 Sep 2011 06:56:39 +0000 https://www.markhneedham.com/blog/2011/09/24/retrospectives-getting-overly-focused-on-actions/ I’ve attended a lot of different retrospectives over the last few years and one thing that seems to happen quite frequently is that a problem will be raised and there will become a massive urgency to find an action to match with that problem. As a result of this we don’t tend to go very deeply into working out why that problem happened in the first place and how we can stop it happening in the first place. node.js: child_process.exec not returning all results https://www.markhneedham.com/blog/2011/09/22/node-js-child_process-exec-not-returning-all-results/ Thu, 22 Sep 2011 19:55:45 +0000 https://www.markhneedham.com/blog/2011/09/22/node-js-child_process-exec-not-returning-all-results/ I’ve been playing around with some node.js code to get each of the commits from our git repository but noticed that it didn’t seem to be returning me all the results. I had the following code: var exec = require('child_process').exec; var gitRepository = '/some/local/path'; exec('cd ' + gitRepository + ' && git log --pretty=format:"%H | %ad | %s%d" --date=raw ', function(error, stdout, stderror) { var commits = stdout.split("\n"); // do some stuff with commits }); We have around 2000 commits in the repository but I was only getting back 1600 of them when I checked the length of commits. The 'window fixing' wall https://www.markhneedham.com/blog/2011/09/20/the-window-fixing-wall/ Tue, 20 Sep 2011 06:49:43 +0000 https://www.markhneedham.com/blog/2011/09/20/the-window-fixing-wall/ On my current project we have a wall where we keep track of 'window fixing' tasks - things that people want to fix in the code base but chose to defer until a later date. Every now and then we take what’s on the wall and prioritise it according to Fabio Pereira’s effort/pain matrix so that we know which clean up tasks will provide the greatest value to the team. Scala: for comprehensions with Options https://www.markhneedham.com/blog/2011/09/15/scala-for-comprehensions-with-options/ Thu, 15 Sep 2011 22:21:14 +0000 https://www.markhneedham.com/blog/2011/09/15/scala-for-comprehensions-with-options/ I’ve generally avoided using for expressions in Scala because the keyword reminds me of for loops in Java/C# and I want to learn to program in a less imperative way. After working with my colleague Mushtaq I realised that in some cases using for comprehensions can lead to much more readable code. An interesting use case where this is the case is when we want to create an object from a bunch of parameters that may or may not be set. Javascript: Internet Explorer 8 - trim() leads to 'Object doesn't support this property or method' error https://www.markhneedham.com/blog/2011/09/13/javascript-internet-explorer-8-trim-leads-to-object-doesnt-support-this-property-or-method-error/ Tue, 13 Sep 2011 13:33:43 +0000 https://www.markhneedham.com/blog/2011/09/13/javascript-internet-explorer-8-trim-leads-to-object-doesnt-support-this-property-or-method-error/ We make use of the Javascript https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/String/trim function in our application but didn’t realise that it isn’t implemented by Internet Explorer until version 9. This led to the following error on IE8 when we used it: Message: Object doesn’t support this property or method Line: 18 Char: 13 Code: 0 URI: http://our.app/file.js There’s a stackoverflow thread suggesting some different ways of implementing your own 'trim()' method but since we’re using jQuery already we decided to just use the '$. gawk: Getting story numbers from git commit messages https://www.markhneedham.com/blog/2011/09/12/gawk-getting-story-numbers-from-git-commit-messages/ Mon, 12 Sep 2011 07:05:13 +0000 https://www.markhneedham.com/blog/2011/09/12/gawk-getting-story-numbers-from-git-commit-messages/ As I mentioned in my previous post I’ve been writing a little application to create graphs based on our git repository history and in one of them we wanted to try and create a graph showing which people had been working on which stories. I needed a way to extract a story number from the git commit message and then store them all in a text file. A typical commit with a story number in might look like this: Learning node.js: Step https://www.markhneedham.com/blog/2011/09/11/learning-node-js-step/ Sun, 11 Sep 2011 22:37:15 +0000 https://www.markhneedham.com/blog/2011/09/11/learning-node-js-step/ I’ve been playing around with node.js to generate some graphs from our git repository which effectively meant chaining together a bunch of shell commands to give me the repository data in the format I wanted. I was able to do this by making use of http://nodejs.org/docs/v0.4.8/api/all.html#child_process.exec which comes with the core library. The first version looked like this: var exec = require('child_process').exec, _ = require("underscore"); ... function parseCommitsFromRepository(fn) { var gitRepository = "/tmp/core"; var gitPlayArea = "/tmp/" + new Date(). Learning Regular Expressions: Non capturing match https://www.markhneedham.com/blog/2011/09/07/learning-regular-expressions-non-capturing-match/ Wed, 07 Sep 2011 20:47:14 +0000 https://www.markhneedham.com/blog/2011/09/07/learning-regular-expressions-non-capturing-match/ I’ve been working my way slowly through the O’Reilly 'Mastering Regular Expressions' book and recently read about the non capturing match operator which came in useful for some Git log parsing I’ve been doing. On the project I’m working on we all commit as the same user and then put our names at the beginning of the commit message. We wanted to try and find out the statistics of who’d been pairing with each other and therefore needed to extract the pairs from commits. Pair Programming: The disadvantages of 100% pairing https://www.markhneedham.com/blog/2011/09/06/pair-programming-the-disadvantages-of-100-pairing/ Tue, 06 Sep 2011 23:34:58 +0000 https://www.markhneedham.com/blog/2011/09/06/pair-programming-the-disadvantages-of-100-pairing/ I’ve written a lot of blog posts in the past about pair programming and the advantages that I’ve seen from using this technique but lately I find myself increasingly frustrated at the need to pair 100% of the time which happens on most teams I work on. From my experience it’s certainly useful as a coaching tool, as I’ve mentioned before I think it’s a very useful for increasing the amount of collaboration between team members and an excellent way for ensuring that knowledge of the code base is spread across the team. Parsing XML from the unix terminal/shell https://www.markhneedham.com/blog/2011/09/03/parsing-xml-from-the-unix-terminalshell/ Sat, 03 Sep 2011 23:42:11 +0000 https://www.markhneedham.com/blog/2011/09/03/parsing-xml-from-the-unix-terminalshell/ I spent a bit of time today trying to put together a quick script which would allow me to grab story numbers from the commits in our Git repository and then work out which functional areas those stories were in by querying mingle. Therefore I wanted to make a curl request to the mingle and then pipe that result somewhere and run an xpath expression to get my element. Coding: The value in finding the generic abstraction https://www.markhneedham.com/blog/2011/08/31/coding-the-value-in-finding-the-generic-abstraction/ Wed, 31 Aug 2011 06:49:48 +0000 https://www.markhneedham.com/blog/2011/08/31/coding-the-value-in-finding-the-generic-abstraction/ I recently worked on adding the meta data section for each of the different document types that it serves which involved showing 15-20 pieces of data for each document type. There are around 4-5 document types and although the meta data for each document type is similar it’s not exactly the same! When we got to the second document type it wasn’t obvious where the abstraction was so we went for the copy/paste approach to see if it would be any easier to see the commonality if we put the two templates side by side. The read-only database https://www.markhneedham.com/blog/2011/08/29/the-read-only-database/ Mon, 29 Aug 2011 23:32:26 +0000 https://www.markhneedham.com/blog/2011/08/29/the-read-only-database/ The last couple of applications I’ve worked on have had almost completely read only databases where we had to populate the database in an offline process and then provide various ways for users to access the data. This creates an interesting situation with respect to how we should setup our development environment. Our normal setup would probably have an individual version of that database on every development machine and we would populate and then truncate the database during various test scenarios. Pain Driven Development https://www.markhneedham.com/blog/2011/08/21/pain-driven-development/ Sun, 21 Aug 2011 17:33:07 +0000 https://www.markhneedham.com/blog/2011/08/21/pain-driven-development/ My colleague Pat Fornasier has been using an interesting spin on the idea of making decisions at the last responsible moment by encouraging our team to 'feel the pain' before introducing any constraint in our application. These are some of the decisions which we’ve been delaying/are still delaying: Dependency Injection Everyone in our team comes from a Java/C# background and one of the first technical decisions that gets made on applications in those languages is which dependency injection container to use. node.js: Building a graph of build times using the Go API https://www.markhneedham.com/blog/2011/08/13/node-js-building-a-graph-of-build-times-using-the-go-api/ Sat, 13 Aug 2011 14:52:25 +0000 https://www.markhneedham.com/blog/2011/08/13/node-js-building-a-graph-of-build-times-using-the-go-api/ I’ve been playing around with node.js again and one thing that I wanted to do was take a CSV file generated by the Go API and extract the build times so that we could display it on a graph. Since I don’t have a Go instance on my machine I created a URL in my node application which would mimic the API and return a CSV file. I’m using the express web framework to take care of some of the plumbing: Scala: Do modifiers on functions really matter? https://www.markhneedham.com/blog/2011/08/13/scala-do-modifiers-on-functions-really-matter/ Sat, 13 Aug 2011 02:10:53 +0000 https://www.markhneedham.com/blog/2011/08/13/scala-do-modifiers-on-functions-really-matter/ A couple of colleagues and I were having an interesting discussion this afternoon about the visibility of functions which are mixed into an object from a trait. The trait in question looks like this: trait Formatting { def formatBytes(bytes: Long): Long = { math.round(bytes.toDouble / 1024) } } And is mixed into various objects which need to display the size of a file in kB like this: class SomeObject extends Formatting { } By mixing that function into SomeObject any of the clients of SomeObject would now to be able to call that function and transform a bytes value of their own! Scala, WebDriver and the Page Object Pattern https://www.markhneedham.com/blog/2011/08/09/scala-webdriver-and-the-page-object-pattern/ Tue, 09 Aug 2011 00:54:23 +0000 https://www.markhneedham.com/blog/2011/08/09/scala-webdriver-and-the-page-object-pattern/ We’re using WebDriver on my project to automate our functional tests and as a result are using the Page Object pattern to encapsulate each page of the application in our tests. We’ve been trying to work out how to effectively reuse code since some of the pages have parts of them which work exactly the same as another page. For example we had a test similar to this… class FooPageTests extends Spec with ShouldMatchers with FooPageSteps { it("is my dummy test") { . Clojure: Getting caught out by lazy collections https://www.markhneedham.com/blog/2011/07/31/clojure-getting-caught-out-by-lazy-collections/ Sun, 31 Jul 2011 21:40:35 +0000 https://www.markhneedham.com/blog/2011/07/31/clojure-getting-caught-out-by-lazy-collections/ Most of the work that I’ve done with Clojure has involved running a bunch of functions directly in the REPL or through Leiningen’s run target which led to me getting caught out when I created a JAR and tried to run that. As I mentioned a few weeks ago I’ve been rewriting part of our system in Clojure to see how the design would differ and a couple of levels down the Clojure version comprises of applying a map function over a collection of documents. Performance tuning our data import: Gather precise data https://www.markhneedham.com/blog/2011/07/29/performance-tuning-our-data-import-gather-precise-data/ Fri, 29 Jul 2011 01:34:04 +0000 https://www.markhneedham.com/blog/2011/07/29/performance-tuning-our-data-import-gather-precise-data/ One of the interesting problems that we have to solve on my current project is working out how to import a few million XML documents into our database in a reasonable amount of time. The stages of the import process are as follows: Extract a bunch of ZIP files to the disc Processing only the XML documents... Load the XML document and determine whether the document is valid to import Unix: Summing the total time from a log file https://www.markhneedham.com/blog/2011/07/27/unix-summing-the-total-time-from-a-log-file/ Wed, 27 Jul 2011 23:02:33 +0000 https://www.markhneedham.com/blog/2011/07/27/unix-summing-the-total-time-from-a-log-file/ As I mentioned in my last post we’ve been doing some profiling of a data ingestion job and as a result have been putting some logging into our code to try and work out where we need to work on. We end up with a log file peppered with different statements which looks a bit like the following: 18:50:08.086 [akka:event-driven:dispatcher:global-5] DEBUG - Imported document. /Users/mneedham/foo.xml in: 1298 18:50:09.064 [akka:event-driven:dispatcher:global-1] DEBUG - Imported document. A crude way of telling if a remote machine is a VM https://www.markhneedham.com/blog/2011/07/27/a-crude-way-of-telling-if-a-remote-machine-is-a-vm/ Wed, 27 Jul 2011 22:31:20 +0000 https://www.markhneedham.com/blog/2011/07/27/a-crude-way-of-telling-if-a-remote-machine-is-a-vm/ We were doing a bit of profiling of a data importing process we’ve been running across various environments and wanted to check whether or not one of the environments was a physical machine or a VM. A bit of googling first led me to the following site where you can fill a MAC address and it will tell you which vendor it belongs to. macvendorlookup.com is even better though because it’s more easily scriptable! Scala: Prettifying test builders with package object https://www.markhneedham.com/blog/2011/07/26/scala-prettifying-test-builders-with-package-object/ Tue, 26 Jul 2011 22:31:58 +0000 https://www.markhneedham.com/blog/2011/07/26/scala-prettifying-test-builders-with-package-object/ We have several different test builders in our code base which look roughly like this: case class FooBuilder(bar : String, baz : String) { def build = new Foo(bar, baz) } In our tests we originally used them like this: class FooPageTest extends Specs with ShouldMatchers { it("should let us load a foo") { when(databaseHas(FooBuilder(bar = "Bar", baz = "Bazz"))) // and so on... } } This works well but we wanted our tests to only contain domain language and no implementation details. Retrospectives: The 4 L's Retrospective https://www.markhneedham.com/blog/2011/07/25/retrospectives-the-4-ls-retrospective/ Mon, 25 Jul 2011 21:00:30 +0000 https://www.markhneedham.com/blog/2011/07/25/retrospectives-the-4-ls-retrospective/ I facilitated the latest retrospective my team had last week and decided to try The 4 L’s technique which I’d come across while browsing the 'retrospectives' tag on del.icio.us. We had 4 posters around the room representing each of the L’s: Liked Learned Lacked Longed for I’m not really a fan of the majority of a retrospective being dominated by a full group discussion as many people aren’t comfortable giving their opinions to that many people and therefore end up not participating at all. Scala: Making it easier to abstract code https://www.markhneedham.com/blog/2011/07/23/scala-making-it-easier-to-abstract-code/ Sat, 23 Jul 2011 12:05:05 +0000 https://www.markhneedham.com/blog/2011/07/23/scala-making-it-easier-to-abstract-code/ A couple of months ago I attended Michael Feathers' 'Brutal Refactoring' workshop at XP 2011 where he opined that developers generally do the easiest thing when it comes to code bases. More often than not this means adding to an existing method or existing class rather than finding the correct place to put the behaviour that they want to add. Something interesting that I’ve noticed so far on the project I’m working on is that so far we haven’t been seeing the same trend. Scala: Companion Objects https://www.markhneedham.com/blog/2011/07/23/scala-companion-objects/ Sat, 23 Jul 2011 11:57:44 +0000 https://www.markhneedham.com/blog/2011/07/23/scala-companion-objects/ One of the language features available to us in Scala which I think is having a big impact in helping us to make our code base easier to follow is the companion object. We’ve been using companion objects quite liberally in our code base to define factory methods for our classes. As I mentioned in a previous post a lot of our objects are acting as wrappers around XML documents and we’ve been pushing some of the data extraction from the XML into companion objects so that our classes can take in non XML values. Clojure: Creating XML document with namespaces https://www.markhneedham.com/blog/2011/07/20/clojure-creating-xml-document-with-namespaces/ Wed, 20 Jul 2011 20:28:17 +0000 https://www.markhneedham.com/blog/2011/07/20/clojure-creating-xml-document-with-namespaces/ As I mentioned in an earlier post we’ve been parsing XML documents with the Clojure zip-filter API and the next thing we needed to do was create a new XML document containing elements which needed to be inside a namespace. We wanted to end up with a document which looked something like this: <root> <mynamespace:foo xmlns:mynamespace="http://www.magicalurlfornamespace.com"> <mynamespace:bar>baz</mynamespace:bar> </mynamespace:foo> </root> We can make use of lazy-xml/emit to output an XML string from some sort of input? Scala: Rolling with implicit https://www.markhneedham.com/blog/2011/07/19/scala-rolling-with-implicit/ Tue, 19 Jul 2011 06:39:44 +0000 https://www.markhneedham.com/blog/2011/07/19/scala-rolling-with-implicit/ We’ve been coding in Scala on my project for around 6 weeks now and are getting to the stage where we’re probably becoming a big dangerous with our desire to try out some of the language features. One that we’re trying out at the moment is the implicit key word which allows you to pass arguments to objects and methods without explicitly defining them in the parameter list. The website we’re working on needs to be accessible in multiple languages and therefore we need to be able to translate some words before they get displayed on the page. Emacs: Re-mapping the Control and Meta Keys on Mac OS X https://www.markhneedham.com/blog/2011/07/17/emacs-re-mapping-the-control-and-meta-keys-on-mac-os-x/ Sun, 17 Jul 2011 10:24:13 +0000 https://www.markhneedham.com/blog/2011/07/17/emacs-re-mapping-the-control-and-meta-keys-on-mac-os-x/ Since I’ve started playing around with Clojure again I thought it’d make sense to use emacs as my editor and therefore needed to work out how to remap the Ctrl and Meta to keys which are more accessible on the MBP’s keyboard. I’ve found that I like using the Caps Lock for Ctrl and that’s reasonably easy to change by navigating to 'System Preferences > Keyboard > Modifier Keys': Clojure: Extracting child elements from an XML document with zip-filter https://www.markhneedham.com/blog/2011/07/16/clojure-extracting-child-elements-from-an-xml-document-with-zip-filter/ Sat, 16 Jul 2011 22:19:47 +0000 https://www.markhneedham.com/blog/2011/07/16/clojure-extracting-child-elements-from-an-xml-document-with-zip-filter/ I’ve been following Nurullah Akkaya’s blog post about navigating XML documents using the Clojure zip-filter API and I came across an interesting problem in a document I’m parsing which goes beyond what’s covered in his post. Nurullah provides a neat zip-str function which we can use to convert an XML string into a zipper object: (require '[clojure.zip :as zip] '[clojure.xml :as xml]) (use '[clojure.contrib.zip-filter.xml]) (defn zip-str [s] (zip/xml-zip (xml/parse (java. Scala: An attempt to eradicate the if https://www.markhneedham.com/blog/2011/07/12/scala-an-attempt-to-eradicate-the-if/ Tue, 12 Jul 2011 22:50:40 +0000 https://www.markhneedham.com/blog/2011/07/12/scala-an-attempt-to-eradicate-the-if/ In a previous post I included a code sample where we were formatting a page range differently depending on whether the start page and end pages were the same. The code looked like this: trait PageAware { def startPage:String def endPage:String def pageRange = if(firstPage == lastPage) "page %s".format(firstPage) else "pages %s-%s".format(firstPage, lastPage) } Looking at the if statement on the last line we were curious whether it would be possible to get rid of it and replace it with something else. Scala: Pattern matching a pair inside map/filter https://www.markhneedham.com/blog/2011/07/12/scala-pattern-matching-a-pair-inside-mapfilter/ Tue, 12 Jul 2011 22:42:49 +0000 https://www.markhneedham.com/blog/2011/07/12/scala-pattern-matching-a-pair-inside-mapfilter/ More than a few times recently we’ve wanted to use pattern matching on a collection of pairs/tuples and have run into trouble doing so. It’s easy enough if you don’t try and pattern match: > List(("Mark", 4), ("Charles", 5)).filter(pair => pair._2 == 4) res6: List[(java.lang.String, Int)] = List((Mark,4)) But if we try to use pattern matching: List(("Mark", 4), ("Charles", 5)).filter(case(name, number) => number == 4) We end up with this error: Clojure: Language as thought shaper https://www.markhneedham.com/blog/2011/07/10/clojure-language-as-thought-shaper/ Sun, 10 Jul 2011 22:21:16 +0000 https://www.markhneedham.com/blog/2011/07/10/clojure-language-as-thought-shaper/ I recently read an interesting article by Tom Van Cutsem where he describes some of the goals that influence the design of programming languages and one which stood out to me is that of viewing 'language as a thought shaper': Language as thought shaper: to induce a paradigm shift in how one should structure software (changing the "path of least resistance"). To quote Alan Perlis: "a language that doesn’t affect the way you think about programming, is not worth knowing. Scala: Traits galore https://www.markhneedham.com/blog/2011/07/09/scala-traits-galore/ Sat, 09 Jul 2011 19:54:05 +0000 https://www.markhneedham.com/blog/2011/07/09/scala-traits-galore/ We recently came across a problem where we had some logic that we wanted to be used by two classes. Our original thought was to pull it up into an abstract class which ended up looking like this: abstract class SomeArbitraryClass(root:xml.Node) { def unrelatedField1:String def unrelatedField2:String def startPage:String def endPage:String def pageRange = if(firstPage == lastPage) "page %s".format(firstPage) else "pages %s-%s".format(firstPage, lastPage) } Writing a test link to scala test for the page logic helped us to see more clearly that the design was a bit awkward: Scala: Martin Odersky's Object-oriented meets functional: An exploration of Scala https://www.markhneedham.com/blog/2011/07/05/scala-martin-oderskys-object-oriented-meets-functional-an-exploration-of-scala/ Tue, 05 Jul 2011 05:02:28 +0000 https://www.markhneedham.com/blog/2011/07/05/scala-martin-oderskys-object-oriented-meets-functional-an-exploration-of-scala/ My colleague Charles and I attended Martin Odersky’s 'Object-oriented meets functional: An exploration of Scala' two day Scala workshop hosted by Skills Matter at the end of last week. It was run by Iulian Dragos who wrote his Phd thesis on how to improve the performance of the Scala compiler. The course was a bit adapted from the original in that it came at Scala more from an application developer’s point of view rather than that of a language geek. Clojure: Equivalent to Scala's flatMap/C#'s SelectMany https://www.markhneedham.com/blog/2011/07/03/clojure-equivalent-to-scalas-flatmapcs-selectmany/ Sun, 03 Jul 2011 22:50:47 +0000 https://www.markhneedham.com/blog/2011/07/03/clojure-equivalent-to-scalas-flatmapcs-selectmany/ I’ve been playing around with Clojure a bit over the weekend and one thing I got stuck with was working out how to achieve the functionality provided by Scala’s flatMap or C#'s SelectMany methods on collections. I had a collection of zip files and wanted to transform that into a collection of all the file entries in those files. If we just use map then we’ll end up with a collection of collections which is more difficult to deal with going forward. Git: Deleting a remote branch on a gitolite configured repository https://www.markhneedham.com/blog/2011/06/28/git-deleting-a-remote-branch-on-a-gitolite-configured-repository/ Tue, 28 Jun 2011 22:09:18 +0000 https://www.markhneedham.com/blog/2011/06/28/git-deleting-a-remote-branch-on-a-gitolite-configured-repository/ We’ve had an xsbt branch on our gitolite powered repository for the last couple of weeks while we worked out how to move our build from sbt 0.7 to sbt 0.10 but having finally done that we needed to delete it. I originally tried running the following command from one of our developer workstations: git push origin :xsbt But ended up with the following error: remote: error: denying ref deletion for regs/head/xsbt ! Scala: Self type annotations and structured types https://www.markhneedham.com/blog/2011/06/27/scala-self-type-annotations-and-structured-types/ Mon, 27 Jun 2011 23:21:56 +0000 https://www.markhneedham.com/blog/2011/06/27/scala-self-type-annotations-and-structured-types/ A few days ago I tweeted that I didn’t really see the point in structured types in Scala… Not sure I understand where you would use structural types in #scala instead of defining a method on a trait http://bit.ly/jgiW7b …but today my colleague Uday came up with a cool way of combining self type annotations with structured types inside a trait we defined. We had some code duplicated across two classes which looked roughly like this: Bounded Rationality https://www.markhneedham.com/blog/2011/06/26/bounded-rationality/ Sun, 26 Jun 2011 17:05:07 +0000 https://www.markhneedham.com/blog/2011/06/26/bounded-rationality/ In Thinking In Systems: A Primer' one of the most interesting ideas that Donella Meadows describes is what Herbert Simon coined bounded rationality: Bounded rationality means that people make quite reasonable decisions based on the information they have. But they don’t have perfect information, especially about more distant parts of the system Later on in the chapter the following idea is suggested: If you become a manager, you probably will stop seeing labour as a deserving partner in production, and start seeing it as a cost to be minimised. Coding: Light weight wrapper vs serialisation/deserialisation https://www.markhneedham.com/blog/2011/06/26/coding-light-weight-wrapper-vs-serialisationdeserialisation/ Sun, 26 Jun 2011 13:58:10 +0000 https://www.markhneedham.com/blog/2011/06/26/coding-light-weight-wrapper-vs-serialisationdeserialisation/ As I’ve mentioned before, we’re making use of a MarkLogic database on the project I’m working on which means that we’re getting quite big XML data structures coming into our application whenever we execute a query. The normal way that I’ve seen for dealing with external systems would be to create an anti corruption layer where we initialise objects in our system with the required data from the external system. Tech Leading: Keeping the passion https://www.markhneedham.com/blog/2011/06/22/tech-leading-keeping-the-passion/ Wed, 22 Jun 2011 23:48:08 +0000 https://www.markhneedham.com/blog/2011/06/22/tech-leading-keeping-the-passion/ As I mentioned a couple of months ago, while I was in India I was acting as the Tech Lead on the project the TWU grads were working on and one thing I learnt from doing that is the importance of trying to keep the passion of the developers on the team. When we started off I was more focused on trying to encourage the team to try and develop as many of the stories as possible. Scala: val, lazy val and def https://www.markhneedham.com/blog/2011/06/22/scala-val-lazy-val-and-def/ Wed, 22 Jun 2011 23:04:44 +0000 https://www.markhneedham.com/blog/2011/06/22/scala-val-lazy-val-and-def/ We have a variety of val, </cite>lazy val</cite> and def definitions across our code base but have been led to believe that idiomatic Scala would have us using lazy val as frequently as possible. As far as I understand so far this is what the different things do: val evaluates as soon as you initialise the object and stores the result. lazy val evaluates the first time that it’s accessed and stores the result. Scala/Mustache: Creating a comma separated list https://www.markhneedham.com/blog/2011/06/22/scalamustache-creating-a-comma-separated-list/ Wed, 22 Jun 2011 21:24:06 +0000 https://www.markhneedham.com/blog/2011/06/22/scalamustache-creating-a-comma-separated-list/ We’re using the Mustache templating engine on my project at the moment and one thing that we wanted to do was build a comma separated list. Mustache is designed so that you pretty much can’t do any logic in the template which made it really difficult to do what we wanted. It’s easy enough to get a comma after each item in a list with something like the following code: MarkLogic: Customising a result set https://www.markhneedham.com/blog/2011/06/20/marklogic-customising-a-result-set/ Mon, 20 Jun 2011 22:36:48 +0000 https://www.markhneedham.com/blog/2011/06/20/marklogic-customising-a-result-set/ One of the stories we worked on last week had us needing to be able to customise the output of a MarkLogic search query to include some elements which aren’t included in the default view. We started off with this: search.xqy xquery version "1.0-ml"; import module namespace search = "http://marklogic.com/appservices/search" at "/MarkLogic/appservices/search/search.xqy"; declare variable $term as xs:string := xdmp:get-request-field("query", ""); search:search($term) Which gives us back a list of results showing where in the documents the search term appeared. Chef, Fedora and 'ArgumentError: Attribute domain is not defined!' https://www.markhneedham.com/blog/2011/06/18/chef-fedora-and-argumenterror-attribute-domain-is-not-defined/ Sat, 18 Jun 2011 18:45:29 +0000 https://www.markhneedham.com/blog/2011/06/18/chef-fedora-and-argumenterror-attribute-domain-is-not-defined/ I’ve been playing around with Chef Solo on Fedora and executing the following: sudo chef-solo -c config/solo.rb -j config/node.json (where node.json just contains the example code from the resolver example on the Chef documentation page and the cookbooks folder contains all the opscode cookbooks.) leads to the following error: ... ERROR: Running exception handlers ERROR: Exception handlers complete FATAL: Stacktrace dumped to /home/mark/chef-solo/chef-stacktrace.out FATAL: ArgumentError: Attribute domain is not defined! MarkLogic: Deleting all the documents in a database https://www.markhneedham.com/blog/2011/06/18/marklogic-deleting-all-the-documents-in-a-database/ Sat, 18 Jun 2011 16:08:47 +0000 https://www.markhneedham.com/blog/2011/06/18/marklogic-deleting-all-the-documents-in-a-database/ We’re using the MarkLogic database on my current project and something that we wanted to do recently was delete all the documents as part of a deployment script. Getting all of the documents is reasonably easy - we just need to make a call to the doc() function. We can then iterate through the documents like so: for $doc in doc() return $doc We wanted to make use of the http://docs. Fedora: Recovering from the IntelliJ 'Ctrl-Alt-F7' https://www.markhneedham.com/blog/2011/06/16/fedora-recovering-from-the-intellij-ctrl-alt-f7/ Thu, 16 Jun 2011 07:27:15 +0000 https://www.markhneedham.com/blog/2011/06/16/fedora-recovering-from-the-intellij-ctrl-alt-f7/ We’re using Fedora on our local developer work stations and some of the default key bindings of the operating system seem to conflict with ones provided by IntelliJ IDEA. One particular amusing one is 'Ctrl-Alt-F7' which you use in IntelliJ to see the usages of a piece of code. In Fedora that seems to switch into a different X Server session and you just see a blank screen with seemingly no way out! Parkinson's Law and Iteration Zero https://www.markhneedham.com/blog/2011/06/13/parkinsons-law-and-iteration-zero/ Mon, 13 Jun 2011 23:02:57 +0000 https://www.markhneedham.com/blog/2011/06/13/parkinsons-law-and-iteration-zero/ I’ve been thinking a bit about Parkinson’s Law recently and its' applicability in software development. Parkinson’s law is defined as follows: Parkinson’s Law is the adage first articulated by Cyril Northcote Parkinson as the first sentence of a humorous essay published in The Economist in 1955: “Work expands so as to fill the time available for its completion” My colleagues quite frequently reference this law with respect to stories taking the amount of time that reflects the story point estimate assigned to them. Scala: Setting a default value https://www.markhneedham.com/blog/2011/06/12/scala-setting-a-default-value/ Sun, 12 Jun 2011 16:03:30 +0000 https://www.markhneedham.com/blog/2011/06/12/scala-setting-a-default-value/ We wanted to try and generate a build label to use for the name of the artifacts archive that gets generated each time we run the build but wanted to default it to a hard coded value if the system property representing the build label wasn’t available. In Ruby we would be able to do something like this: buildLabel = ENV["GO_PIPELINE_LABEL"] || "LOCAL" There isn’t a function in Scala that does that so we initially ended up with this: Sbt: Rolling with continuous/incremental compilation and Jetty https://www.markhneedham.com/blog/2011/06/10/sbt-rolling-with-continuousincremental-compilation-and-jetty/ Fri, 10 Jun 2011 00:16:05 +0000 https://www.markhneedham.com/blog/2011/06/10/sbt-rolling-with-continuousincremental-compilation-and-jetty/ As I mentioned in an earlier post we’re using SBT on our project and one of it’s cool features is that it will listen to the source directory and then automatically recompile the code when it detects file changes. We’ve also installed the sbt-jetty-embed plugin which allows us to create a war which has Jetty embedded so that we can keep our application containerless. That plugin adds an action called 'jetty' to sbt so we (foolishly in hindsight) thought that we would be able to launch the application in triggered execution mode by making use of a ~ in front of that: IntelliJ: Adding resources with unusual extensions onto the classpath https://www.markhneedham.com/blog/2011/06/09/intellij-adding-resources-with-unusual-extensions-onto-the-classpath/ Thu, 09 Jun 2011 23:10:23 +0000 https://www.markhneedham.com/blog/2011/06/09/intellij-adding-resources-with-unusual-extensions-onto-the-classpath/ We’re making use of MarkLogic and therefore xquery on the project I’m currently working on and recently wanted to add our xquery setup files onto the classpath so they could be used in a test. We added them into 'src/main/resources' and set that as a source path in IntelliJ assuming that was all we needed to do. Despite doing that our test kept failing because it couldn’t locate the files on the classpath. Sbt: Zipping files without their directory structure https://www.markhneedham.com/blog/2011/06/04/sbt-zipping-files-without-their-directory-structure/ Sat, 04 Jun 2011 17:24:16 +0000 https://www.markhneedham.com/blog/2011/06/04/sbt-zipping-files-without-their-directory-structure/ We’re using SBT on our project and Pat and I have been trying to work out how to zip together some artifacts so that they’re all available from the top level of the zip file i.e. we don’t want to copy the directory structure where the files come from. I’ve been playing around with this in the Scala REPL which we can launch with our project’s dependencies loaded with the following command: Developer Experience (#devexp) and the 5 minute experience https://www.markhneedham.com/blog/2011/05/31/developer-experience-devexp-and-the-5-minute-experience/ Tue, 31 May 2011 21:29:18 +0000 https://www.markhneedham.com/blog/2011/05/31/developer-experience-devexp-and-the-5-minute-experience/ My former colleague Ade Oshineye recently linked me to a post he’s written about Developer Experience (#devexp) which is described as: [...] an aspirational movement that seeks to apply the techniques of User Experience (UX) professionals to the tools and services that we offer to developers." I think it’s quite an interesting idea and I particularly like two of the ideas suggested: 2. Focus on the '5 minute Out Of Box experience' The idea here is that if you provide a library, developers should be able to go from downloading to "Hello World" in 5 minutes. XP 2011: How complex is software? https://www.markhneedham.com/blog/2011/05/19/xp-2011-how-complex-is-software/ Thu, 19 May 2011 09:44:09 +0000 https://www.markhneedham.com/blog/2011/05/19/xp-2011-how-complex-is-software/ The last session I attended at XP 2011 was a workshop run by John Mcfadyen where he introduced us to Dave Snowden’s Cynefin model, which is a model used to describe problems, situations and systems. I’d come across the model previously and it had been all over my twitter stream a couple of weeks ago as a result of Dave Snowden giving a key note at the Lean Systems and Software conference. In what world does that make sense https://www.markhneedham.com/blog/2011/05/14/in-what-world-does-that-make-sense/ Sat, 14 May 2011 21:12:31 +0000 https://www.markhneedham.com/blog/2011/05/14/in-what-world-does-that-make-sense/ In her keynote at XP 2011 Esther Derby encouraged us to ask the question "in what world does that make sense?" whenever we encounter something which we consider to be stupid or ridiculous. I didn’t think much of it at the time but my colleague Pat Kua has been asking me the question whenever I’ve been describing something that I find confusing to him. After about the third time I noticed that its quite a nice tool for getting us to reflect on the systems and feedback loops that may be encouraging the behaviour witnessed. System Traps: Rule Beating https://www.markhneedham.com/blog/2011/05/14/system-traps-rule-beating/ Sat, 14 May 2011 21:02:54 +0000 https://www.markhneedham.com/blog/2011/05/14/system-traps-rule-beating/ In 'Thinking In Systems' section five focuses on systems which produce "truly problematic behaviour" and one of these so called system traps is known as 'rule beating'. Rule beating occurs when the agents in a system take evasive action to get around the intent of rules in a system: The letter of the law is met, the spirit of the law is not. A common system where we see this in organisations is around training budgets. XP 2011: Esther Derby - Still no silver bullets https://www.markhneedham.com/blog/2011/05/13/xp-2011-esther-derby-still-no-silver-bullets/ Fri, 13 May 2011 12:26:37 +0000 https://www.markhneedham.com/blog/2011/05/13/xp-2011-esther-derby-still-no-silver-bullets/ The first keynote at XP 2011 was one given by Esther Derby titled 'Still no silver bullets' where she talked about some of the reasons why agile adoption seems to work in the small but often fails in the large. Esther quoted Donella Meadows, the author of 'Thinking in Systems', a few times which was an interesting coincidence for me as I’m currently reading her book. One of the first quotes from that book was the following: XP 2011: Michael Feathers - Brutal Refactoring https://www.markhneedham.com/blog/2011/05/11/xp-2011-michael-feathers-brutal-refactoring/ Wed, 11 May 2011 13:35:42 +0000 https://www.markhneedham.com/blog/2011/05/11/xp-2011-michael-feathers-brutal-refactoring/ The second session that I attended at XP 2011 was Michael Feathers' tutorial 'Brutal Refactoring' where he talked through some of the things that he’s learned since he finished writing 'Working Effectively With Legacy Code'. I’ve found some of Michael’s recent blog posts about analysing the data in our code repositories quite interesting to read and part of this tutorial was based on the research he’s done in that area. Feedback: In public https://www.markhneedham.com/blog/2011/05/11/feedback-in-public/ Wed, 11 May 2011 12:12:00 +0000 https://www.markhneedham.com/blog/2011/05/11/feedback-in-public/ One of the areas that I covered during a session I ran at XP 2011 on making feedback work in teams was the idea of giving feedback in public. The general consensus seems to be that giving feedback in public isn’t a good idea and it’d much more effective to give that feedback privately. I think this is a good rule of thumb and my observations are that feedback given in public tends to not be given in a very constructive manner and therefore leads to a defensive response from the recipient. XP 2011: J.B. Rainsberger - A Simple Approach to Modular Design https://www.markhneedham.com/blog/2011/05/11/xp-2011-j-b-rainsberger-a-simple-approach-to-modular-design/ Wed, 11 May 2011 12:11:46 +0000 https://www.markhneedham.com/blog/2011/05/11/xp-2011-j-b-rainsberger-a-simple-approach-to-modular-design/ After finishing my own session at XP 2011 I attended the second half of J.B. Rainsberger’s tutorial on modular design. For most of the time that I was there he drove out the design for a point of sale system in Java while showing how architectural patterns can emerge in the code just by focusing on improving names and removing duplication. The second half of the session was much more interesting to watch as this was when J. Discussing the Undiscussable: Book Review https://www.markhneedham.com/blog/2011/05/07/discussing-the-undiscussable-book-review/ Sat, 07 May 2011 00:45:42 +0000 https://www.markhneedham.com/blog/2011/05/07/discussing-the-undiscussable-book-review/ I came across the work of Chris Argyris at the start of the year and in a twitter conversation with Benjamin Mitchell he suggested that Bill Noonan’s 'Discussing the Undiscussable' was the most accessible text for someone new to the subject. In the book Noonan runs through a series of different tools that Chris Argyris originally came up with for helping people to handle difficult conversational situations more effectively. Feedback Loops: Human Decisions https://www.markhneedham.com/blog/2011/05/05/feedback-loops-human-decisions/ Thu, 05 May 2011 18:04:56 +0000 https://www.markhneedham.com/blog/2011/05/05/feedback-loops-human-decisions/ I’ve been reading Donella Meadows' 'Thinking In Systems: A Primer', an introductory text on systems thinking, and after 30 pages or so the author poses the following challenge: Sometimes I challenge my students to try to think of any human decision that occurs without a feedback loop - that is, a decision that is made without regard to any information about the level of stock that it influences Meadows has quite a nice way of guiding us to thinking about systems by referring to 'stocks' and 'flows'. ThoughtWorks University: Retrospective Coherence https://www.markhneedham.com/blog/2011/05/01/thoughtworks-university-retrospective-coherence/ Sun, 01 May 2011 11:25:37 +0000 https://www.markhneedham.com/blog/2011/05/01/thoughtworks-university-retrospective-coherence/ I recently came across Joseph Pelrine’s blog post where he describes the way that you might go about organising a great party. He describes a party that a friend of his hosted and all the things which contributed to it being great, such as the people you invite, the music that is played, the food and drink that are served and the conversations that are had. If you then wanted to replicate a 'great party' you might think that you could just replay his friend’s party, with the same guests, same music, a script of the conversations had and so on. ThoughtWorks University: v2.0 vs v1.0 https://www.markhneedham.com/blog/2011/04/27/thoughtworks-university-v2-0-vs-v1-0/ Wed, 27 Apr 2011 12:33:56 +0000 https://www.markhneedham.com/blog/2011/04/27/thoughtworks-university-v2-0-vs-v1-0/ Since we finished the most recent ThoughtWorks University session last week a few people have been asking me how the experience was and I’ve found myself comparing this experience to my own as an attendee in August 2006. Back then ThoughtWorks University was much different. We had 5 weeks of workshop style sessions and then spent the last week working on an internal application. This time we spent 1 week doing the workshop style sessions, 1 week working together on a story and then 4 weeks working on the application. The ladder of inference https://www.markhneedham.com/blog/2011/04/24/the-ladder-of-inference/ Sun, 24 Apr 2011 14:11:36 +0000 https://www.markhneedham.com/blog/2011/04/24/the-ladder-of-inference/ In Discussing the Undiscussable William Noonan describes the ladder of inference, a tool which can be used to help us achieve double loop learning with respect to our interactions with other people. Ladder of Inference helps people identify what information or facts are used as the basis for their reasoning process. It also helps people understand how they interpret that information and how they apply their interpretation to the issue or problem at hand. ThoughtWorks University: Things people found difficult https://www.markhneedham.com/blog/2011/04/23/thoughtworks-university-things-people-found-difficult/ Sat, 23 Apr 2011 19:47:01 +0000 https://www.markhneedham.com/blog/2011/04/23/thoughtworks-university-things-people-found-difficult/ After six weeks ThoughtWorks University #21 finished on Thursday so I thought it’d be interesting to summarise some of the things that people seemed to find difficult over the course of TWU. The stack trace We were using Java for the duration of TWU and as a result there were plenty of stack traces for people to debug. These were most frequently related to incorrect wiring of Spring components but there were other reasons too. The sunk cost fallacy https://www.markhneedham.com/blog/2011/04/17/the-sunk-cost-fallacy/ Sun, 17 Apr 2011 12:05:12 +0000 https://www.markhneedham.com/blog/2011/04/17/the-sunk-cost-fallacy/ I recently came across David McRaney’s post about the sunk cost fallacy with reference to Farmville, a fallacy that is very applicable to software. David starts off with the following statements which describe the fallacy pretty well: The Misconception: You make rational decisions based on the future value of objects, investments and experiences. The Truth: Your decisions are tainted by the emotional investments you accumulate, and the more you invest in something the harder it becomes to abandon it. Tech Leading: Initial Thoughts https://www.markhneedham.com/blog/2011/04/17/tech-leading-initial-thoughts/ Sun, 17 Apr 2011 11:27:31 +0000 https://www.markhneedham.com/blog/2011/04/17/tech-leading-initial-thoughts/ As I mentioned in an earlier post I’ve been playing the role of tech lead on the project that we’ve been doing at ThoughtWorks University so I thought it’d be interesting to note down some of my observations so far. Out of the tech leads that I’ve had I liked the style of Dave Cameron the best. He viewed himself more as a technical facilitator rather than as a person who should make every single decision about how a system got built which meant that others also got a chance to take some responsibility. ThoughtWorks University: "It's your project" https://www.markhneedham.com/blog/2011/04/13/thoughtworks-university-its-your-project/ Wed, 13 Apr 2011 20:28:34 +0000 https://www.markhneedham.com/blog/2011/04/13/thoughtworks-university-its-your-project/ One of the things that we’ve struggled with at ThoughtWorks University is giving the attendees the opportunity to run the project that we’ve been working on. The first few weeks were the most frustrating both for the trainers and for the attendees because we spent a lot of time telling the attendees that it was their project but then didn’t display behaviour consistent with that message. From my observations this happened because the role of the trainers was defined as 'senior team member' which meant that if a trainer saw something going wrong they’d try and fix it since that’s what they’d do in a normal team. Feedback: Easing in https://www.markhneedham.com/blog/2011/04/13/feedback-easing-in/ Wed, 13 Apr 2011 02:33:52 +0000 https://www.markhneedham.com/blog/2011/04/13/feedback-easing-in/ One of the most common techniques of feedback which I’ve come across is one that William Noonan describes in 'Discussing the Undiscussable' as easing in. From the book: Easing in is a skilful strategy whereby I try to get the other person to come round to my point of view without my stating it directly. From my experience we’ll try to do this because giving our point of view could lead to an awkward conversation so we’d rather they express our opinion for us instead. HTML encoding/escaping with StringTemplate and Spring MVC https://www.markhneedham.com/blog/2011/04/09/html-encodingescaping-with-stringtemplate-and-spring-mvc/ Sat, 09 Apr 2011 10:54:59 +0000 https://www.markhneedham.com/blog/2011/04/09/html-encodingescaping-with-stringtemplate-and-spring-mvc/ Last week my colleague T.C. and I had to work out how to HTML encode the values entered by the user when redisplaying those onto the page to prevent a cross site scripting attack on the website. I wrote a blog post a couple of years ago describing how to do this in ASP.NET MVC and the general idea is that we need to have a custom renderer which HTML encodes any strings that pass through it. ThoughtWorks University: Similarities with 'Discussing the Undiscussable' https://www.markhneedham.com/blog/2011/04/09/thoughtworks-university-similarities-with-discussing-the-undiscussable/ Sat, 09 Apr 2011 10:38:47 +0000 https://www.markhneedham.com/blog/2011/04/09/thoughtworks-university-similarities-with-discussing-the-undiscussable/ I’m currently reading the final chapter of William Noonan’s Discussing the Undiscussable titled 'Helping Those Who Teach, Learn' and a couple of the ideas that he describes seem quite applicable to what we’re doing at ThoughtWorks University. Modelling the skills When teaching the Mutual Learning Model Noonan suggests that the practitioner needs to be able to produce actions consistent with the model in real time situations rather than just being able to do convincing presentations on the subject. Unix: Getting the sound from 'say' as a wav file https://www.markhneedham.com/blog/2011/04/07/unix-getting-the-sound-from-say-as-a-wav-file/ Thu, 07 Apr 2011 19:18:04 +0000 https://www.markhneedham.com/blog/2011/04/07/unix-getting-the-sound-from-say-as-a-wav-file/ I spent a bit of time yesterday afternoon working out how to get the output from the Unix command 'say' to be played whenever our build breaks. We’re using cctray on a Windows box for that purpose which means that we need to have the file in the 'wav' format. Unfortunately 'say' doesn’t seem to be able to output a file in that format: > say "WARNING! Drainage has occurred, please fix it. ThoughtWorks University: Letting people explore https://www.markhneedham.com/blog/2011/04/06/thoughtworks-university-letting-people-explore/ Wed, 06 Apr 2011 01:34:00 +0000 https://www.markhneedham.com/blog/2011/04/06/thoughtworks-university-letting-people-explore/ I’ve been acting as the tech lead on the project that we’re working on at ThoughtWorks University and as a result I sometimes find myself being dragged away from my pair to help someone else. An interesting thing which I’ve noticed on more than one occasion is that when I’ve come back from helping - maybe 15 or 20 minutes later - my pair has actually got much further than I expected them to. ThoughtWorks University: The coaching/training conflict https://www.markhneedham.com/blog/2011/04/03/thoughtworks-university-the-coachingtraining-conflict/ Sun, 03 Apr 2011 17:11:46 +0000 https://www.markhneedham.com/blog/2011/04/03/thoughtworks-university-the-coachingtraining-conflict/ As I mentioned in an earlier post Sumeet has been encouraging us to act more as coaches rather than trainers during ThoughtWorks University but it’s not quite as easy as it seems. I’ve noticed that there are a few things that contribute to this difficulty. Assessment The biggest obstacle is that by the end of TWU the trainers are required to send a review about each of the grads to the respective Resource Managers describing each person’s current level of skill in various categories. ThoughtWorks University: The use of games https://www.markhneedham.com/blog/2011/03/30/thoughtworks-university-the-use-of-games/ Wed, 30 Mar 2011 20:34:32 +0000 https://www.markhneedham.com/blog/2011/03/30/thoughtworks-university-the-use-of-games/ When I attended ThoughtWorks University in August 2006 we spent quite a bit of time playing games which had been designed to help us to achieve various learning objectives. At the time I didn’t think much about it but now being on the other side as a trainer I’ve started to doubt whether these types of sessions are as useful as I originally thought. I recently came across a blog post Sumeet wrote last year where he talks about effective e-learning environments and I think his point still applies here: ThoughtWorks University: A double loop learning example https://www.markhneedham.com/blog/2011/03/30/thoughtworks-university-a-double-loop-learning-example/ Wed, 30 Mar 2011 19:17:57 +0000 https://www.markhneedham.com/blog/2011/03/30/thoughtworks-university-a-double-loop-learning-example/ One of the most interesting things that I’ve been reading about recently is the idea of single and double loop learning which were defined by Chris Argyris and Donald Schon in their book 'Organizational Learning: A theory of action perspective' in 1978. I quite like the definitions that Mark Smith gives for these types of learning in his article about Chris Argyris: Single Loop Learning Single-loop learning seems to be present when goals, values, frameworks and, to a significant extent, strategies are taken for granted. ThoughtWorks University: Pulling the 'pearls' https://www.markhneedham.com/blog/2011/03/29/thoughtworks-university-pulling-the-pearls/ Tue, 29 Mar 2011 18:32:14 +0000 https://www.markhneedham.com/blog/2011/03/29/thoughtworks-university-pulling-the-pearls/ I recently wrote about the coding dojo style week that we ran at ThoughtWorks University last week and I briefly mentioned that we used break out sessions to cover topics ('the pearls') that people didn’t totally understand. To describe that in more detail what we did to start with was write the name of each of the 90/180 minute sessions on a card and put it on the wall under a 'To Do' heading: The working long hours culture https://www.markhneedham.com/blog/2011/03/29/the-working-long-hours-culture/ Tue, 29 Mar 2011 17:25:20 +0000 https://www.markhneedham.com/blog/2011/03/29/the-working-long-hours-culture/ One of the aspects of software development that I’ve thankfully seen relatively infrequently over the last few years is that of some people in teams working long hours on a consistent basis. I have seen it happen on a few occasions and I think it can have a detrimental effect on a team rather than the good which is presumably intended. The biggest disadvantage is that it makes other people in the team feel guilty that they aren’t working long hours and they may feel peer pressured into matching the hours of their colleagues. ThoughtWorks University: Coding Dojo Style https://www.markhneedham.com/blog/2011/03/29/thoughtworks-university-coding-dojo-style/ Tue, 29 Mar 2011 17:15:48 +0000 https://www.markhneedham.com/blog/2011/03/29/thoughtworks-university-coding-dojo-style/ One of the things that Sumeet has been encouraging at ThoughtWorks University is the idea that the 'trainers' should be in a coaching role rather than a training one. As a result of this suggestion one of the things we’ve done is to change the style of the second week so that it wasn’t full of sessions/workshops but instead involved working on code as a group. Jim came up with the idea of the 'exploded story' whereby we spent the whole of last week as a group working on one story for Sukrupa while spending quite a bit of time exploring the different activities that playing a story end to end would involve. Java: Faking System.in https://www.markhneedham.com/blog/2011/03/24/java-faking-system-in/ Thu, 24 Mar 2011 21:58:31 +0000 https://www.markhneedham.com/blog/2011/03/24/java-faking-system-in/ We ran a refactoring dojo a couple of days ago at ThoughtWorks University and in preparation I wrote some system level tests around the coding problem that we were going to use during the session. It’s a command line application which is called through the main method of 'Program' and since there’s no dependency injection we need to be able to set System.in and System.out in order to do any testing. ThoughtWorks University: Brain dumping https://www.markhneedham.com/blog/2011/03/23/twu-brain-dumping/ Wed, 23 Mar 2011 18:45:59 +0000 https://www.markhneedham.com/blog/2011/03/23/twu-brain-dumping/ One of the things that I’m learning while working at ThoughtWorks University is to bite my tongue a bit to allow people to learn in their own way. I noticed this particularly yesterday in a refactoring session we were doing. For about 10-15 minutes in the middle of the session we’d managed to get the code into a state where it didn’t compile and we couldn’t run the tests. ThoughtWorks University: A refactoring dojo https://www.markhneedham.com/blog/2011/03/22/thoughtworks-university-a-refactoring-dojo/ Tue, 22 Mar 2011 19:10:09 +0000 https://www.markhneedham.com/blog/2011/03/22/thoughtworks-university-a-refactoring-dojo/ I facilitated a refactoring session today at ThoughtWorks University where we spent the morning refactoring our way through one of the problems the grads had to work on as part of the pre coursework. The previous version of this session has been more structured, whereby one of the trainers worked solo at the keyboard and took suggestions from the group about which refactoring to cover next. There are a certain number of refactorings that the session aims to introduce and the trainer would have practiced beforehand so they could make these fairly flawlessly. Retrospectives: Mini Group Discussions https://www.markhneedham.com/blog/2011/03/20/retrospectives-mini-group-discussions/ Sun, 20 Mar 2011 18:36:42 +0000 https://www.markhneedham.com/blog/2011/03/20/retrospectives-mini-group-discussions/ One of the approaches that I like the best in retrospectives is when the facilitator splits the team into smaller groups during the brainstorming part of the retrospective. I decided to try this out in a retrospective we ran after one week of ThoughtWorks University, using The Retrospective Starfish to provide a framework in which people could frame their thoughts. Usually what I’ve seen happen in these mini groups is that everyone will write down their own ideas on stickies and then discuss them as a group but still put up all the stickies even if the group didn’t agree with everything. Confirmation Bias and Loss of Autonomy https://www.markhneedham.com/blog/2011/03/20/confirmation-bias-and-loss-of-autonomy/ Sun, 20 Mar 2011 18:08:35 +0000 https://www.markhneedham.com/blog/2011/03/20/confirmation-bias-and-loss-of-autonomy/ I’ve mentioned confirmation bias in a few of my previous blog posts but I hadn’t realised quite how widespread it can be in organisations until quite recently. Confirmation bias [A] tendency for people to favor information that confirms their preconceptions or hypotheses regardless of whether the information is true. As a result, people gather evidence and recall information from memory selectively, and interpret it in a biased way. The biases appear in particular for emotionally significant issues and for established beliefs. Pair Presenting https://www.markhneedham.com/blog/2011/03/17/pair-presenting/ Thu, 17 Mar 2011 06:52:37 +0000 https://www.markhneedham.com/blog/2011/03/17/pair-presenting/ Over the last year or so I’ve had some opportunities to pair with a few different people on sessions/presentations that we’ve been giving. I much prefer doing this than presenting something by myself mainly because it’s much more fun and seems to encourage more participation than when I do something alone. I feel that it’s probably easier to pair present if you both have similar opinions on the subject matter and are comfortable with a similar style of delivery. TWU: Fishbowl https://www.markhneedham.com/blog/2011/03/15/twu-fishbowl/ Tue, 15 Mar 2011 01:11:37 +0000 https://www.markhneedham.com/blog/2011/03/15/twu-fishbowl/ As part of a session on ThoughtWorks values at ThoughtWorks Univesity we held a fishbowl to discuss the trade offs we often have to make between the values when confronted with real life situations. A fishbowl conversation is a form of dialog that can be used when discussing topics within large groups. Four to five chairs are arranged in an inner circle. This is the fishbowl. The remaining chairs are arranged outside the fishbowl. Use of language: Intuitive https://www.markhneedham.com/blog/2011/03/13/use-of-language-intuitive/ Sun, 13 Mar 2011 13:39:54 +0000 https://www.markhneedham.com/blog/2011/03/13/use-of-language-intuitive/ Sumeet and I were recently discussing the difference between the use of Google Groups for internal communication compared to the Jive platform which we’re now moving to and I suggested that I found the former more intuitive to use. Sumeet suggested that the word 'intuitive' is quite overloaded and later pointed me to an article on the Moodle website which advocates the same thing: Intuitive is a word you should avoid in discussions of usability as its meaning is often confused. Everything I know everyone else knows https://www.markhneedham.com/blog/2011/03/13/everything-i-know-everyone-else-knows/ Sun, 13 Mar 2011 12:03:13 +0000 https://www.markhneedham.com/blog/2011/03/13/everything-i-know-everyone-else-knows/ For as long as I can remember I’ve had the belief that, at least as far as software is concerned, everything I know how to do everyone else also knows how to do. I carried that assumption for quite a while and only realised relatively recently how harmful it can be. The most observable outcome I noticed is that I either didn’t give my opinion in group situations or just didn’t take part in them because I assumed that what I wanted to say would eventually be contributed by someone else anyway. TWU: Session Preparation - Limited WIP https://www.markhneedham.com/blog/2011/03/09/twu-session-preparation-limited-wip/ Wed, 09 Mar 2011 14:42:41 +0000 https://www.markhneedham.com/blog/2011/03/09/twu-session-preparation-limited-wip/ I’ve spent a fair percentage of the last couple of weeks preparing sessions for ThoughtWorks University and one thing Frankie has been trying to encourage is only preparing one at a time i.e. limited work in progress Normally I’d be completely in favour of that approach but it doesn’t seem to work at all for me with this type of work. There seem to be a few parts to creating a session, including: Kano Model: Some ThoughtWorks examples https://www.markhneedham.com/blog/2011/03/06/kano-model-some-thoughtworks-examples/ Sun, 06 Mar 2011 08:37:11 +0000 https://www.markhneedham.com/blog/2011/03/06/kano-model-some-thoughtworks-examples/ My colleague Jason Yip recently linked to the Kano Model and although it’s a theory about product development the definition of what counts as a product seems like it can be quite broad. The best explanation of the model that I’ve come across so far is a post by Jean Claude Grosjean which Frankie linked me to. Grosjean describes the three types of requirements like so: Must Have ("Basic needs") These are not always expressed but they are obvious to the customer and must be met…[they] are not a source of satisfaction but can cause major disappointment. Coding: Reflection vs Action mode https://www.markhneedham.com/blog/2011/03/06/coding-reflection-vs-action-mode/ Sun, 06 Mar 2011 04:19:01 +0000 https://www.markhneedham.com/blog/2011/03/06/coding-reflection-vs-action-mode/ It recently struck me while preparing some ThoughtWorks University sessions that there appear to be two modes that I spend my time switching between while coding: Action mode - we’re focused on getting things done, making things happen Reflective mode - we’re a bit more detached and looking at things from a higher level I spent the majority of 2008 and 2009 in reflective mode on the systems I was working on which can be seen by scanning through a lot of the blog posts that I wrote during that time. TWU: Session Design - Measurable goals https://www.markhneedham.com/blog/2011/03/04/twu-session-design-measurable-goals/ Fri, 04 Mar 2011 02:49:59 +0000 https://www.markhneedham.com/blog/2011/03/04/twu-session-design-measurable-goals/ We’ve been spending our time recently preparing the sessions for the next ThoughtWorks University batch and one thing Sumeet has encouraged us to do is ensure that we have a measurable goal for each session. In our case that means that we need to design our sessions with the intention of the grads being able to do something rather than understand something after the session. It’s very difficult to measure whether someone understands something and from what I’ve noticed having a goal of someone understanding something can encourage you to put in more than is strictly necessary. Ruby: Refactoring from hash to object https://www.markhneedham.com/blog/2011/02/27/ruby-refactoring-from-hash-to-object/ Sun, 27 Feb 2011 20:10:50 +0000 https://www.markhneedham.com/blog/2011/02/27/ruby-refactoring-from-hash-to-object/ Something I’ve noticed when I play around with Ruby in my own time is that I nearly always end up with the situation where I’m passing hashes all over my code and to start with it’s not a big deal. Unfortunately I eventually get to the stage where I’m effectively modelling an object inside a hash and it all gets very difficult to understand. I’ve written a few times before about incrementally refactoring code so this seemed like a pretty good chance for me to try that out. Pair Programming: Doodling https://www.markhneedham.com/blog/2011/02/26/pair-programming-doodling/ Sat, 26 Feb 2011 05:20:32 +0000 https://www.markhneedham.com/blog/2011/02/26/pair-programming-doodling/ Another interesting pair programming 'technique' which I rediscovered while pairing with Priyank is that of doodling or drawing various parts of the solution when your pair is writing code. I find that this helps to stop my brain wondering off and lets me reflect on what we’re doing from a higher level. As an added bonus it also seems to allow me to listen more effectively to my pair. Pecha Kucha: My first attempt https://www.markhneedham.com/blog/2011/02/26/pecha-kucha-my-first-attempt/ Sat, 26 Feb 2011 04:39:19 +0000 https://www.markhneedham.com/blog/2011/02/26/pecha-kucha-my-first-attempt/ The first time I came across the Pecha Kucha style of presenting was at the XP 2010 conference during the Agile Suitcase session where Pat Kua and some others talked about the practices, principles and values they most favoured. I’ve never done one before but as part of the preparation work for ThoughtWorks University each of the trainers had to prepare one which we then presented to each other yesterday. Books: Know why you're reading it https://www.markhneedham.com/blog/2011/02/26/books-know-why-youre-reading-it/ Sat, 26 Feb 2011 03:06:48 +0000 https://www.markhneedham.com/blog/2011/02/26/books-know-why-youre-reading-it/ Something which I frequently forget while reading books is that it’s actually quite useful to know exactly why you’re reading it i.e. what knowledge are you trying to gain by doing so. I noticed this again recently while reading The Agile Samurai - it’s one of the books we ask ThoughtWorks University participants to read before they come to India. Implicitly I knew that I just wanted to get a rough idea of what sort of things it’s telling people but I somewhat foolishly just started reading it cover to cover. Pair Programming: "What are you trying to learn?" https://www.markhneedham.com/blog/2011/02/23/pair-programming-what-are-you-trying-to-learn/ Wed, 23 Feb 2011 02:58:51 +0000 https://www.markhneedham.com/blog/2011/02/23/pair-programming-what-are-you-trying-to-learn/ I’ve noticed recently that while pairing with various different people that I frequently ask my pair what they’re trying to learn through the approach that they’re about to take. I tend to use it when I don’t really understand what my pair is doing and want to find out so that I can stay engaged. It seems to be a more effective and less confrontational way of finding out than saying "What are you doing? Espoused theory, theory in action & hypocrisy https://www.markhneedham.com/blog/2011/02/23/espoused-theory-theory-in-action-hypocrisy/ Wed, 23 Feb 2011 01:34:47 +0000 https://www.markhneedham.com/blog/2011/02/23/espoused-theory-theory-in-action-hypocrisy/ Earlier in the year I wrote about Chris Argyris' espoused theory and theory in action and one of the interesting aspects to it which I hadn’t previously considered is how we treat people when their espoused theory and theory in action don’t match. My tendency is to think that these people are hypocrites but Benjamin Mitchell pointed out in a conversation on twitter that it’s not really helpful to think that way: Pomodoro: Observations from giving it a go https://www.markhneedham.com/blog/2011/02/20/pomodoro-observations-from-giving-it-a-go/ Sun, 20 Feb 2011 19:26:04 +0000 https://www.markhneedham.com/blog/2011/02/20/pomodoro-observations-from-giving-it-a-go/ I learnt about the pomodoro technique a couple of years ago and while I did try it out sporadically back then, it’s only recently that I thought I’d properly give it a try when managing my spare time. My approach without the pomodoro technique is to have a long list of things that I could do and then not really doing any of them because I feel bad about not doing one of the other things instead. Communication: Listening https://www.markhneedham.com/blog/2011/02/20/communication-listening/ Sun, 20 Feb 2011 18:43:24 +0000 https://www.markhneedham.com/blog/2011/02/20/communication-listening/ I realised a couple of weeks ago while pairing with a colleague that I’ve become quite bad at interrupting people while they’re speaking. I did have an inkling that I’d let my ability to properly listen to someone drift a bit but I hadn’t seen any evidence until my colleague pointed it out. Somewhat ironically I actually wrote a post about active listening when I first started working at ThoughtWorks in 2006 and reading back over the listening barriers that I listed I realise that there are a few that I tend to break: ThoughtWorks University: Balancing helping and learning https://www.markhneedham.com/blog/2011/02/19/thoughtworks-university-balancing-helping-and-learning/ Sat, 19 Feb 2011 15:15:59 +0000 https://www.markhneedham.com/blog/2011/02/19/thoughtworks-university-balancing-helping-and-learning/ 6 months after my first attempt to train one of the ThoughtWorks University batches was cut short I’m back in Bangalore again and spent the first few days of this week pairing with the grads. It’s been interesting for me trying to balance how much I help and suggest ideas while still allowing them to learn at the same time. At the moment I think I’m leaning too far towards helping and not realising until later on that my colleague hadn’t quite understood why I’d suggested what I did and therefore hadn’t learnt anything from my suggestion. Increasing team sizes: Collective unresponsibility https://www.markhneedham.com/blog/2011/02/16/increasing-team-sizes-collective-unresponsibility/ Wed, 16 Feb 2011 18:00:52 +0000 https://www.markhneedham.com/blog/2011/02/16/increasing-team-sizes-collective-unresponsibility/ After a few recent conversations with colleagues as well as my observations of several projects I’m coming to the conclusion that the way that people react in situations often differs significantly depending on whether they’re working in a large or small team. One of the most obvious ways that this manifests itself is when there comes a need for someone to volunteer to take care of something - be it a particular functional area, communication with the onshore team or something else. Vim: Copying to and retrieving from the clipboard https://www.markhneedham.com/blog/2011/02/14/vim-copying-to-and-retrieving-from-the-clipboard/ Mon, 14 Feb 2011 14:13:56 +0000 https://www.markhneedham.com/blog/2011/02/14/vim-copying-to-and-retrieving-from-the-clipboard/ My memory when it comes to remembering how to get text to and from Vim via the clipboard is pretty bad so I thought I’d try summarising what I know and see if that works out any better. We can access the system clipboard via the '+' buffer so the commands revolve around that. Copying to the clipboard To copy the whole file to the clipboard we can use this command: CouchDB: Join like behaviour with link functions https://www.markhneedham.com/blog/2011/02/13/couchdb-join-like-behaviour-with-link-functions/ Sun, 13 Feb 2011 17:58:54 +0000 https://www.markhneedham.com/blog/2011/02/13/couchdb-join-like-behaviour-with-link-functions/ I’ve been playing around with the Twitter streaming API a bit lately to see which links are being posted most frequently by the people I follow and then storing the appropriate tweets in CouchDB. I recently came across a problem which I struggled to solve for quite a while. Based on the following map function: { "_id" : "_design/query", "views" : { "by_link" : { "map" : "function(doc){ emit(doc.actual_link, { user : doc. CouchDB: 'badmatch' when executing view https://www.markhneedham.com/blog/2011/02/12/couchdb-badmatch-when-executing-view/ Sat, 12 Feb 2011 18:03:53 +0000 https://www.markhneedham.com/blog/2011/02/12/couchdb-badmatch-when-executing-view/ I’ve been playing around with CouchDB again in my annual attempt to capture the links appearing on my twitter stream and I managed to create the following error for myself: $ curl http://127.0.0.1:5984/twitter_links/_design/cleanup/_view/find_broken_links {"error":"badmatch","reason":"{\n \"find_broken_links\": {\n \"map\": \"function(doc) { \nvar prefix = doc.actual_link.match(/.*/); \n if(true) { emit(doc.actual_link, null); } }\"\n }\n}"} It turns out this error is because I’ve managed to create new line characters in the view while editing it inside CouchDBX. Sed: Extended regular expressions https://www.markhneedham.com/blog/2011/02/11/sed-extended-regular-expressions/ Fri, 11 Feb 2011 20:34:53 +0000 https://www.markhneedham.com/blog/2011/02/11/sed-extended-regular-expressions/ Irfan and I were looking at how to do some text substitution in a text file this afternoon and turned to sed to help us in our quest. He had originally used grep to find what he wanted to replace on each line, using a grep regular expression to match one or more numbers: cat the_file.txt | grep "[0-9]\+" That works pretty well but since I knew how to do the substitution in sed we needed to convert the regular expression to work with sed. University coding https://www.markhneedham.com/blog/2011/02/06/university-coding/ Sun, 06 Feb 2011 16:57:14 +0000 https://www.markhneedham.com/blog/2011/02/06/university-coding/ We went to do some university recruitment recently and pairing with some of the students reminded me of some things that I’ve started doing better since I started working professionally. I wanted to note them down so that I’m more aware that these might be common areas to improve on for university graduates that I work with in the future. Naming of things I don’t remember there being that much focus on naming variables/methods/classes in any of the programming courses that I studied. Feedback: Making the request specific https://www.markhneedham.com/blog/2011/02/06/feedback-making-the-request-specific/ Sun, 06 Feb 2011 15:47:30 +0000 https://www.markhneedham.com/blog/2011/02/06/feedback-making-the-request-specific/ My colleagues in Pune have been collecting feedback over the past week as part of the quarterly feedback cycle and it’s got me thinking about the way that people ask for the feedback. The most popular way is to ask for general feedback which answers questions like this: What are the things that the individual has done well? What are the things that the individual has not done well and/or needs more focus/improvement? Ruby: Where to define the method? https://www.markhneedham.com/blog/2011/02/03/ruby-where-to-define-the-method/ Thu, 03 Feb 2011 19:37:17 +0000 https://www.markhneedham.com/blog/2011/02/03/ruby-where-to-define-the-method/ In our application we deal with items which can be put into a shopping cart. An item is defined like so: class Item < ActiveRecord::Base end One problem that we had to solve recently was working out how to display a message to the user if the item they wanted to buy was out of stock. We can find out if items are out of stock by making a call to an external service: 'Why' often unhelpful https://www.markhneedham.com/blog/2011/02/03/why-often-unhelpful/ Thu, 03 Feb 2011 18:51:45 +0000 https://www.markhneedham.com/blog/2011/02/03/why-often-unhelpful/ Something which I’ve noticed recently in particular when interviewing people but also in some other situations is that frequently posing a question which begins with 'why' results in quite a defensive response. While discussing this with Priyank he pointed out that asking a question in this way can often be construed as a criticism of the idea being questioned. Admittedly it is often the case that I’m questioning something which has been done differently than what I might have done but I’m still curious as to the reasoning behind it. Increasing team sizes: Boredom https://www.markhneedham.com/blog/2011/01/27/increasing-team-sizes-boredom/ Thu, 27 Jan 2011 22:59:37 +0000 https://www.markhneedham.com/blog/2011/01/27/increasing-team-sizes-boredom/ Although the majority of the teams that I’ve worked on over the past few years have been relatively small in size I have worked on a few where the team size has been pretty big and perhaps inevitably the productivity has felt much lower. I think this is somewhat inevitable since although the overall throughput of these teams may be higher than on smaller teams, due to problems such as having difficulty parallelising work, not every pair is working at maximum productivity. The Five Orders of Ignorance - Phillip G. Armour https://www.markhneedham.com/blog/2011/01/26/the-five-orders-of-ignorance-phillip-g-armour/ Wed, 26 Jan 2011 18:08:09 +0000 https://www.markhneedham.com/blog/2011/01/26/the-five-orders-of-ignorance-phillip-g-armour/ While trawling the comments of Dan North’s 'Deliberate Discovery' post I came across an interesting article written by Phillip G. Armour titled 'The Five Orders of Ignorance'. The main thing I took from the article is that the author uses the metaphor of software as a 'knowledge acquisition activity' for which he then defines five orders of ignorance that we can have in our attempts to acquire that knowledge. Deliberate Discovery: The stuff I don't know list https://www.markhneedham.com/blog/2011/01/26/deliberate-discovery-the-stuff-i-dont-know-list/ Wed, 26 Jan 2011 18:07:07 +0000 https://www.markhneedham.com/blog/2011/01/26/deliberate-discovery-the-stuff-i-dont-know-list/ Towards the end of Dan North’s post on Deliberate Discovery he makes the following suggestion: There is much more to say about deliberate discovery. Think about applying the principle to learning a new language, or picking up a new technology, or a new domain. What could you do to identify and reduce your ignorance most rapidly? This reminded me a lot of what I used to do when I came across things that I didn’t know how to do a few years ago. Distributed Agile: Stories - Negotiable https://www.markhneedham.com/blog/2011/01/24/distributed-agile-stories-negotiable/ Mon, 24 Jan 2011 03:34:47 +0000 https://www.markhneedham.com/blog/2011/01/24/distributed-agile-stories-negotiable/ I was recently reading an article about how to write meaningful user stories and towards the end of it the author mentioned the INVEST acronym which suggests that stories should be: Independent Negotiable Valuable Estimable Small Testable From what I’ve seen the most difficult one to achieve in a distributed context is that stories should be 'negotiable', in particular when it comes to negotiating the way that the UX of a bit of functionality should work. While in India: Osmotic communication https://www.markhneedham.com/blog/2011/01/24/while-in-india-osmotic-communication/ Mon, 24 Jan 2011 03:33:39 +0000 https://www.markhneedham.com/blog/2011/01/24/while-in-india-osmotic-communication/ One of the things that has been puzzling me during my time in India is the amount of time that is spent in meetings pushing information to people rather than them pulling it. In previous projects that I’ve worked on a lot of the knowledge was moved between around as a result of osmotic communication Osmotic communication means that information flows into the background hearing of members of the team, so that they pick up relevant information as though by osmosis. Listening to feedback mechanisms https://www.markhneedham.com/blog/2011/01/21/listening-to-feedback-mechanisms/ Fri, 21 Jan 2011 03:46:21 +0000 https://www.markhneedham.com/blog/2011/01/21/listening-to-feedback-mechanisms/ In Growing Object Oriented Software the authors talk about the value of listening to our tests to understand potential problems with our code and I’ve started to notice recently that there are implicit feedback mechanisms dotted around at a higher level which we can also listen to. A couple of examples come to mind: Nothing to show in the showcase I’ve worked on a couple of projects where we’ve got to the end of the iteration and realised that we don’t actually have anything tangible to show the product owner. Coding: Spike Driven Development https://www.markhneedham.com/blog/2011/01/19/coding-spike-driven-development/ Wed, 19 Jan 2011 17:46:39 +0000 https://www.markhneedham.com/blog/2011/01/19/coding-spike-driven-development/ While reading Dan North’s second post about software craftsmanship I was able to resonate quite a lot with a point he made in the 'On value' section: I’m not going to mandate test-driving anything (which is a huge about-face from what I was saying a year ago), unless it will help. Copy-and-paste is fine too. (Before you go all shouty at me again, hold off until I blog about the benefits of copy-and-paste, as it appears in a couple of patterns I’m calling Spike and Stabilize and Ginger Cake. MySQL: The used command is not allowed with this MySQL version https://www.markhneedham.com/blog/2011/01/18/mysql-the-used-command-is-not-allowed-with-this-mysql-version/ Tue, 18 Jan 2011 18:58:35 +0000 https://www.markhneedham.com/blog/2011/01/18/mysql-the-used-command-is-not-allowed-with-this-mysql-version/ For my own reference more than anything else, on my version of MySQL on Mac OS X, which is: mysql5 Ver 14.14 Distrib 5.1.48, for apple-darwin10.4.0 (i386) using readline 6.1 When I try to use the 'LOAD DATA LOCAL' option to load data into tables I get the following error message: ERROR 1148 (42000) at line 4: The used command is not allowed with this MySQL version Which we can get around by using the following flag as described in the comments of the documentation: Installing git-svn on Mac OS X https://www.markhneedham.com/blog/2011/01/15/installing-git-svn-on-mac-os-x/ Sat, 15 Jan 2011 19:05:26 +0000 https://www.markhneedham.com/blog/2011/01/15/installing-git-svn-on-mac-os-x/ I somehow managed to uninstall git-svn on my machine and Emmanuel Bernard’s blog post suggested it could be installed using ports: sudo port install git-core +svn I tried that and was ending up with the following error: ---> Computing dependencies for git-core ---> Dependencies to be installed: p5-svn-simple subversion-perlbindings apr-util db46 cyrus-sasl2 neon serf subversion p5-term-readkey ---> Verifying checksum(s) for db46 Error: Checksum (md5) mismatch for patch.4.6.21.1 Error: Checksum (md5) mismatch for patch. mount_smbfs: mount error..File exists https://www.markhneedham.com/blog/2011/01/15/mount_smbfs-mount-error-file-exists/ Sat, 15 Jan 2011 18:31:07 +0000 https://www.markhneedham.com/blog/2011/01/15/mount_smbfs-mount-error-file-exists/ I’ve been playing around with mounting a Windows file share onto my machine via the terminal because I’m getting bored of constantly having to go to Finder and manually mounting it each time! After a couple of times of mounting and unmounting the drive I ended up with this error: > mount_smbfs //mneedham@punedc02/shared punedc02_shared/ mount_smbfs: mount error: /Volumes/punedc02_shared: File exists I originally thought the 'file exists' part of the message was suggesting that I’d already mounted a share on 'punedc02_shared' but calling the 'umount' command led to the following error: Sed: 'sed: 1: invalid command code R' on Mac OS X https://www.markhneedham.com/blog/2011/01/14/sed-sed-1-invalid-command-code-r-on-mac-os-x/ Fri, 14 Jan 2011 14:15:19 +0000 https://www.markhneedham.com/blog/2011/01/14/sed-sed-1-invalid-command-code-r-on-mac-os-x/ A few days ago I wrote about how we’d been using Sed to edit multiple files and while those examples were derived from what we’d been using on Ubuntu I realised that they didn’t actually work on Mac OS X. For example, the following command: sed -i 's/require/include/' Rakefile Throws this error: sed: 1: "Rakefile": invalid command code R What I hadn’t realised is that on the Mac version of sed the '-i' flag has a mandatory suffix, as described in this post. Chris Argyris: Espoused Theory vs Theory in Action https://www.markhneedham.com/blog/2011/01/13/chris-argyris-espoused-theory-vs-theory-in-action/ Thu, 13 Jan 2011 20:02:49 +0000 https://www.markhneedham.com/blog/2011/01/13/chris-argyris-espoused-theory-vs-theory-in-action/ Via some combination of Christian Blunden, http://twitter.com/!/patkua[Pat Kua], http://twitter.com/!/dpjoyce[David Joyce] and Benjamin Mitchell I’ve been spending some time lately reading about the work of Chris Argyris. I’ve previously come across his name while reading The Fifth Discipline but I didn’t realise how interesting his work actually is. One of the interesting concepts I’ve come across so far is the difference between espoused theory and theory in use: Espoused theory Rails: Using helpers inside a controller https://www.markhneedham.com/blog/2011/01/11/rails-using-helpers-inside-a-controller/ Tue, 11 Jan 2011 17:09:49 +0000 https://www.markhneedham.com/blog/2011/01/11/rails-using-helpers-inside-a-controller/ For about an hour or so this afternoon we were following the somewhat evil practice of using a method defined in a helper inside a controller. The method was defined in the ApplicationHelper module: module ApplicationHelper def foo # do something end end So we initially assumed that we’d just be able to reference that method inside any of our controllers since they all derive from ApplicationController. That wasn’t the case so our next attempt was to try and add it as a helper: Sed across multiple files https://www.markhneedham.com/blog/2011/01/11/sed-across-multiple-files/ Tue, 11 Jan 2011 16:43:53 +0000 https://www.markhneedham.com/blog/2011/01/11/sed-across-multiple-files/ Pankhuri and I needed to rename a method and change all the places where it was used and decided to see if we could work out how to do it using sed. We needed to change a method call roughly like this: home_link(current_user) To instead read: homepage_path For which we need the following sed expression: sed -i 's/home_link([^)]*)/homepage_path/' [file_name] Which works pretty well if you know which file you want to change but we wanted to run it over the whole code base. Jet Airways: Lacking conceptual integrity and the power of twitter https://www.markhneedham.com/blog/2011/01/10/jet-airways-lacking-conceptual-integrity-and-the-power-of-twitter/ Mon, 10 Jan 2011 18:08:22 +0000 https://www.markhneedham.com/blog/2011/01/10/jet-airways-lacking-conceptual-integrity-and-the-power-of-twitter/ I recently travelled to London and back for Christmas using Jet Airways and the whole journey got off to an 'interesting' start. I originally booked two Jet Airways flights - one from Pune to Delhi and another from Delhi to London. A couple of weeks later I was sent an email cancelling my Pune to Delhi flight and informing me that I should contact their customer support centre. I quickly browsed their website to check what had happened to my flight and found out that it had actually changed from being a Jet Airways flight and was now in fact a Jet Lite flight - their sister airline. Failure of integration point doesn't have to stop the user: A real life example https://www.markhneedham.com/blog/2011/01/10/failure-of-integration-point-doesnt-have-to-stop-the-user-a-real-life-example/ Mon, 10 Jan 2011 15:28:44 +0000 https://www.markhneedham.com/blog/2011/01/10/failure-of-integration-point-doesnt-have-to-stop-the-user-a-real-life-example/ Ashwin and I were recently discussing integration points in software systems and in particular how many systems are designed in such a way that they will stop the user from going any further if one of those integration points is down. The main point in favour of designing systems in this way is that it’s logically very simple - all operations are synchronous and we don’t have to worry about any offline processing. Ruby: Sorting by boolean fields https://www.markhneedham.com/blog/2011/01/08/ruby-sorting-by-boolean-fields/ Sat, 08 Jan 2011 13:15:19 +0000 https://www.markhneedham.com/blog/2011/01/08/ruby-sorting-by-boolean-fields/ We were doing a bit of work on RapidFTR in the ThoughtWorks Pune office today and one problem my pair and I were trying to solve was how to sort a collection of objects by a boolean field. Therefore given the following array of values: form_sections = [FormSection.new(:enabled => false, :name => "a", :order => 1), FormSection.new(:enabled => true, :name => "b", :order => 2)] We wanted to display those form sections which were disabled at the bottom of the page. Vim: Learnings so far https://www.markhneedham.com/blog/2010/12/27/vim-learnings-so-far/ Mon, 27 Dec 2010 19:15:51 +0000 https://www.markhneedham.com/blog/2010/12/27/vim-learnings-so-far/ I’ve been using Vim instead of RubyMine for the last month or so and it’s been interesting observing the way that I browse code as I add plugins to make my life easier. Between files I generally don’t know exactly where in the folder structure different files live since I’m used to being able to search by just the name i.e. RubyMine’s Ctrl-N Yahuda Katz wrote a blog post earlier in the year where he listed some of the plugins he’s been using - one of which is called Command-T and allows exactly this functionality. India Cultural Differences: Hierarchy https://www.markhneedham.com/blog/2010/12/27/india-cultural-differences-hierarchy/ Mon, 27 Dec 2010 14:16:09 +0000 https://www.markhneedham.com/blog/2010/12/27/india-cultural-differences-hierarchy/ One of the more interesting differences between Indian culture and my own is that in India there appears to be more adherence to a hierarchy than I’ve experienced before. ThoughtWorks tries to keep a reasonably flat hierarchy so I think the idea of hierarchy would be much more obvious if I was working at one of the big Indian services organisations. Between peers conversations don’t seem to play out any differently but someone in a position of authority is more likely to be able to get their opinion across and accepted with less resistance than they might experience without that authority. Theory of Constraints: Blaming the bottleneck https://www.markhneedham.com/blog/2010/12/26/theory-of-constraints-blaming-the-bottleneck/ Sun, 26 Dec 2010 00:04:54 +0000 https://www.markhneedham.com/blog/2010/12/26/theory-of-constraints-blaming-the-bottleneck/ I’ve been reading The Goal over the last week or so where Eliyahu Goldratt describes the theory of constraints as a philosophy for allowing organisations to continually achieve their goal. Goldratt goes on to describe bottlenecks - resources which have a capacity less than the capacity being demanded of the system. The capacity of the system cannot be higher than that of the bottleneck which means that we need to find a way to optimise the bottlenecks in any system. India Cultural Differences: Language https://www.markhneedham.com/blog/2010/12/24/india-cultural-differences-language/ Fri, 24 Dec 2010 18:12:51 +0000 https://www.markhneedham.com/blog/2010/12/24/india-cultural-differences-language/ For the majority of the time that I’ve spent in Pune so far language hasn’t been a big deal at all but there are a couple of differences that I didn’t initially anticipate. The local language While the official office language is English my colleagues seem more comfortable talking to each other in Hindi so quite frequently the conversation will move into Hindi if someone isn’t directly speaking to me. Communication when it's not going your way https://www.markhneedham.com/blog/2010/12/22/communication-when-its-not-going-your-way/ Wed, 22 Dec 2010 23:32:46 +0000 https://www.markhneedham.com/blog/2010/12/22/communication-when-its-not-going-your-way/ I’ve been reading some of the articles written about the disruption caused by the snow across Europe and I found one quote in The Daily Telegraph by Phillip Hammond particularly interesting "I think whilst people are obviously deeply upset about the inconvenience, particularly at this time of year, of having their travel plans disrupted, most of what I am hearing is a sense of outrage about the way they were then treated when they were stranded at Heathrow airport. India Cultural Differences: Stretched work day https://www.markhneedham.com/blog/2010/12/20/india-cultural-differences-stretched-work-day/ Mon, 20 Dec 2010 21:23:39 +0000 https://www.markhneedham.com/blog/2010/12/20/india-cultural-differences-stretched-work-day/ A couple of months ago I briefly touched on the very stretched days I’ve experienced while working in India. This is in contrast to what I’ve experienced in the UK and Australia where the day was much more time boxed and tended to go from 9am to 6pm. At the moment we also have a call with colleagues in Chicago at 9pm for about 30-45 minutes so the day has now stretched out until nearly 10pm. Distributed Agile: Bringing onshore people offshore https://www.markhneedham.com/blog/2010/12/20/distributed-agile-bringing-onshore-people-offshore/ Mon, 20 Dec 2010 08:58:35 +0000 https://www.markhneedham.com/blog/2010/12/20/distributed-agile-bringing-onshore-people-offshore/ For the last two weeks we’ve had a ThoughtWorks colleague from the onshore team working with us in Pune and it’s been really cool having someone who has been working on 'the other side'. In my time in India there seem to have been many more people going from offshore to onshore than the other way around but based on this experience I don’t think that should necessarily be the case. India Cultural Differences: Tolerance/Patience https://www.markhneedham.com/blog/2010/12/15/india-cultural-differences-tolerancepatience/ Wed, 15 Dec 2010 19:08:56 +0000 https://www.markhneedham.com/blog/2010/12/15/india-cultural-differences-tolerancepatience/ Some colleagues have been asking me recently what cultural differences I’ve noticed working in India compared to my experiences in the UK and Australia and one of the biggest differences by far is the amount of tolerance and patience people here have compared to me. These attributes seem to show themselves in roughly two situations: With respect to the environment We’ve had some building work done in the Pune office recently which has meant that there’s been extremely high volume drilling being done to the extent that you can barely hear someone who’s sitting a couple of metres away. Ask someone vs work it out yourself https://www.markhneedham.com/blog/2010/12/14/ask-someone-vs-work-it-out-yourself/ Tue, 14 Dec 2010 18:04:18 +0000 https://www.markhneedham.com/blog/2010/12/14/ask-someone-vs-work-it-out-yourself/ Back in 2007/2008 when I worked on my first couple of projects at ThoughtWorks I always found it strange how frequently my colleagues would try and figure something out themselves rather than asking someone else (who already knew how to do it) how to do it. Fast forward to 2010 and I find myself being the one encouraging people to figure things out themselves. There’s still merit in communicating with colleagues when we’ve tried to work out how to do something and haven’t managed to figure it out but it’s also useful to not have this as our default mode. Technical implementation heavy stories https://www.markhneedham.com/blog/2010/12/13/technical-implementation-heavy-stories/ Mon, 13 Dec 2010 21:29:02 +0000 https://www.markhneedham.com/blog/2010/12/13/technical-implementation-heavy-stories/ Earlier this year I wrote about some of the problems that we can run into when we have implicit assumptions in stories and another problematic approach I’ve seen around this area is where we end up with stories that are very heavily focused on technical implementation. Initially this seems like it will work out pretty well since all the developer then needs to do is follow the steps that have been outlined for them but from my experience it seems to create more problems than it solves. Distributed Agile: Other observations https://www.markhneedham.com/blog/2010/12/12/distributed-agile-other-observations/ Sun, 12 Dec 2010 08:11:31 +0000 https://www.markhneedham.com/blog/2010/12/12/distributed-agile-other-observations/ Some of the difficulties of working in an offshore environment were clear to me before I even came to India but I’ve come across a few others lately which I either didn’t think about before or didn’t realise how annoying they were! Getting data from the client’s network For several of the stories that we’ve been working on lately we needed to make use of huge amounts of reference data residing on the client’s network. Bugs: Prioritising by bucket https://www.markhneedham.com/blog/2010/12/12/bugs-prioritising-by-bucket/ Sun, 12 Dec 2010 07:59:17 +0000 https://www.markhneedham.com/blog/2010/12/12/bugs-prioritising-by-bucket/ At a lot of organisations that I’ve worked there is a tendency to prioritise bugs by a priority bucket. We might therefore have priority buckets 1-4 where the bucket number indicates how important the bug is to fix and then any buckets ranked below 4 would not be fixed but would be logged anyway. From what I’ve noticed this isn’t a particularly effective way of managing bugs. To start with there tend to be a lot of discussions around what the priority of each bug should be where a QA will argue that it should be a higher priority while a developer disagrees. Why am I working in India? https://www.markhneedham.com/blog/2010/12/10/why-am-i-working-in-india/ Fri, 10 Dec 2010 03:47:13 +0000 https://www.markhneedham.com/blog/2010/12/10/why-am-i-working-in-india/ A few colleagues have asked me why I chose to work in India so I thought it would be interesting to explore what it is that appealed to me about working here. I’ve come to the conclusion that there were 2 main drivers for me: The buzz of the ThoughtWorks office I was in Bangalore in 2006 when I attended ThoughtWorks University and one of the things that stood out for me was the atmosphere in the Diamond District office. Ruby: One method, two parameter types https://www.markhneedham.com/blog/2010/12/07/ruby-one-method-two-parameter-types/ Tue, 07 Dec 2010 05:01:44 +0000 https://www.markhneedham.com/blog/2010/12/07/ruby-one-method-two-parameter-types/ One interesting thing that I’ve noticed while coding in Ruby is that due to the dynamicness of the language it’s possible to pass values of different types into a given method as parameters. For example, I’ve recently come across a few examples of methods designed like this: def calculate_foo_prices(foos) ... [foos].flatten.each do |foo| # do something end end This allows us to use the method like this: # foos would come in as an array from the UI foos = [Foo. Ruby: Exiting a 'loop' early https://www.markhneedham.com/blog/2010/12/01/ruby-exiting-a-loop-early/ Wed, 01 Dec 2010 17:56:51 +0000 https://www.markhneedham.com/blog/2010/12/01/ruby-exiting-a-loop-early/ We recently had a problem to solve which at its core required us to iterate through a collection, look up a value for each key and then exit as soon as we’d found a value. The original solution looped through the collection and then explicitly returned once a value had been found: def iterative_version v = nil [1,2,3,4,5].each do |i| v = long_running_method i return v unless v.nil? end v end def long_running_method(value) puts "inside the long running method with #{value}" return nil if value > 3 value end Which we run like so: Noone wants your stupid process - Jeff Patton https://www.markhneedham.com/blog/2010/11/30/noone-wants-your-stupid-process-jeff-patton/ Tue, 30 Nov 2010 20:35:56 +0000 https://www.markhneedham.com/blog/2010/11/30/noone-wants-your-stupid-process-jeff-patton/ My former colleague Alexandre Martins recently pointed me to a presentation given by Jeff Patton at Agile Roots titled 'Noone wants your stupid process' and it’s one of the most interesting talks I’ve watched recently. In the talk Jeff cites globo.com as a case study of a company which is using an agile approach to development of their website but are starting to doubt whether it’s the best way to go about things. Consulting is like inception https://www.markhneedham.com/blog/2010/11/30/consulting-is-like-inception/ Tue, 30 Nov 2010 19:25:16 +0000 https://www.markhneedham.com/blog/2010/11/30/consulting-is-like-inception/ My colleague Jason Yip recently tweeted the following… Sometimes consulting reminds me of the movie Inception …which reminded me of a conversation I was having with a colleague here who’s been working on consulting engagements here for the last few months. I was describing some of the things that I wanted to change on my team and she pointed out that I always described each change as something that I wanted to change rather than something which I wanted to see change. Local port forwarding https://www.markhneedham.com/blog/2010/11/29/local-port-forwarding/ Mon, 29 Nov 2010 19:42:13 +0000 https://www.markhneedham.com/blog/2010/11/29/local-port-forwarding/ A colleague and I ran into an interesting problem today which we wanted to use local port forwarding to solve. In our environment.rb file we have a Solr instance url defined like so: SOLR_CONFIG = { :service_url => "http://some.internal.address:9983/solr/sco_slave_1" } It’s defined like that because our colleagues in Chicago have setup a Solr instance on a test environment and all the developers hit the same box. In Pune everyone has Solr configured on their own box so we really wanted to configure that url to be 'localhost' on port '8983'. Team Communication: Learning models https://www.markhneedham.com/blog/2010/11/27/team-communication-learning-models/ Sat, 27 Nov 2010 10:50:27 +0000 https://www.markhneedham.com/blog/2010/11/27/team-communication-learning-models/ One of the problems I’ve noticed in several of the 'agile' communication mechanisms (such as the standup or dev huddle) that we typically use on teams is that they focus almost entirely on verbal communication which only covers one of our learning styles - the auditory learning style. The Learning Models The VAK learning style model describes a simple model covering the different learning styles that people have: Visual - seeing and reading. Increasing team sizes: Parallelising work https://www.markhneedham.com/blog/2010/11/26/increasing-team-sizes-parallelising-work/ Fri, 26 Nov 2010 03:53:33 +0000 https://www.markhneedham.com/blog/2010/11/26/increasing-team-sizes-parallelising-work/ One of the trickiest things to do when working in bigger teams is ensuring that it is possible to parallelise the work we have across the number of pairs that we have available. From my experience this problem happens much less frequently in smaller teams. Perhaps inevitably it’s much easier to find 2 or 3 things that can be worked on in parallel than it is to find 6 or 7 or more. Interviewing: Communication https://www.markhneedham.com/blog/2010/11/26/interviewing-communication/ Fri, 26 Nov 2010 03:50:20 +0000 https://www.markhneedham.com/blog/2010/11/26/interviewing-communication/ I’ve been in India for around 4 months and in that time I think I’ve probably interviewed more people than I have in the last 4 years. Over this time I’ve come to realise that the two main things I’m looking for in candidates are passion and ability to communicate effectively. It’s relatively easy to pick up on whether someone is passionate about what they do in a conversation or while pairing with them but I find the communication aspect a bit more tricky. A dirty hack to get around aliases not working in a shell script https://www.markhneedham.com/blog/2010/11/24/a-dirty-hack-to-get-around-aliases-not-working-in-a-shell-script/ Wed, 24 Nov 2010 18:48:25 +0000 https://www.markhneedham.com/blog/2010/11/24/a-dirty-hack-to-get-around-aliases-not-working-in-a-shell-script/ In another script I’ve been working on lately I wanted to call 'mysql' but unfortunately on my machine it’s 'mysql5' rather than 'mysql'. I have an alias defined in '~/.bash_profile' so I can call 'mysql' from the terminal whenever I want to. alias mysql=mysql5 Unfortunately shell scripts don’t seem to have access to this alias and the only suggestion I’ve come across while googling this is to source '~/.bash_profile' inside the script. Ruby: Checking for environment variables in a script https://www.markhneedham.com/blog/2010/11/24/ruby-checking-for-environment-variables-in-a-script/ Wed, 24 Nov 2010 18:34:45 +0000 https://www.markhneedham.com/blog/2010/11/24/ruby-checking-for-environment-variables-in-a-script/ I’ve been working on a Ruby script to allow us to automate part of our Solr data setup and part of the task was to check that some environment variables were set and throw an exception if not. I got a bit stuck initially trying to work out how to return a message showing only the missing environment variables but it turned out to be pretty simple when I came back to it a couple of hours later. Systems Thinking: Individuals and the environment https://www.markhneedham.com/blog/2010/11/23/systems-thinking-individuals-and-the-environment/ Tue, 23 Nov 2010 20:20:03 +0000 https://www.markhneedham.com/blog/2010/11/23/systems-thinking-individuals-and-the-environment/ Something which I’ve become fairly convinced about recently is that the environment that someone works in has far more impact on their perceived performance than their own individual skills. Given that belief I’ve often got stuck answering why some people are better able to handle a difficult environment than others - in terms of accepting the situation and finding a way of being productive regardless. Does this mean that they’re better than people who can’t work in that environment as effectively? Make it interesting for yourself https://www.markhneedham.com/blog/2010/11/22/make-it-interesting-for-yourself/ Mon, 22 Nov 2010 19:58:21 +0000 https://www.markhneedham.com/blog/2010/11/22/make-it-interesting-for-yourself/ Just over a year ago I wrote a post about learning one thing each day and since I’ve been struggling to do this lately I thought I’d come back to this topic again. My general thinking at the time I wrote that post was that sometimes it would be really difficult to find a way to learn anything on the project I was working on and the only way to learn would be to play around with something outside work. The Adventures of Johnny Bunko - The Last Career Guide You'll Ever Need: Book Review https://www.markhneedham.com/blog/2010/11/21/the-adventures-of-johnny-bunko-the-last-career-guide-youll-ever-need-book-review/ Sun, 21 Nov 2010 17:02:14 +0000 https://www.markhneedham.com/blog/2010/11/21/the-adventures-of-johnny-bunko-the-last-career-guide-youll-ever-need-book-review/ I read Dan Pink’s A Whole New Mind earlier in the year but I hadn’t heard of The Adventures of Johnny Bunko until my colleague Sumeet Moghe mentioned it in a conversation during ThoughtWorks India’s XConf, an internal conference run here. The book is written in the Manga format so it’s incredibly quick to read and it gives 6 ideas around building a career. I’m generally not a fan of the idea of 'building a career' - generally when I hear that phrase it involves having a 'five year' plan and other such concepts which I consider to be pointless. From unconsciously incompetent to consciously incompetent https://www.markhneedham.com/blog/2010/11/19/from-unconsciously-incompetent-to-consciously-incompetent/ Fri, 19 Nov 2010 20:20:19 +0000 https://www.markhneedham.com/blog/2010/11/19/from-unconsciously-incompetent-to-consciously-incompetent/ One of the cool things about software development is that despite writing code for 5 years professionally and just under 10 altogether, there are still a phenomenal number of things that I don’t know how to do. The learning opportunities are vast! One of the areas which I’ve known I don’t know that much about is Unix command line tools such as awk and sed. Since the majority of projects that I’ve worked on have involved using Windows as the development environment I’ve never had extended exposure to the types of problems we get on a project which require their use. Capistrano, sed, escaping forward slashes and 'p' is not 'puts'! https://www.markhneedham.com/blog/2010/11/18/capistrano-sed-escaping-forward-slashes-and-p-is-not-puts/ Thu, 18 Nov 2010 18:40:37 +0000 https://www.markhneedham.com/blog/2010/11/18/capistrano-sed-escaping-forward-slashes-and-p-is-not-puts/ Priyank and I have been working on automating part of our deployment process and one task we needed to do as part of this is replace some variables used in one of our shell scripts. All the variables in the script refer to production specific locations but we needed to change a couple of them in order to run the script in our QA environment. We’re therefore written a sed command, which we call from Capistrano, to allow us to do this. Rails: A slightly misleading error https://www.markhneedham.com/blog/2010/11/16/rails-a-slightly-misleading-error/ Tue, 16 Nov 2010 21:17:00 +0000 https://www.markhneedham.com/blog/2010/11/16/rails-a-slightly-misleading-error/ We recently created a new project to handle the reporting part of our application and as with all our projects we decided not to checkin any configuration ".yml' files but rather '.yml.example' files which people can then customise for their own environments. So in our config directory would look something like this when you first checkout the project: config database.yml.example some.yml.example </ul> And we’d need to copy those files to get '. Retrospectives: My first time facilitating https://www.markhneedham.com/blog/2010/11/15/retrospectives-my-first-time-facilitating/ Mon, 15 Nov 2010 19:52:00 +0000 https://www.markhneedham.com/blog/2010/11/15/retrospectives-my-first-time-facilitating/ Despite being part of numerous retrospectives over the past few years I don’t remember actually facilitating one until my current team’s last week. I’ve gradually come to appreciate the skill involved in facilitating this type of meeting having originally been of the opinion that there wasn’t much to it. I recently read Agile Retrospectives which has loads of different ideas for activities beyond just creating 'went well' and 'could improve' columns and then filling those in as a group. Agile: Increasing team sizes https://www.markhneedham.com/blog/2010/11/14/agile-increasing-team-sizes/ Sun, 14 Nov 2010 11:51:37 +0000 https://www.markhneedham.com/blog/2010/11/14/agile-increasing-team-sizes/ A fairly common trend on nearly every project I’ve worked on is that at some stage the client will ask for more people to be added to the team in order to 'improve' the velocity. Some of the most common arguments against doing so are that it will initially slow down the team’s velocity as the new members learn the domain, code base and get to know the other members of the team. Experiments in not using the mouse https://www.markhneedham.com/blog/2010/11/12/experiments-in-not-using-the-mouse/ Fri, 12 Nov 2010 15:43:37 +0000 https://www.markhneedham.com/blog/2010/11/12/experiments-in-not-using-the-mouse/ Priyank and I have been pairing a bit lately and we thought it’d be interesting to try and not use the mouse for anything that we had to do while pairing. Editor Priyank uses GVim (Yehuda Katz recommends MacVim if you’re using Mac OS) so we already don’t need to use the mouse at all when we’re inside the editor. One annoying thing we found is that sometimes we wanted to copy stuff from the terminal into GVim and couldn’t think of a good way to do that without selecting the text on the terminal with a mouse and then 'Ctrl-C’ing. Distributed Agile: Communicating big design decisions https://www.markhneedham.com/blog/2010/11/10/distributed-agile-communicating-big-design-decisions/ Wed, 10 Nov 2010 19:58:33 +0000 https://www.markhneedham.com/blog/2010/11/10/distributed-agile-communicating-big-design-decisions/ Although we mostly split the work on my project so that there aren’t too many dependencies between the teams in Chicago and Pune, there have still been some times when we’ve designed major parts of the code base in Pune and have needed to communicate that to our Chicago colleagues. I’ve never seen this situation so it’s been interesting to see which approaches work in trying to do this effectively and allowing the people in the other location to have input as well. Active Record: Nested attributes https://www.markhneedham.com/blog/2010/11/09/active-record-nested-attributes/ Tue, 09 Nov 2010 18:37:10 +0000 https://www.markhneedham.com/blog/2010/11/09/active-record-nested-attributes/ I recently learnt about quite a neat feature of Active Record called nested attributes which allows you to save attributes on associated records of a parent model. It’s been quite useful for us as we have a few pages in our application where the user is able to update models like this. We would typically end up with parameters coming into the controller like this: class FoosController < ApplicationController def update # params = { :id => "1", :foo => { :baz => "new_baz", :bar_attributes => { :value => "something" } } } Foo. Distributed Agile: Communication - Reliance on one person https://www.markhneedham.com/blog/2010/11/08/distributed-agile-communication-reliance-on-one-person/ Mon, 08 Nov 2010 13:56:21 +0000 https://www.markhneedham.com/blog/2010/11/08/distributed-agile-communication-reliance-on-one-person/ Continuing with my series of observations on what it’s like working in a distributed agile team, another thing that I’ve noticed is that it’s useful to try and ensure that there is communication between as many people as possible in the two cities. This means that we want to ensure that we don’t have an over reliance on one person to handle any communication. We have a call once a day between developers in Pune and Chicago and the Chicago guys have been able to achieve this by rotating the person attending the call. Retrospectives: General observations https://www.markhneedham.com/blog/2010/11/06/retrospectives-general-observations/ Sat, 06 Nov 2010 17:17:16 +0000 https://www.markhneedham.com/blog/2010/11/06/retrospectives-general-observations/ Following on from my blog post about some observations about the actions that we create in retrospectives I’ve also noticed some general ways that retrospectives might not end up being as useful as we’d hope. Having a manager facilitating While having the manager of the team facilitating the retrospective isn’t a problem in itself I think it’s useful to remember that in this context they aren’t in that role anymore. Retrospectives: Actions https://www.markhneedham.com/blog/2010/11/06/retrospectives-actions/ Sat, 06 Nov 2010 11:59:16 +0000 https://www.markhneedham.com/blog/2010/11/06/retrospectives-actions/ My colleague Ashwin Raghav wrote a blog post earlier in the week in which he noted some patterns that he’s noticed in retrospectives in his time working in ThoughtWorks. In it he talks quite generally about things he’s noticed but in my experience one of the areas in which teams typically struggle is when it comes to action items. Too many action items I think this is probably the number one mistake that we make in retrospectives and it’s really easy to make. Distributed Agile: Context https://www.markhneedham.com/blog/2010/10/31/distributed-agile-context/ Sun, 31 Oct 2010 18:27:16 +0000 https://www.markhneedham.com/blog/2010/10/31/distributed-agile-context/ From my last couple of months working for ThoughtWorks in Pune I think the most common subject that I’ve heard discussed is how to ensure that the team offshore is receiving all the context about the decisions and direction being taken onshore. What I’ve found most interesting is that I think out of all the teams that I’ve worked on in the last four years my current team has by far the most context about what the client wants to do and the approaches they want to take over the next few months. Meetings: Guerilla Collaboration https://www.markhneedham.com/blog/2010/10/31/meetings-guerilla-collaboration/ Sun, 31 Oct 2010 14:53:40 +0000 https://www.markhneedham.com/blog/2010/10/31/meetings-guerilla-collaboration/ As I’ve mentioned on twitter a few times my current team has a lot of meetings and apart from using the passive aggressive approach that Toby Tripp’s meeting ticker provides I’ve also been flicking through Chapter 19, 'Guerilla Collaboration', of Jean Tabaka’s 'Collaboration Explained: Facilitation skills for software project leaders' which gives other ideas. I’ve also seen some useful ideas that my colleagues have used in meetings that I’ve been part of. Ruby: Getting Active Record validation errors twice https://www.markhneedham.com/blog/2010/10/29/ruby-getting-active-record-validation-errors-twice/ Fri, 29 Oct 2010 04:27:41 +0000 https://www.markhneedham.com/blog/2010/10/29/ruby-getting-active-record-validation-errors-twice/ I managed to create an interesting problem for myself while playing around with some code whereby I was ending up with validation errors appearing twice every time I called 'valid?' on a specific model. I figured I was probably doing something stupid and in fact a few replies by Aaron Baldwin on a mailing list thread on 'rubyonrails-talk' helped explain exactly what I’d done: Are you calling require 'employee' anywhere? Ruby: Using a variable in a regex https://www.markhneedham.com/blog/2010/10/27/ruby-using-a-variable-in-a-regex/ Wed, 27 Oct 2010 13:55:27 +0000 https://www.markhneedham.com/blog/2010/10/27/ruby-using-a-variable-in-a-regex/ We’re using Web Mock on my current project to stub out some of the external web requests in some of our integration tests and I managed to get myself very confused while trying to use a variable inside a regular expression that I was trying to pass to the 'stub_request' method. The code was roughly like this: some_url = "http://service.com/method" stub_request(:any, /some_url/). to_return(:body => File.new('/path/to/some.xml'), :headers => {'Content-Length' => 666, 'Content-Type' => 'text/xml'}, :status => 200, :headers => {'Content-Type' => 'text/xml'}) The request was being stubbed when I hard coded the url inside the regular expression but not being stubbed when I used the variable like in the example above. Distributed Agile: Communication https://www.markhneedham.com/blog/2010/10/27/distributed-agile-communication/ Wed, 27 Oct 2010 13:50:53 +0000 https://www.markhneedham.com/blog/2010/10/27/distributed-agile-communication/ I’d always heard that communication when you’re working offshore was much more difficult than in a co-located team but it’s quite difficult to imagine exactly what the difficulties are until you see them for yourself. These are some of my latest observations in this area so far. Learning models I’m a very visual learner and the majority of the time any communication between people in two different locations will be done through words either via email or on a conference call. Communication: Logging levels https://www.markhneedham.com/blog/2010/10/25/communication-logging-levels/ Mon, 25 Oct 2010 18:49:23 +0000 https://www.markhneedham.com/blog/2010/10/25/communication-logging-levels/ I think one of the most important skills to perfect when communicating with other people is to understand the level of detail that we need to be speaking at, something my colleague Ashwin Raghav refers to as our logging level. We log various things in our code at varying logging levels ranging from 'error' through 'debug' to 'warn', and each of these is useful for understanding what our code is doing. Ruby: Mocking or stubbing methods on the system under test https://www.markhneedham.com/blog/2010/10/24/ruby-mocking-or-stubbing-methods-on-the-system-under-test/ Sun, 24 Oct 2010 17:30:26 +0000 https://www.markhneedham.com/blog/2010/10/24/ruby-mocking-or-stubbing-methods-on-the-system-under-test/ An approach to testing which I haven’t seen before and am therefore assuming is more specific to Ruby is the idea of stubbing or mocking out functions on the system under test. I’ve come across a couple of situations where this seems to be done: When stubbing out calls to methods which are being mixed into the class via a module When stubbing out calls to private methods within the class Feedback loops: Overcompensating https://www.markhneedham.com/blog/2010/10/24/feedback-loops-overcompensating/ Sun, 24 Oct 2010 08:39:14 +0000 https://www.markhneedham.com/blog/2010/10/24/feedback-loops-overcompensating/ One of the things that I’ve noticed while working with various colleagues over the last few years is that the more experienced ones are much more skilled at making slight adjustments to their approach based on feedback that they receive from the environment. I’ve been reading a couple of books on systems thinking over the last few months and one of the takeaways for me has been that we need to be careful when reacting to feedback we get from a system to ensure that we don’t over compensate and end up creating a new problem for ourselves instead. Agile: Story Wall - A couple of learnings https://www.markhneedham.com/blog/2010/10/22/agile-story-wall-a-couple-of-learnings/ Fri, 22 Oct 2010 17:13:34 +0000 https://www.markhneedham.com/blog/2010/10/22/agile-story-wall-a-couple-of-learnings/ I wrote earlier in the week about the benefits of having a physical story wall on a distributed team and in the process of getting one in place on the project we learnt a few things that I’d previously taken for granted. All the work in one place We initially started off by having stories on one part of the wall, bugs on another part and any technical tasks stored in Mingle somewhere. Learning: Writing about simple things https://www.markhneedham.com/blog/2010/10/20/learning-writing-about-simple-things/ Wed, 20 Oct 2010 20:51:56 +0000 https://www.markhneedham.com/blog/2010/10/20/learning-writing-about-simple-things/ My colleague Aman King is back in Pune for the time being and during one of our conversations he was asking me why I didn’t wait a bit longer and learn more about Ruby before writing about it. In a way he is right and I didn’t write anything at all about C# or Java when I was first learning how to write code in those languages because I didn’t have the confidence to write about something that I knew nothing about. Distributed Agile: Physical story wall still useful https://www.markhneedham.com/blog/2010/10/20/distributed-agile-physical-story-wall-still-useful/ Wed, 20 Oct 2010 17:21:24 +0000 https://www.markhneedham.com/blog/2010/10/20/distributed-agile-physical-story-wall-still-useful/ When I started working on my current project there was no physical story wall, instead the whole project was being tracked on Mingle. The current state of the Mingle story wall was sometimes visible on a shared monitor and sometimes wasn’t, depending on whether or not the monitor had been turned off. There was also a small wall used to track which stories were in development but after that there was no physical visibility of the status of anything. Coding: Context independent code https://www.markhneedham.com/blog/2010/10/18/coding-context-independent-code/ Mon, 18 Oct 2010 15:52:28 +0000 https://www.markhneedham.com/blog/2010/10/18/coding-context-independent-code/ I’ve been flicking through Growing Object Oriented Software Guided By Tests again and in Chapter 6 on Object Oriented Style I came across the part of the chapter which talks about writing context independent code which reminded me of some code I’ve worked on recently. The authors suggest the following: A system is easier to change if its objects are context-independent; that is, if each object has no built-in knowledge about the system in which it executes Ruby: Using alias with 'indexers' https://www.markhneedham.com/blog/2010/10/18/ruby-using-alias-with-indexers/ Mon, 18 Oct 2010 04:24:22 +0000 https://www.markhneedham.com/blog/2010/10/18/ruby-using-alias-with-indexers/ I’ve been browsing through some of the Rails routing code while following Jamis' Buck’s blog post and I came across something I hadn’t seen before while inside the 'NamedRouteCollection' class. The bit of code which initially confused me is in RouteSet.add_named_route: module ActionController module Routing class RouteSet def initialize ... self.named_routes = NamedRouteCollection.new end def add_named_route(name, path, options = {}) # TODO - is options EVER used? name = options[:name_prefix] + name. Distributed Agile: Cultural Differences/Expectation disconnect https://www.markhneedham.com/blog/2010/10/17/distributed-agile-cultural-differencesexpectation-disconnect/ Sun, 17 Oct 2010 15:06:18 +0000 https://www.markhneedham.com/blog/2010/10/17/distributed-agile-cultural-differencesexpectation-disconnect/ I came across an article written a few months ago titled 'Outsourcing doesn’t work' which discussed some of the problems the author has experienced while working with teams offshore. The article is provocatively titled but has some interesting observations which I thought I could contrast with my own after working offshore in Pune, India for a couple of months now. The team I’m working on is distributed between Pune and Chicago so it’s not exactly the same situation as the author’s but the majority of the team are in a different country to the client. Ruby: Hash default value https://www.markhneedham.com/blog/2010/10/16/ruby-hash-default-value/ Sat, 16 Oct 2010 14:02:37 +0000 https://www.markhneedham.com/blog/2010/10/16/ruby-hash-default-value/ I’ve been pairing a fair bit with Ashwin this week and one thing he showed me which I hadn’t previously seen is the ability to set a default value for a hash which gets returned if we search for a key that doesn’t exist. This is an idea that I originally came across while playing around with Clojure but with Clojure the default value was defined in the calling code rather than in the hash definition. RSpec: Testing Rails routes https://www.markhneedham.com/blog/2010/10/13/rspec-testing-rails-routes/ Wed, 13 Oct 2010 18:25:32 +0000 https://www.markhneedham.com/blog/2010/10/13/rspec-testing-rails-routes/ Something which I keep forgetting is how to write controller tests where I want to check whether an action correctly redirected to another action. With most of the routes in our application we’ve created a 'resourceful route' where each action maps to a CRUD operation in the database. We can do that with this type of code in routes.rb: ActionController::Routing::Routes.draw do |map| map.resources :foos end Several helper methods based on named rotes get created and included in our controllers when we do this and we have access to those inside our specs. Agile: Constraints https://www.markhneedham.com/blog/2010/10/13/agile-constraints/ Wed, 13 Oct 2010 14:03:54 +0000 https://www.markhneedham.com/blog/2010/10/13/agile-constraints/ I recently came across quite an interesting post written by Steve Garnett where he discusses the difference between constraints and impediments inside organisations. He comes to the following conclusion: For me, the difference between an impediment and a constraint is whether the individual, team, organisation, enterprise, or industry considers the obstacle as removable. If whoever is working with the obstacle believes it can be removed then it is considered an impediment, if the same person doesn’t not believe it can be removed, or doesn’t wish to work towards it’s removal, it’s considered a constraint. Ruby: Active Record - Using 'exclusive_scope' in IRB https://www.markhneedham.com/blog/2010/10/11/ruby-active-record-using-exclusive_scope-in-irb/ Mon, 11 Oct 2010 19:03:39 +0000 https://www.markhneedham.com/blog/2010/10/11/ruby-active-record-using-exclusive_scope-in-irb/ Ashwin and I have been working recently on a bit of code to make it possible to 'soft delete' some objects in our system. We’re doing this by creating an additional column in that table called 'deleted_at_date' which we populate if a record is 'deleted'. As we wanted the rest of the application to ignore 'deleted' records we added a default scope to it: class Foo < ActiveRecord::Base default_scope :conditions => "deleted_at_date is null" end This works fine but we wanted to be able to see the status of all the records in IRB and with the default scope 'Foo. Agile: The curse of meetings https://www.markhneedham.com/blog/2010/10/09/agile-the-curse-of-meetings/ Sat, 09 Oct 2010 03:39:29 +0000 https://www.markhneedham.com/blog/2010/10/09/agile-the-curse-of-meetings/ Something which can often happen with agile software development teams is that in the desire to take everyone’s opinion into account for every decision we end up having a lot of meetings. Toni wrote about this a while ago and described a situation where he’d managed to get rid of a meeting and just have a discussion after the stand up with the necessary people. While this is a good idea I still think there are occasions where it’s not necessary to discuss every problem down to the minute details with the whole team. Ruby: Getting the caller method with Kernel.caller https://www.markhneedham.com/blog/2010/10/08/ruby-getting-the-caller-method-with-kernel-caller/ Fri, 08 Oct 2010 13:19:56 +0000 https://www.markhneedham.com/blog/2010/10/08/ruby-getting-the-caller-method-with-kernel-caller/ One of the things I’ve been finding when debugging Cucumber specs is that due to the number of levels of indirection present in those examples it becomes quite difficult to work out exactly how certain pieces of code got called. In one cuke we were trying to work out how 4 objects of the same type were ending up in the database when it seemed like there should only be two. Rails: before_filter, rescue_from and so on https://www.markhneedham.com/blog/2010/10/05/rails-before_filter-rescue_from-and-so-on/ Tue, 05 Oct 2010 08:53:48 +0000 https://www.markhneedham.com/blog/2010/10/05/rails-before_filter-rescue_from-and-so-on/ One thing I’ve noticed while browsing our Rails code base is that the first entry point inside a controller is much less frequently the method corresponding to the action than it would be with a C# ASP.NET MVC application. The concept of filters exists in ASP.NET MVC but on the projects I’ve worked on they’ve been used significantly less than before filters would be in a Rails application. As a result I’m getting much more in the habit of checking for the before filters in the ApplicationController when an action isn’t working as expected to try and figure out what’s going on. Coding: Write the first one ugly https://www.markhneedham.com/blog/2010/10/03/coding-write-the-first-one-ugly/ Sun, 03 Oct 2010 05:03:44 +0000 https://www.markhneedham.com/blog/2010/10/03/coding-write-the-first-one-ugly/ I just came across a really cool blog post written a couple of months ago by Evan Light where he proposes that we 'write the first one ugly': To overcome paralysis, for small chunks of code, it is often better to just write whatever comes to mind — no matter how awful it may seem at the time. Give yourself permission to let the first version suck. I think this is a really good piece of advice and it seems along the same lines as a suggestion from Uncle Bob in Clean Code: RSpec: Another newbie mistake https://www.markhneedham.com/blog/2010/09/30/rspec-another-newbie-mistake/ Thu, 30 Sep 2010 07:03:07 +0000 https://www.markhneedham.com/blog/2010/09/30/rspec-another-newbie-mistake/ We recently had a spec which was checking that we didn’t receive a call to a specific method on an object… describe "Our Object" do it "should not update property if user is not an admin" do our_user = Factory("user_with_role_x) User.stub!(:find).and_return(our_user) user.stub!(:is_admin?).and_return(false) user.should_not_receive(:property) end end …where 'property' refers to a field in the users table. In the code 'property' would get set like this: class ObjectUnderTest def method_under_test user = User. Ruby: ActiveRecord 2.3.5 object equality https://www.markhneedham.com/blog/2010/09/30/ruby-activerecord-2-3-5-object-equality/ Thu, 30 Sep 2010 07:00:57 +0000 https://www.markhneedham.com/blog/2010/09/30/ruby-activerecord-2-3-5-object-equality/ We learnt something interesting about the equality of ActiveRecord objects today while comparing two user objects - one which was being provided to our application by Warden and the other that we’d retrieved by a 'User.find' call. Both objects referred to the same user in the database but were different instances in memory. We needed to check that we were referring to the same user for one piece of functionality and were therefore able to make use of the '==' method defined on ActiveRecord::Base which is defined in the documentation like so: Ruby: Intersection/Difference/Concatenation with collections https://www.markhneedham.com/blog/2010/09/29/ruby-intersectiondifferenceconcatenation-with-collections/ Wed, 29 Sep 2010 03:28:40 +0000 https://www.markhneedham.com/blog/2010/09/29/ruby-intersectiondifferenceconcatenation-with-collections/ We came across a couple of situations yesterday where we wanted to perform operations on two different arrays. My immediate thought was that there should be some methods available similar to what we have in C# which Mike Wagg and I spoke about in our talk about using functional programming techniques in C#. I was expecting to find methods with names indicating the operation they perform but in actual fact the methods are more like operators which makes for code that reads really well. FactoryGirl: 'has_and_belongs_to_many' associations and the 'NoMethodError' https://www.markhneedham.com/blog/2010/09/27/factorygirl-has_and_belongs_to_many-associations-and-the-nomethoderror/ Mon, 27 Sep 2010 14:18:48 +0000 https://www.markhneedham.com/blog/2010/09/27/factorygirl-has_and_belongs_to_many-associations-and-the-nomethoderror/ We ran into a somewhat frustrating problem while using Factory Girl to create an object which had a 'has_and_belongs_to_many' association with another object. The relevant code in the two classes was like this.. class Bar < ActiveRecord::Base has_and_belongs_to_many :foos, :class_name => "Foo", :join-table => "bar_foos" end class Foo < ActiveRecord::Base has_many :bars end …and we originally defined our 'Bar' factory like so: Factory.define :bar do |f| f.association(:foos, :factory => :foo) end Factory. RSpec: Fooled by stub!...with https://www.markhneedham.com/blog/2010/09/26/rspec-fooled-by-stub-with/ Sun, 26 Sep 2010 19:03:24 +0000 https://www.markhneedham.com/blog/2010/09/26/rspec-fooled-by-stub-with/ We had an RSpec spec setup roughly like this the other day… describe "my stub test" do it "should be amazin" do Mark.stub!(:random).with("some_wrong_argument").and_return("something") Another.new.a_method end end …where 'Mark' and 'Another' were defined like so: class Mark def self.random(params) "do some amazing stuff" end end class Another def a_method random = Mark.random("foo") # use random for something end end When we ran the spec we would get the following error message which was initially a little baffling: RSpec: Causing ourselves much pain through 'attr' misuse https://www.markhneedham.com/blog/2010/09/26/rspec-causing-ourselves-much-pain-through-attr-misuse/ Sun, 26 Sep 2010 18:57:53 +0000 https://www.markhneedham.com/blog/2010/09/26/rspec-causing-ourselves-much-pain-through-attr-misuse/ While testing some code that we were mixing into one of our controllers we made what I thought was an interesting mistake. The module we wanted to test had some code a bit like this… module OurModule def some_method @User = User.find(params[:id]) # in the test code this is always true if @user == user ... end end end .and we had the spec setup like so: describe 'OurController' do class TestController include OurModule attr_accessor :user end before(:each) do @controller = TestController. Ruby: Control flow using 'and' https://www.markhneedham.com/blog/2010/09/23/ruby-control-flow-using-and/ Thu, 23 Sep 2010 14:33:29 +0000 https://www.markhneedham.com/blog/2010/09/23/ruby-control-flow-using-and/ Something I’ve noticed while reading Ruby code is that quite frequently the flow of a program is controlled by the 'chaining' of different operations through use of the 'and' keyword. I’ve noticed that this pattern is used in Javascript code as well and it’s particularly prevalent when we want to get a status for those operations after they’ve all been executed. For example we might have the following code… Ruby: Returning hashes using merge! and merge https://www.markhneedham.com/blog/2010/09/21/ruby-returning-hashes-using-merge-and-merge/ Tue, 21 Sep 2010 20:24:47 +0000 https://www.markhneedham.com/blog/2010/09/21/ruby-returning-hashes-using-merge-and-merge/ We came across an interesting problem today with some code which was unexpectedly returning nil. The code that we had looked like this… class SomeClass def our_method a_hash = { :a => 2 } a_hash.merge!({:b => 3}) unless some_condition.nil? end end …and we didn’t notice the 'unless' statement on the end which meant that if 'some_condition' was nil then the return value of the method would be nil. One way around it is to ensure that we explicitly return a_hash at the end of the method… Learning cycles at an overall project level https://www.markhneedham.com/blog/2010/09/20/learning-cycles-at-an-overall-project-level/ Mon, 20 Sep 2010 18:56:20 +0000 https://www.markhneedham.com/blog/2010/09/20/learning-cycles-at-an-overall-project-level/ I was looking back over a post I wrote a couple of years ago where I described some learning cycles that I’d noticed myself going through with respect to code and although at the time I was thinking of those cycles in terms of code I think they are applicable at a project level as well. The cycles I described were as follows: Don’t know what is good and what’s bad Rails: Faking a delete method with 'form_for' https://www.markhneedham.com/blog/2010/09/20/rails-faking-a-delete-method-with-form_for/ Mon, 20 Sep 2010 18:52:15 +0000 https://www.markhneedham.com/blog/2010/09/20/rails-faking-a-delete-method-with-form_for/ We recently had a requirement to delete an item based on user input and wanting to adhere to the 'RESTful' approach that Rails encourages we therefore needed to fake a HTTP Delete method request. The documentation talks a little about this: The Rails framework encourages RESTful design of your applications, which means you’ll be making a lot of “PUT” and “DELETE” requests (besides “GET” and “POST”). However, most browsers don’t support methods other than “GET” and “POST” when it comes to submitting forms. Ruby: Random Observations https://www.markhneedham.com/blog/2010/09/19/ruby-random-observations/ Sun, 19 Sep 2010 11:35:28 +0000 https://www.markhneedham.com/blog/2010/09/19/ruby-random-observations/ I thought it’d be interesting to write down some of my observations after working with Ruby and Rails for a couple more weeks so here are some more things I’ve come across and others that I’ve got confused with… The :: operator (apparently also known as the leading double colon operator) I came across this while looking at some of the rails_warden code to try to understand how that gem opens the ActionController::Base class to add helper methods to it. Ruby: Testing declarative_authorization https://www.markhneedham.com/blog/2010/09/17/ruby-testing-declarative_authorization/ Fri, 17 Sep 2010 19:53:37 +0000 https://www.markhneedham.com/blog/2010/09/17/ruby-testing-declarative_authorization/ As I mentioned in a post earlier in the week we’re using the declarative_authorization gem to control access to various parts of our application and as we’ve been migrating parts of the code base over to use that framework one thing we’ve noticed is that there seems to be a diminishing return in how much value we get from writing specs to cover each rule that we create. We found that while it is possible to write a spec to cover every single rule it sometimes seems like the spec is just duplicating what the rule already describes. SICP: Iterative process vs Recursive process functions https://www.markhneedham.com/blog/2010/09/16/sicp-iterative-process-vs-recursive-process-functions/ Thu, 16 Sep 2010 18:48:31 +0000 https://www.markhneedham.com/blog/2010/09/16/sicp-iterative-process-vs-recursive-process-functions/ I was working my way through some of the exercises in SICP over the weekend and one that I found particularly interesting was 1.11 where you have to write a function by means of a recursive process and then by means of an iterative process. A function f is defined by the rule that f(n) = n if n<3 and f(n) = f(n - 1) + 2f(n - 2) + 3f(n - 3) if n> 3. Ruby: Caught out by no type checking https://www.markhneedham.com/blog/2010/09/13/ruby-caught-out-by-no-type-checking/ Mon, 13 Sep 2010 17:44:04 +0000 https://www.markhneedham.com/blog/2010/09/13/ruby-caught-out-by-no-type-checking/ I got caught out for a little while today when comparing a value coming into a controller from 'params' which we were then comparing with a collection of numbers. The code was roughly like this… class SomeController def some_action some_collection = [1,2,3,4,5] selected_item = some_collection.find { |item| item == params[:id] } end end …and since the 'id' being passed in was '1' I was expected that we should have a selected item but we didn’t. Ruby: FactoryGirl & declarative_authorization - Random thoughts https://www.markhneedham.com/blog/2010/09/12/ruby-factorygirl-declarative_authorization-random-thoughts/ Sun, 12 Sep 2010 14:25:06 +0000 https://www.markhneedham.com/blog/2010/09/12/ruby-factorygirl-declarative_authorization-random-thoughts/ Two other gems that we’re using on my current project are FactoryGirl and declarative_authorization. We use declarative_authorization for controlling access to various parts of the application and FactoryGirl allows us to build objects for use in our tests. We wanted to be able to deactivate the authorization when creating test objects because otherwise our test wouldn’t have permission to create certain objects. Our original approach was to create a 'God' role which we could assign to the 'current_user' in our tests therefore allowing us to create whatever objects we wanted. Learning: Study habits https://www.markhneedham.com/blog/2010/09/12/learning-study-habits/ Sun, 12 Sep 2010 13:27:39 +0000 https://www.markhneedham.com/blog/2010/09/12/learning-study-habits/ I came across an interesting article from the New York Times that Michael Feathers originally linked to on twitter which discusses some of the common ideas that we have about good study habits, pointing out the flaws in them and suggesting alternative approaches. The author starts out by making some interesting observations about spacing out our learning: An hour of study tonight, an hour on the weekend, another session a week from now: such so-called spacing improves later recall, without requiring students to put in more overall study effort or pay more attention, dozens of studies have found. Rails: Polymorphism through 'constantize' https://www.markhneedham.com/blog/2010/09/10/rails-polymorphism-through-constantize/ Fri, 10 Sep 2010 21:26:04 +0000 https://www.markhneedham.com/blog/2010/09/10/rails-polymorphism-through-constantize/ One interesting feature of Rails which Shishir pointed out the other day is the ability to take a user provided value and make use of Active Support’s 'constantize' method to effectively achieve polymorphism directly from the user’s input. As an example if we were creating different types of widgets from the same web page we might have several different forms that the user could submit. We could have a hidden field representing the type of the widget like so: Ruby: Checking an array contains an item https://www.markhneedham.com/blog/2010/09/08/ruby-checking-an-array-contains-an-item/ Wed, 08 Sep 2010 18:54:50 +0000 https://www.markhneedham.com/blog/2010/09/08/ruby-checking-an-array-contains-an-item/ A couple of times in the past few days I’ve wanted to check if a particular item exists in an array and presumably influenced by working for too long with the .NET/Java APIs I keep expecting there to be a 'contains' method that I can call on the array! More as an attempt to help myself remember than anything else, the method we want is actually called 'include?'. Therefore… jQuery UI Tabs: Changing selected tab https://www.markhneedham.com/blog/2010/09/08/jquery-ui-tabs-changing-selected-tab/ Wed, 08 Sep 2010 18:32:37 +0000 https://www.markhneedham.com/blog/2010/09/08/jquery-ui-tabs-changing-selected-tab/ We’re using the tabs part of the jQuery UI library on the project I’m currently working on and one thing we wanted to do was change the default tab that was being selected. The documentation suggested that one way to do this was to give the index of the tab we wanted selected when calling the tabs function: $( ".selector" ).tabs({ selected: 3 }); Since we wanted to select the tab by name based on a value from the query string we thought it would probably be simpler if we could just set the selected tab using a css class. Ruby: Hash ordering https://www.markhneedham.com/blog/2010/09/07/ruby-hash-ordering/ Tue, 07 Sep 2010 03:52:32 +0000 https://www.markhneedham.com/blog/2010/09/07/ruby-hash-ordering/ The application that I’m working on at the moment is deployed into production on JRuby but we also use the C Ruby 1.8.7 interpreter when developing locally since this allows us much quicker feedback. As a result we sometimes come across interesting differences in the way that the two runtimes work. One that we noticed yesterday is that if you create a hash, the order of the keys in the hash will be preserved when interpreted on JRuby but not with the C Ruby interpreter. Flow in software teams https://www.markhneedham.com/blog/2010/09/05/flow-in-software-teams/ Sun, 05 Sep 2010 17:34:17 +0000 https://www.markhneedham.com/blog/2010/09/05/flow-in-software-teams/ My former colleague Greg Gigon has written an interesting blog post where he talks about the pain that we cause ourselves by multi-tasking, a point which Kevin Fox also makes on the Theory of Constraints blog. I think the overall point that he makes is very true: We can switch our attention quickly from one task to another. But … is it good for our brain? Is it good for the work we are doing? Design Simplicity: Partially updating an object https://www.markhneedham.com/blog/2010/09/05/design-simplicity-partially-updating-an-object/ Sun, 05 Sep 2010 17:32:00 +0000 https://www.markhneedham.com/blog/2010/09/05/design-simplicity-partially-updating-an-object/ One of the most common discussions that I have with my colleagues is around designing bits of code in the simplest way possible. I’ve never quite been able to put my finger on exactly what makes a design simple and there is frequently disagreement about what is even considered simple. On the last project I worked on we had an interesting problem where we wanted to partially update different parts of an object from different pages of the application. Objective C: Observations https://www.markhneedham.com/blog/2010/08/31/objective-c-observations/ Tue, 31 Aug 2010 18:27:10 +0000 https://www.markhneedham.com/blog/2010/08/31/objective-c-observations/ I’ve been playing around with Objective C over the last month or so and although my knowledge of the language is still very much limited I thought it’d be interesting to describe some of the things about the language that I think are quite interesting and others that keep catching me out. Protocols I touched on protocols a bit in my first post but they seem like an interesting middle ground between interfaces and duck typing. Rails: Populating a dropdown list using 'form_for' https://www.markhneedham.com/blog/2010/08/31/rails-populating-a-dropdown-list-using-form_for/ Tue, 31 Aug 2010 01:22:14 +0000 https://www.markhneedham.com/blog/2010/08/31/rails-populating-a-dropdown-list-using-form_for/ Last week we were trying to make use of Rails' 'form_for' helper to populate a dropdown list with the values of a collection that we’d set to an instance variable in our controller. My colleague pointed out that we’d need to use 'collection_select' in order to do this. We want to put the values in the 'foos' collection onto the page. 'foos' is a hash which defines some display values and their corresponding values like so: Coding: Mutating parameters https://www.markhneedham.com/blog/2010/08/26/coding-mutating-parameters/ Thu, 26 Aug 2010 07:47:23 +0000 https://www.markhneedham.com/blog/2010/08/26/coding-mutating-parameters/ One of the earliest rules of thumb that I was taught by my colleagues is the idea that we should try and avoid mutating/changing values passed into a function as a parameter. The underlying reason as I understand it is that if you’re just skimming through the code you wouldn’t necessarily expect the values of incoming parameters to be different depending where in the function they’re used. I think the most dangerous example of this is when we completely change the value of a parameter, like so: Ruby: 'method_missing' and slightly misled by RubyMine https://www.markhneedham.com/blog/2010/08/23/ruby-method_missing-and-slightly-misled-by-rubymine/ Mon, 23 Aug 2010 21:07:46 +0000 https://www.markhneedham.com/blog/2010/08/23/ruby-method_missing-and-slightly-misled-by-rubymine/ Another library that we’re using on my project is ActionMailer and before reading through the documentation I was confused for quite a while with respect to how it actually worked. We have something similar to the following piece of code… Emailer.deliver_some_email …which when you click its definition in RubyMine takes you to this class definition: class Emailer < ActionMailer::Base def some_email recipients "some@email.com" from "some_other_email@whatever.com" # and so on end end I initially thought that method was called 'deliver_some_mail' but having realised that it wasn’t I was led to the 'magic' that is 'method_missing' on 'ActionMailer::Base' which is defined as follows: Distributed Agile: Initial observations https://www.markhneedham.com/blog/2010/08/23/distributed-agile-initial-observations/ Mon, 23 Aug 2010 02:52:37 +0000 https://www.markhneedham.com/blog/2010/08/23/distributed-agile-initial-observations/ One of the reasons I wanted to come and work for ThoughtWorks in India is that I wanted to see how a distributed agile project is run and see the ways in which it differs to one which is done co-located. I worked on a project which was distributed between Sydney and Melbourne in 2008/2009 and while some of the challenges seem to be quite similar to the ones we faced there, some are completely different. Ruby: Accessing fields https://www.markhneedham.com/blog/2010/08/22/ruby-accessing-fields/ Sun, 22 Aug 2010 18:26:17 +0000 https://www.markhneedham.com/blog/2010/08/22/ruby-accessing-fields/ I’ve spent a little time browsing through some of the libraries used by my project and one thing which I noticed in ActiveSupport is that fields don’t seem to be accessed directly but rather are accessed through a method which effectively encapsulates them inside the object. For example the following function is defined in 'inheritable_attributes.rb' def write_inheritable_attribute(key, value) if inheritable_attributes.equal?(EMPTY_INHERITABLE_ATTRIBUTES) @inheritable_attributes = {} end inheritable_attributes[key] = value end def inheritable_attributes @inheritable_attributes ||= EMPTY_INHERITABLE_ATTRIBUTES end EMPTY_INHERITABLE_ATTRIBUTES = {}. Ultimate configurability https://www.markhneedham.com/blog/2010/08/21/ultimate-configurability/ Sat, 21 Aug 2010 11:04:54 +0000 https://www.markhneedham.com/blog/2010/08/21/ultimate-configurability/ In Continuous Delivery the authors talk about the danger of ultimate configurability… Configurable software is not always the cheaper solution it appears to be. It’s almost always better to focus on delivering the high-value functionality with little configuration and then add configuration options later when necessary …and from my experience when you take this over configurability to its logical conclusion you end up developing a framework that can hopefully just be 'configured' for any number of 'front ends'. The fear tax https://www.markhneedham.com/blog/2010/08/20/the-fear-tax/ Fri, 20 Aug 2010 14:14:28 +0000 https://www.markhneedham.com/blog/2010/08/20/the-fear-tax/ Seth Godin recently wrote a post about 'the fear tax' which he describes as a 'tax' that we pay when we do something in order to try and calm our fear about something else but don’t necessarily end up calming those fears. We pay the fear tax every time we spend time or money seeking reassurance. We pay it twice when the act of seeking that reassurance actually makes us more anxious, not less. Database configuration: Just like any other change https://www.markhneedham.com/blog/2010/08/18/database-configuration-just-like-any-other-change/ Wed, 18 Aug 2010 10:07:42 +0000 https://www.markhneedham.com/blog/2010/08/18/database-configuration-just-like-any-other-change/ I’ve been flicking through Continuous Deployment and one section early on about changing configuration information in our applications particularly caught my eye: In our experience, it is an enduring myth that configuration information is somehow less risky to change than source code. Our bet is that, given access to both, we can stop your system at least as easily by changing the configuration as by changing the source code. iPad: Getting PragProg books onto the Kindle App https://www.markhneedham.com/blog/2010/08/16/ipad-getting-pragprog-books-onto-the-kindle-app/ Mon, 16 Aug 2010 07:18:05 +0000 https://www.markhneedham.com/blog/2010/08/16/ipad-getting-pragprog-books-onto-the-kindle-app/ As I’ve mentioned previously I think the Kindle application on the iPad is the best one for reading books and as a result of that I wanted to be able to read some books which I’d bought from the PragProg store onto it. The first step is to download the '.mobi' version of the book and use iPhoneExplorer to drag the file into the 'Kindle/Documents/eBook' folder on the iPad. Creativity - John Cleese https://www.markhneedham.com/blog/2010/08/16/creativity-john-cleese/ Mon, 16 Aug 2010 05:42:51 +0000 https://www.markhneedham.com/blog/2010/08/16/creativity-john-cleese/ Jonas Boner recently linked to a really cool (and short) presentation by John Cleese about creativity which I think is very applicable to software development.</param></param></param></embed> Cleese describes some observations he’s made about creativity from his experiences working in comedy. These were some of the key ideas: Plan to throw one away? Cleese describes a situation where he wrote a script for Fawlty Towers and then lost it. He decided to rewrite it from memory and after he’d done that he found the original. Can we always release to production incrementally? https://www.markhneedham.com/blog/2010/08/16/can-we-always-release-to-production-incrementally/ Mon, 16 Aug 2010 04:22:40 +0000 https://www.markhneedham.com/blog/2010/08/16/can-we-always-release-to-production-incrementally/ Jez recently linked to a post written by Timothy Fitz about a year ago where he talks about the way his team use continuous delivery which means that every change made to the code base goes into production immediately as long as it passes their test suite. I’ve become fairly convinced recently that it should always be possible to deploy to production frequently but we recently came across a situation where it seemed like doing that wouldn’t make much sense. Objective C: Expected '(' before 'Project' https://www.markhneedham.com/blog/2010/08/14/objective-c-expected-before-project/ Sat, 14 Aug 2010 10:33:24 +0000 https://www.markhneedham.com/blog/2010/08/14/objective-c-expected-before-project/ A mistake I’ve made more than a few times while declaring headers in Objective C is forgetting to explicitly import the classes used in the interface definition. I’ve been refactoring some of the code I wrote earlier in the week and wanted to create a 'LabelFactory'. I had the following code: LabelFactory.h #import <UIKit/UIKit.h> @interface LabelFactory : NSObject { } + (UILabel*)createLabelFrom:(Project *)project withXCoordinate:(NSInteger)x withYCoordinate:(NSInteger)y; @end Which gives this error on compilation: Rules of thumb vs Exercise your judgement https://www.markhneedham.com/blog/2010/08/13/rules-of-thumb-vs-exercise-your-judgement/ Fri, 13 Aug 2010 10:05:41 +0000 https://www.markhneedham.com/blog/2010/08/13/rules-of-thumb-vs-exercise-your-judgement/ I spent a bit of time working through the first Micro Testing album of the Industrial Logic eLearning suite a few weeks ago and there’s an interesting piece of advice towards the end of the album: Microtesting is not a formula. It’s a technique. When microtesting rigorously, you will be called constantly to make judgments like these, between one set of names and another, and their corresponding approaches. Remember the judgment premise. One idea at a time https://www.markhneedham.com/blog/2010/08/12/one-idea-at-a-time/ Thu, 12 Aug 2010 18:59:54 +0000 https://www.markhneedham.com/blog/2010/08/12/one-idea-at-a-time/ One thing I noticed while pairing with some of the ThoughtWorks University guys a few weeks ago is that I had an almost overwhelming urge to show them all sorts of coding techniques that I’ve learned, probably to the point where it’d be more confusing than helpful. JK pointed out that it’s more effective to bite your tongue and just focus on one idea at a time which is something that the authors of Agile Coaching touch on briefly at the beginning of the book: Coding: Using a library/rolling your own https://www.markhneedham.com/blog/2010/08/10/coding-using-a-libraryrolling-your-own/ Tue, 10 Aug 2010 17:25:39 +0000 https://www.markhneedham.com/blog/2010/08/10/coding-using-a-libraryrolling-your-own/ One of the things that I’ve noticed as we’ve started writing more client side code is that I’m much more likely to look for a library which solves a problem than I would be with server side code. A requirement that we’ve had on at least the last 3 or 4 projects I’ve worked on is to do client side validation on the values entered into a form by the user. Learning and Situated cognition https://www.markhneedham.com/blog/2010/08/10/learning-and-situated-cognition/ Tue, 10 Aug 2010 03:26:23 +0000 https://www.markhneedham.com/blog/2010/08/10/learning-and-situated-cognition/ Sumeet recently blogged about the new style ThoughtWorks University that he and the other trainers have introduced and although I only got to see it in action for a few days it seemed clear to me that it was an improvement on the original version. The questions being asked, discussions being had and situations that were coming up were pretty much the same as I’ve seen on any software project that I’ve worked on. iPad: Redrawing the screen https://www.markhneedham.com/blog/2010/08/09/ipad-redrawing-the-screen/ Mon, 09 Aug 2010 04:38:17 +0000 https://www.markhneedham.com/blog/2010/08/09/ipad-redrawing-the-screen/ As I mentioned in a post I wrote last week I’ve been writing a little iPad application to parse a cctray feed and then display the status of the various builds on the screen. The way I’ve been doing this is by dynamically adding labels to the view and colouring the background of those labels red or green depending on the build status. FirstViewController.h @interface FirstViewController : UIViewController { . Coding: Tools/Techniques influence the way we work https://www.markhneedham.com/blog/2010/08/07/coding-toolstechniques-influence-the-way-we-work/ Sat, 07 Aug 2010 13:14:05 +0000 https://www.markhneedham.com/blog/2010/08/07/coding-toolstechniques-influence-the-way-we-work/ Dave Astels mentions in his BDD paper that the way we use language influences the way that we write code, quoting the Sapir-Whorf hypothesis “there is a systematic relationship between the grammatical categories of the language a person speaks and how that person both understands the world and behaves in it.” In a similar way, something which I didn’t fully appreciate until the last project I worked on is how much the tools and techniques that you use can influence the way that you work. Objective C: Back to being a novice https://www.markhneedham.com/blog/2010/08/06/objective-c-back-to-being-a-novice/ Fri, 06 Aug 2010 03:59:20 +0000 https://www.markhneedham.com/blog/2010/08/06/objective-c-back-to-being-a-novice/ As I mentioned in my previous post about parsing an XML file in Objective C I’m a novice on the Dreyfus Model when it comes to this type of development and I’ve found it interesting that I’ve dropped back into habits from my PHP days when I was first learning how to program. The big picture My first instinct after I’d created a project in XCode was to try and understand how an iPad application fits together. Objective C: Parsing an XML file https://www.markhneedham.com/blog/2010/08/04/objective-c-parsing-an-xml-file/ Wed, 04 Aug 2010 05:00:01 +0000 https://www.markhneedham.com/blog/2010/08/04/objective-c-parsing-an-xml-file/ I’ve been wanting to try out some iPad development for a while and as a hello worldish exercise for myself I thought I’d try and work out how to parse the cctray.xml file from Sam Newman’s bigvisiblewall. Realising that I’m a novice on the Dreyfus Model when it comes to Objective C I started out by following a tutorial from iPhone SDK Articles which explained how to do this. The value of naming things https://www.markhneedham.com/blog/2010/07/31/the-value-of-naming-things/ Sat, 31 Jul 2010 07:05:52 +0000 https://www.markhneedham.com/blog/2010/07/31/the-value-of-naming-things/ Nikhil and I were discussing some of the ideas around Test Driven Development earlier in the week and at one stage I pointed out that I quite liked Bryan Liles' idea of 'make it pass or change the message'. Bryan suggests that when we have a failing test our next step should be to make that test pass or at least write some code which results in us getting a different error message and hopefully one step closer to making the test pass. Kent Beck's Test Driven Development Screencasts https://www.markhneedham.com/blog/2010/07/28/kent-becks-test-driven-development-screencasts/ Wed, 28 Jul 2010 10:44:05 +0000 https://www.markhneedham.com/blog/2010/07/28/kent-becks-test-driven-development-screencasts/ Following the recommendations of Corey Haines, Michael Guterl, James Martin and Michael Hunger I decided to get Kent Beck’s screencasts on Test Driven Development which have been published by the Pragmatic Programmers. I read Kent’s 'Test Driven Development By Example' book a couple of years ago and remember enjoying that so I was intrigued as to what it would be like to see some of those ideas put into practice in real time. TDD: Call your shots https://www.markhneedham.com/blog/2010/07/28/tdd-call-your-shots/ Wed, 28 Jul 2010 07:39:03 +0000 https://www.markhneedham.com/blog/2010/07/28/tdd-call-your-shots/ One of the other neat ideas I was reminded of when watching Kent Beck’s TDD screencasts is the value of 'calling your shots' i.e. writing a test and then saying what’s going to happen when you run that test. It reminds me of an exercise we used to do in tennis training when I was younger. The coach would feed the ball to you and just before you hit it you had to say exactly where on the court you were going to place it - cross court/down the line and short/deep. TDD: Testing collections https://www.markhneedham.com/blog/2010/07/28/tdd-testing-collections/ Wed, 28 Jul 2010 06:05:25 +0000 https://www.markhneedham.com/blog/2010/07/28/tdd-testing-collections/ I’ve been watching Kent Beck’s TDD screencasts and in the 3rd episode he reminded me of a mistake I used to make when I was first learning how to test drive code. The mistake happens when testing collections and I would write a test which would pass even if the collection had nothing in it. The code would look something like this: [Test] public void SomeTestOfACollection() { var someObject = new Object(); var aCollection = someObject. Agile: Developer attendance at showcases https://www.markhneedham.com/blog/2010/07/27/agile-developer-attendance-at-showcases/ Tue, 27 Jul 2010 07:31:59 +0000 https://www.markhneedham.com/blog/2010/07/27/agile-developer-attendance-at-showcases/ On the majority of the projects that I’ve worked on at ThoughtWorks we’ve held a showcase at the end of each iteration to show our client what we’ve been working on and finished over the previous one or two weeks. The format of these showcases has been fairly similar each time but the people who attended has tended to vary depending on the situation. As part of the project being worked on at ThoughtWorks University we’ve run a showcase at the end of each week which the whole team have been attending. Technical Debt around release time https://www.markhneedham.com/blog/2010/07/25/technical-debt-around-release-time/ Sun, 25 Jul 2010 14:21:34 +0000 https://www.markhneedham.com/blog/2010/07/25/technical-debt-around-release-time/ One of the requirements that the ThoughtWorks University grads have been given on the internal project they’re working on is to ensure that they leave the code base in a good state so that the next batch can potentially continue from where they left off. The application will be deployed on Thursday and this means that a lot of the time this week will be spent refactoring certain areas of the code base rather than only adding new functionality. Bundler: Don't forget to call 'source' https://www.markhneedham.com/blog/2010/07/25/bundler-dont-forget-to-call-source/ Sun, 25 Jul 2010 11:48:51 +0000 https://www.markhneedham.com/blog/2010/07/25/bundler-dont-forget-to-call-source/ Brian, Tejas and I (well mainly them) have been working on an application to give badges to people based on their GitHub activity at the Yahoo Open Hack Day in Bangalore and we’ve been making use of Bundler to pull in our dependencies. Our Gemfile was originally like this: gem "sinatra", "1.0" gem "haml", "3.0.13" gem "activesupport", "3.0.0.beta4", :require => false gem "tzinfo", "0.3.22" gem "nokogiri", "1.4.2" ... For quite a while we were wondering why 'bundle install' wasn’t actually resolving anything at all before we RTFM and realised that we needed to call 'source' at the top so that bundler knows where to pull the dependencies from. TDD, small steps and no need for comments https://www.markhneedham.com/blog/2010/07/23/tdd-small-steps-and-no-need-for-comments/ Fri, 23 Jul 2010 02:52:03 +0000 https://www.markhneedham.com/blog/2010/07/23/tdd-small-steps-and-no-need-for-comments/ I recently came a blog post written by Matt Ward describing some habits to make you a better coder and while he presented a lot of good ideas I found myself disagreeing with his 2nd tip: Write Your Logic through Comments When it comes to coding, there are many tenets and ideas I stand by. One of this is that code is 95% logic. Another is that logic doesn't change when translated from human language into a programming language. The prepared mind vs having context when learning new ideas https://www.markhneedham.com/blog/2010/07/22/the-prepared-mind-vs-having-context-when-learning-new-ideas/ Thu, 22 Jul 2010 04:06:40 +0000 https://www.markhneedham.com/blog/2010/07/22/the-prepared-mind-vs-having-context-when-learning-new-ideas/ I’m currently working as a trainer for ThoughtWorks University (TWU) and the participants have some Industrial Logic e-learning material to work through before they take part in the 6 week training program. I’ve been working through the refactoring/https://elearning.industriallogic.com/gh/submit?Action=AlbumContentsAction&album=recognizingSmells&devLanguage=Java[code smells] courses myself and while I’ve been finding it really useful, I think this was partly because I’ve been able to link the material to situations that I’ve seen in code bases that I’ve worked on over the past few years. Feedback, the environment and other people https://www.markhneedham.com/blog/2010/07/20/feedback-the-environment-and-other-people/ Tue, 20 Jul 2010 17:30:16 +0000 https://www.markhneedham.com/blog/2010/07/20/feedback-the-environment-and-other-people/ Something that I’ve noticed over the last few years is that when people give feedback to each other there is often an over emphasis on the individual and less attention paid to the environment in which they were working. I covered this a bit in a blog post I wrote about a year ago titled 'Challenging projects and the Kubler Ross Grief Cycle' which I converted into a presentation that I gave at XP2010 in June. Writing off a badly executed practice https://www.markhneedham.com/blog/2010/07/17/writing-off-a-badly-executed-practice/ Sat, 17 Jul 2010 11:13:51 +0000 https://www.markhneedham.com/blog/2010/07/17/writing-off-a-badly-executed-practice/ I recently came across an interesting post about pair programming by Paritosh Ranjan where he outlines some of the problems he’s experienced with this practice. While some of the points that he raises are certainly valid I think they’re more evidence of pair programming not being done in an effective way rather than a problem with the idea in itself. To take one example: Generally people don’t think a lot while pair programming as the person who wants to think about the pros and cons will be considered inefficient (as he will slow down the coding speed). TDD: I hate deleting unit tests https://www.markhneedham.com/blog/2010/07/15/tdd-i-hate-deleting-unit-tests/ Thu, 15 Jul 2010 23:15:54 +0000 https://www.markhneedham.com/blog/2010/07/15/tdd-i-hate-deleting-unit-tests/ Following on from my post about the value we found in acceptance tests on our project when doing a large scale refactoring I had an interesting discussion with Jak Charlton and Ben Hall about deleting unit tests when they’re no longer needed. The following is part of our discussion: Ben: @JakCharlton @markhneedham a lot (not all) of the unit tests created can be deleted once the acceptance tests are passing. Drive - Dan Pink https://www.markhneedham.com/blog/2010/07/15/drive-dan-pink/ Thu, 15 Jul 2010 00:21:09 +0000 https://www.markhneedham.com/blog/2010/07/15/drive-dan-pink/ One of the more interesting presentations doing the rounds on twitter and on our internal mailing lists is the following one by Dan Pink titled 'Drive - The surprising truth about what motivates us'.</param></param></param></embed> This topic generally interests me anyway but it’s quite intriguing that the research Dan has gathered support for what I imagine many people intrinsically knew. Incentives The presentation dispels the myth that money always works as a motivator for getting people to do what we want them to do. J: Tacit Programming https://www.markhneedham.com/blog/2010/07/13/j-tacit-programming/ Tue, 13 Jul 2010 14:47:41 +0000 https://www.markhneedham.com/blog/2010/07/13/j-tacit-programming/ A couple of months ago I wrote about tacit programming with respect to F#, a term which I first came across while reading about the J programming language. There’s a good introduction to tacit programming on the J website which shows the evolution of a function which originally has several local variables into a state where it has none at all. I’ve been having a go at writing Roy Osherove’s TDD Kata in J and while I haven’t got very far yet I saw a good opportunity to move the code I’ve written so far into a more tacit style. Linchpin: Book Review https://www.markhneedham.com/blog/2010/07/12/linchpin-book-review/ Mon, 12 Jul 2010 16:07:12 +0000 https://www.markhneedham.com/blog/2010/07/12/linchpin-book-review/ I’ve read a couple of Seth Godin’s other books - Tribes and The Dip - and found them fairly readable so I figured his latest offering, Linchpin, would probably be worth a read too. This is the first book that I’ve read on the iPad’s Kindle application and it was a reasonably good reading experience - I particularly like the fact that you can make notes and highlight certain parts of the text. The Internet Explorer 6 dilemma https://www.markhneedham.com/blog/2010/07/11/the-internet-explorer-6-dilemma/ Sun, 11 Jul 2010 19:31:16 +0000 https://www.markhneedham.com/blog/2010/07/11/the-internet-explorer-6-dilemma/ A couple of weeks ago Dermot and I showcased a piece of functionality that we’d been working on - notably hiding some options in a drop down list. We showcased this piece of functionality to the rest of the team in Firefox and it all worked correctly. Our business analyst, who was also acting as QA, then had a look at the story in Internet Explorer 6 and we promptly realised that the way we’d solved the problem didn’t actually work in IE6. A new found respect for acceptance tests https://www.markhneedham.com/blog/2010/07/11/a-new-found-respect-for-acceptance-tests/ Sun, 11 Jul 2010 17:08:39 +0000 https://www.markhneedham.com/blog/2010/07/11/a-new-found-respect-for-acceptance-tests/ On the project that I’ve been working on over the past few months one of the key benefits of the application was its ability to perform various calculations based on user input. In order to check that these calculators are producing the correct outputs we created a series of acceptance tests that ran directly against one of the objects in the system. We did this by defining the inputs and expected outputs for each scenario in an excel spreadsheet which we converted into a CSV file before reading that into an NUnit test. Performance: Do it less or find another way https://www.markhneedham.com/blog/2010/07/10/performance-do-it-less-or-find-another-way/ Sat, 10 Jul 2010 22:49:52 +0000 https://www.markhneedham.com/blog/2010/07/10/performance-do-it-less-or-find-another-way/ One thing that we tried to avoid on the project that I’ve been working on is making use of C# expressions trees in production code. We found that the areas of the code where we compiled these expressions trees frequently showed up as being the least performant areas of the code base when run through a performance profiler. In a discussion about the ways to improve the performance of an application Christian pointed out that once we’ve identified the area for improvement there are two ways to do this: Installing Ruby 1.9.2 with RVM on Snow Leopard https://www.markhneedham.com/blog/2010/07/08/installing-ruby-1-9-2-with-rvm-on-snow-leopard/ Thu, 08 Jul 2010 13:10:32 +0000 https://www.markhneedham.com/blog/2010/07/08/installing-ruby-1-9-2-with-rvm-on-snow-leopard/ Yesterday evening I decided to try and upgrade the Ruby installation on my Mac from 1.8.7 to 1.9.2 and went on the yak shaving mission which is doing just that. RVM seems to be the way to install Ruby these days so I started off by installing that with the following command from the terminal: bash < <( curl http://rvm.beginrescueend.com/releases/rvm-install-head ) That bit worked fine for me but there are further instructions on the RVM website if that doesn’t work. Group feedback https://www.markhneedham.com/blog/2010/07/07/group-feedback/ Wed, 07 Jul 2010 00:17:41 +0000 https://www.markhneedham.com/blog/2010/07/07/group-feedback/ On an internal mailing list my colleague David Pattinson recently described a feedback approach he’d used on a project where everyone on the team went into a room and they took turns giving direct feedback to each person. Since we were finishing the project that we’ve been working on for the past few months, Christian, Dermot and I decided to give it a try last week. One thing to note is that this feedback wasn’t linked to any performance review, it was just between the 3 of us to allow us to find ways that we can be more effective on projects that we work on in the future. The Limited Red Society - Joshua Kerievsky https://www.markhneedham.com/blog/2010/07/05/the-limited-red-society-joshua-kerievsky/ Mon, 05 Jul 2010 15:02:32 +0000 https://www.markhneedham.com/blog/2010/07/05/the-limited-red-society-joshua-kerievsky/ I recently watched a presentation given by Joshua Kerievsky from the Lean Software & Systems conference titled 'The Limited Red Society' in which describes an approach to refactoring where we try to minimise the amount of time that the code is in a 'red' state. This means that the code should be compiling and the tests green for as much of this time as possible . I think it’s very important to follow these principles in order to successfully refactor code on a project team and it’s an approach that my colleague Dave Cameron first introduced me to when we worked together last year. Mikado-ish method for debugging https://www.markhneedham.com/blog/2010/07/04/mikado-ish-method-for-debugging/ Sun, 04 Jul 2010 01:20:45 +0000 https://www.markhneedham.com/blog/2010/07/04/mikado-ish-method-for-debugging/ I’ve written previously about the Mikado method and how I’ve made use of it for identifying ways in which I could refactor code but I think this approach is more generally applicable for any kind of code investigation. Our application has a lot of calculations in it and we’ve been trying to refactor the code which wires all the calculators up to make use of a DSL which reveals the intention of the code more as well as making it easier to test. Coding: Having the design influenced by the ORM https://www.markhneedham.com/blog/2010/07/02/coding-having-the-design-influenced-by-the-orm/ Fri, 02 Jul 2010 16:56:41 +0000 https://www.markhneedham.com/blog/2010/07/02/coding-having-the-design-influenced-by-the-orm/ I wrote a few weeks ago about incremental refactoring using a static factory method where we ended up with the following code: public class LookUpKey { private readonly string param1; private readonly string param2; private readonly string param3; public LookUpKey(string param1, string param2, string param3) { this.param1 = param1; this.param2 = param2; this.param3 = param3; } public static LookUpKey CreateFrom(UserData userData) { var param1 = GetParam1From(userData); var param2 = GetParam2From(userData); var param3 = GetParam3From(userData); return new LookUpKey(param1, param2, param3); } public string Param1Key { { get { return param1; } } } . jQuery: Dynamically updating a drop down list https://www.markhneedham.com/blog/2010/06/30/jquery-dynamically-updating-a-drop-down-list/ Wed, 30 Jun 2010 10:46:20 +0000 https://www.markhneedham.com/blog/2010/06/30/jquery-dynamically-updating-a-drop-down-list/ We recently had a requirement to dynamically update a drop down list based on how the user had filled in other parts of the page. Our initial approach was to populate the drop down with all potential options on page load and then add CSS selectors to the options that we wanted to hide. That worked fine in Chrome and Firefox but Internet Explorer seems to ignore CSS selectors inside a drop down list so none of the options were being hidden. NHibernate 2nd level cache: Doing it wrong? https://www.markhneedham.com/blog/2010/06/29/nhibernate-2nd-level-cache-doing-it-wrong/ Tue, 29 Jun 2010 06:45:11 +0000 https://www.markhneedham.com/blog/2010/06/29/nhibernate-2nd-level-cache-doing-it-wrong/ I wrote a couple of weeks ago about how we’d been trying to make use of the NHibernate 2nd level cache and we were able to cache our data by following the various posts that I listed. Unfortunately when we ran some performance tests we found that the performance of the application was significantly worse than when we just wrote our own 'cache' - an object which had a dictionary containing the reference data items we’d previously tried to lookup and the appropriate values. Intuition and 'quit thinking and look' https://www.markhneedham.com/blog/2010/06/28/intuition-and-quit-thinking-and-look/ Mon, 28 Jun 2010 08:39:10 +0000 https://www.markhneedham.com/blog/2010/06/28/intuition-and-quit-thinking-and-look/ Something which Dermot, Christian and I noticed last week is that on our project we’ve reached the stage where we intuitively know what the underlying problem is for any given error message in the application we’re working on. We’re pretty much at the stage where we’re effectively pattern matching what’s going on without needing to think that much anymore. This is a good thing because it saves a lot of time analysing every single message to try and work out what’s going on - I think this means that we’ve reached a higher level of the Dreyfus model when it comes to this particular situation. Is 'be the worst' ever limiting? https://www.markhneedham.com/blog/2010/06/26/is-be-the-worst-ever-limiting/ Sat, 26 Jun 2010 10:03:25 +0000 https://www.markhneedham.com/blog/2010/06/26/is-be-the-worst-ever-limiting/ One of my favourite patterns from Ade Oshineye and Dave Hoover’s 'Apprenticeship Patterns' is 'Be the worst' which is described as follows: Surround yourself with developers who are better than you. Find a stronger team where you are the weakest member and have room to grow. Be the Worst was the seminal pattern of this pattern language. It was lifted from some advice that Pat Metheny offered to young musicians: “Be the worst guy in every band you’re in. Mercurial: Only pushing some local changes https://www.markhneedham.com/blog/2010/06/25/mercurial-only-pushing-some-local-changes/ Fri, 25 Jun 2010 23:32:36 +0000 https://www.markhneedham.com/blog/2010/06/25/mercurial-only-pushing-some-local-changes/ One problem we’ve come across a few times over the last couple of months while using Mercurial is the situation where we want to quickly commit a local change without committing other local changes that we’ve made. The example we came across today was where we wanted to make a change to the build file as we’d made a mistake in the target that runs on our continuous integration server and hadn’t noticed for a while during which time we’d accumulated other local changes. Leadership and software teams: Some thoughts https://www.markhneedham.com/blog/2010/06/22/leadership-and-software-teams-some-thoughts/ Tue, 22 Jun 2010 22:51:10 +0000 https://www.markhneedham.com/blog/2010/06/22/leadership-and-software-teams-some-thoughts/ Roy Osherove wrote a post about a month ago describing the different maturity levels of software teams and the strategies that he uses when leading each of these which I found quite interesting. He describes the following states of maturity for a team: Chaotic Stage — the state where a team does not possess the skills, motives or ambition to become a mature self managing team. Mid-Life stage — where a team possesses some skills for self management and decision making , and can make some of its own decisions without needing a team lead. C#: StackTrace https://www.markhneedham.com/blog/2010/06/22/c-stacktrace/ Tue, 22 Jun 2010 22:27:58 +0000 https://www.markhneedham.com/blog/2010/06/22/c-stacktrace/ Dermot and I were doing a bit of work on a mini testing DSL that we’ve been writing to try and make some of our interaction tests a bit more explicit and one of the things that we wanted to do was find out which method was being called on one of our collaborators. We have a stub collaborator which gets injected into our system under test. It looks roughly like this: iPad: First thoughts https://www.markhneedham.com/blog/2010/06/21/ipad-first-thoughts/ Mon, 21 Jun 2010 21:30:20 +0000 https://www.markhneedham.com/blog/2010/06/21/ipad-first-thoughts/ I’ve had the iPad for about a month now and since my colleagues Martin Fowler, Neal Ford and Chris Stevenson have already previously written about their experiences with it I thought I’d share the way I’m using it as well. Twitter I follow a lot of people involved in software development on twitter and come across a lot of interesting articles/blogs that people link to or write. A lot of the time I don’t really want to read those posts when I come across them - it would be much better if I could just save them to read later on. Coding: Controlled Technical Debt https://www.markhneedham.com/blog/2010/06/20/coding-controlled-technical-debt/ Sun, 20 Jun 2010 22:37:32 +0000 https://www.markhneedham.com/blog/2010/06/20/coding-controlled-technical-debt/ A couple of months ago I wrote about an approach to stories that Christian has been encouraging on our project whereby we slim stories down to allow us to deliver the core functionality of the application as quickly as possible. In our case we had a requirement to setup a range of different parameters used to lookup reference data used in the different calculations that we have in our application. Git/Mercurial: Pushing regularly https://www.markhneedham.com/blog/2010/06/19/gitmercurial-pushing-regularly/ Sat, 19 Jun 2010 22:14:06 +0000 https://www.markhneedham.com/blog/2010/06/19/gitmercurial-pushing-regularly/ I was reading a recent blog post by Gabriel Schenker where he discusses http://feedproxy.google.com/r/LosTechies/3/h-tL8ABnNkY/git-and-our-friction-points-and-beginners-mistakes.aspx[how his team is making use of Git] and about half way through he says the following: When using Git as your SCM it is normal to work for quite a while — maybe for a couple of days — in a local branch and without ever pushing the changes to the origin. Usually we only push when a feature is done or a defect is completely resolved. Slack time https://www.markhneedham.com/blog/2010/06/18/slack-time/ Fri, 18 Jun 2010 17:36:25 +0000 https://www.markhneedham.com/blog/2010/06/18/slack-time/ Ken Schwaber recently wrote a blog post where he compared the differences between the kanban, lean and scrum approaches to software development and although I haven’t had the same experiences as he has with the first two, one interesting thing he implies is that with a scrum approach we have slack time built in. God help us. People found ways to have slack in waterfall, to rest and be creative. Using real life metaphors https://www.markhneedham.com/blog/2010/06/17/using-real-life-metaphors/ Thu, 17 Jun 2010 07:00:09 +0000 https://www.markhneedham.com/blog/2010/06/17/using-real-life-metaphors/ My colleague Dermot Kilroy attended the DDD 2010 Exchange in London last week and one of the ideas that he’s been sharing with us from that is that of thinking how the user would solve a given problem without a technological solution i.e. how was something done before computers existed. This encourages us to take a bigger picture view and can actually lead to a much simpler solution than we’d otherwise come up with. Incremental Refactoring: Create factory method https://www.markhneedham.com/blog/2010/06/17/incremental-refactoring-create-factory-method/ Thu, 17 Jun 2010 00:43:41 +0000 https://www.markhneedham.com/blog/2010/06/17/incremental-refactoring-create-factory-method/ Dermot and I spent a bit of time today refactoring some code where the logic had ended up in the wrong place. The code originally looked a bit like this: public class LookupService { public LookUp Find(UserData userData) { var param1 = GetParam1From(userData); var param2 = GetParam2From(userData); var param3 = GetParam3From(userData); var lookupKey = new LookUpKey(param1, param2, param3); return lookupRepository.Find(lookupKey); } } public class LookUpKey { private readonly string param1; private readonly string param2; private readonly string param3; public LookUpKey(string param1, string param2, string param3) { this. Fluent NHibernate and the 2nd level cache https://www.markhneedham.com/blog/2010/06/16/fluent-nhibernate-and-the-2nd-level-cache/ Wed, 16 Jun 2010 00:07:43 +0000 https://www.markhneedham.com/blog/2010/06/16/fluent-nhibernate-and-the-2nd-level-cache/ We’ve been trying to cache some objects using NHibernate’s second level cache which always proves to be a trickier task than I remember it being the previous time! We’re storing some reference data in the database and then using LINQ to NHibernate to query for the specific row that we want based on some user entered criteria. We can cache that query by calling 'SetCacheable' on the 'QueryOptions' property of our query: Fluent NHibernate: Seeing the mapping files generated https://www.markhneedham.com/blog/2010/06/15/fluent-nhibernate-seeing-the-mapping-files-generated/ Tue, 15 Jun 2010 23:15:30 +0000 https://www.markhneedham.com/blog/2010/06/15/fluent-nhibernate-seeing-the-mapping-files-generated/ We’ve been fiddling around with Fluent NHibernate a bit over the last couple of days and one of the things that we wanted to do was output the NHibernate mapping files being generated so we could see if they were as expected. I couldn’t figure out how to do it but thanks to the help of James Gregory, Andrew Bullock and Matthew Erbs on twitter this is the code that you need in order to do that: TDD: Driving from the assertion up https://www.markhneedham.com/blog/2010/06/14/tdd-driving-from-the-assertion-up/ Mon, 14 Jun 2010 22:46:00 +0000 https://www.markhneedham.com/blog/2010/06/14/tdd-driving-from-the-assertion-up/ About a year ago I wrote a post about a book club we ran in Sydney covering 'The readability of tests' from Steve Freeman and Nat Pryce’s book in which they suggest that their preferred way of writing tests is to drive them from the assertion up: Write Tests Backwards Although we stick to a canonical format for test code, we don’t necessarily write tests from top to bottom. What we often do is: write the test name, which helps us decide what we want to achieve; write the call to the target code, which is the entry point for the feature; write the expectations and assertions, so we know what effects the feature should have; and, write the setup and teardown to define the context for the test. C#: A failed attempt at F#-ish pattern matching https://www.markhneedham.com/blog/2010/06/13/c-a-failed-attempt-at-f-ish-pattern-matching/ Sun, 13 Jun 2010 22:35:14 +0000 https://www.markhneedham.com/blog/2010/06/13/c-a-failed-attempt-at-f-ish-pattern-matching/ A few weeks ago we had some C# code around calcuations which had got a bit too imperative in nature. The code looked roughly like this: public class ACalculator { public double CalculateFrom(UserData userData) { if(userData.Factor1 == Factor1.Option1) { return 1.0; } if(userData.Factor2 == Factor2.Option3) { return 2.0; } if(userData.Factor3 == Factor3.Option2) { return 3.0 } return 0.0; } } I think there should be a more object oriented way to write this code whereby we push some of the logic onto the 'UserData' object but it struck me that it reads a little bit like pattern matching code you might see in F#. The Refactoring Dilemma https://www.markhneedham.com/blog/2010/06/13/the-refactoring-dilemma/ Sun, 13 Jun 2010 13:37:39 +0000 https://www.markhneedham.com/blog/2010/06/13/the-refactoring-dilemma/ On several of the projects that I’ve worked on over the last couple of years we’ve seen the following situation evolve: The team starts coding the application. At some stage there is a breakthrough in understanding and a chance to really improve the code. However the deadline is tight and we wouldn’t see a return within the time left if we refactored the code now The team keeps on going with the old approach Retrospectives: Some thoughts https://www.markhneedham.com/blog/2010/06/10/retrospectives-some-thoughts/ Thu, 10 Jun 2010 07:22:38 +0000 https://www.markhneedham.com/blog/2010/06/10/retrospectives-some-thoughts/ I’ve worked on two different teams this year which had quite different approaches to retrospectives. In the first team we had a retrospective at the beginning of every iteration i.e. once every two weeks and in the second team we tried out the idea of having a rolling retrospective i.e. we put up potential retrospective items on the wall and when there were enough of those we discussed them in the standup. XP2010: General thoughts https://www.markhneedham.com/blog/2010/06/09/xp2010-general-thoughts/ Wed, 09 Jun 2010 15:29:44 +0000 https://www.markhneedham.com/blog/2010/06/09/xp2010-general-thoughts/ I had the chance to attend the XP2010 conference in Trondheim, Norway for a couple of days last week as I was presenting a lightening talk based on a blog post I wrote last year titled 'Tough projects and the Kubler Ross Grief Cycle'. It was interesting to see the way another conference was organised as the only other conference I’ve attended was QCon which is a much more technical conference. XP2010: Coding Dojo Open Space https://www.markhneedham.com/blog/2010/06/04/xp2010-coding-dojo-open-space/ Fri, 04 Jun 2010 21:05:52 +0000 https://www.markhneedham.com/blog/2010/06/04/xp2010-coding-dojo-open-space/ I attended an open space hosted by Emily Bache at the XP2010 conference in Trondheim, Norway with several other people who have been organising coding dojos around the world. It was really interesting to hear about some of the different approaches that people have taken and how a lot of the issues we had with the one we used to run in Sydney were the same as what others had experienced. Ask for forgiveness, not for permission https://www.markhneedham.com/blog/2010/06/04/ask-for-forgiveness-not-for-permission/ Fri, 04 Jun 2010 21:03:38 +0000 https://www.markhneedham.com/blog/2010/06/04/ask-for-forgiveness-not-for-permission/ I gave a presentation at our ThoughtWorks Brazil office in Porto Alegre last way on some of the things that I’ve learned while working at ThoughtWorks and the first point I made was that it was better to 'ask for forgiveness, not for permission'. This was something that was taught to me a few years ago and the idea behind this is that if there’s some idea we want to try out it makes much more sense to start trying it now and then we can always apologise later on if someone has a problem with us doing that. C#: Using a dictionary instead of if statements https://www.markhneedham.com/blog/2010/05/30/c-using-a-dictionary-instead-of-if-statements/ Sun, 30 May 2010 23:13:25 +0000 https://www.markhneedham.com/blog/2010/05/30/c-using-a-dictionary-instead-of-if-statements/ A problem we had to solve on my current project is how to handle form submission where the user can click on a different button depending whether they want to go to the previous page, save the form or go to the next page. An imperative approach to this problem might yield code similar to the following: public class SomeController { public ActionResult TheAction(string whichButton, UserData userData) { if(whichButton == "Back") { // do the back action } else if(whichButton == "Next") { // do the next action } else if(whichButton == "Save") { // do the save action } throw Exception(""); } } A neat design idea which my colleague Dermot Kilroy introduced on our project is the idea of using a dictionary to map to the different actions instead of using if statements. Evolving a design: Some thoughts https://www.markhneedham.com/blog/2010/05/13/evolving-a-design-some-thoughts/ Thu, 13 May 2010 07:00:18 +0000 https://www.markhneedham.com/blog/2010/05/13/evolving-a-design-some-thoughts/ Phil wrote an interesting post recently about the Ubuntu decision making process with respect to design and suggested that we should look to follow something similar on agile software development teams. The Ubuntu design process basically comes down to this: This is not a democracy. Good feedback, good data, are welcome. But we are not voting on design decisions. Phil suggests the following: That doesn’t mean that there is an Architect (capital A, please), designing the system for the less-skilled developers to write. Agile: Chasing a points total https://www.markhneedham.com/blog/2010/05/11/agile-chasing-a-points-total/ Tue, 11 May 2010 22:28:42 +0000 https://www.markhneedham.com/blog/2010/05/11/agile-chasing-a-points-total/ I’ve previously written about the danger of using velocity as a goal but on almost every project I’ve worked on at some stage we do actually end up chasing a points total. Something I find quite interesting towards the end of an iteration is that if there is a choice of two stories to pick up then the project manager will nearly always press for one which can be completed within the remaining time in order to get the points total for that iteration higher. F#: Tacit programming https://www.markhneedham.com/blog/2010/05/10/f-tacit-programming/ Mon, 10 May 2010 23:24:39 +0000 https://www.markhneedham.com/blog/2010/05/10/f-tacit-programming/ I recently came across the idea of tacit programming which is described as such: Tacit programming is a programming paradigm in which a function definition does not include information regarding its arguments, using combinators and function composition (but not λ-abstraction) instead of variables. The simplicity behind this idea allows its use on several programming languages, such as J programming language and APL and especially in stack or concatenative languages, such as PostScript, Forth, Joy or Factor. Learnings from my first project of 2010 https://www.markhneedham.com/blog/2010/05/09/learnings-from-my-first-project-of-2010/ Sun, 09 May 2010 22:17:57 +0000 https://www.markhneedham.com/blog/2010/05/09/learnings-from-my-first-project-of-2010/ Pat Kua recently wrote a retrospective of his time working at ThoughtWorks and since I recently finished the first project I’ve worked on in 2010 I thought it would be interesting to have a look at what I’d learned and observed while working on it. "Perfect" code I’ve previously believed that driving for the cleanest code with the least duplication and best structured object oriented design was the way to go but on this project we favoured a simpler design which felt quite procedural in comparison to some of the code bases I’ve worked on. Coding: Paying attention https://www.markhneedham.com/blog/2010/05/09/coding-paying-attention/ Sun, 09 May 2010 13:04:48 +0000 https://www.markhneedham.com/blog/2010/05/09/coding-paying-attention/ Jeremy Miller tweeted earlier in the week about the dangers of using an auto mocking container and how it can encourage sloppy design: That whole "Auto Mocking Containers encourage sloppy design" meme that I blew off last week? Seeing an example in our code. I haven’t used an auto mocking container but it seems to me that although that type of tool might be useful for reducing the amount of code we have to write in our tests it also hides the actual problem that we have - an object has too many dependencies. F#: My current coding approach https://www.markhneedham.com/blog/2010/05/06/f-my-current-coding-approach/ Thu, 06 May 2010 23:36:26 +0000 https://www.markhneedham.com/blog/2010/05/06/f-my-current-coding-approach/ I spent a bit of time over the weekend coding a simple generic builder for test objects in F# and I noticed that although there were similarity with the ways I drive code in C# or Java my approach didn’t seem to be exactly the same. I’ve previously written about the importance of getting quick feedback when programming and how I believe that this can often be achieved faster by using the REPL rather than unit testing. Consistency in the code base and incremental refactoring https://www.markhneedham.com/blog/2010/05/05/consistency-in-the-code-base-and-incremental-refactoring/ Wed, 05 May 2010 22:34:56 +0000 https://www.markhneedham.com/blog/2010/05/05/consistency-in-the-code-base-and-incremental-refactoring/ I wrote a post a while ago about keeping consistency in the code base where I covered some of the reasons that you might want to rewrite parts of a code base and the potential impact of those changes but an interesting side to this discussion which I didn’t cover that much but which seems to play a big role is the role of incremental refactoring. In our code base we recently realised that the naming of the fields in some parts of a form don’t really make sense and I wanted to start naming new fields with the new naming style and then go back and change the existing ones incrementally when it was a good time to do so. F#: The Kestrel Revisited https://www.markhneedham.com/blog/2010/05/04/f-the-kestrel-revisited/ Tue, 04 May 2010 18:36:58 +0000 https://www.markhneedham.com/blog/2010/05/04/f-the-kestrel-revisited/ A couple of days I wrote about a 'returning' function that I’d written to simplify a bit of F# code that I’ve been working on. It’s defined like so: let returning t f = f(t); t And can then be used like this: let build (t:Type) = returning (Activator.CreateInstance(t)) (fun t -> t.GetType().GetProperties() |> Array.iter (fun p -> p.SetValue(t, createValueFor p, null))) While I quite like this function it didn’t quite feel like idiomatic F# to me. Coding: Make the mutation obvious https://www.markhneedham.com/blog/2010/05/04/coding-make-the-mutation-obvious/ Tue, 04 May 2010 18:32:28 +0000 https://www.markhneedham.com/blog/2010/05/04/coding-make-the-mutation-obvious/ Although I’m generally quite opposed to coding approaches whereby we mutate objects, sometimes the way a framework is designed seems to make this a preferable option. We came across a situation like this last week when we wanted to hydrate an object with data coming back from the browser. The signature of the action in question looked like this: public class SomeController { public ActionResult SomeAction(string id, UserData userData) { } We were able to automatically bind most of the values onto 'UserData' except for the 'id' which was coming in from the URL. Coding: The Kestrel https://www.markhneedham.com/blog/2010/05/03/coding-the-kestrel/ Mon, 03 May 2010 00:28:04 +0000 https://www.markhneedham.com/blog/2010/05/03/coding-the-kestrel/ Reg Braithwaite has a cool series of posts where he covers the different combinators from Raymond Smullyan’s 'To Mock a Mockingbird' book and one of my favourites is the 'Kestrel' or 'K Combinator' which describes a function that returns a constant function. It’s described like so: Kxy = x The Kestrel function would take in 2 arguments and return the value of the first one. The second argument would probably be a function that takes in the first argument and then performs some side effects with that value. Coding: Generalising too early https://www.markhneedham.com/blog/2010/04/30/coding-generalising-too-early/ Fri, 30 Apr 2010 07:12:26 +0000 https://www.markhneedham.com/blog/2010/04/30/coding-generalising-too-early/ I’ve previously written about the value of adding duplication to code before removing it and we had an interesting situation this week where we failed to do that and ended up generalising a piece of code too early to the point where it actually didn’t solve the problem anymore. The problem we were trying to solve was around the validation of some dependent fields and to start with we had this requirement: QTB: thetrainline.com - 'Scale at speed' https://www.markhneedham.com/blog/2010/04/29/qtb-thetrainline-com-scale-at-speed/ Thu, 29 Apr 2010 23:51:17 +0000 https://www.markhneedham.com/blog/2010/04/29/qtb-thetrainline-com-scale-at-speed/ About 18 months on from the first ThoughtWorks QTB that I saw about offshoring, on Wednesday night I attended the latest QTB in Manchester titled 'thetrainline.com - Scale at speed'. The presenters were thetrainline.com’s CIO David Jack and the Managing Director of ThoughtWorks India, Mahesh Baxi. They took us on the journey that thetrainline.com have taken while working with ThoughtWorks to re-architect part of their system to allow them to quickly deliver new functionality on the 2,500 websites that their portal technology powers. Listening to your tests: An example https://www.markhneedham.com/blog/2010/04/27/listening-to-your-tests-an-example/ Tue, 27 Apr 2010 22:34:22 +0000 https://www.markhneedham.com/blog/2010/04/27/listening-to-your-tests-an-example/ I was recently reading a blog post by Esko Luontola where he talks about the direct and indirect effects of TDD and one particularly interesting point he makes is that driving our code with a TDD approach helps to amplify the problems caused by writing bad code. if the code is not maintainable, it will be hard to change. Also if the code is not testable, it will be hard to write tests for it. Small step refactoring: Overload constructor https://www.markhneedham.com/blog/2010/04/25/small-step-refactoring-overload-constructor/ Sun, 25 Apr 2010 22:48:37 +0000 https://www.markhneedham.com/blog/2010/04/25/small-step-refactoring-overload-constructor/ I’ve previously written about some approaches that I’ve been taught with respect to taking small steps when refactoring code and another approach which a couple of colleagues have been using recently is the idea of overloading the constructor when refactoring objects. On a couple of occasions we’ve been trying to completely change the way an object was designed and changing the current constructor would mean that we’d have to change all the tests against that object before checking if the new design was actually going to work or not. Iron Ruby: 'unitialized constant...NameError' https://www.markhneedham.com/blog/2010/04/25/iron-ruby-unitialized-constant-nameerror/ Sun, 25 Apr 2010 17:27:25 +0000 https://www.markhneedham.com/blog/2010/04/25/iron-ruby-unitialized-constant-nameerror/ I’ve been playing around a bit with Iron Ruby and cucumber following Rupak Ganguly’s tutorial and I tried to change the .NET example provided in the 0.4.2 release of cucumber to call a class wrapping Castle’s WindsorContainer. The feature file now looks like this: # 'MyAssembly.dll' is in the 'C:/Ruby/lib/ruby/gems/1.8/gems/cucumber-0.6.4/examples/cs' folder require 'MyAssembly' ... Before do @container = Our::Namespace::OurContainer.new.Container end The class is defined roughly like this: public class OurContainer : IContainerAccessor { private WindsorContainer container = new WindsorContainer(); public SwintonContainer() { container. Haskell: parse error on input `=' https://www.markhneedham.com/blog/2010/04/22/haskell-parse-error-on-input/ Thu, 22 Apr 2010 23:35:27 +0000 https://www.markhneedham.com/blog/2010/04/22/haskell-parse-error-on-input/ I’ve been trying to follow the 'Monads for Java/C++ programmers' post in ghci and getting the following type of error when trying out the code snippets: Prelude> a = 3 <interactive>:1:2: parse error on input `=' I figured there must be something wrong with my installation of the compiler since I was copying and pasting the example across and having this problem. Having reinstalled that, however, I still had the same problem. Lured in by the complexity https://www.markhneedham.com/blog/2010/04/21/lured-in-by-the-complexity/ Wed, 21 Apr 2010 07:21:55 +0000 https://www.markhneedham.com/blog/2010/04/21/lured-in-by-the-complexity/ We recently ran into an interesting problem when running the website we’re building on our 'user replica machine' where you can access the application via a web browser running on Citrix. The problem we were having was that the result of a post redirect get request that we were making via the jQuery Form plugin was failing to update the fragment of the page correctly. It looked like it was replacing it with the original HTML. Functional C#: An imperative to declarative example https://www.markhneedham.com/blog/2010/04/20/functional-c-an-imperative-to-declarative-example/ Tue, 20 Apr 2010 07:08:09 +0000 https://www.markhneedham.com/blog/2010/04/20/functional-c-an-imperative-to-declarative-example/ I wrote previously about how we’ve been working on some calculations on my current project and one thing we’ve been trying to do is write this code in a fairly declarative way. Since we’ve been test driving the code it initially started off being quite imperative and looked a bit like this: public class TheCalculator { ... public double CalculateFrom(UserData userData) { return Calculation1(userData) + Calculation2(userData) + Calculation3(userData); } public double Calculation1(UserData userData) { // do calculation stuff here } public double Calculation2(UserData userData) { // do calculation stuff here } . Coding: Another outside in example https://www.markhneedham.com/blog/2010/04/18/coding-another-outside-in-example/ Sun, 18 Apr 2010 22:46:46 +0000 https://www.markhneedham.com/blog/2010/04/18/coding-another-outside-in-example/ I’ve written before about my thoughts on outside in development and we came across another example last week where we made our life difficult by not initially following this approach. The rough design of what we were working on looked like this: My pair and I were working on the code to do the calculations and we deliberately chose not to drive the functionality from the UI because the other pair were reworking all our validation code and we didn’t want to step on each others toes. Late integration: Some thoughts https://www.markhneedham.com/blog/2010/04/18/late-integration-some-thoughts/ Sun, 18 Apr 2010 21:19:23 +0000 https://www.markhneedham.com/blog/2010/04/18/late-integration-some-thoughts/ John Daniels has an interesting post summarising GOOSgaggle, an event run a few weeks ago where people met up to talk about the ideas in 'Growing Object Oriented Software, Guided by Tests'. It’s an interesting post and towards the end he states the following: Given these two compelling justifications for starting with end-to-end tests, why is it that many people apparently don’t start there? We came up with two possibilities, although there may be many others: Functional C#: Using custom delegates to encapsulate Funcs https://www.markhneedham.com/blog/2010/04/17/functional-c-using-custom-delegates-to-encapsulate-funcs/ Sat, 17 Apr 2010 12:16:46 +0000 https://www.markhneedham.com/blog/2010/04/17/functional-c-using-custom-delegates-to-encapsulate-funcs/ One of the problems that I’ve frequently run into when writing C# code in a more functional way is that we can often end up with 'Funcs' all over the place which don’t really describe what concept they’re encapsulating. We had some code similar to this where it wasn’t entirely obvious what the Func being stored in the dictionary was actually doing: public class Calculator { private Dictionary<string, Func<double, double, double>> lookups = new Dictionary<string, Func<double, double, double>>(); public Blah() { lookups. C#: Java-ish enums https://www.markhneedham.com/blog/2010/04/17/c-java-ish-enums/ Sat, 17 Apr 2010 10:33:16 +0000 https://www.markhneedham.com/blog/2010/04/17/c-java-ish-enums/ We’ve been writing quite a bit of code on my current project trying to encapsulate user selected values from drop down menus where we then want to go and look up something in another system based on the value that they select. Essentially we have the need for some of the things that a Java Enum would give us but which a C# one doesn’t! Right now we have several classes similar to the following in our code base to achieve this: hg: Reverting committed changes https://www.markhneedham.com/blog/2010/04/15/hg-reverting-committed-changes/ Thu, 15 Apr 2010 22:35:53 +0000 https://www.markhneedham.com/blog/2010/04/15/hg-reverting-committed-changes/ Continuing with our learning with Mercurial, yesterday we wanted to revert a couple of change sets that we had previously committed and go back to an old version of the code and continue working from there. As an example, say we wanted to go back to Revision 1 and had the following changes committed: Revision 3 Revision 2 Revision 1 Revision 0 My original thought was that we could merge revision 1 with the current tip: Agile: Slimming down stories https://www.markhneedham.com/blog/2010/04/14/agile-slimming-down-stories/ Wed, 14 Apr 2010 22:53:07 +0000 https://www.markhneedham.com/blog/2010/04/14/agile-slimming-down-stories/ On the project I’m currently working on we have several stories around writing the code that does various different calculations based on user input and then shows the results on the screen. The original assumption on these stories was that we would be looking up the data of the business rules from a local database. The data would be copied across from a central database into that one for this project. Maverick: Book review https://www.markhneedham.com/blog/2010/04/14/maverick-book-review/ Wed, 14 Apr 2010 07:23:54 +0000 https://www.markhneedham.com/blog/2010/04/14/maverick-book-review/ My colleagues Frankie and Danilo have been recommending 'Maverick' to me for a long time and I finally got around to reading it. In this book Ricardo Semler, the CEO of Semco, tells the story of the company and how he helped evolved the organisation into one which is more employee led and embraces ideas such as open & self set salaries while encouraging civil obedience in the workforce as a necessity to alert the organisation to its problems. F#: The 'defaultArg' function https://www.markhneedham.com/blog/2010/04/12/f-the-defaultarg-function/ Mon, 12 Apr 2010 18:21:41 +0000 https://www.markhneedham.com/blog/2010/04/12/f-the-defaultarg-function/ While reading through an old blog post by Matthew Podwysocki about writing F# code in a functional rather than imperative way I came across the 'defaultArg' function which I haven’t seen previously. It’s quite a simple function that we can use when we want to set a default value if an option type has a value of 'None': The type signature is as follows: > defaultArg;; val it : ('a option -> 'a -> 'a) = <fun:clo@0> And the definition is relatively simple: Mercurial: Early thoughts https://www.markhneedham.com/blog/2010/04/10/mercurial-early-thoughts/ Sat, 10 Apr 2010 11:43:23 +0000 https://www.markhneedham.com/blog/2010/04/10/mercurial-early-thoughts/ We’re using Mercurial as our source control system on the project I’m working on at the moment and since I’ve not yet used a distributed source control system on a team I thought it’d be interesting to note some of my initial thoughts. One of the neat things about having a local repository and a central one is that you can check in lots of times locally and then push those changes to the central repository when you want everyone else to get the changes that you’ve made. Coding: Maybe vs Null Object patterns https://www.markhneedham.com/blog/2010/04/10/coding-maybe-vs-null-object-patterns/ Sat, 10 Apr 2010 11:21:30 +0000 https://www.markhneedham.com/blog/2010/04/10/coding-maybe-vs-null-object-patterns/ On the project I’m currently working on my colleague Christian Blunden has introduced a version of the Maybe type into the code base, a concept that originally derives from the world of functional programming. The code looks a bit like this: public interface Maybe<T> { bool HasValue(); T Value(); } public class Some<T> : Maybe<T> { private readonly T t; public Some(T t) { this.t = t; } public bool HasValue() { return true; } public T Value() { return t; } } public class None<T> : Maybe<T> { public bool HasValue() { return false; } public T Value() { throw new NotImplementedException(); } } We would then use it in the code like this: Coding: FindOrCreateUser and similar methods https://www.markhneedham.com/blog/2010/04/09/coding-findorcreateuser-and-similar-methods/ Fri, 09 Apr 2010 07:09:28 +0000 https://www.markhneedham.com/blog/2010/04/09/coding-findorcreateuser-and-similar-methods/ One of the general guidelines that I like to follow when writing methods is trying to ensure that it’s only doing one thing but on several recent projects I’ve noticed us breaking this guideline and it feels like the right thing to do. The method in question typically takes in some user details, looks up that user in some data store and then returning it if there is an existing user and creating a new user if not. Velocity as a goal https://www.markhneedham.com/blog/2010/04/07/velocity-as-a-goal/ Wed, 07 Apr 2010 23:36:16 +0000 https://www.markhneedham.com/blog/2010/04/07/velocity-as-a-goal/ Grant Joung wrote a post a while ago about velocity goals and whether they’re a good or bad idea, a topic which seems to come up from time to time on agile teams. My colleague Danilo Sato previously wrote about the dangers of using velocity as a performance measure because it’s something that’s directly within our control and can therefore be gamed: Value should be measured at the highest level possible, so that it doesn’t fall into one team’s (or individual’s) span of control. LDNUG: Mixing functional and object oriented approaches to programming in C# https://www.markhneedham.com/blog/2010/04/02/ldnug-mixing-functional-and-object-oriented-approaches-to-programming-in-c/ Fri, 02 Apr 2010 23:11:07 +0000 https://www.markhneedham.com/blog/2010/04/02/ldnug-mixing-functional-and-object-oriented-approaches-to-programming-in-c/ On Wednesday evening my colleague Mike Wagg and I presented a variation of a talk I originally presented at Developer Developer Developer 8 titled 'Mixing functional and object oriented approaches to programming in C#' to the London .NET User Group at Skillsmatter. The slides from the talk are below and there is a video of the talk on the Skillsmatter website. Mixing functional and object oriented approaches to programming in C# How I Learned to Let My Workers Lead https://www.markhneedham.com/blog/2010/04/01/how-i-learned-to-let-my-workers-lead/ Thu, 01 Apr 2010 09:38:10 +0000 https://www.markhneedham.com/blog/2010/04/01/how-i-learned-to-let-my-workers-lead/ I recently came across a really interesting article written by Ralph Stayer titled 'How I Learned to Let My Workers Lead' about his experiences at Johnsonville Foods. It describes the way that he was able to help change the company culture from one where he made all the decisions and took all responsibility to one where everyone in the company was involved in decision making, resulting in a more successful organisation. Saved from an episode of bear shaving https://www.markhneedham.com/blog/2010/03/30/saved-from-an-episode-of-bear-shaving/ Tue, 30 Mar 2010 06:57:43 +0000 https://www.markhneedham.com/blog/2010/03/30/saved-from-an-episode-of-bear-shaving/ As part of our continuous integration build we have a step in the build which tears down a Windows service, uninstalls it and then reinstalls it later on from the latest files checked into the repository. One problem we’ve been having recently is that despite the fact it should already have been uninstalled a lock has been kept on the log4net dll in our build directory, a directory that we tear down as one of the next steps. Reading Code: underscore.js https://www.markhneedham.com/blog/2010/03/28/reading-code-underscore-js/ Sun, 28 Mar 2010 20:02:10 +0000 https://www.markhneedham.com/blog/2010/03/28/reading-code-underscore-js/ I’ve been spending a bit of time reading through the source code of underscore.js, a JavaScript library that provides lots of functional programming support which my colleague Dave Yeung pointed out to me after reading my post about building a small application with node.js. I’m still getting used to the way that JavaScript libraries are written but these were some of the interesting things that I got from reading the code: Finding the assumptions in stories https://www.markhneedham.com/blog/2010/03/26/finding-the-assumptions-in-stories/ Fri, 26 Mar 2010 01:14:15 +0000 https://www.markhneedham.com/blog/2010/03/26/finding-the-assumptions-in-stories/ My colleague J.K. has written an interesting blog post where he describes a slightly different approach that he’s been taking to writing stories to help move the business value in a story towards the beginning of the description and avoid detailing a solution in the 'I want' section of the story. To summarise, J.K.'s current approach involves moving from the traditional story format of: As I... I want.. So that. Selenium, Firefox and HTTPS pages https://www.markhneedham.com/blog/2010/03/25/selenium-firefox-and-https-pages/ Thu, 25 Mar 2010 08:09:26 +0000 https://www.markhneedham.com/blog/2010/03/25/selenium-firefox-and-https-pages/ A fairly common scenario that we come across when building automated test suites using Selenium is the need to get past the security exception that Firefox pops up when you try to access a self signed HTTPS page. Luckily there is quite a cool plugin for Firefox called 'Remember Certificate Exception' which automatically clicks through the exception and allows the automated tests to keep running and not get stuck on the certificate exception page. TDD: Consistent test structure https://www.markhneedham.com/blog/2010/03/24/tdd-consistent-test-structure/ Wed, 24 Mar 2010 06:53:55 +0000 https://www.markhneedham.com/blog/2010/03/24/tdd-consistent-test-structure/ While pairing with Damian we came across the fairly common situation where we’d written two different tests - one to handle the positive case and one the negative case. While tidying up the tests after we’d got them passing we noticed that the test structure wasn’t exactly the same. The two tests looked a bit like this: [Test] public void ShouldSetSomethingIfWeHaveAFoo() { var aFoo = FooBuilder.Build.WithBar("bar").WithBaz("baz").AFoo(); // some random setup // some stubs/expectations var result = new Controller(. Defensive Programming and the UI https://www.markhneedham.com/blog/2010/03/22/defensive-programming-and-the-ui/ Mon, 22 Mar 2010 23:42:02 +0000 https://www.markhneedham.com/blog/2010/03/22/defensive-programming-and-the-ui/ A few weeks ago I was looking at quite an interesting bug in our system which initially didn’t seem possible. On one of our screens we have some questions that the user fills in which read a bit like this: Do you have a foo? Is your foo an approved foo? Is your foo special? i.e. you would only see the 2nd and 3rd questions on the screen if you answered yes to the first question. node.js: A little application with Twitter & CouchDB https://www.markhneedham.com/blog/2010/03/21/node-js-a-little-application-with-twitter-couchdb/ Sun, 21 Mar 2010 22:13:27 +0000 https://www.markhneedham.com/blog/2010/03/21/node-js-a-little-application-with-twitter-couchdb/ I’ve been continuing to play around with node.js and I thought it would be interesting to write a little application to poll Twitter every minute and save any new Tweets into a CouchDB database. I first played around with CouchDB in May last year and initially spent a lot of time trying to work out how to install it before coming across CouchDBX which gives you one click installation for Mac OS X. TDD: Expressive test names https://www.markhneedham.com/blog/2010/03/19/tdd-expressive-test-names/ Fri, 19 Mar 2010 18:06:51 +0000 https://www.markhneedham.com/blog/2010/03/19/tdd-expressive-test-names/ Towards the end of a post I wrote just over a year ago I suggested that I wasn’t really bothered about test names anymore because I could learn what I wanted from reading the test body. Recently, however, I’ve come across several tests that I wrote previously which were testing the wrong thing and had such generic test names that it wasn’t obvious that it was happening. The tests in question were around code which partially clones an object but doesn’t copy some fields for various reasons. Functional C#: Continuation Passing Style https://www.markhneedham.com/blog/2010/03/19/functional-c-continuation-passing-style/ Fri, 19 Mar 2010 07:48:51 +0000 https://www.markhneedham.com/blog/2010/03/19/functional-c-continuation-passing-style/ Partly inspired by my colleague Alex Scordellis' recent post about lambda passing style I spent some time trying out a continuation passing style style on some of the code in one of our controllers to see how different the code would look compared to its current top to bottom imperative style. We had code similar to the following: public ActionResult Submit(string id, FormCollection form) { var shoppingBasket = CreateShoppingBasketFrom(id, form); if (! Essential and accidental complexity https://www.markhneedham.com/blog/2010/03/18/essential-and-accidental-complexity/ Thu, 18 Mar 2010 23:21:55 +0000 https://www.markhneedham.com/blog/2010/03/18/essential-and-accidental-complexity/ I’ve been reading Neal Ford’s series of articles on Evolutionary architecture and emergent design and in the one about 'Investigating architecture and design' he discusses Essential and accidental complexity which I’ve previously read about in Neal’s book, 'The Productive Programmer'. Neal defines these terms like so: Essential complexity is the core of the problem we have to solve, and it consists of the parts of the software that are legitimately difficult problems. Parallel Pair Programming https://www.markhneedham.com/blog/2010/03/16/parallel-pair-programming/ Tue, 16 Mar 2010 23:56:47 +0000 https://www.markhneedham.com/blog/2010/03/16/parallel-pair-programming/ I’ve spent a bit of time working with Les recently and it’s been quite interesting working out the best way for us to pair together as he’s working as a front end developer on the team which means he’s best utilised working on the CSS/JavaScript/HTML side of things. Having said that there are often features which require both front end and backend collaboration and we’ve been trying to drive these features from the front end through to the backend rather than working on the backend code separately and then working with Les later on to hook it all up to the frontend. node.js: First thoughts https://www.markhneedham.com/blog/2010/03/15/node-js-first-thoughts/ Mon, 15 Mar 2010 00:09:47 +0000 https://www.markhneedham.com/blog/2010/03/15/node-js-first-thoughts/ I recently came across node.js via a blog post by Paul Gross and I’ve been playing around with it a bit over the weekend trying to hook up some code to call through to the Twitter API and then return the tweets on my friend timeline. node.js gives us event driven I/O using JavaScript running server side on top of Google’s V8 JavaScript engine. Simon Willison has http://www.slideshare.net/simon/evented-io-based-web-servers-explained-using-bunnies - Simon Willison’s talk[part of a presentation on slideshare] where he describes the difference between the typical thread per request approach and the event based approach to dealing with web requests using the metaphor of bunnies. A reminder of the usefulness of Git https://www.markhneedham.com/blog/2010/03/14/a-reminder-of-the-usefulness-of-git/ Sun, 14 Mar 2010 00:45:34 +0000 https://www.markhneedham.com/blog/2010/03/14/a-reminder-of-the-usefulness-of-git/ Despite the fact that none of the projects that I’ve worked on have used Git or Mercurial as the team’s main repository I keep forgetting how useful those tools can be even if they’re just being used locally. I ran into a problem when trying to work out why a Rhino Mocks expectation wasn’t working as I expected last week having refactored a bit of code to include a constructor. Preventing systematic errors: An example https://www.markhneedham.com/blog/2010/03/13/preventing-systematic-errors-an-example/ Sat, 13 Mar 2010 23:26:23 +0000 https://www.markhneedham.com/blog/2010/03/13/preventing-systematic-errors-an-example/ James Shore has an interesting recent blog post where he describes some alternatives to over reliance on acceptance testing and one of the ideas that he describes is fixing the process whenever a bug is found in exploratory testing. He describes two ways of preventing bugs from making it through to exploratory testing: Make the bug impossible Catch the bug automatically Sometimes we can prevent defects by changing the design of our system so that type of defect is impossible. Does an organisation need to be fully committed to agile/lean/scrum? https://www.markhneedham.com/blog/2010/03/11/does-an-organisation-need-to-be-fully-committed-to-agileleanscrum/ Thu, 11 Mar 2010 08:05:28 +0000 https://www.markhneedham.com/blog/2010/03/11/does-an-organisation-need-to-be-fully-committed-to-agileleanscrum/ Alan Atlas has a recent blog post where he discusses agile, lean and scrum and suggests that you can’t truly achieve agility unless your company is fully committed to it which differs slightly from my experiences. Alan makes a valid point that we’re not really following an approach just because we use all the practices: Many people make the mistake of viewing Scrum and Agile and Lean as sets of practices. Javascript: Function scoping https://www.markhneedham.com/blog/2010/03/10/javascript-function-scoping/ Wed, 10 Mar 2010 23:06:31 +0000 https://www.markhneedham.com/blog/2010/03/10/javascript-function-scoping/ My colleague John Hume wrote an interesting post about his experience with the 'const' keyword in ActionScript where he describes the problems with trying to capture a loop variable in a closure and then evaluating it later on in the code. Since ActionScript and JavaScript are both dialects of ECMAscript, this is a problem in JavaScript as well, and is due to the fact that variables in JavaScript have function scope rather than block scope which is the case in many other languages. Pair Programming: Some thoughts https://www.markhneedham.com/blog/2010/03/09/pair-programming-some-thoughts/ Tue, 09 Mar 2010 23:04:29 +0000 https://www.markhneedham.com/blog/2010/03/09/pair-programming-some-thoughts/ Mark Wilden pointed me to a post he’s written about his experience pair programming at Pivotal Labs where he makes some interesting although not uncommon observations. When you pair program, you’re effectively joined at the hip with your pair. You can’t pair if only one of you is there. I’ve previously written wondering what we should do if our pair isn’t around where I was leaning more towards the opinion that we should try to continue along the same path that we were on when working with our pair if they’re gone for a short amount of time and to find a new pair or work alone if they’re gone for longer. Getting real: Book review https://www.markhneedham.com/blog/2010/03/08/getting-real-book-review/ Mon, 08 Mar 2010 21:56:58 +0000 https://www.markhneedham.com/blog/2010/03/08/getting-real-book-review/ I recently came across 37 Signals 'Getting Real' book where they go through their approach to building web applications and there have certainly been some good reminders and ideas on the best way to do this. These are some of my favourite parts: Ship it! If there are minor bugs, ship it as soon you have the core scenarios nailed and ship the bug ﬁxes to web gradually after that. Javascript: The 'new' keyword https://www.markhneedham.com/blog/2010/03/06/javascript-the-new-keyword/ Sat, 06 Mar 2010 15:16:02 +0000 https://www.markhneedham.com/blog/2010/03/06/javascript-the-new-keyword/ I came across an interesting post by John Resig where he describes a 'makeClass' function that he uses in his code to create functions which can instantiate objects regardless of whether the user calls that function with or without the new keyword. The main reason that the new keyword seems to be considered harmful is because we might make assumptions in our function that it will be called with the new keyword which changes the meaning of 'this' inside that function. Functional C#: Using Join and GroupJoin https://www.markhneedham.com/blog/2010/03/04/functional-c-using-join-and-groupjoin/ Thu, 04 Mar 2010 18:55:02 +0000 https://www.markhneedham.com/blog/2010/03/04/functional-c-using-join-and-groupjoin/ An interesting problem which I’ve come across a few times recently is where we have two collections which we want to use together in some way and get a result which could either be another collection or some other value. In one which Chris and I were playing around with we had a collection of years and a collection of cars with corresponding years and the requirement was to show all the years on the page with the first car we found for that year or an empty value if there was no car for that year. Riskiest thing first vs Outside in development https://www.markhneedham.com/blog/2010/03/02/riskiest-thing-first-vs-outside-in-development/ Tue, 02 Mar 2010 22:49:11 +0000 https://www.markhneedham.com/blog/2010/03/02/riskiest-thing-first-vs-outside-in-development/ I had an interesting conversation with my colleague David Santoro last week where I described the way that I often pick out the riskiest parts of a story or task and do those first and David pointed out that this approach didn’t seem to fit in with the idea of outside in development. The idea with outside in development as I understand it is that we would look to drive any new functionality from the UI i. A reminder about context switching https://www.markhneedham.com/blog/2010/03/01/a-reminder-about-context-switching/ Mon, 01 Mar 2010 23:12:01 +0000 https://www.markhneedham.com/blog/2010/03/01/a-reminder-about-context-switching/ I’ve spent most of my time working on agile software development teams over the last few years so for the most part each pair is only working on one story, keeping the work in progress low and allowing them to focus on that piece of work until it’s completed. My pair and I ended up in a therefore somewhat unusual situation last week where we were attempting to work on three things at the same time and weren’t doing a particularly great job on any of them. Javascript: Confusing 'call' and 'apply' https://www.markhneedham.com/blog/2010/02/28/javascript-confusing-call-and-apply/ Sun, 28 Feb 2010 01:45:49 +0000 https://www.markhneedham.com/blog/2010/02/28/javascript-confusing-call-and-apply/ I wrote a couple of weeks ago about using the 'call' and 'apply' functions in Javascript when passing functions around and while working on our IE6 specific code I realised that I’d got them mixed up. We were writing some code to override one of our functions so that we could call the original function and then do something else after that. The code was roughly like this: Foo = { bar : function(duck) { console. Javascript: Isolating browser specific code https://www.markhneedham.com/blog/2010/02/28/javascript-isolating-browser-specific-code/ Sun, 28 Feb 2010 00:11:20 +0000 https://www.markhneedham.com/blog/2010/02/28/javascript-isolating-browser-specific-code/ One thing we’ve found on my current project is that despite our best efforts we’ve still ended up with some javascript code which we only want to run if the user is using Internet Explorer 6 and the question then becomes how to write that code so that it doesn’t end up being spread all over the application. jQuery has some functions which allow you to work out which browser’s being used but I’ve noticed that when we use those you tend to end up with if statements dotted all around the code which isn’t so good. Shu Ha Ri harmful? https://www.markhneedham.com/blog/2010/02/26/shu-ha-ri-harmful/ Fri, 26 Feb 2010 23:53:31 +0000 https://www.markhneedham.com/blog/2010/02/26/shu-ha-ri-harmful/ I came across a blog post by Rachel Davies where she wonders whether the Shu-Ha-Ri approach to learning/teaching is actually harmful and I found Rachel’s thoughts around the teaching of principles and practices quite interesting. Quoting Jeff Sutherland: Only when you have mastered the basic practices are you allowed to improvise. And the last and most important — Before you have gained discipline, centering, and flexibility, you are a hazard to yourself and others. Coding: Shared libraries https://www.markhneedham.com/blog/2010/02/26/coding-shared-libraries/ Fri, 26 Feb 2010 00:36:50 +0000 https://www.markhneedham.com/blog/2010/02/26/coding-shared-libraries/ On a few projects that I’ve worked on one of the things that we’ve done is create a shared library of objects which can be used across several different projects and while at the time it seemed like a good idea, in hindsight I’m not sure if it’s an entirely successful strategy. I’m quite a fan of not recreating effort which is generally the goal when trying to pull out common code and within one team this seems to be a good approach the majority of the time. Pair Programming: In interviews https://www.markhneedham.com/blog/2010/02/25/pair-programming-in-interviews/ Thu, 25 Feb 2010 08:03:12 +0000 https://www.markhneedham.com/blog/2010/02/25/pair-programming-in-interviews/ I came across a couple of quite interesting blog posts recently which described some approaches to interviewing which suggest a more empirical approach to interviewing whereby the interview is treated more like an audition for the person being interviewed. I like this idea and it’s something that we do when recruiting developers in a pair programming interview. The general idea is that we pair with the candidate as they go through a coding problem. Refactoring: Small steps to pull out responsibilities https://www.markhneedham.com/blog/2010/02/24/refactoring-small-steps-to-pull-out-responsibilities/ Wed, 24 Feb 2010 00:45:38 +0000 https://www.markhneedham.com/blog/2010/02/24/refactoring-small-steps-to-pull-out-responsibilities/ I wrote previously about how I’ve been using effect sketches to identify responsibilities in objects so that I can pull them out into other objects and once I’ve done this I often find that I can’t see a small next step to take. At this stage in the past I’ve often then stopped and left the refactoring until I have more time to complete it but this hasn’t really worked and a lot of the time I end up only seeing the code change in my mind and not in the actual code. Coding: Effect sketches and the Mikado method https://www.markhneedham.com/blog/2010/02/23/coding-effect-sketches-and-the-mikado-method/ Tue, 23 Feb 2010 00:29:34 +0000 https://www.markhneedham.com/blog/2010/02/23/coding-effect-sketches-and-the-mikado-method/ I’ve written previously about how useful I find effect sketches for helping me to understand how an object’s methods and fields fit together and while drawing one a couple of weeks ago I noticed that it’s actually quite useful for seeing which parts of the code will be the easiest to change. I was fairly sure one of the object’s in our code base was doing too many things due to the fact that it had a lot of dependencies. Javascript: Bowling Game Kata https://www.markhneedham.com/blog/2010/02/22/javascript-bowling-game-kata/ Mon, 22 Feb 2010 23:14:20 +0000 https://www.markhneedham.com/blog/2010/02/22/javascript-bowling-game-kata/ I spent some time over the weekend playing with the bowling game kata in Javascript. I thought I knew the language well enough to be able to do this kata quite easily so I was quite surprised at how much I struggled initially. These are some of my observations from this exercise: I was using screw-unit as my unit testing framework - I originally tried to setup JSTestDriver but I was having problems getting that to work so in the interests of not shaving the yak I decided to go with something I already know how to use. C#: Overcomplicating with LINQ https://www.markhneedham.com/blog/2010/02/21/c-overcomplicating-with-linq/ Sun, 21 Feb 2010 12:01:22 +0000 https://www.markhneedham.com/blog/2010/02/21/c-overcomplicating-with-linq/ I recently came across an interesting bit of code which was going through a collection of strings and then only taking the first 'x' number of characters and discarding the rest. The code looked roughly like this: var words = new[] {"hello", "to", "the", "world"}; var newWords = new List<string>(); foreach (string word in words) { if (word.Length > 3) { newWords.Add(word.Substring(0, 3)); continue; } newWords.Add(word); } For this initial collection of words we would expect 'newWords' to contain ["hel", "to", "the", "wor"] C#: A lack of covariance with generics example https://www.markhneedham.com/blog/2010/02/20/c-a-lack-of-covariance-with-generics-example/ Sat, 20 Feb 2010 12:17:16 +0000 https://www.markhneedham.com/blog/2010/02/20/c-a-lack-of-covariance-with-generics-example/ One of the things I find most confusing when reading about programming languages is the idea of covariance and contravariance and while I’ve previously read that covariance is not possible when using generics in C# I recently came across an example where I saw that this was true. I came across this problem while looking at how to refactor some code which has been written in an imperative style: C#: Causing myself pain with LINQ's delayed evaluation https://www.markhneedham.com/blog/2010/02/18/c-causing-myself-pain-with-linqs-delayed-evaluation/ Thu, 18 Feb 2010 22:28:12 +0000 https://www.markhneedham.com/blog/2010/02/18/c-causing-myself-pain-with-linqs-delayed-evaluation/ I recently came across some code was imperatively looping through a collection and then mapping each value to go to something else by using an injected dependency to do that. I thought I’d try to make use of functional collection parameters to try and simplify the code a bit but actually ended up breaking one of the tests. About a month ago I wrote about how I’d written a hand rolled stub to simplify a test and this was actually where I caused myself the problem! Rules of Thumb: Don't use the session https://www.markhneedham.com/blog/2010/02/16/rules-of-thumb-dont-use-the-session/ Tue, 16 Feb 2010 23:19:09 +0000 https://www.markhneedham.com/blog/2010/02/16/rules-of-thumb-dont-use-the-session/ A while ago I wrote about some rules of thumb that I’d been taught by my colleagues with respect to software development and I was reminded of one of them - don’t put anything in the session - during a presentation my colleague Luca Grulla gave at our client on scaling applications by making use of the infrastructure of the web. The problem with putting state in the session is that it means that requests from a specific user have to be tied to a specific server i. F#: Passing an argument to a member constraint https://www.markhneedham.com/blog/2010/02/15/f-passing-an-argument-to-a-member-constraint/ Mon, 15 Feb 2010 00:05:17 +0000 https://www.markhneedham.com/blog/2010/02/15/f-passing-an-argument-to-a-member-constraint/ I’ve written previously about function overloading in F# and my struggles working out how to do it and last week I came across the concept of inline functions and statically resolved parameters as a potential way to solve that problem. I came across a problem where I thought I would be able to make use of this while playing around with some code parsing Xml today. I had a 'descendants' function which I wanted to be applicable against 'XDocument' and 'XElement' so I originally just defined the functions separately forgetting that the compiler wouldn’t allow me to do so as we would have a duplicate definition of the function: F#: Unexpected identifier in implementation file https://www.markhneedham.com/blog/2010/02/14/f-unexpected-identifier-in-implementation-file/ Sun, 14 Feb 2010 01:03:34 +0000 https://www.markhneedham.com/blog/2010/02/14/f-unexpected-identifier-in-implementation-file/ I’ve been playing around with some F# code this evening and one of the bits of code needs to make a HTTP call and return the result. I wrote this code and then tried to make use of the 'Async.RunSynchronously' function to execute the call. The code I had looked roughly like this: namespace Twitter module RetrieveLinks open System.Net open System.IO open System.Web open Microsoft.FSharp.Control let AsyncHttp (url:string) = async { let request = HttpWebRequest. Javascript: Some stuff I learnt this week https://www.markhneedham.com/blog/2010/02/12/javascript-some-stuff-i-learnt-this-week/ Fri, 12 Feb 2010 21:11:54 +0000 https://www.markhneedham.com/blog/2010/02/12/javascript-some-stuff-i-learnt-this-week/ I already wrote about how I’ve learnt a bit about the 'call' and 'apply' functions in Javascript this week but as I’ve spent the majority of my time doing front end stuff this week I’ve also learnt and noticed some other things which I thought were quite interesting. Finding character codes We were doing some testing early in the week where we needed to restrict the characters that could be entered into a text box. Javascript: Passing functions around with call and apply https://www.markhneedham.com/blog/2010/02/12/javascript-passing-functions-around-with-call-and-apply/ Fri, 12 Feb 2010 20:18:02 +0000 https://www.markhneedham.com/blog/2010/02/12/javascript-passing-functions-around-with-call-and-apply/ Having read Douglas Crockford’s 'Javascript: The Good Parts' I was already aware that making use of the 'this' keyword in Javascript is quite dangerous but we came across what must be a fairly common situation this week where we wanted to pass around a function which made use of 'this' internally. We were writing some JSTestDriver tests around a piece of code which looked roughly like this: function Common() { this. F#: Inline functions and statically resolved type parameters https://www.markhneedham.com/blog/2010/02/10/f-inline-functions-and-statically-resolved-type-parameters/ Wed, 10 Feb 2010 23:06:14 +0000 https://www.markhneedham.com/blog/2010/02/10/f-inline-functions-and-statically-resolved-type-parameters/ One thing which I’ve often wondered when playing around with F# is that when writing the following function the type of the function is inferred to be 'int -> int -> int' rather than allowing any values which can be added together: let add x y = x + y > val add : int -> int -> int It turns out if you use the 'inline' keyword then the compiler does exactly what we want: Javascript: File encoding when using string.replace https://www.markhneedham.com/blog/2010/02/10/javascript-file-encoding-when-using-string-replace/ Wed, 10 Feb 2010 00:02:02 +0000 https://www.markhneedham.com/blog/2010/02/10/javascript-file-encoding-when-using-string-replace/ We ran into an interesting problem today when moving some Javascript code which was making use of the 'string.replace' function to strip out the £ sign from some text boxes on a form. The code we had written was just doing this: var textboxValue = $("#fieldId").val().replace(/£/, ''); So having realised that we had this code all over the place we decided it would make sense to create a common function that strip the pound sign out. Functional C#: Extracting a higher order function with generics https://www.markhneedham.com/blog/2010/02/08/functional-c-extracting-a-higher-order-function-with-generics/ Mon, 08 Feb 2010 23:17:47 +0000 https://www.markhneedham.com/blog/2010/02/08/functional-c-extracting-a-higher-order-function-with-generics/ While working on some code with Toni we realised that we’d managed to create two functions that were almost exactly the same except they made different service calls and returned collections of a different type. The similar functions were like this: private IEnumerable<Foo> GetFoos(Guid id) { IEnumerable<Foo> foos = new List<Foo>(); try { foos = fooService.GetFoosFor(id); } catch (Exception e) { // do some logging of the exception } return foos; } private IEnumerable<Bar> GetBars(Guid id) { IEnumerable<Bar> bars = new List<Bar>(); try { bars = barService. Willed vs Forced designs https://www.markhneedham.com/blog/2010/02/08/willed-vs-forced-designs/ Mon, 08 Feb 2010 22:48:05 +0000 https://www.markhneedham.com/blog/2010/02/08/willed-vs-forced-designs/ I came across an interesting post that Roy Osherove wrote a few months ago where he talks about 'Willed vs Forced Designs' and some common arguments that people give for not using TypeMock on their projects. I’m not really a fan of the TypeMock approach to dealing with dependencies in tests because it seems to avoid the fact that the code is probably bad in the first place if we have to resort to using some of the approaches it encourages. F#: function keyword https://www.markhneedham.com/blog/2010/02/07/f-function-keyword/ Sun, 07 Feb 2010 02:54:13 +0000 https://www.markhneedham.com/blog/2010/02/07/f-function-keyword/ I’ve been browsing through Chris Smith’s Programming F# book and in the chapter on pattern matching he describes the 'function' key word which I haven’t used before. It’s used in pattern matching expressions when we want to match against one of the parameters passed into the function which contains the pattern match. For example if we have this somewhat contrived example: let isEven value = match value with | x when (x % 2) = 0 -> true | _ -> false That could be rewritten using the function keyword to the following: Functional C#: LINQ vs Method chaining https://www.markhneedham.com/blog/2010/02/05/functional-c-linq-vs-method-chaining/ Fri, 05 Feb 2010 18:06:28 +0000 https://www.markhneedham.com/blog/2010/02/05/functional-c-linq-vs-method-chaining/ One of the common discussions that I’ve had with several colleagues when we’re making use of some of the higher order functions that can be applied on collections is whether to use the LINQ style syntax or to chain the different methods together. I tend to prefer the latter approach although when asked the question after my talk at Developer Developer Developer I didn’t really have a good answer other than to suggest that it seemed to just be a personal preference thing. Coding: Wrapping/not wrapping 3rd party libraries and DSLs https://www.markhneedham.com/blog/2010/02/02/coding-wrappingnot-wrapping-3rd-party-libraries-and-dsls/ Tue, 02 Feb 2010 23:54:21 +0000 https://www.markhneedham.com/blog/2010/02/02/coding-wrappingnot-wrapping-3rd-party-libraries-and-dsls/ One of the things which Nat Pryce and Steve Freeman suggest in their book Growing Object Oriented Software guided by tests is the idea of wrapping any third party libraries that we use in our own code. We came across a situation where we did this and then later on I made the mistake of not following this advice. To start with my colleague David had created a DSL which kept all the calls to Selenium nicely wrapped inside one class. Functional C#: Writing a 'partition' function https://www.markhneedham.com/blog/2010/02/01/functional-c-writing-a-partition-function/ Mon, 01 Feb 2010 23:34:02 +0000 https://www.markhneedham.com/blog/2010/02/01/functional-c-writing-a-partition-function/ One of the more interesting higher order functions that I’ve come across while playing with F# is the partition function which is similar to the filter function except it returns the values which meet the predicate passed in as well as the ones which don’t. I came across an interesting problem recently where we needed to do exactly this and had ended up taking a more imperative for each style approach to solve the problem because this function doesn’t exist in C# as far as I know. DDD8: Mixing functional and object oriented approaches to programming in C# https://www.markhneedham.com/blog/2010/01/31/ddd8-mixing-functional-and-object-oriented-approaches-to-programming-in-c/ Sun, 31 Jan 2010 14:05:05 +0000 https://www.markhneedham.com/blog/2010/01/31/ddd8-mixing-functional-and-object-oriented-approaches-to-programming-in-c/ I did a presentation titled 'Mixing functional and object oriented approaches to programming in C#' at the Developer Developer Developer conference in Reading. The slides from the talk are below: Mixing functional and object oriented approaches to programming in C# I’ve not done many technical talks so far. My only previous attempt was a talk on F# one at the Sydney Alt.NET user group last year so I’m still learning how to do this effectively. Book Club: Growing Object Oriented Software - Chapter 7 (Steve Freeman & Nat Pryce) https://www.markhneedham.com/blog/2010/01/28/book-club-growing-object-oriented-software-chapter-7-steve-freeman-nat-pryce/ Thu, 28 Jan 2010 19:13:22 +0000 https://www.markhneedham.com/blog/2010/01/28/book-club-growing-object-oriented-software-chapter-7-steve-freeman-nat-pryce/ My colleague David Santoro has started up a technical book club at the client we’re working at in Wales and the book choice for the first session was Chapter 7 - Achieving Object Oriented Design - of Growing Object Oriented Software, guided by tests written by Steve Freeman and Nat Pryce. In this chapter they cover various approaches for driving towards object oriented code including techniques to find new objects and a detailed description of TDD and how we can approach this in a way that allows us to drive out new behaviour effectively. Automapper: Don't forget Mapper.Reset() at the start https://www.markhneedham.com/blog/2010/01/27/automapper-dont-forget-mapper-reset-at-the-start/ Wed, 27 Jan 2010 07:57:22 +0000 https://www.markhneedham.com/blog/2010/01/27/automapper-dont-forget-mapper-reset-at-the-start/ I wrote about my first thoughts using Automapper last week and although I realised that it makes use of the static gateway pattern we ran into a problem where two consecutive calls to a method using AutoMapper always returned the same value for one of the mappings. The code was roughly like this: public Bar CreateNewBar(Bar originalBar, string someNewValue) { Mapper.CreateMap<Baz, Baz>() .ForMember(x => x.Id, opts => opts.Ignore()) .ForMember(x => x. TDD: Rewriting/refactoring tests https://www.markhneedham.com/blog/2010/01/25/tdd-rewritingrefactoring-tests/ Mon, 25 Jan 2010 22:06:23 +0000 https://www.markhneedham.com/blog/2010/01/25/tdd-rewritingrefactoring-tests/ I’ve read several times about the dangers of the big rewrite when it comes to production code but I’ve recently been wondering whether or not we should apply the same rules when it comes to test code or not. I worked with Raphael Speyer for a few weeks last year and on the code base we were working on he often spent some time rewriting tests originally written using rMock to use mockito which was the framework we were driving towards. TDD: Simplifying a test with a hand rolled stub https://www.markhneedham.com/blog/2010/01/25/tdd-simplifying-a-test-with-a-hand-rolled-stub/ Mon, 25 Jan 2010 21:23:31 +0000 https://www.markhneedham.com/blog/2010/01/25/tdd-simplifying-a-test-with-a-hand-rolled-stub/ I wrote a couple of weeks ago about my thoughts on hand written stubs vs framework generated stubs and I noticed an interesting situation where it helped me out while trying to simplify some test code. The code in question was making use of several framework generated stubs/mocks and one in particular was trying to return different values depending on the value passed as a parameter. The test was failing and I spent about half an hour unsuccessfully trying to work out why it wasn’t working as expected before I decided to replace it with a hand rolled stub that did exactly what I wanted. TDD: Removing the clutter https://www.markhneedham.com/blog/2010/01/24/tdd-removing-the-clutter/ Sun, 24 Jan 2010 01:13:57 +0000 https://www.markhneedham.com/blog/2010/01/24/tdd-removing-the-clutter/ I got the chance to work with Phil for a couple of weeks last year and one of the most interesting things that he started teaching me was the importance of reducing the clutter in our tests and ensuring that we take some time to refactor them as well as the code as part of the 'red-green-refactor' cycle. I’m still trying to work out the best way to do this but I came across a really interesting post by J. Coding: The collecting parameter pattern https://www.markhneedham.com/blog/2010/01/23/coding-the-collecting-parameter-pattern/ Sat, 23 Jan 2010 14:45:59 +0000 https://www.markhneedham.com/blog/2010/01/23/coding-the-collecting-parameter-pattern/ The collecting parameter pattern is one of my favourite ones when used well but I’ve noticed recently that it can lead to quit misleading APIs as well. One way that we used it quite effectively was when getting objects to render themselves to a ViewData container which was then used to populate the view. public class Micro { private string micro; public Micro(string micro) { this.micro = micro; } public void renderTo(ViewData viewData) { viewData. Automapper: First thoughts https://www.markhneedham.com/blog/2010/01/22/automapper-first-thoughts/ Fri, 22 Jan 2010 23:21:56 +0000 https://www.markhneedham.com/blog/2010/01/22/automapper-first-thoughts/ I came across Jimmy Bogard’s Automapper library a while ago but hadn’t had the opportunity to try it out on a project until this week. The problem we wanted to solve was relatively simple. We had a domain object and we wanted to create a copy of that with one of the fields changed and all of the ids cleared from the object and any objects contained within it so that we could persist the new web of objects to the database. Functional collectional parameters: Some thoughts https://www.markhneedham.com/blog/2010/01/20/functional-collectional-parameters-some-thoughts/ Wed, 20 Jan 2010 22:45:55 +0000 https://www.markhneedham.com/blog/2010/01/20/functional-collectional-parameters-some-thoughts/ I’ve been reading through a bit of Steve Freeman and Nat Pryce’s 'Growing Object Oriented Software guided by tests' book and I found the following observation in chapter 7 quite interesting: When starting a new area of code, we might temporarily suspend our design judgment and just write code without attempting to impose much structure. It’s interesting that they don’t try and write perfect code the first time around which is actually something I thought experienced developers did until I came across Uncle Bob’s Clean Code book where he suggested something similar. Strategic Design (Responsibility Traps) - Eric Evans https://www.markhneedham.com/blog/2010/01/18/strategic-design-responsibility-traps-eric-evans/ Mon, 18 Jan 2010 22:52:15 +0000 https://www.markhneedham.com/blog/2010/01/18/strategic-design-responsibility-traps-eric-evans/ Reading through some of Simon Harris' blog entries I came across his thoughts on a presentation Eric Evans did at QCon titled 'Strategic Design - Responsibility Traps' which seems to cover a lot of the ground from the second half of Domain Driven Design and more. In the presentation Evans make some really insightful comments and points out a lot of mistakes that I’ve made on projects. It certainly serves as a reminder to go back and read part 4 of the book again and really understand the material from that section. Coding: Missing abstractions and LINQ https://www.markhneedham.com/blog/2010/01/17/coding-missing-abstractions-and-linq/ Sun, 17 Jan 2010 19:09:35 +0000 https://www.markhneedham.com/blog/2010/01/17/coding-missing-abstractions-and-linq/ Something which I’ve noticed quite a lot on the projects that I’ve worked on since C# 3.0 was released is that lists seem to be passed around code much more and have LINQ style filters and transformations performed on them while failing to describe the underlying abstraction explcitly in the code. As a result of this we quite frequently we end up with this code being in multiple places and since it’s usually not very much code the repetition goes unnoticed more than other types of duplication might do. Nant: Populating templates https://www.markhneedham.com/blog/2010/01/16/nant-populating-templates/ Sat, 16 Jan 2010 00:13:30 +0000 https://www.markhneedham.com/blog/2010/01/16/nant-populating-templates/ One of the common tasks that we need to do on every project I’ve worked on is ensure that we can create a web.config file for the different environments that we need to deploy our application to. Nant has quite a neat task called 'expandproperties' which allows us to do this quite easily. In our build file we would have the following: build-file.build <property name ="configFile" value="${environment}.properties" readonly="true"/> <if test="${not file::exists(configFile)}"> <fail message="Configuration file '${configFile}' could not be found. C#: A functional solution to a modeling problem https://www.markhneedham.com/blog/2010/01/15/c-a-functional-solutional-to-a-modeling-problem/ Fri, 15 Jan 2010 23:23:58 +0000 https://www.markhneedham.com/blog/2010/01/15/c-a-functional-solutional-to-a-modeling-problem/ We were working on some refactoring today where we pushed some logic back from a service and onto a domain object and I noticed that we were able to use functions quite effectively to reduce the amount of code we had to write while still describing differences in behaviour. The class we want to write needs to take in two integers which represent two different situations related to Foo. Depending upon whether we have 'Situation 1', 'Situation 2' or both situations we will display the results slightly differently. TDD: Thoughts on using a clock in tests https://www.markhneedham.com/blog/2010/01/15/tdd-thoughts-on-using-a-clock-in-tests/ Fri, 15 Jan 2010 21:56:48 +0000 https://www.markhneedham.com/blog/2010/01/15/tdd-thoughts-on-using-a-clock-in-tests/ A few months ago Uncle Bob wrote a post about TDD where he suggested that he preferred to use hand created stubs in his tests wherever possible and only resorted to using a Mockito created stub as a last resort. I wrote previously about my thoughts of where to use each of the two approaches and one example of where hand written stubs seems to make sense is the clock. TDD: Hand written stubs vs Framework generated stubs https://www.markhneedham.com/blog/2010/01/15/tdd-hand-written-stubs-vs-framework-generated-stubs/ Fri, 15 Jan 2010 21:44:36 +0000 https://www.markhneedham.com/blog/2010/01/15/tdd-hand-written-stubs-vs-framework-generated-stubs/ A few months ago Uncle Bob wrote a post about TDD where he suggested that he preferred to use hand created stubs in his tests wherever possible and only resorted to using a Mockito created stub as a last resort. I’ve tended to use framework created ones but my colleague Matt Dunn and I noticed that it didn’t seem to work out too well for us writing some tests around a controller where the majority of our tests were making exactly the same call to that repository and expected to receive the same return value but a few select edge cases expected something different. F#: Refactoring to sequence/for expressions https://www.markhneedham.com/blog/2010/01/14/f-refactoring-to-sequencefor-expressions/ Thu, 14 Jan 2010 08:01:29 +0000 https://www.markhneedham.com/blog/2010/01/14/f-refactoring-to-sequencefor-expressions/ Since I started playing around with F# one of the things I’ve been trying to do is not use the 'for' keyword because I was trying to avoid writing code in an imperative way and for loops are a big part of this for me. Having read Jon Harrop’s solution to the word count problem where he made use of both sequence and for expressions I thought it’d be intersting to see what some of the code I’ve written would look like using that approach. C# Test Builder Pattern: My current thinking https://www.markhneedham.com/blog/2010/01/13/c-test-builder-pattern-my-current-thinking/ Wed, 13 Jan 2010 01:37:15 +0000 https://www.markhneedham.com/blog/2010/01/13/c-test-builder-pattern-my-current-thinking/ I’ve written previously about the test builder pattern in C# and having noticed some different implementations of this pattern I thought it’d be interesting to post my current thinking on how to use it. One thing I’ve noticed is that we often end up just creating methods which effectively act as setters rather than easing the construction of an object. This seems to happen most commonly when the value we want to set is a boolean value. F#: Refactoring to pattern matching https://www.markhneedham.com/blog/2010/01/12/f-refactoring-to-pattern-matching/ Tue, 12 Jan 2010 01:33:58 +0000 https://www.markhneedham.com/blog/2010/01/12/f-refactoring-to-pattern-matching/ I was looking through some of the F# code I’ve written recently and I realised that I was very much writing C# in F# with respect to the number of if statements I’ve been using. I thought it would be interesting to see what the code would look like if I was able to refactor some of that code to make use of pattern matching instead which would be a more idiomatic way of solving the problem in F#. C# Object Initializer: More thoughts https://www.markhneedham.com/blog/2010/01/10/c-object-initializer-more-thoughts/ Sun, 10 Jan 2010 18:52:22 +0000 https://www.markhneedham.com/blog/2010/01/10/c-object-initializer-more-thoughts/ I wrote previously about my dislike of C#'s object initializer syntax and while I still think those arguments hold I came across an interesting argument for why it is a useful feature in Jeremy Miller’s MSDN article on creating internal DSLs in C#. In the article Jeremy works through an example where he builds up a 'SendMessageRequest' first by using a fluent interface and then by making use of object initializer syntax. Roy Osherove's TDD Kata: An F# attempt https://www.markhneedham.com/blog/2010/01/10/roy-osheroves-tdd-kata-an-f-attempt/ Sun, 10 Jan 2010 01:46:07 +0000 https://www.markhneedham.com/blog/2010/01/10/roy-osheroves-tdd-kata-an-f-attempt/ As I’ve mentioned in a few of my recent posts I’ve been having another go at Roy Osherove’s TDD Kata but this time in F#. One thing I’ve been struggling with when coding in F# is working out how many intermediate variables we actually need. They can be useful for expressing intent better but they’re clutter in a way. I’ve included my solution at the end and in the active pattern which determines whether or not we have a custom delimeter defined in our input string I can’t decide whether or not to create a value to represent the expressions that determine that. F#: Refactoring to active patterns https://www.markhneedham.com/blog/2010/01/07/f-refactoring-to-active-patterns/ Thu, 07 Jan 2010 23:31:37 +0000 https://www.markhneedham.com/blog/2010/01/07/f-refactoring-to-active-patterns/ I’ve been playing around with more F# code and after realising that I’d peppered the code with if statements I thought it would be interesting to try and refactor it to make use of active patterns. The code is part of my F# solution to Roy Osherove’s TDD Kata and is used to parse the input string and find which delimeters are being used. This is the original code: TDD: Hungarian notation for mocks/stubs https://www.markhneedham.com/blog/2010/01/06/tdd-hungarian-notation-for-mocksstubs/ Wed, 06 Jan 2010 00:08:14 +0000 https://www.markhneedham.com/blog/2010/01/06/tdd-hungarian-notation-for-mocksstubs/ A fairly common discussion that I’ve had with several of my colleagues is around the way that we name the variables used for mocks and stubs in our tests. There seems to be about a 50/50 split between including 'Stub' or 'Mock' on the end of those variable names and not doing so. In a simple example test using Rhino Mocks as the testing framework this would be the contrast between the two approaches: F#: String.Split with a multi character delimeter https://www.markhneedham.com/blog/2010/01/05/f-string-split-with-a-multi-character-delimeter/ Tue, 05 Jan 2010 23:10:56 +0000 https://www.markhneedham.com/blog/2010/01/05/f-string-split-with-a-multi-character-delimeter/ In my continued efforts at Roy Osherove’s TDD Kata I’ve been trying to work out how to split a string based on a delimeter which contains more than one character. My original thinking was that it should be possible to do so like this: "1***2".Split("***".ToCharArray());; I didn’t realise that splitting the string like that splits on each of the stars individually which means that we end up getting 2 empty values in the result: F#: Expressing intent and the forward/application operators https://www.markhneedham.com/blog/2010/01/04/f-expressing-intent-and-the-forwardapplication-operators/ Mon, 04 Jan 2010 11:11:10 +0000 https://www.markhneedham.com/blog/2010/01/04/f-expressing-intent-and-the-forwardapplication-operators/ A while ago I wrote about F#'s forward and application operators where I’d looked at how these could be used to simplify code and while trying out Roy Osherove’s TDD Kata I realised that perhaps the choice of which of these to use or whether to use them at all depends on what intent we’re expressing. The specific bit of code I was writing was for raising an exception if negative values were provided and I originally thought I’d use the forward operator to express this code: The Last Lecture - Randy Pausch https://www.markhneedham.com/blog/2010/01/01/the-last-lecture-randy-pausch/ Fri, 01 Jan 2010 14:32:58 +0000 https://www.markhneedham.com/blog/2010/01/01/the-last-lecture-randy-pausch/ I recently watched Randy Pausch’s 'Last Lecture: Achieving Your Childhood Dreams' and read the corresponding book and although it’s not directly related to software development I think that some of the points that he makes are really intriguing. These were some of the parts that particularly stood out for me: Introduce the elephant in the room - whatever it is that people are really thinking about, put it out in the open. OOP: Behavioural and Structural constraints https://www.markhneedham.com/blog/2009/12/31/oop-behavioural-and-structural-constraints/ Thu, 31 Dec 2009 16:08:25 +0000 https://www.markhneedham.com/blog/2009/12/31/oop-behavioural-and-structural-constraints/ A few months ago I wrote a post describing how we should test the behaviour of code rather than the implementation whereby we would write tests against the public API of an object rather than exposing other internal data of the object and testing against that directly. While I still think this is a useful way of testing code I didn’t really have a good definition for what makes that a test of an object’s behaviour. Roy Osherove's TDD Kata: My first attempt https://www.markhneedham.com/blog/2009/12/25/roy-osheroves-tdd-kata-my-first-attempt/ Fri, 25 Dec 2009 22:25:57 +0000 https://www.markhneedham.com/blog/2009/12/25/roy-osheroves-tdd-kata-my-first-attempt/ I recently came across Roy Osherove’s commentary on Corey Haines' attempt at Roy’s TDD Kata so I thought I’d try it out in C#. Andrew Woodward has recorded his version of the kata where he avoids using the mouse for the whole exercise so I tried to avoid using the mouse as well and it was surprisingly difficult! I’ve only done the first part of the exercise so far which is as follows: Debug It: Book Review https://www.markhneedham.com/blog/2009/12/24/debug-it-book-review/ Thu, 24 Dec 2009 05:26:46 +0000 https://www.markhneedham.com/blog/2009/12/24/debug-it-book-review/ David Agans' 'Debugging' is the best debugging book that I’ve read so I was intrigued to see that there was another book being written on the subject. Paul Butcher offered me a copy of the book to review so I was keen to see whether it was more like 'Debugging' or 'Release It' as Ted Neward suggests. The Book Debug It by Paul Butcher The Review Much like Krzysztof Kozmic I found that a lot of the ideas early on in the book were similar to what I’ve been taught by my ThoughtWorks colleagues over the last 3 1/2 years. Duke Nukem Forever & Reworking code https://www.markhneedham.com/blog/2009/12/23/duke-nukem-forever-reworking-code/ Wed, 23 Dec 2009 07:27:51 +0000 https://www.markhneedham.com/blog/2009/12/23/duke-nukem-forever-reworking-code/ Cosmin Stejerean linked to a really interesting article on wired.com which tells the story of how Duke Nukem failed over 12 years to ship their latest game, eventually giving up. Phil has written a post about the article from the angle of his experience working with these types of companies and working out how to get something into production but as I read this article it seemed to have some relation to reworking code and why/how we approach this. One change at a time https://www.markhneedham.com/blog/2009/12/22/one-change-at-a-time/ Tue, 22 Dec 2009 06:01:04 +0000 https://www.markhneedham.com/blog/2009/12/22/one-change-at-a-time/ I’m reading through Paul Butcher’s 'Debug It' book and one of his suggestions when trying to diagnose a problem in our code is to only change one thing at a time. In a way this might seem fairly obvious but I’ve certainly fallen into the trap of making multiple changes at the same time in the misled belief that it’ll lead to the problem being solved more quickly. When making changes to code Butcher has the following piece of advice which I quite like: F#: Word Count using a Dictionary https://www.markhneedham.com/blog/2009/12/20/f-word-count-using-a-dictionary/ Sun, 20 Dec 2009 10:09:30 +0000 https://www.markhneedham.com/blog/2009/12/20/f-word-count-using-a-dictionary/ Having spent some time unsuccessfully trying to make my F# attempt at the word count problem work I decided to follow the lead of the other examples I’ve read and make use of a Dictionary to keep count of the words. I originally thought that I might be having a problem with the downloading of the files and storing of those strings in memory so I tried to change that bit of code to be lazily evaluated: Book Club: Working Effectively With Legacy Code - Chapters 12 & 13 (Michael Feathers) https://www.markhneedham.com/blog/2009/12/20/book-club-working-effectively-with-legacy-code-chapters-12-13-michael-feathers/ Sun, 20 Dec 2009 03:52:12 +0000 https://www.markhneedham.com/blog/2009/12/20/book-club-working-effectively-with-legacy-code-chapters-12-13-michael-feathers/ In the last Sydney book club that I attended before I moved back to the UK we discussed Chapters 12 and 13 of Michael Feathers' 'Working Effectively With Legacy Code' Liz has taken over the summarising of the book club now that I’m not there so if you want to keep on reading about the book club Liz’s blog is the place to go! Chapter 12 - I Need to Make Many Changes in One Area. F#: The use keyword and using function https://www.markhneedham.com/blog/2009/12/19/f-the-use-keyword-and-using-function/ Sat, 19 Dec 2009 10:33:57 +0000 https://www.markhneedham.com/blog/2009/12/19/f-the-use-keyword-and-using-function/ While I was playing around with the little F# script that I wrote to try and solve the word count problem I noticed that in a couple of places I had used the 'use' keyword when dealing with resources that needed to be released when they’d been used. Using the 'use' keyword means that the 'Dispose' method will be called on the resource when it goes out of scope. You and Your Research - Richard Hamming https://www.markhneedham.com/blog/2009/12/19/you-and-your-research-richard-hamming/ Sat, 19 Dec 2009 02:52:15 +0000 https://www.markhneedham.com/blog/2009/12/19/you-and-your-research-richard-hamming/ Another paper that I read on my Sydney to London flight was one titled 'You and Your Research' by Richard Hamming. It’s a transcript of a talk that Richard Hamming gave to Bellcore employees at the Morris Research and Engineering Centre in 1986. The talk is aimed at computer science researchers and Hamming describes ways for them to do the best research that they can. I think several of the ideas in the talk relate to software development as well. Coding: An outside in observation https://www.markhneedham.com/blog/2009/12/19/coding-an-outside-in-observation/ Sat, 19 Dec 2009 00:55:19 +0000 https://www.markhneedham.com/blog/2009/12/19/coding-an-outside-in-observation/ I’ve been reading Michl Henning’s post on API design and one thing which he points out is that it’s important to drive the design of an API based on the way that it will be used by its clients: A great way to get usable APIs is to let the customer (namely, the caller) write the function signature, and to give that signature to a programmer to implement. This step alone eliminates at least half of poor APIs: too often, the implementers of APIs never use their own creations, with disastrous consequences for usability F#: Word Count - A somewhat failed attempt https://www.markhneedham.com/blog/2009/12/18/f-word-count-a-somewhat-failed-attempt/ Fri, 18 Dec 2009 02:58:34 +0000 https://www.markhneedham.com/blog/2009/12/18/f-word-count-a-somewhat-failed-attempt/ I came across Zach Cox’s word count problem via Sam Aaron and Ola Bini’s twitter streams and I thought it’d be interesting to try it out in F# to see what the solution would be like. The solution needs to count word frequencies from a selection of newsgroup articles. I wanted to see if it was possible to write it in F# without using a map to keep track of how many of each word had been found. Coding: Naming https://www.markhneedham.com/blog/2009/12/16/coding-naming/ Wed, 16 Dec 2009 22:08:22 +0000 https://www.markhneedham.com/blog/2009/12/16/coding-naming/ Sarah Taraporewalla recently wrote an interesting post about the importance of words with respect to the way that we use them in our code and it reminded me of some conversations I’ve had with Dave Cameron about the importance of creating a shared understanding of the different types/objects in the systems that we build. On a few projects that I’ve worked on where we didn’t have a common understanding of what different concepts in the domain should be I noticed that there was a reluctance to make changes to class names. The Computer Scientist as Toolsmith - Fred Brooks https://www.markhneedham.com/blog/2009/12/16/the-computer-scientist-as-toolsmith-fred-brooks/ Wed, 16 Dec 2009 06:15:14 +0000 https://www.markhneedham.com/blog/2009/12/16/the-computer-scientist-as-toolsmith-fred-brooks/ I’ve come across a couple of posts recently talking about the gender specificness of the term 'Software Craftsman' and Victoria suggests that the term 'Codesmith' would be a more appropriate name to use. I’m not that bothered what the name is but I was reading the transcript of Fred Brooks' acceptance speech for winning the ACM Allen Newell Award in 1994 titled 'The Computer Scientist as Toolsmith' which has some interesting ideas about what our role should be. Coding: The little details all add to our understanding https://www.markhneedham.com/blog/2009/12/15/coding-the-little-details-all-add-to-our-understanding/ Tue, 15 Dec 2009 08:09:05 +0000 https://www.markhneedham.com/blog/2009/12/15/coding-the-little-details-all-add-to-our-understanding/ I’ve been watching an interesting presentation by Scott Hanselmann titled 'Information Overload and Managing the Flow' from OreDev where he covers various strategies to allow us to be more productive in the face of the huge amounts of information constantly threatening to overwhelm us. One interesting suggestion he has around 37 minutes in is that when learning a new language it might be a good idea to contact someone who’s an expert in that language and get some framing knowledge on the type of stuff that’s worth learning and what we might not bother with. TDD: Only mock types you own https://www.markhneedham.com/blog/2009/12/13/tdd-only-mock-types-you-own/ Sun, 13 Dec 2009 21:47:04 +0000 https://www.markhneedham.com/blog/2009/12/13/tdd-only-mock-types-you-own/ Liz recently posted about mock objects and the original 'mock roles, not objects' paper and one thing that stood out for me is the idea that we should only mock types that we own. I think this is quite an important guideline to follow otherwise we can end up in a world of pain. One area which seems particularly vulnerable to this type of thing is when it comes to testing code which interacts with Hibernate. Clojure: My first attempt at a macro https://www.markhneedham.com/blog/2009/12/12/clojure-my-first-attempt-at-a-macro/ Sat, 12 Dec 2009 03:53:37 +0000 https://www.markhneedham.com/blog/2009/12/12/clojure-my-first-attempt-at-a-macro/ I’m up to the chapter on using macros in Stuart Halloway’s 'Programming Clojure' book and since I’ve never used a language which has macros in before I thought it’d be cool to write one. In reality there’s no reason to create a macro to do what I want to do but I wanted to keep the example simple so I could try and understand exactly how macros work. I want to create a macro which takes in one argument and then prints hello and the person’s name. Clojure: Forgetting the brackets https://www.markhneedham.com/blog/2009/12/12/clojure-forgetting-the-brackets/ Sat, 12 Dec 2009 03:51:19 +0000 https://www.markhneedham.com/blog/2009/12/12/clojure-forgetting-the-brackets/ I’ve been playing around with macros over the last few days and while writing a simple one forgot to include the brackets to make it evaluate correctly: (defmacro say-hello [person] println "Hello" person) This macro doesn’t even expand like I thought it would: user=> (macroexpand-1 '(say-hello blah)) blah That seemed a bit strange to me but I eventually realised that I’d missed off the brackets around 'println' and the arguments following it which would have resulted in 'println' being evaluated with those arguments. TDD: Big leaps and small steps https://www.markhneedham.com/blog/2009/12/10/tdd-big-leaps-and-small-steps/ Thu, 10 Dec 2009 22:14:26 +0000 https://www.markhneedham.com/blog/2009/12/10/tdd-big-leaps-and-small-steps/ About a month ago or so Gary Bernhardt wrote a post showing how to get started with TDD and while the post is quite interesting, several comments on the post pointed out that he had jumped from iteratively solving the problem straight to the solution with his final step. Something which I’ve noticed while solving algorithmic problems in couple of different functional programming languages is that the test driven approach doesn’t work so well for these types of problems. Haskell vs F#: Function composition https://www.markhneedham.com/blog/2009/12/09/haskell-vs-f-function-composition/ Wed, 09 Dec 2009 22:10:27 +0000 https://www.markhneedham.com/blog/2009/12/09/haskell-vs-f-function-composition/ I’m reading through John Hughes' 'Why functional programming matters' paper and one thing I’ve come across which is a bit counter intuitive to me is the Haskell function composition operator. I’ve written previously about F#'s function composition operator which is defined as follows: let inline (>>) f g x = g(f x) To write a function which doubled all the values in a list and then returned the odd values we’d do this: Clojure: when-let macro https://www.markhneedham.com/blog/2009/12/09/clojure-when-let-macro/ Wed, 09 Dec 2009 02:41:47 +0000 https://www.markhneedham.com/blog/2009/12/09/clojure-when-let-macro/ In my continued playing around with Clojure I came across the 'when-let' macro. 'when-let' is used when we want to bind an expression to a symbol and only execute the body provided as the second argument to the macro if that symbol evaluates to true. As I wrote previously, a value of 'false' or 'nil' would result in the second argument not being evaluated. A simple example of using 'when-let' would be: Our obsession with efficiency - Dan North https://www.markhneedham.com/blog/2009/12/07/our-obsession-with-efficiency-dan-north/ Mon, 07 Dec 2009 17:05:57 +0000 https://www.markhneedham.com/blog/2009/12/07/our-obsession-with-efficiency-dan-north/ Oredev have put some of the videos from the conference on Vimeo and one of my favourites is 'Our obsession with efficiency' by my colleague Dan North. The slides for the talk are available on SlideShare. In this talk Dan leads from the following statement about efficiency: So here’s the thing, I don’t believe in efficiency. It’s our obsession with efficiency that has got us into the current technology mess, and which has led almost directly to heavy waterfall processes. Clojure: Unit testing in the REPL https://www.markhneedham.com/blog/2009/12/06/clojure-unit-testing-in-the-repl/ Sun, 06 Dec 2009 03:28:05 +0000 https://www.markhneedham.com/blog/2009/12/06/clojure-unit-testing-in-the-repl/ One thing which I think is great about coding with F# is the quick feedback that we can get by defining and then testing out functions in the REPL. We can do the same thing in Clojure but it’s even better because we can also define and run unit tests which I think is pretty neat. Nurullah Akkaya has a good post which describes how to use clojure.test, a testing framework written by Stuart Sierra so I’ve been using that to define some tests cases for the little RSS feed parser that I’m writing. Book Club: Working Effectively With Legacy Code - Chapter 11 (Michael Feathers) https://www.markhneedham.com/blog/2009/12/03/book-club-working-effectively-with-legacy-code-chapter-11-michael-feathers/ Thu, 03 Dec 2009 16:27:29 +0000 https://www.markhneedham.com/blog/2009/12/03/book-club-working-effectively-with-legacy-code-chapter-11-michael-feathers/ In our latest technical book club we discussed chapter 11 - 'I Need to Make a Change. What Methods Should I Test?' - of Michael Feathers' 'Working Effectively With Legacy Code'. In this chapter Feathers covers some techniques which allow us to work out which parts of the code we need to write tests around when we make changes. These are some of my thoughts and our discussion of the chapter: Fundamentals of Object-Oriented Design in UML: Book Review https://www.markhneedham.com/blog/2009/12/01/fundamentals-of-object-oriented-design-in-uml-book-review/ Tue, 01 Dec 2009 23:26:38 +0000 https://www.markhneedham.com/blog/2009/12/01/fundamentals-of-object-oriented-design-in-uml-book-review/ One of my favourite recent blog posts is one written by Sammy Larbi on coupling and cohesion and while discussing it with Phil he suggested that I would probably like this book and in particular the chapter on connascence which I’ve previously written about. The Book Fundamentals of Object-Oriented Design in UML by Meilir Page-Jones The Review I really enjoyed reading this book and I think it’s one that I could come back and read again to gain something else from in the future. Clojure: Parsing an RSS feed https://www.markhneedham.com/blog/2009/11/30/clojure-parsing-an-rss-feed/ Mon, 30 Nov 2009 18:33:55 +0000 https://www.markhneedham.com/blog/2009/11/30/clojure-parsing-an-rss-feed/ I’ve been playing around with a little script in Clojure to parse the ThoughtWorks Blogs RSS feed and then create a tweet for each of them which contains a link to the blog post and the person’s Twitter ID if they have one. It’s not finished yet but I’m finding the way that we parse documents like this in Clojure quite intriguing. The xml to parse looks roughly like this: TDD: Testing delegation https://www.markhneedham.com/blog/2009/11/27/tdd-testing-delegation/ Fri, 27 Nov 2009 14:43:45 +0000 https://www.markhneedham.com/blog/2009/11/27/tdd-testing-delegation/ I recently came across an interesting blog post by Rod Hilton on unit testing and it reminded me of a couple of conversations Phil, Raph and I were having about the best way to test classes which delegate some responsibility to another class. An example that we ran into recently was where we wrote some code which required one controller to delegate to another. public class ControllerOne extends Controller { public ModelAndView handleRequest(HttpServletRequest request, HttpServletResponse response) throws Exception { } } public class ControllerTwo extends Controller { private final ControllerOne controllerOne; public ControllerTwo(ControllerOne controllerOne) { this. Clojure: The 'apply' function https://www.markhneedham.com/blog/2009/11/25/clojure-the-apply-function/ Wed, 25 Nov 2009 11:59:11 +0000 https://www.markhneedham.com/blog/2009/11/25/clojure-the-apply-function/ In my continued playing around with Clojure I came across the 'apply' function which is used when we want to call another function with a number of arguments but have actually been given a single argument which contains the argument list. The example that I’ve been trying to understand is applying 'str' to a collection of values. I started off with the following: (str [1 2 3]) => "[1 2 3]" This just returns the string representation of the vector that we passed it, but what we actually want is to get an output of "123". Book Club: Working Effectively With Legacy Code - Chapter 10 (Michael Feathers) https://www.markhneedham.com/blog/2009/11/24/book-club-working-effectively-with-legacy-code-chapter-10-michael-feathers/ Tue, 24 Nov 2009 23:31:25 +0000 https://www.markhneedham.com/blog/2009/11/24/book-club-working-effectively-with-legacy-code-chapter-10-michael-feathers/ In our latest technical book club we discussed chapter 10 - 'I Can’t Run This Method in a Test Harness' - of Michael Feather’s 'Working Effectively With Legacy Code'. In this chapter Feathers outlines some of the problems we might have getting methods under test and then suggests some ways to get around those problems. These are some of my thoughts and our discussion of the chapter: I quite like the idea of pragmatic refactoring that Feathers suggests early on in the chapter: Writing a Java function in Clojure https://www.markhneedham.com/blog/2009/11/23/writing-a-java-function-in-clojure/ Mon, 23 Nov 2009 20:08:20 +0000 https://www.markhneedham.com/blog/2009/11/23/writing-a-java-function-in-clojure/ A function that we had to write in Java on a project that I worked on recently needed to indicate whether there was a gap in a series of data points or not. If there were gaps at the beginning or end of the sequence then that was fine but gaps in the middle of the sequence were not. null, 1, 2, 3 => no gaps 1, 2, 3, null => no gaps 1, null, 2, 3 => gaps The Java version looked a bit like this: Requirements: The story points focus https://www.markhneedham.com/blog/2009/11/23/requirements-the-story-points-focus/ Mon, 23 Nov 2009 11:46:52 +0000 https://www.markhneedham.com/blog/2009/11/23/requirements-the-story-points-focus/ Something which an agile approach on a project typically gives us is the ability to change requirements rapidly based on the different types of feedback we typically get over the course of the project. One way that we can lose this advantage is by getting caught up by the number of story points being completed and using this as the measure of success. The flexibility to change has an impact on the number of story points that may be completed in a given iteration - if we start doing some work on a story and then get feedback from the business while it is still in progress it’s possible that we will end up with more work to do than we had previously. Pair Programming/Helping/Working Collaboratively https://www.markhneedham.com/blog/2009/11/22/pair-programminghelpingworking-collaboratively/ Sun, 22 Nov 2009 16:43:24 +0000 https://www.markhneedham.com/blog/2009/11/22/pair-programminghelpingworking-collaboratively/ Dan North has been presenting his 'Pimp my architecture' talk again at QCon San Francisco this week and after reading the hugely positive feedback on Twitter I decided to watch some of it again. The idea of getting people to help each other rather than pair program is what stood out for me this time, something which Brian Guthrie also pointed out: "We didn’t do pairing, we did 'helping'. You can’t get alpha progs to 'pair' but they’ll tell you what they know. Clojure: Checking for a nil value in a collection https://www.markhneedham.com/blog/2009/11/21/clojure-checking-for-a-nil-value-in-a-collection/ Sat, 21 Nov 2009 22:11:22 +0000 https://www.markhneedham.com/blog/2009/11/21/clojure-checking-for-a-nil-value-in-a-collection/ Something which I wanted to do recently was write a function that would indicate whether a collection contained a nil value. I initially incorrectly thought the 'contains?' function was the one that I wanted: (contains? '(1 nil 2 3) nil) => false I thought it would work the same as the Java equivalent but that function actually checks whether a key exists in a collection rather than a value. It’s more useful when dealing with maps. Clojure: A few things I've been tripping up on https://www.markhneedham.com/blog/2009/11/20/clojure-a-few-things-ive-been-tripping-up-on/ Fri, 20 Nov 2009 13:11:03 +0000 https://www.markhneedham.com/blog/2009/11/20/clojure-a-few-things-ive-been-tripping-up-on/ In my continued playing with Clojure I’m noticing a few things that I keep getting confused about. The meaning of parentheses Much like Keith Bennett I’m not used to parentheses playing such an important role in the way that an expression gets evaluated. As I understand it if an expression is enclosed in parentheses then that means it will be evaluated as a function. For example I spent quite a while trying to work out why the following code kept throwing a class cast exception: Two controllers, type conformance and the Liskov Substitution Principle https://www.markhneedham.com/blog/2009/11/19/two-controllers-type-conformance-and-the-liskov-substitution-principle/ Thu, 19 Nov 2009 00:08:39 +0000 https://www.markhneedham.com/blog/2009/11/19/two-controllers-type-conformance-and-the-liskov-substitution-principle/ An interesting object orientation related problem that Raph and I were looking at recently revolved around the design of two controllers in the application we’ve been working on. The two controllers in question look roughly like this: public class GenericController extends Controller { private final SomeFactory someFactory; public GenericController(SomeFactory someFactory); this.someFactory = someFactory; } public ModelAndView handleRequest(HttpServletRequest request, HttpServletResponse response) throws Exception { // do some stuff but never use 'request' or 'response' } } public class MoreSpecificController extends GenericController { private final SomeFactory someFactory; public MoreSpecificController(SomeFactory someFactory); super(someFactory); } public ModelAndView handleRequest(HttpServletRequest request, HttpServletResponse response) throws Exception { . Book Club: Working Effectively With Legacy Code - Chapter 9 (Michael Feathers) https://www.markhneedham.com/blog/2009/11/18/book-club-working-effectively-with-legacy-code-chapter-9-michael-feathers/ Wed, 18 Nov 2009 17:25:32 +0000 https://www.markhneedham.com/blog/2009/11/18/book-club-working-effectively-with-legacy-code-chapter-9-michael-feathers/ In our latest technical book club we discussed chapter 9 - 'I Can’t Get This Class Into A Test Harness' - of Michael Feather’s 'Working Effectively With Legacy Code'. This chapter goes through various problems that we might have getting a class under test and then suggests different techniques to get around those problems. These are some of my thoughts and our discussion of the chapter: One approach that Feathers describes when dealing with constructors which take in a lot of values is to just pass in nulls for the parameters that we don’t care about. The 'should' word https://www.markhneedham.com/blog/2009/11/17/the-should-word/ Tue, 17 Nov 2009 23:52:42 +0000 https://www.markhneedham.com/blog/2009/11/17/the-should-word/ I’ve been reading Coders at Work recently and one of my favourite answers from the first chapter interview with Jamie Zawinski is the following: I think one thing that’s really important is not to be afraid of your ignorance. If you don’t understand how something works, ask someone who does. A lot of people are skittish about that. And that doesn’t help anybody. Not knowing something doesn’t mean you’re dumb - it just means you don’t know it yet. Clojure: A first look at recursive functions https://www.markhneedham.com/blog/2009/11/17/clojure-a-first-look-at-recursive-functions/ Tue, 17 Nov 2009 11:10:37 +0000 https://www.markhneedham.com/blog/2009/11/17/clojure-a-first-look-at-recursive-functions/ I’m working through Stuart Halloway’s 'Programming Clojure' book and I just got to the section where it first mentions recursive functions. It’s a simple function to countdown from a given number to zero and then return that sequence. This was one of the examples from the book: (defn countdown [result x] (if (zero? x) result (recur (conj result x) (dec x)))) That function could then be called like this: A reminder to talk to the rubber duck https://www.markhneedham.com/blog/2009/11/15/a-reminder-to-talk-to-the-rubber-duck/ Sun, 15 Nov 2009 21:06:42 +0000 https://www.markhneedham.com/blog/2009/11/15/a-reminder-to-talk-to-the-rubber-duck/ Alongside taking a break from it perhaps one of the most effective ways to solve a tricky problem is to describe it to someone else. When pairing This typically isn’t a problem when pair programming although it can still happen if a pair stays together too long and both start making the same possibly incorrect assumptions when trying to solve a problem. In this case it makes sense to call someone else over who can lend a fresh perspective to the problem. Mercurial: hg bisect https://www.markhneedham.com/blog/2009/11/14/mercurial-hg-bisec/ Sat, 14 Nov 2009 11:20:13 +0000 https://www.markhneedham.com/blog/2009/11/14/mercurial-hg-bisec/ We’ve been using Mercurial locally on the project I’ve been working on and Phil showed me a cool feature called 'bisect' a couple of weeks ago which can be helpful for working out which revision we managed to break our code in. It’s been ported across from Git and is included in Mercurial from version 1.0.0 rather than just being an extension. From the bisect extension page: Its behaviour is fairly simple: it takes a first revision known to be correct (i. TDD: Combining the when and then steps https://www.markhneedham.com/blog/2009/11/14/tdd-combining-the-when-and-then-steps/ Sat, 14 Nov 2009 00:17:57 +0000 https://www.markhneedham.com/blog/2009/11/14/tdd-combining-the-when-and-then-steps/ I’ve written before about my favoured approach of writing tests in such a way that they have clear 'Given/When/Then' sections and something which I come across quite frequently is tests where the latter steps have been combined into one method call which takes care of both of these. An example of this which I came across recently was roughly like this: @Test public void shouldCalculatePercentageDifferences() { verifyPercentage(50, 100, 100); verifyPercentage(100, 100, 0); verifyPercentage(100, 50, -50); } private void verifyPercentage(int originalValue, int newValue, int expectedValue) { assertEquals(expectedValue, new PercentageCalculator(). Adapting our approach for the context https://www.markhneedham.com/blog/2009/11/13/adapting-our-approach-for-the-context/ Fri, 13 Nov 2009 06:34:00 +0000 https://www.markhneedham.com/blog/2009/11/13/adapting-our-approach-for-the-context/ Amongst the many posts written recently about unit testing one which I quite liked was written by fallenrogue where he describes how in different contexts/cultures a different approach is favoured which means a technique like TDD might not work so well. cashto, the guy who wrote the original post, agrees with this in the comments on that post: Absolutely right. I write apps on mobile devices in C++. What works for me may not work well for someone who designs websites with RoR, and vice versa. Coding: Pushing the logic back https://www.markhneedham.com/blog/2009/11/11/coding-pushing-the-logic-back/ Wed, 11 Nov 2009 20:30:08 +0000 https://www.markhneedham.com/blog/2009/11/11/coding-pushing-the-logic-back/ I was reading a post on the law of demeter by Richard Hart recently and it reminded me that a lot of the refactorings that we typically do on code bases are about pushing the logic back into objects instead of exposing data and performing calculations elsewhere. An example that I spotted where we did this recently was while building a 'BusinessSummary' object whose state was based on the state of a collection of other objects. Legacy Code: Sensing https://www.markhneedham.com/blog/2009/11/10/legacy-code-sensing/ Tue, 10 Nov 2009 06:33:22 +0000 https://www.markhneedham.com/blog/2009/11/10/legacy-code-sensing/ In 'Working Effectively With Legacy Code' Michael Feathers describes two reasons for wanting to break dependencies in our code - to allow separation and sensing. The former describes the need to get a piece of code into a test harness while the latter describes the need to assert whether that piece of code is doing what we want it to. On the projects I’ve worked on we’ve tended to run into problems with the latter more frequently and Matt and I actually ran into this problem when we were refactoring some code into a role based interface approach. Coding: The agent noun class https://www.markhneedham.com/blog/2009/11/08/coding-the-agent-noun-class/ Sun, 08 Nov 2009 20:44:18 +0000 https://www.markhneedham.com/blog/2009/11/08/coding-the-agent-noun-class/ I refer quite frequently to a post written by my colleague Peter Gillard Moss where he describes the agent noun code smell for class names. An agent noun is defined by Wikipedia as: In linguistics, an agent noun (or nomen agentis) is a word that is derived from another word denoting an action, and that identifies an entity that does that action. Some typical examples of this are classes which end in the name 'Manager', 'Retriever', 'Helper' or even 'Controller' as Carlos points out. Knowing when to persevere and when to change approach https://www.markhneedham.com/blog/2009/11/08/knowing-when-to-persevere-and-when-to-change-approach/ Sun, 08 Nov 2009 09:57:41 +0000 https://www.markhneedham.com/blog/2009/11/08/knowing-when-to-persevere-and-when-to-change-approach/ It strikes me that one of the most important skills to develop in software development is knowing when to keep going with an approach to a problem and when we should stop and try something else. This situation doesn’t always happen because if we have two people available and realise before we start on the task that there is some doubt as to which solution is the most appropriate then we can adopt a set based approach whereby we try out multiple potential solutions in parallel. TDD: Useful when new on a project https://www.markhneedham.com/blog/2009/11/06/tdd-useful-when-new-on-a-project/ Fri, 06 Nov 2009 21:57:10 +0000 https://www.markhneedham.com/blog/2009/11/06/tdd-useful-when-new-on-a-project/ Something which I’ve noticed over the last few projects that I’ve worked on is that at the beginning when I don’t know very much at all about the code base, domain and so on is that pairing with someone to TDD something seems to make it significantly easier for me to follow what’s going on than other approaches I’ve seen. I thought that it was probably because I’m more used to that approach than any other but in Michael Feathers' description of TDD in 'Working Effectively With Legacy Code' he points out the following: Consistency in the code base https://www.markhneedham.com/blog/2009/11/04/consistency-in-the-code-base/ Wed, 04 Nov 2009 21:39:28 +0000 https://www.markhneedham.com/blog/2009/11/04/consistency-in-the-code-base/ I’ve had quite a few discussions with various different colleagues about coding consistency over the last year or so and Pat Kuaand Frank Trindade have both written posts suggesting that we should look to have coding standards on projects in order to avoid the type of pain that having an inconsistent approach can lead to. From what I’ve noticed there seem to be two reasons that we end up with inconsistent code on projects: Reading Code: Unity https://www.markhneedham.com/blog/2009/11/04/reading-code-unity/ Wed, 04 Nov 2009 01:22:56 +0000 https://www.markhneedham.com/blog/2009/11/04/reading-code-unity/ I spent a bit of time reading some of the Unity code base recently and I decided to try out a variation of Michael Feathers 'Effect Sketching' which my colleague Dave Cameron showed me. 'Effect Sketching' is a technique Feathers describes in 'Working Effectively With Legacy Code' and the idea is that we sketch a diagram showing the interactions between the fields and methods in a specific class while browsing through the code. Book Club: Working Effectively With Legacy Code - Chapter 8 (Michael Feathers) https://www.markhneedham.com/blog/2009/11/03/book-club-working-effectively-with-legacy-code-chapter-8-michael-feathers/ Tue, 03 Nov 2009 00:16:32 +0000 https://www.markhneedham.com/blog/2009/11/03/book-club-working-effectively-with-legacy-code-chapter-8-michael-feathers/ In our latest technical book club we discussed chapter 8 - 'How do I add a feature?' - of Michael Feather’s 'Working Effectively With Legacy Code'. This chapter covers Test Driven Development and a technique I hadn’t come across before called Programming By Difference. These are some of my thoughts and our discussion of the chapter: In the section on TDD Feathers mentions the copy/paste/refactor pattern which I wrote about a few days ago. Coding: Copy/Paste then refactor https://www.markhneedham.com/blog/2009/10/31/coding-copypaste-then-refactor/ Sat, 31 Oct 2009 17:54:31 +0000 https://www.markhneedham.com/blog/2009/10/31/coding-copypaste-then-refactor/ We’re currently reading Michael Feathers 'Working Effectively With Legacy Code' in our technical book club and one interesting technique which he describes in the Test Driven Development section is copying and pasting some existing code, changing the appropriate part to make the test pass before refactoring to remove the duplication we just created. I can’t remember coming across this approach previously but I found myself using it to solve a Scala problem last week. Coding: Invariant checking on dependency injected components https://www.markhneedham.com/blog/2009/10/31/coding-invariant-checking-on-dependency-injected-components/ Sat, 31 Oct 2009 03:00:40 +0000 https://www.markhneedham.com/blog/2009/10/31/coding-invariant-checking-on-dependency-injected-components/ I’ve written a couple of times previously about invariant checking in constructors and I had an interesting discussion with some colleagues recently around doing this type of defensive programming when the object in question has its dependencies injected by a container. Quite often we would see code similar to this in a controller: public class SomeController { public SomeController(Dependency1 valueOne, Dependency2 valueTwo) { AssertThat.isNotNull(valueOne); AssertThat.isNotNull(valueTwo); // and so on } } Where 'SomeController' would have 'Dependency1' and 'Dependency2' set up in a Spring configuration file in this example. Coding: Consistency when invariant checking https://www.markhneedham.com/blog/2009/10/29/coding-consistency-when-invariant-checking/ Thu, 29 Oct 2009 23:06:35 +0000 https://www.markhneedham.com/blog/2009/10/29/coding-consistency-when-invariant-checking/ I wrote a while ago about reading the ASP.NET MVC source code and noticing that it makes use of code inside its constructors to ensure that null values can’t be passed in and while I’m still not convinced this is the way to go I think if we do take this approach then we need to ensure we do so consistently. Something which happens quite often is that you’ll come across code which makes use of defensive programming in one of its constructors like so: Coding: Connascence - Some examples https://www.markhneedham.com/blog/2009/10/28/coding-connascence-some-examples/ Wed, 28 Oct 2009 22:43:01 +0000 https://www.markhneedham.com/blog/2009/10/28/coding-connascence-some-examples/ I’ve been reading Meilir Page Jones' 'Fundamentals of Object Oriented Design in UML' recently and one of the chapters that I found the most interesting is the one where he talks about 'connascence'. Connascence describes the relation between two different bits of code and two bits of code are said to be connascent if a change to one bit of code would require a change to the other bit of the code or if some change to another piece of code would require both bits of code to change for our program to still be correct. Book Club: Working Effectively With Legacy Code - Chapters 6 & 7 (Michael Feathers) https://www.markhneedham.com/blog/2009/10/26/book-club-working-effectively-with-legacy-code-chapters-6-7-michael-feathers/ Mon, 26 Oct 2009 23:10:45 +0000 https://www.markhneedham.com/blog/2009/10/26/book-club-working-effectively-with-legacy-code-chapters-6-7-michael-feathers/ In our latest technical book club we covered chapters 6 & 7 - 'I Don’t Have Much Time And I Have To Change It' and 'It Takes Forever To Make A Change' - of Michael Feathers' 'Working Effectively With Legacy Code'. The first chapter discusses various different techniques that we can use to add in new code to a legacy code base. These include: Sprout method - create a new method for our new functionality and make a call to it from existing code. Scala: Converting an input stream to a string https://www.markhneedham.com/blog/2009/10/26/scala-converting-an-input-stream-to-a-string/ Mon, 26 Oct 2009 06:32:24 +0000 https://www.markhneedham.com/blog/2009/10/26/scala-converting-an-input-stream-to-a-string/ I was playing around with Scala over the weekend and one thing that I wanted to do was get the data from a HTTP response as a string so that I could parse the xml returned. The data source is fairly small so loading the stream into memory wasn’t a problem. Carlos pointed me to a bit of Java code that did this and I converted it as literally as possible into Scala. Testing End Points: Integration tests vs Contract tests https://www.markhneedham.com/blog/2009/10/25/testing-integration-points-integration-tests-vs-contract-tests/ Sun, 25 Oct 2009 00:04:12 +0000 https://www.markhneedham.com/blog/2009/10/25/testing-integration-points-integration-tests-vs-contract-tests/ We recently changed the way that we test against our main integration point on the project I’ve been working on so that in our tests we retrieve the service object from our dependency injection container instead of 'newing' one up. Our tests therefore went from looking like this: [Test] public void ShouldTestSomeService() { var someService = new SomeService(); // and so on } To something more like this: [Test] public void ShouldTestSomeService() { var someService = UnityFactory. Value objects: Immutability and Equality https://www.markhneedham.com/blog/2009/10/23/value-objects-immutability-and-equality/ Fri, 23 Oct 2009 23:39:05 +0000 https://www.markhneedham.com/blog/2009/10/23/value-objects-immutability-and-equality/ A couple of weeks ago I was working on some code where I wanted to create an object composed of the attributes of several other objects. The object that I wanted to construct was a read only object so it seemed to make sense to make it a value object. The object would be immutable and once created none of the attributes of the object would change. This was my first attempt at writing the code for this object: Coding: The primitive obsession https://www.markhneedham.com/blog/2009/10/23/coding-the-primitive-obsession/ Fri, 23 Oct 2009 00:08:10 +0000 https://www.markhneedham.com/blog/2009/10/23/coding-the-primitive-obsession/ I recently came across an interesting post by Naresh Jain where he details a discussion at the SDTConf 2009 about the code smells that hurt people the most. Naresh describes the 'primitive obsession' anti pattern as being the crux of poor design: I would argue that I’ve seen code which does not have much duplication but its very difficult to understand what’s going on. Hence I claim, “only if the code had better abstractions it would be a lot easier to understand and evolve the code”. The effect of adding new people to project teams https://www.markhneedham.com/blog/2009/10/21/the-effect-of-adding-new-people-to-project-teams/ Wed, 21 Oct 2009 18:06:47 +0000 https://www.markhneedham.com/blog/2009/10/21/the-effect-of-adding-new-people-to-project-teams/ I’ve read quite frequently about the challenges we will experience when adding new people onto teams, including Fred Brooks' 'The Mythical Man Month', but having seen quite a few new people join the project that I’ve been working on over the last few months I think there are actually some significant benefits they can provide. I think the impact new people provide is particularly useful on a challenging project where they may be able to have a much more immediate impact. Book Club: Working Effectively With Legacy Code - Chapters 3,4 & 5 (Michael Feathers) https://www.markhneedham.com/blog/2009/10/20/book-club-working-effectively-with-legacy-code-chapters-34-5-michael-feathers/ Tue, 20 Oct 2009 07:01:37 +0000 https://www.markhneedham.com/blog/2009/10/20/book-club-working-effectively-with-legacy-code-chapters-34-5-michael-feathers/ In our latest technical book club we discussed chapters 3,4 and 5 of Michael Feathers' 'Working Effectively With Legacy Code' - 'Sensing and Separation', 'The Seam Model' and 'Tools'. These are some of my thoughts from our discussion of these chapters: Feathers suggests two reasons why we break dependencies when trying to get tests in place - sensing and separation. The former involves the breaking of dependencies in order to get access to the values computed in our code and the latter is necessary so that we can get our code into a test harness to start with. Coding: Role based interfaces https://www.markhneedham.com/blog/2009/10/18/coding-role-based-interfaces/ Sun, 18 Oct 2009 20:33:39 +0000 https://www.markhneedham.com/blog/2009/10/18/coding-role-based-interfaces/ I’ve read a bit about role based interfaces but I’ve never really quite understood how the idea could be applied into our code - this week my colleague Matt Dunn has been teaching me. We had a requirement to show some content on every page of the website we’re working on. The content would be slightly different depending on which business process you’re doing. Our first solution made use of an already defined 'BusinessType' property which allowed us to work out which content we needed to create. Treating Javascript as an integration point https://www.markhneedham.com/blog/2009/10/17/treating-javascript-as-an-integration-point/ Sat, 17 Oct 2009 09:16:12 +0000 https://www.markhneedham.com/blog/2009/10/17/treating-javascript-as-an-integration-point/ A couple of weeks ago I wrote a post about my software development journey over the last year and towards the end I described the difficulties we were having in making changes to some C# code while being sure that we hadn’t broken javascript functionality that also relied on that code. We typically have code which looks like this: public class SomeController { public ActionResult SomeControllerAction() { var someModel = new SomeModel { Property1 = "my Property" }; return new JsonResult { Data = someModel }; } } public class SomeModel { public string Property1 { get; set; } } We would make use of this type of object in javascript code like so: Book Club: Working Effectively With Legacy Code - Chapters 1 & 2 (Michael Feathers) https://www.markhneedham.com/blog/2009/10/14/book-club-working-effectively-with-legacy-code-chapters-1-2-michael-feathers/ Wed, 14 Oct 2009 23:21:39 +0000 https://www.markhneedham.com/blog/2009/10/14/book-club-working-effectively-with-legacy-code-chapters-1-2-michael-feathers/ We’ve decided to go back to reading a book in our technical book club after a few months of discussing different papers and the chosen book is Michael Feathers' 'Working Effectively With Legacy Code'. We started off by reading the first two chapters titled 'Changing Software' and 'Working with Feedback' and these are some of my thoughts and our discussion of the chapters: Early on Feathers talks about the need to change software in order to add features and fix bugs and while it is certainly necessary to make some changes to code in order to do this we discussed whether there is ever a time that we might look to keep the number of changes we’re making to a minimum. Scala: Code Kata #2 - Karate Chop - Array Slicing Attempt https://www.markhneedham.com/blog/2009/10/13/scala-code-kata-2-karate-chop-array-slicing-attempt/ Tue, 13 Oct 2009 07:00:53 +0000 https://www.markhneedham.com/blog/2009/10/13/scala-code-kata-2-karate-chop-array-slicing-attempt/ In my continued attempts to learn a bit of Scala I’ve been trying out the 2nd of Dave Thomas' code katas - Karate Chop - while using an array slicing approach. I’ve tried out the iterative approach to this problem in Java about a year ago and it ends up being quite verbose so I thought the array slicing one would be much more concise. I didn’t drive any of the solutions I worked on from the tests - in fact I only got all the tests provided by Dave Thomas running right at the end which was probably a mistake in retrospect. DSLs: Violating the builder pattern https://www.markhneedham.com/blog/2009/10/12/dsls-violating-the-builder-pattern/ Mon, 12 Oct 2009 22:20:16 +0000 https://www.markhneedham.com/blog/2009/10/12/dsls-violating-the-builder-pattern/ I recently came across an interesting post by Dave Thomas where he discussed several domain specific languages (DSLs) he’s come across and suggests that a lot of them seem to be trying too hard to read like the english language instead of focusing on describing a vocabulary for their specific domain Reading this post reminded me that I fell into this trap earlier in the year while doing some work to create a builder pattern in our code which didn’t need to make use of a 'Build' method but instead would make use of C#'s implicit operator to automatically convert the builder to an object at the appropriate moment. Pair Programming: API exploration https://www.markhneedham.com/blog/2009/10/11/pair-programming-api-exploration/ Sun, 11 Oct 2009 14:49:21 +0000 https://www.markhneedham.com/blog/2009/10/11/pair-programming-api-exploration/ A colleague and I were working on some code a couple of weeks ago which mostly revolved around investigating the C# reflection API to work out which methods we needed to use. My colleague was driving while we were doing this and our progress seemed very much based on intuition about the API rather than being gradual. In fact it was quite similar to one of the situations in which Uncle Bob suggests TDD doesn’t work so well: TDD: Keeping assertions clear https://www.markhneedham.com/blog/2009/10/10/tdd-keeping-assertions-clear/ Sat, 10 Oct 2009 11:07:21 +0000 https://www.markhneedham.com/blog/2009/10/10/tdd-keeping-assertions-clear/ Something which I noticed was a problem with the first example test that I provided in my post about API readability and testability is that the assertion we are making is not that great. [Test] public void ShouldConstructModelForSomeSituation() { Assert.AreEqual(DateTime.Today.ToDisplayFormat(), model.SomeDate()); } It’s not really obvious what the expected result is supposed to be except that it should be the 'DisplayFormat'. If that fails then we’ll need to navigate to the 'ToDisplayFormat' method to work out what that method does. Coding: API readability/testability https://www.markhneedham.com/blog/2009/10/10/coding-api-readabilitytestability/ Sat, 10 Oct 2009 00:21:45 +0000 https://www.markhneedham.com/blog/2009/10/10/coding-api-readabilitytestability/ About a month ago or so I described how we did some work to ensure that we were calling a class the same way in our tests as in our production code and while I think that was a good choice in that situation we came across a similar problem this week where we weren’t so sure. The piece of code in question was being used to create the view model for a page and one of the pieces of data that we wanted to show on this page was the date on which something would be valid which is currently today’s date. Software Development Apprenticeship: Some thoughts https://www.markhneedham.com/blog/2009/10/07/software-development-apprenticeship-some-thoughts/ Wed, 07 Oct 2009 20:32:38 +0000 https://www.markhneedham.com/blog/2009/10/07/software-development-apprenticeship-some-thoughts/ I recently came across a interview with Dave Hoover where he talks through the idea of working as an apprentice software developer and suggests some ways to do this more effectively. I think the easiest thing to get wrong in software development is to over estimate our ability and there is even a study that proves that theory. Hoover refers to this as 'having an accurate self assessment'. If we work on the same project for a while then we’re going to get pretty good at navigating that code base and we’ll probably be able to solve any problem and add any piece of functionality fairly easily which only helps fuel the belief. Book Club: Integration tests are a scam (J.B. Rainsberger) https://www.markhneedham.com/blog/2009/10/06/book-club-integration-tests-are-a-scam-j-b-rainsberger/ Tue, 06 Oct 2009 23:37:52 +0000 https://www.markhneedham.com/blog/2009/10/06/book-club-integration-tests-are-a-scam-j-b-rainsberger/ In our latest book club we discussed J.B. Rainsberger’s presentation from Agile 2009 titled 'Integration tests are a scam'. These are some of my thoughts and our discussion of the video: While talking about how to write interaction tests he suggests that we should only be looking to create interfaces for Domain Driven Design services. If we find ourselves wanting to create interfaces for entities or value objects then we probably have a service wanting to get out. My Software Development journey: Year 3-4 https://www.markhneedham.com/blog/2009/10/05/my-software-development-journey-year-3-4/ Mon, 05 Oct 2009 18:52:14 +0000 https://www.markhneedham.com/blog/2009/10/05/my-software-development-journey-year-3-4/ Just over a year ago I wrote a blog post about my software development journey up to that point and I thought it’d be interesting to write a new version for the 13 months or so since then to see what the main things I’ve learned are. Functional programming I started playing around with F# about 11 months ago after becoming intrigued about this approach to programming following some conversations with my colleague Phil Calcado. Coding: Rules of thumb https://www.markhneedham.com/blog/2009/10/04/coding-rules-of-thumb/ Sun, 04 Oct 2009 16:59:29 +0000 https://www.markhneedham.com/blog/2009/10/04/coding-rules-of-thumb/ I recently came across a post by Ayende where he talks about the need for tests to justify themselves and describes his approach to testing which doesn’t involved TDDing all the code he writes. While this approach clearly works well for Ayende I really like the following comment by Alex Simkin: Anyway, this post should be marked MA (Mature Audience Only), so younger programmers wont use excuse to not write unit tests because Ayende doesn’t do it. Learn one thing a day https://www.markhneedham.com/blog/2009/10/03/learn-one-thing-a-day/ Sat, 03 Oct 2009 13:58:55 +0000 https://www.markhneedham.com/blog/2009/10/03/learn-one-thing-a-day/ I came across an interesting post about a month or so written by Chad Fowler on Tim Ferriss' blog where he suggested that a useful way of ensuring that we are always improving is to ask the question 'Am I better than yesterday?' at the end of each day. I really like this idea and I think it fits in quite nicely with the approach that I take which is to try and ensure that I learn one new thing each day. QTB: Agile Governance - Managing the Enterprise Issues https://www.markhneedham.com/blog/2009/10/01/qtb-agile-governance-managing-the-enterprise-issues/ Thu, 01 Oct 2009 23:10:36 +0000 https://www.markhneedham.com/blog/2009/10/01/qtb-agile-governance-managing-the-enterprise-issues/ I went to watch the latest ThoughtWorks Australia Quarterly Technology Briefing in Sydney on Wednesday where my colleague Lindy Stephens, Suncorp’s Josh Melville and Lonely Planet’s Nigel Dalton presented on 'Agile Governance - Managing the Enterprise Issues'. I was actually unsure of how interesting it would be to me as the title seemed a bit dull but it was actually quite entertaining and not at all what I expected. Scala: 99 problems https://www.markhneedham.com/blog/2009/09/30/scala-99-problems/ Wed, 30 Sep 2009 23:39:16 +0000 https://www.markhneedham.com/blog/2009/09/30/scala-99-problems/ My colleague Liz Douglass and I have been playing around with Scala and Liz recently pointed out Phil Gold’s 'Ninety Nine Scala Problems' which we’ve been working through. One in particular which is quite interesting is number 7 where we need to flatten a nested list structure. Therefore given this input: flatten(List(List(1, 1), 2, List(3, List(5, 8)))) We would expect this output: res0: List[Any] = List(1, 1, 2, 3, 5, 8) I tried this out on my own using recursion but kept ending up creating a stack overflow by writing code that never terminated! Book Club: Design Sense (Michael Feathers) https://www.markhneedham.com/blog/2009/09/30/book-club-design-sense-michael-feathers/ Wed, 30 Sep 2009 00:42:29 +0000 https://www.markhneedham.com/blog/2009/09/30/book-club-design-sense-michael-feathers/ In our latest technical book club we discussed a presentation given at the Norwegian Developers Conference by Michael Feathers titled 'Design Sense'. In this presentation he presents quite a number of different ideas that he has learned from his experiences in software development over the years. These are some of my thoughts and our discussion: The first part of the presentation talks about method size and Feathers observes that there seems to be a power law with relation to the size of methods in code bases - i. Learning from others/Learning yourself https://www.markhneedham.com/blog/2009/09/28/learning-from-otherslearning-yourself/ Mon, 28 Sep 2009 00:02:12 +0000 https://www.markhneedham.com/blog/2009/09/28/learning-from-otherslearning-yourself/ Something which has become quite apparent to me recently is that I learn things far more quickly if I try it out myself and make mistakes than if I just rely on someone else’s word for it but some more experienced colleagues seem able to use information explained to them fair more effectively and don’t necessarily need to go through this process. While reading through the Dreyfus Model one of the ideas that is suggested is that once people reach the level of 'Proficient' at any given skill then they are able to learn from the experiences of others without needing to experience something themselves. The Duct Tape Programmer: Some thoughts https://www.markhneedham.com/blog/2009/09/26/the-duct-tape-programmer-some-thoughts/ Sat, 26 Sep 2009 17:16:34 +0000 https://www.markhneedham.com/blog/2009/09/26/the-duct-tape-programmer-some-thoughts/ I just came across quite an insightful post by Jak Charlton titled 'Ship it or Ship out' in which he talks about the importance of shipping the software we work on, referring to Joel’s recent post 'The Duct Tape Programmer'. Unit testing When I first read Joel’s post I didn’t really like it because it seems to downplay the role of unit testing when coding, something which I believe is quite important from my experience of software development so far. TDD: It makes you question what you're doing https://www.markhneedham.com/blog/2009/09/25/tdd-it-makes-you-question-what-youre-doing/ Fri, 25 Sep 2009 23:48:33 +0000 https://www.markhneedham.com/blog/2009/09/25/tdd-it-makes-you-question-what-youre-doing/ My colleague Matt Dunn and I have been putting a lot of tests around some code over the last few days so that we can safely make some changes around that area and having finally created our safety net we’ve moved onto adding in the new functionality. We’re test driving the new bit of functionality whereas with the previous code only the code had been written with no unit tests and it’s been quite interesting seeing the contrast in the style of code which seems to come out from these differing styles. Book Club: Versioning your database (K. Scott Allen) https://www.markhneedham.com/blog/2009/09/24/book-club-versioning-your-database-k-scott-allen/ Thu, 24 Sep 2009 07:35:25 +0000 https://www.markhneedham.com/blog/2009/09/24/book-club-versioning-your-database-k-scott-allen/ In our latest technical book club we discussed a series of posts written by K. Scott Allen about getting your database under version control. Three rules for database work The baseline Change scripts Views, Stored Procedures and the like Branching and Merging These are some of my thoughts and our discussion: We had an interesting discussion around when it’s ok to go and change checked in change scripts - on previous projects I’ve worked on we’ve actually had the rule that once you’ve checked in a change script to source control then you can no longer change it but instead need to add another change script that does what you want. TDD: Copying and pasting tests https://www.markhneedham.com/blog/2009/09/22/tdd-copying-and-pasting-tests/ Tue, 22 Sep 2009 23:39:56 +0000 https://www.markhneedham.com/blog/2009/09/22/tdd-copying-and-pasting-tests/ I’ve been re-reading a post my colleague Ian Cartwright wrote earlier this year about treating test code the same way as production code and one thing which stands out as something which I’m certainly guilty off is copying and pasting tests. Ian lists the following problems with doing this: The first one is cut & paste, for some reason when it comes to unit tests people suddenly start cutting and pasting all over the place. TDD: Tests that give us a false confidence of coverage https://www.markhneedham.com/blog/2009/09/21/tdd-tests-that-give-us-a-false-confidence-of-coverage/ Mon, 21 Sep 2009 22:49:49 +0000 https://www.markhneedham.com/blog/2009/09/21/tdd-tests-that-give-us-a-false-confidence-of-coverage/ During J.B. Rainsberger’s presentation at Agile 2009 titled 'Integration tests are a scam' he suggests that having lots of integrationt tests covering our code can give us a false sense of confidence that we are testing our code and I think the same can happen with unit tests as well if we’re not careful how we write them. It’s important to ensure that our unit tests are actually testing something useful otherwise the cost of writing and maintaining them will outweigh the benefits that we derive from doing so. TDD: Keeping test intent when using test builders https://www.markhneedham.com/blog/2009/09/20/tdd-keeping-test-intent-when-using-test-builders/ Sun, 20 Sep 2009 12:06:04 +0000 https://www.markhneedham.com/blog/2009/09/20/tdd-keeping-test-intent-when-using-test-builders/ While the test data builder pattern is quite a useful one for simplifying the creation of test data in our tests I think we need to be quite careful when using it that we don’t lose the intent of the test that we’re writing. The main advantage that I see with this pattern is that by using it we can provide default values for properties of our objects which aren’t important for the bit of functionality that we’re currently testing but which need to be provided otherwise the test can’t actually be run. Set Based Concurrent Engineering: A simple example https://www.markhneedham.com/blog/2009/09/19/set-based-concurrent-engineering-a-simple-example/ Sat, 19 Sep 2009 02:24:11 +0000 https://www.markhneedham.com/blog/2009/09/19/set-based-concurrent-engineering-a-simple-example/ One of my favourite ideas that I came across while reading the Poppendieck’s Lean Software Development is set based concurrent engineering which encourages us to keep our options open with regards to the solution to a problem until we absolutely need to decide on an approach after which we probably can’t easily change that decision so we will most likely stick with it. I like the idea but on the projects I’ve worked on we often seem to take a more point based approach - there will be some discussion up front on the potential solutions to a problem and eventually one of them will be considered to be the best solution and we go and implement that one. TDD: Testing with generic abstract classes https://www.markhneedham.com/blog/2009/09/18/tdd-testing-with-generic-abstract-classes/ Fri, 18 Sep 2009 00:40:09 +0000 https://www.markhneedham.com/blog/2009/09/18/tdd-testing-with-generic-abstract-classes/ In a post I wrote earlier in the week I described a dilemma we were having testing some code which made use of abstract classes and Perryn Fowler, Liz Keogh and Pat Maddox pointed out that a useful approach for this problem would be to make use of an abstract test class. The idea here is that we create an equivalent hierarchy to our production code for our tests which in the example that I provided would mean that we have roughly the following setup: Coding: Watch out for mutable code https://www.markhneedham.com/blog/2009/09/16/coding-watch-out-for-mutable-code/ Wed, 16 Sep 2009 23:31:58 +0000 https://www.markhneedham.com/blog/2009/09/16/coding-watch-out-for-mutable-code/ I’ve been doing some more work recently on trying to reduce the number of fields in some of our classes and moving any logic related to calculations into the methods that use the logic but managed to break part of our application recently by doing that a bit too casually and not realising that the code I’d inlined was actually being mutated later on. The code I’d refactored originally looked like this: Book Club: SOLID Principles (Uncle Bob Martin) https://www.markhneedham.com/blog/2009/09/16/book-club-solid-principles-uncle-bob-martin/ Wed, 16 Sep 2009 01:11:58 +0000 https://www.markhneedham.com/blog/2009/09/16/book-club-solid-principles-uncle-bob-martin/ In our latest technical book club we discussed Uncle Bob Martin’s presentation to the Norwegian Developers Conference on 'SOLID Design'. These principles of object oriented design are also written up on Uncle Bob’s website and are also in his book 'Agile Principles, Patterns and Practices'. I read most of the book a couple of years ago but I don’t always remember all of the principles when I’m coding so it was good to revisit them again. Scala: The '_=' mixed identifier https://www.markhneedham.com/blog/2009/09/14/scala-the-_-mixed-identifier/ Mon, 14 Sep 2009 23:49:07 +0000 https://www.markhneedham.com/blog/2009/09/14/scala-the-_-mixed-identifier/ I’ve been playing around with Scala a bit and in particular following some of the code examples from Daniel Spiewak’s 'Scala for Java Refugees' article on Traits and Types. One thing that I got a bit confused about in one of the examples was the use of the '_' at the end of one of the function definitions: class MyContainer[T] { private var obj:T = null def value = obj def value_=(v:T) = obj = v } val cont = new MyContainer[String] cont. TDD: Testing sub classes https://www.markhneedham.com/blog/2009/09/13/tdd-testing-sub-classes/ Sun, 13 Sep 2009 22:21:22 +0000 https://www.markhneedham.com/blog/2009/09/13/tdd-testing-sub-classes/ We ran into another interesting testing dilemma while refactoring the view model code which I described in an earlier post to the point where we have an abstract class and three sub classes which means that we now have 3 classes which did the same thing 80% of the time. As I mentioned in a post a couple of weeks ago one of the main refactorings that we did was to move some calls to dependency methods from the constructor and into properties so that those calls would only be made if necessary. Coding: An abstract class/ASP.NET MVC dilemma https://www.markhneedham.com/blog/2009/09/13/coding-an-abstract-classasp-net-mvc-dilemma/ Sun, 13 Sep 2009 00:19:42 +0000 https://www.markhneedham.com/blog/2009/09/13/coding-an-abstract-classasp-net-mvc-dilemma/ I previously described a refactoring that we have been working on to reduce the number of fields and delay calculations and the actual goal behind this refactoring was to get the code into shape so that we could add in the logic for a new business process that our application needed to handle. The code in question defines view models being used by different partial views which are rendered depending on the business process that the user is currently executing. TDD: Test only constructors https://www.markhneedham.com/blog/2009/09/12/tdd-test-only-constructors/ Sat, 12 Sep 2009 00:35:12 +0000 https://www.markhneedham.com/blog/2009/09/12/tdd-test-only-constructors/ I wrote previously how we’d been doing some work to change the way that we get a 'User' object into our system and one mistake that we made intially was to have another constructor on the 'User' object which was being used in all our unit tests which involved the user in some way. The original reason that this 'test constructor' was created was to make it easier to construct a 'fake user' which we were using in some of our functional tests but had ended up being used in unit tests as well. Impersonators: Using them in showcases https://www.markhneedham.com/blog/2009/09/10/impersonators-using-them-in-showcases/ Thu, 10 Sep 2009 00:23:33 +0000 https://www.markhneedham.com/blog/2009/09/10/impersonators-using-them-in-showcases/ Towards the end of my colleague Julio Maia’s blog post about the impersonator pattern he suggests that the standalone environment that we can create through the use of impersonators can be quite useful for showcases and we actually had a recent occasion where we had to switch mid-showcase from using the integration environment to make use of an impersonator. In this case part of the environment went down in the middle of the showcase so if we wanted to keep on going then that was our only option but in general the expectation of the business is that our showcases show them the functionality of the application end to end. A reminder that sometimes it's best just to ask https://www.markhneedham.com/blog/2009/09/07/a-reminder-that-sometimes-its-best-just-to-ask/ Mon, 07 Sep 2009 22:27:57 +0000 https://www.markhneedham.com/blog/2009/09/07/a-reminder-that-sometimes-its-best-just-to-ask/ Recently my pair and I were trying to merge some changes into our code that we had just picked up fron updating from the trunk and realised that we weren’t actually sure how to resolve that merge since it seemed to conflict with what we’d been working on. We hadn’t checked in for longer than we would have liked to due to a bit of a checkin pile up which had happened because the build on the CI server had been failing for a few hours due to a temporary problem we were having with an external dependency. Fiddler: Trying to work out how it all hooks together https://www.markhneedham.com/blog/2009/09/06/fiddler-trying-to-work-out-how-it-all-hooks-together/ Sun, 06 Sep 2009 23:25:42 +0000 https://www.markhneedham.com/blog/2009/09/06/fiddler-trying-to-work-out-how-it-all-hooks-together/ I mentioned previously that we’re making use of Fiddler quite a lot on my current project, mainly to check the traffic going to and from the service layer, and I’m quite curious how it actually works. In particular I wanted to know: How we’re able to route requests through Fiddler and then through the corporate proxy How proxy settings work differently for Firefox and Internet Explorer As far as I’m aware the source code for Fiddler isn’t available so a colleague and I tracked the various proxy settings when Fiddler was turned on and off and also had a look at some registry settings. Coding: Checking invariants in a factory method https://www.markhneedham.com/blog/2009/09/06/coding-checking-invariants-in-a-factory-method/ Sun, 06 Sep 2009 00:46:01 +0000 https://www.markhneedham.com/blog/2009/09/06/coding-checking-invariants-in-a-factory-method/ Something which we discussed quite frequently when studying Domain Driven Design in our technical book club earlier this year was where the code which checked whether we had setup an object correctly should reside. Shortly after that I suggested that I didn’t think it should go in the constructor of an object but that we should rely on objects to be good citizens and not pass in null values or the like to other objects. Book Club: Promiscuous Pairing & Beginner's Mind (Arlo Belshee) https://www.markhneedham.com/blog/2009/09/05/book-club-promiscuous-pairing-beginners-mind-arlo-belkshee/ Sat, 05 Sep 2009 16:12:32 +0000 https://www.markhneedham.com/blog/2009/09/05/book-club-promiscuous-pairing-beginners-mind-arlo-belkshee/ In this weeks book club we discussed Arlo Belshee’s paper 'Promiscuous Pairing and Beginner’s Mind' where he presents the idea of rotating pairs more frequently than we might usually, suggesting that the optimal rotation time is 90 minutes. I remember coming across the idea of promiscuous pairing a couple of years ago but I hadn’t read the paper all the way through and so far haven’t worked on a team where we’ve really tried out his ideas. Coding Dojo #22: Scala, lamdaj, Project Euler https://www.markhneedham.com/blog/2009/09/04/coding-dojo-22-scala-lamdaj-project-euler/ Fri, 04 Sep 2009 00:26:00 +0000 https://www.markhneedham.com/blog/2009/09/04/coding-dojo-22-scala-lamdaj-project-euler/ In our latest coding dojo we played around with Scala and lambdaj while attempting to solve some of the problems on the Project Euler website. The Format We started off on two different machines with two of us having a look at solving the first Project Euler problem in Scala and the other two trying to solve it in Java while using the lambdaj library. What did we learn? Fabio and I worked on the Scala solution to the problem and we were pretty much playing around with different ways to aggregate all the values in the list: ~scala 1. Coding: Reduce fields, delay calculations https://www.markhneedham.com/blog/2009/09/02/coding-reduce-fields-delay-calculations/ Wed, 02 Sep 2009 23:52:06 +0000 https://www.markhneedham.com/blog/2009/09/02/coding-reduce-fields-delay-calculations/ A pattern in code which I’ve noticed quite frequently lately is that of executing calculations in the constructor of an object and then storing the result in a field on the object. From the small amount of experience I have playing around with functional languages I have come across the idea of lazy evaluation of functions quite frequently and I think it’s something that we can apply in object oriented languages as well. TDD: Test the behaviour rather than implementation https://www.markhneedham.com/blog/2009/09/02/tdd-test-the-behaviour-rather-than-implementation/ Wed, 02 Sep 2009 00:42:52 +0000 https://www.markhneedham.com/blog/2009/09/02/tdd-test-the-behaviour-rather-than-implementation/ I previously wrote about some duplicated code we’d taken the time to remove from our code base and one something else that we found when working with this code is that a lot of the tests around this code were testing the implementation/internal state of the object rather than testing the behaviour that they expected to see. I find it makes more sense to test the behaviour since this is the way that the object will most likely be used in our production code. Coding: The guilty bystander https://www.markhneedham.com/blog/2009/08/30/coding-the-guilty-bystander/ Sun, 30 Aug 2009 20:07:50 +0000 https://www.markhneedham.com/blog/2009/08/30/coding-the-guilty-bystander/ While discussing the duplication in our code based which I described in an earlier post with some other colleagues earlier this week I realised that I had actually gone past this code a couple of times previously, seen that there was a problem with it but hadn’t taken any steps to fix it other than to make a mental note that I would fix it when I got the chance. Coding: Group the duplication, then remove it https://www.markhneedham.com/blog/2009/08/30/coding-group-the-duplication-then-remove-it/ Sun, 30 Aug 2009 13:13:50 +0000 https://www.markhneedham.com/blog/2009/08/30/coding-group-the-duplication-then-remove-it/ One of the most common activities for software developers is removing duplication from code and Dave recently showed me a technique which I hadn’t seen before for doing this more effectively - first group all the code into one place without removing any of the duplication and then remove the duplication when everything is in one place. The code where we tried out this technique was being used to construct the model for the navigation at the top of the pages on the website we’re working on and before we grouped the duplication the code looked a bit like this: Book Club: Unshackle your domain (Greg Young) https://www.markhneedham.com/blog/2009/08/29/book-club-unshackle-your-domain-greg-young/ Sat, 29 Aug 2009 09:54:39 +0000 https://www.markhneedham.com/blog/2009/08/29/book-club-unshackle-your-domain-greg-young/ In this week’s book club we continued with the idea of discussing videos, this week’s selection being Greg Young’s 'Unshackle your Domain' presentation from QCon San Francisco in November 2008. He also did a version of this talk in the February European Alt.NET meeting. In this presentation Greg talks about Command Query Separation at the architecture level and explicit state transitions amongst other things. Jonathan Oliver has created a useful resource page of the material that’s been written about some of these ideas as well. jQuery: $.post, 'jsonp' and cross-domain requests https://www.markhneedham.com/blog/2009/08/27/jquery-post-jsonp-and-cross-domain-requests/ Thu, 27 Aug 2009 22:39:26 +0000 https://www.markhneedham.com/blog/2009/08/27/jquery-post-jsonp-and-cross-domain-requests/ We spent a bit of time yesterday looking through the jQuery code trying to work out why a cross domain request we were making using jQuery’s '$.post' function wasn’t working. In hindsight perhaps it should have been obvious that you wouldn’t be able to do that but I didn’t completely understand how we were able to do cross domain requests were possible at all but we had some '$.getJson' 'jsonp' function calls around our code base which were doing just that. Pair Programming: Observations on anti-patterns https://www.markhneedham.com/blog/2009/08/27/pair-programming-observations-on-anti-patterns/ Thu, 27 Aug 2009 00:02:50 +0000 https://www.markhneedham.com/blog/2009/08/27/pair-programming-observations-on-anti-patterns/ I’ve been pairing a bit more regularly recently after more sporadic pairing sessions over the last 9 or 10 months and I’ve noticed that I’ve picked up some habits which aren’t really that effective when pairing so I’m on a mission to sort that out. Moving around the code too quickly One thing that I often forget is that when you’re driving you know exactly where you’re going with the mouse or keyboard just before you do it whereas the other person doesn’t know until you’ve done it. Coding: Coupling and Expressiveness https://www.markhneedham.com/blog/2009/08/25/coding-coupling-and-expressiveness/ Tue, 25 Aug 2009 22:42:55 +0000 https://www.markhneedham.com/blog/2009/08/25/coding-coupling-and-expressiveness/ We came across an interesting situation in our code base recently whereby two coding approaches which I consider important for writing maintainable code seemed to come into conflict with each other. The code we were working on needed to retrieve some customer details from a backend system by making use of the current user’s 'customerId' which we can retrieve from the 'LoggedInUser'. My initial thought was that since we only needed one property of the 'LoggedInUser' we could just pass in the 'customerId' instead of the 'LoggedInUser': Rock Scissors Paper: TDD as if you meant it https://www.markhneedham.com/blog/2009/08/24/rock-scissors-paper-tdd-as-if-you-meant-it/ Mon, 24 Aug 2009 22:11:26 +0000 https://www.markhneedham.com/blog/2009/08/24/rock-scissors-paper-tdd-as-if-you-meant-it/ I decided to spend a bit of time on Saturday having another go at writing Rock Scissors Paper while following Keith Braithwaite’s TDD as if you meant it exercise. We previously did this exercise at a coding dojo but I wanted to see what happens when you code for a longer period of time with this exercise since we typically only code for maybe a couple of hours at a dojo. Book Club: What I've learned about DDD since the book (Eric Evans) https://www.markhneedham.com/blog/2009/08/24/book-club-what-ive-learned-about-ddd-since-the-book-eric-evans/ Mon, 24 Aug 2009 18:20:33 +0000 https://www.markhneedham.com/blog/2009/08/24/book-club-what-ive-learned-about-ddd-since-the-book-eric-evans/ This week book club became video club as we discussed Eric Evans' QCon London presentation 'What I’ve learned about DDD since the book'. I was lucky enough to be able to attend this presentation live and we previously ran a book club where I briefly summarised what I’d learnt but this gave everyone else an opportunity to see it first hand. There are some of my thoughts and our discussion of the presentation: Pair Programming: Keeping both people engaged https://www.markhneedham.com/blog/2009/08/24/pair-programming-keeping-both-people-engaged/ Mon, 24 Aug 2009 18:18:09 +0000 https://www.markhneedham.com/blog/2009/08/24/pair-programming-keeping-both-people-engaged/ I’ve written a few times previously about pair programming and how I think it’s one of the best practices I’ve seen used on agile teams but in order to ensure that we’re making the best use of this practice it’s important to ensure that both people are engaged. It is often quite difficult to persuade people who aren’t used to extreme programming that having two people working at the same machine is actually beneficial and this task can be made even more difficult if one person is losing focus or interest and therefore isn’t actually adding much value in that pairing session. Learning: Thoughts on doing so more effectively https://www.markhneedham.com/blog/2009/08/24/learning-thoughts-on-doing-so-more-effectively/ Mon, 24 Aug 2009 18:15:19 +0000 https://www.markhneedham.com/blog/2009/08/24/learning-thoughts-on-doing-so-more-effectively/ One of the quite common sayings that I’ve come across when discussing student/teacher type situations is that it’s the teacher’s responsibility to present the material to the student in a way that they can understand and that if the student still doesn’t understand then the teacher hasn’t done their job properly. I believe that this approach is also followed in the UK education system nowadays and while it makes sense in a way I don’t think it’s a particularly useful belief to have as a student since it seems to encourage you to be quite passive in the learning process. Coding: Unused code https://www.markhneedham.com/blog/2009/08/21/coding-unused-code/ Fri, 21 Aug 2009 08:56:02 +0000 https://www.markhneedham.com/blog/2009/08/21/coding-unused-code/ An interesting problem that we have come across a few times over the past 6 months is the dilemma about what to do when start work on a feature and get part way through it before it gets descoped from the current iteration, maybe to be picked up later on but maybe not. The easiest, and there most common, approach is to just leave the code in the code base half complete and then hopefully return to it at some later stage. TDD: Asserting on test dependency code https://www.markhneedham.com/blog/2009/08/19/tdd-asserting-on-test-dependency-code/ Wed, 19 Aug 2009 23:19:45 +0000 https://www.markhneedham.com/blog/2009/08/19/tdd-asserting-on-test-dependency-code/ Something I’ve noticed a bit lately is tests which setup a load of dependencies for a test and then do assertions on that setup before getting on to calling the system under test. The code tends to be similar to this: public void ShouldHopefullyDoSomeAwesomeStuff() { // setup via expectations for dependency1 and dependency2 Assert.IsNotNull(dependency1.DependedOnMethod); new SystemUnderTest(dependency1, dependency2).DoThatStuff(); // test assertions } I’ve done this a fair few times myself and I used to believe that it actually made the test more valuable since we were ensuring that the dependencies were in a good state before we executed the test. Impersonators: Finding the enabling point https://www.markhneedham.com/blog/2009/08/19/impersonators-finding-the-enabling-point/ Wed, 19 Aug 2009 00:43:18 +0000 https://www.markhneedham.com/blog/2009/08/19/impersonators-finding-the-enabling-point/ One of the other interesting problems that we’ve come across while making use of different impersonators in our build process, and which Julio mentions at the end of his comment on Gil Zilberfeld’s blog post, is trying to work out where the correct place for the impersonator is. Ideally we want to put the impersonator in a place where we can easily turn it on or off depending on whether we want to use the impersonator or the real end point. Pulling from github on Windows https://www.markhneedham.com/blog/2009/08/18/pulling-from-github-on-windows/ Tue, 18 Aug 2009 00:33:11 +0000 https://www.markhneedham.com/blog/2009/08/18/pulling-from-github-on-windows/ My colleague Dave Cameron has been telling me about his adventures playing around with Git Sharp (a C# port of the Java Git implementation jGit) so I thought I’d get a copy of the code and have a look as well. I tend to check out all code bases from my host machine instead of virtual machine so I got the code all checked out on the Mac and accessed it via a shared folder on my VM. Law of Demeter: Some thoughts https://www.markhneedham.com/blog/2009/08/17/law-of-demeter-some-thoughts/ Mon, 17 Aug 2009 21:12:26 +0000 https://www.markhneedham.com/blog/2009/08/17/law-of-demeter-some-thoughts/ Phil Haack wrote a post a few weeks ago about the law of demeter and how it’s not just about reducing the number of dots that appear on one line. This is a nice side effect of following the law of demeter but I often feel that the main benefit we get from following it is that code becomes easier to change since we haven’t exposed the state of an object all over the place. Impersonators: Why do we need them? https://www.markhneedham.com/blog/2009/08/16/impersonators-why-do-we-need-them/ Sun, 16 Aug 2009 22:11:25 +0000 https://www.markhneedham.com/blog/2009/08/16/impersonators-why-do-we-need-them/ I wrote previously about an impersonator we are using on my project which Martin Fowler has dubbed the 'self initializing fake' and although I thought this was the only type of situation where we might use this approach, from discussing this with my colleague Julio Maia and from experiences on the project I’m working on I realise there are other advantages to this approach as well. To deal with unstable/slow integration points This is the main reason that we use the self initializing fake and provides perhaps the most obvious reason why we might create an impersonator because we will remain in pain if we don’t create one. Builders hanging off class vs Builders in same namespace https://www.markhneedham.com/blog/2009/08/15/builders-hanging-off-class-vs-builders-in-same-namespace/ Sat, 15 Aug 2009 10:53:49 +0000 https://www.markhneedham.com/blog/2009/08/15/builders-hanging-off-class-vs-builders-in-same-namespace/ I wrote a couple of months ago about an approach we’re using to help people find test data builders in our code base by hanging those builders off a class called 'GetBuilderFor' and I think it’s worked reasonably well. However, a couple of weeks ago my colleague Lu Ning suggested another way to achieve our goal of allowing people to find the builders easily. The approach he suggested is to put all of the builders in the same namespace, for example 'Builders', so that if someone wants to find out if a builder already exists they can just type 'Builders. Challenging projects and the five stages of grief https://www.markhneedham.com/blog/2009/08/13/challenging-projects-and-the-five-stages-of-grief/ Thu, 13 Aug 2009 17:20:08 +0000 https://www.markhneedham.com/blog/2009/08/13/challenging-projects-and-the-five-stages-of-grief/ One of the things that I’ve noticed over the past few years of working on software delivery projects is that the most challenging projects, the ones that most people hate working on, tend to last the longest yet teach you the most although maybe not immediately. The problem is that a lot of the time we are in a state of frustration with all the things that are wrong about the project and therefore don’t focus on the things that we can do to make our situation better and improve the chances of the project to deliver. Zen Mind, Beginners Mind: Book Review https://www.markhneedham.com/blog/2009/08/12/zen-mind-beginners-mind-book-review/ Wed, 12 Aug 2009 09:06:53 +0000 https://www.markhneedham.com/blog/2009/08/12/zen-mind-beginners-mind-book-review/ The Book Zen Mind, Beginner’s Mind by Shunryu Suzuki The Review I first came across the actual term beginner’s mind when reading through the 'Wear The White Belt' chapter of Apprenticeship Patterns although it was often mentioned to me on one of the first projects I did at ThoughtWorks a couple of years ago that people liked teaching me things because I just took the information in pretty much without questioning. Dreyfus Model: More thoughts https://www.markhneedham.com/blog/2009/08/10/dreyfus-model-more-thoughts/ Mon, 10 Aug 2009 20:36:51 +0000 https://www.markhneedham.com/blog/2009/08/10/dreyfus-model-more-thoughts/ Since we discussed the Dreyfus Model in book club a few weeks ago I’ve noticed that I’m more aware of my own level of skill at different tasks and references to the model appear more frequent at least amongst my colleagues. These are some of the things I’ve been thinking about: How do we use the model? Alan Skorks has an interesting post where he discusses the role of the Dreyfus Model in helping to build software development expertise concluding that it doesn’t help very much in developing expertise within a team. Coding Dojo #21: TDD as if you meant it revisited https://www.markhneedham.com/blog/2009/08/08/coding-dojo-21-tdd-as-if-you-meant-it-revisited/ Sat, 08 Aug 2009 23:50:49 +0000 https://www.markhneedham.com/blog/2009/08/08/coding-dojo-21-tdd-as-if-you-meant-it-revisited/ In this weeks dojo we decided to revisit the "TDD as if you meant it' exercise originally invented by Keith Braithwaite for the Software Craftsmanship Conference but recently tried out at the Alt.NET UK Conference in London. The idea was to write code for 'tic tac toe' or 'naughts and crosses' and we were following these requirements: a game is over when all fields are taken a game is over when all fields in a column are taken by a player Book Club: Object Role Stereotypes (Jeremy Miller) https://www.markhneedham.com/blog/2009/08/08/book-club-object-role-stereotypes-jeremy-miller/ Sat, 08 Aug 2009 00:49:12 +0000 https://www.markhneedham.com/blog/2009/08/08/book-club-object-role-stereotypes-jeremy-miller/ In last week’s book club we discussed an article written by Jeremy Miller for MSDN Magazine titled 'Object Role Stereotypes' which discusses part of Rebecca Wirfs Brock’s book 'Object Design'. I’ve been trying to read Object Design for about a year since coming across the book while reading through the slides from JAOO Sydney 2008 but I’ve often found the reading to be quite abstract and have struggled to work out how to apply the ideas to the coding I do day to day. Bear Shaving https://www.markhneedham.com/blog/2009/08/06/bear-shaving/ Thu, 06 Aug 2009 18:58:00 +0000 https://www.markhneedham.com/blog/2009/08/06/bear-shaving/ I recently came across a blog post by Seth Godin where he coins the term 'bear shaving' which is where we address the symptoms of a problem instead of addressing the problem. The main example he gives is the idea of shaving a bear so that it can deal with the increased temperature caused by global warming instead of addressing the underlying problem which has led to this happening in the first place. Think a little, code a little https://www.markhneedham.com/blog/2009/08/05/think-a-little-code-a-little/ Wed, 05 Aug 2009 00:13:12 +0000 https://www.markhneedham.com/blog/2009/08/05/think-a-little-code-a-little/ I recently came across an interesting post by Frans Bauma entitled 'Think first, doing is for later' which was linked to from Jeremy Miller’s blog entry about incremental delivery and continuous design. Right now I find myself in favour of Jeremy’s approach which is more about writing some code and then getting some feedback on it and then writing some more code instead of spending a lot of time thinking before we write any code. Strong opinions, weakly held https://www.markhneedham.com/blog/2009/08/03/strong-opinions-weakly-held/ Mon, 03 Aug 2009 00:46:13 +0000 https://www.markhneedham.com/blog/2009/08/03/strong-opinions-weakly-held/ I find one of the most applicable mantras in software development is Bob Sutton’s idea that we should have strong opinions weakly held. The idea as I understand it is that we shouldn’t sit on the fence but instead have an opinion that we research thoroughly and are prepared to back up. However, we shouldn’t become too attached to those opinions but instead be prepared to listen to alternative points of view and take those on where they prove more useful than our previous opinions. Coding Dojo #20: Groovy Sales Tax Problem https://www.markhneedham.com/blog/2009/07/31/coding-dojo-20-groovy-sales-tax-problem/ Fri, 31 Jul 2009 09:07:26 +0000 https://www.markhneedham.com/blog/2009/07/31/coding-dojo-20-groovy-sales-tax-problem/ Continuing with the Groovy theme, this week we worked on the ThoughtWorks code review tax problem which involved modeling different items that a customer could buy and the associated tax rules that different types of goods had. The Format We had 3 people this week so most of the time we had all 3 of us involved in driving the code, which was projected onto the television screen again, while rotating every 10 minutes or so. Book Club: Hexagonal Architecture (Alistair Cockburn) https://www.markhneedham.com/blog/2009/07/30/book-club-hexagonal-architecture-alistair-cockburn/ Thu, 30 Jul 2009 00:59:18 +0000 https://www.markhneedham.com/blog/2009/07/30/book-club-hexagonal-architecture-alistair-cockburn/ In our latest book club we discussed Alistair Cockburn’s Hexagonal Architecture which I first heard about around a year ago and was another of Dave Cameron's recommendations. As I understand it, the article describes an architecture for our systems where the domain sits in the centre and other parts of the system depend on the domain while the domain doesn’t depend on anything concrete but is interacted with by various adapters. Reading Code: Rhino Mocks https://www.markhneedham.com/blog/2009/07/28/reading-code-rhino-mocks/ Tue, 28 Jul 2009 00:05:11 +0000 https://www.markhneedham.com/blog/2009/07/28/reading-code-rhino-mocks/ I spent a bit of time recently reading through some of the Rhino Mocks to get a basic understanding of how some features work under the hood. As well as just getting some practice at reading unfamiliar code I also wanted to know the following: How does the 'VerifyAllExpectations' extension method work? What’s the difference between the 'GenerateMock' and 'GenerateStub' methods on MockRepository? How does the 'AssertWasNotCalled' extension method actually work? F#: Playing around with asynchronous workflows https://www.markhneedham.com/blog/2009/07/26/f-playing-around-with-asynchronous-workflows/ Sun, 26 Jul 2009 23:45:14 +0000 https://www.markhneedham.com/blog/2009/07/26/f-playing-around-with-asynchronous-workflows/ I spent a bit of time over the weekend playing around with F# asynchronous workflows and seeing how they could be used to launch Firefox windows asynchronously for my FeedBurner graph creator. Initially I decided to try out the 'Async.RunWithContinuations' function which I recently read about on Matthew Podwysocki’s blog. Matthew describes this as being a function which is useful for executing a single operation asynchronously and this worked out quite well for me as my application only has the ability to get one feed and then create a graph from its data. F#: Values, functions and DateTime https://www.markhneedham.com/blog/2009/07/25/f-values-functions-and-datetime/ Sat, 25 Jul 2009 14:10:45 +0000 https://www.markhneedham.com/blog/2009/07/25/f-values-functions-and-datetime/ One of the things I’ve noticed recently in my playing around with F# is that when we decide to wrap calls to the .NET DateTime methods there is a need to be quite careful that we are wrapping those calls with an F# function and not an F# value. If we don’t do this then the DateTime method will only be evaluated once and then return the same value for every call which is probably not the behaviour we’re looking for. Cruise Agents: Reducing 'random' build failures https://www.markhneedham.com/blog/2009/07/25/cruise-agents-reducing-random-build-failures/ Sat, 25 Jul 2009 11:28:38 +0000 https://www.markhneedham.com/blog/2009/07/25/cruise-agents-reducing-random-build-failures/ As I mentioned previously we’re making use of multiple cruise agents in our build to allow us to run our acceptance tests in parallel, therefore allowing a build which would be nearly 2 hours if run in sequence to be completed in around 10 minutes. Early on with this approach we were getting a lot of failures in our builds which weren’t directly related to the code being changed and were more to do with the various dependencies we were making use of. Wrapping collections: Inheritance vs Composition https://www.markhneedham.com/blog/2009/07/24/wrapping-collections-inheritance-vs-composition/ Fri, 24 Jul 2009 01:07:23 +0000 https://www.markhneedham.com/blog/2009/07/24/wrapping-collections-inheritance-vs-composition/ I wrote previously about the differences between wrapping collections and just creating extension methods to make our use of collections in the code base more descriptive but I’ve noticed in code I’ve been reading recently that there appear to be two ways of wrapping the collection - using composition as I described previously but also extending the collection by using inheritance. I was discussing this with Lu Ning recently and he pointed out that if what we have is actually a collection then it might make more sense to extend the collection with a custom class whereas if the collection is just an implementation detail of some other domain concept then it would be better to use composition. Good Lazy and Bad Lazy https://www.markhneedham.com/blog/2009/07/21/good-lazy-and-bad-lazy/ Tue, 21 Jul 2009 23:10:20 +0000 https://www.markhneedham.com/blog/2009/07/21/good-lazy-and-bad-lazy/ One of the things I remember picking up from reading The Pragmatic Programmer is that developers need to be lazy in order to find better ways to solve problems and I came across a post by Philipp Lensson from a few years ago where he also suggests good developers are lazy and dumb. Something which I’ve come to realise more recently is that it’s not necessarily true that being lazy as a developer is always a good thing - it depends in what way you are being lazy because there are certainly good and bad ways in which you can express your laziness! Coding: Quick feedback https://www.markhneedham.com/blog/2009/07/20/coding-quick-feedback/ Mon, 20 Jul 2009 21:10:12 +0000 https://www.markhneedham.com/blog/2009/07/20/coding-quick-feedback/ One of the most important things to achieve if we are to get any sort of productivity when writing code is to find ways to get the quickest feedback possible. My general default stance with respect to this has always been to TDD code although I’ve found when coding in F# that I’m not actually sure what the overall best way to get quick feedback is. This is partly because I haven’t been able to find a way to run tests easily from inside Visual Studio but also partly because even when you do this the code for the whole project needs to be recompiled before the tests can be run which takes time. F#: Active patterns for parsing xml https://www.markhneedham.com/blog/2009/07/19/f-active-patterns-for-parsing-xml/ Sun, 19 Jul 2009 12:12:13 +0000 https://www.markhneedham.com/blog/2009/07/19/f-active-patterns-for-parsing-xml/ I decided to spend some time doing some refactoring on the FeedBurner application that I started working on last week and the first area I worked on was cleaning up the way that the xml we get from FeedBurner is parsed. While playing around with the application from the command line I realised that it didn’t actually cover error conditions - such as passing in an invalid feed name - very well and I thought this would be a good opportunity to make use of an active pattern to handle this. Book Club: The Dreyfus Model (Stuart and Hubert Dreyfus) https://www.markhneedham.com/blog/2009/07/18/book-club-the-dreyfus-model-stuart-and-hubert-dreyfus/ Sat, 18 Jul 2009 10:40:30 +0000 https://www.markhneedham.com/blog/2009/07/18/book-club-the-dreyfus-model-stuart-and-hubert-dreyfus/ In our latest book club we discussed the Dreyfus Model, a paper written in 1980 by Stuart and Hubert Dreyfus. I’ve become quite intrigued by the Dreyfus Model particularly since reading about its applicability to software development in Andy Hunt’s Pragmatic Learning and Thinking and after looking through Pat Kua’s presentation on 'Climbing the Dreyfus Ladder of Agile Practices' I thought it’d be interesting to study the original paper. F#: Passing command line arguments to a script https://www.markhneedham.com/blog/2009/07/16/f-passing-command-line-arguments-to-a-script/ Thu, 16 Jul 2009 07:40:18 +0000 https://www.markhneedham.com/blog/2009/07/16/f-passing-command-line-arguments-to-a-script/ I’ve been doing a bit of refactoring of my FeedBurner application so that I can call it from the command line with the appropriate arguments and one of the problems I came across is working out how to pass arguments from the command line into an F# script. With a compiled application we are able to make use of the 'EntryPointAttribute' to get access to the arguments passed in: Book Club: An agile approach to a legacy system (Chris Stevenson and Andy Pols) https://www.markhneedham.com/blog/2009/07/15/book-club-an-agile-approach-to-a-legacy-system-chris-stevenson-and-andy-pols/ Wed, 15 Jul 2009 00:53:45 +0000 https://www.markhneedham.com/blog/2009/07/15/book-club-an-agile-approach-to-a-legacy-system-chris-stevenson-and-andy-pols/ Our latest book club session was a discussion on a paper written by my colleague Chris Stevenson and Andy Pols titled 'An Agile Approach to a Legacy System' which I think was written in 2004. This paper was suggested by Dave Cameron. These are some of my thoughts and our discussion of the paper: The first thing that was quite interesting was that the authors pointed out that if you just try and rewrite a part of a legacy system you are actually just writing legacy code yourself - we weren’t sure exactly what was meant by this since for me at least the definition of legacy code is 'code which we are scared to change [because it has no tests]' but presumably the new code did have tests so it wasn’t legacy in this sense. Test Doubles: My current approach https://www.markhneedham.com/blog/2009/07/14/test-doubles-my-current-approach/ Tue, 14 Jul 2009 13:23:52 +0000 https://www.markhneedham.com/blog/2009/07/14/test-doubles-my-current-approach/ My colleague Sarah Taraporewalla recently wrote about her thoughts on test doubles (to use Gerard Meszaros' language) and it got me thinking about the approach I generally take in this area. Stub objects I use stubs mostly to control the output of depended on components of the system under test where we don’t want to verify those outputs. Most of the time I make use of the mocking library’s ability to stub out method calls on these dependencies. F#: A day writing a Feedburner graph creator https://www.markhneedham.com/blog/2009/07/12/f-a-day-writing-a-feedburner-graph-creator/ Sun, 12 Jul 2009 17:14:13 +0000 https://www.markhneedham.com/blog/2009/07/12/f-a-day-writing-a-feedburner-graph-creator/ I’ve spent a bit of the day writing a little application to take the xml from my Feedburner RSS feed and create a graph showing the daily & weekly average subscribers. What did I learn? I decided that I wanted to parameterise the feedburner url so that I would be able to run the code for different time periods and against different feeds. In C# we’d probably make use of 'string. F#: Wrapping .NET library calls https://www.markhneedham.com/blog/2009/07/12/f-wrapping-net-library-calls/ Sun, 12 Jul 2009 12:11:46 +0000 https://www.markhneedham.com/blog/2009/07/12/f-wrapping-net-library-calls/ I’ve been spending a bit of time writing some code to parse the xml of my Feedburner RSS feed and create a graph to show both the daily and weekly average subscribers which you can’t currently get from the Feedburner dashboard. One thing which I found while doing this is that calls to the .NET base class library don’t seem to fit in that well with the way that you would typically compose functions together in F#. Continuous Integration: Community College Discussion https://www.markhneedham.com/blog/2009/07/11/continuous-integration-community-college-discussion/ Sat, 11 Jul 2009 14:13:48 +0000 https://www.markhneedham.com/blog/2009/07/11/continuous-integration-community-college-discussion/ We ran a session on Continuous Integration at the most recent Community College in the ThoughtWorks Sydney office. It was roughly based around a CI Maturity Model which I recently came across although the intention was to find out what other teams were doing CI wise. I became a bit more aware of how little I know about CI after listening to a Software Engineering Radio interview with my colleague Chris Read so I was keen to see how other teams are approaching this problem. F#: Downloading a file from behind a proxy https://www.markhneedham.com/blog/2009/07/11/f-downloading-a-file-from-behind-a-proxy/ Sat, 11 Jul 2009 03:20:25 +0000 https://www.markhneedham.com/blog/2009/07/11/f-downloading-a-file-from-behind-a-proxy/ I’ve been continuing working on a little script to parse Cruise build data and the latest task was to work out how to download my Google Graph API created image onto the local disk. I’m using the WebClient class to do this and the code looks like this: let DownloadGraph (fileLocation:string) (uri:System.Uri) = async { let webClient = new WebClient() webClient.DownloadFileAsync(uri, fileLocation)} Sadly this doesn’t work when I run it from the client site where I have access to the build metrics as there is a corporate proxy sitting in the way. F#: Convert sequence to comma separated string https://www.markhneedham.com/blog/2009/07/09/f-convert-sequence-to-comma-separated-string/ Thu, 09 Jul 2009 22:32:55 +0000 https://www.markhneedham.com/blog/2009/07/09/f-convert-sequence-to-comma-separated-string/ I’ve been continuing playing around with parsing Cruise data as I mentioned yesterday with the goal today being to create a graph from the build data. After recommendations from Dean Cornish and Sam Newman on Twitter I decided to give the Google Graph API a try to do this and realised that I would need to create a comma separated string listing all the build times to pass to the Google API. F#: Parsing Cruise build data https://www.markhneedham.com/blog/2009/07/08/f-parsing-cruise-build-data/ Wed, 08 Jul 2009 22:46:05 +0000 https://www.markhneedham.com/blog/2009/07/08/f-parsing-cruise-build-data/ I’ve been playing around a bit with the properties REST API that Cruise exposes to try and get together some build metrics and I decided it might be an interesting task to try and use F# for. I’m making use of the 'search' part of the API to return the metrics of all the builds run on a certain part of the pipeline and I then want to parse those results so that I can extract just the name of the agent that ran that build and the duration of that build. Book Club: Why noone uses functional languages (Philip Wadler) https://www.markhneedham.com/blog/2009/07/08/book-club-why-noone-uses-functional-languages-philip-wadler/ Wed, 08 Jul 2009 00:29:56 +0000 https://www.markhneedham.com/blog/2009/07/08/book-club-why-noone-uses-functional-languages-philip-wadler/ Our latest technical book club discussion was based around Philip Wadler’s paper 'Why noone uses functional langauges' which he wrote in 1998. I came across this paper when reading some of the F# goals in the FAQs on the Microsoft website. These are some of my thoughts and our discussion of the paper: One of the points suggested in the paper is that functional languages aren’t used because of their lack of availability on machines but as Dave pointed out this doesn’t really seem to be such a big problem these days - certainly for F# I’ve found it relatively painless to get it setup and running and even for a language like Ruby people are happy to download and install it on their machines and it is also pretty much painless to do so. C#: Removing duplication in mapping code with partial classes https://www.markhneedham.com/blog/2009/07/07/c-removing-duplication-in-mapping-code-with-partial-classes/ Tue, 07 Jul 2009 18:11:36 +0000 https://www.markhneedham.com/blog/2009/07/07/c-removing-duplication-in-mapping-code-with-partial-classes/ One of the problems that we’ve come across while writing the mapping code for our anti corruption layer is that there is quite a lot of duplication of mapping similar types due to the fact that each service has different auto generated classes representing the same data structure. We are making SOAP web service calls and generating classes to represent the requests and responses to those end points using SvcUtil. Domain Driven Design: Anti Corruption Layer https://www.markhneedham.com/blog/2009/07/07/domain-driven-design-anti-corruption-layer/ Tue, 07 Jul 2009 09:05:57 +0000 https://www.markhneedham.com/blog/2009/07/07/domain-driven-design-anti-corruption-layer/ I previously wrote about some of the Domain Driven Design patterns we have noticed on my project and I think the pattern which ties all these together is the anti corruption layer. The reason why you might use an anti corruption layer is to create a little padding between subsystems so that they do not leak into each other too much. Remember, an ANTICORRUPTION LAYER is a means of linking two BOUNDED CONTEXTS. Brownfield Application Development in .NET: Book Review https://www.markhneedham.com/blog/2009/07/06/brownfield-application-development-in-net-book-review/ Mon, 06 Jul 2009 00:43:40 +0000 https://www.markhneedham.com/blog/2009/07/06/brownfield-application-development-in-net-book-review/ The Book Brownfield Application Development in .NET by Kyle Baley and Donald Belcham The Review I asked to be sent this book to review by Manning as I was quite intrigued to see how well it would complement Michael Feather’s Working Effectively with Legacy Code, the other book I’m aware of which covers approaches to dealing with non green field applications. What did I learn? The authors provide a brief description of the two different approaches to unit testing - state based and behaviour based - I’m currently in favour of the latter approach and Martin Fowler has a well known article which covers pretty much anything you’d want to know about this topic area. Domain Driven Design: Conformist https://www.markhneedham.com/blog/2009/07/04/domain-driven-design-conformist/ Sat, 04 Jul 2009 10:17:31 +0000 https://www.markhneedham.com/blog/2009/07/04/domain-driven-design-conformist/ Something which constantly surprises me about Domain Driven Design is how there is a pattern described in the book for just about every possible situation you find yourself in when coding on projects. A lot of these patterns appear in the 'Strategic Design' section of the book and one which is very relevant for the project I’m currently working on is the 'Conformist' pattern which is described like so: Coding Dojo #19: Groovy Traveling salesman variation https://www.markhneedham.com/blog/2009/07/04/coding-dojo-19-groovy-traveling-salesman-variation/ Sat, 04 Jul 2009 09:36:01 +0000 https://www.markhneedham.com/blog/2009/07/04/coding-dojo-19-groovy-traveling-salesman-variation/ Our latest coding dojo involved working on a variation of the traveling salesman problem in Groovy again. The Format We had 8 people participating this week so we returned to the Randori format, rotating the pair at the keyboard every 7 minutes. Give the number of people it might have actually been better to have a couple of machines and use the UberDojo format. What We Learnt The importance of just getting started stood out a lot for me in this dojo - there have been quite a few times when we’ve met intending to do some coding and spent so long talking about coding that we didn’t end up writing anything. F#: Pattern matching with the ':?' operator https://www.markhneedham.com/blog/2009/07/02/f-pattern-matching-with-the-operator/ Thu, 02 Jul 2009 23:10:19 +0000 https://www.markhneedham.com/blog/2009/07/02/f-pattern-matching-with-the-operator/ I’ve been doing a bit more reading of the Fake source code and one interesting thing which I came across which I hadn’t seen was an active pattern which was making use of the ':?' operator to match the input type against .NET types. let (|File|Directory|) (fileSysInfo : FileSystemInfo) = match fileSysInfo with | :? FileInfo as file -> File (file.Name) | :? DirectoryInfo as dir -> Directory (dir.Name, seq { for x in dir. Book Club: Logging - Release It (Michael Nygaard) https://www.markhneedham.com/blog/2009/07/02/book-club-logging-release-it-michael-nygaard/ Thu, 02 Jul 2009 12:04:34 +0000 https://www.markhneedham.com/blog/2009/07/02/book-club-logging-release-it-michael-nygaard/ Our latest technical book club session was a discussion of the logging section in Michael Nygard’s Release It. I recently listened to an interview with Michael Nygard on Software Engineering Radio so I was interested in reading more of his stuff and Cam suggested that the logging chapter would be an interesting one to look at as it’s often something which we don’t spend a lot of time thinking about on software development teams. F#: What I've learnt so far https://www.markhneedham.com/blog/2009/06/30/f-what-ive-learnt-so-far/ Tue, 30 Jun 2009 23:09:35 +0000 https://www.markhneedham.com/blog/2009/06/30/f-what-ive-learnt-so-far/ I did a presentation of some of the stuff that I’ve learnt from playing around with F# over the last six months or so at the most recent Alt.NET Sydney meeting. I’ve included the slides below but there was also some interesting discussion as well. One of the questions asked was around how you would deal with code on a real project with regards to structuring it and ensuring that it was maintainable. F#: Setting properties like named parameters https://www.markhneedham.com/blog/2009/06/29/f-setting-properties-like-named-parameters/ Mon, 29 Jun 2009 00:28:14 +0000 https://www.markhneedham.com/blog/2009/06/29/f-setting-properties-like-named-parameters/ One of the most frustrating things for me lately about interacting with C# libraries from F# has been setting up objects through the use of properties. While I am against the use of setters to construct objects in the first place, that’s the way that a lot of libraries work so it’s a bit of a necessary evil! In C# we would typically make use of the object initializer syntax to do this, but in F# I’ve been writing code like this to do the same thing: F#: More thoughts on the forward & application operators https://www.markhneedham.com/blog/2009/06/27/f-more-thoughts-on-the-forward-application-operators/ Sat, 27 Jun 2009 22:55:02 +0000 https://www.markhneedham.com/blog/2009/06/27/f-more-thoughts-on-the-forward-application-operators/ I’ve been spending a bit of time reading through the Fake source code to try and understand how it works and one of the things which I quite like about it is the way the authors have made use of different F# operators to make expressions easier to read by reducing the number of brackets that need to be written and reordering the functions/values depending on the particular context. Coding Dojo #18: Groovy Bowling Game https://www.markhneedham.com/blog/2009/06/26/coding-dojo-18-groovy-bowling-game/ Fri, 26 Jun 2009 18:15:23 +0000 https://www.markhneedham.com/blog/2009/06/26/coding-dojo-18-groovy-bowling-game/ This week’s dojo involved coding a familiar problem - the bowling game - in a different language, Groovy. The code we wrote is available on bitbucket. The Format Cam, Dean and I took turns pairing with each other with the code projected onto a TV. As there were only a few of us the discussion on where we were taking the code tended to included everyone rather than just the two at the keyboard. Safe refactoring: Removing object initializer, introducing builder https://www.markhneedham.com/blog/2009/06/26/safe-refactoring-removing-object-initializer-introducing-builder/ Fri, 26 Jun 2009 00:02:45 +0000 https://www.markhneedham.com/blog/2009/06/26/safe-refactoring-removing-object-initializer-introducing-builder/ I previously wrote about an approach we took to safely remove some duplication and I recently followed a similar mantra to replace an object initializer call which had around 40 properties being setup with a builder to try and make the code a bit easier to understand. We did have tests checking the values being setup by the object initializer so I was already able to refactor with some degree of safety - it would probably have been possible to just create the builder and build the object from that and then delete the old code and replace it with the new but I’ve caused myself too many problems from doing that before that I decided to try a more incremental approach. QTB: Agile Adoption - How to stuff it up https://www.markhneedham.com/blog/2009/06/24/qtb-agile-adoption-how-to-stuff-it-up/ Wed, 24 Jun 2009 23:58:38 +0000 https://www.markhneedham.com/blog/2009/06/24/qtb-agile-adoption-how-to-stuff-it-up/ I attended the most recent ThoughtWorks Quarterly Technology briefing on Tuesday which was titled 'Agile Adoption - How to stuff it up' and presented by my colleagues Andy Marks and Martin Fowler. There seems to be quite a few books out at the moment about how to introduce a more agile approach into your organisation - I’ve been reading Lean-Agile Software Development and Becoming Agile and there is also a book called Scaling Lean and Agile Development - so I was intrigued to see whether the messages from this talk would be similar to those in these books. Using Fiddler with IIS https://www.markhneedham.com/blog/2009/06/24/using-fiddler-with-iis/ Wed, 24 Jun 2009 17:46:23 +0000 https://www.markhneedham.com/blog/2009/06/24/using-fiddler-with-iis/ We’ve been using Fiddler to debug the requests and responses sent via web services to a service layer our application interacts with and it works pretty well when you run the application using Cassini but by default won’t work when you run the website through IIS. The key to this as one of my colleagues (who gives credit to Erik) showed me today is to ensure that IIS is running under the same user that Fiddler is running under which in our case is the 'Administrator' account. Visual Studio/Resharper: Changing the order of arguments https://www.markhneedham.com/blog/2009/06/23/visual-studioresharper-changing-the-order-of-arguments/ Tue, 23 Jun 2009 19:31:37 +0000 https://www.markhneedham.com/blog/2009/06/23/visual-studioresharper-changing-the-order-of-arguments/ We’ve recently run into some places in our tests where the expectation and actual values passed into NUnit's 'Assert.AreEqual' are the wrong way round, therefore meaning that the error messages we get when tests fail are somewhat confusing! Assert.AreEqual(theActualValue, "the expectation"); We can change the arguments around using Resharper by using the key combination 'Ctrl-Alt-Shift-ArrowKey' but you can only do this one line at a time which was a bit annoying as there were about 20 to change. F#: Continuation Passing Style https://www.markhneedham.com/blog/2009/06/22/f-continuation-passing-style/ Mon, 22 Jun 2009 23:39:07 +0000 https://www.markhneedham.com/blog/2009/06/22/f-continuation-passing-style/ I recently came across the idea of continuations while reading Real World Functional Programming and Wes Dyer has a blog post where he explains continuations in more detail and also talks about the idea of using a continuation passing style in languages which don’t support Call/CC (Call with Current continuation). As I understand it we can achieve a continuation passing style of programming by passing in the bit of code that we went executed next (i. Seams: Some thoughts https://www.markhneedham.com/blog/2009/06/21/seams-some-thoughts/ Sun, 21 Jun 2009 17:21:22 +0000 https://www.markhneedham.com/blog/2009/06/21/seams-some-thoughts/ I pick up Michael Feathers' Working Effectively with Legacy Code book from time to time and one of my favourite parts of the book is the chapter where he talks about 'Seams'. To quote the book: A seam is a place where you can alter behaviour in your program without editing in that place Seams in the book are generally discussed in terms of how we can get tests around legacy code which was written without easy testability in mind but I’ve noticed that the ideas behind seams seem to be more widely applicable than this. Book Club: The Readability of Tests - Growing Object Oriented Software (Steve Freeman/Nat Pryce) https://www.markhneedham.com/blog/2009/06/20/book-club-the-readability-of-tests-growing-object-oriented-software-steve-freemannat-pryce/ Sat, 20 Jun 2009 11:26:51 +0000 https://www.markhneedham.com/blog/2009/06/20/book-club-the-readability-of-tests-growing-object-oriented-software-steve-freemannat-pryce/ Our technical book club this week focused on 'The Readability of Tests' chapter from Steve Freeman & Nat Pryce’s upcoming book 'Growing Object Oriented Software, guide by tests'. I’ve been reading through some of the other chapters online and I thought this would be an interesting chapter to talk about as people seem to have different opinions on how DRY tests should be, how we build test data, how we name tests and so on. Functional Collection Parameters: A different way of thinking about collections https://www.markhneedham.com/blog/2009/06/18/functional-collection-parameters-a-different-way-of-thinking-about-collections/ Thu, 18 Jun 2009 18:31:59 +0000 https://www.markhneedham.com/blog/2009/06/18/functional-collection-parameters-a-different-way-of-thinking-about-collections/ One of the changes that I’ve noticed in my coding now compared to around 7 or 8 months ago is that whenever there’s some operations to be performed on a collection I am far more inclined to think of how to do those operations using a functional approach. I’ve written previously about the ways I’ve been making use of functional collection parameters in my code but what I hadn’t really considered was that the way of thinking about the problem we want to solve is slightly different. Book Club: Arguments and Results (James Noble) https://www.markhneedham.com/blog/2009/06/16/book-club-arguments-and-results-james-noble/ Tue, 16 Jun 2009 23:37:04 +0000 https://www.markhneedham.com/blog/2009/06/16/book-club-arguments-and-results-james-noble/ We restarted our book club again last week by reading James Noble’s Arguments and Results paper, a paper I came across from a Michael Feathers blog post a few months ago detailing 10 papers that every programmer should read. We decided to try out the idea of reading papers/individual chapters from books as it allows us to vary the type of stuff we’re reading more frequently and is an approach which Obie seems to be having some success with. Functional Collection Parameters: Handling the null collection https://www.markhneedham.com/blog/2009/06/16/functional-collection-parameters-handling-the-null-collection/ Tue, 16 Jun 2009 20:29:29 +0000 https://www.markhneedham.com/blog/2009/06/16/functional-collection-parameters-handling-the-null-collection/ One of the interesting cases where I’ve noticed we tend to avoid functional collection parameters in our code base is when there’s the possibility of the collection being null. The code is on the boundary of our application’s interaction with another service so it is actually a valid scenario that we could receive a null collection. When using extension methods, although we wouldn’t get a null pointer exception by calling one on a null collection, we would get a 'source is null' exception when the expression is evaluated so we need to protect ourself against this. C#/F#: Using .NET framework classes https://www.markhneedham.com/blog/2009/06/16/cf-using-net-framework-classes/ Tue, 16 Jun 2009 18:55:38 +0000 https://www.markhneedham.com/blog/2009/06/16/cf-using-net-framework-classes/ I was recently discussing F# with a couple of colleagues and one thing that came up is the slightly different ways that we might choose to interact with certain .NET framework classes compared to how we use those same classes in C# code. One of those where I see potential for different use is the Dictionary class. In C# code when we’re querying a dictionary to check that a value exists before we try to extract it we might typically do this: F#: Using C# extension methods https://www.markhneedham.com/blog/2009/06/15/f-using-c-extension-methods/ Mon, 15 Jun 2009 20:03:34 +0000 https://www.markhneedham.com/blog/2009/06/15/f-using-c-extension-methods/ An interesting thing I noticed about referencing C# libraries from F# is that you can’t access C# extension methods on generic open types in the same way that you would be able to if you were using the library from C# code. I came across this problem when playing around with the Rhino Mocks framework in some F# code. I wrote a simple test to see whether I could get an expectation to work correctly, without paying any regard for the fact that you can’t use all C# extension methods in the same way as you can from C# code! F#: Overlapping fields in record types https://www.markhneedham.com/blog/2009/06/14/f-overlapping-fields-in-record-types/ Sun, 14 Jun 2009 00:37:01 +0000 https://www.markhneedham.com/blog/2009/06/14/f-overlapping-fields-in-record-types/ A problem which has confused me for a while is how to create instances of record types whose fields overlap with another record defined further down in an F# file. The most recently defined record seems to take precedence even if it has more fields than a record defined earlier and you don’t specify all of those fields in your record creation attempt. For example, if I have the following two record types: Coding: Single Level of Abstraction Principle https://www.markhneedham.com/blog/2009/06/12/coding-single-level-of-abstraction-principle/ Fri, 12 Jun 2009 17:35:51 +0000 https://www.markhneedham.com/blog/2009/06/12/coding-single-level-of-abstraction-principle/ One of the other useful principles for writing readable code that I’ve come across in the last year or so is the Single Level of Abstraction Principle. I first came across the idea of writing code at the same level of abstraction in Uncle Bob’s Clean Code although I only learnt about the actual term in Neal Ford’s The Productive Programmer. As the name suggests the idea is that within a certain method we look to keep all the code at the same level of abstraction to help us read it more easily. Coding Dojo #17: Refactoring Cruise Control .NET https://www.markhneedham.com/blog/2009/06/12/coding-dojo-17-refactoring-cruise-control-net/ Fri, 12 Jun 2009 17:07:30 +0000 https://www.markhneedham.com/blog/2009/06/12/coding-dojo-17-refactoring-cruise-control-net/ After a couple of weeks of more experimental coding dojos this week we decided to get back to some pure coding with the session being focused around doing some refactoring of the continuous integration server Cruise Control .NET. The overall intention of the refactoring we worked on is to try and introduce the concept of a 'ChangeSet' into the code base to represent the revisions that come in from source control systems that CC. Coding: Keep method/variable names positive https://www.markhneedham.com/blog/2009/06/11/coding-keep-methodvariable-names-positive/ Thu, 11 Jun 2009 07:44:41 +0000 https://www.markhneedham.com/blog/2009/06/11/coding-keep-methodvariable-names-positive/ Something which I’ve come across a few times recently in code is method names which describe the negative aspect of something and for me at least these are very difficult to understand since I need to keep remembering that we are dealing with the negative and not the positive which I think is significantly easier to reason about. A recent example of this which I came across was in some acceptance test code which among other things was asserting whether or not the policy number that had been created was in a valid format and returning the result of that assertion back to our Fitnesse fixture. F#: Useful for scripting https://www.markhneedham.com/blog/2009/06/09/f-useful-for-scripting/ Tue, 09 Jun 2009 23:29:15 +0000 https://www.markhneedham.com/blog/2009/06/09/f-useful-for-scripting/ We had the need to do a bit of scripting recently to change the names of the folders where we store our artifacts to signify which artifacts were created from our build’s production branch and which were generated from the main branch. The problem we had was that we were ending up overwriting old artifacts from the main branch with the production branch’s artifacts so we wanted to fix this. Pair Programming: So you don't want to do it... https://www.markhneedham.com/blog/2009/06/08/pair-programming-so-you-dont-want-to-do-it/ Mon, 08 Jun 2009 17:05:46 +0000 https://www.markhneedham.com/blog/2009/06/08/pair-programming-so-you-dont-want-to-do-it/ I’ve worked on several software development teams over the last few years - some that pair programmed all the time and some that didn’t - and one of the key things that I’ve noticed is that the level of collaboration on these teams was significantly higher when pair programming was being done on a regular basis. The following are some of the observations I have noticed in teams which don’t pair program frequently. Javascript: Using 'replace' to make a link clickable https://www.markhneedham.com/blog/2009/06/08/javascript-using-replace-to-make-a-link-clickable/ Mon, 08 Jun 2009 11:57:39 +0000 https://www.markhneedham.com/blog/2009/06/08/javascript-using-replace-to-make-a-link-clickable/ I’ve been doing a bit more work on my twitter application over the weekend - this time taking the tweets that I’ve stored in CouchDB and displaying them on a web page. One of the problems I had is that the text of the tweets is just plain text so if there is a link in a tweet then when I display it on a web page it isn’t clickable since it isn’t enclosed by the '<a href"…"></a>' tag. F#: Explicit interface implementation https://www.markhneedham.com/blog/2009/06/07/f-explicit-interface-implementation/ Sun, 07 Jun 2009 08:19:01 +0000 https://www.markhneedham.com/blog/2009/06/07/f-explicit-interface-implementation/ I’ve been writing some code to map between CouchDB documents and F# objects and something which I re-learned while doing this is the way that interfaces work in F#. In F# when you have a class which implements an interface that class makes use of explicit interface implementation. This means that in order to access any members of the interface that the class implements you need to specifically refer to the interface by upcasting the value using the ':>' operator. Coding: Why do we extract method? https://www.markhneedham.com/blog/2009/06/04/coding-why-do-we-extract-method/ Thu, 04 Jun 2009 20:30:47 +0000 https://www.markhneedham.com/blog/2009/06/04/coding-why-do-we-extract-method/ Ever since I’ve read Uncle Bob’s Clean Code book my approach to coding has been all about the 'extract method' refactoring - I pretty much look to extract method as much as I can until I get to the point where extracting another method would result in me just describing the language semantics in the method name. One of the approaches that I’ve come across with regards to doing this refactoring is that it’s only used when there is duplication of code and we want to reduce that duplication so that it’s all in one place and then call that method from two places. Coding: Putting code where people can find it https://www.markhneedham.com/blog/2009/06/02/coding-putting-code-where-people-can-find-it/ Tue, 02 Jun 2009 23:35:31 +0000 https://www.markhneedham.com/blog/2009/06/02/coding-putting-code-where-people-can-find-it/ I’ve previously written about the builder pattern which I think is a very useful pattern for helping to setup data. It allows us to setup custom data when we care about a specific piece of data in a test or just use default values if we’re not bothered about a piece of data but need it to be present for our test to execute successfully. One problem that I noticed was that despite the fact we had builders for quite a number of the classes we were using in our tests, when new tests were being added test data was still being setup by directly using the classes instead of making use of the builders which had already done the hard work for you. F#: Tuples don't seem to express intent well https://www.markhneedham.com/blog/2009/06/02/f-tuples-dont-seem-to-express-intent-well/ Tue, 02 Jun 2009 22:01:52 +0000 https://www.markhneedham.com/blog/2009/06/02/f-tuples-dont-seem-to-express-intent-well/ Tuples are one of the data types that I learnt about at university but never actually got to use for anything until I started playing around with F# which has this type in the language. A tuple describes an ordered group of values and in that sense is similar to a C# anonymous type except an anonymous type’s values are named whereas a tuple’s are not. In F# we can create one by separating a sequence of values with a comma in a value assignment: VMware: Accessing host server https://www.markhneedham.com/blog/2009/06/02/vmware-accessing-host-server/ Tue, 02 Jun 2009 21:36:46 +0000 https://www.markhneedham.com/blog/2009/06/02/vmware-accessing-host-server/ I’ve been doing all my spare time .NET development from within VMWare for about the last year or so and now and then it’s quite useful to be able to access the host machine either to get some files from there or to access a server that’s running on the host. The former problem is solved by going to 'Virtual Machines -> Shared Folders' and clicking on the + button on the bottom left of the menu to add a folder that you want to share. CouchDB/Futon: '_all_dbs' call returns databases with leading 'c/' https://www.markhneedham.com/blog/2009/05/31/couchdbfuton-_all_dbs-call-returns-databases-with-leading-c/ Sun, 31 May 2009 23:28:20 +0000 https://www.markhneedham.com/blog/2009/05/31/couchdbfuton-_all_dbs-call-returns-databases-with-leading-c/ As I mentioned in my previous post I’ve been playing around with CouchDB and one of the problems that I’ve been having is that although I can access my database through the REST API perfectly fine, whenever I went to the Futon page ('http://localhost:5984/_utils/' in my case) to view my list of databases I was getting the following javascript error: Database information could not be retrieved: missing I thought I’d have a quick look with FireBug to see if I could work out what was going on and saw several requests being made to the following urls and resulting in 404s: SharpCouch: Use anonymous type to create JSON objects https://www.markhneedham.com/blog/2009/05/31/sharpcouch-use-anonymous-type-to-create-json-objects/ Sun, 31 May 2009 20:59:47 +0000 https://www.markhneedham.com/blog/2009/05/31/sharpcouch-use-anonymous-type-to-create-json-objects/ I’ve been playing around with CouchDB a bit today and in particular making use of SharpCouch, a library which acts as a wrapper around CouchDB calls. It is included in the CouchBrowse library which is recommended as a good starting point for interacting with CouchDB from C# code. I decided to work out how the API worked with by writing an integration test to save a document to the database. F#: Testing asynchronous calls to MailBoxProcessor https://www.markhneedham.com/blog/2009/05/30/f-testing-asynchronous-calls-to-mailboxprocessor/ Sat, 30 May 2009 20:38:02 +0000 https://www.markhneedham.com/blog/2009/05/30/f-testing-asynchronous-calls-to-mailboxprocessor/ Continuing with my attempts to test some of the code in my twitter application I’ve been trying to work out how to test the Erlang style messaging which I set up to process tweets when I had captured them using the TweetSharp API. The problem I had is that that processing is being done asynchronously so we can’t test it in our normal sequential way. Chatting with Dave about this he suggested that what I really needed was a latch which could be triggered when the asynchronous behaviour had completed, thus informing the test that it could proceed. xUnit.NET: Running tests written in Visual Studio 2010 https://www.markhneedham.com/blog/2009/05/30/xunitnet-running-tests-written-in-visual-studio-2010/ Sat, 30 May 2009 11:51:53 +0000 https://www.markhneedham.com/blog/2009/05/30/xunitnet-running-tests-written-in-visual-studio-2010/ I’ve been playing around with F# in Visual Studio 2010 after the Beta 1 release last Wednesday and in particular I’ve been writing some xUnit.NET tests around the twitter application I’ve been working on. A problem I ran into when attempting to run my tests against 'xunit.console.exe' is that xUnit.NET is linked to run against version 2.0 of the CLR and right now you can’t actually change the 'targetframework' for a project compiled in Visual Studio 2010. Coding Dojo #16: Reading SUnit code https://www.markhneedham.com/blog/2009/05/29/coding-dojo-16-reading-sunit-code/ Fri, 29 May 2009 09:23:19 +0000 https://www.markhneedham.com/blog/2009/05/29/coding-dojo-16-reading-sunit-code/ Continuing on from last week’s look at Smalltalk, in our latest coding dojo we spent some time investigating the SUnit testing framework, how we would use it to write some tests and looking at how it actually works. The Format We had 3 people for the dojo this week and the majority was spent looking at the code on a big screen and trying to understand between us what was going on. The 5 dysfunctions of teams in code https://www.markhneedham.com/blog/2009/05/28/the-5-dysfunctions-of-teams-in-code/ Thu, 28 May 2009 05:44:52 +0000 https://www.markhneedham.com/blog/2009/05/28/the-5-dysfunctions-of-teams-in-code/ I recently came across an interesting post by my colleague Pat Kua where he talks about how some patterns he’s noticed in code can be linked to Conway’s law which suggests that the structure of systems designed in organisations will mirror the communication structure of that organisation. I recently read a book called 'The Five Dysfunctions of Teams' which describe some behaviours in teams which aren’t working in an effective way. Pair Programming: Refactoring https://www.markhneedham.com/blog/2009/05/26/pair-programming-refactoring/ Tue, 26 May 2009 23:44:36 +0000 https://www.markhneedham.com/blog/2009/05/26/pair-programming-refactoring/ One area of development where I have sometimes wondered about the value that we can get from pair programming is when we’re spending time doing some major refactoring of our code base. The reason I felt that pairing on big refactoring tasks might be difficult compared to when working on a story together is that with a story there tends to be a more defined goal and the business have defined that goal whereas with a refactoring task that goal is often less clear and people have much wider ranging differing opinions about the approach that should be taken. Refactoring: Removing duplication more safely https://www.markhneedham.com/blog/2009/05/26/refactoring-removing-duplication-more-safely/ Tue, 26 May 2009 13:20:01 +0000 https://www.markhneedham.com/blog/2009/05/26/refactoring-removing-duplication-more-safely/ One of the most important things that I’ve learnt from the coding dojo sessions that we’ve been running over the last six months is the importance of small step refactorings. Granted we have been trying to take some of the practices to the extreme but the basic idea of trying to keep the tests green for as much time as well as keeping our code in a state where it still compiles (in a static language) is very useful no matter what code we’re working on. The value of a fresh mind https://www.markhneedham.com/blog/2009/05/26/the-value-of-a-fresh-mind/ Tue, 26 May 2009 00:51:41 +0000 https://www.markhneedham.com/blog/2009/05/26/the-value-of-a-fresh-mind/ I recently read a post by my colleague Sai Venkatakrishnan where he talks about some of the disadvantages of over working on a project and it reminded me of something I’ve noticed a lot recently - notably that after taking a break from solving a problem, either by looking at it again the next day or after lunch or any kind of break I end up solving it significantly more quickly than if I’d kept on trying to solve it without doing so. TDD: Making the test green quickly https://www.markhneedham.com/blog/2009/05/24/tdd-making-the-test-green-quickly/ Sun, 24 May 2009 23:43:28 +0000 https://www.markhneedham.com/blog/2009/05/24/tdd-making-the-test-green-quickly/ Although I pointed out some things that I disagreed with in Nick’s post about pair programming one thing that I really liked in that post was that he emphasised the importance of getting tests from red to green as quickly as possible. I remember the best programming sessions I’ve had was with Stacy Curl, now an ex-thoughtworker and whom I believe was also a chess player. He would always look to quickly make my tests pass, even if it was to just echo the output that my tests would sometimes expect. Real World Functional Programming: Book Review https://www.markhneedham.com/blog/2009/05/24/real-world-functional-programming-book-review/ Sun, 24 May 2009 19:25:07 +0000 https://www.markhneedham.com/blog/2009/05/24/real-world-functional-programming-book-review/ The Book Real World Functional Programming by Tomas Petricek with Jon Skeet (corresponding website) The Review I decided to read this book after being somewhat inspired to learn more about functional programming after talking with Phil about his experiences learning Clojure. I’m currently working on a .NET project so it seemed to make sense that F# was the language I picked to learn. What did I learn? I’ve worked with C# 3. Pair Programming: It's not about equal keyboard time https://www.markhneedham.com/blog/2009/05/23/its-not-about-equal-keyboard-time/ Sat, 23 May 2009 16:35:56 +0000 https://www.markhneedham.com/blog/2009/05/23/its-not-about-equal-keyboard-time/ My colleague Nick Carroll recently blogged some ideas about what to do if your pair is hogging the keyboard, suggesting using a timer which keeps track of how long each person has had at the keyboard as a useful way of ensuring that both people in the pair stay more engaged. While I can see the thinking behind this I think it is addressing the wrong problem. From my experience we don’t always want to be moving the keyboard between the two people quickly at all times - I have certainly seen times where it makes sense for one person to be spending more time at the keyboard than the other. Coding: Setters reduce trust https://www.markhneedham.com/blog/2009/05/23/coding-setters-reduce-trust/ Sat, 23 May 2009 15:37:34 +0000 https://www.markhneedham.com/blog/2009/05/23/coding-setters-reduce-trust/ I’ve written previously about my dislike of the way the object initialiser is misused in C# 3.0 and although I’ve also written about my preference for explicit modeling and the need for objects to act as good citizensI’ve never quite been able to articulate what it is I dislike so much about having setter methods on objects. I’ve learnt from experience that it leads to a world of pain in our code by having the ability to setup an object after construction using setters and in a conversation with a colleague about this last week he suggested that the reason it’s such a bad practice to follow is that it makes us lose our trust in not only that object but in all the other objects in the application. Coding Dojo #15: Smalltalk https://www.markhneedham.com/blog/2009/05/21/coding-dojo-15-smalltalk/ Thu, 21 May 2009 19:05:26 +0000 https://www.markhneedham.com/blog/2009/05/21/coding-dojo-15-smalltalk/ We decided to play around with Smalltalk a bit in our latest coding dojo. A lot of the ideas that I value the most in terms of writing software effectively seem to have originally come from the Smalltalk community and a colleague of mine has been reading Kent Beck’s TDD by Example book and was really keen to try out the language to see where Kent’s original ideas came from. Build: Using virtual machines to run it in parallel https://www.markhneedham.com/blog/2009/05/21/build-using-virtual-machines-to-run-it-in-parallel/ Thu, 21 May 2009 18:02:27 +0000 https://www.markhneedham.com/blog/2009/05/21/build-using-virtual-machines-to-run-it-in-parallel/ One of the things that we’ve been working on lately to improve the overall time that our full build takes to run is to split the acceptance tests into several small groups of tests so that we can run them in parallel. We are using Cruise as our build server so the ability to have multiple agents running against different parts of the build at the same time comes built it. F#: Object expressions https://www.markhneedham.com/blog/2009/05/19/f-object-expressions/ Tue, 19 May 2009 01:38:31 +0000 https://www.markhneedham.com/blog/2009/05/19/f-object-expressions/ One of the things I miss a bit from the Java world is the ability to create anonymous inner classes which implement a certain interface. We can’t do this in C# - you always need to define a named class - but in my latest playing around with F# I was quite pleased to learn that we do have this ability using a feature called object expressions. These come in particularly useful when you are only making use of the implementation of an interface in one place in the code and therefore don’t want to expose this type to any other code. 97 Things Every Software Architect Should Know: Book Review https://www.markhneedham.com/blog/2009/05/18/97-things-every-software-architect-should-know-book-review/ Mon, 18 May 2009 01:03:25 +0000 https://www.markhneedham.com/blog/2009/05/18/97-things-every-software-architect-should-know-book-review/ The Book 97 Things Every Software Architect Should Know by Richard Monson-Haefel The Review My colleague Erik Doernenburg mentioned that he had written a couple of chapters in this book a while ago and there was a copy of the book in the ThoughtWorks office so I thought I’d take a look. I’m far from being an architect but since their decisions affect what I do I was intrigued to see what they should be doing. Coding Dojo #14: Rock, Scissors, Paper - TDD as if you meant it https://www.markhneedham.com/blog/2009/05/15/coding-dojo-14-rock-scissors-paper-tdd-as-if-you-meant-it/ Fri, 15 May 2009 07:39:57 +0000 https://www.markhneedham.com/blog/2009/05/15/coding-dojo-14-rock-scissors-paper-tdd-as-if-you-meant-it/ We decided to have a second week of following Keith Braithwaite’s 'TDD as if you meant it' exercise which he led at the Software Craftsmanship Conference. Our attempt a fortnight ago was around implementing a Flash Message interceptor, to hook into the Spring framework but this week was focused more around modeling, the goal being to model a game of Rock, Paper, Scissors. The code is available on our bitbucket repository. Mercurial: Pulling from behind a proxy https://www.markhneedham.com/blog/2009/05/13/mercurial-pulling-from-behind-a-proxy/ Wed, 13 May 2009 07:49:44 +0000 https://www.markhneedham.com/blog/2009/05/13/mercurial-pulling-from-behind-a-proxy/ I’ve been playing around with Mercurial and the mercurial hosting website bitbucket a bit this year and recently wanted to pull from a repository from behind a proxy server. With a bit of help from the mercurial mailing list and the documentation this is how I was able to pull the repository for the Hambread project I’ve been doing a bit of work on: ~text hg --config http_proxy.host=ipOfYourProxyServer:portOfYourProxyServer --config http_proxy. Debugging: Get to a stage where it works https://www.markhneedham.com/blog/2009/05/12/debugging-get-to-a-stage-where-it-works/ Tue, 12 May 2009 09:21:13 +0000 https://www.markhneedham.com/blog/2009/05/12/debugging-get-to-a-stage-where-it-works/ When debugging a problem I’ve learnt far too many times that where possible the most effective approach is to try and get the application back into a state where it does work and then analyse the changes that have resulted in it no longer working as expected. About 7 or 8 years ago when I used to code PHP at school and university that pretty much was my default approach - I didn’t really know how to program well enough to work out how to fix something that was broken so I would always just revert back all the steps I’d done until it worked. Tackling the risk early on at a task level https://www.markhneedham.com/blog/2009/05/11/tackling-the-risk-early-on-at-a-task-level/ Mon, 11 May 2009 23:54:12 +0000 https://www.markhneedham.com/blog/2009/05/11/tackling-the-risk-early-on-at-a-task-level/ I wrote previously about the idea of tackling the risky tasks in a project early on - an idea that I learnt about when reading Alistair Cockburn’s Crystal Clear. Towards the end of the post I wondered whether we could apply this idea at a story level whereby we would identify the potentially risky parts of a story and make sure that we addressed those risks before they became problematic to us. F#: Regular expressions/active patterns https://www.markhneedham.com/blog/2009/05/10/f-regular-expressionsactive-patterns/ Sun, 10 May 2009 08:58:48 +0000 https://www.markhneedham.com/blog/2009/05/10/f-regular-expressionsactive-patterns/ Josh has been teaching me how to do regular expressions in Javascript this week and intrigued as to how you would do this in F# I came across a couple of blog posts by Chris Smith talking about active patterns and regular expressions via active patterns. As I understand them active patterns are not that much different to normal functions but we can make use of them as part of a let or match statement which we can’t do with a normal function. C#: Using virtual leads to confusion? https://www.markhneedham.com/blog/2009/05/06/c-using-virtual-leads-to-confusion/ Wed, 06 May 2009 19:30:50 +0000 https://www.markhneedham.com/blog/2009/05/06/c-using-virtual-leads-to-confusion/ A colleague and I were looking through some code that I worked on a couple of months ago where I had created a one level hierarchy using inheritance to represent the response status that we get back from a service call. The code was along these lines: public class ResponseStatus { public static readonly ResponseStatus TransactionSuccessful = new TransactionSuccessful(); public static readonly ResponseStatus UnrecoverableError = new UnrecoverableError(); public virtual bool RedirectToErrorPage { get { return true; } } } public class UnrecoverableError : ResponseStatus { } public class TransactionSuccessful : ResponseStatus { public override bool RedirectToErrorPage { get { return false; } } } Looking at it now it does seem a bit over-engineered, but the main confusion with this code is that when you click through to the definition of 'RedirectToError' it goes to the ResponseStatus version of that property and it’s not obvious that it is being overridden in a sub class, this being possible due to my use of the virtual key word. Adding humour to Tester/Developer collaboration https://www.markhneedham.com/blog/2009/05/04/adding-humour-to-testerdeveloper-collaboration/ Mon, 04 May 2009 23:43:03 +0000 https://www.markhneedham.com/blog/2009/05/04/adding-humour-to-testerdeveloper-collaboration/ Pat Kua has a recent post where he talks about the language used between testers and developers when talking about defects that testers come across when testing some functionality and while I would agree with him that the language used is important, I’ve always found that injecting some humour into the situation takes the edge off. As Dahlia points out I think this is probably only possible if there is good rapport between the developers and testers on the team so perhaps this has been the case for the teams I’ve worked on. Pair Programming: When your pair steps away https://www.markhneedham.com/blog/2009/05/03/pair-programming-when-your-pair-steps-away/ Sun, 03 May 2009 19:08:27 +0000 https://www.markhneedham.com/blog/2009/05/03/pair-programming-when-your-pair-steps-away/ I’ve been having a bit of a discussion recently with some of my colleagues about what we should do when pair programming and one of the people in the pair has to step away to go and help someone else or to take part in an estimation session or whatever it happens to be. If we’re pairing in an effective way then it should be possible for the person still at the computer to keep on going on the story/task that the pair were working on alone. F#: Stuff I get confused about https://www.markhneedham.com/blog/2009/05/02/f-stuff-i-get-confused-about/ Sat, 02 May 2009 14:38:36 +0000 https://www.markhneedham.com/blog/2009/05/02/f-stuff-i-get-confused-about/ Coming from the world of C# I’ve noticed that there are a couple of things that I sometimes get confused about when playing around with stuff in F# land. Passing arguments to functions The way that we pass arguments to functions seems to be a fairly constant cause of confusion at the moment especially when doing that as part of a chain of other expressions where the use of brackets starts to become necessary. F#: Entry point of an application https://www.markhneedham.com/blog/2009/05/02/f-entry-point-of-an-application/ Sat, 02 May 2009 01:56:09 +0000 https://www.markhneedham.com/blog/2009/05/02/f-entry-point-of-an-application/ In an attempt to see whether or not the mailboxes I’ve been working on for my twitter application were actually processing messages on different threads I ran into the problem of defining the entry point of an F# application. I thought it would be as simple as defining a function called 'main' but I put this function into my code ran the executable and nothing happened! Googling the problem a bit led me to believe that it is possible to do but that the function needs to be the last thing that happens in the compilation sequence of the project. F#: Erlang style messaging passing https://www.markhneedham.com/blog/2009/05/02/f-erlang-style-messaging-passing/ Sat, 02 May 2009 01:53:56 +0000 https://www.markhneedham.com/blog/2009/05/02/f-erlang-style-messaging-passing/ As I mentioned in my previous post about over loading methods in F# I’ve been trying to refactor my twitter application into a state where it can concurrently process twitter statuses while continuing to retrieve more of them from the twitter website. I played around a bit with Erlang last year and one thing that I quite liked is the message passing between processes to allow operations to be performed concurrently. Coding Dojo #13: TDD as if you meant it https://www.markhneedham.com/blog/2009/04/30/coding-dojo-13-tdd-as-if-you-meant-it/ Thu, 30 Apr 2009 06:12:41 +0000 https://www.markhneedham.com/blog/2009/04/30/coding-dojo-13-tdd-as-if-you-meant-it/ We decided to follow Keith Braithwaite’s 'TDD as if you meant it' exercise which he led at the Software Craftsmanship Conference and which I originally read about on Gojko Adzic’s blog. We worked on implementing a Flash Message interceptor, to hook into the Spring framework, that one of my colleague’s has been working on - the idea is to show a flash method to the user, that message being stored in the session on a Post and then removed on a Get in the 'Post-Redirect-Get' cycle. F#: Overloading functions/pattern matching https://www.markhneedham.com/blog/2009/04/28/f-overloading-functionspattern-matching/ Tue, 28 Apr 2009 23:43:22 +0000 https://www.markhneedham.com/blog/2009/04/28/f-overloading-functionspattern-matching/ While trying to refactor my twitter application into a state where I could use Erlang style message passing to process some requests asynchronously while still hitting twitter to get more messages I came across the problem of wanting to overload a method. By default it seems that you can’t do method overloading in F# unless you make use of the OverloadID attribute which I learnt about from reading Scott Seely’s blog post: Coding: Weak/Strong APIs https://www.markhneedham.com/blog/2009/04/27/coding-weakstrong-apis/ Mon, 27 Apr 2009 20:30:52 +0000 https://www.markhneedham.com/blog/2009/04/27/coding-weakstrong-apis/ An interesting problem that I’ve come across a few times in the last couple of week centres around how strongly typed we should make the arguments to public methods on our objects. There seem to be benefits and drawbacks with each approach so I’m not sure which approach is better - it possibly depends on the context. When we have a strong API the idea is that we pass an object as the argument to a method on another object. F#: Not equal/Not operator https://www.markhneedham.com/blog/2009/04/25/f-not-equalnot-operator/ Sat, 25 Apr 2009 22:12:43 +0000 https://www.markhneedham.com/blog/2009/04/25/f-not-equalnot-operator/ While continuing playing with my F# twitter application I was trying to work out how to exclude the tweets that I posted from the list that gets displayed. I actually originally had the logic the wrong way round so that it was only showing my tweets! let excludeSelf (statuses:seq<TwitterStatus>) = statuses |> Seq.filter (fun eachStatus -> eachStatus.User.ScreenName.Equals("markhneedham")) Coming from the world of Java and C# '!' would be the operator to find the screen names that don’t match my own name. Writing unit tests can be fun https://www.markhneedham.com/blog/2009/04/25/writing-unit-tests-can-be-fun/ Sat, 25 Apr 2009 19:51:10 +0000 https://www.markhneedham.com/blog/2009/04/25/writing-unit-tests-can-be-fun/ I recently came across Pavel Brodzinski’s blog and while browsing through some of his most recent posts I came across one discussing when unit testing doesn’t work. The majority of what Pavel says I’ve seen happen before on projects I’ve worked on but I disagree with his suggestion that writing unit tests is boring: Writing unit tests is boring. That’s not amusing or challenging algorithmic problem. That’s not cool hacking trick which you can show off with in front of your geeky friends. OO with a bit of functional mixed in https://www.markhneedham.com/blog/2009/04/25/oo-with-a-bit-of-functional-mixed-in/ Sat, 25 Apr 2009 11:14:12 +0000 https://www.markhneedham.com/blog/2009/04/25/oo-with-a-bit-of-functional-mixed-in/ From my experiences playing around with F# and doing a bit of functional C# I’m beginning to think that the combination of functional and object oriented programming actually results in code which I think is more expressive and easy to work with than code written only with an object oriented approach in mind. I’m also finding it much more fun to write code this way! In a recent post Dean Wampler questions whether the supremacy of object oriented programming is over before going on to suggest that the future is probably going to be a mix of functional programming and object oriented programming. Pimp my architecture - Dan North https://www.markhneedham.com/blog/2009/04/25/pimp-my-architecture-dan-north/ Sat, 25 Apr 2009 01:26:38 +0000 https://www.markhneedham.com/blog/2009/04/25/pimp-my-architecture-dan-north/ My colleague Dan North presented a version of a talk he first did at QCon London titled 'Pimp my architecture' at the ThoughtWorks Sydney community college on Wednesday night. He’ll also be presenting it at JAOO in Sydney and Brisbane in a couple of weeks time. The slides for the talk are here and it’s also available on InfoQ. What did I learn? I quite liked the way the talk was laid out - Dan laid out a series of problems that he’s seen on some projects he’s worked on and then showed on the next slide where he planned to take the architecture. DDD: Making implicit concepts explicit https://www.markhneedham.com/blog/2009/04/23/ddd-making-implicit-concepts-explicit/ Thu, 23 Apr 2009 12:36:25 +0000 https://www.markhneedham.com/blog/2009/04/23/ddd-making-implicit-concepts-explicit/ One of my favourite parts of the Domain Driven Design book is where Eric Evans talks about making implicit concepts in our domain model explicit. The book describes this process like so: Many transformations of domain models and the corresponding code happen when developers recognize a concept that has been hinted at in discussion or present implicitly in the design, and they then represent it explicitly in the model with one or more objects or relationships. The Five Dysfunctions of a Team: Book Review https://www.markhneedham.com/blog/2009/04/22/the-five-dysfunctions-of-a-team-book-review/ Wed, 22 Apr 2009 06:50:43 +0000 https://www.markhneedham.com/blog/2009/04/22/the-five-dysfunctions-of-a-team-book-review/ The Book The Five Dysfunctions of a Team by Patrick Lencioni The Review I heard about this book a while ago but I was intrigued to actually get a copy by Darren Cotterill, the Iteration Manager on the project I’m working on at the moment. I was particularly interested in learning whether the ideas of agile and/or lean help to solve any of these dysfunctions. What did I learn? The book is split into two sections. Learning through teaching https://www.markhneedham.com/blog/2009/04/21/learning-through-teaching/ Tue, 21 Apr 2009 07:38:36 +0000 https://www.markhneedham.com/blog/2009/04/21/learning-through-teaching/ I’ve been watching one of the podcasts recorded from the Alt.NET Houston conference titled 'Why blog and open source' and one of the interesting ideas that stood out amongst the opinions express is that people write about their experience in order to understand the topics better themselves. I’ve found this to be a very valuable way of learning - in fact it’s probably more beneficial to the teacher than the student, somewhat ironically. Coding: Applying levels of abstraction https://www.markhneedham.com/blog/2009/04/19/coding-applying-levels-of-abstraction/ Sun, 19 Apr 2009 23:03:01 +0000 https://www.markhneedham.com/blog/2009/04/19/coding-applying-levels-of-abstraction/ One interesting situation that we often arrive at when writing code is working out when the best time to apply a level of abstraction is. I think there is always a trade off to be made when it comes to creating abstractions - creating the abstraction adds to the complexity of the code we’re writing but it is often the case that creating it makes it easier for us to navigate the code base. I don't have time not to test! https://www.markhneedham.com/blog/2009/04/18/i-dont-have-time-not-to-test/ Sat, 18 Apr 2009 09:25:17 +0000 https://www.markhneedham.com/blog/2009/04/18/i-dont-have-time-not-to-test/ I recently read a blog post by Joshua Lockwood where he spoke of some people who claim they don’t have time to test. Learning the TDD approach to writing code has been one of best things that I’ve learnt over the last few years - before I worked at ThoughtWorks I didn’t know how to do it and the only way I could verify whether something worked was to load up the application and manually check it. F#: Refactoring that little twitter application into objects https://www.markhneedham.com/blog/2009/04/18/f-refactoring-that-little-twitter-application-into-objects/ Sat, 18 Apr 2009 08:47:06 +0000 https://www.markhneedham.com/blog/2009/04/18/f-refactoring-that-little-twitter-application-into-objects/ I previously wrote about a little twitter application I’ve been writing to go through my twitter feed and find only the tweets with links it and while it works I realised that I was finding it quite difficult to add any additional functionality to it. I’ve been following the examples in Real World Functional Programming which has encouraged an approach of creating functions to do everything that you want to do and then mixing them together. Coding Dojo #12: F# https://www.markhneedham.com/blog/2009/04/16/coding-dojo-12-f/ Thu, 16 Apr 2009 18:20:50 +0000 https://www.markhneedham.com/blog/2009/04/16/coding-dojo-12-f/ In our latest coding dojo we worked on trying to port some of the functionality of some C# 1.0 brain models, and in particular one around simulating chaos behaviour, that Dave worked on at university. The Format This was more of an experimental dojo since everyone was fairly new to F# so we didn’t rotate the pair at the keyboard as frequently as possible. What We Learnt The aim of the session was to try and put some unit tests around the C# code and then try and replace that code with an F# version of it piece by piece. Lean: Big Picture over Local Optimisations https://www.markhneedham.com/blog/2009/04/14/lean-big-picture-over-local-optimisations/ Tue, 14 Apr 2009 22:10:13 +0000 https://www.markhneedham.com/blog/2009/04/14/lean-big-picture-over-local-optimisations/ I recently finished reading Lean Thinking and one of the things that was repeatedly emphasised is the need to look at the process as a whole rather than trying to optimise each part individually. If we phrased this in a similar way to the Agile Manifesto it would probably read 'Big Picture over Local Optimisations'. The examples in Lean Thinking tend to be more manufacturing focused but I think this idea can certainly be applied in thinking about software projects too. F#: A day of writing a little twitter application https://www.markhneedham.com/blog/2009/04/13/f-a-day-of-writing-a-little-twitter-application/ Mon, 13 Apr 2009 22:09:37 +0000 https://www.markhneedham.com/blog/2009/04/13/f-a-day-of-writing-a-little-twitter-application/ I spent most of the bank holiday Monday here in Sydney writing a little application to scan through my twitter feed and find me just the tweets which have links in them since for me that’s where a lot of the value of twitter lies. I’m sure someone has done this already but it seemed like a good opportunity to try and put a little of the F# that I’ve learned from reading Real World Functional Programming to use. TDD: Balancing DRYness and Readability https://www.markhneedham.com/blog/2009/04/13/tdd-balancing-dryness-and-readability/ Mon, 13 Apr 2009 00:47:00 +0000 https://www.markhneedham.com/blog/2009/04/13/tdd-balancing-dryness-and-readability/ I wrote previously about creating DRY tests and after some conversations with my colleagues recently about the balance between reducing duplication but maintaining readability I think I’ve found the compromise between the two that works best for me. The underlying idea is that in any unit test I want to be aiming for a distinct 3 sections in the test - Given/When/Then, Arrange/Act/Assert or whatever your favourite description for those is. Pair Programming: The Code Fairy https://www.markhneedham.com/blog/2009/04/10/pair-programming-the-code-fairy/ Fri, 10 Apr 2009 19:28:18 +0000 https://www.markhneedham.com/blog/2009/04/10/pair-programming-the-code-fairy/ One of the hardest situations that comes up when pair programming is when you want to solve a problem in a certain way but you can’t persuade your pair that it’s the approach you should take. The temptation in these situations is to wait until your pair isn’t around, maybe by staying late at the end of the day or coming in early the next day and then making the changes to the code that you wanted to make but didn’t when you were pairing with them. Coding: Passing booleans into methods https://www.markhneedham.com/blog/2009/04/08/coding-passing-booleans-into-methods/ Wed, 08 Apr 2009 05:43:43 +0000 https://www.markhneedham.com/blog/2009/04/08/coding-passing-booleans-into-methods/ In a post I wrote a couple of days ago about understanding the context of a piece of code before criticising it, one of the examples that I used of a time when it seems fine to break a rule was passing a boolean into a method to determine whether or not to show an editable version of a control on the page. Chatting with Nick about this yesterday it became clear to me that I’ve missed one important reason why you’d not want to pass a boolean into a method. DDD: Only for complex projects? https://www.markhneedham.com/blog/2009/04/06/ddd-only-for-complex-projects/ Mon, 06 Apr 2009 19:21:55 +0000 https://www.markhneedham.com/blog/2009/04/06/ddd-only-for-complex-projects/ One of the things I find a bit confusing when it comes to Domain Driven Design is that some of the higher profile speakers/user group contributors on the subject have expressed the opinion that DDD is more suitable when we are dealing with complex projects. I think this means complex in terms of the domain but I’ve certainly worked on some projects where we’ve been following certainly some of the ideas of DDD and have got some value out of doing so in domains which I wouldn’t say were particularly complex. Coding: It's all about the context https://www.markhneedham.com/blog/2009/04/05/coding-criticising-without-context/ Sun, 05 Apr 2009 19:45:56 +0000 https://www.markhneedham.com/blog/2009/04/05/coding-criticising-without-context/ I think one of the easiest things to do as a developer is to look at some code that you didn’t write and then start trashing it for all the supposed mistakes that the author has made that you wouldn’t have. It’s certainly something I’ve been guilty of doing and probably will be again in the future. Sometimes it’s justified but most of the time we lack the context for understanding why the code was written the way it was and therefore our criticism is not very useful to anyone. Functional C#: The hole in the middle pattern https://www.markhneedham.com/blog/2009/04/04/functional-c-the-hole-in-the-middle-pattern/ Sat, 04 Apr 2009 11:41:23 +0000 https://www.markhneedham.com/blog/2009/04/04/functional-c-the-hole-in-the-middle-pattern/ While reading Real World Functional Programming I came across an interesting pattern that I have noticed in some code bases recently which I liked but didn’t know had been given a name! The hole in the middle pattern, coined by Brian Hurt, shows a cool way of using higher order functions in order to reuse code in cases where the code typically looks something like this: public void SomeServiceCall() { var serviceClient = CreateServiceClient(); try { serviceClient. TDD: Testing mapping code https://www.markhneedham.com/blog/2009/04/02/tdd-testing-mapping-code/ Thu, 02 Apr 2009 23:11:12 +0000 https://www.markhneedham.com/blog/2009/04/02/tdd-testing-mapping-code/ I’ve previously written about some of the aspects of the mapping efforts that we’ve done on recent projects and what we’ve found from our testing (or lack of) around this type of code is that somewhere along the line you are going to have to check that you’re mapping these values correctly, be it in an automated test or just by manually checking that the correct values are being sent across our integration points and into other systems. Pair Programming: Slowly but surely https://www.markhneedham.com/blog/2009/03/31/pair-programming-slowly-but-surely/ Tue, 31 Mar 2009 23:15:28 +0000 https://www.markhneedham.com/blog/2009/03/31/pair-programming-slowly-but-surely/ I recently watched a video recorded by Uncle Bob at the Chicago Alt.NET meeting where amongst other things he talked about the importance of going slowly but surely when we’re developing code i.e. spending the time to get it right first time instead of rushing through and having to go back and fix our mistakes. While pairing with a colleague recently it became clear to me that pair programming, when done well, drives you towards a state where you are being much more careful about the work being produced. DDD: Recognising relationships between bounded contexts https://www.markhneedham.com/blog/2009/03/30/ddd-recognising-relationships-between-bounded-contexts/ Mon, 30 Mar 2009 22:52:52 +0000 https://www.markhneedham.com/blog/2009/03/30/ddd-recognising-relationships-between-bounded-contexts/ One of the big takeaways for me from the Domain Driven Design track at the recent QCon London conference was that the organisational patterns in the second half of the book are probably more important than the actual patterns themselves. There are various patterns used to describe the relationships between different bounded contexts: Shared Kernel - This is where two teams share some subset of the domain model. This shouldn’t be changed without the other team being consulted. Pair Programming: From a Lean angle https://www.markhneedham.com/blog/2009/03/29/pair-programming-from-a-lean-angle/ Sun, 29 Mar 2009 16:54:05 +0000 https://www.markhneedham.com/blog/2009/03/29/pair-programming-from-a-lean-angle/ I recently watched a presentation about lean thinking and I started seeing parallels in a lot of what they were saying with the benefits that I believe we see in projects when the team pair programs. Big Picture vs Local Optimisations One of the biggest arguments used against pair programming is that we get half as much work done because we have two people working on one computer. Even if we ignore the immediate flaws in that argument I think this is a case of looking at individual productivity when in fact what we really care about is the team’s productivity i. F#: Forcing type to unit for Assert.ShouldThrow in XUnit.NET https://www.markhneedham.com/blog/2009/03/28/f-forcing-type-to-unit-for-assertshouldthrow-in-xunitnet/ Sat, 28 Mar 2009 02:35:27 +0000 https://www.markhneedham.com/blog/2009/03/28/f-forcing-type-to-unit-for-assertshouldthrow-in-xunitnet/ I’ve started playing around with F# again and decided to try and create some unit tests around the examples I’m following from Real World Functional Programming. After reading Matt Podwysocki’s blog post about XUnit.NET I decided that would probably be the best framework for me to use. The example I’m writing tests around is: let convertDataRow(str:string) = let cells = List.of_seq(str.Split([|','|])) match cells with | label::value::_ -> let numericValue = Int32. Coding: Isolate the data not just the endpoint https://www.markhneedham.com/blog/2009/03/25/coding-isolate-the-data-not-just-the-endpoint/ Wed, 25 Mar 2009 23:28:42 +0000 https://www.markhneedham.com/blog/2009/03/25/coding-isolate-the-data-not-just-the-endpoint/ One of the fairly standard ways of shielding our applications when integrating with other systems is to create a wrapper around it so that all interaction with it is in one place. As I mentioned in a previous post we have been using the repository pattern to achieve this in our code. One service which we needed to integrate lately provided data for populating data on drop downs on our UI so the service provided two pieces of data - a Value (which needed to be sent to another service when a certain option was selected) and a Label (which was the value for us to display on the screen). QTB: Lean Times Require Lean Thinking https://www.markhneedham.com/blog/2009/03/25/qtb-lean-times-require-lean-thinking/ Wed, 25 Mar 2009 00:36:09 +0000 https://www.markhneedham.com/blog/2009/03/25/qtb-lean-times-require-lean-thinking/ I went to watch the latest ThoughtWorks Quarterly Technology Briefing on Tuesday, which was presented by my colleague Jason Yip and Paul Heaton, titled 'Lean Times Require Lean Thinking' I’ve been reading quite a bit of lean related material lately but I thought it would be interesting to hear about it directly from the perspective of two people who have been involved with applying the concepts in organisations. What did I learn? ASP.NET MVC: Pre-compiling views when using SafeEncodingCSharpCodeProvider https://www.markhneedham.com/blog/2009/03/24/aspnet-mvc-pre-compiling-views-when-using-safeencodingcsharpcodeprovider/ Tue, 24 Mar 2009 22:55:41 +0000 https://www.markhneedham.com/blog/2009/03/24/aspnet-mvc-pre-compiling-views-when-using-safeencodingcsharpcodeprovider/ We’ve been doing some work to get our views in ASP.NET MVC to be pre-compiled which allows us to see any errors in them at compilation rather than at run time. It’s relatively simple to do. You just need to add the following code into your .csproj file anywhere below the element: <Target Name="AfterBuild"> <AspNetCompiler VirtualPath="/" PhysicalPath="$(ProjectDir)\..\$(ProjectName)"/> </Target> where VirtualPath refers to the virtual path defined inside your project file and PhysicalPath is the path to the folder which contains the project with the views in. Coding: Making the debugger redundant https://www.markhneedham.com/blog/2009/03/22/coding-making-the-debugger-redundant/ Sun, 22 Mar 2009 19:52:31 +0000 https://www.markhneedham.com/blog/2009/03/22/coding-making-the-debugger-redundant/ I recently wrote my dislike of the debugger and related to this, I spent some time last year watching some videos from JAOO 2007 on MSDN’s Channel 9. One of my favourites is an interview featuring Joe Armstrong and Eric Meijer where Joe Armstrong points out that when coding Erlang he never has to use a debugger because state is immutable. In Erlang, once you set the value of a variable 'x' it cannot be changed. Lean Thinking: Book Review https://www.markhneedham.com/blog/2009/03/21/lean-thinking-book-review/ Sat, 21 Mar 2009 10:36:52 +0000 https://www.markhneedham.com/blog/2009/03/21/lean-thinking-book-review/ The Book Lean Thinking by James P. Womack and Daniel T. Jones The Review This is the latest book in my lean learning after The Toyota Way, Taiichi Ohno’s Workplace Management and Lean Software Development and seemed like the most logical one to read next as it came at lean from a slightly different angle. I found this the most hard going of the books I’ve read on the subject so far. Coding: Reassessing what the debugger is for https://www.markhneedham.com/blog/2009/03/20/coding-reassessing-what-the-debugger-is-for/ Fri, 20 Mar 2009 21:39:56 +0000 https://www.markhneedham.com/blog/2009/03/20/coding-reassessing-what-the-debugger-is-for/ When I first started programming in a 'proper' IDE one of the things that I thought was really cool was the ability to debug through my code whenever something wasn’t working quite the way I expected it to. Now the debugger is not a completely pointless tool - indeed there is sometimes no other easy way to work out what’s going wrong - but I think it now becomes the default problem solver whenever a bit of code is not working as we expect it to. Re-reading books https://www.markhneedham.com/blog/2009/03/19/re-reading-books/ Thu, 19 Mar 2009 10:49:30 +0000 https://www.markhneedham.com/blog/2009/03/19/re-reading-books/ An interesting thing that I’ve started to notice recently with regards to software development books is that I get a lot more from reading the book the second time compared to what I did reading the book the first time. I’ve noticed this for several books, including The Pragmatic Programmer, Code Complete and Domain Driven Design, so my first thought was that perhaps I had read this books too early when I didn’t have the necessary context or experience to gain value from reading them. Coding: Make it obvious https://www.markhneedham.com/blog/2009/03/18/coding-make-it-obvious/ Wed, 18 Mar 2009 10:44:48 +0000 https://www.markhneedham.com/blog/2009/03/18/coding-make-it-obvious/ One of the lessons that I’ve learned the more projects I work on is that the most important thing to do when coding is to do so in a way that you make life easier for the next person who has to come across that code, be it yourself or one of your team mates. I think the underlying idea is that we need to make things as obvious as possible. QCon London 2009: The Power of Value - Power Use of Value Objects in Domain Driven Design - Dan Bergh Johnsson https://www.markhneedham.com/blog/2009/03/15/qcon-london-2009-the-power-of-value-power-use-of-value-objects-in-domain-driven-design-dan-bergh-johnsson/ Sun, 15 Mar 2009 09:45:19 +0000 https://www.markhneedham.com/blog/2009/03/15/qcon-london-2009-the-power-of-value-power-use-of-value-objects-in-domain-driven-design-dan-bergh-johnsson/ The final Domain Driven Design talk I attended at QCon was by Dan Bergh Johnsson about the importance of value objects in our code. I thought this session fitted in really well as a couple of the previous speakers had spoken of the under utilisation of value objects. The slides for the presentation are here. What did I learn? Dan started the talk by outlining the goal for the presentation which was to 'show how power use of value objects can radically change design and code, hopefully for the better'. QCon London 2009: Rebuilding guardian.co.uk with DDD - Phil Wills https://www.markhneedham.com/blog/2009/03/14/qcon-london-2009-rebuilding-guardiancouk-with-ddd-phil-wills/ Sat, 14 Mar 2009 14:23:43 +0000 https://www.markhneedham.com/blog/2009/03/14/qcon-london-2009-rebuilding-guardiancouk-with-ddd-phil-wills/ Talk #3 on the Domain Driven Design track at QCon was by Phil Wills about how the Guardian rebuilt their website using Domain Driven Design. I’d heard a little bit about this beforehand from colleagues who had the chance to work on that project but it seemed like a good opportunity to hear a practical example and the lessons learned along the way. There are no slides available for this one on the QCon website at the moment. QCon London 2009: DDD & BDD - Dan North https://www.markhneedham.com/blog/2009/03/14/qcon-london-2009-ddd-bdd-dan-north/ Sat, 14 Mar 2009 01:28:04 +0000 https://www.markhneedham.com/blog/2009/03/14/qcon-london-2009-ddd-bdd-dan-north/ The second presentation in the Domain Driven Design track at QCon was titled 'DDD & BDD' and was presented by my colleague Dan North - a late stand in for Greg Young who apparently injured himself playing ice hockey. Eric did an interview with Greg at QCon San Francisco 2007 where Greg talks about some of his ideas and apparently there is an InfoQ video kicking around of Greg’s 'Unshackle Your Domain' talk from QCon San Francisco 2008 which we were told to pester InfoQ to post on their website! QCon London 2009: What I've learned about DDD since the book - Eric Evans https://www.markhneedham.com/blog/2009/03/13/qcon-london-2009-what-ive-learned-about-ddd-since-the-book-eric-evans/ Fri, 13 Mar 2009 20:56:07 +0000 https://www.markhneedham.com/blog/2009/03/13/qcon-london-2009-what-ive-learned-about-ddd-since-the-book-eric-evans/ I went to the QCon conference in London on Thursday, spending the majority of the day on Eric Evans' Domain Driven Design track. The opening presentation was by Eric Evans, himself, and was titled 'What I’ve learned about DDD since the book'. In the 5 years since the book was published, I’ve practiced DDD on various client projects, and I’ve continued to learn about what works, what doesn’t work, and how to conceptualize and describe it all. OO: Reducing the cost of...lots of stuff! https://www.markhneedham.com/blog/2009/03/12/oo-reducing-the-cost-oflots-of-stuff/ Thu, 12 Mar 2009 04:04:22 +0000 https://www.markhneedham.com/blog/2009/03/12/oo-reducing-the-cost-oflots-of-stuff/ I’ve been working in the world of professional software development for a few years now and pretty much take it as a given that the best way to write code which is easy for other people to understand and work with is to write that code in an object oriented way. Not everyone agrees with this approach of course and I’ve been told on occasions that I’m 'over object orienting' (is that even a word? OO: Micro Types https://www.markhneedham.com/blog/2009/03/10/oo-micro-types/ Tue, 10 Mar 2009 22:40:57 +0000 https://www.markhneedham.com/blog/2009/03/10/oo-micro-types/ Micro or Tiny types present an approach to coding which seems to divide opinion in my experience, from those who think it’s a brilliant idea to those who believe it’s static typing gone mad. I fall into the former group. So what is it? The idea is fairly simple - all primitives and strings in our code are wrapped by a class, meaning that we never pass primitives around. DDD: Repository pattern https://www.markhneedham.com/blog/2009/03/10/ddd-repository-not-only-for-databases/ Tue, 10 Mar 2009 10:31:27 +0000 https://www.markhneedham.com/blog/2009/03/10/ddd-repository-not-only-for-databases/ The Repository pattern from Domain Driven Design is one of the cleanest ways I have come across for separating our domain objects from their persistence mechanism. Until recently every single implementation I had seen of this pattern involved directly using a database as the persistence mechanism with the repository acting as a wrapper around the Object Relational Mapper (Hibernate/NHibernate). Now I consider there to be two parts to the repository pattern: DDD: Bounded Contexts https://www.markhneedham.com/blog/2009/03/07/ddd-bounded-contexts/ Sat, 07 Mar 2009 10:03:38 +0000 https://www.markhneedham.com/blog/2009/03/07/ddd-bounded-contexts/ I’ve been reading Casey Charlton’s excellent series of posts on Domain Driven Design recently and today came across his thoughts about which types of applications Domain Driven Design is suited to. Towards the end of the post he talks about the fact that there is a lot of excellent ideas in Domain Driven Design even if you don’t have the chance to use all of them. ...there is a wealth of wisdom and experience encapsulated in Domain Driven Design — use what you think applies to your situation, and you will find your software becoming more flexible, more reactive to your audience, and easier to understand — just don’t expect miracles, and beware of over complicating your code for the sake of it — sometimes simpler really is better. Coding Dojo #11: Javascript Isola https://www.markhneedham.com/blog/2009/03/06/coding-dojo-11-javascript-isola/ Fri, 06 Mar 2009 06:38:42 +0000 https://www.markhneedham.com/blog/2009/03/06/coding-dojo-11-javascript-isola/ In our latest coding dojo we attempted to code Isola in Javascript but instead of coding from the board inwards we decided to try and take the approach of coding from the cells outwards to keep it interesting. My colleague brought in his copy of the game and it made it much easier to imagine how we should be modeling it by having the game in front of us. Coding: Good Citizens https://www.markhneedham.com/blog/2009/03/04/coding-good-citizens/ Wed, 04 Mar 2009 23:58:48 +0000 https://www.markhneedham.com/blog/2009/03/04/coding-good-citizens/ I was recently reading Brad Cross' recent post about creating objects which are Good Citizens in code and he certainly nails one aspect of this with regards to ensuring that our objects are in a usable state post construction. In OO design, an object is considered to be a good citizen if it is in a fully composed and usable state post-construction. This means that once the constructor exits, the class is ready to use - without the need to call additional setters or init() methods. ASP.NET MVC: Reducing duplication for partial models https://www.markhneedham.com/blog/2009/03/03/aspnet-mvc-using-adaptors-for-partial-models/ Tue, 03 Mar 2009 23:55:36 +0000 https://www.markhneedham.com/blog/2009/03/03/aspnet-mvc-using-adaptors-for-partial-models/ One of the problems we can encounter when using partials throughout our views is how we should create the model needed for those partials. The approach that we have been following is to have the partial/child model on the parent model and then just call the appropriate method where we create the partial. e.g. public class ParentModel { public string Property1 {get;set;} public ChildModel ChildModel { get;set; } } public class ChildModel { public string Property1 {get;set;} } We have sometimes run into the problem where the data in the ChildModel is being populated from the ParentModel (due to it also being needed there) leading to data duplication. Trade Offs: Some Thoughts https://www.markhneedham.com/blog/2009/03/02/trade-offs-some-thoughts/ Mon, 02 Mar 2009 23:01:11 +0000 https://www.markhneedham.com/blog/2009/03/02/trade-offs-some-thoughts/ As we know with software development with pretty much every decision we make or technology we choose there is a trade off that goes with making this choice as compared with choosing an alternative. I first learnt this when working with Ade a couple of years ago and while I know it to be true, I had come to believe that some practices are just non-negotiable and we should look to apply them judiciously wherever possible. NUnit: Tests with Context/Spec style assertions https://www.markhneedham.com/blog/2009/03/01/nunit-tests-with-contextspec-style-assertions/ Sun, 01 Mar 2009 16:43:46 +0000 https://www.markhneedham.com/blog/2009/03/01/nunit-tests-with-contextspec-style-assertions/ I recently started playing around with Scott Bellware’s Spec-Unit and Aaron’s Jensen’s MSpec, two frameworks which both provide a way of writing Context/Spec style tests/specifications. What I particularly like about this approach to writing tests is that we can divide assertions into specific blocks and have them all evaluated even if an earlier one fails. NUnit is our testing tool of choice at the moment and we wanted to try and find a way to test the mapping between the domain and service layers of the application. Coding: Implicit vs Explicit modeling https://www.markhneedham.com/blog/2009/02/28/coding-implicit-vs-explicit-modeling/ Sat, 28 Feb 2009 09:50:45 +0000 https://www.markhneedham.com/blog/2009/02/28/coding-implicit-vs-explicit-modeling/ When it comes to object modeling there seem to be two distinct approaches that I have come across. Implicit modeling The first approach is where we do what I like to think of as implicit modeling. With this approach we would probably use less objects than in the explicit approach and we would have objects being populated as we moved through the work flow of our application. I call it implicit modeling because we need to imply where we are based on the internal state of our objects - we can typically work this out by seeing what is and is not set to null. Coding: Using 'ToString' https://www.markhneedham.com/blog/2009/02/26/coding-using-tostring/ Thu, 26 Feb 2009 23:43:20 +0000 https://www.markhneedham.com/blog/2009/02/26/coding-using-tostring/ An interesting conversation I’ve had recently with some of my colleagues is around the use of the ToString method available on all objects created in Java or C#. It was also pointed out in the comments on my recent post about wrapping DateTimes in our code. I think the original intention of this method was to create a string representation of an object, but its use has been overloaded by developers to the point where its expected use is as a mechanism for creating nice output when debugging the code or viewing unit test failures. C#: Wrapping DateTime https://www.markhneedham.com/blog/2009/02/25/c-wrapping-datetime/ Wed, 25 Feb 2009 23:12:57 +0000 https://www.markhneedham.com/blog/2009/02/25/c-wrapping-datetime/ I think it was Darren Hobbs who first introduced me to the idea of wrapping dates in our system to describe what that date actually means in our context, and after suffering the pain of passing some unwrapped dates around our code I think I can safely say that wrapping them is the way to go. The culprit was a date of birth which was sometimes being created from user input and sometimes being retrieved from another system. C#: Wrapping collections vs Extension methods https://www.markhneedham.com/blog/2009/02/23/c-wrapping-collections-vs-extension-methods/ Mon, 23 Feb 2009 20:24:26 +0000 https://www.markhneedham.com/blog/2009/02/23/c-wrapping-collections-vs-extension-methods/ Another interesting thing I’ve noticed in C# world is that there seems to be a trend towards using extension methods as much as possible. One area where this is particularly prevalent is when working with collections. From reading Object Calisthenics and working with Nick I have got used to wrapping collections and defining methods on the wrapped class for interacting with the underlying collection. For example, given that we have a collection of Foos that we need to use in our system we might wrap that in an object Foos. C#: Implicit Operator https://www.markhneedham.com/blog/2009/02/22/c-implicit-operator/ Sun, 22 Feb 2009 22:20:22 +0000 https://www.markhneedham.com/blog/2009/02/22/c-implicit-operator/ Since it was pointed out in the comments on an earlier post I wrote about using the builder pattern how useful the implicit operator could be in this context we’ve been using it wherever it makes sense. The main benefit that using this approach provides is that our test code becomes more expressive since we don’t need to explicitly call a method to complete the building of our object. ASP.NET MVC: Driving partials by convention https://www.markhneedham.com/blog/2009/02/21/aspnet-mvc-driving-partials-by-convention/ Sat, 21 Feb 2009 10:39:49 +0000 https://www.markhneedham.com/blog/2009/02/21/aspnet-mvc-driving-partials-by-convention/ I like to have conventions in the code I write - I find it makes the code i write much cleaner which still providing flexibility. One of the conventions that Jeremy Miller coined for working with ASP.NET MVC applications is that of using one model per controller method aka "The Thunderdome principle". I think we can take this further by having one model per partial that we use inside our views. Coding Dojo #10: Isola III https://www.markhneedham.com/blog/2009/02/19/coding-dojo-10-isola-iii/ Thu, 19 Feb 2009 23:09:33 +0000 https://www.markhneedham.com/blog/2009/02/19/coding-dojo-10-isola-iii/ In our latest coding dojo we continued working on Isola with a focus on adding functionality following on from last week’s refactoring effort. The Format We used the Randori approach with four people participating for the whole session. What We Learnt Our real aim for this session was to try and get the code into a state where we could reject an invalid move i.e. a move to a square that wasn’t adjacent to the one the player was currently on. C#: Extension methods != Open classes https://www.markhneedham.com/blog/2009/02/19/c-extensions-methods-open-classes/ Thu, 19 Feb 2009 06:22:07 +0000 https://www.markhneedham.com/blog/2009/02/19/c-extensions-methods-open-classes/ When I first heard about extension methods in C# it sounded like a pretty cool idea but I wasn’t sure how they differed to the idea of open classes that I had seen when doing a bit of Ruby. After a bit of a struggle recently to try and override some extension methods on HtmlHelper in ASP.NET MVC it’s clear to me that we don’t quite have the same power that open classes would provide. Collective Code Ownership: Some Thoughts https://www.markhneedham.com/blog/2009/02/17/collective-code-ownership-some-thoughts/ Tue, 17 Feb 2009 22:32:44 +0000 https://www.markhneedham.com/blog/2009/02/17/collective-code-ownership-some-thoughts/ Collective code ownership is one of the things we practice on projects using extreme programming and Mike Bria’s post on the subject makes me wonder if code ownership exists on more than one level. Kent Beck’s definition of collective code ownership is that Anyone can change anything at anytime Mike also gives an alternative definition which goes beyond that: From a more measurable POV, CoCO states that everyone on the team (developer-wise) must be able to describe the design of anything the team is working on in no more than 5 minutes. C#: Object Initializer and The Horse Shoe https://www.markhneedham.com/blog/2009/02/16/c-object-initializer-and-the-horse-shoe/ Mon, 16 Feb 2009 22:04:20 +0000 https://www.markhneedham.com/blog/2009/02/16/c-object-initializer-and-the-horse-shoe/ The object initializer syntax introduced in C# 3.0 makes it easier for us to initialise our objects in one statement but I think we need to remember that they are not named parameters and that there is still a place (a very good one actually) for creating objects from constructors or factory methods. Unfortunately what I think the cleaner syntax does is encourage us to create objects with half the fields populated and half of them null by default. Encoding user entered data https://www.markhneedham.com/blog/2009/02/15/encoding-user-entered-data/ Sun, 15 Feb 2009 01:46:33 +0000 https://www.markhneedham.com/blog/2009/02/15/encoding-user-entered-data/ I previously wrote about protecting websites from cross site scripting in the ASP.NET MVC framework by encoding user input when we are going to display it in the browser. We can either choose to encode data like this or we can encode it straight away when we get it. There did not seem to be a consensus on the best approach in a discussion on the ASP.NET forums but we believe it is far better to encode the data when it is outgoing rather than incoming. Coding: Assertions in constructors https://www.markhneedham.com/blog/2009/02/14/coding-assertions-in-constructors/ Sat, 14 Feb 2009 01:32:10 +0000 https://www.markhneedham.com/blog/2009/02/14/coding-assertions-in-constructors/ While browsing through the ASP.NET MVC source I noticed that they use an interesting pattern on the constructors to ensure that an exception will be thrown if an object is not instantiated correctly. public ControllerContext(HttpContextBase httpContext, RouteData routeData, ControllerBase controller) : base(httpContext, routeData) { if (controller == null) { throw new ArgumentNullException("controller"); } Controller = controller; } If you pass in a null Controller you shall go no further! Ferengi Programmer and the Dreyfus Model https://www.markhneedham.com/blog/2009/02/13/ferengi-programmer-and-the-dreyfus-model/ Fri, 13 Feb 2009 00:01:58 +0000 https://www.markhneedham.com/blog/2009/02/13/ferengi-programmer-and-the-dreyfus-model/ I’ve been reading Jeff Atwood’s post regarding Joel’s comments on the podcast about Uncle Bob’s SOLID principles and what struck me as I read through his dislike of having too many rules and guidelines is that there is a misunderstanding of how we should use these rules and I think at the heart of this understanding the Dreyfus Model might clear this up. To briefly recap the different levels of the Dreyfus Model (you can read more about this in Pragmatic Thinking and Learning) ASP.NET MVC: Preventing XSS attacks https://www.markhneedham.com/blog/2009/02/12/aspnet-mvc-preventing-xss-attacks/ Thu, 12 Feb 2009 22:47:30 +0000 https://www.markhneedham.com/blog/2009/02/12/aspnet-mvc-preventing-xss-attacks/ XSS(Cross site scripting) attacks on websites seem to be quite popular these days but luckily if you’re working with the ASP.NET MVC framework Steve Sanderson has written a great post on how to protect yourself from this. The solution Steve details works the opposite way to other solutions I have heard for this problem - we assume that everything that goes to the browser needs to be HTML encoded unless otherwise stated. Coding Dojo #9: Refactoring Isola https://www.markhneedham.com/blog/2009/02/12/coding-dojo-9-refactoring-isola/ Thu, 12 Feb 2009 21:46:23 +0000 https://www.markhneedham.com/blog/2009/02/12/coding-dojo-9-refactoring-isola/ Our latest coding dojo involved refactoring the code we wrote a couple of weeks ago for the board game Isola. We started a repository on Bit Bucket to store our code from these sessions. The Format We used the Randori approach again with four people participating for the whole session. What We Learnt Last time we had spent most of our time purely making the code functional so all the objects were completely mutable. C#: Properties vs Methods https://www.markhneedham.com/blog/2009/02/11/c-properties-vs-methods/ Wed, 11 Feb 2009 11:20:08 +0000 https://www.markhneedham.com/blog/2009/02/11/c-properties-vs-methods/ I was browsing through our tests today and noticed a test along these lines (simplified for example purposes): [Test, ExpectedException(typeof(Exception))] public void ShouldThrowExceptionIfNoBarSet() { var bar = new Foo(null).Bar; } public class Foo { private readonly string bar; public Foo(string bar) { this.bar = bar; } public string Bar { get { if (bar == null) { throw new Exception("No bar"); } return bar; } } } What I found strange here is that 'bar' is never used and Resharper points out as much. Agile: Re-estimating cards https://www.markhneedham.com/blog/2009/02/11/agile-re-estimating-cards/ Wed, 11 Feb 2009 07:25:50 +0000 https://www.markhneedham.com/blog/2009/02/11/agile-re-estimating-cards/ Chris Johnston has another interesting post in which he writes about the practice of re-estimating cards after they have been completed. I think this somewhat misses the point that the estimate is indeed supposed to be an estimate. It might turn out to be too optimistic or too pessimistic, the theory being that overall we will end up with a reasonable balance that will allow us to make a prediction on how much work we believe we can complete in a certain time period. Agile: What is it? https://www.markhneedham.com/blog/2009/02/09/agile-what-is-it/ Mon, 09 Feb 2009 17:06:02 +0000 https://www.markhneedham.com/blog/2009/02/09/agile-what-is-it/ My colleague Chris Johnston wrote recently about his experiences in agile software development, posing some questions that he has. Specifically: Why comments are evil? Why design is evil? Why must you pair all the time? Why Agile principles become Agile rules? Now I’m assuming that most (if not all) of Chris' experiences with agile have been at ThoughtWorks, in which case the mix of agile we use on our projects tends to be a combination of Scrum and Extreme Programming. OOP: What does an object's responsibility entail? https://www.markhneedham.com/blog/2009/02/09/oop-what-does-an-objects-responsibility-entail/ Mon, 09 Feb 2009 16:52:10 +0000 https://www.markhneedham.com/blog/2009/02/09/oop-what-does-an-objects-responsibility-entail/ One of the interesting discussions I’ve been having recently with some colleagues is around where the responsibility lies for describing the representation of an object when it is to be used in another bounded context - e.g. on the user interface or in a call to another system. I believe that an object should be responsible for deciding how its data is used rather than having another object reach into it, retrieve its data and then decide what to do with it. Quality is what I work for https://www.markhneedham.com/blog/2009/02/09/quality-is-what-i-work-for/ Mon, 09 Feb 2009 16:51:14 +0000 https://www.markhneedham.com/blog/2009/02/09/quality-is-what-i-work-for/ I’ve been reading the transcript of Joel Spolsky/Jeff Atwood’s podcast discussion on TDD/Quality and related posts on the subject by Uncle Bob and Ron Jeffries and while I guess it’s fairly inevitable that I’m likely to side with the latter two, what I’ve realised is that I get the greatest enjoyment from my job when we are writing high quality software. Certainly delivering value to customers in a timely manner is important but if we’re not producing something that we’re proud to have written then I think we’re doing ourselves and our customer a disservice. Refactoring: Comment it out vs small steps removal https://www.markhneedham.com/blog/2009/02/08/refactoring-comment-it-out-vs-small-steps-removal/ Sun, 08 Feb 2009 09:10:39 +0000 https://www.markhneedham.com/blog/2009/02/08/refactoring-comment-it-out-vs-small-steps-removal/ One refactoring I was doing last week was to try and remove the use of some getters/setters on one of our objects so that it was better encapsulated and all the behaviour related to it happened in one place. The change involved introducing a constructor to initialise the object rather than doing so using the new object initialiser syntax and initalising it using the properties. My initial approach was to find all the usages of these properties and then remove each usage one by one, running our suite of tests against the code after each change to ensure that nothing had broken as a result of the change. Agile: Why do we integrate early? https://www.markhneedham.com/blog/2009/02/06/agile-why-do-we-integrate-early/ Fri, 06 Feb 2009 16:47:26 +0000 https://www.markhneedham.com/blog/2009/02/06/agile-why-do-we-integrate-early/ One of the inevitabilities of most projects is that at some stage there is going to need be some sort of integration. The likes of Alistair Cockburn in Crystal Clear and Andy Hunt/Dave Thomas in The Pragmatic Programmer talk of the need to do integration early rather than letting it wait until later, but why? Get the pain out the way To some degree every time we try to integrate there is going to be some level of pain - for me it therefore makes sense that we take this pain early on when we have the chance to do something about it rather than leaving it until later and being surprised at the problems it causes. C#: Public fields vs automatic properties https://www.markhneedham.com/blog/2009/02/04/c-public-fields-vs-automatic-properties/ Wed, 04 Feb 2009 17:52:03 +0000 https://www.markhneedham.com/blog/2009/02/04/c-public-fields-vs-automatic-properties/ An interesting new feature in C# 3.0 is that of automatic properties on objects - this allows us to define a get/set property and the creation of the underlying field is taken care off for us. We can therefore create a class like this: public class Foo { public string Bar { get; set; } } Now ignoring the fact that it’s terrible OO to write a class like that, one thing that we’ve been wondering is what’s the difference between doing the above and just creating a public field on Foo called Bar like so: Nant include task - namespace matters https://www.markhneedham.com/blog/2009/02/03/nant-include-task-namespace-matters/ Tue, 03 Feb 2009 10:43:56 +0000 https://www.markhneedham.com/blog/2009/02/03/nant-include-task-namespace-matters/ We’ve been trying to include some properties into our build file from a properties file today but no matter what we tried the properties were not being set. We eventually realised that the build file has an XML Namespace set on the project element. <project name="..." xmlns="http://nant.sf.net/schemas/nant.xsd"> It turns out that if you want to include a properties file in your build file, like so: <include buildfile="properties.xml" /> …you need to put the namespace on the project attribute of that file as well, otherwise its properties don’t get picked up. C#: Refactoring to functional collection parameters https://www.markhneedham.com/blog/2009/02/03/c-refactoring-to-functional-collection-parameters/ Tue, 03 Feb 2009 07:18:40 +0000 https://www.markhneedham.com/blog/2009/02/03/c-refactoring-to-functional-collection-parameters/ I wrote about a month or so ago about the functional collection parameters now available in C# and certainly one of the most fun refactorings for me is trying to get code written using a for loop into a state where it is using one of these. With a bit of help from my colleague James Crisp, these are some of the most common refactorings that I have come across so far. Coding Dojo #8: Isola https://www.markhneedham.com/blog/2009/01/30/coding-dojo-8-isola/ Fri, 30 Jan 2009 11:17:58 +0000 https://www.markhneedham.com/blog/2009/01/30/coding-dojo-8-isola/ Our latest coding dojo involved writing the board game Isola in Java. The Format We used the Randori approach again with around 8 or 9 people participating for the majority of the session, our biggest turnout yet. I think the majority of people had the opportunity to drive a couple of times over the evening. We had the pair driving at the front of the room and everyone else further back to stop the tendency of observers to whiteboard stuff. TDD: Test DRYness https://www.markhneedham.com/blog/2009/01/30/tdd-test-dryness/ Fri, 30 Jan 2009 11:16:27 +0000 https://www.markhneedham.com/blog/2009/01/30/tdd-test-dryness/ I had a discussion recently with Fabio about DRYness in our tests and how we don’t tend to adhere to this principal as often in test code as in production code. I think certainly some of the reason for this is that we don’t take as much care of our test code as we do production code but for me at least some of it is down to the fact that if we make our tests too DRY then they become very difficult to read and perhaps more importantly, very difficult to debug when there is a failure. TDD: Design tests for failure https://www.markhneedham.com/blog/2009/01/28/tdd-design-tests-for-failure/ Wed, 28 Jan 2009 00:48:16 +0000 https://www.markhneedham.com/blog/2009/01/28/tdd-design-tests-for-failure/ As with most code, tests are read many more times than they are written and as the majority of the time the reason for reading them is to identify a test failure I think it makes sense that we should be designing our tests with failure in mind. Several ideas come to mind when thinking about ways to write/design our tests so that when we do have to read them our task is made easier. Learning alone or Learning together https://www.markhneedham.com/blog/2009/01/25/learning-alone-or-learning-together/ Sun, 25 Jan 2009 23:00:39 +0000 https://www.markhneedham.com/blog/2009/01/25/learning-alone-or-learning-together/ One of the things that I have been curious about since we started running coding dojos is whether people learn more effectively alone or when learning as part of a group. Not that I think they are mutually exclusive, I think a combination of both is probably the way to go depending on what it is we are trying to learn and the way that we’re trying to learn it. jQuery: Approaches to testing https://www.markhneedham.com/blog/2009/01/24/jquery-approaches-to-testing/ Sat, 24 Jan 2009 09:36:32 +0000 https://www.markhneedham.com/blog/2009/01/24/jquery-approaches-to-testing/ We’ve been doing a bit of work with jQuery and true to our TDD roots we’ve been trying to work out the best way to test drive our coding in this area. There seem to be 3 main ways that you can go about doing this, regardless of the testing framework you choose to you. We are using screw-unit for our javascript testing. Mock everything out The idea here is that we mock out all calls made to jQuery functions and then we assert that the expected calls were made in our test. Coding Dojo #7: Retlang/Hamcrest .NET attempt https://www.markhneedham.com/blog/2009/01/22/coding-dojo-7-retlanghamcrest-net-attempt/ Thu, 22 Jan 2009 23:02:15 +0000 https://www.markhneedham.com/blog/2009/01/22/coding-dojo-7-retlanghamcrest-net-attempt/ We ran a sort of coding dojo/playing around session which started with us looking at the .NET concurrency library, Retlang, and ended with an attempt to write Hamcrest style assertions in C#. The Format We had the same setup as for our normal coding dojos with two people at the keyboard although we didn’t rotate as aggressively as normal. What We Learnt We started off having a look at a concurrency problem in Cruise Control . C#: Builder pattern still useful for test data https://www.markhneedham.com/blog/2009/01/21/c-builder-pattern-still-useful-for-test-data/ Wed, 21 Jan 2009 23:49:13 +0000 https://www.markhneedham.com/blog/2009/01/21/c-builder-pattern-still-useful-for-test-data/ I had thought that with the ability to use the new object initalizer syntax in C# 3.0 meant that the builder pattern was now no longer necessary but some recent refactoring efforts have made me believe otherwise. My original thought was that the builder pattern was really useful for providing a nicely chained way of creating objects, but after a bit of discussion with some colleagues I have come across three different reasons why we might want to use the builder pattern to create test data: Coding: Contextual learning https://www.markhneedham.com/blog/2009/01/21/coding-contextual-learning/ Wed, 21 Jan 2009 06:42:22 +0000 https://www.markhneedham.com/blog/2009/01/21/coding-contextual-learning/ While reading my colleague’s notes on a brown bag session on pair programming she gave I was reminded of my belief that we learn much more effectively when we are learning in a practical environment. The bit that interested me was this bit regarding onboarding: On board new team members to bring them up to speed on the overall goal and design, so you do not need to repeat basic details when you work with them on a story. Cruise: Pipelining for fast visual feedback https://www.markhneedham.com/blog/2009/01/19/cruise-pipelining-for-fast-visual-feedback/ Mon, 19 Jan 2009 21:38:20 +0000 https://www.markhneedham.com/blog/2009/01/19/cruise-pipelining-for-fast-visual-feedback/ One of the cool features in build servers like Cruise and Team City is the ability to create build pipelines. I have done a bit of work using this feature in previous projects but the key driver for doing so there was to create a chain of producers/consumers (producing and consuming artifacts) eventually resulting in a manual step to put the application into a testing environment. While this is certainly a good reason to create a build pipeline, a colleague pointed out an equally useful way of using this feature to split the build into separate steps pipelined together. F# vs C# vs Java: Functional Collection Parameters https://www.markhneedham.com/blog/2009/01/19/f-vs-c-vs-java-functional-collection-parameters/ Mon, 19 Jan 2009 19:24:25 +0000 https://www.markhneedham.com/blog/2009/01/19/f-vs-c-vs-java-functional-collection-parameters/ I wrote a post about a month ago on using functional collection parameters in C# and over the weekend Fabio and I decided to try and contrast the way you would do this in Java, C# and then F# just for fun. Map Map evaluates a high order function on all the elements in a collection and then returns a new collection containing the results of the function evaluation. YAGNI: Some thoughts https://www.markhneedham.com/blog/2009/01/17/yagni-some-thoughts/ Sat, 17 Jan 2009 21:01:38 +0000 https://www.markhneedham.com/blog/2009/01/17/yagni-some-thoughts/ If you hang around a team practicing XP for long enough, one of the phrases you are bound to hear is YAGNI (You Ain’t Gonna Need It). Although it can sometimes be used to ignore things we don’t want to focus on as Ian points out, in general the aim is to stop people from working on code that isn’t currently required. So assuming our team isn’t being lazy and trying to avoid decisions that they don’t want to think about, why do we hear the YAGNI call and more importantly, perhaps, what happens when we don’t heed that call. The danger of commenting out code https://www.markhneedham.com/blog/2009/01/17/the-danger-of-commenting-out-code/ Sat, 17 Jan 2009 16:02:33 +0000 https://www.markhneedham.com/blog/2009/01/17/the-danger-of-commenting-out-code/ An idea which is considered common sense by most developers but which is not always adhered to is that of not commenting out code. Code is nearly always under source control anyway so commenting out code which is not being used doesn’t really serve any positive purpose and it can have quite a few negative effects. Clutter Ideally we should be able to read through the code without too much confusion - each method’s name being descriptive enough that we can work out what is going on. Coding Dojo #6: Web Driver https://www.markhneedham.com/blog/2009/01/15/coding-dojo-6-web-driver/ Thu, 15 Jan 2009 00:37:24 +0000 https://www.markhneedham.com/blog/2009/01/15/coding-dojo-6-web-driver/ We ran a sort of coding dojo/more playing around with web driver learning session this evening, coding some tests in Java driving Planet TW from the code. The Format We had the same setup as for our normal coding dojos but only one person was driving at a time and the others were watching from around them offering tips on different approaches. I think only a couple of us drove during the session. F#: Partial Function Application with the Function Composition Operator https://www.markhneedham.com/blog/2009/01/12/f-partial-function-application-with-the-function-composition-operator/ Mon, 12 Jan 2009 22:22:43 +0000 https://www.markhneedham.com/blog/2009/01/12/f-partial-function-application-with-the-function-composition-operator/ In my continued reading of F# one of the ideas I’ve come across recently is that of partial function application. This is a way of allowing us to combine different functions together and allows some quite powerful syntax to be written. The term 'currying' is perhaps a better known term for describing this although as I understand they are not exactly the same. Currying is where we return a function that has been partially applied, in such a way that we can chain together a group of functions with a single argument. How does the user language fit in with the ubiquitous language? https://www.markhneedham.com/blog/2009/01/10/how-does-the-user-language-fit-in-with-the-ubiquitous-language/ Sat, 10 Jan 2009 15:38:01 +0000 https://www.markhneedham.com/blog/2009/01/10/how-does-the-user-language-fit-in-with-the-ubiquitous-language/ We’ve been doing some work this week around trying to ensure that we have a ubiquitous language to describe aspects of the domain across the various different systems on my project. It’s not easy as there are several different teams involved but one thing we realised while working on the language is that the language of the business is not the same as the language of the user. Although this is the first time that I recall working on a project where the language of the user is different to the language of the domain I’m sure there must be other domains where this is the case as well. Finding the value in fixing technical debt https://www.markhneedham.com/blog/2009/01/10/finding-the-value-in-fixing-technical-debt/ Sat, 10 Jan 2009 14:04:57 +0000 https://www.markhneedham.com/blog/2009/01/10/finding-the-value-in-fixing-technical-debt/ Technical debt is a term coined by Martin Fowler which we tend to use on our projects to describe a number of different situations on projects as Ian Cartwright points out in his post on the subject. Ian covers it in more detail, but to summarise my understanding of what technical debt actually is: Technical debt is where we know that something we choose not to take care of now is going to affect us in the future. Coding Dojo #5: Uno https://www.markhneedham.com/blog/2009/01/08/coding-dojo-5-uno/ Thu, 08 Jan 2009 23:41:57 +0000 https://www.markhneedham.com/blog/2009/01/08/coding-dojo-5-uno/ We ran our 5th coding dojo on Thursday night, writing the card game Uno in Java. We didn’t all know the rules so this video explained it - surely a parody but you never know! The Format We used the Randori approach again with 6 people participating for the majority of the session. Everyone paired with everyone else at least once and sometimes a couple of times. We had the pair driving at the front of the room and everyone else further back to stop the tendency of observers to whiteboard stuff. Javascript Dates - Be aware of mutability https://www.markhneedham.com/blog/2009/01/07/javascript-dates-be-aware-of-mutability/ Wed, 07 Jan 2009 23:17:05 +0000 https://www.markhneedham.com/blog/2009/01/07/javascript-dates-be-aware-of-mutability/ It seems that much like in Java, dates in Javascript are mutable, meaning that it is possible to change a date after it has been created. We had this painfully shown to us when using the datejs library to manipulate some dates. The erroneous code was similar to this: var jan312009 = new Date(2008, 1-1, 31); var oneMonthFromJan312009 = new Date(jan312009.add(1).month()); See the subtle error? Outputting these two values gives the following: Javascript: Add a month to a date https://www.markhneedham.com/blog/2009/01/07/javascript-add-a-month-to-a-date/ Wed, 07 Jan 2009 23:00:58 +0000 https://www.markhneedham.com/blog/2009/01/07/javascript-add-a-month-to-a-date/ We’ve been doing a bit of date manipulation in Javascript on my current project and one of the things that we wanted to do is add 1 month to a given date. We can kind of achieve this using the standard date libraries but it doesn’t work for edge cases. For example, say we want to add one month to January 31st 2009. We would expect one month from this date to be February 28th 2009: Outliers: Book Review https://www.markhneedham.com/blog/2009/01/06/outliers-book-review/ Tue, 06 Jan 2009 23:23:06 +0000 https://www.markhneedham.com/blog/2009/01/06/outliers-book-review/ The Book Outliers by Malcolm Gladwell The Review I came across this book following recommendations by Jason Yip and Steven 'Doc' List on Twitter. I’ve previously read The Tipping Point and Blink and I like his easy going style so it was a no brainer that I was going to read this one. I found that this book complimented Talent is Overrated quite nicely. Outliers covers how the story of how people became the best at what they do whereas Talent is Overrated focuses more on what you need to do if you want to become one of these people. jQuery datepicker IE6 positioning bug https://www.markhneedham.com/blog/2009/01/06/jquery-datepicker-ie6-positioning-bug/ Tue, 06 Jan 2009 21:57:06 +0000 https://www.markhneedham.com/blog/2009/01/06/jquery-datepicker-ie6-positioning-bug/ We’ve been using the jQuery datepicker on my current project and came across some strange behaviour with regards to the positioning of the calendar in IE6. The calendar was always positioning itself right at the top of the screen instead of just below the textbox it was hooked up to but in Firefox it was working fine. After a bit of exploration in the jQuery code (ui.datepicker.js) we worked out that the 'document. F#: Forward Operator https://www.markhneedham.com/blog/2009/01/06/f-forward-operator/ Tue, 06 Jan 2009 00:19:52 +0000 https://www.markhneedham.com/blog/2009/01/06/f-forward-operator/ Continuing on my F# journey I came across a post by Ben Hall describing the approach he takes when learning a new programming language. One of the approaches he describes is that of writing unit tests to help keep your learning on track. I’ve only been using the F# interactive console so far so I thought I’d give it a try. After reading about the somewhat convoluted approach required to use NUnit or MBUnit to write F# unit tests I came across XUnit. Agile: When is a story done? https://www.markhneedham.com/blog/2009/01/04/agile-when-is-a-story-done/ Sun, 04 Jan 2009 22:17:08 +0000 https://www.markhneedham.com/blog/2009/01/04/agile-when-is-a-story-done/ I’ve worked on a few different agile projects and one of the things that hasn’t been completely consistent is when we consider a story to be 'done'. There seem to a few different approaches, each of which has its benefits and drawbacks. Why do we care? We care about 'done' for tracking the points we have achieved in an iteration and for knowing when we have added the value the story provides. F# Option Types https://www.markhneedham.com/blog/2009/01/02/f-option-types/ Fri, 02 Jan 2009 22:35:31 +0000 https://www.markhneedham.com/blog/2009/01/02/f-option-types/ I’ve been spending a bit of time working through the Real World Functional Programming book to learn a bit about F# and one of the cool features I came across today (while reading Chris Smith’s post on F# lists) is the Option type. I first came across this idea a few months ago when discussing null handling strategies with a colleague who pointed out that you could get around this problem in Scala by using the Option class. 2008: My Technical Review https://www.markhneedham.com/blog/2009/01/01/2008-my-technical-review/ Thu, 01 Jan 2009 09:28:23 +0000 https://www.markhneedham.com/blog/2009/01/01/2008-my-technical-review/ Others in the blogosphere seem to be doing 2008 round ups around about now so I thought I’d jump in on the action. Project Overview I worked on 5 projects this year writing code in C# 2.0/3.0, Java and Ruby. 2 of the projects were writing client side code, 2 web applications and 1 writing services. The domains I worked in were investment banking, insurance and an industrial automation system Agile: Some misconceptions https://www.markhneedham.com/blog/2008/12/31/agile-some-misconceptions/ Wed, 31 Dec 2008 09:04:00 +0000 https://www.markhneedham.com/blog/2008/12/31/agile-some-misconceptions/ I came across an interesting article written for Visual Studio Magazine about agile methodologies where the author makes what I consider to be some misconceptions. The first is around the level of experience of people working on an agile team: For example, agile teams have a tendency to require a high level of experience and professionalism just to join the team. I wouldn’t say I have a high level of experience and I’ve been working on agile teams for the past two years, just one data point suggesting that this statement is not actually accurate. Oxite: Some Thoughts https://www.markhneedham.com/blog/2008/12/31/oxite-some-thoughts/ Wed, 31 Dec 2008 01:26:37 +0000 https://www.markhneedham.com/blog/2008/12/31/oxite-some-thoughts/ The recently released Oxite code base has taken a bit of a hammering in the blogosphere for a variety of reasons - the general feeling being that it doesn’t really serve as a particularly good example of an ASP.NET MVC application. I was intrigued to read the code though - you can always learn something by doing so and reading code is one of the ares that I want to improve in. Talent is Overrated: Book Review https://www.markhneedham.com/blog/2008/12/29/talent-is-overrated-book-review/ Mon, 29 Dec 2008 20:52:59 +0000 https://www.markhneedham.com/blog/2008/12/29/talent-is-overrated-book-review/ The Book Talent is Overrated by Geoff Colvin The Review I came across this book on Jason Yip’s Twitter feed while the idea of 10,000 hours to become an expert at any given skill was being discussed. I’m reading Outliers as well and the two books seem to complement each other quite well. I’m interested in how we can apply deliberate practice in software development, perhaps using the medium of coding dojos, to become better developers in a more effective manner than just normal practice. Internal/External Domain Models https://www.markhneedham.com/blog/2008/12/28/internalexternal-domain-models/ Sun, 28 Dec 2008 00:19:13 +0000 https://www.markhneedham.com/blog/2008/12/28/internalexternal-domain-models/ One of the underlying characteristic of most of the projects I have worked on is that we have defined our own domain model. On my current project due to the fact that most of the logic in the system is being handled through other services we decided to use WCF messages as the domain model, meaning that our domain model is being defined externally by the team defining the message contracts. C# lambdas: How much context should you need? https://www.markhneedham.com/blog/2008/12/27/c-lambdas-how-much-context-should-you-need/ Sat, 27 Dec 2008 23:15:31 +0000 https://www.markhneedham.com/blog/2008/12/27/c-lambdas-how-much-context-should-you-need/ I had an interesting discussion with a colleague last week about the names that we give to variables inside lambda expressions which got me thinking about the context that we should need to hold when reading code like this. The particular discussion was around an example like this: public class Foo { private String bar; private String baz; public Foo(String bar, String baz) { this.bar = bar; this.baz = baz; } public override string ToString() { return string. TDD: Does it make you slower? https://www.markhneedham.com/blog/2008/12/25/tdd-does-it-make-you-slower/ Thu, 25 Dec 2008 09:41:50 +0000 https://www.markhneedham.com/blog/2008/12/25/tdd-does-it-make-you-slower/ There have been several times where we have been writing code in a test driven way and it has been suggested that we would be able to go much quicker if we stopped writing the tests and just wrote the code. I feel this is a very short term way of looking at the problem and it does eventually come back to haunt you. One of the problems seems to be that in many organisations only the first release of a piece of software is considered, and in this case then yes maybe it would be quicker to develop code in a non TDD fashion. Testing First vs Testing Last https://www.markhneedham.com/blog/2008/12/22/testing-first-vs-testing-last/ Mon, 22 Dec 2008 21:39:22 +0000 https://www.markhneedham.com/blog/2008/12/22/testing-first-vs-testing-last/ I recently posted about my experiences of testing last where it became clear to me how important writing the test before the code is. If we view the tests purely as a way of determining whether or not our code works correctly for a given set of examples then it doesn’t make much difference whether we test before or after we have written the code. If on the other hand we want to get more value out of our tests such as having them the tests act as documentation, drive the design of our APIs and generally prove useful reading to ourself and others in future then a test first approach is the way to go. Try it and see what happens https://www.markhneedham.com/blog/2008/12/21/try-it-and-see-what-happens/ Sun, 21 Dec 2008 17:43:21 +0000 https://www.markhneedham.com/blog/2008/12/21/try-it-and-see-what-happens/ Another of the ideas I have picked up from my lean reading is that of trying things out without understanding exactly what is happening. Or as The Toyota Way puts it… There are many things one doesn’t understand and therefore, we ask them why don’t you just go ahead and take action; try to do something? This is an approach which several colleagues I have worked with recently have been encouraging me to follow. Lean Software Development: Book Review https://www.markhneedham.com/blog/2008/12/20/lean-software-development-book-review/ Sat, 20 Dec 2008 17:29:49 +0000 https://www.markhneedham.com/blog/2008/12/20/lean-software-development-book-review/ The Book Lean Software Development by Mary and Tom Poppendieck The Review I’m keen to learn how the ideas from The Toyota Way can be applied to software development and as far as I know this is the first book which addressed this, hence the reason for me reading it. What did I learn? I found the idea of financial based decisions particularly interesting - I’ve often had situations when developing software where there are trade offs to make and it would have been much easier to make them if we had a dollar value associated with each potential solution. TDD: Mock expectations in Setup https://www.markhneedham.com/blog/2008/12/19/tdd-mock-expectations-in-setup/ Fri, 19 Dec 2008 20:57:23 +0000 https://www.markhneedham.com/blog/2008/12/19/tdd-mock-expectations-in-setup/ One of the ideas that I mentioned in a recent post about what I consider to be a good unit test was the ideas that we shouldn’t necessarily consider the DRY (Don’t Repeat Yourself) principle to be our number one driver. I consider putting mock expectations in the setup methods of our tests to be one of those occasions where we shouldn’t obey this principle and I thought this would be fairly unanimously agreed upon but putting the question to the Twittersphere led to mixed opinions. Testing: What is a defect? https://www.markhneedham.com/blog/2008/12/18/testing-what-is-a-defect/ Thu, 18 Dec 2008 22:34:48 +0000 https://www.markhneedham.com/blog/2008/12/18/testing-what-is-a-defect/ One of the key ideas that I have learnt from my readings of The Toyota Way and Taaichi Ohno’s Workplace Management is that we should strive not to pass defects through the system to the next process, which you should consider to be your customer. As a developer the next process for each story is the testing phase where the testers will (amongst other things) run through the acceptance criteria and then do some exploratory testing for scenarios which weren’t explicitly part of the acceptance criteria. Functional Collection Parameters in C# https://www.markhneedham.com/blog/2008/12/17/functional-collection-parameters-in-c/ Wed, 17 Dec 2008 22:13:28 +0000 https://www.markhneedham.com/blog/2008/12/17/functional-collection-parameters-in-c/ While talking through my understanding of the Select method which can be applied to collections in C# with a colleague, it became clear that C# doesn’t seem to use the same names for these type of operations as are used in the world of functional programming. Coincidentally on the same day I came across Bill Six’s post about using functional collection parameters in Ruby, so I thought I’d see what the equivalent operations are in C#. Pair Programming: What works for me https://www.markhneedham.com/blog/2008/12/17/pair-programming-what-works-for-me/ Wed, 17 Dec 2008 22:09:08 +0000 https://www.markhneedham.com/blog/2008/12/17/pair-programming-what-works-for-me/ My colleague Chris Johnston recently posted about his experiences with pair programming, eventually ending up asking for other people’s experiences in doing so. Several of my colleagues have replied citing some of their best practices and I have previously posted about what I think makes pair programming more effective so for this post I thought I’d try and also identify the approaches that make pair programming work for me. C#'s Lambda ForEach: Only on Lists? https://www.markhneedham.com/blog/2008/12/15/cs-lamba-foreach-only-on-lists/ Mon, 15 Dec 2008 23:52:17 +0000 https://www.markhneedham.com/blog/2008/12/15/cs-lamba-foreach-only-on-lists/ One of my favourite things introduced into C# recently is the new ForEach method which can be applied to (apparently only!) lists. Last week we had a situation where we wanted to make use of the ForEach method on an IDictionary which we were using to store a collection of Selenium clients. IDictionary<string, ISelenium> seleniumClients = new Dictionary<string, ISelenium>(); We wanted to write a piece of code to exit all of the clients when our tests had completed. Environment matters a lot https://www.markhneedham.com/blog/2008/12/15/environment-matters-a-lot/ Mon, 15 Dec 2008 22:02:41 +0000 https://www.markhneedham.com/blog/2008/12/15/environment-matters-a-lot/ One of the discussions we had at the Alt.NET conference back in September was around how important the environment that you work in is to your self improvement as a software developer and it came up again in a discussion with some colleagues. I posted previously about my software development journey so far but to add to that one of the most important things for me about working at ThoughtWorks is the environment that it has provided me to improve myself as a software developer. JUnit Theories: First Thoughts https://www.markhneedham.com/blog/2008/12/12/junit-theories-first-thoughts/ Fri, 12 Dec 2008 00:34:17 +0000 https://www.markhneedham.com/blog/2008/12/12/junit-theories-first-thoughts/ One of my favourite additions to JUnit 4.4 was the @Theory annotation which allows us to write parameterised tests rather than having to recreate the same test multiple times with different data values or creating one test and iterating through our own collection of data values. Previously, as far as I’m aware, it was only possible to parameterise tests by using the TestNG library which has some nice ideas around grouping tests but had horrible reporting the last time I used it. Code for positive data values not negative https://www.markhneedham.com/blog/2008/12/11/code-for-positive-data-values-not-negative/ Thu, 11 Dec 2008 06:48:42 +0000 https://www.markhneedham.com/blog/2008/12/11/code-for-positive-data-values-not-negative/ While reading Pat Kua’s latest post about how coding a certain way can help you avoid certain classes of bugs I was reminded of a technique taught to me by a colleague with regards to writing functions/methods. The idea is that it is more effective to code for positive data values rather than trying to work out all the possible negative combinations, since there are likely to be cases which we hadn’t considered if we do the latter. TDD: One test at a time https://www.markhneedham.com/blog/2008/12/09/tdd-one-test-at-a-time/ Tue, 09 Dec 2008 22:07:37 +0000 https://www.markhneedham.com/blog/2008/12/09/tdd-one-test-at-a-time/ My colleague Sarah Taraporewalla has written a series of posts recently about her experiences with TDD and introducing it at her current client. While I agreed with the majority of the posts, one thing I found interesting was that in the conversation with a TDDer there were two tests being worked on at the same time (at least as far as I understand from the example). This means that there will be two tests failing if we run our test suite, something which I try to avoid wherever possible. Javascript: Creating quick feedback loops https://www.markhneedham.com/blog/2008/12/09/javascript-creating-quick-feedback-loops/ Tue, 09 Dec 2008 21:13:21 +0000 https://www.markhneedham.com/blog/2008/12/09/javascript-creating-quick-feedback-loops/ I’ve been working quite a lot with Javascript and in particular jQuery recently and since I haven’t done much in this area before all the tips and tricks are new to me. One thing which is always useful no matter the programming language is to use it in a way that you can get rapid feedback on what you are doing. Fortunately there are quite a few tools that allow us to do this with Javascript: Taiichi Ohno's Workplace Management: Book Review https://www.markhneedham.com/blog/2008/12/09/taiichi-ohnos-workplace-management-book-review/ Tue, 09 Dec 2008 00:14:48 +0000 https://www.markhneedham.com/blog/2008/12/09/taiichi-ohnos-workplace-management-book-review/ The Book Taiichi Ohno’s Workplace Management by Taiichi Ohno The Review Having completed The Toyota Way a few weeks ago I was speaking with Jason about what books were good to read next - he recommended this one and The Toyota Way Fieldbook. I struggled to see a connection to software development with a lot of what I read, but there were certainly words of wisdom that we can apply to continuously improve our ability to deliver projects. Twitter as a learning tool https://www.markhneedham.com/blog/2008/12/07/twitter-as-a-learning-tool/ Sun, 07 Dec 2008 22:30:43 +0000 https://www.markhneedham.com/blog/2008/12/07/twitter-as-a-learning-tool/ About 8 or 9 months ago I remember having a conversation with a colleague where I asked him where he had got his almost encyclopedic knowledge of all things software development. His reply at the time was that he read a lot of blogs and that this was where he had picked up a lot of the information. While subscribing to different blogs remains a useful way of learning about different aspects of software development, I think Twitter is now becoming a very useful complementary tool to use alongside the RSS reader. Learning cycles https://www.markhneedham.com/blog/2008/12/07/learning-cycles/ Sun, 07 Dec 2008 11:40:00 +0000 https://www.markhneedham.com/blog/2008/12/07/learning-cycles/ I’ve noticed a recurring trend in the way that I learn new concepts which doesn’t seem to fit exactly into any of the models of learning that I have come across so far. It seems to me to be a learning cycle which goes something like this: Don’t know what is good and what’s bad Learn what’s good and what’s bad but don’t know how to fix something that’s bad Dave Thomas on Managing Lean and Agile In Large Software Development https://www.markhneedham.com/blog/2008/12/05/dave-thomas-on-managing-lean-and-agile-in-large-software-development/ Fri, 05 Dec 2008 00:00:50 +0000 https://www.markhneedham.com/blog/2008/12/05/dave-thomas-on-managing-lean-and-agile-in-large-software-development/ No coding dojo update this week as Dave Thomas was in the ThoughtWorks Sydney office to talk about Managing Lean and Agile in Large Software Development. It was actually a talk to the Geek Girls Sydney group but I sneaked in to hear his other talk after listening to the cloud computing one last week. It was a much toned down presentation compared to the cloud computing one although still amusing in places. What makes a good unit test? https://www.markhneedham.com/blog/2008/12/04/what-make-a-good-unit-test/ Thu, 04 Dec 2008 00:31:29 +0000 https://www.markhneedham.com/blog/2008/12/04/what-make-a-good-unit-test/ Following on from my post around the definition of a unit test, a recent discussion on the Test Driven Development mailing list led me to question what my own approach is for writing unit tests. To self quote from my previous post: A well written unit test in my book should be simple to understand and run quickly. Quite simple in theory but as I have learnt (and am still learning) the hard way, much harder to do in practice. jQuery Validation & Firefox Refresh Behaviour https://www.markhneedham.com/blog/2008/12/02/jquery-validation-firefox-refresh-behaviour/ Tue, 02 Dec 2008 22:54:52 +0000 https://www.markhneedham.com/blog/2008/12/02/jquery-validation-firefox-refresh-behaviour/ We’ve been working quite a bit with jQuery and cross browser compatibility and one of the interesting differences we came across today was the behaviour of Firefox and Internet Explorer when it comes to refreshing a page. When you press refresh in Internet Explorer the page gets refreshed to the state that it was in when you first loaded the URL, meaning that the state of the data in forms is returned to its original state. What are your personal practices? https://www.markhneedham.com/blog/2008/12/02/what-are-your-personal-practices/ Tue, 02 Dec 2008 21:18:54 +0000 https://www.markhneedham.com/blog/2008/12/02/what-are-your-personal-practices/ I’ve been reviewing Apprenticeship Patterns over the last week or so and one of the cool ideas I came across is that of creating a Personal Practices Map. The idea is that you draw up a list of your 10 most important practices for coding and design and draw out any relationships between them. This is mine as of now: I wouldn’t say I follow all of these all the time, but they are the practices that I try to follow whenever possible. TDD: If it's hard to test reflect on your approach https://www.markhneedham.com/blog/2008/11/30/tdd-if-its-hard-to-test-reflect-on-your-approach/ Sun, 30 Nov 2008 18:42:29 +0000 https://www.markhneedham.com/blog/2008/11/30/tdd-if-its-hard-to-test-reflect-on-your-approach/ Chad Myers gets it spot on in his recent post about not testing private methods - private methods are private because they should be inaccessible from outside the class and their functionality should be tested via one of the public methods that calls them. I’ve found that when a piece of code seems really difficult to test without exposing a private method then we’re probably trying to test that functionality from the wrong place. Coding Dojo #4: Roman Numerals https://www.markhneedham.com/blog/2008/11/30/coding-dojo-4-roman-numerals/ Sun, 30 Nov 2008 17:58:27 +0000 https://www.markhneedham.com/blog/2008/11/30/coding-dojo-4-roman-numerals/ We ran our 4th coding dojo on Thursday night, attempting to solve the Roman Numerals problem from the TDD Problems website. The Format We ran with the Randori approach again with between 4-6 participants taking part. We coded for about an hour and a half. The pair coding were sat at the front of the room this time in an attempt to keep the focus on the code, a problem identified last week. Html.RadioButton setting all values to selected value workaround https://www.markhneedham.com/blog/2008/11/28/htmlradiobutton-setting-all-values-to-selected-value-workaround/ Fri, 28 Nov 2008 21:32:28 +0000 https://www.markhneedham.com/blog/2008/11/28/htmlradiobutton-setting-all-values-to-selected-value-workaround/ While working with the Html.RadioButton() UI helper for ASP.NET MVC we came across an interesting problem whereby when you submitted the form, all the values for that particular group of radio buttons was set to the value of the one that was selected. For example, given a form like this: <%= Html.RadioButton("option1", true) %>Yes <%= Html.RadioButton("option2", false)%>No When we first load the page, this is the HTML it generated: TDD: Suffering from testing last https://www.markhneedham.com/blog/2008/11/28/tdd-suffering-from-testing-last/ Fri, 28 Nov 2008 00:34:24 +0000 https://www.markhneedham.com/blog/2008/11/28/tdd-suffering-from-testing-last/ I’ve always been a big proponent of writing tests before writing code, and I roll off the standard reasons to people who question this approach: They help to drive the design They provide a safety net when making future changes They provide a way of communicating the intent of the code to the rest of the team And so on. Despite knowing all this I recently took a non test driven approach to writing some bits of code - we were keen to get the system working end to end so it seemed a trade off worth making to prove that it was doable. Dave Thomas on Cloud Computing https://www.markhneedham.com/blog/2008/11/26/dave-thomas-on-cloud-computing/ Wed, 26 Nov 2008 20:46:09 +0000 https://www.markhneedham.com/blog/2008/11/26/dave-thomas-on-cloud-computing/ I went to see Object Mentor’s Dave Thomas give a talk about cloud computing on Tuesday evening in a combined meeting of the Sydney Alt.NET user group and several others. I’d not seen him speak before but several colleagues had seen him at JAOO earlier this year so he came highly recommended. We started off with a plug for the JAOO Australia 2009 conference which will again be in Brisbane and Sydney at the beginning of May. Agile/Lean: All or Nothing? https://www.markhneedham.com/blog/2008/11/26/agilelean-all-or-nothing/ Wed, 26 Nov 2008 06:29:06 +0000 https://www.markhneedham.com/blog/2008/11/26/agilelean-all-or-nothing/ While reading The Toyota Way one of the ideas which stood out for me was the constant mentioning of organisations which picked bits of The Toyota Way, implemented them, achieved some short term gains but then eventually these improvements and went back to the way they were before. I noticed a similar theme coming out in the series of posts in the last week or so about the decline of agile. Lambda in C#: Conciseness v Readability https://www.markhneedham.com/blog/2008/11/24/c-new-language-features-conciseness-v-readability/ Mon, 24 Nov 2008 23:41:36 +0000 https://www.markhneedham.com/blog/2008/11/24/c-new-language-features-conciseness-v-readability/ One of the things I really disliked when I first came across C# 3.0 code was lambda functions. At the time I remember speaking to my Tech Lead and expressing the opinion that they were making the code harder to understand and were valuing conciseness over readability. After a week of reading about the new C# features and understanding how they worked the code was now more readable to me and a lot of the boiler plate code that I had come to expect was no longer necessary. Testing Test Code https://www.markhneedham.com/blog/2008/11/23/testing-test-code/ Sun, 23 Nov 2008 23:21:46 +0000 https://www.markhneedham.com/blog/2008/11/23/testing-test-code/ One of the interesting discussions that has come up on several projects I’ve worked on is whether or not we should test code that was written purely to help us test production code. One of the main arguments used against testing test utility code is that it is not production code and therefore perhaps doesn’t need to be held to the same standards because it lacks the complexity of production code. Agile: A reminder of the benefits of colocation https://www.markhneedham.com/blog/2008/11/22/agile-a-reminder-of-the-benefits-of-colocation/ Sat, 22 Nov 2008 12:46:27 +0000 https://www.markhneedham.com/blog/2008/11/22/agile-a-reminder-of-the-benefits-of-colocation/ Sometimes it’s the seemingly small details of the agile/XP approach to software development that make it so much more effective than the traditional approach. I was reminded of this last week with regards to having co-located teams with the developers, BAs, QAs and the business people all sitting in close proximity. I was working on the auto completion function for one of our screens and the QA on the team, who was sitting next to me, asked me if I could look through the acceptance criteria that he was working on. Coding Dojo #3: Krypton Factor https://www.markhneedham.com/blog/2008/11/22/coding-dojo-3-krypton-factor/ Sat, 22 Nov 2008 11:00:03 +0000 https://www.markhneedham.com/blog/2008/11/22/coding-dojo-3-krypton-factor/ We ran our 3rd coding dojo on Thursday night, attempting to solve the Krypton Factor problem from the Online Judge website. The Format We ran with the Randori approach again, exactly the same as last week but this time we only had 4 participants for the majority of the coding session. What We Learnt We still ended up spending a large percentage of the time drawing out the problem on the whiteboard and not coding. Saff Squeeze: First Thoughts https://www.markhneedham.com/blog/2008/11/21/saff-squeeze-first-thoughts/ Fri, 21 Nov 2008 00:58:07 +0000 https://www.markhneedham.com/blog/2008/11/21/saff-squeeze-first-thoughts/ While practicing some coding by doing the Roman number conversion last weekend I came across an article by Kent Beck which talked of a method he uses to remove the need to use the debugger to narrow down problems. He calls the method the 'Saff Squeeze' and the basic idea as I understand it is to write the original failing test and then inline the pieces of code that it calls, adding assertions earlier on in the code until the actual point of failure is found. Debugging ASP.NET MVC source code https://www.markhneedham.com/blog/2008/11/19/debugging-aspnet-mvc-source-code/ Wed, 19 Nov 2008 21:30:19 +0000 https://www.markhneedham.com/blog/2008/11/19/debugging-aspnet-mvc-source-code/ We’ve been doing some work with the ASP.NET MVC framework this week and one of the things we wanted to be able to do is to debug through the source code to see how it works. Our initial idea was to bin deploy the ASP.NET MVC assemblies with the corresponding pdbs. Unfortunately this didn’t work and we got a conflict with the assemblies deployed in the GAC: Compiler Error Message: CS0433: The type 'System. The Toyota Way: Book Review https://www.markhneedham.com/blog/2008/11/19/the-toyota-way-book-review/ Wed, 19 Nov 2008 06:53:08 +0000 https://www.markhneedham.com/blog/2008/11/19/the-toyota-way-book-review/ The Book The Toyota Way by Jeffrey Liker The Review I was initially very skeptical about the value of lean in software development but became intrigued as to its potential value after listening to Jason championing it. Since The Toyota Way is the book where many of the ideas originated from I thought it only made sense for this to be my first port of call to learn about lean. Standups: Pair stand together https://www.markhneedham.com/blog/2008/11/17/standups-pair-stand-together/ Mon, 17 Nov 2008 22:16:11 +0000 https://www.markhneedham.com/blog/2008/11/17/standups-pair-stand-together/ One of the common trends I have noticed in the stand ups of teams which practice pair programming is that very often the first person in the pair describes what they have been working on and what they will be doing today and then when it comes to the other person they say 'ditto'. After I dittoed one too many times on a project earlier this year it was pointed out to me that this was not a valuable way of contributing to the weekend and that I should describe my view of our progress as it may differ to my pair. Agile - Should everyone have to learn all the roles? https://www.markhneedham.com/blog/2008/11/17/agile-should-everyone-have-to-learn-all-the-roles/ Mon, 17 Nov 2008 00:14:22 +0000 https://www.markhneedham.com/blog/2008/11/17/agile-should-everyone-have-to-learn-all-the-roles/ In my final year of university a few years ago when I was applying for jobs I was really keen to join the (then) Reuters Graduate Technology program. The thing that appealed to me the most was that over the 2 years you were on the graduate program you would have the opportunity to be placed in 4 different roles within the business. The website gives some examples: Technical architect, Project manager, Infrastructure service manager, Business analyst, Product & development manager, Software engineer, Implementation engineer, Desktop design consultant, Technical specialist, Deployment project manager, Training Build: Red/Green for local build https://www.markhneedham.com/blog/2008/11/15/build-redgreen-for-local-build/ Sat, 15 Nov 2008 08:26:21 +0000 https://www.markhneedham.com/blog/2008/11/15/build-redgreen-for-local-build/ One thing I’m learning from reading The Toyota Way is that visual indicators are a very important part of the Toyota Production System, and certainly my experience working in agile software development is that the same is true there. We have certainly learnt this lesson with regards to continuous integration - the build is either red or green and it’s a very obvious visual indicator of the code base at any moment in time. Coding Dojo #2: Bowling Game & Object Calisthenics Continued https://www.markhneedham.com/blog/2008/11/13/coding-dojo-2-bowling-game-object-calisthenics-continued/ Thu, 13 Nov 2008 22:39:07 +0000 https://www.markhneedham.com/blog/2008/11/13/coding-dojo-2-bowling-game-object-calisthenics-continued/ We ran another Coding Dojo on Wednesday night as part of ThoughtWorks Geek Night where we continued working on the Bowling Game problem from last week, keeping the Object Calisthenics approach broadly in mind but not sticking to it as strictly. The Format This time we followed the Randori approach, with a projector beaming the code onto the wall, 2 people pairing on the problem and everyone else watching. Technical/Code Base Retrospective https://www.markhneedham.com/blog/2008/11/12/technicalcode-base-retrospective/ Wed, 12 Nov 2008 23:50:33 +0000 https://www.markhneedham.com/blog/2008/11/12/technicalcode-base-retrospective/ We decided to run a technical retrospective on our code base yesterday afternoon but apart from one blog post on the subject and a brief mention on Pat Kua’s blog I couldn’t find much information with regards to how to run one. We therefore decided to take a fairly similar approach to our weekly retrospectives in terms of having one column for 'Like' and one for 'Dislike'. In addition we had columns for 'Want To Know More About' and 'Patterns'. Agile: The Client/User dilemma https://www.markhneedham.com/blog/2008/11/12/agile-the-clientuser-dilemma/ Wed, 12 Nov 2008 07:22:55 +0000 https://www.markhneedham.com/blog/2008/11/12/agile-the-clientuser-dilemma/ While reading Marc’s post about the Customer or Client naming dilemma I was reminded of another situation I have noticed in software development - the Client/User dilemma. From my experience of agile projects it tends to be much more likely that we can get easy access to our client than to the users of the system we are writing. Alistair Cockburn mentions in Crystal Clear that having an expert user sit with the team can be very useful, but it is not something that I have experienced on all the projects that I have worked on. Logging with Pico Container https://www.markhneedham.com/blog/2008/11/11/logging-with-pico-container/ Tue, 11 Nov 2008 00:08:16 +0000 https://www.markhneedham.com/blog/2008/11/11/logging-with-pico-container/ One thing that we’ve been working on recently is the logging for our current code base. Nearly all the objects in our system are being created by Pico Container so we decided that writing an interceptor that hooked into Pico Container would be the easiest way to intercept and log any exceptions throw from our code. Our initial Googling led us to the AOP Style Interception page on the Pico website which detailed how we could create a static proxy for a class that we put in the container. Agile: Putting the risk up front https://www.markhneedham.com/blog/2008/11/10/agile-putting-the-risk-up-front/ Mon, 10 Nov 2008 22:44:15 +0000 https://www.markhneedham.com/blog/2008/11/10/agile-putting-the-risk-up-front/ The last two projects that I’ve worked on I’ve been on the project from right near the start, and one thing that’s been consistent in both projects is that we’ve spent time early on in the project trying to reduce technical risk. In my most recent project this has involved getting infrastructure in place early on, and in the previous one it involved working on technical spikes for several weeks to prove that what the client was asking for was actually technically possible. Debugging 3rd party libraries more effectively https://www.markhneedham.com/blog/2008/11/09/debugging-3rd-party-libraries-more-effectively/ Sun, 09 Nov 2008 21:55:17 +0000 https://www.markhneedham.com/blog/2008/11/09/debugging-3rd-party-libraries-more-effectively/ Debugging 3rd party library code quickly and effectively is one of the skills which most obviously separates Senior and Junior developers from my experience. From observation over the last couple of years there are some patterns in the approaches which the best debuggers take. Get more information Sometimes it’s difficult to understand exactly how to solve a problem without getting more information. Verbose logging mode is available on the majority of libraries and provides the information showing how everything fits together which is normally enough information to work out how to solve the problem. Hamcrest Matchers - Make the error message clear https://www.markhneedham.com/blog/2008/11/08/hamcrest-matchers-make-the-error-message-clear/ Sat, 08 Nov 2008 02:46:59 +0000 https://www.markhneedham.com/blog/2008/11/08/hamcrest-matchers-make-the-error-message-clear/ We have been making good use of Hamcrest matchers on my current project for making assertions, and have moved almost entirely away from the more traditional JUnit assertEquals approach. There are several reasons why I find the Hamcrest matcher approach to be more productive - it’s more flexible, more expressive and when an assertion fails we have a much better idea about why it has failed than if we use a JUnit assertion for example. File system equivalent of commenting code https://www.markhneedham.com/blog/2008/11/06/file-system-equivalent-of-commenting-code/ Thu, 06 Nov 2008 21:51:09 +0000 https://www.markhneedham.com/blog/2008/11/06/file-system-equivalent-of-commenting-code/ Last week I came across what I have decided is the file system equivalent of commenting out code - not deleting directories when we are no longer using them. The specific situation we ran into was while trying to make some Tomcat configuration changes but everything we changed was having no effect on what we were seeing on the web site. Eventually we realised that we were actually changing the configuration in the wrong place - we actually had two Tomcat folder lying around. Object Calisthenics: First thoughts https://www.markhneedham.com/blog/2008/11/06/object-calisthenics-first-thoughts/ Thu, 06 Nov 2008 21:30:26 +0000 https://www.markhneedham.com/blog/2008/11/06/object-calisthenics-first-thoughts/ We ran an Object Calisthenics variation of Coding Dojo on Wednesday night as part of ThoughtWorks Geek Night in Sydney. Object Calisthenics is an idea suggest by Jeff Bay in The ThoughtWorks Anthology , and lists 9 rules to writing better Object Oriented code. For those who haven’t seen the book, the 9 rules are: Use only one level of indentation per method Don’t use the else keyword Wrap all primitives and strings Pair Programming: The Over Eager Driver https://www.markhneedham.com/blog/2008/11/05/pair-programming-the-over-eager-driver/ Wed, 05 Nov 2008 23:48:54 +0000 https://www.markhneedham.com/blog/2008/11/05/pair-programming-the-over-eager-driver/ One of the interesting situations that can arise when pair programming is that one person dominates the driving and their pair can hardly get a look in. This is not necessarily because they are hogging the keyboard - it is often just the case that they are the stronger technically in the pair and the other person isn’t willing to ask for the keyboard. A big part of the value in pair programming comes from having both people taking turns at driving and navigating from my experience and there are several ideas that I have come across for trying to encourage a more collaborative approach to pair programming. Crystal Clear: Book Review https://www.markhneedham.com/blog/2008/11/05/crystal-clear-book-review/ Wed, 05 Nov 2008 08:01:26 +0000 https://www.markhneedham.com/blog/2008/11/05/crystal-clear-book-review/ The Book Crystal Clear by Alistair Cockburn The Review This was a book which had been recommended to me by a colleague a few months ago as one of the best software development books to read, and after hearing Ian Cooper describe how his team was implementing some of the ideas at the Alt.NET conference I decided I’d give it a read. I have been working in an Agile/XP environment at ThoughtWorks for the last two years so my context coming into the book was around understanding where the overlap with Crystal Clear was, what differences there are and how I can apply these on my projects Pair Programming: Benefits of the pair switch mid story https://www.markhneedham.com/blog/2008/11/04/pair-programming-benefits-of-the-pair-switch-mid-story/ Tue, 04 Nov 2008 00:00:26 +0000 https://www.markhneedham.com/blog/2008/11/04/pair-programming-benefits-of-the-pair-switch-mid-story/ On my current project we’ve been having some discussions around the frequency with which we rotate pairs, the feeling being that we probably keep the same pairs for a bit too long. We discussed using techniques such as promiscuous pairing, which takes the idea of pair rotation to an extreme, but have settled on making our rotations more or less daily. One interesting thing I noticed from some recent pair switching was the immediate benefit we can realise from the pair rotation. Pair Programming: Driving quickly https://www.markhneedham.com/blog/2008/11/02/pair-programming-driving-quickly/ Sun, 02 Nov 2008 22:13:33 +0000 https://www.markhneedham.com/blog/2008/11/02/pair-programming-driving-quickly/ In order to experience the full benefits of pair programming it is important to try and reduce the chance of the navigator getting bored and losing focus. One of the main ways that we can do this is by ensuring that we have a quick turnaround between the driver and navigator, and this can be done by ensuring that when we are driving we are doing so as quickly as possible. CSS in Internet Explorer - Some lessons learned https://www.markhneedham.com/blog/2008/11/01/css-in-internet-explorer-some-lessons-learned/ Sat, 01 Nov 2008 01:24:51 +0000 https://www.markhneedham.com/blog/2008/11/01/css-in-internet-explorer-some-lessons-learned/ I’ve spent the last few days working with CSS, and in particular trying to make a layout which works perfectly fine in Firefox work properly in Internet Explorer 6. I’m far from an expert when it comes to this but I’ve picked up a few lessons from our attempts to get identical layouts in both browsers. Internet Explorer seems to do some crazy stuff when it comes to padding and margins - we were often ending up with huge margins where we hadn’t even specified any. Testing Hibernate mappings: Setting up test data https://www.markhneedham.com/blog/2008/10/30/testing-hibernate-mappings-setting-up-test-data/ Thu, 30 Oct 2008 23:24:14 +0000 https://www.markhneedham.com/blog/2008/10/30/testing-hibernate-mappings-setting-up-test-data/ Continuing with my mini Hibernate mappings series, this post talks about the different ways of setting up the test data for our Hibernate tests. Where to test the mappings from? How to test for equality? How to setup the test data? There are a couple of ways that we can setup data for Hibernate tests. Insert Hibernate Object This approach involves creating a new object and saving it to the database using the save method on the Hibernate session. Testing Hibernate mappings: Testing Equality https://www.markhneedham.com/blog/2008/10/29/testing-hibernate-mappings-testing-equality/ Wed, 29 Oct 2008 18:03:36 +0000 https://www.markhneedham.com/blog/2008/10/29/testing-hibernate-mappings-testing-equality/ I started a mini Hibernate series with my last post where I spoke of there being three main areas to think about when it comes to testing: Where to test the mappings from? How to test for equality? How to setup the test data? Once we have worked out where to test the mappings from, if we have decided to test them through either our repository tests or directly from the Hibernate session then we have some choices to make around how to test for equality. Testing Hibernate mappings: Where to test from? https://www.markhneedham.com/blog/2008/10/27/testing-hibernate-mappings-where-to-test-from/ Mon, 27 Oct 2008 22:55:15 +0000 https://www.markhneedham.com/blog/2008/10/27/testing-hibernate-mappings-where-to-test-from/ I’ve had the opportunity to work with Hibernate and it’s .NET twin NHibernate on several of my projects and one of the more interesting decisions around its use is working out the best way to test the hibernate mappings that hook together our domain model and the database. There are three decisions to make around how best to do this: Where to test the mappings from? How to test for equality? buildr - using another project's dependencies https://www.markhneedham.com/blog/2008/10/26/buildr-using-another-projects-dependencies/ Sun, 26 Oct 2008 20:54:18 +0000 https://www.markhneedham.com/blog/2008/10/26/buildr-using-another-projects-dependencies/ Through my continued use of buildr on my current project one thing we wanted to do last week was to run our production code tests using some code from the test-utilities project along with its dependencies. I thought this would be the default behaviour but it wasn’t. Looking at the documentation suggested we could achieve this by calling 'compile.dependencies' on the project, but from what I can tell you still need to explicitly state that you want to use the main test utilities code as well. Selenium - Selecting the original window https://www.markhneedham.com/blog/2008/10/25/selenium-selecting-the-original-window/ Sat, 25 Oct 2008 01:55:18 +0000 https://www.markhneedham.com/blog/2008/10/25/selenium-selecting-the-original-window/ I’ve not used Selenium much in my time - all of my previous projects have been client side applications or service layers - but I’ve spent a bit of time getting acquainted with it this week. While activating some acceptance tests this week I noticed quite a strange error happening if the tests ran in a certain order: com.thoughtworks.selenium.SeleniumException: ERROR: Current window or frame is closed! at com.thoughtworks.selenium.HttpCommandProcessor.doCommand(HttpCommandProcessor.java:73) at com. Don't shave the yak, ask 'Why are we doing this?' https://www.markhneedham.com/blog/2008/10/25/dont-shave-the-yak-ask-why-are-we-doing-this/ Sat, 25 Oct 2008 01:34:53 +0000 https://www.markhneedham.com/blog/2008/10/25/dont-shave-the-yak-ask-why-are-we-doing-this/ One of the very common pitfalls I make when working on things is to get so engrossed in the technical details of the problem that I completely forget the reason for doing it in the first place. Over the last week or so I have noticed myself trying to solve some ridiculous problems without considering whether I am solving the right problem in the first place. To give an example, I was working with Hibernate earlier in the week trying to setup a new mapping between two entities which involved creating a composite key on one of the entities, which led to us having to work out how to do that on the database, then editing our migration script, then trawling Google to work out why our mapping wasn’t working, before a colleague overheard our pain and pointed out that we had over complicated matters. Keep Java checked exceptions in a bounded context https://www.markhneedham.com/blog/2008/10/23/keep-java-checked-exceptions-in-a-bounded-context/ Thu, 23 Oct 2008 21:22:26 +0000 https://www.markhneedham.com/blog/2008/10/23/keep-java-checked-exceptions-in-a-bounded-context/ One of the features that I dislike in Java compared to C# is checked exceptions. For me an exception is about a situation which is exceptional, and if we know that there is a possibility of it happening and even have that possibility defined in our code then it doesn’t seem all that exceptional to me. Having said that they do at least provide information which you can’t help but notice about what can go wrong when you make a call to a particular method. Making experience matter https://www.markhneedham.com/blog/2008/10/23/making-experience-matter/ Thu, 23 Oct 2008 00:12:29 +0000 https://www.markhneedham.com/blog/2008/10/23/making-experience-matter/ I recently came across this post which speaks about the desire of recruiters to put candidates into technology specific boxes when it comes to describing their experience. I guess this desire is backed by humans' need to see the patterns and similarities in data and having someone who doesn’t quite fit into a generalised box makes it more difficult. I have worked on projects in Java, C# and a bit of Ruby so I do agree with most of the points with regards to language specialisation and as Jay Fields points out it is actually beneficial to diversify your experience to improve yourself. Tomcat - No caching of RESTlet resources for Firefox https://www.markhneedham.com/blog/2008/10/22/tomcat-no-caching-of-pages-for-firefox/ Wed, 22 Oct 2008 22:00:33 +0000 https://www.markhneedham.com/blog/2008/10/22/tomcat-no-caching-of-pages-for-firefox/ One problem that we’ve been trying to solve today is how to make a RESTlet resource non cacheable. The reason for this is that when a user logs out of the system and then hits the back button they shouldn’t be able to see that page, but instead should see the login form. After several hours of trawling Google and trying out various different suggestions we came across the idea of setting 'cache-control' with the value 'no-store' in the response headers. Fearless Change: Book Review https://www.markhneedham.com/blog/2008/10/21/fearless-change-book-review/ Tue, 21 Oct 2008 23:34:40 +0000 https://www.markhneedham.com/blog/2008/10/21/fearless-change-book-review/ The Book Fearless Change by Mary Lyan Manns and Linda Rising The Review I came across this book while watching an interview with Linda Rising on InfoQ. She mentioned some ideas from Malcolm Gladwell’s The Tipping Point which intrigued me and a strong recommendation from a colleague ensured this book made it onto my reading list. I am not currently working on a project where I need to instigate a lot of change so I was going slightly against my own principle of only reading books when I need to, but I recalled several times previously when I have tried to introduce what I thought were good ideas and didn’t really get anywhere. If you use an 'if' you deserve to suffer https://www.markhneedham.com/blog/2008/10/21/if-you-use-an-if-you-deserve-to-suffer/ Tue, 21 Oct 2008 07:19:56 +0000 https://www.markhneedham.com/blog/2008/10/21/if-you-use-an-if-you-deserve-to-suffer/ One of the things I dislike the most when coding is writing if statements. and while I don’t believe that if should be completely abolished from our toolkit, I think the anti if campaign started about a year ago is going along the right lines. While there is certainly value in using an if statement as a guard block it usually feels that we have missed an abstraction if we are using it elsewhere. Build: Checkout and Go https://www.markhneedham.com/blog/2008/10/19/build-checkout-and-go/ Sun, 19 Oct 2008 22:49:14 +0000 https://www.markhneedham.com/blog/2008/10/19/build-checkout-and-go/ On the previous project I was working on one of the pain points we were having was around setting up developer environments such that you could get the code up and running on a machine as quickly as possible. I would go to a newly formatted machine ready to set it up for development and run into a cascading list of dependencies I hadn’t considered. SVN wasn’t installed, then Ruby, then we had the wrong version of Java and all the while we were wasting time when this process could have been automated. Learnings from Code Kata #1 https://www.markhneedham.com/blog/2008/10/18/learnings-from-code-kata-1/ Sat, 18 Oct 2008 19:47:31 +0000 https://www.markhneedham.com/blog/2008/10/18/learnings-from-code-kata-1/ I’ve been reading My Job Went To India and one of the chapters midway through the second section talks about the value of practicing coding using code katas. I’ve not tried doing these before but I thought it would be an interesting activity to try out. The Kata Code Kata One - Supermarket Pricing What I learnt As this kata is not supposed to be a coding exercise I started out just modeling ideas in my head about how I would do it before I realised that this wasn’t working as an effective way for me to learn. Pair Programming: Pair Flow https://www.markhneedham.com/blog/2008/10/17/pair-programming-pair-flow/ Fri, 17 Oct 2008 00:18:39 +0000 https://www.markhneedham.com/blog/2008/10/17/pair-programming-pair-flow/ In an earlier post about Team Productivity I stumbled upon the idea that while pair programming there could be such a concept as pair flow. The term 'flow' is used to describe a situation where you are totally immersed in the work you’re doing and where time seems to go by without you even noticing. This can also happen when pair programming and I think there are some factors which can make it more likely. Browsing around the Unix shell more easily https://www.markhneedham.com/blog/2008/10/15/browsing-around-the-unix-shell-more-easily/ Wed, 15 Oct 2008 22:31:16 +0000 https://www.markhneedham.com/blog/2008/10/15/browsing-around-the-unix-shell-more-easily/ Following on from my post about getting the pwd to display on the bash prompt all the time I have learnt a couple of other tricks to make the shell experience more productive. Aliases are the first new concept I came across and several members of my current team and I now have these setup. We are primarily using them to provide a shortcut command to get to various locations in the file system. Java vs .NET: An Overview https://www.markhneedham.com/blog/2008/10/15/java-vs-net-an-overview/ Wed, 15 Oct 2008 00:09:05 +0000 https://www.markhneedham.com/blog/2008/10/15/java-vs-net-an-overview/ A couple of months ago my colleague Mark Thomas posted about working on a C# project after 10 years working in Java, and being someone who has worked on projects in both languages fairly consistently (3 Java projects, 2 .NET projects) over the last two years I thought it would be interesting to do a comparison between the two. The standard ThoughtWorks joke is that you just need to remember to capitalise the first letter of method names in C# and then you’re good to go but I think there’s more to it than that. Context Driven Learning https://www.markhneedham.com/blog/2008/10/13/context-driven-learning/ Mon, 13 Oct 2008 20:44:03 +0000 https://www.markhneedham.com/blog/2008/10/13/context-driven-learning/ One pattern I’ve noticed over the last couple of years with regards to my own learning is that I find it very difficult to learn new things unless I can directly apply what I have learnt to a real life situation. I feel this was part of the reason I found the way material is taught at universities so difficult to understand - nearly every course I studied was taught on its own without any reference to the others, and rarely did I get to use the ideas I learnt in a practical context. Using test guided techniques for spiking https://www.markhneedham.com/blog/2008/10/12/using-test-guided-techniques-for-spiking/ Sun, 12 Oct 2008 13:49:35 +0000 https://www.markhneedham.com/blog/2008/10/12/using-test-guided-techniques-for-spiking/ I think that out of all the Extreme Programming practices Test Driven Development is the one which I like the best. I feel it provides a structure for development work and helps me to remain focused on what I am trying to achieve rather than writing code which may not necessarily be needed. However, there are times when it’s difficult to use a TDD approach, and Pat Kua suggested earlier this year that if you’re using a TDD approach all the time you’re doing something wrong. What is a unit test? https://www.markhneedham.com/blog/2008/10/10/what-is-a-unit-test/ Fri, 10 Oct 2008 23:21:43 +0000 https://www.markhneedham.com/blog/2008/10/10/what-is-a-unit-test/ One of the questions which came up during the Sydney Alt.NET User Group meeting at the start of October was around what a unit test actually is. I suppose the somewhat naive or simplistic definition is that it is just any test written using an xUnit framework such as NUnit or JUnit. However, integration or acceptance tests are often written using these frameworks so this definition doesn’t hold. While discussing this last week a colleague came up with what I considered to be a very clear yet precise definition. Pair Programming: Why would I pair on this? https://www.markhneedham.com/blog/2008/10/09/pair-programming-why-would-i-pair-on-this/ Thu, 09 Oct 2008 00:38:43 +0000 https://www.markhneedham.com/blog/2008/10/09/pair-programming-why-would-i-pair-on-this/ In the comments of my previous post on pairing Vivek made the following comment about when we should pair: The simplest principle I have is to use "conscious" pairing vs. "unconscious" pairing. A pair should always know why they are pairing. On previous projects I have worked on there have been several tasks where it has been suggested that there is little value in pairing. I decided to try and apply Vivek’s principle of knowing why we might pair on these tasks to see if there is actually any value in doing so. Test Driven Development By Example: Book Review https://www.markhneedham.com/blog/2008/10/07/test-driven-development-by-example-book-review/ Tue, 07 Oct 2008 23:17:19 +0000 https://www.markhneedham.com/blog/2008/10/07/test-driven-development-by-example-book-review/ The Book Test Driven Development by Example by Kent Beck The Review I know this book is quite old but I haven’t read it before - it’s been recommended to me several times but I never got round to reading it, possibly because of my somewhat misguided opinion that seeing as I do TDD nearly every day I shouldn’t need to read it. More by chance than anything else, I was browsing through a friend’s copy of the book and came across several gems of information which persuaded me that I should take the time to read the rest of it. rspec - Invalid character '\240' in expression https://www.markhneedham.com/blog/2008/10/06/rspec-invalid-character-240-in-expression/ Mon, 06 Oct 2008 20:48:48 +0000 https://www.markhneedham.com/blog/2008/10/06/rspec-invalid-character-240-in-expression/ We have been using rspec on my project for the unit testing of our Ruby code and while running one of the specs last week I ended up getting this somewhat en-cryptic error message: Invalid character '\240' in expression ... After convincing myself that this error wasn’t actually possible it turned out that I had somehow entered an 'invisible to TextMate' character after one of the method definitions - on the editor it just looked like a space. Calling shell script from ruby script https://www.markhneedham.com/blog/2008/10/06/calling-shell-script-from-ruby-script/ Mon, 06 Oct 2008 20:12:49 +0000 https://www.markhneedham.com/blog/2008/10/06/calling-shell-script-from-ruby-script/ Damana and I previously posted about our experiences with different Ruby LDAP solutions. Having settled on Ruby-LDAP (although having read Ola and Steven’s comments we will now look at ruby-net-ldap) we then needed to put together the setup, installation and teardown into a ruby script file. A quick bit of Googling revealed that we could use the Kernel.exec method to do this. For example, you could put the following in a ruby script file and it would execute and show you the current directory listing: Pragmatic Learning and Thinking: Book Review https://www.markhneedham.com/blog/2008/10/06/pragmatic-learning-and-thinking-book-review/ Mon, 06 Oct 2008 02:20:04 +0000 https://www.markhneedham.com/blog/2008/10/06/pragmatic-learning-and-thinking-book-review/ The Book Pragmatic Learning and Thinking by Andy Hunt The Review I came across this book when reading a post linking lean to the Dreyfus Model on Dan North’s blog. I have a keen interest in theories of learning and have completed an NLP Practitioner’s course so the ideas described in the book summary immediately appealed to me. After coming across the concept of Reading Deliberately in Chapter 6 of the book I decided I should give the SQ3Q approach to reading books its first run out. Ruby LDAP Options https://www.markhneedham.com/blog/2008/10/05/ruby-ldap-options/ Sun, 05 Oct 2008 16:29:32 +0000 https://www.markhneedham.com/blog/2008/10/05/ruby-ldap-options/ As I mentioned in an earlier post a colleague and I spent a few days looking at how to connect to an OpenDS LDAP server using Ruby. We ended up analysing four different solutions for solving the problem. Active LDAP This approach involved using the Active LDAP Ruby which "provides an object oriented interface to LDAP. It maps LDAP entries to Ruby objects with LDAP attribute accessors based on your LDAP server’s schema and each object’s objectClasses". Ruby: Ignore header line when parsing CSV file https://www.markhneedham.com/blog/2008/10/04/ruby-ignore-header-line-when-parsing-csv-file/ Sat, 04 Oct 2008 01:32:08 +0000 https://www.markhneedham.com/blog/2008/10/04/ruby-ignore-header-line-when-parsing-csv-file/ As my Ruby journey continues one of the things I wanted to do today was parse a CSV file. This article proved to be very useful for teaching the basics but it didn’t say how to ignore the header line that the CSV file contained. The CSV file I was parsing was similar to this: name, surname, location Mark, Needham, Sydney David, Smith, London I wanted to get the names of people originally to use them in my code. It's not all about the acceptance tests https://www.markhneedham.com/blog/2008/10/03/its-not-all-about-the-acceptance-tests/ Fri, 03 Oct 2008 01:26:13 +0000 https://www.markhneedham.com/blog/2008/10/03/its-not-all-about-the-acceptance-tests/ A few of my colleagues recently posted their opinions about acceptance tests which tied in nicely with a discussion about acceptance testing that was had at the Alt.NET conference in London. For the sake of argument I will assume that when we refer to acceptance tests we are talking about tests at the GUI level which are being automatically driven by a tool, usually Selenium but maybe something like White if it is a client side application. Ignore file in Svn https://www.markhneedham.com/blog/2008/10/02/ignore-file-in-svn/ Thu, 02 Oct 2008 21:10:27 +0000 https://www.markhneedham.com/blog/2008/10/02/ignore-file-in-svn/ I spent a bit of time this afternoon marveling at the non intuitiveness of working out how to ignore files in Svn. Normally I’d just use Tortoise SVN as it makes it so easy for you but I really wanted to know how to do it from the shell! After a bit of Googling and conversation with a colleague I think I have it figured out to some extent. Ruby: Unzipping a file using rubyzip https://www.markhneedham.com/blog/2008/10/02/ruby-unzipping-a-file-using-rubyzip/ Thu, 02 Oct 2008 00:04:22 +0000 https://www.markhneedham.com/blog/2008/10/02/ruby-unzipping-a-file-using-rubyzip/ In the world of Ruby I’ve been working on a script which needs to unzip a file and then run an installer which is only available after unpacking it. We’ve been using the rubyzip gem to do so but so far it hasn’t felt intuitive to me coming from the Java/C# world. ZipFile is the class we need to use and at first glance I had thought that it would be possible to just pass the zip file name to the 'extract' method and have it do all the work for me! Alt.NET Sydney User Group Meeting #1 https://www.markhneedham.com/blog/2008/10/01/altnet-sydney-user-group-meeting-1/ Wed, 01 Oct 2008 22:09:53 +0000 https://www.markhneedham.com/blog/2008/10/01/altnet-sydney-user-group-meeting-1/ James Crisp and Richard Banks arranged the first Alt.NET Sydney User Group meeting held on Tuesday night at the ThoughtWorks office. The first thing to say is thanks to James and Richard for getting this setup so quickly - it was less than a month ago that Richard suggested the idea of creating a group on the Alt.NET mailing list. Richard and James have already written summaries of what went on but I thought I’d give some of my thoughts as well. TDD without the design https://www.markhneedham.com/blog/2008/10/01/tdd-without-the-design/ Wed, 01 Oct 2008 00:32:20 +0000 https://www.markhneedham.com/blog/2008/10/01/tdd-without-the-design/ Roy Osherove and several others have posted recently about introducing TDD to the 'masses' As I understand it Roy’s idea is to separate the learning of TDD from the learning of good design principles - good design principles in this case being the OOP principles described in Uncle Bob’s Agile Software Development Principles, Practices and Practices or on the Object Mentor website. I am usually in favour of an approach that breaks a skill down into chunks so that it is easier to learn but in this case I feel that some of the big gains in coding in a TDD way is the decoupled design it encourages, which in my experience is more likely to follow good design principles. Connecting to LDAP server using OpenDS in Java https://www.markhneedham.com/blog/2008/09/29/connecting-to-ldap-server-using-opends-in-java/ Mon, 29 Sep 2008 23:27:37 +0000 https://www.markhneedham.com/blog/2008/09/29/connecting-to-ldap-server-using-opends-in-java/ A colleague and I have spent the past couple of days spiking solutions for connecting to LDAP servers from Ruby. We decided that the easiest way to do this is by using OpenDS, an open source directory service based on LDAP. One option we came up with for doing this was to make use of the Java libraries for connecting to the LDAP server and then calling through to these from our Ruby code using the Ruby Java Bridge. Show pwd all the time https://www.markhneedham.com/blog/2008/09/28/show-pwd-all-the-time/ Sun, 28 Sep 2008 22:50:44 +0000 https://www.markhneedham.com/blog/2008/09/28/show-pwd-all-the-time/ Finally back in the world of the shell last week I was constantly typing 'pwd' to work out where exactly I was in the file system until my colleague pointed out that you can adjust your settings to get this to show up automatically for you on the left hand side of the prompt. To do this you need to create or edit your .bash_profile file by entering the following command: Pair Programming: What do we gain from it? https://www.markhneedham.com/blog/2008/09/28/pair-programming-what-do-we-gain-from-it/ Sun, 28 Sep 2008 22:19:30 +0000 https://www.markhneedham.com/blog/2008/09/28/pair-programming-what-do-we-gain-from-it/ My former colleague Vivek Vaid has an interesting post about parallel-paired programming where he talks about introducing lean concepts into deciding when we should pair to get maximum productivity. Midway through the post he mentions that the original reason that we starting pairing was for 'collaborative design' which got me thinking whether there are reasons beyond this why we would want to pair. I have often worked on clients where the value of pair programming has been questioned and it has been suggested that we should only adhere to this practice for tasks where it adds most value. Easily misused language features https://www.markhneedham.com/blog/2008/09/25/easily-misused-language-features/ Thu, 25 Sep 2008 23:18:09 +0000 https://www.markhneedham.com/blog/2008/09/25/easily-misused-language-features/ In the comments of my previous post about my bad experiences with Java’s import static my colleague Carlos and several others pointed out that it is actually a useful feature when used properly. The code base where I initially came across the feature misused it quite severely but it got me thinking about other language features I have come across which can add great value when used effectively but lead to horrific problems when misused. My dislike of Java's static import https://www.markhneedham.com/blog/2008/09/24/my-dislike-of-javas-static-import/ Wed, 24 Sep 2008 23:59:54 +0000 https://www.markhneedham.com/blog/2008/09/24/my-dislike-of-javas-static-import/ While playing around with JBehave I was reminded of my dislike of the import static feature which was introduced in Java 1.5. Using import static allows us to access static members defined in another class without referencing the class name. For example suppose we want to use the following method in our code: Math.max(1,2); Normally we would need to include the class name (Math) that the static function (max) belongs to. Onshore or Offshore - The concepts are the same? https://www.markhneedham.com/blog/2008/09/24/onshore-or-offshore-the-concepts-are-the-same/ Wed, 24 Sep 2008 07:08:54 +0000 https://www.markhneedham.com/blog/2008/09/24/onshore-or-offshore-the-concepts-are-the-same/ I’ve never worked on a distributed or offshore project before, but intrigued having read about Jay Fields' experiences I attended the 'OffShoring: The Current State of Play' Quarterly Technology Briefing held this morning in Sydney to hear the other side of the argument. The underlying message for me was that a lot of the concepts we apply for onshore projects are equally important for offshore projects. Forrester’s Tim Sheedy started off by providing some research data on the state of IT offshoring, some reasons he had identified around which type of work should be offshored before closing on some reasons that it might fail if not done correctly. Testing with Joda Time https://www.markhneedham.com/blog/2008/09/24/testing-with-joda-time/ Wed, 24 Sep 2008 05:11:20 +0000 https://www.markhneedham.com/blog/2008/09/24/testing-with-joda-time/ The alternative to dealing with java.util.Date which I wrote about in a previous post is to make use of the Joda Time library. I’m led to believe that a lot of the ideas from Joda Time will in fact be in Java 7. Nevertheless when testing with Joda Time there are times when it would be useful for us to have control over the time our code is using. Where are we now? Where do we want to be? https://www.markhneedham.com/blog/2008/09/20/where-are-we-now-where-do-we-want-to-be/ Sat, 20 Sep 2008 17:32:01 +0000 https://www.markhneedham.com/blog/2008/09/20/where-are-we-now-where-do-we-want-to-be/ Listening to Dan North speaking last week I was reminded of one of my favourite NLP[*] techniques for making improvements on projects. The technique is the http://en.wikipedia.org/wiki/T.O.T.E.[TOTE] (Test, Operate, Test, Exit) and it is a technique designed to help us get from where we are now to where we want to be via short feedback loops. On my previous project we had a situation where we needed to build and deploy our application in order to show it to the client in a show case. Similarities between Domain Driven Design & Object Oriented Programming https://www.markhneedham.com/blog/2008/09/20/similarities-between-domain-driven-design-object-oriented-programming/ Sat, 20 Sep 2008 13:12:25 +0000 https://www.markhneedham.com/blog/2008/09/20/similarities-between-domain-driven-design-object-oriented-programming/ At the Alt.NET UK Conference which I attended over the weekend it occurred to me while listening to some of the discussions on Domain Driven Design that a lot of the ideas in DDD are actually very similar to those being practiced in Object Oriented Programming and related best practices. The similarities Anaemic Domain Model/Law of Demeter There was quite a bit of discussion in the session about anaemic domain models. Should we always use Domain Model? https://www.markhneedham.com/blog/2008/09/19/should-we-always-use-domain-mode/ Fri, 19 Sep 2008 08:34:35 +0000 https://www.markhneedham.com/blog/2008/09/19/should-we-always-use-domain-mode/ During the discussion about Domain Driven Design at the Alt.NET conference I felt like the idea of the Rich Domain Model was being represented as the only way to design software but I don’t feel that this is the case. As always in software we never have a silver bullet and there are times when Domain Model is not necessarily the best choice, just as there are times when OOP is not necessarily the best choice. Using java.util.Date safely https://www.markhneedham.com/blog/2008/09/18/using-javautildate-safely/ Thu, 18 Sep 2008 11:01:54 +0000 https://www.markhneedham.com/blog/2008/09/18/using-javautildate-safely/ Assuming that you are unable to use Joda Time on your project, there are some simple ways that I have come across that allow you to not suffer at the hands of java.util.Date. What’s wrong with java.util.date in the first place? First of all java.util.date is mutable. This means that if you create a java.util.date object its state can be modified after creation. This means that you can do an operation like the following, for example: Testing file system operations https://www.markhneedham.com/blog/2008/09/17/testing-file-system-operations/ Wed, 17 Sep 2008 15:48:37 +0000 https://www.markhneedham.com/blog/2008/09/17/testing-file-system-operations/ On my previous project one of the areas that we needed to work out how to test was around interaction with the file system. The decision that we needed to make was whether we should unit test this type of functionality or whether it could just be covered by a functional test. To Unit Test One of the patterns to use when unit testing things like this is the Gateway pattern. Team Productivity vs Individual Productivity https://www.markhneedham.com/blog/2008/09/16/team-productivity-vs-individual-productivity/ Tue, 16 Sep 2008 16:41:43 +0000 https://www.markhneedham.com/blog/2008/09/16/team-productivity-vs-individual-productivity/ I’ve been reading Neal Ford’s The Productive Programmer (my review) which is a book all about improving your productivity as an individual developer. It got me thinking that there are also ways that we can make teams more productive so that they are actually teams and not just a group of individuals who happen to work with each other. I’ve had the opportunity of working under some great Tech Leads who have helped create an environment where teams can perform to their maximum. What makes a good developer? https://www.markhneedham.com/blog/2008/09/16/what-make-a-good-developer/ Tue, 16 Sep 2008 10:07:28 +0000 https://www.markhneedham.com/blog/2008/09/16/what-make-a-good-developer/ Early last year I became very curious about what it was that made the best developers in the industry so good at what they do. Jay Fields points out some things that he believes indicate that a developer is good at the end of this post but a former colleague and I tried to come up with a list of areas that any Developer needed to be skilled in to justifiably consider themselves good. Clean Code: Book Review https://www.markhneedham.com/blog/2008/09/15/clean-code-book-review/ Mon, 15 Sep 2008 10:52:33 +0000 https://www.markhneedham.com/blog/2008/09/15/clean-code-book-review/ The Book Clean Code by Robert 'Uncle Bob' Martin The Review I first heard of Uncle Bob a couple of years ago in a conversation with Obie Fernandez and having previously read his Agile Principles, Patterns and Practices in C# book, when my colleague Alexandre Martins came back from JAOO Sydney raving about a talk on 'Clean Code' he’d seen I knew I had to buy this book when it came out. Alt.NET UK Conference 2.0 https://www.markhneedham.com/blog/2008/09/14/altnet-uk-conference-20/ Sun, 14 Sep 2008 16:28:27 +0000 https://www.markhneedham.com/blog/2008/09/14/altnet-uk-conference-20/ I spent most of yesterday at the 2nd Alt.NET UK conference at Conway Hall in London. First of all kudos to Ian Cooper, Alan Dean and Ben Hall for arranging it. There seemed to be a lot more people around than for the one in February which no doubt took a lot of arranging. It was again run using the open spaces format and we started with an interesting discussion on what Alt. Configurable Builds: One configuration file per machine https://www.markhneedham.com/blog/2008/09/13/configurable-builds-one-configuration-file-per-machine/ Sat, 13 Sep 2008 03:54:25 +0000 https://www.markhneedham.com/blog/2008/09/13/configurable-builds-one-configuration-file-per-machine/ I’ve covered some of the ways that I’ve seen for making builds configurable in previous posts: One configuration file per environment One configuration file per user Overriding properties One which I haven’t covered which my colleagues Gil Peeters and Jim Barritt have pointed out is having a build with one configuration file for each machine. Again the setup is fairly similar to one configuration per user or environment. Using Nant we would have the following near the top of the build file: The Productive Programmer: Book Review https://www.markhneedham.com/blog/2008/09/05/the-productive-programmer-book-review/ Fri, 05 Sep 2008 00:05:40 +0000 https://www.markhneedham.com/blog/2008/09/05/the-productive-programmer-book-review/ The Book The Productive Programmer by Neal Ford The Review I first came across this book when I was browsing Andy Hunt’s Pragmatic Thinking and Learning: Refactor Your Wetware on Amazon. It showed up as one of the related books. I had expected it to be a more theoretical book than it actually is. It is full of really useful command line tips and ways to use system tools and IDEs more effectively. BDD style unit test names https://www.markhneedham.com/blog/2008/09/04/bdd-style-unit-test-names/ Thu, 04 Sep 2008 00:05:18 +0000 https://www.markhneedham.com/blog/2008/09/04/bdd-style-unit-test-names/ A couple of my colleagues have been posting about how to name your unit tests based on this original post by Jay Fields. I think that test names are useful, especially when written in a BDD style expressing what a test is supposed to be doing. For example, in a C# NUnit test we might see the following as a test name: [Test] public void ShouldDoSomething() { // Code testing that we're doing something } I write all my tests like this and I’m often asked what the point of the 'Should' is, why not just name it 'DoSomething'. The Wisdom of Crowds and groupthink in Agile Software Development https://www.markhneedham.com/blog/2008/09/03/the-wisdom-of-crowds-and-groupthink-in-agile-software-development/ Wed, 03 Sep 2008 15:17:15 +0000 https://www.markhneedham.com/blog/2008/09/03/the-wisdom-of-crowds-and-groupthink-in-agile-software-development/ Gojko Adzic posted a summary of a talk James Surowiecki gave at Agile 2008 and it got me thinking how we use the Wisdom of Crowds in Agile projects. One of the most interesting things I learnt from the book is that when you bring together a diverse group of people, their output will probably be better than any one expert. Gojko points out this example that was used at Agile 2008: Configurable Builds: Overriding properties https://www.markhneedham.com/blog/2008/09/02/configurable-builds-overriding-properties/ Tue, 02 Sep 2008 14:49:02 +0000 https://www.markhneedham.com/blog/2008/09/02/configurable-builds-overriding-properties/ Sometimes when configuring our build for flexibility we don’t need to spend the time required to create one build configuration per user or one build configuration per environment. In these cases we can just override properties when we call Nant from the command line. One recent example where I made use of this was where we had one configuration file with properties in but wanted to override a couple of them when we ran the continuous integration build. Configurable Builds: One configuration file per user https://www.markhneedham.com/blog/2008/09/02/configurable-builds-one-configuration-file-per-user/ Tue, 02 Sep 2008 13:53:53 +0000 https://www.markhneedham.com/blog/2008/09/02/configurable-builds-one-configuration-file-per-user/ Following on from my first post about making builds configurable, the second way of doing this that I have seen is to have one configuration build file per user. This approach is more useful where there are different configurations needed on each developer machine. For example, if the databases being used for development are on a remote server then each developer machine would be assigned a database with a different name. Configurable Builds: One configuration file per environment https://www.markhneedham.com/blog/2008/09/02/configurable-builds-one-configuration-file-per-environment/ Tue, 02 Sep 2008 01:50:02 +0000 https://www.markhneedham.com/blog/2008/09/02/configurable-builds-one-configuration-file-per-environment/ One of the most important things when coding build files is to try and make them as configurable as possible. At the very least on an agile project there will be a need for two different configurations - one for developer machines and one for continuous integration. On my last two .NET projects we have setup our Nant build to take in a parameter which indicates which build configuration should be used. My Software Development journey so far https://www.markhneedham.com/blog/2008/09/01/my-software-development-journey-so-far/ Mon, 01 Sep 2008 01:01:09 +0000 https://www.markhneedham.com/blog/2008/09/01/my-software-development-journey-so-far/ While reading some of the rough drafts of Apprenticeship Patterns online I started thinking about the stages I have gone through on my Software Development journey so far. I have worked in the industry for just over 3 years; 1 year at Reed Business and 2 years at ThoughtWorks. Over that time my thoughts, opinions and ways of doing things have changed, and no doubt these will continue to evolve as I learn more and more. Perils of estimation https://www.markhneedham.com/blog/2008/08/31/perils-of-estimation/ Sun, 31 Aug 2008 00:24:51 +0000 https://www.markhneedham.com/blog/2008/08/31/perils-of-estimation/ I had my first opportunity to participate in release plan estimation over the last couple of weeks. I’ve done estimation before but never at such a high level. When doing this it appeared clear that there were two situations that we were trying to avoid: Under estimating Under estimating is where we predict that the amount of time taken to complete a piece of work will be less than it actually is. scp Nant Task - 'scp' failed to start. The system cannot find the file specified https://www.markhneedham.com/blog/2008/08/30/scp-nant-task-scp-failed-to-start-the-system-cannot-find-the-file-specified/ Sat, 30 Aug 2008 16:30:41 +0000 https://www.markhneedham.com/blog/2008/08/30/scp-nant-task-scp-failed-to-start-the-system-cannot-find-the-file-specified/ I was trying to make use of the Nant Contrib scp task earlier and was getting an error message which at the time seemed a bit strange (now of course having solve the problem it is obvious!) This was the task I was running: <scp file="someFile.txt" server="some.secure-server.com" /> This was the error: 'scp' failed to start. The system cannot find the file specified I ran it in debug mode to try and see what was going on and got this stack trace: Getting a strongly typed collection using LINQ to Xml https://www.markhneedham.com/blog/2008/08/30/getting-a-strongly-typed-collection-using-linq-to-xml/ Sat, 30 Aug 2008 03:03:58 +0000 https://www.markhneedham.com/blog/2008/08/30/getting-a-strongly-typed-collection-using-linq-to-xml/ I mentioned earlier that I have been playing around with LINQ to Xml for parsing a Visual Studio csproj file. While having namespace issues I decided to try and parse a simpler Xml file to try and work out what I was doing wrong. Given this fragment of Xml: <Node> <InnerNode>mark</InnerNode> <InnerNode>needham</InnerNode> </Node> I wanted to get a collection(IEnumerable) of InnerNode values. Unfortunately my over enthusiasm to use anonymous types meant that I caused myself more problems than I needed to. C# Thrift Examples https://www.markhneedham.com/blog/2008/08/29/c-thrift-examples/ Fri, 29 Aug 2008 01:39:52 +0000 https://www.markhneedham.com/blog/2008/08/29/c-thrift-examples/ As I mentioned in my earlier post I have been working with Facebook’s Thrift messaging project. Unfortunately there are not currently any C# examples of how to use the Data Transfer Objects the Thrift compiler generates for us on the official wiki. We managed to figure out how to do it by following the Java instructions and converting them into C# code. Before writing any code we need to import Thrift. Thrift as a message definition layer https://www.markhneedham.com/blog/2008/08/29/thrift-as-a-message-definition-layer/ Fri, 29 Aug 2008 00:42:51 +0000 https://www.markhneedham.com/blog/2008/08/29/thrift-as-a-message-definition-layer/ Thrift is a Facebook released open source project for cross language serialisation and RPC communication. We made use of it for our message definition layer - when it comes to messaging I’m a fan of the event based approach so we left the RPC stuff well alone. Why Thrift? The reason we used Thrift in the first place was because we had a requirement to get interoperability between a Java and . Querying Xml with LINQ - Don't forget the namespace https://www.markhneedham.com/blog/2008/08/28/querying-xml-with-linq-dont-forget-the-namespace/ Thu, 28 Aug 2008 10:15:45 +0000 https://www.markhneedham.com/blog/2008/08/28/querying-xml-with-linq-dont-forget-the-namespace/ I’ve been working with a colleague on parsing a Visual Studio project file using LINQ to effectively create a DOM of the file. The first thing we tried to do was get a list of all the references from the file. It seemed like a fairly easy problem to solve but for some reason nothing was getting returned: XDocument projectFile = XDocument.Load(projectFilePath.Path); var references = from itemGroupElement in projectFile.Descendants("ItemGroup").First().Elements() select itemGroupElement. Handling balances in systems https://www.markhneedham.com/blog/2008/08/27/handling-balances-in-systems/ Wed, 27 Aug 2008 21:47:15 +0000 https://www.markhneedham.com/blog/2008/08/27/handling-balances-in-systems/ On one of my previous projects one of the problems that we had to solve was how to handle balances - we were working on a cash service for a financial services company. The main discussion often centres around how often the balance should be updated. From my experience there are two main ways that we can go about this: Real time update after every transaction This is perhaps the most obvious approach and the implementation is fairly simple. Resharper templates https://www.markhneedham.com/blog/2008/08/27/resharper-templates/ Wed, 27 Aug 2008 11:58:03 +0000 https://www.markhneedham.com/blog/2008/08/27/resharper-templates/ One of the first things that I do when I go onto a project is setup a ReSharper template for writing tests. I generally set this up so that when I type 'should' I can press tab and it will automatically create an outline of a test method for me. Creating a template is as simple as going to 'ReSharper > Live Templates' from Visual Studio. I have attached several templates that I seem to end up writing over and over again. Agile - Should we track more than just development? https://www.markhneedham.com/blog/2008/08/26/agile-should-we-track-more-than-just-development/ Tue, 26 Aug 2008 23:57:40 +0000 https://www.markhneedham.com/blog/2008/08/26/agile-should-we-track-more-than-just-development/ I touched earlier on the transparency of agile and I’ve been thinking about some of the ways that we track/report information in agile projects. In all the projects I’ve been involved in the data being tracked almost exclusively referred to development time. One of the advantages of having continuous analysis/development/testing is that we are able to reduce the time spent on the System Integration and User Acceptance Testing phases of the project. The transparency of Agile https://www.markhneedham.com/blog/2008/08/26/the-transparency-of-agile/ Tue, 26 Aug 2008 11:46:46 +0000 https://www.markhneedham.com/blog/2008/08/26/the-transparency-of-agile/ One of the key ideas behind agile software development is providing information as early as possible to allow the business to best make decisions. There are a variety of ways that this is done including the use of burn up charts, estimates of scope and velocity for example. This data is compiled to try and give an accurate idea of how long a project is likely to take so that the business can work out early on whether the value it adds is worth the expected cost. NCover Nant Team City Integration https://www.markhneedham.com/blog/2008/08/25/ncover-nant-team-city-integration/ Mon, 25 Aug 2008 21:29:03 +0000 https://www.markhneedham.com/blog/2008/08/25/ncover-nant-team-city-integration/ I’ve been spending quite a bit of time setting up NCover and then integrating it into Team City. I’ve read some posts which cover parts of this process but nothing which covers the end to end process so hopefully my experience can help to fill that void. Step 1 Download NCover 1.5.8, NCover Explorer 1.4.0.7, NCover Explorer Extras 1.4.0.5 from Kiwidude’s website and the NCover website . Step 2 Put the following into your Nant build file: Encapsulation in build scripts using nant https://www.markhneedham.com/blog/2008/08/21/encapsulation-in-build-scripts-using-nant/ Thu, 21 Aug 2008 00:40:45 +0000 https://www.markhneedham.com/blog/2008/08/21/encapsulation-in-build-scripts-using-nant/ When writing build scripts it’s very easy for it to descend into complete Xml hell when you’re using a tool like nant. I wondered previously whether it was possible to TDD build files and while this is difficult given the dependency model most build tools follow. That doesn’t mean we can’t apply other good design principles from the coding world however. Encapsulation is one of the key principles of OOP and it can be applied in build files too. Building in release mode with no pdbs with msbuild https://www.markhneedham.com/blog/2008/08/20/building-in-release-mode-with-no-pdbs-with-msbuild/ Wed, 20 Aug 2008 18:50:18 +0000 https://www.markhneedham.com/blog/2008/08/20/building-in-release-mode-with-no-pdbs-with-msbuild/ I’ve been having trouble trying to work out how to build our projects in msbuild in release mode without creating the customary pdb files that seem to be created by default. I tried calling msbuild.exe with the 'Release' configuration: 'C:\WINDOWS\Microsoft.NET\Framework\v3.5\MSBuild.Exe ( Proj.csproj /p:OutputPath=\output\path\ /p:Configuration=Release)' To no avail. It still created the pdb file. Next I tried setting the 'DebugSymbols' property to false: 'C:\WINDOWS\Microsoft.NET\Framework\v3.5\MSBuild.Exe ( Proj.csproj /p:OutputPath=\output\path\ /p:Configuration=Release /p:DebugSymbols=false)' Still it created the file. The Information Wall https://www.markhneedham.com/blog/2008/08/20/the-information-wall/ Wed, 20 Aug 2008 00:22:27 +0000 https://www.markhneedham.com/blog/2008/08/20/the-information-wall/ Sometimes the simplest things can provide the greatest value to project teams. We often look for a technical solution to problems where something simpler would achieve the same aim. The Information Wall is as its name may suggest a place where you can put information that people in the team need to know but which they have not (yet) committed to memory. Examples of things that you could put on an information wall could be: NCover - Requested value '/r' was not found https://www.markhneedham.com/blog/2008/08/19/ncover-requested-value-r-was-not-found/ Tue, 19 Aug 2008 21:18:44 +0000 https://www.markhneedham.com/blog/2008/08/19/ncover-requested-value-r-was-not-found/ I’ve been trying to integrate NCover into our build and probably making life harder for myself than it needs to be. The title refers to the error message that I was getting when trying to run the ncover nant task on version 1.0.1 of NCover earlier today. [ncover] Starting 'C:\Program Files\NCover\ncover-console.exe (//r "\long\path\to\tmp392.tmp.ncoversettings" )' in 'C:\my-project\trunk\src' [ncover] Unhandled Exception: System.ArgumentException: Requested value '/r' was not found. [ncover] at System.Enum.Parse(Type enumType, String value, Boolean ignoreCase) [ncover] at NCover. From Prototype to Delivery https://www.markhneedham.com/blog/2008/08/18/from-prototype-to-delivery/ Mon, 18 Aug 2008 22:39:49 +0000 https://www.markhneedham.com/blog/2008/08/18/from-prototype-to-delivery/ Projects often reach the interesting point where the prototyping and development phases intersect and there are some interesting decisions to make. From a development point of view the biggest decision is what to do with the code that has been developed. When developing prototypes the focus tends to be on getting something to work quick and dirty. Not a lot of time is put into considering edge cases or error conditions or any of the other niceties that are needed for software to be usable in an enterprise environment. Returning from methods https://www.markhneedham.com/blog/2008/08/17/returning-from-methods/ Sun, 17 Aug 2008 23:05:33 +0000 https://www.markhneedham.com/blog/2008/08/17/returning-from-methods/ When pair programming there are obviously times when you have different opinions about how things should be done. One of these is the way that we should return from methods. There seem to be two approaches when it comes to this: Exit as quickly as possible The goal with this approach is as the title suggests, to get out of the method at the earliest possible moment. The Guard Block is the best example of this. Naming the patterns we use in code https://www.markhneedham.com/blog/2008/08/16/naming-the-patterns-we-use-in-code/ Sat, 16 Aug 2008 23:58:17 +0000 https://www.markhneedham.com/blog/2008/08/16/naming-the-patterns-we-use-in-code/ I’ve been playing around with C#'s Xml libraries today and in particular the XmlWriter class. I wanted to use it to create an Xml document so I called the XmlWriter.Create() method. One of the overloads for this methods takes in a StringBuilder which I initially thought the XmlWriter used to create the Xml document. In fact it actually writes the Xml Document into this StringBuilder. This is actually possible to deduct from the documentation provided on the Create method but I only glanced at the type needed initially and misunderstood how it worked. Null Handling Strategies https://www.markhneedham.com/blog/2008/08/16/null-handling-strategies/ Sat, 16 Aug 2008 01:03:03 +0000 https://www.markhneedham.com/blog/2008/08/16/null-handling-strategies/ I mentioned in an earlier post my dislike of the passing of null around code, and since then there have been a couple of posts on the subject on the ThoughtWorks blogs. I had always thought that was a silver bullet for the way that we can handle null objects in our code but it seems from reading other people’s opinions and from my own experience that this is not the case (surprise, surprise! Getting latest tagged revision in SVN from DOS/Batch script https://www.markhneedham.com/blog/2008/08/16/getting-latest-tagged-revision-in-svn-from-dosbatch-script/ Sat, 16 Aug 2008 00:10:51 +0000 https://www.markhneedham.com/blog/2008/08/16/getting-latest-tagged-revision-in-svn-from-dosbatch-script/ The way we have setup the build on our continuous integration server, Team City is configured to create a new tag every time the functional tests past successful on that machine. We then have a QA and Showcase build that we can run to deploy all the artifacts necessary to launch the application on that machine. Originally I had just written the batch script to take in the tag of the build which the user could find by looking through repo-browser for the last tag created. First thoughts on using var in C# 3.0 with Resharper https://www.markhneedham.com/blog/2008/08/15/first-thoughts-on-using-var-in-c-30-with-resharper/ Fri, 15 Aug 2008 08:03:09 +0000 https://www.markhneedham.com/blog/2008/08/15/first-thoughts-on-using-var-in-c-30-with-resharper/ One of the first things I noticed when coming into the world of C# 3.0 was the use of the key word 'var' all over our code base. I had read about it previously and was under the impression that its main use would be when writing code around LINQ or when creating anonymous types. On getting Resharper to tidy up my code I noticed that just about every variable type declaration had been removed and replaced with var. Macros in nant https://www.markhneedham.com/blog/2008/08/14/macros-in-nant/ Thu, 14 Aug 2008 21:49:04 +0000 https://www.markhneedham.com/blog/2008/08/14/macros-in-nant/ One of my favourite features of ant is the ability to create macros where you can define common behaviour and then call it from the rest of your build script. Unfortunately that task doesn’t come with nant and it’s not available on nant-contrib either. We were using a very roundabout way to build the various projects in our solution.12345678910111213141516171819~</td> <target name="compile"> <foreach item="Folder" property="folderName"> <include name="${project::get-base-directory()}\Project1" /> <include name="${project::get-base-directory()}\Project2" /> <property name="project. msbuild - Use OutputPath instead of OutDir https://www.markhneedham.com/blog/2008/08/14/msbuild-use-outputpath-instead-of-outdir/ Thu, 14 Aug 2008 19:54:03 +0000 https://www.markhneedham.com/blog/2008/08/14/msbuild-use-outputpath-instead-of-outdir/ We’ve been using msbuild to build our project files on my current project and a colleague and I noticed some strange behaviour when trying to set the directory that the output should be built to. The problem was whenever we tried to set the output directory (using OutDir) to somewhere where there was a space in the directory name it would just fail catastrophically. We spent ages searching for the command line documentation before finding it here. Auto complete with tab in DOS https://www.markhneedham.com/blog/2008/08/13/auto-completion-with-tab-in-dos/ Wed, 13 Aug 2008 23:55:38 +0000 https://www.markhneedham.com/blog/2008/08/13/auto-completion-with-tab-in-dos/ It’s becoming quite a couple of weeks of learning for me around DOS and I have another tip that I just learnt today. I always found it really frustrating when using the windows command prompt that I couldn’t get Unix style tab auto completion. To navigate my way to a directory I would do the following:1234567~ </td> C:\>cd DownloadsC:\Downloads>cd nant-0.85C:\Downloads\nant-0.85>cd binC:\Downloads\nant-0.85\bin>~ </td> </tr> </tbody></table> It is very tedious as you might imagine. Pair Programming: Junior/Junior pair https://www.markhneedham.com/blog/2008/08/13/pair-programming-juniorjunior-pair/ Wed, 13 Aug 2008 23:18:24 +0000 https://www.markhneedham.com/blog/2008/08/13/pair-programming-juniorjunior-pair/ When it comes to Pair Programming the Junior/Junior pairing is considered the most likely to lead to disaster. The old joke being that if you pair two Junior Developers together then you’d better hope you have a revert function on your repository. But is this fair? Certainly pairing two Junior Developers together means that you automatically lose the value of the experience and mentoring skills that you would get if there was a Senior Developer as part of the pair. If Else statements in batch files https://www.markhneedham.com/blog/2008/08/13/if-else-statements-in-batch-files/ Wed, 13 Aug 2008 22:27:40 +0000 https://www.markhneedham.com/blog/2008/08/13/if-else-statements-in-batch-files/ I mentioned in a couple of earlier posts that I’ve been doing quite a bit of work with batch files and the windows command line, and today I wanted to do an If Else statement in one of my scripts. I thought it would be relatively simple, but after various searches and having read articles that suggested that there wasn’t an ELSE construct in batch land I finally found a forum post which explained how to do it. Dependency Tasks https://www.markhneedham.com/blog/2008/08/12/dependency-tasks/ Tue, 12 Aug 2008 23:48:40 +0000 https://www.markhneedham.com/blog/2008/08/12/dependency-tasks/ I am a big fan of Pat Kua’s tiny tasks and to see my desk would be to believe that there had been an invasion of yellow stickies on the planet. Pat explains the idea on his website but to summarise; the idea is that given a story, you break it down into the individual tasks that need to be done in order for it to be complete, write each tasks on a sticky and then when that task is finished throw the sticky away. Getting the current working directory from DOS or Batch file https://www.markhneedham.com/blog/2008/08/12/getting-the-current-working-directory-from-dos-or-batch-file/ Tue, 12 Aug 2008 22:37:27 +0000 https://www.markhneedham.com/blog/2008/08/12/getting-the-current-working-directory-from-dos-or-batch-file/ In the world of batch files I’ve been trying for ages to work out how to get the current/present working directory to make the batch script I’m working on a bit more flexible. In Unix it’s easy, just call 'pwd' and you have it. I wasn’t expecting something that simple in Windows but it is! A call to 'cd' is all that’s needed. If you need to set it in a batch script the following line does the trick:~ </td> set WORKING_DIRECTORY=%cd%~ </td> </tr> </tbody></table> I was surprised that something so simple (I do now feel like an idiot) wasn’t easier to find on Google. Pair Programming: Pairing with a QA https://www.markhneedham.com/blog/2008/08/11/pair-programming-pairing-with-a-qa/ Mon, 11 Aug 2008 22:10:19 +0000 https://www.markhneedham.com/blog/2008/08/11/pair-programming-pairing-with-a-qa/ I’ve talked about pair programming in some of my previous posts as I find the dynamic it creates quite intriguing. One idea which was suggested around the time I wrote those posts by my project manager at the time was developers pairing with the QA or BA on certain tasks. I didn’t get to experience it on that particular project but over the last week I’ve been doing quick a bit of build work and for some of that I was pairing with our QA. Does generalising specialist mean you can't be the best? https://www.markhneedham.com/blog/2008/08/11/does-generalising-specialist-mean-you-cant-be-the-best/ Mon, 11 Aug 2008 05:31:55 +0000 https://www.markhneedham.com/blog/2008/08/11/does-generalising-specialist-mean-you-cant-be-the-best/ It’s often said that people who are really good at what they do are so good at it because they have narrowed their focus in their area of specialty until they are only doing the thing that they are good at. To use a football analogy, Manchester United’s Cristiano Ronaldo - widely acknowledged as the best footballer in the world at the moment - is absolutely brilliant when he has the ball at his feet, taking on defenders, getting in shots around the opposition penalty area. Controlling window position with the win32 API https://www.markhneedham.com/blog/2008/08/10/controlling-window-position-with-the-win32-api/ Sun, 10 Aug 2008 03:02:47 +0000 https://www.markhneedham.com/blog/2008/08/10/controlling-window-position-with-the-win32-api/ We’ve been doing a bit of work around controlling the state of the windows of applications launched programmatically. The problem we were trying to solve is to launch an arbitrary application, move it around the screen and then save its window position on the screen so that next time it’s launched it loads in the same position. There are some win32 APIs designed to do just this, although it took a fair bit of searching and trial and error to work out exactly how to use them. Hiring Developers - not just about the code https://www.markhneedham.com/blog/2008/08/10/hiring-developers-not-just-about-the-code/ Sun, 10 Aug 2008 01:23:38 +0000 https://www.markhneedham.com/blog/2008/08/10/hiring-developers-not-just-about-the-code/ It seems programmers are taking a bit of a hammering this week! Kris Kemper talks about the Net Negative Producing Programmer referring to a paper linked to by Jay Fields, concluding that the code submission is very important in helping to distinguish between good and bad candidates. Now I probably haven’t done as many interviews at ThoughtWorks as Kris has but from what I’ve seen of the recruitment process it seems to be more focused on ensuring that potential hires culturally fit into the organisation rather than that they write the best code that anyone has ever seen. IntelliJ style item tracking in Visual Studio https://www.markhneedham.com/blog/2008/08/09/intellij-style-item-tracking-in-visual-studio/ Sat, 09 Aug 2008 14:51:29 +0000 https://www.markhneedham.com/blog/2008/08/09/intellij-style-item-tracking-in-visual-studio/ One of my favourite features of IntelliJ is that it tracks the item that you currently have open on your Solution Explorer. I thought this wasn’t possible in Visual Studio and had resigned myself to trying to remember which project each file was in. Luckily for me a colleague pointed out that it is in fact possible but is just turned off by default. Tools > Options > Projects and Solutions > Check 'Track Active Item in Solution Explorer' Ruby: Parameterising with ActiveResource https://www.markhneedham.com/blog/2008/08/08/ruby-parameterising-with-activeresource/ Fri, 08 Aug 2008 22:16:02 +0000 https://www.markhneedham.com/blog/2008/08/08/ruby-parameterising-with-activeresource/ We’ve been using Ruby/Rails on my current project to create a RESTful web service. One of the problems we wanted to solve was making the data queried by this web service configurable from our build. We started off with the following bit of code (which makes use of the recently added ActiveResource class):123~ </td> class MyClass < ActiveResource::[.co]Base self.site = [.dl]"[.k]#http://localhost:3000/[.dl]"[.r]#end~ </td> </tr> </tbody></table> And then called this class as follows: 1~ </td> MyClass. Spaces in batch scripts https://www.markhneedham.com/blog/2008/08/08/spaces-in-batch-scripts/ Fri, 08 Aug 2008 20:10:49 +0000 https://www.markhneedham.com/blog/2008/08/08/spaces-in-batch-scripts/ Since reading The Pragmatic Programmer I’ve become a bit of an automation junkie and writing batch scripts falls right under that category. Unfortunately, nearly every single time I write one I forget that Windows really hates it when you have spaces in variable assignments, and I forget how to print out a usage message if the right number of parameters are not passed in. So as much for me as for everyone else, this is how you do it:12345678910111213141516~ </td> @ECHO offIF [%1]==[] GOTO usageIF [%2]==[] GOTO usageset VAR1=%1set VAR2=%2rem important client stuffgoto end:usageecho Usage: script. TeamCity's strange default build location https://www.markhneedham.com/blog/2008/08/08/teamcitys-strange-default-build-location/ Fri, 08 Aug 2008 19:52:50 +0000 https://www.markhneedham.com/blog/2008/08/08/teamcitys-strange-default-build-location/ We’ve been using TeamCity on my current project and it’s proven to be fairly impressive in general. We’re running quite a few different builds which have dependencies on each other and it’s been pretty much one click on the web admin tool to get that set up. One thing that had me really confused is the default location it chooses to build from. The problem is that it seems to change arbitrarily, with the folder name it builds in being calculated from a VSC hash (not sure quite how that’s worked out but there we go). Keyboard shortcut for running tests with Resharper https://www.markhneedham.com/blog/2008/08/08/keyboard-shortcut-for-running-tests-with-resharper/ Fri, 08 Aug 2008 19:23:13 +0000 https://www.markhneedham.com/blog/2008/08/08/keyboard-shortcut-for-running-tests-with-resharper/ Having moved back into the world of C#/.NET development after a few months in the Java world I have had the joy of getting to use Resharper again. One annoyance that myself and my team have been having over the past few weeks is running unit tests. We always end up going to the Solution Explorer, right click the project and then click 'Run Unit Tests'. There is another way… If they were that rubbish... https://www.markhneedham.com/blog/2008/08/08/if-they-were-that-rubbish/ Fri, 08 Aug 2008 19:15:56 +0000 https://www.markhneedham.com/blog/2008/08/08/if-they-were-that-rubbish/ Jay Fields certainly seemed to make some waves in the blogosphere with his recent post about 50% of the people in business software development needing to find a new profession. As a consultant I often go onto projects where a significant amount of difficult to understand and often untested code is in place. At times it feels like the people who have written it really don’t care about the quality of their work which can be very disheartening. Do IDEs encourage bad code? https://www.markhneedham.com/blog/2008/07/27/do-ides-encourage-bad-code/ Sun, 27 Jul 2008 11:43:30 +0000 https://www.markhneedham.com/blog/2008/07/27/do-ides-encourage-bad-code/ Although modern day IDEs (Eclipse, IntelliJ, Resharper etc) undoubtedly provide a lot of benefits when writing code, I am starting to wonder if the ease at which they make things possible actually encourages bad habits. Useful features such as creating and initialising member variables from the definition of a constructor are quickly nullified by the ease at which one is able to create getters/setters/properties for these same member variables. All hopes of encapsulation gone with a few clicks of the mouse. Null checks everywhere and airport security https://www.markhneedham.com/blog/2008/07/18/null-checks-everywhere-and-airport-security/ Fri, 18 Jul 2008 08:32:00 +0000 https://www.markhneedham.com/blog/2008/07/18/null-checks-everywhere-and-airport-security/ Having just flown half way across the world from Sydney to London I’ve been thinking about how airport security is done and noticed a somewhat interesting link to the use of null checks in code. In Sydney and Dubai airports my baggage and I were scanned only once before I was able to get onto the plane. I wasn’t scanned again when I went to the departure gate nor when I got onto the aircraft. Pair Programming: The Non Driving Pair https://www.markhneedham.com/blog/2008/02/14/pair-programming-the-non-driving-pair/ Thu, 14 Feb 2008 01:27:58 +0000 https://www.markhneedham.com/blog/2008/02/14/pair-programming-the-non-driving-pair/ One of the intriguing aspects of pair programming for me is that of the non driving person in the pair — what are they supposed to do?! Obviously there are fairly well known strategies for more interactive pairing, such as Ping Pong and Ball and Board (which is where one person controls the mouse and the other the keyboard), but neither of these strategies suggest what to do when you are not driving Feedback: Positive Reinforcement/Change yourself first https://www.markhneedham.com/blog/2008/02/12/feedback-positive-reinforcementchange-yourself-first/ Tue, 12 Feb 2008 00:01:58 +0000 https://www.markhneedham.com/blog/2008/02/12/feedback-positive-reinforcementchange-yourself-first/ One of the more interesting concepts used on the NLP course that I did last year was the idea of only giving positive feedback to people. The thinking behind the theory (which I think comes from Robert Dilts, one of the early thinkers behind NLP) is that people know what they are doing wrong and already beat themselves up about it; therefore there is no point you mentioning it as well. Pair Programming: Introduction https://www.markhneedham.com/blog/2008/02/10/pair-programming-introduction/ Sun, 10 Feb 2008 01:47:25 +0000 https://www.markhneedham.com/blog/2008/02/10/pair-programming-introduction/ I’ve had the opportunity to have worked with many different people pair programming wise over the last year or so, and having seen it done in several different ways thought it would be interesting to work through some of my thoughts about this Extreme Programming (XP) originated practice. First of all it seems to me that pair programming is a technique that is used with a lot more frequency at ThoughtWorks than at any other IT organisation. Learning theory first https://www.markhneedham.com/blog/2008/02/09/learning-theory-first/ Sat, 09 Feb 2008 13:15:11 +0000 https://www.markhneedham.com/blog/2008/02/09/learning-theory-first/ I’ve always been the type of person who only gets the motivation to do something if there is some useful practical reason for doing so. It therefore probably doesn’t come as much of a surprise that I hated the majority of my mostly theoretical computer science degree. I was talking to one of my colleagues last week and came out of the conversation convinced that the desire to know the theory behind concepts is amplified when you actually get to see it in action in a real life system. Active listening https://www.markhneedham.com/blog/2006/09/03/active-listening/ Sun, 03 Sep 2006 15:39:06 +0000 https://www.markhneedham.com/blog/2006/09/03/active-listening/ One of the first unusual (to me) things that I noticed from the trainers at ThoughtWorks University was that when they were listening to participants they would often ask questions and re-frame the participants' comments. Intrigued and impressed by this I spoke to one of the trainers and was told that they were engaging in 'active listening'. Wikipedia defines the term as follows: Active listening is an intent "listening for meaning" in which the listener checks with the speaker to see that a statement has been correctly heard and understood. Giving effective feedback https://www.markhneedham.com/blog/2006/09/02/giving-effective-feedback/ Sat, 02 Sep 2006 03:07:45 +0000 https://www.markhneedham.com/blog/2006/09/02/giving-effective-feedback/ One of the most interesting things I have discovered since starting at ThoughtWorks earlier this month is the emphasis that is placed on giving feedback. The first lesson we were taught about giving feedback was that it could be one of two types. Either it should Strengthen Confidence or Increase Effectiveness. In Layman’s term that means that if you want to make a positive comment about somebody’s contribution then you should make reference to something specific that you believe they have done well so that they can continue doing it. Inheritance and Delegation https://www.markhneedham.com/blog/2006/09/02/inheritance-and-delegation/ Sat, 02 Sep 2006 01:31:40 +0000 https://www.markhneedham.com/blog/2006/09/02/inheritance-and-delegation/ One of the major learning points this week at TWU has been understanding when it is appropriate to use inheritance and when delegation is the better choice. I had heard stories about how inheritance could be misused but I didn’t think I would be stupid enough to fall straight into that trap! We were taught the concept using 'Measurement' as the problem domain. So to translate the previous sentence into English: The aim was to design classes which could handle old school measurement types such as Inches, Feet, Yards, and so on. Watching a master at work https://www.markhneedham.com/blog/2006/09/02/watching-a-master-at-work/ Sat, 02 Sep 2006 01:01:44 +0000 https://www.markhneedham.com/blog/2006/09/02/watching-a-master-at-work/ I’ve always found it fascinating watching people who really excel in their field going about business, be it footballers, tennis players, actors, whoever. This week at TWU I’ve been playing around with some Ruby on Rails as I mentioned in the previous post, and yesterday I had the opportunity to watch one of the leading figures in the Ruby on Rails field at work. Take a bow Obie Fernandez, who gave several of the TWU attendees a demonstration of how to develop applications using Ruby on Rails. First thoughts on Ruby... https://www.markhneedham.com/blog/2006/08/29/first-thoughts-on-ruby/ Tue, 29 Aug 2006 20:01:05 +0000 https://www.markhneedham.com/blog/2006/08/29/first-thoughts-on-ruby/ I’ve heard a lot about Ruby on Rails over the last couple of years but I’d never really been intrigued to get it set up on my machine and 'have a play' with it so to speak. It turned out to be a relatively painless process and after following the instructions on the official site I had it all setup within about half an hour which was a record for me for getting a development environment setup. https://www.markhneedham.com/blog/2009/04/11/the-mythical-man-month-book-review/ Mon, 01 Jan 0001 00:00:00 +0000 https://www.markhneedham.com/blog/2009/04/11/the-mythical-man-month-book-review/ https://www.markhneedham.com/blog/2017/04/27/python-flask-generating-a-static-html-page/ Mon, 01 Jan 0001 00:00:00 +0000 https://www.markhneedham.com/blog/2017/04/27/python-flask-generating-a-static-html-page/ draft = false date="2017-04-27 20:59:56" title="Python: Flask - Generating a static HTML page" tag=['python'] category=['Python'] description="Learn how to generate a static HTML page using Python's Flask library." Whenever I need to quickly spin up a web application Python’s Flask library is my go to tool but I recently found myself wanting to generate a static HTML to upload to S3 and wondered if I could use it for that as well.