Mark Needham

Thoughts on Software Development

Archive for the ‘golang’ tag

Go: Multi-threaded writing to a CSV file

with one comment

As part of a Go script I’ve been working on I wanted to write to a CSV file from multiple Go routines, but realised that the built in CSV Writer isn’t thread safe.

My first attempt at writing to the CSV file looked like this:

package main
 
 
import (
	"encoding/csv"
	"os"
	"log"
	"strconv"
)
 
func main() {
 
	csvFile, err := os.Create("/tmp/foo.csv")
	if err != nil {
		log.Panic(err)
	}
 
	w := csv.NewWriter(csvFile)
	w.Write([]string{"id1","id2","id3"})
 
	count := 100
	done := make(chan bool, count)
 
	for i := 0; i < count; i++ {
		go func(i int) {
			w.Write([]string {strconv.Itoa(i), strconv.Itoa(i), strconv.Itoa(i)})
			done <- true
		}(i)
	}
 
	for i:=0; i < count; i++ {
		<- done
	}
	w.Flush()
}

This script should output the numbers from 0-99 three times on each line. Some rows in the file are written correctly, but as we can see below, some aren’t:

40,40,40
37,37,37
38,38,38
18,18,39
^@,39,39
...
67,67,70,^@70,70
65,65,65
73,73,73
66,66,66
72,72,72
75,74,75,74,75
74
7779^@,79,77
...

One way that we can make our script safe is to use a mutex whenever we’re calling any methods on the CSV writer. I wrote the following code to do this:

type CsvWriter struct {
	mutex *sync.Mutex
	csvWriter *csv.Writer
}
 
func NewCsvWriter(fileName string) (*CsvWriter, error) {
	csvFile, err := os.Create(fileName)
	if err != nil {
		return nil, err
	}
	w := csv.NewWriter(csvFile)
	return &CsvWriter{csvWriter:w, mutex: &sync.Mutex{}}, nil
}
 
func (w *CsvWriter) Write(row []string) {
	w.mutex.Lock()
	w.csvWriter.Write(row)
	w.mutex.Unlock()
}
 
func (w *CsvWriter) Flush() {
	w.mutex.Lock()
	w.csvWriter.Flush()
	w.mutex.Unlock()
}

We create a mutex when NewCsvWriter instantiates CsvWriter and then use it in the Write and Flush functions so that only one go routine at a time can access the underlying CsvWriter. We then tweak the initial script to call this class instead of calling CsvWriter directly:

func main() {
	w, err := NewCsvWriter("/tmp/foo-safe.csv")
	if err != nil {
		log.Panic(err)
	}
 
	w.Write([]string{"id1","id2","id3"})
 
	count := 100
	done := make(chan bool, count)
 
	for i := 0; i < count; i++ {
		go func(i int) {
			w.Write([]string {strconv.Itoa(i), strconv.Itoa(i), strconv.Itoa(i)})
			done <- true
		}(i)
	}
 
	for i:=0; i < count; i++ {
		<- done
	}
	w.Flush()
}

And now if we inspect the CSV file all lines have been written successfully:

...
25,25,25
13,13,13
29,29,29
32,32,32
26,26,26
30,30,30
27,27,27
31,31,31
28,28,28
34,34,34
35,35,35
33,33,33
37,37,37
36,36,36
...

That’s all for now. If you have any suggestions for a better way to do this do let me know in the comments or on twitter – I’m @markhneedham

Written by Mark Needham

January 31st, 2017 at 5:57 am

Posted in Go

Tagged with ,

Go vs Python: Parsing a JSON response from a HTTP API

without comments

As part of a recommendations with Neo4j talk that I’ve presented a few times over the last year I have a set of scripts that download some data from the meetup.com API.

They’re all written in Python but I thought it’d be a fun exercise to see what they’d look like in Go. My eventual goal is to try and parallelise the API calls.

This is the Python version of the script:

import requests
import os
import json
 
key =  os.environ['MEETUP_API_KEY']
lat = "51.5072"
lon = "0.1275"
 
seed_topic = "nosql"
uri = "https://api.meetup.com/2/groups?&amp;topic={0}&amp;lat={1}&amp;lon={2}&amp;key={3}".format(seed_topic, lat, lon, key)
 
r = requests.get(uri)
all_topics = [topic["urlkey"]  for result in r.json()["results"] for topic in result["topics"]]
 
for topic in all_topics:
    print topic

We’re using the requests library to send a request to the meetup API to get the groups which have the topic ‘nosql’ in the London area. We then parse the response and print out the topics.

Now to do the same thing in Go! The first bit of the script is almost identical:

import (
	"fmt"
	"os"
	"net/http"
	"log"
	"time"
)
 
func handleError(err error) {
	if err != nil {
		fmt.Println(err)
		log.Fatal(err)
	}
}
 
func main() {
	var httpClient = &amp;http.Client{Timeout: 10 * time.Second}
 
	seedTopic := "nosql"
	lat := "51.5072"
	lon := "0.1275"
	key := os.Getenv("MEETUP_API_KEY")
 
	uri := fmt.Sprintf("https://api.meetup.com/2/groups?&amp;topic=%s&amp;lat=%s&amp;lon=%s&amp;key=%s", seedTopic, lat, lon, key)
 
	response, err := httpClient.Get(uri)
	handleError(err)
	defer response.Body.Close()
	fmt.Println(response)
}

If we run that this is the output we see:

$ go cmd/blog/main.go
 
&amp;{200 OK 200 HTTP/2.0 2 0 map[X-Meetup-Request-Id:[2d3be3c7-a393-4127-b7aa-076f150499e6] X-Ratelimit-Reset:[10] Cf-Ray:[324093a73f1135d2-LHR] X-Oauth-Scopes:[basic] Etag:["35a941c5ea3df9df4204d8a4a2d60150"] Server:[cloudflare-nginx] Set-Cookie:[__cfduid=d54db475299a62af4bb963039787e2e3d1484894864; expires=Sat, 20-Jan-18 06:47:44 GMT; path=/; domain=.meetup.com; HttpOnly] X-Meetup-Server:[api7] X-Ratelimit-Limit:[30] X-Ratelimit-Remaining:[29] X-Accepted-Oauth-Scopes:[basic] Vary:[Accept-Encoding,User-Agent,Accept-Language] Date:[Fri, 20 Jan 2017 06:47:45 GMT] Content-Type:[application/json;charset=utf-8]] 0xc420442260 -1 [] false true map[] 0xc4200d01e0 0xc4202b2420}

So far so good. Now we need to parse the response that comes back.

Most of the examples that I came across suggest creating a struct with all the fields that you want to extract from the JSON document but that feels a bit over kill for such a simple script.

Instead we can just create maps of (string -> interface{}) and then apply type conversions where appropriate. I ended up with the following code to extract the topics:

import "encoding/json"
 
var target map[string]interface{}
decoder := json.NewDecoder(response.Body)
decoder.Decode(&amp;target)
 
for _, rawGroup := range target["results"].([]interface{}) {
    group := rawGroup.(map[string]interface{})
    for _, rawTopic := range group["topics"].([]interface{}) {
        topic := rawTopic.(map[string]interface{})
        fmt.Println(topic["urlkey"])
    }
}

It’s more verbose that the Python version because we have to explicitly type each thing we take out of the map at every stage, but it’s not too bad. This is the full script:

package main
 
import (
	"fmt"
	"os"
	"net/http"
	"log"
	"time"
	"encoding/json"
)
 
func handleError(err error) {
	if err != nil {
		fmt.Println(err)
		log.Fatal(err)
	}
}
 
func main() {
	var httpClient = &amp;http.Client{Timeout: 10 * time.Second}
 
	seedTopic := "nosql"
	lat := "51.5072"
	lon := "0.1275"
	key := os.Getenv("MEETUP_API_KEY")
 
	uri := fmt.Sprintf("https://api.meetup.com/2/groups?&amp;topic=%s&amp;lat=%s&amp;lon=%s&amp;key=%s", seedTopic, lat, lon, key)
 
	response, err := httpClient.Get(uri)
	handleError(err)
	defer response.Body.Close()
 
	var target map[string]interface{}
	decoder := json.NewDecoder(response.Body)
	decoder.Decode(&amp;target)
 
	for _, rawGroup := range target["results"].([]interface{}) {
		group := rawGroup.(map[string]interface{})
		for _, rawTopic := range group["topics"].([]interface{}) {
			topic := rawTopic.(map[string]interface{})
			fmt.Println(topic["urlkey"])
		}
	}
}

Once I’ve got these topics the next step is to make more API calls to get the groups for those topics.

I want to make those API calls in parallel while making sure I don’t exceed the rate limit restrictions on the API and I think I can make use of go routines, channels, and timers to do that. But that’s for another post!

Written by Mark Needham

January 21st, 2017 at 10:49 am

Posted in Python

Tagged with ,

Go: First attempt at channels

without comments

In a previous blog post I mentioned that I wanted to extract blips from The ThoughtWorks Radar into a CSV file and I thought this would be a good mini project for me to practice using Go.

In particular I wanted to try using channels and this seemed like a good chance to do that.

I watched a talk by Rob Pike on designing concurrent applications where he uses the following definition of concurrency:

Concurrency is a way to structure a program by breaking it into pieces that can be executed independently.

He then demonstrates this with the following diagram:

2016 12 23 19 52 30

I broke the scraping application down into four parts:

  1. Find the links of blips to download ->
  2. Download the blips ->
  3. Scrape the data from each page ->
  4. Write the data into a CSV file

I don’t think we gain much by parallelising steps 1) or 4) but steps 2) and 3) seem easily parallelisable. Therefore we’ll use a single goroutine for steps 1) and 4) and multiple goroutines for steps 2) and 3).

We’ll create two channels:

  • filesToScrape
  • filesScraped

And they will interact with our components like this:

  • 2) will write the path of the downloaded files into filesToScape
  • 3) will read from filesToScrape and write the scraped content into filesScraped
  • 4) will read from filesScraped and put that information into a CSV file.


I decided to write a completely serial version of the scraping application first so that I could compare it to the parallel version. I had the following common code:

scrape/scrape.go

package scrape
 
import (
	"github.com/PuerkitoBio/goquery"
	"os"
	"bufio"
	"fmt"
	"log"
	"strings"
	"net/http"
	"io"
)
 
func checkError(err error) {
	if err != nil {
		fmt.Println(err)
		log.Fatal(err)
	}
}
 
type Blip struct {
	Link  string
	Title string
}
 
func (blip Blip) Download() File {
	parts := strings.Split(blip.Link, "/")
	fileName := "rawData/items/" + parts[len(parts) - 1]
 
	if _, err := os.Stat(fileName); os.IsNotExist(err) {
		resp, err := http.Get("http://www.thoughtworks.com" + blip.Link)
		checkError(err)
		body := resp.Body
 
		file, err := os.Create(fileName)
		checkError(err)
 
		io.Copy(bufio.NewWriter(file), body)
		file.Close()
		body.Close()
	}
 
	return File{Title: blip.Title, Path: fileName }
}
 
type File struct {
	Title string
	Path  string
}
 
func (fileToScrape File ) Scrape() ScrapedFile {
	file, err := os.Open(fileToScrape.Path)
	checkError(err)
 
	doc, err := goquery.NewDocumentFromReader(bufio.NewReader(file))
	checkError(err)
	file.Close()
 
	var entries []map[string]string
	doc.Find("div.blip-timeline-item").Each(func(i int, s *goquery.Selection) {
		entry := make(map[string]string, 0)
		entry["time"] = s.Find("div.blip-timeline-item__time").First().Text()
		entry["outcome"] = strings.Trim(s.Find("div.blip-timeline-item__ring span").First().Text(), " ")
		entry["description"] = s.Find("div.blip-timeline-item__lead").First().Text()
		entries = append(entries, entry)
	})
 
	return ScrapedFile{File:fileToScrape, Entries:entries}
}
 
type ScrapedFile struct {
	File    File
	Entries []map[string]string
}
 
func FindBlips(pathToRadar string) []Blip {
	blips := make([]Blip, 0)
 
	file, err := os.Open(pathToRadar)
	checkError(err)
 
	doc, err := goquery.NewDocumentFromReader(bufio.NewReader(file))
	checkError(err)
 
	doc.Find(".blip").Each(func(i int, s *goquery.Selection) {
		item := s.Find("a")
		title := item.Text()
		link, _ := item.Attr("href")
		blips = append(blips, Blip{Title: title, Link: link })
	})
 
	return blips
}

Note that we’re using the goquery library to scrape the HTML files that we download.

A Blip is used to represent an item that appears on the radar e.g. .NET Core. A File is a representation of that blip on my local file system and a ScrapedFile contains the local representation of a blip and has an array containing every appearance the blip has made in radars over time.

Let’s have a look at the single threaded version of the scraper:

cmd/single/main.go

package main
 
import (
	"fmt"
	"encoding/csv"
	"os"
	"github.com/mneedham/neo4j-thoughtworks-radar/scrape"
)
 
 
func main() {
	var filesCompleted chan scrape.ScrapedFile = make(chan scrape.ScrapedFile)
	defer close(filesCompleted)
 
	blips := scrape.FindBlips("rawData/twRadar.html")
 
	var filesToScrape []scrape.File
	for _, blip := range blips {
		filesToScrape = append(filesToScrape, blip.Download())
	}
 
	var filesScraped []scrape.ScrapedFile
	for _, file := range filesToScrape {
		filesScraped = append(filesScraped, file.Scrape())
	}
 
	blipsCsvFile, _ := os.Create("import/blipsSingle.csv")
	writer := csv.NewWriter(blipsCsvFile)
	defer blipsCsvFile.Close()
 
	writer.Write([]string{"technology", "date", "suggestion" })
	for _, scrapedFile := range filesScraped {
		fmt.Println(scrapedFile.File.Title)
		for _, blip := range scrapedFile.Entries {
			writer.Write([]string{scrapedFile.File.Title, blip["time"], blip["outcome"] })
		}
	}
	writer.Flush()
}

rawData/twRadar.html is a local copy of the A-Z page which contains all the blips. This version is reasonably simple: we create an array containing all the blips, scrape them into another array, and then that array into a CSV file. And if we run it:

$ time go run cmd/single/main.go 
 
real	3m10.354s
user	0m1.140s
sys	0m0.586s
 
$ head -n10 import/blipsSingle.csv 
technology,date,suggestion
.NET Core,Nov 2016,Assess
.NET Core,Nov 2015,Assess
.NET Core,May 2015,Assess
A single CI instance for all teams,Nov 2016,Hold
A single CI instance for all teams,Apr 2016,Hold
Acceptance test of journeys,Mar 2012,Trial
Acceptance test of journeys,Jul 2011,Trial
Acceptance test of journeys,Jan 2011,Trial
Accumulate-only data,Nov 2015,Assess

It takes a few minutes and most of the time will be taken in the blip.Download() function – work which is easily parallelisable. Let’s have a look at the parallel version where goroutines use channels to communicate with each other:

cmd/parallel/main.go

package main
 
import (
	"os"
	"encoding/csv"
	"github.com/mneedham/neo4j-thoughtworks-radar/scrape"
)
 
func main() {
	var filesToScrape chan scrape.File = make(chan scrape.File)
	var filesScraped chan scrape.ScrapedFile = make(chan scrape.ScrapedFile)
	defer close(filesToScrape)
	defer close(filesScraped)
 
	blips := scrape.FindBlips("rawData/twRadar.html")
 
	for _, blip := range blips {
		go func(blip scrape.Blip) { filesToScrape <- blip.Download() }(blip)
	}
 
	for i := 0; i < len(blips); i++ {
		select {
		case file := <-filesToScrape:
			go func(file scrape.File) { filesScraped <- file.Scrape() }(file)
		}
	}
 
	blipsCsvFile, _ := os.Create("import/blips.csv")
	writer := csv.NewWriter(blipsCsvFile)
	defer blipsCsvFile.Close()
 
	writer.Write([]string{"technology", "date", "suggestion" })
	for i := 0; i < len(blips); i++ {
		select {
		case scrapedFile := <-filesScraped:
			for _, blip := range scrapedFile.Entries {
				writer.Write([]string{scrapedFile.File.Title, blip["time"], blip["outcome"] })
			}
		}
	}
	writer.Flush()
}

Let’s remove the files we just downloaded and give this version a try.

$ rm rawData/items/*
 
$ time go run cmd/parallel/main.go 
 
real	0m6.689s
user	0m2.544s
sys	0m0.904s
 
$ head -n10 import/blips.csv 
technology,date,suggestion
Zucchini,Oct 2012,Assess
Reactive Extensions for .Net,May 2013,Assess
Manual infrastructure management,Mar 2012,Hold
Manual infrastructure management,Jul 2011,Hold
JavaScript micro frameworks,Oct 2012,Trial
JavaScript micro frameworks,Mar 2012,Trial
NPM for all the things,Apr 2016,Trial
NPM for all the things,Nov 2015,Trial
PowerShell,Mar 2012,Trial

So we’re down from 190 seconds to 7 seconds, pretty cool! One interesting thing is that the order of the values in the CSV file will be different since the goroutines won’t necessarily come back in the same order that they were launched. We do end up with the same number of values:

$ wc -l import/blips.csv 
    1361 import/blips.csv
 
$ wc -l import/blipsSingle.csv 
    1361 import/blipsSingle.csv

And we can check that the contents are identical:

$ cat import/blipsSingle.csv  | sort > /tmp/blipsSingle.csv
 
$ cat import/blips.csv  | sort > /tmp/blips.csv
 
$ diff /tmp/blips.csv /tmp/blipsSingle.csv


The code in this post is all on github. I’m sure I’ve made some mistakes/there are ways that this could be done better so do let me know in the comments or I’m @markhneedham on twitter.

Written by Mark Needham

December 24th, 2016 at 10:45 am

Posted in Go

Tagged with ,

Go: cannot execute binary file: Exec format error

without comments

In an earlier blog post I mentioned that I’d been building an internal application to learn a bit of Go and I wanted to deploy it to AWS.

Since the application was only going to live for a couple of days I didn’t want to spend a long time build up anything fancy so my plan was just to build the executable, SSH it to my AWS instance, and then run it.

My initial (somewhat naive) approach was to just build the project on my Mac and upload and run it:

$ go build
 
$ scp myapp ubuntu@aws...
 
$ ssh ubuntu@aws...
 
$ ./myapp
-bash: ./myapp: cannot execute binary file: Exec format error

That didn’t go so well! By reading Ask Ubuntu and Dave Cheney’s blog post on cross compilation I realised that I just needed to set the appropriate environment variables before running go build.

The following did the trick:

env GOOS=linux GOARCH=amd64 GOARM=7 go build

And that’s it! I’m sure there’s more sophisticated ways of doing this that I’ll come to learn about but for now this worked for me.

Written by Mark Needham

December 23rd, 2016 at 6:24 pm

Posted in Go

Tagged with ,

Go: Templating with the Gin Web Framework

without comments

I spent a bit of time over the last week building a little internal web application using Go and the Gin Web Framework and it took me a while to get the hang of the templating language so I thought I’d write up some examples.

Before we get started, I’ve got my GOPATH set to the following path:

$ echo $GOPATH
/Users/markneedham/projects/gocode

And the project containing the examples sits inside the src directory:

$ pwd
/Users/markneedham/projects/gocode/src/github.com/mneedham/golang-gin-templating-demo

Let’s first install Gin:

$ go get gopkg.in/gin-gonic/gin.v1

It gets installed here:

$ ls -lh $GOPATH/src/gopkg.in
total 0
drwxr-xr-x   3 markneedham  staff   102B 23 Dec 10:55 gin-gonic

Now let’s create a main function to launch our web application:

demo.go

package main
 
import (
	"github.com/gin-gonic/gin"
	"net/http"
)
 
func main() {
	router := gin.Default()
	router.LoadHTMLGlob("templates/*")
 
	// our handlers will go here
 
	router.Run("0.0.0.0:9090")
}

We’re launching our application on port 9090 and the templates live in the templates directory which is located relative to the file containing the main function:

$ ls -lh
total 8
-rw-r--r--  1 markneedham  staff   570B 23 Dec 13:34 demo.go
drwxr-xr-x  4 markneedham  staff   136B 23 Dec 13:34 templates

Arrays

Let’s create a route which will display the values of an array in an unordered list:

	router.GET("/array", func(c *gin.Context) {
		var values []int
		for i := 0; i < 5; i++ {
			values = append(values, i)
		}
 
		c.HTML(http.StatusOK, "array.tmpl", gin.H{"values": values})
	})
<ul>
  {{ range .values }}
  <li>{{ . }}</li>
  {{ end }}
</ul>

And now we’ll cURL our application to see what we get back:

$ curl http://localhost:9090/array
<ul>
  <li>0</li>
  <li>1</li>
  <li>2</li>
  <li>3</li>
  <li>4</li>
</ul>

What about if we have an array of structs instead of just strings?

import "strconv"
 
type Foo struct {
	value1 int
	value2 string
}
 
	router.GET("/arrayStruct", func(c *gin.Context) {
		var values []Foo
		for i := 0; i < 5; i++ {
			values = append(values, Foo{Value1: i, Value2: "value " + strconv.Itoa(i)})
		}
 
		c.HTML(http.StatusOK, "arrayStruct.tmpl", gin.H{"values": values})
	})
<ul>
  {{ range .values }}
  <li>{{ .Value1 }} -> {{ .Value2 }}</li>
  {{ end }}
</ul>

cURL time:

$ curl http://localhost:9090/arrayStruct
<ul>
  <li>0 -> value 0</li>
  <li>1 -> value 1</li>
  <li>2 -> value 2</li>
  <li>3 -> value 3</li>
  <li>4 -> value 4</li>  
</ul>

Maps

Now let’s do the same for maps.

	router.GET("/map", func(c *gin.Context) {
		values := make(map[string]string)
		values["language"] = "Go"
		values["version"] = "1.7.4"
 
		c.HTML(http.StatusOK, "map.tmpl", gin.H{"myMap": values})
	})
<ul>
  {{ range .myMap }}
  <li>{{ . }}</li>
  {{ end }}
</ul>

And cURL it:

$ curl http://localhost:9090/map
<ul>
  <li>Go</li>
  <li>1.7.4</li>
</ul>

What if we want to see the keys as well?

	router.GET("/mapKeys", func(c *gin.Context) {
		values := make(map[string]string)
		values["language"] = "Go"
		values["version"] = "1.7.4"
 
		c.HTML(http.StatusOK, "mapKeys.tmpl", gin.H{"myMap": values})
	})
<ul>
  {{ range $key, $value := .myMap }}
  <li>{{ $key }} -> {{ $value }}</li>
  {{ end }}
</ul>
$ curl http://localhost:9090/mapKeys
<ul>  
  <li>language -> Go</li>  
  <li>version -> 1.7.4</li>  
</ul>

And finally, what if we want to select specific values from the map?

	router.GET("/mapSelectKeys", func(c *gin.Context) {
		values := make(map[string]string)
		values["language"] = "Go"
		values["version"] = "1.7.4"
 
		c.HTML(http.StatusOK, "mapSelectKeys.tmpl", gin.H{"myMap": values})
	})
<ul>
  <li>Language: {{ .myMap.language }}</li>
  <li>Version: {{ .myMap.version }}</li>
</ul>
$ curl http://localhost:9090/mapSelectKeys
<ul>
  <li>Language: Go</li>
  <li>Version: 1.7.4</li>
</ul>

I’ve found the Hugo Go Template Primer helpful for figuring this out so that’s a good reference if you get stuck. You can find a go file containing all the examples on github if you want to use that as a starting point.

Written by Mark Needham

December 23rd, 2016 at 2:30 pm

Posted in Go

Tagged with ,