# Mark Needham

Thoughts on Software Development

## Python: Combinations of values on and off

In my continued exploration of Kaggle’s Spooky Authors competition, I wanted to run a GridSearch turning on and off different classifiers to work out the best combination.

I therefore needed to generate combinations of 1s and 0s enabling different classifiers.

e.g. if we had 3 classifiers we’d generate these combinations

```0 0 1 0 1 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 1```

where…

• ‘0 0 1’ means: classifier1 is disabled, classifier3 is disabled, classifier3 is enabled
• ‘0 1 0’ means: classifier1 is disabled, classifier3 is enabled, classifier3 is disabled
• ‘1 1 0’ means: classifier1 is enabled, classifier3 is enabled, classifier3 is disabled
• ‘1 1 1’ means: classifier1 is enabled, classifier3 is enabled, classifier3 is enabled

…and so on. In other words, we need to generate the binary representation for all the values from 1 to 2number of classifiers-1.

We can write the following code fragments to calculate a 3 bit representation of different numbers:

```>>> "{0:0b}".format(1).zfill(3) '001' >>> "{0:0b}".format(5).zfill(3) '101' >>> "{0:0b}".format(6).zfill(3) '110'```

We need an array of 0s and 1s rather than a string, so let’s use the list function to create our array and then cast each value to an integer:

```>>> [int(x) for x in list("{0:0b}".format(1).zfill(3))] [0, 0, 1]```

Finally we can wrap that code inside a list comprehension:

```def combinations_on_off(num_classifiers): return [[int(x) for x in list("{0:0b}".format(i).zfill(num_classifiers))] for i in range(1, 2 ** num_classifiers)]```

And let’s check it works:

```>>> for combination in combinations_on_off(3): print(combination)   [0, 0, 1] [0, 1, 0] [0, 1, 1] [1, 0, 0] [1, 0, 1] [1, 1, 0] [1, 1, 1]```

what about if we have 4 classifiers?

```>>> for combination in combinations_on_off(4): print(combination)   [0, 0, 0, 1] [0, 0, 1, 0] [0, 0, 1, 1] [0, 1, 0, 0] [0, 1, 0, 1] [0, 1, 1, 0] [0, 1, 1, 1] [1, 0, 0, 0] [1, 0, 0, 1] [1, 0, 1, 0] [1, 0, 1, 1] [1, 1, 0, 0] [1, 1, 0, 1] [1, 1, 1, 0] [1, 1, 1, 1]```

Perfect! We can now use this function to help work out which combinations of classifiers are needed.

Written by Mark Needham

December 3rd, 2017 at 5:23 pm

Posted in Python

Tagged with ,

## Python 3: TypeError: unsupported format string passed to numpy.ndarray.__format__

This post explains how to work around a change in how Python string formatting works for numpy arrays between Python 2 and Python 3.

I’ve been going through Kevin Markham‘s scikit-learn Jupyter notebooks and ran into a problem on the Cross Validation one, which was throwing this error when attempting to print the KFold example:

```Iteration Training set observations Testing set observations --------------------------------------------------------------------------- TypeError Traceback (most recent call last) <ipython-input-28-007cbab507e3> in <module>() 6 print('{} {:^61} {}'.format('Iteration', 'Training set observations', 'Testing set observations')) 7 for iteration, data in enumerate(kf, start=1): ----> 8 print('{0:^9} {1} {2:^25}'.format(iteration, data[0], data[1]))   TypeError: unsupported format string passed to numpy.ndarray.__format__```

We can reproduce this easily:

`>>> import numpy as np`
```>>> "{:9}".format(np.array([1,2,3])) Traceback (most recent call last): File "<stdin>", line 1, in <module> TypeError: unsupported format string passed to numpy.ndarray.__format__```

What about if we use Python 2?

```>>> "{:9}".format(np.array([1,2,3])) '[1 2 3] '```

Hmmm, must be a change between the Python versions.

We can work around it by coercing our numpy array to a string:

```>>> "{:9}".format(str(np.array([1,2,3]))) '[1 2 3] '```

Written by Mark Needham

November 19th, 2017 at 7:16 am

Posted in Python

Tagged with ,

## Python 3: Create sparklines using matplotlib

I recently wanted to create sparklines to show how some values were changing over time. In addition, I wanted to generate them as images on the server rather than introducing a JavaScript library.

Chris Seymour’s excellent gist which shows how to create sparklines inside a Pandas dataframe got me most of the way there, but I had to tweak his code a bit to get it to play nicely with Python 3.6.

This is what I ended up with:

```import matplotlib matplotlib.use("Agg") import matplotlib.pyplot as plt import base64   from io import BytesIO   def sparkline(data, figsize=(4, 0.25), **kwags): """ Returns a HTML image tag containing a base64 encoded sparkline style plot """ data = list(data)   fig, ax = plt.subplots(1, 1, figsize=figsize, **kwags) ax.plot(data) for k,v in ax.spines.items(): v.set_visible(False) ax.set_xticks([]) ax.set_yticks([])   plt.plot(len(data) - 1, data[len(data) - 1], 'r.')   ax.fill_between(range(len(data)), data, len(data)*[min(data)], alpha=0.1)   img = BytesIO() plt.savefig(img, transparent=True, bbox_inches='tight') img.seek(0) plt.close()   return base64.b64encode(img.read()).decode("UTF-8")```

I had to change the class used to write the image from StringIO to BytesIO and I found I needed to decode the bytes produced if I wanted it to display in a HTML page.

This is how you would call the above function:

```if __name__ == "__main__": values = [ [1,2,3,4,5,6,7,8,9,10], [7,10,12,18,2,8,10,6,7,12], [10,9,8,7,6,5,4,3,2,1] ]   with open("/tmp/foo.html", "w") as file: for value in values: file.write('<div><img src="data:image/png;base64,{}"/></div>'.format(sparkline(value)))```

And the HTML page looks like this:

Written by Mark Needham

September 23rd, 2017 at 6:51 am