Mark Needham

Thoughts on Software Development

Archive for the ‘python3’ tag

Python: Combinations of values on and off

without comments

In my continued exploration of Kaggle’s Spooky Authors competition, I wanted to run a GridSearch turning on and off different classifiers to work out the best combination.

I therefore needed to generate combinations of 1s and 0s enabling different classifiers.

e.g. if we had 3 classifiers we’d generate these combinations

0 0 1
0 1 0
1 0 0
1 1 0
1 0 1
0 1 1
1 1 1

where…

  • ‘0 0 1’ means: classifier1 is disabled, classifier3 is disabled, classifier3 is enabled
  • ‘0 1 0’ means: classifier1 is disabled, classifier3 is enabled, classifier3 is disabled
  • ‘1 1 0’ means: classifier1 is enabled, classifier3 is enabled, classifier3 is disabled
  • ‘1 1 1’ means: classifier1 is enabled, classifier3 is enabled, classifier3 is enabled

…and so on. In other words, we need to generate the binary representation for all the values from 1 to 2number of classifiers-1.

We can write the following code fragments to calculate a 3 bit representation of different numbers:

>>> "{0:0b}".format(1).zfill(3)
'001'
>>> "{0:0b}".format(5).zfill(3)
'101'
>>> "{0:0b}".format(6).zfill(3)
'110'

We need an array of 0s and 1s rather than a string, so let’s use the list function to create our array and then cast each value to an integer:

>>> [int(x) for x in list("{0:0b}".format(1).zfill(3))]
[0, 0, 1]

Finally we can wrap that code inside a list comprehension:

def combinations_on_off(num_classifiers):
    return [[int(x) for x in list("{0:0b}".format(i).zfill(num_classifiers))]
            for i in range(1, 2 ** num_classifiers)]

And let’s check it works:

>>> for combination in combinations_on_off(3):
       print(combination)
 
[0, 0, 1]
[0, 1, 0]
[0, 1, 1]
[1, 0, 0]
[1, 0, 1]
[1, 1, 0]
[1, 1, 1]

what about if we have 4 classifiers?

>>> for combination in combinations_on_off(4):
       print(combination)
 
[0, 0, 0, 1]
[0, 0, 1, 0]
[0, 0, 1, 1]
[0, 1, 0, 0]
[0, 1, 0, 1]
[0, 1, 1, 0]
[0, 1, 1, 1]
[1, 0, 0, 0]
[1, 0, 0, 1]
[1, 0, 1, 0]
[1, 0, 1, 1]
[1, 1, 0, 0]
[1, 1, 0, 1]
[1, 1, 1, 0]
[1, 1, 1, 1]

Perfect! We can now use this function to help work out which combinations of classifiers are needed.

Written by Mark Needham

December 3rd, 2017 at 5:23 pm

Posted in Python

Tagged with ,

Python 3: TypeError: unsupported format string passed to numpy.ndarray.__format__

without comments

This post explains how to work around a change in how Python string formatting works for numpy arrays between Python 2 and Python 3.

I’ve been going through Kevin Markham‘s scikit-learn Jupyter notebooks and ran into a problem on the Cross Validation one, which was throwing this error when attempting to print the KFold example:

Iteration                   Training set observations                   Testing set observations
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-28-007cbab507e3> in <module>()
      6 print('{} {:^61} {}'.format('Iteration', 'Training set observations', 'Testing set observations'))
      7 for iteration, data in enumerate(kf, start=1):
----> 8     print('{0:^9} {1} {2:^25}'.format(iteration, data[0], data[1]))
 
TypeError: unsupported format string passed to numpy.ndarray.__format__

We can reproduce this easily:

>>> import numpy as np
>>> "{:9}".format(np.array([1,2,3]))
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: unsupported format string passed to numpy.ndarray.__format__

What about if we use Python 2?

>>> "{:9}".format(np.array([1,2,3]))
'[1 2 3]  '

Hmmm, must be a change between the Python versions.

We can work around it by coercing our numpy array to a string:

>>> "{:9}".format(str(np.array([1,2,3])))
'[1 2 3]  '

Written by Mark Needham

November 19th, 2017 at 7:16 am

Posted in Python

Tagged with ,

Python 3: Create sparklines using matplotlib

without comments

I recently wanted to create sparklines to show how some values were changing over time. In addition, I wanted to generate them as images on the server rather than introducing a JavaScript library.

Chris Seymour’s excellent gist which shows how to create sparklines inside a Pandas dataframe got me most of the way there, but I had to tweak his code a bit to get it to play nicely with Python 3.6.

This is what I ended up with:

import matplotlib
matplotlib.use("Agg")
import matplotlib.pyplot as plt
import base64
 
from io import BytesIO
 
def sparkline(data, figsize=(4, 0.25), **kwags):
    """
    Returns a HTML image tag containing a base64 encoded sparkline style plot
    """
    data = list(data)
 
    fig, ax = plt.subplots(1, 1, figsize=figsize, **kwags)
    ax.plot(data)
    for k,v in ax.spines.items():
        v.set_visible(False)
    ax.set_xticks([])
    ax.set_yticks([])
 
    plt.plot(len(data) - 1, data[len(data) - 1], 'r.')
 
    ax.fill_between(range(len(data)), data, len(data)*[min(data)], alpha=0.1)
 
    img = BytesIO()
    plt.savefig(img, transparent=True, bbox_inches='tight')
    img.seek(0)
    plt.close()
 
    return base64.b64encode(img.read()).decode("UTF-8")

I had to change the class used to write the image from StringIO to BytesIO and I found I needed to decode the bytes produced if I wanted it to display in a HTML page.

This is how you would call the above function:

if __name__ == "__main__":
    values = [
        [1,2,3,4,5,6,7,8,9,10],
        [7,10,12,18,2,8,10,6,7,12],
        [10,9,8,7,6,5,4,3,2,1]
    ]
 
    with open("/tmp/foo.html", "w") as file:
        for value in values:
            file.write('<div><img src="data:image/png;base64,{}"/></div>'.format(sparkline(value)))

And the HTML page looks like this:

2017 09 23 07 49 32

Written by Mark Needham

September 23rd, 2017 at 6:51 am