# ClickHouse: Float equality

I’ve been playing around with NumPy data in ClickHouse this week and wanted to share what I learnt when checking for equality of float values. Let’s get going!

## Creating arrays

We’re going to use Python’s NumPy library to create 5 arrays containing 10 values each:

``````import numpy as np
rng = np.random.default_rng(seed=42)
rng.random(size=(5, 5))``````
Output
``````array([[0.28138389, 0.29359376, 0.66191651, 0.55703215, 0.78389821],
[0.66431354, 0.40638686, 0.81402038, 0.16697292, 0.02271207],
[0.09004786, 0.72235935, 0.46187723, 0.16127178, 0.50104478],
[0.1523121 , 0.69632038, 0.44615628, 0.38102123, 0.30151209],
[0.63028259, 0.36181261, 0.08764992, 0.1180059 , 0.96189766]])``````

By default, the type of the values is `float64`:

``rng.random(size=(5, 10)).dtype``
Output
``dtype('float64')``

But we’re going to cast that to `float32` since that’s the format of the data in the file that I’ve been working with. We’ll then save the arrays to a file:

``````values = rng.random(size=(5, 10))
np.save('arrays.npy', values.astype(np.float32))``````

## Querying in ClickHouse

``````curl https://clickhouse.com/ | sh
./clickhouse local -m``````

We can then read the `arrays.npy` file:

``````FROM file('arrays.npy', Npy)
SELECT array, toTypeName(array);``````
Output
``````┌─array────────────────────────────────────────────────────┬─toTypeName(array)─┐
│ [0.2813839,0.29359376,0.6619165,0.55703217,0.78389823]   │ Array(Float32)    │
│ [0.66431355,0.40638685,0.8140204,0.16697292,0.022712072] │ Array(Float32)    │
│ [0.09004786,0.72235936,0.46187723,0.16127178,0.50104475] │ Array(Float32)    │
│ [0.1523121,0.69632035,0.44615626,0.38102123,0.3015121]   │ Array(Float32)    │
│ [0.6302826,0.36181262,0.08764992,0.1180059,0.9618977]    │ Array(Float32)    │
└──────────────────────────────────────────────────────────┴───────────────────┘``````

Next, let’s see if we can return only the rows where the first item of the array is `0.99237555`. My first attempt looked like this:

``````FROM file('arrays.npy', Npy)
SELECT array, toTypeName(array)
WHERE array[1] = 0.66431355
FORMAT Vertical;``````
Output
``0 rows in set. Elapsed: 0.002 sec.``

Computer says no! Let’s write a query that returns the type of our float literal:

``SELECT 0.66431355 AS literal, toTypeName(literal);``
Output
``````┌────literal─┬─toTypeName(0.66431355)─┐
│ 0.66431355 │ Float64                │
└────────────┴────────────────────────┘``````

Aha, it’s a `Float64`, which explains why it didn’t match our array of `Float32` values. There are at least two ways that I know of to rectify this.

We can use the `toFloat32` function to cast the float literal to the right type:

``````FROM file('arrays.npy', Npy)
SELECT array, toTypeName(array)
WHERE array[1] = toFloat32(0.66431355);``````

Or, we can cast the value to `Float32`

``````FROM file('arrays.npy', Npy)
SELECT array, toTypeName(array)
WHERE array[1] = CAST(0.66431355 AS Float32);``````

Either way, we get the following result.

Output
``````Row 1:
──────
┌─array────────────────────────────────────────────────────┬─toTypeName(array)─┐
│ [0.66431355,0.40638685,0.8140204,0.16697292,0.022712072] │ Array(Float32)    │
└──────────────────────────────────────────────────────────┴───────────────────┘``````

My takeaway when working with floats (or even decimals) is to make sure that you know the underlying data types that you’re working with.