Pandas: Find rows where column/field is null
In my continued playing around with the Kaggle house prices dataset I wanted to find any columns/fields that have null values in.
If we want to get a count of the number of null fields by column we can use the following code, adapted from Poonam Ligade’s kernel:
import pandas as pd
Count the null columns
train = pd.read_csv("train.csv") null_columns=train.columns[train.isnull().any()] train[null_columns].isnull().sum()
LotFrontage 259 Alley 1369 MasVnrType 8 MasVnrArea 8 BsmtQual 37 BsmtCond 37 BsmtExposure 38 BsmtFinType1 37 BsmtFinType2 38 Electrical 1 FireplaceQu 690 GarageType 81 GarageYrBlt 81 GarageFinish 81 GarageQual 81 GarageCond 81 PoolQC 1453 Fence 1179 MiscFeature 1406 dtype: int64
So there are lots of different columns containing null values. What if we want to find the solitary row which has 'Electrical' as null?
Single column is null
LotFrontage Alley MasVnrType MasVnrArea BsmtQual BsmtCond BsmtExposure \ 1379 73.0 NaN None 0.0 Gd TA No BsmtFinType1 BsmtFinType2 Electrical FireplaceQu GarageType GarageYrBlt \ 1379 Unf Unf NaN NaN BuiltIn 2007.0 GarageFinish GarageQual GarageCond PoolQC Fence MiscFeature 1379 Fin TA TA NaN NaN NaN
And what if we want to return every row that contains at least one null value? That’s not too difficult - it’s just a combination of the code in the previous two sections:
All null columns
LotFrontage Alley MasVnrType MasVnrArea BsmtQual BsmtCond BsmtExposure \ 0 65.0 NaN BrkFace 196.0 Gd TA No 1 80.0 NaN None 0.0 Gd TA Gd 2 68.0 NaN BrkFace 162.0 Gd TA Mn 3 60.0 NaN None 0.0 TA Gd No 4 84.0 NaN BrkFace 350.0 Gd TA Av BsmtFinType1 BsmtFinType2 Electrical FireplaceQu GarageType GarageYrBlt \ 0 GLQ Unf SBrkr NaN Attchd 2003.0 1 ALQ Unf SBrkr TA Attchd 1976.0 2 GLQ Unf SBrkr TA Attchd 2001.0 3 ALQ Unf SBrkr Gd Detchd 1998.0 4 GLQ Unf SBrkr TA Attchd 2000.0 GarageFinish GarageQual GarageCond PoolQC Fence MiscFeature 0 RFn TA TA NaN NaN NaN 1 RFn TA TA NaN NaN NaN 2 RFn TA TA NaN NaN NaN 3 Unf TA TA NaN NaN NaN 4 RFn TA TA NaN NaN NaN
Hope that helps future Mark!
About the author
I'm currently working on real-time user-facing analytics with Apache Pinot at StarTree. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.