Mark Needham

Thoughts on Software Development

R: Error in approxfun(x.values.1, y.values.1, method = “constant”, f = 1, : zero non-NA points

without comments

I’ve been following Michy Alice’s logistic regression tutorial to create an attendance model for London dev meetups and ran into an interesting problem while doing so.

Our dataset has a class imbalance i.e. most people RSVP ‘no’ to events which can lead to misleading accuracy score where predicting ‘no’ every time would lead to supposed high accuracy.

Source: local data frame [2 x 2]
 
  attended     n
     (dbl) (int)
1        0  1541
2        1    53

I sampled the data using caret‘s upSample function to avoid this:

attended = as.factor((df %>% dplyr::select(attended))$attended)
upSampledDf = upSample(df %>% dplyr::select(-attended), attended)
upSampledDf$attended = as.numeric(as.character(upSampledDf$Class))

I then trained a logistic regression model but when I tried to plot the area under the curve I ran into trouble:

p <- predict(model, newdata=test, type="response")
pr <- prediction(p, test$attended)
prf <- performance(pr, measure = "tpr", x.measure = "fpr")
 
Error in approxfun(x.values.1, y.values.1, method = "constant", f = 1,  : 
  zero non-NA points

I don’t have any NA values in my data frame so this message was a bit confusing to start with. As usual Stack Overflow came to the rescue with the suggestion that I was probably missing positive/negative values for the independent variable i.e. ‘approved’.

A quick count on the test data frame using dplyr confirmed my mistake:

> test %>% count(attended)
Source: local data frame [1 x 2]
 
  attended     n
     (dbl) (int)
1        1   582

I’ll have to randomly sort the data frame and then reassign my training and test data frames to work around it.

Be Sociable, Share!

Written by Mark Needham

December 27th, 2015 at 12:24 pm

Posted in R

Tagged with