Mark Needham

Thoughts on Software Development

R: Mapping a function over a collection of values

with 2 comments

I spent a bit of Sunday playing around with R and one thing I wanted to do was map a function over a collection of values and transform each value slightly.

I loaded my data set using the ‘Import Dataset’ option in R Studio (suggested to me by Rob) which gets converted to the following function call:

> data <-  read.csv("~/data.csv", header=T, encoding="ISO-8859")
> data
  Column1 InterestingColumn
1    Mark             12.50
2    Dave            100.00
3    John          1,231.00

data.csv

Column1, InterestingColumn
Mark, 12.50
Dave, 100.00 
John, 1,231.00

data is a table with the type ‘3 obs. of 2 variables’ in R Studio.

I was only interested in the values in the 2nd column so I selected those like this:

> data$InterestingColumn
[1]  12.50     100.00    1,231.00
Levels:  1,231.00  100.00  12.50

I wanted to apply a function over each of the numbers and return a new list containing those transformations.

I initially had a look at doing this with a for loop but it didn’t turn out to be as easy as I’d expected and I eventually came across the lapply function which allows you to apply a function over a list or vector.

> values <- data$InterestingColumn
> lapply(values, function(x) 5000 - as.numeric(gsub("\\s|,","", x)))
[[1]]
[1] 4987.5
 
[[2]]
[1] 4900
 
[[3]]
[1] 3769

We define a function which subtracts the value in the column from 5000 since the CSV file contained derived values and I was interested in the original value.

In order to do that subtraction I needed to cast the value from the CSV file to be numeric which first involved stripping out any spaces or commas using gsub and then casting the string using as.numeric.

If we want to have a table structure then we can use the ‘by’ function to do a similar thing:

> as.table(by(data$InterestingColumn, data$Column1, function(x) 5000 - as.numeric(gsub("\\s|,","", x))))
data$Column1
  Dave   John   Mark 
4900.0 3769.0 4987.5

I don’t know enough R to know how to keep the data in exactly the same structure as we got it so if anyone can point me in the right direction that’d be cool.

Be Sociable, Share!

Written by Mark Needham

July 23rd, 2012 at 11:25 pm

Posted in R

Tagged with

  • http://twitter.com/kinow kinow

     Hmmm, the original structure that you have after you read the CSV would be a data.frame. It’s quite hard to work with data frames, I think it may be easier to stick with a list or vector, but I’m no R specialist :-) Will keep an eye here to see the other commentaries.

    Few days ago Jenkins got an R Plug-in too, so now you can run your R script there – https://wiki.jenkins-ci.org/display/JENKINS/R+Plugin

  • Sergei Petrov

    5000 – as.numeric(gsub(“\s|,”,””, data$InterestingColumn ))