I spent a bit of Sunday playing around with R and one thing I wanted to do was map a function over a collection of values and transform each value slightly.
> data <- read.csv("~/data.csv", header=T, encoding="ISO-8859") > data Column1 InterestingColumn 1 Mark 12.50 2 Dave 100.00 3 John 1,231.00
Column1, InterestingColumn Mark, 12.50 Dave, 100.00 John, 1,231.00
data is a table with the type ‘3 obs. of 2 variables’ in R Studio.
I was only interested in the values in the 2nd column so I selected those like this:
> data$InterestingColumn  12.50 100.00 1,231.00 Levels: 1,231.00 100.00 12.50
I wanted to apply a function over each of the numbers and return a new list containing those transformations.
I initially had a look at doing this with a for loop but it didn’t turn out to be as easy as I’d expected and I eventually came across the lapply function which allows you to apply a function over a list or vector.
> values <- data$InterestingColumn > lapply(values, function(x) 5000 - as.numeric(gsub("\\s|,","", x))) []  4987.5 []  4900 []  3769
We define a function which subtracts the value in the column from 5000 since the CSV file contained derived values and I was interested in the original value.
In order to do that subtraction I needed to cast the value from the CSV file to be numeric which first involved stripping out any spaces or commas using gsub and then casting the string using as.numeric.
If we want to have a table structure then we can use the ‘by’ function to do a similar thing:
> as.table(by(data$InterestingColumn, data$Column1, function(x) 5000 - as.numeric(gsub("\\s|,","", x)))) data$Column1 Dave John Mark 4900.0 3769.0 4987.5
I don’t know enough R to know how to keep the data in exactly the same structure as we got it so if anyone can point me in the right direction that’d be cool.