Mark Needham

Thoughts on Software Development

R: substr – Getting a vector of positions

with 2 comments

I recently found myself writing an R script to extract parts of a string based on a beginning and end index which is reasonably easy using the substr function:

> substr("mark loves graphs", 0, 4)
[1] "mark"

But what if we have a vector of start and end positions?

> substr("mark loves graphs", c(0, 6), c(4, 10))
[1] "mark"

Hmmm that didn’t work as I expected! It turns out we actually need to use the substring function instead which wasn’t initially obvious to me on reading the documentation:

> substring("mark loves graphs", c(0, 6, 12), c(4, 10, 17))
[1] "mark"   "loves"  "graphs"

Easy when you know how!

Be Sociable, Share!

Written by Mark Needham

April 18th, 2016 at 7:49 pm

Posted in R

Tagged with

  • Antonios K.

    You can get same results using the “strsplit” command.
    Like:
    unlist(strsplit(“mark loves graphs”, ” “)) .

    Just in case it’s not easy to get all beginning and end indices in a (very) big string, but you know whether there’s a space or another symbol between the words.

  • Yeh for this case that makes more sense…what I was actually doing was getting the indexes from the tm library and then selected the parts of a string.

    Will have to write that up as well!