18 Apr 2016

R: substr - Getting a vector of positions

I recently found myself writing an R script to extract parts of a string based on a beginning and end index which is reasonably easy using the https://stat.ethz.ch/R-manual/R-devel/library/base/html/substr.html function:

R > substr("mark loves graphs", 0, 4)
[1] "mark"

But what if we have a vector of start and end positions?

R > substr("mark loves graphs", c(0, 6), c(4, 10))
[1] "mark"

Hmmm that didn’t work as I expected! It turns out we actually need to use the https://stat.ethz.ch/R-manual/R-devel/library/base/html/substr.html function instead which wasn’t initially obvious to me on reading the documentation:

R > substring("mark loves graphs", c(0, 6, 12), c(4, 10, 17))
[1] "mark"   "loves"  "graphs"

Easy when you know how!

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.