Sed: Replacing characters with a new line
I’ve been playing around with writing some algorithms in both Ruby and Haskell and the latter wasn’t giving the correct result so I wanted to output an intermediate state of the two programs and compare them.
I didn’t do any fancy formatting of the output from either program so I had the raw data structures in text files which I needed to transform so that they were comparable.
The main thing I wanted to do was get each of the elements of the collection onto their own line. The output of one of the programs looked like this:
To get each of the elements onto a new line my first step was to replace every occurrence of ', (' with '\n('. I initially tried using sed to do that:
sed -E -e 's/, \(/\\n(/g' ruby_union.txt
All that did was insert the string value '\n' rather than the new line character.
I’ve come across similar problems before and I usually just use tr but in this case it doesn’t work very well because we’re replacing more than just a single character.
I came across this thread on Linux Questions which gives a couple of ways that we can get see to do what we want.
The first suggestion is that we should use a back slash followed by the enter key while writing our sed expression where we want the new line to be and then continue writing the rest of the expression.
We therefore end up with the following:
sed -E -e "s/,\(/\ /g" ruby_union.txt
This approach works but it’s a bit annoying as you need to delete the rest of the expression so that the enter works correctly.
An alternative is to make use of echo with the '-e' flag which allows us to output a new line. Usually backslashed characters aren’t interpreted and so you end up with a literal representation. e.g.
$ echo "mark\r\nneedham" mark\r\nneedham $ echo -e "mark\r\nneedham" mark needham
We therefore end up with this:~ ~text sed -E -e "s/, \(/\`echo -e '\n\r'`/g" ruby_union.txt ~
It was pointed out in the comments that this final version of the sed statement doesn’t actually lead to a very nice output which is because I left out the other commands I passed to it which get rid of extra brackets.
The following gives a cleaner output: ~text $ echo "[(1,2), (3,4), (5,6)]" | sed -E -e "s/, \(/\`echo -e '\n\r'`/g" -e 's/\[|]|\)|\(//g' 1,2 3,4 5,6 ~
About the author
I'm currently working on real-time user-facing analytics with Apache Pinot at StarTree. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.