Mark Needham

Thoughts on Software Development

Unix: Find all text below string in a file

without comments

I recently wanted to parse some text out of a bunch of files so that I could do some sentiment analysis on it. Luckily the text I want is at the end of the file and doesn’t have anything after it but there is text before it that I want to get rid.

The files look like this:

# text I don't care about
 
= Heading of the bit I care about
 
# text I care about

In other words I want to find the line that contains the Heading and then get all the text after that point.

I figured sed was the tool for the job but my knowledge of the syntax was a bit rusty. Luckily this post served as a refresher.

Effectively what we want to do is delete from the beginning of the file up until the line after the heading. We can do this with the following command:

$ cat /tmp/foo.txt 
# text I don't care about
 
= Heading of the bit I care about
 
# text I care about
$ cat /tmp/foo.txt | sed '1,/Heading of the bit I care about/d'
 
# text I care about

That still leaves an extra empty line after the heading which is a bit annoying but easy enough to get rid of by passing another command to sed that strips empty lines:

$ cat /tmp/foo.txt | sed -e '1,/Heading of the bit I care about/d' -e '/^\s*$/d'
# text I care about

The only difference here is that we’re now passing the ‘-e’ flag to allow us to specify multiple commands. If we just pass them sequentially then the 2nd one will be interpreted as the name of a file.

Be Sociable, Share!

Written by Mark Needham

June 19th, 2016 at 8:36 am

Posted in Shell Scripting

Tagged with