Recently I was working with a CSV file which contained both Windows and Unix line endings which was making it difficult to work with.
The actual line endings were HEX ‘0A0D’ i.e. Windows line breaks but there were also HEX ‘OA’ i.e. Unix line breaks within one of the columns.
I wanted to get rid of the Unix line breaks and discovered that you can do HEX sequence replacement using the GNU version of sed – unfortunately the Mac ships with the BSD version which doesn’t have this functionaltiy.
The first step was therefore to install the GNU version of sed.
brew install coreutils brew install gnu-sed --with-default-names
I wanted to replace my system sed so that’s why I went with the ‘–with-default-names’ flag – without that flag I believe the sed installation would be accessible as ‘gs-sed’.
The following is an example of what the lines in the file look like:
$ echo -e "Hello\x0AMark\x0A\x0D" Hello Mark
We want to get rid of the new line in between ‘Hello’ and ‘Mark’ but leave the other one be. I adapted one of the commands from this tutorial to look for lines which end in ‘0A’ where that isn’t followed by a ‘0D’:
$ echo -e "Hello\x0AMark\x0A\x0D" | \ sed 'N;/\x0A[^\x0D]/s/\n/ /' Hello Mark
Let’s go through the parts of the sed command:
- N – this creates a multiline pattern space by reading a new line of input and appending it to the contents of the pattern space. The two lines are separated by a new line.
- /\x0A[^\x0D]/ – this matches any lines which contain ‘OA’ not followed by ‘OD’
- /s/\n/ / – this substitutes the new line character with a space for those matching lines from the previous command.
Now let’s check it works if we have multiple lines that we want to squash:
$ echo -e "Hello\x0AMark\x0A\x0DHello\x0AMichael\x0A\x0D" Hello Mark Hello Michael $ echo -e "Hello\x0AMark\x0A\x0DHello\x0AMichael\x0A\x0D" | \ sed 'N;/\x0A[^\x0D]/s/\n/ /' Hello Mark Hello Michael
Looks good! The actual file is a bit more nuanced so I’ve still got a bit more work to do but this is a good start.