Mark Needham

Thoughts on Software Development

Unix: Counting the number of commas on a line

with 2 comments

A few weeks ago I was playing around with some data stored in a CSV file and wanted to do a simple check on the quality of the data by making sure that each line had the same number of fields.

One way this can be done is with awk:

awk -F "," ' { print NF-1 } ' file.csv

Here we’re specifying the file separator -F as ‘,’ and then using the NF (number of fields) variable to print how many commas there are on the line.

Another slightly more complicated way is to combine tr and awk like so:

tr -d -c ",\n" < file.csv | awk ' { print length } '

Here we’re telling tr to delete any characters except for a comma or new line.

If we pass just a comma to the ‘-d’ option like so…

tr -d "," < file.csv

…that would delete all the commas from a line but we can use the ‘-c’ option to complement the comma i.e. delete everything except for the comma.

tr -d -c "," < file.csv

Unfortunately that puts all the commas onto the same line so we need to complement the new line character as well:

tr -d -c ",\n" < file.csv

We can then use the length variable of awk to print out the number of commas on each line.

We can achieve the same thing by making use of sed instead of tr like so:

sed 's/[^,]//g' file.csv | awk ' { print length } '

Since sed operates on a line by line basis we just need to tell it to substitute anything which isn’t a comma with nothing and then pipe the output of that into awk and use the length variable again.

I thought it might be possible to solve this problem using cut as well but I can’t see any way to get it to output the total number of fields.

If anyone knows any other cool ways to do the same thing let me know in the comments – it’s always interesting to see how different people wield the unix tools!

Written by Mark Needham

November 10th, 2012 at 4:30 pm

Posted in Shell Scripting

Tagged with

  • pDaleC

    Using Perl (not necessarily in the most efficient way):
    perl -ne “print( tr/,// . qq(n));” file.csv

    Also, to check that all the lines are the same, you might want to add the following to the end of all your commands, to get a list of unique counts (should give only 1 value):
    | sort -u

  • http://www.markhneedham.com/blog Mark Needham

    @d08281e1f1d98af16451fe77c52b2a47:disqus nice idea on the ‘sort -u’, I think I’d actually used ‘sort | uniq’, didn’t realise you could do the unique part using sort so that’s one less pipe I need in the future!