10 Nov 2012

Unix: Counting the number of commas on a line

A few weeks ago I was playing around with some data stored in a CSV file and wanted to do a simple check on the quality of the data by making sure that each line had the same number of fields.

One way this can be done is with awk:

awk -F "," ' { print NF-1 } ' file.csv

Here we’re specifying the file separator -F as ',' and then using the NF (number of fields) variable to print how many commas there are on the line.

Another slightly more complicated way is to combine tr and awk like so:

tr -d -c ",\n" < file.csv | awk ' { print length } '

Here we’re telling tr to delete any characters except for a comma or new line.

If we pass just a comma to the '-d' option like so…

tr -d "," < file.csv

…that would delete all the commas from a line but we can use the '-c' option to complement the comma i.e. delete everything except for the comma.

tr -d -c "," < file.csv

Unfortunately that puts all the commas onto the same line so we need to complement the new line character as well:

tr -d -c ",\n" < file.csv

We can then use the length variable of awk to print out the number of commas on each line.

We can achieve the same thing by making use of sed instead of tr like so:

sed 's/[^,]//g' file.csv | awk ' { print length } '

Since sed operates on a line by line basis we just need to tell it to substitute anything which isn’t a comma with nothing and then pipe the output of that into awk and use the length variable again.

I thought it might be possible to solve this problem using cut as well but I can’t see any way to get it to output the total number of fields.

If anyone knows any other cool ways to do the same thing let me know in the comments - it’s always interesting to see how different people wield the unix tools!

About the author

I'm currently working on short form content at ClickHouse. I publish short 5 minute videos showing how to solve data problems on YouTube @LearnDataWithMark. I previously worked on graph analytics at Neo4j, where I also co-authored the O'Reilly Graph Algorithms Book with Amy Hodler.