Ruby: Ignore header line when parsing CSV file
As my Ruby journey continues one of the things I wanted to do today was parse a CSV file.
This article proved to be very useful for teaching the basics but it didn't say how to ignore the header line that the CSV file contained.
The CSV file I was parsing was similar to this:
name, surname, location Mark, Needham, Sydney David, Smith, London
I wanted to get the names of people originally to use them in my code. This was the first attempt:
1 2 3 4 5 6 7 8 9 10 | require 'csv' def parse_csv_file_for_names(path_to_csv) names = [] csv_contents = CSV.read(path_to_csv) csv_contents.each do |row| names << row[0] end return names end |
I then printed out the names to see what was going on:
1 2 3 4 | names = parse_csv_file_for_names( "csv_file.csv" ) names.each do |name| puts name end |
This is what was printed:
name Mark David
It turns out that the 'shift' method is what I was looking for to help me ignore the first line of the file. The new and improved method now looks like this:
1 2 3 4 5 6 7 8 9 10 11 | require 'csv' def parse_csv_file_for_names(path_to_csv) names = [] csv_contents = CSV.read(path_to_csv) csv_contents.shift csv_contents.each do |row| names << row[0] end return names end |
Not a particularly complicated thing to do in the end although I had been expecting to find a method on CSV that would allow me to ignore the header line automatically. As far as I could tell there isn't one!
I wholeheartedly recommend FasterCSV:
http://fastercsv.rubyforge.org/
Comes with a headers option:
FasterCSV.foreach("/path/to/file.csv', :headers => true) do |row|
So the headers row will be skipped in the iteration, but you also get some nice convenience methods to work with it if you so wish.
Luigi Montanez
4 Oct 08 at 2:29 pm
First off let me thank you for this blog. It has been very beneficial to me. I'm attempting to build an key-word driven automation framework with Ruby/Watir and I needed to learn how to parse a csv file. I'm still a n00b with Ruby. I understand this blog however the question I have is about the
"row[0]". Can you explain what this is for. I looks like you are pushing contents of row[0] into the name variable. However I don't see anthing being read into row[].
Thanks,
Bobby
Bobby Washington
9 May 09 at 4:01 am
@Bobby – I'm not sure exactly what the internals of the 'each' method are – but basically row is defining a variable which refers to each line in the file.
The 'each' method makes a call to the block which 'row' is defined inside after each line of the file that it reads. So we're never ourselves putting anything into row but the each function is.
Mark Needham
9 May 09 at 12:34 pm
Thank you for the explanation Mark.
Thanks,
Bobby
Bobby Washington
12 May 09 at 2:16 am