Mark Needham

Thoughts on Software Development

Java: Parsing CSV files

with 3 comments

As I mentioned in a previous post I recently moved a bunch of neo4j data loading code from Ruby to Java and as part of that process I needed to parse some CSV files.

In Ruby I was using FasterCSV which became the standard CSV library from Ruby 1.9 but it’s been a while since I had to parse CSV files in Java so I wasn’t sure which library to use.

I needed a library which could parse a comma separated file where there might be commas in the values of one of the fields. I think that’s fairly standard behaviour in any CSV library but my googling led me to OpenCSV.

It can be downloaded from here and so far seems to do the job!

This is an example of how I’m using it:

String filePath = "/Users/mneedham/data/awesome-csv-file.csv";
CSVReader reader = new CSVReader(new FileReader(filePath), ',');
 
List<String[]> csvEntries = reader.readAll();
Iterator<String[]> iterator = csvEntries.iterator();
 
while (iterator.hasNext()) {
    String[] row = iterator.next();
    System.out.println("field 1: " + row[0]);
}

There are more use cases described on the home page.

Be Sociable, Share!

Written by Mark Needham

September 23rd, 2012 at 10:46 pm

Posted in Java

Tagged with

  • Abraham Marín Pérez

    I needed a library which could parse a comma separated file where there might be commas in the values of one of the fields

    To be honest, I didn’t even know this was possible! If you encounter more commas than you should on a given line (because one of the fields is including commas itself), how do you know which one is the one that separate fields?

  • http://www.markhneedham.com/blog Mark Needham

    @google-af0bb581951d39bf7046c8ca5ce2b4ff:disqus if the field with the commas in is enclosed in double quotes (“”) then it all works as you imagine it would…the files I’m working with either have the data like that or I manipulate it so they do!

  • Abraham Marín Pérez

    Right! Now that makes sense, so we’re then talking about two different characters delimiting the fields, the comma as “primary” and the quotation mark as “secondary”.

    Thanks.