Mark Needham

Thoughts on Software Development

Archive for the ‘Shell Scripting’ Category

Neo4j: Using LOAD CSV to help explore CSV files

without comments

During the Neo4j How I met your mother hackathon that we ran last week one of the attendees noticed that one of the CSV files we were importing wasn’t creating as many records as they expected it to.

This is typically the case when there’s some odd quoting in the CSV file but we decided to look into it.

The file in question was one containing references made in HIMYM. The first 5 lines look like this:

$ head -n 5 data/import/references.csv
ReferencedEpisodeId,ReferencingEpisodeId,ReferenceText
168,184,"Marshall will eventually hear back from the New York State Judicatory Committee in Something New, which will become a main plot point of Season 9."
168,169,Barney proclaiming to be done with Robin will be the focal point of Lobster Crawl.
58,57,"Barney finally confronts his saboteur (Abby, whom he slept with in Ten Sessions) in Everything Must Go."
58,63,"Barney finally confronts his saboteur (Abby, whom he slept with in Ten Sessions) in Everything Must Go."

And this is how many lines the Unix ‘wc’ command sees:

$ wc -l data/import/references.csv
     782 data/import/references.csv

So we might expect that there are going to be 782 records created if we import that file into Neo4j. Let’s run a quick query in Neo4j to see what it thinks:

LOAD CSV WITH HEADERS 
FROM "file:///Users/markneedham/projects/neo4j-himym/data/import/references.csv" 
AS row
return count(*)
 
==> +----------+
==> | count(*) |
==> +----------+
==> | 636      |
==> +----------+
==> 1 row

So we have 146 less records than we expected which means Neo4j is treating multiple lines as one CSV line in some cases.

Let’s go back to the Unix command line to try and work out which lines those are. There must be some lines which start with part of the ‘ReferenceText’ rather than a ‘ReferenceEpisodeId’ so let’s extract the first column and see what’s going on there:

$ cat data/import/references.csv | cut -d"," -f1 | grep -v  '[0-9]\+$'| head -n 10
ReferencedEpisodeId
This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny
This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny
This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny
This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny
This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny
This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny
This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny
This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny
Also

We’ve extracted the first column and then filter the output to only keep rows which don’t contain all numbers which will be our rogue rows.

Let’s switch back to Neo4j land to see which rows it thinks contains these fragments of text:

LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/projects/neo4j-himym/data/import/references.csv" AS row
WITH row WHERE row.ReferenceText =~ ".*This is the Mother's first.*"
RETURN row.ReferencedEpisodeId, row.ReferencingEpisodeId, row.ReferenceText
 
==> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
==> | row.ReferencedEpisodeId | row.ReferencingEpisodeId | row.ReferenceText                                                                                                                                                                                                                                                                                                                                                                     |
==> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
==> | "45"                    | "37"                     | "This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9." |
==> | "45"                    | "184"                    | "This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9." |
==> +--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Interestingly it only returns two rows containing that phrase whereas we see it at least 8 times. Initially I thought this was an issue with the LOAD CSV command but if we filter the rows to only return ones that have a ‘ReferencedEpisodeId’ of ’45’ then we do see them returned:

==> +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
==> | row                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
==> +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "53", ReferenceText -> "The website counting down to the next slap (slapcountdown.com) that Marshall sends Barney reaches zero in Slapsgiving, when the third slap is delivered."}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "55", ReferenceText -> "Ted gets rid of his butterfly tramp stamp through ten weekly sessions of laser tattoo removal between The Platinum Rule and Ten Sessions, over the course of which he meets, asks out, and eventually starts dating his dermatologist, Stella Zinman."}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "57", ReferenceText -> "Ted gets rid of his butterfly tramp stamp through ten weekly sessions of laser tattoo removal between The Platinum Rule and Ten Sessions, over the course of which he meets, asks out, and eventually starts dating his dermatologist, Stella Zinman."}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "56", ReferenceText -> "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton.
==> This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."}  |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "200", ReferenceText -> "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton.
==> This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."} |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "100", ReferenceText -> "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton.
==> This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."} |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "86", ReferenceText -> "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton.
==> This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."}  |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "113", ReferenceText -> "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton.
==> This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."} |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "161", ReferenceText -> "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton.
==> This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."} |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "37", ReferenceText -> "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton.
==> This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."}  |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "184", ReferenceText -> "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton.
==> This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."} |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "37", ReferenceText -> "This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."}                                                                                                                                                                                                                                                                                                                                                                         |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "184", ReferenceText -> "This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."}                                                                                                                                                                                                                                                                                                                                                                        |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "63", ReferenceText -> "Marshall makes other home-made websites in Everything Must Go (lilyandmarshallselltheirstuff.com) and The Sexless Innkeeper (itwasthebestnightever.com), where Lily and Future Ted mention it being a problem."}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "92", ReferenceText -> "Marshall makes other home-made websites in Everything Must Go (lilyandmarshallselltheirstuff.com) and The Sexless Innkeeper (itwasthebestnightever.com), where Lily and Future Ted mention it being a problem."}                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |
==> +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

So the actual problem is that the regex matcher doesn’t deal with the new line in the string.

Our next step is therefore to get rid of new lines within strings. I spent ages trying to find the appropriate command before coming across the following use of awk which does the job:

$ cat data/import/references.csv | awk '(NR-1)%2{$1=$1} {print $0}' RS=\" ORS=\" | wc -l
637
 
$ cat data/import/references.csv | awk '(NR-1)%2{$1=$1} {print $0}' RS=\" ORS=\" > data/import/refs.csv

Let’s try the LOAD CSV command again:

LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/projects/neo4j-himym/data/import/refs.csv" AS row
WITH row WHERE row.ReferenceText =~ ".*This is the Mother's first.*"
RETURN row.ReferencedEpisodeId, row.ReferencingEpisodeId, row.ReferenceText
 
==> +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
==> | row.ReferencedEpisodeId | row.ReferencingEpisodeId | row.ReferenceText                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |
==> +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
==> | "45"                    | "56"                     | "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton. This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9." |
==> | "45"                    | "200"                    | "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton. This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9." |
==> | "45"                    | "100"                    | "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton. This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9." |
==> | "45"                    | "86"                     | "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton. This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9." |
==> | "45"                    | "113"                    | "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton. This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9." |
==> | "45"                    | "161"                    | "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton. This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9." |
==> | "45"                    | "37"                     | "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton. This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9." |
==> | "45"                    | "184"                    | "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton. This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9." |
==> | "45"                    | "37"                     | "This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."                                                                                                                                                                                                                                                                                                                                                                        |
==> | "45"                    | "184"                    | "This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."                                                                                                                                                                                                                                                                                                                                                                        |
==> +-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

And there we go!

Update

Michael pointed out that I could have used the dotall regex flag at the beginning of the regular expression in order to search across new lines without having to remove them! In that case the query would read like this:

LOAD CSV WITH HEADERS FROM "file:///Users/markneedham/projects/neo4j-himym/data/import/references.csv" AS row
WITH row WHERE row.ReferenceText =~ "(?s).*This is the Mother.*"
RETURN row
 
==> +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
==> | row                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
==> +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "56", ReferenceText -> "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton.
==> This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."}  |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "200", ReferenceText -> "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton.
==> This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."} |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "100", ReferenceText -> "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton.
==> This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."} |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "86", ReferenceText -> "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton.
==> This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."}  |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "113", ReferenceText -> "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton.
==> This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."} |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "161", ReferenceText -> "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton.
==> This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."} |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "37", ReferenceText -> "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton.
==> This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."}  |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "184", ReferenceText -> "Ted eventually acquires the yellow umbrella in No Tomorrow (after the Mother leaves it behind at the St. Patrick's Day party, as seen in How Your Mother Met Me), and leaves it in Cindy's and the Mother's apartment in Girls Versus Suits. The umbrella is also seen/referenced in many other episodes, including Right Place, Right Time, Big Days, and Farhampton.
==> This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."} |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "37", ReferenceText -> "This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."}                                                                                                                                                                                                                                                                                                                                                                         |
==> | {ReferencedEpisodeId -> "45", ReferencingEpisodeId -> "184", ReferenceText -> "This is the Mother's first on-screen appearance with the yellow umbrella. Previously she appeared in Lucky Penny, with her head obscured by a bridal veil. She is seen again in No Tomorrow, again hidden by the umbrella, her ankle is seen briefly in Girls Versus Suits, and she gets her first proper appearance in Something New, after which she appears throughout Season 9."}                                                                                                                                                                                                                                                                                                                                                                        |
==> +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+

Written by Mark Needham

June 11th, 2015 at 11:15 pm

Posted in neo4j,Shell Scripting

Tagged with

Mac OS X: GNU sed – Hex string replacement / replacing new line characters

without comments

Recently I was working with a CSV file which contained both Windows and Unix line endings which was making it difficult to work with.

The actual line endings were HEX ‘0A0D’ i.e. Windows line breaks but there were also HEX ‘OA’ i.e. Unix line breaks within one of the columns.

I wanted to get rid of the Unix line breaks and discovered that you can do HEX sequence replacement using the GNU version of sed – unfortunately the Mac ships with the BSD version which doesn’t have this functionaltiy.

The first step was therefore to install the GNU version of sed.

brew install coreutils
brew install gnu-sed --with-default-names

I wanted to replace my system sed so that’s why I went with the ‘–with-default-names’ flag – without that flag I believe the sed installation would be accessible as ‘gs-sed’.

The following is an example of what the lines in the file look like:

$ echo -e "Hello\x0AMark\x0A\x0D"
Hello
Mark

We want to get rid of the new line in between ‘Hello’ and ‘Mark’ but leave the other one be. I adapted one of the commands from this tutorial to look for lines which end in ‘0A’ where that isn’t followed by a ‘0D':

$ echo -e "Hello\x0AMark\x0A\x0D" | \
  sed 'N;/\x0A[^\x0D]/s/\n/ /'
Hello Mark

Let’s go through the parts of the sed command:

  • N – this creates a multiline pattern space by reading a new line of input and appending it to the contents of the pattern space. The two lines are separated by a new line.
  • /\x0A[^\x0D]/ – this matches any lines which contain ‘OA’ not followed by ‘OD’
  • /s/\n/ / – this substitutes the new line character with a space for those matching lines from the previous command.

Now let’s check it works if we have multiple lines that we want to squash:

$ echo -e "Hello\x0AMark\x0A\x0DHello\x0AMichael\x0A\x0D"
Hello
Mark
Hello
Michael
 
$ echo -e "Hello\x0AMark\x0A\x0DHello\x0AMichael\x0A\x0D" | \
  sed 'N;/\x0A[^\x0D]/s/\n/ /'
Hello Mark
Hello Michael

Looks good! The actual file is a bit more nuanced so I’ve still got a bit more work to do but this is a good start.

Written by Mark Needham

June 11th, 2015 at 9:38 pm

Posted in Shell Scripting

Tagged with

Unix: Converting a file of values into a comma separated list

without comments

I recently had a bunch of values in a file that I wanted to paste into a Java program which required a comma separated list of strings.

This is what the file looked like:

$ cat foo2.txt | head -n 5
1.0
1.0
1.0
1.0
1.0

And the idea is that we would end up with something like this:

"1.0","1.0","1.0","1.0","1.0"


The first thing we need to do is quote each of the values. I found a nice way to do this using sed:

$ sed 's/.*/"&"/g' foo2.txt | head -n 5
"1.0"
"1.0"
"1.0"
"1.0"
"1.0"

Now that we’ve got all the values quoted we need to get rid of the new lines and replace them with commas. The way I’d normally do this is using ‘tr’ and then just not copy the final comma…

$ sed 's/.*/"&"/g' foo2.txt | tr '\n' ','
"1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0",

…but I learnt that we can actually do one better than this using ‘paste’ which allows you to replace new lines excluding the last one.

The only annoying thing about paste is that you can’t pipe to it so we need to use process substitution instead:

$ paste -s -d ',' <(sed 's/.*/"&"/g' foo2.txt)
"1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0","1.0"

If we’re only a Mac we could even automate the copy/paste step too by piping to ‘pbcopy':

$ paste -s -d ',' <(sed 's/.*/"&"/g' foo2.txt) | pbcopy

Written by Mark Needham

June 8th, 2015 at 10:23 pm

Posted in Shell Scripting

Tagged with

cURL: POST/Upload multi part form

without comments

I’ve been doing some work which involved uploading a couple of files from a HTML form and I wanted to check that the server side code was working by executing a cURL command rather than using the browser.

The form looks like this:

<form action="http://foobar.com" method="POST" enctype="multipart/form-data">
    <p>
        <label for="nodes">File 1:</label>
        <input type="file" name="file1" id="file1">
    </p>
 
    <p>
        <label for="relationships">File 2:</label>
        <input type="file" name="file2" id="file2">
    </p>
 
    <input type="submit" name="submit" value="Submit">
</form>

If we convert the POST request from the browser into a cURL equivalent we end up with the following:

curl 'http://foobar.com' -H 'Origin: null' -H 'Accept-Encoding: gzip,deflate,sdch' -H 'Host: foobar.com:7474' -H 'Accept-Language: en-US,en;q=0.8' -H 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_3) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.95 Safari/537.36' -H 'Content-Type: multipart/form-data; boundary=----WebKitFormBoundaryMxYFIg6GFEIPAe6V' -H 'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8' -H 'Cache-Control: max-age=0' -H 'Cookie: splashShown1.6=1; undefined=0; _mkto_trk=id:773-GON-065&token:_mch-localhost-1373821432078-37666; JSESSIONID=123cbkxby1rtcj3dwipqzs7yu' -H 'Connection: keep-alive' --data-binary $'------WebKitFormBoundaryMxYFIg6GFEIPAe6V\r\nContent-Disposition: form-data; name="file1"; filename="file1.csv"\r\nContent-Type: text/csv\r\n\r\n\r\n------WebKitFormBoundaryMxYFIg6GFEIPAe6V\r\nContent-Disposition: form-data; name="file2"; filename="file2.csv"\r\nContent-Type: text/csv\r\n\r\n\r\n------WebKitFormBoundaryMxYFIg6GFEIPAe6V\r\nContent-Disposition: form-data; name="submit"\r\n\r\nSubmit\r\n------WebKitFormBoundaryMxYFIg6GFEIPAe6V--\r\n' --compressed

I tried executing this command but I couldn’t quite get it to work, and in any case it seemed extremely complicated. Thankfully, I came across a cURL tutorial which described a much simpler alternative which does work:

curl --form file1=@file1.csv --form file2=@file2.csv --form press=submit http://foobar.com

I knew it couldn’t be that complicated!

Written by Mark Needham

September 23rd, 2013 at 10:16 pm

Posted in Shell Scripting

Tagged with ,

Unix: tar – Extracting, creating and viewing archives

with one comment

I’ve been playing around with the Unix tar command a bit this week and realised that I’d memorised some of the flag combinations but didn’t actually know what each of them meant.

For example, one of the most common things that I want to do is extract a gripped neo4j archive:

$ wget http://dist.neo4j.org/neo4j-community-1.9.2-unix.tar.gz
$ tar -xvf neo4j-community-1.9.2-unix.tar.gz

where:

  • -x means extract
  • -v means produce verbose output i.e. print out the names of all the files as you unpack it
  • -f means unpack the file which follows this flag i.e. neo4j-community-1.9.2-unix.tar.gz in this example

I didn’t realise that by default tar runs against standard input so we could actually achieve the above in one go with the following combination:

$ wget http://dist.neo4j.org/neo4j-community-1.9.2-unix.tar.gz -o - | tar -xv

The other thing I wanted to do was create a gripped archive from the contents of a folder, something which I do much less frequently and am therefore much more rusty at! The following does the trick:

$ tar -cvzpf neo4j-football.tar.gz neo4j-football/
$ ls -alh neo4j-football.tar.gz 
-rw-r--r--  1 markhneedham  staff   526M 22 Aug 23:38 neo4j-football.tar.gz

where:

  • -c means create a new archive
  • -z means gzip that archive
  • -p means preserve file permissions

Sometimes we’ll want to exclude some things from our archive which is where the ‘–exclude’ flag comes in handy.

For example I want to exclude the data, git and neo4j folders which sit inside ‘neo4j-football’ which I can do with the following:

$ tar --exclude "data*" --exclude "neo4j-community*" --exclude ".git" -cvzpf neo4j-football.tar.gz neo4j-football/
$ ls -alh neo4j-football.tar.gz 
-rw-r--r--  1 markhneedham  staff   138M 22 Aug 23:36 neo4j-football.tar.gz

If we want to quickly check that our file has been created correctly we can run the following:

$ tar -tvf neo4j-football.tar.gz

where:

  • -t means list the contents of the archive to standard out

And that pretty much covers my main use cases for the moment!

Written by Mark Needham

August 22nd, 2013 at 10:56 pm

Posted in Shell Scripting

Tagged with ,

Unix/awk: Extracting substring using a regular expression with capture groups

with 4 comments

A couple of years ago I wrote a blog post explaining how I’d used GNU awk to extract story numbers from git commit messages and I wanted to do a similar thing today to extract some node ids from a file.

My eventual solution looked like this:

$ echo "mark #1000" | gawk '{ match($0, /#([0-9]+)/, arr); if(arr[1] != "") print arr[1] }'
1000

But in the comments an alternative approach was suggested which used the Mac version of awk and the RSTART and RLENGTH global variables which get set when a match is found:

$ echo "mark #1000" | awk 'match($0, /#[0-9]+/) { print substr( $0, RSTART, RLENGTH )}'
#1000

Unfortunately Mac awk doesn’t seem to capture groups so as you can see it includes the # character which we don’t actually want.

In this instance it wasn’t such a big deal but it was more annoying for the node id extraction that I was trying to do:

$ head -n 5 log.txt
Command[27716, Node[7825340,used=true,rel=14547348,prop=31734662]]
Command[27716, Node[7825341,used=true,rel=14547349,prop=31734665]]
Command[27716, Node[7825342,used=true,rel=14547350,prop=31734668]]
Command[27716, Node[7825343,used=true,rel=14547351,prop=31734671]]
$ head -n 5 log.txt | awk 'match($0, /Node\[([^,]+)/) { print substr( $0, RSTART, RLENGTH )}'
Node[7825340
Node[7825341
Node[7825342
Node[7825343
Node[7825336

I ended up having to brew install gawk and using a variation of the gawk command I mentioned at the beginning of this post:

$ head -n 5 log.txt | gawk 'match($0, /Node\[([^,]+)/, arr) { print arr[1]}'
7825340
7825341
7825342
7825343
7825336

Written by Mark Needham

June 26th, 2013 at 3:23 pm

Posted in Shell Scripting

Tagged with

Unix: find, xargs, zipinfo and the ‘caution: filename not matched:’ error

with 4 comments

As I mentioned in my previous post last week I needed to scan all the jar files included with the neo4j-enterprise gem and I started out by finding out where it’s located on my machine:

$ bundle show neo4j-enterprise
/Users/markhneedham/.rbenv/versions/jruby-1.7.1/lib/ruby/gems/shared/gems/neo4j-enterprise-1.8.2-java

I then thought I could get a list of all the jar files using find and pipe it into zipinfo via xargs to get all the file names and then search for HighlyAvailableGraphDatabaseFactory:

Unfortunately when I tried that it didn’t quite work:

$ cd /Users/markhneedham/.rbenv/versions/jruby-1.7.1/lib/ruby/gems/shared/gems/neo4j-enterprise-1.8.2-java/lib/neo4j-enterprise/jars/
$ find . -iname "*.jar" | xargs zipinfo
caution: filename not matched:  ./lib/neo4j-enterprise/jars/logback-classic-0.9.30.jar
caution: filename not matched:  ./lib/neo4j-enterprise/jars/logback-core-0.9.30.jar
caution: filename not matched:  ./lib/neo4j-enterprise/jars/neo4j-backup-1.8.2.jar
caution: filename not matched:  ./lib/neo4j-enterprise/jars/neo4j-com-1.8.2.jar
caution: filename not matched:  ./lib/neo4j-enterprise/jars/neo4j-consistency-check-1.8.2.jar
caution: filename not matched:  ./lib/neo4j-enterprise/jars/neo4j-ha-1.8.2.jar
caution: filename not matched:  ./lib/neo4j-enterprise/jars/neo4j-udc-1.8.2.jar
caution: filename not matched:  ./lib/neo4j-enterprise/jars/org.apache.servicemix.bundles.netty-3.2.5.Final_1.jar
caution: filename not matched:  ./lib/neo4j-enterprise/jars/server-api-1.8.2.jar
caution: filename not matched:  ./lib/neo4j-enterprise/jars/slf4j-api-1.6.2.jar
caution: filename not matched:  ./lib/neo4j-enterprise/jars/zookeeper-3.3.2.jar

I switched ‘zipinfo’ to ‘echo’ to see what was going on which resulted in the following output:

$ find . -iname "*.jar" | xargs echo
./log4j-1.2.16.jar ./logback-classic-0.9.30.jar ./logback-core-0.9.30.jar ./neo4j-backup-1.8.2.jar ./neo4j-com-1.8.2.jar ./neo4j-consistency-check-1.8.2.jar ./neo4j-ha-1.8.2.jar ./neo4j-udc-1.8.2.jar ./org.apache.servicemix.bundles.netty-3.2.5.Final_1.jar ./server-api-1.8.2.jar ./slf4j-api-1.6.2.jar ./zookeeper-3.3.2.jar

As I understand it, xargs expects arguments to be separated by a space and I thought it would apply the command to each argument individually but it seemed to be including the space as part of the file name.

I’ve previously used the ‘-n’ flag to xargs to explicitly tell it to call the corresponding command with one argument at a time and that seemed to do the trick:

$ find . -iname "*.jar" | xargs -n1 zipinfo
Archive:  ./log4j-1.2.16.jar   481535 bytes   346 files
-rw----     2.0 fat     3186 bX defN 30-Mar-10 23:25 META-INF/MANIFEST.MF
-rw----     2.0 fat        0 bl defN 30-Mar-10 23:25 META-INF/
-rw----     2.0 fat    11366 bl defN 30-Mar-10 23:14 META-INF/LICENSE
-rw----     2.0 fat      160 bl defN 30-Mar-10 23:14 META-INF/NOTICE
-rw----     2.0 fat        0 bl defN 30-Mar-10 23:25 META-INF/maven/
-rw----     2.0 fat        0 bl defN 30-Mar-10 23:25 META-INF/maven/log4j/
...

Of course to solve this particular we don’t actually need to use find and xargs since we can just call zipinfo with the wildcard match:

$ zipinfo \*.jar
Archive:  log4j-1.2.16.jar   481535 bytes   346 files
-rw----     2.0 fat     3186 bX defN 30-Mar-10 23:25 META-INF/MANIFEST.MF
-rw----     2.0 fat        0 bl defN 30-Mar-10 23:25 META-INF/
-rw----     2.0 fat    11366 bl defN 30-Mar-10 23:14 META-INF/LICENSE
-rw----     2.0 fat      160 bl defN 30-Mar-10 23:14 META-INF/NOTICE
-rw----     2.0 fat        0 bl defN 30-Mar-10 23:25 META-INF/maven/
-rw----     2.0 fat        0 bl defN 30-Mar-10 23:25 META-INF/maven/log4j/
-rw----    
...

I’m curious why xargs didn’t work as I expected it to though – have I just misremembered its default behaviour or is something weird going on?

Written by Mark Needham

June 9th, 2013 at 11:10 pm

Posted in Shell Scripting

Tagged with

Unix: Working with parts of large files

with one comment

Chris and I were looking at the neo4j log files of a client earlier in the week and wanted to do some processing of the file so we could ask the client to send us some further information.

The log file was over 10,000 lines long but the bit of the file we were interesting in was only a few hundred lines.

I usually use Vim and the ‘:set number’ when I want to refer to line numbers in a file but Chris showed me that we can achieve the same thing with e.g. ‘less -N data/log/neo4j.0.0.log’.

We can then operate on say lines 10-100 by passing the ‘-n’ flag to sed:

-n By default, each line of input is echoed to the standard output after all of the commands have been applied to it. The -n option suppresses this behavior.

$ sed -n '10,15p' data/log/neo4j.0.0.log
INFO: Enabling HTTPS on port [7473]
May 19, 2013 11:11:52 AM org.neo4j.server.logging.Logger log
INFO: No SSL certificate found, generating a self-signed certificate..
May 19, 2013 11:11:53 AM org.neo4j.server.logging.Logger log
INFO: Mounted discovery module at [/]
May 19, 2013 11:11:53 AM org.neo4j.server.logging.Logger log

We then used a combination of grep, awk and sort to work out which log files we needed.

Written by Mark Needham

May 19th, 2013 at 9:44 pm

Posted in Shell Scripting

Tagged with

Unix: Checking for open sockets on nginx

without comments

Tim and I were investigating a weird problem we were having with nginx where it was getting in a state where it had exceeded the number of open files allowed on the system and started rejecting requests.

We can find out the maximum number of open files that we’re allowed on a system with the following command:

$ ulimit -n
1024

Our hypothesis was that some socket connections were never being closed and therefore the number of open files was climbing slowly upwards until it exceeded the limit.

We wanted to check how many sockets nginx had open so to start with we needed to know the process IDs it was running under:

$ ps aux | grep nginx | grep -v grep
root      1089  0.0  0.7 105152  2736 ?        Ss   17:34   0:00 nginx: master process /usr/sbin/nginx
www-data 17474  0.0  0.6 105300  2296 ?        S    21:49   0:04 nginx: worker process
www-data 17475  0.0  0.7 105300  2856 ?        S    21:49   0:04 nginx: worker process
www-data 17476  0.0  0.7 105300  2792 ?        S    21:49   0:03 nginx: worker process
www-data 17477  0.0  0.7 105300  2668 ?        S    21:49   0:04 nginx: worker process

So the process IDs we’re interested in are 1089, 17474, 17475, 17476 and 17477.

We can check which file descriptors they have open with the following command:

$ sudo ls -alh /proc/{1089,17{474,475,476,477}}/fd
/proc/17476/fd:
total 0
dr-x------ 2 www-data www-data  0 Apr 23 23:40 .
...
l-wx------ 1 www-data www-data 64 Apr 23 23:40 6 -> /var/log/nginx/error.log
l-wx------ 1 www-data www-data 64 Apr 23 23:40 7 -> /var/www/thinkingingraphs/shared/log/nginx_access.log
l-wx------ 1 www-data www-data 64 Apr 23 23:40 8 -> /var/www/thinkingingraphs/shared/log/nginx_error.log
lrwx------ 1 www-data www-data 64 Apr 23 23:40 9 -> socket:[8910]
 
/proc/17477/fd:
total 0
...
lrwx------ 1 www-data www-data 64 Apr 23 23:40 56 -> socket:[52213]
lrwx------ 1 www-data www-data 64 Apr 23 23:40 57 -> anon_inode:[eventpoll]
l-wx------ 1 www-data www-data 64 Apr 23 23:40 6 -> /var/log/nginx/error.log
l-wx------ 1 www-data www-data 64 Apr 23 23:40 7 -> /var/www/thinkingingraphs/shared/log/nginx_access.log
l-wx------ 1 www-data www-data 64 Apr 23 23:40 8 -> /var/www/thinkingingraphs/shared/log/nginx_error.log
lrwx------ 1 www-data www-data 64 Apr 23 23:40 9 -> socket:[8910]

We can narrow that down to just show us how many sockets are open:

$ sudo ls -alh /proc/{1089,17{474,475,476,477}}/fd | grep socket  | wc -l
189

We could also use lsof although for some reason that returns a slightly different number:

$ sudo lsof -p 1089,17474,17475,17476,17477 | grep socket | wc -l
184

If we want to use brace expansion to do that it becomes a bit more tricky:

$ sudo lsof -p `echo {1089,174{74,75,76,77}} | sed 's/ /,/g'` | grep socket | wc -l
184

Annoyingly we couldn’t actually replicate the error but think that it’s been solved in nginx 1.2.0 (we were using 1.1.19) by this change:

Bugfix: a segmentation fault might occur in a worker process if the
       "try_files" directive was used; the bug had appeared in 1.1.19.

Written by Mark Needham

April 23rd, 2013 at 11:59 pm

Posted in Shell Scripting

Tagged with

awk: Parsing ‘free -m’ output to get memory usage/consumption

with 3 comments

Although I know this problem is already solved by collectd and New Relic I wanted to write a little shell script that showed me the memory usage on a bunch of VMs by parsing the output of free.

The output I was playing with looks like this:

$ free -m
             total       used       free     shared    buffers     cached
Mem:           365        360          5          0         59         97
-/+ buffers/cache:        203        161
Swap:          767         13        754

I wanted to find out what % of the memory on the machine was being used and as I understand it the numbers that we would use to calculate this are the ‘total’ value on the ‘Mem’ line and the ‘used’ value on the ‘buffers/cache’ line.

I initially thought that the ‘used’ value I was interested in should be the one on the ‘Mem’ line but this number includes memory that Linux has borrowed for disk caching so it isn’t the true number.

There’s another quite interesting article showing some experiments you can do to prove this.

So what I wanted to do was get the result of the calculation ‘203/365′ which I wasn’t sure how to do until I realised you can match multiple regular expressions with awk like so:

$ free -m | awk '/Mem:/ { print $2 } /buffers\/cache/ { print $3 }'                                                        
365
203

We’ve now filtered the output down to just our two numbers but another neat thing you can do with awk is change what it uses as its field and record separator.

In this case we want to change the field separator to be the new line character and we’ll set the record separator to be nothing because otherwise it defaults to the new line character which will mess with our field separator.

Those two values are set by using the ‘RS’ and ‘FS’ variables:

$ free -m | 
  awk '/Mem:/ { print $2 } /buffers\/cache/ { print $3 }' | 
  awk 'BEGIN { RS = "" ; FS = "\n" } { print $2 / $1 }'
0.556164

This is still sub optimal because we’re using two awk commands rather than one! We can get around that by storing the two memory values in variables and printing them out in an END block:

$ free -m | 
  awk '/Mem:/ { total=$2 } /buffers\/cache/ { used=$3 } END { print used/total}'
0.556164

Written by Mark Needham

April 10th, 2013 at 7:03 am

Posted in Shell Scripting

Tagged with