Mark Needham

Thoughts on Software Development

Archive for the ‘jq’ tag

jq: Cannot iterate over number / string and number cannot be added

without comments

In my continued parsing of meetup.com’s JSON API I wanted to extract some information from the following JSON file:

$ head -n40 data/members/18313232.json
[
  {
    "status": "active",
    "city": "London",
    "name": ". .",
    "other_services": {},
    "country": "gb",
    "topics": [],
    "lon": -0.13,
    "joined": 1438866605000,
    "id": 92951932,
    "state": "17",
    "link": "http://www.meetup.com/members/92951932",
    "photo": {
      "thumb_link": "http://photos1.meetupstatic.com/photos/member/8/d/6/b/thumb_250896203.jpeg",
      "photo_id": 250896203,
      "highres_link": "http://photos1.meetupstatic.com/photos/member/8/d/6/b/highres_250896203.jpeg",
      "photo_link": "http://photos1.meetupstatic.com/photos/member/8/d/6/b/member_250896203.jpeg"
    },
    "lat": 51.49,
    "visited": 1446745707000,
    "self": {
      "common": {}
    }
  },
  {
    "status": "active",
    "city": "London",
    "name": "Abdelkader Idryssy",
    "other_services": {},
    "country": "gb",
    "topics": [
      {
        "name": "Weekend Adventures",
        "urlkey": "weekend-adventures",
        "id": 16438
      },
      {
        "name": "Community Building",
        "urlkey": "community-building",

In particular I want to extract the member’s id, name, join date and the ids of topics they’re interested in. I started with the following jq query to try and extract those attributes:

$ jq -r '.[] | [.id, .name, .joined, (.topics[] | .id | join(";"))] | @csv' data/members/18313232.json
Cannot iterate over number (16438)

Annoyingly this treats topic ids on an individual basis rather than as an array as I wanted. I tweaked the query to the following with no luck:

$ jq -r '.[] | [.id, .name, .joined, (.topics[].id | join(";"))] | @csv' data/members/18313232.json
Cannot iterate over number (16438)

As a guess I decided to wrap ‘.topics[].id’ in an array literal to see if it had any impact:

$ jq -r '.[] | [.id, .name, .joined, ([.topics[].id] | join(";"))] | @csv' data/members/18313232.json
92951932,". .",1438866605000,""
jq: error (at data/members/18313232.json:60013): string ("") and number (16438) cannot be added

Woot! A different error message at least and this one seems to be due to a type mismatch between the string we want to end up with and the array of numbers that we currently have.

We can cast our way to victory with the ‘tostring’ function:

$ jq -r '.[] | [.id, .name, .joined, ([.topics[].id | tostring] | join(";"))] | @csv' data/members/18313232.json
...
92951932,". .",1438866605000,""
193866304,"Abdelkader Idryssy",1445195325000,"16438;20727;15401;9760;20246;20923;3336;2767;242;58259;4417;1789;10454;20274;10232;563;25375;16433;15187;17635;26273;21808;933;7789;23884;16212;144477;15322;21067;3833;108403;20221;1201;182;15083;9696;4377;15360;18296;15121;17703;10161;1322;3880;18333;3485;15585;44584;18692;21681"
28643052,"Abhishek Chanda",1439688955000,"646052;520302;15167;563;65735;537492;646072;537502;24959;1025832;8599;31197;24410;26118;10579;1064;189;48471;16216;18062;33089;107633;46831;20479;1423042;86258;21441;3833;21681;188;9696;58162;20398;113032;18060;29971;55324;30928;15261;58259;638;16475;27591;10107;242;109595;10470;26384;72514;1461192"
39523062,"Adam Kinder-Jones",1438677474000,"70576;21549;3833;42277;164111;21522;93380;48471;15167;189;563;25435;87614;9696;18062;58162;10579;21681;19882;108403;128595;15582;7029"
194119823,"Adam Lewis",1444867994000,"10209"
14847001,"Adam Rogers",1422917313000,""
87709042,"Adele Green",1436867381000,"15167;18062;102811;9696;30928;18060;78565;189;7029;48471;127567;10579;58162;563;3833;16216;21441;37708;209821;15401;59325;31792;21836;21900;984862;15720;17703;96823;4422;85951;87614;37428;2260;827;121802;19672;38660;84325;118991;135612;10464;1454542;17936;21549;21520;17628;148303;20398;66339;29661"
11497937,"Adrian Bridgett",1421067940000,"30372;15046;25375;638;498;933;374;27591;18062;18060;15167;10581;16438;15672;1998;1273;713;26333;15099;15117;4422;15892;242;142180;563;31197;20479;1502;131178;15018;43256;58259;1201;7319;15940;223;8652;66493;15029;18528;23274;9696;128595;21681;17558;50709;113737"
14151190,"adrian lawrence",1437142198000,"7029;78565;659;85951;15582;48471;9696;128595;563;10579;3833;101960;16137;1973;78566;206;223;21441;16216;108403;21681;186;1998;15731;17703;15043;16613;17885;53531;48375;16615;19646;62326;49954;933;22268;19243;37381;102811;30928;455;10358;73511;127567;106382;16573;36229;781;23981;1954"
183557824,"Adrien Pujol",1421882756000,"108403;563;9696;21681;188;24410;1064;32743;124668;15472;21123;1486432;1500742;87614;46831;1454542;46810;166000;126177;110474"
...

Written by Mark Needham

November 24th, 2015 at 12:12 am

Posted in Software Development

Tagged with

jq: Filtering missing keys

without comments

I’ve been playing around with the meetup.com API again over the last few days and having saved a set of events to disk I wanted to extract the venues using jq.

This is what a single event record looks like:

$ jq -r ".[0]" data/events/0.json
{
  "status": "past",
  "rating": {
    "count": 1,
    "average": 1
  },
  "utc_offset": 3600000,
  "event_url": "http://www.meetup.com/londonweb/events/3261890/",
  "group": {
    "who": "Web Peeps",
    "name": "London Web",
    "group_lat": 51.52000045776367,
    "created": 1034097743000,
    "join_mode": "approval",
    "group_lon": -0.12999999523162842,
    "urlname": "londonweb",
    "id": 163876
  },
  "name": "London Web Design October Meetup",
  "created": 1094756756000,
  "venue": {
    "city": "London",
    "name": "Roadhouse Live Music Restaurant , Bar & Club",
    "country": "GB",
    "lon": -0.1,
    "phone": "44-020-7240-6001",
    "address_1": "The Piazza",
    "address_2": "Covent Garden",
    "repinned": false,
    "lat": 51.52,
    "id": 11725
  },
  "updated": 1273536337000,
  "visibility": "public",
  "yes_rsvp_count": 2,
  "time": 1097776800000,
  "waitlist_count": 0,
  "headcount": 0,
  "maybe_rsvp_count": 5,
  "id": "3261890"
}

We want to extract the keys underneath ‘venue’.
I started with the following:

$ jq -r ".[] | .venue" data/events/0.json
...
{
  "city": "London",
  "name": "Counting House Pub",
  "country": "gb",
  "lon": -0.085022,
  "phone": "020 7283 7123",
  "address_1": "50 Cornhill Rd",
  "address_2": "EC3V 3PD",
  "repinned": false,
  "lat": 51.513407,
  "id": 835790
}
null
{
  "city": "Paris",
  "name": "Mozilla Paris",
  "country": "fr",
  "lon": 2.341002,
  "address_1": "16 Bis Boulevard Montmartre",
  "repinned": false,
  "lat": 48.871834,
  "id": 23591845
}
...

This is close to what I want but it includes ‘null’ values which means when you extract the keys inside ‘venue’ they are all empty as well:

jq -r ".[] | .venue | [.id, .name, .city, .address_1, .address_2, .lat, .lon] | @csv" data/events/0.json
...
101958,"The Green Man and French Horn,  -","London","54, St. Martins Lane - Covent Garden","WC2N 4EA",51.52,-0.1
,,,,,,
107295,"The Yorkshire Grey Pub","London","46 Langham Street","W1W 7AX",51.52,-0.1
...
,,,,,,

If functional programming lingo we want to filter out any JSON documents which don’t have the ‘venue’ key.
‘filter’ has a different meaning in jq so it took me a while to realise that the ‘select’ function was what I needed to get rid of the null values:

$ jq -r ".[] | select(.venue != null) | .venue | [.id, .name, .city, .address_1, .address_2, .lat, .lon] | @csv" data/events/0.json | head
11725,"Roadhouse Live Music Restaurant , Bar & Club","London","The Piazza","Covent Garden",51.52,-0.1
11725,"Roadhouse Live Music Restaurant , Bar & Club","London","The Piazza","Covent Garden",51.52,-0.1
11725,"Roadhouse Live Music Restaurant , Bar & Club","London","The Piazza","Covent Garden",51.52,-0.1
11725,"Roadhouse Live Music Restaurant , Bar & Club","London","The Piazza","Covent Garden",51.52,-0.1
76192,"Pied Bull Court","London","Galen Place, London, WC1A 2JR",,51.516747,-0.12719
76192,"Pied Bull Court","London","Galen Place, London, WC1A 2JR",,51.516747,-0.12719
85217,"Earl's Court Exhibition Centre","London","Warwick Road","SW5 9TA",51.49233,-0.199735
96579,"Olympia 2","London","Near Olympia tube station",,51.52,-0.1
76192,"Pied Bull Court","London","Galen Place, London, WC1A 2JR",,51.516747,-0.12719
101958,"The Green Man and French Horn,  -","London","54, St. Martins Lane - Covent Garden","WC2N 4EA",51.52,-0.1

And we’re done.

Written by Mark Needham

November 14th, 2015 at 10:51 pm

Posted in Software Development

Tagged with

jq: error – Cannot iterate over null (null)

without comments

I’ve been playing around with the jq library again over the past couple of days to convert the JSON from the Stack Overflow API into CSV and found myself needing to deal with an optional field.

I’ve downloaded 100 or so questions and stored them as an array in a JSON array like so:

$ head -n 100 so.json
[
    {
        "has_more": true,
        "items": [
            {
                "is_answered": false,
                "delete_vote_count": 0,
                "body_markdown": "...",
                "tags": [
                    "jdbc",
                    "neo4j",
                    "cypher",
                    "spring-data-neo4j"
                ],
                "question_id": 33023306,
                "title": "How to delete multiple nodes by specific ID using Cypher",
                "down_vote_count": 0,
                "view_count": 8,
                "answers": [
                    {
...
]

I wrote the following command to try and extract the answer meta data and the corresponding question_id:

$ jq -r \
 '.[] | .items[] |
 { question_id: .question_id, answer: .answers[] } |
 [.question_id, .answer.answer_id, .answer.title] |
 @csv' so.json
 
33023306,33024189,"How to delete multiple nodes by specific ID using Cypher"
33020796,33021958,"How do a general search across string properties in my nodes?"
33018818,33020068,"Neo4j match nodes related to all nodes in collection"
33018818,33024273,"Neo4j match nodes related to all nodes in collection"
jq: error (at so.json:134903): Cannot iterate over null (null)

Unfortunately this results in an error since some questions haven’t been answered yet and therefore don’t have the ‘answers’ property.

While reading the docs I came across the alternative operation ‘//’ which can be used to provide defaults – in this case I thought I could plugin an empty array of answers if a question hadn’t been answered yet:

$ jq -r \
 '.[] | .items[] |
 { question_id: .question_id, answer: (.answers[] // []) } |
 [.question_id, .answer.answer_id, .answer.title] |
 @csv' so.json
 
33023306,33024189,"How to delete multiple nodes by specific ID using Cypher"
33020796,33021958,"How do a general search across string properties in my nodes?"
33018818,33020068,"Neo4j match nodes related to all nodes in collection"
33018818,33024273,"Neo4j match nodes related to all nodes in collection"
jq: error (at so.json:134903): Cannot iterate over null (null)

Still the same error! Reading down the page I noticed the ? operator which provides syntactic sugar for handling/catching errors. I gave it a try:

$ jq -r  '.[] | .items[] |
 { question_id: .question_id, answer: .answers[]? } |
 [.question_id, .answer.answer_id, .answer.title] |
 @csv' so.json | head -n10
 
33023306,33024189,"How to delete multiple nodes by specific ID using Cypher"
33020796,33021958,"How do a general search across string properties in my nodes?"
33018818,33020068,"Neo4j match nodes related to all nodes in collection"
33018818,33024273,"Neo4j match nodes related to all nodes in collection"
33015714,33021482,"Upgrade of spring data neo4j 3.x to 4.x Relationship Operations"
33011477,33011721,"Why does Neo4j OGM delete method return void?"
33011102,33011565,"Neo4j and algorithms"
33011102,33013260,"Neo4j and algorithms"
33010859,33011505,"Importing data into an existing database in neo4j"
33009673,33010942,"How do I use Spring Data Neo4j to persist a Map (java.util.Map) object inside an NodeEntity?"

As far as I can tell we are just skipping any records that don’t contain ‘answers’ which is exactly the behaviour I’m after so that’s great – just what we need!

Written by Mark Needham

October 9th, 2015 at 6:34 am

Posted in Software Development

Tagged with