· wikidata sparql

Querying Wikidata: SELECT vs CONSTRUCT

In this blog post we’re going to build upon the newbie’s guide to querying Wikidata, and learn all about the CONSTRUCT clause.

wikidata construct select
Figure 1. SPARQL’s CONSTRUCT and SELECT clauses

In the newbie’s guide, we wrote the following query to find a tennis player with the name "Nick Kyrgios" and return their date of birth:

SELECT *
WHERE {
  ?person wdt:P106 wd:Q10833314 ;
          rdfs:label 'Nick Kyrgios'@en ;
          wdt:P569 ?dateOfBirth
}

where:

  • wdt:P106 is occupation

  • wd:Q10833314 is tennis player

  • wdt:P569 is date of birth

If we run that query, we’ll see the following output:

Table 1. Results
person dateOfBirth

http://www.wikidata.org/entity/Q3720084

1995-04-27T00:00:00Z

But what if we want to return the results as a list of triples instead?

CONSTRUCT WHERE

We can use the CONSTRUCT WHERE clause instead of SELECT.

A short form for the CONSTRUCT query form is provided for the case where the template and the pattern are the same and the pattern is just a basic graph pattern (no FILTERs and no complex graph patterns are allowed in the short form). The keyword WHERE is required in the short form.

I found a good article explaining the CONSTRUCT clause as part of FutureLearn’s Introduction to Linked Data and the Semantic Web course.

Our updated query looks like this:

CONSTRUCT
WHERE { ?person wdt:P106 wd:Q10833314 ;
                rdfs:label 'Nick Kyrgios'@en ;
                wdt:P569 ?dateOfBirth
}

And if we run that we’ll get the following output:

Table 2. Results
subject predicate object

http://www.wikidata.org/entity/Q3720084

http://www.wikidata.org/prop/direct/P106

http://www.wikidata.org/entity/Q10833314

http://www.wikidata.org/entity/Q3720084

http://www.w3.org/2000/01/rdf-schema#label

Nick Kyrgios

http://www.wikidata.org/entity/Q3720084

http://www.wikidata.org/prop/direct/P569

1995-04-27T00:00:00Z

where:

  • Q3720084 is Nick Kyrgios

  • P106 is occupation

  • Q10833314 is tennis player

  • P569 is date of birth

So if we translate the three triples returned, what we have is:

Table 3. Translated results

Nick Kyrgios

occupation

tennis player

Nick Kyrgios

label

Nick Kyrgios

Nick Kyrgios

date of birth

1995-04-27T00:00:00Z

So far, so good. Let’s extend our SELECT query to also return the person’s nationality:

SELECT *
WHERE {
  ?person wdt:P106 wd:Q10833314 ;
          rdfs:label 'Nick Kyrgios'@en ;
          wdt:P569 ?dateOfBirth ;
          wdt:P27 [ rdfs:label ?countryName ] .
  filter(lang(?countryName) = "en")
}
Table 4. Results
person dateOfBirth countryName

http://www.wikidata.org/entity/Q3720084

1995-04-27T00:00:00Z

Australia

Now we want to do the same thing with our CONSTRUCT query:

CONSTRUCT
WHERE {
  ?person wdt:P106 wd:Q10833314 ;
          rdfs:label 'Nick Kyrgios'@en ;
          wdt:P569 ?dateOfBirth ;
          wdt:P27 [ rdfs:label ?countryName ] .
  filter(lang(?countryName) = "en")
}

If we run that query, we’ll get the following error:

SPARQL-QUERY: queryStr=CONSTRUCT
WHERE {
  ?person wdt:P106 wd:Q10833314 ;
          rdfs:label 'Nick Kyrgios'@en ;
          wdt:P569 ?dateOfBirth ;
          wdt:P27 [ rdfs:label ?countryName ] .
  filter(lang(?countryName) = "en")
}
java.util.concurrent.ExecutionException: org.openrdf.query.MalformedQueryException: CONSTRUCT WHERE only permits statement patterns in the WHERE clause.

As the error message indicates, we can only use statement patterns in the WHERE clause. The filter part of the WHERE clause is problematic, so let’s remove that:

CONSTRUCT
WHERE {
  ?person wdt:P106 wd:Q10833314 ;
          rdfs:label 'Nick Kyrgios'@en ;
          wdt:P569 ?dateOfBirth ;
          wdt:P27 [ rdfs:label ?countryName ]
}

If we run that query, we’ll get the following output:

Table 5. Results
subject predicate object

http://www.wikidata.org/entity/Q3720084

http://www.wikidata.org/prop/direct/P106

http://www.wikidata.org/entity/Q10833314

http://www.wikidata.org/entity/Q3720084

http://www.w3.org/2000/01/rdf-schema#label

Nick Kyrgios

http://www.wikidata.org/entity/Q3720084

http://www.wikidata.org/prop/direct/P569

1995-04-27T00:00:00Z

b0

http://www.w3.org/2000/01/rdf-schema#label

Australia

http://www.wikidata.org/entity/Q3720084

http://www.wikidata.org/prop/direct/P27

b0

b1

http://www.w3.org/2000/01/rdf-schema#label

Awıstralya

http://www.wikidata.org/entity/Q3720084

http://www.wikidata.org/prop/direct/P27

b1

…​

b5

http://www.w3.org/2000/01/rdf-schema#label

ཨས་ཊེཡེ་ལི་ཡ

http://www.wikidata.org/entity/Q3720084

http://www.wikidata.org/prop/direct/P27

b5

Hmm, the output isn’t exactly what we wanted. We have two issues to try and figure out:

  • what are those values that prefixed with b all about?

  • we’ve got every single version of "Australia" instead of just the English version

We can fix the first problem by pulling out the country and country name separately instead of doing it all in one statement. This means that:

?player wdt:P27 [ rdfs:label ?countryName ]

becomes:

?player wdt:P27 ?country .
?country rdfs:label ?countryName

If we do that, we’ll have the following query:

CONSTRUCT
WHERE {
  ?person wdt:P106 wd:Q10833314 ;
          rdfs:label 'Nick Kyrgios'@en ;
          wdt:P569 ?dateOfBirth ;
          wdt:P27 ?country  .
  ?country rdfs:label ?countryName
}

And now let’s run that query:

Table 6. Results
subject predicate object

http://www.wikidata.org/entity/Q3720084

http://www.wikidata.org/prop/direct/P106

http://www.wikidata.org/entity/Q10833314

http://www.wikidata.org/entity/Q3720084

http://www.w3.org/2000/01/rdf-schema#label

Nick Kyrgios

http://www.wikidata.org/entity/Q3720084

http://www.wikidata.org/prop/direct/P569

1995-04-27T00:00:00Z

http://www.wikidata.org/entity/Q3720084

http://www.wikidata.org/prop/direct/P27

http://www.wikidata.org/entity/Q408

http://www.wikidata.org/entity/Q408

http://www.w3.org/2000/01/rdf-schema#label

Australia

http://www.wikidata.org/entity/Q408

http://www.w3.org/2000/01/rdf-schema#label

Australië

…​

http://www.wikidata.org/entity/Q408

http://www.w3.org/2000/01/rdf-schema#label,Австралия,

http://www.wikidata.org/entity/Q408

That’s better, but we still have all versions of Australia instead of just the English version.

Plain old CONSTRUCT

As far as I understand, to fix that we’ll need to use the normal CONSTRUCT syntax, which requires us to specify all the triples that we’d like to return.

Let’s update our query to do that:

CONSTRUCT {
  ?person wdt:P569 ?dateOfBirth;
          rdfs:label ?playerName;
          wdt:P27 ?country .
  ?country rdfs:label ?countryName
}
WHERE {
  ?person wdt:P106 wd:Q10833314 ;
          rdfs:label 'Nick Kyrgios'@en ;
          wdt:P569 ?dateOfBirth ;
          wdt:P27 ?country  .
  ?country rdfs:label ?countryName .
  filter(lang(?countryName) = "en")
}

And if we run that query, we’ll see the following output:

Table 7. Results
subject predicate object

http://www.wikidata.org/entity/Q3720084

http://www.wikidata.org/prop/direct/P569

1995-04-27T00:00:00Z

http://www.wikidata.org/entity/Q3720084

http://www.wikidata.org/prop/direct/P27

http://www.wikidata.org/entity/Q408

http://www.wikidata.org/entity/Q408

http://www.w3.org/2000/01/rdf-schema#label

Australia

That’s better, but we’re missing the statement that returns the player’s name.

We do have that statement in the CONSTRUCT clause, but we also need to have it in the WHERE clause. If we do that we’ll also need to add a language filter so that we only return the English version of the name. Our query now looks like this:

CONSTRUCT {
  ?person wdt:P569 ?dateOfBirth;
          rdfs:label ?playerName;
          wdt:P27 ?country .
  ?country rdfs:label ?countryName
}
WHERE {
  ?person wdt:P106 wd:Q10833314 ;
          rdfs:label 'Nick Kyrgios'@en ;
          rdfs:label ?playerName;
          wdt:P569 ?dateOfBirth ;
          wdt:P27 ?country  .
  ?country rdfs:label ?countryName .
  filter(lang(?countryName) = "en")
  filter(lang(?playerName) = "en")
}

Now let’s run that query:

Table 8. Results
subject predicate object

http://www.wikidata.org/entity/Q3720084

http://www.wikidata.org/prop/direct/P569

1995-04-27T00:00:00Z

http://www.wikidata.org/entity/Q3720084

http://www.w3.org/2000/01/rdf-schema#label

Nick Kyrgios

http://www.wikidata.org/entity/Q3720084

http://www.wikidata.org/prop/direct/P27

http://www.wikidata.org/entity/Q408

http://www.wikidata.org/entity/Q408

http://www.w3.org/2000/01/rdf-schema#label

Australia

Much better.

Returning a custom RDF graph

One other neat thing about the CONSTRUCT clause is that we can change the RDF graph that our query returns. The following query uses vocabulary from schema.org in place of Wikidata predicates:

PREFIX sch: <http://schema.org/>

CONSTRUCT {
  ?person sch:birthDate ?dateOfBirth;
          sch:name ?playerName;
          sch:nationality ?country .
  ?country sch:name ?countryName
}
WHERE {
  ?person wdt:P106 wd:Q10833314 ;
          rdfs:label 'Nick Kyrgios'@en ;
          rdfs:label ?playerName;
          wdt:P569 ?dateOfBirth ;
          wdt:P27 ?country  .
  ?country rdfs:label ?countryName .
  filter(lang(?countryName) = "en")
  filter(lang(?playerName) = "en")
}

If we run this query, we get the following, much friendlier looking, output:

Table 9. Results
subject predicate object

http://www.wikidata.org/entity/Q3720084

http://schema.org/birthDate

1995-04-27T00:00:00Z

http://www.wikidata.org/entity/Q3720084

http://schema.org/name

Nick Kyrgios

http://www.wikidata.org/entity/Q3720084

http://schema.org/nationality

http://www.wikidata.org/entity/Q408

http://www.wikidata.org/entity/Q408

http://schema.org/name

Australia

And that’s all for now. If there’s a better way to do anything that I described, do let me know in the comments, I’m still a SPARQL newbie.

  • LinkedIn
  • Tumblr
  • Reddit
  • Google+
  • Pinterest
  • Pocket