SPARQL/Property paths

Property paths

edit

Statements in a triplestore have a particular Property in the triples. In SPARQL queries you can also write down property paths in the triples.

Property paths are a shorthand to write down a path of properties between two items. The simplest path is just a single property, which forms an ordinary triple:

?item wdt:P31 ?class.

You can add path elements with a forward slash (/).

?item wdt:P31/wdt:P279/wdt:P279 ?class.

This is equivalent to either of the following:

?item wdt:P31 ?temp1.
?temp1 wdt:P279 ?temp2.
?temp2 wdt:P279 ?class.
?item wdt:P31 [ wdt:P279 [ wdt:P279 ?class ] ].

Exercise: (re)write the “grandchildren of Bach” query to use this syntax.

An asterisk (*) after a path element means “zero or more of this element”.

?item wdt:P31/wdt:P279* ?class.
# means:
?item wdt:P31 ?class
# or
?item wdt:P31/wdt:P279 ?class
# or
?item wdt:P31/wdt:P279/wdt:P279 ?class
# or
?item wdt:P31/wdt:P279/wdt:P279/wdt:P279 ?class
# or ...

If there are no other elements in the path, ?a something* ?b means that ?b might also just be ?a directly, with no path elements between them at all.

A plus (+) is similar to an asterisk, but means “one or more of this element”. The following query finds all descendants of Bach:

SELECT ?descendant ?descendantLabel
WHERE
{
  wd:Q1339 wdt:P40+ ?descendant.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

If we used an asterisk instead of a plus here, the query results would include Bach himself.

A question mark (?) is similar to an asterisk or a plus, but means “zero or one of this element”.

You can separate path elements with a vertical bar (|) instead of a forward slash; this means “either-or”: the path might use either of those properties. (But not both – an either-or path segment always matches a path of length one.)

You can also group path elements with parentheses (()), and freely combine all these syntax elements (/|*+?). This means that another way to find all descendants of Bach is:

SELECT ?descendant ?descendantLabel
WHERE
{
  ?descendant (wdt:P22|wdt:P25)+ wd:Q1339.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Instead of using the “child” property to go from Bach to his descendants, we use the “father” and “mother” properties to go from the descendants to Bach. The path might include two mothers and one father, or four fathers, or father-mother-mother-father, or any other combination. (Though, of course, Bach can’t be the mother of someone, so the last element will always be father.)

Summary of the codes after a path element:

Code Meaning
? (Question mark) zero or one of this element
* (Asterisk) zero or more of this element
+ (Plus) one or more of this element
edit

Instead of the normal Triple "subject, predicate, object" it is also possible to write it as inverse link "object, predicate, subject". This can be done by adding ^ in front of the predicate. For normal triples this is not very useful, but for property paths it avoids using dummy variables.

For example this query finds the siblings of Johan Sebastian Bach, by querying siblings with the same father.

SELECT ?sibling ?siblingLabel
WHERE
{
  # Bach   father/has father sibling
  wd:Q1339 wdt:P22/^wdt:P22 ?sibling. # ^ = Inverse link
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

With dummy variable this can be written as

SELECT ?sibling ?siblingLabel
WHERE
{
  # Bach   father/has father sibling
  wd:Q1339 wdt:P22 ?dummy.
  ?dummy ^wdt:P22 ?sibling. # ^ = Inverse link
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Or without inverse link:

SELECT ?sibling ?siblingLabel
WHERE
{
  # Bach   father/has father sibling
  wd:Q1339 wdt:P22 ?dummy.
  ?sibling wdt:P22 ?dummy.
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

Code Meaning
^ (Circumflex) Inverse link

Instances and classes

edit

Most Wikidata properties are “has” relations: has child, has father, has occupation. But sometimes (in fact, frequently), you also need to talk about what something is. But there are in fact two kinds of relations there:

  • Gone with the Wind is a film.
  • A film is a work of art.

Gone with the Wind is one particular film. It has a particular director (Victor Fleming), a specific duration (238 minutes), a list of cast members (Clark Gable, Vivien Leigh, …), and so on.

Film is a general concept. Films can have directors, durations, and cast members, but the concept “film” as such does not have any particular director, duration, or cast members. And although a film is a work of art, and a work of art usually has a creator, the concept of “film” itself does not have a creator – only particular instances of this concept do.

This difference is why there are two properties for “is” in Wikidata: P31 and P279. Gone with the Wind is a particular instance of the class “film”; the class “film” is a subclass (more specific class; specialization) of the more general class “work of art”.

So what does this mean for us when we’re writing SPARQL queries? When we want to search for “all works of art”, it’s not enough search for all items that are directly instances of “work of art”:

SELECT ?work ?workLabel
WHERE
{
  ?work wdt:P31 wd:Q838948. # instance of work of art
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}

Try it!

As I’m writing this, that query only returns 2815 results – obviously, there are more works of art than that! The problem is that this misses items like Gone with the Wind, which is only an instance of “film”, not of “work of art”. “film” is a subclass of “work of art”, but we need to tell SPARQL to take that into account when searching.

One possible solution to this is the [] syntax we talked about: Gone with the Wind is an instance of some subclass of “work of art”. (For exercise, try writing that query!) But that still has problems:

  1. We’re no longer including items that are directly instances of work of art.
  2. We’re still missing items that are instances of some subclass of some other subclass of “work of art” – for example, Snow White and the Seven Dwarfs is an animated film, which is a film, which is a work of art. In this case, we need to follow two “subclass of” statements – but it might also be three, four, five, any number really.

The solution: ?item wdt:P31/wdt:P279* ?class. This means that there’s one “instance of” and then any number of “subclass of” statements between the item and the class.

SELECT ?work ?workLabel
WHERE
{
  ?work wdt:P31/wdt:P279* wd:Q838948. # instance of any subclass of work of art
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }
}
LIMIT 1000

Try it!

I don’t recommend running that query for all works of art. WDQS can handle it (just barely), but your browser might crash when trying to display the results because there’s so many of them. For that reason a LIMIT 1000 is inserted.

Now you know how to search for all works of art, or all buildings, or all human settlements: the magic incantation wdt:P31/wdt:P279*, along with the appropriate class. This uses some more SPARQL features that I haven’t explained yet, but quite honestly, this is almost the only relevant use of those features, so you don’t need to understand how it works in order to use WDQS effectively.

References

edit