XQuery/SPARQLing Country Calling Codes

MotivationEdit

Stimulated by Henry Story's blog entry, the following script works on the same problem. This script uses the functions defined in previous module to execute a SPARQL query on the dbpedia server, and to convert SPARQL Query results to tuples.

First attemptEdit

import module namespace fr="http://www.cems.uwe.ac.uk/wiki/fr"   at  "fr.xqm";
 
declare variable $query := "
PREFIX : <http://dbpedia.org/resource/>
PREFIX p: <http://dbpedia.org/property/> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
SELECT * WHERE { 
     ?resource  p:callingCode ?callingCode.
     }
 ";
 
 declare option exist:serialize "method=xhtml media-type=text/html";
 
 <html>
<head>
  <title>Country Calling codes</title>
</head>
<body>
  <h1>Country Calling codes</h1>
  <table border="1">
   {  for $country in  fr:sparql-to-tuples(fr:execute-sparql($query))
      let $name := fr:clean($country/resource)
      order by $name
      return      
        <tr>
           <td><a href="{$country/resource}">{$name}</a></td>
           <td>{$country/callingCode}</td>
        </tr>
   }
   </table>
 </body>
 </html>

Run


In this script the resource uri is parsed to get the local name part of the resource URI in the fr:clean() function.


The more sound alternative is to filter the multilingual rdfs:label property:

 SELECT * WHERE { 
    ?resource  p:callingCode ?callingCode.
     ?resource rdfs:label ?name.
     FILTER (lang(?name) = 'en')
    }

Run

but this query is naturally much slower.

DiscussionEdit

This query returns a set of dbpedia resources which have a callingCode property. However, it includes resources which are not countries and it proves quite difficult to identify which resources are countries. It might be expected that either the skos:subject or rdfs:type predicates would identify countries, but this is not the case.

Of course, what entities are classified as countries is a debatable issue, as is currently illustrated by Kosova and by the documentation on ISO 3166. Perhaps countries are better identified by properties. There is a property countryCode which looks promising:

The SPARQL query becomes:

PREFIX : <http://dbpedia.org/resource/>
PREFIX p: <http://dbpedia.org/property/> 
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#> 
SELECT * WHERE { 
    ?resource p:callingCode ?callingCode.
    ?resource p:countryCode ?countryCode.
    }

Run


However this shows that many countries have incomplete data in dbpedia, or that the coding of this property is inconsistent. This is not surprising because there are a number of types of country codes, which result in different definitions of country:

Wikipedia scrapingEdit

In fact, International Calling codes are listed in a wikipedia entry Thus a more direct approach would be to generate the table by scraping wikipedia directly. However, now we err in the opposite direction, in that there are calling codes for telecom services as well as countries, and the format of numbers and names is inconsistent - some multiple numbers, some numbers with leading + , some countries with appended synonyms etc.


In this script, the path expression finds the anchor "Alphabetical_Listing" and then finds the following table.

declare namespace h= "http://www.w3.org/1999/xhtml" ;
 
let $url := "http://en.wikipedia.org/wiki/International_calling_codes"
let $wikipage := doc($url)
let $section := $wikipage//h:table[@class="wikitable sortable"][2]
return 
    $section

Jan 2010 - the page layout had changed so that the previous path to this table :

let $section := $wikipage//h:a[@name="Alphabetical_Listing"]/../following-sibling::h:table[1]

to the current :

let $section := $wikipage//h:table[@class="wikitable sortable"][2]

Wikipedia

Export as RDFEdit

An alternative is to export this table as RDF. Here the resource is the dbpedia resource and the property is defined in the dbpedia property namespace.

declare namespace h= "http://www.w3.org/1999/xhtml" ;
declare namespace rdf = "http://www.w3.org/1999/02/22-rdf-syntax-ns#";
declare namespace p = "http://dbpedia.org/property/";
 
let $url := "http://en.wikipedia.org/wiki/International_calling_codes"
let $wikipage := doc($url)
let $section := $wikipage//h:table[@class="wikitable sortable"][2]
return 
  <rdf:RDF xmlns:p = "http://dbpedia.org/property/">
    {for $row in $section/h:tr[h:td]
     let $country := string($row/h:td[1])
     let $code := string($row/h:td[2]/h:a[1])
     let $code := replace($code,"\*","")
     let $resource := concat("http://dbpedia.org/resource/", replace($country," ","_"))
     return 
      <rdf:Description rdf:about="{$resource}">
         <p:internationalcallingCode>{$code}</p:internationalcallingCode>
     </rdf:Description>
    }
  </rdf:RDF>

Similarly the structure of this table changed so this code needed to be updated.


RDF

Last modified on 12 January 2010, at 19:48