XQuery/Wiki weapons page

Over on Matt Turner's blog, he uses MarkLogic to get a list of medieval weapons from the wiki page as the first step in the enrichment of the texts of Shakespeare's plays.

Here's another attempt at this task, using only standard XQuery functions. Again, we are fortunate that wiki pages are well-formed XML.

declare namespace h= "http://www.w3.org/1999/xhtml" ;

let $url := "http://en.wikipedia.org/wiki/List_of_medieval_weapons"
let $wikipage := doc($url)
return 
    string-join($wikipage//h:div[@id="bodyContent"]//h:li[h:a/@title][empty(h:ul)]/h:a,',')

The complex path here is it ensure that only the relevant li tags are included and that only terminals in a hierarchy of terms are included, hence the check that the li has no ul child.

Last modified on 20 August 2013, at 12:05