XQuery/XQuery and Python
< XQuery
Over at [1] Cameron Laird has an example of Python code to extract and list the a tags on an XHTML page:
import elementtree.ElementTree for element in elementtree.ElementTree.parse("draft2.xml").findall("//a"): if element.tag == "a": attributes = element.attrib if "href" in attributes: print "'%s' is at URL '%s'." % (element.text, attributes ['href']) if "name" in attributes: print "'%s' anchors '%s'." % (element.text, attributes['name'])
The equivalent to this Python code in XQuery is:
for $a in doc("http://en.wikipedia.org/wiki/XQuery")//*:a return if ($a/@href) then concat("'", $a,"' is at URL '",$a/@href,"' ") else if ($a/@name) then concat("'", $a,"' anchors '",$a/@name,"' ") else ()
tags in XQuery on Wikipedia (view source)
Here the namespace prefix is a wild-card since we don't know what the html namespace might be.
More succinctly but less readably (and with quotes omitted from the output for clarity), this could be expressed as:
string-join( doc("http://en.wikipedia.org/wiki/XQuery")//*:a/ (if (@href) then concat(.," is at URL ",@href) else if (@name) then concat(.," anchors ", @name) else () ) ,' ' )
tags in XQuery on Wikipedia (view source)
More usefully, we might supply the url of any XHTML page as a parameter and generate an HTML page of external links:
declare option exist:serialize "method=xhtml media-type=text/html"; let $url :=request:get-parameter("url",()) return <html> <h1>External links in {$url}</h1> { for $a in doc($url)//*:a[text()][starts-with(@href,'http://')] return <div><b>{string($a)}</b> is at <a href="{$a/@href}"><i>{string($a/@href)}</i> </a></div> } </html>