XQuery/All Paths
Motivation
editYou want to generate a list of all unique path expressions to a document.
This process is very useful to quickly get familiar with a new data set. It is also important to make sure that your document-style transforms are accessing all the elements. This process can also be used as a basis for generating index files for a new data set.
Example Output
editPaths the list of unique paths for a sample file from the Shakespeare Demos on the eXist demo system at /db/shakespeare/plays/hamlet.xml would generate the following results.
PLAY
PLAY/TITLE
PLAY/FM
PLAY/FM/P
PLAY/PERSONAE
PLAY/PERSONAE/TITLE
PLAY/PERSONAE/PERSONA
PLAY/PERSONAE/PGROUP
PLAY/PERSONAE/PGROUP/PERSONA
PLAY/PERSONAE/PGROUP/GRPDESCR
PLAY/SCNDESCR
PLAY/PLAYSUBT
PLAY/ACT
PLAY/ACT/TITLE
PLAY/ACT/SCENE
PLAY/ACT/SCENE/TITLE
PLAY/ACT/SCENE/STAGEDIR
PLAY/ACT/SCENE/SPEECH
PLAY/ACT/SCENE/SPEECH/SPEAKER
PLAY/ACT/SCENE/SPEECH/LINE
PLAY/ACT/SCENE/SPEECH/STAGEDIR
PLAY/ACT/SCENE/SPEECH/LINE/STAGEDIR
Note that these path expressions are sorted in document order, that is the order that the path first appeared in a document. So you can see that the cast list in the PERSONAE appear before the ACT/SCENE elements. The output can also be sorted in alphabetical order.
Method
editWe will use the functx libraries.
In particular the function:
functx:distinct-element-paths($nodes)
takes as its input a node and returns a sequence of strings of the path expressions.
distinct-element-paths function
editxquery version "1.0";
declare namespace functx = "http://www.functx.com";
declare function functx:path-to-node($nodes as node()*) as xs:string* {
$nodes/string-join(ancestor-or-self::*/name(.), '/')
};
declare function functx:distinct-element-paths($nodes as node()*) as xs:string* {
distinct-values(functx:path-to-node($nodes/descendant-or-self::*))
};
declare function functx:sort($seq as item()*) as item()* {
for $item in $seq
order by $item
return $item
};
let $in-xml := collection("NAMEOFCOLLECTION")
return functx:sort(functx:distinct-element-paths($in-xml))
The heart of this query is the single expression:
ancestor-or-self::*/name(.)
Which says in effect "get me the element names of all the nodes in the document". The next step is to turn this list into a list of distinct element paths. This is done by the function functx:distinct-element-paths()
Working with a single test document
edituse the document()
Working with a document collection
edituse collection() function
Creating a Web Service
editAcknowledgments
editDavid Elwell posted this suggestion on the open-exist list on July 22 of 2010