Motivation

edit

You want to generate a list of all unique path expressions to a document.

This process is very useful to quickly get familiar with a new data set. It is also important to make sure that your document-style transforms are accessing all the elements. This process can also be used as a basis for generating index files for a new data set.

Example Output

edit

Paths the list of unique paths for a sample file from the Shakespeare Demos on the eXist demo system at /db/shakespeare/plays/hamlet.xml would generate the following results.

PLAY
PLAY/TITLE
PLAY/FM
PLAY/FM/P
PLAY/PERSONAE
PLAY/PERSONAE/TITLE
PLAY/PERSONAE/PERSONA
PLAY/PERSONAE/PGROUP
PLAY/PERSONAE/PGROUP/PERSONA
PLAY/PERSONAE/PGROUP/GRPDESCR
PLAY/SCNDESCR
PLAY/PLAYSUBT
PLAY/ACT
PLAY/ACT/TITLE
PLAY/ACT/SCENE
PLAY/ACT/SCENE/TITLE
PLAY/ACT/SCENE/STAGEDIR
PLAY/ACT/SCENE/SPEECH
PLAY/ACT/SCENE/SPEECH/SPEAKER
PLAY/ACT/SCENE/SPEECH/LINE
PLAY/ACT/SCENE/SPEECH/STAGEDIR
PLAY/ACT/SCENE/SPEECH/LINE/STAGEDIR

Note that these path expressions are sorted in document order, that is the order that the path first appeared in a document. So you can see that the cast list in the PERSONAE appear before the ACT/SCENE elements. The output can also be sorted in alphabetical order.

Method

edit

We will use the functx libraries.

In particular the function:

 functx:distinct-element-paths($nodes)

takes as its input a node and returns a sequence of strings of the path expressions.

See Documentation on xqueryfunctions.com

distinct-element-paths function

edit
xquery version "1.0";
declare namespace functx = "http://www.functx.com";
declare function functx:path-to-node($nodes as node()*) as xs:string* {
    $nodes/string-join(ancestor-or-self::*/name(.), '/')
};

declare function functx:distinct-element-paths($nodes as node()*) as xs:string* {
    distinct-values(functx:path-to-node($nodes/descendant-or-self::*))
 };

declare function functx:sort($seq as item()*) as item()* {
  for $item in $seq
  order by $item
  return $item
};

let $in-xml := collection("NAMEOFCOLLECTION")

return functx:sort(functx:distinct-element-paths($in-xml))

The heart of this query is the single expression:

  ancestor-or-self::*/name(.)

Which says in effect "get me the element names of all the nodes in the document". The next step is to turn this list into a list of distinct element paths. This is done by the function functx:distinct-element-paths()

Working with a single test document

edit

use the document()

Working with a document collection

edit

use collection() function

Creating a Web Service

edit

Acknowledgments

edit

David Elwell posted this suggestion on the open-exist list on July 22 of 2010