XQuery/All Leaf Paths

Motivation edit

You want to generate a list of all leaf paths in a document or document collection.

This process is very useful to get to know a new data set. Specifically you will find that the leaf elements in an XML file carry much of the data in a data-style markup. These leaf elements frequently are used to carry the most semantics or meaning within the document. They for the basis for a semantic inventory of the document. That is each leaf element should be able to be associated with a data definition.

Leaf elements are also good targets for indexing within your index configuration file.

Example edit

Method edit

We will use the functx leaf-elements() function

  functx:leaf-elements($nodes*) xs:string*

This function takes as input, one or more nodes and returns an array of strings.

Example Output edit

For the demo play Hamlet that is included in the eXist demo set the file /db/shakespeare/plays/hamlet.xml will generate the following output:

PLAY
TITLE
FM
P
PERSONAE
PERSONA
PGROUP
GRPDESCR
SCNDESCR
PLAYSUBT
ACT
SCENE
STAGEDIR
SPEECH
SPEAKER
LINE

Source Code to leaf-elements edit

declare namespace functx = "http://www.functx.com"; 
declare function functx:leaf-elements ($root as node()?) as element()* {
   $root/descendant-or-self::*[not(*)]
};

This query uses the descendant-or-self::* function with the predicate [not(*)] to qualify only elements that do not have child nodes.

Example XQuery edit

xquery version "1.0";
declare namespace functx = "http://www.functx.com";
declare function functx:distinct-element-names($nodes as node()*) as xs:string* {
   distinct-values($nodes/descendant-or-self::*/local-name(.))
};

let $doc := doc('/db/shakespeare/plays/hamlet.xml')

let $distinct-element-names := functx:distinct-element-names($doc)

let $distinct-element-names-count := count($distinct-element-names)

return
<ol>{
  for $distinct-element-name in $distinct-element-names
  order by $distinct-element-name
  return
      <li>{$distinct-element-name}</li>
}</ol>

Adding Attributes edit

You can also run a query that will get all the distinct attributes. Attributes are all considered leaf data types since they can never have child elements.

declare function functx:distinct-attribute-names($nodes as node()*)  as xs:string* {
   distinct-values($nodes//@*/name(.))
};

This query says in effect to "get all the all the distinct attribute names in the input nodes".

For the MODS demo file: doc('/db/mods/01c73f2b05650de2e6124d9d113f40be.xml')

You will get the following attributes:

  1. type
  2. encoding
  3. authority

</syntaxhighlight>

References edit

Documentation on xqueryfunctions.com web site.