XQuery/All Leaf Paths
Motivation
editYou want to generate a list of all leaf paths in a document or document collection.
This process is very useful to get to know a new data set. Specifically you will find that the leaf elements in an XML file carry much of the data in a data-style markup. These leaf elements frequently are used to carry the most semantics or meaning within the document. They for the basis for a semantic inventory of the document. That is each leaf element should be able to be associated with a data definition.
Leaf elements are also good targets for indexing within your index configuration file.
Example
editMethod
editWe will use the functx leaf-elements() function
functx:leaf-elements($nodes*) xs:string*
This function takes as input, one or more nodes and returns an array of strings.
Example Output
editFor the demo play Hamlet that is included in the eXist demo set the file /db/shakespeare/plays/hamlet.xml will generate the following output:
PLAY
TITLE
FM
P
PERSONAE
PERSONA
PGROUP
GRPDESCR
SCNDESCR
PLAYSUBT
ACT
SCENE
STAGEDIR
SPEECH
SPEAKER
LINE
Source Code to leaf-elements
editdeclare namespace functx = "http://www.functx.com";
declare function functx:leaf-elements ($root as node()?) as element()* {
$root/descendant-or-self::*[not(*)]
};
This query uses the descendant-or-self::* function with the predicate [not(*)] to qualify only elements that do not have child nodes.
Example XQuery
editxquery version "1.0";
declare namespace functx = "http://www.functx.com";
declare function functx:distinct-element-names($nodes as node()*) as xs:string* {
distinct-values($nodes/descendant-or-self::*/local-name(.))
};
let $doc := doc('/db/shakespeare/plays/hamlet.xml')
let $distinct-element-names := functx:distinct-element-names($doc)
let $distinct-element-names-count := count($distinct-element-names)
return
<ol>{
for $distinct-element-name in $distinct-element-names
order by $distinct-element-name
return
<li>{$distinct-element-name}</li>
}</ol>
Adding Attributes
editYou can also run a query that will get all the distinct attributes. Attributes are all considered leaf data types since they can never have child elements.
declare function functx:distinct-attribute-names($nodes as node()*) as xs:string* {
distinct-values($nodes//@*/name(.))
};
This query says in effect to "get all the all the distinct attribute names in the input nodes".
For the MODS demo file: doc('/db/mods/01c73f2b05650de2e6124d9d113f40be.xml')
You will get the following attributes:
- type
- encoding
- authority
</syntaxhighlight>
References
editDocumentation on xqueryfunctions.com web site.