XQuery/Filtering Nodes

Motivation

You want to create filters that remove or replace specific nodes in an XML stream. This stream may be in-memory XML documents and may not be on-disk.

↑Jump back a section

Method

To process all nodes in a tree we will start with recursive function called the identity transform. This function copies the source tree into the output tree without change. We begin with this process and then add some exception processing for each filter.

(: return a deep copy of  the element and all sub elements :)
declare function local:copy($element as element()) as element() {
   element {node-name($element)}
      {$element/@*,
          for $child in $element/node()
              return
               if ($child instance of element())
                 then local:copy($child)
                 else $child
      }
};

This function uses an XQuery construct called computed element constructor to construct an element. The format of the element constructor is the following:

  element {ELEMENT-NAME} {ELEMENT-VALUE}

In the above case ELEMENT-VALUE is another query that finds all the child elements of the current node. The for loop selects all nodes of the current element and does the following pseudo-code:

  if the child is another element ''(this uses the "instance of" instruction)''
      then copy the child ''(recursively)''
      else return the child ''(we have a leaf element of the tree)''

If you understand this basic structure of this algorithm you can now modify it to filter out only the elements you want. You just start with this template and modify various sections.

Note that you can also achieve this function by using the typeswitch operator:

declare function local:copy($n as node()) as node() {
   typeswitch($n)
      case $e as element()
         return
            element {name($e)}
                    {$e/@*,
                     for $c in $e/(* | text())
                         return local:copy($c) }         
      default return $n
 };
↑Jump back a section

Removing all attributes

The following function removes all attributes from elements since attributes are not copied.

declare function local:copy-no-attributes($element as element()) as element() {
   element {node-name($element)}
      {
      for $child in $element/node()
         return
            if ($child instance of element())
               then local:copy-no-attributes($child)
               else $child
      }
};

This function can also be arrived at by using the typeswitch operator:

declare function local:copy($n as node()) as node() {
   typeswitch($n)
      case $e as element()
         return
            element {name($e)}
                    {for $c in $e/(* | text())
                         return local:copy($c) }         
      default return $n
  };

The function can be parameterized by adding a second function argument to indicate what attributes should be removed.

↑Jump back a section

Change all the attribute names for a given element

declare function local:change-attribute-name-for-element(
   $node as node(),
   $element as xs:string,
   $old-attribute as xs:string,
   $new-attribute as xs:string
   ) as element() {
       element
         {node-name($node)}
         {if (string(node-name($node))=$element)
           then
              for $att in $node/@*
              return
                if (name($att)=$old-attribute)
                  then
                     attribute {$new-attribute} {$att}
                   else
                      attribute {name($att)} {$att}
           else
              $node/@*
           ,
               for $child in $node/node()
                 return if ($child instance of element())
                    then local:change-attribute-name-for-element($child, $element, $old-attribute, $new-attribute)
                    else $child 
         }
};
↑Jump back a section

Replacing all attribute values

For all elements that have a specific attribute replace the old attribute with the new attribute.

declare function local:change-attribute-values(
   $node as node(),
   $element as xs:string,
   $attribute as xs:string,
   $old-value as xs:string,
   $new-value as xs:string) as element() {
       element
         {node-name($node)}
         {if (string(node-name($node))=$element)
           then
              for $att in $node/@*
              return
                if (name($att)=$attribute)
                  then
                     attribute {name($att)} {$new-value}
                   else
                      attribute {name($att)} {$old-value}
           else
              $node/@*
           ,
               for $child in $node/node()
                 return if ($child instance of element())
                    then local:change-attribute-values($child, $element, $attribute, $old-value, $new-value)
                    else $child 
         }
};
↑Jump back a section

Removing named attributes

Attributes are filtered in the predicate expression not(name()=$attribute-name) so that named attributes are omitted.

declare function local:copy-filter-attributes(
       $element as element(),
       $attribute-name as xs:string*) as element() {
    element {node-name($element)}
            {$element/@*[not(name()=$attribute-name)],
                for $child in $element/node()
                   return if ($child instance of element())
                      then local:copy-filter-attributes($child, $attribute-name)
                      else $child
            }
  };
↑Jump back a section

Removing named elements

Likewise, elements can be filtered in a predicate:

declare function local:remove-elements($input as element(), $remove-names as xs:string*) as element() {
   element {node-name($input) }
      {$input/@*,
       for $child in $input/node()[not(name(.)=$remove-names)]
          return
             if ($child instance of element())
                then local:remove-elements($child, $remove-names)
                else $child
      }
};

This adds the node() qualifier and the name of the node in the predicate:

/node()[not(name(.)=$element-name)]

To use this function just pass the input XML as the first parameter and a sequence of element names as strings as the second parameter. For example:

  let $input := doc('my-input.xml')
  let $remove-list := ('xxx', 'yyy', 'zzz')
  local:remove-elements($input,  $remove-list)
↑Jump back a section

Example illustrating the above filters

The following script demonstrates these functions:

let $x :=
<data>
   <a q="joe">a</a>
   <b p="5" q="fred" >bb</b>
   <c>
        <d>dd</d>
         <a q="dave">aa</a>
   </c>
</data>
return
 <output>
    <original>{$x}</original>
    <fullcopy> {local:copy($x)}</fullcopy>
    <noattributes>{local:copy-no-attributes($x)}  </noattributes>
    <filterattributes>{local:copy-filter-attributes($x,"q")}</filterattributes>
    <filterelements>{local:copy-filter-elements($x,"a")}</filterelements>
    <filterelements2>{local:copy-filter-elements($x,("a","d"))}  </filterelements2>
 </output>

Run

↑Jump back a section

Removing unwanted namespaces

Some systems do not allow you to have precise control of the namespaces used after doing an update despite the use of copy-namespaces declarations.

The following XQuery function is an example that will remove the TEI namespace from a node.

declare function local:clean-namespaces($node as node()) {
    typeswitch ($node)
        case element() return
            if (namespace-uri($node) eq "http://www.tei-c.org/ns/1.0") then
                element { QName("http://www.tei-c.org/ns/1.0", local-name($node)) } {
                    $node/@*, for $child in $node/node() return local:clean-namespaces($child)
                }
            else
                $node
        default return
            $node
};

Below two functions will remove any namespace from a node, nnsc stands for no-namespace-copy. The first one performs much faster: From my limited understanding it jumps attributes quicker. The other one still here, something tricky might be hidden there.

(: return a deep copy of the element withouth namespaces 
declare function local:nnsc1($element as element()) as element() {
     element { local-name($element) } {
         $element/@*,
         for $child in $element/node()
         return
             if ($child instance of element())
             then local:nnsc1($child)
             else $child
         }
};
(: return a deep copy of the element withouth namespaces 
declare function local:nnsc2($element as element()) as element() {
     element { QName((), local-name($element)) } {
         for $child in $element/(@*,*)
         return
             if ($child instance of element())
             then local:nnsc2($child)
             else $child
     }
};

Conversely, if you want to add a namespace to an element, a starting point in this blog post: http://fgeorges.blogspot.com/2006/08/add-namespace-node-to-element-in.html

↑Jump back a section

Removing elements with no string value

Elements which contain no string value or which contain whitespace only can be removed:

declare function local:remove-empty-elements($nodes as node()*)  as node()* {
   for $node in $nodes
   return
     if ($node instance of element())
     then if (normalize-space($node) = '')
          then ()
          else element { node-name($node)}
                { $node/@*,
                  local:remove-empty-elements($node/node())}
     else if ($node instance of document-node())
     then local:remove-empty-elements($node/node())
     else $node
 } ;
↑Jump back a section

Removing empty attributes

Attributes which contain no text can be stripped:

declare function local:remove-empty-attributes($element as element()) as element() {
element { node-name($element)}
{ $element/@*[string-length(.) ne 0],
for $child in $element/node( )
return 
    if ($child instance of element())
    then local:remove-empty-attributes($child)
    else $child }
};
↑Jump back a section

Adding elements

When adding an element, one has to know what to add and where to add it.

Here $node is the document or XML fragment to work on, $new-node is the new element to insert, $element-name-to-check is which other element to use as a reference for inserting $new-node, and $location gives the option of inserting $new-node before, after, or as the first or last child of $element-name-to-check. $location accepts four values: 'before', 'after', 'first-child', and 'last-child' (if another value is given, $element-name-to-check is removed).

declare function local:insert-element($node as node()?, $new-node as node(), 
    $element-name-to-check as xs:string, $location as xs:string) { 
        if (local-name($node) eq $element-name-to-check)
        then
            if ($location eq 'before')
            then ($new-node, $node) 
            else 
                if ($location eq 'after')
                then ($node, $new-node)
                else
                    if ($location eq 'first-child')
                    then element { node-name($node) } { 
                        $node/@*
                        ,
                        $new-node
                        ,
                        for $child in $node/node()
                            return 
                                local:insert-element($child, $new-node, $element-name-to-check, $location) 
                    }
                    else
                    if ($location eq 'last-child')
                    then element { node-name($node) } { 
                        $node/@*
                        ,
                        for $child in $node/node()
                            return 
                                local:insert-element($child, $new-node, $element-name-to-check, $location) 
                        ,
                        $new-node
                    }
                    else () (:You remove the $element-to-check if none of the three options are used.:)
        else
            if ($node instance of element()) 
            then
                element { node-name($node) } { 
                    $node/@*
                    , 
                    for $child in $node/node()
                        return 
                            local:insert-element($child, $new-node, $element-name-to-check, $location) 
             }
         else $node
};

A typeswitch is not used because it requires static parameters.

Having the following main,

let $doc := 
<html>
    <head n="1">1</head>
    <body>
        <p n="2">2</p>
        <p n="3">3</p>
    </body>
</html>
 
let $insert := <p n="4">0</p>
 
return 
    local:insert-element($doc, $insert, 'head', 'first-child')

the result will be

<html>
   <head n="1">
      <p n="4">0</p>1</head>
   <body>
      <p n="2">2</p>
      <p n="3">3</p>
   </body>
</html>

Note that if $element-name-to-check is 'p', the $new-node will be inserted in relation to every element named 'p'.

↑Jump back a section
Last modified on 16 November 2012, at 09:22