XQuery/Compare with XQuery

Motivation

edit

You want to use XQuery to compare two lists of items and find out how the lists are different

Methods

edit

We will use a variety of functions that iterate through the lists. For each example we will perform some comparison with the second list.

Simple iteration and test for missing elements

edit

In this first example, we will use a simple for loop to go through each item on a linear list. We then will check to see if that item is anywhere on the second list (regardless of order). If it is on the second list we will display the item from the first list. If not we will output "missing". This is useful if you want to find out if a local collection is missing files from some remote collection.

xquery version "1.0";

(: Compare of two linear lists :)

let $list1 :=
<list1>
  <item>a</item>
  <item>b</item>
  <item>c</item>
  <item>d</item>
</list1>


let $list2 :=
<list1>
  <item>a</item>
  <item>c</item>
  <item>e</item>
</list1>

return
<missing>{
 for $item1 in $list1/item
 let $item-text := $item1/text()
 return
  <test item="{$item-text}">
  {if ($list2/item/text()=$item-text)
    then ($item1)
    else <missing>{$item-text}</missing>
  }
   </test>
}</missing>

Note that the conditional expression:

if ($list2/item/text() = $item-text)

Tests to see if the $item-text is anywhere in list2. If it occurs anywhere this expression will return true().

Sample Results

edit
<missing>
  <test item="a">
    <item>a</item>
  </test>
  <test item="b">
    <missing>b</missing>
  </test>
  <test item="c">
    <item>c</item>
  </test>
  <test item="d">
    <missing>d</missing>
  </test>
</missing>

Note that this will not report any items on the second list that are missing from the first list.

Using Quantified Expressions

edit

This can be rewritten using XQuery quantified expressions. There are two reasons for this. First the XQuery optimizer can frequently run quantified expressions much faster and some people feel they are easier to read. See XQuery/Quantified Expressions for more details.

In this second example the list assignments are the same but we will only display the items from list 1 that are missing from list 2.

<missing>{
 for $item1 in $list1/item
 return
  if (some $item2 in $list2/item satisfies $item2/text() = $item1/text())
    then ()
    else $item1
}</missing>

This returns:

<missing>
  <item>b</item>
  <item>d</item>
</missing>

We are now ready to modularize this missing function so that we can pass any two lists to find missing elements.

Creating a Missing XQuery Function

edit

Our next step is to create an XQuery function that compare any two lists and returns the items in the second list that are not in the first list.

declare function local:missing($list1 as node()*, $list2 as node()*) as node()* {
 for $item1 in $list1/item
 let $item-text := $item1/text()
 return
  if (some $item2 in $list2/item satisfies $item2/text() = $item1/text())
    then ()
    else $item1
};

We can rewrite the output function to use this function:

<results>
  <missing-from-2>{local:missing($list1, $list2)}</missing-from-2>
  <missing-from-1>{local:missing($list2, $list1)}</missing-from-1>
</results>

Note that the order of the lists has been reversed in the second call to the missing() function. The second pass looks for items on list2 that are not on list1.

Running this query generates the following output:

<results>
  <missing-from-2>
    <item>b</item>
    <item>d</item>
  </missing-from-2>
  <missing-from-1>
    <item>e</item>
  </missing-from-1>
</results>

Creating HTML Difference Lists

edit

We can use CSS to style the output of these reports.

Screen Image

edit
 
HTML Diff Report using CSS

Sample Data

edit

This example uses full words of items to show text highlighting:

let $list1 :=
<list>
  <item>apples</item>
  <item>bananas</item>
  <item>carrots</item>
  <item>kiwi</item>
</list>


let $list2 :=
<list>
  <item>apples</item>
  <item>carrots</item>
  <item>grapes</item>
</list>

The following function uses HTML div and span elements and adds class="missing" to each div that is missing. The CSS file will highlight this background.

declare function local:missing($list1 as node()*, $list2 as node()*) as node()* {
 for $item1 in $list1/item
 return
  if (some $item2 in $list2/item satisfies $item2/text() = $item1/text())
    then <div>{$item1/text()}</div>
    else
    <div>
    {attribute {'class'} {'missing'}}
    {$item1/text()}     
    </div>
};

We then use the following CSS file to highlight the differences. Each missing element must have class="missing" attribute for the missing element to be highlighted in this report.

body {font-family: Ariel,Helvetica,sans-serif; font-size: large;}
h2 {padding: 3px; margin: 0px; text-align: center; font-size: large; background-color: silver;}
.left, .right {border: solid black 1px; padding: 5px;}
.missing {background-color: pink;}
.left {float: left; width: 190px}
.right {margin-left: 210px; width: 190px}
<body>
   <h1>Missing Items Report</h1>
   <div class="left">
     <h2>List 1</h2>
     {for $item in $list1/item return <div>{$item/text()}</div>}
   </div>
   <div class="right">
     <h2>List 2</h2>
     {for $item in $list2/item return <div>{$item/text()}</div>}
   </div>
   <br/>
   <div class="left">
     <h2>List 1 Missing from 2</h2>
     {local:missing($list1, $list2)}
   </div>
   <div class="right">
     <h2>List 2 Missing from 1</h2>
     {local:missing($list2, $list1)}
   </div>
  </body>

Collation

edit

If the lists are in sorted order, or can be sorted into order, an alternative approach is to recursively collate the two lists. The core algorithm looks like:

declare function local:merge($a, $b as item()* ) as item()* {
  if (empty($a) and empty($b))
  then ()
  else if (empty ($b) or $a[1] lt $b[1])
  then ($a[1], local:merge(subsequence($a, 2), $b))
  else if (empty($a) or $a[1] gt $b[1])
  then ($b[1],local:merge($a, subsequence($b,2)))      
  else (: a and b matched :)
     ($a[1], $b[1], local:merge(subsequence($a,2), subsequence($b,2)))
  };

With the example above, we can merge two lists.

 
let $list1 := 
<list>
  <item>apples</item>
  <item>bananas</item>
  <item>carrots</item>
  <item>kiwi</item>
</list>
 
 
let $list2 := 
<list>
  <item>apples</item>
  <item>carrots</item>
  <item>grapes</item>
</list>

return 
<result>
{local:merge($list1/item,$list2/item) }
</result>

Execute

The actions on merge will depend on the application and the algorithm can be modified to output only mismatched items on one or other list, and handle matching items appropriately. For example, to display the merged list as HTML, we might modify the algorithm to:

declare function local:merge($a, $b as item()* ) as item()* {
  if (empty($a) and empty ($b)) 
  then ()
  else if (empty ($b) or $a[1] lt $b[1])
  then (<div class="left">{$a[1]/text()}</div>, local:merge(subsequence($a, 2), $b))
  else if (empty ($a) or $a[1] gt $b[1])
  then (<div class="right">{$b[1]/text()}</div>,local:merge($a, subsequence($b,2))) 
  else (<div class="match">{$a[1]/text()}</div>, local:merge(subsequence($a,2), subsequence($b,2)))
 };

Execute