XQuery/Grouping Items

Motivation edit

You have many items in a set of data that have a category associated with them. You want to create a report that sorts the items by a category.

Method edit

We will perform the query in three steps.

  1. use a FLWOR statement create a sequence of the distinct categories using the distinct-values() function
  2. for each item in the category sequence, select all items that belong to that category. This will be done by adding a predicate (where clause) to the end of our XPath selector. This takes the form of data/item[x=y] where if x=y returns true the item will be added to the sequence
  3. for each result set in the FLWOR statement return the category name and then all the items in that category

Sample Data edit

<items>
    <item>
        <name>item #1</name>
        <category>red</category>
    </item>
 <item>
        <name>item #1</name>
        <category>red</category>
    </item>
    <item>
        <name>item #2</name>
        <category>green</category>
    </item>
    <item>
        <name>item #3</name>
        <category>red</category>
    </item>
    <item>
        <name>item #4</name>
        <category>blue</category>
    </item>
    <item>
        <name>item #5</name>
        <category>red</category>
    </item>
    <item>
        <name>item #6</name>
        <category>blue</category>
    </item>
    <item>
        <name>item #7</name>
        <category>green</category>
    </item>
    <item>
        <name>item #8</name>
        <category>red</category>
    </item>
</items>

Sample Query edit

The following XQuery will demonstrate this technique. Note that the distinct values for all the categories are stored in the $distinct-categories variable.

xquery version "1.0";
declare option exist:serialize "method=xhtml media-type=text/html";

(: load the items :)
let $data := doc('/db/mdr/apps/training/labs/04-group-by/data.xml')/items

let $distinct-categories := distinct-values($data/item/category/text())

return
<html>
  <body>
     <table border="1">
        <thead>
           <tr>
              <th>Category</th>
              <th>Items</th>
           </tr>
        </thead>
        <tbody>
        {
        for $category in $distinct-categories
        return
           <tr>
              <td>{$category}</td>
              <td>{string-join($data/item[category=$category]/name/text(), ', ')}</td>
           </tr>
        }
        </tbody>
     </table>
  </body>
</html>

In the query above the statement:

  $data/item[category=$category]

reads as "get all the items from the data set that have the category element equal to the current category.

The string-join() function just puts a comma and a space string between the items in the output stream for readability.

Sample Output edit

Category Items
red item #1, item #3, item #5, item #8
green item #2, item #7
blue item #4, item #6

Discussion edit

Note that you are not restricted to having an item be in a single category. Adding multiple categories to an item will not require any changes to the script.

You can also add new categories to this list at any time without changing the program above. As long as there are range indexes for the category element the list of all categories will be created very quickly, even for millions of records.