XQuery/Queries on Tables
Motivation
editWe are frequently given data in a tabular structure and we are required to extract data relative to a position within the table. For example we might need to fill the data in a cell of a table that has the sum of the other values in that column.
Method
editWe will build a library of XQuery functions that uses XPath expressions to get values from the table assuming you are at some position in the table.
Sample Input
editHere is a sample of a table where each cell has the row and column number within the table data element:
let $table :=
<table>
<tr>
<td>r1.c1</td><td>r1.c2</td><td>r1.c3</td><td>r1.c4</td>
</tr>
<tr>
<td>r2.c1</td><td>r2.c2</td><td>r2.c3</td><td>r2.c4</td>
</tr>
<tr>
<td>r3.c1</td><td>r3.c2</td><td>r3.c3</td><td>r3.c4</td>
</tr>
<tr>
<td>r4.c1</td><td>r4.c2</td><td>r4.c3</td><td>r4.c4</td>
</tr>
</table>
Sample functions
editHere are three functions that extract a cell, a row and a column
declare function local:cell($table as node(), $row-num as xs:integer, $col-num as xs:integer) {
$table/tr[$row-num]/td[$col-num]
};
This function takes the input table and removes all but a single row and single column using predicates within the XPath expression. So to get the second row and third column the executed expression is: $table/tr[2]/td[3]
(: returns all the cells of the current row :)
declare function local:current-row($current-td as node()) {
$current-td/..
};
This function simply gets the <tr> element that contains the current cell we are in. So if we run
$row := local:current-row( local:cell($table, 2, 3) )
we would get back
<tr>
<td>r2.c1</td>
<td>r2.c2</td>
<td>r2.c3</td>
<td>r2.c4</td>
</tr>
Our last utility function will find all the cells of a column of a table. Here is the code to do this:
(: returns all the cells of the current column :)
declare function local:current-col($current-td as node()) as node()* {
(: figure out what column we are on by counting prior cells :)
let $col-num := count($current-td/preceding-sibling::td) + 1
return
<col-cells>
{$current-td/../../tr/td[$col-num]}
</col-cells>
};
This function is a bit more complex. We need to first figure out what column we are within the table. To do this we will use the preceding-sibling XPath axis expression to count the number of prior cells within the table. We then add one so that if there are no prior columns we will be on the first column. We could have also used preceding-sibling::* if we were not sure that some cells used other element names such as <th> for table headers. Once we know what column we are in we can just return the table which we get by adding ../.. and then get all rows and only our current column with /tr/td[$col-num] .
Sample Driver Query
edit(: put table here :)
let $cell-r2-c3 := local:cell($table, 2, 3)
return
<results>
<current-cell>{$cell-r2-c3}</current-cell>
<current-row>{local:current-row($cell-r2-c3)}</current-row>
<current-column>{local:current-col($cell-r2-c3)}</current-column>
</results>
which returns
<results>
<current-cell>
<td>r2.c3</td>
</current-cell>
<current-row>
<tr>
<td>r2.c1</td>
<td>r2.c2</td>
<td>r2.c3</td>
<td>r2.c4</td>
</tr>
</current-row>
<current-column>
<col-cells>
<td>r1.c3</td>
<td>r2.c3</td>
<td>r3.c3</td>
<td>r4.c3</td>
</col-cells>
</current-column>
</results>
Adding Calculations to a Table
editNow that we have a strategy for getting the rows and columns of a cell, lets add two calculation types to our table. We will modify our table so that it contains only digits or the expressions {rowsum} or {colsum}. The table will look like the following:
Table with data and operations
edit<table>
<tr>
<td>1.1</td><td>1.2</td><td>1.3</td><td>1.4</td><td>rowsum</td>
</tr>
<tr>
<td>2.1</td><td>2.2</td><td>2.3</td><td>2.4</td><td>rowsum</td>
</tr>
<tr>
<td>3.1</td><td>3.2</td><td>3.3</td><td>3.4</td><td>rowsum</td>
</tr>
<tr>
<td>4.1</td><td>4.2</td><td>4.3</td><td>4.4</td><td>rowsum</td>
</tr>
<tr>
<td>colsum</td><td>colsum</td><td>colsum</td><td>colsum</td><td>colsum</td>
</tr>
</table>
Now we will need a function that replaces each calculation with the values. We will also replace the functions for each row and column with functions that sum the values that are castable as digits.
full source code
editxquery version "1.0";
(: returns the cell of a table at the specified row and column number :)
declare function local:cell($table as node(), $row-num as xs:integer, $col-num as xs:integer) {
$table/tr[$row-num]/td[$col-num]
};
(: returns the sum of all items in the current row that are castable to a decimal:)
declare function local:sum-current-row($current-td as node()) as xs:decimal {
sum(
for $td in $current-td/..//td
return
if ($td castable as xs:double)
then xs:double($td/text())
else ()
)
};
(: returns the sum of all items in the current column that are castable to a decimal :)
declare function local:sum-current-col($current-td as node()) as xs:decimal {
(: figure out what column we are on by counting prior cells :)
let $col-num := count($current-td/preceding-sibling::td) + 1
return
sum(
for $td in $current-td/../../tr/td[$col-num]
return
if ($td castable as xs:double)
then xs:decimal($td)
else ()
)
};
declare function local:transform-table($table as node()) as node() {
<table>
{for $row in $table/tr
return
<tr>
{for $td in $row/td
return
if ($td castable as xs:decimal)
then $td
else
<td>
{if ($td = 'rowsum')
then local:sum-current-row($td)
else
if ($td = 'colsum')
then local:sum-current-col($td)
else 'unknown-function'
}
</td>
}
</tr>
}
</table>
};
let $title := 'table queries'
let $table :=
<table>
<tr>
<td>1.1</td><td>1.2</td><td>1.3</td><td>1.4</td><td>rowsum</td>
</tr>
<tr>
<td>2.1</td><td>2.2</td><td>2.3</td><td>2.4</td><td>rowsum</td>
</tr>
<tr>
<td>3.1</td><td>3.2</td><td>3.3</td><td>3.4</td><td>rowsum</td>
</tr>
<tr>
<td>4.1</td><td>4.2</td><td>4.3</td><td>4.4</td><td>rowsum</td>
</tr>
<tr>
<td>colsum</td><td>colsum</td><td>colsum</td><td>colsum</td><td>colsum</td>
</tr>
</table>
(: get the nth row and nth column :)
let $cell-r2-c3 := local:cell($table, 2, 3)
return
<html>
<body>
{local:transform-table($table)}
</body>
</html>
Table with row and column totals
editWhich returns the following results (need screen image here):
1.1 | 1.2 | 1.3 | 1.4 | 5 |
2.1 | 2.2 | 2.3 | 2.4 | 9 |
3.1 | 3.2 | 3.3 | 3.4 | 13 |
4.1 | 4.2 | 4.3 | 4.4 | 17 |
10.4 | 10.8 | 11.2 | 11.6 | 0 |
Note that the final colsum of rowsums, the result of calculations is zero. This is due to the fact that the totals of row-specific sub-totals are not yet calculated and placed in the original table before the colsum is done. To fix this we could do one or more of the following:
- Use updates (XQuery updates) to update the table after each operation
- a new tablesum function
- generalized the functions to work more like a spreadsheet.
To do this the table would need to be updated with the sums as they are calculated. But the order of the totals may not be correct unless we use a dependency graph to understand the order that calculations should occur. This can easily be done by using an XForms framework.