XQuery/Database Utilities

Motivation

edit

You would like a standardized set of tools for getting a list of all collections and resources and for resources that match specific patterns or have specific mime-types.

There are several functions that allow you to do a fast scan utility functions for each resource (file or collection) or just collections. Each of these functions have two arguments, one for the starting collection and the other is a callback function that can be used to filter out unwanted matches.

  dbutil:scan(xs:anyURI($collection as xs:anyURI, $func as function($collection-path as xs:anyURI, $full-file-path as xs:anyURI?)) as item()*)
  dbutil:scan-resources($collection as xs:anyURI, $func as function(xs:anyURI) as item()*)
  dbutil:scan-collections($collection as xs:anyURI), $func as function(xs:anyURI) as item()*)
  dbutil:find-by-mimetype($collection as xs:anyURI, $mimeType as xs:string)
  dbutil:find-by-mimetype($collection as xs:anyURI, $mimeType as xs:string, $func as function(xs:anyURI)

These are all small XQuery functions and can be found at:

$EXIST_HOME/webapp/WEB-INF/data/expathrepo/shared-0.3.1/content/dbutils.xql

Example of Scan

edit

The following example uses the scan() function to get a list of all the resources in the collection that end with the suffix 'xunit.xq' - which is a script that returns the results in XUnit test result format.

Sample Code

edit
xquery version "1.0";
import module namespace dbutil="http://exist-db.org/xquery/dbutil";

let $root-collection := '/db'
let $file-suffix := 'xunit.xq'

let $find-file-path :=
  (: This function MUST have two parameters, the first is the collection path, the second is the FULL path to a resource.
     Note that the second parameter must be able to be optional.  :)
  function($collection as xs:anyURI, $full-path-to-resource as xs:anyURI?) {
    if (ends-with($full-path-to-resource, $file-suffix))
        then
          $full-path-to-resource
        else ()
     }
     
  return
  <results>
      {for $resource in dbutil:scan(xs:anyURI($root-collection), $find-file-path)
      return
         <resource>{$resource}</resource>
       }
  </results>

Sample Results

edit
<results>
   <resource>/db/apps/myapp/unit-tests/scan-xunit.xq</resource>
   <resource>/db/apps/myapp/unit-tests/scan-collection-xunit.xq</resource>
</results>

Example of Scan Collection

edit

The following example uses the scan-collection() function to get a list of all collection paths that end with the string "/unit-tests".

Source Code

edit
xquery version "1.0";

import module namespace dbutil="http://exist-db.org/xquery/dbutil";

let $root-collection := '/db'
let $leaf-collection-to-find := '/unit-tests'

let $find-leaf-function :=
  function($root-collection as xs:anyURI) {
  if (ends-with($root-collection, $leaf-collection-to-find))
   then
     $root-collection
   else ()
}
  return
  <results>
      {for $collection in dbutil:scan-collections(xs:anyURI($root-collection), $find-leaf-function)
      return
         <collection>{$collection}</collection>
       }
  </results>

Results

edit
<results>
   <collection>/db/apps/skos/unit-tests</collection>
   <collection>/db/apps/graph2svg/unit-tests</collection>
   <collection>/db/apps/eco-meta/unit-tests</collection>
   <collection>/db/nosql/unit-tests</collection>
   <collection>/db/nosql/apps/make-docx/unit-tests</collection>
   <collection>/db/nosql/apps/docbook-to-docx/unit-tests</collection>
</results>

Other Followup Notes

edit

Again it illustrates how to obtain information from Freebase via it's MQL language (it predated Sparql).

The previous query was taken from

https://developers.google.com/freebase/v1/mql-overview

and it limits the data that results from the call to the Freebase API. You can see a limit parameter being set to 3 below in the API call.

https://www.googleapis.com/freebase/v1/mqlread?query=[{"type":"/music/album","name":null,"artist":{"id":"/en/bob_dylan"},"limit":3}]&cursor

If you do not specify a limit with your API call, Freebase will impose a limit of 100 records on your query. This message addresses the question of how to get everything.

The key to doing this is dangling at the end of the above API call, it's the cursor parameter and it's usage is discussed with an example here

  https://developers.google.com/freebase/v1/mql-overview#querying-with-cursor-paging-results

To summarise you ask for a cursor (see the example API call above for the form of the initial request) to be returned with your query results which acts as a link to the next set of query results. You obtain that next set by supplying the value of the cursor returned from the previous invocation. Along with that next set you get another cursor that points to the set after that. When the final set of results are retrieved the cursor returns a string value of false (the Freebase overview has this in upper case but my code used lower case 'false' and that works).

The overview has sample Python code which I have not tried or parsed in anger but which I believe invokes libraries that take care of all the cursor handling for you.

  https://developers.google.com/freebase/v1/mql-overview#looping-through-cursor-results

However the same thing can easily be achieved from XQuery with a little bit of tail recursion.

We will use as an example a MQL query that returns all films with their netflix_id's.

[{
  "type": "/film/film",
  "name": null,
  "netflix_id": []
}]

A few brief comments about MQL. You ask for something by giving the field name and a value null. Null gets replaced by the actual value. However if the field can have multiple values MQL will return an array and cause your null query to error. This may happen even when you are expecting a singular value so you can avoid this problem by using the symbol for an empty array instead of null as in the query above.

You can paste the query above into

  http://www.freebase.com/query

to see the results (we will take care of the cursor in the code example).

Now to the code, which assumes XQuery 3.0

xquery version "3.0";

import module namespace xqjson="http://xqilla.sourceforge.net/lib/xqjson";

Freebase returns JSON but we want to store this in an xml db so we use the above package for json to XML conversion. From eXist you can install the package by just clicking it on the eXist Package Manager which you can get to from the eXist Dashboard.

We declare a variable for our query.

declare variable $mqlQuery {'[{
   "type": "/film/film",
  "name": null,
  "netflix_id": []
}]'};

declare variable $freebase {'https://www.googleapis.com/freebase/v1/mqlread'};
declare variable $key {obtain an API key from freebase and puts it's
value here'};

Reference

edit

Posted by Wolfgang in April of 2014