XQuery/Synchronizing Remote Collections

Motivation

edit

You want to update items on collections that are new or newer than another collection.

Method

edit

Many database store creation dates and last-modified dates along with resources. These dates can be used to see if a local collection is out of sync with a remote collection. An XQuery script can be written that will only list the new files or files that are newer then the creation date on your local collection.

For the eXist database, here are the two functions that are used to access the timestamps.

  xmldb:last-modified($collection, $resource)
  xmldb:created($collection, $resource)
  

Where:

  $collection is the path to the collection (xs:string)
  $resource is the name of the resource (xs:string)

For example:

   let $my-file-last-modified := xmldb:last-modified('/db/test', 'myfile.xml')

Will return the date and time that the file myfile.xml in the collection /db/text was last modified. The format of the timestamp is the XML Schema dateTime format:

  "2009-06-04T07:50:04.828-05:00"

For example, this indicates the time is 7:50 am on June the 4th, 2009 for Central Standard Time which is 5 hours behind Coordinated Universal Time (UTC).

Sample Recursive Collection Last Modified Function

edit

You can combine the xmldb:last-modified() function with another function xmldb:get-child-collections($collection) that returns all of the child collections of the current collection. By calling itself using tail recursion you can find all the last modified dates within a collection and all its subcollections.

Here is a sample XQuery function that returns a list of all the last-modified date-times of the resources in a collection and all of the subcollections under it.

declare function local:collection-last-modified($collection as xs:string) as node()* {
<collection>
   {attribute {'cid'} {$collection} }
   {for $resource in xmldb:get-child-resources($collection)
      return
      <resource>
        {attribute {'id'} {$resource}}
        {attribute {'last-modified'} {xmldb:last-modified($collection, $resource)}}
      </resource>,
      if (exists(xmldb:get-child-collections($collection)))
        then (
           for $child in xmldb:get-child-collections($collection)
           order by $child
           return
              (: note the recursion here :)
              local:collection-last-modified(concat($collection, '/', $child))
           )
         else ()
  }
</collection>
};

Note that two attributes are added to each resource. One is the resource id which must be unique in each collection an the other is the date the resource was last modified.

Sample Driver

edit

You can call this function by simply passing the collection root you wish to start at.

xquery version "1.0";
let $collection := '/db/test'

return
<last-modified-report>
  {local:collection-last-modified($collection)}
</last-modified-report>

This returns the following file:

<last-modified-report>
   <collection cid="/db/test">
      <resource id="get-remote-collection.xq" last-modified="2009-04-29T08:16:06.104-05:00"/>
      <collection cid="/db/test/views">
         <resource id="get-site-mod-dates.xq" last-modified="2009-04-30T09:01:58.599-05:00"/>
         <resource id="site-last-modified.xq" last-modified="2009-04-30T09:07:10.016-05:00"/>
      </collection>
   </collection>
</last-modified-report>

Driving Syncs with Apache Ant

edit

You can now use these transforms to create batch files that will transfer only the files that have changed or are new.

Many databases provide Apache Ant tasks that have functions that extract and store operations.

Here is a sample Apache ant target that does an extract on a local file and stores it on a remote file.

<target name="push-bananas">
      <xdb:extract 
         xmlns:xdb="http://exist-db.org/ant" 
         uri="xmldb:exist://${local-host}/exist/xmlrpc/db/test/sync" 
         resource="bananas.xml" 
         destfile="C:/backup/db/test/sync/bananas.xml"
         user="${local-user}"
         password="${local-pass}"
      />
      <xdb:store 
         xmlns:xdb="http://exist-db.org/ant" 
         uri="xmldb:exist://${remote-host}/exist/xmlrpc/db/test/sync" 
         srcfile="/backup/db/test/sync/bananas.xml" createcollection="true"
         user="${remote-user}"
         password="${remote-pass}"
        />
     </target>

Note that the following properties must be set in this Ant file.

   <property name="local-host" value="localhost"/>
   <property name="local-user" value="admin"/>
   <property name="local-pass" value="put-local-pw-here"/>
   
   <property name="remote-host" value="example.com"/>
   <property name="remote-user" value="admin"/>
   <property name="remote-pass" value="put-remote-pw-here"/>