XQuery/XQuery Batch Jobs

Motivation

edit

You want to run an XQuery job at regular intervals.

Method

edit

We will use the eXist job scheduler. The eXist job scheduler is built around the Quartz system and eXist provides an XQuery API to this system to add and remove jobs.

Method 1: Modify the conf.xml file

edit

If you have a job that needs to run on a regular basis, you can just add a single line to your $EXIST_HOME/conf.xml file. For example, if you have a simple XQuery script that writes a dateTime stamp to the log file, you could add the following line:

Sample Addition to conf.xml

edit

This line says that when seconds=0 for every minute of every hour of every day-of-month for every month for each day of week, run this job.

 <!-- run hello world to a log file every minute 
      Fields are: Sec, Min, Hrs, Day-of-Month, Month, Day-of-Week
-->
<job xquery="/db/test/sched/datetime-logger.xq"  cron-trigger="0 * * * * ?"/>

Sample datetime-logger.xq

edit
xquery version "1.0";

(: append the current date and time to the log file :)

let $datetime := current-dateTime()
let $message := concat('Current date-time :  ', $datetime)
let $log := util:log-system-out($message)

return
<results>
   <log>{$message}</log>
</results>

Sample Output in log file

edit
  (Line: 7) Current date-time:  2011-07-18T15:58:00-05:00
  (Line: 7) Current date-time:  2011-07-18T15:59:00-05:00

Sample Weekly Lucene Optimize

edit

Add the following line to your $EXIST_HOME/conf.xml file.

<!-- optimize the Lucene Indexes at 10:15AM Monday -->
<job xquery="/db/system/jobs/optimize-lucene-indexes.xq"  cron-trigger="0 15 10 * * MON"/>

Sample XQuery to Optimize Lucene Indexes

edit

Contents of /db/system/jobs/optimize-lucene-indexes.xq

xquery version "1.0";

(: run the Lucene Optimize function after new content has been loaded :)

let $login := xmldb:login('/db', 'admin', 'YOUR-ADMIN-PASSWORD')
let $log-start := util:log-system-out(concat('Starting Lucene Optimize at :', current-dateTime()))
let $start-time := util:system-time()
let $optomize := ft:optimize()
let $end-time := util:system-time()
let $runtimems := (($end-time - $start-time) div xs:dayTimeDuration('PT1S'))  * 1000  
let $log-end := util:log-system-out(concat('Finished Lucene Optimize at :', current-dateTime()))

return
<results>
   <message>Finished ft:optimize() in {$runtimems} ms</message>
</results>

Method 2: Use the XQuery API

edit

In this method we will use the XQuery API to add, view and remove jobs from the job scheduler.

To enable the XQuery scheduler you may have to set a line in the $EXIST_HOME/extensions

  include.module.scheduler = true

And then type "build" to recompile the code.

And also make sure that the line in the $EXIST_HOME/conf.xml is un-commented.

 <module uri="http://exist-db.org/xquery/scheduler" class="org.exist.xquery.modules.scheduler.SchedulerModule" />

Here are the two functions to add and delete jobs.

  scheduler:schedule-xquery-cron-job($xquery-path, $cron-string, $job-id)
  scheduler:delete-scheduled-job($job-id)

Note: You must make sure that the XQuery job scheduler module is enabled in your system. You can verify this by the following XQuery:

The format of cron string is documented [1]:

Listing Scheduled Jobs

edit

You can get a list of all scheduled jobs by using the scheduler:get-scheduled-jobs() XQuery function. This returns a document that has the following format:

<scheduler:jobs count="5" xmlns:scheduler="http://exist-db.org/xquery/scheduler">
    <scheduler:group name="eXist.System">
        <scheduler:job name="Sync">
            <scheduler:trigger name="Sync Trigger">
                <expression>2500</expression>
                <state>1</state>
                <start>2012-09-14T15:48:24.724Z</start>
                <end/>
                <previous>2012-09-25T17:31:12.224Z</previous>
                <next>2012-09-25T17:31:13.57Z</next>
                <final/>
            </scheduler:trigger>
        </scheduler:job>
    </scheduler:group>
    <scheduler:group name="eXist.User">
        <scheduler:job name="REST_TimeoutCheck">
            <scheduler:trigger name="REST_TimeoutCheck Trigger">
                <expression>2000</expression>
                <state>1</state>
                <start>2012-09-14T15:48:25.337Z</start>
                <end/>
                <previous>2012-09-25T17:31:13.337Z</previous>
                <next>2012-09-25T17:31:13.57Z</next>
                <final/>
            </scheduler:trigger>
        </scheduler:job>
    </scheduler:group>

Adding and Removing Jobs with XQuery

edit

The following is a sample of the system calls to add and remove jobs:

xquery version "1.0";

(: unit test to add a datetime logger job to the job scheduler 
   to monitor this you can do $tail -f $EXIST_HOME/webapp/WEB-INF/logs/exist.log :)

let $xquery-path := '/db/dma/apps/job-scheduler/scripts/log-datetime.xq'

(: run the logger every minute :)
let $cron := '0 * * * * ?'

(: http://en.wikibooks.org/wiki/XQuery/XQuery_Batch_Jobs :)
let $add := scheduler:schedule-xquery-cron-job($xquery-path, $cron, 'Test of Schedule XQuery Cron Job')

return
<results>
   <xquery-path>{$xquery-path}</xquery-path>
   <cron>{$cron}</cron>
   <result>{$add}</result>
</results>

Avoiding Concurrent Jobs

edit

Sometimes you want to run a job frequently that polls a remote site, for example once every five minutes. If it finds a file it might want to transfer files. But sometimes the time to transfer the files is longer than the polling frequency. This will restart the job again.

To get around this you have two options. One is to be able to configure eXist to not run concurrent jobs. This is not available in 2.1. In that case you may need to set a flag that will test to see if the prior job has finished. You can use the cache module to set one flag for each job.

The following shows how you can use the put/get and remove functions to manage global state across queries:

let $job-id := request:get-parameter('job-id', 'poll-customer-123')
let $delete := xs:boolean(request:get-parameter('delete', 'false'))

return
 if ($delete)
    then
      let $remove := cache:remove('running-jobs', $job-id)
      return <result><message>Job {$job-id} has finsihed.</message></result>
    else

  if (cache:get('running-jobs', $job-id) = 'true')
    then <result><message>Job {$job-id} is already running.</message></result> (: if the job is running then exit :)
    else
    
(: continue :)
let $set-running-flag := cache:put('running-jobs', $job-id, 'true')

return
<result>
  <message>Job {$job-id} has been started</message>
</result>