XQuery/XQuery Batch Jobs
Motivation
editYou want to run an XQuery job at regular intervals.
Method
editWe will use the eXist job scheduler. The eXist job scheduler is built around the Quartz system and eXist provides an XQuery API to this system to add and remove jobs.
Method 1: Modify the conf.xml file
editIf you have a job that needs to run on a regular basis, you can just add a single line to your $EXIST_HOME/conf.xml file. For example, if you have a simple XQuery script that writes a dateTime stamp to the log file, you could add the following line:
Sample Addition to conf.xml
editThis line says that when seconds=0 for every minute of every hour of every day-of-month for every month for each day of week, run this job.
<!-- run hello world to a log file every minute
Fields are: Sec, Min, Hrs, Day-of-Month, Month, Day-of-Week
-->
<job xquery="/db/test/sched/datetime-logger.xq" cron-trigger="0 * * * * ?"/>
Sample datetime-logger.xq
editxquery version "1.0";
(: append the current date and time to the log file :)
let $datetime := current-dateTime()
let $message := concat('Current date-time : ', $datetime)
let $log := util:log-system-out($message)
return
<results>
<log>{$message}</log>
</results>
Sample Output in log file
edit(Line: 7) Current date-time: 2011-07-18T15:58:00-05:00 (Line: 7) Current date-time: 2011-07-18T15:59:00-05:00
Sample Weekly Lucene Optimize
editAdd the following line to your $EXIST_HOME/conf.xml file.
<!-- optimize the Lucene Indexes at 10:15AM Monday -->
<job xquery="/db/system/jobs/optimize-lucene-indexes.xq" cron-trigger="0 15 10 * * MON"/>
Sample XQuery to Optimize Lucene Indexes
editContents of /db/system/jobs/optimize-lucene-indexes.xq
xquery version "1.0";
(: run the Lucene Optimize function after new content has been loaded :)
let $login := xmldb:login('/db', 'admin', 'YOUR-ADMIN-PASSWORD')
let $log-start := util:log-system-out(concat('Starting Lucene Optimize at :', current-dateTime()))
let $start-time := util:system-time()
let $optomize := ft:optimize()
let $end-time := util:system-time()
let $runtimems := (($end-time - $start-time) div xs:dayTimeDuration('PT1S')) * 1000
let $log-end := util:log-system-out(concat('Finished Lucene Optimize at :', current-dateTime()))
return
<results>
<message>Finished ft:optimize() in {$runtimems} ms</message>
</results>
Method 2: Use the XQuery API
editIn this method we will use the XQuery API to add, view and remove jobs from the job scheduler.
To enable the XQuery scheduler you may have to set a line in the $EXIST_HOME/extensions
include.module.scheduler = true
And then type "build" to recompile the code.
And also make sure that the line in the $EXIST_HOME/conf.xml is un-commented.
<module uri="http://exist-db.org/xquery/scheduler" class="org.exist.xquery.modules.scheduler.SchedulerModule" />
Here are the two functions to add and delete jobs.
scheduler:schedule-xquery-cron-job($xquery-path, $cron-string, $job-id) scheduler:delete-scheduled-job($job-id)
Note: You must make sure that the XQuery job scheduler module is enabled in your system. You can verify this by the following XQuery:
The format of cron string is documented [1]:
Listing Scheduled Jobs
editYou can get a list of all scheduled jobs by using the scheduler:get-scheduled-jobs() XQuery function. This returns a document that has the following format:
<scheduler:jobs count="5" xmlns:scheduler="http://exist-db.org/xquery/scheduler">
<scheduler:group name="eXist.System">
<scheduler:job name="Sync">
<scheduler:trigger name="Sync Trigger">
<expression>2500</expression>
<state>1</state>
<start>2012-09-14T15:48:24.724Z</start>
<end/>
<previous>2012-09-25T17:31:12.224Z</previous>
<next>2012-09-25T17:31:13.57Z</next>
<final/>
</scheduler:trigger>
</scheduler:job>
</scheduler:group>
<scheduler:group name="eXist.User">
<scheduler:job name="REST_TimeoutCheck">
<scheduler:trigger name="REST_TimeoutCheck Trigger">
<expression>2000</expression>
<state>1</state>
<start>2012-09-14T15:48:25.337Z</start>
<end/>
<previous>2012-09-25T17:31:13.337Z</previous>
<next>2012-09-25T17:31:13.57Z</next>
<final/>
</scheduler:trigger>
</scheduler:job>
</scheduler:group>
Adding and Removing Jobs with XQuery
editThe following is a sample of the system calls to add and remove jobs:
xquery version "1.0";
(: unit test to add a datetime logger job to the job scheduler
to monitor this you can do $tail -f $EXIST_HOME/webapp/WEB-INF/logs/exist.log :)
let $xquery-path := '/db/dma/apps/job-scheduler/scripts/log-datetime.xq'
(: run the logger every minute :)
let $cron := '0 * * * * ?'
(: http://en.wikibooks.org/wiki/XQuery/XQuery_Batch_Jobs :)
let $add := scheduler:schedule-xquery-cron-job($xquery-path, $cron, 'Test of Schedule XQuery Cron Job')
return
<results>
<xquery-path>{$xquery-path}</xquery-path>
<cron>{$cron}</cron>
<result>{$add}</result>
</results>
Avoiding Concurrent Jobs
editSometimes you want to run a job frequently that polls a remote site, for example once every five minutes. If it finds a file it might want to transfer files. But sometimes the time to transfer the files is longer than the polling frequency. This will restart the job again.
To get around this you have two options. One is to be able to configure eXist to not run concurrent jobs. This is not available in 2.1. In that case you may need to set a flag that will test to see if the prior job has finished. You can use the cache module to set one flag for each job.
The following shows how you can use the put/get and remove functions to manage global state across queries:
let $job-id := request:get-parameter('job-id', 'poll-customer-123')
let $delete := xs:boolean(request:get-parameter('delete', 'false'))
return
if ($delete)
then
let $remove := cache:remove('running-jobs', $job-id)
return <result><message>Job {$job-id} has finsihed.</message></result>
else
if (cache:get('running-jobs', $job-id) = 'true')
then <result><message>Job {$job-id} is already running.</message></result> (: if the job is running then exit :)
else
(: continue :)
let $set-running-flag := cache:put('running-jobs', $job-id, 'true')
return
<result>
<message>Job {$job-id} has been started</message>
</result>