XQuery/Publishing Overview

Motivation edit

You have a workflow process that allows an internal team to review web content before it is transferred to a public web site. When the documents have been marked "approved for publication" they must be transferred to a public web server in a controlled way.

Methods edit

There are many ways to transfer XML documents from one server to another. This document will describe an set of basic methods that may vary based on your local configuration. In this document the following figure will be used.

 
Configuration of Internal Content Management Systems and Publishing Server in DMZ

Simple Publication Workflows edit

Many organizations have strict policy guidelines on who has permission to publish content to a public web site. Before content is transferred to a public web site documents that intent to be published typically go through a series of stages:

  • draft - documents that are in very early stages of a quality control process
  • under-review - documents that are being reviewed by a editorial team for quality such as content quality, spelling and typographical errors
  • approved-for-publication - documents that have been approved for publication on a public web site.

All documents that have been marked approved-for-publication can then be transferred from an internal content site to the public web server. In general only specified users with specified roles are allowed to mark documents as approved-for-publication.

Simple Publishing Scripts edit

There are several options to creating publishing scripts. We will begin with a very simple script and then add features.

Publishing with HTTP PUTs and DELETEs edit

This simplest way to publish documents is to use the eXist (or EXQuery) [1] library. This library has PUT and DELETE operations that can be used to programmatically add and delete web content on your publication server.

Here is an example of the httpclient:put() function:

  httpclient:put($url as xs:anyURI,
                 $content as node(),
                 $persist as xs:boolean,
                 $request-headers as element()?) item()

Pros: very simple to use Cons: no central audit trail of who published what and when

Publishing with a POST service edit

An alternative is to create a central publishing service on your public web site that will coordinate all publishing events. This can be done by using the HTTP POST client function and then writing a single publication server to catch and log all publication events.

  httpclient:post($url as xs:anyURI,
                 $content as node(),
                 $persist as xs:boolean,
                 $request-headers as element()?) item()

You must remember to use the cast to xs:anyURI for the URL. For example:

  let $post-status :=
     httpclient:post(xs:anyURI('http://example.com/db/get-doc.xq'), $doc-to-publish, true(), ())

Publishing with a Callback edit

It is frequently easier to instruct a web service on the public web server that you now have a new resource that is ready to published but not push the entire file to the public web site in the request. Only a URL to the resource is sent to the public web server that includes four parameters:

  • the user publishing the document
  • any authentication credentials
  • the type (publish or delete)
  • a comments on the reason for publication or deletion
  • the identifier of the document to be pulled from the central content management system by the public web server or the id of the document to be deleted

The public web server then calls a function to load that resource from the content management system inside the system. This can be done with standard URL parameters. Note that in this case the passwords will be in the web log files.

Example of getting URL parameters with: publish-with-callback.xq

(: The user that will execute the login :)
let $user := request:get-parameter('user', '')

(: The pass that will execute the login :)
let $pass := request:get-parameter('pass', '')

(: The full URL of the document we are going to bring over :)
let $url := request:get-parameter('url', '')

(: the /db location we are going to put the new document into :)
let $db-loc := request:get-parameter('db-loc', '')

(: This is the document fetch from the internal CMS server :)
let $get-doc := doc($url)

Note that this style is more secure since only documents that exist on the internal content management system are candidates for publishing.

Publishing Audit Logs edit

If you use a central web service for publishing you can now log all publishing events in a single centralized log file. This file can then be used to report and audit who changed what content on the public web site and when.

The following example shows how all publishing events can be added to a log file that shows what users published what files and when they were published or deleted. In the example below the type code should be set to be publish or delete.

let $audit-log :=
<publish-event>
   <type-code>publish|delete</type-code>
   <user>{$user}</user>
   <dateTime>{current-dateTime()}</dateTime>
   <db-loc>{$db-loc}</db-loc>
</publish-event>

(: check that the log file exists and if not then create it :)
let $check-log-exists :=
  if (doc-available($log-file))
     then ()
     else
        xmldb:store(
           functx:substring-before-last($log-file, '/'), 
           functx:substring-after-last($log-file, '/'), 
           <publish-events/> )

(: this inserts the audit record at the end of the log file :)
let $update := update insert $audit-log into doc($log-file)/publish-events

Using Certificates edit

It is sometimes not possible to create a secure connection between an internal CMS systems and the publishing web site. An alternative method is to provide certificates to each system that is authorized to publish documents to the publishing server.