XQuery/DocBook to HTML

Motivation edit

You would like to convert DocBook documents to HTML format.

Method edit

We will use an XQuery typewsitch transform that converts sample instance documents into an XQuery typeswitch module. To begin this process you can use any tool that generates an instance document from the XML Schema. You can then edit this document to include only the elements that you want to transform. You can then run this file through the tool to generate the typeswich XQuery module.

Dispatch Function edit

Our main dispatch function will have the following structure:

declare namespace db="http://docbook.org/ns/docbook";)

declare function db2html:transform($nodes as node()*) as item()* {
    for $node in $nodes
    return 
        typeswitch($node)
            case text() return $node
            case element(db:article) return db2html:article($node)
            case element(db:book) return db2html:book($node)
            case element(db:info) return db2html:info($node)
            case element(db:para) return db2html:para($node)
            case element(db:sect1) return db2html:sect1($node)
            case element(db:title) return () (: all titles will be transformed by their parent node :)
            case element(db:emphasis) return db2html:emphasis($node)
            default return <u>{$node}</u>
};

Note the following:

  1. the transform function takes a sequences of nodes as a parameter and the typeswitch function is retuned for each of the nodes.
  2. the transform returns a sequence of items
  3. the default action is to copy the node from the input to the output. All nodes not in the case statement will be passed through without modification. You can change this default behavior by changing the type returned by default.
  4. for each of the elements, a separate function "element function" is called. If the elements are leaf elements they can just return the content of the leaf element. If they are not leaf elements the MUST call the transform for each subbranch they contain. By changing each function you can change what output is created by that element. This structure keeps your transform modular and easy to maintain.
  5. the namespace for each element is the Docbook namespace (db is the prefix associated with the DocBook 5 URL http://docbook.org/ns/docbook
  6. all elements not in the case statement will be returned with an "u" for unknown element wrapper. This makes them easy to spot when debugging.

Structure of the element functions edit

Each element function has a simple structure. For example the <para> element might have the following structure:

declare function db2html:para($node as node()) as element() {
<p>
  {db2html:transform($node/node())}
</p>
};

This has the effect of converting all the <db:para> nodes into

nodes. Note that the input type is node() and the return type is element(). Also note that since <para> elements contain other sub nodes (elements and text) they are all processed by the recursive call to db2html:transform($node/node()). Here is a full list of the functions for the above module.

declare function db2html:article($node as node()) as element() {
<div class="article">
  {db2html:transform($node/node())}
</div>
};

declare function db2html:book($node as node()) as element() {
<div class="book">
  {db2html:transform($node/node())}
</div>
};

declare function db2html:info($node as node()) as element() {
<div class="info">
  <h1>{$node/db:title/text()}</h1>
</div>
};


declare function db2html:sect1($node as node()) as element() {
<div class="sect1">
  <h2>{$node/db:title/text()}</h2>
  {db2html:transform($node/node())}
</div>
};

declare function db2html:title($node as node()) as element() {
<div class="title">
  {db2html:transform($node/node())}
</div>
};

declare function db2html:para($node as node()) as element() {
<p>
  {db2html:transform($node/node())}
</p>
};

declare function db2html:emphasis($node as node()) as element() {
<b>
  {db2html:transform($node/node())}
</b>
};


Sample Conversion edit

Input edit

<article xmlns="http://docbook.org/ns/docbook" version="5.0">
    <info>
        <title>Article Template Title</title>
    </info>
    <sect1>
        <title>Section1 Title</title>
        <para>Typewsitch transforms are <emphasis role="bold">very</emphasis> powerful.</para>
    </sect1>
</article>

To invoke the transform you simply pass the root node of the DocBook document to the function:

let $input := doc('/db/my-docbook-5-document.xml')/db:article
let $output := db2html:transform($input)
return $output

Note that you MUST put the root element that you want to transform after the doc() function. Without this the doc() function returns a document node that will not match any of the document case statements.

Output edit

<div class="article">
   <div class="info">
     <h1>Article Template Title</h1>
   </div>
   <div class="sect1">
      <h2>Section1 Title</h2>
      <p>Typewsitch transforms are <b>very</b>powerful.</p>
   </div>
</div>

References edit

Source code for this example on Google Code: http://code.google.com/p/xrx/source/browse/#svn%2Ftrunk%2F21-docbook-2-html

Chis Wallace has provided a tool that converts the XML Docbook into a typeswitch here: Generating Skeleton Typeswitch Transformation Modules

DocBook to HTML Typeswitch Transform