XQuery/Generating PDF from XSL-FO files

Motivation

edit

You want to generate documents with precise page layout from XML documents, for example to PDF.

Approach

edit

Typically, the steps required to generate a PDF document are:

  • retrieve or compute the base XML document
  • transform XML file to XSL-FO markup, perhaps using XQuery typeswitch or XSL
  • transform the XSL-FO to PDF using the free Apache FOP or a commercial FOP rendering engine such as

http://www.renderx.com/ RenderX] or Antennahouse

Method

edit

We will use a built-in eXist function to convert XSL-FO file into PDF. (See Installing the XSL-FO module if this module is not installed and configured.)

Using the xslfo:render() function

edit

The function is the xslfo:render(). It has the following structure:

  let $pdf-binary := xslfo:render($input-xml-fo-document, 'application/pdf', $parameters)

or if you use a XSL-FO configuration file:

  let $pdf-binary := xslfo:render($input-xml-fo-document, 'application/pdf', $parameters, $fo-config-file)

This file can be saved directly to the XML file system. It will be stored as a non-searchable binary document.

You can then view this directly by providing a link to the file or you can send it directly to the browser by using the response:stream-binary() function as follows:

  return response:stream-binary($pdf-binary, 'application/pdf', 'output.pdf')

Example XQuery to Generate PDF

edit

The following program will generate a PDF document with the text "Hello World".

xquery version "1.0";
declare namespace fo="http://www.w3.org/1999/XSL/Format";
declare namespace xslfo="http://exist-db.org/xquery/xslfo";
 
let $fo :=
let $pdf := xslfo:render($fo, "application/pdf", ())
 
return response:stream-binary($pdf, "application/pdf", "output.pdf")

Execute

Notes on Installing Apache FOP Processor

edit

Enabling the XSL-FO Module

edit

You will need a module that converts XSL-FO to PDF. Examples of these are:

  1. The Apache FOP processor (free open source)
  2. The Antenna House FOP processor (commercial) http://www.antennahouse.com/
  3. The RenderX FTP processor (commercial) http://www.renderx.com/

Make sure that the module extension is loaded. You can do this by going to the $EXIST_HOME/conf.xml file and un-commenting the following line (around line 769):

<module class="org.exist.xquery.modules.xslfo.XSLFOModule"
        uri="http://exist-db.org/xquery/xslfo">
        <parameter name="processorAdapter" value="org.exist.xquery.modules.xslfo.ApacheFopProcessorAdapter"/>
</module

Where the possible values for the processorAdapter parameter are:

  org.exist.xquery.modules.xslfo.ApacheFopProcessorAdapter for Apache's FOP


If the module is correctly loaded then you should see it in the function documentation.

Make sure that you have correctly edited the $EXIST_HOME/extensions/build.properties to set XSLFO to to be true:

Change:

  # XSL FO transformations (Uses Apache FOP)
  include.module.xslfo = false

To be:

  include.module.xslfo = true

After you change BOTH these files you will need to run the "build.sh" or "build.bat" program in your $EXIST_HOME to get the new FOP binaries in the jar files.

Make sure that the build file can get access to the correct fop.jar file from the Apache web site.

Automatically Downloading The Apache XSL-FO Jar Files

edit

Exist comes with a sample ant task that can automatically download the FOP distribution zip file, extract the tree jar files we need and remove the rest. Here is the ant target from the eXist 1.4 $EXIST_HOME/modules/build.xml

<target name="prepare-libs-xslfo" unless="libs.available.xslfo" if="include.module.xslfo.config">
   <echo message="Load: ${include.module.xslfo}"/>
   <echo message="------------------------------------------------------"/>
   <echo message="Downloading libraries required by the xsl-fo module"/>
  <echo message="------------------------------------------------------"/>
   <!-- Apache FOP .95 -->
   <get src="${include.module.xslfo.url}" dest="fop-0.95-bin.zip" verbose="true" usetimestamp="true" />
      <unzip src="fop-0.95-bin.zip" dest="${top.dir}/${lib.user}">
         <patternset>
		<include name="fop-0.95/build/fop.jar"/>
			 <include name="fop-0.95/lib/batik-all-1.7.jar"/>
			 <include name="fop-0.95/lib/xmlgraphics-commons-1.3.1.jar"/>
		 </patternset>
		 <mapper type="flatten"/>
      </unzip>
   <delete file="fop-0.95-bin.zip"/>
</target>

Note that fop 1.0 is now available so you can change this task to be the following:

<target name="prepare-libs-xslfo" unless="libs.available.xslfo" if="include.module.xslfo.config">
   <echo message="Load: ${include.module.xslfo}"/>
   <echo message="------------------------------------------------------"/>
   <echo message="Downloading libraries required by the xsl-fo module"/>
   <echo message="------------------------------------------------------"/>

   <!-- Download the Apache FOP Processor from the Apache Web Site-->
   <get src="${include.module.xslfo.url}" dest="fop-1.0-bin.zip" verbose="true" usetimestamp="true" />
		<unzip src="fop-1.0-bin.zip" dest="${top.dir}/${lib.user}">
			<patternset>
				<include name="fop-1.0/build/fop.jar"/>
				<include name="fop-1.0/lib/batik-all-1.7.jar"/>
				<include name="fop-1.0/lib/xmlgraphics-commons-1.3.1.jar"/>
			</patternset>
			<mapper type="flatten"/>
		</unzip>
   <delete file="fop-1.0-bin.zip"/>
</target>

Sample Transcript

edit

The following is a sample transcript:

prepare-xslfo:
     [echo] Load: true
     [echo] ------------------------------------------------------
     [echo] Downloading libraries required by the xsl-fo module
     [echo] ------------------------------------------------------
    [fetch] Getting: http://apache.cs.uu.nl/dist/xmlgraphics/fop/binaries/fop-1.0-bin.zip
    [fetch] To: C:\DOCUME~1\DANMCC~1\LOCALS~1\Temp\FetchTask8407348433221748527tmp
    [fetch] ....................................................
    [fetch] ....................................................
    [fetch] ....................................................
    [fetch] ....................................................
    [fetch] ....................................................
    [fetch] ....................................................
    [fetch] ....................................................
    [fetch] ....................................................
    [fetch] ....................................................
    [fetch] ....................................................
    [fetch] ....................................................
    [fetch] ....................................................
    [fetch] ....................................................
    [fetch] ....................................................
    [fetch] ....................
    [fetch] Expanding: C:\DOCUME~1\DANMCC~1\LOCALS~1\Temp\FetchTask8407348433221748527tmp into C:\ws\exist-trunk\lib\us

At the end of this process you should see the following three jar files in your $EXIST_HOME/lib/extensions folder:

  cd $EXIST_HOME/lib/extensions
  $ ls -l
  -rwxrwxrwx+ 1 Dan McCreary None 3318083 2010-12-10 09:23 batik-all-1.7.jar
  -rwxrwxrwx+ 1 Dan McCreary None 3079811 2010-12-10 09:23 fop.jar
  -rwxrwxrwx+ 1 Dan McCreary None  569113 2010-12-10 09:23 xmlgraphics-commons-1.4.jar

If you do not see these files you can manually copy them from the a download of the XSL-FO binaries.

Now go to the $EXIST_HOME directory and type "build". You should not see any error messages. If you do got to the build file and fix or remove the errors.

After you reboot you should be able to see the XSL-FO convert the file into a PDF file.

Notes on installing RenderX XSL-FO Processors

edit

RenderX is a commercial FOP processor that is used in place of the Apache FOP processor.

Edit Config files

edit

On exist 1.4 you must enable include.module.xslfo = true in extensions/build.properties and run "build.sh" or "build.bat" This step is not necessary if you run the 2.0 release.

Edit conf.xml and comment out the reference to the default Apache xslfo module. Change the module to use RenderX as follows:

<module uri="http://exist-db.org/xquery/xslfo" class="org.exist.xquery.modules.xslfo.XSLFOModule">
  <parameter name="processorAdapter" value="org.exist.xquery.modules.xslfo.RenderXXepProcessorAdapter"/>
</module>

Copy RenderX jar files

edit

Copy all .jar from XEP/lib into $EXIST_HOME/lib/user

Restart eXist-db

edit

Restart the eXist database.

Test

edit

change your XQuery to include the xep configuration as an XML element and pass it to the render function:

let $pdf := xslfo:render(fo:main($id), "application/pdf", (), $config)
return
    response:stream-binary($pdf, "media-type=application/pdf", $id || ".pdf")

In $config you need to make sure the path to the license and fonts points to the correct location on your disk.

Using Config File for External References

edit

When you reference an image you must either use an absolute reference and make sure that the server has read access or you must use a relative path reference. The root of relative path references can be set in the xslfo config file.

xquery version "1.0";
declare namespace fo="http://www.w3.org/1999/XSL/Format";
declare namespace xslfo="http://exist-db.org/xquery/xslfo";

let $fop-config :=
<fop version="1.0">
   <!-- Base URL for resolving relative URLs -->
   <base>http://localhost:8080/exist/rest/db/nosql/pdf/images</base>
</fop>

let $fo := doc('/db/test/xslfo/fo-templates/sample-fo-file-with-external-references.fo')
let $pdf := xslfo:render($fo, "application/pdf", (), $fop-config)
 
return response:stream-binary($pdf, "application/pdf", "output.pdf")

You many not want to hardcode your hostname and port and context. To make this work on any host, port and context you can use the following code to build your FOP base:

let $get-server-name := request:get-server-name()
let $port := xs:string(request:get-server-port())
let $conditional-port :=
   if ($port = '80') then () else concat(':', $port)
let $get-context-path := request:get-context-path()

let $fop-config :=
<fop version="1.0">
   <!-- Base URL for resolving relative URLs -->
   <base>http://{$get-server-name}{$conditional-port}{request:get-context-path()}/rest/db/nosql/resources/images</base>
</fop>

Now you can use the following FOP template to generate your PDF.

<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <fo:layout-master-set>
        <fo:simple-page-master master-name="my-page">
            <fo:region-body margin="0.5in"/>
        </fo:simple-page-master>
    </fo:layout-master-set>
    <fo:page-sequence master-reference="my-page">
        <fo:flow flow-name="xsl-region-body">
            <fo:block>Test of external SVG reference
            </fo:block>
            <fo:block>
                SVG Chart Test
                <fo:external-graphic content-width="7.5in" scaling="uniform" src="url(my-test-image.png)"/>
                content-width="7.5in"
                scaling="uniform"
                src="url(chart.svg)"
            </fo:block>
        </fo:flow>
    </fo:page-sequence>
</fo:root>

Including SVG Images in your PDF files

edit

When you create PDF documents you have the ability to include "line art" directly in the PDF files that have use the SVG format.

There are some translation issues from SVG to PDF but much of the line-art converts very well.

To get SVG rendering to work within eXist you must also load the Sun AWT libs if you reference SVG images.

http://xmlgraphics.apache.org/fop/0.95/graphics.html#batik

Which says you must tell Java to force-load the awt libraries when the JVM starts up:

  -Djava.awt.headless=true

In your $EXIST_HOME/startup.bat or $EXIST_HOME/startup.sh you will need to add the following:

  set JAVA_OPTS="-Xms128m -Xmx512m -Dfile.encoding=UTF-8 -Djava.endorsed.dirs=%JAVA_ENDORSED_DIRS% -Djava.awt.headless=true"

If you are using the "wrapper" tool to start your sever you will need to add the following lines to the $EXIST_HOME/tools/wrapper/conf/wrapper.conf

  # make AWT load the fonts for SVG rendering inside of XSLFO
  wrapper.java.additional.6=-Djava.awt.headless=true

Using Inline SVG

edit

One of easy ways to test your configuration is to use an inline reference to an SVG file. You can do this by using the fo:instream-foreign-object element. The following is an example of this.

<fo:block>
     Test of inline SVG reference.
     <fo:block>
        <fo:instream-foreign-object content-width="7.5in" scaling="uniform">
          <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" height="200" width="200">
               <circle cx="100" cy="100" r="40" stroke="black" stroke-width="2" fill="blue"/>
           </svg>
        </fo:instream-foreign-object>
   </fo:block>
   content-width="7.5in"
   scaling="uniform"
</fo:block>

Sample External SVG Reference

edit

Note this assumes you have configured your <base> URL in the FOP configuration file.

<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <fo:layout-master-set>
        <fo:simple-page-master master-name="my-page">
            <fo:region-body margin="0.5in"/>
        </fo:simple-page-master>
    </fo:layout-master-set>
    <fo:page-sequence master-reference="my-page">
        <fo:flow flow-name="xsl-region-body">
            <fo:block>Test of external SVG reference</fo:block>
            <fo:block>
                SVG Chart Test
                <fo:external-graphic content-width="7.5in" scaling="uniform" src="url(chart.svg)"/>
                content-width="7.5in"
                scaling="uniform"
                src="url(chart.svg)"
            </fo:block>
        </fo:flow>
    </fo:page-sequence>
</fo:root>

Adding Support for Equations

edit

Formatting LaTeX Equations in XSL-FO

edit

Latex is a non-xml language for used for typesetting documents that have mathematical equations. Despite its unusual non-markup syntax, LaTeX is still popular in many mathematics, and physics publications. XSL-FO includes an extension package that allows LaTeX equations to be added to XSL-FO documents. To use the package you must add two jar files to the $EXIST_HOME/lib/extensions, reboot eXist and then add the appropriate syntax to your XSL-FO document.

Installation Steps

edit

From this site: http://forge.scilab.org/index.php/p/jlatexmath/downloads/

Copy the following files into your $EXIST_HOME/lib/extensions

  • jlatexmath-1.0.3.jar
  • jlatexmath-fop-1.0.3.jar

Then restart your eXist server so the jar files are loaded.

Then add the following code to your FO

<fo:block>To pass, you should see a symbols for 2/4 = 1/2</fo:block>
<fo:block>
   <fo:instream-foreign-object>
      <latex xmlns="http://forge.scilab.org/p/jlatexmath">\frac{2}{4}=\frac{1}{2}</latex>
   </fo:instream-foreign-object>
</fo:block>

Note that the XSL-FO software does not automatically pull fonts out of the config file. To force fonts to load into RAM you will need to add the following auto-detect element to your fop configuration file.

<fop version="1.0">
    <renderers>
        <renderer mime="application/pdf">
            <filterList>
                <value>flate</value>
            </filterList>
            <fonts>
                <auto-detect/>
            </fonts>
        </renderer>
    </renderers>
</fop>
 
output of unit test

Math ML Equation Support

edit

Note: this item is not complete yet.

Although Latex is a common way to represent equations, the Math Markup Language also will work.

There are also hints that http://jeuclid.sourceforge.net/ works

This has not yet been tested.

Notes

edit

See XSL-FO Tables and XSL-FO Images on how to add print quality tables and charts to your document.

When you follow trunk, sometimes conf.xml gets reset to the defaults, and you have to reenable xslfo processing in conf.xml. The error printed if you miss this reads like that: "cannot compile xquery: err:xpst0017 call to undeclared function: xslfo:render".

Instructions for RenderX

edit

Updated steps from Wolfgang on Jan 6th 2014:

  • copy license.xml to EXIST_HOME
  • copy x4u.jar, xep.jar and xt.jar from xep into EXIST_HOME/lib/user
  • edit conf.xml to change the XSL FO driver:
<module uri="http://exist-db.org/xquery/xslfo" class="org.exist.xquery.modules.xslfo.XSLFOModule">
    <parameter name="processorAdapter" value="org.exist.xquery.modules.xslfo.RenderXXepProcessorAdapter“/>
</module>
  • restart eXist for the jars to be loaded
  • upload xep.xml from the xep directory into a collection in eXist (e.g. /db)
  • edit xep.xml and change the base directory for xep fonts. it should point to the xep install directory:
    <fonts xml:base="/Users/wolf/Source/renderx/fonts/" default-family="Arial“>
  • If you are on a Mac, you may have to change change the directory for the Arial font further down in the file:
    <font-group xml:base="file:/Library/Fonts/" label="Windows TrueType" embed="true" subset="true“>
  • call xep in your XQuery as follows:
let $id := request:get-parameter("id", ())
let $config := util:expand(doc("/db/xep.xml")/*)
let $pdf := xslfo:render(fo:main($id), "application/pdf", (), $config)
return
    response:stream-binary($pdf, "media-type=application/pdf", $id || ".pdf“)

The util:expand trick is required because xslfo:render expects an in-memory DOM element for $config (this should probably be fixed).

Note: xep prints error messages to stdout, so you usually don’t see them. I was running eXist via the launcher, so I opened the „Tool Window“ via the system tray menu and clicked on „Show console messages“.

Updated steps from Kevin Brown (RenderX) on May 17th 2017:

Instead of pulling apart the installation of RenderX, you can edit the master configuration file of RenderX to resolve all other files you may need, including the license file. So, an installation for RenderX is easy if you follow these steps:

  • Install RenderX in any directory you wish or if RenderX is already installed, make note of the directory. Example: an installation on Windows may be in "C:\Program Files\RenderX\XEP"
  • Copy "xep.jar" from the installation of RenderX to the installation of exist-db's "/lib/user" directory. Note from the above installation notes, you do not need "x4u.jar" only "xep.jar" is required. If you want validation of the XSL FO to be reported, then you also need "xt.jar". The files "xep.jar" and "xt.jar"would be located in the "/lib" directorty of the RenderX installation. In the above example this would be "C:\Program Files\RenderX\XEP\lib"
  • Insert "xep.xml" into your database. "xep.xml" is the RenderX configuration file located in the root of the RenderX installation.
  • Edit "xep.xml" in the database and change the "config" element to add an "xml:base" attribute that points to the installation of RenderX on disk. This one step will then allow all the other files (like "license.xml", "rolemap.xml" and other things like hyphenation and fonts to be found as they are all relative to the "xml:base" of the configuration. Given the above example install, I would have the following root element in the "xep.xml" that is inside the database:

<config xmlns="http://www.renderx.com/XEP/config" xml:base="file:/C:/Program Files/RenderX/XEP/">

  • Optionally, if you did not copy "xt.jar" or do not want validation, add the following to "xep.xml":

<option name="VALIDATE" value="false"/>

  • Edit "conf.xml" as covered above, restart the database and format away.

An installation such as this essentially means that an external installation of RenderX and the installation with exist-db start as essentially the same, sharing the same fonts, hyphenations, license and other files. Of course, if you update "xep.xml" on disk, you need to update in the database also. Or if you update RenderX, you need to copy over the updated "xep.jar" into the exist-db installation.

Acknowledgments

edit

The user Dmitriy has been helpful in the creation of the procedure for installation on systems that do not have source code. Wolfgang has also added feedback on the RenderX instructions for eXist 2.1 Josef Karthauser helped with getting LaTeX equations to render correctly within PDF documents.

Discussion

edit

The steps to enable the FOP module should be listed somewhere in the eXist administrative site and removed from this Wikibook.

The RenderX instructions can be greatly simplified. The root element of the RenderX configuration file ("xep.xml") takes "xml:base" as an argument. If you have RenderX installed in some directory -- let's say for example in "C:\Program Files\RenderX\XEP" on your system, then after importing "xep.xml", you can just edit the root element in the imported version like this:

<config xmlns="http://www.renderx.com/XEP/config" xml:base="file:/C:/Program Files/RenderX/XEP/">

All other references in the "xep.xml" would be relative to this and as such, does not require you to move anything at all (like putting license.xml in EXIST_HOME or image directories or fonts or anything.

Also note that "X4U.jar" should not be required as it only is for the GUI and "xt.jar" is not required unless you set validate to true in "xep.xml". Only "xep.jar" is required.