Last modified on 23 June 2014, at 12:00

XQuery/Generating PDF from XSL-FO files

MotivationEdit

You want to generate documents with precise page layout from XML documents, for example to PDF.

ApproachEdit

Typically, the steps required to generate a PDF document are:

  • retrieve or compute the base XML document
  • transform XML file to XSL-FO markup, perhaps using XQuery typeswitch or XSL
  • transform the XSL-FO to PDF using the free Apache FOP or a commercial FOP rendering engine such as

http://www.renderx.com/ RenderX] or Antennahouse

MethodEdit

We will use a built-in eXist function to convert XSL-FO file into PDF. (See Installing the XSL-FO module if this module is not installed and configured.)

Using the xslfo:render() functionEdit

The function is the xslfo:render(). It has the following structure:

  let $pdf-binary := xslfo:render($input-xml-fo-document, 'application/pdf', $parameters)

or if you use a XSL-FO configuration file:

  let $pdf-binary := xslfo:render($input-xml-fo-document, 'application/pdf', $parameters, $fo-config-file)

This file can be saved directly to the XML file system. It will be stored as a non-searchable binary document.

You can then view this directly by providing a link to the file or you can send it directly to the browser by using the response:stream-binary() function as follows:

  return response:stream-binary($pdf-binary, 'application/pdf', 'output.pdf')

Example XQuery to Generate PDFEdit

The following program will generate a PDF document with the text "Hello World".

xquery version "1.0";
declare namespace fo="http://www.w3.org/1999/XSL/Format";
declare namespace xslfo="http://exist-db.org/xquery/xslfo";
 
let $fo :=
<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <fo:layout-master-set>
        <fo:simple-page-master master-name="my-page">
            <fo:region-body margin="1in"/>
        </fo:simple-page-master>
    </fo:layout-master-set>
    <fo:page-sequence master-reference="my-page">
        <fo:flow flow-name="xsl-region-body">
            <fo:block>Hello World2!</fo:block>            
        </fo:flow>
    </fo:page-sequence>
</fo:root>
 
let $pdf := xslfo:render($fo, "application/pdf", ())
 
return response:stream-binary($pdf, "application/pdf", "output.pdf")

Execute

Notes on Installing Apache FOP ProcessorEdit

Enabling the XSL-FO ModuleEdit

You will need a module that converts XSL-FO to PDF. Examples of these are:

  1. The Apache FOP processor (free open source)
  2. The Antenna House FOP processor (commercial) http://www.antennahouse.com/
  3. The RenderX FTP processor (commercial) http://www.renderx.com/

Make sure that the module extension is loaded. You can do this by going to the $EXIST_HOME/conf.xml file and un-commenting the following line (around line 769):

<module class="org.exist.xquery.modules.xslfo.XSLFOModule"
        uri="http://exist-db.org/xquery/xslfo">
        <parameter name="processorAdapter" value="org.exist.xquery.modules.xslfo.ApacheFopProcessorAdapter"/>
</module

Where the possible values for the processorAdapter parameter are:

  org.exist.xquery.modules.xslfo.ApacheFopProcessorAdapter for Apache's FOP


If the module is correctly loaded then you should see it in the function documentation.

Make sure that you have correctly edited the $EXIST_HOME/extensions/build.properties to set XSLFO to to be true:

Change:

  # XSL FO transformations (Uses Apache FOP)
  include.module.xslfo = false

To be:

  include.module.xslfo = true

After you change BOTH these files you will need to run the "build.sh" or "build.bat" program in your $EXIST_HOME to get the new FOP binaries in the jar files.

Make sure that the build file can get access to the correct fop.jar file from the Apache web site.

Automatically Downloading The Apache XSL-FO Jar FilesEdit

Exist comes with a sample ant task that can automatically download the FOP distribution zip file, extract the tree jar files we need and remove the rest. Here is the ant target from the eXist 1.4 $EXIST_HOME/modules/build.xml

<target name="prepare-libs-xslfo" unless="libs.available.xslfo" if="include.module.xslfo.config">
   <echo message="Load: ${include.module.xslfo}"/>
   <echo message="------------------------------------------------------"/>
   <echo message="Downloading libraries required by the xsl-fo module"/>
  <echo message="------------------------------------------------------"/>
   <!-- Apache FOP .95 -->
   <get src="${include.module.xslfo.url}" dest="fop-0.95-bin.zip" verbose="true" usetimestamp="true" />
      <unzip src="fop-0.95-bin.zip" dest="${top.dir}/${lib.user}">
         <patternset>
		<include name="fop-0.95/build/fop.jar"/>
			 <include name="fop-0.95/lib/batik-all-1.7.jar"/>
			 <include name="fop-0.95/lib/xmlgraphics-commons-1.3.1.jar"/>
		 </patternset>
		 <mapper type="flatten"/>
      </unzip>
   <delete file="fop-0.95-bin.zip"/>
</target>

Note that fop 1.0 is now available so you can change this task to be the following:

<target name="prepare-libs-xslfo" unless="libs.available.xslfo" if="include.module.xslfo.config">
   <echo message="Load: ${include.module.xslfo}"/>
   <echo message="------------------------------------------------------"/>
   <echo message="Downloading libraries required by the xsl-fo module"/>
   <echo message="------------------------------------------------------"/>
 
   <!-- Download the Apache FOP Processor from the Apache Web Site-->
   <get src="${include.module.xslfo.url}" dest="fop-1.0-bin.zip" verbose="true" usetimestamp="true" />
		<unzip src="fop-1.0-bin.zip" dest="${top.dir}/${lib.user}">
			<patternset>
				<include name="fop-1.0/build/fop.jar"/>
				<include name="fop-1.0/lib/batik-all-1.7.jar"/>
				<include name="fop-1.0/lib/xmlgraphics-commons-1.3.1.jar"/>
			</patternset>
			<mapper type="flatten"/>
		</unzip>
   <delete file="fop-1.0-bin.zip"/>
</target>

Sample TranscriptEdit

The following is a sample transcript:

prepare-xslfo:

    [echo] Load: true
    [echo] ------------------------------------------------------
    [echo] Downloading libraries required by the xsl-fo module
    [echo] ------------------------------------------------------
   [fetch] Getting: http://apache.cs.uu.nl/dist/xmlgraphics/fop/binaries/fop-1.0-bin.zip
   [fetch] To: C:\DOCUME~1\DANMCC~1\LOCALS~1\Temp\FetchTask8407348433221748527tmp
   [fetch] ....................................................
   [fetch] ....................................................
   [fetch] ....................................................
   [fetch] ....................................................
   [fetch] ....................................................
   [fetch] ....................................................
   [fetch] ....................................................
   [fetch] ....................................................
   [fetch] ....................................................
   [fetch] ....................................................
   [fetch] ....................................................
   [fetch] ....................................................
   [fetch] ....................................................
   [fetch] ....................................................
   [fetch] ....................
   [fetch] Expanding: C:\DOCUME~1\DANMCC~1\LOCALS~1\Temp\FetchTask8407348433221748527tmp into C:\ws\exist-trunk\lib\us

At the end of this process you should see the following three jar files in your $EXIST_HOME/lib/extensions folder:

  cd $EXIST_HOME/lib/extensions
  $ ls -l
  -rwxrwxrwx+ 1 Dan McCreary None 3318083 2010-12-10 09:23 batik-all-1.7.jar
  -rwxrwxrwx+ 1 Dan McCreary None 3079811 2010-12-10 09:23 fop.jar
  -rwxrwxrwx+ 1 Dan McCreary None  569113 2010-12-10 09:23 xmlgraphics-commons-1.4.jar

If you do not see these files you can manually copy them from the a download of the XSL-FO binaries.

Now go to the $EXIST_HOME directory and type "build". You should not see any error messages. If you do got to the build file and fix or remove the errors.

After you reboot you should be able to see the XSL-FO convert the file into a PDF file.

Notes on installing RenderX XSL-FO ProcessorsEdit

RenderX is a commercial FOP processor that is used in place of the Apache FOP processor.

Edit Config filesEdit

On exist 1.4 you must enable include.module.xslfo = true in extensions/build.properties and run "build.sh" or "build.bat" This step is not necessary if you run the 2.0 release.

Edit conf.xml and comment out the reference to the default Apache xslfo module. Change the module to use RenderX as follows:

<module uri="http://exist-db.org/xquery/xslfo" class="org.exist.xquery.modules.xslfo.XSLFOModule">
  <parameter name="processorAdapter" value="org.exist.xquery.modules.xslfo.RenderXXepProcessorAdapter"/>
</module>

Copy RenderX jar filesEdit

Copy all .jar from XEP/lib into $EXIST_HOME/lib/user

Restart eXist-dbEdit

Restart the eXist database.

TestEdit

change your XQuery to include the xep configuration as an XML element and pass it to the render function:

let $pdf := xslfo:render(fo:main($id), "application/pdf", (), $config)
return
    response:stream-binary($pdf, "media-type=application/pdf", $id || ".pdf")

In $config you need to make sure the path to the license and fonts points to the correct location on your disk.

Using Config File for External ReferencesEdit

When you reference an image you must either use an absolute reference and make sure that the server has read access or you must use a relative path reference. The root of relative path references can be set in the xslfo config file.

xquery version "1.0";
declare namespace fo="http://www.w3.org/1999/XSL/Format";
declare namespace xslfo="http://exist-db.org/xquery/xslfo";
 
let $fop-config :=
<fop version="1.0">
   <!-- Base URL for resolving relative URLs -->
   <base>http://localhost:8080/exist/rest/db/nosql/pdf/images</base>
</fop>
 
let $fo := doc('/db/test/xslfo/fo-templates/sample-fo-file-with-external-references.fo')
let $pdf := xslfo:render($fo, "application/pdf", (), $fop-config)
 
return response:stream-binary($pdf, "application/pdf", "output.pdf")

You many not want to hardcode your hostname and port and context. To make this work on any host, port and context you can use the following code to build your FOP base:

let $get-server-name := request:get-server-name()
let $port := xs:string(request:get-server-port())
let $conditional-port :=
   if ($port = '80') then () else concat(':', $port)
let $get-context-path := request:get-context-path()
 
let $fop-config :=
<fop version="1.0">
   <!-- Base URL for resolving relative URLs -->
   <base>http://{$get-server-name}{$conditional-port}{request:get-context-path()}/rest/db/nosql/resources/images</base>
</fop>

Now you can use the following FOP template to generate your PDF.

<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <fo:layout-master-set>
        <fo:simple-page-master master-name="my-page">
            <fo:region-body margin="0.5in"/>
        </fo:simple-page-master>
    </fo:layout-master-set>
    <fo:page-sequence master-reference="my-page">
        <fo:flow flow-name="xsl-region-body">
            <fo:block>Test of external SVG reference
            </fo:block>
            <fo:block>
                SVG Chart Test
                <fo:external-graphic content-width="7.5in" scaling="uniform" src="url(my-test-image.png)"/>
                content-width="7.5in"
                scaling="uniform"
                src="url(chart.svg)"
            </fo:block>
        </fo:flow>
    </fo:page-sequence>
</fo:root>

Including SVG Images in your PDF filesEdit

When you create PDF documents you have the ability to include "line art" directly in the PDF files that have use the SVG format.

There are some translation issues from SVG to PDF but much of the line-art converts very well.

To get SVG rendering to work within eXist you must also load the Sun AWT libs if you reference SVG images.

http://xmlgraphics.apache.org/fop/0.95/graphics.html#batik

Which says you must tell Java to force-load the awt libraries when the JVM starts up:

  -Djava.awt.headless=true

In your $EXIST_HOME/startup.bat or $EXIST_HOME/startup.sh you will need to add the following:

  set JAVA_OPTS="-Xms128m -Xmx512m -Dfile.encoding=UTF-8 -Djava.endorsed.dirs=%JAVA_ENDORSED_DIRS% -Djava.awt.headless=true"

If you are using the "wrapper" tool to start your sever you will need to add the following lines to the $EXIST_HOME/tools/wrapper/conf/wrapper.conf

  # make AWT load the fonts for SVG rendering inside of XSLFO
  wrapper.java.additional.6=-Djava.awt.headless=true

Using Inline SVGEdit

One of easy ways to test your configuration is to use an inline reference to an SVG file. You can do this by using the fo:instream-foreign-object element. The following is an example of this.

<fo:block>
     Test of inline SVG reference.
     <fo:block>
        <fo:instream-foreign-object content-width="7.5in" scaling="uniform">
          <svg xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" height="200" width="200">
               <circle cx="100" cy="100" r="40" stroke="black" stroke-width="2" fill="blue"/>
           </svg>
        </fo:instream-foreign-object>
   </fo:block>
   content-width="7.5in"
   scaling="uniform"
</fo:block>

Sample External SVG ReferenceEdit

Note this assumes you have configured your <base> URL in the FOP configuration file.

<fo:root xmlns:fo="http://www.w3.org/1999/XSL/Format">
    <fo:layout-master-set>
        <fo:simple-page-master master-name="my-page">
            <fo:region-body margin="0.5in"/>
        </fo:simple-page-master>
    </fo:layout-master-set>
    <fo:page-sequence master-reference="my-page">
        <fo:flow flow-name="xsl-region-body">
            <fo:block>Test of external SVG reference</fo:block>
            <fo:block>
                SVG Chart Test
                <fo:external-graphic content-width="7.5in" scaling="uniform" src="url(chart.svg)"/>
                content-width="7.5in"
                scaling="uniform"
                src="url(chart.svg)"
            </fo:block>
        </fo:flow>
    </fo:page-sequence>
</fo:root>

Adding Support for EquationsEdit

Formatting LaTeX Equations in XSL-FOEdit

Latex is a non-xml language for used for typesetting documents that have mathematical equations. Despite its unusual non-markup syntax, LaTeX is still popular in many mathematics, and physics publications. XSL-FO includes an extension package that allows LaTeX equations to be added to XSL-FO documents. To use the package you must add two jar files to the $EXIST_HOME/lib/extensions, reboot eXist and then add the appropriate syntax to your XSL-FO document.

Installation StepsEdit

From this site: http://forge.scilab.org/index.php/p/jlatexmath/downloads/

Copy the following files into your $EXIST_HOME/lib/extensions

  • jlatexmath-1.0.3.jar
  • jlatexmath-fop-1.0.3.jar

Then restart your eXist server so the jar files are loaded.

Then add the following code to your FO

<fo:block>To pass, you should see a symbols for 2/4 = 1/2</fo:block>
<fo:block>
   <fo:instream-foreign-object>
      <latex xmlns="http://forge.scilab.org/p/jlatexmath">\frac{2}{4}=\frac{1}{2}</latex>
   </fo:instream-foreign-object>
</fo:block>

Note that the XSL-FO software does not automatically pull fonts out of the config file. To force fonts to load into RAM you will need to add the following auto-detect element to your fop configuration file.

<fop version="1.0">
    <renderers>
        <renderer mime="application/pdf">
            <filterList>
                <value>flate</value>
            </filterList>
            <fonts>
                <auto-detect/>
            </fonts>
        </renderer>
    </renderers>
</fop>
output of unit test

Math ML Equation SupportEdit

Note: this item is not complete yet.

Although Latex is a common way to represent equations, the Math Markup Language also will work.

There are also hints that http://jeuclid.sourceforge.net/ works

This has not yet been tested.

NotesEdit

See XSL-FO Tables and XSL-FO Images on how to add print quality tables and charts to your document.

When you follow trunk, sometimes conf.xml gets reset to the defaults, and you have to reenable xslfo processing in conf.xml. The error printed if you miss this reads like that: "cannot compile xquery: err:xpst0017 call to undeclared function: xslfo:render".

Instructions for RenderXEdit

Updated steps from Wolfgang on Jan 6th 2014:

  • copy license.xml to EXIST_HOME
  • copy x4u.jar, xep.jar and xt.jar from xep into EXIST_HOME/lib/user
  • edit conf.xml to change the XSL FO driver:
<module uri="http://exist-db.org/xquery/xslfo" class="org.exist.xquery.modules.xslfo.XSLFOModule">
    <parameter name="processorAdapter" value="org.exist.xquery.modules.xslfo.RenderXXepProcessorAdapter“/>
</module>
  • restart eXist for the jars to be loaded
  • upload xep.xml from the xep directory into a collection in eXist (e.g. /db)
  • edit xep.xml and change the base directory for xep fonts. it should point to the xep install directory:
    <fonts xml:base="/Users/wolf/Source/renderx/fonts/" default-family="Arial“>
  • If you are on a Mac, you may have to change change the directory for the Arial font further down in the file:
    <font-group xml:base="file:/Library/Fonts/" label="Windows TrueType" embed="true" subset="true“>
  • call xep in your XQuery as follows:
let $id := request:get-parameter("id", ())
let $config := util:expand(doc("/db/xep.xml")/*)
let $pdf := xslfo:render(fo:main($id), "application/pdf", (), $config)
return
    response:stream-binary($pdf, "media-type=application/pdf", $id || ".pdf“)

The util:expand trick is required because xslfo:render expects an in-memory DOM element for $config (this should probably be fixed).

Note: xep prints error messages to stdout, so you usually don’t see them. I was running eXist via the launcher, so I opened the „Tool Window“ via the system tray menu and clicked on „Show console messages“.

AcknowledgmentsEdit

The user Dmitriy has been helpful in the creation of the procedure for installation on systems that do not have source code. Wolfgang has also added feedback on the RenderX instructions for eXist 2.1 Josef Karthauser helped with getting LaTeX equations to render correctly within PDF documents.

DiscussionEdit

The steps to enable the FOP module should be listed somewhere in the eXist administrative site and removed from this Wikibook.