In the case of conversion of documents into SGML/XML, the original file must be written using a structuring style sheet (template). But students often give a lot of importance to the layout of the final document and make many personalizations and adaptations on the original structuring style sheet. It is our duty to also convert also this aspect of their work. The conversion tools devoted to the document content have to be completed by conversion tools that may generate rendering style-sheets : typically XSL files for XML documents.
As SGML or XML cannot be directly read by users using today's browsers (Opera, Netscape, Internet Explorer), it is necessary either to provide layout information in the form of a stylesheet or to transform those highly structured documents into layout or printable version in HTML, PDF,PS, which can be more easily used by users to read the documents.
Generally, universities have 2 possibilities to cope with the demand for layout:
- Try to preserve all layout information the author (student) has given to the MS Word or other document. This strategy involves the author directly by the personalization of a standard style sheet for his/her work.
- Universities could decide to develop one single style sheet or a choice of certain style sheets usable for all ETDs. This would support a corporate design and solve the layout question on a more general level, leaving it to the ETD production department.
Style Sheets for SGML or XML documents
The development of such tools is being achieved in France (collaboration between Lyon 2 and Marne-la-Vallée) for documens produced in MS Word and other RTF compatible authoring tools. Theirs are based on the analysis and the extraction of the typographic characteristics associated to each used style.
The equivalent tools shall be easy to develop for document produced with LateX, as this kind of language natively uses the notion of rendering style-sheets.
Style sheet languages for XML are:
- Cascading Style Sheets (CSS )
- Extensible Style Language (XSL) .
As CSS are not powerful enough to handle the complexity and demand for large XML documents as theses and dissertations, this is not advisable.
Substandards within XSL Standard
Within the XSL standard, one distinguishes between several sub-standards:
- Extensible Style Sheet Transformation (XSLT). This part allows the user to produce stylesheets that act like small programs. They transform the original document, that is always valid against a specified DTD, into a document that either follows another DTD (which allows an easier rendering within browsers, as HTML.dtd) or allows a transformation of the document into other document description languages such as Rich Text Format (RTF), LaTeX, PDF. From those formats the production of a printed version may be possible.
- XPath allows to authors to build expressions that link to other XML documents, but not only to the whole document. The citation of a certain subsection and the citation of e.g. section 3 to 5 might be possible once common browsers support this linking technology.
- XSL:Fo ( Formatting vocabulary, that can be applied to the nodes of an XML document)
Example for a Printing On Demand Service (POD) based on the use of stylesheets
Digital Libraries which use their document servers as long term electronic archives will not make printed information dispensable. On the contrary: for users of these information systems the desire for printed documents is increasing. In most cases this desire often focuses not on the whole document as such, but on particular parts of it like chapters, citations and so on. For that reason the Humboldt-University Berlin's printing on demand project aims toward the development of a technology which allows the users to print the only the desired part of a certain document.
Printing on Demand with XML
For the printing on demand component with XML the usage of Apache/Cocoon was chosen. This software uses an XSLT-engine to produce an HTML or PDF-Version on the fly. "The Cocoon Project is an Open Source volunteer project under the auspices of the Apache Software Foundation (ASF), and, in harmony with the Apache webserver itself, it is released under a very open license. Even if the most common use of Cocoon is the automatic creation of HTML through the processing of statically or dynamically generated XML files, Cocoon is also able to perform more sophisticated formatting, such as XSL:FO rendering to PDF files, client-dependent transformations such as WML formatting for WAP-enabled devices, or direct XML serving to XML and XSL aware clients. "
As Cocoon does not consist of a printing on demand component (especially a selection feature) a small workaround using different XSLT-stylesheets had to be created. The users view, containing an HTML-view onto the actual document includes check boxes which the user can use to select parts of a specific document. This view is produced by the XSLT-Broker stylesheet which calls a default stylesheet, that produces HTML (XSLT-Stylesheet with option document.xml?format=html). If the user selects certain parts of the document by clicking in the checkboxes, by clicking on the "OK" button a perl script (PHP-Choise) is called. This script selects the desired chapters and sections of the document by using XPath-expressions (http://dochost.rz.huberlin.de/proprint/bsp/slides.xml?CHAPTER=3 and http://dochost.rz.huberlin.de/proprint/bsp/slides.xml?CHAPTER=4 ) cuts those parts out of the document and holds them in the main memory. This procedure is carried out by the XSLT-Broker-stylesheet that has now been called with the XML-option (document.xml?format=xml). These parts are added to one single XML-document (all in the main memory!) and processed by the XSLT Broker-stylesheet either with the print option or the HTML option (document.xml?format=pdf or document.xml?format=html)
Next Section: Metadata, cross walks