XML - Managing Data Exchange/A single entity



Previous Chapter Next Chapter
Introduction to XML Basic data structures




Learning objectives


  • introduce XML documents, schemas, and stylesheets
  • describe and create an XML document
  • describe and create an XML schema
  • describe and create an XML stylesheet


Introduction

edit

In this chapter, we start to practice working with XML using XML documents, schemas, and stylesheets. An XML document organizes data and information in a structured, hierarchical format. An XML schema provides standards and rules for the structure of a given XML document. An XML schema also enables data transfer. An XSL (XML stylesheet) allows unique presentations of the material found within an XML document.

In the first chapter, Introduction to XML, you learned what XML is, why it is useful, and how it is used. So, now you want to create your very own XML documents. In this chapter, we will show you the basic components used to create an XML document. This chapter is the foundation for all subsequent chapters--it is a little lengthy, but don't be intimidated. We will take you through the fundamentals of XML documents.


This chapter is divided into three parts:

  • XML Document
  • XML Schema
  • XML Stylesheets (XSL)


As you learned in the previous chapter, the XML Schema and Stylesheet are essentially specialized XML Documents. Within each of these three parts we will examine the layout and components required to create the document. There are links at the end of the XML document, schema, and stylesheet sections that show you how to create the documents using an XML editor. At the bottom of the page there is a link to Exercises for this chapter and a link to the Answers.

The first thing you will need before starting to create XML documents is a problem--something you want to solve by using XML to store and share data or information. You need some entity you can collect information about and then access in a variety of formats. So, we created one for you.

To develop an XML document and schema, start with a data model depicting the reality of the actual data that is exchanged. Once a high fidelity model has been created, the data model can be readily converted to an XML document and schema. In this chapter, we start with a very simple situation and in successive chapters extend the complexity to teach you more features of XML.

Our starting point is a single entity, CITY, which is shown in the following figure. While our focus is on this single entity, to map CITY to an XML schema, we need to have an entity that contains CITY. In this case, we have created TOURGUIDE. Think of a TOURGUIDE as containing many cities, and in this case TOURGUIDE has no attributes nor an identifier. It is just a container for data about cities.


Exhibit 1: Data model - Tourguide

 


XML document

edit

An XML document is a file containing XML code and syntax. XML documents have an .xml file extension.

We will examine the features & components of the XML document.


  • Prologue (XML Declaration)
  • Elements
  • Attributes
  • Rules to follow
  • Well-formed & Valid XML documents


Below is a sample XML document using our TourGuide model. We will refer to it as we describe the parts of an XML document.

Exhibit 2: XML document for city entity

  <?xml version="1.0" encoding="UTF-8"?>
  <tourGuide xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
    xsi:noNamespaceSchemaLocation='city.xsd'>
    <city>
        <cityName>Belmopan</cityName>
        <adminUnit>Cayo</adminUnit>
        <country>Belize</country>
        <population>11100</population>
        <area>5</area>
        <elevation>130</elevation>
        <longitude>88.44</longitude>
        <latitude>17.27</latitude>
        <description>Belmopan is the capital of Belize</description>
        <history>Belmopan was established following the devastation of the
           former capital, Belize City, by Hurricane Hattie in 1965. High 
           ground and open space influenced the choice and ground-breaking 
           began in 1966.  By 1970 most government offices and operations had 
           already moved to the new location.
        </history>
    </city>
    <city>
        <cityName>Kuala Lumpur</cityName>
        <adminUnit>Selangor</adminUnit>
        <country>Malaysia</country>
        <population>1448600</population>
        <area>243</area>
        <elevation>111</elevation>
        <longitude>101.71</longitude>
        <latitude>3.16</latitude>
        <description>Kuala Lumpur is the capital of Malaysia and the largest 
            city in the nation</description>
        <history>The city was founded in 1857 by Chinese tin miners and  
            preceded Klang.  In 1880 the British government transferred their 
            headquarters from Klang to Kuala Lumpur, and in 1896 it became the 
            capital of Malaysia. 
        </history>
    </city>
    <city>
        <cityName>Winnipeg</cityName>
        <adminUnit>St. Boniface</adminUnit>
        <country>Canada</country>
        <population>618512</population>
        <area>124</area>
        <elevation>40</elevation>
        <longitude>97.14</longitude>
        <latitude>49.54</latitude>
        <description>Winnipeg has two seasons. Winter and Construction.</description>
        <history>The city was founded by people at the forks (Fort Garry)
         trading in pelts with the Hudson Bay Company. Ironically, 
         The Bay was bought by America.
        </history>
    </city>
  </tourGuide>

Prologue (XML declaration)

edit

The XML document starts off with the prologue. The prologue informs both a reader and the computer of certain specifications that make the document XML compliant. The first line is the XML declaration (and the only line in this basic XML document).

Exhibit 3: XML document - prologue

     <?xml version="1.0" encoding="UTF-8"?>

xml   =   this is an XML document
version="1.0"   =   the XML version (XML 1.0 is the W3C-recommended version)
encoding="UTF-8"   =   the character encoding used in the document - UTF 8 corresponds to 8-bit encoded Unicode characters (i.e. the standard way to encode international documents) - Unicode provides a unique number for every character.
Another potential attribute of the XML declaration:
standalone="yes"   =   the dependency of the document ('yes' indicates that the document does not require another document to complete content)

Elements

edit

The majority of what you see in the XML document consists of XML elements. Elements are identified by their tags that open with < or </ and close with > or />. The start tag looks like this: <element attribute="value">, with a left angle bracket (<) followed by the element type name, optional attributes, and finally a right angle bracket (>). The end tag looks like this: </element>, similar to the start tag, but with a slash (/) between the left angle bracket and the element type name, and no attributes.

When there's nothing between a start tag and an end tag, XML allows you to combine them into an empty element tag, which can include everything a start tag can: <img src="Belize.gif" />. This one tag must be closed with a slash and right angle bracket (/>), so that it can be distinguished from a start tag.

The XML document is designed around a major theme, an umbrella concept covering all other items and subjects; this theme is analyzed to determine its component parts, creating categories and subcategories. The major theme and its component parts are described by elements. In our sample XML document, 'tourGuide' is the major theme; 'city' is a category; 'population' is a subcategory of 'city'; and the hierarchy may be carried even further: 'males' and 'females' could be subcategories of 'population'. Elements follow several rules of syntax that will be described in the Rules to Follow section.


We left out the attributes within the <tourGuide> start tag — that part will be explained in the XML Schema section.

Exhibit 4: Elements of the city entity XML document

  <tourGuide>
    <city>
        <cityName>Belmopan</cityName>
        <adminUnit>Cayo</adminUnit>
        <country>Belize</country>
        <population>11100</population>
        <area>5</area>
        <elevation>130</elevation>
        <longitude>88.44</longitude>
        <latitude>17.27</latitude>
        <description>Belmopan is the capital of Belize</description>
        <history>Belmopan was established following the devastation of the
           former capital, Belize City, by Hurricane Hattie in 1965. High 
           ground and open space influenced the choice and ground-breaking 
           began in 1966.  By 1970 most government offices and operations had 
           already moved to the new location.
        </history>
    </city>
  </tourGuide>


Element hierarchy

edit
  • root element  -   This is the XML document's major theme element. Every document must have exactly one and only one root element. All other elements are contained within this one root element. The root element follows the XML declaration. In our example, <tourGuide> is the root element.
  • parent element  -   This is any element that contains other elements, the child elements. In our example, <city> is a parent element.
  • child element  -   This is any element that is contained within another element, the parent element. In our example, <population> is a child element of <city>.
  • sibling element  -   These are elements that share the same parent element. In our example, <cityName>, <adminUnit>, <country>, <population>, <area>, <elevation>, <longitude>, <latitude>, <description>, and <history> are all sibling elements.


Attributes

edit

Attributes aid in modifying the content of a given element by providing additional or required information. They are contained within the element's opening tag. In our sample XML document code we could have taken advantage of attributes to specify the unit of measure used to determine the area and the elevation (it could be feet, yards, meters, kilometers, etc.); in this case, we could have called the attribute 'measureUnit' and defined it within the opening tag of 'area' and 'elevation'.


       <adminUnit class="state">Cayo</adminUnit>
       <adminUnit class="region">Selangor</adminUnit>


The above attribute example can also be written as:


1. using child elements

     <adminUnit>
          <class>state</class>
          <name>Cayo</name>
     </adminUnit>
     <adminUnit>
          <class>region</class>
          <name>Selangor</name>
     </adminUnit>

2. using an empty element

    <adminUnit class="state" name="Cayo" />
    <adminUnit class="region" name="Selangor" />


Attributes can be used to:

  • provide more information that is not defined in the data
  • define a characteristic of the element (size, color, style)
  • ensure the inclusion of information about an element in all instances

Attributes can, however, be a bit more difficult to manipulate and they have some constraints. Consider using a child element if you need more freedom.


Rules to follow

edit

These rules are designed to aid the computer reading your XML document.

  • The first line of an XML document must be the XML declaration (the prologue).
  • The main theme of the XML document is established in the root element and all other elements must be contained within the opening and closing tags of this root element.
  • Every element must have an opening tag and a closing tag - no exceptions

(e.g. <element>data stuff</element>).

  • Tags must be nested in a particular order

=> the parent element's opening and closing tags must contain all of its child elements' tags; in this way, you close first the tag that was opened last:

<parentElement>
      <childElement1>data</childElement1>
      <childElement2>
              <subChildElementA>data</subChildElementA>
              <subChildElementB>data</subChildElementB>
      </childElement2>
      <childElement3>data</childElement3>
</parentElement>
  • Attribute values should have quotation marks around them and no spaces.
  • Empty tags or empty elements must have a space and a slash (/) at the end of the tag.
  • Comments in the XML language begin with "<!--" and end with "-->".


XML Element Naming Convention

edit

Any name can be used but the idea is to make names meaningful to those who might read the document.

  • XML elements may only start with either a letter or an underscore character.
  • The name must not start with the string "xml" which is reserved for the XML specification.
  • The name may not contain spaces.
  • The ":" should not be used in element names because it is reserved to be used for namespaces (This will be covered in more detail in a later chapter).
  • The name may contain a mixture of letters, numbers, or other characters.


XML documents often have a corresponding database. The database will contain fields which correspond to elements in the XML document. A good practice is to use the naming rules of your database for the elements in the XML documents.

DTD (Document Type Definition) Validation - Simple Example

edit
Simple Internal DTD
edit
 <?xml version="1.0"?>
 <!DOCTYPE cdCollection [
    <!ELEMENT cdCollection (cd)>
    <!ELEMENT cd (title, artist, year)>
    <!ELEMENT title (#PCDATA)>
    <!ELEMENT artist (#PCDATA)>
    <!ELEMENT year (#PCDATA)>
 ]>
 <cdCollection>
  <cd>
    <title>Dark Side of the Moon</title>
    <artist>Pink Floyd</artist>
    <year>1973</year>
  </cd>
 </cdCollection>

Every element that will be used MUST be included in the DTD. Don’t forget to include the root element, even though you have already specified it at the beginning of the DTD. You must specify it again, in an <!ELEMENT> tag. <!ELEMENT cdCollection (cd)> The root element, <cdCollection>, contains all the other elements of the document, but only one direct child element: <cd>. Therefore, you need to specify the child element (only direct child elements need to be specified) in the parentheses. <!ELEMENT cd (title, artist, year)> With this line, we define the <cd> element. Note that this element contains the child elements <title>, <artist>, and <year>. These are spelled out in a particular order. This order must be followed when creating the XML document. If you change the order of the elements (with this particular DTD), the document won’t validate. <!ELEMENT title (#PCDATA)> The remaining three tags, <title>, <artist>, and <year> don’t actually contain other tags. They do however contain some text that needs to be parsed. You may remember from an earlier lecture that this data is called Parsed Character Data, or #PCDATA. Therefore, #PCDATA is specified in the parentheses. So this simple DTD outlines exactly what you see here in the XML file. Nothing can be added or taken away, as long as we stick to this DTD. The only thing you can change is the #PCDATA text part between the tags.

Adding complexity
edit

There may be times when you will want to put more than just character data, or more than just child elements into a particular element. This is referred to as mixed content. For example, let’s say you want to be able to put character data OR a child element, such as the <b> tag into a <description> element:

 <!ELEMENT description (#PCDATA | b | i )*>

This particular arrangement allows us to use PCDATA, the <b> tag, or the <i> tag all at once. One particular caveat though, is that if you are going to mix PCDATA and other elements, the grouping must be followed by the asterisk (*) suffix. This declaration allows us to now add the following to the XML document (after defining the individual elements of course)

  <cd>
    <title>Love. Angel. Music. Baby</title>
    <artist>Gwen Stefani</artist>
    <year>2004</year>
    <genre>pop</genre>
    <description>
      This is a great album from former  
      <nowiki><i>No Doubt</i> singer <b>Gwen Stephani</b>.</nowiki>
    </description>
  </cd>

With attributes this is done a little differently than with elements. Please see following example:

  <cd remaster_date=”1992”>
    <title>Dark Side of the Moon</title>
    <artist>Pink Floyd</artist>
    <year>1973</year>
  </cd>

In order for this to validate, it must be specified in the DTD. Attribute content models are specified with:

 <!ATTLIST element_name attribute_name attribute_type default_value>

Let’s use this to validate our CD example:

 <!ATTLIST cd remaster_date CDATA #IMPLIED>
Choices
edit
 <ATTLIST person gender (male|female) “male”>
Grouping Attributes for an Element
edit

If a particular element is to have many different attributes, group them together like so:

<!ATTLIST car horn CDATA #REQUIRED
             seats CDATA #REQUIRED
     steeringwheel CDATA #REQUIRED
             price CDATA #IMPLIED>
Adding STATIC validation, for items that must have a certain value
edit
<!ATTLIST classList   classNumber CDATA #IMPLIED
                      building (UWINNIPEG_DCE|UWINNIPEG_MAIN) "UWINNIPEG_MAIN"
                      originalDeveloper CDATA #FIXED "Khal Shariff">

Suffixes=

edit

So what happens with our last example with the CD collection, when we want to add more CDs? With the current DTD, we cannot add any more CDs without getting an error. Try it and see. When you specify a child element (or elements) the way we did, only one of each child element can be used. Not very suitable for a CD collection is it? We can use something called suffixes to add functionality to the <!ELEMENT> tag. Suffixes are added to the end of the specified child element(s). There are 3 main suffixes that can be used:

  • ( No suffix ): Only 1 child can be used.
  • ( + ): One or more elements can be used.
  • ( * ): Zero or more elements can be used.
  • ( ? ): Zero or one element may be used.
Validating for multiple children with a DTD
edit

So in the case of our CD collection XML file, we can add more CDs to the list by adding a + suffix:

<!ELEMENT cd_collection(cd+)>
Using more internal formatting tags
edit

Bold tags, B's for example are also defined in the DTD as elements, that are optional like thus:

<ELEMENT notes (#PCDATA | b | i)*>
   <!ELEMENT b (#PCDATA)*>
   <!ELEMENT i (#PCDATA)*>
]>

_______________

<classList classNumber="303" building="UWINNIPEG_DCE" originalDeveloper="Khal Shariff">
 <student>
   <firstName>Kenneth
   </firstName>
   <lastName>Branaugh
   </lastName>
   <studentNumber>
   </studentNumber>
   <notes><b>Excellent </b>, Kenneth is doing well.
   </notes>
etc

Case Study on BMEcat

edit

One of the first major national projects for the use of XML as a B2B exchange format was initiated by the federal association for material management, purchasing and logistics (BME) in cooperation with leading German companies, e.g. Bayer, BMW, SAP and Siemens. They all created a standard for the exchange of product catalogues. This project was named BMEcat. The result of this initiative is a DTD collection for the description of product catalogues and related transactions (new catalogue, updating of product data and updating of prices).

Companies operating in the electronic commerce (suppliers, purchasing companies and market places) exchange increasingly large amounts of data. They quickly reach their limits here by the variety of data exchange formats. The BMEcat solution creates a basis for a straightforward transfer of catalogue data from various data formats. This lays the foundation to bringing forward the goods traffic through the Internet in Germany. The use of the BMEcat reduces the costs for all parties as standard interfaces can be used.

The XML-based standard BMEcat was successfully implemented in many projects. Nowadays a variety of companies applies BMEcat and use it for the exchange of their product catalogs in this established standard.


A BMEcat catalogue (Version 1.2) consists of the following main elements:

CATALOG This element contains the essential information of a shopping catalog, e.g. language version and validity. BMEcat expects exactly one language per catalog.

SUPPLIER This element includes identification and address of the catalog suppliers. BMEcat expects exactly one supplier per catalog.

BUYER This element contains the name and address of the catalogue recipient. BMEcat expects no more than one recipient per catalog.

AGREEMENT This element contains one or more framework agreement IDs associated with the appropriate validity period. BMEcat expects all prices of a catalogue belonging to the contract mentioned above.

CLASSIFICATION SYSTEM This element allows the full transfer of one or more classification systems, including feature definitions and key words.

CATALOG GROUP SYSTEM This element originates from version 1.0. It is mainly used for the transfer of tree-structures which facilitate the navigation of a user in the target system (Browser).

ARTICLE (since 2005 PRODUCT) This element represents a product. It contains a set of standard attributes.

ARTICLE PRICE (since 2005 PRODUCT PRICE) This element represents a price. The support of different pricing models is very powerful in comparison with other exchange formats. Season prices, country prices, different currencies and different validity periods, etc. will be supported.

ARTICLE FEATURE (since 2005 PRODUCT FEATURE) This element allows the transfer of characteristic values. You can either record predefined group characteristics or individual product characteristics.

VARIANT This element allows listing of product variants, without having to duplicate them. However, the variations of BMEcat only apply to individual changes in value, leading to a change of Article ID. Otherwise there can’t exist any dependences on other attributes (especially at prices).

MIME This element includes any number of additional documents such as product images, data sheets, or websites.

ARTICLE REFERENCE (since 2005 REFERENCE PRODUCT) This element allows cross-referencing between articles within a catalogue as well as between catalogues. These references may used restrictedly for mapping product bundles.

USER DEFINED EXTENSION This element enables transportation of data at the outside the BMEcat standards. The transmitter and receiver have to be coordinated.

You can find a typical BMEcat file here.

ONLINE Validator
edit

GIYBF

Well-formed and valid XML

edit

Well-formed XML  -  An XML document that correctly abides by the rules of XML syntax.

Valid XML  -  An XML document that adheres to the rules of an XML schema (which we will discuss shortly). To be valid an XML document must first be well-formed.


A Valid XML Document must be Well-formed. But, a Well-formed XML Document might not be valid - in other words, a well-formed XML document, that meets the criteria for XML syntax, might not meet the criteria for the XML schema, and will therefore be invalid.

For example, think of the situation where your XML document contains the following (for this schema):

  <city>
    <cityName>Boston</cityName>
    <country>United States</country>
    <adminUnit>Massachusetts</adminUnit>
  :
  :
  :
  </city>

Notice that the elements do not appear in the correct sequence according to the schema (cityName, adminUnit, country). The XML document can be validated (using validation software) against its declared schema – the validation software would then catch the out of sequence error.


Using an XML Editor

edit

Check chapter XML Editor for instructions on how to start an XML editor. Once you have followed the steps to get started you can copy the code in the sample XML document and paste it into the XML editor. Then check your results. Is the XML document well-formed? Is the XML document valid? (you will need to have copied and pasted the schema in order to validate - we will look at schemas next)


XML schema

edit

An XML schema is an XML document. XML schemas have an .xsd file extension.

An XML schema is used to govern the structure and content of an XML document by providing a template for XML documents to follow in order to be valid. It is a guide for how to structure your XML document as well as indicating your XML document's components (elements and attributes - and their relationships). An XML editor will examine an XML document to ensure that it conforms to the specifications of the XML schema it is written against - to ensure it is valid.

XML schemas engender confidence in data transfer. With schemas, the receiver of data can feel confident that the data conforms to expectations. The sender and the receiver have a mutual understanding of what the data represent.

Because an XML schema is an XML document, you use the same language - standard XML markup syntax - with elements and attributes specific to schemas.


A schema defines:

  • the structure of the document
  • the elements
  • the attributes
  • the child elements
  • the number of child elements
  • the order of elements
  • the names and contents of all elements
  • the data type for each element

For more detailed information on XML schemas and reference lists of: Common XML Schema Primitive Data Types, Summary of XML Schema Elements, Schema Restrictions and Facets for data types, and Instance Document Attributes, click on this wikibook link => http://en.wikibooks.org/wiki/XML_Schema


Schema reference

edit

This is the part of the XML Document that references an XML Schema:

Exhibit 5: XML document's schema reference

  <tourGuide
      xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'
      xsi:noNamespaceSchemaLocation='city.xsd'>

This is the part we left out when we described the root element in the basic XML document from the previous section. The additional attributes of the root element <tourGuide> reference the XML schema (it is the schemaLocation attribute).

xmlns:xsi='http://www.w3.org/2001/XMLSchema-instance'  -  references the W3C Schema-instance namespace
xsi:noNamespaceSchemaLocation='city.xsd'  -  references the XML schema document (city.xsd)

Schema document

edit

Below is a sample XML schema using our TourGuide model. We will refer to it as we describe the parts of an XML schema.

Exhibit 6: XML schema document for city entity

  <?xml version="1.0" encoding="UTF-8"?>
  <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
   elementFormDefault="unqualified">  
    <xsd:element name="tourGuide">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="city" type="cityDetails" minOccurs = "1" maxOccurs="unbounded" />
            </xsd:sequence>
        </xsd:complexType>
     </xsd:element>
     <xsd:complexType name="cityDetails">
        <xsd:sequence> 
             <xsd:element name="cityName" type="xsd:string"/>
             <xsd:element name="adminUnit" type="xsd:string"/>
             <xsd:element name="country" type="xsd:string"/>
             <xsd:element name="population" type="xsd:integer"/>
             <xsd:element name="area" type="xsd:integer"/>
             <xsd:element name="elevation" type="xsd:integer"/>
             <xsd:element name="longitude" type="xsd:decimal"/>
             <xsd:element name="latitude" type="xsd:decimal"/>
             <xsd:element name="description" type="xsd:string"/>
             <xsd:element name="history" type="xsd:string"/>
         </xsd:sequence>
     </xsd:complexType>
  </xsd:schema>
  <!--
    Note: Latitude and Longitude are decimal data types.
    The conversion is from the usual form (e.g., 50º 17' 35")
    to a decimal by using the formula degrees+min/60+secs/3600.
  -->


Prolog

edit

Remember that the XML schema is essentially an XML document and therefore must begin with the prolog, which in the case of a schema includes:

  • the XML declaration
  • the schema element declaration


The XML declaration:

  <?xml version="1.0" encoding="UTF-8"?>

The schema element declaration:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema" elementFormDefault="unqualified">

The schema element is similar to a root element - it contains all other elements in the schema.

Attributes of the schema element include:

xmlns  -  XML NameSpace - the URL for the site that describes the XML elements and data types used in the schema.

You can find more about namespaces here => Namespace.

xmlns:xsd  -  All the elements and attributes with the 'xsd' prefix adhere to the vocabulary designated in the given namespace.

elementFormDefault  -  elements from the target namespace are either required or not required to be qualified with the namespace prefix. This is mostly useful when more than one namespace is referenced. In this case, 'elementFormDefault' must be qualified, because you must indicate which namespace you are using for each element. If you are referencing only one namespace, then 'elementFormDefault' can be unqualified. Perhaps, using qualified as the default is most prudent, this way you do not accidentally forget to indicate which namespace you are referencing.

Element declarations

edit

Define the elements in the schema.

Include:

  • the element name
  • the element data type (optional)

Basic element declaration format: <xsd:element name="name" type="type">

Simple type
edit

declares elements that:

  • do NOT have Child Elements
  • do NOT have Attributes

example: <xsd:element name="cityName" type="xsd:string" />

Default Value

If an element is not assigned a value then the default value is assigned.

example: <xsd:element name="description" type="xsd:string" default="really cool place to visit!" />

Fixed Value

An attribute that is defined as fixed must be empty or contained the specified fixed value. No other values are allowed.

example: <xsd:element name="description" type="xsd:string" '''fixed="you must visit this place - it is awesome!"''' />

Complex type
edit

declares elements that:

  • can have Child Elements
  • can have Attributes

examples:

1. The root element 'tourGuide' contains a child element 'city'. This is shown here:

Nameless complex type

     <xsd:element name="tourGuide">
        <xsd:complexType>
            <xsd:sequence>
                <xsd:element name="city" type="cityDetails" minOccurs = "1" maxOccurs="unbounded" />
            </xsd:sequence>
        </xsd:complexType>
     </xsd:element>

Occurrence Indicators:

  • minOccurs = the minimum number of times an element can occur (here it is 1 time)
  • maxOccurs = the maximum number of times an element can occur (here it is an unlimited number of times, 'unbounded')


2. The parent element 'city' contains many child elements: 'cityName', 'adminUnit', 'country', 'population', etc. Why does this complex element set not start with the line: <xsd:element name="city" type="cityDetails">? The element 'city' was already defined above within the complex element 'tourGuide' and it was given the type, 'cityDetails'. This data type, 'cityDetails', is utilized here in identifying the sequence of child elements for the parent element 'city'.

Named Complex Type - and therefore can be reused in other parts of the schema

   <xsd:complexType name="cityDetails">
        <xsd:sequence>
             <xsd:element name="cityName" type="xsd:string"/>
             <xsd:element name="adminUnit" type="xsd:string"/>
             <xsd:element name="country" type="xsd:string"/>
             <xsd:element name="population" type="xsd:integer"/>
             <xsd:element name="area" type="xsd:integer"/>
             <xsd:element name="elevation" type="xsd:integer"/>
             <xsd:element name="longitude" type="xsd:decimal"/>
             <xsd:element name="latitude" type="xsd:decimal"/>
             <xsd:element name="description" type="xsd:string"/>
             <xsd:element name="history" type="xsd:string"/>
         </xsd:sequence>
   </xsd:complexType>

The <xsd:sequence> tag indicates that the child elements must appear in the order, the sequence, specified here.

Compare the sample XML Schema and the sample XML Document - try to observe patterns in the code and how the XML Schema sets up the XML Document.


3. Elements that have attributes are also designated as complex type.

a. this XML Document line: <adminUnit class="state" name="Cayo" /> would be defined in the XML Schema as:

     <xsd:element name="adminUnit">
          <xsd:complexType>
               <xsd:attribute name="class" type="xsd:string" />
               <xsd:attribute name="name" type="xsd:string" />
          </xsd:complexType>
     </xsd:element>

b. this XML Document line: <adminUnit class="state">Cayo</adminUnit> would be defined in the XML Schema as:

     <xsd:element name="adminUnit">
          <xsd:complexType>
               <xsd:simpleContent>
             		<xsd:extension base="xsd:string">
                                <xsd:attribute name="class" type="xsd:string" />
                        </xsd:extension>
	       </xsd:simpleContent>
          </xsd:complexType>
     </xsd:element>

Attribute declarations

edit

Attribute declarations are used in complex type definitions. We saw some attribute declarations in the third example of the Complex Type Element.

<xsd:attribute name="class" type="xsd:string" />


Data type declarations

edit

These are contained within element and attribute declarations as: type=" " .

Common XML Schema Data Types

XML schema has a lot of built-in data types. The most common types are:

string a string of characters
decimal a decimal number
integer an integer
boolean the values true or false or 1 or 0
date a date, the date pattern can be specified such as YYYY-MM-DD
time a time of day, the time pattern can be specified such as HH:MM:SS
dateTime a date and time combination
anyURI if the element will contain a URL


For an entire list of built-in simple data types see http://www.w3.org/TR/xmlschema-2/#built-in-datatypes



Using an XML Editor => XML Editor

This link will take you to instructions on how to start an XML editor. Once you have followed the steps to get started you can copy the code in the sample XML schema document and paste it into the XML editor. Then check your results. Is the XML schema well-formed? Is the XML schema valid?


XML stylesheet (XSL)

edit

An XML Stylesheet is an XML Document. XML Stylesheets have an .xsl file extension.

The eXtensible Stylesheet Language (XSL) provides a means to transform and format the contents of an XML document for display. Since an XML document does not contain tags a browser understands, such as HTML tags, browsers cannot present the data without a stylesheet that contains the presentation information. By separating the data and the presentation logic, XSL allows people to view the data according to their different needs and preferences.

The XSL Transformation Language (XSLT) is used to transform an XML document from one form to another, such as creating an HTML document to be viewed in a browser. An XSLT stylesheet consists of a set of formatting instructions that dictate how the contents of an XML document will be displayed in a browser, with much the same effect as Cascading Stylesheets (CSS) do for HTML. Multiple views of the same data can be created using different stylesheets. The output of a stylesheet is not restricted to a browser.


During the transformation process, XSLT analyzes the XML document and converts it into a node tree – a hierarchical representation of the entire XML document. Each node represents a piece of the XML document, such as an element, attribute or some text content. The XSL stylesheet contains predefined “templates” that contain instructions on what to do with the nodes. XSLT will use the match attribute to relate XML element nodes to the templates, and transform them into the resulting document.

Exhibit 7: XML stylesheet document for city entity

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
    <xsl:output method="html"/> 
    <xsl:template match="/">
        <html>
            <head>
                <title>Tour Guide</title>
            </head>
            <body>
                <h2>Cities</h2>
                <xsl:apply-templates select="tourGuide"/>
            </body>
        </html>
    </xsl:template>
    <xsl:template match="tourGuide">
        <xsl:for-each select="city">
            <br/><xsl:value-of select="continentName"/><br/>
            <xsl:value-of select="cityName"/><br/>
            <xsl:text>Population: </xsl:text>
            <xsl:value-of select='format-number(population, "##,###,###")'/><br/>
            <xsl:value-of select="country"/>
            <br/>
        </xsl:for-each>     
    </xsl:template>
</xsl:stylesheet>


The output of the city.xsl stylesheet in Table 2-3 will look like the following:

Cities

Europe
Madrid
Population: 3,128,600
Spain

Asia
Shanghai
Population: 18,880,000

China


You will notice that the stylesheet consists of HTML to inform the media tool (a web browser) of the presentation design. If you do not already know HTML this may seem a little confusing. Online resources such as the W3Schools tutorials can help with the basic understanding you will need =>(http://www.w3schools.com/html/default.asp).

Incorporated within the HTML is the XML that supplies the data, the information, contained within our XML document. The XML of the stylesheet indicates what information will be displayed and how. So, the HTML constructs a display and the XML plugs in values within that display. XSL is the tool that transforms the information into presentational form, but at the same time keeps the meaning of the data.

XML at Bertelsmann - a case study

The German Bertelsmann Inc. is a privately owned media conglomerate operating in 56 countries. It has interests in such businesses as TV broadcast (RTL), magazine (Gruner & Jahr), books (Random House) etc. In 2005 its 89 000 employees generated 18 billion € of revenue.

A major concern of such a diversified business is utilizing synergies. Management needs to make sure the Random House employees don´t spend time and money figuring out what RTL TV journalists already have come up with.

Thus knowledge management based on IT promises huge time savings. Consequently Bertelsmann in 2002 started a project called BeCom. BeCom´s purpose was to enable the different Bertelsmann businesses to use the same data for their different media applications. XML is crucial in this project, because it allows for separating data (document) from presentation (style sheet). Thus data can both be examined statistically and be modified to fit different media like TV and newspapers.

Statistical XML data management for example enables employees to benefit from CBR (Case Based Reasoning). CBR allows a Bertelsmann employee who searches for specific content to profit from previous search findings of other Bertelsmann employees, thus gaining info which is much more contextual than isolated research results only. Besides XML data management, Bertelsmann TV and Book units can apply this optimized data in their specific media using a variety of lay-out applications like 3B2 or QuarkXPress.


Prolog

edit
  • the XML declaration;
  • the stylesheet declaration;
  • the namespace declaration;
  • the output document format.
 <?xml version="1.0" encoding="UTF-8"?>
 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
   <xsl:output method="html"/>


The XML declaration

 <?xml version="1.0" encoding="UTF-8"?>


The stylesheet & namespace declarations

     <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
  • identifies the document as an XSL style sheet;
  • identifies the version number;
  • refers to the W3C XSL namespace - the URL for the site that describes the XML elements and data types used in the schema. You can find more about namespaces here => Namespace. Every time the xsl: prefix is used it references the given namespace.


The output document format

      <xsl:output method="html"/>

This element designates the format of the output document and must be a child element of <xsl:stylesheet>

Templates

edit

The <xsl:template> element is used to create templates that describe how to display elements and their content. Above, in the XSL introduction, we mentioned that XSL breaks up the XML document into nodes and works on individual nodes. This is done with templates. Each template within an XSL describes a single node. To identify which node a given template is describing, use the 'match' attribute. The value given to the 'match' attribute is called a pattern. Remember: (node tree – a hierarchical representation of the entire XML document. Each node represents a piece of the XML document, such as an element, attribute or some text content). Wherever there is branching in the node tree, there is a node. <xsl:template> defines the start of a template and contains rules to apply when a specified node is matched.


the match attribute

   <xsl:template match="/">

This template match attribute associates the XML document root (/), the whole branch of the XML source document, with the HTML document root. Contained within this template element is the typical HTML markup found at the beginning of any HTML document. This HTML is written to the output. The XSL looks for the root match and then outputs the HTML, which the browser understands.

   <xsl:template match="tourGuide">

This template match attribute associates the element 'tourGuide' with the display rules described within this element.


Elements

edit

Elements specific to XSL:

XSL Element Meaning
(from our sample XSL)
<xsl:text> Prints the actual text found between this element's tags
<xsl:value-of> This element is used with a 'select' attribute to look up the value of the node selected and plug it into the output.
<xsl:for-each> This element is used with a 'select' attribute to handle elements that repeat by looping through all the nodes in the selected node set.
<xsl:apply-templates> This element will apply a template to a node or nodes. If it uses a 'select' attribute then the template will be applied only to the selected child node(s) and can specify the order of child nodes. If no 'select' attribute is used then the template will be applied to the current node and all its child nodes as well as text nodes.

For more XSL elements => http://www.w3schools.com/xsl/xsl_w3celementref.asp .

Language-Specific Validation and Transformation Methods

edit

PHP Methods of XML Dom Validation

edit

Using the DOM DocumentObjectModel to validate XML and with a DTD DocumentTypeDeclaration and the PHP language on a server and more http://wiki.cc/php/Dom_validation

Browser Methods

edit

Place this line of code in your .xml document after the XML declaration (prologue).

 <?xml-stylesheet type="text/xsl" href="tourGuide.xsl"?>

PHP XML Production

edit
 <?php
 $xmlData = "";
 mysql_connect('localhost','root','')
 or die('Failed to connect to the DBMS');
 // make connection to database
 mysql_select_db('issd')
 or die('Failed to open the requested database');
 $result = mysql_query('SELECT * from students') or die('Query to like get the records failed');
 if (mysql_num_rows($result)<1){
    die ('');
 }
 $xmlString = "<classlist>\n";
 $xmlString .= "\t<student>";
 while ($row = mysql_fetch_array($result)) {
         $xmlString .=  "
          \t<firstName>
              ".$row['firstName']."
           </firstName>\n
            \t<lastName>
              ".$row['lastName']."
          \t</lastName>\n";         
      }
 $xmlString .= "</student>\n";
 $xmlString .= "</classlist>";
 echo $xmlString;
 $myFile = "classList.xml"; //any file
 $fh = fopen($myFile, 'w') or die("can't open file"); //create filehandler
 fwrite($fh, $xmlString); //write the data into the file
 fclose($fh); //ALL DONE!
 ?>

PHP Methods of XSLT Transformation

edit

This one is good for PHP5 and wampserver (latest). Please ensure that *xsl* is NOT commented out in the php.ini file.

 <?php
 // Load the XML source
 $xml = new DOMDocument;
 $xml->load('tourguide.xml');
 $xsl = new DOMDocument;
 $xsl->load('tourguide.xsl');
 // Configure the transformer
 $proc = new XSLTProcessor;
 $proc->importStyleSheet($xsl); // attach the xsl rules
 echo $proc->transformToXML($xml);
 ?>


Example 1, Using within PHP itself (use phpInfo() function to check XSLT extension; enable if needed) This example might produce XHTML. Please note it could produce anything defined by the XSL.

 <?php
 $xhtmlOutput = xslt_create();
 $args = array();
 $params = array('foo' => 'bar');
 $theResult = xslt_process(
                         $xhtmlOutput,
                         'theContentSource.xml',
                         'theTransformationSource.xsl',
                         null,
                         $args,
                         $params
                        );
 xslt_free($xhtmlOutput); // free that memory
 // echo theResult or save it to a file or continue processing (perhaps instructions)
 ?>

Example 2:

 <?php
 if (PHP_VERSION >= 5) {
   // Emulate the old xslt library functions
   function xslt_create() {
       return new XsltProcessor();
   }
   function xslt_process($xsltproc,
                         $xml_arg,
                         $xsl_arg,
                          $xslcontainer = null,
                         $args = null,
                         $params = null) {
       // Start with preparing the arguments
       $xml_arg = str_replace('arg:', '', $xml_arg);
       $xsl_arg = str_replace('arg:', '', $xsl_arg);
       // Create instances of the DomDocument class
       $xml = new DomDocument;
       $xsl = new DomDocument;
       // Load the xml document and the xsl template
       $xml->loadXML($args[$xml_arg]);
       $xsl->loadXML($args[$xsl_arg]);
       // Load the xsl template
       $xsltproc->importStyleSheet($xsl);
       // Set parameters when defined
       if ($params) {
           foreach ($params as $param => $value) {
               $xsltproc->setParameter("", $param, $value);
           }
       }
       // Start the transformation
       $processed = $xsltproc->transformToXML($xml);
       // Put the result in a file when specified
       if ($xslcontainer) {
           return @file_put_contents($xslcontainer, $processed);
       } else {
           return $processed;
       }
   }
   function xslt_free($xsltproc) {
       unset($xsltproc);
   }
 }
 $arguments = array(
   '/_xml' => file_get_contents("xml_files/201945.xml"),
   '/_xsl' => file_get_contents("xml_files/convertToSql_new2.xsl")
 );
 $xsltproc = xslt_create();
 $html = xslt_process(
   $xsltproc,
   'arg:/_xml',
   'arg:/_xsl',
   null,
   $arguments
 );
 xslt_free($xsltproc);
 print $html;
 ?>

PHP file writing code

edit
 $myFile = "testFile.xml"; //any file
 $fh = fopen($myFile, 'w') or die("can't open file"); //create filehandler
 $stringData = "<foo>\n\t<bar>\n\thello\n"; // get a string ready to write
 fwrite($fh, $stringData); //write the data into the file
 $stringData2 = "\t</bar>\n</foo>";
 fwrite($fh, $stringData2); //write more data into the file
 fclose($fh); //ALL DONE!

XML Colors

edit

For use in your stylesheet: these colors can be used for both background and font

http://www.w3schools.com/html/html_colors.asp

http://www.w3schools.com/html/html_colorsfull.asp

http://www.w3schools.com/html/html_colornames.asp


Using an XML Editor => XML Editor

This link will take you to instructions on how to start an XML editor. Once you have followed the steps to get started you can copy the code in the sample XML stylesheet document and paste it into the XML editor. Then check your results. Is the XML stylesheet well-formed?


XML at Thomas Cook - a case study

edit

As the leading travel company and most widely recognized brands in the world, Thomas Cook works across the travel value chain - airlines, hotels, tour operators, travel and incoming agencies, providing its customers with the right product in all market segments across the globe. Employing over 11,000 staff, the Group has 33 tour operators, around 3,600 travel agencies, a fleet of 80 aircraft and a workforce numbering some 26,000. Thomas Cook operates throughout a network of 616 locations in Europe and overseas. The company is now the second largest travel group in Europe and the third largest in the world.

As Thomas Cook sells other companies´ products, ranging from packaged holidays to car hires, it needs to regularly change its online brochure. Before Thomas Cook started using XML, it put information into HTML format, and would take upto six weeks to get an online brochure up and running online. XML helps do this job in about three days. This helps provide all of Thomas Cook´s current and potential customers and its various agencies in different geographical locations with updated information, instead of having to wait six weeks for new information to be released.

XML allows Thomas Cook to put content information into a single database, which can be re-used as many times as required. "We did not want to keep having to re-do the same content, we wanted the ability to switch it on immediately," said Gwyn Williams, who is content manager at Thomascook.com. "This has brought internal benefits such as being able to re-deploy staff into more value added areas." Thomascook.com currently holds 65,000 pages of brochure and travel guide information and an online magazine in XML format.

Thomas Cook started using XML at a relatively early stage. As Thomas Cook has a large database, the early use of XML will stand it in good stead. At some point, the databases will have to be incorporated into XML, and it is reported that XML databases are quicker than conventional databases, giving Thomas Cook a slight competitive advantage against those who do not use XML.

Thomas Cook has found that this can lead to substantial cost reductions as well as consistency of information across all channels. By implementing a central content management system to facilitate brochure production and web publications, they have centralized the production, maintenance and distribution of content across their brands and channels.

Summary

edit
From the previous chapter Introduction to XML, you have learned the need for data exchange and the usefulness of XML in data exchange. In this chapter, you have learned more about the three major XML files: the XML document, the XML schema, and the XML stylesheet. You learned the correct documentation required for each type of file. You learned basic rules of syntax applicable for all XML documents. You learned how to integrate the three types of XML documents. And you learned the definition and distinction between a well-formed document and a valid document. By following the XML Editor links, you were able to see the results of the sample code and learn how to use an XML Editor.

Below are Exercises and Answers for further practice. Good Luck!


XML
SGML
Dan Connelly
RSS
XML Declaration
parent
child
sibling
element
attribute
*Well-formed XML
PCDATA

Exercise 1.

a)Using "tourguide" above as a good example, create an XML document whose root is "classlist" . This CLASSLIST is created from a starting point of single entity, STUDENT. Any number of students contain elements: firstname, lastname, emailaddress.