Ruby Programming/XML Processing/REXML

REXML is a XML processing API. As of Ruby 1.8, it is included in the Standard API.

REXML can read and write XML documents. Validation against a DTD or a schema is not yet fully implemented.

Strong XML parsing gems exist as well, such as nokogiri and the hpricot gems. They work faster than REXML typically.

Basics

edit

Definitions

edit

Using the DOM API, REXML can parse documents and build a tree containing the elements, attributes, and texts.

For example, this might be used to save a wikibook:

 <wikibook title="Programming:Ruby">
   <section title="Getting started">
     <chapter title="Overview">Ruby is a programming language of the Perl and Python ilk; [...]
     </chapter>
   </section>
 </wikibook>

In this case, the chapter is an element. It has an attribute title with the value Overview and a text with the value "Ruby is a programming language [...]".

The section is also an element. It has an attribute, too, but no text. Instead, it has an element, the chapter element.

In short, elements can have attributes, text and child elements.

Representation in REXML

edit

When parsing a XML document, an instance of the REXML::Document class is created. (The new message of REXML::Document just has to be fed with a REXML::Document itself, or a String, or an IO.) This represents the whole document, including the <?xml...?> tag. REXML::Document itself is a subclass of REXML::Element, an important class.

When using DOM, instances of the Element class are representing the elements of the XML document. They might have attributes, accessible using the attributes message, text, and child elements.

The Document is an Element itself, but usually, you might be more interested in the root element of the XML document. As defined in the XML specification, any document has only one root element; it can be easily obtained calling REXML::Document.root().

Once you have obtained the root Element, you can go down the tree using the elements message defined in Element, which returns a collection of all child elements, or access attributes or texts, whatever you need.

The tree can be modified, too. In addition, the to_s methods have been overridden to return the XML code of elements, attributes and text. Element.to_s returns the XML code of the whole element, including attributes, text, and child elements' XML code. You can call that on the Document, too.

edit

Standard API Documentation at ruby-doc.org, including the rexml package