XQuery/Lorum Ipsum text
Motivation
editYou want to create realistically-sized example XML for testing or demonstration. Lorum impsum text is often used to fill out the contents and it would be useful to add this text wherever needed in an XML file.
We explore two approaches, one based on modifying the text and the other modifying the XML.
Approach 1 : string replacement
editThe places in the incomplete XML file where lorum ipsum text is to be placed is marked with ellipsis "...". The XML file is read, serialised to a string, split into parts, and the parts re-assembled adding a randomly chosen section of the lorum ipsum text in place of the ellipsis. The string is then turned back into XML for output. The base lorum ipsum text is stored as an XML file:
http://www.cems.uwe.ac.uk/xmlwiki/apps/lorumipsum/words.xml
Concepts used
edit- XML <> string conversion : The script uses a pair of functions from the exist util module (util:serialize and util:parse) to convert back and forth between XML and a string. This allows the XML text to be operated on as a simple string before being converted back to XML
- recursion : interpolating the random text into the original string requires a recursive function
- regular expressions: reg exps are used to tokenise the lorum ipsum text and the incomplete XML file containing ellipsis
XQuery
editdeclare function local:join-random($parts,$words) {
if (count($parts) > 1)
then
let $randomtext :=string-join(subsequence ($words,util:random(100), util:random(100))," ")
return string-join(($parts[1],$randomtext, local:join-random(subsequence($parts,2), $words)),"")
else $parts
};
let $lorumipsum := doc("/db/Wiki/apps/lorumipsum/words.xml")/lorumipsum
let $words := tokenize($lorumipsum,"\s+")
let $file := request:get-parameter("file",())
let $doc := doc($file)/*
let $docText := util:serialize($doc,"media-type=text/xml method=xml")
let $parts := tokenize($docText, "\.\.\.")
let $completedText := local:join-random($parts,$words)
return util:parse($completedText)
Example
editExplanation
edit- the lorum ipsum text is split into words by tokenising on whitespace
- the incomplete XML is fetched and the root element accessed.
- this element is converted to a string using the util:serialize function, then tokenized with the pattern "\.\.\.\" (not "..." since . means any single character in regular expressions)
- the recursive function join-random() joins the first of a sequence of strings with a random stretch of the lorum ipsum text with the remainder of the strings similarly joined
- the expanded text is converted back to an XML element using util:parse()
Improvements
edit- the lorum ipsum text itself could be generated rather than stored.
- the script could be parameterized for the lorum impsum file, allowing different, perhaps more realistic text to be used.
- the lorum ipsum words are passed as a parameter to the recursive function. This could be defined in a global variable instead.
- It would be better to use the httpclient module to fetch the files and control the caching via headers - here the file is being cached
Approach 2 - XML replacement
editThe choice of ellipsis as marker is problematic if this is to appear in the text. The conversion into text and back to XML is an overhead.
An alternative approach would be to use an XML element, for example <ipsum/> to mark the places where ipsum lorum text is to appear and replace every occurrence with a random word. The replacement of a specific element anywhere in the XML tree can be accomplished by modifying the identify transformation discussed in XQuery/Filtering_Nodes.
Concepts
edit- recursion - to copy an arbitrary XML tree, replacing a given element with random text.
XQuery
editdeclare variable $lorumipsum := doc("/db/Wiki/apps/lorumipsum/words.xml")/lorumipsum;
declare variable $words := tokenize($lorumipsum,"\s+");
declare variable $marker:= "ipsum";
declare function local:copy-with-random($element as element()) as element() {
element {node-name($element)}
{$element/@*,
for $child in $element/node()
return
if ($child instance of element())
then
if (name($child) = $marker)
then subsequence($words,util:random(100),util:random(100))
else local:copy-with-random($child)
else $child
}
};
let $file := request:get-parameter("file",())
let $root := doc($file)/*
return
local:copy-with-random($root)
Explanation
edit- the sequence of ipsum lorum words are held in a global variable to avoid passing it as a parameter to the recursive function.
- The copy-with-random() function recursively copies the elements and items in a tree to a new tree
- When the element with the name "ipsum" is encountered, a selection of ipsum lorem text is returned instead of the original element.
Example
editDiscussion
editThe second approach is simpler. Performance is about the same.
Acknowledgements
edit- the sample XML is an extract from "Search: The Graphics Web Guide", Ken Coupland,Laurence King Publishing (2002)