XQuery/Parsing CSV
Motivation
You want to parse a file of comma-separated values (CSV) text into an xml structure or you have a flat file with very similar line/field structure that you want to convert into XML format.
Method
We will use the tokenize($input, '\n') function to break the input file into separate lines. We will then use the tokenize($input, ',') function to parse each line into separate fields.
Basic Example
xquery version "1.0"; let $title := 'XQuery CSV Parser' let $csv-input-sample := 'John,Smith,x123 Peg,Jones,x456 Sue,Adams,x789 Dan,McCoy,x321' let $lines := tokenize($csv-input-sample, '\n') return <results>{ for $line in $lines let $fields := tokenize($line, ',') return <row>{ for $field in $fields return <field>{$field}</field> }</row> }</results>
Sample Output
<results> <row> <field>John</field> <field>Smith</field> <field>x123</field> </row> <row> <field>Peg</field> <field>Jones</field> <field>x456</field> </row> <row> <field>Sue</field> <field>Adams</field> <field>x789</field> </row> <row> <field>Dan</field> <field>McCoy</field> <field>x321</field> </row> </results>
Example With Data Dictionary in Row 1
This second example will use the first row of the CSV file as a data dictionary of the element names for each column of each row.
let $csv :=
'name,faculty
alice,anthropology
bob,biology'
let $lines := tokenize($csv, '\n')
let $head := tokenize($lines[1], ',')
let $body := remove($lines, 1)
return
<people>
{
for $line in $body
let $fields := tokenize($line, ',')
return
<person>
{
for $key at $pos in $head
let $value := $fields[$pos]
return
element { $key } { $value }
}
</person>
}
</people>
We will receive below result (tested with eXide in eXist 2.0):
<people> <person> <name>alice</name> <faculty>anthropology</faculty> </person> <person> <name>bob</name> <faculty>biology</faculty> </person> </people>
Beware though, that this requires the very strict input format proposed. In practice CSV can look quite different.
Adding Configuration File Options
Many times you have a family of CSV files that all may have very similar options for import. In this case it is useful to be able to pass a series of configuration parameter to a single XQuery function. These configuration parameters include:
- What the field delimiter is (comma is the defult"
- The element name of the root node
- The element name of each line or row
- If each field could be wrapped in quotes
<file-import-config> <field-delimiter>,<field-delimiter> <root-element-name>people</root-element-name> <line-element-name>person</line-element-name> </file-import-config>
You can then pass this configuration file to your XQuery function:
import-lib:flat-file-import($input-file, $config)