OpenClinica User Manual/ImportingData

Importing data edit

It is sometimes useful to import data into OpenClinica. The data can be amended, discrepancy notes added, it is checked on import and can be referenced using rules.

Reasons why you shouldn't do it (mostly convenience) edit

  1. It is sometimes less work to consider non-OpenClinica data as a separate data stream, that is merged with data from OpenClinica at the end of the study (especially if it doesn't need to be amended in the OpenClinica interface).
  2. It is sometimes quicker to manually enter data than import data
  3. Depending on your regulatory regime, you may need to validate any manipulation of the data prior to it being imported
  4. Depending on the import method, data goes into the system as ‘Completed’, but a data manager user can still upload the same data over the top of the data. Altering the records to be ‘locked’ doesn't stop the data being overwritten. May need sophisticated logic to avoid overwrites

Difficulties in using the standard import edit

  1. Import requires you to set up subjects and schedule visits before importing a file
  2. Import uses CDISC ODM XML files that can be challenging to create from Excel, CSV or whatever source you have
  3. Need to obtain OIDs for event and CRF when creating the CDISC ODM XML
  4. Import is dependent upon the subject's SubjectKey (StudySubjectOID), which varies from their standard Study Subject ID in a consistent way. However, if their Study Subject ID has changed, then SubjectKey will be dependent upon their initial incorrect Study Subject ID. So you may have to have some mappings for subjects whose ID was entered wrongly

How to do it edit

OpenClinica.com has documentation on how you can import CDISC ODM format XML files.

Methods edit

The OpenClinica wiki includes a number of import tools that can be used to import data

Converting CSV to CDISC ODM XML using XSLT edit

One method of generating a CDISC ODM XML from a CSV file (Microsoft Excel can save in this format) is to use an XSLT parser such as Kernow to parse the CSV using a customised XSLT transform.

An example CSV file (with SubjectKeys: SS_S001 and SS_S002, ITEM_NAMES: I_APFIN_LBISOPROSTANES and I_APFIN_LBOXLDL):

,I_APFIN_LBISOPROSTANES,I_APFIN_LBOXLDL
SS_S001,10,20
SS_S002,30,40

The XSLT transform for import:

<?xml version="1.0"?>
<xsl:stylesheet version="2.0"
	xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
	xmlns:xs="http://www.w3.org/2001/XMLSchema"
	xmlns:fn="fn"
	exclude-result-prefixes="xs fn">

  <xsl:output indent="yes" encoding="UTF-8"/>
  <!-- update this with the path to your csv file -->
  <xsl:param name="pathToCSV" select="'file:///C:/Test/FinalBlood.csv'"/>

  <xsl:function name="fn:getTokens" as="xs:string+">
    <xsl:param name="str" as="xs:string"/>
    <xsl:analyze-string select="concat($str, ',')" regex='(("[^"]*")+|[^,]*),'>
      <xsl:matching-substring>
        <xsl:sequence select='replace(regex-group(1), "^""|""$|("")""", "$1")'/>
      </xsl:matching-substring>
    </xsl:analyze-string>
  </xsl:function>

  <xsl:template match="/" name="main">
    <xsl:choose>
      <xsl:when test="unparsed-text-available($pathToCSV)">
        <xsl:variable name="csv" select="unparsed-text($pathToCSV)"/>
        <!--UNIX line endings are &#xa; Windows are &#xd;&#xa;-->
        <xsl:variable name="lines" select="tokenize($csv, '&#xd;&#xa;')" as="xs:string+"/>
        <xsl:variable name="elemNames" select="fn:getTokens($lines[1])" as="xs:string+"/>
        <ODM xmlns="http://www.cdisc.org/ns/odm/v1.3" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.cdisc.org/ns/odm/v1.3 ODM1-3.xsd" ODMVersion="1.3" FileOID="1D20080412202420" FileType="Snapshot" Description="First dataset for testing of ODM" CreationDateTime="2008-04-12T20:24:20" >
          <!-- update this with your StudyOID -->
          <ClinicalData StudyOID="S_2009CV16" MetaDataVersionOID="v1.0.0">
            <xsl:for-each select="$lines[position() > 1]">
              <xsl:variable name="lineItems" select="fn:getTokens(.)" as="xs:string+"/>
              <xsl:if test="$lineItems[1] != string()">
              <SubjectData><xsl:attribute name="SubjectKey"><xsl:value-of select="$lineItems[1]"/></xsl:attribute>
                <!-- update these OIDs with your StudyEventOID, FormOID (a specific version) and ItemGroupOID-->
                <StudyEventData StudyEventOID="SE_APFINALBLOODS" >
                  <FormData FormOID="F_APFINALBLOOD_V01">
                    <ItemGroupData ItemGroupOID="IG_APFIN_UNGROUPED" TransactionType="Insert" >
                      <xsl:for-each select="$elemNames[position() > 1]">
                        <xsl:variable name="pos" select="position()"/>
                        <ItemData><xsl:attribute name="ItemOID"><xsl:value-of select="$elemNames[$pos+1]"/></xsl:attribute><xsl:attribute name="Value"><xsl:value-of select="$lineItems[$pos+1]"/></xsl:attribute></ItemData>
                      </xsl:for-each>
                    </ItemGroupData>
                  </FormData>
                </StudyEventData>
              </SubjectData>
                </xsl:if>
            </xsl:for-each>
          </ClinicalData>
        </ODM>
      </xsl:when>
      <xsl:otherwise>
        <xsl:text>Cannot locate : </xsl:text>
        <xsl:value-of select="$pathToCSV"/>
      </xsl:otherwise>
    </xsl:choose>
  </xsl:template>

</xsl:stylesheet>

This should create the following CDISC ODM XML file:

<?xml version="1.0" encoding="UTF-8"?>
<ODM xmlns="http://www.cdisc.org/ns/odm/v1.3"
     xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
     xsi:schemaLocation="http://www.cdisc.org/ns/odm/v1.3 ODM1-3.xsd"
     ODMVersion="1.3"
     FileOID="1D20080412202420"
     FileType="Snapshot"
     Description="First dataset for testing of ODM"
     CreationDateTime="2008-04-12T20:24:20">
   <ClinicalData StudyOID="S_2009CV16" MetaDataVersionOID="v1.0.0">
      <SubjectData SubjectKey="SS_S001">
         <StudyEventData StudyEventOID="SE_APFINALBLOODS">
            <FormData FormOID="F_APFINALBLOOD_V01">
               <ItemGroupData ItemGroupOID="IG_APFIN_UNGROUPED" TransactionType="Insert">
                  <ItemData ItemOID="I_APFIN_LBISOPROSTANES" Value="40"/>
                  <ItemData ItemOID="I_APFIN_LBOXLDL" Value="30"/>
               </ItemGroupData>
            </FormData>
         </StudyEventData>
      </SubjectData>
      <SubjectData SubjectKey="SS_S002">
         <StudyEventData StudyEventOID="SE_APFINALBLOODS">
            <FormData FormOID="F_APFINALBLOOD_V01">
               <ItemGroupData ItemGroupOID="IG_APFIN_UNGROUPED" TransactionType="Insert">
                  <ItemData ItemOID="I_APFIN_LBISOPROSTANES" Value="20"/>
                  <ItemData ItemOID="I_APFIN_LBOXLDL" Value="10"/>
               </ItemGroupData>
            </FormData>
         </StudyEventData>
      </SubjectData>
   </ClinicalData>
</ODM>

Then import the file as per OpenClinica's Import Data documentation.

Limitations of this method edit

  • The transform to XML step only works correctly if there are no line breaks in the data (so no multi-line text fields).
  • XML characters in the input like < and > will cause problems.
  • Changes to the template or a different transformation tool (such as http://servingxml.sourceforge.net/) may avoid these issues.
  • The XML is easiest to construct if all the data is for the same CRF in the same Event
  • Need to reorganise data to have correct fieldnames for column headers and subject names for first items in each row