OpenClinica User Manual/DiscrepancyNotes
OpenClinica can export Discrepancy Notes as a CSV file, however as of version 3.x, the export causes a few problems if opened in Excel. The problems are caused by double quotes, newlines and carriage returns in the description and detailed notes fields. Additionally, an extra newline character causes excel to have a blank row every other row.
Aims
editTo simply transform the content of discrepancy notes csv exports into something usable by Excel. To do this, newlines and carriage returns in the description and detailed notes fields are removed. Double quotes are kept, but correctly escaped, and the double newline character at the end of the line is replaced with a single newline character.
Limitations
edit- The code below is specific to Windows, and has been tested in Windows 7.
- Windows doesn’t run unsigned PowerShell scripts, so until a signed version of the script below can be created, you have to run the commands manually.
- Run this on the unaltered csv file downloaded from OpenClinica - don't alter or open and save the file in Excel
- Neither the output nor the input files can be open when the script is started.
- The regexps could be reimplemented in Perl under Linux.
Instructions
edit- Alter the input and output values below to point to your desired inputfile and outputfile names (paths can be obtained from Explorer's 'location box')
- Type cmd into the Windows Start Orb search box (which opens a terminal)
- Then copy and paste the following section into the terminal (right-clicking on the terminal will open context menu from whence you can paste)
- Below the script are instructions on how to open the updated file and save it in Excel format
powershell
#########################
#PowerShell script begin#
#########################
#Filename - in single quotes, the inputfile and outputfile
#The outputfile will be overwritten. The inputfile will not be modified.
$inputfile = 'C:\Users\<user>\Documents\<study>\Discrepancies\dnotes_studyid.csv'
$outputfile = 'C:\Users\user>\Documents\<study>\Discrepancies\dnotes_studyid_working.csv'
#Get the file
$text = [System.IO.File]::ReadAllText($inputfile)
#Replacements
$text= $text -replace '\n','<newline/>'
$text= $text -replace '\r','<return/>'
#Give the line after the header a double line
$text= [regex]::Replace($text, "(id,Subject name,CRF name,Description,Discrepancy type,Event name,Parent note ID,Resolution status,Detailed notes,Entity name,Entity value,Date created,Date updated,Study id,Thread Number)",'$1<newline/>')
#Mark end of lines
#Start of lines look like this: <return/><newline/>16883,12047X,FF
#OR: <newline/><newline/>16883,12047X,FF
#Assume ids are up to 8 chars length
$text= [regex]::Replace($text, "(<return/><newline/>)(\d{1,8},[^.]*?,[^.*?])", '<endofline/><startofline/>$2');
$text= [regex]::Replace($text, "(<newline/><newline/>)(\d{1,8},[^.]*?,[^.*?])", '<endofline/><startofline/>$2');
#Mark the end of file as an end of line:
$text=$text+'<endofline/>'
#Simplify by removing unusual extra lines
$text= $text -replace ',,,,,,<return/><newline/>,,,,,,,,,,,,,,<return/><newline/>',''
$text= $text -replace ',,,,,,,,<return/><newline/>,,,,,,,,,,,,,,<endofline/>','<endofline/>'
$text= $text -replace '<return/><newline/>,,,,,,,,,,,,,,<endofline/>','<endofline/>'
#Simplify by removing newlines and returns
$text= $text -replace '<newline/>',''
$text= $text -replace '<return/>',''
#Lines look like this: id,Subject name,CRF name,Description,Discrepancy type,Event name,Parent note ID,Resolution status,Detailed notes,Entity name,Entity value,Date created,Date updated,Study id,Thread Number
#find beginning of description
$text= [regex]::Replace($text, "(<startofline/>[^,]*?,[^,]*?,[^,]*?,)",'$1<begindescription/>"')
#Find end of notes
$text= [regex]::Replace($text, "(,[^,]*?,[^,]*?,[^,]*?,[^,]*?,[^,]*?,[^,]*?<endofline/>)",'"<endnote/>$1')
#Find end of description and beginning of notes
$text= [regex]::Replace($text, "(<begindescription/>.*?)(,[^,]*?,[^,]*?,)(\d{4,6}|)(,[^,]*?,)(.*?<endnote/>)",'$1"<enddescription/>$2$3$4<beginnote/>"$5')
#Fix header quirk:
$text= [regex]::Replace($text, "(id,Subject name,CRF name,Description,Discrepancy type,Event name,Parent note ID,Resolution status,Detailed notes)""(<endnote/>)",'$1')
#Fix double quotes:
$text= $text -replace '<beginnote/>""<endnote/>','<beginnote/><endnote/>'
$text= $text -replace '<beginnote/>""','<beginnote/>"'
$text= $text -replace '""<endnote/>','"<endnote/>'
$text= $text -replace '<begindescription/>""<enddescription/>','<begindescription/><enddescription/>'
$text= $text -replace '<begindescription/>""','<begindescription/>"'
$text= $text -replace '""<enddescription/>','"<enddescription/>'
#Carriage return [char]13
#Line feed [char]10
#Back replaces:
$text= $text -replace '<endofline/>',''
$text= $text -replace '<startofline/>',[char]10
$text= $text -replace '<beginnote/>',''
$text= $text -replace '<endnote/>',''
$text= $text -replace '<begindescription/>',''
$text= $text -replace '<enddescription/>',''
#Write new file
$text | Out-File $outputfile
#########################
#Powershell script end #
#########################
exit
Opening the file in Excel
edit- Open Excel
- From the file tab, select 'Open'
- Open the file by browsing to the location of the output file - broaden the search to 'All Files' if necessary.
- Select 'Delimited', press next
- Alter the delimiters section, uncheck 'Tab' and check 'Comma' (leave the Text qualifier as ")
- Press finish
- You can now save the file in a native xlsx or other format