R Programming/Importing and exporting data
Data can be stored in a large variety of formats. Each statistical package has its own format for data (xls for Microsoft Excel, dta for Stata, sas7bdat for SAS, ...). R can read almost all file formats. We present a method for each kind of file. If none of the following methods work, you can use a specific software for data conversion such as the free software OpenRefine or the commercial software Stat Transfer[1]. In any case, most statistical software can export data in a CSV (comma separated values) format and all of them can read CSV data. This is often the best solution to make data available to everyone.
Graphical user interfaces
Some IDE or GUI provides some press button solution to import data.
You may also have a look at speedR, a graphical user interface which helps at importing data from Excel, OpenOfficeCalc, CSV and other text files[2].
library(speedR) speedR()
CSV (csv,txt,dat)
You can import data from a text file (often CSV) using read.table(), read.csv() or read.csv2(). The option header = TRUE indicates that the first line of the CSV file should be interpreted as variables names and the option sep = gives the separator (generally "," or ";").
csv.get() (Hmisc) is another possibility.
mydata <- read.table("data.txt",header=TRUE) mydata <- read.table("data.csv", header = TRUE, sep=",") # import from a CSV mydata <- read.csv("data.csv", header=T) mydata <- read.table("data.csv", header = TRUE, sep=";") mydata <- read.csv2("data.csv", header=T)
Note that there is no problem if your data are stored on the internet.
df <- read.table("http://www.mywebsite.com/.../data.csv", header = TRUE, sep = ",")
By default, strings are converted to factors. If you want to avoid this conversion, you can specify the option stringsAsFactors = FALSE.
You can export data to a text file using write.table().
write.table(mydat,file="mydat.csv",quote=T,append=F,sep=",",eol = "\n", na = "NA", dec = ".", row.names = T,col.names = T)
For large CSV files, it is possible to use the ff package[3].
library("ff") df <- read.csv.ffdf(file="large_csv_file.csv", header=TRUE, VERBOSE=TRUE, first.rows=10000, next.rows=50000)
Fixed width text files
read.fwf() and write.fwf().
Some fixed width text files are provided with a SAS script to import them. Anthony Damico has created SAScii package to easily import those data[4].
Unstructured text files
- See
scan()andreadLines()in the Reading and writing text files section.
Stata (dta)
- We can read Stata data using
read.dta()in the foreign package and export to Stata data format usingwrite.dta(). - Note that string variables in Stata are limited to 244 characters. This can be an issue during the exportation process.
- See also
Stata.file()in the memisc package andstata.getin the Hmisc package.
> library("foreign") > mydata <- read.dta("mydata.dta",convert.dates = TRUE, convert.factors = TRUE, convert.underscore = TRUE) > names(mydata) > write.dta(mydata, file = "mydata.dta")
SAS (sas7bdat)
Experimental support for SAS databases having the sas7bdat extension is provided by the sas7bdat[5] package. However, sas7bdat files generated by 64 bit versions of SAS, and SAS running on non-Microsoft Windows platforms are not yet supported.
SAS (xpt)
- See also
sasexport.get()andsas.get()in the Hmisc - See also the SASxport package.
library("foreign") mydata<-read.xport("SASData.xpt") names(mydata)
SPSS (sav)
read.spss()(foreign) andspss.get()(Hmisc)
> library("foreign") > mydata<-read.spss("SPSSData.sav") > names(mydata)
Excel (xls,xlsx)
Importing data from Excel is not easy. The solution depends on your operating system. If none of the methods below works, you can always export each Excel spreadsheets to CSV format and read the CSV in R. This is often the simplest and quickest solution.
The RODBC solution (works fine on Windows):
library("RODBC") channel <- odbcConnectExcel("Graphiques pourcent croissance.xls") # creates a connection sqlTables(channel) # List all the tables effec <- sqlFetch(channel, "effec") # Read one spreadsheet as an R table odbcClose(channel) # close the connection (don't forget)
The xlsReadWrite package.
> library(xlsReadWrite) mydat <- read.xls("myfile.xls", colNames = T, sheet = "mysheet", type = "data.frame", from = 1, checkNames = TRUE)
- "sheet" specifies the name or the number of the sheet you want to import.
- "from" specifies the first row of the spreadsheet.
The gnumeric package[6]. This package use an external software called ssconvert which is usually installed with gnumeric, the Gnome office spreadsheet. The read.gnumeric.sheet() function reads xls and xlsx files.
library("gnumeric") df1 <- read.gnumeric.sheet(file = "df.xls", head = TRUE, sheet.name = "Feuille1") df2 <- read.gnumeric.sheet(file = "df.xlsx", head = TRUE, sheet.name = "Feuille1")
See also xlsx for Excel 2007 documents and read.xls() (gdata).
The XLConnect package.
require("XLConnect") wb <- loadWorkbook("myfile.xls", create = FALSE) # Show a summary of the workbook (shows worksheets, # defined names, hidden sheets, active sheet name, ...) summary(wb) # Read data from a worksheet interpreting the first row as column names df1 <- readWorksheet(wb, sheet = "mysheet") # Read data from a named region/range interpreting the first row as column # names df2 <- readNamedRegion(wb, name = "myname", header = TRUE)
XLConnect supports reading and writing both xls and xlsx file formats. Since it is based on Apache POI it only requires a Java installation and as such works on many platforms including Windows, UNIX/Linux and Mac. Besides reading & writing data it provides a number of additional features such as adding plots, cell styling & style actions and many more.
Google Spread Sheets
You should make the spreadsheet public, publish it as a CSV file. Then you can read it in R using read.csv(). See more on the Revolution's computing blog (link). See also RGoogleDocs (link).
# Read from a Google SpreadSheet. require(RCurl) myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AkuuKBh0jM2TdGppUFFxcEdoUklCQlJhM2kweGpoUUE&single=true&gid=0&output=csv") read.csv(textConnection(myCsv))
gnumeric spreadsheets
The gnumeric package[6]. read.gnumeric.sheet() reads one sheet and read.gnumeric.sheets() reads all sheets and store them in a list.
library("gnumeric") df <- read.gnumeric.sheet(file = "df.gnumeric", head = TRUE, sheet.name = "df.csv") View(df) df <- read.gnumeric.sheets(file = "df.gnumeric", head = TRUE) View(df$df.csv)
OpenOffice and LibreOffice (ods)
The gnumeric package does a good job for ODS files[6].
library("gnumeric") df <- read.gnumeric.sheet(file = "df.ods", head = TRUE, sheet.name = "Feuille1")
speedR is another alternative.
library("speedR") df <- speedR.importany(file = "df.ods")
Note that you can also use the speedR graphical user interface (speedR()) which will return the command line for replication.
library("speedR") speedR()
Or you may use ROpenOffice[7].
library("ROpenOffice") df <- read.ods(file = "df.ods")
JSON
JSON (JavaScript Object Notation) is a very common format on the internet. The rjson library makes it easy to import data from a json format[8].
# json.txt : a text file including data in the JSON format library("rjson") df <- fromJSON(paste(readLines("json.txt"), collapse=""))
Is is easy to export a list or a dataframe to a JSON format using the toJSON() function :
# df : a data frame library("rjson") json <- toJSON(df)
dBase (dbf)
read.dbf() in the foreign package.
library("foreign") df <- read.dbf("salaries07.dbf") str(df)
Hierarchical Data Format (hdf5)
↑Jump back a sectionDICOM and NIfTI
- See "Working with the {DICOM} and {NIfTI} Data Standards in R" in the Journal of Statistical Software[10]
Resources
- R Data Manual[11].
- Paul Murrell's Introduction to Data Technologies[12].
References
- ↑ Stat Transfer http://www.stattransfer.com/
- ↑ speedR http://speedr.r-forge.r-project.org/
- ↑ "Opening Large CSV Files in R". http://stathack.wordpress.com/2012/12/27/opening-large-csv-files-in-r/. Retrieved March 7, 2013.
- ↑ David Smith. "Importing public data with SAS instructions into R". http://blog.revolutionanalytics.com/2012/07/importing-public-data-with-sas-instructions-into-r.html. Retrieved February 1, 2013.
- ↑ sas7bdat http://cran.r-project.org/web/packages/sas7bdat/index.html
- ↑ abc This command has been tested using Ubuntu 10.10 and R 2.11.1
- ↑ The Omega Project for Statistical Computing
- ↑ http://cran.r-project.org/web/packages/rjson/index.html
- ↑ http://cran.r-project.org/web/packages/hdf5/index.html
- ↑ Brandon Whitcher, Volker J. Schmid, Andrew Thorton "Working with the {DICOM} and {NIfTI} Data Standards in R", Journal of Statistical Software Vol. 44, Issue 6, Oct 2011, link
- ↑ R Data Manual http://cran.r-project.org/doc/manuals/R-data.html
- ↑ Paul Murrell introduction to Data Technologies http://www.stat.auckland.ac.nz/~paul/ItDT/