R Programming/Importing and exporting data

Data can be stored in a large variety of formats. Each statistical package has its own format for data (xls for Microsoft Excel, dta for Stata, sas7bdat for SAS, ...). R can read almost all file formats. We present a method for each kind of file. If none of the following methods work, you can use a specific software for data conversion such as the free software OpenRefine or the commercial software Stat Transfer[1]. In any case, most statistical software can export data in a CSV (comma separated values) format and all of them can read CSV data. This is often the best solution to make data available to everyone.

Graphical user interfaces

Some IDE or GUI provides some press button solution to import data.

You may also have a look at speedR, a graphical user interface which helps at importing data from Excel, OpenOfficeCalc, CSV and other text files[2].

library(speedR)
speedR()
↑Jump back a section

CSV (csv,txt,dat)

You can import data from a text file (often CSV) using read.table(), read.csv() or read.csv2(). The option header = TRUE indicates that the first line of the CSV file should be interpreted as variables names and the option sep = gives the separator (generally "," or ";").

csv.get() (Hmisc) is another possibility.

mydata <- read.table("data.txt",header=TRUE)
mydata <- read.table("data.csv", header = TRUE, sep=",")  # import from a CSV
mydata <- read.csv("data.csv", header=T)
mydata <- read.table("data.csv", header = TRUE, sep=";") 
mydata <- read.csv2("data.csv", header=T)

Note that there is no problem if your data are stored on the internet.

df <- read.table("http://www.mywebsite.com/.../data.csv", header = TRUE, sep = ",")

By default, strings are converted to factors. If you want to avoid this conversion, you can specify the option stringsAsFactors = FALSE.

You can export data to a text file using write.table().

write.table(mydat,file="mydat.csv",quote=T,append=F,sep=",",eol = "\n", na = "NA", dec = ".", row.names = T,col.names = T)

For large CSV files, it is possible to use the ff package[3].

library("ff")
df <- read.csv.ffdf(file="large_csv_file.csv", header=TRUE, VERBOSE=TRUE, first.rows=10000, next.rows=50000)
↑Jump back a section

Fixed width text files

read.fwf() and write.fwf().

Some fixed width text files are provided with a SAS script to import them. Anthony Damico has created SAScii package to easily import those data[4].

↑Jump back a section

Unstructured text files

↑Jump back a section

Stata (dta)

  • We can read Stata data using read.dta() in the foreign package and export to Stata data format using write.dta().
  • Note that string variables in Stata are limited to 244 characters. This can be an issue during the exportation process.
  • See also Stata.file() in the memisc package and stata.get in the Hmisc package.
> library("foreign")
> mydata <- read.dta("mydata.dta",convert.dates = TRUE, convert.factors = TRUE, convert.underscore = TRUE)
> names(mydata)
> write.dta(mydata, file = "mydata.dta")
↑Jump back a section

SAS (sas7bdat)

Experimental support for SAS databases having the sas7bdat extension is provided by the sas7bdat[5] package. However, sas7bdat files generated by 64 bit versions of SAS, and SAS running on non-Microsoft Windows platforms are not yet supported.

SAS (xpt)

  • See also sasexport.get() and sas.get() in the Hmisc
  • See also the SASxport package.
library("foreign")
mydata<-read.xport("SASData.xpt")
names(mydata)
↑Jump back a section

SPSS (sav)

  • read.spss() (foreign) and spss.get() (Hmisc)
> library("foreign")
> mydata<-read.spss("SPSSData.sav")
> names(mydata)
↑Jump back a section

EViews

readEViews() in the hexView package for EViews files.

↑Jump back a section

Excel (xls,xlsx)

Importing data from Excel is not easy. The solution depends on your operating system. If none of the methods below works, you can always export each Excel spreadsheets to CSV format and read the CSV in R. This is often the simplest and quickest solution.

The RODBC solution (works fine on Windows):

library("RODBC")
channel <- odbcConnectExcel("Graphiques pourcent croissance.xls") # creates a connection
sqlTables(channel) # List all the tables
effec <- sqlFetch(channel, "effec") # Read one spreadsheet as an R table
odbcClose(channel) # close the connection (don't forget)

The xlsReadWrite package.

> library(xlsReadWrite)
mydat <- read.xls("myfile.xls", colNames = T, sheet = "mysheet", type = "data.frame", from = 1, checkNames = TRUE)
  • "sheet" specifies the name or the number of the sheet you want to import.
  • "from" specifies the first row of the spreadsheet.

The gnumeric package[6]. This package use an external software called ssconvert which is usually installed with gnumeric, the Gnome office spreadsheet. The read.gnumeric.sheet() function reads xls and xlsx files.

library("gnumeric")
df1 <- read.gnumeric.sheet(file = "df.xls", head = TRUE, sheet.name = "Feuille1")
df2 <- read.gnumeric.sheet(file = "df.xlsx", head = TRUE, sheet.name = "Feuille1")

See also xlsx for Excel 2007 documents and read.xls() (gdata).

The XLConnect package.

require("XLConnect")
wb <- loadWorkbook("myfile.xls", create = FALSE)
# Show a summary of the workbook (shows worksheets,
# defined names, hidden sheets, active sheet name, ...)
summary(wb)
# Read data from a worksheet interpreting the first row as column names
df1 <- readWorksheet(wb, sheet = "mysheet")
# Read data from a named region/range interpreting the first row as column
# names
df2 <- readNamedRegion(wb, name = "myname", header = TRUE)

XLConnect supports reading and writing both xls and xlsx file formats. Since it is based on Apache POI it only requires a Java installation and as such works on many platforms including Windows, UNIX/Linux and Mac. Besides reading & writing data it provides a number of additional features such as adding plots, cell styling & style actions and many more.

↑Jump back a section

Google Spread Sheets

You should make the spreadsheet public, publish it as a CSV file. Then you can read it in R using read.csv(). See more on the Revolution's computing blog (link). See also RGoogleDocs (link).

# Read from a Google SpreadSheet.
require(RCurl)
myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AkuuKBh0jM2TdGppUFFxcEdoUklCQlJhM2kweGpoUUE&single=true&gid=0&output=csv")
read.csv(textConnection(myCsv))
↑Jump back a section

gnumeric spreadsheets

The gnumeric package[6]. read.gnumeric.sheet() reads one sheet and read.gnumeric.sheets() reads all sheets and store them in a list.

library("gnumeric")
df <- read.gnumeric.sheet(file = "df.gnumeric", head = TRUE, sheet.name = "df.csv")
View(df)
df <- read.gnumeric.sheets(file = "df.gnumeric", head = TRUE)
View(df$df.csv)
↑Jump back a section

OpenOffice and LibreOffice (ods)

The gnumeric package does a good job for ODS files[6].

library("gnumeric")
df <- read.gnumeric.sheet(file = "df.ods", head = TRUE, sheet.name = "Feuille1")

speedR is another alternative.

library("speedR")
df <- speedR.importany(file = "df.ods")

Note that you can also use the speedR graphical user interface (speedR()) which will return the command line for replication.

library("speedR")
speedR()

Or you may use ROpenOffice[7].

library("ROpenOffice")
df <- read.ods(file = "df.ods")
↑Jump back a section

JSON

JSON (JavaScript Object Notation) is a very common format on the internet. The rjson library makes it easy to import data from a json format[8].

# json.txt : a text file including data in the JSON format
library("rjson")
df <- fromJSON(paste(readLines("json.txt"), collapse=""))

Is is easy to export a list or a dataframe to a JSON format using the toJSON() function :

# df : a data frame
library("rjson")
json <- toJSON(df)
↑Jump back a section

dBase (dbf)

read.dbf() in the foreign package.

library("foreign")
df  <- read.dbf("salaries07.dbf")
str(df)
↑Jump back a section

Hierarchical Data Format (hdf5)

hdf5 data can be read using the hdf5 package[9].

↑Jump back a section

DICOM and NIfTI

  • See "Working with the {DICOM} and {NIfTI} Data Standards in R" in the Journal of Statistical Software[10]
↑Jump back a section

Resources

↑Jump back a section
Last modified on 7 March 2013, at 08:09