Last modified on 15 October 2014, at 12:43

R Programming/Importing and exporting data

Data can be stored in a large variety of formats. Each statistical package has its own format for data (xls for Microsoft Excel, dta for Stata, sas7bdat for SAS, ...). R can read almost all file formats. We present a method for each kind of file. If none of the following methods work, you can use a specific software for data conversion such as the free software OpenRefine or the commercial software Stat Transfer[1]. In any case, most statistical software can export data in a CSV (comma separated values) format and all of them can read CSV data. This is often the best solution to make data available to everyone.

Graphical user interfacesEdit

Some IDE or GUI provides some press button solution to import data.

You may also have a look at speedR, a graphical user interface which helps at importing data from Excel, OpenOfficeCalc, CSV and other text files[2].

library(speedR)
speedR()

CSV (csv,txt,dat)Edit

You can import data from a text file (often CSV) using read.table(), read.csv() or read.csv2(). The option header = TRUE indicates that the first line of the CSV file should be interpreted as variables names and the option sep = gives the separator (generally "," or ";").

csv.get() (Hmisc) is another possibility.

mydata <- read.table("data.txt",header=TRUE)
mydata <- read.table("data.csv", header = TRUE, sep=",")  # import from a CSV
mydata <- read.csv("data.csv", header=T)
mydata <- read.table("data.csv", header = TRUE, sep=";") 
mydata <- read.csv2("data.csv", header=T)

Note that there is no problem if your data are stored on the internet.

df <- read.table("http://www.mywebsite.com/.../data.csv", header = TRUE, sep = ",")

By default, strings are converted to factors. If you want to avoid this conversion, you can specify the option stringsAsFactors = FALSE.

You can export data to a text file using write.table().

write.table(mydat,file="mydat.csv",quote=T,append=F,sep=",",eol = "\n", na = "NA", dec = ".", row.names = T,col.names = T)

For large CSV files, it is possible to use the ff package[3].

library("ff")
df <- read.csv.ffdf(file="large_csv_file.csv", header=TRUE, VERBOSE=TRUE, first.rows=10000, next.rows=50000)

Fixed width text filesEdit

read.fwf() and write.fwf().

Some fixed width text files are provided with a SAS script to import them. Anthony Damico has created SAScii package to easily import those data[4].

Unstructured text filesEdit

Stata (dta)Edit

  • We can read Stata data using read.dta() in the foreign package and export to Stata data format using write.dta().
  • Note that string variables in Stata are limited to 244 characters. This can be an issue during the exportation process.
  • See also Stata.file() in the memisc package and stata.get in the Hmisc package.
> library("foreign")
> mydata <- read.dta("mydata.dta",convert.dates = TRUE, convert.factors = TRUE, convert.underscore = TRUE)
> names(mydata)
> write.dta(mydata, file = "mydata.dta")

SAS (sas7bdat)Edit

Experimental support for SAS databases having the sas7bdat extension is provided by the sas7bdat[5] package. However, sas7bdat files generated by 64 bit versions of SAS, and SAS running on non-Microsoft Windows platforms are not yet supported.

SAS (xpt)Edit

  • See also sasexport.get() and sas.get() in the Hmisc
  • See also the SASxport package.
library("foreign")
mydata<-read.xport("SASData.xpt")
names(mydata)

SPSS (sav)Edit

  • read.spss() (foreign) and spss.get() (Hmisc)
> library("foreign")
> mydata<-read.spss("SPSSData.sav")
> names(mydata)

EViewsEdit

readEViews() in the hexView package for EViews files.

Excel (xls,xlsx)Edit

Importing data from Excel is not easy. The solution depends on your operating system. If none of the methods below works, you can always export each Excel spreadsheets to CSV format and read the CSV in R. This is often the simplest and quickest solution.

XLConnect supports reading and writing both xls and xlsx file formats. Since it is based on Apache POI it only requires a Java installation and as such works on many platforms including Windows, UNIX/Linux and Mac. Besides reading & writing data it provides a number of additional features such as adding plots, cell styling & style actions and many more.

require("XLConnect")
wb <- loadWorkbook("myfile.xls", create = FALSE)
# Show a summary of the workbook (shows worksheets,
# defined names, hidden sheets, active sheet name, ...)
summary(wb)
# Read data from a worksheet interpreting the first row as column names
df1 <- readWorksheet(wb, sheet = "mysheet")
# Read data from a named region/range interpreting the first row as column
# names
df2 <- readNamedRegion(wb, name = "myname", header = TRUE)

The RODBC solution (works fine on Windows):

library("RODBC")
channel <- odbcConnectExcel("Graphiques pourcent croissance.xls") # creates a connection
sqlTables(channel) # List all the tables
effec <- sqlFetch(channel, "effec") # Read one spreadsheet as an R table
odbcClose(channel) # close the connection (don't forget)

The xlsReadWrite package.

> library(xlsReadWrite)
mydat <- read.xls("myfile.xls", colNames = T, sheet = "mysheet", type = "data.frame", from = 1, checkNames = TRUE)
  • "sheet" specifies the name or the number of the sheet you want to import.
  • "from" specifies the first row of the spreadsheet.

The gnumeric package[6]. This package use an external software called ssconvert which is usually installed with gnumeric, the Gnome office spreadsheet. The read.gnumeric.sheet() function reads xls and xlsx files.

library("gnumeric")
df1 <- read.gnumeric.sheet(file = "df.xls", head = TRUE, sheet.name = "Feuille1")
df2 <- read.gnumeric.sheet(file = "df.xlsx", head = TRUE, sheet.name = "Feuille1")

See also xlsx for Excel 2007 documents and read.xls() (gdata).

Google Spread SheetsEdit

You should make the spreadsheet public, publish it as a CSV file. Then you can read it in R using read.csv(). See more on the Revolution's computing blog (link). See also RGoogleDocs (link).

# Read from a Google SpreadSheet.
require(RCurl)
myCsv <- getURL("https://docs.google.com/spreadsheet/pub?hl=en_US&hl=en_US&key=0AkuuKBh0jM2TdGppUFFxcEdoUklCQlJhM2kweGpoUUE&single=true&gid=0&output=csv")
read.csv(textConnection(myCsv))

gnumeric spreadsheetsEdit

The gnumeric package[6]. read.gnumeric.sheet() reads one sheet and read.gnumeric.sheets() reads all sheets and store them in a list.

library("gnumeric")
df <- read.gnumeric.sheet(file = "df.gnumeric", head = TRUE, sheet.name = "df.csv")
View(df)
df <- read.gnumeric.sheets(file = "df.gnumeric", head = TRUE)
View(df$df.csv)

OpenOffice and LibreOffice (ods)Edit

The gnumeric package does a good job for ODS files[6].

library("gnumeric")
df <- read.gnumeric.sheet(file = "df.ods", head = TRUE, sheet.name = "Feuille1")

readODS does not require external dependencies, making it crossplatform.

library("readODS")
df=read.ods("df.ods")

Or you may use ROpenOffice[7].

library("ROpenOffice")
df <- read.ods(file = "df.ods")

no longer available in CRAN: speedR is another alternative.

library("speedR")
df <- speedR.importany(file = "df.ods")

Note that you can also use the speedR graphical user interface (speedR()) which will return the command line for replication.

library("speedR")
speedR()

JSONEdit

JSON (JavaScript Object Notation) is a very common format on the internet. The rjson library makes it easy to import data from a json format[8].

# json.txt : a text file including data in the JSON format
library("rjson")
df <- fromJSON(paste(readLines("json.txt"), collapse=""))

Is is easy to export a list or a dataframe to a JSON format using the toJSON() function :

# df : a data frame
library("rjson")
json <- toJSON(df)

dBase (dbf)Edit

read.dbf() in the foreign package.

library("foreign")
df  <- read.dbf("file.dbf")
str(df)

Hierarchical Data Format (hdf5)Edit

hdf5 data can be read using the hdf5 package[9].

SQLEdit

DICOM and NIfTIEdit

  • See "Working with the {DICOM} and {NIfTI} Data Standards in R" in the Journal of Statistical Software[10]

ResourcesEdit

ReferencesEdit

Previous: Data Management Index Next: Graphics