R Programming/Tobit And Selection Models

Tobit (type 1 Tobit) edit

In this section, we look at simple tobit model where the outcome variable is observed only if it is above or below a given threshold.

  • tobit() in the AER package[1]. This is a wrapper for survreg().
N <- 1000
u <- rnorm(N)
x <- - 1 + rnorm(N)
ystar <- 1 + x + u
y <- ystar*(ystar > 0)
hist(y)

ols <- lm(y ~ x)
summary(ols)
#Plot a correlation matrix and scatter plot
library(GGally)
library(ggplot2)
library(ggfortify)
ggcorr(DATA)
ggpairs(DATA)
#
M<lm(y~.)
library(ggfortify)
autoplot(M, label.size = 3)
#












library(AER)
tobit <- tobit(y ~ x,left=0,right=Inf,dist = "gaussian")

Selection models (type 2 tobit or heckit) edit

In this section we look at endogenous selection process. The outcome y is observe only if d is equal to one with d a binary variable which is correlated with the error term of y.

  • heckit() and selection() in sampleSelection [2]. The command is called heckit() in honor of James Heckman[3].
N <- 1000
u <- rnorm(N)
v <- rnorm(N)
x <- - 1 + rnorm(N)
z <- 1 + rnorm(N)
d <- (1 + x + z + u + v> 0)
ystar <- 1 + x + u
y <- ystar*(d == 1)
hist(y)

ols <- lm(y ~ x)
summary(ols)

library(sampleSelection)
heckit.ml <- heckit(selection = d ~ x + z, outcome = y ~ x, method = "ml")
summary(heckit.ml)

heckit.2step <- heckit(selection = d ~ x + z, outcome = y ~ x, method = "2step")
summary(heckit.2step)

Multi-index selection models edit

In this section we look at endogenous selection processes in matching markets. Matching is concerned with who transacts with whom, and how. For example, which students attend which college. The outcome y is observed only for equilibrium student-college pairs (or matches). These matches are indicated with d equal to one with d a binary variable which is correlated with the error term of y.

  • stabit() and stabit2() in matchingMarkets.[4][5] The command is called stabit() in reference to the application in stable matching markets.

Simulate two-sided matching data for 20 markets (m=20) with 100 students (nStudents=100) per market and 20 colleges with quotas of 5 students, each (nSlots=rep(5,20)). True parameters in selection and outcome equations are all equal to 1.

library(matchingMarkets)
xdata <- stabsim2(m=20, nStudents=100, nSlots=rep(5,20),
  colleges = "c1",
  students = "s1",
  outcome = ~ c1:s1 + eta + nu,
  selection = ~ -1 + c1:s1 + eta
)

Observe the bias from sorting between students and colleges.

lm1 <- lm(y ~ c1:s1, data=xdata$OUT)
summary(lm1)

Correct for sorting bias by running the Gibbs sampler in Sorensen (2007).[6]

fit2 <- stabit2(OUT = xdata$OUT,
           colleges = "c1",
           students = "s1",
           outcome = y ~ c1:s1, 
           selection = ~ -1 + c1:s1,
           niter=1000
)
summary(fit2)

Truncation edit

  • truncreg package
  • DTDA "An R package for analyzing truncated data" pdf.

References edit

  1. Christian Kleiber and Achim Zeileis (2008). Applied Econometrics with R. New York: Springer-Verlag. ISBN 978-0-387-77316-2. URL http://CRAN.R-project.org/package=AER
  2. Sample Selection Models in R: Package sampleSelection http://www.jstatsoft.org/v27/i07
  3. James Heckman "Sample selection bias as a specification error", Econometrica: Journal of the econometric society, 1979
  4. Klein, T. (2015). "Analysis of Stable Matchings in R: Package matchingMarkets" (PDF). Vignette to R Package matchingMarkets.
  5. "matchingMarkets: Analysis of Stable Matchings". R Project.
  6. Sorensen, M. (2007). "How Smart is Smart Money? A Two-Sided Matching Model of Venture Capital". Journal of Finance. 62 (6): 2725–2762.