# R Programming/Binomial Models

In this section, we look at the binomial model. We have one outcome which is binary and a set of explanatory variables.

This kind of model can be analyzed using a linear probability model. However a drawback of this model for the parameter of the Bernoulli distribution is that, unless restrictions are placed on $\beta$ , the estimated coefficients can imply probabilities outside the unit interval $[0,1]$ . For this reason, models such as the logit model or the probit model are more commonly used. If you want to estimate a linear probability model, have a look at the linear models page.

## Logit model

The model takes the form : $y_{i}\sim Bernoulli(\pi _{i})$  with the inverse link function : $\pi _{i}={\frac {exp(x_{i}'\beta )}{(1+exp(x_{i}'\beta ))}}$ . It can be estimated using maximum likelihood or using bayesian methods.

### Fake data simulations

> x <- 1 + rnorm(1000,1)
> xbeta <- -1  + (x* 1)
> proba <- exp(xbeta)/(1 + exp(xbeta))
> y <- ifelse(runif(1000,0,1) < proba,1,0)
> table(y)
> df <- data.frame(y,x)


### Maximum likelihood estimation

• The standard way to estimate a logit model is glm() function with family binomial and link logit.
• lrm() (Design) is another implementation of the logistic regression model.
• There is an implementation in the Zelig package.

In this example, we simulate a model with one continuous predictor and estimate this model using the glm() function.

> res <- glm(y ~ x , family  = binomial(link=logit))
> summary(res) # results
> confint(res) # confindence intervals
> names(res)
> exp(res$coefficients) # odds ratio > exp(confint(res)) # Confidence intervals for odds ratio (delta method) > predict(res) # prediction on a linear scale > predict(res, type = "response") # predicted probabilities > plot(x, predict(res, type = "response")) # plot the predicted probabilities  #### Zelig The Zelig' package makes it easy to compute all the quantities of interest. We develop a new example. First we simulate a new dataset with two continuous explanatory variables and we estimate the model using zelig() with the model = "logit" option. • We the look at the predicted values of y at the mean of x1 and x2 • Then we look at the predicted values when x1 = 0 and x2 = 0 • We also look at what happens when x1 changes from the 3rd to the 1st quartile. > x1 <- 1 + rnorm(1000) > x2 <- -1 + x1 + rnorm(1000) > xbeta <- -1 + x1 + x2 > proba <- exp(xbeta)/(1 + exp(xbeta)) > y <- ifelse(runif(1000,0,1) < proba,1,0) > mydat <- data.frame(y,x1,x2) > table(y) > > z.out <- zelig(y ~ x1 + x2, model = "logit", data = mydat) # estimating the model > summary(z.out) > x.out <- setx(z.out, x1 = mean(x1), x2 = mean(x2)) # setting values for the explanatory variables > s.out <- sim(z.out, x = x.out) # simulating the quantities of interest > summary(s.out) > plot(s.out) # plot the quantities of interest > # the same with other values > x.out <- setx(z.out, x1 = 0, x2 = 0) > s.out <- sim(z.out, x = x.out) > summary(s.out) > # What happens if x1 change from the 3rd quartile to the 1st quartile ? > x.high <- setx(z.out, x1 = quantile(mydat$x1,.75), x2 = mean(mydat$x2)) > x.low <- setx(z.out, x1 = quantile(mydat$x1,.25), x2 = mean(x2))
> s.out2<-sim(z.out, x=x.high, x1=x.low)
> plot(s.out2)

• ROC Curve in the verification package.
• Zelig has a rocplot() function.

### Bayesian estimation

• bayesglm() in the arm package
• MCMClogit() in the MCMCpack for a bayesian estimation of the logit model.
> # Data generating process
> x <- 1 + rnorm(1000,1)
> xbeta <- -1  + (x* 1)
> proba <- exp(xbeta)/(1 + exp(xbeta))
> y <- ifelse(runif(1000,0,1) < proba,1,0)
> table(y)
>
> library(MCMCpack)
> res <- MCMClogit(y ~ x)
> summary(res)

> library("arm")
> res <- bayesglm(y ~ x, family = binomial(link=logit))
> summary(res)


## Probit model

The probit model is a binary model in which we assume that the link function is the cumulative density function of a normal distribution.

We simulate fake data. First, we draw two random variables x1 and x2 in any distributions (this does not matter). Then we create the vector xbeta as a linear combination of x1 and x2. We apply the link function to that vector and we draw the binary variable y as Bernouilli random variable.

> x1 <- 1 + rnorm(1000)
> x2 <- -1 + x1 + rnorm(1000)
> xbeta <- -1  + x1 + x2
> proba <- pnorm(xbeta)
> y <- ifelse(runif(1000,0,1) < proba,1,0)
> mydat <- data.frame(y,x1,x2)
> table(y)


### Maximum likelihood

We can use the glm() function with family=binomial(link=probit) option or the probit() function in the sampleSelection package which is a wrapper of the former one.

> res <- glm(y ~ x1 + x2 , family = binomial(link=probit), data = mydat)
> summary(res)
>
> library("sampleSelection")
> probit(y ~ x1 + x2, data = mydat)
> summary(res)


### Bayesian estimation

• MCMCprobit() (MCMCpack)
> library("MCMCpack")
> post <- MCMCprobit(y ~ x1 + x2 , data = mydat)
> summary(post)
> plot(post)