# Stata/Linear Models

## Simple Linear ModelEdit

We generate a simple fake data set :

clear
set obs 1000
gen u = invnorm(uniform())
gen x = invnorm(uniform())
gen y = 1 + x + u
reg y x
eret list /*gives the list of all stored results */
predict yhat /*gives the predicted value of y*/
predict res, res /*gives the residuals*/

leanout is a prefix which simplifies the output[1]. This command does not display useless ancillary statistics and focus and confidence intervals rather than null hypothesis testing.

ssc install leanout
leanout : reg y x

## Performing multiple regression on the same subsampleEdit

Sometimes you want to perform multiple regressions on the same subsample. This is not obvious since when one of the variable of the model is missing the observation is dropped. One way to be sure that you use the same subsample is to use the 'e(sample)' command which returns the list of all used observations. In the example below qui store the result of 'e(sample)' in variables 'samp1' and 'samp2' and we perform the model conditionning on 'samp1==1 & samp2 == 1'. Thus we are sure that both estimation are done using the same observations.

. clear
. set obs 1000
. gen u = invnorm(uniform())
. gen x = invnorm(uniform())
. gen y1 = 1 + x + u if uniform() < .8
. gen y2 = 1 + x + u if uniform() < .9
. qui reg y1 x
. gen samp1 = e(sample)
. ta samp1
. qui reg y2 x
. gen samp2 = e(sample)
. ta samp2
. eststo clear
. eststo : qui : reg y1 x if samp1 & samp2
. eststo : qui : reg y2 x if samp1 & samp2
. esttab ,  star(* 0.1 ** 0.05 *** 0.01) se

## Instrumental VariablesEdit

Here is a data generating process for an instrumental variable setting. u is correlated with x which gives endogeneity. z is independant of u and correlated with x, which makes it eligible as a valid instrument for x.

clear
set obs 1000
gen u = invnorm(uniform())
gen z = invnorm(uniform())
gen x = invnorm(uniform()) + z + u
gen y = 1 + 2*x + u

It easy to see that the standard least square estimate is biased and the IV estimate is unbiased.

eststo clear
eststo : reg y x
eststo : ivreg y (x=z)
esttab , se

You can perform an overidentification test using overid or ivreg2

clear
set obs 1000
gen u = invnorm(uniform())
gen z1 = invnorm(uniform())
gen z2 = invnorm(uniform())
gen x = invnorm(uniform()) + z1 - 2*z2 + u
gen y = 2*x + u

ivreg y (x=z1 z2)
overid
ivreg2 y (x=z1 z2)

## Seemingly Unrelated EquationsEdit

. clear
. set obs 1000
. local s11 = 1
. local s12 = .5
. local s22 = 1
. local s13 = .5
. local s23 = .5
. local s33 = 1
. forvalues k = 1/3{
2.  tempvar u`k'
3.  gen `u`k'' = invnorm(uniform())
4.  }
. gen eta1 = `s11' * `u1'
. gen eta2 = `s12' * `u1' + `s22' * `u2'
. gen eta3 = `s13' * `u1' + `s23' * `u2' + `s33' * `u3'
. gen x = invnorm(uniform())
. forvalues k=1/3{
2.  gen z`k' = invnorm(uniform())
3.  }
. gen y1 = 1 + 2*x + z1 + eta1
. gen y2 = - 1 + x + z2 + eta2
. gen y3 = 4 + z3 + eta3
. global eq1 =  "y1 x z1"
. global eq2 =  "y2 x z2"
. global eq3 =  "y3 x z3"
. reg \$eq1
. reg \$eq2
. reg \$eq3
. sureg (toto1 : \$eq1) (toto2 : \$eq2) (toto3 : \$eq3)

## Linear Panel DataEdit

• xtset
• xtreg
• xtabond
• xtabond2
• ivreg2
• xtivreg2
• ivendog
• ivhettest
• overid : overidentification test
• xtoverid : overidentification test
• xttest2
• ivgmm0
• xtarsim
• xtdpd
• xtdpdsys

### Random effect estimatorEdit

We assume $y_{it} = 1 + x_{it} + z_{i} + f_{i} + u_{it}$. With f independant of x and z and u independant of x and z.

. clear
. set obs 1000
. gen id = _n
. gen f = invnorm(uniform())
. gen z = uniform()
. expand 10
. gen u = invnorm(uniform())
. gen x = uniform()
. gen y = 1 + x + z + f + u
. eststo clear
. eststo : qui : reg y x z
. eststo : qui : reg y x z, robust
. eststo : qui : reg y x z, cluster(id)
. eststo : qui : xtreg y x z, i(id) re
. eststo : qui : xtreg y x z, i(id) mle
. eststo : qui : xtmixed y x z || id : , mle
. esttab * , se

### Dynamic Linear Panel DataEdit

Layard and Nickel unemployment dataset.

. use http://fmwww.bc.edu/ec-p/data/macro/abdata.dta, clear
(Layard & Nickell, Unemployment in Britain, Economica 53, 1986 from Ox dist)

You can also generate fake data :

clear
set obs 10000
set seed 123456
gen id = _n
gen f= invnorm(uniform())
forvalues t=1/5{
gen u`t' = invnorm(uniform())
}
gen y1 = f/.3 + u1
forvalues t=2/5{
local z=`t'-1
gen y`t' =  .7 * y`z' +  f +  u`t'
}
save wide, replace
reshape long y, i(id) j(year)
drop u* f
tsset siren an
save long, replace

It is easy to see that standard random effect and fixed effect models are biased but instrumented random and fixed effect are unbiased :

eststo clear
eststo : qui : xtreg y l.y, re
eststo : qui : xtreg y l.y, fe
eststo : qui : xtivreg y (l.y= l2.d.y) , re
eststo : qui : xtivreg y (l.y= l2.y) , fd
esttab  ,se
eststo clear
eststo : qui : xi : xtabond2 y l.y, gmmstyle(l.y, lag(2 .) equation(level))  nomata  robust
eststo : qui : xi : xtabond2 y l.y, gmmstyle(l.y, lag(2 .) equation(level))  ivstyle( , e(diff)) nomata  robust
eststo : qui : xi : xtabond2 y l.y, iv(l.y l2.y l3.y, equation(diff))   nomata  robust
esttab , se

## ReferencesEdit

1. Nathaniel Beck "leanout: A prefix to regress (and similar commands) to produce less output that is more useful" Stata Journal, forthcoming http://politics.as.nyu.edu/docs/IO/2576/sj_driver.pdf
 Previous: Descriptive Statistics Index Next: Maximum Likelihood
Last modified on 18 October 2011, at 16:00