Stata/Linear Models

References

UCLA Computing STATA Regression page

Simple Linear Model

We generate a simple fake data set :

clear
set obs 1000
gen u = invnorm(uniform())
gen x = invnorm(uniform()) 
gen y = 1 + x + u

reg y x
eret list /*gives the list of all stored results */
predict yhat /*gives the predicted value of y*/
predict res, res /*gives the residuals*/

leanout is a prefix which simplifies the output^[1]. This command does not display useless ancillary statistics and focus and confidence intervals rather than null hypothesis testing.

ssc install leanout{{typo help inline|reason=similar to cleanout|date=September 2022}}
leanout : reg y x

Performing multiple regression on the same subsample

Sometimes you want to perform multiple regressions on the same subsample. This is not obvious since when one of the variable of the model is missing the observation is dropped. One way to be sure that you use the same subsample is to use the 'e(sample)' command which returns the list of all used observations. In the example below qui store the result of 'e(sample)' in variables 'samp1' and 'samp2' and we perform the model conditioning on 'samp1==1 & samp2 == 1'. Thus we are sure that both estimation are done using the same observations.

 
. clear
. set obs 1000
. gen u = invnorm(uniform())
. gen x = invnorm(uniform())
. gen y1 = 1 + x + u if uniform() < .8
. gen y2 = 1 + x + u if uniform() < .9 
. qui reg y1 x
. gen samp1 = e(sample)
. ta samp1 
. qui reg y2 x
. gen samp2 = e(sample)
. ta samp2
. eststo clear
. eststo : qui : reg y1 x if samp1 & samp2 
. eststo : qui : reg y2 x if samp1 & samp2
. esttab ,  star(* 0.1 ** 0.05 *** 0.01) se

Instrumental Variables

Here is a data generating process for an instrumental variable setting. u is correlated with x which gives endogeneity. z is independant of u and correlated with x, which makes it eligible as a valid instrument for x.

clear
set obs 1000
gen u = invnorm(uniform())
gen z = invnorm(uniform())
gen x = invnorm(uniform()) + z + u
gen y = 1 + 2*x + u

It easy to see that the standard least square estimate is biased and the IV estimate is unbiased.

eststo clear
eststo : reg y x 
eststo : ivreg y (x=z)
esttab , se

You can perform an overidentification test using overid or ivreg2

clear
set obs 1000
gen u = invnorm(uniform())
gen z1 = invnorm(uniform())
gen z2 = invnorm(uniform())
gen x = invnorm(uniform()) + z1 - 2*z2 + u
gen y = 2*x + u

ivreg y (x=z1 z2)
overid
ivreg2 y (x=z1 z2)

Seemingly Unrelated Equations

. clear
. set obs 1000
. local s11 = 1
. local s12 = .5 
. local s22 = 1
. local s13 = .5
. local s23 = .5
. local s33 = 1 
. forvalues k = 1/3{
  2.  tempvar u`k'
  3.  gen `u`k'' = invnorm(uniform())
  4.  }
. gen eta1 = `s11' * `u1'
. gen eta2 = `s12' * `u1' + `s22' * `u2'
. gen eta3 = `s13' * `u1' + `s23' * `u2' + `s33' * `u3' 
. gen x = invnorm(uniform()) 
. forvalues k=1/3{
  2.  gen z`k' = invnorm(uniform())
  3.  }
. gen y1 = 1 + 2*x + z1 + eta1 
. gen y2 = - 1 + x + z2 + eta2 
. gen y3 = 4 + z3 + eta3
. global eq1 =  "y1 x z1"
. global eq2 =  "y2 x z2"
. global eq3 =  "y3 x z3" 
. reg $eq1
. reg $eq2
. reg $eq3
. sureg (toto1 : $eq1) (toto2 : $eq2) (toto3 : $eq3)

Linear Panel Data

xtset
xtreg
xtabond
xtabond2
ivreg2
xtivreg2
ivendog
ivhettest
overid^{[check spelling]} : overidentification test
xtoverid : overidentification test
xttest2
ivgmm0
xtarsim
xtdpd
xtdpdsys

Random effect estimator

We assume $y_{it}=1+x_{it}+z_{i}+f_{i}+u_{it}$ . With f independant of x and z and u independant of x and z.

. clear
. set obs 1000
. gen id = _n
. gen f = invnorm(uniform())
. gen z = uniform()
. expand 10
. gen u = invnorm(uniform())
. gen x = uniform()
. gen y = 1 + x + z + f + u
. eststo clear
. eststo : qui : reg y x z
. eststo : qui : reg y x z, robust
. eststo : qui : reg y x z, cluster(id)
. eststo : qui : xtreg y x z, i(id) re
. eststo : qui : xtreg y x z, i(id) mle
. eststo : qui : xtmixed y x z || id : , mle
. esttab * , se

Dynamic Linear Panel Data

Layard and Nickel unemployment dataset.

. use http://fmwww.bc.edu/ec-p/data/macro/abdata.dta, clear
(Layard & Nickell, Unemployment in Britain, Economica 53, 1986 from Ox dist)

You can also generate fake data :

clear
	set obs 10000
	set seed 123456
	gen id = _n
	gen f= invnorm(uniform())
	forvalues t=1/5{
		gen u`t' = invnorm(uniform())
		}
	gen y1 = f/.3 + u1 
	forvalues t=2/5{
		local z=`t'-1
		gen y`t' =  .7 * y`z' +  f +  u`t'
	}
save wide, replace
reshape long y, i(id) j(year)
drop u* f
tsset siren an 
save long, replace

It is easy to see that standard random effect and fixed effect models are biased but instrumented random and fixed effect are unbiased :

eststo clear
eststo : qui : xtreg y l.y, re 
eststo : qui : xtreg y l.y, fe 
eststo : qui : xtivreg y (l.y= l2.d.y) , re 
eststo : qui : xtivreg y (l.y= l2.y) , fd 
esttab  ,se

eststo clear
eststo : qui : xi : xtabond2 y l.y, gmmstyle(l.y, lag(2 .) equation(level))  nomata  robust
eststo : qui : xi : xtabond2 y l.y, gmmstyle(l.y, lag(2 .) equation(level))  ivstyle( , e(diff)) nomata  robust
eststo : qui : xi : xtabond2 y l.y, iv(l.y l2.y l3.y, equation(diff))   nomata  robust
esttab , se

References

↑ Nathaniel Beck "leanout: A prefix to regress (and similar commands) to produce less output that is more useful" Stata Journal, forthcoming http://politics.as.nyu.edu/docs/IO/2576/sj_driver.pdf

Previous: Descriptive Statistics

Index

Next: Maximum Likelihood

[1] Nathaniel Beck "leanout: A prefix to regress (and similar commands) to produce less output that is more useful" Stata Journal, forthcoming http://politics.as.nyu.edu/docs/IO/2576/sj_driver.pdf

[1]