Econometric Theory/Heteroskedasticity

One of our CLR assumptions for linear regression is that our disturbance terms are homoscedastic, meaning they have equal scatter ( $Var(\epsilon )=\sigma ^{2}$ ). However, there are times that regressions end up with heteroscedastic disturbance terms, meaning the scatters are unequal ( $Var(\epsilon )=\sigma _{i}^{2}$ ).

Causes of Heteroscedasticity

Heteroscedasticity are more common in cross-sectional data than in time series. It is usually due to a scale or size factor.

Example: In basic Keynesian economics, we assume that savings and income are determined by wealth and income. Agents that have more wealth and income are more likely to save, this will produce a hetroscedastic relationship. The main causes include: 1. Presence of outliers 2. Omission of variable 3. Skewness in distribution of regressors 4. Incorrect or wrong functional form

Consequences of Heteroscedasticity

1) OLS Coefficients are still unbiased for true value. $E({\hat {\beta }})=\beta$

Unbiased coefficients depend on $E(\epsilon )=0,cov(x_{i},\epsilon _{i})=0$

So the regression is safe from heteroscedasticity. on this assumption.

2) OLS Coefficients are not efficient. There exists an alternative to the OLS Coefficient that has a smaller variance than the OLS one. $\exists _{\tilde {\beta }}st.var({\tilde {\beta }})<var({\hat {\beta }}^{OLS})$ where $E({\tilde {\beta }})=\beta$ and therefore, is more efficient.

3) Hypothesis tests, in the presence of heteroscedasticity of an OLS coefficient, based on the standard error of the coefficient are invalid.

Bias in estimated variance of OLS Estimators causes inefficiencies.

Recall: ${\hat {\beta }}^{OLS}=\beta +{\frac {\sum (x_{i}+{\bar {x}})\epsilon _{i}}{\sum (x_{i}+{\bar {x}})^{2}}}$

When ${\hat {\beta }}$ is unbiased, the second term goes to zero. However, with heteroscedasticity, the second term is not zero.

Derive $var({\hat {\beta }}^{OLS})$ when $var(\epsilon )=\sigma _{i}^{2}$ $var({\hat {\beta }}^{OLS})=var({\frac {\sum (x_{i}+{\bar {x}})\epsilon _{i}}{\sum (x_{i}+{\bar {x}})^{2}}})$ $=({\frac {1}{\sum (x_{i}-{\bar {x}})^{2}}})^{2}var(\sum (x_{i}-{\bar {x}})\epsilon )$ $=({\frac {1}{\sum (x_{i}-{\bar {x}})^{2}}})^{2}\sum var((x_{i}-{\bar {x}})\epsilon )$ $=({\frac {1}{\sum (x_{i}-{\bar {x}})^{2}}})^{2}\sum x_{i}-{\bar {x}})^{2}var(\epsilon )$ $=({\frac {1}{\sum (x_{i}-{\bar {x}})^{2}}})^{2}\sum x_{i}-{\bar {x}})^{2}\sigma _{i}^{2}$

We can represent the differences in variances in the error term by the matrix Z

$\sigma _{i}^{2}=Z\sigma ^{2}$