Below is the proof of the Normal Equations for OLS.
The goal of OLS is to minimize the sum of squared error terms to find the best fit, also called the Residual Sum of Squares (RSS). This is denoted by
∑
ϵ
i
^
2
{\displaystyle \sum {\hat {\epsilon _{i}}}^{2}}
.
Known:
ϵ
i
^
=
Y
i
−
Y
i
^
=
Y
i
−
α
−
β
X
i
{\displaystyle {\hat {\epsilon _{i}}}=Y_{i}-{\hat {Y_{i}}}=Y_{i}-\alpha -\beta X_{i}}
RSS =
∑
ϵ
i
^
2
{\displaystyle \sum {\hat {\epsilon _{i}}}^{2}}
=
=
∑
(
Y
i
−
Y
i
^
)
2
{\displaystyle =\sum (Y_{i}-{\hat {Y_{i}}})^{2}}
=
∑
(
Y
i
−
α
^
−
β
^
X
i
)
2
{\displaystyle =\sum (Y_{i}-{\hat {\alpha }}-{\hat {\beta }}X_{i})^{2}}
m
i
n
α
∑
ϵ
i
^
2
{\displaystyle min_{\alpha }\sum {\hat {\epsilon _{i}}}^{2}}
=
∂
∑
ϵ
i
^
2
∂
α
^
=
∑
2
ϵ
i
^
∂
ϵ
i
^
∂
α
^
=
2
∑
ϵ
i
^
(
−
1
)
=
2
∑
(
Y
i
−
α
^
−
β
^
X
i
)
(
−
1
)
=
0
{\displaystyle {\frac {\partial \sum {\hat {\epsilon _{i}}}^{2}}{\partial {\hat {\alpha }}}}=\sum 2{\hat {\epsilon _{i}}}{\frac {\partial {\hat {\epsilon _{i}}}}{\partial {\hat {\alpha }}}}=2\sum {\hat {\epsilon _{i}}}(-1)=2\sum (Y_{i}-{\hat {\alpha }}-{\hat {\beta }}X_{i})(-1)=0}
m
i
n
β
∑
ϵ
i
^
2
{\displaystyle min_{\beta }\sum {\hat {\epsilon _{i}}}^{2}}
=
∂
∑
ϵ
i
^
2
∂
β
^
=
∑
2
ϵ
i
^
∂
ϵ
i
^
∂
β
^
=
2
∑
ϵ
i
^
(
−
X
i
)
=
2
∑
(
Y
i
−
α
^
−
β
^
X
i
)
(
−
X
i
)
=
0
{\displaystyle {\frac {\partial \sum {\hat {\epsilon _{i}}}^{2}}{\partial {\hat {\beta }}}}=\sum 2{\hat {\epsilon _{i}}}{\frac {\partial {\hat {\epsilon _{i}}}}{\partial {\hat {\beta }}}}=2\sum {\hat {\epsilon _{i}}}(-X_{i})=2\sum (Y_{i}-{\hat {\alpha }}-{\hat {\beta }}X_{i})(-X_{i})=0}
So we have two equations:
∑
(
Y
i
−
α
^
−
β
^
X
i
)
(
−
1
)
=
0
{\displaystyle \sum (Y_{i}-{\hat {\alpha }}-{\hat {\beta }}X_{i})(-1)=0}
and
∑
(
Y
i
−
α
^
−
β
^
X
i
)
(
−
X
i
)
=
0
{\displaystyle \sum (Y_{i}-{\hat {\alpha }}-{\hat {\beta }}X_{i})(-X_{i})=0}
(The two(2) here is divided from both sides)
setting them both equal to
∑
Y
i
{\displaystyle \sum Y_{i}}
We get
∑
Y
i
=
n
α
^
+
β
^
∑
X
i
{\displaystyle \sum Y_{i}=n{\hat {\alpha }}+{\hat {\beta }}\sum X_{i}}
(This is the first OLS Normal Equation)
and
∑
Y
i
X
i
=
α
^
∑
X
i
+
β
^
∑
X
i
2
{\displaystyle \sum Y_{i}X_{i}={\hat {\alpha }}\sum X_{i}+{\hat {\beta }}\sum X_{i}^{2}}
(This is the second OLS Normal Equation)
Solve the Normal Equations
edit
Divide the first equation by n
1
n
∑
Y
i
=
α
^
+
1
n
β
^
∑
X
i
{\displaystyle {\frac {1}{n}}\sum Y_{i}={\hat {\alpha }}+{\frac {1}{n}}{\hat {\beta }}\sum X_{i}}
Leaves us with
(
∑
W
i
1
n
=
W
¯
)
{\displaystyle \left(\sum W_{i}{\frac {1}{n}}={\bar {W}}\right)}
Y
¯
=
α
^
+
β
^
X
¯
⇔
α
^
=
Y
¯
−
β
^
X
¯
{\displaystyle {\bar {Y}}={\hat {\alpha }}+{\hat {\beta }}{\bar {X}}\Leftrightarrow {\hat {\alpha }}={\bar {Y}}-{\hat {\beta }}{\bar {X}}}
Now we know how to get α(hat), we can work on β(hat)
∑
Y
i
X
i
=
α
^
∑
X
i
+
β
^
∑
X
i
2
=
[
Y
¯
−
β
^
X
¯
]
∑
X
i
+
β
^
∑
X
i
2
=
[
(
∑
X
i
)
(
∑
Y
i
)
n
]
+
β
^
[
∑
X
i
2
−
(
∑
X
i
)
2
n
]
{\displaystyle \sum Y_{i}X_{i}={\hat {\alpha }}\sum X_{i}+{\hat {\beta }}\sum X_{i}^{2}=[{\bar {Y}}-{\hat {\beta }}{\bar {X}}]\sum X_{i}+{\hat {\beta }}\sum X_{i}^{2}=[{\frac {(\sum X_{i})(\sum Y_{i})}{n}}]+{\hat {\beta }}[\sum X_{i}^{2}-{\frac {(\sum X_{i})^{2}}{n}}]}
We can move β(hat) to one side
β
^
=
∑
Y
i
X
i
−
(
∑
X
i
)
(
∑
Y
i
)
n
∑
X
i
2
−
(
∑
X
i
)
2
n
=
∑
(
x
i
−
x
¯
)
(
y
i
−
y
¯
)
∑
(
x
i
−
x
¯
)
2
{\displaystyle {\hat {\beta }}={\frac {\sum Y_{i}X_{i}-{\frac {(\sum X_{i})(\sum Y_{i})}{n}}}{\sum X_{i}^{2}-{\frac {(\sum X_{i})^{2}}{n}}}}={\frac {\sum (x_{i}-{\bar {x}})(y_{i}-{\bar {y}})}{\sum (x_{i}-{\bar {x}})^{2}}}}
- n \bar{X} \bar{Y}
And now we have our Normal equations for OLS.
Since we have two equations and two unknowns, we are able to solve for them (
α
^
,
β
^
{\displaystyle {\hat {\alpha }},{\hat {\beta }}}
).