Define the concept of steepest descent
d
k
{\displaystyle d_{k}\!\,}
and show how to compute the optimal stepsize
α
k
{\displaystyle \alpha _{k}\!\,}
∇
Q
(
x
k
)
⋅
d
k
<
0
{\displaystyle \nabla Q(x_{k})\cdot d_{k}<0\!\,}
Choose
α
k
{\displaystyle \alpha _{k}\!\,}
such that
Q
(
x
k
+
α
k
d
k
)
{\displaystyle Q(x_{k}+\alpha _{k}d_{k})\!\,}
is minimized i.e.
α
k
:
min
α
k
Q
(
x
k
+
α
k
d
k
)
{\displaystyle \alpha _{k}:\min _{\alpha _{k}}Q(x_{k}+\alpha _{k}d_{k})\!\,}
Q
(
x
k
+
α
k
d
k
)
=
⟨
x
k
+
α
k
d
k
,
A
x
k
+
α
k
A
d
k
⟩
−
⟨
x
k
+
α
k
d
k
,
b
⟩
{\displaystyle Q(x_{k}+\alpha _{k}d_{k})=\langle x_{k}+\alpha _{k}d_{k},Ax_{k}+\alpha _{k}Ad_{k}\rangle -\langle x_{k}+\alpha _{k}d_{k},b\rangle \!\,}
d
d
α
Q
(
x
k
+
α
k
d
k
)
=
⟨
A
d
k
,
x
k
⟩
+
α
k
⟨
A
d
k
,
d
k
⟩
−
⟨
d
k
,
b
⟩
{\displaystyle {\frac {d}{d\alpha }}Q(x_{k}+\alpha _{k}d_{k})=\langle Ad_{k},x_{k}\rangle +\alpha _{k}\langle Ad_{k},d_{k}\rangle -\langle d_{k},b\rangle \!\,}
Setting the above expression equal to zero gives the optimal
α
k
{\displaystyle \alpha _{k}\!\,}
:
α
k
=
⟨
d
k
,
b
−
A
x
k
⟩
⟨
A
d
k
,
d
k
⟩
{\displaystyle \alpha _{k}={\frac {\langle d_{k},b-Ax_{k}\rangle }{\langle Ad_{k},d_{k}\rangle }}\!\,}
Note that since
A
{\displaystyle A\!\,}
is symmetric
⟨
d
k
,
A
x
k
⟩
=
⟨
A
d
k
,
x
k
⟩
{\displaystyle \langle d_{k},Ax_{k}\rangle =\langle Ad_{k},x_{k}\rangle \!\,}
Formulate the steepest descent (or gradient method) method and write a pseudocode which implements it.
Note that
d
k
=
−
∇
Q
(
x
k
)
=
b
−
A
x
k
=
r
k
{\displaystyle d_{k}=-\nabla Q(x_{k})=b-Ax_{k}=r_{k}\!\,}
. Then the minimal
α
k
{\displaystyle \alpha _{k}\!\,}
is given by
⟨
r
k
,
r
k
⟩
⟨
r
k
,
r
k
⟩
A
{\displaystyle {\frac {\langle r_{k},r_{k}\rangle }{\langle r_{k},r_{k}\rangle _{A}}}\!\,}
Given
x
0
∈
R
n
{\displaystyle x_{0}\in R^{n}\!\,}
For
k
=
0
,
1
,
2
,
…
{\displaystyle k=0,1,2,\ldots \!\,}
r
k
=
b
−
A
x
k
If
r
k
=
0
,
then quit
d
k
=
r
k
α
k
=
⟨
r
k
,
r
k
⟩
⟨
r
k
,
r
k
⟩
A
x
k
+
1
=
x
k
+
α
k
d
k
{\displaystyle {\begin{aligned}r_{k}&=b-Ax_{k}\\&{\mbox{If }}r_{k}=0,{\mbox{ then quit }}\\d_{k}&=r_{k}\\\alpha _{k}&={\frac {\langle r_{k},r_{k}\rangle }{\langle r_{k},r_{k}\rangle _{A}}}\\x_{k+1}&=x_{k}+\alpha _{k}d_{k}\end{aligned}}}
Since
B
{\displaystyle B\!\,}
is symmetric, positive definite,
B
=
E
T
E
{\displaystyle B=E^{T}E\!\,}
where
E
{\displaystyle E\!\,}
is upper triangular (Cholesky Factorization).
Then
B
−
1
=
E
−
1
E
−
T
{\displaystyle B^{-1}=E^{-1}E^{-T}\!\,}
Hence,
B
−
1
A
x
=
B
−
1
b
E
−
1
E
−
T
A
x
=
E
−
1
E
−
T
b
E
−
T
A
x
=
E
−
T
b
E
−
T
A
E
−
1
⏟
A
~
E
x
⏟
x
~
=
E
−
T
b
⏟
b
~
{\displaystyle {\begin{aligned}B^{-1}Ax&=B^{-1}b\\E^{-1}E^{-T}Ax&=E^{-1}E^{-T}b\\E^{-T}Ax&=E^{-T}b\\\underbrace {E^{-T}AE^{-1}} _{\tilde {A}}\underbrace {Ex} _{\tilde {x}}&=\underbrace {E^{-T}b} _{\tilde {b}}\end{aligned}}\!\,}
A
~
{\displaystyle {\tilde {A}}\!\,}
is symmetric:
(
E
−
T
A
E
−
1
)
T
=
E
−
T
A
E
−
1
{\displaystyle (E^{-T}AE^{-1})^{T}=E^{-T}AE^{-1}\!\,}
since
A
{\displaystyle A\!\,}
symmetric
A
~
{\displaystyle {\tilde {A}}\!\,}
is positive definite:
x
T
E
−
T
A
E
−
1
x
=
(
E
−
1
x
)
T
A
(
E
−
1
x
)
>
0
{\displaystyle x^{T}E^{-T}AE^{-1}x=(E^{-1}x)^{T}A(E^{-1}x)>0\!\,}
since
A
{\displaystyle A\!\,}
positive definite
Given
x
0
∈
R
n
{\displaystyle x_{0}\in R^{n}\!\,}
For
k
=
0
,
1
,
2
,
…
{\displaystyle k=0,1,2,\ldots \!\,}
x
k
~
=
E
x
k
r
k
=
b
~
−
A
~
x
k
~
If
r
k
=
0
,
then quit
d
k
=
r
k
α
k
=
⟨
r
k
,
r
k
⟩
⟨
r
k
,
r
k
⟩
A
~
x
k
+
1
=
x
k
+
α
k
d
k
{\displaystyle {\begin{aligned}{\tilde {x_{k}}}&=Ex_{k}\\r_{k}&={\tilde {b}}-{\tilde {A}}{\tilde {x_{k}}}\\&{\mbox{If }}r_{k}=0,{\mbox{ then quit }}\\d_{k}&=r_{k}\\\alpha _{k}&={\frac {\langle r_{k},r_{k}\rangle }{\langle r_{k},r_{k}\rangle _{\tilde {A}}}}\\x_{k+1}&=x_{k}+\alpha _{k}d_{k}\end{aligned}}}