Let
A
∈
R
m
×
n
,
b
∈
R
m
,
{\displaystyle A\in \mathbb {R} ^{m\times n},b\in \mathbb {R} ^{m},\!\,}
and
p
∈
R
n
{\displaystyle p\in \mathbb {R} ^{n}}
, with
m
>
n
{\displaystyle m>n\!\,}
, and let
δ
{\displaystyle \delta \!\,}
be a scalar. Show how the constrained least squares problem,
minimize
‖
b
−
A
x
‖
2
subject to
p
T
x
=
δ
{\displaystyle {\mbox{minimize }}\|b-Ax\|_{2}\quad \quad {\mbox{ subject to }}p^{T}x=\delta }
can be reduced to solving a related unconstrained least squares problem. The algorithm should start by finding a Householder transformation
H
{\displaystyle H\!\,}
such that
H
p
=
‖
p
‖
2
e
1
{\displaystyle Hp=\|p\|_{2}e_{1}}
and setting
y
=
H
x
{\displaystyle y=Hx\!\,}
Since Householder Matrices are orthogonal and hermitian (i.e. symmetric if the matrix is real), we have
y
=
H
x
⇒
x
=
H
T
y
=
H
y
{\displaystyle y=Hx\Rightarrow x=H^{T}y=Hy}
Then the constraint can be written
p
T
x
=
δ
p
T
(
H
T
y
)
=
δ
(
H
p
)
T
y
=
δ
(
‖
p
‖
2
e
1
)
T
y
=
δ
‖
p
‖
2
e
1
T
y
=
δ
y
1
=
δ
‖
p
‖
2
{\displaystyle {\begin{aligned}p^{T}x&=\delta \\p^{T}(H^{T}y)&=\delta \\(Hp)^{T}y&=\delta \\(\|p\|_{2}e_{1})^{T}y&=\delta \\\|p\|_{2}e_{1}^{T}y&=\delta \\y_{1}&={\frac {\delta }{\|p\|_{2}}}\end{aligned}}}
Hence, the first entry of the column vector
y
{\displaystyle y\!\,}
,
y
1
{\displaystyle y_{1}\!\,}
(a scalar), represents our constraint.
Now we want to show that we can write
‖
A
x
−
b
‖
2
{\displaystyle \|Ax-b\|_{2}\!\,}
as a related unconstrained problem. Substituting for
x
{\displaystyle x\!\,}
in
A
x
{\displaystyle Ax\!\,}
we get,
A
x
=
A
H
y
{\displaystyle Ax=AHy\!\,}
Let
A
H
=
B
=
[
B
1
B
^
]
{\displaystyle AH=B={\begin{bmatrix}B_{1}&{\hat {B}}\end{bmatrix}}}
where
B
1
{\displaystyle B_{1}\!\,}
is the first column of
B
{\displaystyle B\!\,}
and
B
^
{\displaystyle {\hat {B}}\!\,}
are the remaining columns.
Since,
y
=
[
y
1
y
^
]
{\displaystyle y={\begin{bmatrix}y_{1}\\{\hat {y}}\end{bmatrix}}}
block matrix multiplication yields,
A
H
y
=
B
y
=
[
B
1
B
^
]
[
y
1
y
^
]
=
y
1
B
1
+
B
^
y
^
{\displaystyle {\begin{aligned}AHy&=By\\&={\begin{bmatrix}B_{1}&{\hat {B}}\end{bmatrix}}{\begin{bmatrix}y_{1}\\{\hat {y}}\end{bmatrix}}\\&=y_{1}B_{1}+{\hat {B}}{\hat {y}}\end{aligned}}}
So
min
y
‖
b
−
A
H
y
‖
2
=
min
y
^
‖
b
−
y
1
B
1
−
B
^
y
^
‖
2
{\displaystyle \min _{y}\|b-AHy\|_{2}=\min _{\hat {y}}\|b-y_{1}B_{1}-{\hat {B}}{\hat {y}}\|_{2}}
, which is unconstrained.
Prove or disprove the following interpolants exist for all values of
y
1
,
y
2
,
y
3
{\displaystyle y_{1},y_{2},y_{3}\!\,}
and all distinct values of
x
1
,
x
2
,
x
3
{\displaystyle x_{1},x_{2},x_{3}\!\,}
.
a
.
y
i
=
c
1
+
c
2
x
i
+
c
3
x
i
2
{\displaystyle a.\qquad y_{i}=c_{1}+c_{2}x_{i}+c_{3}x_{i}^{2}\!\,}
Substituting into this equation the pairs
x
i
,
y
i
,
i
=
1
,
2
,
3
,
{\displaystyle x_{i},\,y_{i},\;i=1,2,3,\,}
gives rise to the following matrix equation:
[
y
1
y
2
y
3
]
=
[
1
x
1
x
1
2
1
x
2
x
2
2
1
x
3
x
3
2
]
⏟
A
[
c
1
c
2
c
3
]
{\displaystyle {\begin{bmatrix}y_{1}\\y_{2}\\y_{3}\end{bmatrix}}=\underbrace {\begin{bmatrix}1&x_{1}&x_{1}^{2}\\1&x_{2}&x_{2}^{2}\\1&x_{3}&x_{3}^{2}\end{bmatrix}} _{A}{\begin{bmatrix}c_{1}\\c_{2}\\c_{3}\end{bmatrix}}}
Where A is the Vandermonde matrix , which is know to be non-singular. This proves existence and uniqueness of the coefficients
c
1
,
c
2
,
c
3
{\displaystyle c_{1},\,c_{2},\,c_{3}}
.
b
.
y
i
=
c
1
+
c
2
x
i
2
+
c
3
x
i
4
{\displaystyle b.\qquad y_{i}=c_{1}+c_{2}x_{i}^{2}+c_{3}x_{i}^{4}\!\,}
Now, substituting the pairs
x
i
,
y
i
,
i
=
1
,
2
,
3
,
{\displaystyle x_{i},\,y_{i},\;i=1,2,3,\,}
gives:
[
y
1
y
2
y
3
]
=
[
1
x
1
2
x
1
4
1
x
2
2
x
2
4
1
x
3
2
x
3
4
]
[
c
1
c
2
c
3
]
{\displaystyle {\begin{bmatrix}y_{1}\\y_{2}\\y_{3}\end{bmatrix}}={\begin{bmatrix}1&x_{1}^{2}&x_{1}^{4}\\1&x_{2}^{2}&x_{2}^{4}\\1&x_{3}^{2}&x_{3}^{4}\end{bmatrix}}{\begin{bmatrix}c_{1}\\c_{2}\\c_{3}\end{bmatrix}}}
Consider the unique set of points
x
1
=
1
,
x
2
=
−
1
,
x
3
=
0.
{\displaystyle x_{1}=1,\ x_{2}=-1,\ x_{3}=0.\ }
The system reduces to
[
y
1
y
2
y
3
]
=
[
1
1
1
1
1
1
1
0
0
]
⏟
A
[
c
1
c
2
c
3
]
{\displaystyle {\begin{bmatrix}y_{1}\\y_{2}\\y_{3}\end{bmatrix}}=\underbrace {\begin{bmatrix}1&1&1\\1&1&1\\1&0&0\end{bmatrix}} _{A}{\begin{bmatrix}c_{1}\\c_{2}\\c_{3}\end{bmatrix}}}
Here, the matrix
A
{\displaystyle A\!\,}
is clearly singular.
More generally, the determinant of the matrix
A
{\displaystyle A}
is given by
|
1
x
1
2
x
1
4
1
x
2
2
x
2
4
1
x
3
2
x
3
4
|
=
(
x
2
2
−
x
1
2
)
(
x
3
2
−
x
1
2
)
(
x
3
2
−
x
2
2
)
=
0
{\displaystyle \left|{\begin{array}{ccc}1&x_{1}^{2}&x_{1}^{4}\\1&x_{2}^{2}&x_{2}^{4}\\1&x_{3}^{2}&x_{3}^{4}\end{array}}\right|=(x_{2}^{2}-x_{1}^{2})(x_{3}^{2}-x_{1}^{2})(x_{3}^{2}-x_{2}^{2})=0}
if
x
j
=
−
x
i
{\displaystyle x_{j}=-x_{i}}
for any
i
≠
j
{\displaystyle i\neq j}
Consider the linear system of equations
A
x
=
b
{\displaystyle \!\,Ax=b}
where
A
{\displaystyle \!\,A}
is a symmetric positive definite matrix of order
n
{\displaystyle \!\,n}
. The conjugate gradient method (CG) for solving this system is
Choose
x
0
{\displaystyle \!\,x_{0}}
, compute
r
0
=
b
−
A
x
0
{\displaystyle \!\,r_{0}=b-Ax_{0}}
Set
p
0
=
r
0
{\displaystyle \!\,p_{0}=r_{0}}
for
k
=
0
{\displaystyle \!\,k=0}
until convergence
α
k
=
(
r
k
,
r
k
)
/
(
p
k
,
A
p
k
)
{\displaystyle \!\,\alpha _{k}=(r_{k},r_{k})/(p_{k},Ap_{k})}
x
k
+
1
=
x
k
+
α
k
p
k
{\displaystyle \!\,x_{k+1}=x_{k}+\alpha _{k}p_{k}}
r
k
+
1
=
r
k
−
α
k
A
p
k
{\displaystyle \!\,r_{k+1}=r_{k}-\alpha _{k}Ap_{k}}
<Test for Convergence>
β
k
=
(
r
k
+
1
,
r
k
+
1
)
/
(
r
k
,
r
k
)
{\displaystyle \!\,\beta _{k}=(r_{k+1},r_{k+1})/(r_{k},r_{k})}
p
k
+
1
=
r
k
+
1
+
β
k
p
k
{\displaystyle \!\,p_{k+1}=r_{k+1}+\beta _{k}p_{k}}
end
where
(
v
,
w
)
=
∑
i
=
1
n
v
i
w
i
{\displaystyle \!\,(v,w)=\sum _{i=1}^{n}v_{i}w_{i}}
is the Euclidean inner product.
Let
Q
{\displaystyle \!\,Q}
be some other symmetric [ 1] positive-definite matrix of order
n
{\displaystyle \!\,n}
. We know that the forms
⟨
v
,
w
⟩
A
≡
(
A
v
,
w
)
⟨
v
,
w
⟩
Q
≡
(
Q
v
,
w
)
{\displaystyle \langle v,w\rangle _{A}\equiv (Av,w)\quad \quad \quad \langle v,w\rangle _{Q}\equiv (Qv,w)}
are inner products on
R
n
{\displaystyle \!\,R^{n}}
. In addition, a matrix
M
{\displaystyle M\,\!}
is symmetric with respect to the
Q
{\displaystyle Q\,\!}
inner product
⟨
,
⟩
Q
{\displaystyle \langle \,,\,\rangle _{Q}\,\!}
if
⟨
M
v
,
w
⟩
Q
=
⟨
v
,
M
w
⟩
Q
{\displaystyle \langle Mv,w\rangle _{Q}=\langle v,Mw\rangle _{Q}\,\!}
for all
v
,
w
{\displaystyle v,w\,\!}
in
R
n
{\displaystyle R^{n}\,\!}
, and
M
{\displaystyle M\,\!}
is positive definite with respect to
⟨
,
⟩
Q
{\displaystyle \langle \,,\,\rangle _{Q}\,\!}
if
⟨
M
v
,
v
⟩
Q
>
0
{\displaystyle \langle Mv,v\rangle _{Q}>0\,\!}
for all nonzero
v
{\displaystyle v\,\!}
Show that
Q
−
1
A
{\displaystyle Q^{-1}A\,\!}
is symmetric and positive-definite with respect to the
Q
{\displaystyle Q\,\!}
-inner product.
⟨
Q
−
1
A
v
,
w
⟩
Q
=
(
Q
Q
−
1
A
v
,
w
)
=
(
A
v
,
w
)
⟨
v
,
Q
−
1
A
w
⟩
Q
=
(
Q
v
,
Q
−
1
A
w
)
=
(
Q
−
T
Q
v
,
A
w
)
=
(
Q
−
1
Q
v
,
A
w
)
=
(
v
,
A
w
)
=
(
A
T
v
,
w
)
=
(
A
v
,
w
)
⟨
Q
−
1
A
v
,
v
⟩
Q
=
(
Q
Q
−
1
A
v
,
v
)
=
(
A
v
,
v
)
>
0
{\displaystyle {\begin{aligned}\langle Q^{-1}Av,w\rangle _{Q}&=(QQ^{-1}Av,w)\\&=(Av,w)\\\langle v,Q^{-1}Aw\rangle _{Q}&=(Qv,Q^{-1}Aw)\\&=(Q^{-T}Qv,Aw)\\&=(Q^{-1}Qv,Aw)\\&=(v,Aw)\\&=(A^{T}v,w)\\&=(Av,w)\\\\\langle Q^{-1}Av,v\rangle _{Q}&=(QQ^{-1}Av,v)\\&=(Av,v)\\&>0\\\end{aligned}}}
In light of this fact, CG can be used to solve the system
Q
−
1
A
x
=
Q
−
1
b
{\displaystyle Q^{-1}Ax=Q^{-1}b\,\!}
in an appropriate manner. Specify this algorithm and identify any extra costs required that would not be present with
Q
=
I
{\displaystyle Q=I\,\!}
.
Choose
x
0
{\displaystyle \!\,x_{0}}
r
0
=
b
−
A
x
0
{\displaystyle \!\,r_{0}=b-Ax_{0}}
r
~
0
{\displaystyle {\tilde {r}}_{0}}
: solve
Q
r
~
0
=
r
0
{\displaystyle Q{\tilde {r}}_{0}=r_{0}\!\,}
p
0
=
r
~
0
{\displaystyle \!\,p_{0}={\tilde {r}}_{0}}
for
k
=
0
,
1
,
2
,
.
.
.
{\displaystyle \!\,k=0,1,2,...}
until convergence
α
k
=
(
r
k
,
r
~
k
)
/
(
p
k
,
A
p
k
)
{\displaystyle \!\,\alpha _{k}=(r_{k},{\tilde {r}}_{k})/(p_{k},Ap_{k})}
x
k
+
1
=
x
k
+
α
k
p
k
{\displaystyle \!\,x_{k+1}=x_{k}+\alpha _{k}p_{k}}
r
k
+
1
=
r
k
−
α
k
A
p
k
{\displaystyle \!\,r_{k+1}=r_{k}-\alpha _{k}Ap_{k}}
<Test for Convergence>
r
~
k
+
1
{\displaystyle {\tilde {r}}_{k+1}}
: solve
Q
r
~
k
+
1
=
r
k
+
1
{\displaystyle Q{\tilde {r}}_{k+1}=r_{k+1}\!\,}
β
k
=
(
r
k
+
1
,
r
~
k
+
1
)
/
(
r
k
,
r
~
k
)
{\displaystyle \!\,\beta _{k}=(r_{k+1},{\tilde {r}}_{k+1})/(r_{k},{\tilde {r}}_{k})}
p
k
+
1
=
r
~
k
+
1
+
β
k
p
k
{\displaystyle \!\,p_{k+1}={\tilde {r}}_{k+1}+\beta _{k}p_{k}}
end
One additional cost is computing
Q
−
1
{\displaystyle Q^{-1}\,\!}
.
Another one-time cost is multiplying
Q
−
1
A
{\displaystyle Q^{-1}A\,\!}
which has cost
n
3
{\displaystyle n^{3}\,\!}
and
Q
−
1
b
{\displaystyle Q^{-1}b\,\!}
which has cost
n
2
{\displaystyle n^{2}\,\!}
.
Use any facts you know about the conjugate gradient method to identify properties of
Q
{\displaystyle Q}
that would be desirable for computing the solution
x
{\displaystyle x}
efficiently.
↑ Omitted on the actual exam. This made the problem ambiguous and the word symmetric should have been included. (Howard Elman, 12/16/08)