Inverse matrices play a key role in linear algebra, and particularly in computations. However, only square matrices can possibly be invertible.
This leads us to introduce the Moore-Penrose inverse of a potentially non-square real- or complex-valued matrix, which satisfies some but not necessarily all of the properties of an inverse matrix.
Definition.
Let
A
{\displaystyle A}
be an
m -by-
n matrix over a field
K
{\displaystyle \mathbb {K} }
and
A
+
{\displaystyle A^{+}}
be an
n -by-
m matrix over
K
{\displaystyle \mathbb {K} }
, where
K
{\displaystyle \mathbb {K} }
is either
R
{\displaystyle \mathbb {R} }
, the real numbers, or
C
{\displaystyle \mathbb {C} }
, the complex numbers. Recall that
A
∗
{\displaystyle A^{*}}
refers to the conjugate transpose of
A
{\displaystyle A}
. Then the following four criteria are called the
Moore–Penrose conditions for
A
{\displaystyle A}
:
A
A
+
A
=
A
{\displaystyle AA^{+}A=A}
,
A
+
A
A
+
=
A
+
{\displaystyle A^{+}AA^{+}=A^{+}}
,
(
A
A
+
)
∗
=
A
A
+
{\displaystyle \left(AA^{+}\right)^{*}=AA^{+}}
,
(
A
+
A
)
∗
=
A
+
A
{\displaystyle \left(A^{+}A\right)^{*}=A^{+}A}
.
We will see below that given a matrix
A
{\displaystyle A}
, there exists a unique matrix
A
+
{\displaystyle A^{+}}
that satisfies all four of the Moore–Penrose conditions. They generalise the properties of the usual inverse.
Remark.
If
A
{\displaystyle A}
is an invertible square matrix, then the ordinary inverse
A
−
1
{\displaystyle A^{-1}}
satisfies the Moore-Penrose conditions for
A
{\displaystyle A}
. Observe also that if
A
+
{\displaystyle A^{+}}
satisfies the Moore-Penrose conditions for
A
{\displaystyle A}
, then
A
{\displaystyle A}
satisfies the Moore-Penrose conditions for
A
+
{\displaystyle A^{+}}
.
Basic properties of the Hermitian conjugate
edit
We assemble some basic properties of the conjugate transpose for later use. In the following lemmas,
A
{\displaystyle A}
is a matrix with complex elements and n columns,
B
{\displaystyle B}
is a matrix with complex elements and n rows.
Lemma (1).
For any
K
{\displaystyle \mathbb {K} }
-matrix
A
{\displaystyle A}
,
A
∗
A
=
0
⇒
A
=
0
{\displaystyle A^{*}A=0\Rightarrow A=0}
Proof. The assumption says that all elements of A*A are zero. Therefore,
0
=
Tr
(
A
∗
A
)
=
∑
j
=
1
n
(
A
∗
A
)
j
j
=
∑
j
=
1
n
∑
i
=
1
m
(
A
∗
)
j
i
A
i
j
=
∑
i
=
1
m
∑
j
=
1
n
|
A
i
j
|
2
.
{\displaystyle 0=\operatorname {Tr} \left(A^{*}A\right)=\sum _{j=1}^{n}\left(A^{*}A\right)_{jj}=\sum _{j=1}^{n}\sum _{i=1}^{m}\left(A^{*}\right)_{ji}A_{ij}=\sum _{i=1}^{m}\sum _{j=1}^{n}\left|A_{ij}\right|^{2}.}
Therefore, all
A
i
j
{\displaystyle A_{ij}}
equal 0 i.e.
A
=
0
{\displaystyle A=0}
.
◻
{\displaystyle \square }
Lemma (2).
For any
K
{\displaystyle \mathbb {K} }
-matrix
A
{\displaystyle A}
,
A
∗
A
B
=
0
⇒
A
B
=
0
{\displaystyle A^{*}AB=0\Rightarrow AB=0}
Proof. :
0
=
A
∗
A
B
⇒
0
=
B
∗
A
∗
A
B
⇒
0
=
(
A
B
)
∗
(
A
B
)
⇒
0
=
A
B
(
by Lemma 1
)
{\displaystyle {\begin{aligned}0&=A^{*}AB&\\\Rightarrow 0&=B^{*}A^{*}AB&\\\Rightarrow 0&=(AB)^{*}(AB)&\\\Rightarrow 0&=AB&({\text{by Lemma 1}})\end{aligned}}}
◻
{\displaystyle \square }
Lemma (3).
For any
K
{\displaystyle \mathbb {K} }
-matrix
A
{\displaystyle A}
,
A
B
B
∗
=
0
⇒
A
B
=
0
{\displaystyle ABB^{*}=0\Rightarrow AB=0}
Proof. This is proved in a manner similar to the argument of Lemma 2 (or by simply taking the Hermitian conjugate).
◻
{\displaystyle \square }
Existence and uniqueness
edit
We establish existence and uniqueness of the Moore-Penrose inverse for every matrix.
Theorem.
If
A
{\displaystyle A}
is a
K
{\displaystyle \mathbb {K} }
-matrix and
A
1
+
{\displaystyle A_{1}^{+}}
and
A
2
+
{\displaystyle A_{2}^{+}}
satisfy the Moore-Penrose conditions for
A
{\displaystyle A}
, then
A
1
+
=
A
2
+
{\displaystyle A_{1}^{+}=A_{2}^{+}}
.
Proof. Let
A
{\displaystyle A}
be a matrix over
R
{\displaystyle \mathbb {R} }
or
C
{\displaystyle \mathbb {C} }
. Suppose that
A
1
+
{\displaystyle {A_{1}^{+}}}
and
A
2
+
{\displaystyle {A_{2}^{+}}}
are Moore–Penrose inverses of
A
{\displaystyle A}
. Observe then that
A
A
1
+
=
(
1
)
(
A
A
2
+
A
)
A
1
+
=
(
A
A
2
+
)
(
A
A
1
+
)
=
(
3
)
(
A
A
2
+
)
∗
(
A
A
1
+
)
∗
=
A
2
+
∗
(
A
A
1
+
A
)
∗
=
(
1
)
A
2
+
∗
A
∗
=
(
A
A
2
+
)
∗
=
(
3
)
A
A
2
+
.
{\displaystyle A{A_{1}^{+}}{\overset {(1)}{{}={}}}(A{A_{2}^{+}}A){A_{1}^{+}}=(A{A_{2}^{+}})(A{A_{1}^{+}}){\overset {(3)}{{}={}}}(A{A_{2}^{+}})^{*}(A{A_{1}^{+}})^{*}={A_{2}^{+}}^{*}(A{A_{1}^{+}}A)^{*}{\overset {(1)}{{}={}}}{A_{2}^{+}}^{*}A^{*}=(A{A_{2}^{+}})^{*}{\overset {(3)}{{}={}}}A{A_{2}^{+}}.}
Analogously we conclude that
A
1
+
A
=
A
2
+
A
{\displaystyle {A_{1}^{+}}A={A_{2}^{+}}A}
. The proof is completed by observing that then
A
1
+
=
(
2
)
A
1
+
A
A
1
+
=
A
1
+
A
A
2
+
=
A
2
+
A
A
2
+
=
(
2
)
A
2
+
.
{\displaystyle {A_{1}^{+}}{\overset {(2)}{{}={}}}{A_{1}^{+}}A{A_{1}^{+}}={A_{1}^{+}}A{A_{2}^{+}}=A_{2}^{+}A{A_{2}^{+}}{\overset {(2)}{{}={}}}{A_{2}^{+}}.}
◻
{\displaystyle \square }
Theorem.
For every
K
{\displaystyle \mathbb {K} }
-matrix
A
{\displaystyle A}
there is a matrix
A
+
{\displaystyle A^{+}}
satisfying the Moore-Penrose conditions for
A
{\displaystyle A}
Proof. The proof proceeds in stages.
A
{\displaystyle A}
is a 1-by-1 matrix
For any
x
∈
K
{\displaystyle x\in \mathbb {K} }
, we define:
x
+
:=
{
x
−
1
,
if
x
≠
0
0
,
if
x
=
0
{\displaystyle x^{+}:={\begin{cases}x^{-1},&{\mbox{if }}x\neq 0\\0,&{\mbox{if }}x=0\end{cases}}}
It is easy to see that
x
+
{\displaystyle x^{+}}
is a pseudoinverse of
x
{\displaystyle x}
(interpreted as a 1-by-1 matrix).
A
{\displaystyle A}
is a square diagonal matrix
Let
D
{\displaystyle D}
be an n -by-n matrix over
K
{\displaystyle \mathbb {K} }
with zeros off the diagonal. We define
D
+
{\displaystyle D^{+}}
as an n -by-n matrix over
K
{\displaystyle \mathbb {K} }
with
(
D
+
)
i
j
:=
(
D
i
j
)
+
{\displaystyle \left(D^{+}\right)_{ij}:=\left(D_{ij}\right)^{+}}
as defined above. We write simply
D
i
j
+
{\displaystyle D_{ij}^{+}}
for
(
D
+
)
i
j
=
(
D
i
j
)
+
{\displaystyle \left(D^{+}\right)_{ij}=\left(D_{ij}\right)^{+}}
.
Notice that
D
+
{\displaystyle D^{+}}
is also a matrix with zeros off the diagonal.
We now show that
D
+
{\displaystyle D^{+}}
is a pseudoinverse of
D
{\displaystyle D}
:
(
D
D
+
D
)
i
j
=
D
i
j
D
i
j
+
D
i
j
=
D
i
j
⇒
D
D
+
D
=
D
{\displaystyle \left(DD^{+}D\right)_{ij}=D_{ij}D_{ij}^{+}D_{ij}=D_{ij}\Rightarrow DD^{+}D=D}
(
D
+
D
D
+
)
i
j
=
D
i
j
+
D
i
j
D
i
j
+
=
D
i
j
+
⇒
D
+
D
D
+
=
D
+
{\displaystyle \left(D^{+}DD^{+}\right)_{ij}=D_{ij}^{+}D_{ij}D_{ij}^{+}=D_{ij}^{+}\Rightarrow D^{+}DD^{+}=D^{+}}
(
D
D
+
)
i
j
∗
=
(
D
D
+
)
j
i
¯
=
D
j
i
D
j
i
+
¯
=
(
D
j
i
D
j
i
+
)
∗
=
D
j
i
D
j
i
+
=
D
i
j
D
i
j
+
⇒
(
D
D
+
)
∗
=
D
D
+
{\displaystyle \left(DD^{+}\right)_{ij}^{*}={\overline {\left(DD^{+}\right)_{ji}}}={\overline {D_{ji}D_{ji}^{+}}}=\left(D_{ji}D_{ji}^{+}\right)^{*}=D_{ji}D_{ji}^{+}=D_{ij}D_{ij}^{+}\Rightarrow \left(DD^{+}\right)^{*}=DD^{+}}
(
D
+
D
)
i
j
∗
=
(
D
+
D
)
j
i
¯
=
D
j
i
+
D
j
i
¯
=
(
D
j
i
+
D
j
i
)
∗
=
D
j
i
+
D
j
i
=
D
i
j
+
D
i
j
⇒
(
D
+
D
)
∗
=
D
+
D
{\displaystyle \left(D^{+}D\right)_{ij}^{*}={\overline {\left(D^{+}D\right)_{ji}}}={\overline {D_{ji}^{+}D_{ji}}}=\left(D_{ji}^{+}D_{ji}\right)^{*}=D_{ji}^{+}D_{ji}=D_{ij}^{+}D_{ij}\Rightarrow \left(D^{+}D\right)^{*}=D^{+}D}
A
{\displaystyle A}
is a general diagonal matrix
Let
D
{\displaystyle D}
be an m -by-n matrix over
K
{\displaystyle \mathbb {K} }
with zeros off the main diagonal, where m and n are unequal. That is,
D
i
j
=
d
i
{\displaystyle D_{ij}=d_{i}}
for some
d
i
∈
K
{\displaystyle d_{i}\in \mathbb {K} }
when
i
=
j
{\displaystyle i=j}
and
D
i
j
=
0
{\displaystyle D_{ij}=0}
otherwise.
Consider the case where
n
>
m
{\displaystyle n>m}
. Then we can rewrite
D
=
[
D
0
0
m
×
(
n
−
m
)
]
{\displaystyle D=\left[D_{0}\,\,\mathbf {0} _{m\times (n-m)}\right]}
by stacking where
D
0
{\displaystyle D_{0}}
is a square diagonal m -by-m matrix, and
0
m
×
(
n
−
m
)
{\displaystyle \mathbf {0} _{m\times (n-m)}}
is the m -by-(n −m ) zero matrix. We define
D
+
≡
[
D
0
+
0
(
n
−
m
)
×
m
]
{\displaystyle D^{+}\equiv {\begin{bmatrix}D_{0}^{+}\\\mathbf {0} _{(n-m)\times m}\end{bmatrix}}}
as an n -by-m matrix over
K
{\displaystyle \mathbb {K} }
, with
D
0
+
{\displaystyle D_{0}^{+}}
the pseudoinverse of
D
0
{\displaystyle D_{0}}
defined above, and
0
(
n
−
m
)
×
m
{\displaystyle \mathbf {0} _{(n-m)\times m}}
the (n −m )-by-m zero matrix . We now show that
D
+
{\displaystyle D^{+}}
is a pseudoinverse of
D
{\displaystyle D}
:
By multiplication of block matrices,
D
D
+
=
D
0
D
0
+
+
0
m
×
(
n
−
m
)
0
(
n
−
m
)
×
m
=
D
0
D
0
+
,
{\displaystyle DD^{+}=D_{0}D_{0}^{+}+\mathbf {0} _{m\times (n-m)}\mathbf {0} _{(n-m)\times m}=D_{0}D_{0}^{+},}
so by property 1 for square diagonal matrices
D
0
{\displaystyle D_{0}}
proven in the previous section,
D
D
+
D
=
D
0
D
0
+
[
D
0
0
m
×
(
n
−
m
)
]
=
[
D
0
D
0
+
D
0
0
m
×
(
n
−
m
)
]
=
[
D
0
0
m
×
(
n
−
m
)
]
=
D
{\displaystyle DD^{+}D=D_{0}D_{0}^{+}\left[D_{0}\,\,\mathbf {0} _{m\times (n-m)}\right]=\left[D_{0}D_{0}^{+}D_{0}\,\,\mathbf {0} _{m\times (n-m)}\right]=\left[D_{0}\,\,\mathbf {0} _{m\times (n-m)}\right]=D}
.
Similarly,
D
+
D
=
[
D
0
+
D
0
0
m
×
(
n
−
m
)
0
(
n
−
m
)
×
m
0
(
n
−
m
)
×
(
n
−
m
)
]
{\displaystyle D^{+}D={\begin{bmatrix}D_{0}^{+}D_{0}&\mathbf {0} _{m\times (n-m)}\\\mathbf {0} _{(n-m)\times m}&\mathbf {0} _{(n-m)\times (n-m)}\end{bmatrix}}}
, so
D
+
D
D
+
=
[
D
0
+
D
0
0
m
×
(
n
−
m
)
0
(
n
−
m
)
×
m
0
(
n
−
m
)
×
(
n
−
m
)
]
[
D
0
+
0
(
n
−
m
)
×
m
]
=
[
D
0
+
D
0
D
0
+
0
(
n
−
m
)
×
m
]
=
D
+
.
{\displaystyle D^{+}DD^{+}={\begin{bmatrix}D_{0}^{+}D_{0}&\mathbf {0} _{m\times (n-m)}\\\mathbf {0} _{(n-m)\times m}&\mathbf {0} _{(n-m)\times (n-m)}\end{bmatrix}}{\begin{bmatrix}D_{0}^{+}\\\mathbf {0} _{(n-m)\times m}\end{bmatrix}}={\begin{bmatrix}D_{0}^{+}D_{0}D_{0}^{+}\\\mathbf {0} _{(n-m)\times m}\end{bmatrix}}=D^{+}.}
By 1 and property 3 for square diagonal matrices,
(
D
D
+
)
∗
=
(
D
0
D
0
+
)
∗
=
D
0
D
0
+
=
D
D
+
{\displaystyle \left(DD^{+}\right)^{*}=\left(D_{0}D_{0}^{+}\right)^{*}=D_{0}D_{0}^{+}=DD^{+}}
.
By 2 and property 4 for square diagonal matrices,
(
D
+
D
)
∗
=
[
(
D
0
+
D
0
)
∗
0
m
×
(
n
−
m
)
0
(
n
−
m
)
×
m
0
(
n
−
m
)
×
(
n
−
m
)
]
=
[
D
0
+
D
0
0
m
×
(
n
−
m
)
0
(
n
−
m
)
×
m
0
(
n
−
m
)
×
(
n
−
m
)
]
=
D
+
D
.
{\displaystyle \left(D^{+}D\right)^{*}={\begin{bmatrix}\left(D_{0}^{+}D_{0}\right)^{*}&\mathbf {0} _{m\times (n-m)}\\\mathbf {0} _{(n-m)\times m}&\mathbf {0} _{(n-m)\times (n-m)}\end{bmatrix}}={\begin{bmatrix}D_{0}^{+}D_{0}&\mathbf {0} _{m\times (n-m)}\\\mathbf {0} _{(n-m)\times m}&\mathbf {0} _{(n-m)\times (n-m)}\end{bmatrix}}=D^{+}D.}
Existence for
D
{\displaystyle D}
such that
m
>
n
{\displaystyle m>n}
follows by swapping the roles of
D
{\displaystyle D}
and
D
+
{\displaystyle D^{+}}
in the
n
>
m
{\displaystyle n>m}
case and using the fact that
(
D
+
)
+
=
D
{\displaystyle \left(D^{+}\right)^{+}=D}
.
A
{\displaystyle A}
is an arbitrary matrix
The singular value decomposition theorem states that there exists a factorization of the form
A
=
U
Σ
V
∗
{\displaystyle A=U\Sigma V^{*}}
where:
U
{\displaystyle U}
is an m -by-m unitary matrix over
K
{\displaystyle \mathbb {K} }
.
Σ
{\displaystyle \Sigma }
is an m -by-n matrix over
K
{\displaystyle \mathbb {K} }
with nonnegative real numbers on the diagonal and zeros off the diagonal.
V
{\displaystyle V}
is an n -by-n unitary matrix over
K
{\displaystyle \mathbb {K} }
.[ 1]
Define
A
+
{\displaystyle A^{+}}
as
V
Σ
+
U
∗
{\displaystyle V\Sigma ^{+}U^{*}}
.
We now show that
A
+
{\displaystyle A^{+}}
is a pseudoinverse of
A
{\displaystyle A}
:
A
A
+
A
=
U
Σ
V
∗
V
Σ
+
U
∗
U
Σ
V
∗
=
U
Σ
Σ
+
Σ
V
∗
=
U
Σ
V
∗
=
A
{\displaystyle AA^{+}A=U\Sigma V^{*}V\Sigma ^{+}U^{*}U\Sigma V^{*}=U\Sigma \Sigma ^{+}\Sigma V^{*}=U\Sigma V^{*}=A}
A
+
A
A
+
=
V
Σ
+
U
∗
U
Σ
V
∗
V
Σ
+
U
∗
=
V
Σ
+
Σ
Σ
+
U
∗
=
V
Σ
+
U
∗
=
A
+
{\displaystyle A^{+}AA^{+}=V\Sigma ^{+}U^{*}U\Sigma V^{*}V\Sigma ^{+}U^{*}=V\Sigma ^{+}\Sigma \Sigma ^{+}U^{*}=V\Sigma ^{+}U^{*}=A^{+}}
(
A
A
+
)
∗
=
(
U
Σ
V
∗
V
Σ
+
U
∗
)
∗
=
(
U
Σ
Σ
+
U
∗
)
∗
=
U
(
Σ
Σ
+
)
∗
U
∗
=
U
(
Σ
Σ
+
)
U
∗
=
U
Σ
V
∗
V
Σ
+
U
∗
=
A
A
+
{\displaystyle \left(AA^{+}\right)^{*}=\left(U\Sigma V^{*}V\Sigma ^{+}U^{*}\right)^{*}=\left(U\Sigma \Sigma ^{+}U^{*}\right)^{*}=U\left(\Sigma \Sigma ^{+}\right)^{*}U^{*}=U\left(\Sigma \Sigma ^{+}\right)U^{*}=U\Sigma V^{*}V\Sigma ^{+}U^{*}=AA^{+}}
(
A
+
A
)
∗
=
(
V
Σ
+
U
∗
U
Σ
V
∗
)
∗
=
(
V
Σ
+
Σ
V
∗
)
∗
=
V
(
Σ
+
Σ
)
∗
V
∗
=
V
(
Σ
+
Σ
)
V
∗
=
V
Σ
+
U
∗
U
Σ
V
∗
=
A
+
A
{\displaystyle \left(A^{+}A\right)^{*}=\left(V\Sigma ^{+}U^{*}U\Sigma V^{*}\right)^{*}=\left(V\Sigma ^{+}\Sigma V^{*}\right)^{*}=V\left(\Sigma ^{+}\Sigma \right)^{*}V^{*}=V\left(\Sigma ^{+}\Sigma \right)V^{*}=V\Sigma ^{+}U^{*}U\Sigma V^{*}=A^{+}A}
◻
{\displaystyle \square }
This leads us to the natural definition:
Definition (Moore-Penrose inverse).
Let
A
{\displaystyle A}
be a
K
{\displaystyle \mathbb {K} }
-matrix. Then the unique
K
{\displaystyle \mathbb {K} }
-matrix satisfying the Moore-Penrose conditions for
A
{\displaystyle A}
is called the
Moore-Penrose inverse
A
+
{\displaystyle A^{+}}
of
A
{\displaystyle A}
.
We have already seen above that the Moore-Penrose inverse generalises the classical inverse to potentially non-square matrices.
We will now list some basic properties of its interaction with the Hermitian conjugate, leaving most of the proofs as exercises to the reader.
Exercise.
For any
K
{\displaystyle \mathbb {K} }
-matrix
A
{\displaystyle A}
,
A
∗
+
=
A
+
∗
{\displaystyle {A^{*}}^{+}={A^{+}}^{*}}
The following identities hold:
A + = A + A +* A *
A + = A * A +* A +
A = A +* A * A
A = A A * A +*
A * = A * A A +
A * = A + A A *
Proof of the first one:
A
+
=
A
+
A
A
+
{\displaystyle A^{+}=A^{+}AA^{+}}
and
A
A
+
=
(
A
A
+
)
∗
{\displaystyle AA^{+}=\left(AA^{+}\right)^{*}}
imply that
A
+
=
A
+
(
A
A
+
)
∗
=
A
+
A
+
∗
A
∗
{\displaystyle A^{+}=A^{+}\left(AA^{+}\right)^{*}=A^{+}A^{+^{*}}A^{*}}
. □
The remaining identities are left as exercises .
Reduction to the Hermitian case
edit
The results of this section show that the computation of the pseudoinverse is reducible to its construction in the
Hermitian case. It suffices to show that the putative constructions satisfy the defining criteria.
Proposition.
For every
K
{\displaystyle \mathbb {K} }
-matrix
A
{\displaystyle A}
,
A
+
=
A
∗
(
A
A
∗
)
+
{\displaystyle A^{+}=A^{*}(AA^{*})^{+}}
Proof. Observe that
A
A
∗
=
A
A
∗
(
A
A
∗
)
+
A
A
∗
⇔
A
A
∗
=
A
D
A
A
∗
⇔
0
=
(
A
D
−
I
)
A
A
∗
⇔
0
=
A
D
A
−
A
(
by Lemma 3
)
⇔
A
=
A
D
A
{\displaystyle {\begin{aligned}&&AA^{*}&=AA^{*}\left(AA^{*}\right)^{+}AA^{*}&\\&\Leftrightarrow &AA^{*}&=ADAA^{*}&\\&\Leftrightarrow &0&=(AD-I)AA^{*}&\\&\Leftrightarrow &0&=ADA-A&({\text{by Lemma 3}})\\&\Leftrightarrow &A&=ADA&\end{aligned}}}
Similarly,
(
A
A
∗
)
+
A
A
∗
(
A
A
∗
)
+
=
(
A
A
∗
)
+
{\displaystyle \left(AA^{*}\right)^{+}AA^{*}\left(AA^{*}\right)^{+}=\left(AA^{*}\right)^{+}}
implies that
A
∗
(
A
A
∗
)
+
A
A
∗
(
A
A
∗
)
+
=
A
∗
(
A
A
∗
)
+
{\displaystyle A^{*}\left(AA^{*}\right)^{+}AA^{*}\left(AA^{*}\right)^{+}=A^{*}\left(AA^{*}\right)^{+}}
i.e.
D
A
D
=
D
{\displaystyle DAD=D}
.
Additionally,
A
D
=
A
A
∗
(
A
A
∗
)
+
{\displaystyle AD=AA^{*}\left(AA^{*}\right)^{+}}
so
A
D
=
(
A
D
)
∗
{\displaystyle AD=(AD)^{*}}
.
Finally,
D
A
=
A
∗
(
A
A
∗
)
+
A
{\displaystyle DA=A^{*}\left(AA^{*}\right)^{+}A}
implies that
(
D
A
)
∗
=
A
∗
(
(
A
A
∗
)
+
)
∗
A
=
A
∗
(
(
A
A
∗
)
+
)
A
=
D
A
{\displaystyle (DA)^{*}=A^{*}\left(\left(AA^{*}\right)^{+}\right)^{*}A=A^{*}\left(\left(AA^{*}\right)^{+}\right)A=DA}
.
Therefore,
D
=
A
+
{\displaystyle D=A^{+}}
.
◻
{\displaystyle \square }
Exercise.
For every
K
{\displaystyle \mathbb {K} }
-matrix
A
{\displaystyle A}
,
A
+
=
(
A
∗
A
)
+
A
∗
{\displaystyle A^{+}=(A^{*}A)^{+}A^{*}}
We now turn to calculating the Moore-Penrose inverse for a product of two matrices,
C
=
A
B
.
{\displaystyle C=AB.}
Proposition.
If
A
{\displaystyle A}
has orthonormal columns i.e.
A
∗
A
=
I
{\displaystyle A^{*}A=I}
, then for any
K
{\displaystyle \mathbb {K} }
-matrix
B
{\displaystyle B}
of the right dimensions,
(
A
B
)
+
=
B
+
A
+
{\displaystyle (AB)^{+}=B^{+}A^{+}}
.
Proof. Since
A
∗
A
=
I
{\displaystyle A^{*}A=I}
,
A
+
=
A
∗
{\displaystyle A^{+}=A^{*}}
. Write
C
=
A
B
{\displaystyle C=AB}
and
D
=
B
+
A
+
=
B
+
A
∗
{\displaystyle D=B^{+}A^{+}=B^{+}A^{*}}
. We show that
D
{\displaystyle D}
satisfies the Moore–Penrose criteria for
C
{\displaystyle C}
.
C
D
C
=
A
B
B
+
A
∗
A
B
=
A
B
B
+
B
=
A
B
=
C
,
D
C
D
=
B
+
A
∗
A
B
B
+
A
∗
=
B
+
B
B
+
A
∗
=
B
+
A
∗
=
D
,
(
C
D
)
∗
=
D
∗
B
∗
A
∗
=
A
(
B
+
)
∗
B
∗
A
∗
=
A
(
B
B
+
)
∗
A
∗
=
A
B
B
+
A
∗
=
C
D
,
(
D
C
)
∗
=
B
∗
A
∗
D
∗
=
B
∗
A
∗
A
(
B
+
)
∗
=
(
B
+
B
)
∗
=
B
+
B
=
B
+
A
∗
A
B
=
D
C
.
{\displaystyle {\begin{aligned}CDC&=ABB^{+}A^{*}AB=ABB^{+}B=AB=C,\\[4pt]DCD&=B^{+}A^{*}ABB^{+}A^{*}=B^{+}BB^{+}A^{*}=B^{+}A^{*}=D,\\[4pt](CD)^{*}&=D^{*}B^{*}A^{*}=A\left(B^{+}\right)^{*}B^{*}A^{*}=A\left(BB^{+}\right)^{*}A^{*}=ABB^{+}A^{*}=CD,\\[4pt](DC)^{*}&=B^{*}A^{*}D^{*}=B^{*}A^{*}A\left(B^{+}\right)^{*}=\left(B^{+}B\right)^{*}=B^{+}B=B^{+}A^{*}AB=DC.\end{aligned}}}
Therefore,
D
=
C
+
{\displaystyle D=C^{+}}
.
◻
{\displaystyle \square }
Exercise.
If
B
{\displaystyle B}
has orthonormal rows, then for any
K
{\displaystyle \mathbb {K} }
-matrix
A
{\displaystyle A}
of the right dimensions,
(
A
B
)
+
=
B
+
A
+
{\displaystyle (AB)^{+}=B^{+}A^{+}}
.
Another important special case which approximates closely that of invertible matrices is when
A
{\displaystyle A}
has full column rank and
B
{\displaystyle B}
has full row rank.
Proposition.
If
A
{\displaystyle A}
has full column rank and
B
{\displaystyle B}
has full row rank, then
(
A
B
)
+
=
B
+
A
+
{\displaystyle (AB)^{+}=B^{+}A^{+}}
.
Proof. Since
A
{\displaystyle A}
has full column rank,
A
∗
A
{\displaystyle A^{*}A}
is invertible so
(
A
∗
A
)
+
=
(
A
∗
A
)
−
1
{\displaystyle \left(A^{*}A\right)^{+}=\left(A^{*}A\right)^{-1}}
. Similarly, since
B
{\displaystyle B}
has full row rank,
B
B
∗
{\displaystyle BB^{*}}
is invertible so
(
B
B
∗
)
+
=
(
B
B
∗
)
−
1
{\displaystyle \left(BB^{*}\right)^{+}=\left(BB^{*}\right)^{-1}}
.
Write
D
=
B
+
A
+
=
B
∗
(
B
B
∗
)
−
1
(
A
∗
A
)
−
1
A
∗
{\displaystyle D=B^{+}A^{+}=B^{*}\left(BB^{*}\right)^{-1}\left(A^{*}A\right)^{-1}A^{*}}
(using reduction to the Hermitian case). We show that
D
{\displaystyle D}
satisfies the Moore–Penrose criteria.
C
D
C
=
A
B
B
∗
(
B
B
∗
)
−
1
(
A
∗
A
)
−
1
A
∗
A
B
=
A
B
=
C
,
D
C
D
=
B
∗
(
B
B
∗
)
−
1
(
A
∗
A
)
−
1
A
∗
A
B
B
∗
(
B
B
∗
)
−
1
(
A
∗
A
)
−
1
A
∗
=
B
∗
(
B
B
∗
)
−
1
(
A
∗
A
)
−
1
A
∗
=
D
,
C
D
=
A
B
B
∗
(
B
B
∗
)
−
1
(
A
∗
A
)
−
1
A
∗
=
A
(
A
∗
A
)
−
1
A
∗
=
(
A
(
A
∗
A
)
−
1
A
∗
)
∗
,
⇒
(
C
D
)
∗
=
C
D
,
D
C
=
B
∗
(
B
B
∗
)
−
1
(
A
∗
A
)
−
1
A
∗
A
B
=
B
∗
(
B
B
∗
)
−
1
B
=
(
B
∗
(
B
B
∗
)
−
1
B
)
∗
,
⇒
(
D
C
)
∗
=
D
C
.
{\displaystyle {\begin{aligned}CDC&=ABB^{*}\left(BB^{*}\right)^{-1}\left(A^{*}A\right)^{-1}A^{*}AB=AB=C,\\[4pt]DCD&=B^{*}\left(BB^{*}\right)^{-1}\left(A^{*}A\right)^{-1}A^{*}ABB^{*}\left(BB^{*}\right)^{-1}\left(A^{*}A\right)^{-1}A^{*}=B^{*}\left(BB^{*}\right)^{-1}\left(A^{*}A\right)^{-1}A^{*}=D,\\[4pt]CD&=ABB^{*}\left(BB^{*}\right)^{-1}\left(A^{*}A\right)^{-1}A^{*}=A\left(A^{*}A\right)^{-1}A^{*}=\left(A\left(A^{*}A\right)^{-1}A^{*}\right)^{*},\\\Rightarrow (CD)^{*}&=CD,\\[4pt]DC&=B^{*}\left(BB^{*}\right)^{-1}\left(A^{*}A\right)^{-1}A^{*}AB=B^{*}\left(BB^{*}\right)^{-1}B=\left(B^{*}\left(BB^{*}\right)^{-1}B\right)^{*},\\\Rightarrow (DC)^{*}&=DC.\end{aligned}}}
Therefore,
D
=
C
+
{\displaystyle D=C^{+}}
.
◻
{\displaystyle \square }
We finally derive a formula for calculating the Moore-Penrose inverse of
A
A
∗
{\displaystyle AA^{*}}
.
Proposition.
If
B
=
A
∗
{\displaystyle B=A^{*}}
, then
(
A
B
)
+
=
A
+
∗
A
+
{\displaystyle (AB)^{+}=A^{+*}A^{+}}
.
Proof. Here,
B
=
A
∗
{\displaystyle B=A^{*}}
, and thus
C
=
A
A
∗
{\displaystyle C=AA^{*}}
and
D
=
A
+
∗
A
+
{\displaystyle D=A^{+*}A^{+}}
. We show that indeed
D
{\displaystyle D}
satisfies the four Moore–Penrose criteria.
C
D
C
=
A
A
∗
A
+
∗
A
+
A
A
∗
=
A
(
A
+
A
)
∗
A
+
A
A
∗
=
A
A
+
A
A
+
A
A
∗
=
A
A
+
A
A
∗
=
A
A
∗
=
C
D
C
D
=
A
+
∗
A
+
A
A
∗
A
+
∗
A
+
=
A
+
∗
A
+
A
(
A
+
A
)
∗
A
+
=
A
+
∗
A
+
A
A
+
A
A
+
=
A
+
∗
A
+
A
A
+
=
A
+
∗
A
+
=
D
(
C
D
)
∗
=
(
A
A
∗
A
+
∗
A
+
)
∗
=
A
+
∗
A
+
A
A
∗
=
A
+
∗
(
A
+
A
)
∗
A
∗
=
A
+
∗
A
∗
A
+
∗
A
∗
=
(
A
A
+
)
∗
(
A
A
+
)
∗
=
A
A
+
A
A
+
=
A
(
A
+
A
)
∗
A
+
=
A
A
∗
A
+
∗
A
+
=
C
D
(
D
C
)
∗
=
(
A
+
∗
A
+
A
A
∗
)
∗
=
A
A
∗
A
+
∗
A
+
=
A
(
A
+
A
)
∗
A
+
=
A
A
+
A
A
+
=
(
A
A
+
)
∗
(
A
A
+
)
∗
=
A
+
∗
A
∗
A
+
∗
A
∗
=
A
+
∗
(
A
+
A
)
∗
A
∗
=
A
+
∗
A
+
A
A
∗
=
D
C
{\displaystyle {\begin{aligned}CDC&=AA^{*}A^{+*}A^{+}AA^{*}=A\left(A^{+}A\right)^{*}A^{+}AA^{*}=AA^{+}AA^{+}AA^{*}=AA^{+}AA^{*}=AA^{*}=C\\[4pt]DCD&=A^{+*}A^{+}AA^{*}A^{+*}A^{+}=A^{+*}A^{+}A\left(A^{+}A\right)^{*}A^{+}=A^{+*}A^{+}AA^{+}AA^{+}=A^{+*}A^{+}AA^{+}=A^{+*}A^{+}=D\\[4pt](CD)^{*}&=\left(AA^{*}A^{+*}A^{+}\right)^{*}=A^{+*}A^{+}AA^{*}=A^{+*}\left(A^{+}A\right)^{*}A^{*}=A^{+*}A^{*}A^{+*}A^{*}\\&=\left(AA^{+}\right)^{*}\left(AA^{+}\right)^{*}=AA^{+}AA^{+}=A\left(A^{+}A\right)^{*}A^{+}=AA^{*}A^{+*}A^{+}=CD\\[4pt](DC)^{*}&=\left(A^{+*}A^{+}AA^{*}\right)^{*}=AA^{*}A^{+*}A^{+}=A\left(A^{+}A\right)^{*}A^{+}=AA^{+}AA^{+}\\&=\left(AA^{+}\right)^{*}\left(AA^{+}\right)^{*}=A^{+*}A^{*}A^{+*}A^{*}=A^{+*}\left(A^{+}A\right)^{*}A^{*}=A^{+*}A^{+}AA^{*}=DC\end{aligned}}}
Therefore,
D
=
C
+
{\displaystyle D=C^{+}}
. In other words:
(
A
A
∗
)
+
=
A
+
∗
A
+
{\displaystyle \left(AA^{*}\right)^{+}=A^{+*}A^{+}}
and, since
(
A
∗
)
∗
=
A
{\displaystyle \left(A^{*}\right)^{*}=A}
(
A
∗
A
)
+
=
A
+
A
+
∗
{\displaystyle \left(A^{*}A\right)^{+}=A^{+}A^{+*}}
◻
{\displaystyle \square }
Projectors and subspaces
edit
The defining feature of classical inverses is that
A
A
−
1
=
A
−
1
A
=
I
.
{\displaystyle AA^{-1}=A^{-1}A=I.}
What can we say about
A
A
+
{\displaystyle AA^{+}}
and
A
+
A
{\displaystyle A^{+}A}
?
We can derive some properties easily from the more basic properties above:
Exercise.
Let
A
{\displaystyle A}
be a
K
{\displaystyle \mathbb {K} }
-matrix. Then
(
A
A
+
)
2
=
(
A
A
+
)
,
(
A
+
A
)
2
=
(
A
+
A
)
,
(
A
A
+
)
∗
=
(
A
A
+
)
{\displaystyle (AA^{+})^{2}=(AA^{+}),(A^{+}A)^{2}=(A^{+}A),(AA^{+})^{*}=(AA^{+})}
and
(
A
+
A
)
∗
=
(
A
+
A
)
{\displaystyle (A^{+}A)^{*}=(A^{+}A)}
We can conclude that
P
=
A
A
+
{\displaystyle P=AA^{+}}
and
Q
=
A
+
A
{\displaystyle Q=A^{+}A}
are orthogonal projections.
Proposition.
Let
A
{\displaystyle A}
be a
K
{\displaystyle \mathbb {K} }
-matrix. Then
P
=
A
A
+
{\displaystyle P=AA^{+}}
and
Q
=
A
+
A
{\displaystyle Q=A^{+}A}
are orthogonal projections
Proof. Indeed, consider the operator
P
{\displaystyle P}
: any vector decomposes as
x
=
P
x
+
(
I
−
P
)
x
{\displaystyle x=Px+(I-P)x}
and for all vectors
x
{\displaystyle x}
and
y
{\displaystyle y}
satisfying
P
x
=
x
{\displaystyle Px=x}
and
(
I
−
P
)
y
=
y
{\displaystyle (I-P)y=y}
, we have
x
∗
y
=
(
P
x
)
∗
(
I
−
P
)
y
=
x
∗
P
∗
(
I
−
P
)
y
=
x
∗
P
(
I
−
P
)
y
=
0
{\displaystyle x^{*}y=(Px)^{*}(I-P)y=x^{*}P^{*}(I-P)y=x^{*}P(I-P)y=0}
.
It follows that
P
A
=
A
A
+
A
=
A
{\displaystyle PA=AA^{+}A=A}
and
A
+
P
=
A
+
A
A
+
=
A
+
{\displaystyle A^{+}P=A^{+}AA^{+}=A^{+}}
. Similarly,
Q
A
+
=
A
+
{\displaystyle QA^{+}=A^{+}}
and
A
Q
=
A
{\displaystyle AQ=A}
. The orthogonal components are now readily identified.
◻
{\displaystyle \square }
We finish our analysis by determining image and kernel of the mappings encoded by the Moore-Penrose inverse.
Proposition.
Let
A
{\displaystyle A}
be a
K
{\displaystyle \mathbb {K} }
-matrix. Then
Ker
(
A
+
)
=
Ker
(
A
∗
)
{\displaystyle \operatorname {Ker} \left(A^{+}\right)=\operatorname {Ker} \left(A^{*}\right)}
and
Im
(
A
+
)
=
Im
(
A
∗
)
{\displaystyle \operatorname {Im} \left(A^{+}\right)=\operatorname {Im} \left(A^{*}\right)}
.
Proof. If
y
{\displaystyle y}
belongs to the range of
A
{\displaystyle A}
then for some
x
{\displaystyle x}
,
y
=
A
x
{\displaystyle y=Ax}
and
P
y
=
P
A
x
=
A
x
=
y
{\displaystyle Py=PAx=Ax=y}
. Conversely, if
P
y
=
y
{\displaystyle Py=y}
then
y
=
A
A
+
y
{\displaystyle y=AA^{+}y}
so that
y
{\displaystyle y}
belongs to the range of
A
{\displaystyle A}
. It follows that
P
{\displaystyle P}
is the orthogonal projector onto the range of
A
{\displaystyle A}
.
I
−
P
{\displaystyle I-P}
is then the orthogonal projector onto the orthogonal complement of the range of
A
{\displaystyle A}
, which equals the kernel of
A
∗
{\displaystyle A^{*}}
.
A similar argument using the relation
Q
A
∗
=
A
∗
{\displaystyle QA^{*}=A^{*}}
establishes that
Q
{\displaystyle Q}
is
the orthogonal projector onto the range of
A
∗
{\displaystyle A^{*}}
and
(
I
−
Q
)
{\displaystyle (I-Q)}
is the orthogonal projector onto the kernel of
A
{\displaystyle A}
.
Using the relations
P
(
A
+
)
∗
=
P
∗
(
A
+
)
∗
=
(
A
+
P
)
∗
=
(
A
+
)
∗
{\displaystyle P\left(A^{+}\right)^{*}=P^{*}\left(A^{+}\right)^{*}=\left(A^{+}P\right)^{*}=\left(A^{+}\right)^{*}}
and
P
=
P
∗
=
(
A
+
)
∗
A
∗
{\displaystyle P=P^{*}=\left(A^{+}\right)^{*}A^{*}}
it follows that the range of P equals the range of
(
A
+
)
∗
{\displaystyle \left(A^{+}\right)^{*}}
, which in turn implies that the range of
I
−
P
{\displaystyle I-P}
equals the kernel of
A
+
{\displaystyle A^{+}}
. Similarly
Q
A
+
=
A
+
{\displaystyle QA^{+}=A^{+}}
implies that the range of
Q
{\displaystyle Q}
equals the range of
A
+
{\displaystyle A^{+}}
. Therefore, we find,
Ker
(
A
+
)
=
Ker
(
A
∗
)
.
Im
(
A
+
)
=
Im
(
A
∗
)
.
{\displaystyle {\begin{aligned}\operatorname {Ker} \left(A^{+}\right)&=\operatorname {Ker} \left(A^{*}\right).\\\operatorname {Im} \left(A^{+}\right)&=\operatorname {Im} \left(A^{*}\right).\\\end{aligned}}}
◻
{\displaystyle \square }
We present two applications of the Moore-Penrose inverse in solving linear systems of equations.
Least-squares minimization
edit
Moore-Penrose inverses can be used for least-squares minimisation of a system of equations that might not necessarily have an exact solution.
Proposition.
For any
m
×
n
{\displaystyle m\times n}
matrix
A
{\displaystyle A}
,
‖
A
x
−
b
‖
2
≥
‖
A
z
−
b
‖
2
{\displaystyle \|Ax-b\|_{2}\geq \|Az-b\|_{2}}
where
z
=
A
+
b
{\displaystyle z=A^{+}b}
.
Proof. We first note that (stating the complex case), using the fact that
P
=
A
A
+
{\displaystyle P=AA^{+}}
satisfies
P
A
=
A
{\displaystyle PA=A}
and
P
=
P
∗
{\displaystyle P=P^{*}}
, we have
A
∗
(
A
z
−
b
)
=
A
∗
(
A
A
+
b
−
b
)
=
A
∗
(
P
b
−
b
)
=
A
∗
P
∗
b
−
A
∗
b
=
(
P
A
)
∗
b
−
A
∗
b
=
0
{\displaystyle {\begin{alignedat}{2}A^{*}(Az-b)&=A^{*}(AA^{+}b-b)\\&=A^{*}(Pb-b)\\&=A^{*}P^{*}b-A^{*}b\\&=(PA)^{*}b-A^{*}b\\&=0\end{alignedat}}}
so that (
c.c.
{\displaystyle {\text{c.c.}}}
stands for the Hermitian conjugate of the previous term in the following)
‖
A
x
−
b
‖
2
2
=
‖
A
z
−
b
‖
2
2
+
(
A
(
x
−
z
)
)
∗
(
A
z
−
b
)
+
c.c.
+
‖
A
(
x
−
z
)
‖
2
2
=
‖
A
z
−
b
‖
2
2
+
(
x
−
z
)
∗
A
∗
(
A
z
−
b
)
+
c.c.
+
‖
A
(
x
−
z
)
‖
2
2
=
‖
A
z
−
b
‖
2
2
+
‖
A
(
x
−
z
)
‖
2
2
≥
‖
A
z
−
b
‖
2
2
{\displaystyle {\begin{alignedat}{2}\|Ax-b\|_{2}^{2}&=\|Az-b\|_{2}^{2}+(A(x-z))^{*}(Az-b)+{\text{c.c.}}+\|A(x-z)\|_{2}^{2}\\&=\|Az-b\|_{2}^{2}+(x-z)^{*}A^{*}(Az-b)+{\text{c.c.}}+\|A(x-z)\|_{2}^{2}\\&=\|Az-b\|_{2}^{2}+\|A(x-z)\|_{2}^{2}\\&\geq \|Az-b\|_{2}^{2}\end{alignedat}}}
as claimed.
◻
{\displaystyle \square }
Remark.
This lower bound need not be zero as the system
A
x
=
b
{\displaystyle Ax=b}
may not have a solution (e.g. when the matrix
A does not have full rank or the system is overdetermined).
If
A
{\displaystyle A}
is injective i.e. one-to-one (which implies
m
≥
n
{\displaystyle m\geq n}
), then the bound is attained uniquely at
z
{\displaystyle z}
.
Minimum-norm solution to a linear system
edit
The proof above also shows that if the system
A
x
=
b
{\displaystyle Ax=b}
is satisfiable i.e. has a solution, then necessarily
z
=
A
+
b
{\displaystyle z=A^{+}b}
is a solution (not necessarily unique). We can say more:
Proposition.
If the system
A
x
=
b
{\displaystyle Ax=b}
is satisfiable, then
z
=
A
+
b
{\displaystyle z=A^{+}b}
is the unique solution with smallest Euclidean norm.
Proof. Note first, with
Q
=
A
+
A
{\displaystyle Q=A^{+}A}
, that
Q
z
=
A
+
A
A
+
b
=
A
+
b
=
z
{\displaystyle Qz=A^{+}AA^{+}b=A^{+}b=z}
and that
Q
∗
=
Q
{\displaystyle Q^{*}=Q}
. Therefore, assuming that
A
x
=
b
{\displaystyle Ax=b}
, we have
z
∗
(
x
−
z
)
=
(
Q
z
)
∗
(
x
−
z
)
=
z
∗
Q
(
x
−
z
)
=
z
∗
(
A
+
A
x
−
z
)
=
z
∗
(
A
+
b
−
z
)
=
0.
{\displaystyle {\begin{aligned}z^{*}(x-z)&=(Qz)^{*}(x-z)\\&=z^{*}Q(x-z)\\&=z^{*}\left(A^{+}Ax-z\right)\\&=z^{*}\left(A^{+}b-z\right)\\&=0.\end{aligned}}}
Thus
‖
x
‖
2
2
=
‖
z
+
(
x
−
z
)
‖
2
2
=
‖
z
‖
2
2
+
z
∗
(
x
−
z
)
+
z
∗
(
x
−
z
)
¯
+
‖
x
−
z
‖
2
2
=
‖
z
‖
2
2
+
‖
x
−
z
‖
2
2
≥
‖
z
‖
2
2
{\displaystyle {\begin{alignedat}{2}\|x\|_{2}^{2}&=\|z+(x-z)\|_{2}^{2}\\&=\|z\|_{2}^{2}+z^{*}(x-z)+{\overline {z^{*}(x-z)}}+\|x-z\|_{2}^{2}\\&=\|z\|_{2}^{2}+\|x-z\|_{2}^{2}\\&\geq \|z\|_{2}^{2}\end{alignedat}}}
with equality if and only if
x
=
z
{\displaystyle x=z}
, as was to be shown.
◻
{\displaystyle \square }
An immediate consequence of this result is that
z
{\displaystyle z}
is also the uniquely smallest solution to the least-squares minimization problem for all
A
x
=
b
{\displaystyle Ax=b}
, including when
A
{\displaystyle A}
is neither injective nor surjective. It can be shown that the least-squares approximation
A
z
=
y
≈
b
{\displaystyle Az=y\approx b}
is unique. Thus it is necessary and sufficient for all
x
{\displaystyle x}
that solve the least-squares minimization to satisfy
A
x
=
y
=
A
z
=
A
A
+
b
{\displaystyle Ax=y=Az=AA^{+}b}
. This system always has a solution (not necessarily unique) as
A
z
{\displaystyle Az}
lies in the column space of
A
{\displaystyle A}
. From the above result the smallest
x
{\displaystyle x}
which solves this system is
A
+
(
A
A
+
b
)
=
A
+
b
=
z
{\displaystyle A^{+}(AA^{+}b)=A^{+}b=z}
.