Topics in Abstract Algebra/Linear algebra

The Moore-Penrose inverseEdit

Inverse matrices play a key role in linear algebra, and particularly in computations. However, only square matrices can possibly be invertible. This leads us to introduce the Moore-Penrose inverse of a potentially non-square real- or complex-valued matrix, which satisfies some but not necessarily all of the properties of an inverse matrix.

Definition. Let   be an m-by-n matrix over a field   and   be an n-by-m matrix over  , where   is either  , the real numbers, or  , the complex numbers. Recall that   refers to the conjugate transpose of  . Then the following four criteria are called the Moore–Penrose conditions for  :

  1.  ,
  2.  ,
  3.  ,
  4.  .


We will see below that given a matrix  , there exists a unique matrix   that satisfies all four of the Moore–Penrose conditions. They generalise the properties of the usual inverse.

Remark. If   is an invertible square matrix, then the ordinary inverse   satisfies the Moore-Penrose conditions for  . Observe also that if   satisfies the Moore-Penrose conditions for  , then   satisfies the Moore-Penrose conditions for  .


Basic properties of the Hermitian conjugateEdit

We assemble some basic properties of the conjugate transpose for later use. In the following lemmas,   is a matrix with complex elements and n columns,   is a matrix with complex elements and n rows.

Lemma (1). For any  -matrix  ,  
Proof. The assumption says that all elements of A*A are zero. Therefore,

 

Therefore, all   equal 0 i.e.  .  

Lemma (2). For any  -matrix  ,  
Proof. :    

Lemma (3). For any  -matrix  ,  
Proof. This is proved in a manner similar to the argument of Lemma 2 (or by simply taking the Hermitian conjugate).  

Existence and uniquenessEdit

We establish existence and uniqueness of the Moore-Penrose inverse for every matrix.

Theorem. If   is a  -matrix and   and   satisfy the Moore-Penrose conditions for  , then  .
Proof. Let   be a matrix over   or  . Suppose that   and   are Moore–Penrose inverses of  . Observe then that

 

Analogously we conclude that  . The proof is completed by observing that then

   


Theorem. For every  -matrix   there is a matrix   satisfying the Moore-Penrose conditions for  
Proof. The proof proceeds in stages.

  is a 1-by-1 matrix

For any  , we define:

 

It is easy to see that   is a pseudoinverse of   (interpreted as a 1-by-1 matrix).

  is a square diagonal matrix

Let   be an n-by-n matrix over   with zeros off the diagonal. We define   as an n-by-n matrix over   with   as defined above. We write simply   for  .

Notice that   is also a matrix with zeros off the diagonal.

We now show that   is a pseudoinverse of  :

  1.  
  2.  
  3.  
  4.  

  is a general diagonal matrix

Let   be an m-by-n matrix over   with zeros off the main diagonal, where m and n are unequal. That is,   for some   when   and   otherwise.

Consider the case where  . Then we can rewrite   by stacking where   is a square diagonal m-by-m matrix, and   is the m-by-(nm) zero matrix. We define   as an n-by-m matrix over  , with   the pseudoinverse of   defined above, and   the (nm)-by-m zero matrix. We now show that   is a pseudoinverse of  :

  1. By multiplication of block matrices,   so by property 1 for square diagonal matrices   proven in the previous section, .
  2. Similarly,  , so  
  3. By 1 and property 3 for square diagonal matrices,  .
  4. By 2 and property 4 for square diagonal matrices,  

Existence for   such that   follows by swapping the roles of   and   in the   case and using the fact that  .

  is an arbitrary matrix

The singular value decomposition theorem states that there exists a factorization of the form

 

where:

  is an m-by-m unitary matrix over  .
  is an m-by-n matrix over   with nonnegative real numbers on the diagonal and zeros off the diagonal.
  is an n-by-n unitary matrix over  .[1]

Define   as  .

We now show that   is a pseudoinverse of  :

  1.  
  2.  
  3.  
  4.    

This leads us to the natural definition:

Definition (Moore-Penrose inverse). Let   be a  -matrix. Then the unique  -matrix satisfying the Moore-Penrose conditions for   is called the Moore-Penrose inverse   of  .


Basic propertiesEdit

We have already seen above that the Moore-Penrose inverse generalises the classical inverse to potentially non-square matrices. We will now list some basic properties of its interaction with the Hermitian conjugate, leaving most of the proofs as exercises to the reader.

Exercise. For any  -matrix  ,  


The following identities hold:

  1. A+ = A+ A+* A*
  2. A+ = A* A+* A+
  3. A = A+* A* A
  4. A = A A* A+*
  5. A* = A* A A+
  6. A* = A+ A A*

Proof of the first one:   and   imply that  . □

The remaining identities are left as exercises.

Reduction to the Hermitian caseEdit

The results of this section show that the computation of the pseudoinverse is reducible to its construction in the Hermitian case. It suffices to show that the putative constructions satisfy the defining criteria.

Proposition. For every  -matrix  ,  
Proof. Observe that

 

Similarly,   implies that   i.e.  .

Additionally,   so  .

Finally,   implies that  .

Therefore,  .  

Exercise. For every  -matrix  ,  


ProductsEdit

We now turn to calculating the Moore-Penrose inverse for a product of two matrices,  

Proposition. If   has orthonormal columns i.e.  , then for any  -matrix   of the right dimensions,  .
Proof. Since  ,  . Write   and  . We show that   satisfies the Moore–Penrose criteria for  .

 

Therefore,  .  

Exercise. If   has orthonormal rows, then for any  -matrix   of the right dimensions,  .


Another important special case which approximates closely that of invertible matrices is when   has full column rank and   has full row rank.

Proposition. If   has full column rank and   has full row rank, then  .
Proof. Since   has full column rank,   is invertible so  . Similarly, since   has full row rank,   is invertible so  .

Write  (using reduction to the Hermitian case). We show that   satisfies the Moore–Penrose criteria.

 

Therefore,  .  

We finally derive a formula for calculating the Moore-Penrose inverse of  .

Proposition. If  , then  .
Proof. Here,  , and thus   and  . We show that indeed   satisfies the four Moore–Penrose criteria.

 

Therefore,  . In other words:

 

and, since  

   

Projectors and subspacesEdit

The defining feature of classical inverses is that   What can we say about   and  ?

We can derive some properties easily from the more basic properties above:

Exercise. Let   be a  -matrix. Then   and  


We can conclude that   and   are orthogonal projections.

Proposition. Let   be a  -matrix. Then   and   are orthogonal projections
Proof. Indeed, consider the operator  : any vector decomposes as

 

and for all vectors   and   satisfying   and  , we have

 .

It follows that   and  . Similarly,   and  . The orthogonal components are now readily identified.  

We finish our analysis by determining image and kernel of the mappings encoded by the Moore-Penrose inverse.

Proposition. Let   be a  -matrix. Then   and  .
Proof. If   belongs to the range of   then for some  ,   and  . Conversely, if   then   so that   belongs to the range of  . It follows that   is the orthogonal projector onto the range of  .   is then the orthogonal projector onto the orthogonal complement of the range of  , which equals the kernel of  .

A similar argument using the relation   establishes that   is the orthogonal projector onto the range of   and   is the orthogonal projector onto the kernel of  .

Using the relations   and   it follows that the range of P equals the range of  , which in turn implies that the range of   equals the kernel of  . Similarly   implies that the range of   equals the range of  . Therefore, we find,

   

ApplicationsEdit

We present two applications of the Moore-Penrose inverse in solving linear systems of equations.

Least-squares minimizationEdit

Moore-Penrose inverses can be used for least-squares minimisation of a system of equations that might not necessarily have an exact solution.

Proposition. For any   matrix  ,   where  .
Proof. We first note that (stating the complex case), using the fact that   satisfies   and  , we have

 

so that (  stands for the Hermitian conjugate of the previous term in the following)

 

as claimed.  

Remark. This lower bound need not be zero as the system   may not have a solution (e.g. when the matrix A does not have full rank or the system is overdetermined). If   is injective i.e. one-to-one (which implies  ), then the bound is attained uniquely at  .


Minimum-norm solution to a linear systemEdit

The proof above also shows that if the system   is satisfiable i.e. has a solution, then necessarily   is a solution (not necessarily unique). We can say more:

Proposition. If the system   is satisfiable, then   is the unique solution with smallest Euclidean norm.
Proof. Note first, with  , that   and that  . Therefore, assuming that  , we have

 

Thus

 

with equality if and only if  , as was to be shown.  

An immediate consequence of this result is that   is also the uniquely smallest solution to the least-squares minimization problem for all  , including when   is neither injective nor surjective. It can be shown that the least-squares approximation   is unique. Thus it is necessary and sufficient for all   that solve the least-squares minimization to satisfy  . This system always has a solution (not necessarily unique) as   lies in the column space of  . From the above result the smallest   which solves this system is  .

NotesEdit

  1. Some authors use slightly different dimensions for the factors. The two definitions are equivalent.

ReferencesEdit