Linear Algebra/Singular Value Decomposition

Singular Value DecompositionEdit

Given any   matrix  , the singular value decomposition (SVD) is   where   is an   unitary matrix,   is an   unitary matrix, and   is an   diagonal matrix where all off-diagonal entries are 0 and the diagonal entries are all non-negative real values. The diagonal entries of   are referred to as "singular values".

As an example, consider the shear transformation  . The singular value decomposition of   is:

 

The set of all unit length vectors   such that   form a sphere of radius 1 around the origin. When   is applied to this sphere, it becomes an ellipsoid. The principal radii of this ellipsoid are the singular values, and their directions form the columns of  .


Existence of the singular value decompositionEdit

One fact that is not immediately obvious is that the singular value decomposition always exists:

Theorem (existence of the singular value decomposition)

Given any   matrix  , there exists   unitary matrix  ,   unitary matrix  , and   diagonal matrix   where all off-diagonal entries are 0 and the diagonal entries are all non-negative real values such that  .

In essence, any linear transformation is a rotation, followed by stretching or shrinking parallel to each axis (with some dimensions added or zeroed out of existence), followed by another rotation.

The following proof will demonstrate that the singular value decomposition always exists. An outline of the proof will be given first:

Proof outline

We need to prove that an arbitrary linear transform   is a rotation:  , followed by scaling parallel to each axis:  , and lastly followed by another rotation:  , giving  .

If the columns of   are already mutually orthogonal, then the first rotation is not necessary:  . The entries of   are the lengths of the vectors formed by the columns of  , and   is a rotation that rotates the elementary basis vectors of   to be parallel with the columns of  .

In most cases however, the columns of   are not mutually orthogonal. In this case, the rotation   is non-trivial.  , so   must be chosen so that the columns of   are mutually orthogonal. Let  . We need to choose orthonormal vectors   so that   are all mutually orthogonal. This can be done iteratively. Imagine that we have chosen   so that when given any vector   that is orthogonal to  , that   is orthogonal to  . The effort now switches to finding an orthonormal set of vectors   confined to the space of vectors that are perpendicular to   such that   are mutually orthogonal.

Let   be a unitary matrix with   as the first column. Factoring   from the left side of   to get   results in a new set of orthonormal vectors that are the columns of  . The goal of having the columns of   be mutually orthogonal is converted to having the columns of   be mutually orthogonal with   effectively replacing  .   transforms to  , and the space of vectors orthogonal to   transforms to the space spanned by the standard basis vectors  . The first column of   is   and so is orthogonal to all other columns.

If   is a unitary matrix where the first column of   is   normalized to unit length, then factoring   from the left side of   to get   results in a matrix in which the first column is parallel to the standard basis vector  . The first column of   is orthogonal to all other columns, so the first column of   is orthogonal to all other columns, so hence the first row of   contains all 0s except for the first column.

  can now be determined recursively with the dimension reduced to  , and   is replaced with   with the first row and column removed. This forms the inductive component of the coming proof.

Lastly, how do we know that there exists   so that when given any vector   that is orthogonal to  , that   is orthogonal to  ? The answer will be that the unit length   that maximizes   is a valid  .

We are now ready to give the proof in full detail:

Proof of the existence of the singular value decomposition

This proof will proceed using induction on both   and  .

Base Case  

  has a single row, and therefore has the form   where   is an arbitrary unit length vector, and   is an arbitrary non-negative real number. Note that   and   exist for any single row matrix  .

Let  ,  , and   where   together form an mutually orthogonal set of unit length vectors.   can be determined via Gram-Schmidt orthogonalization. It is clear that:  .

Base Case  

  has a single column, and therefore has the form   where   is an arbitrary unit length vector, and   is an arbitrary non-negative real number. Note that   and   exist for any single column matrix  .

Let  ,  , and   where   together form an mutually orthogonal set of unit length vectors.   can be determined via Gram-Schmidt orthogonalization. It is clear that:  .

Inductive Case  

Let   denote the   standard basis vector of  . Let   denote a   matrix of 0s.

Maximize   subject to the constraint  . Let   be a unit length vector that maximizes  , and let  . Let   (if  , then   is an arbitrary unit length vector).

Using Gram-Schmidt orthogonalization, unitary matrices   and   can be determined such that the first columns of   and   respectively are   and  :   and  .   where  . It will now be proven that the first column and row of   contain all 0s except for the (1,1) entry which contains  :  .

 . This means that the first column of   is  .

To show that the first row of   is  , we will show that the first column of   is orthogonal to all of the other columns of  . This will require exploiting the fact that   maximizes   subject to the constraint  .

Let   be a parameterized unit length vector. Let  .

Taking the derivative of the constraint   gives  

  being maximized at   gives:

 

 

(  and   denote the real and imaginary components of a complex number respectively.)

Let   be arbitrary. Let  .   and   are orthogonal.

This gives:  .

Now let   where the   outside of the subscript is the imaginary constant.   and   are orthogonal.

This gives:  .

Hence:  .

Therefore, the first column of   is orthogonal to all of the other columns of  , and   has the form:  .

  is an   matrix, and therefore by an inductive argument,  . Finally,

  where  ,  , and  .