Statistics/Numerical Methods/Basic Linear Algebra and Gram-Schmidt Orthogonalization

Introduction

edit

Basically, all the sections found here can be also found in a linear algebra book. However, the Gram-Schmidt Orthogonalization is used in statistical algorithm and in the solution of statistical problems. Therefore, we briefly jump into the linear algebra theory which is necessary to understand Gram-Schmidt Orthogonalization.

The following subsections also contain examples. It is very important for further understanding that the concepts presented here are not only valid for typical vectors as tuple of real numbers, but also functions that can be considered vectors.

Fields

edit

Definition

edit

A set   with two operations   and   on its elements is called a field (or short  ), if the following conditions hold:

  1. For all   holds  
  2. For all   holds   (commutativity)
  3. For all   holds   (associativity)
  4. It exist a unique element  , called zero, such that for all   holds  
  5. For all   a unique element  , such that holds  
  6. For all   holds  
  7. For all   holds   (commutativity)
  8. For all   holds   (associativity)
  9. It exist a unique element  , called one, such that for all   holds  
  10. For all non-zero   a unique element  , such that holds  
  11. For all   holds   (distributivity)

The elements of   are also called scalars.

Examples

edit

It can easily be proven that real numbers with the well known addition and multiplication   are a field. The same holds for complex numbers with the addition and multiplication. Actually, there are not many more sets with two operations which fulfill all of these conditions.

For statistics, only the real and complex numbers with the addition and multiplication are important.

Vector spaces

edit

Definition

edit

A set   with two operations   and   on its elements is called a vector space over R, if the following conditions hold:

  1. For all   holds  
  2. For all   holds   (commutativity)
  3. For all   holds   (associativity)
  4. It exist a unique element  , called origin, such that for all   holds  
  5. For all   exists a unique element  , such that holds  
  6. For all   and   holds  
  7. For all   and   holds   (associativity)
  8. For all   and   holds  
  9. For all   and for all  holds   (distributivity wrt. vector addition)
  10. For all   and for all  holds   (distributivity wrt. scalar addition)

Note that we used the same symbols   and   for different operations in   and  . The elements of   are also called vectors.

Examples:

  1. The set   with the real-valued vectors   with elementwise addition   and the elementwise multiplication   is a vector space over  .
  2. The set of polynomials of degree  ,  , with usual addition and multiplication is a vector space over  .

Linear combinations

edit

A vector   can be written as a linear combination of vectors  , if

 

with  .

Examples:

  •   is a linear combination of   since  
  •   is a linear combination of   since  

Basis of a vector space

edit

A set of vectors   is called a basis of the vector space  , if

1. for each vector   exist scalars   such that   2. there is no subset of   such that 1. is fulfilled.

Note, that a vector space can have several bases.

Examples:

  • Each vector   can be written as  . Therefore is   a basis of  .
  • Each polynomial of degree   can be written as linear combination of   and therefore forms a basis for this vector space.

Actually, for both examples we would have to prove condition 2., but it is clear that it holds.

Dimension of a vector space

edit

A dimension of a vector space is the number of vectors which are necessary for a basis. A vector space has infinitely many number of basis, but the dimension is uniquely determined. Note that the vector space may have a dimension of infinity, e.g. consider the space of continuous functions.

Examples:

  • The dimension of   is three, the dimension of   is   .
  • The dimension of the polynomials of degree   is  .

Scalar products

edit

A mapping   is called a scalar product if the following holds for all   and   :

  1.  
  2.  
  3.   with  
  4.   with  

Examples:

  • The typical scalar product in   is  .
  •   is a scalar product on the vector space of polynomials of degree  .

Norm

edit

A norm of a vector is a mapping  , if holds

  1.   for all   and   (positive definiteness)
  2.   for all   and all  
  3.   for all   (triangle inequality)

Examples:

  • The   norm of a vector in   is defined as  .
  • Each scalar product generates a norm by  , therefore   is a norm for the polynomials of degree  .

Orthogonality

edit

Two vectors   and   are orthogonal to each other if  . In   it holds that the cosine of the angle between two vectors can expressed as

 .

If the angle between   and   is ninety degree (orthogonal) then the cosine is zero and it follows that  .

A set of vectors   is called orthonormal, if

 .

If we consider a basis   of a vector space then we would like to have a orthonormal basis. Why ?

Since we have a basis, each vector   and   can be expressed by   and  . Therefore the scalar product of   and   reduces to

   
 
 
 

Consequently, the computation of a scalar product is reduced to simple multiplication and addition if the coefficients are known. Remember that for our polynomials we would have to solve an integral!

Gram-Schmidt orthogonalization

edit

Algorithm

edit

The aim of the Gram-Schmidt orthogonalization is to find for a set of vectors   an equivalent set of orthonormal vectors   such that any vector which can be expressed as linear combination of   can also be expressed as linear combination of  :

1. Set   and  

2. For each   set   and  , in each step the vector   is projected on   and the result is subtracted from  .

 

Example

edit

Consider the polynomials of degree two in the interval  with the scalar product   and the norm  . We know that   and   are a basis for this vector space. Let us now construct an orthonormal basis:

Step 1a:  

Step 1b:  

Step 2a:  

Step 2b:  

Step 3a:  

Step 3b:  

It can be proven that   and   form a orthonormal basis with the above scalarproduct and norm.

Numerical instability

edit

Consider the vectors   and  . Assume that   is so small that computing   holds on a computer (see http://en.wikipedia.org/wiki/Machine_epsilon). Let compute a orthonormal basis for this vectors in   with the standard scalar product   and the norm  .

Step 1a.  

Step 1b.   with  

Step 2a.  

Step 2b.  

Step 3a.  

Step 3b.  

It obvious that for the vectors

-  

-  

-  

the scalarproduct  . All other pairs are also not zero, but they are multiplied with   such that we get a result near zero.

Modified Gram-Schmidt

edit

To solve the problem a modified Gram-Schmidt algorithm is used:

  1. Set   for all  
  2. for each   from   to   compute
    1.  
    2. for each   from   to   compute  

The difference is that we compute first our new   and subtract it from all other  . We apply the wrongly computed vector to all vectors instead of computing each   separately.

Example (recomputed)

edit

Step 1.  ,  ,  

Step 2a.   with  

Step 2b.  

Step 2c.  

Step 3a.  

Step 3b.  

Step 4a.  

We can easily verify that  .


Application

edit

Exploratory Project Pursuit

edit

In the analysis of high-dimensional data we usually analyze projections of the data. The approach results from the Theorem of Cramer-Wold that states that the multidimensional distribution is fixed if we know all one-dimensional projections. Another theorem states that most (one-dimensional) projections of multivariate data are looking normal, even if the multivariate distribution of the data is highly non-normal.

Therefore in Exploratory Projection Pursuit we judge the interestingness of a projection by comparison with a (standard) normal distribution. If we assume that the one-dimensional data   are standard normal distributed then after the transformation   with   the cumulative distribution function of the standard normal distribution then   is uniformly distributed in the interval  .

Thus the interesting can measured by   with   a density estimated from the data. If the density   is equal to   in the interval   then the integral becomes zero and we have found that our projected data are normally distributed. An value larger than zero indicates a deviation from the normal distribution of the projected data and hopefully an interesting distribution.

Expansion with orthonormal polynomials

edit

Let   a set of orthonormal polynomials with the scalar product   and the norm  . What can we derive about a densities   in the interval   ?

If   for some maximal degree   then it holds

 

We can also write   or empirically we get an estimator  .

We describe the term   and get for our integral

 

So using a orthonormal function set allows us to reduce the integral to a summation of coefficient which can be estimated from the data by plugging   in the formula above. The coefficients   can be precomputed in advance.

Normalized Legendre polynomials

edit

The only problem left is to find the set of orthonormal polynomials   upto degree  . We know that   form a basis for this space. We have to apply the Gram-Schmidt orthogonalization to find the orthonormal polynomials. This has been started in the first example.

The resulting polynomials are called normalized Legendre polynomials. Up to a scaling factor the normalized Legendre polynomials are identical to Legendre polynomials. The Legendre polynomials have a recursive expression of the form

 

So computing our integral reduces to computing   and   and using the recursive relationship to compute the  's. Please note that the recursion can be numerically unstable!

References

edit
  • Halmos, P.R. (1974). Finite-Dimensional Vector Spaces, Springer: New York