Statistics/Multivariate Data Analysis

Distributions edit

Multivariate Normal edit

The multivariate normal is just an extension of the normal distribution to the multivariate case. The simplest definition of the multivariate normal distribution can be given as follows:

Definition (Multivariate Normal Distribution):

A random vector   of dimension   is said to follow a multivariate normal distribution with mean   and covariance matrix   if  . It is denoted by  .

At first glance, the definition seems rather abstract and esoteric. After all, the univariate normal distribution has a specific form of density and a specific characteristic function, both of which are mathematically valid characterisations of any probability distribution. However, this kind of definition is necessary to deal with the case where   is not strictly positive definite. In the case where   is positive definite, it can be shown via Gauss-Markov theorem that the density function of  . However, this will not be true when   is singular, as in that case the density function will not exist. But a definition based on the characteristic function will still work. A piecewise density function can still be derived based on the eigenvalues of  , but it is not a true density.

Matrix-variate Normal edit

We will first need to develop some notation. Let   be a matrix with columns  . Then we define the column vector  , and we call it the vectorisation of  .

Definition (Matrix-variate Normal):

We say   follows a matrix-variate normal distribution with mean matrix   and covariance matrix   if  

The reader here should notice that this is simply imposing a normal distribution on the vectorisation of  . Thus, many of the results that are true for multivariate normal random vector will also be true for the vectorisation of matrix variate normal random variable.

Now that we have a definition of the multivariate and matrix-variate normal distribution, our next aim should be to find a similar analogue of the univariate   distribution with   degrees of freedom and Student's   distribution, both of which are very closely related to the univariate normal distribution. We know that if   then  . What would be an analogue of this for the multivariate case?

Wishart Distribution edit

Definition (Wishart Distribution):

If   for  , then   is said to have a Wishart distribution with   degrees of freedom and associated matrix  . It is denoted by  .

Although there does exist a form of density for the Wishart distribution, it is not necessary to prove most of the results we will require. An important thing to note, however, is that if   follows a Wishart distribution, then  . This result can be easily proved by multiplying   on the left and right by  and  , and then using the fact that  .

Methodology edit

  1. Principal Component Analysis
  2. Canonical Correlation Analysis