Statistics/Multivariate Data Analysis

Distributions

Multivariate Normal

The multivariate normal is just an extension of the normal distribution to the multivariate case. The simplest definition of the multivariate normal distribution can be given as follows:

Definition (Multivariate Normal Distribution):

A random vector $\mathbf {X}$ of dimension $p$ is said to follow a multivariate normal distribution with mean $\mu$ and covariance matrix $\Sigma$ if $\forall \mathbf {a} \in \mathbb {R} ^{p},\ \mathbf {a} ^{T}\mathbf {X} \sim {\mathcal {N}}(\mathbf {a} ^{T}\mu ,\mathbf {a} ^{T}\Sigma \mathbf {a} )$ . It is denoted by $\mathbf {X} \sim {\mathcal {N}}_{p}(\mu ,\Sigma )$ .

At first glance, the definition seems rather abstract and esoteric. After all, the univariate normal distribution has a specific form of density and a specific characteristic function, both of which are mathematically valid characterisations of any probability distribution. However, this kind of definition is necessary to deal with the case where $\Sigma$ is not strictly positive definite. In the case where $\Sigma$ is positive definite, it can be shown via Gauss-Markov theorem that the density function of $\mathbf {X} ,\ f_{\mathbf {X} }(\mathbf {x} )={\frac {1}{{\sqrt {2\pi }}|\Sigma |^{\frac {1}{2}}}}e^{-{\frac {1}{2}}(\mathbf {x} -\mu )^{T}\Sigma ^{-1}(\mathbf {x} -\mu )}$ . However, this will not be true when $\Sigma$ is singular, as in that case the density function will not exist. But a definition based on the characteristic function will still work. A piecewise density function can still be derived based on the eigenvalues of $\Sigma$ , but it is not a true density.

Matrix-variate Normal

We will first need to develop some notation. Let $X_{m\times n}$ be a matrix with columns $c_{(1)},c_{(2)},\ldots ,c_{(n)}$ . Then we define the column vector ${\textstyle vec(X):={\begin{bmatrix}c_{(1)}\\c_{(2)}\\\vdots \\c_{(n)}\end{bmatrix}}}$ , and we call it the vectorisation of $X$ .

Definition (Matrix-variate Normal):

We say $X_{m\times n}$ follows a matrix-variate normal distribution with mean matrix $\mu _{m\times n}$ and covariance matrix $\Sigma _{mn\times mn}$ if $vec(X)\sim {\mathcal {N}}_{mn}(vec(\mu ),\Sigma )$

The reader here should notice that this is simply imposing a normal distribution on the vectorisation of $X$ . Thus, many of the results that are true for multivariate normal random vector will also be true for the vectorisation of matrix variate normal random variable.

Now that we have a definition of the multivariate and matrix-variate normal distribution, our next aim should be to find a similar analogue of the univariate $\chi _{(p)}^{2}$ distribution with $p$ degrees of freedom and Student's $t$ distribution, both of which are very closely related to the univariate normal distribution. We know that if $X_{i}\sim {\mathcal {N}}(\mu _{i},\sigma _{i}^{2})\ \forall i\in \{1,2,\ldots n\}$ then $\sum _{i=1}^{n}{\frac {(X_{i}-\mu _{i})^{2}}{\sigma _{i}^{2}}}\sim \chi _{(n)}^{2}$ . What would be an analogue of this for the multivariate case?

Wishart Distribution

Definition (Wishart Distribution):

If $\mathbf {X} _{i}{\overset {iid}{\sim }}{\mathcal {N}}_{p}(\mu ,\Sigma )$ for $i\in \{1,2,\ldots n\}$ , then $S=\sum _{i=1}^{n}(\mathbf {X_{i}} -\mu )(\mathbf {X_{i}} -\mu )^{T}$ is said to have a Wishart distribution with $n$ degrees of freedom and associated matrix $\Sigma$ . It is denoted by $S\sim W_{p}(n,\Sigma )$ .

Although there does exist a form of density for the Wishart distribution, it is not necessary to prove most of the results we will require. An important thing to note, however, is that if $S$ follows a Wishart distribution, then ${\frac {\mathbf {a} ^{T}S\mathbf {a} }{\mathbf {a} ^{T}\Sigma \mathbf {a} }}\sim \chi _{(n)}^{2}$ . This result can be easily proved by multiplying $S$ on the left and right by $\mathbf {a} ^{T}$ and $\mathbf {a}$ , and then using the fact that $\mathbf {a} ^{T}\mathbf {X} \sim {\mathcal {N}}(\mathbf {a} ^{T}\mu ,\mathbf {a} ^{T}\Sigma \mathbf {a} )$ .

Statistics/Multivariate Data Analysis

Contents

Distributions

Multivariate Normal

Matrix-variate Normal

Wishart Distribution

Methodology