Probability/Important Distributions

Probability
Important Distributions

Distributions of a discrete random variable

Distributions of a continuous random variable

Uniform distribution (continuous)

The continuous uniform distribution is a model for 'no preference', i.e. all intervals of the same length on its support are equally likely ^[9] (it can be seen from the pdf corresponding to continuous uniform distribution). There is also discrete uniform distribution, but it is less important than continuous uniform distribution. So, from now on, simply 'uniform distribution' refers to the continuous one, instead of the discrete one.

Definition. (Uniform distribution)

Pdf's of

{\color {dodgerblue}{\mathcal {U}}[a,b]}

.

A random variable $X$ follows the uniform distribution, denoted by $X\sim {\mathcal {U}}[a,b]$ , if its pdf is $f(x)=1/(b-a),\quad x\in \operatorname {supp} (X)=[a,b],{\text{ and }}a\leq b.$

Remark.

The support of ${\mathcal {U}}[a,b]$ can also be alternatively $[a,b),(a,b]$ or $(a,b)$ , without affecting the probabilities of events involved, since the probability calculated, using pdf at a single point, is zero anyways.
The distribution ${\mathcal {U}}[0,1]$ is the standard uniform distribution.

Proposition.

Cdf's of

{\color {dodgerblue}{\mathcal {U}}[a,b]}

.

(Cdf of uniform distribution) The cdf of ${\mathcal {U}}[a,b]$ is $F(x)={\begin{cases}0,&x<a;\\(x-a)/(b-a),&a\leq x\leq b;\\1,&x>b.\end{cases}}$

Proof. $F(x)=\int _{-\infty }^{x}{\frac {\mathbf {1} \{a\leq x\leq b\}}{b-a}}\,dy={\frac {1}{b-a}}\int _{a}^{x}\mathbf {1} \{a\leq x\leq b\}\,dy={\begin{cases}0/(b-a),&x<a;\\[][y]_{a}^{x}/(b-a),&a\leq x\leq b;\\[][y]_{a}^{b}/(b-a),&x>b.\end{cases}}$ Then, the result follows.

$\Box$

Exponential distribution

The exponential distribution with rate parameter $\lambda$ is often used to describe the interarrival time of rare events with rate $\lambda$ .

Comparing this with the Poisson distribution, the exponential distribution describes the interarrival time of rare events, while Poisson distribution describes the number of occurrences of rare events within a fixed time interval.

By definition of rate, when the rate $\uparrow$ , then interarrival time $\downarrow$ (i.e. frequency of the rare event $\uparrow$ ).

So, we would like the pdf to be more skewed to left when $\lambda \uparrow$ (i.e. the pdf has higher value for small $x$ when $\lambda \uparrow$ ), so that areas under the pdf for intervals involving small value of $x$ $\uparrow$ when $\lambda \uparrow$ .

Also, since with a fixed rate $\lambda$ , the interarrival time should be less likely of higher value. So, intuitively, we would also like the pdf to be a strictly decreasing function, so that the probability involved (area under the pdf for some interval) $\downarrow$ when $x\uparrow$ .

As we can see, the pdf of exponential distribution satisfies both of these properties.

Definition. (Exponential distribution)

Pdf's of

{\color {darkorange}\operatorname {Exp} (0.5)},{\color {purple}\operatorname {Exp} (1)}

and

{\color {royalblue}\operatorname {Exp} (1.5)}

.

A random variable $X$ follows the exponential distribution with positive rate parameter $\lambda$ , denoted by $X\sim \operatorname {Exp} (\lambda )$ , if its pdf is $f(x)=\lambda e^{-\lambda x},\quad x\in \operatorname {supp} (X)=[0,\infty ).$

Proposition. (Cdf of exponential distribution)

Cdf's of

{\color {darkorange}\operatorname {Exp} (0.5)},{\color {purple}\operatorname {Exp} (1)}

and

{\color {royalblue}\operatorname {Exp} (1.5)}

.

The cdf of $\operatorname {Exp} (\lambda )$ is $F(x)=1-e^{-\lambda x},\quad x\geq 0.$

Proof. Suppose $X\sim \operatorname {Exp} (\lambda )$ . The cdf of $X$ is ${\begin{aligned}F(x)&=\int _{-\infty }^{x}\lambda e^{-\lambda y}\mathbf {1} \{y\geq 0\}\,dy\\&={\begin{cases}\int _{0}^{x}\lambda e^{-\lambda y}\,dy,&x\geq 0;\\0,&x<0\\\end{cases}}&\left({\text{When }}x<0,x\notin \operatorname {supp} (X),{\text{ so }}F(x)=\mathbb {P} (X\leq x)=0\right)\\&=\mathbf {1} \{x\geq 0\}\lambda \int _{0}^{x}e^{-\lambda y}\,dy\\&=\mathbf {1} \{x\geq 0\}{\frac {\lambda }{-\lambda }}[e^{-\lambda }y]_{0}^{x}\\&=-\mathbf {1} \{x\geq 0\}(e^{-\lambda x}-1)\\&=(1-e^{-\lambda x})\mathbf {1} \{x\geq 0\}.\\\end{aligned}}$

$\Box$

Proposition. (Memorylessness of exponential distribution) If $X\sim \operatorname {Exp} (\lambda )$ , then $\mathbb {P} (X>s+t|X>s)=\mathbb {P} (X>t)$ for each nonnegative number $s$ and $t$ .

Proof. $\mathbb {P} (X>s+t|X>s){\overset {\text{ def }}{=}}{\frac {\mathbb {P} (X>s+t\cap X>s)}{\mathbb {P} (X>s)}}={\frac {\mathbb {P} (X>s+t)}{\mathbb {P} (X>s)}}={\frac {1-(1-e^{-\lambda (s+t)})}{1-(1-e^{-\lambda s})}}={\frac {e^{-\lambda (s+t)}}{e^{-\lambda s}}}=e^{-\lambda t}=\mathbb {P} (X>t).$

$\Box$

Remark.

$X>s+t$ can be interpreted as 'the rare event will not occur within next $t$ units of time';
$X>s$ can be interpreted as 'the rare event has not occurred for past $s$ units of time'.
It implies that the condition $X>s$ does not affect the distribution of the remaining waiting time for the rare event (it still follows exponential distribution with the same parameter).
So, we can assume the arrival process of the event starts afresh at arbitrary time point of observation.

Gamma distribution

Gamma distribution is a generalized exponential distribution, in the sense that we can also change the shape of the pdf of exponential distribution.

Definition. (Gamma distribution)

Pdf's of

{\color {red}\operatorname {Gamma} (1,1)},{\color {green}\operatorname {Gamma} (2,1)},{\color {blue}\operatorname {Gamma} (3,1)}

and

{\color {magenta}\operatorname {Gamma} (3,0.5)}

.

A random variable $X$ follows the gamma distribution with positive shape parameter $\alpha$ and positive rate parameter $\lambda$ , denoted by $X\sim \operatorname {Gamma} (\alpha ,\lambda )$ , if its pdf is $f(x)={\frac {\lambda ^{\alpha }x^{\alpha -1}e^{-\lambda x}}{\Gamma (\alpha )}},\quad x\in \operatorname {supp} (X)=[0,\infty ).$

Cdf's of

{\color {red}\operatorname {Gamma} (1,1)},{\color {green}\operatorname {Gamma} (2,1)},{\color {blue}\operatorname {Gamma} (3,1)}

and

{\color {magenta}\operatorname {Gamma} (3,0.5)}

.

Remark.

$\operatorname {Gamma} (1,\lambda )\equiv \operatorname {Exp} (\lambda )$ , since the pdf of $\operatorname {Gamma} (1,\lambda )$

$f(x)={\frac {\lambda x^{1-1}e^{-\lambda }}{\underbrace {\Gamma (1)} _{=0!=1}}}\mathbf {1} \{x\geq 0\}=\lambda e^{-\lambda x},$

which is the pdf of

\operatorname {Exp} (\lambda )

.

Beta distribution

Beta distribution is a generalized ${\mathcal {U}}[0,1]$ , in the sense that we can also change the shape of the pdf, using two shape parameters.

Definition. (Beta distribution)

Pdf's of

{\color {red}\operatorname {Beta} (0.5,0.5)},{\color {royalblue}\operatorname {Beta} (5,1)},{\color {green}\operatorname {Beta} (1,3)}

,

{\color {purple}\operatorname {Beta} (2,2)}

and

{\color {darkorange}\operatorname {Beta} (2,5)}

.

A random variable $X$ follows the beta distribution with positive shape parameters $\alpha$ and $\beta$ , denoted by $X\sim \operatorname {Beta} (\alpha ,\beta )$ , if its pdf is $f(x)={\frac {\Gamma (\alpha +\beta )}{\Gamma (\alpha )\Gamma (\beta )}}x^{\alpha -1}(1-x)^{\beta -1},\quad x\in \operatorname {supp} (X)=[0,1].$

Cdf's of

{\color {red}\operatorname {Beta} (0.5,0.5)},{\color {royalblue}\operatorname {Beta} (5,1)},{\color {green}\operatorname {Beta} (1,3)}

,

{\color {purple}\operatorname {Beta} (2,2)}

and

{\color {darkorange}\operatorname {Beta} (2,5)}

.

Remark.

$\operatorname {Beta} (1,1)\equiv {\mathcal {U}}[0,1]$ , since the pdf of $\operatorname {Beta} (1,1)$ is

$f(x)={\frac {\overbrace {\Gamma (2)} ^{=1!=1}}{\underbrace {\Gamma (1)} _{=0!=1}\Gamma (1)}}x^{1-1}(1-x)^{1-1}\mathbf {1} \{0\leq x\leq 1\}=\mathbf {1} \{0\leq x\leq 1\},$

which is the pdf of

{\mathcal {U}}[0,1]

.

Cauchy distribution

The Cauchy distribution is a heavy-tailed distribution ^[10]. As a result, it is a 'pathological' distribution, in the sense that it has some counter-intuitive properties, e.g. undefined mean and variance, despite its mean and variance seems to be defined when we look at its graph directly.

Definition. (Cauchy distribution)

Pdf and cdf of

\operatorname {Cauchy} (0)

.

A random variable $X$ follows the Cauchy distribution with location parameter $\theta$ , denoted by $X\sim \operatorname {Cauchy} (\theta )$ , if its pdf is $f(x)={\frac {1}{\pi (1+(x-\theta )^{2})}},\quad x\in \operatorname {supp} (X)=\mathbb {R} .$

Remark.

This definition is referring to a special case of Cauchy distribution. To be more precise, there is also the scale parameter in the complete definition of Cauchy distribution, and it is set to be one in the pdf here.

This definition is used here for simplicity.

The pdf is symmetric about $\theta$ , since $f(\theta +x)=f(\theta -x)$ .

Normal distribution (very important)

The normal or Gaussian distribution is a thing of beauty, appearing in many places in nature. This is probably because sample means or sample sums often follow normal distributions approximately by central limit theorem. As a result, the normal distribution is important in statistics.

Definition. (Normal distribution)

Pdf's of

{\color {blue}{\mathcal {N}}(0,0.2)},{\color {red}{\mathcal {N}}(0,1)},{\color {darkorange}{\mathcal {N}}(0,5)}

and

{\color {darkgreen}{\mathcal {N}}(-2,0.5)}

.

A random variable $X$ follows the normal distribution with mean $\mu$ and variance $\sigma ^{2}$ , denoted by $X\sim {\mathcal {N}}(\mu ,\sigma ^{2})$ , if its pdf is $f(x)={\frac {1}{\sqrt {2\pi \sigma ^{2}}}}\exp \left(-{\frac {(x-\mu )^{2}}{2\sigma ^{2}}}\right),\quad x\in \operatorname {supp} (X)=\mathbb {R} .$

Cdf's of

{\color {blue}{\mathcal {N}}(0,0.2)},{\color {red}{\mathcal {N}}(0,1)},{\color {darkorange}{\mathcal {N}}(0,5)}

and

{\color {darkgreen}{\mathcal {N}}(-2,0.5)}

.

Remark.

The distribution ${\mathcal {N}}(0,1)$ is the standard normal distribution.

For ${\mathcal {N}}(0,1)$ , its pdf is often denoted by $\varphi (\cdot )$ , and its cdf is often denoted by $\Phi (\cdot )$ .
pdf of ${\mathcal {N}}(0,1)$ is $\varphi (x)={\frac {1}{\sqrt {2\pi }}}e^{-x^{2}/2}$ .
It follows that the pdf of ${\mathcal {N}}(\mu ,\sigma ^{2})$ is $(1/\sigma )\varphi (x-\mu /\sigma )$ .

It will be proved that $\mu$ is actually the mean, and $\sigma$ is actually the variance.
The pdf is symmetric about $\mu$ , since $f(\mu +x)=f(\mu -x)$ .

Proposition. (Distributions for linear transformation of normally distributed random variables) If $X\sim {\mathcal {N}}(\mu ,\sigma ^{2})$ , and ${\color {blue}a}$ and ${\color {red}b}$ are constants, $Y={\color {blue}a}X+{\color {red}b}\sim {\mathcal {N}}({\color {blue}a}\mu +{\color {red}b},{\color {blue}a^{2}}\sigma ^{2})$ .

Proof. Assume $a>0$ ^[11]. Let $F_{X}$ and $F_{Y}$ be cdf of $X$ and $Y$ respectively. Since $F_{Y}(y)=\mathbb {P} (Y\leq y)=\mathbb {P} ({\color {blue}a}X+{\color {red}b}\leq y)=\mathbb {P} (X\leq (y-{\color {red}b})/{\color {blue}a})=F_{X}{\big (}(y-{\color {red}b})/{\color {blue}a}{\big )},$ by differentiation, ${\begin{aligned}f_{Y}(y)&={\frac {1}{\color {blue}a}}f_{X}{\big (}(y-{\color {red}b})/{\color {blue}a}{\big )}\\&={\frac {1}{{\color {blue}a}{\sqrt {2\pi \sigma ^{2}}}}}\exp \left(-{\big (}(y-{\color {red}b})/{\color {blue}a}-\mu {\big )}^{2}/2\sigma ^{2}\right)\\&={\frac {1}{\sqrt {2\pi {\color {blue}a^{2}}\sigma ^{2}}}}\exp \left(-{\big (}y-({\color {blue}a}\mu +{\color {red}b}){\big )}^{2}/2{\color {blue}a^{2}}\sigma ^{2}\right)&\quad {\text{since }}a>0,\\\end{aligned}}$ which is the pdf of ${\mathcal {N}}({\color {blue}a}\mu +{\color {red}b},{\color {blue}a^{2}}\sigma ^{2})$ .

$\Box$

Remark.

A special case is when $a=1/\sigma$ and $b=-\mu /\sigma$ , $Y=aX+b=(X-\mu )/\sigma \sim {\mathcal {N}}(0,1)$ since
$a\mu +b=(1/\sigma )\mu -\mu /\sigma =0$ ;
$a^{2}\sigma ^{2}=\sigma ^{2}/\sigma ^{2}=1$ .
This shows that we can transform each normally distributed r.v. to the r.v. following standard normal distribution.
This can ease the calculation for the probability relating the normally distributed r.v., since we have the standard normal table, in which values of $\Phi (x)$ at different $x$ are given.
For some types of standard normal table, only the values of $\Phi (x)$ at different nonnegative $x$ are given.
Then, we can calculate its values at different negative $x$ using

$\Phi (-x)=1-\Phi (x).$

This formula holds since ${\begin{aligned}&&\phi (-y)&=\phi (y)\\&\Leftrightarrow &\int _{-\infty }^{x}\phi (-y)\,dy&=\int _{-\infty }^{x}\phi (y)\,dy\\&\Leftrightarrow &-\int _{\infty }^{-x}\phi (u)\,du&=\Phi (x)&{\text{let }}u=-y\Rightarrow dy=-dy.\\&\Leftrightarrow &[\Phi (u)]_{-x}^{\infty }&=\Phi (x)\\&\Leftrightarrow &\underbrace {\Phi (\infty )} _{=\mathbb {P} (\Omega )=1}-\Phi (-x)&=\Phi (x).\end{aligned}}$

Important distributions for statistics especially

The following distributions are important in statistics especially, and they are all related to normal distribution. We will introduce them briefly.

Chi-squared distribution

The chi-squared distribution is a special case of Gamma distribution, and also related to standard normal distribution.

Definition. (Chi-squared distribution)

Pdf's of

{\color {darkorange}\chi _{1}^{2}},{\color {green}\chi _{2}^{2}},{\color {royalblue}\chi _{3}^{2}},{\color {blue}\chi _{4}^{2}},{\color {purple}\chi _{6}^{2}}

and

{\color {red}\chi _{9}^{2}}

.

The chi-squared distribution with positive ${\color {blue}\nu }$ degrees of freedom, denoted by $\chi _{\color {blue}\nu }^{2}$ , is the distribution of $Z_{1}^{2}+\dotsb +Z_{\color {blue}\nu }^{2}$ , in which $Z_{1},\dotsc ,Z_{\color {blue}\nu }$ are i.i.d., and they all follow ${\mathcal {N}}(0,1)$ .

Cdf's of

{\color {darkorange}\chi _{1}^{2}},{\color {green}\chi _{2}^{2}},{\color {royalblue}\chi _{3}^{2}},{\color {blue}\chi _{4}^{2}},{\color {purple}\chi _{6}^{2}}

and

{\color {red}\chi _{9}^{2}}

.

Remark.

It can be proved that $\chi _{\color {blue}\nu }^{2}\equiv \operatorname {Gamma} ({\color {blue}\nu }/2,1/2)$ and thus $\operatorname {Gamma} (\alpha ,\lambda )\equiv {\frac {1}{2\lambda }}\chi _{2\alpha }^{2}$ . (Then, we can deduce the pdf of $\chi _{\nu }^{2}$ through this.)
This implies for the random variable $X\sim \chi _{2\alpha }^{2}$ , ${\frac {X}{2\lambda }}\sim \operatorname {Gamma} (\alpha ,\lambda )$ .
A random variable $X$ follows the chi-squared distribution with ${\color {blue}\nu }$ degrees of freedom is denoted by $X\sim \chi _{\color {blue}\nu }^{2}$ .

Student's t-distribution

The Student's $t$ -distribution is related to chi-squared distribution and normal distribution.

Definition. (Student's $t$ -distribution)

Pdf's of

{\color {darkorange}t_{1}},{\color {purple}t_{2}},{\color {royalblue}t_{5}}

and

t_{\infty }

.

The Student's $t$ -distribution with ${\color {blue}\nu }$ degrees of freedom, denoted by $t_{\color {blue}\nu }$ , is the distribution of ${\frac {Z}{\sqrt {Y/{\color {blue}\nu }}}}$ in which $Y\sim \chi _{\color {blue}\nu }^{2}$ and $Z\sim {\mathcal {N}}(0,1)$ .

Cdf's of

{\color {darkorange}t_{1}},{\color {purple}t_{2}},{\color {royalblue}t_{5}}

and

t_{\infty }

.

Remark.

$t_{1}=\operatorname {Cauchy} (0)$ and $t_{\infty }={\mathcal {N}}(0,1)$ (the $\infty$ is extended real number).
The tails of the pdf is heavier as ${\color {blue}\nu }\downarrow$ .
A random variable $X$ follows the (Student's ) $t$ -distribution with ${\color {blue}\nu }$ degrees of freedom is denoted by $X\sim t_{\color {blue}\nu }$ .
It can be proved that the pdf of $t_{\color {blue}\nu }$ is

$f(x;{\color {blue}\nu })={\frac {\Gamma {\big (}({\color {blue}\nu }+1)/2{\big )}}{{\sqrt {{\color {blue}\nu }\pi }}\Gamma ({\color {blue}\nu }/2)}}\left({\frac {\color {blue}\nu }{x^{2}+{\color {blue}\nu }}}\right)^{({\color {blue}\nu }+1)/2}.$

F-distribution

The $F$ -distribution is sort of a generalized Student's $t$ -distribution, in the sense that it has one more changeable parameter for another degrees of freedom.

Definition. ( $F$ -distribution) The $F$ -distribution with ${\color {red}\nu _{1}}$ and ${\color {blue}\nu _{2}}$ degrees of freedom, denoted by $F_{{\color {red}\nu _{1}},{\color {blue}\nu _{2}}}$ , is the distribution of ${\frac {X_{1}/{\color {red}\nu _{1}}}{X_{2}/{\color {blue}\nu _{2}}}}$ in which $X_{1}\sim \chi _{\color {red}\nu _{1}}^{2}$ and $X_{2}\sim \chi _{\color {blue}\nu _{2}}^{2}$ .

Pdf's of

{\color {red}F_{1,1}},F_{2,1},{\color {blue}F_{5,2}},{\color {green}F_{10,1}}

and

{\color {dimgray}F_{100,100}}

.

Cdf's of

{\color {red}F_{1,1}},F_{2,1},{\color {blue}F_{5,2}},{\color {green}F_{10,1}}

and

{\color {dimgray}F_{100,100}}

.

Remark.

$F_{1,\nu }=t_{\nu }^{\color {purple}2}$ .
A random variable $X$ following the $F$ -distribution with ${\color {red}\nu _{1}}$ and ${\color {blue}\nu _{2}}$ degrees of freedom is denoted by $X\sim F_{{\color {red}\nu _{1}},{\color {blue}\nu _{2}}}$ .
It can be proved that the pdf of $F_{{\color {red}\nu _{1}},{\color {blue}\nu _{2}}}$ is

$f(x;{\color {red}\nu _{1}},{\color {blue}\nu _{2}})={\frac {\Gamma {\big (}({\color {red}\nu _{1}}+{\color {blue}\nu _{2}})/2{\big )}{\color {red}\nu _{1}}^{{\color {red}\nu _{1}}/2}{\color {blue}\nu _{2}}^{{\color {blue}\nu _{2}}/2}}{\Gamma ({\color {red}\nu _{1}}/2)\Gamma ({\color {blue}\nu _{2}}/2)}}\cdot {\frac {x^{{\color {red}\nu _{1}}/2-1}}{({\color {blue}\nu _{2}}+{\color {red}\nu _{1}}x)^{({\color {red}\nu _{1}}+{\color {blue}\nu _{2}})/2}}}.$

If you are interested in knowing how chi-squared distribution, Student's $t$ -distribution, and $F$ -distribution are useful in statistics, then you may briefly look at, for instance, Statistics/Interval Estimation (applications in confidence interval construction) and Statistics/Hypothesis Testing (applications in hypothesis testing).

Joint distributions

Multinomial distribution

Motivation

Multinomial distribution is generalized binomial distribution, in the sense that each trial has more than two outcomes.

Suppose $n$ objects are to be allocated to $k$ cells independently, for which each object is allocated to one and only one cell, with probability $p_{i}$ to be allocated to the $i$ th cell ( $i=1,2,\dotsc ,k$ ) ^[12]. Let $X_{i}$ be the number of objects allocated to cell $i$ . We would like to calculate the probability $\mathbb {P} {\big (}\mathbf {X} {\overset {\text{ def }}{=}}(X_{1},\dotsc ,X_{k})^{T}=\mathbf {x} {\overset {\text{ def }}{=}}(x_{1},\dotsc ,x_{k})^{T}{\big )}$ , i.e. the probability that $i$ th cell has $x_{i}$ objects.

We can regard each allocation as an independent trial with $k$ outcomes (since it can be allocated to one and only one of $k$ cells). We can recognize that the allocation of $n$ objects is partition of $n$ objects into $k$ groups. There are hence ${\binom {n}{x_{1},\dotsc ,x_{k}}}$ ways of allocation.

So, $\mathbb {P} (\mathbf {X} =\mathbf {x} )={\binom {n}{x_{1},\dotsc ,x_{k}}}p_{1}^{x_{1}}\dotsb p_{k}^{x_{k}}.$ In particular, the probability of allocating $x_{i}$ objects to $i$ th cell is $p_{i}^{x_{i}}$ by independence, and so that of a particular case of allocation of $n$ objects to $k$ cells is $p_{1}^{x_{1}}\dotsb p_{k}^{x_{k}}$ by independence.

Definition

Definition. (Multinomial distribution) A random vector $\mathbf {X} =(X_{1},\dotsc ,X_{k})^{T}$ follows the multinomial distribution with $n$ trials and probability vector $\mathbf {p} =(p_{1},\dotsc ,p_{k})^{T}$ , denoted by $\mathbf {X} \sim \operatorname {Multinom} (n,\mathbf {p} )$ , if its joint pmf is $f_{\mathbf {X} }(x_{1},\dotsc ,x_{k};n,\mathbf {p} )={\binom {n}{x_{1},\dotsc ,x_{k}}}p_{1}^{x_{1}}\dotsb p_{k}^{x_{k}},\quad x_{1},\dotsc ,x_{k}\geq 0,{\text{ and }}x_{1}+\dotsb +x_{k}=n.$

Remark.

$\operatorname {Multinom} (n,\mathbf {p} )\equiv \operatorname {Binom} (n,p)$ if $\mathbf {p} =(p,1-p)^{T}$ .

In this case, if $(X_{1},X_{2})^{T}\sim \operatorname {Multinom} (n,\mathbf {p} )$ , $X_{1}$ is the number of successes for the binomial distribution (and $X_{2}(=n-X_{1})$ is the number of failures).

Also, $X_{i}\sim \operatorname {Binom} (n,p_{i})$ . It can be seen by regarding allocating the object into $i$ th cell as 'success' for each allocation of single object ^[13]. Then, the success probability is $p_{i}$ .

Multivariate normal distribution

Multivariate normal distribution is, as suggested by its name, a multivariate (and also generalized) version of the normal distribution (univariate).

Definition. (Multivariate normal distribution) A random vector $\mathbf {X} =(X_{1},\dotsc ,X_{k})^{T}$ follows the $k$ -dimensional normal distribution with mean vector ${\boldsymbol {\mu }}$ and covariance matrix ${\boldsymbol {\Sigma }}$ , denoted by $\mathbf {X} \sim {\mathcal {N}}_{k}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})$ ^[14] if its joint pdf is $f_{\mathbf {X} }(x_{1},\dotsc ,x_{k};{\boldsymbol {\mu }},{\boldsymbol {\Sigma }})={\frac {\exp \left(-(\mathbf {x} -{\boldsymbol {\mu }})^{T}{\boldsymbol {\Sigma }}^{-1}(\mathbf {x} -{\boldsymbol {\mu }})/2\right)}{\sqrt {(2\pi )^{k}\det {\boldsymbol {\Sigma }}}}},\quad \mathbf {x} =(x_{1},\dotsc ,x_{k})^{T}\in \mathbb {R} ^{k}$ in which ${\boldsymbol {\mu }}=(\mu _{1},\dotsc ,\mu _{k})^{T}=(\mathbb {E} [X_{1}],\dotsc ,\mathbb {E} [X_{k}])^{T}$ is the mean vector, and ${\boldsymbol {\Sigma }}={\begin{pmatrix}\operatorname {Cov} (X_{1},X_{1})&\cdots &\operatorname {Cov} (X_{1},X_{k})\\\vdots &\ddots &\vdots \\\operatorname {Cov} (X_{k},X_{1})&\cdots &\operatorname {Cov} (X_{k},X_{k})\end{pmatrix}}={\begin{pmatrix}\sigma _{1}^{2}&\cdots &\operatorname {Cov} (X_{1},X_{k})\\\vdots &\ddots &\vdots \\\operatorname {Cov} (X_{k},X_{1})&\cdots &\sigma _{k}^{2}\end{pmatrix}}$ is the covariance matrix (with size $k\times k$ ).

Remark.

The distribution for case $k=2$ is more usually used, and that is called the bivariate normal distribution.
An alternative and equivalent definition is that $\mathbf {X} =(X_{1},\dotsc ,X_{k})^{T}\sim {\mathcal {N}}_{k}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})$ if

${\begin{aligned}X_{1}&=a_{11}Z_{1}+\dotsb +a_{1n}Z_{n}+\mu _{1};\\\vdots \\X_{k}&=a_{k1}Z_{1}+\dotsb +a_{kn}Z_{n}+\mu _{k},\\\end{aligned}}$

for some constants

a_{11},\dotsc ,a_{1n},\dotsc ,a_{k1},\dotsc ,a_{kn},\mu _{1},\dotsc ,\mu _{k}

, and

Z_{1},\dotsc ,Z_{n}

are

n

i.i.d. standard normal random variables.

Using the above result, the marginal distribution followed by $X_{i}$ is ${\mathcal {N}}(\mu _{i},\sigma _{i}^{2}),\quad i=1,2,\dotsc ,{\text{ or }}k$ , as one will expect.

By proposition about the sum of independent normal random variables and distribution of linear transformation of normal random variables (see Probability/Transformation of Random Variables chapter), the mean is $0+\dotsb +0+\mu _{i}=\mu _{i}$ , and the variance is $a_{i1}^{2}+\dotsb +a_{in}^{2}$ (this equals $\sigma _{i}^{2}$ by definition).

Proposition. (Joint pdf of the bivariate normal distribution) The joint pdf of ${\mathcal {N}}_{2}({\boldsymbol {\mu }},{\boldsymbol {\Sigma }})$ is $f(x,y)={\frac {1}{2\pi \sigma _{X}\sigma _{Y}{\sqrt {1-\rho ^{2}}}}}\exp \left(-{\frac {1}{2(1-\rho ^{2})}}\left(\left({\frac {x-\mu _{X}}{\sigma _{X}}}\right)^{2}-2\rho \left({\frac {x-\mu _{X}}{\sigma _{X}}}\right)\left({\frac {y-\mu _{Y}}{\sigma _{Y}}}\right)+\left({\frac {y-\mu _{Y}}{\sigma _{Y}}}\right)^{2}\right)\right),\quad (x,y)^{T}\in \mathbb {R} ^{2}$

in which

\rho =\rho (X,Y)

and

\sigma _{X},\sigma _{Y}

are positive.

Graph of an example of bivariate normal distribution

Proof. For the bivariate normal distribution,

the mean vector is ${\boldsymbol {\mu }}=(\mu _{X},\mu _{Y})$ ;
the covariance matrix is ${\boldsymbol {\Sigma }}={\begin{pmatrix}\operatorname {Cov} (X,X)&\operatorname {Cov} (X,Y)\\\operatorname {Cov} (Y,X)&\operatorname {Cov} (Y,Y)\end{pmatrix}}={\begin{pmatrix}\operatorname {Var} (X)&\operatorname {Cov} (X,Y)\\\operatorname {Cov} (X,Y)&\operatorname {Var} (Y)\\\end{pmatrix}}={\begin{pmatrix}\sigma _{X}^{2}&\rho \sigma _{X}\sigma _{Y}\\\rho \sigma _{X}\sigma _{Y}&\sigma _{Y}^{2}\\\end{pmatrix}}.$
Hence,

${\begin{aligned}(\mathbf {x} -{\boldsymbol {\mu }})^{T}{\boldsymbol {\Sigma }}^{-1}(\mathbf {x} -{\boldsymbol {\mu }})&={\frac {1}{\det {\boldsymbol {\Sigma }}}}\left((x-\mu _{X},y-\mu _{Y})^{T}\right)^{T}{\begin{pmatrix}\sigma _{Y}^{2}&-\rho \sigma _{X}\sigma _{Y}\\-\rho \sigma _{X}\sigma _{Y}&\sigma _{X}^{2}\\\end{pmatrix}}(x-\mu _{X},y-\mu _{Y})^{T})\\&={\frac {1}{\det {\boldsymbol {\Sigma }}}}{\begin{pmatrix}{\color {blue}x-\mu _{X}}&{\color {red}y-\mu _{Y}}\end{pmatrix}}{\begin{pmatrix}{\color {darkgreen}\sigma _{Y}^{2}}&{\color {darkorange}-\rho \sigma _{X}\sigma _{Y}}\\{\color {purple}-\rho \sigma _{X}\sigma _{Y}}&{\color {maroon}\sigma _{X}^{2}}\\\end{pmatrix}}{\begin{pmatrix}x-\mu _{X}\\y-\mu _{Y}\end{pmatrix}}\\&={\frac {1}{\det {\boldsymbol {\Sigma }}}}{\begin{pmatrix}{\color {blue}(x-\mu _{X})}{\color {darkgreen}\sigma _{Y}^{2}}{\color {purple}-}{\color {red}(y-\mu _{Y})}{\color {purple}\rho \sigma _{X}\sigma _{Y}}&{\color {darkorange}-}{\color {blue}(x-\mu _{X})}{\color {darkorange}\rho \sigma _{X}\sigma _{Y}}+{\color {red}(y-\mu _{Y})}{\color {maroon}\sigma _{X}^{2}}\end{pmatrix}}{\begin{pmatrix}{\color {deeppink}x-\mu _{X}}\\{\color {deeppink}y-\mu _{Y}}\end{pmatrix}}\\&={\frac {1}{\underbrace {\det {\boldsymbol {\Sigma }}} _{\sigma _{X}^{2}\sigma _{Y}^{2}-(\rho \sigma _{X}\sigma _{Y})^{2}}}}{\big (}(x-\mu _{X})^{\color {deeppink}2}\sigma _{Y}^{2}\underbrace {-{\color {deeppink}(x-\mu _{X})}(y-\mu _{Y})\rho \sigma _{X}\sigma _{Y}-(x-\mu _{X}){\color {deeppink}(y-\mu _{Y})}\rho \sigma _{X}\sigma _{Y}} _{=-2\rho (x-\mu _{X})(y-\mu _{Y})\sigma _{X}\sigma _{Y}}+(y-\mu _{Y})^{\color {deeppink}2}\sigma _{X}^{2}{\big )}\\&={\frac {(x-\mu _{X})^{2}\sigma _{Y}^{2}-2\rho (x-\mu _{X})(y-\mu _{Y})\sigma _{X}\sigma _{Y}+(y-\mu _{Y})^{2}\sigma _{X}^{2}}{\sigma _{X}^{2}\sigma _{Y}^{2}(1-\rho )^{2}}}\\&={\frac {1}{1-\rho ^{2}}}\left(\left({\frac {x-\mu _{X}}{\sigma _{X}}}\right)^{2}-2\rho \left({\frac {(x-\mu _{X})(y-\mu _{Y})}{\sigma _{X}\sigma _{Y}}}\right)+\left({\frac {y-\mu _{Y}}{\sigma _{Y}}}\right)^{2}\right).\end{aligned}}$

It follows that the joint pdf is

${\begin{aligned}f(x,y)&={\frac {1}{\sqrt {(2\pi )^{2}\det {\boldsymbol {\Sigma }}}}}\exp \left(-{\frac {1}{2}}\cdot {\frac {1}{1-\rho ^{2}}}\left(\left({\frac {x-\mu _{X}}{\sigma _{X}}}\right)^{2}-2\rho \left({\frac {(x-\mu _{X})(y-\mu _{Y})}{\sigma _{X}\sigma _{Y}}}\right)+\left({\frac {y-\mu _{Y}}{\sigma _{Y}}}\right)^{2}\right)\right)\\&={\frac {1}{2\pi {\sqrt {\sigma _{X}^{2}\sigma _{Y}^{2}(1-\rho ^{2})}}}}\exp \left({\frac {-1}{2(1-\rho ^{2})}}\left(\left({\frac {x-\mu _{X}}{\sigma _{X}}}\right)^{2}-2\rho \left({\frac {(x-\mu _{X})(y-\mu _{Y})}{\sigma _{X}\sigma _{Y}}}\right)+\left({\frac {y-\mu _{Y}}{\sigma _{Y}}}\right)^{2}\right)\right)\\&={\frac {1}{2\pi \sigma _{X}\sigma _{Y}{\sqrt {1-\rho ^{2}}}}}\exp \left({\frac {-1}{2(1-\rho ^{2})}}\left(\left({\frac {x-\mu _{X}}{\sigma _{X}}}\right)^{2}-2\rho \left({\frac {x-\mu _{X}}{\sigma _{X}}}\right)\left({\frac {y-\mu _{Y}}{\sigma _{Y}}}\right)+\left({\frac {y-\mu _{Y}}{\sigma _{Y}}}\right)^{2}\right)\right).\\\end{aligned}}$

$\Box$

Random Variables

Probability
Important Distributions

Joint Distributions and Independence

↑ Alternatively, we can define the events as $\{i{\text{th Bernoulli trial is a failure}}\}.$
↑ 'indpt.' stands for independence.
↑ This is because there is unordered selection of (distinguishable and ordered) ${\color {darkgreen}r}$ trials for 'success' without replacement from ${\color {blue}n}$ trials (then the remaining position is for 'failure').
↑ Occurrence of the rare event is viewed as 'success' and non-occurrence of the rare event is viewed as 'failure'.
↑ Unlike the outcomes for the binomial distribution, there is only one possible sequence for each ${\color {red}x}$ .
↑ There is unordered selection of ${\color {red}x}$ trials for 'failures' (or ${\color {darkgreen}k}-1$ trials for 'successes') from ${\color {red}x}+{\color {darkgreen}k}-1$ trials without replacement
↑ The restriction on $k$ is imposed so that the binomial coefficients are defined, i.e. the expression 'makes sense'. In practice, we rarely use this condition directly. Instead, we usually directly determine whether a specific value of $x$ 'makes sense'.
↑ It is out of scope for this book.
↑ The probability is 'distributed uniformly over an interval'.
↑ A random variable following the Cauchy distribution has a relatively high probability to take extreme values, compared with other light-tailed distributions (e.g. the normal distribution). Graphically, the 'tails' (i.e. left end and right end) of the pdf.
↑ The case for $a<0$ holds similarly (The inequality sign is in opposite direction, and eventually we will have two negative signs cancelling each other). Also when $a=0$ , the r.v. becomes a non-random constant, and so we are not interested in this case.
↑ Then, $p_{1}+p_{2}+\dotsb +p_{k}=1$ .
↑ If the object is allocated to a cell other than $i$ th cell, then it is 'failure'
↑ The subscript $k$ for ${\mathcal {N}}$ is to emphasize that the distribution is $k$ -dimensional, and is optional.

[1] Alternatively, we can define the events as $\{i{\text{th Bernoulli trial is a failure}}\}.$

[2] 'indpt.' stands for independence.

[3] This is because there is unordered selection of (distinguishable and ordered) ${\color {darkgreen}r}$ trials for 'success' without replacement from ${\color {blue}n}$ trials (then the remaining position is for 'failure').

[4] Occurrence of the rare event is viewed as 'success' and non-occurrence of the rare event is viewed as 'failure'.

[5] Unlike the outcomes for the binomial distribution, there is only one possible sequence for each ${\color {red}x}$ .

[6] There is unordered selection of ${\color {red}x}$ trials for 'failures' (or ${\color {darkgreen}k}-1$ trials for 'successes') from ${\color {red}x}+{\color {darkgreen}k}-1$ trials without replacement

[7] The restriction on $k$ is imposed so that the binomial coefficients are defined, i.e. the expression 'makes sense'. In practice, we rarely use this condition directly. Instead, we usually directly determine whether a specific value of $x$ 'makes sense'.

[8] It is out of scope for this book.

[9] The probability is 'distributed uniformly over an interval'.

[10] A random variable following the Cauchy distribution has a relatively high probability to take extreme values, compared with other light-tailed distributions (e.g. the normal distribution). Graphically, the 'tails' (i.e. left end and right end) of the pdf.

[11] The case for $a<0$ holds similarly (The inequality sign is in opposite direction, and eventually we will have two negative signs cancelling each other). Also when $a=0$ , the r.v. becomes a non-random constant, and so we are not interested in this case.

[12] Then, $p_{1}+p_{2}+\dotsb +p_{k}=1$ .

[13] If the object is allocated to a cell other than $i$ th cell, then it is 'failure'

[14] The subscript $k$ for ${\mathcal {N}}$ is to emphasize that the distribution is $k$ -dimensional, and is optional.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

	Binomial distribution.
	Poisson distribution.
	Geometric distribution.
	Negative binomial distribution.
	Hypergeometric distribution.

	Binomial distribution.
	Poisson distribution.
	Geometric distribution.
	Negative binomial distribution.
	Hypergeometric distribution.

	Binomial distribution.
	Poisson distribution.
	Geometric distribution.
	Negative binomial distribution.
	Hypergeometric distribution.

	Binomial distribution.
	Poisson distribution.
	Geometric distribution.
	Negative binomial distribution.
	Hypergeometric distribution.

	Binomial distribution.
	Poisson distribution.
	Geometric distribution.
	Negative binomial distribution.
	Hypergeometric distribution.

Probability/Important Distributions

Contents

Distributions of a discrete random variable

Preliminary conept: Bernoulli trial

Binomial distribution

Motivation

Definition

Bernoulli distribution

Poisson distribution

Motivation

Definition

Geometric distribution

Motivation

Definition

Negative binomial distribution

Motivation

Definition

Hypergeometric distribution

Motivation

Definition

Finite discrete distribution

Exercises

Distributions of a continuous random variable

Uniform distribution (continuous)

Exponential distribution

Gamma distribution

Beta distribution

Cauchy distribution

Normal distribution (very important)

Important distributions for statistics especially

Chi-squared distribution

Student's t-distribution

F-distribution

Joint distributions

Multinomial distribution

Motivation

Definition

Multivariate normal distribution

Point added for a correct answer:
Points for an incorrect answer:
Ignore the questions' coefficients:

	$\operatorname {HypGeo} (650,100,300)$
	$\operatorname {HypGeo} (650,350,100)$
	$\operatorname {HypGeo} (650,250,100)$
	$\operatorname {HypGeo} (650,100,100)$
	$\operatorname {HypGeo} (650,100,350)$

	Binomial distribution.
	Bernoulli distribution.
	Poisson distribution.
	Geometric distribution.
	Negative binomial distribution.
	Hypergeometric distribution.

	$\operatorname {Binom} (200,0.001)$
	$\operatorname {Binom} (200,0.999)$
	$\operatorname {Binom} (20000,0.001)$
	$\operatorname {Binom} (20000,0.999)$
	$\operatorname {Binom} (2,0.001)$

	Yes.
	No.