Probability/Properties of Distributions

Probability
Properties of Distributions

Introduction

Recall that pdf (or cdf) describes the random behaviour of a random variable completely . However, we may sometimes find the pdf (or cdf) to be too complicated, and only want to know some partial information about the random variable. In view of this, we study some properties of distributions in this chapter, which provide partial descriptions of the random behaviour of the random variable.

Some examples of such partial descriptions include

location (e.g. pdf is 'located' at left or right?),
dispersion (e.g. 'sharp' of 'flat' pdf?),
skewness (e.g. pdf is symmetric, skewed to left, or skewed to right?), and
tail property (e.g. pdf have 'light' or 'heavy' tails?).

We can qualitatively describe them, but such descriptions are quite subjective and inaccurate. To give a more objective and accurate measure to such descriptions, we evaluate them quantitatively using some quantitative measures derived from the pdf (or cdf) of the random variable.

We will discuss some of such quantitative measures in this chapter. Among these, the expectation is the most important one, since many of other properties base upon the concept of expectation.

Expectation

Variance (and standard deviation)

Indeed, variance is a special case of central moment, and is related to moment in some sense.

Definition. ( $r$ th moment) The $r$ th moment of a random variable $X$ is $\mathbb {E} [X^{r}]$ .

Definition. ( $r$ th central moment) The $r$ th central moment of a random variable $X$ is $\mathbb {E} {\bigg [}(X-\underbrace {\mathbb {E} [X]} _{\text{constant}})^{r}{\bigg ]}$ .

Definition. (Variance) The variance of a random variable $X$ , denoted by $\operatorname {Var} (X)$ , is its 2nd central moment, i.e. $\mathbb {E} {\bigg [}(X-\underbrace {\mathbb {E} [X]} _{\text{constant}})^{2}{\bigg ]}$ .

Since $(X-\mathbb {E} [X])^{2}$ is the squared deviation of the value of $X$ from its mean, in view of the definition of variance, we can see that variance measure the dispersion (or spread) of distribution, since it is what we would expect of the squared deviation if we are to take an observation of the random variable.

Another term which is closed related is standard deviation.

Definition. (Standard deviation) The standard deviation of random variable $X$ , usually denoted as $\sigma$ , is ${\sqrt {\operatorname {Var} (X)}}$ .

Remark.

the interpretation of standard deviation is similar to that of variance
standard deviation is also sometimes abbreviated as 's.d.'
standard deviation of random variable $X$ has the same unit as $X$ , which is one of its advantage, and one of the reasons to use standard deviation instead of variance to measure dispersion.
since standard deviation is usually denoted as $\sigma$ , it follows that we can denote variance as $\sigma ^{2}$ , although it is not as common as the $\operatorname {Var} (\cdot )$ notation.

Proposition. (Properties of variance)

(alternative expression for variance)

$\operatorname {Var} (X)=\mathbb {E} \left[X^{2}\right]-\left(\mathbb {E} [X]\right)^{2}$

(invariance under change in location parameter)

$\operatorname {Var} (X+a)=\operatorname {Var} (X)$ for each constant $a$

(homogeneity of degree two)

$\operatorname {Var} (bX)=b^{2}\operatorname {Var} (X)$ for each constant $b$

(nonnegativity)

$\operatorname {Var} (X)\geq 0$

(zero variance implies non-randomness)

$\operatorname {Var} (X)=0\Rightarrow X={\text{non-random constant}}\Leftrightarrow {\text{there exists a constant }}c{\text{ such that }}\mathbb {P} (X=c)=1$

(additivity under independence)

$X_{1},\ldots ,X_{n}{\text{ are independent}}\Rightarrow \operatorname {Var} (X_{1}+\cdots +X_{n})=\operatorname {Var} (X_{1})+\cdots +\operatorname {Var} (X_{n})$

Proof.

alternative expression for variance:

Let

\mu =\mathbb {E} [X]

for clearer expression.

$\mathbb {E} \left[(X-\mu )^{2}\right]=\mathbb {E} \left[X^{2}-2X\mu +\mu ^{2}\right]=\mathbb {E} \left[X^{2}\right]-2\mu \underbrace {\mathbb {E} [X]} _{\mu }+\mu ^{2}=\mathbb {E} \left[X^{2}\right]-\mu ^{2},$ and the result follows.

invariance under change in location parameter:

$\operatorname {Var} (X+a)=\mathbb {E} {\bigg [}(X{\cancel {+a}}-\underbrace {\mathbb {E} [X+a]} _{\mathbb {E} [X]{\cancel {+a}}})^{2}{\bigg ]}=\mathbb {E} \left[(X-\mathbb {E} [X])^{2}\right]=\operatorname {Var} (X).$

nonnegativity: it follows from $(X-\mathbb {E} [X])^{2}\geq 0$ .
zero variance implies non-randomness:

Let

\mu =\mathbb {E} [X]

for clearer expression. Consider the event

E_{n}=\{|X-\mu |\geq n^{-1}\}

, in which

n

is a positive integer.

Since

0=\operatorname {Var} (X)=\mathbb {E} \left[(X-\mu )^{2}\right]\geq \mathbb {E} [(X-\mu )^{2}\underbrace {\mathbf {1} \{E_{n}\}} _{\leq 1}]=\mathbb {E} \left[|X-a|^{2}\mathbf {1} \{E_{n}\}\right]\geq \mathbb {E} [\underbrace {n^{-2}} _{\text{constant}}\mathbf {1} \{E_{n}\}]=\underbrace {n^{-2}} _{\geq 0}\underbrace {\mathbb {P} (E_{n})} _{\geq 0}\geq 0,

we have

0\geq n^{-2}\mathbb {P} (E_{n})\geq 0\Rightarrow 0\geq \mathbb {P} (E_{n})\geq 0\Rightarrow \mathbb {P} (E_{n})=0

.

Thus,

$\mathbb {P} (\underbrace {|X-\mu |>0} _{X\neq \mu })=\mathbb {P} \left(\bigcup _{n=1}^{\infty }E_{n}\right){\overset {\text{a lemma}}{=}}\lim _{n\to \infty }\underbrace {\mathbb {P} (E_{n})} _{0}=0\Rightarrow \mathbb {P} (X=\mu )=1-\underbrace {\mathbb {P} (X\neq \mu )} _{0}=1$

additivity under independence:

For each random variable

X

and

Y

that are independent with means

\mu ,\nu

respectively,

${\begin{aligned}\operatorname {Var} (X+Y)&=\mathbb {E} \left[(X+Y-\mathbb {E} [X+Y])^{2}\right]\\\operatorname {Var} (X+Y)&=\mathbb {E} \left[(X+Y-\mu -\nu )^{2}\right]&{\text{by linearity}}\\&=\underbrace {\mathbb {E} \left[(X-\mu )^{2}\right]} _{\operatorname {Var} (X)}+\underbrace {\mathbb {E} \left[(Y-\nu )^{2}\right]} _{\operatorname {Var} (Y)}+2\mathbb {E} [(X-\mu )(Y-\nu )]&{\text{by linearity}}\\&=\operatorname {Var} (X)+\operatorname {Var} (Y)+2\mathbb {E} [XY]-2\nu \mathbb {E} [X]-2\mu \mathbb {E} [Y]+2\mu \nu &{\text{by linearity}}\\&=\operatorname {Var} (X)+\operatorname {Var} (Y)+2\underbrace {\mathbb {E} [X]\mathbb {E} [Y]} _{\mu \nu }-2\nu \mu {\cancel {-2\mu \nu }}{\cancel {+2\mu \nu }}&{\text{by independence of }}X,Y\\&=\operatorname {Var} (X)+\operatorname {Var} (Y){\cancel {+2\mu \nu }}{\cancel {-2\nu \mu }}\\&=\operatorname {Var} (X)+\operatorname {Var} (Y).\end{aligned}}$ Thus, inductively, $\operatorname {Var} (X_{1}+\cdots +X_{n})=\operatorname {Var} (X_{1})+\operatorname {Var} (X_{2}+\cdots +X_{n})=\cdots =\operatorname {Var} (X_{1})+\cdots +\operatorname {Var} (X_{n})$ if $X_{1},\ldots ,X_{n}$ are independent.

$\Box$

Variance of some distributions of a discrete random variable

Proposition. (Variance of Bernoulli and binomial r.v.'s) Let $X\sim \operatorname {Ber} (p)$ and $Y\sim \operatorname {Binom} (n,p)$ . Then, $\operatorname {Var} (X)=p(1-p)$ and $\operatorname {Var} (Y)=np(1-p)$ .

Proof.

$\mathbb {E} [X^{2}]=0\cdot \mathbb {P} (X=0)+1\cdot \mathbb {P} (\underbrace {X^{2}=1} _{\Leftrightarrow X=1})=p$ since $X$ is nonnegative.
It follows that $\operatorname {Var} (X)=\mathbb {E} [X^{2}]-(\mathbb {E} [X])^{2}=p-p^{2}=p(1-p)$ .
Similar to the proof for the mean of Bernoulli and binomial r.v.'s, $Y=X_{1}+\dotsb +X_{n}$ in which $X_{1},\dotsc ,X_{n}$ are i.i.d. and follow $\operatorname {Ber} (p)$ .
Because of the independence (from i.i.d. property), $\operatorname {Var} (Y)=\underbrace {\operatorname {Var} (X_{1})+\dotsb +\operatorname {Var} (X_{n})} _{n{\text{ times}}}=np(1-p).$

$\Box$

Proposition. (Variance of Poisson r.v.'s) Let $X\sim \operatorname {Pois} (\lambda )$ . Then, $\operatorname {Var} (X)=\lambda$ .

Proof.

$\mathbb {E} [X^{2}]=\sum _{k=0}^{\infty }k^{2}\underbrace {\left({\frac {\lambda ^{k}e^{-\lambda }}{k!}}\right)} _{\mathbb {P} (X=k)}=\lambda \left(0+\sum _{\underbrace {\color {blue}k=1} _{k-1=0}}^{\infty }{\cancel {k}}\left({\frac {k\lambda ^{k-1}e^{-\lambda }}{{\cancel {k}}(k-1)!}}\right)\right)=\lambda \left(\underbrace {\sum _{k-1=0}^{\infty }{\frac {(k{\color {red}-1})e^{-\lambda }\lambda ^{k-1}}{(k-1)!}}} _{\mathbb {E} [X]}+{\color {red}\overbrace {\sum _{k-1=0}^{\infty }\underbrace {\frac {e^{-\lambda }\lambda ^{k-1}}{(k-1)!}} _{\mathbb {P} (X=k-1)}} ^{=1}}\right)=\lambda (\lambda +1).$
Hence, $\operatorname {Var} (X)=\mathbb {E} [X^{2}]-(\mathbb {E} [X])^{2}=\lambda (\lambda +1)-\lambda ^{2}=\lambda .$

$\Box$

Proposition. (Variance of geometric and negative binomial r.v.'s) Let $X\sim \operatorname {Geo} (p)$ and $Y\sim \operatorname {NB} (k,p)$ . Then, $\operatorname {Var} (X)={\frac {1-p}{p^{2}}}$ , and $\mathbb {E} [Y]={\frac {k(1-p)}{p^{2}}}$ .

Proof.

Since

${\begin{aligned}\mathbb {E} [X]&=\sum _{k=0}^{\infty }k^{2}\underbrace {(1-p)^{k}p} _{\mathbb {P} (X=k)}\\&=\sum _{k=0}^{\infty }(k-1+1)^{2}\underbrace {(1-p)^{k}p} _{\mathbb {P} (X=k)}\\&=\sum _{k=0}^{\infty }(k-1)^{2}(1-p)^{k}p+\sum _{k=0}^{\infty }2(k-1)(1-p)^{k}p+\overbrace {\sum _{k=0}^{\infty }\underbrace {(1-p)^{k}p} _{\mathbb {P} (X=k)}} ^{=1}\\&=\underbrace {\color {blue}(0-1)^{2}(1-p)^{0}p} _{=p}+(1-p)\sum _{\color {blue}k-1=0}^{\infty }(k-1)^{2}(1-p)^{k-1}p+\underbrace {\color {red}2(0-1)(1-p)^{0}p} _{=-2p}+2(1-p)\sum _{\color {red}k-1=0}^{\infty }(k-1)(1-p)^{k-1}p+1\\&=p+(1-p)\mathbb {E} [X^{2}]-2p+2(1-p)\underbrace {\mathbb {E} [X]} _{(1-p)/p}+1\\&=(1-p)\mathbb {E} [X^{2}]+{\frac {2(1-p)^{2}}{p}}+1-p,\\\end{aligned}}$

it follows that $\;p\mathbb {E} [X^{2}]={\frac {2(1-p)^{2}}{p}}+1-p\Rightarrow \mathbb {E} [X^{2}]={\frac {2(1-p)^{2}+p(1-p)}{p^{2}}}$ .
Hence, $\operatorname {Var} (X)=\mathbb {E} [X^{2}]-(\mathbb {E} [X])^{2}={\frac {2(1-p)^{2}+p(1-p)}{p^{2}}}-{\frac {(1-p)^{2}}{p^{2}}}={\frac {(1-p)^{2}+p(1-p)}{p^{2}}}={\frac {(1-p)(1{\cancel {-p+p}})}{p^{2}}}$ .
Similarly, $Y=X_{1}+\dotsb +X_{k}$ in which $X_{1},\dotsc ,X_{k}$ are i.i.d., and follow $\operatorname {Geo} (p)$ ^[5].
Because of the independence, $\operatorname {Var} (Y)=\operatorname {Var} (X_{1})+\dotsb +\operatorname {Var} (X_{k})=\underbrace {{\frac {1-p}{p^{2}}}+\dotsb +{\frac {1-p}{p^{2}}}} _{k{\text{ times}}}={\frac {k(1-p)}{p^{2}}}.$

$\Box$

Variance of some distributions of a continuous random variable

Proposition. (Variance of uniform r.v.'s) Let $X\sim {\mathcal {U}}[a,b]$ . ( $a<b$ ) Then, $\operatorname {Var} (X)={\frac {(b-a)^{2}}{12}}$ .

Proof. ${\begin{aligned}\operatorname {Var} (X)&=\mathbb {E} \left[X^{2}\right]-(\mathbb {E} [X])^{2}\\&=\int _{a}^{b}{\frac {x^{2}}{b-a}}\,dx-\left({\frac {b+a}{2}}\right)^{2}\\&={\frac {1}{b-a}}\left(b^{3}/3-a^{3}/3\right)-\left({\frac {a+b}{2}}\right)^{2}\\&={\frac {1}{3(b-a)}}\left(b^{3}-a^{3}\right)-\left({\frac {a+b}{2}}\right)^{2}\\&={\frac {1}{3{\cancel {(b-a)}}}}{\cancel {(b-a)}}(b^{2}+ba+a^{2})-{\frac {a^{2}+2ab+b^{2}}{4}}\\&={\frac {{\color {blue}{\cancel {4}}}b^{2}{\color {purple}{\cancel {+4ab}}}+{\color {red}{\cancel {4}}}a^{2}{\color {blue}{\cancel {-3b^{2}}}}-{\color {purple}{\overset {2}{\cancel {6}}}}ab{\color {red}{\cancel {-3a^{2}}}}}{12}}\\&={\frac {b^{2}-2ab+a^{2}}{12}}\\&={\frac {(b-a)^{2}}{12}}.\\\end{aligned}}$

$\Box$

Proposition. (Variance of gamma, exponential and chi-squared r.v.'s) Let $X\sim \operatorname {Gamma} (\alpha ,\lambda )$ , $Y\sim \operatorname {Exp} (\lambda )$ , and $Z\sim \chi _{\nu }^{2}$ . Then, $\operatorname {Var} (X)=\alpha /\lambda ^{2}$ , $\operatorname {Var} (Y)=1/\lambda ^{2}$ , and $\operatorname {Var} (Z)=2\nu$ .

Proof.

Similarly, it suffices to prove the formula for variance of gamma r.v.'s.
${\begin{aligned}\mathbb {E} [X^{2}]&=\int _{0}^{\infty }{\color {red}x^{2}}\cdot {\frac {\lambda ^{\alpha }x^{\alpha -1}e^{-\lambda x}}{\Gamma (\alpha )}}\,dx\\&={\frac {\color {purple}(\alpha +1)\alpha }{\color {blue}\lambda ^{2}}}\underbrace {\int _{0}^{\infty }{\frac {\lambda ^{\alpha {\color {blue}+2}}x^{\alpha {\color {red}+2}-1}e^{-\lambda x}}{\Gamma (\alpha {\color {purple}+2})}}\,dx} _{=F(\infty )=1},&F{\text{ is the cdf of }}\operatorname {Gamma} (\alpha +2,\lambda ),\\&={\frac {(\alpha +1)\alpha }{\lambda ^{2}}}.\\\end{aligned}}$
It follows that $\operatorname {Var} (X)=\mathbb {E} [X^{2}]-(\mathbb {E} [X]^{2})={\frac {({\cancel {\alpha }}+1)\alpha }{\lambda ^{2}}}{\cancel {-{\frac {\alpha ^{2}}{\lambda ^{2}}}}}={\frac {\alpha }{\lambda ^{2}}}.$
Since $\operatorname {Exp} (\lambda )\equiv \operatorname {Gamma} (1,\lambda )$ , $\operatorname {Var} (Y)=1/\lambda ^{2}$ by substituting $\alpha =1$ .
Since $\chi _{\nu }^{2}\equiv \operatorname {Gamma} (\nu /2,1/2)$ , $\operatorname {Var} (Z)=(\nu /2)/(1/2)^{2}=2\nu$ by substituting $\alpha =\nu /2$ and $\lambda =1/2$ .

$\Box$

Proposition. (Variance of beta r.v.'s) Let $X\sim \operatorname {Beta} (\alpha ,\beta )$ . Then, $\operatorname {Var} (X)={\frac {\alpha \beta }{(\alpha +\beta )^{2}(\alpha +\beta +1)}}$ .

Proof.

${\begin{aligned}\mathbb {E} [X^{2}]&=\int _{0}^{1}{\color {red}x^{2}}\cdot {\frac {\Gamma (\alpha +\beta )}{\Gamma (\alpha )\Gamma (\beta )}}x^{\alpha -1}(1-x)^{\beta -1}\,dx\\&={\frac {\color {purple}(\alpha +1)\alpha }{\color {blue}(\alpha +\beta +1)(\alpha +\beta )}}\underbrace {\int _{0}^{1}{\frac {\Gamma (\alpha +\beta {\color {blue}+2})}{\Gamma (\alpha {\color {purple}+2})\Gamma (\beta )}}x^{\alpha {\color {red}+2}-1}(1-x)^{\beta -1}\,dx} _{F(1)=1},&F{\text{ is the cdf of }}\operatorname {Beta} (\alpha +2,\beta ),\\&={\frac {(\alpha +1)\alpha }{(\alpha +\beta +1)(\alpha +\beta )}}.\end{aligned}}$
It follows that ${\begin{aligned}\operatorname {Var} (X)&=\mathbb {E} [X^{2}]-(\mathbb {E} [X])^{2}={\frac {(\alpha +1)\alpha }{(\alpha +\beta +1)(\alpha +\beta )}}-{\frac {\alpha ^{2}}{(\alpha +\beta )^{2}}}\\&={\frac {(\alpha +1)(\alpha )(\alpha +\beta )-\alpha ^{2}(\alpha +\beta +1)}{(\alpha +\beta )^{2}(\alpha +\beta +1)}}\\&={\frac {\alpha ({\cancel {\alpha ^{2}+\alpha \beta +\alpha }}+\beta {\cancel {-\alpha ^{2}-\alpha \beta -\alpha }})}{(\alpha +\beta )^{2}(\alpha +\beta +1)}}\\&={\frac {\alpha \beta }{(\alpha +\beta )^{2}(\alpha +\beta +1)}}.\\\end{aligned}}$

$\Box$

Proposition. (Undefined variance of Cauchy r.v.'s) Let $X\sim \operatorname {Cauchy} (\theta )$ . Then, $\operatorname {Var} (X)$ is undefined.

Proof. It follows from the proposition about undefined mean of Cauchy r.v.'s and the formula $\operatorname {Var} (X)=\mathbb {E} [X^{2}]-(\mathbb {E} [X])^{2}$ (arbitrary term minus undefined term is undefined).

$\Box$

Proposition. (Variance of normal r.v.'s) Let $X\sim {\mathcal {N}}(\mu ,\sigma ^{2})$ . Then, $\operatorname {Var} (X)=\sigma ^{2}$ .

Proof.

Let $Z={\frac {X-\mu }{\sigma }}\sim {\mathcal {N}}(0,1)$ .
${\begin{aligned}\mathbb {E} [Z^{2}]&=\int _{-\infty }^{\infty }x^{2}\varphi (x)\,dx\\&={\frac {1}{\sqrt {2\pi }}}\int _{-\infty }^{\infty }x^{2}e^{-x^{2}/2}\,dx\\&=-{\frac {1}{\sqrt {2\pi }}}\int _{-\infty }^{\infty }xd(e^{-x^{2}/2})\\&=-{\frac {1}{\sqrt {2\pi }}}\left([xe^{-x^{2}/2}]_{-\infty }^{\infty }-\int _{-\infty }^{\infty }e^{-x^{2}/2}\,dx\right)&{\text{by integration by parts}},\\&=-{\frac {1}{\sqrt {2\pi }}}\left(0-0-\int _{-\infty }^{\infty }e^{-x^{2}/2}\,dx\right)&{\text{since exponential function }}\downarrow {\text{ much faster than linear function, or by L'Hospital rule}},\\&=\underbrace {\int _{-\infty }^{\infty }\varphi (x)\,dx} _{=\Phi (\infty )=1}\\&=1.\end{aligned}}$
It follows that $\operatorname {Var} (Z)=\mathbb {E} [Z^{2}]-(\mathbb {E} [Z])^{2}=1-0=1$ .
Hence, $\operatorname {Var} (X)=\operatorname {Var} (\sigma Z+\mu )=\sigma ^{2}\operatorname {Var} (Z)=\sigma ^{2}$ .

$\Box$

Exercise.

	$\operatorname {Var} (aX)=0\Rightarrow X={\text{non-random constant}}$ for each constant $a$ .
	$\operatorname {Var} (aX+b)=a^{2}\operatorname {Var} (X)$ for each random variable $X$ , and for each constant $a,b$ .
	$\operatorname {Var} (X)=\mathbb {E} [X]\Rightarrow X={\text{non-random constant}}$
	$\operatorname {Var} (X)\leq 0$ if $X\leq 0$
	Standard deviation of random variable $X$ , $\sigma <\operatorname {Var} (X)$

Coefficient of variation

Quantile

Then, we will discuss quantile. In particular, median and interqaurtile range are quite related to quantiles.

Definition. (Quantile) Quantile of order $\alpha$ ( $\alpha$ th quantile) of random variable $X$ is $F^{-1}(\alpha )=\inf\{x\in \mathbb {R} :F(x)>\alpha \}.$

Remark.

Definition of quantile is not unique. There are several alternative definitions, namely

$\inf\{x\in \mathbb {R} :F(x)\geq \alpha \},\sup\{x\in \mathbb {R} :F(x)\leq \alpha \}{\text{ and }}\sup\{x\in \mathbb {R} :F(x)<\alpha \}.$

If $F$ is strictly increasing, all alternative definitions become equivalent and equal the inverse of cdf at $\alpha$ $F^{-1}(\alpha )$ , and thus we can calculate the $\alpha$ th quantile by solving the equation $F(x)=\alpha$ .
Practical applications focus only on $\alpha \in (0,1)$ .

The following are some terminologies related to quantiles.

Definition. (Percentile) The $(100\alpha )$ th percentile is $\alpha$ th quantile.

Example. 70th percentile is 0.7th quantile.

Definition. (Median) The median is 0.5th quantile.

Definition. (Quartile) The $j$ th quartile is $(j/4)$ th quantile in which $j\in \{1,2,3\}$ .

Example. 2nd quartile is 0.5th quantile, which is also median.

Definition. (Interquartile range) The interquartile range is 3rd quartile minus 1st quartile.

Median and interquartile range measure centrality and dispersion respectively. Recall that mean and variance measure the same things respectively. One advantage of median and interquartile range is robust, since they are always defined, while mean and variance can be infinite, and they fail to measure centrality and dispersion in those occasions. However, median and interquartile range also have some disadvantages, e.g. they may be more difficult to be computed, and may not be very accurate.

Example. (Quantile of uniform distribution) The $\alpha$ th quantile of uniform distribution with parameters $a$ and $b>a$ is $a+\alpha (b-a),$ since $F(x)={\frac {x-a}{b-a}}\mathbf {1} \{a<x<b\}=\alpha \Rightarrow x=F^{-1}(\alpha )=a+\alpha (b-a),$ and we can see that $x\in (a,b)$ if $\alpha \in (0,1)$ .

Then, median of uniform distribution is ${\frac {a+b}{2}}$ which is the same as its mean, and the interquartile range of uniform distribution is ${\cancel {a}}+(3/4)(b-a){\cancel {-a}}-(1/4)(b-a)={\frac {b-a}{2}},$ which is different from its variance, namely ${\frac {(b-a)^{2}}{12}}$ .

Exercise.

Mode

Mode is another measure of centrality.

Definition. (Mode)

The mode of a pmf (pdf) is the value of $x$ at which the pmf (pdf) takes its maximum value (has its local maximum).

Remark.

The mode is the value that is most likely to be sampled (for pmf).
Mode is less frequently used than mean.

Example. The modes of the pmf of the numbers coming up from throwing a fair six-faced dice are 1,2,3,4,5 and 6, since the probability for each of these numbers coming up is 1/6, so the pmf takes its maximum value (1/6) at each of these numbers.

Exercise.

Remark.

From this example, we can see that the mode is not necessarily unique.

Covariance and correlation coefficients

In this section, we will discuss two important properties of joint distributions, namely covariance and correlation coefficients. As we will see, covariance is related to variance in some sense, and correlation coefficient is closed related to correlation.

Definition. (Covariance) For each random variable $X,Y$ , the covariance of $X,Y$ is $\operatorname {Cov} (X,Y)=\mathbb {E} [(X-\mathbb {E} [X])(Y-\mathbb {E} [Y])].$

Definition. (Correlation coefficient) For each random variable $X,Y$ such that $\operatorname {Var} (X),\operatorname {Var} (Y)>0$ , the correlation coefficient is $\rho (X,Y)={\frac {\operatorname {Cov} (X,Y)}{\sqrt {\operatorname {Var} (X)\operatorname {Var} (Y)}}}.$

Both covariance and correlation coefficient measure linear relationship between $X$ and $Y$ . As we will see, $\rho (X,Y)\in [-1,1]$ , $X,Y$ are more highly correlated as $|\rho (X,Y)|$ increases, and $X$ has a linear relationship with $Y$ if $|\rho (X,Y)|=1$ .

Proposition. (Properties of covariance)

(i) (symmetry) for each random variable $X,Y$ , $\operatorname {Cov} (X,Y)=\operatorname {Cov} (Y,X)$ (ii) for each random variable $X$ , $\operatorname {Cov} (X,X)=\operatorname {Var} (X)$ (iii) (alternative formula of covariance) $\operatorname {Cov} (X,Y)=\mathbb {E} [XY]-\mathbb {E} [X]\mathbb {E} [Y]$ (iv) for each constant $a_{1},\ldots ,a_{n},b_{1},\ldots ,b_{m},c,d$ , and for each random variables $X_{1},\ldots ,X_{n},Y_{1},\ldots ,Y_{m}$ , $\operatorname {Cov} \left(\sum _{i=1}^{n}(a_{i}X_{i}+c),\sum _{j=1}^{m}(b_{j}Y_{j}+d)\right)=\sum _{i=1}^{n}\sum _{j=1}^{m}a_{i}b_{j}\operatorname {Cov} (X_{i},Y_{j})$ (v) for each random variable $X_{1},\ldots ,X_{n}$ , $\operatorname {Var} (X_{1}+\cdots +X_{n})=\sum _{i=1}^{n}\operatorname {Var} (X_{i})+2\sum _{1\leq i<j\leq n}^{}\operatorname {Cov} (X_{i},Y_{j})$

Proof.

(i) $\operatorname {Cov} (X,Y)=\mathbb {E} [(X-\mathbb {E} [X])(Y-\mathbb {E} [Y])]=\mathbb {E} [(Y-\mathbb {E} [Y])(X-\mathbb {E} [X])]=\operatorname {Cov} (Y,X)$ (ii) $\operatorname {Cov} (X,X)=\mathbb {E} [(X-\mathbb {E} [X])(X-\mathbb {E} [X])]=\mathbb {E} [(X-\mathbb {E} [X])^{2}]=\operatorname {Var} (X)$ (iii) ${\begin{aligned}\operatorname {Cov} (X,Y)&=\mathbb {E} [(X-\mathbb {E} [X])(Y-\mathbb {E} [Y])]\\&=\mathbb {E} [XY-X\mathbb {E} [Y]-Y\mathbb {E} [X]+\mathbb {E} [X]\mathbb {E} [Y]]\\&=\mathbb {E} [XY]-\mathbb {E} [Y]\mathbb {E} [X]{\cancel {-\mathbb {E} [X]\mathbb {E} [Y]+\mathbb {E} [X]\mathbb {E} [Y]}}\qquad {\text{by linearity}}\\&=\mathbb {E} [XY]-\mathbb {E} [X]\mathbb {E} [Y]\end{aligned}}$ (iv) ${\begin{aligned}\operatorname {Cov} \left(\sum _{i=1}^{n}(a_{i}X_{i}+c),\sum _{j=1}^{m}(b_{j}Y_{j}+d)\right)&=\mathbb {E} \left[\left(\sum _{i=1}^{n}(a_{i}X_{i}+c)-\sum _{i=1}^{n}\mathbb {E} [a_{i}X_{i}+c]\right)\left(\sum _{j=1}^{m}(b_{j}Y_{j}+d)-\sum _{j=1}^{m}\mathbb {E} [b_{j}Y_{j}+d]\right)\right]\\&=\mathbb {E} \left[\sum _{i=1}^{n}(a_{i}X_{i}-\mathbb {E} [a_{i}X_{i}])\sum _{j=1}^{m}(b_{j}Y_{j}-\mathbb {E} [b_{j}Y_{j}])\right]\\&=\mathbb {E} \left[\sum _{i=1}^{n}\sum _{j=1}^{m}(a_{i}X_{i}-\mathbb {E} [a_{i}X_{i}])(b_{j}Y_{j}-\mathbb {E} [b_{j}Y_{j}])\right]\\&=\sum _{i=1}^{n}\sum _{j=1}^{m}\mathbb {E} [(a_{i}X_{i}-a_{i}\mathbb {E} [X_{i}])(b_{j}Y_{j}-b_{j}\mathbb {E} [Y_{j}])]&{\text{by linearity}}\\&=\sum _{i=1}^{n}\sum _{j=1}^{m}a_{i}b_{j}\mathbb {E} [X_{i}-\mathbb {E} [X_{i}]]\mathbb {E} [Y_{j}-\mathbb {E} [Y_{j}]]\\&=\sum _{i=1}^{n}\sum _{j=1}^{m}a_{i}b_{j}\operatorname {Cov} (X_{i},Y_{j})\end{aligned}}$ (v) ${\begin{aligned}\operatorname {Var} \left(\sum _{i=1}^{n}X_{i}\right)&{\overset {\text{(ii)}}{=}}\operatorname {Cov} \left(\sum _{i=1}^{n}X_{i},\sum _{j=1}^{n}X_{j}\right)\\&{\overset {\text{(iv)}}{=}}\sum _{i=1}^{n}\sum _{j=1}^{n}\operatorname {Cov} (X_{1},X_{j})\\&=\sum _{1\leq i=j\leq n}^{}\operatorname {Cov} (X_{i},X_{j})+\sum _{1\leq i\neq j\leq n}^{}\operatorname {Cov} (X_{i},X_{j})\\&{\overset {\text{(ii)}}{=}}\sum _{i=1}^{n}\operatorname {Var} (X_{i})+\sum _{1\leq i<j\leq n}^{}\operatorname {Cov} (X_{i},X_{j})+\sum _{1\leq j<i\leq n}^{}\operatorname {Cov} (X_{i},X_{j})\\&{\overset {\text{(i)}}{=}}\sum _{i=1}^{n}\operatorname {Var} (X_{i})+2\sum _{1\leq i<j\leq n}^{}\operatorname {Cov} (X_{i},X_{j})\end{aligned}}$

$\Box$

Then, we will discuss about correlation coefficients. The following is the definition of correlation between correlation between two random variables.

Definition. (Correlation between two random variables) Random variables $X,Y$ are uncorrelated if $\operatorname {Cov} (X,Y)=0$ , and are correlated if $\operatorname {Cov} (X,Y)\neq 0$

Remark.

$\operatorname {Cov} (X,Y)=0\Leftrightarrow \rho (X,Y)=0$ , and $\operatorname {Cov} (X,Y)\neq 0\Leftrightarrow \rho (X,Y)\neq 0$ if $\operatorname {Var} (X)\neq 0$ and $\operatorname {Var} (Y)\neq 0$ . This explains why we use covariance instead of correlation coefficient. It is because covariance is always defined, but correlation coefficient may be undefined.

Covariance and correlation coefficient are similar, but they have differences. In particular, $\operatorname {Cov} (X,Y)$ depends on variances of $X$ and $Y$ , not just their relationship. Thus, this number is affected by the variances, and does not measure their relationship accurately. On the other hand, $\rho (X,Y)$ adjusts for variances of $X$ and $Y$ , and therefore measures their relationships more accurately.

The following is one of the most important properties of correlation coefficient.

Proposition. (Universal measure by correlation coefficient) Correlation coefficient lies between -1 and 1 (inclusively).

Proof. For each random variable $X,Y$ ,

Aim: prove that $\rho (X,Y)\leq 1\Leftrightarrow {\frac {\operatorname {Cov} (X,Y)}{\sqrt {\operatorname {Var} (X)\operatorname {Var} (Y)}}}\leq 1$ . To get rid of the square root to make the proof neater, we square both side of the inequality, and get ${\frac {\operatorname {Cov} (X,Y)^{2}}{\operatorname {Var} (X)\operatorname {Var} (Y)}}\leq 1\Leftrightarrow {\frac {\operatorname {Cov} (X,Y)^{2}}{\operatorname {Var} (Y)}}\leq \operatorname {Var} (X)\Leftrightarrow \operatorname {Var} (X)+{\frac {\operatorname {Cov} (X,Y)^{2}}{\operatorname {Var} (Y)}}\geq 0$ .

Recall that $\operatorname {Var} (\cdot )\geq 0$ . So, one way to prove the rightmost inequality is expressing its left side as $\operatorname {Var} (\cdot )$ , as follows: $\operatorname {Var} (X)-{\frac {\operatorname {Cov} (X,Y)^{2})}{\operatorname {Var} (Y)}}=\operatorname {Var} (X)+\left({\frac {\operatorname {Cov} (X,Y))}{\operatorname {Var} (Y)}}\right)^{2}\operatorname {Var} (Y)-2\left({\frac {\operatorname {Cov} (X,Y)}{\operatorname {Var} (Y)}}\right){\overset {\text{(iv,v)}}{=}}\operatorname {Var} \left(X-{\frac {\operatorname {Cov} (X,Y)}{\operatorname {Var} (Y)}}Y\right).$ Thus, the result follows.

$\Box$

Remark. For each random variable $X,Y$ ,

the higher the $|\rho (X,Y)|$ , the higher the correlation between $X,Y$
because of this, we can compare the correlation of different pairs of random variables
if $\rho (X,Y)=1$ , $X$ increases linearly with $Y$
if $\rho (X,Y)=-1$ , $X$ decreases linearly with $Y$

Then, we will define several terminologies related to correlation coefficient.

Definition. (Positively correlated, negatively correlated, and uncorrelated) Random variables $X,Y$ are positively (negatively) correlated if $\rho (X,Y)>0\;(<0)$ , i.e. $X$ tends to $\uparrow (\downarrow )$ as $Y\uparrow$ .

They are uncorrelated if $\rho (X,Y)=0$ .

Then, we will state an important result that is related to independence and correlation. Intuitively, you may think that 'independent' is the same as 'uncorrelated'. However, this is wrong. Indeed, 'independent' is stronger than 'uncorrelated'.

Proposition. (Relationship between independence and correlation) If two random variables are independent, they are uncorrelated.

Proof. For each independent random variable $X,Y$ with mean $\mu ,\nu$ respectively, $\operatorname {Cov} (X,Y)=\mathbb {E} [(X-\mu )(Y-\nu )]{\overset {\text{independence}}{=}}\mathbb {E} [X-\mu ]\mathbb {E} [Y-\nu ]{\overset {\text{linearity}}{=}}(\underbrace {\mathbb {E} [X]} _{\mu }-\mu )(\underbrace {\mathbb {E} [Y]} _{\nu }-\nu )=0$

$\Box$

However, converse is not true, as we will see in the following example.

Example. Let $V,W\sim \operatorname {Ber} (1/2)$ such that they are independent. Set $X=V+W,Y=|V-W|$ . Since $V=W=0\Leftrightarrow X=Y=0$ , $(V,W)=(1,0){\text{ or }}(0,1)\Leftrightarrow X=Y=1$ , and $V=W=1\Leftrightarrow X=2{\text{ and }}Y=0$ , their joint pmf is $f(x,y)=\mathbf {1} \{X=Y=0\}/4+\mathbf {1} \{X=Y=1\}/2+\mathbf {1} \{X=2{\text{ and }}Y=0\}/4.$ The covariance $\operatorname {Cov} (X,Y)=\mathbb {E} [XY]-\mathbb {E} [X]\mathbb {E} [Y]=1(1)(1/2)-[1(1/2)+2(1/4)][1(1/2)]=0,$ and so $X,Y$ are uncorrelated.

On the other hand, $\mathbb {P} (X=Y=0)=1/4\neq \mathbb {P} (X=0)\mathbb {P} (Y=0)=(1/4)(1/4+1/4)=1/8,$ and so $X,Y$ are not independent.

This illustrates that 'uncorrelated' does not imply 'independent'.

Exercise.

	Two random variables $X,Y$ are uncorrelated if at least one of them is non-random constant.
	For each random variable, it is uncorrelated with itself.
	For each random variable, it increases linearly with itself.
	Random variables $X,Y$ are more highly correlated than random variables $V,W$ if $\rho (X,Y)>\rho (V,W)$ .

Joint Distributions and Independence

Probability
Properties of Distributions

Conditional Distributions

↑ Each of the Bernoulli r.v.'s acts as an indicator for the success of the corresponding trial. Since, there are $n$ independent Bernoulli trials, there are $n$ such indicators.
↑ Each geometric r.v. shows the number of failure for the corresponding success.
↑ since this probability is unconditional, because the corresponding mean is also unconditional, so that their sum is also unconditional mean (as in the proposition)
↑ $X_{1},\dotsc ,X_{n}$ are dependent, but we can still use the linearity of expectation, since it does not require independence.
↑ Each geometric r.v. shows the number of failure for the corresponding success.

[1] Each of the Bernoulli r.v.'s acts as an indicator for the success of the corresponding trial. Since, there are $n$ independent Bernoulli trials, there are $n$ such indicators.

[2] Each geometric r.v. shows the number of failure for the corresponding success.

[3] since this probability is unconditional, because the corresponding mean is also unconditional, so that their sum is also unconditional mean (as in the proposition)

[4] $X_{1},\dotsc ,X_{n}$ are dependent, but we can still use the linearity of expectation, since it does not require independence.

[5] Each geometric r.v. shows the number of failure for the corresponding success.

[1]

[2]

[3]

[4]

[5]

	$p$
	$q$
	$p+q$
	$(1-p)(1-q)+p(1-q)+q(1-p)+2pq$
	$2p(1-q)+2pq$

	$\pi$
	$p\pi$
	$q\pi$
	$(1-p)\pi$
	$(1-q)\pi$

	$pq+(1-p)\pi$
	$p(p+q)+(1-p)\pi$
	$pq+(1-p)q\pi$
	$p(p+q)+(1-p)p\pi$

	$\mathbb {E} [Z]$ increases.
	$\mathbb {E} [Z]$ decreases.
	Change in $\mathbb {E} [Z]$ depends on values of $p$ and $q$ .
	$\mathbb {E} [Y]$ remains unchanged.
	$\mathbb {E} [Z]$ increases if $p=q=1/3$ .

	$\mathbb {E} [X]\geq 0$ for each random variable $X$ .
	$\mathbb {E} [\|X\|]\geq 0$ for each random variable $X$ .
	$\|\mathbb {E} [X]\|\geq 0$ for each random variable $X$ .
	$\mathbb {E} [XYZ]=\mathbb {E} [X]\mathbb {E} [Y]\mathbb {E} [Z]$ if random variables $X,Y$ and $Z$ are pairwise independent.

Probability/Properties of Distributions

Contents

Introduction

Expectation

Mean of some distributions of a discrete random variable

Mean of some distributions of a continuous random variable

Examples

Probability generating functions

Variance (and standard deviation)

Variance of some distributions of a discrete random variable

Variance of some distributions of a continuous random variable

Coefficient of variation

Quantile

Mode

Covariance and correlation coefficients

	$a\mathbb {E} [X]+\mathbb {E} [bY+c]$
	$c\mathbb {E} [(a/c)X+(b/c)Y]+c$
	$(b-a)k+c$
	$\mathbb {E} [-ak+bk+c]$

	20th quantile is 0.2th percentile
	4th quartile is 1st quantile
	2nd quantile is undefined.
	0th quantile = 0th percentile = 0th quartile.
	Interquartile range must be nonnegative.
	Median must be nonnegative.