# Probability/Joint Distributions and Independence

## Motivation

Suppose we are given a pmf of a discrete random variable ${\displaystyle X}$  and a pmf of a discrete random variable ${\displaystyle Y}$ . For example,

${\displaystyle f_{X}(x)=(\mathbf {1} \{x=0\}+\mathbf {1} \{x=1\})/2\quad {\text{and}}\quad f_{Y}(y)=(\mathbf {1} \{y=0\}+\mathbf {1} \{y=2\})/2}$

We cannot tell the relationship between ${\displaystyle X}$  and ${\displaystyle Y}$  with only such information. They may be related or not related.

For example, the random variable ${\displaystyle X}$  may be defined as ${\displaystyle X=1}$  if head comes up and ${\displaystyle X=0}$  otherwise from tossing a fair coin, and the random variable ${\displaystyle Y}$  may be defined as ${\displaystyle Y=2}$  if head comes up and ${\displaystyle Y=0}$  otherwise from tossing the coin another time. In this case, ${\displaystyle X}$  and ${\displaystyle Y}$  are unrelated.

Another possibility is that the random variable ${\displaystyle Y}$  is defined as ${\displaystyle Y=2X}$  if head comes up in the first coin tossing, and ${\displaystyle Y=0}$  otherwise. In this case, ${\displaystyle X}$  and ${\displaystyle Y}$  are related.

Yet, in the above two examples, the pmf of ${\displaystyle X}$  and ${\displaystyle Y}$  are exactly the same.

Therefore, to tell the relationship between ${\displaystyle X}$  and ${\displaystyle Y}$ , we define the joint cumulative distribution function, or joint cdf.

## Joint distributions

Definition. (Joint cumulative distribution function) Let ${\displaystyle X_{1},\dotsc ,X_{n}}$  be random variables defined on a sample space ${\displaystyle \Omega }$ . The joint cumulative distribution function (cdf) of random variables ${\displaystyle X_{1},\dotsc ,X_{n}}$  is

${\displaystyle F(x_{1},\dotsc ,x_{n})=\mathbb {P} (X_{1}\leq x_{1}\cap \cdots \cap X_{n}\leq x_{n})=\mathbb {P} \left(\bigcap _{i=1}^{n}\{\omega \in \Omega :X_{i}(\omega )\leq x_{i}\}\right).}$

Sometimes, we may want to know the random behaviour in one of the random variables involved in a joint cdf. We can do this by computing the marginal cdf from joint cdf. The definition of marginal cdf is as follows:

Definition. (Marginal cumulative distribution function) The cumulative distribution function ${\displaystyle F_{X_{i}}}$  of each random variable ${\displaystyle X_{i}}$  is marginal cumulative distribution function (cdf) of ${\displaystyle X_{i}}$  which is a member in the ${\displaystyle n}$  random variables ${\displaystyle X_{1},\dotsc ,X_{n}}$ .

Remark. Actually, the marginal cdf of ${\displaystyle X_{i}}$  is simply the cdf of ${\displaystyle X_{i}}$  (which is in one variable). We have already discussed this kind of cdf in previous chapters.

Proposition. (Obtaining marginal cdf from joint cdf) Given a joint cdf ${\displaystyle F(x_{1},\dotsc ,x_{n})}$ , the marginal cdf of ${\displaystyle X_{i}}$  is

${\displaystyle F_{X_{i}}(x)=F(\infty ,\dotsc ,\infty ,\underbrace {x} _{i{\text{-th position}}},\infty ,\dotsc ,\infty ).}$

Proof. When we set the arguments other than ${\displaystyle i}$ -th argument to be ${\displaystyle \infty }$ , e.g. ${\displaystyle X_{1}\leq \infty \Leftrightarrow \lim _{x\to \infty }X_{1}\leq x}$  , the joint cdf becomes

{\displaystyle {\begin{aligned}\mathbb {P} (X_{1}\leq \infty \cap \cdots \cap X_{i-1}\leq \infty \cap X_{i}\leq x\cap X_{i+1}\leq \infty \cap \cdots \cap X_{n}\leq \infty )&=\underbrace {\mathbb {P} (X_{1}\leq \infty \cap \cdots \cap X_{i-1}\leq \infty )} _{1}\mathbb {P} (X_{i}\leq x)\underbrace {\mathbb {P} (X_{i+1}\leq \infty \cap \cdots \cap X_{n}\leq \infty )} _{1}\qquad {\text{by independence}}\\&=\mathbb {P} (X_{i}\leq x)\\&=F_{X_{i}}(x)\end{aligned}}}

${\displaystyle \Box }$

Remark. In general, we cannot deduce the joint cdf from a given set of marginal cdf's.

Example. Consider the joint cdf of random variables ${\displaystyle X}$  and ${\displaystyle Y}$ :

${\displaystyle F(x,y)=1-e^{-x}-e^{-y}+e^{-xy}.}$

The marginal cdf of ${\displaystyle Y}$  is
${\displaystyle F_{Y}(y)=\lim _{x\to \infty }(1-e^{-x}-e^{-y}+e^{-xy})=1-e^{-y}.}$

Similar to the one-variable case, we have joint pmf and joint pdf. Also, analogously, we have marginal pmf and marginal pdf.

Definition. (Joint probability mass function) The joint probability mass function (joint pmf) of ${\displaystyle X_{1},\dotsc ,X_{n}}$  is

${\displaystyle f(x_{1},\dotsc ,x_{n})=\mathbb {P} {\big (}(X_{1},\dotsc ,X_{n})=(x_{1},\dotsc ,x_{n}){\big )},\quad (x_{1},\dotsc ,x_{n})\in \mathbb {R} ^{n}.}$

Definition. (Marginal probability mass function) The marginal probability mass function (marginal pmf) of each ${\displaystyle X_{i}}$  which is a member of the ${\displaystyle n}$  random variables ${\displaystyle X_{1},X_{2},\dotsc ,X_{n}}$  is

${\displaystyle f_{X_{i}}(x)=\mathbb {P} (X_{i}=x),\quad x\in \mathbb {R} .}$

Proposition. (Obtaining marginal pmf from joint pmf) For discrete random variables ${\displaystyle X_{1},\dotsc ,X_{n}}$  with joint pmf ${\displaystyle f}$ , the marginal pmf of ${\displaystyle X_{i}}$  is

${\displaystyle f_{X_{i}}({\color {red}x})=\underbrace {\sum _{u_{1}}\cdots \sum _{u_{i-1}}\sum _{u_{i+1}}\cdots \sum _{u_{n}}} _{n-1\;{\text{summations}}}f(u_{1},\dotsc ,u_{i-1},{\color {red}x},u_{i+1},\dotsc ,u_{n}).}$

Proof. Consider the case in which there are only two random variables, say ${\displaystyle X}$  and ${\displaystyle Y}$ . Then, we have

${\displaystyle \sum _{\color {green}y}f({\color {red}x},{\color {green}y})=\sum _{\color {green}y}\mathbb {P} (X={\color {red}x}\cap {\color {green}Y=y})=\mathbb {P} (X={\color {red}x})\qquad {\text{by law of total probability}}.}$

Similarly, in general case, we have
{\displaystyle {\begin{aligned}\sum _{\color {green}u_{n}}f(u_{1},\dotsc ,u_{i-1},{\color {red}x},u_{i+1},\dotsc ,{\color {green}u_{n}})&=\sum _{\color {green}u_{n}}\mathbb {P} (X_{1}\leq u_{1}\cap \cdots \cap X_{i-1}u_{i-1}\cap X_{i}\leq {\color {red}x}\cap X_{i+1}\leq u_{i+1}\cap \cdots \cap X_{n-1}\leq u_{n-1}\cap {\color {green}X_{n}\leq u_{n}})\\&=\mathbb {P} (X_{1}\leq u_{1}\cap \cdots \cap X_{i-1}u_{i-1}\cap X_{i}\leq {\color {red}x}\cap X_{i+1}\leq u_{i+1}\cap \cdots \cap X_{n-1}\leq u_{n-1})\qquad {\text{by law of total probability}}.\end{aligned}}}

Then, we perform similar process on each of the other variables (${\displaystyle n-2}$  left), with one extra summation sign added for each process. Thus, in total we will have ${\displaystyle n-1}$  summation sign, and we will finally get the desired result. ${\displaystyle \Box }$

Remark. This process may sometimes be called 'summing over each possible value of other variables'.

Example. Suppose we throw a fair six-faced dice two times. Let ${\displaystyle X}$  be the number facing up in the first throw, and ${\displaystyle Y}$  be the number facing up in the second throw. Then, the joint pmf of ${\displaystyle (X,Y)}$  is

${\displaystyle f(x,y)=\mathbb {P} (X=x\cap Y=y)={\frac {1}{6}}\cdot {\frac {1}{6}}={\frac {1}{36}}.}$

in which ${\displaystyle x,y\in \{1,2,3,4,5,6\}}$ , and ${\displaystyle f(x,y)=0}$  otherwise. Also, the marginal pmf of ${\displaystyle X}$  is
${\displaystyle f_{X}(x)=\sum _{y}f(x,y)=f(x,1)+f(x,2)+\cdots +f(x,6)=6(1/36)={\frac {1}{6}}}$

in which ${\displaystyle x\in \{1,2,3,4,5,6\}}$ , and ${\displaystyle f_{X}(x)=0}$  otherwise.

By symmetry (replace all ${\displaystyle X}$  with ${\displaystyle Y}$  and replace all ${\displaystyle x}$  with ${\displaystyle y}$ ), the marginal pmf of ${\displaystyle Y}$  is

${\displaystyle f_{Y}(y)={\frac {1}{6}}}$

in which ${\displaystyle y\in \{1,2,3,4,5,6\}}$ , and ${\displaystyle f_{Y}(x)=0}$  otherwise.

Exercise. Suppose there are two red balls and one blue ball in a box, and we draw two balls one by one from the box with replacement. Let ${\displaystyle X=1}$  if the ball from the first draw is red, and ${\displaystyle X=0}$  otherwise. Let ${\displaystyle Y=1}$  if the ball from the second draw is red, and ${\displaystyle Y=0}$  otherwise.

1 Calculate the marginal pmf of ${\displaystyle X}$ .

 ${\displaystyle f_{X}(x)=(\mathbf {1} \{x=0\}+2\cdot \mathbf {1} \{x=1\})/3}$ ${\displaystyle f_{X}(x)=(\mathbf {1} \{x=1\}+2\cdot \mathbf {1} \{x=0\})/3}$ ${\displaystyle f_{X}(x)=2/3}$ ${\displaystyle f_{X}(x)=(\mathbf {1} \{x=1\}+\mathbf {1} \{x=0\})/2}$

2 Calculate the joint pmf of ${\displaystyle (X,Y)}$ .

 ${\displaystyle f(x,y)=(1/9)(\mathbf {1} \{(x,y)=(0,0)\}+2\cdot \mathbf {1} \{(x,y)=(0,1)\}+2\cdot \mathbf {1} \{(x,y)=(1,0)\}+4\cdot \mathbf {1} \{(x,y)=(1,1)\}}$ ) ${\displaystyle f(x,y)=(1/9)(4\cdot \mathbf {1} \{(x,y)=(0,0)\}+2\cdot \mathbf {1} \{(x,y)=(0,1)\}+2\cdot \mathbf {1} \{(x,y)=(1,0)\}+\mathbf {1} \{(x,y)=(1,1)\}}$ ) ${\displaystyle f(x,y)=(2/9)(\mathbf {1} \{(x,y)=(0,0)+\mathbf {1} \{(x,y)=(0,1)\}+\mathbf {1} \{(x,y)=(1,0)\}+\mathbf {1} \{(x,y)=(1,1)\}}$ )

Exercise. Recall the example in the motivation section.

(a) Suppose we toss a fair coin twice. Let ${\displaystyle X=\mathbf {1} \{{\text{head comes up}}\}}$  and ${\displaystyle Y=2\cdot \mathbf {1} \{{\text{head comes up}}\}}$ . Show that joint pmf of ${\displaystyle (X,Y)}$  is

${\displaystyle f(x,y)={\frac {\mathbf {1} \{x\in \{0,1\}\cap y\in \{0,2\}\}}{4}}.}$

(b) Suppose we toss a fair coin once. Let ${\displaystyle X=\mathbf {1} \{{\text{head comes up}}\}}$  and ${\displaystyle Y=2X}$ . Show that joint pmf of ${\displaystyle (X,Y)}$  is

${\displaystyle f(x,y)={\frac {\mathbf {1} \{x\in \{0,1\}\cap y=2x\}}{2}}.}$

(c) Show that marginal pmf of ${\displaystyle X}$  and ${\displaystyle Y}$  are

${\displaystyle f_{X}(x)={\frac {\mathbf {1} \{x\in \{0,1\}\}}{2}}\quad {\text{and}}\quad f_{Y}(y)={\frac {\mathbf {1} \{y\in \{0,2\}\}}{2}}}$

in each of the situations in (a) and (b). (Hint: for part (b), we need to put value in the variable in the indicator)

Proof.

(a) Since the support of ${\displaystyle (X,Y)}$  is ${\displaystyle \{(0,0),(0,2),(1,0),(1,2)\}}$ , the joint pmf of ${\displaystyle (X,Y)}$  is

${\displaystyle f(x,y)=\mathbb {P} (X=0\cap Y=0)+\mathbb {P} (X=0\cap Y=2)+\mathbb {P} (X=1\cap Y=0)+\mathbb {P} (X=1\cap Y=2)={\frac {\mathbf {1} \{x\in \{0,1\}\cap y\in \{0,2\}\}}{4}}.}$

(b) Since the support of ${\displaystyle (X,Y)}$  is ${\displaystyle x\in \{0,1\}\cap y=2x}$ , the joint pmf of ${\displaystyle (X,Y)}$  is

${\displaystyle f(x,y)=\mathbb {P} (X=0\cap Y=2(0)=0)+\mathbb {P} (X=1\cap Y=2(1)=2)={\frac {\mathbf {1} \{x\in \{0,1\}\cap y=2x\}}{2}}.}$

(c) Part (a): marginal pmf of ${\displaystyle X}$  is

${\displaystyle f_{X}(x)=f(x,0)+f(x,2)={\frac {2\mathbf {1} \{x\in \{0,1\}\}}{4}}={\frac {\mathbf {1} \{x\in \{0,1\}\}}{2}},}$

and marginal pmf of ${\displaystyle Y}$  is
${\displaystyle f_{Y}(y)=f(0,y)+f(1,y)={\frac {2\mathbf {1} \{y\in \{0,2\}\}}{4}}={\frac {\mathbf {1} \{y\in \{0,2\}\}}{2}}.}$

Part (b): The marginal pmf of ${\displaystyle X}$  is

${\displaystyle f_{X}(x)=f(x,0)+f(x,2)={\frac {\mathbf {1} \{\overbrace {x\in \{0,1\}\cap 0=2x} ^{x=0}\}}{2}}+{\frac {\mathbf {1} \{\overbrace {x\in \{0,1\}\cap 2=2x} ^{x=1}\}}{2}}={\frac {\overbrace {\mathbf {1} \{x=0\}+\mathbf {1} \{x=1\}} ^{\mathbf {1} \{x=0\cup x=1\}}}{2}}={\frac {\mathbf {1} \{x\in \{0,1\}\}}{2}}.}$

Similarly, the marginal pmf of ${\displaystyle Y}$  is
${\displaystyle f_{Y}(y)=f(0,y)+f(1,y)={\frac {\mathbf {1} \{0\in \{0,1\}\cap y=0\}}{2}}+{\frac {\mathbf {1} \{1\in \{0,1\}\cap y=2\}}{2}}={\frac {\overbrace {\mathbf {1} \{0\in \{0,1\}\}} ^{1}\mathbf {1} \{y=0\}+\overbrace {\mathbf {1} \{1\in \{0,1\}\}} ^{1}\mathbf {1} \{y=2\}}{2}}={\frac {\mathbf {1} \{y\in \{0,2\}\}}{2}}.}$

For jointly continuous random variables, the definition is generalized version of the one for continuous random variables (univariate case).

Definition. (Jointly continuous random variable) Random variables ${\displaystyle X_{1},\dotsc ,X_{n}}$  are jointly continuous if

${\displaystyle \mathbb {P} {\big (}(X_{1},\dotsc ,X_{n})\in S{\big )}=\int \dotsi \int _{S}f(x_{1},\dotsc ,x_{n})\,dx_{1}\cdots \,dx_{n},\quad S\subseteq \mathbb {R} ^{n},}$

for some nonnegative function ${\displaystyle f}$ .

Remark.

• The function ${\displaystyle f}$  is the joint probability density function (joint pdf) of ${\displaystyle X_{1},\dotsc ,X_{n}}$ .
• Similarly, ${\displaystyle f(x_{1},\dotsc ,x_{n})\,dx_{1}\cdots \,dx_{n}}$  can be interpreted as the probability over the 'infinitesimal' region ${\displaystyle [x_{1},x_{1}+dx_{1}]\times \dotsb \times [x_{n},x_{n}+dx_{n}]}$ , and ${\displaystyle f(x_{1},\dotsc ,x_{n})}$  can be interpreted as the density of the probability over that 'infinitesimal' region, i.e. ${\displaystyle {\frac {\mathbb {P} {\big (}X\in [x_{1},x_{1}+dx_{1}]\times \dotsb \times [x_{n},x_{n}+dx_{n}]{\big )}}{dx_{1}\dotsb dx_{n}}}}$ , intuitively and non-rigourously.
• By setting ${\displaystyle S=(-\infty ,x_{1}]\times \dotsb \times (-\infty ,x_{n}]}$ , the cdf

${\displaystyle F(x_{1},\dotsc ,x_{n})=\underbrace {\int _{-\infty }^{x_{1}}\cdots \int _{-\infty }^{x_{n}}} _{n\;{\text{integrations}}}f(u_{1},\dotsc ,u_{n})\,du_{n}\cdots \,du_{1},}$

which is similar to the univariate case.

Definition. (Marginal probability density function) The pdf ${\displaystyle f_{X_{i}}}$  of each ${\displaystyle X_{i}}$  which is a member of the ${\displaystyle n}$  random variables ${\displaystyle X_{1},X_{2},\dotsc ,X_{n}}$  is the marginal probability density function (marginal pdf) of ${\displaystyle X_{i}}$ .

Proposition. (Obtaining marginal pdf from joint pdf) For continuous random variables ${\displaystyle X_{1},\dotsc ,X_{n}}$  with joint pdf ${\displaystyle f}$ , the marginal pdf of ${\displaystyle X_{i}}$  is

${\displaystyle f_{X_{i}}({\color {red}x})=\underbrace {\int _{-\infty }^{\infty }\cdots \int _{-\infty }^{\infty }} _{n-1\;{\text{integrations}}}f(u_{1},\dotsc ,u_{i-1},{\color {red}x},u_{i+1},\dotsc ,u_{n})\,du_{1}\cdots \,du_{i-1}\,du_{i+1}\cdots \,du_{n}.}$

Proof. Recall the proposition about obtaining marginal cdf from joint cdf. We have

{\displaystyle {\begin{aligned}&&F_{X_{i}}({\color {red}x})&=F(\infty ,\dotsc ,\infty ,\overbrace {\color {red}x} ^{i{\text{-th position}}},\infty ,\dotsc ,\infty )\\&\Rightarrow &\int _{-\infty }^{\color {red}x}f_{X_{i}}(u)\,du&=\int _{-\infty }^{\infty }\cdots \int _{-\infty }^{\color {red}x}\cdots \int _{-\infty }^{\infty }f(u_{1},\dotsc ,u_{n})\,du_{n}\cdots \,du_{i}\cdots \,du_{1}\qquad {\text{by definitions}}\\&\Rightarrow &{\frac {d}{dx}}\int _{-\infty }^{\color {red}x}f_{X_{i}}(u)\,du&={\frac {d}{dx}}\int _{-\infty }^{\infty }\cdots \int _{-\infty }^{\color {red}x}\cdots \int _{-\infty }^{\infty }f(u_{1},\dotsc ,u_{n})\,du_{n}\cdots \,du_{i}\cdots \,du_{1}\\&\Rightarrow &f_{X_{i}}({\color {red}x})&=\underbrace {\int _{-\infty }^{\infty }\cdots \int _{-\infty }^{\infty }} _{n-1\;{\text{integrations}}}f(u_{1},\dotsc ,u_{i-1},{\color {red}x},u_{i+1},\dotsc ,u_{n})\,du_{1}\cdots \,du_{i-1}\,du_{i+1}\cdots \,du_{n}\qquad {\text{by fundamental theorem of calculus}}\end{aligned}}}

${\displaystyle \Box }$

Proposition. (Obtaining joint pdf from joint cdf) If a joint cdf ${\displaystyle F}$  of jointly continuous random variables has each partial deriviative at ${\displaystyle (x_{1},\dotsc ,x_{n})}$ , then the joint pdf is

${\displaystyle f(x_{1},\dotsc ,x_{n})={\frac {\partial ^{n}}{\partial x_{1}\cdots \partial x_{n}}}F(x_{1},\cdots ,x_{n}).}$

Proof. It follows from using fundamental theorem of calculus ${\displaystyle n}$  times.

${\displaystyle \Box }$

Example. If the joint pdf of jointly continuous random variable ${\displaystyle (X,Y)}$  is

${\displaystyle f(x,y)={\frac {1}{4}}xy(\mathbf {1} \{x,y\in [0,1]\}),}$

the marginal pdf of ${\displaystyle X}$  is
${\displaystyle f_{X}(x)=\int _{-\infty }^{\infty }{\frac {1}{4}}xy(\mathbf {1} \{x\in [0,1]\}\mathbf {1} \{y\in [0,1]\})\,dy={\frac {1}{4}}x(\mathbf {1} \{x\in [0,1]\})\underbrace {\int _{0}^{1}y\,dy} _{1^{2}/2-0^{2}/2}={\frac {1}{8}}x(\mathbf {1} \{x\in [0,1]\}).}$

Also,
${\displaystyle \mathbb {P} ((X,Y)\leq (1/2,1/2))=\int _{-\infty }^{1/2}\int _{-\infty }^{1/2}{\frac {1}{4}}xy(\mathbf {1} \{x,y\in [0,1]\}\,dx\,dy={\frac {1}{4}}\int _{0}^{1/2}y\int _{0}^{1/2}x\,dx\,dy={\frac {1}{4}}\cdot {\frac {(1/2)^{2}}{2}}\int _{0}^{1/2}y\,dy={\frac {1}{4}}\cdot {\frac {1}{8}}\cdot {\frac {1}{8}}={\frac {1}{256}}.}$

Exercise. Let ${\displaystyle X}$  and ${\displaystyle Y}$  be jointly continuous random variables. Consider the joint cdf of ${\displaystyle (X,Y)}$ :

${\displaystyle F(x,y)=\mathbf {1} \{x,y\in [0,2]\}{\frac {x^{2}y^{3}}{32}}.}$

1 Calculate the joint pdf of ${\displaystyle (X,Y)}$ .

 ${\displaystyle f(x,y)=\mathbf {1} \{x,y\in [0,2]\}{\frac {3xy^{2}}{8}}}$ ${\displaystyle f(x,y)=\mathbf {1} \{x,y\in [0,2]\}{\frac {3xy^{2}}{16}}}$ ${\displaystyle f(x,y)=\mathbf {1} \{x,y\in [0,2]\}{\frac {xy^{2}}{16}}}$ ${\displaystyle f(x,y)=\mathbf {1} \{x,y\in [0,2]\}{\frac {xy^{2}}{8}}}$

2 Calculate the marginal pdf of ${\displaystyle X}$ .

 ${\displaystyle \mathbf {1} \{0\leq y\leq 2\}{\frac {3}{8}}y^{2}}$ ${\displaystyle \mathbf {1} \{0\leq x\leq 2\}{\frac {1}{2}}x}$ ${\displaystyle \mathbf {1} \{0\leq y\leq 2\}{\frac {1}{12}}y^{3}}$ ${\displaystyle \mathbf {1} \{0\leq x\leq 2\}{\frac {1}{8}}x^{2}}$

## Independence

Recall that multiple events are independent if the probability for the intersection of them equals the product of probabilities of each event, by definition. Since ${\displaystyle \{X\in A\}}$  is also an event, we have a natural definition of independence for random variables as follows:

Definition. (Independence of random variables) Random variables ${\displaystyle X_{1},X_{2},\dotsc ,X_{n}}$  are independent if

${\displaystyle \mathbb {P} (X_{1}\in A_{1}\cap \cdots \cap X_{n}\in A_{n})=\mathbb {P} (X_{1}\in A_{1})\cdots \mathbb {P} (X_{n}\in A_{n})}$

for each ${\displaystyle n}$  and for each subset ${\displaystyle A_{1},A_{2},\dotsc ,A_{n}\subseteq \mathbb {R} }$ .

Remark. Under this condition, the events ${\displaystyle \{X_{1}\in A_{1}\},\dotsc ,\{X_{n}\in A_{n}\}}$  are independent.

Theorem. (Alternative condition for independence of random variables) Random variables ${\displaystyle X_{1},X_{2},\dotsc ,X_{n}}$  are independent if and only if the joint cdf of ${\displaystyle (X_{1},\dotsc ,X_{n})}$

${\displaystyle F(x_{1},\dotsc ,x_{n})=F_{X_{1}}(x_{1})\cdots F_{X_{n}}(x_{n})}$

or the joint pdf or pmf of ${\displaystyle (X_{1},\dotsc ,X_{n})}$
${\displaystyle f(x_{1},\dotsc ,x_{n})=f_{X_{1}}(x_{1})\cdots f_{X_{n}}(x_{n})}$

for each ${\displaystyle x_{1},\dotsc ,x_{n}\in \mathbb {R} }$ .

Proof. Partial:

Only if part: If random variables ${\displaystyle X_{1},X_{2},\dotsc ,X_{n}}$  are independent,

${\displaystyle \mathbb {P} (X_{1}\in A_{1}\cap \cdots \cap X_{n}\in A_{n})=\mathbb {P} (X_{1}\in A_{1})\cdots \mathbb {P} (X_{n}\in A_{n})}$

for each ${\displaystyle n}$  and for each subset ${\displaystyle A_{1},A_{2},\dotsc ,A_{n}\subseteq \mathbb {R} }$ . Setting ${\displaystyle A_{1}=(-\infty ,x_{1}),\dotsc ,A_{n}=(-\infty ,x_{n})}$ , and we have
${\displaystyle \mathbb {P} (X_{1}\leq x_{1}\cap \cdots \cap X_{n}\leq x_{n})=\mathbb {P} (X_{1}\leq x_{1})\cdots \mathbb {P} (X_{n}\leq x_{n})\implies F(x_{1},\dotsc ,x_{n})=F_{X_{1}}(x_{1})\cdots F_{X_{n}}(x_{n}).}$

Thus, we obtain the result for the joint cdf part.

For the joint pdf part,

{\displaystyle {\begin{aligned}&&F(x_{1},\dotsc ,x_{n})&=F_{X_{1}}(x_{1})\cdots F_{X_{n}}(x_{n})\\&\Rightarrow &{\frac {\partial ^{n}}{\partial x_{1}\cdots \partial x_{n}}}F(x_{1},\dotsc ,x_{n})&={\frac {\partial ^{n}}{\partial x_{1}\cdots \partial x_{n}}}\left(F_{X_{1}}(x_{1})\cdots F_{X_{n}}(x_{n})\right)\\&\Rightarrow &f(x_{1},\dotsc ,x_{n})&=f_{X_{n}}(x_{n}){\frac {\partial ^{n}}{\partial x_{1}\cdots \partial x_{n-1}}}\left(F_{X_{1}}(x_{1})\cdots F_{X_{n-1}}(x_{n-1})\right)\\&&&=f_{X_{n}}(x_{n})f_{X_{n-1}}(x_{n-1}){\frac {\partial ^{n}}{\partial x_{1}\cdots \partial x_{n-2}}}\left(F_{X_{1}}(x_{1})\cdots F_{X_{n-2}}(x_{n-2})\right)\\&&&=\cdots =f_{X_{1}}(x_{1})\cdots f_{X_{n}}(x_{n})\end{aligned}}}

${\displaystyle \Box }$

Remark.

• That is, if joint cdf (joint pdf (pmf)) can be factorized as the product of marginal cdf's (marginal pdf's (pmf's))
• Actually, if we can factorize the joint cdf or joint pdf or joint pmf as the product of some functions in each of the variables, then the condition is also satisfied.

Example. The joint pdf of two independent exponential random variables with rate ${\displaystyle \lambda }$ , ${\displaystyle X}$  and ${\displaystyle Y}$  is

${\displaystyle f(x,y)=(\mathbf {1} \{x\geq 0\}\lambda e^{-\lambda x})(\mathbf {1} \{y\geq 0\}\lambda e^{-\lambda y})=\mathbf {1} \{x,y\geq 0\}\lambda ^{2}e^{-\lambda (x+y)}.}$

(Random variables ${\displaystyle X}$  and ${\displaystyle Y}$  are said to be independent and identically distributed (i.i.d.) in this case)

In general, the joint pdf of ${\displaystyle n}$  independent exponential random variables with rate ${\displaystyle \lambda }$ , ${\displaystyle X_{1},\dotsc ,X_{n}}$  is

${\displaystyle f(x_{1},\dotsc ,x_{n})=\mathbf {1} \{x_{1},\dotsc ,x_{n}\geq 0\}\lambda ^{n}e^{-\lambda (x_{1}+\cdots +x_{n})}.}$

(Random variables ${\displaystyle X_{1},\dotsc ,X_{n}}$  are also i.i.d. in this case)

On the other hand, if the joint pdf of two random variables ${\displaystyle V}$  and ${\displaystyle W}$  are

${\displaystyle f(v,w)=\mathbf {1} \{w\leq 2-2v\},}$

random variables ${\displaystyle V}$  and ${\displaystyle W}$  are dependent since the joint pdf cannot be factorized as the product of marginal pdf's.

Exercise. Let ${\displaystyle X,Y,Z}$  be jointly continuous random variables. Consider a joint pdf of ${\displaystyle (X,Y,Z)}$ :

${\displaystyle f(x,y,z)=\mathbf {1} \{x,y,z\geq 0\}\mathbf {1} \{x+y+z/k\leq 1\}.}$

1 Calculate ${\displaystyle k}$ .

 1 2 3 4

2 Are ${\displaystyle X,Y,Z}$  independent?

 yes no

Consider another joint pdf of ${\displaystyle (X,Y,Z)}$ :

${\displaystyle f(x,y,z)=\mathbf {1} \{x,y,z\geq 0\}\mathbf {1} \{y\leq 1-x\}\mathbf {1} \{z\leq k\}}$

1 Calculate ${\displaystyle k}$ .

 1 2 3 4

2 Are ${\displaystyle X,Y,Z}$  independent?

 yes no

Consider another joint pdf of ${\displaystyle (X,Y,Z)}$ :

${\displaystyle f(x,y,z)=kxyz\mathbf {1} \{x,y\in [0,1]\}\mathbf {1} \{z\in [0,2]\}.}$

1 Calculate ${\displaystyle k}$ .

 1 2 3 4

2 Are ${\displaystyle X,Y,Z}$  independent?

 yes no

Proposition. (Independence of events concerning disjoint sets of independent random variables) Suppose random variables ${\displaystyle X_{1},X_{2},\dotsc }$  are independent. Then, for each ${\displaystyle r  and fixed functions ${\displaystyle f_{1},f_{2},f_{3},\dotsc }$ , the random variables

${\displaystyle Y_{1}=f_{1}(X_{1},\dotsc ,X_{\color {red}r}),\quad Y_{2}=f_{2}(X_{{\color {red}r}+1},\dotsc ,X_{\color {blue}s}),\quad Y_{3}=f_{3}(X_{{\color {blue}s}+1},\dotsc ,X_{t}),\dotsc }$

are independent.

Example. Suppose ${\displaystyle X_{1},X_{2},X_{3},X_{4}}$  are independent Bernoulli random variables with success probability ${\displaystyle p}$ . Then, ${\displaystyle Y_{1}=X_{1}+X_{2}}$  and ${\displaystyle Y_{2}=X_{3}-X_{4}}$  are also independent.

On the other hand, ${\displaystyle Y_{1}=X_{1}+X_{2}}$  and ${\displaystyle Y_{2}=2-X_{3}-X_{2}}$  are not independent. A counter-example to the independence is

${\displaystyle \underbrace {\mathbb {P} (Y_{1}=2\cap Y_{2}=2)} _{0}\neq \underbrace {\mathbb {P} (Y_{1}=2)\mathbb {P} (Y_{2}=2)} _{{\text{may}}\;\neq 0}.}$

Left hand side equals zero since ${\displaystyle Y_{1}=2\implies X_{2}=1}$ , but ${\displaystyle Y_{2}=2\implies X_{2}=0}$ .

Right hand side may not equal zero since ${\displaystyle \mathbb {P} (Y_{1}=2)=\mathbb {P} (X_{1}=1\cap X_{2}=1)=p^{2}}$ , and ${\displaystyle \mathbb {P} (Y_{2}=2)=\mathbb {P} (X_{2}=0\cap X_{3}=0)=(1-p)^{2}}$ . We can see that ${\displaystyle p^{2}(1-p^{2})}$  may not equal zero.

Exercise.

Let ${\displaystyle X_{1},\dotsc ,X_{n}}$  be i.i.d. random variables, and ${\displaystyle Y_{1},\dotsc ,Y_{n}}$  also be i.i.d. random variables. Which of the following is (are) true?

 ${\displaystyle \sum _{i=1}^{n-1}X_{i}}$  and ${\displaystyle X_{n}}$  are independent. ${\displaystyle X_{1}^{X_{2}}}$  and ${\displaystyle X_{3}^{X_{4}}}$  are independent. ${\displaystyle \prod _{i=1}^{n}X_{i}}$  and ${\displaystyle \prod _{i=1}^{n}Y_{i}}$  are independent. ${\displaystyle X_{1}+X_{2}+X_{3}}$  and ${\displaystyle Y_{1}+Y_{2}+Y_{3}}$  are independent if ${\displaystyle X_{1},\dotsc ,X_{n},Y_{1}}$  are independent.

### Sum of independent random variables (optional)

In general, we use joint cdf, pdf or pmf to determine the distribution of sum of independent random variables by first principle. In particular, there are some interesting results related to the distribution of sum of independent random variables.

### Order statistics

Definition. (Order statistics) Let ${\displaystyle X_{1},\dotsc ,X_{n}}$  be ${\displaystyle n}$  i.i.d. r.v.'s (each with cdf ${\displaystyle F(x)}$ ). Define ${\displaystyle X_{(1)},X_{(2)},\dotsc ,X_{(n)}}$  be the smallest, second smallest, ..., largest of ${\displaystyle X_{1},X_{2},\dotsc ,X_{n}}$ . Then, the ordered values ${\displaystyle X_{(1)}\leq X_{(2)}\leq \dotsb \leq X_{(n)}}$  is the order statistics.

Proposition. (Cdf of order statistics) The cdf of ${\displaystyle X_{(k)}}$  (${\displaystyle k}$  is an integer such that ${\displaystyle 1\leq k\leq n}$ ) is

${\displaystyle F_{X_{(k)}}({\color {blue}x})=\sum _{j=k}^{n}{\binom {n}{j}}(F({\color {blue}x}))^{j}{\big (}1-F({\color {blue}x}){\big )}^{n-j}.}$

Proof.

• Consider the event ${\displaystyle \{X_{(k)}\leq {\color {blue}x}\}}$ .
                          Possible positions of x
|<--------------------->
*---*----...------*----*------...--------*
X  (1)  (2)          (k)  (k+1)             (n)
|----------------------> when x moves RHS like this, >=k X_i are at the LHS of x

• We can see from the above figure that ${\displaystyle \{X_{(k)}\leq {\color {blue}x}\}=\{{\text{at least }}k{\text{ of the }}X_{i}{\text{'s are }}\leq {\color {blue}x}\}}$ .
• Let no. of ${\displaystyle X_{i}}$ 's that are less than or equal to ${\displaystyle {\color {blue}x}}$  be ${\displaystyle N}$ .
• Since ${\displaystyle N\sim \operatorname {Binom} (n,\mathbb {P} (X_{i}\leq {\color {blue}x})){\overset {\text{ def }}{=}}\operatorname {Binom} (n,F({\color {blue}x}))}$  (because for each ${\displaystyle X_{i}}$ , we can treat ${\displaystyle X_{i}\leq x}$  and ${\displaystyle X_{i}>x}$  be the two outcomes in a Bernoulli trial),
• The cdf is

${\displaystyle \mathbb {P} (X_{(k)}\leq {\color {blue}x})=\mathbb {P} (N\geq k)=\sum _{j=k}^{n}{\binom {n}{j}}(F({\color {blue}x}))^{j}{\big (}1-F({\color {blue}x}){\big )}.}$

${\displaystyle \Box }$

Example. Let ${\displaystyle X_{1},X_{2},X_{3}}$  be i.i.d. r.v.'s following ${\displaystyle \operatorname {Exp} (2)}$ . Then, the cdf of ${\displaystyle X_{(2)}}$  is

${\displaystyle \sum _{j=2}^{3}{\binom {3}{j}}(F(x))^{j}(1-F(x))^{3-j}=\mathbf {1} \{x\geq 0\}\left({\binom {3}{2}}(1-e^{-2x})^{2}(e^{-2x})+{\binom {3}{3}}(1-e^{-2x})^{3}\right)=\mathbf {1} \{x\geq 0\}\left(3(1-e^{-2x})^{2}(e^{2x})+(1-e^{-2x})^{3}\right).}$

Exercise.

Calculate ${\displaystyle \mathbb {P} (X_{(2)}\geq 2)}$ .

 0.000665 0.000994 0.036296 0.963704 0.999335

## Poisson process

Definition.

If successive interarrival times of unpredictable events are independent random variables, with each following an exponential distribution with a common rate ${\displaystyle \lambda }$ , then the process of arrivals is a Poisson process with rate ${\displaystyle \lambda }$ .

There are several important properties for Poisson process.

Proposition. (Time to ${\displaystyle n}$ -th event in Poisson process) The time to ${\displaystyle n}$ -th event in a Poisson process follows the ${\displaystyle \operatorname {Gamma} (n,\lambda )}$  distribution.

Proof.

• The time to ${\displaystyle n}$ -th event is ${\displaystyle X_{1}+\dotsb +X_{n}}$ , with each following ${\displaystyle \operatorname {Exp} (\lambda )}$ .
• It suffices to prove that ${\displaystyle X_{1}+X_{2}\sim \operatorname {Gamma} (2,\lambda )}$ , and then the desired result follows by induction.
• {\displaystyle {\begin{aligned}f_{X_{1}+X_{2}}(z)&=\lambda ^{2}\int _{-\infty }^{\infty }\mathbf {1} \{\underbrace {z-x\geq 0} _{x\leq z}\}\mathbf {1} \{x\geq 0\}e^{-\lambda (z-x)}e^{-\lambda x}\,dx&{\text{by proposition about convolution of pdf's}}\\&=\lambda ^{2}\int _{0}^{z}e^{-\lambda (z{\cancel {-x}}){\cancel {-\lambda x}}}\,dx\\&=\lambda ^{2}\int _{0}^{z}e^{-\lambda z}\,dx\\&=\lambda ^{2}ze^{-\lambda z}\\&={\frac {\lambda ^{2}ze^{-\lambda z}}{\Gamma (2)}}&{\text{since }}\Gamma (2)=1!=1,\end{aligned}}}

which is the pdf of ${\displaystyle \Gamma (2,\lambda )}$ , as desired.

${\displaystyle \Box }$

Remark. The time to ${\displaystyle n}$ -th event is also the sum of the ${\displaystyle n}$  successive interarrival times before the ${\displaystyle n}$ -th event.

Proposition. (Number of arrivals within a fixed time interval) The number of arrivals within a fixed time interval of length ${\displaystyle t}$  follows the ${\displaystyle \operatorname {Pois} (\lambda t)}$  distribution.

Proof. For each nonnegative integer ${\displaystyle n}$ , let ${\displaystyle V}$  be the interarrival time between the ${\displaystyle n}$ -th and ${\displaystyle n+1}$ -th arrival, and ${\displaystyle W}$  be the time to ${\displaystyle n}$ th event, starting from the beginning of the fixed time interval (we can treat the start to be time zero because of the memoryless property). The joint pdf of ${\displaystyle (V,W)}$  is

{\displaystyle {\begin{aligned}f(v,w)&=f_{V}(v)f_{W}(w)&{\text{by independence}}\\&=\underbrace {(\lambda e^{-\lambda v})} _{{\text{pdf of}}\;\operatorname {Exp} (\lambda )}\underbrace {\left({\frac {\lambda ^{n}w^{n-1}e^{-\lambda w}}{(n-1)!}}\right)} _{{\text{pdf of}}\operatorname {Gamma} (n,\lambda )}.\end{aligned}}}

Let ${\displaystyle N}$  the number of arrivals within the fixed time interval. The pmf of ${\displaystyle N}$  is
{\displaystyle {\begin{aligned}\mathbb {P} (N=n)&=\mathbb {P} (W\leq t\cap \underbrace {V+W>t} _{V>t-W})\\&=\int _{0}^{t}\int _{t-w}^{\infty }\underbrace {f(v,w)} _{{\text{joint pdf of}}\;(V,W)}\,dv\,dw\\&=\int _{0}^{t}\int _{t-w}^{\infty }(\lambda e^{-\lambda v})\left({\frac {\lambda ^{n}w^{n-1}e^{-\lambda w}}{(n-1)!}}\right)\,dv\,dw\\&=\int _{0}^{t}{\frac {\lambda ^{n}w^{n-1}e^{-\lambda w}}{(n-1)!}}\int _{t-w}^{\infty }\lambda e^{-\lambda v}\,dv\,dw\\&={\frac {\lambda ^{n}}{(n-1)!}}\int _{0}^{t}w^{n-1}{\cancel {e^{-\lambda w}}}(0-(-e^{-\lambda (t{\cancel {-w}})}))\,dw\\&={\frac {\lambda ^{n}{\color {green}e^{-\lambda t}}}{(n-1)!}}\int _{0}^{t}w^{n-1}\,dw\\&={\frac {\lambda ^{n}e^{-\lambda t}}{(n-1)!}}\cdot \left({\frac {t^{n}}{n}}-0\right)\\&={\frac {e^{-\lambda t}(\lambda t)^{n}}{n!}}\end{aligned}}}

which is the pmf of ${\displaystyle \operatorname {Pois} (\lambda t)}$ . The result follows.

${\displaystyle \Box }$

Proposition. (Time to the first arrival with ${\displaystyle n}$  independent Poisson processes) Let ${\displaystyle T_{1},T_{2},\dotsc ,T_{n}}$  be independent random variables with ${\displaystyle T_{i}\sim \operatorname {Exp} (\lambda _{i})}$ , in which ${\displaystyle i=1,2,\dotsc ,n}$ . If we define ${\displaystyle T=\min\{T_{1},\dotsc ,T_{n}\}}$  (which is the time to the first arrival with ${\displaystyle n}$  independent Poisson processes), then ${\displaystyle T\sim \operatorname {Exp} (\lambda _{1}+\lambda _{2}+\cdots +\lambda _{n})}$ .

Proof. For each ${\displaystyle t>0}$ ,

{\displaystyle {\begin{aligned}&&\mathbb {P} (T>t)&=\mathbb {P} (T_{1}>t\cap \cdots \cap T_{n}>t)\\&&&=\mathbb {P} (T_{1}>t)\cdots \mathbb {P} (T_{n}>t)&{\text{by independence}}\\&&&=[1-(\underbrace {1-e^{-\lambda _{1}t}} _{{\text{cdf of}}\;\operatorname {Exp} (\lambda _{1})})]\cdots [1-(\underbrace {1-e^{-\lambda _{n}t}} _{{\text{cdf of}}\;\operatorname {Exp} (\lambda _{n})})]\\&&&=e^{-t(\lambda _{1}+\cdots +\lambda _{n})}\\&\Rightarrow &\mathbb {P} (T\leq t)&=1-e^{-t(\lambda _{1}+\cdots +\lambda _{n})}\\&\Rightarrow &T&\sim \operatorname {Exp} (\lambda _{1}+\lambda _{2}+\cdots +\lambda _{n})\end{aligned}}}

${\displaystyle \Box }$

Example. Suppose there are two service counters, counter A and B, with independent service times following the exponential distribution with rate ${\displaystyle \lambda }$ . In the past 10 minutes, John and Peter are being served at counter A and B respectively.

First, the time you need to wait to be served (i.e. the time for one of John and Peter leaves the counter) is the minimum value of the service time for John and Peter counting from now, which are independent and follow the exponential distribution with rate ${\displaystyle \lambda }$ . Thus, your waiting time follows the exponential distribution with rate ${\displaystyle \lambda +\lambda =2\lambda }$ .

Suppose now John leaves the counter A, and you are currently being served at counter A. Then, the probability that you leave the counter first is ${\displaystyle 1/2}$ , by memoryless property and symmetry (the chances that Peter and you leave the counter first are governed by the same chance mechanism), counterintuitively.

Exercise. Suppose the process of arrivals of car accidents is a Poisson process with unit rate. Let ${\displaystyle T_{i}}$  be the time to the ${\displaystyle i}$ -th car accidents, and ${\displaystyle X_{i}}$  be the interarrival time between the ${\displaystyle i-1}$ -th and ${\displaystyle i}$ -th accidents.

1 Which of the following is (are) true?

 ${\displaystyle T_{3}\sim \operatorname {Gamma} (3,1)}$ ${\displaystyle T_{3}\sim \operatorname {Exp} (1)}$ ${\displaystyle T_{3}\sim \operatorname {Exp} (3)}$ ${\displaystyle T_{3}\sim \operatorname {Pois} (1)}$ ${\displaystyle T_{3}\sim \operatorname {Pois} (3)}$

2 Which of the following is (are) true?

 ${\displaystyle X_{i}\sim \operatorname {Exp} (i)}$ ${\displaystyle X_{i}\sim \operatorname {Exp} (1)}$ ${\displaystyle X_{i}\sim \operatorname {Pois} (1)}$ ${\displaystyle X_{i}-X_{i-1}\sim \operatorname {Exp} (1)}$

3 Which of the following is (are) true?

 ${\displaystyle T_{i}-T_{i-1}\sim \operatorname {Exp} (1)}$ ${\displaystyle T_{i}-T_{i-1}\sim \operatorname {Gamma} (1,1)}$ ${\displaystyle T_{i}-T_{i-1}\sim \operatorname {Pois} (1)}$ The pmf of the number of arrivals within a fixed time interval of length ${\displaystyle t}$  is ${\displaystyle f(x)={\frac {e^{-t}t^{x}}{x!}}}$ .