Probability/Joint Distributions and Independence

Probability
Joint Distributions and Independence

Motivation

Suppose we are given a pmf of a discrete random variable $X$ and a pmf of a discrete random variable $Y$ . For example, $f_{X}(x)=(\mathbf {1} \{x=0\}+\mathbf {1} \{x=1\})/2\quad {\text{and}}\quad f_{Y}(y)=(\mathbf {1} \{y=0\}+\mathbf {1} \{y=2\})/2$ We cannot tell the relationship between $X$ and $Y$ with only such information. They may be related or not related.

For example, the random variable $X$ may be defined as $X=1$ if head comes up and $X=0$ otherwise from tossing a fair coin, and the random variable $Y$ may be defined as $Y=2$ if head comes up and $Y=0$ otherwise from tossing the coin another time. In this case, $X$ and $Y$ are unrelated.

Another possibility is that the random variable $Y$ is defined as $Y=2X$ if head comes up in the first coin tossing, and $Y=0$ otherwise. In this case, $X$ and $Y$ are related.

Yet, in the above two examples, the pmf of $X$ and $Y$ are exactly the same.

Therefore, to tell the relationship between $X$ and $Y$ , we define the joint cumulative distribution function, or joint cdf.

Joint distributions

Independence

Recall that multiple events are independent if the probability for the intersection of them equals the product of probabilities of each event, by definition. Since $\{X\in A\}$ is also an event, we have a natural definition of independence for random variables as follows:

Definition. (Independence of random variables) Random variables $X_{1},X_{2},\dotsc ,X_{n}$ are independent if $\mathbb {P} (X_{1}\in A_{1}\cap \cdots \cap X_{n}\in A_{n})=\mathbb {P} (X_{1}\in A_{1})\cdots \mathbb {P} (X_{n}\in A_{n})$ for each $n$ and for each subset $A_{1},A_{2},\dotsc ,A_{n}\subseteq \mathbb {R}$ .

Remark. Under this condition, the events $\{X_{1}\in A_{1}\},\dotsc ,\{X_{n}\in A_{n}\}$ are independent.

Theorem. (Alternative condition for independence of random variables) Random variables $X_{1},X_{2},\dotsc ,X_{n}$ are independent if and only if the joint cdf of $(X_{1},\dotsc ,X_{n})$ $F(x_{1},\dotsc ,x_{n})=F_{X_{1}}(x_{1})\cdots F_{X_{n}}(x_{n})$ or the joint pdf or pmf of $(X_{1},\dotsc ,X_{n})$ $f(x_{1},\dotsc ,x_{n})=f_{X_{1}}(x_{1})\cdots f_{X_{n}}(x_{n})$ for each $x_{1},\dotsc ,x_{n}\in \mathbb {R}$ .

Proof. Partial:

Only if part: If random variables $X_{1},X_{2},\dotsc ,X_{n}$ are independent, $\mathbb {P} (X_{1}\in A_{1}\cap \cdots \cap X_{n}\in A_{n})=\mathbb {P} (X_{1}\in A_{1})\cdots \mathbb {P} (X_{n}\in A_{n})$ for each $n$ and for each subset $A_{1},A_{2},\dotsc ,A_{n}\subseteq \mathbb {R}$ . Setting $A_{1}=(-\infty ,x_{1}),\dotsc ,A_{n}=(-\infty ,x_{n})$ , and we have $\mathbb {P} (X_{1}\leq x_{1}\cap \cdots \cap X_{n}\leq x_{n})=\mathbb {P} (X_{1}\leq x_{1})\cdots \mathbb {P} (X_{n}\leq x_{n})\implies F(x_{1},\dotsc ,x_{n})=F_{X_{1}}(x_{1})\cdots F_{X_{n}}(x_{n}).$ Thus, we obtain the result for the joint cdf part.

For the joint pdf part, ${\begin{aligned}&&F(x_{1},\dotsc ,x_{n})&=F_{X_{1}}(x_{1})\cdots F_{X_{n}}(x_{n})\\&\Rightarrow &{\frac {\partial ^{n}}{\partial x_{1}\cdots \partial x_{n}}}F(x_{1},\dotsc ,x_{n})&={\frac {\partial ^{n}}{\partial x_{1}\cdots \partial x_{n}}}\left(F_{X_{1}}(x_{1})\cdots F_{X_{n}}(x_{n})\right)\\&\Rightarrow &f(x_{1},\dotsc ,x_{n})&=f_{X_{n}}(x_{n}){\frac {\partial ^{n}}{\partial x_{1}\cdots \partial x_{n-1}}}\left(F_{X_{1}}(x_{1})\cdots F_{X_{n-1}}(x_{n-1})\right)\\&&&=f_{X_{n}}(x_{n})f_{X_{n-1}}(x_{n-1}){\frac {\partial ^{n}}{\partial x_{1}\cdots \partial x_{n-2}}}\left(F_{X_{1}}(x_{1})\cdots F_{X_{n-2}}(x_{n-2})\right)\\&&&=\cdots =f_{X_{1}}(x_{1})\cdots f_{X_{n}}(x_{n})\end{aligned}}$

$\Box$

Remark.

That is, if joint cdf (joint pdf (pmf)) can be factorized as the product of marginal cdf's (marginal pdf's (pmf's))
Actually, if we can factorize the joint cdf or joint pdf or joint pmf as the product of some functions in each of the variables, then the condition is also satisfied.

Example. The joint pdf of two independent exponential random variables with rate $\lambda$ , $X$ and $Y$ is $f(x,y)=(\mathbf {1} \{x\geq 0\}\lambda e^{-\lambda x})(\mathbf {1} \{y\geq 0\}\lambda e^{-\lambda y})=\mathbf {1} \{x,y\geq 0\}\lambda ^{2}e^{-\lambda (x+y)}.$ (Random variables $X$ and $Y$ are said to be independent and identically distributed (i.i.d.) in this case)

In general, the joint pdf of $n$ independent exponential random variables with rate $\lambda$ , $X_{1},\dotsc ,X_{n}$ is $f(x_{1},\dotsc ,x_{n})=\mathbf {1} \{x_{1},\dotsc ,x_{n}\geq 0\}\lambda ^{n}e^{-\lambda (x_{1}+\cdots +x_{n})}.$ (Random variables $X_{1},\dotsc ,X_{n}$ are also i.i.d. in this case)

On the other hand, if the joint pdf of two random variables $V$ and $W$ are $f(v,w)=\mathbf {1} \{w\leq 2-2v\},$ random variables $V$ and $W$ are dependent since the joint pdf cannot be factorized as the product of marginal pdf's.

Exercise. Let $X,Y,Z$ be jointly continuous random variables. Consider a joint pdf of $(X,Y,Z)$ : $f(x,y,z)=\mathbf {1} \{x,y,z\geq 0\}\mathbf {1} \{x+y+z/k\leq 1\}.$

Consider another joint pdf of $(X,Y,Z)$ : $f(x,y,z)=\mathbf {1} \{x,y,z\geq 0\}\mathbf {1} \{y\leq 1-x\}\mathbf {1} \{z\leq k\}$

Consider another joint pdf of $(X,Y,Z)$ : $f(x,y,z)=kxyz\mathbf {1} \{x,y\in [0,1]\}\mathbf {1} \{z\in [0,2]\}.$

Proposition. (Independence of events concerning disjoint sets of independent random variables) Suppose random variables $X_{1},X_{2},\dotsc$ are independent. Then, for each $r<s<t<\cdots$ and fixed functions $f_{1},f_{2},f_{3},\dotsc$ , the random variables $Y_{1}=f_{1}(X_{1},\dotsc ,X_{\color {red}r}),\quad Y_{2}=f_{2}(X_{{\color {red}r}+1},\dotsc ,X_{\color {blue}s}),\quad Y_{3}=f_{3}(X_{{\color {blue}s}+1},\dotsc ,X_{t}),\dotsc$ are independent.

Example. Suppose $X_{1},X_{2},X_{3},X_{4}$ are independent Bernoulli random variables with success probability $p$ . Then, $Y_{1}=X_{1}+X_{2}$ and $Y_{2}=X_{3}-X_{4}$ are also independent.

On the other hand, $Y_{1}=X_{1}+X_{2}$ and $Y_{2}=2-X_{3}-X_{2}$ are not independent. A counter-example to the independence is $\underbrace {\mathbb {P} (Y_{1}=2\cap Y_{2}=2)} _{0}\neq \underbrace {\mathbb {P} (Y_{1}=2)\mathbb {P} (Y_{2}=2)} _{{\text{may}}\;\neq 0}.$ Left hand side equals zero since $Y_{1}=2\implies X_{2}=1$ , but $Y_{2}=2\implies X_{2}=0$ .

Right hand side may not equal zero since $\mathbb {P} (Y_{1}=2)=\mathbb {P} (X_{1}=1\cap X_{2}=1)=p^{2}$ , and $\mathbb {P} (Y_{2}=2)=\mathbb {P} (X_{2}=0\cap X_{3}=0)=(1-p)^{2}$ . We can see that $p^{2}(1-p^{2})$ may not equal zero.

Exercise.

Let $X_{1},\dotsc ,X_{n}$ be i.i.d. random variables, and $Y_{1},\dotsc ,Y_{n}$ also be i.i.d. random variables. Which of the following is (are) true?

	$\sum _{i=1}^{n-1}X_{i}$ and $X_{n}$ are independent.
	$X_{1}^{X_{2}}$ and $X_{3}^{X_{4}}$ are independent.
	$\prod _{i=1}^{n}X_{i}$ and $\prod _{i=1}^{n}Y_{i}$ are independent.
	$X_{1}+X_{2}+X_{3}$ and $Y_{1}+Y_{2}+Y_{3}$ are independent if $X_{1},\dotsc ,X_{n},Y_{1}$ are independent.

Sum of independent random variables (optional)

In general, we use joint cdf, pdf or pmf to determine the distribution of sum of independent random variables by first principle. In particular, there are some interesting results related to the distribution of sum of independent random variables.

Sum of independent random variables

Proposition. (Convolution of cdf's and pdf's) If the cdf of independent random variables $X$ and $Y$ are $F_{X}$ and $F_{Y}$ respectively, then the cdf of $X+Y$ is ${\color {red}F}_{X+Y}(z)=\int _{-\infty }^{\infty }{\color {red}F}_{X}(z-y)f_{Y}(y)\,dy,$ and the pdf of $X+Y$ is ${\color {blue}f}_{X+Y}(z)=\int _{-\infty }^{\infty }{\color {blue}f}_{X}(z-y)f_{Y}(y)\,dy.$

Proof.

Continuous case:

cdf: ${\begin{aligned}F_{X+Y}(z)&=\mathbb {P} (X+Y\leq z)&{\text{by definition}}\\&=\iint _{x+y\leq z}f_{X}(x)f_{Y}(y)\,dx\,dy&{\text{by definition and independence}}\\&=\int _{-\infty }^{\infty }\int _{-\infty }^{z-y}f_{X}(x)f_{Y}(y)\,dx\,dy&{\text{by Fubini's theorem}}\\&=\int _{-\infty }^{\infty }\left(\int _{-\infty }^{z-y}f_{X}(x)\,dx\right)f_{Y}(y)\,dy\\&=\int _{-\infty }^{\infty }F_{X}(z-y)f_{Y}(y)\,dy&{\text{by definition}}.\end{aligned}}$

/\                                     
//\ y                                
///\|
////*
////|\
////|/\
////|//\ x+y=z <=> x=z-y
////|///\
////|////\
----*-----*--------------- x 
////|//////\
////|///////\

-->: -infty to z-y
^
|: -infty to infty
 
*--*
|//| : x+y <= z
*--*

pdf: ${\begin{aligned}f_{X+Y}(z)&={\frac {d}{dz}}\int _{-\infty }^{\infty }F_{X}(z-y)f_{Y}(y)\,dy\\&=\int _{-\infty }^{\infty }{\frac {d}{dz}}F_{X}(z-y)f_{Y}(y)\,dy&{\text{by fundamental theorem of calculus}}\\&=\int _{-\infty }^{\infty }f_{X}(z-y)f_{Y}(y)\,dy.\end{aligned}}$

$\Box$

Remark.

The cdf and pdf in this case are actually the convolution of the cdf's $F_{X}$ and $F_{Y}$ , and pdf's (pmf's) $f_{X}$ and $f_{Y}$ respectively, and hence the name of the proposition.

Example.

Let the pdf of $X$ be $f(x)=\mathbf {1} \{0\leq {\color {blue}x}\leq 1\}$ .
Let the pdf of $Y$ be $f(y)=\mathbf {1} \{-1\leq y\leq 0\}$ .
Then, the pdf of $X+Y$ is

${\begin{aligned}f_{X+Y}(z)&=\int _{-\infty }^{\infty }\mathbf {1} \{0\leq {\color {blue}z-y}\leq 1\}\mathbf {1} \{-1\leq y\leq 0\}\,dy\\&=\int _{-\infty }^{\infty }\mathbf {1} \{z-1\leq y\leq z\}\mathbf {1} \{-1\leq y\leq 0\}\,dy\\&=\mathbf {1} \{0\leq z\leq 1\}\int _{z-1}^{0}\,dy+\mathbf {1} \{-1\leq z\leq 0\}\int _{z}^{0}\,dy\\&=\mathbf {1} \{0\leq z\leq 1\}(1-z)-z\mathbf {1} \{-1\leq z\leq 0\}.\end{aligned}}$ Graphically, the pdf looks like

        y
        |
        |
        |
     *  * 1
      \ |\  
  y=-z \| \ y=1-z
-----*--*--*----- z
    -1 O|  1   
        |
     -1 *
        |

Exercise.

Proposition. (Convolution of pmf's) If the pmf of independent random variables $X$ and $Y$ are $f_{X}$ and $f_{Y}$ respectively, then the pmf of $X+Y$ is $f_{X+Y}(n)=\sum _{k=0}^{n}f_{X}(k)f_{Y}(n-k)=f_{X}(0)f_{Y}(n)+f_{X}(1)f_{Y}(n-1)+\dotsb +f_{X}(n-1)f_{Y}(1)+f_{X}(n)f_{Y}(0).$

Proof.

Let $E_{i}=\{X=i\}\cap \{Y=n-i\}$ .
For each nonnegative integer $n$ ,

$\{X+Y=n\}=E_{0}\cup E_{1}\cup \dotsb \cup E_{n}.$

Since $E_{i}\cap E_{j}=\varnothing$ for each $i\neq j$ , $E_{i}$ 's are pairwise disjoint.
Hence, by extended P3 and independence of $X$ and $Y$ , $\mathbb {P} (X+Y=n)=\mathbb {P} (X=0)\mathbb {P} (Y=n)+\mathbb {P} (X=1)\mathbb {P} (Y=n-1)+\dotsb +\mathbb {P} (X=n)\mathbb {P} (Y=0).$
The result follows by definition.

$\Box$

Example. We roll a fair six-faced dice twice (independently). Then, the probability for the sum of the numbers coming up to be 7 is $\underbrace {(1/6)(1/6)+\dotsb +(1/6)(1/6)} _{6{\text{ times}}}=1/6$ .

Proof. Let $X$ and $Y$ be the first and second number coming up respectively. The desired probability is $\mathbb {P} (X+Y=7)=\underbrace {\mathbb {P} (X=1)} _{1/6}\underbrace {\mathbb {P} (Y=6)} _{1/6}+\dotsb +\underbrace {\mathbb {P} (X=6)} _{1/6}\underbrace {\mathbb {P} (Y=1)} _{1/6}=\underbrace {(1/6)(1/6)+\dotsb +(1/6)(1/6)} _{6{\text{ times}}}=1/6.$

$\Box$

Exercise.

Proposition. (Sum of independent Poisson r.v.'s) If $X_{1}\sim \operatorname {Pois} (\lambda _{1}),\dotsc ,X_{n}\sim \operatorname {Pois} (\lambda _{n})$ and $X_{1},\dotsc ,X_{n}$ are independent, then $X_{1}+\dotsb +X_{n}\sim \operatorname {Pois} (\lambda _{1}+\dotsb +\lambda _{n})$ .

Proof.

The pmf of $X_{1}+X_{2}$ is

${\begin{aligned}f_{X_{1}+X_{2}}(a)&=\sum _{k=0}^{n}{\frac {e^{-\lambda _{1}}\lambda _{1}^{k}}{k!}}\cdot {\frac {e^{-\lambda _{2}}\lambda _{2}^{n-k}}{(n-k)!}}&{\text{by the proposition about convolution of pmf's}}\\&=e^{-\lambda _{1}-\lambda _{2}}\sum _{k=0}^{n}{\frac {\lambda _{1}^{k}\cdot \lambda _{2}^{n-k}}{k!(n-k)!}}\\&={\frac {e^{-(\lambda _{1}+\lambda _{2})}}{n!}}\underbrace {\sum _{k=0}^{n}{\frac {n!}{k!(n-k)!}}\cdot \lambda _{1}^{k}\cdot \lambda _{2}^{n-k}} _{=(\lambda _{1}+\lambda _{2})^{n}}&{\text{ by binomial theorem}}.\\\end{aligned}}$

This pmf as the pmf of $\operatorname {Pois} (\lambda _{1}+\lambda _{2})$ , and so $X_{1}+X_{2}\sim \operatorname {Pois} (\lambda _{1}+\lambda _{2})$ .
We can extend this result to $n$ Poisson r.v.'s by induction.

$\Box$

Example. There are two service counters, for which the first one receives $X\sim \operatorname {Pois} (3)$ enquiries per hour, while the second one receives $Y\sim \operatorname {Pois} (4)$ enquiries per hour. Given that $X$ and $Y$ are independent, the number of enquiries received by the two counters per hour follows $\operatorname {Pois} (3+4)=\operatorname {Pois} (7)$ .

Proof.

The number of enquiries received by the two counters per hour is $X+Y$ .
Then, the result follows from the proposition about sum of Poisson r.v.'s.

$\Box$

Exercise.

	$\operatorname {Pois} (3)$
	$\operatorname {Pois} (4)$
	$\operatorname {Pois} (6)$
	$\operatorname {Pois} (7)$
	$\operatorname {Pois} (8)$

Order statistics

Definition. (Order statistics) Let $X_{1},\dotsc ,X_{n}$ be $n$ i.i.d. r.v.'s (each with cdf $F(x)$ ). Define $X_{(1)},X_{(2)},\dotsc ,X_{(n)}$ be the smallest, second smallest, ..., largest of $X_{1},X_{2},\dotsc ,X_{n}$ . Then, the ordered values $X_{(1)}\leq X_{(2)}\leq \dotsb \leq X_{(n)}$ is the order statistics.

Proposition. (Cdf of order statistics) The cdf of $X_{(k)}$ ( $k$ is an integer such that $1\leq k\leq n$ ) is $F_{X_{(k)}}({\color {blue}x})=\sum _{j=k}^{n}{\binom {n}{j}}(F({\color {blue}x}))^{j}{\big (}1-F({\color {blue}x}){\big )}^{n-j}.$

Proof.

Consider the event $\{X_{(k)}\leq {\color {blue}x}\}$ .

                          Possible positions of x
                      |<--------------------->
    *---*----...------*----*------...--------*
X  (1)  (2)          (k)  (k+1)             (n)
                      |----------------------> when x moves RHS like this, >=k X_i are at the LHS of x

We can see from the above figure that $\{X_{(k)}\leq {\color {blue}x}\}=\{{\text{at least }}k{\text{ of the }}X_{i}{\text{'s are }}\leq {\color {blue}x}\}$ .
Let no. of $X_{i}$ 's that are less than or equal to ${\color {blue}x}$ be $N$ .
Since $N\sim \operatorname {Binom} (n,\mathbb {P} (X_{i}\leq {\color {blue}x})){\overset {\text{ def }}{=}}\operatorname {Binom} (n,F({\color {blue}x}))$ (because for each $X_{i}$ , we can treat $X_{i}\leq x$ and $X_{i}>x$ be the two outcomes in a Bernoulli trial),
The cdf is

$\mathbb {P} (X_{(k)}\leq {\color {blue}x})=\mathbb {P} (N\geq k)=\sum _{j=k}^{n}{\binom {n}{j}}(F({\color {blue}x}))^{j}{\big (}1-F({\color {blue}x}){\big )}.$

$\Box$

Example. Let $X_{1},X_{2},X_{3}$ be i.i.d. r.v.'s following $\operatorname {Exp} (2)$ . Then, the cdf of $X_{(2)}$ is $\sum _{j=2}^{3}{\binom {3}{j}}(F(x))^{j}(1-F(x))^{3-j}=\mathbf {1} \{x\geq 0\}\left({\binom {3}{2}}(1-e^{-2x})^{2}(e^{-2x})+{\binom {3}{3}}(1-e^{-2x})^{3}\right)=\mathbf {1} \{x\geq 0\}\left(3(1-e^{-2x})^{2}(e^{2x})+(1-e^{-2x})^{3}\right).$

Exercise.

	$f_{X}(x)=(\mathbf {1} \{x=0\}+2\cdot \mathbf {1} \{x=1\})/3$
	$f_{X}(x)=(\mathbf {1} \{x=1\}+2\cdot \mathbf {1} \{x=0\})/3$
	$f_{X}(x)=2/3$
	$f_{X}(x)=(\mathbf {1} \{x=1\}+\mathbf {1} \{x=0\})/2$

	$f(x,y)=\mathbf {1} \{x,y\in [0,2]\}{\frac {3xy^{2}}{8}}$
	$f(x,y)=\mathbf {1} \{x,y\in [0,2]\}{\frac {3xy^{2}}{16}}$
	$f(x,y)=\mathbf {1} \{x,y\in [0,2]\}{\frac {xy^{2}}{16}}$
	$f(x,y)=\mathbf {1} \{x,y\in [0,2]\}{\frac {xy^{2}}{8}}$

	$\mathbf {1} \{0\leq y\leq 2\}{\frac {3}{8}}y^{2}$
	$\mathbf {1} \{0\leq x\leq 2\}{\frac {1}{2}}x$
	$\mathbf {1} \{0\leq y\leq 2\}{\frac {1}{12}}y^{3}$
	$\mathbf {1} \{0\leq x\leq 2\}{\frac {1}{8}}x^{2}$

	0.000665
	0.000994
	0.036296
	0.963704
	0.999335

	$T_{3}\sim \operatorname {Gamma} (3,1)$
	$T_{3}\sim \operatorname {Exp} (1)$
	$T_{3}\sim \operatorname {Exp} (3)$
	$T_{3}\sim \operatorname {Pois} (1)$
	$T_{3}\sim \operatorname {Pois} (3)$

Probability/Joint Distributions and Independence

Contents

Motivation

Joint distributions

Independence

Sum of independent random variables (optional)

Order statistics

Poisson process

	$f(x,y)=(1/9)(\mathbf {1} \{(x,y)=(0,0)\}+2\cdot \mathbf {1} \{(x,y)=(0,1)\}+2\cdot \mathbf {1} \{(x,y)=(1,0)\}+4\cdot \mathbf {1} \{(x,y)=(1,1)\}$ )
	$f(x,y)=(1/9)(4\cdot \mathbf {1} \{(x,y)=(0,0)\}+2\cdot \mathbf {1} \{(x,y)=(0,1)\}+2\cdot \mathbf {1} \{(x,y)=(1,0)\}+\mathbf {1} \{(x,y)=(1,1)\}$ )
	$f(x,y)=(2/9)(\mathbf {1} \{(x,y)=(0,0)+\mathbf {1} \{(x,y)=(0,1)\}+\mathbf {1} \{(x,y)=(1,0)\}+\mathbf {1} \{(x,y)=(1,1)\}$ )

	$X_{i}\sim \operatorname {Exp} (i)$
	$X_{i}\sim \operatorname {Exp} (1)$
	$X_{i}\sim \operatorname {Pois} (1)$
	$X_{i}-X_{i-1}\sim \operatorname {Exp} (1)$

	$T_{i}-T_{i-1}\sim \operatorname {Exp} (1)$
	$T_{i}-T_{i-1}\sim \operatorname {Gamma} (1,1)$
	$T_{i}-T_{i-1}\sim \operatorname {Pois} (1)$
	The pmf of the number of arrivals within a fixed time interval of length $t$ is $f(x)={\frac {e^{-t}t^{x}}{x!}}$ .

	yes
	no

	yes
	no

	yes
	no