# Probability/Conditional Probability

## Motivation

In some situations we need a new kind of probability.

Consider the Monty Hall problem:

Suppose you're on a game show, and you're given the choice of three doors: Behind one door is a car; behind the others, goats. You pick a door, say No. 1, and the host, who knows what's behind the doors, opens another door, say No. 3, which has a goat. He then says to you, "Do you want to pick door No. 2?" Is it to your advantage to switch your choice? (from wikipedia)

[1]

Illustration of the situation

There are some (implicit) assumptions:

• The host must open a door that is not picked by us.
• The most must open a door with a goat, but not car behind it.

To determine whether we have advantage to switch our choice, we need to know the probability that the car is behind the door after switching our choice, given that the goat is behind door No. 3.

This probability is a conditional probability (the conditions are the host opens door No. 3 and we pick door No. 1 ), and we will discuss the value of this probability later in this chapter.

## Definition

Let's motivate the definition of conditional probability by considering the following Venn diagram.

*-------------------------*
|        *---------*      |
|        |   B\A   |      |              *---------*
|   *----*----*    |<-- B |              |   B\A   |
|   |    |    |    |      |    ---->     *----*    |
|   |    |AnB |    |      |              |    |    | <--- B=Omega'
|   |    *----*----*      |              |AnB |    |
|   | A\B     | <-- A     | <--- Omega   *----*----*
|   *---------*           |
*-------------------------*


Without any condition, the probability of ${\textstyle A}$  is illustrated by the rectangular region consisting of both ${\textstyle A\setminus B}$  and ${\textstyle A\cap B}$ . In the Venn diagram, the ratio of the area of the region ${\textstyle A}$  to the area of the whole sample space ${\textstyle \Omega }$  is the ratio of ${\textstyle \mathbb {P} (A)}$  to ${\textstyle \mathbb {P} (\Omega ){\overset {\text{ P2 }}{=}}1}$  (or simply ${\textstyle \mathbb {P} (A)}$ ). So,

${\displaystyle {\frac {{\text{area of }}A}{{\text{area of }}\Omega }}={\frac {\mathbb {P} (A)}{\mathbb {P} (\Omega )}}=\mathbb {P} (A).}$

If we are given ${\textstyle B}$  (implying that ${\textstyle \mathbb {P} (B)>0}$ ), then we can regard ${\textstyle B}$  as the new sample space (RHS), say ${\textstyle B{\overset {\text{ def }}{=}}\Omega '}$ . Then, intuitively, the probability of ${\textstyle A}$  given ${\textstyle B}$  should be the ratio of area occupied by ${\textstyle A}$  in the region for ${\textstyle B=\Omega '}$  (i.e. area of ${\textstyle A\cap B}$ ) to the area of ${\textstyle B{\overset {\text{ def }}{=}}\Omega '}$ . So, the probability of ${\textstyle A}$  given ${\textstyle B}$  should be

${\displaystyle {\frac {{\text{area of }}A\cap B}{{\text{area of }}B}}={\frac {{\text{area of }}A\cap B/{\text{area of }}\Omega }{{\text{area of }}B/{\text{area of }}\Omega }}={\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}},}$

which is exactly the definition of conditional probability.

Definition. (Conditional probability) Conditional probability of event ${\textstyle A}$  given event ${\textstyle B}$  is

${\displaystyle \mathbb {P} (A|B){\overset {\text{ def }}{=}}{\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}},}$

assuming ${\textstyle \mathbb {P} (B)>0}$ .

Remark.

• The assumption of ${\textstyle \mathbb {P} (B)>0}$  prevent the above formula gives an undefined value.
• Also, it does not make sense to consider the probability of an event conditional on an impossible event, since an impossible event can never happen, then why can it be given to be happened?
• It follows that ${\textstyle \mathbb {P} (A\cap B)=\mathbb {P} (A|B)\mathbb {P} (B)}$  for each event ${\textstyle A}$  and ${\textstyle B}$  with ${\textstyle \mathbb {P} (B)>0}$  (simplified multiplication rule of probability).

Example. (Conditional probability is a probability) Conditional probability is a probability, since it satisfies all 3 probability axioms.

Proof.

(P1) since the numerator and denominator in the formula are both probabilities (i.e. they satisfy the 3 probability axioms), both are nonnegative. In particular, the denominator is positive, by the assumption. It follows that the fraction is positive.
(P2) it suffices to prove that ${\textstyle \mathbb {P} (\Omega |B)=1}$  for each event ${\textstyle B}$  with ${\textstyle \mathbb {P} (B)>0}$ , which is true since ${\textstyle \mathbb {P} (\Omega |B){\overset {\text{ def }}{=}}{\frac {\mathbb {P} (\Omega \cap B)}{\mathbb {P} (B)}}={\frac {\mathbb {P} (B)}{\mathbb {P} (B)}}=1}$  (${\textstyle B\subseteq \Omega }$  by definition of event).
(P3) for each infinite sequence of disjoint events ${\textstyle A_{1},A_{2},\dotsc }$ ,
${\displaystyle \mathbb {P} \left(\bigcup _{i=1}^{\infty }A_{i}{\bigg |}B\right)={\frac {\mathbb {P} {\big (}(A_{1}\cup A_{2}\cup \dotsb )\cap B{\big )}}{\mathbb {P} (B)}}={\frac {\mathbb {P} {\big (}(A_{1}\cap B)\cup (A_{2}\cap B)\cup \dotsb {\big )}}{\mathbb {P} (B)}}{\overset {\text{ P3 }}{=}}{\frac {\sum _{i=1}^{\infty }\mathbb {P} (A_{i}\cap B)}{\mathbb {P} (B)}}=\sum _{i=1}^{\infty }{\big (}\mathbb {P} (A_{i}\cap B)/\underbrace {\mathbb {P} (B)} _{{\text{not involving }}i}{\big )}{\overset {\text{ def }}{=}}\sum _{i=1}^{\infty }\mathbb {P} (A_{i}|B).}$

${\displaystyle \Box }$

Example. (Special cases for conditional probability) If ${\textstyle B\subseteq A}$  (${\textstyle B}$  implies ${\textstyle A}$ ), then ${\textstyle \mathbb {P} (A|B)={\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}}={\frac {\mathbb {P} (B)}{\mathbb {P} (B)}}=1}$ , as expected (since given ${\textstyle B}$ , which implies ${\textstyle A}$ , ${\textstyle A}$  is certain).

If ${\textstyle A}$  and ${\textstyle B}$  are disjoint, ${\textstyle \mathbb {P} (A|B)={\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}}={\frac {\mathbb {P} (\varnothing )}{\mathbb {P} (B)}}={\frac {0}{\mathbb {P} (B)}}=0}$ .

Example. (Even and prime numbers) We roll a fair five-faced dice one time. Let ${\textstyle E}$  and ${\textstyle P}$  be the events that even number comes up and prime number comes up respectively. Then, ${\textstyle \mathbb {P} (E|P)={\frac {\mathbb {P} (E\cap P)}{\mathbb {P} (P)}}={\frac {1/6}{3/6}}={\frac {1}{3}}}$ , and ${\textstyle \mathbb {P} (P|E)={\frac {\mathbb {P} (P\cap E)}{\mathbb {P} (E)}}={\frac {1/6}{2/6}}={\frac {1}{2}}}$ .

Proof. The result follows from observing that among 1,2,3,4 and 5,

• there are 3 prime numbers, namely 2,3 and 5;
• there are 2 even numbers, namely 2 and 4;
• there is 1 number that is both prime and even, namely 2.

${\displaystyle \Box }$

Exercise. Suppose the dice is now six-faced.

1 Calculate ${\textstyle \mathbb {P} (P|E)}$ .

 ${\textstyle {\frac {1}{6}}}$ ${\textstyle {\frac {1}{3}}}$ ${\textstyle {\frac {1}{2}}}$ ${\textstyle 1}$ None of the above.

2 Calculate ${\textstyle \mathbb {P} (E|P)}$ .

 ${\textstyle {\frac {1}{6}}}$ ${\textstyle {\frac {1}{3}}}$ ${\textstyle {\frac {1}{2}}}$ ${\textstyle 1}$ None of the above.

3 Calculate ${\textstyle \mathbb {P} (P|P\cap E)}$ .

 ${\textstyle {\frac {1}{6}}}$ ${\textstyle {\frac {1}{3}}}$ ${\textstyle {\frac {1}{2}}}$ ${\textstyle 1}$ None of the above.

4 Calculate ${\textstyle \mathbb {P} (P|P\cup E)}$ .

 ${\textstyle {\frac {1}{6}}}$ ${\textstyle {\frac {1}{3}}}$ ${\textstyle {\frac {1}{2}}}$ ${\textstyle 1}$ None of the above.

Example. Amy rolls two fair six-faced dice, with one colored red and another colored blue (so that they are distinguishable), without looking at the dice. After Amy rolls the two dice, Bob tells Amy that there is at least one 6 coming up (assume Bob tells the truth). Then, the probability that 6 comes up for both dice is ${\textstyle {\frac {1/36}{1/6+1/6-1/36}}={\frac {1}{11}}}$  after hearing the information from Bob.

Proof. The condition is there is at least one 6 coming up, and the probability of this condition can be calculated by inclusion-exclusion principle:

{\displaystyle {\begin{aligned}\mathbb {P} (\{{\text{at least one 6 comes up}}\})&=\mathbb {P} (\{{\text{6 comes up for the red dice}}\}\cup \{{\text{6 comes up for the blue dice}}\})\\&=\mathbb {P} (\{{\text{6 comes up for the red dice}}\})+\mathbb {P} (\{{\text{6 comes up for the blue dice}}\})\\&\;-\mathbb {P} (\underbrace {\{{\text{6 comes up for the red dice}}\}\cap \{{\text{6 comes up for the blue dice}}} _{=\{{\text{6 comes up for both dice}}\}}\})\\&={\frac {1}{6}}+{\frac {1}{6}}-{\frac {1}{36}}\\&={\frac {11}{36}}.\end{aligned}}}

Also,
${\displaystyle \mathbb {P} (\{{\text{at least one 6 comes up}}\}\cap \{{\text{ 6 comes up for both dice}}\})=\mathbb {P} (\{{\text{6 comes up for both dice}}\})={\frac {1}{36}}.}$

The result follows.

${\displaystyle \Box }$

Exercise.

Calculate the probability again if the blue dice is colored red, such that the two dice is not distinguishable.

 ${\textstyle {\frac {1}{12}}}$ ${\textstyle {\frac {1}{11}}}$ ${\textstyle {\frac {1}{6}}}$ ${\textstyle {\frac {7}{12}}}$ None of the above.

Chris claims that the desired probability in the example should be ${\textstyle 1/6}$ , since given there is at least one 6 coming up, we know that 6 comes up in a dice. Considering the another dice, which has six equally likely possible outcomes for the number coming up, namely 1,2,3,4,5 and 6, and we can regard this as the new sample space. The desired event is that 6 comes up for both dice, and thus the desired outcome for the another dice is 6. It follows that the probability is ${\textstyle 1/6}$ , since the number of outcomes in the desired event is 1, while that in the sample space is 6.

We know that the correct answer is ${\textstyle 1/11}$ , and not ${\textstyle 1/6}$ , but why is this claim wrong? (Credit: the idea of this question comes from this discussion)

Remark.

• denoting the numbers coming up in the form of ordered pair ${\textstyle ({\color {red}a},{\color {blue}b})}$ , in which ${\textstyle {\color {red}a}}$  is the number coming up for the red dice, and ${\textstyle {\color {blue}b}}$  is the number coming up for the blue dice, then

${\displaystyle \{{\text{at least one 6 comes up}}\}=\{({\color {red}1},{\color {blue}6}),({\color {red}2},{\color {blue}6}),({\color {red}3},{\color {blue}6}),({\color {red}4},{\color {blue}6}),({\color {red}5},{\color {blue}6}),({\color {red}6},{\color {blue}6}),({\color {red}6},{\color {blue}1}),({\color {red}6},{\color {blue}2}),({\color {red}6},{\color {blue}3}),({\color {red}6},{\color {blue}4}),({\color {red}6},{\color {blue}5}),({\color {red}6},{\color {blue}6})\},}$

consisting of 11 equally likely outcomes, and among these, only ${\textstyle ({\color {red}6},{\color {blue}6})}$  is the desired outcome, and so the probability is ${\textstyle 1/11}$ , regarding the above set as the new sample space
• this matches with the motivation for the definition of conditional probability
• if Bob tells Amy that 6 comes up for the red dice, then the sample space is ${\textstyle \{({\color {red}6},{\color {blue}1}),({\color {red}6},{\color {blue}2}),({\color {red}6},{\color {blue}3}),({\color {red}6},{\color {blue}4}),({\color {red}6},{\color {blue}5}),({\color {red}6},{\color {blue}6})\}}$ , consisting of 6 (equally likely) outcomes

Proposition. (Multiplication rule of probability) For each event ${\textstyle {\color {red}E_{1}},{\color {blue}E_{2}},\dotsc ,{\color {darkgreen}E_{n}}}$ ,

${\displaystyle \mathbb {P} ({\color {red}E_{1}}\cap {\color {blue}E_{2}}\cap \dotsb \cap {\color {darkgreen}E_{n}})=\mathbb {P} ({\color {red}E_{1}})\mathbb {P} ({\color {blue}E_{2}}|{\color {red}E_{1}})\mathbb {P} ({\color {purple}E_{3}}|{\color {red}E_{1}}\cap {\color {blue}E_{2}})\dotsb \mathbb {P} ({\color {darkgreen}E_{n}}|{\color {red}E_{1}}\cap \dotsb \cap {\color {brown}E_{n-1}})}$

Proof.

${\displaystyle \mathbb {P} ({\color {red}E_{1}})\mathbb {P} ({\color {blue}E_{2}}|{\color {red}E_{1}})\mathbb {P} ({\color {purple}E_{3}}|{\color {red}E_{1}}\cap {\color {blue}E_{2}})\dotsb \mathbb {P} ({\color {darkgreen}E_{n}}|{\color {red}E_{1}}\cap \dotsb \cap {\color {brown}E_{n-1}}){\overset {\text{ def }}{=}}{\cancel {\mathbb {P} ({\color {red}E_{1}})}}\cdot {\frac {\cancel {\mathbb {P} ({\color {blue}E_{2}}\cap {\color {red}E_{1}})}}{\cancel {\mathbb {P} ({\color {red}E_{1}})}}}\cdot {\frac {\cancel {\mathbb {P} ({\color {purple}E_{3}}\cap {\color {red}E_{1}}\cap {\color {blue}E_{2}})}}{\cancel {\mathbb {P} ({\color {red}E_{1}}\cap {\color {blue}E_{2}})}}}\cdot \dotsb \cdot {\frac {\mathbb {P} ({\color {darkgreen}E_{n}}\cap {\color {red}E_{1}}\cap \dotsb \cap {\color {brown}E_{n-1}})}{\cancel {\mathbb {P} ({\color {red}E_{1}}\cap \dotsb \cap {\color {brown}E_{n-1}})}}}}$

${\displaystyle \Box }$

Remark.

• It is also known as chain rule of probability.

Two important theorems related to conditional probability, namely law of total probability and Bayes' theorem, will be discussed in the following sections.

## Law of total probability and Bayes' theorem

Theorem. (Law of total probability and Bayes' theorem) Assume that ${\textstyle A\subseteq B_{1}\cup B_{2}\cup \dotsb }$  in which events ${\textstyle B_{1},B_{2},\dotsc }$  are disjoint and have nonzero probabilities. Then,

${\displaystyle \mathbb {P} (A)=\mathbb {P} (A|B_{1})\mathbb {P} (B_{1})+\mathbb {P} (A|B_{2})\mathbb {P} (B_{2})+\dotsb }$

Proof. Illustration (finite case):

*-----------------------------------------*
|      B_1    B_2     B_3                 |
|       |      |       |                  |
|       v      v       v                  |
|   *-------*-----*----------*            |
|   |       |     |          |            |
|   |       |     |          |            |
|   |       |     |          | <---- B    |
|   | *-----*-----*--------* |            |
|   | |AnB_1|AnB_2|AnB_3   |<----- A      | <---- Omega
|   | *-----*-----*--------* |            |
|   |       |     |          |            |
|   |       |     |          |            |
|   *-------*-----*----------*            |
|                                         |
|                                         |
|                                         |
*-----------------------------------------*


Since ${\textstyle B_{1},B_{2},\dotsc }$  are disjoint, ${\textstyle A\cap B_{1},A\cap B_{2},\dotsc }$  are also disjoint (by observing that ${\textstyle (A\cap B_{1})\cap (A\cap B_{2})=A\cap (B_{1}\cap B_{2})=A\cap \varnothing =\varnothing }$ , and other intersections have similar results). It follows that

${\displaystyle \mathbb {P} (A)=\mathbb {P} {\bigg (}\underbrace {\bigcup _{i}(A\cap B_{i})} _{\text{union of disjoint events}}{\bigg )}{\overset {\text{ (ext.) P3 }}{=}}\sum _{i}^{}\mathbb {P} (A\cap B_{i}){\overset {\text{ def }}{=}}\sum _{i}^{}\mathbb {P} (A|B_{i})\mathbb {P} (B_{i})}$

in which ${\textstyle A=\bigcup _{i}(A\cap B_{i})}$  since ${\textstyle A\subseteq B_{1}\cup B_{2}\cup \dotsb }$

${\displaystyle \Box }$

Remark.

• It follows from the definition of conditional probability that ${\textstyle \mathbb {P} (A)=\mathbb {P} (A\cap B_{1})+\mathbb {P} (A\cap B_{2})+\dotsb }$  also, but the form in the theorem is more commonly used.
• The number of ${\textstyle B_{i}}$ 's may be infinite or finite.
• The assumption is equivalent to '${\textstyle A}$  occurs implies one and only one of ${\textstyle B_{i}}$ 's occurs'.

Theorem. (Bayes' theorem) Assume that ${\textstyle A\subseteq B_{1}\cup B_{2}\cup \dotsb }$  in which events ${\textstyle B_{1},B_{2},\dotsc }$  are disjoint and have nonzero probabilities. Then,

${\displaystyle \mathbb {P} (B_{j}|A)={\frac {\mathbb {P} (A|B_{j})\mathbb {P} (B_{j})}{\mathbb {P} (A|B_{1})\mathbb {P} (B_{1})+\mathbb {P} (A|B_{2})\mathbb {P} (B_{2})+\dotsb }}}$

Proof. It follows from the definition of conditional probability (for numerator) and law of total probability (for denominator). To be more precise,

${\displaystyle \mathbb {P} (B_{j}|A)={\frac {\overbrace {\mathbb {P} (A\cap B_{j})} ^{{\overset {\text{ def }}{=}}\mathbb {P} (A|B_{j})\mathbb {P} (B_{j})}}{\underbrace {\mathbb {P} (A)} _{=\mathbb {P} (A|B_{1})\mathbb {P} (B_{1})+\mathbb {P} (A|B_{2})\mathbb {P} (B_{2})+\dotsb }}}.}$

${\displaystyle \Box }$

Illustration (finite case):

*-----------------------------------------*
|      B_1    B_2     B_3                 |
|       |      |       |                  |                     Pr(B_3|A)=
|       v      v       v                  |
|   *-------*-----*----------*            |                     *--------*
|   |       |     |          |            |   ------>           |AnB_3   |    <----- Pr(AnB_3)
|   |       |     |          |            |                     *--------*
|   |       |     |          | <---- B    |            -------------------------------
|   | *-----*-----*--------* |            |                 *-----*-----*--------*
|   | |AnB_1|AnB_2|AnB_3   |<----- A      | <---- Omega     |AnB_1|AnB_2|AnB_3   | <---- Pr(AnB_1)+Pr(AnB_2)+Pr(AnB_3)
|   | *-----*-----*--------* |            |                 *-----*-----*--------*
|   |       |     |          |            |
|   |       |     |          |            |
|   *-------*-----*----------*            |
|                                         |
|                                         |
|                                         |
*-----------------------------------------*


Example. Assume that the weather at a certain day can either be sunny or rainy, with equal probability. Amy has a probability of ${\textstyle {\color {darkgreen}0.8}}$  (${\textstyle {\color {blue}0.3}}$ ) to bring an umbrella at that day if the weather of that day is rainy (sunny).

Let ${\textstyle R,S,U}$  be the events that the weather at that day is rainy, sunny and Amy brings an umbrella at that day respectively. Then, the probability that Amy brings an umbrella at that day is

${\displaystyle \mathbb {P} (U)={\color {darkgreen}\mathbb {P} (U|R)}{\color {orangered}\mathbb {P} (R)}+{\color {blue}\mathbb {P} (U|S)}{\color {orangered}\mathbb {P} (S)}={\color {darkgreen}0.8}\cdot {\color {orangered}0.5}+{\color {blue}0.3}\cdot {\color {orangered}0.5}=0.55.}$

by law of total probability.

Given that Amy brings an umbrella at that day, the probability for that day to be rainy is

${\displaystyle \mathbb {P} (R|U)={\frac {{\color {darkgreen}\mathbb {P} (U|R)}{\color {orangered}\mathbb {P} (R)}}{\mathbb {P} (U)}}={\frac {{\color {darkgreen}0.8}\cdot {\color {orangered}0.5}}{0.55}}={\frac {8}{11}}}$

(by Bayes' theorem).

Exercise.

1 Assume that the weather can also be cloudy, such that the weather is twice as likely to be cloudy compared with sunny and rainy, and that Amy has a probability ${\textstyle p}$  to bring an umbrella at that day if the weather is cloudy. Calculate ${\textstyle p}$  such that the probability for that day to be rainy given Amy brings an umbrella at that day is ${\textstyle 4/11}$  instead.

 ${\textstyle 0}$ ${\textstyle {\frac {11}{15}}}$ ${\textstyle {\frac {11}{20}}}$ ${\textstyle {\frac {77}{120}}}$ None of the above.

2 Continue from previous question. Calculate ${\textstyle p}$  such that ${\textstyle \mathbb {P} (R|S)=\mathbb {P} (S|R)}$ .

 ${\textstyle {\frac {1}{20}}}$ ${\textstyle {\frac {7}{60}}}$ ${\textstyle {\frac {9}{20}}}$ ${\textstyle {\frac {47}{60}}}$ None of the above.

## Independence

### Motivation

Intuitively, if events are independent, then we expect that occurrence or non-occurrence of some events does not affect the occurrence or non-occurrence of the others. How do we express this meaning by probability expressions?

If there are only two events involved, it is quite simple: using the notion of conditional probability, we can define events ${\textstyle A}$  and ${\textstyle B}$  to be independent if ${\textstyle \mathbb {P} (A|B)=\mathbb {P} (A)}$  and ${\textstyle \mathbb {P} (B|A)=\mathbb {P} (B)}$ , or using just one equation, ${\textstyle \mathbb {P} (A\cap B)=\mathbb {P} (A)\mathbb {P} (B)}$  (by observing that ${\textstyle \mathbb {P} (A\cap B)=\mathbb {P} (A)\mathbb {P} (B)\Leftrightarrow \mathbb {P} (A|B)=\mathbb {P} (A)\Leftrightarrow \mathbb {P} (B|A)=\mathbb {P} (A)}$ ).

We can also define independence for more events: e.g. for three events ${\textstyle E_{1},E_{2},E_{3}}$  , we would like to define they are independent if all of the following hold:

• ${\textstyle \mathbb {P} (E_{1}|E_{2}\cap E_{3})=\mathbb {P} (E_{1})}$ ;
• ${\textstyle \mathbb {P} (E_{1}\cap E_{2}|E_{3})=\mathbb {P} (E_{1}\cap E_{2})}$ ;
• ${\textstyle \mathbb {P} (E_{1}\cap E_{3}|E_{2})=\mathbb {P} (E_{1}\cap E_{3})}$ ;
• ${\textstyle \mathbb {P} (E_{2}|E_{1}\cap E_{3})=\mathbb {P} (E_{2})}$ ;
• ${\textstyle \mathbb {P} (E_{2}\cap E_{3}|E_{1})=\mathbb {P} (E_{2}\cap E_{3})}$ ;
• ${\textstyle \mathbb {P} (E_{3}|E_{1}\cap E_{2})=\mathbb {P} (E_{3})}$ .

We can see that when more events are involved, the requirement becomes more clumsy, if we use the conditional probabilities as the definition.

Since having all of the above hold is actually equivalent to having only the following requirement hold:

• For each finite subset ${\textstyle S\subseteq \{1,2,3\}}$ , ${\textstyle \mathbb {P} \left(\bigcap _{i\in S}E_{i}\right)=\prod _{i\in S}\mathbb {P} (E_{i})}$ .

We can use this more compact expression for the definition.

Indeed, we have similar results when more events are involved, and so we have the following definition for independence.

### Definition

Definition. (Independence) The events ${\textstyle E_{1},E_{2},\dotsc }$  are independent if for each finite subset ${\textstyle S\subseteq \{1,2,\dotsc \}}$ ,

${\displaystyle \mathbb {P} \left(\bigcap _{i\in S}E_{i}\right)=\prod _{i\in S}\mathbb {P} (E_{i}).}$

Remark.

• Pairwise independence does not imply independence (but converse is true, and thus independence is 'stronger' than pairwise independence).
• We can use ${\textstyle A\perp \!\!\!\perp B}$  to denote the independence of ${\textstyle A}$  and ${\textstyle B}$ .

Example. (Events that are pairwise independent but not independent) Consider two balls, in which one is bigger than the another. Both balls are either be red or blue, with equal chance. Define

• ${\textstyle A}$  be the event that the bigger ball is red;
• ${\textstyle B}$  be the event that the smaller ball is red;
• ${\textstyle C}$  be the event that both balls have the same color.

Then, ${\textstyle A,B}$  and ${\textstyle C}$  are pairwise independent but not independent.

Proof. Consider the following tables containing relevant probabilities:

${\displaystyle {\begin{array}{cccc}{\text{bigger ball}}\backslash {\text{smaller ball}}&{\color {red}{\text{red}}}&{\color {blue}{\text{blue}}}\\\hline {\color {red}{\text{red}}}&1/4({\color {darkgreen}\checkmark _{A}}{\color {brown}\checkmark _{B}}{\color {purple}\checkmark _{C}})&1/4({\color {darkgreen}\checkmark _{A}})\\{\color {blue}{\text{blue}}}&1/4({\color {brown}\checkmark _{B}})&1/4({\color {purple}\checkmark _{C}})\\\end{array}}}$

• Then, ${\textstyle A,B}$  and ${\textstyle C}$  are pairwise independent since ${\textstyle \mathbb {P} (A\cap B)=\mathbb {P} (A\cap C)=\mathbb {P} (B\cap C)=\mathbb {P} (\{{\text{both balls are red}}\})=1/4}$ , and ${\textstyle \mathbb {P} (A)\mathbb {P} (B)=\mathbb {P} (A)\mathbb {P} (C)=\mathbb {P} (B)\mathbb {P} (C)=(1/4+1/4)^{2}=1/4}$ .
• However, ${\textstyle \mathbb {P} (A\cap B\cap C)=\mathbb {P} (\{{\text{both balls are red}}\})=1/4\neq \mathbb {P} (A)\mathbb {P} (B)\mathbb {P} (C)=1/8}$ .
• Thus, ${\textstyle A,B}$  and ${\textstyle C}$  are not independent.

${\displaystyle \Box }$

Exercise.

1 Define ${\textstyle D}$  be the event that the two balls have different color. Are ${\textstyle A,B}$  and ${\textstyle D}$  (a) pairwise independent; (b) independent?

 (a) Yes; (b) Yes. (a) Yes; (b) No. (a) No; (b) Yes. (a) No; (b) No.

2 Assume that both balls have a probability ${\textstyle p}$  to be red and ${\textstyle 1-p}$  to be blue. ${\textstyle A,B}$  and ${\textstyle C}$  are pairwise independent for which of the following value(s) of ${\textstyle p}$ ?

 0 0.25 0.5 0.75 1 None of the above.

Remark.

• If we know the occurrence or non-occurrence of any two of ${\textstyle A,B}$  and ${\textstyle C}$ , then we know the color of the two balls.
• So, the remaining unknown event become either certain or impossible.
• E.g., if we know ${\textstyle A}$  occurs and ${\textstyle C}$  does not occur, then we know that
• the bigger ball is red,
• the smaller ball is blue (since the two balls have different color, while the bigger ball is red).
• So, ${\textstyle B}$  becomes impossible.
• Thus, intuitively, ${\textstyle A,B}$  and ${\textstyle C}$  should not be independent.

Example. (Monty Hall problem) Recall the Monty Hall problem in the motivation section. Let ${\textstyle P_{1}}$ , ${\textstyle C_{2}}$  and ${\textstyle H_{3}}$  be the events that door No. 1 is picked, car is behind door No. 2, and the host opens door No.3 respectively. The probability that the car is behind door No. 2 is

${\displaystyle \mathbb {P} (C_{2}|P_{1}\cap H_{3})={\frac {\mathbb {P} (C_{2}\cap P_{1}\cap H_{3})}{\mathbb {P} (P_{1}\cap H_{3})}}={\frac {\overbrace {\mathbb {P} (P_{1})} ^{=1}\overbrace {\mathbb {P} (C_{2}|P_{1})} ^{=\mathbb {P} (C_{2})=1/3}\overbrace {\mathbb {P} (H_{3}|C_{2}\cap P_{1})} ^{=1}}{\underbrace {\mathbb {P} (H_{3}|P_{1})} _{=1/2}\underbrace {\mathbb {P} (P_{1})} _{=1}}}={\frac {1/3}{1/2}}={\frac {2}{3}}.}$

Proof.

• ${\textstyle \mathbb {P} (P_{1})=1}$  since ${\textstyle P_{1}}$  is given, and so is certain.
• ${\textstyle \mathbb {P} (C_{2}|P_{1})=\mathbb {P} (C_{2})}$  since the probability of ${\textstyle C_{2}}$  is the same regardless of the door picked, i.e. ${\textstyle P_{1}\perp \!\!\!\perp C_{2}}$ .
• ${\textstyle \mathbb {P} (C_{2})={\frac {1}{3}}}$ , since the car is equally likely to be put behind each door, by principle of insufficient reason.
• ${\textstyle \mathbb {P} (H_{3}|C_{2}\cap P_{1})=1}$ , by the assumption, since the host is impossible to open door 2, which has a car behind it (condition), and also door 1, which is picked by us (condition).
• ${\displaystyle \mathbb {P} (H_{3}|P_{1})=\mathbb {P} (H_{3}|P_{1}\cap C_{1})\underbrace {\mathbb {P} (C_{1})} _{1/3}+\mathbb {P} (H_{3}|P_{1}\cap C_{2})\underbrace {\mathbb {P} (C_{2})} _{1/3}+\mathbb {P} (H_{3}|P_{1}\cap C_{3})\underbrace {\mathbb {P} (C_{3})} _{1/3}={\frac {1}{3}}\left({\frac {1}{2}}+1+0\right)={\frac {1}{2}}}$

• ${\textstyle \mathbb {P} (H_{3}|P_{1}\cap C_{1})={\frac {1}{2}}}$ , since the host is impossible to open door 1 (picked), and is equally likely to open door 2 and 3 by principle of insufficient reason.
• ${\textstyle \mathbb {P} (H_{3}|P_{1}\cap C_{2})=1}$ , since the host is impossible to open door 1 (picked) and 2 (with car behind it), and so the host certainly open door 3.
• ${\textstyle \mathbb {P} (H_{3}|P_{1}\cap C_{3})=0}$ , since the host is impossible to open door 3 (with car behind it).
• {\displaystyle {\begin{aligned}\mathbb {P} (H_{3}|P_{1})&{\overset {\text{ def }}{=}}{\frac {\mathbb {P} (H_{3}\cap P_{1})}{\mathbb {P} (P_{1})}}\\&={\frac {\mathbb {P} (H_{3}\cap P_{1}\cap C_{1})+\mathbb {P} (H_{3}\cap P_{1}\cap C_{2})+\mathbb {P} (H_{3}\cap P_{1}\cap C_{3})}{\mathbb {P} (P_{1})}}&{\text{by law of total probability}}\\&={\frac {\mathbb {P} (H_{3}\cap P_{1}\cap C_{1})\mathbb {P} (C_{1})}{\mathbb {P} (P_{1})\mathbb {P} (C_{1})}}+{\frac {\mathbb {P} (H_{3}\cap P_{1}\cap C_{2})\mathbb {P} (C_{2})}{\mathbb {P} (P_{1})\mathbb {P} (C_{2})}}+{\frac {\mathbb {P} (H_{3}\cap P_{1}\cap C_{3})\mathbb {P} (C_{3})}{\mathbb {P} (P_{1})\mathbb {P} (C_{3})}}\\&={\frac {\mathbb {P} (H_{3}\cap P_{1}\cap C_{1})\mathbb {P} (C_{1})}{\mathbb {P} (P_{1}\cap C_{1})}}+{\frac {\mathbb {P} (H_{3}\cap P_{1}\cap C_{2})\mathbb {P} (C_{2})}{\mathbb {P} (P_{1}\cap C_{2})}}+{\frac {\mathbb {P} (H_{3}\cap P_{1}\cap C_{3})\mathbb {P} (C_{3})}{\mathbb {P} (P_{1}\cap C_{3})}}&{\text{by independence}}\\&{\overset {\text{ def }}{=}}\mathbb {P} (H_{3}|P_{1}\cap C_{1})\mathbb {P} (C_{1})+\mathbb {P} (H_{3}|P_{1}\cap C_{2})\mathbb {P} (C_{2})+\mathbb {P} (H_{3}|P_{1}\cap C_{3})\mathbb {P} (C_{3})\end{aligned}}}

• Having these probabilities, the result follows by applying the definition of conditional probability and multiplication rule of probability, as in above.

${\displaystyle \Box }$

Tree diagram.

Exercise.

1 Suppose the host opens the door randomly, such that the host is equally likely to open each door. Calculate the probability again.

 ${\textstyle {\frac {1}{6}}}$ ${\textstyle {\frac {1}{3}}}$ ${\textstyle {\frac {1}{2}}}$ ${\textstyle {\frac {2}{3}}}$ None of the above.

2 Suppose there are ${\textstyle N\geq 3}$  doors instead of 3 doors. Without changing other given information, calculate the probability again.

 ${\textstyle {\frac {N-1}{N(N-2)}}}$ ${\textstyle {\frac {N}{(N-1)(N-2)}}}$ ${\textstyle {\frac {1}{N-2}}}$ ${\textstyle {\frac {1}{N-1}}}$ None of the above.

Remark.

• For other cases in which another door is picked, the same result holds by symmetry (notations can be changed in the expression).

### Related results

Proposition. If and only if some events are independent, then they are still independent when part of them are changed to their complements.

Proof. We can prove it inductively. E.g., assume ${\textstyle E_{1},E_{2},\dotsc }$  are independent. Then,

${\displaystyle \mathbb {P} (E_{1}\cap E_{2})=\mathbb {P} (E_{1})\mathbb {P} (E_{2})\Leftrightarrow {\color {darkgreen}\mathbb {P} (E_{1})-}\mathbb {P} (E_{1}\cap E_{2})={\color {darkgreen}\mathbb {P} (E_{1})-}\mathbb {P} (E_{1})\mathbb {P} (E_{2})\Leftrightarrow \mathbb {P} (E_{1}\cap E_{2}^{c})=\mathbb {P} (E_{1})(\underbrace {1-\mathbb {P} (E_{2})} _{\mathbb {P} (E_{2}^{c})}),}$

and similar results hold for other events.

${\displaystyle \Box }$

Example. (Events that are pairwise independent but not independent (cont'd)) Recall the three events in a previous example.

• ${\textstyle A}$  be the event that the bigger ball is red;
• ${\textstyle B}$  be the event that the smaller ball is red;
• ${\textstyle C}$  be the event that both balls have the same color.

They are not independent in the condition in that example. It follows that ${\textstyle A,B}$  and ${\textstyle C^{c}}$  (namely the event that the two balls have different color, which is ${\textstyle D}$  in an exercise for that example) are not independent.

Example. (Special cases for independence) A certain event is independent of arbitrary event. This also holds for a impossible event.

Proof.

• The empty set ${\textstyle \varnothing }$  is the impossible event, since ${\textstyle \mathbb {P} (\varnothing )=0}$ .
• For each event ${\textstyle E}$ , ${\textstyle \mathbb {P} (E\cap \varnothing )=\mathbb {P} (\varnothing )=0}$ .
• Also, ${\textstyle \underbrace {\mathbb {P} (\varnothing )} _{0}\mathbb {P} (E)=0}$ .
• So, ${\textstyle E\perp \!\!\!\perp \varnothing }$ .
• The sample space ${\textstyle \Omega }$  is the certain event, since ${\textstyle \mathbb {P} (\Omega )=1}$ .
• Since ${\textstyle \Omega =\varnothing ^{c}}$ , and ${\textstyle E\perp \!\!\!\perp \varnothing }$  for each event ${\textstyle E}$ , it follows from the proposition about independence of complement events that ${\textstyle E\perp \!\!\!\perp \Omega }$ .

${\displaystyle \Box }$

Remark.

• The meaning of this result is that knowledge of arbitrary event does not make a certain event less certain, and also does not make an impossible event possible, which is intuitive.

### Conditional independence

Conditional independence is a conditional version of independence, and has the following definition which is similar to that of independence.

Definition. (Conditional independence) The events ${\textstyle E_{1},E_{2},\dotsc }$  are conditionally independent given ${\textstyle C}$  if

${\displaystyle \mathbb {P} \left(\bigcap _{i\in S}E_{i}{\color {darkgreen}{\bigg |}C}\right)=\prod _{i\in S}\mathbb {P} (E_{i}{\color {darkgreen}|C})}$

for each finite subset ${\textstyle S\subseteq \{1,2,\dotsc \}}$ .

Remark.

• In particular, if events ${\textstyle A}$  and ${\textstyle B}$  are conditionally independent given ${\textstyle C}$  (assuming ${\textstyle \mathbb {P} (B{\color {darkgreen}{\color {darkgreen}|C}})>0}$  and ${\textstyle \mathbb {P} ({\color {darkgreen}C})>0}$ ),
${\displaystyle \mathbb {P} (A\cap B{\color {darkgreen}{\color {darkgreen}|C}})=\mathbb {P} (A{\color {darkgreen}{\color {darkgreen}|C}})\mathbb {P} (B{\color {darkgreen}{\color {darkgreen}|C}})\Leftrightarrow {\frac {\mathbb {P} (A\cap B{\color {darkgreen}|C})\mathbb {P} ({\color {darkgreen}C})}{\mathbb {P} (B{\color {darkgreen}|C})\mathbb {P} ({\color {darkgreen}C})}}=\mathbb {P} (A{\color {darkgreen}|C})\Leftrightarrow \mathbb {P} (A|B{\color {darkgreen}\cap C})=\mathbb {P} (A{\color {darkgreen}|C}).}$

• This means that knowing ${\textstyle B}$  happens does not affect the occurrence or non-occurrence of ${\textstyle A}$ .
• In general, some events are conditionally independent given event ${\textstyle C}$  neither implies nor is implied by that given event ${\textstyle C^{c}}$ .
• Conditional independence of some events neither implies nor is implied by independence of them. These two concepts are not related.

Example. Define

• ${\textstyle A}$  be the event that the birthday of Amy is June 1st;
• ${\textstyle B}$  be the event that the birthday of Bob is July 1st;
• ${\textstyle C}$  be the event that Amy and Bob are twins.

Events ${\textstyle A}$  and ${\textstyle B}$  are conditionally independent given ${\textstyle C^{c}}$ , but not conditionally independent given ${\textstyle C}$ . Also, events ${\textstyle A}$  and ${\textstyle B}$  are independent (unconditionally). (Assume, for simplicity, that the birthday of Amy and Bob is equally likely to be one of the 365 dates in a year (not including February 29th).)

Proof.

• ${\textstyle A}$  and ${\textstyle B}$  are conditionally independent given ${\textstyle C^{c}}$  since
• ${\textstyle \mathbb {P} (A|C^{c})=\mathbb {P} (B|C^{c})=1/365}$ ;
• ${\textstyle \mathbb {P} (A\cap B|C^{c})=(1/365)^{2}=\mathbb {P} (A|C^{c})\mathbb {P} (B|C^{c})}$  (there are ${\textstyle 365^{2}}$  equally likely (by principle of insufficient reason) distinct pairs of the birthdays).
• ${\textstyle A}$  and ${\textstyle B}$  are not conditionally independent given ${\textstyle C}$  since
• ${\textstyle \mathbb {P} (A|C)=\mathbb {P} (B|C)=1/365}$ ;
• ${\textstyle \mathbb {P} (A\cap B|C)=0\neq \mathbb {P} (A|C)\mathbb {P} (B|C)}$  (twins must have the same birthday).
• ${\textstyle A}$  and ${\textstyle B}$  are independent (unconditionally) since
• ${\textstyle \mathbb {P} (A)=\mathbb {P} (B)=1/365}$ ;
• ${\textstyle \mathbb {P} (A\cap B)=(1/365)^{2}=\mathbb {P} (A)\mathbb {P} (B)}$  (there are ${\textstyle 365^{2}}$  equally likely (by principle of insufficient reason) distinct pairs of the birthdays).

${\displaystyle \Box }$

Exercise.

Are ${\textstyle A}$  and ${\textstyle C}$  conditionally independent given (a) ${\textstyle B}$ ; (b) ${\textstyle B^{c}}$ ?

 (a) Yes; (b) Yes. (a) Yes; (b) No. (a) No; (b) Yes. (a) No; (b) No.

## References and footnotes

1. if we pick the door with a car behind it, then we win the car. We win nothing otherwise