# Probability Theory/Conditional probability

## Basics and multiplication formula

Definition 3.1 (Conditional probability):

Let $(\Omega ,{\mathcal {F}},P)$  be a probability space, and let $A\in {\mathcal {F}}$  be fixed, such that $P(A)>0$ . If $B\in {\mathcal {F}}$  is another set, then the conditional probability of $B$  where $A$  already has occurred (or occurs with certainty) is defined as

$P_{A}(B):={\frac {P(B\cap A)}{P(A)}}$ .

Using multiplicative notation, we could have written

$P_{A}(B):={\frac {P(BA)}{P(A)}}$

This definition is intuitive, since the following lemmata are satisfied:

Lemma 3.2:

$A\subseteq B\Rightarrow P_{A}(B)=1$

Lemma 3.3:

$P_{A}(B+C)=P_{A}(B)+P_{A}(C)$

Each lemma follows directly from the definition and the axioms holding for $P$  (definition 2.1).

From these lemmata, we obtain that for each $A\in {\mathcal {F}}$ , $(\Omega ,{\mathcal {F}},P_{A})$  satisfies the defining axioms of a probability space (definition 2.1).

With this definition, we have the following theorem:

Theorem 3.4 (Multiplication formula):

$P(A_{1}A_{2}\cdots A_{n})=P_{A_{1}\cdots A_{n-1}}(A_{n})P_{A_{1}\cdots A_{n-2}}(A_{n-1})\cdots P_{A_{1}}(A_{2})P(A_{1})$ ,

where $(\Omega ,{\mathcal {F}},P)$  is a probability space and $A_{1},\ldots ,A_{n}$  are all in ${\mathcal {F}}$ .

Proof:

From the definition, we have

$P_{A}(B)P(A)=P(AB)$

for all $A,B\in {\mathcal {F}}$ . Thus, as ${\mathcal {F}}$  is an algebra, we obtain by induction:

{\begin{aligned}P(A_{1}A_{2}\cdots A_{n})&=P((A_{1}A_{2}\cdots A_{n-1})A_{n})\\&=P_{A_{1}\cdots A_{n-1}}(A_{n})P(A_{1}\cdots A_{n-1})\\&=P_{A_{1}\cdots A_{n-1}}(A_{n})P_{A_{1}\cdots A_{n-2}}(A_{n-1})\cdots P_{A_{1}}(A_{2})P(A_{1}).\end{aligned}} $\Box$

## Bayes' theorem

Theorem 3.5 (Theorem of the total probability):

Let $(\Omega ,{\mathcal {F}},P)$  be a probability space, and assume

$\Omega =A_{1}+\cdots +A_{n}$

(note that by using the $+$ -notation, we assume that the union is disjoint), where $A_{1},\ldots ,A_{n}$  are all contained within ${\mathcal {F}}$ . Then

$\forall B\in {\mathcal {F}}:P(B)=\sum _{j=1}^{n}P(A_{j})P_{A_{j}}(B)$ .

Proof:

{\begin{aligned}\sum _{j=1}^{n}P(A_{j})P_{A_{j}}(B)&=\sum _{j=1}^{n}P(A_{j}){\frac {P(A_{j}\cap B)}{P(A_{j})}}\\&=\sum _{j=1}^{n}P(A_{j}B)\\&=P\left(\sum _{j=1}^{n}A_{j}B\right)\\&=P\left(\left(\sum _{j=1}^{n}A_{j}\right)B\right)\\&=P(\Omega B)\\&=P(B),\end{aligned}}

where we used that the sets $A_{1}B,\ldots ,A_{n}B$  are all disjoint, the distributive law of the algebra ${\mathcal {F}}$  and $\Omega \cap B=B$ .$\Box$

Theorem 3.6 (Retarded Bayes' theorem):

Let $(\Omega ,{\mathcal {F}},P)$  be a probability space and $A,B\in {\mathcal {F}}$ . Then

$P_{B}(A)={\frac {P(A)P_{A}(B)}{P(B)}}$ .

Proof:

${\frac {P(A)P_{A}(B)}{P(B)}}={\frac {P(A){\frac {P(A\cap B)}{P(A)}}}{P(B)}}=P_{B}(A)$ .$\Box$

This formula may look somewhat abstract, but it actually has a nice geometrical meaning. Suppose we are given two sets $A,B\in {\mathcal {F}}$ , already know $P(A)$ , $P(B)$  and $P_{A}(B)$ , and want to compute $P_{B}(A)$ . The situation is depicted in the following picture:

We know the ratio of the size of $A\cap B$  to $A$ , but what we actually want to know is how $A\cap B$  compares to $B$ . Hence, we change the 'comparitant' by multiplying with $P(A)$ , the old reference magnitude, and dividing by $P(B)$ , the new reference magnitude.

Theorem 3.7 (Bayes' theorem):

Let $(\Omega ,{\mathcal {F}},P)$  be a probability space, and assume

$\Omega =A_{1}+\cdots +A_{n}$ ,

where $A_{1},\ldots ,A_{n}$  are all in ${\mathcal {F}}$ . Then for all $B\in {\mathcal {F}}$

$\forall j\in \{1,\ldots ,n\}:P_{B}(A_{j})={\frac {P_{A_{j}}(B)P(A_{j})}{\sum _{k=1}^{n}P(A_{k})P_{A_{k}}(B)}}$ .

Proof:

From the basic version of the theorem, we obtain

$P_{B}(A_{j})={\frac {P_{A_{j}}(B)P(A_{j})}{P(B)}}$ .

Using the formula of total probability, we obtain

$P_{B}(A_{j})={\frac {P_{A_{j}}(B)P(A_{j})}{\sum _{k=1}^{n}P(A_{k})P_{A_{k}}(B)}}$ .$\Box$