# Probability Theory/Conditional probability

## Basics and multiplication formula

Definition 3.1 (Conditional probability):

Let ${\displaystyle (\Omega ,{\mathcal {F}},P)}$  be a probability space, and let ${\displaystyle A\in {\mathcal {F}}}$  be fixed, such that ${\displaystyle P(A)>0}$ . If ${\displaystyle B\in {\mathcal {F}}}$  is another set, then the conditional probability of ${\displaystyle B}$  where ${\displaystyle A}$  already has occurred (or occurs with certainty) is defined as

${\displaystyle P_{A}(B):={\frac {P(B\cap A)}{P(A)}}}$ .

Using multiplicative notation, we could have written

${\displaystyle P_{A}(B):={\frac {P(BA)}{P(A)}}}$

This definition is intuitive, since the following lemmata are satisfied:

Lemma 3.2:

${\displaystyle A\subseteq B\Rightarrow P_{A}(B)=1}$

Lemma 3.3:

${\displaystyle P_{A}(B+C)=P_{A}(B)+P_{A}(C)}$

Each lemma follows directly from the definition and the axioms holding for ${\displaystyle P}$  (definition 2.1).

From these lemmata, we obtain that for each ${\displaystyle A\in {\mathcal {F}}}$ , ${\displaystyle (\Omega ,{\mathcal {F}},P_{A})}$  satisfies the defining axioms of a probability space (definition 2.1).

With this definition, we have the following theorem:

Theorem 3.4 (Multiplication formula):

${\displaystyle P(A_{1}A_{2}\cdots A_{n})=P_{A_{1}\cdots A_{n-1}}(A_{n})P_{A_{1}\cdots A_{n-2}}(A_{n-1})\cdots P_{A_{1}}(A_{2})P(A_{1})}$ ,

where ${\displaystyle (\Omega ,{\mathcal {F}},P)}$  is a probability space and ${\displaystyle A_{1},\ldots ,A_{n}}$  are all in ${\displaystyle {\mathcal {F}}}$ .

Proof:

From the definition, we have

${\displaystyle P_{A}(B)P(A)=P(AB)}$

for all ${\displaystyle A,B\in {\mathcal {F}}}$ . Thus, as ${\displaystyle {\mathcal {F}}}$  is an algebra, we obtain by induction:

{\displaystyle {\begin{aligned}P(A_{1}A_{2}\cdots A_{n})&=P((A_{1}A_{2}\cdots A_{n-1})A_{n})\\&=P_{A_{1}\cdots A_{n-1}}(A_{n})P(A_{1}\cdots A_{n-1})\\&=P_{A_{1}\cdots A_{n-1}}(A_{n})P_{A_{1}\cdots A_{n-2}}(A_{n-1})\cdots P_{A_{1}}(A_{2})P(A_{1}).\end{aligned}}} ${\displaystyle \Box }$

## Bayes' theorem

Theorem 3.5 (Theorem of the total probability):

Let ${\displaystyle (\Omega ,{\mathcal {F}},P)}$  be a probability space, and assume

${\displaystyle \Omega =A_{1}+\cdots +A_{n}}$

(note that by using the ${\displaystyle +}$ -notation, we assume that the union is disjoint), where ${\displaystyle A_{1},\ldots ,A_{n}}$  are all contained within ${\displaystyle {\mathcal {F}}}$ . Then

${\displaystyle \forall B\in {\mathcal {F}}:P(B)=\sum _{j=1}^{n}P(A_{j})P_{A_{j}}(B)}$ .

Proof:

{\displaystyle {\begin{aligned}\sum _{j=1}^{n}P(A_{j})P_{A_{j}}(B)&=\sum _{j=1}^{n}P(A_{j}){\frac {P(A_{j}\cap B)}{P(A_{j})}}\\&=\sum _{j=1}^{n}P(A_{j}B)\\&=P\left(\sum _{j=1}^{n}A_{j}B\right)\\&=P\left(\left(\sum _{j=1}^{n}A_{j}\right)B\right)\\&=P(\Omega B)\\&=P(B),\end{aligned}}}

where we used that the sets ${\displaystyle A_{1}B,\ldots ,A_{n}B}$  are all disjoint, the distributive law of the algebra ${\displaystyle {\mathcal {F}}}$  and ${\displaystyle \Omega \cap B=B}$ .${\displaystyle \Box }$

Theorem 3.6 (Retarded Bayes' theorem):

Let ${\displaystyle (\Omega ,{\mathcal {F}},P)}$  be a probability space and ${\displaystyle A,B\in {\mathcal {F}}}$ . Then

${\displaystyle P_{B}(A)={\frac {P(A)P_{A}(B)}{P(B)}}}$ .

Proof:

${\displaystyle {\frac {P(A)P_{A}(B)}{P(B)}}={\frac {P(A){\frac {P(A\cap B)}{P(A)}}}{P(B)}}=P_{B}(A)}$ .${\displaystyle \Box }$

This formula may look somewhat abstract, but it actually has a nice geometrical meaning. Suppose we are given two sets ${\displaystyle A,B\in {\mathcal {F}}}$ , already know ${\displaystyle P(A)}$ , ${\displaystyle P(B)}$  and ${\displaystyle P_{A}(B)}$ , and want to compute ${\displaystyle P_{B}(A)}$ . The situation is depicted in the following picture:

We know the ratio of the size of ${\displaystyle A\cap B}$  to ${\displaystyle A}$ , but what we actually want to know is how ${\displaystyle A\cap B}$  compares to ${\displaystyle B}$ . Hence, we change the 'comparitant' by multiplying with ${\displaystyle P(A)}$ , the old reference magnitude, and dividing by ${\displaystyle P(B)}$ , the new reference magnitude.

Theorem 3.7 (Bayes' theorem):

Let ${\displaystyle (\Omega ,{\mathcal {F}},P)}$  be a probability space, and assume

${\displaystyle \Omega =A_{1}+\cdots +A_{n}}$ ,

where ${\displaystyle A_{1},\ldots ,A_{n}}$  are all in ${\displaystyle {\mathcal {F}}}$ . Then for all ${\displaystyle B\in {\mathcal {F}}}$

${\displaystyle \forall j\in \{1,\ldots ,n\}:P_{B}(A_{j})={\frac {P_{A_{j}}(B)P(A_{j})}{\sum _{k=1}^{n}P(A_{k})P_{A_{k}}(B)}}}$ .

Proof:

From the basic version of the theorem, we obtain

${\displaystyle P_{B}(A_{j})={\frac {P_{A_{j}}(B)P(A_{j})}{P(B)}}}$ .

Using the formula of total probability, we obtain

${\displaystyle P_{B}(A_{j})={\frac {P_{A_{j}}(B)P(A_{j})}{\sum _{k=1}^{n}P(A_{k})P_{A_{k}}(B)}}}$ .${\displaystyle \Box }$