# Probability/Probability Spaces

## Concept

We will now proceed to develop a more axiomatic theory of probability, allowing for a simpler mathematical formalism. We shall proceed by developing the concept of a probability space, which will allow us to harness many theorems in mathematical analysis.

Recall that an experiment is any action or process with an outcome that is subject to uncertainty or randomness. A probability space or a probability triple is a mathematical construct that models an experiment and its set of possible outcomes.

## Probability space

Before defining probability space, we define several terms used in its definition.

Definition. (Sample space) The sample space, denoted by $\Omega$ , is the non-empty set whose elements are all possible outcomes of an experiment.

Remark.

• The sample space is often not unique, since there are often multiple ways to define the possible outcomes of an experiment, possibly because of the difference in expression .
• An outcome from the experiment is commonly denoted by $\omega$  (small letter of $\Omega$ , omega).

Example. A sample space of the numbers coming up from rolling a six-faced dice is $\Omega =\{1,2,3,4,5,6\}$ .

Definition. (Event) An event is a subset of the sample space.

Remark.

• It follows that the event space ${\mathcal {F}}$ , which is a set consisting all events (or family of events), is the power set of sample space, i.e. ${\mathcal {F}}={\mathcal {P}}(\Omega )$ .
• Event consisting a single outcome (which is a singleton) is sometimes referred as simple event, and event consisting more than one outcomes is sometimes referred as compound event.
• An event is said to have happened or occurred if the outcome of the experiment is an element of the event.

Example. Sets $\varnothing ,\{1,2,3\}$  and $\Omega$  are events from rolling a six-faced dice, while the set $\{0\}$  is not.

Definition. (Probability space) A probability space is a mathematical triplet $(\Omega ,{\mathcal {F}},\mathbb {P} )$  consisting of the sample space $\Omega$ , event space ${\mathcal {F}}$ , and a probability function $\mathbb {P}$ .

Remark.

• There are multiple ways to define the probability functions, as we will see in the following sections, and among those definitions, the axiomatic definition is the most used, and general.
• The probability function is sometimes denoted by $\Pr$ , $P$  or $p$  instead.
• The notation $\mathbb {P}$  is mainly used in this book to distinguish the probability function from other functions named $P$  or $p$ .
• A probability space is arbitrary, in the sense that its author ultimately defines which elements $\Omega$ , ${\mathcal {F}}$ , and $\mathbb {P}$  will contain.
• The probability function $\mathbb {P}$  may present a model for a particular class of real-world situations.

## Terminologies

Terminologies of set from set theory also apply to event, since event is essentially a set. Apart from those terminologies, we also have the following extra terminologies for event.

Definition. (Exhaustive) Events $E_{1},\dotsc ,E_{n}$  are exhaustive if $E_{1}\cup \dotsb \cup E_{n}=\Omega$ .

Example. When we are rolling a six-faced dice, and we are considering the number coming up as the outcome, the events $\{1,2,3,4\}$  and $\{3,4,5,6\}$  are exhaustive, while the events $\varnothing$  and $\{1,2,3,4,5\}$  are not exhaustive.

Definition. (Partition) A group of events $E_{1},\dotsc ,E_{n}$  is a partition of $\Omega$  if the events are both disjoint and exhaustive.

Example. When we are rolling a six-faced dice, and we are considering the number coming up as the outcome, the group of events $\varnothing$  and $\Omega$  is a partition, while the group of events $\{1,2,{\color {maroon}3,4}\}$  and $\{{\color {maroon}3,4},5,6\}$  is not a partition, since these events are not disjoint.

## Probability definition

The remaining undefined item in the probability space is the probability function $\mathbb {P}$ , and we will give various definitions of it, in which the combinatorial (or classical), and axiomatic definitions are important.

Definition. (Subjective probability) The probability of an event is a measure of the chance with which we can expect the event to occur. We assign a number between 0 and 1 inclusively to the probability of an event. A probability of 1 means that we are certain the event will occur, and a probability of 0 means that we are certain the event will not occur.

Example. Amy and Bob access their probabilities of winning the top prize from a lucky draw using the subjective probability approach.

• Amy thinks that she is lucky, and thus assign 0.7 to the probability of winning the top prize.
• Bob thinks that he is unlucky, and thus assign 0.1 to the probability of winning the top prize.

Remark.

• This illustrates a major problem of subjective probability, namely the probability assigned to an event is often not unique, due to different opinions from different people.

Definition. (Combinatorial probability) Assume all outcomes in the sample space $\Omega$  are equally likely. Then, the (combinatorial) probability of an event (say $E$ ) in the sample space is $\mathbb {P} (E)=\#(E)/\#(\Omega )$ .

Remark.

• It is also called classical probability.
• If the outcomes are not equally likely, we cannot apply this definition.
• By principle of indifference (or insufficient reason), unless there exists evidence showing that the outcomes are not equally likely , we should assume that the outcomes are equally likely.
• When the sample space contains infinitely many outcomes, the combinatorial probability is undefined.

Example. The probability of getting the number 1 coming up from rolling a fair red six-faced dice and a fair blue six-faced dice is $1/(6\cdot 6)=1/36$ .

Proof. The number of pair of numbers coming up for the two dices is $\underbrace {6} _{\text{red}}\times \underbrace {6} _{\text{blue}}=36$ . Since the dice is fair, the 36 outcomes are equally likely, and so we can apply combinatorial probability here.

$\Box$

Exercise.

Suppose the blue dice is colored red. Calculate the probability again.

 1/36 1/21 1/18 1/15 1/6

Example. (Capture-mark-recapture) We are fishing in a lake, containing $N$  fishes. First, we catch $k\leq n$  fishes from the lake (capture), and gave them each a marker (mark). Then, we catch fishes from the lake again (recapture), and catch $n\geq r$  (and also $\leq N$ ) fishes this time. The probability that there is $r\leq k$  marked fishes in the $n$  fishes is ${\binom {k}{r}}\times {\binom {N-k}{n-r}}{\bigg /}{\binom {N}{n}}$ .

Proof. We order the $N$  fishes in the lake notionally (e.g. by assigning them different number one by one), so that they are now distinguishable (notionally), then, we have:

• ${\binom {N}{n}}$ : the number of outcomes of catching $n$  fishes from $N$  fishes;
• ${\binom {k}{r}}$ : the number of outcomes of catching $r$  marked fishes from $k$  marked fishes in the recapture process;
• ${\binom {N-k}{n-r}}$ : the number of outcomes of catching $n-r$  unmarked fishes from $N-k$  unmarked fishes in the recapture process (this ensure that we only catch $r$  marked fishes, by ensuring that the remaining caught fishes do not contain any marked fish).

$\Box$

Exercise. There are 9 balls in a box, consisting of 3 red balls, 2 blue balls and 4 green balls.

1 Calculate the probability that a red ball is drawn from the box if 1 ball is drawn from the box.

 1/28 3/28 1/9 1/3 None of the above.

2 Calculate the probability that 2 red balls and 3 green balls are drawn from the box if 6 balls are drawn from the box.

 2/7 5/9 5/7 5/6 None of the above.

3 $n$  orange balls are added to the box such that the probability that 2 red balls and 3 green balls are drawn from the box if 6 balls are drawn from the box is now $1/3$ . Calculate $n$ .

 2 4 8 16 None of the above.

4 Select the correct (in numerical value sense) expression(s) of the probability that $r$  red balls are drawn and $b$  blue balls are drawn if $k$  balls are drawn from the box ($r$ , $b$  and $k$  are of values such that all terms in the following are defined).

 ${\binom {3}{r}}{\binom {2}{b}}{\bigg /}{\binom {9}{k}}$ ${\binom {3}{r}}{\binom {2}{b}}{\bigg /}{\binom {b+r+k}{k}}$ ${\binom {3}{r}}{\binom {2}{b}}{\bigg /}{\binom {9}{9-b-r}}$ ${\binom {9-b-k}{r}}{\binom {2}{b}}{\bigg /}{\binom {9}{k}}$ ${\binom {3}{r}}{\binom {9-r-k}{b}}{\bigg /}{\binom {9}{k}}$ Definition. (Frequentist probability) The probability of an event or outcome is the long-term proportion of times the event would occur if the experiment was repeated independently many times. That is, letting $n(E)$  be the no. of times that event $E$  occurs from $n$  repetitions of experiment, then the probability of $E$  is $\mathbb {P} (E)=\lim _{n\to \infty }{\frac {n(E)}{n}}.$

Remark.

• When the no. of repetitions is large enough, the ratio of the no. of times that event $E$  occurs from these repetitions to the no. of repetitions can be used to approximate $\mathbb {P} (E)$ .

Example. Suppose we throw a coin 1 million times (i.e. 1000000 times). The number of head coming up is 700102, the number of tail coming up is 299896, and the number of times that the coin lands on edge is 2.

Then, the probability that the head coming up is close to $700102/1000000=0.700102$ .

After that, we may think that the coin is unfair .

Definition. (Axiomatic probability) A probability is a set function defined on the event space ${\mathcal {F}}$ . It assigns a real value $\mathbb {P} (E)$  to each event $E$ , with the following probability axioms satisified:

(P1) for each event $E\in {\mathcal {F}}$ , $\mathbb {P} (E)\geq 0$  (nonnegativity);
(P2) $\mathbb {P} (\Omega )=1$  (unitarity);
(P3) for each (countable) infinite sequence of mutually exclusive (or disjoint) events $E_{1},E_{2},\dotsc$ , $\mathbb {P} \left(\bigcup _{i=1}^{\infty }E_{i}\right)=\sum _{i=1}^{\infty }\mathbb {P} (E_{i})$  (countable additivity).

Example. Based on the probability axioms, the probability of an event is impossible to be -0.1.

Example. (Combinatorial probability is probability) Combinatorial probability is a probability since it satisfies all three probability axioms.

Proof.

(P1) It follows from observing that the no. of outcomes is nonnegative;
(P2) It follows from observing that the no. of outcomes in the event (which is a subset of sample space) cannot be larger than the no. of outcomes in the sample space;
(P3) It follows from observing that the no. of outcomes in union of (infinite) disjoint sets is the same as the sum of no. of outcomes in each of the (infinite) disjoint sets (possibly through the Venn diagram, non-rigorously).

$\Box$

With these three axioms only, we can prove many well-known properties of probability.

## Properties of probability

### Basic properties of probability

Proposition. (Probability of empty set) $\mathbb {P} (\varnothing )=0$ .

Proof. Let $E_{i}=\varnothing$  for each positive integer $i$ . $E_{1},E_{2},\dotsc$  are mutually exclusive, since they are all empty sets, and the intersection of each two of them is also empty set. Also, $E_{1}\cup E_{2}\cup \dotsb =\varnothing \cup \varnothing \cup \dotsb =\varnothing$ . So,

{\begin{aligned}&&\mathbb {P} (\varnothing )&=\mathbb {P} (E_{1}\cup E_{2}\cup \dotsb )\\&&&{\overset {\text{ P3 }}{=}}\mathbb {P} (E_{1})+\mathbb {P} (E_{2})+\dotsb \\&\Rightarrow &\underbrace {\mathbb {P} (\varnothing )-\mathbb {P} (E_{1})} _{0}&=\mathbb {P} (E_{1})+\mathbb {P} (E_{2})+\dotsb \\&\Rightarrow &\mathbb {P} (E_{2})+\dotsb &=0\\&\Rightarrow &\mathbb {P} (E_{2})&\leq \mathbb {P} (E_{2})+\dotsb =0.\end{aligned}}

By P1, $\mathbb {P} (E_{2})\geq 0$ . It follows that from these two inequalities that $\mathbb {P} (\varnothing )=\mathbb {P} (E_{2})=0$ .

$\Box$

Proposition. (Extended P3) The property of probability in the third axiom of probability (P3) is also valid for a finite sequence of events.

Proof. For each positive integer $k$ , suppose that $A_{1},\dotsc ,A_{k}$  are disjoint events, and append to these the infinite sequence of events $A_{k+1}=\varnothing ,A_{k+2}=\varnothing ,\dotsc$ . By P3,

$\mathbb {P} \left(\bigcup _{i=1}^{k}A_{i}\right)=\mathbb {P} \left(\bigcup _{i=1}^{\infty }A_{i}\right)=\sum _{i=1}^{\infty }\mathbb {P} (A_{i})=\sum _{i=1}^{k}\mathbb {P} (A_{i})$

since $\sum _{i=k+1}^{\infty }\mathbb {P} (A_{i})=\mathbb {P} (\varnothing )+\dotsb =0$ .

$\Box$

Proposition. (Simplified law of total probability) For each event $A$  and $B$ , $\mathbb {P} (B)=\mathbb {P} (B\cap A)=\mathbb {P} (B\setminus A)$ .

Proof.

$\mathbb {P} (B)=\mathbb {P} (B\cap (\underbrace {A\cup A^{c}} _{\Omega }))=\mathbb {P} {\big (}(B\cap A)\cup (\underbrace {B\cap A^{c}} _{:=B\setminus A}){\big )}{\overset {\text{ ext. P3 }}{=}}\mathbb {P} (B\cap A)+\mathbb {P} (B\setminus A)$



$\Box$

Illustration of simplified law of total probability:

|---------|
|  B\A    | <----- B
|    |----|-----|
|    |BnA |     |
|----|----|     | <---- A
|----------|


Proposition. (Simplified inclusion-exclusion principle) For each event $A$  and $B$ , $\mathbb {P} (A\cup B)=\mathbb {P} (A)+\mathbb {P} (B)-\mathbb {P} (A\cap B)$ .

Proof. Since events $A$  and $B\setminus A$  are disjoint, by extended P3,

$\mathbb {P} (A\cup (B\setminus A))=\mathbb {P} (A)+\mathbb {P} (B\setminus A)=\mathbb {P} (A)+(\mathbb {P} (B)-\mathbb {P} (B\cap A))=\mathbb {P} (A)+\mathbb {P} (B)-\mathbb {P} (A\cap B)$

since $\mathbb {P} (B)=\mathbb {P} (B\cap A)+\mathbb {P} (B\setminus A)\Rightarrow \mathbb {P} (B\setminus A)=\mathbb {P} (B)-\mathbb {P} (B\cap A)$ .

$\Box$

Illustration of simplified inclusion-exclusion principle:

|---------|
|         | <----- B
| II |----|-----|
|    |AnB |     |
|----|----| I   | <---- A
|----------|


$\mathbb {P} (A\cup B)=\mathbb {P} ({\text{I}})+\mathbb {P} ({\text{II}})+\mathbb {P} (A\cap B)=\underbrace {\mathbb {P} ({\text{I}})+\mathbb {P} (A\cap B)} _{\mathbb {P} (A)}+\underbrace {\mathbb {P} ({\text{II}})+\mathbb {P} (A\cap B)} _{\mathbb {P} (B)}-\mathbb {P} (A\cap B)$

Proposition. (Complement rule) For each event $E$ , $\mathbb {P} (E)=1-\mathbb {P} (E^{c})$ .

Proof.

$\mathbb {P} (E)=\mathbb {P} (E)+\mathbb {P} (E^{c})-\mathbb {P} (E^{c}){\overset {\text{ ext. P3 }}{=}}\mathbb {P} (\underbrace {E\cup E^{c}} _{\Omega })-\mathbb {P} (E^{c}){\overset {\text{ P2 }}{=}}1-\mathbb {P} (E^{c})$

$\Box$

Illustration of complement rule:

|---------------|
|               |
|      E^c      | <--- Omega (Pr(Omega)=1)
|    |---|      |
|    | E |      |
|    |---|      |
|---------------|


Proposition. (Numeric bound for probability) For each event $E$ , $0\leq \mathbb {P} (E)\leq 1$ .

Proof. By P1, $\mathbb {P} (E)\geq 0$ , and $\mathbb {P} (E^{c})\geq 0$ . So, $\mathbb {P} (E)\leq \mathbb {P} (E)+\mathbb {P} (E^{c})=\mathbb {P} (E)+(1-\mathbb {P} (E))=1$

$\Box$

Proposition. (Monotonicity) If $A\subseteq B$ , then $\mathbb {P} (A)\leq \mathbb {P} (B)$ .

Proof. By simplified law of total probability,

$\mathbb {P} (B)=\mathbb {P} (\underbrace {B\cap A} _{A})+\mathbb {P} (B\setminus A){\overset {\text{ P1 }}{\geq }}\mathbb {P} (A){\cancel {+0}}.$

$\Box$

Example. The probability of winning the champion in a competition is less than or equal to that of entering the final of the competition, by monotonicity.

Proof. Let $C$  and $F$  the event of winning the champion in the competition, and entering the final of the competition respectively. Then, $C\subseteq F$ , since $C\Rightarrow ({\text{implies}})\;F$  (when we win the champion, then we must enter the final), and so $\mathbb {P} (C)\leq \mathbb {P} (F)$ .

$\Box$

Exercise.

Select all correct statement(s). All following capital letters are events.

 If $A=B$ , then $\mathbb {P} (A)=\mathbb {P} (B)$ . $\mathbb {P} {\big (}A\setminus (B\cup C){\big )}=\mathbb {P} (A)+\mathbb {P} (A\cap B)+\mathbb {P} (A\cap C)-\mathbb {P} (A\cap B\cap C)$ . ${\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}}=1$ if $A\subseteq B$ and $\mathbb {P} (B)>0$ . $0\leq {\frac {\mathbb {P} (A\cap B)}{\mathbb {P} (B)}}\leq 1$ if $\mathbb {P} (B)>0$ .

### More advanced properties of probability

Theorem. (Inclusion-exclusion principle (probability))

Illustration of inclusion-exclusion principle when $n=3$

For each event $E_{1},\dotsc ,E_{n}$ ,

{\begin{aligned}\mathbb {P} (E_{1}\cup \dotsb \cup E_{n})&=\mathbb {P} (E_{1})+\dotsb +\mathbb {P} (E_{n})\\&\;-{\big (}\mathbb {P} (E_{1}\cap E_{2})+\mathbb {P} (E_{1}\cap E_{3})+\dotsb +\mathbb {P} (E_{n-1}\cap E_{n}){\big )}\\&\;+{\big (}\mathbb {P} (E_{1}\cap E_{2}\cap E_{3})+\mathbb {P} (E_{1}\cap E_{2}\cap E_{4})+\dotsb +\mathbb {P} (E_{n-2}\cap E_{n-1}\cap E_{n}){\big )}\\&\;-\dotsb \\&\;+(-1)^{n+1}\mathbb {P} (E_{1}\cap \dotsb \cap E_{n}).\end{aligned}}

Proof. We can prove this by induction.

Recall the simplified inclusion-exclusion principle, which is essentially the inclusion-exclusion principle when $n=2$ . So, we know that the inclusion-exclusion principle is true for $n=2$ , and it remains to prove the case with larger $n$ .

The idea of the induction is illustrated as follows: by simplified inclusion-exclusion principle,

{\begin{aligned}\mathbb {P} ((E_{1}\cup \dotsb \cup E_{n-1})\cup {\color {darkgreen}E_{n}})&=\mathbb {P} (E_{1}\cup \dotsb \cup E_{n-1})+\mathbb {P} ({\color {darkgreen}E_{n}})-\mathbb {P} {\big (}(E_{1}\cup \dotsb \cup E_{n-1})\cap {\color {darkgreen}E_{n}}{\big )}\\&=\mathbb {P} (E_{1}\cup \dotsb \cup E_{n-1})+\mathbb {P} ({\color {darkgreen}E_{n}})-\mathbb {P} {\big (}(E_{1}\cap {\color {darkgreen}E_{n}})\cup \dotsb \cup (E_{n-1}\cap {\color {darkgreen}E_{n}}){\big )}\\&=\dotsb \end{aligned}}

$\Box$

Remark.

• We can write the inclusion-exclusion principle more compactly as follows:

$\mathbb {P} (E_{1}\cup \dotsb \cup E_{n})=\sum _{j=1}^{n}(-1)^{j+1}\sum _{i_{1}<\dotsb

• An alternative and more elegant proof is provided in the chapter about properties of distributions.
• For the intersections of event, each possible distinct combination is involved.

Example. When $n=3$ , for each event $A$ , $B$  and $C$ ,

$\mathbb {P} (A\cup B\cup C)=\mathbb {P} (A)+\mathbb {P} (B)+\mathbb {P} (C)-\mathbb {P} (A\cap B)-\mathbb {P} (A\cap C)-\mathbb {P} (B\cap C)+\mathbb {P} (A\cap B\cap C).$

Example. We select a student from some university students. It is given that

• the selected student has a major in mathematics with a probability 0.4;
• the selected student has a major in statistics with a probability 0.55;
• the selected student has a major in accounting with a probability 0.3;
• the selected student has a major in statistics and accounting with a probability 0.2;
• the selected student has a major in accounting and mathematics with a probability 0.15;
• the selected student has a major in mathematics and statistics with a probability 0.2;
• the selected student has a major in mathematics, statistics and accounting with a probability 0.1.

Then, the probability that the selected student does not have any of these majors is $1-(0.4+0.55+0.3-0.2-0.15-0.2+0.1)=0.2$ .

Proof. Let $M$ , $S$ , $A$  be the event that the selected student among them has a major in mathematics, statistics and accounting respectively. Then,

{\begin{aligned}\mathbb {P} (M^{c}\cap S^{c}\cap A^{c})&=\mathbb {P} {\big (}(M\cup S\cup A)^{c}{\big )}=1-\mathbb {P} (M\cup S\cup A)\\&=1-(\mathbb {P} (M)+\mathbb {P} (S)+\mathbb {P} (A)-\mathbb {P} (M\cap S)-\mathbb {P} (M\cap A)-\mathbb {P} (S\cap A)+\mathbb {P} (M\cap S\cap A))\\&=1-(0.4+0.55+0.3-0.2-0.15-0.2+0.1)\\&=1-0.8=0.2.\end{aligned}}

Alternatively, we can consider the following Venn diagram:
|-------------| <--------- A
|             |
|        |----|----|
|        |    |    |
| 0.05   |0.05|0.15| <---- M
|        |    |    |
|--------|----|----|------|
|        |0.1 |0.1 |      |
| 0.1    |    |    | 0.25 | <---- S
|        |----|----|      |
|-------------|-----------|


We can see from this diagram that $\mathbb {P} (M\cup S\cup A)=0.05+0.05+0.15+0.1+0.1+0.1+0.25=0.8$ , and thus the desired probability is $1-0.8=0.2$ .

$\Box$

Exercise.

1 Calculate the probability that the selected student has at least two of those three majors.

 0.1 0.15 0.2 0.25 0.4

2 Calculate the probability that the selected student has one and only one major.

 0.3 0.35 0.4 0.45 0.5

Lemma. For each event $E_{1},E_{2},\dotsc$ ,

$\mathbb {P} \left(\bigcup _{i=1}^{\infty }E_{i}\right){\overset {\text{ def }}{=}}\mathbb {P} \left(\lim _{n\to \infty }\bigcup _{i=1}^{n}E_{i}\right)=\lim _{n\to \infty }\mathbb {P} \left(\bigcup _{i=1}^{n}E_{i}\right).$

Proof.

$\lim _{n\to \infty }\mathbb {P} \left(\bigcup _{i=1}^{n}E_{i}\right){\overset {\text{ext. P3}}{=}}\lim _{n\to \infty }\sum _{i=1}^{n}\mathbb {P} (E_{i}){\overset {\text{ def }}{=}}\sum _{i=1}^{\infty }\mathbb {P} (E_{i}){\overset {\text{ P3 }}{=}}\mathbb {P} \left(\bigcup _{i=1}^{\infty }E_{i}\right).$

$\Box$

Proposition. (Boole's inequality) For each event $E_{1},E_{2},\dotsc$ ,

$\mathbb {P} \left(\bigcup _{i=1}^{\infty }E_{i}\right)\leq \sum _{i=1}^{\infty }\mathbb {P} (E_{i}).$

Proof. First, by inclusion-exclusion principle, for each event $A$  and $B$ , $\mathbb {P} (A\cap B)=\mathbb {P} (A)+\mathbb {P} (B)-\mathbb {P} (A\cap B){\overset {\text{ P1 }}{\leq }}\mathbb {P} (A)+\mathbb {P} (B)$ .

So,

$\mathbb {P} \left(\bigcup _{i=1}^{n}E_{i}\right)\leq \mathbb {P} (E_{1})+\mathbb {P} \left(\bigcup _{i=2}^{n}E_{i}\right)\leq \mathbb {P} (E_{1})+\mathbb {P} (E_{2})+\mathbb {P} \left(\bigcup _{i=3}^{n}E_{i}\right)\leq \dotsb \leq \mathbb {P} (E_{1})+\mathbb {P} (E_{2})+\dotsb +\mathbb {P} (E_{n})=\sum _{i=1}^{n}\mathbb {P} (E_{i}).$

Using the lemma,

$\mathbb {P} \left(\bigcup _{i=1}^{\infty }E_{i}\right)=\lim _{n\to \infty }\mathbb {P} \left(\bigcup _{i=1}^{n}E_{i}\right){\overset {\text{from above}}{\leq }}\lim _{n\to \infty }\sum _{i=1}^{n}\mathbb {P} (E_{i}){\overset {\text{ def }}{=}}\sum _{i=1}^{\infty }\mathbb {P} (E_{i}).$

$\Box$

1. e.g. the sample space of throwing a dice may include the six numbers, or may only include two outcomes: odd number and even number
2. e.g. it is given that a coin is biased, such that it is more likely that head comes up
3. However, it is still possible that the coin is fair.
4. ext. stands for 'extended'