# Probability/Probability Spaces

## ConceptEdit

We will now proceed to develop a more axiomatic theory of probability, allowing for a simpler mathematical formalism. We shall proceed by developing the concept of a probability space, which will allow us to harness many theorems in mathematical analysis.

Recall that an experiment is any action or process with an outcome that is subject to uncertainty or randomness. A probability space or a probability triple is a mathematical construct that models an experiment and its set of possible outcomes.

## Probability spaceEdit

Before defining *probability space*, we define several terms used in its definition.

**Definition.**
(Sample space)
The *sample space*, denoted by , is the non-empty set whose elements are all possible outcomes of an experiment.

**Remark.**

- The sample space is often
*not*unique, since there are often multiple ways to define the possible outcomes of an experiment, possibly because of the difference in expression^{[1]}. - An outcome from the experiment is commonly denoted by (small letter of , omega).

**Example.**
A sample space of the numbers coming up from rolling a six-faced dice is .

**Definition.**
(Event)
An *event* is a subset of the sample space.

**Remark.**

- It follows that the event space , which is a set consisting all events (or
*f*amily of events), is the power set of sample space, i.e. . - Event consisting a single outcome (which is a singleton) is sometimes referred as
*simple*event, and event consisting more than one outcomes is sometimes referred as*compound*event. - An event is said to have
*happened*or*occurred*if the outcome of the experiment is an element of the event.

**Example.**
Sets and are *events* from rolling a six-faced dice, while the set is not.

**Definition.**
(Probability space)
A *probability space* is a mathematical triplet consisting of the sample space , event space , and a probability function .

**Remark.**

- There are multiple ways to define the probability functions, as we will see in the following sections, and among those definitions, the
*axiomatic*definition is the most used, and general. - The probability function is sometimes denoted by , or instead.

- The notation is mainly used in this book to distinguish the probability function from other functions named or .

- A probability space is arbitrary, in the sense that its author ultimately defines which elements , , and will contain.
- The probability function may present a model for a particular class of real-world situations.

## TerminologiesEdit

Terminologies of set from set theory also apply to event, since event is essentially a set. Apart from those terminologies, we also have the following extra terminologies for event.

**Definition.**
(Exhaustive)
Events are *exhaustive* if .

**Example.**
When we are rolling a six-faced dice, and we are considering the number coming up as the outcome,
the events and are exhaustive,
while the events and are not exhaustive.

**Definition.**
(Partition)
A group of events is a *partition* of if the events are *both* disjoint and exhaustive.

**Example.**
When we are rolling a six-faced dice, and we are considering the number coming up as the outcome,
the group of events and is a partition, while
the group of events and is not a partition, since these events are not disjoint.

## Probability definitionEdit

The remaining undefined item in the probability space is the *probability function* , and we will give various definitions of it,
in which the *combinatorial* (or classical), and *axiomatic* definitions are important.

**Definition.**
(Subjective probability)
The *probability* of an event is a measure of the *chance*
with which we can expect the event to occur.
We assign a number between 0 and 1 inclusively to the probability of an event.
A probability of 1 means that we are certain the event will occur,
and a probability of 0 means that we are certain the event will *not* occur.

**Example.**
Amy and Bob access their probabilities of winning the top prize from a lucky draw using the subjective probability approach.

- Amy thinks that she is lucky, and thus assign 0.7 to the probability of winning the top prize.
- Bob thinks that he is unlucky, and thus assign 0.1 to the probability of winning the top prize.

**Remark.**

- This illustrates a major problem of subjective probability, namely the probability assigned to an event is often not unique, due to different opinions from different people.

**Definition.**
(Combinatorial probability)
Assume all outcomes in the sample space are *equally likely*.
Then, the (combinatorial) probability of an event (say ) in the sample space is .

**Remark.**

- It is also called
*classical*probability. - If the outcomes are
*not*equally likely, we cannot apply this definition. - By
*principle of indifference*(or insufficient reason), unless there exists evidence showing that the outcomes are*not*equally likely^{[2]}, we should*assume*that the outcomes are equally likely. - When the sample space contains infinitely many outcomes, the combinatorial probability is undefined.

Calculation of the probability for not equally likely outcomes |
---|

When all outcomes are not equally likely, let be the outcomes of a sample space . Let . In this case, the probability of each outcome, , is assumed to be known. Then, list all the outcomes in the interested event , say, , and we have However, the probability of each of the not equally likely outcomes is often unknown. If this is the case, the above method does not work, and we cannot apply combinatorial probability in this context. So, the combinatorial probability definition does not work well when we are encountering outcomes that are not equally likely. |

**Example.**
The probability of getting the number 1 coming up from rolling a fair red six-faced dice and a fair blue six-faced dice is .

**Proof.**
The number of pair of numbers coming up for the two dices is .
Since the dice is fair, the 36 outcomes are equally likely, and so we can apply combinatorial probability here.

**Example.**
(Capture-mark-recapture)
We are fishing in a lake, containing fishes.
First, we catch fishes from the lake (capture), and gave them each a marker (mark).
Then, we catch fishes from the lake again (recapture), and catch (and also ) fishes this time.
The probability that there is marked fishes in the fishes is
.

**Proof.**
We order the fishes in the lake notionally (e.g. by assigning them different number one by one), so that they are now distinguishable (notionally), then, we have:

- : the number of outcomes of catching fishes from fishes;
- : the number of outcomes of catching marked fishes from marked fishes in the recapture process;
- : the number of outcomes of catching unmarked fishes from unmarked fishes in the recapture process (this ensure that we
*only*catch marked fishes, by ensuring that the remaining caught fishes do not contain any marked fish).

**Definition.**
(Frequentist probability)
The *probability* of an event or outcome is the long-term proportion of times the event would occur if the experiment was repeated independently many times.
That is, letting be the no. of times that event occurs from repetitions of experiment, then the probability of is

**Remark.**

- When the no. of repetitions is large enough, the ratio of the no. of times that event occurs from these repetitions to the no. of repetitions can be used to
*approximate*.

**Example.**
Suppose we throw a coin 1 million times (i.e. 1000000 times).
The number of head coming up is 700102, the number of tail coming up is 299896,
and the number of times that the coin lands on edge is 2.

Then, the probability that the head coming up is close to .

After that, we may think that the coin is unfair ^{[3]}.

**Definition.**
(Axiomatic probability)
A *probability* is a *set function* defined on the event space . It assigns a real value to each event , with the following *probability axioms* satisified:

- (P1) for each event , (nonnegativity);
- (P2) (unitarity);
- (P3) for each (countable) infinite sequence of mutually exclusive (or disjoint) events , (countable additivity).

**Example.**
Based on the probability axioms, the probability of an event is impossible to be -0.1.

**Example.**
(Combinatorial probability is probability)
Combinatorial probability is a probability since it satisfies all three probability axioms.

**Proof.**

- (P1) It follows from observing that the no. of outcomes is nonnegative;
- (P2) It follows from observing that the no. of outcomes in the event (which is a subset of sample space) cannot be larger than the no. of outcomes in the sample space;
- (P3) It follows from observing that the no. of outcomes in union of (infinite) disjoint sets is the same as the sum of no. of outcomes in each of the (infinite) disjoint sets (possibly through the Venn diagram, non-rigorously).

With these three axioms only, we can prove many well-known properties of probability.

## Properties of probabilityEdit

### Basic properties of probabilityEdit

**Proposition.**
(Probability of empty set)
.

**Proof.**
Let for each positive integer .
are mutually exclusive, since they are all empty sets, and the intersection of each two of them is also empty set. Also, . So,

**Proposition.**
(Extended P3)
The property of probability in the third axiom of probability (P3) is also valid for a *finite* sequence of events.

**Proof.**
For each positive integer ,
suppose that are disjoint events, and append to these the infinite sequence of events . By P3,

**Proposition.**
(Simplified law of total probability)
For each event and , .

**Proof.**

^{[4]}

Illustration of simplified law of total probability:

|---------| | B\A | <----- B | |----|-----| | |BnA | | |----|----| | <---- A |----------|

**Proposition.**
(Simplified inclusion-exclusion principle)
For each event and ,
.

**Proof.**
Since events and are disjoint, by extended P3,

Illustration of simplified inclusion-exclusion principle:

|---------| | | <----- B | II |----|-----| | |AnB | | |----|----| I | <---- A |----------|

**Proposition.**
(Complement rule)
For each event , .

**Proof.**

Illustration of complement rule:

|---------------| | | | E^c | <--- Omega (Pr(Omega)=1) | |---| | | | E | | | |---| | |---------------|

**Proposition.**
(Numeric bound for probability)
For each event , .

**Proof.**
By P1, , and .
So,

**Proposition.**
(Monotonicity)
If , then .

**Proof.**
By simplified law of total probability,

**Example.**
The probability of winning the champion in a competition is less than or equal to that of entering the final of the competition, by monotonicity.

**Proof.**
Let and the event of winning the champion in the competition, and entering the final of the competition respectively.
Then, , since (when we win the champion, then we must enter the final),
and so .

### More advanced properties of probabilityEdit

**Proof.**
We can prove this by induction.

Recall the simplified inclusion-exclusion principle, which is essentially the inclusion-exclusion principle when . So, we know that the inclusion-exclusion principle is true for , and it remains to prove the case with larger .

The idea of the induction is illustrated as follows: by simplified inclusion-exclusion principle,

**Remark.**

- We can write the inclusion-exclusion principle more compactly as follows:

- An alternative and more elegant proof is provided in the chapter about properties of distributions.
- For the intersections of event, each possible distinct combination is involved.

**Example.**
When , for each event , and ,

**Example.**
We select a student from some university students. It is given that

- the selected student has a major in mathematics with a probability 0.4;
- the selected student has a major in statistics with a probability 0.55;
- the selected student has a major in accounting with a probability 0.3;
- the selected student has a major in statistics and accounting with a probability 0.2;
- the selected student has a major in accounting and mathematics with a probability 0.15;
- the selected student has a major in mathematics and statistics with a probability 0.2;
- the selected student has a major in mathematics, statistics and accounting with a probability 0.1.

Then, the probability that the selected student does not have any of these majors is .

**Proof.**
Let , , be the event that the selected student among them has a major in mathematics, statistics and accounting respectively.
Then,

|-------------| <--------- A | | | |----|----| | | | | | 0.05 |0.05|0.15| <---- M | | | | |--------|----|----|------| | |0.1 |0.1 | | | 0.1 | | | 0.25 | <---- S | |----|----| | |-------------|-----------|

We can see from this diagram that , and thus the desired probability is .

The steps for constructing the above Venn diagram |
---|

Given: |-------------| <--------- A | | | |----|----| | | | | | | | | <---- M | | | | |--------|----|----|------| | |0.1 | | | | | | | | <---- S | |----|----| | |-------------|-----------| Calculate: by observing the Venn digram and using the given information |-------------| <--------- A | | | |----|----| | | | | | |0.05| | <---- M | | | | |--------|----|----|------| | |0.1 | | | | | | | | <---- S | |----|----| | |-------------|-----------| Calculate: by observing the Venn digram and using the given information |-------------| <--------- A | | | |----|----| | | | | | |0.05| | <---- M | | | | |--------|----|----|------| | |0.1 |0.1 | | | | | | | <---- S | |----|----| | |-------------|-----------| Calculate: by observing the Venn diagram and using the given information |-------------| <--------- A | | | |----|----| | | | | | |0.05|0.15| <---- M | | | | |--------|----|----|------| | |0.1 |0.1 | | | | | | | <---- S | |----|----| | |-------------|-----------| Calculate: using the given information |-------------| <--------- A | | | |----|----| | | | | | |0.05|0.15| <---- M | | | | |--------|----|----|------| | |0.1 |0.1 | | | 0.1 | | | | <---- S | |----|----| | |-------------|-----------| Calculate: using the given information and observing the Venn digram |-------------| <--------- A | | | |----|----| | | | | |0.05 |0.05|0.15| <---- M | | | | |--------|----|----|------| | |0.1 |0.1 | | | 0.1 | | | | <---- S | |----|----| | |-------------|-----------| Calculate: using the given information and observing the Venn digram |-------------| <--------- A | | | |----|----| | | | | |0.05 |0.05|0.15| <---- M | | | | |--------|----|----|------| | |0.1 |0.1 | | | 0.1 | | | 0.25 | <---- S | |----|----| | |-------------|-----------| |

**Lemma.**
For each event ,

**Proof.**

**Proposition.**
(Boole's inequality)
For each event ,

**Proof.**
First, by inclusion-exclusion principle, for each event and ,
.

So,

Using the lemma,

- ↑ e.g. the sample space of throwing a dice may include the six numbers, or may only include two outcomes: odd number and even number
- ↑ e.g. it is given that a coin is biased, such that it is more likely that head comes up
- ↑ However, it is still
*possible*that the coin is fair. - ↑ ext. stands for 'extended'