Probability/Probability Spaces


ConceptEdit

We will now proceed to develop a more axiomatic theory of probability, allowing for a simpler mathematical formalism. We shall proceed by developing the concept of a probability space, which will allow us to harness many theorems in mathematical analysis.

Recall that an experiment is any action or process with an outcome that is subject to uncertainty or randomness. A probability space or a probability triple is a mathematical construct that models an experiment and its set of possible outcomes.

Probability spaceEdit

Before defining probability space, we define several terms used in its definition.

Definition. (Sample space) The sample space, denoted by  , is the non-empty set whose elements are all possible outcomes of an experiment.

Remark.

  • The sample space is often not unique, since there are often multiple ways to define the possible outcomes of an experiment, possibly because of the difference in expression [1].
  • An outcome from the experiment is commonly denoted by   (small letter of  , omega).

Example. A sample space of the numbers coming up from rolling a six-faced dice is  .

Definition. (Event) An event is a subset of the sample space.

Remark.

  • It follows that the event space  , which is a set consisting all events (or family of events), is the power set of sample space, i.e.  .
  • Event consisting a single outcome (which is a singleton) is sometimes referred as simple event, and event consisting more than one outcomes is sometimes referred as compound event.
  • An event is said to have happened or occurred if the outcome of the experiment is an element of the event.

Example. Sets   and   are events from rolling a six-faced dice, while the set   is not.

Definition. (Probability space) A probability space is a mathematical triplet   consisting of the sample space  , event space  , and a probability function  .

Remark.

  • There are multiple ways to define the probability functions, as we will see in the following sections, and among those definitions, the axiomatic definition is the most used, and general.
  • The probability function is sometimes denoted by  ,   or   instead.
  • The notation   is mainly used in this book to distinguish the probability function from other functions named   or  .
  • A probability space is arbitrary, in the sense that its author ultimately defines which elements  ,  , and   will contain.
  • The probability function   may present a model for a particular class of real-world situations.

TerminologiesEdit

Terminologies of set from set theory also apply to event, since event is essentially a set. Apart from those terminologies, we also have the following extra terminologies for event.

Definition. (Exhaustive) Events   are exhaustive if  .

Example. When we are rolling a six-faced dice, and we are considering the number coming up as the outcome, the events   and   are exhaustive, while the events   and   are not exhaustive.

Definition. (Partition) A group of events   is a partition of   if the events are both disjoint and exhaustive.

Example. When we are rolling a six-faced dice, and we are considering the number coming up as the outcome, the group of events   and   is a partition, while the group of events   and   is not a partition, since these events are not disjoint.

Probability definitionEdit

The remaining undefined item in the probability space is the probability function  , and we will give various definitions of it, in which the combinatorial (or classical), and axiomatic definitions are important.

Definition. (Subjective probability) The probability of an event is a measure of the chance with which we can expect the event to occur. We assign a number between 0 and 1 inclusively to the probability of an event. A probability of 1 means that we are certain the event will occur, and a probability of 0 means that we are certain the event will not occur.

Example. Amy and Bob access their probabilities of winning the top prize from a lucky draw using the subjective probability approach.

  • Amy thinks that she is lucky, and thus assign 0.7 to the probability of winning the top prize.
  • Bob thinks that he is unlucky, and thus assign 0.1 to the probability of winning the top prize.

Remark.

  • This illustrates a major problem of subjective probability, namely the probability assigned to an event is often not unique, due to different opinions from different people.

Definition. (Combinatorial probability) Assume all outcomes in the sample space   are equally likely. Then, the (combinatorial) probability of an event (say  ) in the sample space is  .

Remark.

  • It is also called classical probability.
  • If the outcomes are not equally likely, we cannot apply this definition.
  • By principle of indifference (or insufficient reason), unless there exists evidence showing that the outcomes are not equally likely [2], we should assume that the outcomes are equally likely.
  • When the sample space contains infinitely many outcomes, the combinatorial probability is undefined.

Example. The probability of getting the number 1 coming up from rolling a fair red six-faced dice and a fair blue six-faced dice is  .

Proof. The number of pair of numbers coming up for the two dices is  . Since the dice is fair, the 36 outcomes are equally likely, and so we can apply combinatorial probability here.

 

Exercise.

Suppose the blue dice is colored red. Calculate the probability again.

1/36
1/21
1/18
1/15
1/6



Example. (Capture-mark-recapture) We are fishing in a lake, containing   fishes. First, we catch   fishes from the lake (capture), and gave them each a marker (mark). Then, we catch fishes from the lake again (recapture), and catch   (and also  ) fishes this time. The probability that there is   marked fishes in the   fishes is  .

Proof. We order the   fishes in the lake notionally (e.g. by assigning them different number one by one), so that they are now distinguishable (notionally), then, we have:

  •  : the number of outcomes of catching   fishes from   fishes;
  •  : the number of outcomes of catching   marked fishes from   marked fishes in the recapture process;
  •  : the number of outcomes of catching   unmarked fishes from   unmarked fishes in the recapture process (this ensure that we only catch   marked fishes, by ensuring that the remaining caught fishes do not contain any marked fish).

 


Exercise. There are 9 balls in a box, consisting of 3 red balls, 2 blue balls and 4 green balls.

1 Calculate the probability that a red ball is drawn from the box if 1 ball is drawn from the box.

1/28
3/28
1/9
1/3
None of the above.

2 Calculate the probability that 2 red balls and 3 green balls are drawn from the box if 6 balls are drawn from the box.

2/7
5/9
5/7
5/6
None of the above.

3   orange balls are added to the box such that the probability that 2 red balls and 3 green balls are drawn from the box if 6 balls are drawn from the box is now  . Calculate  .

2
4
8
16
None of the above.

4 Select the correct (in numerical value sense) expression(s) of the probability that   red balls are drawn and   blue balls are drawn if   balls are drawn from the box ( ,   and   are of values such that all terms in the following are defined).

 
 
 
 
 


Definition. (Frequentist probability) The probability of an event or outcome is the long-term proportion of times the event would occur if the experiment was repeated independently many times. That is, letting   be the no. of times that event   occurs from   repetitions of experiment, then the probability of   is  

Remark.

  • When the no. of repetitions is large enough, the ratio of the no. of times that event   occurs from these repetitions to the no. of repetitions can be used to approximate  .

Example. Suppose we throw a coin 1 million times (i.e. 1000000 times). The number of head coming up is 700102, the number of tail coming up is 299896, and the number of times that the coin lands on edge is 2.

Then, the probability that the head coming up is close to  .

After that, we may think that the coin is unfair [3].

Definition. (Axiomatic probability) A probability is a set function defined on the event space  . It assigns a real value   to each event  , with the following probability axioms satisified:

(P1) for each event  ,   (nonnegativity);
(P2)   (unitarity);
(P3) for each (countable) infinite sequence of mutually exclusive (or disjoint) events  ,   (countable additivity).

Example. Based on the probability axioms, the probability of an event is impossible to be -0.1.

Example. (Combinatorial probability is probability) Combinatorial probability is a probability since it satisfies all three probability axioms.

Proof.

(P1) It follows from observing that the no. of outcomes is nonnegative;
(P2) It follows from observing that the no. of outcomes in the event (which is a subset of sample space) cannot be larger than the no. of outcomes in the sample space;
(P3) It follows from observing that the no. of outcomes in union of (infinite) disjoint sets is the same as the sum of no. of outcomes in each of the (infinite) disjoint sets (possibly through the Venn diagram, non-rigorously).

 


With these three axioms only, we can prove many well-known properties of probability.

Properties of probabilityEdit

Basic properties of probabilityEdit

Proposition. (Probability of empty set)  .

Proof. Let   for each positive integer  .   are mutually exclusive, since they are all empty sets, and the intersection of each two of them is also empty set. Also,  . So,

 
By P1,  . It follows that from these two inequalities that  .

 

Proposition. (Extended P3) The property of probability in the third axiom of probability (P3) is also valid for a finite sequence of events.

Proof. For each positive integer  , suppose that   are disjoint events, and append to these the infinite sequence of events  . By P3,

 
since  .

 

Proposition. (Simplified law of total probability) For each event   and  ,  .

Proof.

 
[4]

 

Illustration of simplified law of total probability:

|---------|        
|  B\A    | <----- B
|    |----|-----|
|    |BnA |     |
|----|----|     | <---- A
     |----------|

Proposition. (Simplified inclusion-exclusion principle) For each event   and  ,  .

Proof. Since events   and   are disjoint, by extended P3,

 
since  .

 

Illustration of simplified inclusion-exclusion principle:

|---------|        
|         | <----- B
| II |----|-----|
|    |AnB |     |
|----|----| I   | <---- A
     |----------|

 

Proposition. (Complement rule) For each event  ,  .

Proof.

 

 

Illustration of complement rule:

|---------------|
|               |
|      E^c      | <--- Omega (Pr(Omega)=1)
|    |---|      |
|    | E |      |
|    |---|      |
|---------------|

Proposition. (Numeric bound for probability) For each event  ,  .

Proof. By P1,  , and  . So,  

 

Proposition. (Monotonicity) If  , then  .

Proof. By simplified law of total probability,

 

 

Example. The probability of winning the champion in a competition is less than or equal to that of entering the final of the competition, by monotonicity.

Proof. Let   and   the event of winning the champion in the competition, and entering the final of the competition respectively. Then,  , since   (when we win the champion, then we must enter the final), and so  .

 


Exercise.

Select all correct statement(s). All following capital letters are events.

If  , then   .
 .
  if   and  .
  if   .


More advanced properties of probabilityEdit

Theorem. (Inclusion-exclusion principle (probability))

 
Illustration of inclusion-exclusion principle when  

For each event  ,

 

Proof. We can prove this by induction.

Recall the simplified inclusion-exclusion principle, which is essentially the inclusion-exclusion principle when  . So, we know that the inclusion-exclusion principle is true for  , and it remains to prove the case with larger  .

The idea of the induction is illustrated as follows: by simplified inclusion-exclusion principle,

 

 

Remark.

  • We can write the inclusion-exclusion principle more compactly as follows:

 
  • An alternative and more elegant proof is provided in the chapter about properties of distributions.
  • For the intersections of event, each possible distinct combination is involved.

Example. When  , for each event  ,   and  ,

 

Example. We select a student from some university students. It is given that

  • the selected student has a major in mathematics with a probability 0.4;
  • the selected student has a major in statistics with a probability 0.55;
  • the selected student has a major in accounting with a probability 0.3;
  • the selected student has a major in statistics and accounting with a probability 0.2;
  • the selected student has a major in accounting and mathematics with a probability 0.15;
  • the selected student has a major in mathematics and statistics with a probability 0.2;
  • the selected student has a major in mathematics, statistics and accounting with a probability 0.1.

Then, the probability that the selected student does not have any of these majors is  .

Proof. Let  ,  ,   be the event that the selected student among them has a major in mathematics, statistics and accounting respectively. Then,

 
Alternatively, we can consider the following Venn diagram:
|-------------| <--------- A
|             |
|        |----|----|
|        |    |    |
| 0.05   |0.05|0.15| <---- M
|        |    |    |
|--------|----|----|------|
|        |0.1 |0.1 |      | 
| 0.1    |    |    | 0.25 | <---- S
|        |----|----|      |
|-------------|-----------|

We can see from this diagram that  , and thus the desired probability is  .

 

Exercise.

1 Calculate the probability that the selected student has at least two of those three majors.

0.1
0.15
0.2
0.25
0.4

2 Calculate the probability that the selected student has one and only one major.

0.3
0.35
0.4
0.45
0.5



Lemma. For each event  ,

 

Proof.

 

 

Proposition. (Boole's inequality) For each event  ,

 

Proof. First, by inclusion-exclusion principle, for each event   and  ,  .

So,

 

Using the lemma,

 

 


  1. e.g. the sample space of throwing a dice may include the six numbers, or may only include two outcomes: odd number and even number
  2. e.g. it is given that a coin is biased, such that it is more likely that head comes up
  3. However, it is still possible that the coin is fair.
  4. ext. stands for 'extended'