Probability/Probability Spaces

Terminologies edit

The name of this chapter, probability space, is a mathematical construct that models a random experiment. To be more precise:

Definition. (Probability space) A probability space is a mathematical triplet   consisting of three elements: a sample space  , an event space  , and a probability measure  .

Let us first give the definitions related to sample space. For the definitions of event space and probability, we will discuss it in later sections.

Definition. (Sample space and sample point) The sample space of a random experiment, denoted by  , is a nonempty set consisting of all possible and indecomposable outcomes of the random experiment, called sample points.

Remark.

  • A sample point is often denoted by   (lowercase omega), in contrast with the uppercase omega   used for the notation of sample space.
  • By an indecomposable outcome, we mean we are unable to split the outcome into multiple more specific outcomes ("smaller pieces"). So, in other words, indecomposable outcomes mean outcomes that are expressed in a way that is as specific as possible. Furthermore, the random experiment should only result in exactly one of the indecomposable outcomes.
  • For instance, when rolling a dice and consider the number facing up, we may say the outcomes are "odd number" and "even number". But such outcomes are decomposable: we can split "odd number" to three more specific outcomes: 1,3,5, and "even number" to three more specific outcomes: 2,4,6.
  • On the other hand, we may regard the outcomes as the numbers 1,2,3,4,5,6, which are indecomposable, and are expressed in the most specific way.

Example. A sample space of tossing a coin is   ( : heads come up,  : tails come up), and the sample points are  .

Definition. (Event) An event is a subset of the sample space.

Remark.

  • So, an event is an aggregate of sample points.
  • An event is said to be simple event if it consists of exactly one sample point.
  • An event is said to be compound event if it consists of more than one sample point.

Definition. (Occurrence of events) An event occurs if it contains the outcome from the random experiment.

Example. We take the sample space of rolling a dice to be  . Then, the sets   are events, while the set   is not.

Suppose we get the number 3 from rolling the dice. Then,   and   occur, but   does not occur.

Probability interpretations edit

In this chapter, we will discuss probability mathematically, and we will give an axiomatic and abstract definition to probability (function). By axiomatic definition, we mean defining probability to be a function that satisfying some axioms, called probability axioms. But such axiomatic definition does not tell us how should we interpret the term "probability", so the definition is said to be independent from the interpretation of probability. Such independence make the formal definition always applicable, no matter how you interpret probability.

However, the axiomatic definition does not suggest a way to construct a probability measure (i.e., assigning probabilities to events): it just states that probability is a function satisfying certain axioms, but how can we construct such function in the first place? In this section, we will discuss two main types of probability interpretations: subjectivism and frequentism, where the method of assigning probabilities to events is mentioned in each of them.

Subjectivism edit

Intuitively and naturally, probability of an event is often regarded as a numerical measure of the "chance" of the occurrence of the event (that is, how likely the event will occur). So, it is natural for us to assign probability to an event based on our own assessment on the "chance". (In order for the probability to be valid according to the axiomatic definition, the assignment needs to satisfy the probability axioms.) But different people may have different assessment on the "chance", depending on their personal opinions. So, we can see that such interpretation of probability is somewhat subjective, since different people may assign different probabilities to the same event. Hence, we call such probability interpretation as subjectivism (also known as Bayesian probability).

Example. Amy and Bob assign the probabilities of winning the top prize from a lucky draw according to subjectivism:

  • Amy thinks that she is lucky, and thus assign 0.7 to the probability of winning the top prize.
  • Bob thinks that he is unlucky, and thus assign 0.1 to the probability of winning the top prize.

The main issue of the subjectivism is the lack of objectivity, since different probabilities can be assigned to the same event based on personal opinion. Then, we may have difficulties in choosing which of the probabilities should be used for that event. To mitigate the issue of the lack of objectivity, we may adjust our degrees of belief on an event from time to time when there are more observed data through Bayes' theorem, which will be discussed in later chapter, so that the value is assigned in a more objective way. However, even after the adjustment, the assignment of value is still not in an entirely objective way, since the adjusted value (known as posterior probability) still depends on the initial value (known as prior probability), which is assigned subjectively.

Frequentism edit

Another probability interpretation, which is objective, is called frequentism. We denote by   the number of occurrences of an event   in   repetitions of experiment. (An experiment is any action or process with an outcome that is subject to uncertainty or randomness.) Then, we call   as the relative frequency of the event  . Intuitively, we will expect that the relative frequency fluctuates less and less as   gets larger and larger, and approach to a constant limiting value (we call this as limiting relative frequency) as   tends to infinity, i.e., the limiting relative frequency is  . It is thus natural to take the limiting relative frequency as the probability of the event  . This is exactly what the definition of probability in the frequentism. In particular, the existence of such limiting relative frequency is an assumption or axiom in frequentism. (As a side result, when   is large enough, the relative frequency of the event   may be used to approximate the probability of the event  .)

However, an issue of frequentism is that it may be infeasible to conduct experiments many times for some events. Hence, for those events, no probability can be assigned to them, and this is clearly a limitation for frequentism.

Example. Suppose John is taking a course in probability theory, and the course can only be taken once. Consider the event "John passes the course". For this event, it is infeasible to repeat the experiment many times, since John can only take the course once, so there can be only one experiment. (Notice that we cannot use the experiences for other students taking the course as the experiments for this event, since we specify that the person taking the course must be John, but not others.)

Example. Consider a couple, Amy and Bob, and the event "this couple divorce eventually". For this event, it is again infeasible to repeat the experiment many times, since there is only one such pair of couples, and thus there can only be one experiment.

Because of these issues, we will instead use a modern axiomatic and abstract approach to define probability, which is suggested by a Russian mathematician named Andrey Nikolaevich Kolmogorov in 1933. By axiomatic approach, we mean defining probability quite broadly and abstractly as something that satisfy certain axioms (called probability axioms). Such probability axioms are the mathematical foundation and the basis of modern probability theory.

Probability axioms edit

Since we want use the probability measure   to assign probability   to every event   in the sample space, it seems natural for us to set domain of the probability measure   to be the set containing all subsets of  , i.e., the power set of  ,  . Unfortunately, this situation is not that simple, and there are some technical difficulties if we set the domain like this, when the sample space   is uncountable.

Remark.

  • A set is countable, if, intuitively and informally, we can "count" the elements of the set. E.g., every finite set is countable since we can "count" the elements in every finite set one by one. Also,   are countable since we can still "count" the elements in these sets one by one. (A countable infinite set is also called a countably infinite set.)
  • A set is uncountable if it is not countable.
  • Some examples of uncountable sets include   and real intervals   ( ).

This is because the power set of such uncountable sample space includes some "badly behaved" sets, which causes problems when assigning probabilities to them. (Here, we will not discuss those sets and these technical difficulties in details.) Thus, instead of setting the domain of the probability measure to be  , we set the domain to be a  -algebra (sigma-algebra) containing some "sufficiently well-behaved" events:

Definition. ( -algebra) Let   be a set (considered to be the universal set), and   be its power set. A set   is a  -algebra on the set  , if the following three properties are satisfied:

  1.  .
  2. For every set  , if the set  , then its complement  . (closure under complementation)
  3. For every infinite sequence of sets  , if the sets  , then  . (closure under countable unions)

Remark.

  • Elements of the  -algebra are sometimes called measurable sets. Hence, whether a set is measurable or not actually depends on how we construct the  -algebra. However, we usually construct  -algebra to include "well-behaved" sets only, and exclude "badly behaved" sets. So, in this case, only "well-behaved" sets are measurable.
  • A  -algebra on the sample space  , which consists of a family (or set) of events, is also called an event space, denoted by  .
  • We call   as a measurable space. In the context of probability, the measurable space is  .
  • When the property 3 is weakened to the closure under finite unions, that is, "For every finite sequence  , if the sets  , then  .", then we call   as an algebra over the set  . As we will see, the property 3 above implies this weakened property. Thus, a  -algebra on the set   is necessarily an algebra over the set   (but not vice versa).
  • In terms of probabilities, an intuitive meaning of each of the above three properties is
  • property 1: we should be able assign the probability to the entire sample space (this comes from the probability axiom actually);
  • property 2: if we are able to assign probability to an event ("chance" of the occurrence of the event), then we should also be able to assign a probability to its complement ("chance" of the non-occurrence of the event);
  • property 3: if we are able to assign probabilities to the events  , then we should be able to assign a probability to their union ("chance" of the occurrence of one of the events).
  • Actually, from these three properties, we can deduce some similar results for finitely many events and intersections.

Proposition. (Further properties of  -algebra) Let   be a  -algebra of a set  . Then, we have

  1.  .
  2. For every set  , if the sets  , then  . (closure under finite unions)
  3. For every infinite sequence of sets  , if the sets  , then  . (closure under countable intersections)
  4. For every set  , if the sets  , then  . (closure under finite intersections)

Proof.

Property 1: By the closure under complementation, since  , it follows that  .

Property 2: By the closure under countable unions, we have for every infinite sequence of sets  , if the sets  , then  . So, in particular, we can choose the sequence to be   ( ) where   is an arbitrary sequence such that  . Then,

 
Thus, we have the desired result.

Property 3: For every infinite sequence of sets  , by the closure under complementation, we have

 
Then, by the closure under countable unions, we have  . After that, we use the De Morgan's law:
 
Using the closure under complementation property again, we have
 
as desired.

Property 4: The proof is similar to that of property 2, and hence left as an exercise.

 

Remark.

  • From these properties, it follows that (finite/countably infinite) (unions/intersections) of (complements of) (sets in  -algebra) is also in  -algebra. So, even after many operations on sets in  -algebra, the resulting set is also in  -algebra.
 

Exercise. Prove the property 4 in the proposition above.

Proof

Proof. By the closure under countable intersections (property 3 in proposition above), we have for every infinite sequence of sets  , if the sets  , then  . So, in particular, we can choose the sequence to be   ( ) where   is an arbitrary sequence such that  . Then,

 
Thus, we have the desired result.

 


Example. (Smallest and largest  -algebra) Let   be a sample space.

(a) Prove that the event space   (called trivial  -algebra) is a  -algebra on  . This is the smallest  -algebra on  . (To be more precise,   for every  -algebra   on  .)

(b) Prove that the event space   is a  -algebra on  . This is the largest  -algebra on  . (To be more precise,   for every  -algebra   on  .)


Solution.

(a)

Proof. To prove that   is a  -algebra, we need to show that   satisfies the three defining properties.

Property 1: Since  , the property 1 is satisfied.

Property 2 (closure under complementation): Since   contains only   and  , it suffices to show that the complement of each of them also belongs to  :   and  . So, the property 2 is satisfied.

Property 3 (closure under countable unions): Since   contains only   and  , the union of countably infinitely many events (" ") in   (each of them is either   or  ) is either   or  . In either case, the union belongs to  . Thus, the property 3 is satisfied.

 

(b)

Proof.

Property 1: Since  , it follows that  . Thus, the property is satisfied.

Property 2: Assume that  . Then,  . By the definition of complement, we have  . Hence,  , and thus the property is satisfied.

Property 3: For every infinite sequence of sets  , we have  . It follows by the property of union that  , and hence  . Thus, the property is satisfied.

 


 

Exercise.

1 Let the sample space be  . Which of the following is/are  -algebra of  ?

 
 
 
 
 

2 Let the sample space be  . Which of the following is/are  -algebra of  ?

 
 
  (where   and   are sets of all rational numbers and irrational numbers respectively)
 


We have seen two examples of  -algebra in the example above. Often, the "smallest"  -algebra is not chosen to be the domain of the probability measure, since we usually are interested in events other than   and  .

For the "largest"  -algebra, on the other hand, it contains every event, but we may not be interested in some of them. Particularly, we are usually interested in events that are "well-behaved", instead of those "badly behaved" events (indeed, it may be even impossible to assign probabilities to them properly (those events are called nonmeasurable)).

Fortunately, when the sample space   is countable, every set in   is "well-behaved", so we can take this power set to be a  -algebra for the domain of probability measure.

However, when the sample space   is uncountable, even if the power set   is a  -algebra, it contains "too many" events, particularly, it even includes some "badly behaved" events. Therefore, we will not choose such power set to the domain of the probability measure. Instead, we just choose a  -algebra that includes the "well-behaved" events to be the domain, so that we are able to assign probability properly to every event in the  -algebra of the domain. Particularly, those "well-behaved" events are often the events of interest, so all events of interest are contained in that  -algebra, that is, the domain of the probability measure.

To motivate the probability axioms, we consider some properties that the "probability" in frequentism (as a limiting relative frequency) possess:

  1. The limiting relative frequency must be nonnegative. (We call this property as nonnegativity.)
  2. The limiting relative frequency of the whole sample space   (  is also an event) must be 1 (since by definition   contains all sample points, this event must occur in every repetition). (We call this property as unitarity.)
  3. If the events   are pairwise disjoint (i.e.,   for every   with  ), then the limiting relative frequency of the event   (union of subsets of   is a subset of  , so it can be called an event) is

 
which is the sum of the limiting relative frequency of each of the events  . (We call this property as countable additivity.)

It is thus very natural to set the probability axioms to be the three properties mentioned above:

Definition. (Probability measure)

 
An illustration of probability measure.

Let   be the sample space of a random experiment, and   be an associated event space (which is a  -algebra on  ). A probability measure is a function  , with the following probability axioms satisified:

(P1) for every event  , the probability of the event  ,  ; (nonnegativity)
(P2)  ; (unitarity)
(P3) for every infinite sequence of pairwise disjoint events   (each of which is an element of  ),  . (countable additivity)

Remark.

  • For the axiom P3, we consider infinite sequences instead of finite sequences to add generality for infinite sample spaces. As we will see, we have result similar to the axiom P3 for finitely many pairwise disjoint events.
  • When   is countable, we just set the associated event space to be  .
  • When   is uncountable, the event space is usually not set to be  . Instead, it is common to set the event space to be the Borel  -algebra on   (which is defined when   is a topological space). The details are omitted here, but the event space contains all "well-behaved" sets and most sets we can think of in the power set   anyway.
  • The probability axiom P3 relates the events   to their union  . This matches with the closure under countable unions in the definition of  -algebra.

Using the probability axioms alone, we can prove many well-known properties of probability.

Basic properties of probability edit

Let us start the discussion with some simple properties of probability.

Theorem. Let   be a probability space and   be a set in the event space  . Then,  .

Proof. Consider the infinite sequence of events   (recall that   and   must be in the  -algebra  ). We can see that the events are pairwise disjoint. Also, the union of these events is  . Hence, by the countable additivity of probability, we have

 
It can then be shown that  . [1]

 

Using this result, we can obtain finite additivity from the countable additivity of probability:

Theorem. (Finite additivity) Let   be a probability space. Then, for every event  , if the events   are pairwise disjoint, then

 

Proof. Consider the infinite sequence of events   (recall that   always). Then,

 
(The last equality follows since  , and it can be shown that   using some concepts in limit (to be mathematically rigorous).)

 

Finite additivity makes the proofs of some of the following results simpler.

Theorem. Let   be a probability space, and   and   be sets in the event space  . Then,

  1. (complementary event)  . (The complementary event is taken with respect to the universal set.)
  2. (numeric bound)  .
  3.  .
  4.  .
  5. (monotonicity) If  , then  .

Proof.

Property 1:

First, notice that by definition  . Furthermore, since  , we have   by the closure under complementation of  -algebra. Also, the sets   and   are disjoint. Thus, by the finite additivity, we have

 
On the other hand,
 
Thus, we have the desired result.

Property 2: By property 1, we have  . We then have the desired numeric bound on   since   also by the nonnegativity of probability.

Property 3:

 

Property 4: By property 3, we have

 

Property 5: Assume that  . Then,  . Hence, by property 3,

 

 

Remark.

  • The numeric bound property ensures that the probability measure   is indeed a well-defined function (there is not a probability of event that is out of the codomain  ).

Example. Let   and   the event of winning the champion in the competition, and entering the final of the competition respectively. Then,   (when we win the champion, then we must enter the final), and so   by monotonicity. That is, the probability of winning the champion in a competition is less than or equal to that of entering the final of the competition.

Example. Let   be a probability space, and  . Show that if  , then  .

Proof. Assume  . Then, we have   since   by the nonnegativity of probability. On the other hand, we have  . So, by monotonicity,  , and thus  . Combining the inequalities we get yields the desired result.

 

 

Exercise. Let   be a probability space, and  . Show that if   and  , then  .

Proof

Proof. Assume   and  . Then,  . Hence,

 

 



 

Exercise. Let   be a probability space, and  . Show that  . (Hint: consider the property  .)

Proof

Proof. By the hint, we have

 

 


Example. Suppose you roll a loaded dice, and the corresponding probability space is  . It is given that  , and  , and  . Calculate the probability of getting a 1 from the dice, i.e.,  .

Solution. First, notice that   and  . Furthermore,  . Also,   and   are disjoint. Hence,

 

Also,   are disjoint, and hence

 
 

Exercise. Calculate the probability of getting a 2 from the dice.

Solution

From the example above, we have  . Since   are disjoint, it follows that

 


Remark.

  • For simplicity, we may sometimes omit the curly braces   inside the probability measure. For example, we may write   instead of  .

Example. Let   be a probability space. Suppose we have   and   for events  . Can the events   and   be disjoint?

Solution. They cannot be disjoint. We may prove it by contradiction.

Proof. Assume to the contrary that   and   are disjoint. First,  . Then, by the finite additivity of probability, we have

 
causing a contradiction (violating the numeric bound property, and hence violating the probability axioms logically (by contrapositive)).

 

 

Exercise. What is the minimum probability of   such that   can possibly be a valid probability measure? (Hint: it suffices to consider the bound on   if   is a valid probability measure. The bound is a necessary condition (but may not be sufficient) for   to be a valid probability measure in logic terms.)

Solution

When   is a valid probability measure, we have

 
Since   is a valid probability measure, it follows that  , meaning that the minimum probability is 0.05.



 

Exercise.

 
A graphical illustration of symmetric difference of two sets.

Define   be the symmetric difference of sets   and  , that is,  .

Propose a formula for   in terms of  , and prove it. (Note: the occurrence of the event   corresponds to the occurrence of exactly one of the events   and  .) (Hint: You may use without proof the fact that  . Also, you may use the property that  .)

Solution

Proposition:  .

Proof. We have

 

 


Example. Let   be a probability space, and   be events such that  . It is given that

 
Calculate

(a)  ;

(b)   (without using inclusion-exclusion principle for three events directly).

Solution.

(a)

  •  .
  •  .
  •  .

(b) First,  . Then, we write

 
It follows that  .

Note: to keep track of the probabilities of different events more easily, we may draw a Venn diagram:

|-------------| <--------- A
|             |
|        |----|----|
|  0.2   |    |    |
|        | 0.1| 0.2| <---- C
|        |    |    |
|--------|----|----|------|
|        |0.1 | 0.2|      | 
|   0.1  |    |    |  0.1 | <---- B
|        |----|----|      |
|-------------|-----------|

It also allows us to calculate the probabilities in a more systematic manner. (E.g., we know that  . Also,  . So, the "remaining piece" in   has 0.1 probability (i.e.,  .)

 

Exercise. Calculate

(a)  ;

(b)  .

Solution

(a)

  •  .
  •  .
  •  .
  •  .
  •  .
  •  .

(b)

  •  .
  • We have  . Thus,  .
  •  .
  •  .


Constructing a probability measure edit

As we have said, the axiomatic definition does not suggest us a way to construct a probability measure. Actually, even for the same experiment, there can be many ways to construct a probability measure that satisfies the above probability axioms if there are not sufficient information provided:

Example. (Coin tossing) Consider a random experiment where we toss a coin (fair or unfair). Suppose we want to define a probability space for this random experiment. Three elements are needed: sample space, event space, and probability measure.

Here, we define the sample space as   (  and   stands for "heads comes up" and "tails comes up" respectively). Since   is finite, the event space is chosen to be its power set, that is,  . It then remains to construct the probability measure  , subject to the probability axioms. Since the domain of the probability measure is  , we need to define the "output" of   (that is, assigning probability) for every event in  .

For the empty set, by a property of probability, the probability assigned to it has to be zero. Similarly, for the set  , the probability assigned to it has to be one by the probability axiom P2. But for the remaining two sets  , there are no properties/axioms of probability that tell us what probability must we assign to each of them. Despite this, the sum of the two probabilities assigned has to be one, by the finite additivity property:

 
Also, by the numeric bound of probability, the two probabilities assigned has to be between zero and one.

So, without further information, we can set   where   (and thus  ). Hence, we can see that there are infinitely many ways to construct a probability measure for this random experiment!

However, we have previously mentioned that we may assign probabilities to events subjectively (as in subjectivism), or according to its limiting relative frequency (as in frequentism). Through these two probability interpretations, we may provide some background information for a random experiment, by assigning probabilities to some of the events before constructing the probability measure, to the extent that there is exactly one way to construct a probability measure. Consider the coin tossing example again:

Example. (Coin tossing, continued) Suppose we are told that the coin is fair, so based on this message, we assign the probability of "head comes up" and "tail comes up" to be   respectively. In this case, the way of constructing the probability measure is fixed:

 
This is the only way to construct a probability measure (satisfying the probability axioms) with this further information.

In general, it is not necessary to assign probability to every event in the event space in the background information for us to able to construct the probability measure in exactly one way. Consider the following example.

Example. (Rolling dice) Consider a random experiment where we roll a dice (loaded or unloaded). Then, we define the sample space as   where each number represents the number coming up. It follows that the event space is   since   is finite. (Since there are   sets in the event space, we will not list out them.) Now, we proceed to construct a probability measure   (satisfying the probability axioms). Similarly, we need to consider all 64 events in the event space  .

First, for the empty set and the sample space, we have   and  . Now, let us consider the singletons (i.e., sets consisting one element) in the event space:  . Similarly, without further information, there are infinitely many possible ways to assign probability to each of them, subject to the constraint that their sum has to be one by the probability axiom P2. For now, let us denote

 

If we think more carefully, then we will find out that once these six probabilities are fixed, the probabilities for all other   events are fixed, based on the property of finite additivity (which must be satisfied when the probability measure satisfies the probability axioms), since all other 56 events are just finite unions of some of these six events, and thus their probabilities can be obtained by adding up some of the above probabilities.

We can see from this example that to provide sufficient background information to the extent that the probability measure can be constructed in exactly one way, we just need the probability of each of the singleton events (which should be nonnegative and sum to one to satisfy the probability axioms). After that, we can calculate the probability for each of the other events in the event space, and hence construct the only possible probability measure.

This is true when the sample space is countable, in general:

Theorem. Let   be a probability space, where the sample space   is countable. Assume that the probability of each of the singleton events (i.e.,   for every  ) is given such that they are nonnegative and their sum is one (to satisfy the probability axioms). Then, the probability measure   is given by

 
for every event  .

Proof.

Case 1:   is finite. Then, we can write  . It follows that every event   can be expressed as   (which  s are taken over for the union depends on the event  ). Notice also that the sets " "s are disjoint. (Every set contains a different sample point, and so the intersection of any pair of them is an empty set.) Then, by the finite additivity of probability, we have for every event  ,

 

Case 2:   is countably infinite. Then, we can write  . It follows that every event   can be expressed as   (which  s are taken over for the union depends on the event  ). Notice also that the sets " "s are disjoint. Then, by the countable additivity/finite additivity of probability, we have for every event  ,

 

 

Example. Consider a random experiment where we draw an integer from all positive integers. Then, the probability space is   where  . Suppose it is given that

 
for every  . Since the probabilities are nonnegative, and also their sum is
 
it follows that the probability measure is given by
 
for every event  .
 

Exercise. Calculate the probability  .

Solution

We have by above formula

 


The following is an important special case for the above theorem.

Corollary. (Formula for combinatorial probability) Let   be a probability space, where the sample space is finite. Assume that the probability of each of the singleton events (  of them) is the same:  . (we say that all outcomes are equally likely in this case). Then, the probability measure  , called combinatorial probability (or classical probability) is given by

 
for every event  .

Proof. Under the assumptions, the probability of every singleton event is nonnegative. Also, the sum of the probabilities is

 
Thus, for every event  , we have by the previous theorem
 

 

Remark.

  • By principle of indifference, unless there exists evidence showing that all outcomes are not equally likely, e.g. it is given that a coin is biased, such that it is more likely that head comes up, we should assume that the outcomes are equally likely.

Example. Suppose we roll a fair red dice and a fair blue dice. Then, we can define the sample space be  , where   means   comes up for red dice, and   comes up for blue dice.

Calculate the probability  .

Solution. The number of sample points is  . Since the outcomes should be equally likely, the desired probability is   (there is only one sample point for this event).

 

Exercise. Suppose the blue dice is colored red, such that the two dice become indistinguishable.

Then, student A claims that the probability  , and he reasons as follows:

Recall that when the two dices become indistinguishable, the number of possible distinct pairs of numbers facing up becomes   (shown in previous chapter about combinatorics). So, there are only 21 sample points in the sample space, and hence the probability is  .

On the other hand, student B claims that the probability should still be   and he reasons as follows:

Notice that in this case, the number 1 comes up for both dice correspond to exactly one outcome in the distinguishable case:  . Also, changing the color of the dice should not affect the randomness of the experiment. So, the probability in this case should be the same as the probability  .

Which of the students is correct? Also, explain why the other student is wrong.


Solution

Student B is correct. Student A is wrong, since in this case, not all outcomes in the experiment are equally likely. Particularly, some of the sample points in this case actually correspond to two original sample points, e.g. "1 comes up for one dice, 2 comes up for another" corresponds to   and  .


Example. Consider a random experiment where a fair coin is tossed twice. Calculate the probability of getting heads in a row.

Solution. There are   outcomes in this experiment. They should be equally likely, and hence the probability of getting two heads is  .

 

Exercise. Suppose we toss a fair coin five times.

(a) Calculate the probability   (  means 5 heads in a row).

(b) Calculate the probability   (  means 4 heads in a row, then 1 tails).

(c) Calculate the probability  . (Answer:  )

(d) Calculate the probability  . (Answer:  )

Solution

There should be   equally likely outcomes.

(a) The probability is  . (The event contains one sample point only.)

(b) The probability is  . (The event contains one sample point only.)

(c) The event  , containing five sample points. Thus, the probability is  .

(d) We can count the number of sample points in the event by regarding the five outcomes from tossing five times as five distinguishable cells, and 2 tails as 2 indistinguishable balls. Then, we place 2 indistinguishable balls into 5 distinguishable cells (the remaining empty cells are for heads). So, the number is  . Hence, the probability is  .

Example. (Capture-mark-recapture) Suppose we are fishing in a lake, containing   distinct fishes. First, we catch   fishes from the lake (capture), and give them each a marker (mark), and after that place them back into the lake.

Then, we catch fishes from the lake again (recapture), and catch   fishes this time. Show that the probability that there is   marked fishes in the   fishes caught is

 
assuming that the outcomes in the sample space are equally likely.

Proof. We regard the   fishes as   distinguishable cells, and   catches (in the second time) as   indistinguishable balls (the order of the catch is not important, we are considering what   fishes are caught only). Then, there are   equally likely sample points in the sample space.

It now remains to consider the number of sample points in the event. We now count the number of ways to have   marked fishes in the   fishes. We regard the   marked distinct fishes as   distinguishable cells, and   non-marked fishes as   distinguishable cells. (These two groups of cells are put separately.)

  • Step 1: placing   indistinguishable balls (catches) into   distinguishable cells. (  ways)
  • Step 2: placing   indistinguishable balls (catches) into   distinguishable cells. (  ways)

Hence, the number of sample points in the event is  . The result follows.

 


Example. There are 9 distinguishable balls in a box, consisting of 3 red balls, 2 blue balls and 4 green balls.

(a) Calculate the probability that a red ball is drawn from the box if 1 ball is drawn from the box.

(b) Calculate the probability that exactly 2 red balls and exactly 3 green balls are drawn from the box if 6 balls are drawn from the box.

Solution.

(a) Since one ball is drawn from the box, there are 9 outcomes in the sample space, corresponding to the 9 different draws. We shall assume that the outcomes are equally likely. Then, the probability that a red ball is drawn is   since there are 3 different draws (3 sample points) in the event.

(b) Although the question does not state whether the drawing of ball is unordered or ordered, the probability obtained with either assumption is actually the same, when the 9 balls are distinguishable. This is because every unordered draw of 6 balls corresponds to   ordered draw of 6 balls. So if the ordered draws are equally likely, then the unordered draws must also be equally likely, vice versa. [2] Here, let us regard the draws as unordered for simplicity.

Then, the number of sample points in the sample space is  . (9 balls: 9 distinguishable cells with capacity 1; 6 draws: 6 indistinguishable "balls"). For the event, there are

 
sample points. It follows that the probability is  .
 

Exercise. Three (distinguishable) orange balls are added to the box. Calculate the probability that 2 red balls and 3 green balls are drawn from the box if 6 balls are drawn from the box again. (Answer:  )

Solution

With 3 additional orange balls, the number of sample points in the sample space becomes  . The number of sample points in the event becomes

 
sample points. Thus, the probability is  .


Example. Suppose there are 20 couples in a room, and we pick two people randomly from the room (such that all outcomes in the sample space are equally likely). Calculate the probability that the two people picked are a couple.

Solution.

We regard the two picks to be indistinguishable (or unordered). Then, there are   equally likely sample points in the sample space (place 2 indistinguishable picks into 20 people with capacity 1). Also, the event that the two people picked are a couple has 20 sample points since there are 20 couples.

It follows that the probability is  .

 

Exercise. Regard the two picks as ordered and calculate the probability again (which should also be  ).

Solution

In this case, there are   equally likely sample points in the sample space. Also, the event has   sample points (20 couples, two possible orderings for each couple). Hence, the probability is  .


Example. Amy and Bob are playing a game where each of them roll the same fair dice once, and the player who gets a greater number from the roll wins. If both players get the same number from the roll, then they draw.

(a) Calculate the probability that Amy and Bob draw.

(b) Hence or otherwise, calculate the probability that Amy wins.

Solution.

(a) There are   outcomes in total. There are 6 outcomes where Amy and Bob draw (both get 1,2,3,4,5, or 6). It follows that the probability is  .

(b) To calculate the probability, clearly one can count the number of outcomes where Amy wins. But here we offer an alternative and more convenient approach, which makes use the symmetry of the game. Notice that the situation for Amy and Bob is basically the same in the game (for each of them, there is another player doing the same thing as him/her, and the winning conditions are the same). So, the game is kind of symmetric. Thus, by the natural symmetry of the game, the probability that Amy wins and the probability that Bob wins are equal.

But we can notice that the events that Amy wins, Bob wins, and they draw are disjoint. Also, their union is the whole sample space. It follows that the sum of their probabilities is 1. Hence, letting   be the probability that Amy wins, we have

 
 

Exercise. Consider a modified version of the above game, where if both players get the same number from the roll, then each of them roll the same fair dice once more. (We can see that the game must end after a finite number of dice rolling. It is impossible to keep having draw infinitely.)

Calculate the probability that Amy wins (eventually).


Solution

In this case, the probability of ending up with a draw is 0, since the game mechanism does not allow a draw. By the symmetry of the game, the desired probability is thus  .


Example. Consider a simple game for children. Initially, a child is given a paper with 4 dots on it, with a rectangular shape:

*     *
1     2


3     4
*     *

Then, the child is allowed to draw (exactly) 3 different line segments between two dots on the paper. The aim for the child is to draw a triangle, formed by the 3 line segments, where each vertex of the triangle has to be a dot on the paper. For instance, in this case:

 *    *       
  \  /
   \/
   /\
  /  \
 *----*         

the lower triangle is not counted as a valid one, since one of its vertices is not a dot on the paper (instead, it is made by intersection of two line segments).

Suppose we draw the 3 different line segments randomly. Calculate the probability that there is a (valid) triangle in the resulting diagram.

Solution.

Let us regard the 3 draws of line segment as unordered. Notice that there are   places to draw a line segment. (To draw a line segment, we choose two dots from the four dots (without considering the order).) Then, there are   ways to draw 3 different line segments (which should be equally likely).

Among the 20 ways, there are   ways where a valid triangle is formed. (joining 123, 124, 134, or 234. We choose 3 dots from the 4 dots to form a triangle (without considering order), and there are four ways.) Hence, the probability is  .

 

Exercise. Suppose there are 5 dots on the paper, forming a shape like a regular pentagon.

        * 1    
               
  5 *        * 2
               
      *    *     
     4      3  

Suppose we draw 3 different line segments randomly. Calculate the probability that there is a (valid) triangle in the resulting diagram. (Answer:  )

Solution

Similarly, we regard the 3 draws as unordered. In this case, there are   places to draw a line segment. Thus, there are   ways to draw 3 different line segments.

Among the 120 ways, there are   ways where a valid triangle is formed. Hence, the probability is  .


Example. Suppose we are drawing 5 cards from a poker deck consisting of 52 cards.

(a) Calculate the probability for getting a flush, that is, all 5 cards drawn are of the same suit, and not all are of sequential rank. (Note: sequential rank means A2345,23456,..., or 10JQKA. JQKA2, etc. are not counted as sequential rank.)

(b) Calculate the probability that all 5 cards drawn are of the same color.

(c) Calculate the probability for getting a four of a kind, that is, 4 cards are of the same kind among the 5 cards drawn.

Solution. We regard the 5 draws to be unordered, so there are in total   outcomes, which should be equally likely.

(a) To form a flush, we first consider the 2-step process:

  1. Choose a suit from the 4 suits. (4 ways)
  2. Placing the 5 indistinguishable draws into the 13 cards with the suit chosen in step 1. (  ways)

But among these ways, for each of the 4 ways in step 1, there are 10 ways in step 2 that are of sequential rank. So, we need to subtract   from the above ways. Hence, the probability is  .

(b) For the 5 cards to be of the same color, we consider this as a 2-step process:

  1. Choose a color from the 2 colors. (2 ways)
  2. Placing the 5 indistinguishable draws into the 26 cards with the color chosen in step 1. (  ways)

Hence, the probability is  .

(c) To form a four of a kind, we consider this as a 2-step process:

  1. Choose a kind from 13 kinds for the four cards with the same kind (13 ways)
  2. Choose a card from the remaining 48 cards in the deck (48 ways)

Then, the probability is  .

 

Exercise.

(a) Calculate the probability for getting a straight, that is, the 5 cards drawn consists of five cards of sequential rank, and not all are of the same suit. (Note: A2345, 23456, ..., 10JQKA are counted as straight, but e.g., JQKA2 is not counted as straight. This also applies to straight flush below.) (Answer: approximately 0.00392)

(b) Calculate the probability for getting a straight flush, that is, the 5 cards drawn consists of five cards of sequential rank, and all are of the same suit. (Answer: approximately 0.0000154)

(c) Calculate the probability for getting a full house, that is, the 5 cards drawn contains 3 cards of a rank, and 2 cards of another rank. (Answer: approximately 0.00144)


Solution

(a) To form a straight, we first consider the following 6-step process:

  1. Choose a sequential rank from 10 possible sequential ranks (A2345,23456, ..., 10JQKA) for the straight. (10 ways)
  2. Choose a suit from the 4 suits for the first card in the sequence. (4 ways)
  3. Choose a suit from the 4 suits for the second card in the sequence. (4 ways)
  4. Choose a suit from the 4 suits for the third card in the sequence. (4 ways)
  5. Choose a suit from the 4 suits for the fourth card in the sequence. (4 ways)
  6. Choose a suit from the 4 suits for the fifth card in the sequence. (4 ways)

But in these steps, it is possible that all cards drawn are also of the same suit. So, we need to exclude those cases, and there are   of them, since for each of the 10 ways in step 1, we may choose the same suit from 4 suits throughout the steps 2-6 (4 ways). Hence, the probability is

 
(b) To form a straight flush, the number of ways is the same as the number of cases excluded in (a) (i.e., 40). So, the probability is
 
(c) To form a full house, we consider this as a 4-step process:
  1. Choose a rank from the 13 ranks for the 3 cards with the same rank. (13 ways)
  2. Place 3 indistinguishable draws into the 4 cards with that rank, for the rank chosen in step 1. (  ways)
  3. Choose a rank from the remaining 12 ranks for the 2 cards with the same rank. (12 ways)
  4. Place 2 indistinguishable draws into the 4 cards with that rank, for the rank chosen in step 1. (  ways)

Thus, the probability is

 


Example. (Raffle) Suppose there are 10000 distinct raffle tickets for a raffle. After all 10000 raffle tickets are sold out, 3 different tickets from the 10000 raffle tickets will be chosen randomly as 3 winning tickets, and the holder of each of these 3 winning tickets will receive a prize (it is possible that one person receives multiple prizes, if the person owns multiple winning tickets).

(a) If Amy purchases 10 of the 10000 raffle tickets, then what is the probability for Amy to win at least one prize?

(b) If Amy purchases 100 of the 10000 raffle tickets, then what is the probability for Amy to win at least one prize?

(c) If Amy purchases 1000 of the 10000 raffle tickets, then what is the probability for Amy to win at least one prize?

(Hint: Consider complementary events.)

Solution.

All outcomes for the raffle should be equally likely. Let us regard the winning tickets choices as indistinguishable.

  • The number of outcomes in total is  . (Placing 3 indistinguishable balls (winning ticket choices) into 10000 distinguishable cells (raffle tickets) with capacity one.)
  • For Amy to not win any prize, we are placing 3 indistinguishable balls (winning ticket choices) into   distinguishable cells (raffle tickets that are not purchased by Amy) with capacity one, assuming Amy purchases   tickets.
  • This is the complementary event of "Amy wins at least one prize".

(a) The probability is  .

(b) The probability is  .

(c) The probability is  .

 

Exercise. Suppose a single raffle ticket may be chosen to be a winning ticket multiple times. That is, some of the 3 winning tickets chosen may be the same.

Calculate the probability for Amy to win at least one prize if Amy purchases 1000 of 10000 raffle tickets. (Answer: 0.271 exactly)

Solution

In this case, let us regard the winning tickets choices as distinguishable and ordered for convenience. Then, the number of outcomes in total is  . (Placing 3 distinguishable balls (winning ticket choices) into 10000 distinguishable cells (raffle tickets) with unlimited capacity.) The number of sample points in the event for Amy not winning any prize is   (Placing 3 distinguishable balls (winning ticket choices) into 9000 distinguishable cells (raffle tickets not purchased by Amy) with unlimited capacity.) So, the probability is

 
(Notice that the answer is exactly 0.271 this time.)

Remark.

  • The answer is expected to be quite close to the one in (c), since even a single raffle ticket may be chosen to be a winning ticket more than once, it is quite unlikely for that to happen. So, we will expect that the effect on the randomness should be quite small.



Example. Consider a game which involves repeated tossing of a fair coin. There are two players in the game: player A and B. The fair coin is tossed repeatedly until one of the player wins. Player A wins if the outcome pattern "THH" appears first, and player B wins if the outcome pattern "HHH" appear first. (H: heads come up; T: tails come up) For example, if the outcome sequence is "HHTHTHTTHTTHH", then player A wins after the last toss. On the other hand, if the outcome sequence is "HHH", then player B wins after the last toss.

(a) Can the game continue forever?

(b) Calculate the probability that player A wins.

Solution.

(a) No. Either outcome pattern must appear eventually.

(b) Notice that the player B wins if and only if a head comes up for each of the first three tosses.

Proof. "if" part: It is easy to see that player B wins if the first three tosses result in "HHH". This is just the winning condition for player B.

"only if" part: We prove it by contrapositive. So, we start with assuming that a tail comes up for some of the first three tosses. Then, we focus on the toss with outcome "T". If "HH" appears afterward, then player A wins (so player B does not win). Otherwise, if the next toss is with outcome "T", then we focus on the outcome "T". If "HT" appears afterward, we focus on the toss with outcome "T". Through this argument, we can see that the only way to terminate the game is that player A wins. So, player B never win in this case.

 

It follows that the probability that player B wins is the same as the probability that a head comes up for each of the first three tosses, which is  . (2 outcomes in each toss)

From (a), we know that the game must terminate eventually, that is, either player must win eventually. Hence, the probability that player A wins is  .

Example. Consider a experiment where we draw two integers from the integers 1,2,3 with replacement. Here, we define the sample space be  , where the ordered pair   indicates that the integer   and   is drawn in first and second draw respectively.

It is given that the probability

 
for every  .

(a) Verify that the sum of the probabilities of all singleton events is 1.

(b) Calculate the probability for the sum of the two integers drawn to be even.

Solution.

(a) The sum is

 
(b) Notice that the event for the sum of the two integers drawn to be even is  . So, the probability of this event is
 
 

Exercise. Suppose we disregard the order of the draws, and hence define the sample space to be  , where   means the integers   and   are drawn in the two draws.

(a) Suppose the probability

 
for every   and for some constant  . Determine the constant   such that the sum of the probabilities of all singleton events is 1.

(b) Calculate the probability for the sum of the two integers drawn to be even with the constant   determined in (a). (Answer:  )


Solution

(a) Since

 
the constant   is 24.

(b) The event for the sum of the two integers drawn to be even is  . So, the probability of this event is

 



More advanced properties of probability edit

Recall the inclusion-exclusion principle in combinatorics. We have similar results for probability:

Theorem. (Inclusion-exclusion principle) Let   be a probability space, and   be sets in the event space   (  is an arbitrary positive integer). Then,

 

Proof. We can prove this by mathematical induction.

Let   be the statement

 
We wish to prove that   is true for every positive integer  .

Basis Step: When  ,   is clearly true since it merely states that  .

Inductive Hypothesis: Assume that   is true for an arbitrary positive integer  .

Inductive Step:

Case 1:  . Then,   is true by a property of probability (recall that we have " ").

Case 2:  . We wish to prove that   is true. The main idea of the steps is to regard   as  , and then we apply the above property of probability, and eventually we will apply the inductive hypothesis twice, on two probabilities involving union of   events. Ultimately, through some (somewhat complicated) algebraic manipulations, we finally get the desired result. The details are as follows (may be omitted):

 
So,   is true.

Hence, by the principle of mathematical induction,   is true for every positive integer  .

 

Remark.

  • " " means summing over every vector   such that  . It follows that every term in the sum is chosen from   where for every sum, we place indistinguishable choices (we sort them ascendingly)into the   events. Hence, the number of terms in the " " is   (when the inclusion-exclusion principle applies to   events).

Example. Let   be a probability space. When  , the inclusion-exclusion principle becomes

 
where  .
 

Exercise. Write out the formula for inclusion-exclusion principle explicitly for 4 events  . (Suggestion: check that the number of terms in each sum is correct (see the remark above).)

Solution

 


Example. We select a student from some university students. It is given that

  • the selected student has a major in mathematics with a probability 0.4;
  • the selected student has a major in statistics with a probability 0.55;
  • the selected student has a major in accounting with a probability 0.3;
  • the selected student has a major in statistics and accounting with a probability 0.2;
  • the selected student has a major in accounting and mathematics with a probability 0.15;
  • the selected student has a major in mathematics and statistics with a probability 0.2;
  • the selected student has a major in mathematics, statistics and accounting with a probability 0.1.

Then, the probability that the selected student does not have any of these majors is  .

Proof. Let  ,  ,   be the event that the selected student among them has a major in mathematics, statistics and accounting respectively. Then,

 
Alternatively, we can consider the following Venn diagram:  
Construction of this Venn diagram.

We can see from this diagram that  , and thus the desired probability is  .

 

 

Exercise.

(a) Calculate the probability that the selected student has at least two of those three majors. (Answer: 0.25)

(b) Calculate the probability that the selected student has exactly one major. (Answer: 0.45)

(Hint: you may consider the Venn diagram above.)

Solution

(a) From Venn diagram, we can observe that the probability is  .

(b) From Venn digram, the probability is  .


The following is a classical example for demonstrating the application of inclusion-exclusion principle.

Example. (Letters and envelopes) Suppose we randomly place   letters to   different people into   pre-addressed envelopes (with addresses for the   people).

(a) Calculate the probability that none of the   letters are placed into the correct envelopes.

(b) Calculate the probability that exactly   letters are placed into the correct envelopes (of course,  ).

Solution.

(a) Let   be the event that 1st,2nd,..., -th letter are placed into the correct envelope respectively. Then, the desired probability is given by  .

Now, notice that the intersection of   of the   events means that (at least)   letters are placed into the correct envelopes ( ). Then, we can regard the situation as   cells with capacity one (correct envelopes) are occupied with one corresponding ball (correct letter) already (one way to do this), and we place the remaining   distinguishable balls (letters) into the remaining   distinguishable cells (envelopes) with capacity one. The number of ways of such placement (number of sample points in the intersection of the   events) is thus  .

On the other hand, to get the number of sample points in the sample space can be obtained by considering the situation as placing   distinguishable balls (letters) into   distinguishable cells (envelopes) with capacity one. Thus, the number of sample points in the sample space is  .

It follows that the probability of the intersection of   of the   events is   (assuming that all placements are equally likely).

Hence, by the inclusion-exclusion principle, the probability

 
It follows that the desired probability is
 

Remark.

  • Recall that  . So,  . From this, we can see that the answer in (a) is a partial sum of this infinite series. Hence, when   is large, the answer in (a) is approximately  .


(b) Notice that "exactly   letters are placed into the correct envelopes" is equivalent to "(at least)   letters are placed into the correct envelopes and none of the other   letters are placed into the correct envelopes". Thus, to get the probability, it suffices to calculate the number of sample points in the latter event, and divide it by the number of sample points in the sample space (which is   by (a)).

We can get the number of sample points in the latter event by considering it as a three-step process:

  1. Fix which   letters are to be placed into the correct envelopes. (  ways)
  2. Place the   letters into the correct envelopes. (1 way)
  3. Place the other   letters such that none of them are placed into the correct envelopes. (  ways. Here, we consider the situation in (a) but with   letters. By (a), the probability is  . To get the number of ways for such placement (number of sample points in the event), we multiply it by the number of sample points in the sample space in this case ( ).)

It follows that the number of sample points in the event is

 
Hence, the desired probability is
 

Remark.

  • It turns out that this is a special case for the Poisson limit theorem, which acts as an approximation to the binomial distribution, where the rate parameter  . (The general formula is  .) We will discuss these in later chapter.


Theorem. (Continuity from below) Let   be a probability space, and   be events in the event space  , such that  , that is,   and  . (The events   are said to be increasing in this case.) Then,

 

Proof. Assume that  . Now, define

 
[3] Claim:   are pairwise disjoint.

Proof. We wish to prove that   for every   such that  .

 
Case 1:  . Then,   ( ) and   ( ). Hence,  . (Since  ,   (  can be at most  ).)

Case 2:  . Then,   ( ) and  . Hence,   similarly.

 

Also, we have  , and  . Then, we have

 

 

Corollary. (Continuity from above) Let   be a probability space, and   be events in the event space  , such that  , that is,   and  . (The events   are said to be decreasing in this case.) Then,

 

Proof. Assume that   as  . Then, by De Morgan's law and a property of subset,  . So,

 

 

Example. Suppose Bob is given a task to work on. Let   be the time (which is positive, and in days) taken for Bob to finish the task. Assume that the probability for   is   for every  . Here, we define the sample space to contain all the possible time taken:   (in extended real number sense). (Time   indicates that the task is never finished.) We also denote by   the event   for every  . Then, we have   for every  .

(a) Calculate the probability that Bob finishes the task eventually.

(b) Calculate the probability that Bob never finishes the task.

Solution.

(a) First, the event that Bob finishes the task eventually can be expressed by  . (Informally, we may assert this "by inspection" (looking at the pattern for   when   gets larger and larger). We may also prove it more rigorously, but this is out of scope for here.) Also, notice that  . It follows that the desired probability is given by

 

(b) First, the event that Bob never finishes the task is the complementary event of the event in (a), that is, it can be expressed by  . It follows that

 
 

Exercise. Consider a random experiment where the sample space is  . Let   be the event   for every  . It is given that

 
for every  .

(a) Calculate the probability  . (Hint: Define event   for every  .) (Answer:  ) (We cannot define the event   as  . Why?)

(b) Calculate the probability  . (Hint: Define event   for every  .) (Answer:  )

(c) Calculate the probability  .


Solution

(a) Define event   for every  . Then,   ("by inspection"). So, the probability is given by

 
(We cannot define   as   since when  , this gives  , which is undefined.)

(b) Define event   for every  . Then,   ("by inspection"). So, the probability is given by

 

(c) Notice that   is the complementary event of  . Hence,

 

Example. Suppose John will take the final examination of the course probability theory tomorrow. Here, we define the sample space to contain all his possible scores (expressed in percentage):  . (Suppose his possible score (in percentage) is any real number in  .) (  means 0 mark, and   means full mark.) Let   be the event that John gets at least   in the final exam. In other words, the event   is given by  . It is given that the probability for   is

 
for every  .

(a) Calculate the probability  .

(b) Calculate the probability that John gets zero mark in the exam, that is,  .

Solution.

(a) Since  , the probability is  .

(b) Define event   for every  . Then, we can see that  . Thus, the probability

 
It follows that
 
 

Exercise. Calculate the probability that John gets full mark in the exam, that is,  . (Hint: Define event   for every  .) (Answer:  )

Solution

Define event   for every  . Then, we can see that  . Hence,

 


Theorem. (Boole's inequality/Subadditivity) Let   be a probability space, and   be events in the event space  . Then,

 

Proof. Define

 
Claim:   are pairwise disjoint.

Proof. We wish to prove that   for every   such that  .

 
Case 1:  . Then,  . So, it follows that
 
which means   (since the only subset of   is  ).

Case 2:  . Then,  . So, it follows that

 
which means   (since the only subset of   is  ).

 

Furthermore, we have   for every  , and  . Hence,

 

 

Remark.

  • The Boole's inequality also holds for finitely many events  . This can be seen by setting  . Then, we have

 

Example. Let   be a probability space, and   be events. Given that  ,   and  , what is the maximum value of  ?

Solution. First, we have  . It follows that

 
Hence, the maximum value is 0.45.


  1. One may prove this by contrapositive: Assume that  . Then, by the nonegativity of probability, this means  . Then,   (that is, the sum diverges). So,  .
  2. This is true in general, for selection of distinguishable objects. This is because every unordered selection corresponds to a fixed number of ordered selection. However, this is not true if the objects are indistinguishable, since the unordered selections may correspond to different numbers of ordered selections. This is the same reason for showing that we cannot use the division counting principle to derive the formula for placing   indistinguishable balls into   distinguishable cells with unlimited capacity, based on that for   distinguishable balls.
  3. Graphically,
          .           .
          .         .
          .       .
    *-----------*
    |###########|
    *--------*##|
    |////////|##|
    *----*///|##|  ...
    |....|///|##|
    |....|///|##|
    *----*---*--*
      E_1
        E_2
          E_3 ...
    ..
    .. : F_1  
    
    //
    // : F_2
    
    ##
    ## : F_3