Probability/Conditional Probability

Motivation edit

In the previous chapter, we have actually only dealt with unconditional probabilities. To be more precise, in the problems encountered in the previous chapter, the sample space is defined initially, and all probabilities are assigned with respect to that initial sample space. However, in many situations, after defining the initial sample space for a random experiment, we may get some new information about the random experiment. Hence we may need to update the sample space based on those information. The probability based on this updated sample space is known as a conditional probability.

To illustrate how we get the new information and update the sample space correspondingly, consider the following example:

Example. Suppose we randomly draw a card from the poker deck (such that every card is equally likely to be drawn). We define the sample space to be the set containing all 52 cards.

(a) Calculate the probability that an ace is drawn.

(b) Suppose an ace is drawn out from the poker deck beforehand. Calculate the probability that an ace is drawn. (This is a conditional probability.)

(c) Suppose two aces are drawn out from the poker deck beforehand. Calculate the probability that an ace is drawn. (This is a conditional probability.)

(d) Suppose three aces are drawn out from the poker deck beforehand. Calculate the probability that an ace is drawn. (This is a conditional probability.)

(e) Suppose we instead randomly draw four cards from the deck. Calculate the probability that four aces are drawn.

Solution.

(a) The probability is   (since there are four aces in the deck).

(b) With the given condition (an ace is removed from the deck), we know that there are only 51 cards, and 3 aces in the deck. So, the sample space is updated to contain these 51 cards. Although we cannot precisely describe the sample space since we do not know which ace is removed, we know that the sample space has 51 sample points and only 3 aces. It is also reasonable to assume that the sample points in the updated sample space are equally likely. Hence, the probability should be  . (Notice that we are still able to calculate the probability, despite we do not know exactly what the updated sample space is.)

(c) Similar to (b), the sample space is updated to contain 50 cards and two aces. Hence, the probability is  .

(d) Similar to (b), the sample space is updated to contain 49 cards and one ace. Hence, the probability is  .

(e) Clearly, we can use the concept of combinatorial probability to calculate this probability:  .

Another argument (which may be quite intuitive to many people) is to consider the four draws "one by one".

  1. For the first draw, the probability is  .
  2. For the second draw, we know that an ace is drawn for the first draw, so the probability is   (similar to (b)).
  3. For the third draw, we know that two aces are drawn for the first two draws, so the probability is   (similar to (c)).
  4. For the fourth draw, we know that three aces are drawn for the first three draws, so the probability is   (similar to (d)).

Then, we are somehow told that multiplying all the probabilities gives the desired one ("multiplication rule of probability"):

 
which turns out to be the same as the answer above.

It turns out that this argument is valid, but actually we have not discussed any results that justify this argument in the previous chapter. Indeed, we have implicitly used the concept of conditional probability (in 2,3,4 above), where the sample space is updated as we have new information/knowledge. (In some sense, the (conditional) probability reflect our state of knowledge about the random experiment (about the deck).)

From the example above, we are able to calculate the (conditional) probability "reasonably" through some arguments (see (b)) when the sample points in the initial sample space are equally likely. Furthermore, we can notice that the condition should be an occurrence of an event, which involves the sample points in the sample space. When the condition does not involves the sample points at all, it is irrelevant to the random experiment. For example, if the condition is "the poker deck costs $10", then this is clearly not an event in the sample space and does not involve the sample points. Also, it is irrelevant from this experiment.

To motivate the definition of conditional probability, let us consider more precisely how we obtain the (conditional) probability in (b). In (b), we are given that an ace is drawn out from the poker deck beforehand. This means that ace can never be drawn in our draw. This corresponds to the occurrence of the event (with respect to original sample space)   (denoted by  ) which consists of 51 sample points, resulting from excluding that ace from the original 52 sample points. Thus, we can regard the condition as the occurrence of event  . Now, under this condition, the sample space is updated to be the set  , that is, only points in   are regarded as legit sample points now.

Consider part (b) again. Let us denote by   the event   (with respect to the original sample space).

Now, only part of the points (whose also lie in the set  ) in the set   are regarded as legit sample points. All other points in set   are no longer legit sample points anymore under this condition. In other words, only the points in both sets   and   (i.e., in set  ) are legit sample points in event   under this condition.

In the part (b) above, only the three aces remaining in the deck (in both sets   and  , and hence in set  ) are considered to be legit sample points. The other ace in set   (the ace that is drawn out in the condition) is not considered to be a legit sample point, since that ace is not in the deck at all!

To summarize, when we want to calculate the conditional probability of event   given the occurrence of event  , we do the following:

  1. We update the sample space to the set  .
  2. We only regard the sample points in set   to be the (valid) sample points of event  .

In the above example, we encounter a special case where the sample points in the initial sample space (assumed to be finite) are equally likely (and hence the sample points in the updated sample space   should also be equally likely). In this case, using the result about combinatorial probability (in previous chapter), the conditional probability, denoted by  , is given by

 
(Notice that " " is just a notation (  is a function,   is the "input"). Merely " " means nothing. Particularly, " " is not an event/set.)

When the sample points are not equally likely, we can apply a theorem in the previous chapter for constructing probability measure on the updated sample space  . (Here, we assume that   is countable.) Particularly, since we are only regarding the sample points in set   as the (valid) sample points of event  , it seems that the (naive) "conditional probability" of   given the occurrence of event   should be given by

 
according to that theorem (where   is the probability measure in the initial probability space  ).

However, when we apply the original probability measure   (in the original probability space) to every singleton event in the new sample space  , we face an issue: the sum of those probabilities are just   which is not 1 in general! But that theorem requires this probability to be 1! A natural remedy to this problem is to define a new probability measure  , based on the original probability measure and the above (naive) "conditional probability"  , such that the sum must be 1. After noticing that  , a natural choice of such probability measure is given by

 
for every  . The probability   can be interpreted as the normalizing constant, and every (naive) "conditional probability" (as suggested previously) is scaled by a factor of  . (It turns out that the probability measure   defined in this way also satisfies all the probability axioms. We will prove this later.)

When comparing this formula with the formula for the equally likely sample points case, the two formulas look quite similar actually. In fact, we can express the formula for the equally likely sample points case in the same form as this formula (since the equally likely sample points case is actually just a special case for the theorem we are considering):

 
A graphical illustration for the above motivation.

So, now we have developed a reasonable and natural formula to calculate the conditional probability   when the outcomes are equally likely (applicable to finite sample space), and the outcomes are not equally likely (only for countable sample space). It is thus natural to also use the same formula when the sample space is uncountable. This motivates the following definition of conditional probability:

Definition edit

Definition. (Conditional probability) Let   be a probability space, and   be events. Assume that  . Then, The conditional probability of event   given the occurrence of event  , denoted by  , is

 

Remark.

  • The assumption that   prevents the above formula gives an undefined value.
  • It follows that  .
  • This formula can be illustrated by the following tree diagram:

                *-------> A    P(A n B)
              / P(A|B)
  P(B)       /
   *------> B 
  /          \
 /            \
/              *-------> A^c
\               
 \
  \
   *------> B^c
  • Sometimes, the probability   is unknown, and we are given   and  . In this case, we can apply this formula to get  .
  • Also, the value of   is often not stated in the question explicitly. Instead, we may have to use our intuition to understand the situation to get the conditional probability  .
  • Besides, in a more complicated situation, it may not be clear that what the condition " " in   is, and therefore we may have to decide what should we condition at, depending on the context. (When we condition on some appropriate events, we may be able to determine the conditional probability readily.)
  • Through this definition, we induces (in some sense) another probability space where the conditional probability is taken to be the probability measure:  , where   is defined by

 
for every  .
(Technically speaking, since we have not proved that   satisfies all probability axioms, we cannot say that   is a probability space yet. But it turns out to be the case.)

Example. (  is a valid probability measure) Let   be a probability space, and   be events. Assume that  . Define the function   by

 
for every  .

Prove that   is a valid probability measure, that is, this function satisfies all three probability axioms.

Proof.

P1 (Nonnegativity): Since   and   (  satisfies all the probability axioms, particularly, the nonnegativity), it follows that

 
for every event  .

P2 (Unitarity): We have

 

P3 (Countable additivity): For every infinite sequence of pairwise disjoint events  , we have

 

 


Example. (Special cases for conditional probability)

  • Suppose  . Then, when we update the sample space to set  , the set   contains all (legit) sample points in the updated sample space. Thus, the conditional probability   should be 1. Alternatively and more intuitively, given that   occurs, this means the realized outcome lies in set  , and hence must lie in set   (since  ). So, the probability should be 1.
  • Formally, we can see this readily:   (  when  .)
  • Suppose   and   are disjoint. Then, when we update the sample space to set  , the set contains zero (legit) sample point in the updated sample space. Thus, the conditional probability   should be 1. Alternatively and more intuitively, given that   occurs, this means the realized outcome lies in set  . So, it must not lie in set  . Hence, the probability should be 0.
  • Formally, we can see this readily:  .

Example. Suppose we roll a fair dice once. Let   and   be the events that even number comes up and prime number comes up respectively.

(a) Calculate  .

(b) Calculate  .

Solution.

(a)

 
(b)
 

(Alternatively, we may use the formula obtained in the motivation section since all sample points are equally likely: (a)  ; (b)  .)

 

Exercise.

(a) Calculate  .

(b) Calculate  .


Solution

(a)

 

(b)

 
(We have   since  .)


Example. Suppose Amy tosses a fair coin twice, and she tells Bob that she obtains at least one heads in the two tosses.

(a) Calculate the probability that the outcome of the two tosses is "HH" (two heads).

(b) Calculate the probability that the outcome of the two tosses is "TT" (two tails).

Solution.

(a) The probability is

 

(b) Notice that  . Thus, the probability is 0.

 

Exercise. A student claims that the probability that the outcome of the two tosses is "HH" should be  . He reasons as follows:

When Bob is told that there is at least one heads in the two tosses, we can update the sample space to be  . The probability is thus

 

Point out the mistakes made by the student.

Solution

The mistake is that the two outcomes in   are not equally likely. Indeed, the outcome "exactly one head" in the set   can actually be decomposed into two more specific outcomes:   and  . So, strictly speaking   is not even a sample space!


Example. (Boy or Girl paradox) Amy is a mother with two children, where each of them is equally likely to be a boy or girl.

(a) Suppose Amy tells you that she has at least one son. Calculate the probability that Amy has two sons .

(b) Suppose Amy tells you that her older child is a son. Calculate the probability that Amy has two sons.

Solution.

Define the sample space to be   where   represents the older child is boy and the younger child is a girl, etc. Then, all four sample points in the sample space are equally likely.

(a) The probability is

 

(b) The probability is

 

Remark.

  • This example shows that even with a small change in the given information, the conditional probability obtained can be quite different. So, we should be careful about what information is actually given in order to have a correct calculation.

Example. Consider a course in probability theory. The following summarizes the probability for a student to pass the course when the student has different scores in the mid-term exam:

 

(a) Suppose the probability for getting at least 50 marks in the mid-term exam (for every student) is 0.6. Calculate the probability that a student gets less than 50 marks in the mid-term exam but pass the course.

(b) Assuming that it is equally likely for the score of mid-term exam of a student to lie in one of the above four intervals, calculate the probability that a student (with unknown score) passes the course.

Solution.

(a) Let   be the event that a student gets at least 50 marks in the mid-term exam,   be the event that a student passes the course. Then, the desired probability is  . From the question, we know that  , and  . Hence, the desired probability is

 

(b) Let   be the event that the score of mid-term exam of a student lies in   respectively. From the assumption, we have  . Notice that the events   are pairwise disjoint. Also,

 
(possibly seen by Venn diagram informally) Hence, by the finite additivity, the desired probability is
 
 

Exercise. Suppose the assumption for part (b) above still applies.

(a) Calculate the probability that a student gets at least 90 marks in the mid-term exam if the student passes the course. (Answer: approximately 0.3585)

(b) Calculate the probability that a student gets less than 50 marks in the mid-term exam if the student fails the course. (Answer: approximately 0.5185)


Solution

Let us continue using the notations defined in part (b) above.

(a) The desired probability is

 

(b) The desired probability is

 


Example. Amy rolls two fair six-faced dice, with one colored red and another colored blue (so that they are distinguishable), without looking at the dice. After Amy rolls the two dice, Bob tells Amy that there is at least one 6 coming up (assume Bob tells the truth). Calculate the probability that 6 comes up for both dice after hearing the information from Bob.

Solution.

The condition is there is at least one 6 coming up, and the probability of this condition can be calculated by inclusion-exclusion principle:

 
Also,
 
Thus, the probability that 6 comes up for both dice after hearing the information from Bob is
 
 

Exercise.

1. Calculate the probability again if the blue dice is colored red, such that the two dice is not distinguishable.

Solution

Changing the color of dice does not affect the probability. Thus, the probability is still  .

Remark.

  • After changing the color, although the number of sample points in the sample space changes, the sample points are not equally likely.


2. Chris claims that the desired probability in the example should be  , and he reasons as follows:

Given that there is at least one 6 coming up, we know that 6 comes up in a dice. Now, we consider another dice, having six equally likely possible outcomes for the number coming up, namely 1,2,3,4,5 and 6. Thus, we can update sample space to   where the second coordinate of each ordered pair represents the number coming up for another dice.

The desired event is that 6 comes up for both dice, that is,  . Clearly,  . It follows that the probability is  .

We know that the correct answer is  , and not  , but why is this claim wrong? (cf. this discussion)

Solution

The six outcomes considered include the outcome in which 6 comes up in either red or blue dice. But the sample points for which 6 comes up for the blue or red dice only (resp.) are missed. There are five of them, namely  . So, the updated sample space suggested is actually not complete, and thus a wrong answer is obtained.

The complete updated sample space should be

 
where first and second coordinate in each ordered pair represents the number coming up for red and blue dice respectively. The set   containing 11 sample points. The desired event is  . Hence, the correct answer is  .

Remark.

  • Chris' claim is correct if Bob tells Amy that 6 comes up for red (or blue) dice. There is difference between '6 comes up for red (or blue) dice' and 'there is at least one 6 coming up (which does not specify color)'. E.g., if Bob tells Amy that 6 comes up for the red dice, then the updated sample space does consist of 6 outcomes, as suggested by Chris:

 
where first and second coordinate in each ordered pair represents the number coming up for red and blue dice respectively.



Example. Suppose a poker deck is to be distributed to four players: player A,B,C,D. Calculate the probability that player C gets exactly 3 cards with diamond suit, with the condition

(a) Exactly 8 cards with diamond suit are distributed to players A and B, and then 13 of the remaining 44 cards are to be distributed randomly to player C.

(b) 26 cards are distributed to players A and B, where exactly 8 of them are with diamond suit. Then, the remaining 26 cards are to be distributed randomly and equally to players C and D.

Solution.

(a) Let us update the sample space to contain the possible distributions of the 13 of the 44 cards. The number of sample points in the updated sample space is   (placing 13 card places for player C into 44 cards). On the other hand, the number of sample points in the event is   (placing 3 card places for player C into 5 cards with diamond suit, then placing 10 card places for player C into 39 cards with non-diamond suit). Then, the probability is

 

(b) Let us update the sample space to contain the possible distributions of the remaining 26 cards. The number of sample points in the updated sample space is   (placing 13 card places for player C into 26 cards. Then, the remaining 13 cards are for player D). On the other hand, the number of sample points in the event is  . (placing 3 card places for player C into 5 distinguishable cards with diamond suit, then placing 10 card places for player C into 39 cards with non-diamond suit. The remaining 13 cards are for player D.) Then, the probability is

 
 

Exercise. Suppose 13 of the 52 cards are to be distributed to player C. Calculate the probability that player C gets exactly 3 cards with diamond suit. (Answer: approximately 0.056)

Solution

The probability is

 
(The number of sample points in the sample space is   (placing 13 card places into 52 cards). The number of sample points in the event is   (placing 3 card places into 13 cards with diamond suit, and then placing 10 card places into 39 cards with non-diamond suit.)


Example. (Simpson's paradox) Suppose there are two treatments for a disease: treatment A and B. To test the effectiveness of the treatments, treatment A and B are applied to two different groups of patients with that disease. The following summarizes the number of patients with different results for the two treatments:

 
(Assume that there are only two outcomes for the treatment:   or  .)

(a) Calculate the probability of recovery for men receiving treatment A.

(b) Calculate the probability of recovery for men receiving treatment B.

(c) Calculate the probability of recovery for women receiving treatment A.

(d) Calculate the probability of recovery for women receiving treatment B.

(e) Calculate the probability of recovery for all patients receiving treatment A.

(f) Calculate the probability of recovery for all patients receiving treatment B.


(To calculate the probability of recovery, we consider the probability of picking a recovered patient when we select a patient from the pool of patients randomly such that every patient is equally likely to be picked, where the pool of patients depends on our conditions.)

Solution.

(a) The probability is  .

(b) The probability is  . (This means treatment B is better than A when applied to men only.)

(c) The probability is  .

(d) The probability is  . (This means treatment B is better than A when applied to women only.)

(e) The probability is  .

(f) The probability is  . (This means treatment B is worse than A when applied to all patients.)

Remark.

  • This example shows the Simpson's paradox where there is a reverse in direction of a comparison when data from several groups are combined into a single group.

The following is a generalization to the formula  . It is useful when we calculate the probability of the occurrence of multiple events together, by considering the events "one by one".

Proposition. (Multiplication rule of probability) Let   be a probability space, and   be events. Then,

 

Proof. With the assumptions, we have by definition

 

 

Remark.

  • It is also known as chain rule of probability.

Example. Consider an urn that contains 20 distinguishable balls, where 5 of them is red, 6 of them is blue, 7 of them is green, and 2 of them is purple. Suppose we draw 5 balls from the urn.

(a) Calculate the probability of drawing 3 red balls and 2 blue balls from the urn, if the draw is done without replacement.

(b) Calculate the probability of drawing 3 red balls and 2 blue balls from the urn, if the draw is done with replacement.

Solution.

Let   be the event that red ball is drawn in  -th draw, and blue ball is drawn in  -th draw respectively.

(a) The desired probability can be expressed as   (or  , etc., but simply changing the order does not affect the probability in this case).

Method 1: Combinatorial probability. Let   be the sample space that contains all possible outcomes of the 5 ordered draws. The probability is

 


Method 2: Multiplication rule. Then, the probability is given by

 
(For example,   since the sample space is updated from the set containing 20 balls, to the set containing 19 balls, excluding the ball drawn out in the first draw, and also there are only four (valid) sample points (four red balls remaining) in   in this updated sample space. Here, the initial sample space contains the 20 possible outcomes from drawing a ball from the urn.)

(b) By the multiplication rule, the probability is given by

 
(The conditional probabilities are the same as the unconditional probabilities, since even the sample space is updated according to the conditions, there are not any changes at all! (since the draws are done with replacement) In fact, the outcome of a draw is not affected by the outcomes of other draws. Actually, it turns out that these events ( ) are (statistically) independent. We will discuss the formal definition of independence in more details in a later section. )

Two important theorems related to conditional probability, namely law of total probability and Bayes' theorem, will be discussed in the following section.

Law of total probability and Bayes' theorem edit

Sometimes, it is not an easy task to assign a suitable unconditional probability to an event. For instance, suppose Amy will perform a COVID-19 test, and the result is either positive or negative (impossible to have invalid results). Let  . What should be  ? It is actually quite difficult to answer directly, since this probability is without any condition. In particular, it is unknown that whether Amy gets infected by COVID-19 or not, and clearly the infection will affect the probability assignment quite significantly.

On the other hand, it may be easier to assign/calculate related conditional/unconditional probabilities. Now, let  . The conditional probability  , called sensitivity, may be known based on the research on the COVID-19 test. Also, the conditional probability  , called specificity, may also be known based on the research. Besides, the probability   may be obtained according to studies on COVID-19 infection for Amy's place of living. Since

 
by the definition of conditional probability, we have
 
Since the conditional probability satisfies the probability axioms, we have the relation  , and thus the value of   can be obtained. The remaining terms in the expression can also be obtained, as suggested by above. Thus, we can finally obtain the value of  .

This shows that conditional probabilities can be quite helpful for calculating unconditional probabilities, especially when we condition appropriately so that the conditional probabilities, and the probability of the condition are known in some ways.

The following theorem is an important theorem that relates unconditional probabilities and conditional probabilities, as in above discussion.

Theorem. (Law of total probability)

An illustration of law of total probability.

Let   be a probability space.

  • (finite case) Assume that   are pairwise disjoint events such that   and  . Then,

 
  • (countably infinite case) Assume that   are pairwise disjoint events such that   and  . Then,

 

Proof. Here we only prove the finite case. The proof for countably infinite case is similar and thus left as an exercise.

Under the assumptions,   are pairwise disjoint, and thus   are also pairwise disjoint (by observing that  , and other intersections have similar results). Also, since  , the conditional probabilities   are defined. Moreover, since  , we can observe that   (through Venn diagram, informally). It follows that

 

 

 

Exercise. Prove the countably infinite case for law of total probability.

Proof

Proof. Under the assumptions,   are pairwise disjoint, and thus   are also pairwise disjoint (by observing that  , and other intersections have similar results). Also, since  , the conditional probabilities   are defined. Moreover, since  , we can observe that   (through Venn diagram, informally). It follows that

 

 


Now, suppose Amy has performed a COVID-19 test, and the result is positive! So now Amy is worrying about whether she really gets infected by COVID-19, or it is just a false positive. Therefore, she would like to know the conditional probability   (the conditional probability for getting infected given testing positive). Notice that the conditional probability   may be known (based on some research). However, it does not equal the conditional probability   generally. (These two probabilities are referring to two different things.) So, now we are interested in knowing that whether there is formula that relates these two probabilities, which have somewhat "similar" expressions. See the following exercise for deriving the relationship between   and  :

 

Exercise. Let   be a probability space, and   be events. Assume   and  . Propose a relationship between   and  , and prove it. (Hint: You may apply the definition of conditional probability on each of these two conditional probabilities. Do you notice any similarity in the expressions?)

Solution

Proposition:  .

Proof. Applying the definition of conditional probability, we have

 
and
 
Notice that these two expressions have the same numerator. Thus, we have
 
Rearranging the equation yields the desired result (the rearrangement is valid since   and   are nonzero).

 

Remark.

  •   is known as the prior probability of event  . It is called prior since this probability does not take into account the knowledge of the occurrence of event  . It reflects our initial degrees of belief of event   (according to the subjectivism interpretation of probability).
  •   is known as the posterior probability of event. It is called posterior since this probability takes into account the knowledge of the occurrence of event  . It reflects our adjusted degrees of belief of event   as there are more new information (the occurrence of event   in this case) (according to the subjectivism interpretation of probability).
  • This formula suggests us the way of adjusting/updating the prior probability   to the posterior probability  .
  • Of course, after obtaining the posterior probability, there may be another round of new information available, and we may need to treat this "posterior probability" as the "prior probability", and update it to a "new version" of posterior probability.


The following theorem is the generalization of the above result.

Theorem. (Bayes' theorem)

An illustration of Bayes' theorem.

Let   be a probability space.

  • (finite case) Assume that   are pairwise disjoint events such that   and  . Then,

 
for every  .
  • (countably infinite case) Assume that   are pairwise disjoint events such that   and  . Then,

 
for every  .

Proof.

Finite case: Under the assumptions, we have by law of total probability

 
On the other hand, by the definition of conditional probability,
 
Since   by the definition of conditional probability, the result follows.

Countably infinite case: Under the assumptions, we have by law of total probability

 
On the other hand, by the definition of conditional probability,
 
Since   by the definition of conditional probability, the result follows.

 

Example. Suppose the weather at a certain day can either be sunny or rainy, with equal probability. Amy has a probability of   ( ) to bring an umbrella at that day if the weather of that day is rainy (sunny). At a day, we see that Amy brings an umbrella. Calculate the probability for that day to be rainy.

Solution. Let   be the events that the weather at that day is rainy, sunny and Amy brings an umbrella at that day respectively. Then, the probability that Amy brings an umbrella at that day is

 
by law of total probability.

Given that Amy brings an umbrella at that day, the probability for that day to be rainy is

 
(by Bayes' theorem).
 

Exercise. Assume that the weather can also be cloudy, such that the weather is twice as likely to be cloudy compared with sunny (or rainy). (That is, the probability for the weather to be cloudy is twice of the probability for the weather to be sunny (which equals the probability for the weather to be rainy).) Also, Amy has a probability   to bring an umbrella at a day if the weather of that day is cloudy. Calculate   such that the probability for a day to be rainy if Amy brings an umbrella at that day is   instead of  . (Answer: )

Solution

Let   be the event that the weather is cloudy. Since the weather is twice as likely to be cloudy compared with sunny (or rainy) and also the weather can either be sunny, rainy, or cloudy, it follows that

 
So,  . We want to set the value of   such that the probability  . Hence,
 


Example. Consider Morse code where dots and dashes are used to encode messages. Suppose Amy and Bob want to communicate through Morse code. However, when Amy sends a Morse code to Bob, there is interference during the transmission, and hence there is a   probability for a dot (or dash) to be mistakenly received as a dash (or dot resp.). It is known that the dots and dashes are used in proportion 3:4 in the Morse code. From this, we may assume that the probability for a dot and dash to be sent is   and   respectively. Calculate the probability that a dot is really sent by Amy if Bob receives a dot.

Solution.

Let   and   be the event that a dot is sent and a dot is received respectively.

Then, we have

 
Furthermore, we have
 
Then, we have
 
It follows that the desired probability is given by
 

Example. Suppose a diagnosis for cancer is conducted for Bob. Let   be the event that Bob has cancer, and   be the event that Bob is diagnosed with cancer. Based on research, it is known that the probability for a correct diagnosis is 0.99. That is,

 
Based on the research on the incidence rate of cancer for the Bob's place of living, we may assume that  . Calculate the probability that Bob has cancer if he is diagnosed with cancer.

Solution. The desired probability is given by  . First,

 
We have
 

Remark.

  • This result is quite counter-intuitive, since it appears that the probability for a correct diagnosis is quite high, but it turns out that this probability is quite low!
  • The main reason for this result is that the incidence rate of cancer is quite low.
 

Exercise. Calculate the probability that Bob does not have cancer if he is not diagnosed with cancer. (Answer: approximately 0.99999)

Solution

The desired probability is given by  . We have

 
So, if Bob is not diagnosed with cancer, then he can be quite sure that he actually does not have cancer.


The following is a famous problem.

Example. (Monty Hall problem)

 
A graphical illustration of the situation

Suppose you are on a game show. There are three doors in front of you, labelled door 1,2,3, and you are allowed to open one of them. Among these three doors, a new car is behind one of the doors, and a goat is behind the other doors. If you open the door with a car behind it, then you will get the car. Otherwise, you get nothing.

Now, suppose you pick the door 1. Then the host Monty Hall, who knows what are behind the three doors, opens a door with a goat behind it (which is not chosen by you, of course). It turns out that Monty Hall opens door 3. After that, you are given an offer where you can keep choosing door 1, or switch your choice to door 2.

With these information, calculate the probability that the new car is behind door 2.

Solution.

Let   be the event that a car is behind door 1,2,3 respectively. Also, let   be the event that Monty Hall opens door 3. It should be reasonable to assume that a car is equally likely to be behind one of the doors. So, we have

 
Now, we consider different conditional probabilities related to  , by considering three different cases:
  • Case 1: The car is behind door 1. Then, Monty Hall can open door 2 and 3, and it is reasonable to assume that it is equally likely for him to open one of them. Hence, the conditional probability  .
  • Case 2: The car is behind door 2. Then, Monty Hall must open door 3 (since this is the only choice). Hence, the conditional probability  .
  • Case 3: The car is behind door 3. Then, Monty Hall can never open door 3 (where a car is behind it). Hence, the conditional probability  .

The desired probability is given by

 

Remark.

  • Clearly,   since it is impossible for the car to be behind door 3 if Monty Hall opens door 3. It follows that  . Thus, it is favourable for you to switch your choice to door 2.
 

Exercise. 1. A student thinks that the probability should be   with the following argument:

After Monty Hall opens door 3, we know that a goat is behind it. This means a car is not behind door 3. Then, it is equally likely for the car to be behind one of door 1 and door 2. Hence, the probability is  .

What is the mistake in this argument?

Solution

The mistake is that a wrong event is conditioned on for the above argument. To be more precise, through this reasoning the probability calculated is actually

 
But  . Particularly, when the car is not behind door 3, this does not mean Monty Hall must open door 3. It is possible that the car is behind door 1, where Monty Hall can open door 2 as well.

2. Suppose there are   more doors in addition to the doors 1,2,3, labelled door 4,5,..., , such that there are   doors in total. In this case, Monty Hall opens   doors where a goat is behind each of them. It turns out that Monty Hall opens doors 3,4,..., . Now, you are given an offer where you can keep choosing door 1, or switch your choice to door 2. Calculate the probability that the car is behind door 2. (Answer:  )

Solution

Let   be the event that a car is behind door 1,2,3,...,  respectively. Also, let   be the event that Monty Hall opens doors 3,4,..., . It should be reasonable to assume that a car is equally likely to be behind one of the doors. So, we have

 
In this situation, we have   cases.
  • Case 1: The car is behind door 1. Monty can then open   of the doors 2,3,...,  (or equivalently, choose one of the doors 2,3,...,  to not open). We know that the event   means Monty Hall chooses door 2 to not open. Thus, the probability is  .
  • Case 2: The car is behind door 2. Monty Hall must then open of the doors 3,4,..., . Thus, the probability is  .
  • Case 3: The car is behind door 3. Then, Monty Hall must not open door 3, and hence must not open doors 3,4,..., . Thus, the probability is  

With similar arguments, we have

 
It follows that the desired probability is
 


Remark.

  • For other cases in which another door is picked, the same result holds by symmetry (notations can be changed).

Example. (Gambler's ruin) Consider a gambling game as follows. The player starts with capital   (  is a nonnegative integer and is specified by the player), and tosses a fair coin repeatedly until he wants to stop. If head comes up, then he gains  . If tail comes up, then he loses  .

Suppose Bob plays this game and decides to stop tossing when one of the following conditions are satisfied:

  1. He runs out of money. (That is, his capital is  .)
  2. His capital is  , where   is a certain nonnegative integer specified by Bob.

Calculate the probability that Bob will eventually run out of all money if he starts with capital  .

Solution. In this case, we should calculate the probability using recursive approach. First, let   be the event that Bob will eventually run out of all money if he starts with capital  . Also, let   be the event that head comes up in the first toss.

Then, a crucial observation is that if the first toss gives a head, then the situation is exactly the same as that Bob starts playing the game with capital   after the toss where he gains  . Hence, we have  . Likewise, we have  . (This is often the most important step for solving a problem recursively: identifying the recursive relation given (usually implicitly) in the question.)

Then, we can establish the equation

 
which holds for every   (we do not include   and   since these are boundary values). Notice that we have   (since Bob immediately runs out of money if he starts with zero capital). On the other hand, we have   (since Bob immediately stops playing the game if he starts with capital  , so it is impossible for him to run out of money).

Notice that the above equation implies that the decrement of the probability   when   increases by one is the same for every  . Since there are in total   decrements, it follows that every decrement is of value  . Hence,

 
 

Exercise. Suppose Bob is so greedy that  . Under this case, what does the probability tend to (that is, what is the limit of the above probability as  )? Suggest an intuitive meaning of this result.

Solution

Since  , it follows that the probability ( ) tends to 1 as  . This means Bob will inevitably run out of all money if he is extremely greedy and never wants to stop at any fixed value of  . This result is known as the Gambler's ruin.


Example. Player A and B take turn to roll a fair dice, where player A rolls the dice first. A player wins if he is the first player that rolls a six. Calculate the probability that player A wins.

Solution. We divide the situation into three cases:

  1. Player A rolls a six in the first roll. (probability  ) In this case, player A wins.
  2. Player A does not roll a six in the first roll, and then player B rolls a six in the second roll. (probability   . This can be obtained by considering multiplication rule of probability. We can see that the probability for player B to roll a six in the second roll given that player A does not roll a six in the first roll is  .) But in this case, the probability for player A to win is zero anyway.
  3. Player A and B does not roll a six in the first and second roll respectively. (probability  )

Notice that in case 3, the situation is exactly the same as the original situation at the beginning where it is player A's turn to roll the dice. Hence, under the condition for case 3, the probability for player A to win is the same as the unconditional one.

Thus, we can write

 

Remark.

  • Notice that eventually, one of the players must roll a six. Hence, it is impossible for the game to end up with a draw. This means the probability that player B wins is  .
 

Exercise. Suppose the dice is changed such that the probability for getting a six from rolling the dice is   (which is not necessarily  ), while the player A still rolls the dice first. Show that the game is always favourable to player A for every  .

Solution

Proof. When the probability is  , we can write the above equation as

 
Thus, the probability that player B wins is  . We have
 
Notice that   and   for every  . Hence, this difference is always positive, and hence   always for every  .

 



Example. (Three Prisoners problem) There are three prisoners, A, B and C, which will originally be executed tomorrow. However, the governor chooses one of them randomly to be pardoned (and hence not executed). Only the warden knows who is to be pardoned, but is not allowed to tell the prisoners.

The three prisoners have also heard about this. So, prisoner A asks the warden

Which of the prisoners B and C will be executed? If both are to be executed, then just tell me randomly one of them.

The warden thinks a while, and tells the prisoner A that prisoner B is to be executed. The warden thinks that he does not give any information to prisoner A about whether he is to be pardoned or not, with the following reasoning:

Let   be the events that A, B, or C is pardoned respectively. Also, let   be the event that the warden says B will die. Since the prisoner to be pardoned is chosen randomly, we have

 
After prisoner A hears the answer from warden, the conditional probability for him to be pardoned is
 
So, this probability is the same as the unconditional one.

However, prisoner A thinks that his probability of being pardoned is increased to   with the following reasoning:

Given that prisoner B will be executed, either prisoner A or C will be pardoned. Hence, my probability of being pardoned is  .

Explain why the prisoner A's reasoning is incorrect.

Solution. Prisoner A falsely interprets the warden's saying as the event  , and he calculates

 
But   is not the same as  ! Particularly, if B is not to be pardoned (that is, to be executed), then the warden will not necessarily tell that prisoner B is to be executed, since it is possible that C is also to be pardoned, and the warden tells C is to be executed instead.
 

Exercise. Suppose prisoner A tells the other two prisoners about the answer from the warden to his question.

(a) What is the probability that prisoner B is to be pardoned?

(b) What is the probability that prisoner C is to be pardoned?

Solution

Let us use the notations as defined by the warden's answer.

(a) The probability is

 
That is, prisoner B knows that he must be executed!

(b) The probability is

 
That is, prisoner C knows that he is twice as likely as before to be pardoned!

Remark.

  • From this, we actually know that the warden can give information to the prisoners about whether they are pardoned or not through his answer, if any of the prisoners B and C knows about this.



Independence edit

From the previous discussion, we know that the conditional probability of event   given the occurrence of event   can be interpreted as the probability of   where the sample space is updated to event  . In general, through this update, the probability of   should be affected. But what if the probability is somehow the same as the one before the update?

If this is the case, then the occurrence of a particular event   does not affect the probability of event   actually. Symbolically, it means  . If this holds, then we have  . This means the occurrence of event   also does not affect the probability of event  . This result matches with our intuitive interpretation of the independence of two events, so it seems that it is quite reasonable to define the independence of events   and   as follows:

Two events   and   are independent if   or  .

However, this definition has some slight issues. If  , then some of the conditional probabilities involved may be undefined. So, for some events, we may not be able to tell whether they are independent or not using this "definition". To deal with this, we consider an alternative definition, that is equivalent to the above when   and  . To motivate that definition, we can see that  , when both   and   are nonzero. This results in the following definition:

Definition. (Independence of two events) Let   be a probability space, and   be two events. Then, the events   and   are independent if  .

Remark.

  • We sometimes write   when   and   are independent.

But what if there are more than two events involved? Intuitively, we may consider the following as the general "definition" of independence:

The events   are independent if  . (  is an integer that is at least 2.)

But we will get some strange results by using this as the "definition":

Example. Suppose we roll a dice twice. Let   be the event that both rolls result in the same number,   be the event that sum of the numbers obtained is between 7 and 10,   be the event that the sum is 7 or 8 or 12. Show that   (that is, the events   are "independent" according to the above "definition") but   (that is, the events   and   are not independent).

Proof. First, we have  ,  , and  . Hence, we have

 
But,
 

 


From this example, merely requiring   cannot ensure all pairs of events involved are independent. So, this suggests us another definition:

The events   are independent if all the pairs of events involved are independent.

However, we will again get some strange results by using this as the "definition":

Example. Consider two balls, where one is bigger than another. Both balls are equally likely to be red or blue. Let   be the event that the bigger ball is red,   be the event that the smaller ball is red, and   be the event that both balls have the same color.

Show that all the pairs of events involved are independent, but   (that is, the events   and   are not independent.)

Proof. Consider the following tables containing relevant probabilities:

 
From this, we can see that  . On the other hand,
 
So, this means all the pairs of events involved are independent.

However,  . (Actually, we have  . This means given the occurrence of both events   and  , we know for sure that the event   occurs. So, the knowledge of the occurrence of both events   and   affect the probability of event  , although the probability of event   is not affected by merely the occurrence of   or merely the occurrence of  . )

 


The above examples suggest us that we actually need both requirements

  1.  . (This ensures  . That is, if we condition on the intersection of two events, the probability of another one is not affected.)
  2. All the pairs of events   are independent. (That is,  .

in the definition of independence of events   for the definition to "make sense".

Similarly, the independence of four events   should require

  1.  .
  2. All the triples of events are independent.
  • In other words, we need the probability of intersection of any three and any two events to be able to "split as product of probabilities of single events" as in above.

This leads us to the following general definition:

Definition. (Independence) Let   be a probability space, and   be events. Then, the events   are (mutually) independent if

 
for every finite index set  .

Remark.

  • This definition also applies to finitely many events  , where the index set   is a subset of   instead.
  • If all the pairs of events involved are independent, then we say that   are pairwise independent. By definition, the mutual independence of   implies their pairwise independence (we just consider the index set with two elements in this case). But, the converse does not hold (discussed in a previous example).
  • Sometimes, based on the situation, two events may be assumed to be independent.

Example. Suppose Amy and Bob take turn to toss a fair coin, where Amy toss the coin first. Let   and   be the events that head comes up in the first and second toss respectively. Calculate the probability  .

Solution. It is reasonable to assume that   and   are independent. Hence,

 
(Alternatively, one may also reasonably assume that the four outcomes arising from these two tosses are equally likely, and hence conclude that  .)

Example. Suppose a fair coin is tossed 10 times independently. Which of the following outcomes is more likely? (H represents head and T represents tail)

  1. HHHHHHHHHT
  2. HTTHTHHTHT

Solution. The probability for each of these outcomes is  . So, they are equally likely.

Remark.

  • One may falsely think that the second outcome is more likely, since it seems the heads and tails are distributed more "equally". The reasoning behind this may be one falsely interpret the second outcome to be the same as "5 heads and 5 tails" (without specifying the order). But this is not the case: the second one specifies the outcome required in every toss.
 

Exercise. Calculate the probability that 5 heads and 5 tails are obtained in these 10 tosses. (Answer: approximately 0.246) (Hint: consider the number of different arrangements of 5 heads and 5 tails in these 10 tosses.)

Solution

We can regard the 10 tosses as 10 distinguishable cells with capacity one, and 5 "heads" to be 5 indistinguishable balls. Then, the process of arranging the outcomes is equivalent to placing the 5 balls into the 10 cells (the 5 empty cells are for "tails"). Hence, the number of arrangements is given by  . Notice that every such arrangement is a sample point in the sample space (for the experiment of tossing a coin 10 times), and the probability of every singleton event is  . It follows that the desired probability is

 


Example. Let   be a probability space, and   be events. Show that if the events   and   are independent, then so is the events   and  .

Proof. Assume that the events   and   are independent. We wish to show that  .

We have

 
So, the events   and   are independent.

 

 

Exercise. Show that if the events   and   are independent,

(a) then so is the events   and  .

(b) then so is the events   and  .

Solution
(a)

Proof. Assume that the events   and   are independent. Then,

 
So, the events   and   are independent.

 

Remark.

  • Actually, this result is symmetric to the one in the example above.
(b)

Proof. Assume that the events   and   are independent. Then, by (a) in this exercise,   and   are independent. It follows by the result in the example above that   and   are independent. (Take " " in the example above to be  .)

 



In general, we have the following result.

Proposition. Let   be a probability space, and   be events (where   is an arbitrary integer such that  ). If   are independent, then after replacing some of the events by their respective complements, the independence still holds.

Proof. Assume that the events   are independent. Then, we have

 
Now, suppose we replace   by  , and we want to prove that the independence still holds:
 
Thus,   are still independent. By symmetry, we can instead replace an event other than   by its respective complement, and the independence still holds.

Notice that we can split the process of replacing some of the events into multiple steps, where we replace one event in every step (that is, we replace the events one by one). Now, we can apply the above argument in each step to ensure that the resulting events are still independent. Thus, after all steps, and we finish the replacement, the independence still holds.

 

Example. Let   be a probability space, and   be events such that   and  .

(a) Show that for every event  ,   and   are independent.

(b) Show that for every event  ,   and   are independent.

Solution.

(a)

Proof. For every event  , we have  . So,   by monotonicity. On the other hand,   by nonnegativity. These two inequalities imply that  . Hence, the events   and   are independent.

 

Remark.

  • Notice that one cannot say that if  , then   (but the converse is true). It is possible to have nonempty event with probability zero. (We can have such event for continuous distribution (to be discussed later).)

(b)

Proof. Notice that  . By (a), the events   and   are independent. It follows that   and   are independent (see a previous exercise).

 

 

Exercise. Are the events   and   independent?

Solution

They are independent. We can just apply (a) where the "event  " is set to be the event  .


Remark.

  • The meaning of this result is that knowledge of arbitrary event does not make a certain event less certain, and also does not make an impossible event possible, which is intuitive.

Example. Suppose a machine has three components: A, B, and C. Let   be the event that the component A, B, and C to fail respectively. Based on the studies on the machine, it is known that  , and we can also assume that the events   are (mutually) independent. Suppose the machine fails if one of the components fails, and operates normally otherwise.

(a) Calculate the probability that the machine operates normally.

(b) Given that the machine fails, calculate the probability that all three component fail.

Solution.

(a) The desired probability is given by  . We know that the events   are independent. So, we have

 

(b) The desired probability is given by  . We have by definition

 
 

Exercise. Suppose instead the machine fails if at least two of the component fail, and operates normally otherwise. Repeat parts (a) and (b) above. (Answer: (a) 0.9774; (b) approximately 0.00885)

Solution

(a) The probability is

 
(b) The desired probability is
 


 

Exercise.

(a) Show that the events   and   are independent if and only if  .

(b) Show that the event   is independent of itself if and only if  .

(Hint: prove the "if" and "only if" parts simultaneously)

Solution
(a)

Proof. We have

 

 

(b)

Proof. We have

 

 


 

Exercise. A student claims that for every event  , if   and   are independent, and   and   are independent, then   and   are also independent.

Is the student's claim correct? If yes, prove it. If no, give a counterexample.

Solution

The student's claim is not correct. For example, consider an experiment where we toss a fair coin twice. Let   be the event that head comes up in the first toss,   be the event that head comes up in the second toss,   be the same event as  . Since all four outcomes should be equally likely, it follows that

 
Thus,   and   to be independent, then so is   and   (since   is the same as  ). However,   and   are clearly not independent:
  •  .
  • But,  .

Example. Consider the following procedure to make a fair toss from a coin with unknown probability   of getting a head (suggested by John von Neumann):

  1. Toss the coin twice (independently).
  2. If both outcomes are the same, ignore them and return to step 1. Otherwise, go to next step.
  3. Regard the first outcome as the outcome obtained.

Show that this procedure works.

Proof. Let   be the event that a head is obtained at the end of this procedure. We then want to show that   (this also means the probability for obtaining a tail at the end is  ). Now, we consider the following tree diagram:


 *---- HH ----> repeat
/
------ HT -----> E occurs
------ TH -----> E^c occurs
\
 *---- TT ----> repeat

Notice that if both outcomes are the same, and we return to step 1, then the situation is exactly the same as the initial situation. Thus,   and  . Also, we can calculate

  •  .
  •  .
  •  .
  •  .

Hence, by law of total probability, we can conclude that

 

 


Conditional independence edit

Conditional independence is a conditional version of independence, and has the following definition which is similar to that of independence.

Definition. (Conditional independence) Let   be a probability space, and   be events. Then, the events   are conditionally independent given the occurrence of event   if

 
for every finite index set  .

Remark.

  • This definition also applies to finitely many events  , where the index set   is a subset of   instead.
  • In particular, if events   and   are conditionally independent given   (assuming   and  ),
     
  • This means given the occurrence of  , the occurrence of   does not affect the probability of  .
  • Conditional independence of some events neither implies nor is implied by independence of them. These two concepts are not related.

Example. Suppose we select two people from a large population randomly, and we label them as person A and B. Let   be the event that the birthday of person A is June 1st,   be the event that the birthday of person B is July 1st, and   be the event that person A and B are twins. (Assume that it is equally likely for the birthday of a person to be one of the 365 days.) Then, it should be reasonable to assume that events   and   are conditionally independent given  . Show that the events   and   are not conditionally independent given  .

Proof. Since   and  , (  and   are independent. Also,   and   are independent.) it follows that  .

 

 

Exercise. Show that the events   and   are independent (unconditionally) if and only if  .

Solution

Proof. By law of total probability,

 
On the other hand,  . This means   if and only if  .