Probability/Conditional Distributions



MotivationEdit

Suppose there is an earthquake. Let   be the number of casualties and   be the Richter scale of the earthquake.

(a) Without given anything, what is the distribution of  ?

(b) Given that   , what is the distribution of  ?

(c) Given that   , what is the distribution of  ?

Remark.

  •   means the earthquake is micro, and   means the earthquake is great.

Are your answers to (a),(b),(c) different?

Indeed, we condition on   and   in (ii) and (iii), and the distribution we are finding in (ii) and (iii) should actually be denoted by   and   respectively for clarity. They are the conditional distribution of  , meaning   given   (before observing the value of  ), or  , meaning   given   (after observing the value of  ).

Conditional distributionsEdit

Recall the definition of conditional probability:

 
in which   are events, with  .

We have similar definitions for conditional distributions.

Definition. (Conditional probability function) Let   be random variables. The conditional probability (mass or density) function of   given  , in which   is a real number, is

 

Remark.

  • For discrete random variable  , the pmf

 
  • The marginal pdf can be interpreted as normalizing constant, which makes the integral  , since   (integrating over the region in which   is fixed to be   (the region in which the condition is satisfied), so we only integrate over the corresponding interval of   (  is still a variable)).
  • This is similar to the denominator in the definition of conditional probability, which makes the conditional probability of the whole sample space equals one, to satisfy the probability axiom.

Graphical illustration of the definition:

Top view:
     
        |
        |
        *---------------* 
        |               |
        |               |
fixed y *===============* <--- corresponding interval
        |               |
        |               |
        *---------------*
        |
        *---------------- x

Side view:

          *  
         / \ 
        *\  *  /                                           
       /|#\   \
   |  / |##\ / *---------*
   | *  |###\            /\
   | |\ |##/#\----------/--\     
   | | \|#/###*--------*   /                             
   | |  \/############/#\ /                              
   | |y *\===========/===*                               
   | | /  *---------*   /                                
   | |/              \ /                                 
   | *----------------*                                  
   |/                                                    
   *------------------------- x                          


Front view:
             
    |
    |
    |               
    *\     
    |#\    
    |##\   
    |###\             
    |####\   <------ Area: f_Y(y)
    |#####*--------*  
    |###############\ 
    *================*-------------- x

*---*
|###| : corresponding cross section from joint pdf
*---*   

Definition. (Conditional cumulative distribution function) Let   be random variables. The conditional cumulative distribution function (cdf) of   given  , in which   is a real number, is

 

Graphical illustration of the definition (continuous random variables):

Top view:
     
        |
        |
        *---------------* 
        |               |
        |               |
fixed y *=========@=====* <--- corresponding interval
        |         x     |
        |               |
        *---------------*
        |
        *---------------- 

Side view:

          *  
         / \ 
        *\  *  /                                           
       /|#\   \
   |  / |##\ / *---------*
   | *  |###\            /\
   | |\ |##/#\----------/--\     
   | | \|#/###*--------*   /                             
   | |  \/#########   / \ /                              
   | |y *\========@==/===*                               
   | | /  *-------x-*   /                                
   | |/              \ /                                 
   | *----------------*                                  
   |/                                                    
   *------------------------- x                          


Front view:

    |
    |
    |
    *\      
    |#\    
    |##\              
    |###\             
    |####\   <------------- Area: f_Y(y)         
    |#####*--------*  
    |###########    \ 
    *==========@=====*--------------  
               x
*---*
|###| : the desired region from the cross section from joint pdf, whose area is the probability from the cdf
*---*   

If   for some event  , we have some special notations for simplicity:

  • the conditional probability function of   given   becomes

 
  • the conditional cdf of   given   becomes

 

Proposition. (Determining independence of two random variables) Random varibles   are independent if and only if   for each  .

Proof. Recall the definition of independence between two random variables:

  are independent if

 
for each  .

Since

 
for each  , we have the desired result.

 

Remark.

  • This is expected, since the conditioning on independent event should not affect the occurrence of another independent event.


We can extend the definition of conditional probability function and cdf to groups of random variables, for joint cdf's and joint probability functions, as follows:

Definition. (Conditional joint probability function) Let   and   be two random vectors. The conditional joint probability function of   given   is

 

Then, we also have a similar proposition for determining independence of two random vectors.

Proposition. (Determining independence of two random vectors) Random vectors   are independent if and only if   for each  .

Proof. The definition of independence between two random vectors is

  •   are independent if

 
for each  .

Since

 
for each  , we have the desired result.

 

Conditional distributions of bivariate normal distributionEdit

Recall from the Probability/Important Distributions chapter that the joint pdf of   is

 
, and   and   in this case. in which   and   are positive.

Proposition. (Conditional distributions of bivariate normal distribution) Let  . Then,

 
(abuse of notations).

Proof.

  • First, the conditional pdf

 
  • Then, we can see that  ,
  • and by symmetry (interchanging   and  , and also interchanging   and  ),  .

 

Conditional version of conceptsEdit

We can obtain conditional version of concepts previously established for 'unconditional' distributions analogously for conditional distributions by substituting 'unconditional' cdf, pdf or pmf, i.e.   or  , by their conditional counterparts, i.e.   or  .

Conditional independenceEdit

Definition. Random variables   are conditionally independent given   if and only if

 
or
 
. for each real number   and for each positive integer  , in which   and   denote the joint cdf and probabillity function of   conditional on   respectively.

Remark.

  • For random variables, conditional independence and independence are not related, i.e. one of them does not imply the another.

Example. (Conditional independence does not imply independence) TODO

Example. (Independence does not imply conditional independence) TODO

Conditional expectationEdit

Definition. (Conditional expectation) Let   be the conditional probability function of   given  . Then,

 

Remark.

  •   is a function of  
  • the random variable  , which is a function of   after computing the expectation, is written as   for brevity, in which  's are the same term.
  •   is a realization of   when   is observed to be   in which  's are the same term.

Similarly, we have conditional version of law of the unconscious statistician.

Proposition. (Law of the unconscious statistician (conditional version)) Let   be the conditional probability function of   given  . Then, for each function  ,

 

Proposition. (Conditional expectation under independence) If random variables   are independent,

 
for each function  .

Proof.

 

 

Remark.

  • This equality may not hold if   are not independent.

Example. Suppose random vector   in which   are independent random variables, and  . Then,

 
(  is treated as constant, because of the conditioning: it is constant after realization of  ) but
 

The properties of   still hold for conditional expectations  , with every 'unconditional' expectation replaced by conditional expectation and some suitable modifications, as follows:

Proposition. (Properties of conditional expectation) For each random variable  ,

  • (linearity)  
for each functions   of   and for each random variable  
  • (nonnegativity) if  ,  
  • (monotonicity) if  ,   for each random variable  
  • (triangle inequality)

 

  • (multiplicativity under independence) if   are conditionally independent given  ,

 

Proof. The proof is similar to the one for 'unconditional' expectations.

 

Remark.

  •   are treated as constants given  , since after observing the value of   , they cannot be changed.
  • Each result also holds with   replaced by random vectors  .

The following theorem about conditional expectation is quite important.

Theorem. (Law of total expectation) For each function   and for each random variable  ,

 

Proof.

 

 

Remark.

  • we can replace   by   and get

 

Corollary. (Generalized law of total probability) For each event  ,

 

Proof.

  • First,

 
  • Then, using law of total expectation,

 

 

Remark.

  • The expectation is taken with respect to  , so we use the   notation. We will use similar notations to denote the random variables to which the expectation is taken with respect if needed.
  • We can replace   by  , which is a random vector.
  • If   is discrete, then the expanded form of the result is   (discrete case for law of total probability).
  • If   is continuous, then the expanded form of the result is   (continuous case for law of total probability).

Corollary. (Expectation version of law of total probability) Suppose the sample space   in which  's are mutually exclusive. Then,

 

Proof. Define   if   occurs, in which   is a positive integer. Then,

 

 

Remark.

  • the number of events can be finite, as long as they are mutually exclusive and their union is the whole sample space
  • if  , it reduces to law of total probability

Example. Let   be the human height in m. A person is randomly selected from a population consisting of same number of men and women. Given that the mean height of a man is 1.8 m, and that of a woman is 1.7m, the mean height of the entire population is

 

Corollary. (formula of expectation conditional on event) For each random variable   and event   with  ,

 

Proof. By the formula of expectation computed by weighted average of conditional expectations,

 
and the result follows if  .

 

Remark.

  • if  , it reduces to the definition of the conditional probability   by the fundamental bridge between probability and expectation

After defining conditional expectation, we can also have conditional variance, covariance and correlation coefficient, since variance, covariance, and correlation coefficient are built upon expectation.

Conditional expectations of bivariate normal distributionEdit

Proposition. (Conditional expectations of bivariate normal distribution) Let  . Then,

 

Proof.

  • The result follows from the proposition about conditional distributions of bivariate normal distribution readily.

 

Conditional varianceEdit

Definition. (Conditional variance) The conditional variance of random variable   given   is

 

Similarly, we have properties of conditional variance which are similar to that of variance.

Proposition. (Properties of conditional variance) For each random variable  ,

  • (alternative formula of conditional variance)  
  • (invariance under change in location parameter)  
  • (homogeneity of degree two)  
  • (nonnegativity)  
  • (zero variance implies non-randomness)   for some function   of  
  • (additivity under independence) if   are conditionally independent given  , 

Proof. The proof is similar to the one for properties of variance.

 

Beside law of total expectation, we also have law of total variance, as follows:

Proposition. (Law of total variance) For each rnadom variable  ,

 

Proof.

 

 

Remark.

  • We can replace   by  , a random vector.

Conditional variances of bivariate normal distributionEdit

Proposition. (Conditional variances of bivariate normal distribution) Let  . Then,

 

Proof.

  • The result follows from he proposition about conditional distributions of bivariate normal distribution readily.

 

Remark.

  • It can be observed that the exact values of   and   in the conditions do not matter. The result is the same for different values of them.


Conditional covarianceEdit

Definition. (Conditional covariance) The conditional covariance of   and   given   is

 

Proposition. (Properties of conditional covariance)

(i) (symmetry) for each random variable  ,

 
(ii) for each random variable  ,
 
(iii) (alternative formula of covariance)
 
(iv) for each constant  , and for each random variables  ,
 
(v) for each random variable  ,
 


Conditional correlation coefficientEdit

Definition. (Conditional correlation coefficient) The conditional correlation coefficient of random variables   and   given   is

 

Remark.

  • Similar to 'unconditional' correlation coefficient, conditional correlation coefficient also lies between   and   inclusively. The proof is similar, by replacing every unconditional terms with conditional terms.


Conditional quantileEdit

Definition. (Conditional quantile) The conditional  th quantile of   given   is

 

Remark.

  • Then, we can have conditional median, interquartile range, etc., which are defined using conditional quantile in the same way as the unconditional ones