Recall the definition of conditional probability:
in which are events, with .
Applying this definition to discrete random variables, we have
where is the joint pmf of and , and is the marginal pmf of .
It is natural to call such conditional probability as conditional pmf, right?
We will denote such conditional probability as .
Then, this is basically the definition of conditional pmf: conditional pmf of given is the conditional probability .
Naturally, we will expect that conditional pdf is defined similarly. This is indeed the case:
Definition.
(Conditional probability function)
Let be random variables that are both discrete or both continuous.
The conditional probability (mass or density) function of given , in which is a real number, is
Remark.
The marginal pdf can be interpreted as normalizing constant, which makes the integral , since (integrating over the region in which is fixed to be (the region in which the condition is satisfied), so we only integrate over the corresponding interval of ( is still a variable)).
This is similar to the denominator in the definition of conditional probability, which makes the conditional probability of the whole sample space equals one, to satisfy the probability axiom.
To understand the definition more intuitively for the continuous case, consider the following diagram.
We can see that when we are conditioning , we take a "slice" out from the region under joint pdf,
and the area of the "whole slice" is the area
between the univariate joint pdf with fixed and variable ,
and the -axis.
Since the area is given by ,
while according to the probability axioms, the area should equal 1.
Hence, we scale down the area of "slice" by a factor of , by dividing the univariate joint pdf by .
After that, the curve at the top of scaled "slice" is the graph of the conditional pdf .
Now, we have discussed the case where both random variables are discrete or continuous.
How about the case where one of them is discrete and another one is continuous?
In this case, there is no "joint probability function" of these two random variables, since one is discrete and another is continuous!
But, we can still define the conditional probability function in some other ways.
To motivate the following definition, let be the conditional probability .
Then, differentiating with respect to should yield the conditional pdf .
So, we have
Thus, it is natural to have the following definition.
Definition.
(Conditional probability density function when is continuous and is discrete)
Let be a continuous random variable and be a discrete random variable.
The conditional probability density function of given , where is real number, is
Now, how about the case where is discrete and is continuous?
In this case, let us use the above definition for the motivation of definition. However, we should interchange and so that the assumptions are still satisfied.
Then, we get
In this case, is discrete, so it is natural to define the conditional pmf of given as in the expression.
Now, after rearranging the terms, we get
Thus, we have the following definition.
Definition.
(Conditional probability mass function when is discrete and is continuous)
Let be a discrete random variable and be a continuous random variable.
The conditional probability density function of given , where is real number, is
Based on the definitions of conditional probability functions, it is natural to define the conditional cdf as follows.
Definition.
(Conditional cumulative distribution function)
Let be discrete or continuous random variables.
The conditional cumulative distribution function (cdf) of given , in which is a real number, is
Remark.
We should be aware that when is continuous, the event has probability zero. So, according to the definition of conditional probability, the conditional cdf in this case should be undefined. However, in this context, we still define the conditional probability as an expression that makes sense and is defined.
Graphical illustration of the definition (continuous random variables):
Top view:
|
|
*---------------*
| |
| |
fixed y *=========@=====* <--- corresponding interval
| x |
| |
*---------------*
|
*----------------
Side view:
*
/ \
*\ * /
/|#\ \
| / |##\ / *---------*
| * |###\ /\
| |\ |##/#\----------/--\
| | \|#/###*--------* /
| | \/######### / \ /
| |y *\========@==/===*
| | / *-------x-* /
| |/ \ /
| *----------------*
|/
*------------------------- x
Front view:
|
|
|
*\
|#\
|##\
|###\
|####\ <------------- Area: f_Y(y)
|#####*--------*
|########### \
*==========@=====*--------------
x
*---*
|###| : the desired region from the cross section from joint pdf, whose area is the probability from the cdf
*---*
If for some event ,
we have some special notations for simplicity:
the conditional probability function of given becomes
the conditional cdf of given becomes
Proposition.
(Determining independence of two random variables)
Random varibles are independent if and only if
for each .
Proof.
Recall the definition of independence between two random variables:
are independent if
for each .
Since
for each ,
we have the desired result.
Remark.
This is expected, since the conditioning on independent event should not affect the occurrence of another independent event.
We can extend the definition of conditional probability function and cdf to groups of random variables, for joint cdf's and joint probability functions, as follows:
Definition.
(Conditional joint probability function)
Let and be two random vectors.
The conditional joint probability function of given is
Then, we also have a similar proposition for determining independence of two random vectors.
Proposition.
(Determining independence of two random vectors)
Random vectors are independent if and only if
for each .
Proof.
The definition of independence between two random vectors is
are independent if
for each .
Since
for each ,
we have the desired result.
Conditional distributions of bivariate normal distribution
Proposition.
(Conditional distributions of bivariate normal distribution)
Let .
Then,
(abuse of notations: when we say the distribution of "", we mean the conditional distribution of given ).
Proof.
First, the conditional pdf
Then, we can see that ,
and by symmetry (interchanging and , and also interchanging and ), .
We can obtain conditional version of concepts previously established for 'unconditional'
distributions analogously for conditional distributions by substituting 'unconditional' cdf, pdf or pmf, i.e. or ,
by their conditional counterparts, i.e. or .
Definition.
Random variables are conditionally independent given if and only if
or
.
for each real number and for each positive integer , in which
and
denote the joint cdf and probability function of conditional on respectively.
Remark.
For random variables, conditional independence and independence are not related, i.e. one of them does not imply the another.
Example.
(Conditional independence does not imply independence)
TODO
Example.
(Independence does not imply conditional independence)
TODO
Definition.
(Conditional expectation)
Let be the conditional probability function of given . Then,
Remark.
is a function of
the random variable, which is a function of after computing the expectation, is written as for brevity, in which 's are the same term.
is a realization of when is observed to be in which 's are the same term.
Similarly, we have conditional version of law of the unconscious statistician.
Proposition.
(Law of the unconscious statistician (conditional version))
Let be the conditional probability function of given . Then, for each function ,
Proposition.
(Conditional expectation under independence)
If random variables are independent,
for each function .
Proof.
Remark.
This equality may not hold if are not independent.
Example.
Suppose random vector in which are independent random variables,
and .
Then,
( is treated as constant, because of the conditioning: it is constant after realization of )
but
The properties of still hold for conditional expectations , with every 'unconditional' expectation replaced by conditional expectation and some suitable modifications, as follows:
Proposition.
(Properties of conditional expectation)
For each random variable ,
(linearity)
for each functions of and for each random variable
(nonnegativity) if ,
(monotonicity) if , for each random variable
(triangle inequality)
(multiplicativity under independence) if are conditionally independent given ,
Proof.
The proof is similar to the one for 'unconditional' expectations.
Remark.
are treated as constants given, since after observing the value of , they cannot be changed.
Each result also holds with replaced by random vectors .
The following theorem about conditional expectation is quite important.
Theorem.
(Law of total expectation)
For each function and for each random variable ,
Proof.
Remark.
We can replace by and get
Corollary.
(Generalized law of total probability)
For each event ,
Proof.
First,
Then, using law of total expectation,
Remark.
The expectation is taken with respect to , so we use the notation. We will use similar notations to denote the random variables to which the expectation is taken with respect if needed.
We can replace by , which is a random vector.
If is discrete, then the expanded form of the result is (discrete case for law of total probability).
If is continuous, then the expanded form of the result is (continuous case for law of total probability).
Corollary.
(Expectation version of law of total probability)
Suppose the sample space in which 's are mutually exclusive.
Then,
Proof.
Define if occurs, in which is a positive integer. Then,
Remark.
the number of events can be finite, as long as they are mutually exclusive and their union is the whole sample space
if , it reduces to law of total probability
Example.
Let be the human height in m.
A person is randomly selected from a population consisting of same number of men and women. Given that the mean height of a man is 1.8 m, and that of a woman is 1.7m,
the mean height of the entire population is
Corollary.
(formula of expectation conditional on event)
For each random variable and event with ,
Proof.
By the formula of expectation computed by weighted average of conditional expectations,
and the result follows if .
Remark.
if , it reduces to the definition of the conditional probability by the fundamental bridge between probability and expectation
After defining conditional expectation, we can also have conditional variance, covariance and correlation coefficient, since variance, covariance, and correlation coefficient are built upon expectation.
Conditional expectations of bivariate normal distribution
Definition.
(Conditional covariance)
The conditional covariance of and given is
Proposition.
(Properties of conditional covariance)
(i) (symmetry) for each random variable ,
(ii) for each random variable ,
(iii) (alternative formula of covariance)
(iv)
for each constant ,
and for each random variables ,
(v) for each random variable ,
Definition.
(Conditional correlation coefficient)
The conditional correlation coefficient of random variables and given is
Remark.
Similar to 'unconditional' correlation coefficient, conditional correlation coefficient also lies between and inclusively. The proof is similar, by replacing every unconditional terms with conditional terms.
Definition.
(Conditional quantile)
The conditionalth quantile of given is
Remark.
Then, we can have conditional median, interquartile range, etc., which are defined using conditional quantile in the same way as the unconditional ones