Probability/Transformation of Random Variables

Transformation of random variablesEdit

Underlying principleEdit

Let   be   random variables,   be another   random variables, and   be random (column) vectors.

Suppose the vector-valued function[1]   is one-to-one. Then, its inverse   exists.

After that, we can transform   to   by applying the transformation  , i.e. by  , and transform   to   by applying the inverse transformation  , i.e. by  .

We are often interested in deriving the joint probability function   of  , given the joint probability function   of  . We will examine the discrete and continuous cases one by one in the following.

Transformation of discrete random variablesEdit

Proposition. (transformation of discrete random variables) For each discrete random vector   with joint pmf  , the corresponding joint pmf of the one-to-one transformed random vector   is

 

Proof.

  • For each  ,

 
  • In particular, the inverse exists since the transformation is one-to-one.

 


Transformation of continuous random variablesEdit

For continuous random variables, the situation is more complicated. Let us define Jacobian matrix, and introduce several notations in the definition.

Definition. (Jacobian matrix) Suppose the function   is differentiable (then it follows that   is differentiable). The Jacobian matrix

 
in which   is the component function of   for each  , i.e.  .

Remark.

  • We have  .

Example. Suppose  ,  , and  . Then,  , , and

 

Also,  . Then,  ,  , and

 


Theorem. (Transformation of continuous random variables) For each continuous random vector   with joint pdf  , and assuming differentiability of   (and thus also of  ), the corresponding joint pdf of one-to-one transformed random vector   is

 

Proof.

  • Let  .
  • Recall that   (  is cdf of  ). So, it suffices to prove that  .
  • This is true since   (by applying change of variable formula for multiple integration).
  • The result follows.

 


Moment generating functionEdit

Definition. (Moment generating function) The moment generating function (mgf) for the distribution of a random variable   is  .

Remark.

  • For comparison: cdf is  .
  • Mgf, similar to pmf, pdf and cdf, gives a complete description of distribution, so it can also similarly uniquely identify a distribution, provided that the mgf exists (expectation may be infinite),
  • i.e., we can recover probability function from mgf.
  • The proof to this result is complicated, and thus omitted.

Proposition. (Moment generating property of mgf) Assuming mgf   exists for   in which   is a positive number, we have

 
for each nonnegative integer  .

Proof.

  • Since

 
 
  • The result follows from simplifying the above expression by  

 

Proposition. (Relationship between independence and mgf) If   and   are independent,

 

Proof.

 
Similarly,
 
  • lote: law of total expectation

 

Remark.

  • This equality does not hold if   and   are not independent.

Joint moment generating functionEdit

Definition. (Joint moment generating function) The joint moment generating function (mgf) of random vector   is

 
for each (column) vector  , if the expectation exists.

Remark.

  • When  , the dot product of two vectors is product of two numbers.
  •  .

Proposition. (Relationship between independence and mgf) Random variables   are independent if and only if

 

Proof.

  • 'only if' part:

 
  • Proof for 'if' part is quite complicated, and thus is omitted.

 

Analogously, we have marginal mgf.

Definition. (Marginal mgf) The marginal mgf of   which is a member of random variables   is

 

Proposition. (Moment generating function of linear transformation of random variables) For each constant vector   and a real constant  , the mgf of   is

 

Proof.

 

 

Remark.

  • If   are independent (suppose  ),

 
  • This provides an alternative, and possibly more convenient method to derive the distribution of  , compared with deriving it from probability functions of  .
  • Special case: if   and  , then  , which is sum of r.v.'s.
  • So,  .
  • In particular, if   are independent , then  .
  • We can use this result to prove the formulas for sum of independent r.v.'s., instead of using the proposition about convolution of r.v.'s.
  • Special case: if  , then the expression for linear transformation becomes  .
  • So,  .


Moment generating function of some important distributionsEdit

Proposition. (Moment generating function of binomial distribution) The moment generating function of   is  .

Proof.

 

 

Proposition. (Moment generating function of Poisson distribution) The moment generating function of   is  .

Proof.

 

 

Proposition. (Moment generating function of exponential distribution) The moment generating function of   is  .

Proof.

  •  
  • The result follows.

 

Proposition. (Moment generating function of gamma distribution) The moment generating function of   is  .

Proof.

  • We use similar proof technique from the proof for mgf of exponential distribution.

 
  • The result follows.

 

Proposition. (Moment generating function of normal distribution) The moment generating function of   is  .

Proof.

  • Let  . Then,  .
  • First, consider the mgf of  :

 
  • It follows that the mgf of   is

 
  • The result follows.

 


Distribution of linear transformation of random variablesEdit

We will prove some propositions about distributions of linear transformation of random variables using mgf. Some of them are mentioned in previous chapters. As we will see, proving these propositions using mgf is quite simple.

Proposition. (Distribution of linear transformation of normal r.v.'s) Let  . Then,  .

Proof.

  • The mgf of   is

 
which is the mgf of  , and the result follows since mgf identify a distribution uniquely.

 

Sum of independent random variablesEdit

Proposition. (Sum of independent binomial r.v.'s) Let  , in which   are independent. Then,  .

Proof.

  • The mgf of   is

 
which is the mgf of  , as desired.

 

Proposition. (Sum of independent Poisson r.v.'s) Let  , in which   are independent. Then,  .

Proof.

  • The mgf of   is

 
which is the mgf of  , as desired.

 

Proposition. (Sum of independent exponential r.v.'s) Let   be i.i.d. r.v.'s following  . Then,  .

Proof.

  • The mgf of   is

 
which is the mgf of  , as desired.

 

Proposition. (Sum of independent gamma r.v.'s) Let  , in which   are independent. Then,  .

Proof.

  • The mgf of   is

 
which is the mgf of  , as desired.

 

Proposition. (Sum of independent normal r.v.'s) Let  , in which   are independent. Then  .

Proof.

  • The mgf of   (in which they are independent) is

 
which is the mgf of  , as desired.

 


Central limit theoremEdit

We will provide a proof to central limit theorem (CLT) using mgf here.

Theorem. (Central limit theorem) Let   be a sequence of i.i.d. random variables with finite mean   and positive variance  , and   be the sample mean of the first   random variables, i.e.  . Then, the standardized sample mean   converges in distribution to a standard normal random variable as  .

Proof.

  • Define  . Then, we have

 
  • which is in the form of  .
  • Therefore,

 
and the result follows from the mgf property of identifying distribution uniquely.

 

Remark.

  • Since  ,
  • the sample mean converges in distribution to   as  .
  • The same result holds for the sample mean of normal r.v.'s with the same mean   and the same variance  ,
  • since if  , then  .
  • It follows from the proposition about the distribution of linear transformation of normal r.v.'s that the sample sum, i.e.   converges in distribution to  .
  • The same result holds for the sample sum of normal r.v.'s with the same mean   and the same variance  ,
  • since if  , then  .
  • If a r.v. converges in distribution to a distribution, then we can use the distribution to approximate the probabilities involving the r.v..

A special case of using CLT as approximation is using normal distribution to approximate discrete distribution. To improve accuracy, we should ideally have continuity correction, as explained in the following.

Proposition. (Continuity correction) A continuity correction is rewriting the probability expression   (  is integer) as   when approximating a discrete distribution by normal distribution using CLT.

Remark.

  • The reason for doing this is to make   to be at the 'middle' of the interval, so that it is better approximated.

Illustration of continuity correcction:

| 
|              /
|             /
|            /
|           /|
|          /#|
|         *##|
|        /|##|
|       /#|##|   
|      /##|##|   
|     /|##|##|   
|    / |##|##|   
|   /  |##|##|
|  /   |##|##|
| /    |##|##|
*------*--*--*---------------------
    i-1/2 i i+1/2

| 
|              /
|             /
|            /
|           / 
|          /  
|         *   
|        /|   
|       /#|      
|      /##|      
|     /###|      
|    /####|      
|   /#####|   
|  /|#####|   
| / |#####|   
*---*-----*------------------------
   i-1    i      

| 
|              /|
|             /#|
|            /##|
|           /###|
|          /####|
|         *#####|
|        /|#####|
|       / |#####|
|      /  |#####|
|     /   |#####|
|    /    |#####|
|   /     |#####| 
|  /      |#####|
| /       |#####|
*---------*-----*------------------
          i     i+1 
  1. or equivalently, transformation between supports of   and