Probability/Transformation of Random Variables


Transformation of random variables edit

Underlying principle edit

Let   be   random variables,   be another   random variables, and   be random (column) vectors.

Suppose the vector-valued function[1]   is bijective (it is also called one-to-one correspondence in this case). Then, its inverse   exists.

After that, we can transform   to   by applying the transformation  , i.e. by  , and transform   to   by applying the inverse transformation  , i.e. by  .

We are often interested in deriving the joint probability function   of  , given the joint probability function   of  . We will examine the discrete and continuous cases one by one in the following.

Transformation of discrete random variables edit

Proposition. (transformation of discrete random variables) For each discrete random vector   with joint pmf  , the corresponding joint pmf of the transformed random vector   where   is bijective is

 

Proof. Considering the original pmf  , we have

 
In particular, the inverse   exists since   is bijective.

 


Transformation of continuous random variables edit

For continuous random variables, the situation is more complicated.

Let us investigate the case for univariate pdf, which is simpler.

Theorem. (Transformation of continuous random variable (univariate case)) Let   be a continuous random variable with pdf  . Assume that the function   is differentiable and strictly monotone. Then, the pdf of the transformed random variable   is

 

Proof. Under the assumption that   is differentiable and strictly monotone, the cdf   (  exists since   is strictly monotonic.) Differentiating both side of the above equation (assuming the cdf's involved are differentiable) gives

 
Since  , we can write   as  . Also, we can summarize the above case defined function into a single expression by applying absolute value function to both side:
 
where the absolute value sign is only applied to   since the pdf's must be nonnegative, and thus we do not need to apply the sign to them.

 

Remark.

  • To explain this theorem in a more intuitive manner, we rewrite the equation in the theorem as

 
where both side of the equation can be regarded as differential areas, which are nonnegative due to the absolute value signs.
  • This equation should intuitively hold since they both represent the areas under the pdf's, which represent probabilities. For  , it is the area of the region   under the pdf of   over an "infinitesimal" interval  , which represent the probability for   to lie in this infinitesimal interval  . After transformation, we get another pdf of  , and the original region   is transformed to a region   under pdf of   over an infinitesimal interval   with area  . Since   is bijective function (its strict monotonicity implies this),   "correspond" to   in some sense, and we know that the values in   are "originated" from the values in  , and so the randomness. It follows that the probability for   lying in   and   lying in   should be the same, and hence the two differential areas are the same.

Let us define Jacobian matrix, and introduce several notations in the definition.

Definition. (Jacobian matrix) Suppose the function   is differentiable (then it follows that   is differentiable). The Jacobian matrix

 
in which   is the component function of   for each  , i.e.  .

Remark.

  • We have  .

Example. Suppose  ,  , and  . Then,  , , and

 

Also,  . Then,  ,  , and

 


Theorem. (Transformation of continuous random variables) Let   be a continuous random vector with joint pdf  , and assume   is differentiable and bijective. The corresponding joint pdf of transformed random vector   is

 

Proof. Partial proof: Assume   is differentiable and bijective.

First,

 

On the other hand, we have

 
where  , which is the preimage of the set   under  .

Applying the change of variable formula to this integral (whose proof is advanced and uses our assumptions), we get

 
Comparing the integrals in   and  , we can observe the desired result.

 


Moment generating function edit

Definition. (Moment generating function) The moment generating function (mgf) for the distribution of a random variable   is  .

Remark.

  • For comparison: cdf is  .
  • Mgf, similar to pmf, pdf and cdf, gives a complete description of distribution, so it can also similarly uniquely identify a distribution, provided that the mgf exists (expectation may be infinite),
  • i.e., we can recover probability function from mgf.
  • The proof to this result is complicated, and thus omitted.

Proposition. (Moment generating property of mgf) Assuming mgf   exists for   in which   is a positive number, we have

 
for each nonnegative integer  .

Proof.

  • Since

 
 
  • The result follows from simplifying the above expression by  

 

Proposition. (Relationship between independence and mgf) If   and   are independent,

 

Proof.

 
Similarly,
 
  • lote: law of total expectation

 

Remark.

  • This equality does not hold if   and   are not independent.

Joint moment generating function edit

In the following, we will use   to denote  .

Definition. (Joint moment generating function) The joint moment generating function (mgf) of random vector   is

 
for each (column) vector  , if the expectation exists.

Remark.

  • When  , the dot product of two vectors is product of two numbers.
  •  .

Proposition. (Relationship between independence and mgf) Random variables   are independent if and only if

 

Proof. 'only if' part: Assume   are independent. Then,

 
Proof for 'if' part is quite complicated, and thus is omitted.

 

Analogously, we have marginal mgf.

Definition. (Marginal mgf) The marginal mgf of   which is a member of random variables   is

 

Proposition. (Moment generating function of linear transformation of random variables) For each constant vector   and a real constant  , the mgf of   is

 

Proof.

 

 

Remark.

  • If   are independent,

 
  • This provides an alternative, and possibly more convenient method to derive the distribution of  , compared with deriving it from probability functions of  .
  • Special case: if   and  , then  , which is sum of r.v.'s.
  • So,  .
  • In particular, if   are independent , then  .
  • We can use this result to prove the formulas for sum of independent r.v.'s., instead of using the proposition about convolution of r.v.'s.
  • Special case: if  , then the expression for linear transformation becomes  .
  • So,  .


Moment generating function of some important distributions edit

Proposition. (Moment generating function of binomial distribution) The moment generating function of   is  .

Proof.

 

 

Proposition. (Moment generating function of Poisson distribution) The moment generating function of   is  .

Proof.

 

 

Proposition. (Moment generating function of exponential distribution) The moment generating function of   is  .

Proof.

  •  
  • The result follows.

 

Proposition. (Moment generating function of gamma distribution) The moment generating function of   is  .

Proof.

  • We use similar proof technique from the proof for mgf of exponential distribution.

 
  • The result follows.

 

Proposition. (Moment generating function of normal distribution) The moment generating function of   is  .

Proof.

  • Let  . Then,  .
  • First, consider the mgf of  :

 
  • It follows that the mgf of   is

 
  • The result follows.

 


Distribution of linear transformation of random variables edit

We will prove some propositions about distributions of linear transformation of random variables using mgf. Some of them are mentioned in previous chapters. As we will see, proving these propositions using mgf is quite simple.

Proposition. (Distribution of linear transformation of normal r.v.'s) Let  . Then,  .

Proof.

  • The mgf of   is

 
which is the mgf of  , and the result follows since mgf identify a distribution uniquely.

 

Sum of independent random variables edit

Proposition. (Sum of independent binomial r.v.'s) Let  , in which   are independent. Then,  .

Proof.

  • The mgf of   is

 
which is the mgf of  , as desired.

 

Proposition. (Sum of independent Poisson r.v.'s) Let  , in which   are independent. Then,  .

Proof.

  • The mgf of   is

 
which is the mgf of  , as desired.

 

Proposition. (Sum of independent exponential r.v.'s) Let   be i.i.d. r.v.'s following  . Then,  .

Proof.

  • The mgf of   is

 
which is the mgf of  , as desired.

 

Proposition. (Sum of independent gamma r.v.'s) Let  , in which   are independent. Then,  .

Proof.

  • The mgf of   is

 
which is the mgf of  , as desired.

 

Proposition. (Sum of independent normal r.v.'s) Let  , in which   are independent. Then  .

Proof.

  • The mgf of   (in which they are independent) is

 
which is the mgf of  , as desired.

 


Central limit theorem edit

We will provide a proof to central limit theorem (CLT) using mgf here.

Theorem. (Central limit theorem) Let   be a sequence of i.i.d. random variables with finite mean   and positive variance  , and   be the sample mean of the first   random variables, i.e.  . Then, the standardized sample mean   converges in distribution to a standard normal random variable as  .

Proof.

  • Define  . Then, we have

 
  • which is in the form of  .
  • Therefore,

 
and the result follows from the mgf property of identifying distribution uniquely.

 

Remark.

  • Since  ,
  • the sample mean converges in distribution to   as  .
  • The same result holds for the sample mean of normal r.v.'s with the same mean   and the same variance  ,
  • since if  , then  .
  • It follows from the proposition about the distribution of linear transformation of normal r.v.'s that the sample sum, i.e.   converges in distribution to  .
  • The same result holds for the sample sum of normal r.v.'s with the same mean   and the same variance  ,
  • since if  , then  .
  • If a r.v. converges in distribution to a distribution, then we can use the distribution to approximate the probabilities involving the r.v..

A special case of using CLT as approximation is using normal distribution to approximate discrete distribution. To improve accuracy, we should ideally have continuity correction, as explained in the following.

Proposition. (Continuity correction) A continuity correction is rewriting the probability expression   (  is integer) as   when approximating a discrete distribution by normal distribution using CLT.

Remark.

  • The reason for doing this is to make   to be at the 'middle' of the interval, so that it is better approximated.

Illustration of continuity correcction:

| 
|              /
|             /
|            /
|           /|
|          /#|
|         *##|
|        /|##|
|       /#|##|   
|      /##|##|   
|     /|##|##|   
|    / |##|##|   
|   /  |##|##|
|  /   |##|##|
| /    |##|##|
*------*--*--*---------------------
    i-1/2 i i+1/2

| 
|              /
|             /
|            /
|           / 
|          /  
|         *   
|        /|   
|       /#|      
|      /##|      
|     /###|      
|    /####|      
|   /#####|   
|  /|#####|   
| / |#####|   
*---*-----*------------------------
   i-1    i      

| 
|              /|
|             /#|
|            /##|
|           /###|
|          /####|
|         *#####|
|        /|#####|
|       / |#####|
|      /  |#####|
|     /   |#####|
|    /    |#####|
|   /     |#####| 
|  /      |#####|
| /       |#####|
*---------*-----*------------------
          i     i+1 
  1. or equivalently, transformation between supports of   and