Let be random variables,
be another random variables, and
be random (column) vectors.
Suppose the vector-valued function[1] is bijective (it is also called one-to-one correspondence in this case).
Then, its inverse exists.
After that, we can transform to by applying the transformation ,
i.e. by ,
and transform to by applying the inverse transformation ,
i.e. by .
We are often interested in deriving the joint probability function
of ,
given the joint probability function of .
We will examine the discrete and continuous cases one by one in the following.
Proposition.
(transformation of discrete random variables)
For each discrete random vector with joint pmf, the corresponding joint pmf of the transformed random vector where is bijective is
Proof.
Considering the original pmf , we have
In particular, the inverse exists since is bijective.
For continuous random variables, the situation is more complicated.
Let us investigate the case for univariate pdf, which is simpler.
Theorem.
(Transformation of continuous random variable (univariate case))
Let be a continuous random variable with pdf .
Assume that the function is differentiable and strictly monotone.
Then, the pdf of the transformed random variable is
Proof.
Under the assumption that is differentiable and strictly monotone,
the cdf ( exists since is strictly monotonic.)
Differentiating both side of the above equation (assuming the cdf's involved are differentiable) gives
Since , we can write as .
Also, we can summarize the above case defined function into a single expression by applying absolute value function to both side:
where the absolute value sign is only applied to since the pdf's must be nonnegative, and thus we do not need to apply the sign to them.
Remark.
To explain this theorem in a more intuitive manner, we rewrite the equation in the theorem as
where both side of the equation can be regarded as differential areas, which are nonnegative due to the absolute value signs.
This equation should intuitively hold since they both represent the areas under the pdf's, which represent probabilities. For , it is the area of the region under the pdf of over an "infinitesimal" interval , which represent the probability for to lie in this infinitesimal interval . After transformation, we get another pdf of , and the original region is transformed to a region under pdf of over an infinitesimal interval with area . Since is bijective function (its strict monotonicity implies this), "correspond" to in some sense, and we know that the values in are "originated" from the values in , and so the randomness. It follows that the probability for lying in and lying in should be the same, and hence the two differential areas are the same.
Let us define Jacobian matrix, and introduce several notations in the definition.
Definition.
(Jacobian matrix)
Suppose the function is differentiable (then it follows that is differentiable).
The Jacobian matrix
in which is the component function of
for each , i.e.
.
Remark.
We have .
Example.
Suppose , , and
.
Then,
,, and
Also, .
Then, ,
, and
Theorem.
(Transformation of continuous random variables)
Let be a continuous random vector with joint pdf, and assume is differentiable and bijective.
The corresponding joint pdf of transformed random vector is
Proof.Partial proof:
Assume is differentiable and bijective.
First,
On the other hand,
we have
where , which is the preimage of the set under .
Applying the change of variable formula to this integral (whose proof is advanced and uses our assumptions), we get
Comparing the integrals in and , we can observe the desired result.
Definition.
(Moment generating function)
The moment generating function (mgf) for the distribution of a
random variable is
.
Remark.
For comparison: cdf is .
Mgf, similar to pmf, pdf and cdf, gives a complete description of distribution, so it can also similarly uniquely identify a distribution, provided that the mgf exists (expectation may be infinite),
i.e., we can recover probability function from mgf.
The proof to this result is complicated, and thus omitted.
Proposition.
(Moment generating property of mgf)
Assuming mgf exists for in which is a positive number, we have
for each nonnegative integer .
Proof.
Since
The result follows from simplifying the above expression by
Proposition.
(Relationship between independence and mgf)
If and are independent,
Proof.
Similarly,
lote: law of total expectation
Remark.
This equality does not hold if and are not independent.
Definition.
(Joint moment generating function)
The joint moment generating function (mgf) of random vector is
for each (column) vector ,
if the expectation exists.
Remark.
When , the dot product of two vectors is product of two numbers.
.
Proposition.
(Relationship between independence and mgf)
Random variables are independent if and only if
Proof.
'only if' part:
Assume are independent. Then,
Proof for 'if' part is quite complicated, and thus is omitted.
Analogously, we have marginal mgf.
Definition.
(Marginal mgf)
The marginal mgf of which is a member of random variables is
Proposition.
(Moment generating function of linear transformation of random variables)
For each constant vector
and a real constant , the mgf of is
Proof.
Remark.
If are independent,
This provides an alternative, and possibly more convenient method to derive the distribution of , compared with deriving it from probability functions of .
Special case: if and , then , which is sum of r.v.'s.
So, .
In particular, if are independent , then .
We can use this result to prove the formulas for sum of independent r.v.'s., instead of using the proposition about convolution of r.v.'s.
Special case: if , then the expression for linear transformation becomes .
So, .
Moment generating function of some important distributions
We will prove some propositions about distributions of linear transformation of random variables using mgf. Some of them are mentioned in previous chapters.
As we will see, proving these propositions using mgf is quite simple.
Proposition.
(Distribution of linear transformation of normal r.v.'s)
Let .
Then, .
Proof.
The mgf of is
which is the mgf of , and the result follows since mgf identify a distribution uniquely.
We will provide a proof to central limit theorem (CLT) using mgf here.
Theorem.
(Central limit theorem)
Let be a sequence of i.i.d. random variables with finite mean and positive variance ,
and be the sample mean of the first random variables, i.e.
.
Then, the standardized sample mean
converges in distribution to a standard normal random variable as .
Proof.
Define . Then, we have
which is in the form of .
Therefore,
and the result follows from the mgf property of identifying distribution uniquely.
Remark.
Since ,
the sample mean converges in distribution to as .
The same result holds for the sample mean of normal r.v.'s with the same mean and the same variance ,
since if , then .
It follows from the proposition about the distribution of linear transformation of normal r.v.'s that the sample sum, i.e. converges in distribution to .
The same result holds for the sample sum of normal r.v.'s with the same mean and the same variance ,
since if , then .
If a r.v. converges in distribution to a distribution, then we can use the distribution to approximate the probabilities involving the r.v..
A special case of using CLT as approximation is using normal distribution to approximate discrete distribution.
To improve accuracy, we should ideally have continuity correction, as explained in the following.
Proposition.
(Continuity correction)
A continuity correction is rewriting the probability expression ( is integer) as
when approximating a discrete distribution by normal distribution using CLT.
Remark.
The reason for doing this is to make to be at the 'middle' of the interval, so that it is better approximated.