# Probability/Transformation of Random Variables

## Transformation of random variables

### Underlying principle

Let ${\displaystyle X_{1},\dotsc ,X_{n}}$  be ${\displaystyle n}$  random variables, ${\displaystyle Y_{1},\dotsc ,Y_{n}}$  be another ${\displaystyle n}$  random variables, and ${\displaystyle \mathbf {X} =(X_{1},\dotsc ,X_{n})^{T},\mathbf {Y} =(Y_{1},\dotsc ,Y_{n})^{T}}$  be random (column) vectors.

Suppose the vector-valued function[1] ${\displaystyle \mathbf {g} :\operatorname {supp} (\mathbf {X} )\to \operatorname {supp} (\mathbf {Y} )}$  is one-to-one. Then, its inverse ${\displaystyle \mathbf {g} ^{-1}:\operatorname {supp} (\mathbf {Y} )\to \operatorname {supp} (\mathbf {X} )}$  exists.

After that, we can transform ${\displaystyle \mathbf {X} }$  to ${\displaystyle \mathbf {Y} }$  by applying the transformation ${\displaystyle \mathbf {g} }$ , i.e. by ${\displaystyle \mathbf {Y} =\mathbf {g} (\mathbf {X} )}$ , and transform ${\displaystyle \mathbf {Y} }$  to ${\displaystyle \mathbf {X} }$  by applying the inverse transformation ${\displaystyle \mathbf {g} ^{-1}}$ , i.e. by ${\displaystyle \mathbf {X} =\mathbf {g} ^{-1}(\mathbf {Y} )}$ .

We are often interested in deriving the joint probability function ${\displaystyle f_{\mathbf {Y} }(\mathbf {y} )}$  of ${\displaystyle \mathbf {Y} }$ , given the joint probability function ${\displaystyle f_{\mathbf {X} }(\mathbf {x} )}$  of ${\displaystyle \mathbf {X} }$ . We will examine the discrete and continuous cases one by one in the following.

### Transformation of discrete random variables

Proposition. (transformation of discrete random variables) For each discrete random vector ${\displaystyle \mathbf {X} }$  with joint pmf ${\displaystyle f_{\mathbf {X} }(\mathbf {x} )}$ , the corresponding joint pmf of the one-to-one transformed random vector ${\displaystyle \mathbf {Y} =\mathbf {g} (\mathbf {X} )}$  is

${\displaystyle f_{\mathbf {Y} }(\mathbf {y} )=f_{\mathbf {X} }\left(\mathbf {g} ^{-1}(\mathbf {y} )\right),\quad \mathbf {y} \in \operatorname {supp} (\mathbf {Y} ).}$

Proof.

• For each ${\displaystyle \mathbf {y} \in \operatorname {supp} (\mathbf {Y} )}$ ,

${\displaystyle f_{\mathbf {Y} }(\mathbf {y} ){\overset {\text{ def }}{=}}\mathbb {P} (\mathbf {Y} =\mathbf {y} )=\mathbb {P} \left(\mathbf {g} ^{-1}(\mathbf {Y} )=\mathbf {g} ^{-1}(\mathbf {y} )\right)=\mathbb {P} \left(\mathbf {X} =\mathbf {g} ^{-1}(\mathbf {y} )\right){\overset {\text{ def }}{=}}f_{\mathbf {X} }\left(\mathbf {g} ^{-1}(\mathbf {y} )\right).}$

• In particular, the inverse exists since the transformation is one-to-one.

${\displaystyle \Box }$

### Transformation of continuous random variables

For continuous random variables, the situation is more complicated. Let us define Jacobian matrix, and introduce several notations in the definition.

Definition. (Jacobian matrix) Suppose the function ${\displaystyle \mathbf {g} }$  is differentiable (then it follows that ${\displaystyle \mathbf {g} ^{-1}}$  is differentiable). The Jacobian matrix

${\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}={\begin{pmatrix}{\frac {\partial g_{1}(\mathbf {x} )}{\partial x_{1}}}&\dotsb &{\frac {\partial g_{1}(\mathbf {x} )}{\partial x_{n}}}\\\vdots &\ddots &\vdots \\{\frac {\partial g_{n}(\mathbf {x} )}{\partial x_{1}}}&\dotsb &{\frac {\partial g_{n}(\mathbf {x} )}{\partial x_{n}}}\end{pmatrix}},\quad \mathbf {y} =\mathbf {g} (\mathbf {x} )}$

in which ${\displaystyle g_{j}}$  is the component function of ${\displaystyle \mathbf {g} }$  for each ${\displaystyle j\in \{1,\dotsc ,n\}}$ , i.e. ${\displaystyle \mathbf {g} (\mathbf {x} )=(g_{1}(\mathbf {x} ),\dotsc ,g_{n}(\mathbf {x} ))}$ .

Remark.

• We have ${\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}{\frac {\partial \mathbf {x} }{\partial \mathbf {y} }}=I_{n\times n}\Leftrightarrow {\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}=\left({\frac {\partial \mathbf {x} }{\partial \mathbf {y} }}\right)^{-1}}$ .

Example. Suppose ${\displaystyle \mathbf {x} =(x_{1},x_{2})}$ , ${\displaystyle \mathbf {y} =(y_{1},y_{2})}$ , and ${\displaystyle \mathbf {y} =\mathbf {g} (\mathbf {x} )=({\color {red}2x_{1}},{\color {blue}3x_{2}})}$ . Then, ${\displaystyle g_{1}(\mathbf {x} )={\color {red}2x_{1}}}$ ,${\displaystyle g_{2}(\mathbf {x} )={\color {blue}3x_{2}}}$ , and

${\displaystyle {\frac {\partial \mathbf {y} }{\partial \mathbf {x} }}={\begin{pmatrix}{\frac {\partial ({\color {red}2x_{1}})}{\partial x_{1}}}&{\frac {\partial ({\color {red}2x_{1}})}{\partial x_{2}}}\\{\frac {\partial ({\color {blue}3x_{2}})}{\partial x_{1}}}&{\frac {\partial ({\color {blue}3x_{2}})}{\partial x_{2}}}\end{pmatrix}}={\begin{pmatrix}2&0\\0&3\end{pmatrix}}.}$

Also, ${\displaystyle \mathbf {x} =\mathbf {g} ^{-1}(\mathbf {y} )=({\color {darkgreen}y_{1}/2},{\color {purple}y_{2}/3})}$ . Then, ${\displaystyle g_{1}^{-1}(\mathbf {y} )={\color {darkgreen}y_{1}/2}}$ , ${\displaystyle g_{2}^{-1}(\mathbf {y} )={\color {purple}y_{2}/3}}$ , and

${\displaystyle {\frac {\partial \mathbf {x} }{\partial \mathbf {y} }}={\begin{pmatrix}{\frac {\partial ({\color {darkgreen}y_{1}/2})}{\partial y_{1}}}&{\frac {\partial ({\color {darkgreen}y_{1}/2})}{\partial y_{2}}}\\{\frac {\partial ({\color {purple}y_{2}/3})}{\partial y_{1}}}&{\frac {\partial ({\color {purple}y_{2}/3})}{\partial y_{2}}}\end{pmatrix}}={\begin{pmatrix}1/2&0\\0&1/3\end{pmatrix}}={\begin{pmatrix}2&0\\0&3\end{pmatrix}}^{-1}={\frac {1}{6}}{\begin{pmatrix}3&0\\0&2\end{pmatrix}}.}$

Theorem. (Transformation of continuous random variables) For each continuous random vector ${\displaystyle \mathbf {X} }$  with joint pdf ${\displaystyle f_{\mathbf {X} }(\mathbf {x} )}$ , and assuming differentiability of ${\displaystyle \mathbf {g} }$  (and thus also of ${\displaystyle \mathbf {g} ^{-1}}$ ), the corresponding joint pdf of one-to-one transformed random vector ${\displaystyle \mathbf {Y} =\mathbf {g} (\mathbf {X} )}$  is

${\displaystyle f_{\mathbf {Y} }(\mathbf {y} )=f_{\mathbf {X} }{\big (}\mathbf {g} ^{-1}(\mathbf {y} ){\big )}\left|\det {\frac {\partial \mathbf {x} }{\partial \mathbf {y} }}\right|,\quad \mathbf {y} \in \operatorname {supp} (\mathbf {Y} ).}$

Proof.

• Let ${\displaystyle \mathbf {y} =(y_{1},\dotsc ,y_{n})^{T}}$ .
• Recall that ${\displaystyle f_{\mathbf {Y} }(\mathbf {y} )={\frac {\partial ^{n}}{\partial y_{1}\dotsb \partial y_{n}}}F_{\mathbf {Y} }(\mathbf {y} )}$  (${\displaystyle F_{\mathbf {Y} }}$  is cdf of ${\displaystyle \mathbf {Y} }$ ). So, it suffices to prove that ${\displaystyle F_{\mathbf {Y} }(\mathbf {y} )=\int \dotsi \int _{\operatorname {supp} (\mathbf {Y} )}f_{\mathbf {X} }{\big (}\mathbf {g} ^{-1}(\mathbf {y} ){\big )}\left|\det {\frac {\partial \mathbf {x} }{\partial \mathbf {y} }}\right|\,dy_{1}\cdots \,dy_{n}}$ .
• This is true since ${\displaystyle F_{\mathbf {Y} }(\mathbf {y} )=\int \dotsi \int _{\operatorname {supp} (\mathbf {Y} )}f_{\mathbf {Y} }(\mathbf {y} )\,dy_{1}\cdots \,dy_{n}=\int \dotsi \int _{\operatorname {supp} (\mathbf {Y} )}f_{\mathbf {X} }{\big (}\mathbf {g} ^{-1}(\mathbf {y} ){\big )}\left|\det {\frac {\partial \mathbf {x} }{\partial \mathbf {y} }}\right|\,dy_{1}\cdots \,dy_{n}}$  (by applying change of variable formula for multiple integration).
• The result follows.

${\displaystyle \Box }$

## Moment generating function

Definition. (Moment generating function) The moment generating function (mgf) for the distribution of a random variable ${\displaystyle X}$  is ${\displaystyle M_{X}({\color {darkgreen}t})=\mathbb {E} \left[e^{{\color {darkgreen}t}X}\right]}$ .

Remark.

• For comparison: cdf is ${\displaystyle F_{X}({\color {darkgreen}t})=\mathbb {E} [\mathbf {1} \{X\leq {\color {darkgreen}t}\}]}$ .
• Mgf, similar to pmf, pdf and cdf, gives a complete description of distribution, so it can also similarly uniquely identify a distribution, provided that the mgf exists (expectation may be infinite),
• i.e., we can recover probability function from mgf.
• The proof to this result is complicated, and thus omitted.

Proposition. (Moment generating property of mgf) Assuming mgf ${\displaystyle M_{X}({\color {darkgreen}t})}$  exists for ${\displaystyle t\in (-\varepsilon ,\varepsilon )}$  in which ${\displaystyle \varepsilon }$  is a positive number, we have

${\displaystyle \mathbb {E} [X^{n}]=\left.{\frac {d^{n}}{d{\color {darkgreen}t}^{n}}}M_{X}({\color {darkgreen}t})\right|_{{\color {darkgreen}t}=0}}$

for each nonnegative integer ${\displaystyle n}$ .

Proof.

• Since

${\displaystyle M_{X}({\color {darkgreen}t})=\mathbb {E} \left[e^{{\color {darkgreen}t}X}\right]=\mathbb {E} \left[1+{\color {darkgreen}t}X+{\frac {{\color {darkgreen}t}^{2}X^{2}}{2!}}+\dotsb \right]{\overset {\text{linearity}}{=}}1+{\color {darkgreen}t}\mathbb {E} [X]+{\frac {{\color {darkgreen}t}^{2}}{2!}}\mathbb {E} [X^{2}]+\dotsb ,}$

${\displaystyle {\frac {d^{n}}{d{\color {darkgreen}t}^{n}}}M_{X}({\color {darkgreen}t}){\bigg |}_{{\color {darkgreen}t}=0}={\frac {d^{n}}{d{\color {darkgreen}t}^{n}}}\left(1+{\color {darkgreen}t}\mathbb {E} [X]+{\frac {{\color {darkgreen}t}^{2}}{2!}}\mathbb {E} [X^{2}]+\dotsb \right){\bigg |}_{{\color {darkgreen}t}=0}=\mathbb {E} [X]{\frac {d^{n}}{d{\color {darkgreen}t}^{n}}}{\color {darkgreen}t}+{\frac {\mathbb {E} [X^{2}]}{2!}}{\frac {d^{n}}{d{\color {darkgreen}t}^{n}}}{\color {darkgreen}t^{2}}+\dotsb ,}$

• The result follows from simplifying the above expression by ${\displaystyle {\frac {d^{n}}{d{\color {darkgreen}t}^{n}}}{\color {darkgreen}t}^{m}=\mathbf {1} \{m=n\}n!+\mathbf {1} \{m\neq n\}(0).}$

${\displaystyle \Box }$

Proposition. (Relationship between independence and mgf) If ${\displaystyle X}$  and ${\displaystyle Y}$  are independent,

${\displaystyle M_{XY}({\color {darkgreen}t})={\color {blue}\mathbb {E} _{X}[}M_{Y}({\color {darkgreen}t}{\color {blue}X}){\color {blue}]}={\color {red}\mathbb {E} _{Y}[}M_{X}({\color {darkgreen}t}{\color {red}Y}){\color {red}]}.}$

Proof.

${\displaystyle M_{XY}({\color {darkgreen}t})=\mathbb {E} [e^{{\color {darkgreen}t}{\color {blue}X}{\color {red}Y}}]{\overset {\text{lote}}{=}}{\color {blue}\mathbb {E} _{X}{\bigg [}}{\color {red}\mathbb {E} _{Y}[e}^{{\color {darkgreen}t}{\color {blue}X}{\color {red}Y}}|{\color {blue}X}{\color {red}]}{\color {blue}{\bigg ]}}={\color {blue}\mathbb {E} _{X}[}M_{Y}({\color {darkgreen}t}{\color {blue}X}){\color {blue}]}.}$

Similarly,
${\displaystyle M_{XY}({\color {darkgreen}t})=\mathbb {E} [e^{{\color {darkgreen}t}{\color {blue}X}{\color {red}Y}}]{\overset {\text{lote}}{=}}{\color {red}\mathbb {E} _{X}{\bigg [}}{\color {blue}\mathbb {E} _{X}[e}^{{\color {darkgreen}t}{\color {blue}X}{\color {red}Y}}|{\color {red}Y}{\color {blue}]}{\color {red}{\bigg ]}}={\color {red}\mathbb {E} _{Y}[}M_{X}({\color {darkgreen}t}{\color {red}Y}){\color {red}]}.}$

• lote: law of total expectation

${\displaystyle \Box }$

Remark.

• This equality does not hold if ${\displaystyle X}$  and ${\displaystyle Y}$  are not independent.

### Joint moment generating function

Definition. (Joint moment generating function) The joint moment generating function (mgf) of random vector ${\displaystyle \mathbf {X} =(X_{1},\dotsc ,X_{n})^{T}}$  is

${\displaystyle M_{\mathbf {X} }({\color {darkgreen}\mathbf {t} })=\mathbb {E} [e^{{\color {darkgreen}\mathbf {t} }\cdot \mathbf {X} }]=\mathbb {E} [e^{{\color {darkgreen}t_{1}}X_{1}+\dotsb +{\color {darkgreen}t_{n}}X_{n}}]}$

for each (column) vector ${\displaystyle \mathbf {t} =(t_{1},\dotsc ,t_{n})^{T}}$ , if the expectation exists.

Remark.

• When ${\displaystyle n=1}$ , the dot product of two vectors is product of two numbers.
• ${\displaystyle \mathbf {t} \cdot \mathbf {X} {\overset {\text{ def }}{=}}\mathbf {t} ^{T}\mathbf {X} }$ .

Proposition. (Relationship between independence and mgf) Random variables ${\displaystyle X_{1},\dotsc ,X_{n}}$  are independent if and only if

${\displaystyle M_{\mathbf {X} }(\mathbf {t} )=M_{X_{1}}(t_{1})\dotsb M_{X_{n}}(t_{n}).}$

Proof.

• 'only if' part:

${\displaystyle M_{\mathbf {X} }(\mathbf {t} )=\mathbb {E} [e^{\mathbf {t} \cdot \mathbf {X} }]=\mathbb {E} [e^{t_{1}X_{1}}\dotsb e^{t_{n}X_{n}}]{\overset {\text{ independence }}{=}}\mathbb {E} [e^{t_{1}X_{1}}]\dotsb \mathbb {E} [e^{t_{n}X_{n}}]=M_{X_{1}}(t_{1})\dotsb M_{X_{n}}(t_{n}).}$

• Proof for 'if' part is quite complicated, and thus is omitted.

${\displaystyle \Box }$

Analogously, we have marginal mgf.

Definition. (Marginal mgf) The marginal mgf of ${\displaystyle X_{i}}$  which is a member of random variables ${\displaystyle X_{1},\dotsc ,X_{n}}$  is

${\displaystyle M_{X_{i}}(t)=M_{\mathbf {X} }(0,\dotsc ,0,\underbrace {t} _{i{\text{ th position}}},0,\dotsc ,0)}$

Proposition. (Moment generating function of linear transformation of random variables) For each constant vector ${\displaystyle {\color {red}\mathbf {a} }=({\color {red}a_{1}},\dotsc ,{\color {red}a_{n}})}$  and a real constant ${\displaystyle {\color {blue}b}}$ , the mgf of ${\displaystyle {\color {red}\mathbf {a} }\cdot \mathbf {X} +{\color {blue}b}}$  is

${\displaystyle M_{{\color {red}\mathbf {a} }\cdot \mathbf {X} +{\color {blue}b}}(t)=e^{{\color {blue}b}t}M_{\mathbf {X} }(t{\color {red}\mathbf {a} })=e^{{\color {blue}b}t}M_{\mathbf {X} }(t{\color {red}a_{1}},\dotsc ,t{\color {red}a_{n}}).}$

Proof.

${\displaystyle M_{{\color {red}\mathbf {a} }\cdot \mathbf {X} +{\color {blue}b}}(t)=\mathbb {E} [e^{t{\color {red}\mathbf {a} }\cdot \mathbf {X} +{\color {blue}b}t}]=e^{{\color {blue}b}t}\mathbb {E} [e^{t{\color {red}\mathbf {a} }\cdot \mathbf {X} }]=e^{{\color {blue}b}t}M_{\mathbf {X} }(t{\color {red}\mathbf {a} })=e^{{\color {blue}b}t}M_{\mathbf {X} }(t{\color {red}a_{1}},\dotsc ,t{\color {red}a_{n}}).}$

${\displaystyle \Box }$

Remark.

• If ${\displaystyle X_{1},\dotsc ,X_{n}}$  are independent (suppose ${\displaystyle \mathbf {X} =(X_{1},\dotsc ,X_{n})^{T}}$ ),

${\displaystyle M_{{\color {red}\mathbf {a} }\cdot \mathbf {X} +{\color {blue}b}}(t)=e^{{\color {blue}b}t}M_{X_{1}}(t{\color {red}a_{1}})\dotsb M_{X_{n}}(t{\color {red}a_{n}}).}$

• This provides an alternative, and possibly more convenient method to derive the distribution of ${\displaystyle {\color {red}\mathbf {a} }\cdot \mathbf {X} +{\color {blue}b}}$ , compared with deriving it from probability functions of ${\displaystyle X_{1},\dotsc ,X_{n}}$ .
• Special case: if ${\displaystyle {\color {red}\mathbf {a} }=(1,\dotsc ,1)^{T}}$  and ${\displaystyle {\color {blue}b}=0}$ , then ${\displaystyle {\color {red}\mathbf {a} }\cdot \mathbf {X} +{\color {blue}b}=X_{1}+\dotsb +X_{n}}$ , which is sum of r.v.'s.
• So, ${\displaystyle M_{X_{1}+\dotsb +X_{n}}(t)=M_{\mathbf {X} }(t,\dotsc ,t)}$ .
• In particular, if ${\displaystyle X_{1},\dotsc ,X_{n}}$  are independent , then ${\displaystyle M_{X_{1}+\dotsb +X_{n}}(t)=M_{X_{1}}(t)\dotsb M_{X_{n}}(t)}$ .
• We can use this result to prove the formulas for sum of independent r.v.'s., instead of using the proposition about convolution of r.v.'s.
• Special case: if ${\displaystyle n=1}$ , then the expression for linear transformation becomes ${\displaystyle {\color {red}a}X+{\color {blue}b}}$ .
• So, ${\displaystyle M_{{\color {red}a}X+{\color {blue}b}}(t)=e^{{\color {blue}b}t}M_{X}({\color {red}a}t)}$ .

### Moment generating function of some important distributions

Proposition. (Moment generating function of binomial distribution) The moment generating function of ${\displaystyle X\sim \operatorname {Binom} (n,p)}$  is ${\displaystyle M_{X}(t)=(pe^{t}+1-p)^{n}}$ .

Proof.

${\displaystyle M_{X}(t)=\sum _{k=0}^{n}{\color {blue}e^{tk}}\underbrace {{\binom {n}{k}}{\color {blue}p^{k}}(1-p)^{n-k}} _{{\text{for}}\operatorname {Binom} (n,p)}=\sum _{k=0}^{n}{\binom {n}{k}}{\color {blue}(pe^{t})^{k}}(1-p)^{n-k}=(pe^{t}+1-p)^{n}\quad {\text{by binomial theorem}}.}$

${\displaystyle \Box }$

Proposition. (Moment generating function of Poisson distribution) The moment generating function of ${\displaystyle X\sim \operatorname {Pois} (\lambda )}$  is ${\displaystyle M_{X}(t)=e^{\lambda (e^{t-1})}}$ .

Proof.

${\displaystyle M_{X}(t){\overset {\text{ def }}{=}}\mathbb {E} [e^{tX}]{\overset {\text{ LOTUS }}{=}}\sum _{{\color {darkdarkgreen}k}=0}^{\infty }e^{{\color {darkorange}t}{\color {darkdarkgreen}k}}\cdot \underbrace {\frac {e^{\color {red}-\lambda }{\color {purple}\lambda }^{\color {darkdarkgreen}k}}{{\color {darkdarkgreen}k}!}} _{{\text{for}}\operatorname {Pois} (\lambda )}=e^{{\color {blue}\lambda }({\color {blue}e^{t}}{\color {red}-1})}\overbrace {\sum _{{\color {darkdarkgreen}k}=0}^{\infty }\underbrace {\frac {e^{\color {blue}-\lambda e^{t}}({\color {purple}\lambda }e^{\color {darkorange}t})^{\color {darkdarkgreen}k}}{{\color {darkdarkgreen}k}!}} _{{\text{for}}\operatorname {Pois} (\lambda e^{t})}} ^{=1}=e^{\lambda (e^{t}-1)}.}$

${\displaystyle \Box }$

Proposition. (Moment generating function of exponential distribution) The moment generating function of ${\displaystyle X\sim \operatorname {Exp} (\lambda )}$  is ${\displaystyle M_{X}(t)={\frac {\lambda }{\lambda -t}},\quad t<\lambda }$ .

Proof.

• ${\displaystyle M_{X}(t)=\mathbb {E} [e^{tX}]=\lambda \int _{0}^{\infty }e^{tx}e^{-\lambda x}\,dx=\lambda \int _{0}^{\infty }e^{-(\lambda -t)x}\,dx={\frac {\lambda }{\color {blue}\lambda -t}}\overbrace {\int _{0}^{\infty }\underbrace {{\color {blue}(\lambda -t)}e^{-(\lambda -t)x}} _{{\text{for}}\operatorname {Exp} (\lambda -t)}\,dx} ^{=1},\quad \underbrace {\lambda -t>0} _{\text{ensuring valid rate parameter}}\Leftrightarrow t<\lambda .}$

• The result follows.

${\displaystyle \Box }$

Proposition. (Moment generating function of gamma distribution) The moment generating function of ${\displaystyle X\sim \operatorname {Gamma} (\alpha ,\lambda )}$  is ${\displaystyle M_{X}(t)=\left({\frac {\lambda }{\lambda -t}}\right)^{\alpha },\quad t<\lambda }$ .

Proof.

• We use similar proof technique from the proof for mgf of exponential distribution.

${\displaystyle M_{X}(t)=\mathbb {E} [e^{tX}]={\frac {\lambda ^{\alpha }}{\Gamma (\alpha )}}\int _{0}^{\infty }e^{tx}x^{\alpha -1}e^{-\lambda x}\,dx={\frac {\lambda ^{\alpha }}{\Gamma (\alpha )}}\int _{0}^{\infty }e^{-(\lambda -t)x}x^{\alpha -1}\,dx={\frac {\lambda ^{\alpha }}{\color {blue}(\lambda -t)^{\alpha }}}\overbrace {\int _{0}^{\infty }\underbrace {{\frac {\color {blue}(\lambda -t)^{\alpha }}{\Gamma (\alpha )}}e^{-(\lambda -t)x}x^{\alpha -1}} _{{\text{for}}\operatorname {Gamma} (\alpha ,\lambda -t)}\,dx} ^{=1},\quad \underbrace {\lambda -t>0} _{\text{ensuring valid rate parameter}}\Leftrightarrow t<\lambda .}$

• The result follows.

${\displaystyle \Box }$

Proposition. (Moment generating function of normal distribution) The moment generating function of ${\displaystyle X\sim {\mathcal {N}}({\color {blue}\mu },{\color {red}\sigma ^{2}})}$  is ${\displaystyle M_{X}(t)=e^{{\color {blue}\mu }t+{\color {red}\sigma ^{2}}t^{2}/2}}$ .

Proof.

• Let ${\displaystyle Z={\frac {X-{\color {blue}\mu }}{\color {red}\sigma }}\sim {\mathcal {N}}(0,1)}$ . Then, ${\displaystyle X={\color {red}\sigma }Z+{\color {blue}\mu }}$ .
• First, consider the mgf of ${\displaystyle Z}$ :

${\displaystyle M_{Z}(t){\overset {\text{ def }}{=}}\mathbb {E} [e^{tZ}]={\frac {1}{\sqrt {2\pi }}}\int _{-\infty }^{\infty }\underbrace {e^{tx}e^{-x^{2}/2}} _{=e^{-(x^{2}-2tx)/2}}\,dx={\frac {1}{\sqrt {2\pi }}}\int _{-\infty }^{\infty }\exp {\big (}\overbrace {-(x^{2}-2tx+{\color {darkdarkgreen}t^{2}})} ^{=-(x-t)^{2}}/2+{\color {darkdarkgreen}t^{2}/2}{\big )}\,dx=e^{t^{2}/2}\overbrace {\int _{-\infty }^{\infty }\underbrace {{\frac {1}{\sqrt {2\pi }}}\cdot e^{-(x-t)^{2}/2}} _{{\text{for }}{\mathcal {N}}(t,1)}\,dx} ^{=1}=e^{t^{2}/2}.}$

• It follows that the mgf of ${\displaystyle X}$  is

${\displaystyle M_{X}(t)=e^{{\color {blue}\mu }t}M_{X}({\color {red}\sigma }t)=e^{{\color {blue}\mu }t}e^{{\color {red}\sigma }^{2}t^{2}/2}.}$

• The result follows.

${\displaystyle \Box }$

### Distribution of linear transformation of random variables

We will prove some propositions about distributions of linear transformation of random variables using mgf. Some of them are mentioned in previous chapters. As we will see, proving these propositions using mgf is quite simple.

Proposition. (Distribution of linear transformation of normal r.v.'s) Let ${\displaystyle X\sim {\mathcal {N}}(\mu ,\sigma ^{2})}$ . Then, ${\displaystyle {\color {red}a}X+{\color {blue}b}\sim {\mathcal {N}}({\color {red}a}\mu +{\color {blue}b},{\color {red}a^{2}}\sigma ^{2})}$ .

Proof.

• The mgf of ${\displaystyle {\color {red}a}X+{\color {blue}b}}$  is

${\displaystyle M_{{\color {red}a}X+{\color {blue}b}}(t)=e^{{\color {blue}b}t}M_{X}({\color {red}a}t)=e^{{\color {blue}b}t}\left(\exp({\color {red}a}\mu t+({\color {red}a}\sigma )^{2}t^{2}/2)\right)=\exp \left(({\color {red}a}\mu +{\color {blue}b})t+{\color {red}a^{2}}\sigma ^{2}t^{2}/2\right),}$

which is the mgf of ${\displaystyle {\mathcal {N}}({\color {red}a}\mu +{\color {blue}b},{\color {red}a^{2}}\sigma ^{2})}$ , and the result follows since mgf identify a distribution uniquely.

${\displaystyle \Box }$

#### Sum of independent random variables

Proposition. (Sum of independent binomial r.v.'s) Let ${\displaystyle X_{1}\sim \operatorname {Binom} (n_{1},p),\dotsc ,X_{m}\sim \operatorname {Binom} (n_{m},p)}$ , in which ${\displaystyle X_{1},\dotsc ,X_{m}}$  are independent. Then, ${\displaystyle X_{1}+\dotsb +X_{n}\sim \operatorname {Binom} (n_{1}+\dotsb +n_{m},p)}$ .

Proof.

• The mgf of ${\displaystyle X_{1}+\dotsb +X_{n}}$  is

${\displaystyle M_{X_{1}+\dotsb +X_{n}}(t)=M_{X_{1}}(t)\dotsb M_{X_{n}}(t)=(pe^{t}+1-p)^{n_{1}}\dotsb (pe^{t}+1-p)^{n_{m}}=(pe^{t}+1-p)^{n_{1}+\dotsb +n_{m}},}$

which is the mgf of ${\displaystyle \operatorname {Binom} (n_{1}+\dotsb +n_{m},p)}$ , as desired.

${\displaystyle \Box }$

Proposition. (Sum of independent Poisson r.v.'s) Let ${\displaystyle X_{1}\sim \operatorname {Pois} (\lambda _{1}),\dotsc ,X_{n}\sim \operatorname {Pois} (\lambda _{n})}$ , in which ${\displaystyle X_{1},\dotsc ,X_{n}}$  are independent. Then, ${\displaystyle X_{1}+\dotsb +X_{n}\sim \operatorname {Pois} (\lambda _{1}+\dotsb +\lambda _{n})}$ .

Proof.

• The mgf of ${\displaystyle X_{1}+\dotsb +X_{n}}$  is

${\displaystyle M_{X_{1}+\dotsb +X_{n}}(t)=M_{X_{1}}(t)\dotsb M_{X_{n}}(t)=e^{\lambda _{1}(e^{t}-1)}\dotsb e^{\lambda _{n}(e^{t}-1)}=e^{(\lambda _{1}+\dotsb +\lambda _{n})(e^{t}-1)},}$

which is the mgf of ${\displaystyle \operatorname {Pois} (\lambda _{1}+\dotsb +\lambda _{n})}$ , as desired.

${\displaystyle \Box }$

Proposition. (Sum of independent exponential r.v.'s) Let ${\displaystyle X_{1},\dotsc ,X_{n}}$  be i.i.d. r.v.'s following ${\displaystyle \operatorname {Exp} (\lambda )}$ . Then, ${\displaystyle X_{1}+\dotsb +X_{n}\sim \operatorname {Gamma} (n,\lambda )}$ .

Proof.

• The mgf of ${\displaystyle X_{1}+\dotsb +X_{n}}$  is

${\displaystyle M_{X_{1}+\dotsb +X_{n}}(t)=M_{X_{1}}(t)\dotsb M_{X_{n}}(t)=\left({\frac {\lambda }{\lambda -t}}\right)^{n},}$

which is the mgf of ${\displaystyle \operatorname {Gamma} (n,\lambda )}$ , as desired.

${\displaystyle \Box }$

Proposition. (Sum of independent gamma r.v.'s) Let ${\displaystyle X_{1}\sim \operatorname {Gamma} (\alpha _{1},\lambda ),\dotsc ,X_{n}\sim \operatorname {Gamma} (\alpha _{n},\lambda )}$ , in which ${\displaystyle X_{1},\dotsc ,X_{n}}$  are independent. Then, ${\displaystyle X_{1}+\dotsb +X_{n}\sim \operatorname {Gamma} (\alpha _{1}+\dotsb +\alpha _{n},\lambda )}$ .

Proof.

• The mgf of ${\displaystyle X_{1}+\dotsb +X_{n}}$  is

${\displaystyle M_{X_{1}+\dotsb +X_{n}}(t)=M_{X_{1}}(t)\dotsb M_{X_{n}}(t)=\left({\frac {\lambda }{\lambda -t}}\right)^{\alpha _{1}}\dotsb \left({\frac {\lambda }{\lambda -t}}\right)^{\alpha _{n}}=\left({\frac {\lambda }{\lambda -t}}\right)^{\alpha _{1}+\dotsb +\alpha _{n}},}$

which is the mgf of ${\displaystyle \operatorname {Gamma} (\alpha _{1}+\dotsb +\alpha _{n},\lambda )}$ , as desired.

${\displaystyle \Box }$

Proposition. (Sum of independent normal r.v.'s) Let ${\displaystyle X_{1}\sim {\mathcal {N}}(\mu _{1},\sigma _{1}^{2}),\dotsc ,X_{n}\sim {\mathcal {N}}(\mu _{n},\sigma _{n}^{2})}$ , in which ${\displaystyle X_{1},\dotsc ,X_{n}}$  are independent. Then ${\displaystyle X_{1}+\dotsb +X_{n}\sim {\mathcal {N}}(\mu _{1}+\dotsb +\mu _{n},\sigma _{1}^{2}+\dotsb +\sigma _{n}^{2})}$ .

Proof.

• The mgf of ${\displaystyle X_{1}+\dotsb +X_{n}}$  (in which they are independent) is

${\displaystyle M_{X_{1}+\dotsb +X_{n}}(t)=M_{X_{1}}(t)\dotsb M_{X_{n}}(t)=\exp(\mu _{1}t+\sigma _{1}^{2}t^{2}/2)\dotsb \exp(\mu _{n}t+\sigma _{n}^{2}t^{2}/2)=\exp \left((\mu _{1}+\dotsb +\mu _{n})t+(\sigma _{1}^{2}+\dotsb +\sigma _{n}^{2})t^{2}/2\right),}$

which is the mgf of ${\displaystyle {\mathcal {N}}(\mu _{1}+\dotsb +\mu _{n},\sigma _{1}^{2}+\dotsb +\sigma _{n}^{2})}$ , as desired.

${\displaystyle \Box }$

## Central limit theorem

We will provide a proof to central limit theorem (CLT) using mgf here.

Theorem. (Central limit theorem) Let ${\displaystyle X_{1},X_{2},\dotsc }$  be a sequence of i.i.d. random variables with finite mean ${\displaystyle \mu }$  and positive variance ${\displaystyle \sigma ^{2}}$ , and ${\displaystyle {\overline {X}}_{n}}$  be the sample mean of the first ${\displaystyle n}$  random variables, i.e. ${\displaystyle {\overline {X}}_{n}={\frac {X_{1}+\dotsb +X_{n}}{n}}}$ . Then, the standardized sample mean ${\displaystyle {\frac {{\sqrt {n}}({\overline {X}}_{n}-\mu )}{\sigma }}}$  converges in distribution to a standard normal random variable as ${\displaystyle n\to \infty }$ .

Proof.

• Define ${\displaystyle T_{n}={\frac {{\sqrt {n}}({\overline {X}}_{n}-\mu )}{\sigma }}}$ . Then, we have

${\displaystyle T_{n}={\frac {{\sqrt {n}}{\big (}(X_{1}+\dotsb +X_{n})/n-\mu {\big )}}{\sigma }}={\frac {X_{1}+\dotsb +X_{n}}{\color {red}\sigma {\sqrt {n}}}}{\color {blue}-{\frac {{\sqrt {n}}\mu }{\sigma }}},}$

• which is in the form of ${\displaystyle {\color {red}\mathbf {a} }\cdot \mathbf {X} +{\color {blue}b},\quad {\color {red}\mathbf {a} =\left({\frac {1}{\sigma {\sqrt {n}}}},\dotsc ,{\frac {1}{\sigma {\sqrt {n}}}}\right)^{T}}{\text{ and }}{\color {blue}b=-{\frac {{\sqrt {n}}\mu }{\sigma }}}}$ .
• Therefore,

{\displaystyle {\begin{aligned}&&M_{T_{n}}(t)&=e^{\color {blue}-{\sqrt {n}}\mu t/\sigma }\left(M_{X_{1}}\left({\frac {t}{\color {red}\sigma {\sqrt {n}}}}\right)\dotsb M_{X_{n}}\left({\frac {t}{\color {red}\sigma {\sqrt {n}}}}\right)\right)\\&&M_{T_{n}}(t)&=e^{-{\sqrt {n}}\mu t/\sigma }\left(M_{X_{1}}\left({\frac {t}{\sigma {\sqrt {n}}}}\right)\right)^{n}\quad {\text{since }}X_{1},\dotsc ,X_{n}{\text{ are identically distributed, which is equivalent to they have the same mgf}}\\&\Rightarrow &\ln M_{T_{n}}(t)&=-{\sqrt {n}}\mu t/\sigma +n\ln(\mathbb {E} [e^{t/(\sigma {\sqrt {n}})X_{1}}])\\&&&=-{\sqrt {n}}\mu t/\sigma +n\ln \mathbb {E} \left[1+t/(\sigma {\sqrt {n}})X_{1}+(1/2!)t^{2}/(\sigma ^{2}n)+\dotsb \right]\quad {\text{since }}e^{x}=1+x+{\frac {x^{2}}{2!}}+\dotsb \\&&&=-{\sqrt {n}}\mu t/\sigma +n\ln {\big (}1+t/(\sigma {\sqrt {n}}){\color {blue}\mathbb {E} [X]}+(1/2!)t^{2}/(\sigma ^{2}n)({\color {blue}\underbrace {\mathbb {E} [X^{2}]} _{\operatorname {Var} (X)+(\mathbb {E} [X])^{2}}})+{\text{terms of order smaller than }}n^{-1}{\big )}\\&&&=-{\sqrt {n}}\mu t/\sigma +n\ln \left(1+t/(\sigma {\sqrt {n}}){\color {blue}\mu }+(1/2)t^{2}/(\sigma ^{2}n)({\color {blue}\sigma ^{2}+\mu ^{2}})+{\text{terms of order smaller than }}n^{-1}\right)\\&&&=-{\sqrt {n}}\mu t/\sigma +n[t/(\sigma {\sqrt {n}}){\color {blue}\mu }+(1/2)t^{2}/(\sigma ^{2}n)({\color {blue}\sigma ^{2}+\mu ^{2}})-(1/2)(t/(\sigma {\sqrt {n}}){\color {blue}\mu })^{2}+{\text{terms of order smaller than }}n^{-1}]\quad {\text{since }}\ln(1+x)=x-x^{2}/2+\dotsb \\&&&={\cancel {-{\sqrt {n}}\mu t/\sigma }}{\cancel {+{\sqrt {n}}{\color {blue}\mu }t/\sigma }}+{\color {purple}{\cancel {n}}}(1/2)t^{2}/(\sigma ^{2}{\color {purple}{\cancel {n}}})({\color {blue}\sigma ^{2}}+{\color {blue}\mu ^{2}})-{\color {red}{\cancel {n}}}(1/2)(t^{2}/(\sigma ^{2}{\color {red}{\cancel {n}}}){\color {blue}\mu }^{2})+{\text{terms of order smaller than }}n^{0}\\&&&=(1/2)t^{2}(\sigma ^{2}/\sigma ^{2}){\cancel {+(1/2){\color {blue}\mu ^{2}}t^{2}-(1/2)t^{2}({\color {blue}\mu }^{2})}}+{\text{terms of order smaller than }}n^{0}\\&&&=(1/2)t^{2}+\underbrace {{\text{terms of order smaller than }}n^{0}} _{\to 0{\text{ as }}n\to \infty }\\&\Rightarrow &\lim _{n\to \infty }M_{T_{n}}(t)&=\underbrace {e^{t^{2}/2}} _{{\text{mgf of }}{\mathcal {N}}(0,1)},\end{aligned}}}

and the result follows from the mgf property of identifying distribution uniquely.

${\displaystyle \Box }$

Remark.

• Since ${\displaystyle {\frac {{\sqrt {n}}({\overline {X}}-\mu )}{\sigma }}\sim {\mathcal {N}}(0,1)\Leftrightarrow {\color {blue}{\frac {\sigma }{\sqrt {n}}}}\cdot {\frac {{\sqrt {n}}({\overline {X}}-\mu )}{\sigma }}{\color {red}+\mu }\sim {\mathcal {N}}({\color {red}\mu },{\color {blue}\sigma ^{2}/n})\Leftrightarrow {\overline {X}}\sim {\mathcal {N}}(\mu ,\sigma ^{2}/n)}$ ,
• the sample mean converges in distribution to ${\displaystyle {\mathcal {N}}(\mu ,\sigma ^{2}/n)}$  as ${\displaystyle n\to \infty }$ .
• The same result holds for the sample mean of normal r.v.'s with the same mean ${\displaystyle \mu }$  and the same variance ${\displaystyle \sigma ^{2}}$ ,
• since if ${\displaystyle X_{1},\dotsc ,X_{n}\sim {\mathcal {N}}(\mu ,\sigma ^{2})}$ , then ${\displaystyle {\frac {X_{1}+\dotsb +X_{n}}{\color {blue}n}}\sim {\mathcal {N}}\left({\frac {\overbrace {\mu +\dotsb +\mu } ^{n{\text{ times}}}}{\color {blue}n}},{\frac {\overbrace {\sigma ^{2}+\dotsb +\sigma ^{2}} ^{n{\text{ times}}}}{\color {blue}n^{2}}}\right)\equiv {\mathcal {N}}(\mu ,\sigma ^{2}/n)}$ .
• It follows from the proposition about the distribution of linear transformation of normal r.v.'s that the sample sum, i.e. ${\displaystyle X_{1}+\dotsb +X_{n}={\color {blue}n}{\overline {X}}}$  converges in distribution to ${\displaystyle {\mathcal {N}}({\color {blue}n}\mu ,{\color {blue}n^{2}}\sigma ^{2}/n)\equiv {\mathcal {N}}(n\mu ,n\sigma ^{2})}$ .
• The same result holds for the sample sum of normal r.v.'s with the same mean ${\displaystyle \mu }$  and the same variance ${\displaystyle \sigma ^{2}}$ ,
• since if ${\displaystyle X_{1},\dotsc ,X_{n}\sim {\mathcal {N}}(\mu ,\sigma ^{2})}$ , then ${\displaystyle X_{1}+\dotsb +X_{n}\sim {\mathcal {N}}\left(\overbrace {\mu +\dotsb +\mu } ^{n{\text{ times}}},\overbrace {\sigma ^{2}+\dotsb +\sigma ^{2}} ^{n{\text{ times}}}\right)\equiv {\mathcal {N}}(n\mu ,n\sigma ^{2})}$ .
• If a r.v. converges in distribution to a distribution, then we can use the distribution to approximate the probabilities involving the r.v..

A special case of using CLT as approximation is using normal distribution to approximate discrete distribution. To improve accuracy, we should ideally have continuity correction, as explained in the following.

Proposition. (Continuity correction) A continuity correction is rewriting the probability expression ${\displaystyle \mathbb {P} (X=i)}$  (${\displaystyle i}$  is integer) as ${\displaystyle \mathbb {P} (i-1/2  when approximating a discrete distribution by normal distribution using CLT.

Remark.

• The reason for doing this is to make ${\displaystyle i}$  to be at the 'middle' of the interval, so that it is better approximated.

Illustration of continuity correcction:

|
|              /
|             /
|            /
|           /|
|          /#|
|         *##|
|        /|##|
|       /#|##|
|      /##|##|
|     /|##|##|
|    / |##|##|
|   /  |##|##|
|  /   |##|##|
| /    |##|##|
*------*--*--*---------------------
i-1/2 i i+1/2

|
|              /
|             /
|            /
|           /
|          /
|         *
|        /|
|       /#|
|      /##|
|     /###|
|    /####|
|   /#####|
|  /|#####|
| / |#####|
*---*-----*------------------------
i-1    i

|
|              /|
|             /#|
|            /##|
|           /###|
|          /####|
|         *#####|
|        /|#####|
|       / |#####|
|      /  |#####|
|     /   |#####|
|    /    |#####|
|   /     |#####|
|  /      |#####|
| /       |#####|
*---------*-----*------------------
i     i+1

1. or equivalently, transformation between supports of ${\displaystyle \mathbf {X} }$  and ${\displaystyle \mathbf {Y} }$