Probability/Transformation of Probability Densities

A PDF version of Transformation of Probability Densities is available. 271 kb, 14 pages (info)

This Wikibook shows how to transform the probability density of a continuous random variable in both the one-dimensional and multidimensional case. In other words, it shows how to calculate the distribution of a function of continuous random variables. The first section formulates the general problem and provides its solution. However, this general solution is often quite hard to evaluate and simplifications are possible in special cases, e.g., if the random vector is one-dimensional or if the components of the random vector are independent. Formulas for these special cases are derived in the subsequent sections. This Wikibook also aims to provide an overview of different equations used in this field and show their connection.

General Problem and Solution (n-to-m mapping)

Let ${\vec {X}}=(X_{1},\ldots ,X_{n})$ be a random vector with the probability density function, pdf, $\varrho _{\vec {X}}(x_{1},\ldots ,x_{n})$ and let $f:\mathbb {R} ^{n}\to \mathbb {R} ^{m}$ be a (Borel-measurable) function. We are looking for the probability density function $\varrho _{\vec {Y}}$ of ${\vec {Y}}:={\vec {f}}({\vec {X}})$ .

First, we need to remember the definition of the cumulative distribution function, cdf, $F_{\vec {Y}}({\vec {y}})$ of a random vector: It measures the probability that each component of Y takes a value smaller than the corresponding component of y. We will use a short-hand notation and say that two vectors are "less or equal" (≤) if all of their components are.

F_{\vec {Y}}({\vec {y}})=P\left({\vec {Y}}\leq {\vec {y}}\right)=P\left({\vec {f}}({\vec {X}})\leq {\vec {y}}\right)

(1)

The wanted density $\varrho _{\vec {Y}}({\vec {y}})$ is then obtained by differentiating $F_{\vec {Y}}({\vec {y}})$ :

\varrho _{\vec {Y}}({\vec {y}})={\frac {\partial }{\partial y_{1}}}\cdots {\frac {\partial }{\partial y_{m}}}F_{\vec {Y}}({\vec {y}})

(2)

Thus, the general solution can be expressed as the m‘th derivative of an n-dimensional integral:

ℝⁿ → ℝ^m mapping

$\varrho _{\vec {Y}}({\vec {y}})={\frac {\partial }{\partial y_{1}}}\cdots {\frac {\partial }{\partial y_{m}}}\int _{\left\{{\vec {x}}\in \mathbb {R} ^{n}\mid {\vec {f}}({\vec {x}})\leq {\vec {y}}\right\}}\,\varrho _{\vec {X}}({\vec {x}})\,d^{n}x$

(3)

The following sections will provide simplifications in special cases.

Function of a Random Variable (n=1, m=1)

If n=1 and m=1, X is a continuously distributed random variable with the density $\varrho _{X}$ and $f:\mathbb {R} \rightarrow \mathbb {R}$ is a Borel-measurable function. Then also Y := f(X) is continuously distributed and we are looking for the density $\varrho _{Y}(y)$ .

In the following, f should always be at least differentiable.

Let us first note that there may be values that are never reached by f, e.g. y<0 if f(x) = x². For all those y, necessarily $\varrho _{Y}(y)=0$ .

\varrho _{Y}(y)={\begin{cases}0,&{\text{if }}y\notin f(\mathbb {R} )\\?,&{\text{if }}y\in f(\mathbb {R} )\end{cases}}

Following equations 1 and 2, we obtain

\varrho _{Y}(y)={\frac {d}{dy}}F_{Y}(y)={\frac {d}{dy}}P(Y\leq y)={\frac {d}{dy}}P(f(X)\leq y)

(4)

We will now rearrange this expression in various ways.

Derivation using cumulated distribution function F_x

At first, we limit ourselves to f whose derivative is never 0 (thus, f is a diffeomorphism). Then, the inverse map $f^{-1}$ exists and f is either monotonically increasing or monotonically decreasing.

If f is monotonically increasing, then $x\leq f^{-1}(y)\Leftrightarrow f(x)\leq f(f^{-1}(y))=y$ and $f^{\prime }>0$ . Therefore:

{\begin{array}{rcl}\varrho _{Y}(y)&=&{\frac {d}{dy}}P(f(X)\leq y)={\frac {d}{dy}}P(X\leq f^{-1}(y))\\&=&{\frac {d}{dy}}F_{X}(f^{-1}(y))=\varrho _{X}(f^{-1}(y)){\frac {df^{-1}(y)}{dy}}\end{array}}

If f is monotonically decreasing, then $x\leq f^{-1}(y)\Leftrightarrow f(x)\geq f(f^{-1}(y))=y$ and $f^{\prime }<0$ . Therefore:

{\begin{array}{rcl}\varrho _{Y}(y)&=&{\frac {d}{dy}}P(f(X)\leq y)={\frac {d}{dy}}P(X\geq f^{-1}(y))\\&=&{\frac {d}{dy}}\left(1-F_{X}(f^{-1}(y))\right)=-\varrho _{X}(f^{-1}(y)){\frac {df^{-1}(y)}{dy}}\end{array}}

This can be summarized as:

\varrho _{Y}(y)={\begin{cases}0,&{\text{if }}y\notin f(\mathbb {R} )\\\varrho _{X}(f^{-1}(y))\cdot \left|{\frac {df^{-1}(y)}{dy}}\right|,&{\text{if }}y\in f(\mathbb {R} )\end{cases}}

(5)

If now the derivative $f^{\prime }(x_{i})=0$ does vanish at some positions $x_{i}$ , $i=1,\ldots ,N$ , then we split the definition space of f using those position into $N+1$ disjoint intervals $I_{j}$ . Equation 5 holds for the functions $f_{I_{j}}$ limited in their definition space to those intervals $I_{j}$ . We have

{\begin{array}{rcl}P(f(X)\leq y)&=&\sum \limits _{j=1}^{N+1}P(f_{I_{j}}(X)\leq y)\\\varrho _{Y}(y)={\frac {d}{dy}}P(f(X)\leq y)&=&\sum \limits _{j=1}^{N+1}\varrho _{X}(f_{I_{j}}^{-1}(y))\cdot \left|{\frac {df_{I_{j}}^{-1}(y)}{dy}}\right|\end{array}}

(5a)

With the convention that the sum over 0 addends is 0 and using the inverse function theorem, it is possible to write this in a more compact form (read as: sum over all x, where f(x)=y):

ℝ → ℝ mapping

$\varrho _{Y}(y)=\sum \limits _{x,f(x)=y}{\frac {\varrho _{X}(x)}{\left|f^{\prime }(x)\right|}}$

(6)

Derivation using Integral Substitution

In this section we consider a different derivation.

The probability in 4 is the integral over the probability density. Again in the case of monotonically increasing f, we have:

{\begin{array}{rcl}\int _{-\infty }^{y}\varrho _{Y}(u)\,du&=&P(Y\leq y)=P(f(X)\leq y)=P(X\leq f^{-1}(y))\\&=&\int _{-\infty }^{f^{-1}(y)}\varrho _{X}(x)\,dx\end{array}}

Now we substitute u = f(x) in the integral on the right hand site, i.e. $x=f^{-1}(u)$ and ${\frac {du}{dx}}=f^{\prime }(x)$ . The integral limits are then from -∞ to y and in the rule “ $dx={\frac {dx}{du}}\,du$ ” we have ${\frac {dx}{du}}={\frac {d\,f^{-1}(u)}{du}}$ due to the inverse function theorem. Consequentially:

\int _{-\infty }^{y}\varrho _{Y}(u)\,du=\int _{-\infty }^{y}\varrho _{X}(f^{-1}(u))\,{\frac {d\,f^{-1}(u)}{du}}\,du

Taking the derivative of both sides with respect to y, we get:

\varrho _{Y}(y)=\varrho _{X}(f^{-1}(y))\,{\frac {d\,f^{-1}(y)}{dy}}

Following the same argument as in the last section, we can again derive equation 6.

This rule often misleads physics books to present the following viewpoint, which might be easier to remember, but is not mathematically sound: If you multiply the probability density $\varrho _{X}(x)$ with the “infinitesimal length” $dx$ , then you will get the probability $\varrho _{X}(x)\,dx$ for X to lie in the interval [x, x+dx]. Changing to new coordinates y, you will get by substitution:

\varrho _{X}(x)\,dx=\underbrace {\varrho _{X}(x(y))\,{\frac {dx}{dy}}} _{\varrho _{Y}(y)}\,dy

Derivation using the Delta Distribution

In this section we consider another different derivation, often used in physics.

We start again with equation 4 and write this as an integral:

{\begin{array}{rcl}\varrho _{Y}(y)&=&{\frac {d}{dy}}P(f(X)\leq y)\\&=&{\frac {d}{dy}}\int _{\{x\in \mathbb {R} \mid f(x)\leq y\}}\,\varrho _{X}(x)\,dx\\&=&{\frac {d}{dy}}\int _{\mathbb {R} }\Theta (y-f(x))\,\varrho _{X}(x)\,dx\\&=&\int _{\mathbb {R} }{\frac {d}{dy}}\Theta (y-f(x))\,\varrho _{X}(x)\,dx\\&=&\int _{\mathbb {R} }\delta (y-f(x))\,\varrho _{X}(x)\,dx\end{array}}

The intuitive interpretation of the last expression is: one integrates over all possible x-values and uses the delta “function” to pick all positions where y = f(x). This formula is often found in physics books, possibly written as expectation value, $\langle \ldots \rangle$ :

ℝ → ℝ mapping (using Dirac Delta Distribution)

$\varrho _{Y}(y)=\int _{\mathbb {R} }\delta (y-f(x))\,\varrho _{X}(x)\,dx=\langle \,\delta (y-f(x))\,\rangle \;.$

(7)

We can see this formula is equivalent to equation 6 using the following identity

\int _{\mathbb {R} }\delta (h(x))\,g(x)\,dx=\sum \limits _{x_{0},h(x_{0})=0}{\frac {g(x_{0})}{\left|h^{\prime }(x_{0})\right|}}

Examples

Let us consider the following specific example: let $\varrho _{X}(x)={\frac {\exp[-0.5x^{2}]}{\sqrt {2\pi }}}$ and $f(x)=x^{2}$ . We choose to use equation 6 (equations 5 and 7 lead to the same result). We calculate the derivative $f^{\prime }(x)=2x$ and find all x for which f(x)=y, which are $-{\sqrt {y}}$ and $+{\sqrt {y}}$ if y>0 and none otherwise. For y>0 we have:

\varrho _{Y}(y)=\sum \limits _{x,f(x)=y}{\frac {\varrho _{X}(x)}{\left|f^{\prime }(x)\right|}}={\frac {\varrho _{X}(-{\sqrt {y}})}{\left|f^{\prime }(-{\sqrt {y}})\right|}}+{\frac {\varrho _{X}(+{\sqrt {y}})}{\left|f^{\prime }(+{\sqrt {y}})\right|}}={\frac {\exp[-0.5y]}{{\sqrt {2\pi }}\,2{\sqrt {y}}}}+{\frac {\exp[-0.5y]}{{\sqrt {2\pi }}\,2{\sqrt {y}}}}={\frac {\exp[-0.5y]}{\sqrt {2\pi \,y}}}

Since f never reaches negative values, the sum remains 0 for y<0 and we finally obtain:

\varrho _{Y}(y)={\begin{cases}0,&{\text{if }}y\leq 0\\{\frac {\exp[-0.5y]}{\sqrt {2\pi \,y}}},&{\text{if }}y>0\end{cases}}

This example is illustrated in the following graphics:

Random numbers are generated according to the standard normal distribution X ~ N(0,1). They are shown on the x-axis of figure (a). Many of them are around 0. Each of those numbers is mapped according to y = x², which is shown with grey arrows for two example points. For many generated realisations of X, a histogram on the y-axis will converge towards the wanted probability density function ρ_Y shown in (c). In order to analytically derive this function, we start by observing that in order for Y to be between any v and v+Δv, X must have been between either

-{\sqrt {v+\Delta v}}

and

-{\sqrt {v}}

, or, between

{\sqrt {v}}

and

{\sqrt {v+\Delta v}}

. Those intervals are marked in figures (b) and (c). Because the probability is equal to the area under the probability density function, we can determine ρ_Y from the condition that the grey shaded area in (c) must be equal to the sum of the areas in (b). The areas are calculated using integrals and it is useful to take the limit

\Delta v\to 0

in order to get the formula noted in (c).

Another example is the inverse transformation method. Suppose a computer generates random numbers X with a uniform distribution on [0, 1], i.e.

\varrho _{X}(x)\equiv {\begin{cases}1,&{\text{if }}0\leq x\leq 1\\0,&{\text{else.}}\end{cases}}

If we want to obtain random numbers according to a distribution with the pdf

\varrho _{Z}

, we choose f as the inverse function of the cdf of Z, i.e.

Y=f(X)=F_{Z}^{-1}(X)

. We can now show that Y will have the same distribution as the wanted Z,

\varrho _{Y}=\varrho _{Z}

by using equation 5 and the fact that

f^{-1}=F_{Z}

:

{\begin{array}{rcl}\varrho _{Y}(y)&=&{\begin{cases}0,&{\text{if }}y\notin F_{Z}^{-1}(\mathbb {R} )\\\varrho _{X}(F_{Z}(y))\cdot \left|{\frac {dF_{Z}(y)}{dy}}\right|,&{\text{if }}y\in F_{Z}^{-1}(\mathbb {R} )\end{cases}}\\&=&1\cdot \left|{\frac {dF_{Z}(y)}{dy}}\right|=\varrho _{Z}(y)\end{array}}

.

An example is illustrated in the following plot:

Random numbers y_i are generated from a uniform distribution between 0 and 1, i.e. Y ~ U(0, 1). They are sketched as colored points on the y-axis. Each of the points is mapped according to x=F^-1(y), which is shown with gray arrows for two example points. In this example, we have used an exponential distribution. Hence, for x ≥ 0, the probability density is

\varrho _{X}(x)=\lambda e^{-\lambda \,x}

and the cumulated distribution function is

F(x)=1-e^{-\lambda \,x}

. Therefore,

x=F^{-1}(y)=-{\frac {\ln(1-y)}{\lambda }}

. We can see that using this method, many points end up close to 0 and only few points end up having high x-values - just as it is expected for an exponential distribution.

Mapping of a Random Vector to a Random Variable (n>1, m=1)

We will now investigate the case when a random vector X with known density $\varrho _{\vec {X}}$ is mapped to (scalar) random variable Y and calculate the new density $\varrho _{Y}(y)$ .

According to 3, we find:

\varrho _{Y}(y)={\frac {d}{dy}}\int _{\{{\vec {x}}\in \mathbb {R} ^{n}\mid f({\vec {x}})\leq y\}}\,\varrho _{\vec {X}}({\vec {x}})\,d^{n}x

(8)

The direct evaluation of this equation is sometimes the easiest way, e.g., if there is a known formula for the area or volume presented by the integral. Otherwise one needs to solve a parameter-depended multiple integral.

If the components of the random vector ${\vec {X}}$ are independent, then the probability density factorizes:

\varrho _{\vec {X}}(x_{1},\ldots ,x_{n})=\varrho _{X_{1}}(x_{1})\cdot \ldots \cdot \varrho _{X_{n}}(x_{n})

In this case the delta function may provide a fast tool for the evaluation. Replacing the integration bound with a step function inside the integral, $H(y-f({\vec {x}}))$ and using the fact that the derivative of a step function is the delta:

{\begin{array}{rcl}\varrho _{Y}(y)&=&\int _{\mathbb {R} ^{n}}\varrho _{\vec {X}}(x_{1},\ldots ,x_{n})\,\delta (y-f({\vec {x}}))\,dx_{1}\ldots dx_{n}\\&=&\int _{\mathbb {R} }\varrho _{X_{n}}(x_{n})\cdots \int _{\mathbb {R} }\varrho _{X_{1}}(x_{1})\,\delta (y-f({\vec {x}}))\,dx_{1}\ldots dx_{n}\end{array}}

(9)

If one wants to avoid calculations with the delta function, it is of course possible to evaluate the innermost integral $\int dx_{1}$ , provided that the components are independent:

\varrho _{Y}(y)=\int _{\mathbb {R} ^{n-1}}\,\sum \limits _{x_{1},f({\vec {x}})=y}{\frac {\varrho _{\vec {X}}({\vec {x}})}{\left|{\frac {\partial f({\vec {x}})}{\partial x_{1}}}\right|}}\,dx_{2}\ldots dx_{n}

Examples

Let $Y=f({\vec {X}})=X_{1}+X_{2}$ with the independent, continuous random variable X₁ and X₂. According to equation 9, we have:

{\begin{array}{rcl}\varrho _{Y}(y)&=&\int _{\mathbb {R} }\varrho _{X_{2}}(x_{2})\int _{\mathbb {R} }\varrho _{X_{1}}(x_{1})\,\delta (y-x_{2}-x_{1})\,dx_{1}\,dx_{2}\\&=&\int _{\mathbb {R} }\varrho _{X_{2}}(x_{2})\,\varrho _{X_{1}}(y-x_{2})\,dx_{2}\end{array}}

If one uses the sum formula instead, the sum runs over all x₁, for which

f({\vec {x}})=x_{1}+x_{2}=y

, i.e. where x₁ = y - x₂.

The derivative is

{\frac {\partial (x_{1}+x_{2})}{\partial x_{1}}}=1

, so that one also obtains the equation

\varrho _{Y}(y)=\int _{\mathbb {R} }\varrho _{X_{2}}(x_{2})\,\varrho _{X_{1}}(y-x_{2})\,dx_{2}

.

Integrating over x₂ first leads to the following, equivalent expression:

\varrho _{Y}(y)=\int _{\mathbb {R} }\varrho _{X_{1}}(x_{1})\,\varrho _{X_{2}}(y-x_{1})\,dx_{1}

If $Y=X_{1}-X_{2}$ with independent X₁ and X₂, then $\varrho _{Y}(y)=\int _{\mathbb {R} }\varrho _{X_{1}}(x_{1})\,\varrho _{X_{2}}(x_{1}-y)\,dx_{1}$ .

If $Y=X_{1}\cdot X_{2}$ with independent X₁ and X₂, then $\varrho _{Y}(y)=\int _{\mathbb {R} }\varrho _{X_{1}}(x_{1})\,\varrho _{X_{2}}\left({\frac {y}{x_{1}}}\right){\frac {1}{|x_{1}|}}\,dx_{1}$ .

If $Y={\frac {X_{1}}{X_{2}}}$ with independent X₁ and X₂, then $\varrho _{Y}(y)=\int _{\mathbb {R} }\varrho _{X_{1}}(x_{2}\cdot y)\,\varrho _{X_{2}}(x_{2})\,|x_{2}|\,dx_{2}$ .

Given the independent random variables X₁ and X₂ with the density

\varrho _{\vec {X}}(x_{1},x_{2})={\begin{cases}1/\pi ,&{\text{if }}x_{1}^{2}+x_{2}^{2}\leq 1\\0,&{\text{otherwise}}\end{cases}}

Let

Y:={\sqrt {X_{1}^{2}+X_{2}^{2}}}

. According to equation 8, we need to solve:

\varrho _{Y}(y)={\frac {d}{dy}}\int _{{\bigl \{}{\vec {x}}\in \mathbb {R} ^{n}\mid {\sqrt {x_{1}^{2}+x_{2}^{2}}}\,\leq \,y{\bigr \}}}\,{\frac {1}{\pi }}\,dx_{1}\,dx_{2}

The last integral is over a circle with radius y ≤ 1, hence with the area

\pi y^{2}

. This simplifies the calculation:

0\leq y\leq 1\Rightarrow \varrho _{Y}(y)={\frac {1}{\pi }}\,{\frac {d}{dy}}(\pi y^{2})={\frac {2\pi \,y}{\pi }}=2y

.

If y<0, we integrate over an empty set, which gives 0. If y>1,

\varrho _{\vec {X}}=0

. Therefore, the final result is:

\varrho _{Y}(y)={\begin{cases}2y,&{\text{if }}0\leq y\leq 1\\0,&{\text{otherwise}}\end{cases}}

This example is illustrated in the following graphics:

Figure (a) shows a sketch of random sample points from a uniform distribution in a circle of radius 1, subdivided into rings. In figure (b), we count how many of those points fell into each of the rings with the same width Δv. Since the area of the rings increases linearly with the radius, one can expect more points for larger radii. For Δv → 0, the normalized histogram in (b) will converge to the wanted probability density function ρ_Y. In order to calculate ρ_Y analytically, we first derive the cumulated distribution function F_Y, plotted in figure (d). F_Y(y) is the probability to find a point inside the circle of radius v (shown in grey in figure (c)). For v between 0 and 1, we find

F_{Y}(v)={\frac {A_{v}}{A_{tot}}}={\frac {\pi v^{2}}{\pi 1^{2}}}=v^{2}

. The slope of F_Y is the wanted probability density function

\rho _{Y}(y)={\frac {dF_{Y}(y)}{dy}}=2y

, in agreement with figure (b).

Invertible Transformation of a Random Vector (n=m)

Let ${\vec {X}}=(X_{1},\ldots ,X_{n})$ be a a random vector with the density $\varrho _{\vec {X}}(x_{1},\ldots ,x_{n})$ and let $f:\mathbb {R} ^{n}\to \mathbb {R} ^{n}$ be a diffeomorphism. For the density $\varrho _{\vec {Y}}$ of ${\vec {Y}}:={\vec {f}}({\vec {X}})$ we have:

\int _{G}\varrho _{\vec {X}}({\vec {x}})\,d^{n}x=\int _{f(G)}\varrho _{\vec {X}}(f^{-1}({\vec {y}}))\;\left|{\frac {\partial (x_{1},\ldots ,x_{n})}{\partial (y_{1},\ldots ,y_{n})}}\right|\,d^{n}y

and therefore

ℝⁿ → ℝⁿ mapping

$\varrho _{\vec {Y}}({\vec {y}})=\varrho _{\vec {X}}(f^{-1}({\vec {y}}))\;\left|{\frac {\partial (x_{1},\ldots ,x_{n})}{\partial (y_{1},\ldots ,y_{n})}}\right|\;,$

(10)

where $\Phi _{f^{-1}}=\left|{\frac {\partial (x_{1},\ldots ,x_{n})}{\partial (y_{1},\ldots ,y_{n})}}\right|$ is the Jacobian determinant of $f^{-1}$ . Note that $\Phi _{f^{-1}}=\left(\Phi _{f}\right)^{-1}$ . In the one-dimensional case (n=1), equation 10 coincides with equation 5.

Examples

Given the random vector ${\vec {X}}$ and the invertible matrix A and a vector ${\vec {b}}$ , let ${\vec {Y}}=A{\vec {X}}^{T}+{\vec {b}}$ . Then $\varrho _{\vec {Y}}({\vec {y}})=\varrho _{\vec {X}}\left(A^{-1}\,({\vec {y}}-{\vec {b}})\right)\;\left|\det A^{-1}\right|$ . Also, $\det A^{-1}=1/\det A$ .
Given the independent random variables $X_{1}$ and $X_{2}$ , we introduce polar coordinates $Y_{1}={\sqrt {X_{1}^{2}+X_{2}^{2}}}$ and $Y_{2}=\operatorname {atan2} (X_{2},X_{1})$ . The inverse map is $X_{1}=Y_{1}\,\cos Y_{2}$ and $X_{2}=Y_{1}\,\sin Y_{2}$ . Due to Jacobian determinant $y_{1}$ , the wanted density is $\varrho _{\vec {Y}}(y_{1},y_{2})=y_{1}\,\varrho _{\vec {X}}(y_{1}\,\cos y_{2},\;y_{1}\,\sin y_{2})$ .

Possible simplifications for multidimensional mappings (n>1, m>1)

Even if none of the above special cases apply, simplifications can still be possible. Some of them are listed below:

Independent Target-Components

If one knows beforehand that the components of $Y_{i}$ will be independent, i.e.

\varrho _{\vec {Y}}(y_{1},\ldots ,y_{n})=\varrho _{Y_{1}}(y_{1})\cdot \ldots \cdot \varrho _{Y_{n}}(y_{n})\;,

then the density $\varrho _{Y_{i}}$ of each component $Y_{i}=f_{i}({\vec {X}})$ can be calculated like in the above section Mapping of a Random Vector to a Random Variable.

Example

Given the random vector

{\vec {X}}=(X_{1},X_{2},X_{3},X_{4})

with independent components.

Let

f:\mathbb {R} ^{4}\to \mathbb {R} ^{2}

,

{\vec {Y}}={\vec {f}}({\vec {X}}):=(X_{1}+X_{2},X_{3}+X_{4})

.

Obviously, the components Y₁ = X₁ + X₂ and Y₂ = X₃ + X₄ are independent and therefore:

\varrho _{Y_{1}}(y_{1})=\int _{\mathbb {R} }\varrho _{X_{1}}(y_{1}-x_{2})\,\varrho _{X_{2}}(x_{2})\,dx_{2}

and

\varrho _{Y_{2}}(y_{2})=\int _{\mathbb {R} }\varrho _{X_{3}}(y_{2}-x_{4})\,\varrho _{X_{4}}(x_{4})\,dx_{4}

.

Note that the components of

{\vec {Y}}

can be independent even if the components of

{\vec {X}}

are not.

Split Integral Region

Sometimes it is useful to split the integral region in equation 3 into parts that can be evaluate separately. This can be made explicit by rewriting 3 with delta functions:

\varrho _{\vec {Y}}({\vec {y}})=\int _{\mathbb {R} ^{n}}\varrho _{\vec {X}}({\vec {x}})\,\delta (y_{1}-f({\vec {x}})_{1})\cdot \ldots \cdot \delta (y_{m}-f({\vec {x}})_{m})\,d^{n}x

and then use the identity $\delta (x-x_{0})=\int _{\mathbb {R} }\delta (x-\xi )\,\delta (\xi -x_{0})\,d\xi$ .

Example

To illustrate the idea, we use a simple ℝⁿ → ℝ example: Let Y = X₁² + X₂² + X₃ with

\varrho _{\vec {X}}(x_{1},x_{2},x_{3})={\begin{cases}{\frac {e^{-x_{3}}}{\pi }},&{\text{if }}x_{1}^{2}+x_{2}^{2}\leq 1\wedge x_{3}\geq 0\\0,&{\text{else}}\end{cases}}

The parametrisation of the region where at the same time x₁² + x₂² + x₃ ≤ y and x₁² + x₂² ≤ 1 and x₃ ≥ 0 may not be obvious, so we use the two above formulas:

{\begin{array}{rcl}\varrho _{Y}(y)&=&\int _{\mathbb {R} ^{3}}\varrho _{\vec {X}}({\vec {x}})\,\delta (y-f({\vec {x}}))\,dx_{1}\,dx_{2}\,dx_{3}\\&=&\int _{0}^{\infty }\iint _{x_{1}^{2}+x_{2}^{2}\leq 1}{\frac {e^{-x_{3}}}{\pi }}\,\delta (y-x_{1}^{2}-x_{2}^{2}-x_{3})\,dx_{1}\,dx_{2}\,dx_{3}\\&=&\int _{0}^{\infty }\iint _{x_{1}^{2}+x_{2}^{2}\leq 1}{\frac {e^{-x_{3}}}{\pi }}\,\int _{\mathbb {R} }\delta (\xi -x_{1}^{2}-x_{2}^{2})\,\delta (y-x_{3}-\xi )\,d\xi \,\,dx_{1}\,dx_{2}\,dx_{3}\\&=&\int _{0}^{\infty }\,\int _{\mathbb {R} }\left[\iint _{x_{1}^{2}+x_{2}^{2}\leq 1}{\frac {e^{-x_{3}}}{\pi }}\delta (\xi -x_{1}^{2}-x_{2}^{2})\,dx_{1}\,dx_{2}\right]\,\delta (y-x_{3}-\xi )\,d\xi \,dx_{3}\end{array}}

Now, we have split the integral such that the expression in brackets can be evaluated separately, because the region depends on x₁ and x₂ only and may contain x₃ only as parameter.

\iint _{x_{1}^{2}+x_{2}^{2}\leq 1}{\frac {e^{-x_{3}}}{\pi }}\delta (\xi -x_{1}^{2}-x_{2}^{2})\,dx_{1}\,dx_{2}={\begin{cases}e^{-x_{3}},&{\text{if }}0\leq \xi \leq 1\\0,&{\text{else}}\end{cases}}

Therefore:

\varrho _{Y}(y)=\int _{0}^{\infty }\int _{0}^{1}e^{-x_{3}}\,\delta (y-x_{3}-\xi )\,d\xi \,dx_{3}=\int _{\max(0,y-1)}^{y}e^{-x_{3}}\,dx_{3}={\begin{cases}e^{1-y}-e^{-y},&{\text{if }}y>1\\1-e^{-y},&{\text{if }}0\leq y\leq 1\\0,&{\text{if }}y\leq 0\\\end{cases}}

Helper Coordinates

If f is injective, then it can be easier to introduce additional helper coordinates Y_m+1 to Y_n, then do the $\mathbb {R} ^{n}\to \mathbb {R} ^{n}$ transformation from section Invertible Transformation of a Random Vector and finally integrate out all helper coordinates of the so-obtained density.

Example

Given the random vector

{\vec {X}}=(X_{1},X_{2},X_{3})

with the density

\varrho _{\vec {X}}({\vec {x}})

and the following mapping:

{\begin{pmatrix}Y_{1}\\Y_{2}\end{pmatrix}}={\begin{pmatrix}1&2&3\\4&5&6\end{pmatrix}}{\begin{pmatrix}X_{1}\\X_{2}\\X_{3}\end{pmatrix}}

Now we introduce the helper coordinate Y₃ = X₃, which results in the transformation matrix

A={\begin{pmatrix}1&2&3\\4&5&6\\0&0&1\end{pmatrix}}

with the corresponding pdf

\varrho _{\vec {X}}(A^{-1}\,{\vec {y}})\;\left|\det A^{-1}\right|

. Thus, we finally obtain

\varrho _{\vec {Y}}(y_{1},y_{2})=\int _{\mathbb {R} }\varrho _{\vec {X}}\left(A^{-1}\,{\begin{pmatrix}y_{1}\\y_{2}\\y_{3}\end{pmatrix}}\right)\;\left|\det A^{-1}\right|\;dy_{3}\;.

Remark: If the joint pdf

\varrho _{\vec {Y}}(y_{1},y_{2})

, i.e. the conditional distribution, is not of interest and one is only interested in the marginal distribution with

\varrho _{Y_{1}}(y_{1})=\int _{\mathbb {R} }\varrho _{\vec {Y}}(y_{1},y_{2})\,dy_{2}

, then it is possible to calculate the density as described in section Mapping of a Random Vector to a Random Variable for the mapping Y₁ = 1 X₁ + 2 X₂ + 3 X₃ (and likewise for Y₂ = 4 X₁ + 5 X₂ + 6 X₃).

Real-World Applications

In order to show some possible applications, we present the following questions, which can be answered using the techniques outlines in this Wikibook. In principle, the answers could also be approximated using a numerical random number simulation: generate several realizations of ${\vec {X}}$ , calculate ${\vec {Y}}=f({\vec {X}})$ and make a histogram of the results. However, many such random numbers are needed for reasonable results, especially for higher-dimensional random vectors. Gladly, we can always calculate the resulting distribution analytically using the above formulas.

Statistical Physics

Suppose atoms in a laser are moving with normally distributed velocities V_x, $\varrho _{V_{x}}(v_{x})={\frac {\exp[-v_{x}^{2}/2\sigma ^{2}]}{\sqrt {2\pi \sigma ^{2}}}}$ , σ² = k_BT/m. Due to the Doppler effect, light emitted with frequency f₀ by an atom moving with v_x will be detected as f ≈ f₀ ( 1 + v_x / c ). Hence, f is a function of V_x. What does the detected spectrum, $\varrho _{f}$ , look like? (Answer: Gaussian around f₀.)
Suppose the velocity components of an ideal gas (V_x, V_y, V_z) are identically, independently normally distributed as in the last example. What is the probability density $\varrho _{V}$ of $V={\sqrt {V_{x}^{2}+V_{y}^{2}+V_{z}^{2}}}$ ? (The answer is known as Maxwell-Boltzmann distribution.)

Quantifying Uncertainty of Derived Properties

Suppose we do not know the exact value of X and Y, but we can assign a probability distribution to each of them. What is the distrubution of the derived property Z = X² / Y and what is the mean value and standard deviation of Z? (To tackle such problems, linearisation around the mean values are sometimes used and both X and Y are assumed to be normally distributed. However, we are not limited to such restrictions.)
Suppose we consider the value of one gram gold, silver and platinum in one year from now as independent random variables G, S and P, respectively. Box A contains 1 gram gold, 2 gram silver and 3 gram platinum. Box B contains 4, 5 and 6 gram, respectively. Thus, ${\begin{pmatrix}A\\B\end{pmatrix}}={\begin{pmatrix}1&2&3\\4&5&6\end{pmatrix}}{\begin{pmatrix}G\\S\\P\end{pmatrix}}$ . What is the value of the contents in box A (or box B) in one year from now? (The answer is given in an example above.) Note that A and B are correlated.

Note that the above examples assume the distribution of ${\vec {X}}$ to be known. If it is unknown, or if the calculation is based on only a few data points, methods from mathematical statistics are a better choice to quantify uncertainty.

Generation of Correlated Random Numbers

Correlated random numbers can be obtained by first generating a vector of uncorrelated random numbers and then applying a function on them.

In order to obtain random numbers with covariance matrix C_Y, we can use the following know procedure: Calculate the Cholesky decomposition C_Y = A A^T. Generate a vector ${\vec {x}}$ of uncorrelated random numbers with all var(X_i) = 1. Apply the matrix A: ${\vec {Y}}=A{\vec {X}}$ . This will result in correlated random variables with covariance matrix C_Y = A A^T.

With the formulas outlined in this Wikibook, we can additionally study the shape of the resulting distribution and the effect of non-linear transformations. Consider, e.g., that X is uniform distributed in [0, 2π], Y₁ = sin(X) and Y₂ = cos(X). In this case, a 2D plot of random numbers from (Y₁, Y₂) will show a uniform distribution on a circle. Although Y₁ and Y₂ are stochastically dependent, they are uncorrelated. It is therefore important to know the resulting distribution, because $\varrho _{\vec {Y}}(y_{1},y_{1})$ has more information than the covariance matrix C_Y.

Probability/Transformation of Probability Densities

General Problem and Solution (n-to-m mapping)

Function of a Random Variable (n=1, m=1)

Derivation using cumulated distribution function Fx

Derivation using Integral Substitution

Derivation using the Delta Distribution

Examples

Mapping of a Random Vector to a Random Variable (n>1, m=1)

Examples

Invertible Transformation of a Random Vector (n=m)

Examples

Possible simplifications for multidimensional mappings (n>1, m>1)

Independent Target-Components

Example

Split Integral Region

Example

Helper Coordinates

Example

Real-World Applications

Statistical Physics

Quantifying Uncertainty of Derived Properties

Generation of Correlated Random Numbers

Derivation using cumulated distribution function F_x