Last modified on 24 February 2014, at 15:36

Partial Differential Equations/Distributions

In order to solve partial differential equations, distributions can be very, very helpful. Sometimes they lead to really easy ways to find a cool solution for a partial differential equation.

In this chapter it can be found what distributions are, what they are like, and how you can modify them. How you can actually apply them, you can see not in this, but in the next chapter (about fundamental solutions, Green's functions and Green's kernels).

Important preliminary definitionsEdit

The support of a functionEdit

Let f: \R^d \to \R be a function. We define the support of f as follows:

\text{supp } f := \overline{\{x \in \R^d | f(x) \neq 0\}}


A multiindex \alpha is a vector with entries of natural numbers and zero, i. e. for some k \in \N: \alpha \in \N_0^k.

The absolute value of a multiindex \alpha = (\alpha_1, \ldots, \alpha_k) is defined by |\alpha| := \sum_{i = 0}^k \alpha_i

For a given multiindex \alpha, we define the \alphath derivative als follows:

\frac{\partial^\alpha}{\partial x^\alpha} := \frac{\partial^{|\alpha|}}{\partial x_1^{\alpha_1} \cdots \partial x_d^{\alpha_d}}

For a given vector x = (x_1, \ldots, x_d) \in \R^d and a multiindex \alpha \in \N_0^d we define x to the power of \alpha as follows:

x^\alpha := x_1^{\alpha_1} \cdots x_d^{\alpha_d}


Let x \in \R^d, r \in \R_+. Then

B_r(x) := \{y \in R^d : \|x - y\| < r\}

Set additionEdit

Let A, B \subseteq \R^d. We define the addition of A and B as follows:

A + B := \{x + y : x \in A, y \in B\}


  • B_R(x) + B_r(0) = B_{R+r}(x)
  • \R^d + A = \R^d, for every A \subseteq \R^d.

The bump functionsEdit

Definition of a bump functionEdit

Let U \subseteq \mathbb R^d be an open subset of \mathbb R^d. We call \phi: U \to \R a bump function, if and only if the two conditions hold:

  1. \phi \in C^\infty (U) (this means \phi is infinitely often differentiable)
  2. and also \text{supp } \phi is compact.
The standard mollifier

Example: The standard mollifier, given by

\eta(x) = \frac{1}{z}\begin{cases} e^{-\frac{1}{1-\|x\|^2}}& \text{ if } \|x\| < 1\\
                 0& \text{ if } \|x\|\geq 1

, where z := \int_{B_1(0)} e^{-\frac{1}{1-\|x\|^2}} dx, is a bump function.

The space of bump functionsEdit

We define the space of bump functions for a domain \Omega \subseteq \R^d (a domain is an open and connected set) as the set of all bump functions on this domain:

\mathcal D(\Omega) := \{ \phi \in C^{\infty}(\Omega) \,|\, \operatorname{supp}\, \phi \text{ is a compact subset of } \Omega \}

This space has a notion of convergence: We say that a sequence of bump functions (\phi_i)_{i \in \N} converges to another bump function \phi iff the following two conditions are satisfied:

  1. There is a compact set K \subset \Omega such that \forall i \in \N: \text{supp } \phi_i \subseteq K and:
\lim_{i \rightarrow \infty} \sup_{x\in K} 
\frac{\partial^\alpha}{\partial x^\alpha} 
\left( \phi_i (x) - \phi(x) \right)
\right| = 0 for every multi-index \alpha \in \N_0^d

The Schwartz functionsEdit

The function f(x, y) = e^{-x^2-y^2}

Definition of a Schwartz functionEdit

We call \phi: \R^d \to \mathbb R a Schwartz function, if and only if the following two conditions hold:

  1. \phi \in C^\infty (U) (this means again \phi is infinitely often differentiable)
  2. \forall \alpha, \beta \in \mathbb{N}_0^d: \sup_{x\in\mathbf{R}^d} \left |x^\alpha \frac{\partial^\beta}{\partial x^\beta} \phi(x) \right | < \infty

Example: The function

f: \R^2 \to \R, f(x, y) = e^{-x^2-y^2}

is a Schwartz function.

The space of Schwartz functionsEdit

Analoguously to the space of bump functions, we can also define the space of Schwartz functions:

\mathcal{S}(\R^d) := \left\{ \phi \in C^\infty(\R^d) \,\Big|\, \forall \alpha, \beta \in \mathbb{N}_0^d: \; \sup_{x\in\mathbf{R}^d} \left |x^\alpha \frac{\partial^\beta}{\partial x^\beta} \phi(x) \right | <\infty\; \right\}

The space of Schwartz functions also has a notion of convergence: We say that the sequence of Schwartz functions (\phi_i)_{i \in \N} converges to \phi iff the following condition is satisfied:

\forall \alpha, \beta \in \mathbb{N}_0^d: \sup_{x\in\mathbf{R}^d} \left |x^\alpha \frac{\partial^\beta}{\partial x^\beta} (\phi_i(x) - \phi(x)) \right |  \to 0, i \to \infty

Relations between bump functions and Schwartz functionsEdit

Theorem 1.1Edit

Every bump function is also a Schwartz function, i. e. \forall \text{ domain } \Omega \subseteq \R^d : \mathcal D(\Omega) \subset \mathcal S(\R^d).


A bump function has compact support. Outside the support, the bump function and all the derivatives are zero, because the bump function is constantly zero there. The support is a compact set, and therefore the absolute value of bump function itself and all the derivatives, which are continuous due to the properties of a bump function, attain their maximum there, as well as the function x \mapsto |x^\alpha|, which is also continuous. (see Wikipedia: Extreme value theorem). But since the function and all the derivatives are zero outside the support, this is a global maximum of the absolute value. We furthermore obtain: \left|x^\alpha \frac{\partial^\beta}{\partial x^\beta} \phi(x) \right | = |x^\alpha| \left| \frac{\partial^\beta}{\partial x^\beta} \phi(x) \right| \le \left( \max_{x \in \text{supp } \phi} |x^\alpha| \right) \left(\max_{x \in \text{supp } \phi} \left| \frac{\partial^\beta}{\partial x^\beta} \phi(x) \right| \right) < \infty for every multiindices \alpha, \beta \in \N_0^d, where the last inequality is true, because \left|\frac{\partial^\beta}{\partial x^\beta} \phi(x) \right | is zero outside \text{supp } \phi. This is what we wanted.

Theorem 1.2Edit

Let (\phi_i)_{i \in \N} be an arbitrary sequence of bump functions. If \phi_i \to \phi with respect to the notion of convergence for bump functions, then also \phi_i \to \phi with respect to the notion of convergence for Schwartz functions.


Let K \subset R^d be the compact set in which all the \text{supp } \phi_i are contained. In \R^d, ‘compact’ is the same as ‘bounded and closed’. Therefore, K \subseteq B_M(0) for some M > 0. Then we have for every multiindices \alpha, \beta \in \N_0^d that

\sup_{x \in R^d} \left| x^\alpha \frac{\partial^\beta}{\partial x^\beta} (\phi_i(x) - \phi(x)) \right| = \sup_{x \in K} \left|x^\alpha \frac{\partial^\beta}{\partial x^\beta} (\phi_i(x) - \phi(x)) \right|
\le \sup_{x \in K} |x^\alpha| \left| \frac{\partial^\beta}{\partial x^\beta} (\phi_i(x) - \phi(x)) \right| \le M^{|\alpha|} \sup_{x \in K} \left| \frac{\partial^\beta}{\partial x^\beta} (\phi_i(x) - \phi(x)) \right| \to 0, i \to \infty

due to the definition of convergence for bump functions. Therefore the sequence converges with respect to the notion of convergence for Schwartz functions.


Definition: DistributionsEdit

Let \mathcal A be a function space with a notion of convergence. A distribution T is a mapping T: \mathcal A \to \R with two properties:

  1. T is linear
  2. T is continuous; i. e. if \phi_i \to \phi in the notion of convergence of the function space, then it must follow that T \phi_i \to T \phi in the ordinary notion of convergence in the real numbers known from first semester Analysis (i. e. |T \phi_i - T \phi| \to 0, i \to \infty)

If \mathcal A is the space of the bump functions, we call a distribution T: \mathcal D \to \R a distribution (because usually distributions are distributions with the bump functions as function space). If however \mathcal A is the space of Schwartz distributions, then we call a distribution T: \mathcal S \to \R a tempered distribution.


An example for a distribution is the dirac delta distribution for an a \in \R^d, which is defined by

\delta_a(\phi) := \phi(a)

for functions \phi: \R^d \to \R.

Regular distributionsEdit

Let f be a function and \mathcal A \subseteq L^\infty be a function space, where L^\infty denotes the set of the essentially bounded functions (i. e. the functions which are below a certain constant exept for a Lebesgue nullset). Then we can define a mapping \mathcal A \to \R as follows:

T_f (\varphi) := \int_{\R^n} \varphi(x) f(x) dx

We call a distribution T a regular distribution, if and only if there is a function f such that T = T_f.

Theorem 1.3Edit

The following three claims are true:

  1. If f is an integrable function and \mathcal A \subseteq L^\infty(\R^d), where the inverse of the embedding Id: \mathcal A \to L^\infty is continuous, then T_f as defined above is a distribution.
  2. If f is a locally integrable function, \Omega \subseteq \R^d is a domain and \mathcal A = \mathcal D(\Omega), then T_f as defined above is a distribution.
  3. If f \in L^2(\R^d) and \mathcal A = \mathcal S(\R^d), then T_f as defined above is a distribution.


1) The linearity is due to the linearity of the integral. Well-definedness follows from the calculation

\int_{\R^d} |\varphi(x) f(x)| dx \le \|\varphi\|_{L^\infty} \|f\|_{L^1}

Since the inverse of the embedding is continuous, we have

\|\varphi_i - \varphi\|_{\mathcal A} \to 0 \Rightarrow \|\varphi_i - \varphi\|_{L^\infty} \to 0

Therefore, continuity follows from

|T_f \varphi_i - T_f \varphi| = \left| \int_\Omega (\varphi_i - \varphi)(x) f(x) dx \right| \le \|\varphi_i - \varphi\|_{L^\infty} \underbrace{\int_\Omega f(x) dx}_{\text{constant}} \to 0, i \to \infty

2) The proof follows by observing that f \in L^1(\text{supp } \varphi), since \text{supp } \varphi is bounded, and that the notion of convergence in \mathcal D(\Omega) requires that if \phi_i \to \phi, then there exists a compact set K \subset \R^d such that \forall i \in \N : \text{ supp} \phi_i \subseteq K, and then performing almost the same calculations as above.

3) Due to the triangle inequality for integrals and Hölder's inequality, we have

|T_f(\phi_i) - T_f(\phi)| \le \int_{\R^d} |(\phi_i - \phi)(x)| |f(x)| dx \le \|\phi_i - \phi\|_{L^2} \|f\|_{L^2}

But we furthermore have

\|\phi_i - \phi\|_{L^2}^2 & \le \|\phi_i - \phi\|_{L^\infty} \int_{\R^d} |(\phi_i - \phi)(x)| dx \\
& = \|\phi_i - \phi\|_{L^\infty} \int_{\R^d} \prod_{j=1}^d (1 + x_j^2) |(\phi_i - \phi)(x)| \frac{1}{\prod_{j=1}^d (1 + x_j^2)} dx \\
& \le \|\phi_i - \phi\|_{L^\infty} \|\prod_{j=1}^d (1 + x_j^2) (\phi_i - \phi)\|_{L^\infty} \underbrace{\int_{\R^d} \frac{1}{\prod_{j=1}^d (1 + x_j^2)} dx}_{= \pi^d}

If \phi_i \to \phi in the notion of convergence of the Schwartz function space, then this expression goes to zero. Therefore, continuity is verified. Linearity again follows by the properties of the integral. Well-definedness follows from

\int_{\R^d} |\phi(x)| |f(x)| dx \le \|\phi\|_{L^2} \|f\|_{L^2} < \infty

Distribution spacesEdit

If \mathcal A (\Omega) is a function space of functions defined on \Omega with a notion of convergence, then the set of all distributions on this space is usually denoted with \mathcal A ' (\Omega). This set is also called a "distribution space". It is the dual space of \mathcal A (\Omega).

Theorem 1.4Edit

\forall \text{ domain } \Omega \subseteq \R^d : \mathcal D' (\Omega) \supset \mathcal S' (\R^d)

Proof: Let T \in \mathcal S' (\R^d), let \phi_i \to \phi be a convergent sequence of bump functions with their limit, and let \varphi, \psi be two bump functions.

Theorem 1.1 gives us that \phi_i are Schwartz functions.

Theorem 1.2 gives us that \phi_i \to \phi in the sense of Schwartz functions.

From these two statements we can conclude due to T \in \mathcal S' (\R^d), that T \phi_i \to T\phi.

Theorem 1.1 tells us furthermore that \varphi, \psi are Schwartz functions. From this we can conclude due to T \in \mathcal S' (\R^d) that T(\alpha \varphi + \beta \psi) = \alpha T \varphi + \beta T \psi.

This completes the proof.

Operations on DistributionsEdit

Lemma 1.5Edit

Let \mathcal A(\Omega_1), \mathcal A(\Omega_2) \subseteq L^\infty(\R^d) be function spaces, and L: \mathcal A (\Omega_1) \to L^1_\text{loc} (\Omega_2) be a linear function.

If there exists a linear operator L^*: \mathcal A (\Omega_2) \to \mathcal A (\Omega_1), which is sequentially continuous[1], and it holds that:

\int\limits_{\Omega_1} \varphi(x) (L^*\psi)(x) dx = \int\limits_{\Omega_2} (L \varphi)(x) \psi(x) dx

Then, under these conditions, we may define the operator

\tilde L: \mathcal A' (\Omega_1) \to \mathcal A' (\Omega_2), (\tilde L T) (\varphi) = T(L^* \varphi)

, which really maps to \mathcal A' (\Omega_2), and for regular distributions and f \in \mathcal A(\Omega_1) it will have the property

\tilde L T_f = T_{Lf}

Proof: Well-definedness follows from the fact that L^* \varphi is a function of \mathcal A (\Omega_1) due to the first requirement on L^*. Linearity follows from the linearity of T and linearity of L^*:

(\tilde L T)(\varphi + \psi) := T(L^*(\varphi+\psi)) = T(L^* \varphi + L^* \psi) = T(L^* \varphi) + T(L^* \psi) =: (\tilde L T)(\varphi) + (\tilde L T)(\psi)

Continuity follows just the same way from continuity of T and L^*: Let \phi_i \to \phi w.r.t. the notion of conv. of \mathcal A (\Omega_2). Then

\phi_i \to \phi \Rightarrow L^* \phi_i \to L^* \phi \Rightarrow T (L^* \phi_i) \to T (L^* \phi) \Leftrightarrow: (\tilde L T) (\phi_i) \to (\tilde L T) (\phi)

The property

\tilde L T_f = T_{Lf}

follows directly from the equation

\int\limits_{\Omega_1} \varphi(x) (L^*\psi)(x) dx = \int\limits_{\Omega_2} (L \varphi)(x) \psi(x) dx.

Multiplication by a smooth functionEdit

Let \psi be a smooth function ("smooth" means it is \infty often differentiable). Then, by defining L(\phi(x)) = \psi(x) \cdot \phi(x) and L^*(\varphi(x)) = \psi(x) \cdot \varphi(x), we meet the requirements of the above lemma and may define multiplication of distributions by smooth functions as follows:

Let T \in \mathcal A ', then \psi \cdot T(\varphi) := T(\psi \cdot \varphi)


For the bump functions and the Schwartz functions, we also may define the differentiation of distributions. Let k \in \N and L = \sum_{|\alpha| \le k} a_\alpha (x) \frac{\partial^\alpha}{\partial x^\alpha}. Let's now define

L^*(\phi) := \sum_{|\alpha| \le k} (-1)^{|\alpha|}\frac{\partial^\alpha}{\partial x^\alpha} (a_\alpha (x) \phi (x)).

Then, for the spaces \mathcal A (\Omega_1) = \mathcal A (\Omega_2) = \mathcal D(\Omega) or \mathcal S(\R^d), the requirements for the above lemma 1.4 are met and we may define the differentiation of distribution in the following way:

L T(\varphi) := T(L^* \varphi)

This definition also satisfies LT_f = T_{Lf}.

Proof: By integration by parts, we obtain:

\int_\Omega \phi(x) \alpha(x) \frac{\partial}{\partial x_i} \psi(x) dx = -\int_\Omega \frac{\partial}{\partial x_i} (\phi(x) \alpha(x)) \psi(x) dx + \int_{\partial \Omega} \alpha(x) \phi(x) \psi(x) \nu_i(x) dx

, where \nu_i is the i-th component of the outward normal vector and \partial \Omega is the boundary of \Omega. For bump functions, the boundary integral \int_{\partial \Omega} \alpha(x) \phi(x) \psi(x) \nu_i(x) dx vanishes anyway, because the functions in \mathcal D (\Omega) are zero there. For Schwartz functions, we may use the identity

\int_{\R^d} \phi(x) \alpha(x) \frac{\partial}{\partial x_i} \psi(x) dx = \lim_{r \to \infty} \int_{B_r(0)} \phi(x) \alpha(x) \frac{\partial}{\partial x_i} \psi(x) dx

and the decreasing property of the Schwartz functions to see that the boundary integral goes to zero and therefore

\int_{\R^d} \phi(x) \alpha(x) \frac{\partial}{\partial x_i} \psi(x) dx = -\int_{\R^d} \frac{\partial}{\partial x_i} (\phi(x) \alpha(x)) \psi(x) dx

To derive the equation

\int\limits_{\Omega} \varphi(x) (L^*\psi)(x) dx = \int\limits_{\Omega} (L \varphi)(x) \psi(x) dx

, we may apply the formula from above several times. This finishes the proof, because this equation was the only non-trivial property of L^*, which we need for applying lemma 1.5.


Let \Omega_1 \subseteq \R^d, \Omega_2 \subseteq \R^d be domains, and let \Theta: \Omega_1 \to \Omega_2 be a smooth function from \Omega_1 to \Omega_2, such that for all compact subsets c_2 \subset \Omega_2, \Theta^{-1}(c_2) \subset \Omega_1 is compact. Then we call the function

\Theta^*: \mathcal D(\Omega_2) \to \mathcal D(\Omega_1), \Theta^* (\varphi) = \varphi \circ \Theta

the pull-back of bump functions.

If we choose \Omega_1 = \Omega_2 = \R^d, i. e. \Theta: \R^d \to \R^d is a smooth function from \R^d to \R^d such that for all compact sets c_2 \subset \R^d, \Theta^{-1}(c_1) \subset \R^d is compact, then we also define the pull-back of Schwartz functions just exactly the same way:

\Theta^*: \mathcal S(\R^d) \to \mathcal S(\R^d), \Theta^* (\varphi) = \varphi \circ \Theta

For bump functions and Schwartz functions, we may define the push-forward:

For the bump functions

\Theta_*: \mathcal D'(\Omega_1) \to \mathcal D'(\Omega_2), \Theta_*(T(\phi)) = T(\Theta^*(\phi))

or, for Schwartz functions:

\Theta_*: \mathcal S'(\R^d) \to \mathcal S'(\R^d), \Theta_*(T(\phi)) = T(\Theta^*(\phi))


Let \vartheta \in \mathcal D(B_r(0), and let \Omega_1 \supseteq \Omega_2 + B_r(0). Let's define

L: \mathcal D(\Omega_1) \to C^\infty(\Omega_2), (L \varphi)(y) = (\varphi * \vartheta)(y) := \int_{\Omega} \varphi(x) \vartheta(y - x) dx.

This function (L) is linear, because the integral is linear. It is called the convolution of \vartheta and \varphi.

We can also define: \tilde \vartheta(x) = \vartheta(-x), and:

L^* \varphi := \tilde \vartheta * \varphi

By the theorem of Fubini, we can calculate as follows:

\int_{\Omega_2} (L \varphi)(x) \psi(x) dx = \int_{\Omega_2} \int_{\Omega_1} \vartheta(x - y) \varphi(y) \psi(x) dy dx
= \int_{\Omega_1} \int_{\Omega_2} \vartheta(x - y) \varphi(y) \psi(x) dx dy = \int_{\Omega_1} \varphi(y) (L^*\psi)(y) dy

Therefore, the first assumption for Lemma 1.5 holds.

Due to the Leibniz integral rule, we obtain that for f \in L^1 (i. e. f is integrable) and g \in C^k (\R^d) (i. e. the partial derivatives of g exist up to order k and are also continuous):

\frac{\partial^\alpha}{\partial x^\alpha} (f * g) = f * \left( \frac{\partial^\alpha}{\partial x^\alpha} g \right), |\alpha| \le k

With this formula, we can see (due to the monotony of the integral) that

\sup_{x \in \R^d} \left|\frac{\partial^\alpha}{\partial x^\alpha} (f * g)(x)\right| = \sup_{x \in \R^d} \left| \int_{\R^d} f(y) \frac{\partial^\alpha}{\partial x^\alpha} g(x-y)dy \right| \le \overbrace{\sup_{x \in \R^d} \left| \int_{\R^d} f(y) dy \right|}^{\text{constant}} \cdot \sup_{x \in \R^d} \left| \frac{\partial^\alpha}{\partial x^\alpha} g(x) \right|

From this follows sequential continuity for Schwartz and bump functions by defining f = \vartheta and g = \phi_i - \phi. Thus, with the help of lemma 1.5, we can define the convolution with a distribution of \mathcal D'(\Omega) or \mathcal S'(\R^d) as follows:

(\vartheta * T)(\varphi) := T(\tilde \vartheta * \varphi)


  1. This means in this case that if \phi_i \to \phi with respect to the notion of convergence of \mathcal A (\Omega_2), then must also L^* \phi_i \to L^* \phi w.r.t. (="with respecct to") the notion of convergence of \mathcal A (\Omega_1)