# Introduction and first examples

 Partial Differential Equations Print version The transport equation →

## What is a partial differential equation?

Let ${\displaystyle d\in \mathbb {N} }$ be a natural number, and let ${\displaystyle B\subseteq \mathbb {R} ^{d}}$ be an arbitrary set. A partial differential equation on ${\displaystyle B}$ looks like this:

${\displaystyle \forall (x_{1},\ldots ,x_{d})\in B:h(x_{1},\ldots ,x_{d},u(x_{1},\ldots ,x_{d}),\overbrace {\partial _{x_{1}}u(x_{1},\ldots ,x_{d}),\ldots ,\partial _{x_{d}}u(x_{1},\ldots ,x_{d}),\partial _{x_{1}}^{2}u(x_{1},\ldots ,x_{d}),\ldots } ^{{\text{arbitrary and arbitrarily finitely many partial derivatives, }}n{\text{ inputs of }}h{\text{ in total}}})=0}$

${\displaystyle h}$ is an arbitrary function here, specific to the partial differential equation, which goes from ${\displaystyle \mathbb {R} ^{n}}$ to ${\displaystyle \mathbb {R} }$, where ${\displaystyle n\in \mathbb {N} }$ is a natural number. And a solution to this partial differential equation on ${\displaystyle B}$ is a function ${\displaystyle u:B\to \mathbb {R} }$ satisfying the above logical statement. The solutions of some partial differential equations describe processes in nature; this is one reason why they are so important.

## Multiindices

In the whole theory of partial differential equations, multiindices are extremely important. Only with their help we are able to write down certain formulas a lot briefer.

Definitions 1.1:

A ${\displaystyle d}$-dimensional multiindex is a vector ${\displaystyle \alpha \in \mathbb {N} _{0}^{d}}$, where ${\displaystyle \mathbb {N} _{0}}$ are the natural numbers and zero.

If ${\displaystyle \alpha =(\alpha _{1},\ldots ,\alpha _{d})}$ is a multiindex, then its absolute value ${\displaystyle |\alpha |}$ is defined by

${\displaystyle |\alpha |:=\sum _{k=1}^{d}\alpha _{k}}$

If ${\displaystyle \alpha }$ is a ${\displaystyle d}$-dimensional multiindex, ${\displaystyle B\subseteq \mathbb {R} ^{d}}$ is an arbitrary set and ${\displaystyle u:B\to \mathbb {R} }$ is sufficiently often differentiable, we define ${\displaystyle \partial _{\alpha }u}$, the ${\displaystyle \alpha }$-th derivative of ${\displaystyle u}$, as follows:

${\displaystyle \partial _{\alpha }u:=\partial _{x_{1}}^{\alpha _{1}}\cdots \partial _{x_{d}}^{\alpha _{d}}u}$

## Types of partial differential equations

We classify partial differential equations into several types, because for partial differential equations of one type we will need different solution techniques as for differential equations of other types. We classify them into linear and nonlinear equations, and into equations of different orders.

Definitions 1.2:

A linear partial differential equation is an equation of the form

${\displaystyle \forall x\in B:\sum _{\alpha \in \mathbb {N} _{0}^{d}}a_{\alpha }(x)\partial _{\alpha }u(x)=f(x)}$

, where only finitely many of the ${\displaystyle a_{\alpha }}$s are not the constant zero function. A solution takes the form of a function ${\displaystyle u:B\to \mathbb {R} }$. We have ${\displaystyle B\subseteq \mathbb {R} ^{d}}$ for an arbitrary ${\displaystyle d\in \mathbb {N} }$, ${\displaystyle f:B\to \mathbb {R} }$ is an arbitrary function and the sum in the formula is taken over all possible ${\displaystyle d}$-dimensional multiindices. If ${\displaystyle f=0}$ the equation is called homogenous.

A partial differential equation is called nonlinear iff it is not a linear partial differential equation.

Definition 1.3:

Let ${\displaystyle n\in \mathbb {N} }$. We say that a partial differential equation has ${\displaystyle n}$-th order iff ${\displaystyle n}$ is the smallest number such that it is of the form

${\displaystyle \forall (x_{1},\ldots ,x_{d})\in B\subseteq \mathbb {R} ^{d}:h(x_{1},\ldots ,x_{d},u(x_{1},\ldots ,x_{d}),\overbrace {\partial _{x_{1}}u(x_{1},\ldots ,x_{d}),\ldots ,\partial _{x_{d}}u(x_{1},\ldots ,x_{d}),\partial _{x_{1}}^{2}u(x_{1},\ldots ,x_{d}),\ldots } ^{{\text{partial derivatives at most up to order }}n})=0}$

## First example of a partial differential equation

Now we are very curious what practical examples of partial differential equations look like after all.

Theorem and definition 1.4:

If ${\displaystyle g:\mathbb {R} \to \mathbb {R} }$ is a differentiable function and ${\displaystyle c\in \mathbb {R} }$, then the function

${\displaystyle u:\mathbb {R} ^{2}\to \mathbb {R} ,u(t,x):=g(x+ct)}$

solves the one-dimensional homogenous transport equation

${\displaystyle \forall (t,x)\in \mathbb {R} ^{2}:\partial _{t}u(t,x)-c\partial _{x}u(t,x)=0}$

Proof: Exercise 2.

We therefore see that the one-dimensional transport equation has many different solutions; one for each continuously differentiable function in existence. However, if we require the solution to have a specific initial state, the solution becomes unique.

Theorem and definition 1.5:

If ${\displaystyle g:\mathbb {R} \to \mathbb {R} }$ is a differentiable function and ${\displaystyle c\in \mathbb {R} }$, then the function

${\displaystyle u:\mathbb {R} ^{2}\to \mathbb {R} ,u(t,x):=g(x+ct)}$

is the unique solution to the initial value problem for the one-dimensional homogenous transport equation

${\displaystyle {\begin{cases}\forall (t,x)\in \mathbb {R} ^{2}:&\partial _{t}u(t,x)-c\partial _{x}u(t,x)=0\\\forall x\in \mathbb {R} :&u(0,x)=g(x)\end{cases}}}$

Proof:

Surely ${\displaystyle \forall x\in \mathbb {R} :u(0,x)=g(x+c\cdot 0)=g(x)}$. Further, theorem 1.4 shows that also:

${\displaystyle \forall (t,x)\in \mathbb {R} ^{2}:\partial _{t}u(t,x)-c\partial _{x}u(t,x)=0}$

Now suppose we have an arbitrary other solution to the initial value problem. Let's name it ${\displaystyle v}$. Then for all ${\displaystyle (t,x)\in \mathbb {R} ^{2}}$, the function

${\displaystyle \mu _{(t,x)}(\xi ):=v(t-\xi ,x+c\xi )}$

is constant:

${\displaystyle {\frac {d}{d\xi }}v(t-\xi ,x+c\xi )={\begin{pmatrix}\partial _{t}v(t-\xi ,x+c\xi )&\partial _{x}v(t-\xi ,x+c\xi )\end{pmatrix}}{\begin{pmatrix}-1\\c\end{pmatrix}}=-\partial _{t}v(t-\xi ,x+c\xi )+c\partial _{x}v(t-\xi ,x+c\xi )=0}$

Therefore, in particular

${\displaystyle \forall (t,x)\in \mathbb {R} ^{2}:\mu _{(t,x)}(0)=\mu _{(t,x)}(t)}$

, which means, inserting the definition of ${\displaystyle \mu _{(t,x)}}$, that

${\displaystyle \forall (t,x)\in \mathbb {R} ^{2}:v(t,x)=v(0,x+ct){\overset {\text{initial condition}}{=}}g(x+ct)}$

, which shows that ${\displaystyle u=v}$. Since ${\displaystyle v}$ was an arbitrary solution, this shows uniqueness.${\displaystyle \Box }$

In the next chapter, we will consider the non-homogenous arbitrary-dimensional transport equation.

## Exercises

1. Have a look at the definition of an ordinary differential equation (see for example the Wikipedia page on that) and show that every ordinary differential equation is a partial differential equation.
2. Prove Theorem 1.4 using direct calculation.
3. What is the order of the transport equation?
4. Find a function ${\displaystyle u:\mathbb {R} ^{2}\to \mathbb {R} }$ such that ${\displaystyle \partial _{t}u-2\partial _{x}u=0}$ and ${\displaystyle \forall x\in \mathbb {R} :u(0,x)=x^{3}}$.

## Sources

• Martin Brokate (2011/2012), Partielle Differentialgleichungen, Vorlesungsskript (PDF) (in German) {{citation}}: Check date values in: |year= (help)
• Daniel Matthes (2013/2014), Partial Differential Equations, lecture notes {{citation}}: Check date values in: |year= (help)
 Partial Differential Equations Print version The transport equation →

# The transport equation

 Partial Differential Equations ← Introduction and first examples Print version Test functions →

In the first chapter, we had already seen the one-dimensional transport equation. In this chapter we will see that we can quite easily generalise the solution method and the uniqueness proof we used there to multiple dimensions. Let ${\displaystyle d\in \mathbb {N} }$. The inhomogenous ${\displaystyle d}$-dimensional transport equation looks like this:

${\displaystyle \forall (t,x)\in \mathbb {R} \times \mathbb {R} ^{d}:\partial _{t}u(t,x)-\mathbf {v} \cdot \nabla _{x}u(t,x)=f(t,x)}$

, where ${\displaystyle f:\mathbb {R} \times \mathbb {R} ^{d}\to \mathbb {R} }$ is a function and ${\displaystyle \mathbf {v} \in \mathbb {R} ^{d}}$ is a vector.

## Solution

The following definition will become a useful shorthand notation in many occasions. Since we can use it right from the beginning of this chapter, we start with it.

Definition 2.1:

Let ${\displaystyle f:\mathbb {R} ^{d}\to \mathbb {R} }$ be a function and ${\displaystyle n\in \mathbb {N} }$. We say that ${\displaystyle f}$ is ${\displaystyle n}$ times continuously differentiable iff all the partial derivatives

${\displaystyle \partial _{\alpha }f,\alpha \in \mathbb {N} _{0}^{d}{\text{ and }}|\alpha |\leq n}$

exist and are continuous. We write ${\displaystyle f\in {\mathcal {C}}^{n}(\mathbb {R} ^{d})}$.

Before we prove a solution formula for the transport equation, we need a theorem from analysis which will play a crucial role in the proof of the solution formula.

Theorem 2.2: (Leibniz' integral rule)

Let ${\displaystyle O\subseteq \mathbb {R} }$ be open and ${\displaystyle B\subseteq \mathbb {R} ^{d}}$, where ${\displaystyle d\in \mathbb {N} }$ is arbitrary, and let ${\displaystyle f\in {\mathcal {C}}^{1}(O\times B)}$. If the conditions

• for all ${\displaystyle x\in O}$, ${\displaystyle \int _{B}|f(x,y)|dy<\infty }$
• for all ${\displaystyle x\in O}$ and ${\displaystyle y\in B}$, ${\displaystyle {\frac {d}{dx}}f(x,y)}$ exists
• there is a function ${\displaystyle g:B\to \mathbb {R} }$ such that
${\displaystyle \forall (x,y)\in O\times B:|\partial _{x}f(x,y)|\leq |g(y)|{\text{ and }}\int _{B}|g(y)|dy<\infty }$

hold, then

${\displaystyle {\frac {d}{dx}}\int _{B}f(x,y)dy=\int _{B}{\frac {d}{dx}}f(x,y)}$

We will omit the proof.

Theorem 2.3: If ${\displaystyle f\in {\mathcal {C}}^{1}(\mathbb {R} \times \mathbb {R} ^{d})}$, ${\displaystyle g\in {\mathcal {C}}^{1}(\mathbb {R} ^{d})}$ and ${\displaystyle \mathbf {v} \in \mathbb {R} ^{d}}$, then the function

${\displaystyle u:\mathbb {R} \times \mathbb {R} ^{d}\to \mathbb {R} ,u(t,x):=g(x+\mathbf {v} t)+\int _{0}^{t}f(s,x+\mathbf {v} (t-s))ds}$

solves the inhomogenous ${\displaystyle d}$-dimensional transport equation

${\displaystyle \forall (t,x)\in \mathbb {R} \times \mathbb {R} ^{d}:\partial _{t}u(t,x)-\mathbf {v} \cdot \nabla _{x}u(t,x)=f(t,x)}$

Note that, as in chapter 1, that there are many solutions, one for each continuously differentiable ${\displaystyle g}$ in existence.

Proof:

1.

We show that ${\displaystyle u}$ is sufficiently often differentiable. From the chain rule follows that ${\displaystyle g(x+\mathbf {v} t)}$ is continuously differentiable in all the directions ${\displaystyle t,x_{1},\ldots ,x_{d}}$. The existence of

${\displaystyle \partial _{x_{n}}\int _{0}^{t}f(s,x+\mathbf {v} (t-s))ds,n\in \{1,\ldots ,d\}}$

follows from the Leibniz integral rule (see exercise 1). The expression

${\displaystyle \partial _{t}\int _{0}^{t}f(s,x+\mathbf {v} (t-s))ds}$

we will later in this proof show to be equal to

${\displaystyle f(t,x)+\mathbf {v} \cdot \nabla _{x}\int _{0}^{t}f(s,x+\mathbf {v} (t-s))ds}$,

which exists because

${\displaystyle \nabla _{x}\int _{0}^{t}f(s,x+\mathbf {v} (t-s))ds}$

just consists of the derivatives

${\displaystyle \partial _{x_{n}}\int _{0}^{t}f(s,x+\mathbf {v} (t-s))ds,n\in \{1,\ldots ,d\}}$

2.

We show that

${\displaystyle \forall (t,x)\in \mathbb {R} \times \mathbb {R} ^{d}:\partial _{t}u(t,x)-\mathbf {v} \cdot \nabla _{x}u(t,x)=f(t,x)}$

in three substeps.

2.1

We show that

${\displaystyle \partial _{t}g(x+\mathbf {v} t)-\mathbf {v} \cdot \nabla _{x}g(x+\mathbf {v} t)=0~~~~~(*)}$

This is left to the reader as an exercise in the application of the multi-dimensional chain rule (see exercise 2).

2.2

We show that

${\displaystyle \partial _{t}\int _{0}^{t}f(s,x+\mathbf {v} (t-s))ds-\mathbf {v} \cdot \nabla _{x}\int _{0}^{t}f(s,x+\mathbf {v} (t-s))ds=f(t,x)~~~~~(**)}$

We choose

${\displaystyle F(t,x):=\int _{0}^{t}f(s,x-\mathbf {v} s)ds}$

so that we have

${\displaystyle F(t,x+\mathbf {v} t)=\int _{0}^{t}f(s,x+\mathbf {v} (t-s))ds}$

By the multi-dimensional chain rule, we obtain

{\displaystyle {\begin{aligned}{\frac {d}{dt}}F(t,x+\mathbf {v} t)&={\begin{pmatrix}\partial _{t}F(t,x+\mathbf {v} t)&\partial _{x_{1}}F(t,x+\mathbf {v} t)&\cdots &\partial _{x_{d}}F(t,x+\mathbf {v} t)\end{pmatrix}}{\begin{pmatrix}1\\\mathbf {v} \end{pmatrix}}\\&=\partial _{t}F(t,x+\mathbf {v} t)+\mathbf {v} \cdot \nabla _{x}F(t,x+\mathbf {v} t)\end{aligned}}}

But on the one hand, we have by the fundamental theorem of calculus, that ${\displaystyle \partial _{t}F(t,x)=f(t,x-\mathbf {v} t)}$ and therefore

${\displaystyle \partial _{t}F(t,x+\mathbf {v} t)=f(t,x)}$

and on the other hand

${\displaystyle \partial _{x_{n}}F(t,x+\mathbf {v} t)=\partial _{x_{n}}\int _{0}^{t}f(s,x+\mathbf {v} (t-s))ds}$

, seeing that the differential quotient of the definition of ${\displaystyle \partial _{x_{n}}}$ is equal for both sides. And since on the third hand

${\displaystyle {\frac {d}{dt}}F(t,x+\mathbf {v} t)=\partial _{t}\int _{0}^{t}f(s,x+\mathbf {v} (t-s))ds}$

, the second part of the second part of the proof is finished.

2.3

We add ${\displaystyle (*)}$ and ${\displaystyle (**)}$ together, use the linearity of derivatives and see that the equation is satisfied. ${\displaystyle \Box }$

## Initial value problem

Theorem and definition 2.4: If ${\displaystyle f\in {\mathcal {C}}^{1}(\mathbb {R} \times \mathbb {R} ^{d})}$ and ${\displaystyle g\in {\mathcal {C}}^{1}(\mathbb {R} ^{d})}$, then the function

${\displaystyle u:\mathbb {R} \times \mathbb {R} ^{d}\to \mathbb {R} ,u(t,x):=g(x+\mathbf {v} t)+\int _{0}^{t}f(s,x+\mathbf {v} (t-s))ds}$

is the unique solution of the initial value problem of the transport equation

${\displaystyle {\begin{cases}\forall (t,x)\in \mathbb {R} \times \mathbb {R} ^{d}:&\partial _{t}u(t,x)-\mathbf {v} \cdot \nabla _{x}u(t,x)=f(t,x)\\\forall x\in \mathbb {R} ^{d}:&u(0,x)=g(x)\end{cases}}}$

Proof:

Quite easily, ${\displaystyle u(0,x)=g(x+\mathbf {v} \cdot 0)+\int _{0}^{0}f(s,x+\mathbf {v} (t-s))ds=g(x)}$. Therefore, and due to theorem 2.3, ${\displaystyle u}$ is a solution to the initial value problem of the transport equation. So we proceed to show uniqueness.

Assume that ${\displaystyle v}$ is an arbitrary other solution. We show that ${\displaystyle v=u}$, thereby excluding the possibility of a different solution.

We define ${\displaystyle w:=u-v}$. Then

${\displaystyle {\begin{array}{llll}\forall (t,x)\in \mathbb {R} \times \mathbb {R} ^{d}:&\partial _{t}w(t,x)-\mathbf {v} \cdot \nabla _{x}w(t,x)&=(\partial _{t}u(t,x)-\mathbf {v} \cdot \nabla _{x}u(t,x))-(\partial _{t}v(t,x)-\mathbf {v} \cdot \nabla _{x}v(t,x))&\\&&=f(t,x)-f(t,x)=0&~~~~~(*)\\\forall x\in \mathbb {R} ^{d}:&w(0,x)=u(0,x)-v(0,x)&=g(x)-g(x)=0&~~~~~(**)\end{array}}}$

Analogous to the proof of uniqueness of solutions for the one-dimensional homogenous initial value problem of the transport equation in the first chapter, we define for arbitrary ${\displaystyle (t,x)\in \mathbb {R} \times \mathbb {R} ^{d}}$,

${\displaystyle \mu _{(t,x)}(\xi ):=w(t-\xi ,x+\mathbf {v} \xi )}$

Using the multi-dimensional chain rule, we calculate ${\displaystyle \mu _{(t,x)}'(\xi )}$:

{\displaystyle {\begin{aligned}\mu _{(t,x)}'(\xi )&:={\frac {d}{d\xi }}w(t-\xi ,x+\mathbf {v} \xi )&{\text{ by defs. of the }}'{\text{ symbol and }}\mu \\&={\begin{pmatrix}\partial _{t}w(t-\xi ,x+\mathbf {v} \xi )&\partial _{x_{1}}w(t-\xi ,x+\mathbf {v} \xi )&\cdots &\partial _{x_{d}}w(t-\xi ,x+\mathbf {v} \xi )\end{pmatrix}}{\begin{pmatrix}-1\\\mathbf {v} \end{pmatrix}}&{\text{chain rule}}\\&=-\partial _{t}w(t-\xi ,x+\mathbf {v} \xi )+\mathbf {v} \cdot \nabla _{x}w(t-\xi ,x+\mathbf {v} \xi )&\\&=0&(*)\end{aligned}}}

Therefore, for all ${\displaystyle (t,x)\in \mathbb {R} \times \mathbb {R} ^{d}}$ ${\displaystyle \mu _{(t,x)}(\xi )}$ is constant, and thus

${\displaystyle \forall (t,x)\in \mathbb {R} \times \mathbb {R} ^{d}:w(t,x)=\mu _{(t,x)}(0)=\mu _{(t,x)}(t)=w(0,x+\mathbf {v} t){\overset {(**)}{=}}0}$

, which shows that ${\displaystyle w=u-v=0}$ and thus ${\displaystyle u=v}$.${\displaystyle \Box }$

## Exercises

1. Let ${\displaystyle f\in {\mathcal {C}}^{1}(\mathbb {R} \times \mathbb {R} ^{d})}$ and ${\displaystyle \mathbf {v} \in \mathbb {R} ^{d}}$. Using Leibniz' integral rule, show that for all ${\displaystyle n\in \{1,\ldots ,d\}}$ the derivative

${\displaystyle \partial _{x_{n}}\int _{0}^{t}f(s,x+\mathbf {v} (t-s))ds}$

is equal to

${\displaystyle \int _{0}^{t}\partial _{x_{n}}f(s,x+\mathbf {v} (t-s))ds}$

and therefore exists.

2. Let ${\displaystyle g\in {\mathcal {C}}^{1}(\mathbb {R} ^{d})}$ and ${\displaystyle \mathbf {v} \in \mathbb {R} ^{d}}$. Calculate ${\displaystyle \partial _{t}g(x+\mathbf {v} t)}$.
3. Find the unique solution to the initial value problem

${\displaystyle {\begin{cases}\forall (t,x)\in \mathbb {R} \times \mathbb {R} ^{3}:&\partial _{t}u(t,x)-{\begin{pmatrix}2\\3\\4\end{pmatrix}}\cdot \nabla _{x}u(t,x)=t^{5}+x_{1}^{6}+x_{2}^{7}+x_{3}^{8}\\\forall x\in \mathbb {R} ^{3}:&u(0,x)=x_{1}^{9}+x_{2}^{10}+x_{3}^{11}\end{cases}}}$.

## Sources

 Partial Differential Equations ← Introduction and first examples Print version Test functions →

# Test functions

 Partial Differential Equations ← The transport equation Print version Distributions →

## Motivation

Before we dive deeply into the chapter, let's first motivate the notion of a test function. Let's consider two functions which are piecewise constant on the intervals ${\displaystyle [0,1),[1,2),[2,3),[3,4),[4,5)}$ and zero elsewhere; like, for example, these two:

Let's call the left function ${\displaystyle f_{1}}$, and the right function ${\displaystyle f_{2}}$.

Of course we can easily see that the two functions are different; they differ on the interval ${\displaystyle [4,5)}$; however, let's pretend that we are blind and our only way of finding out something about either function is evaluating the integrals

${\displaystyle \int _{\mathbb {R} }\varphi (x)f_{1}(x)dx}$ and ${\displaystyle \int _{\mathbb {R} }\varphi (x)f_{2}(x)dx}$

for functions ${\displaystyle \varphi }$ in a given set of functions ${\displaystyle {\mathcal {X}}}$.

We proceed with choosing ${\displaystyle {\mathcal {X}}}$ sufficiently clever such that five evaluations of both integrals suffice to show that ${\displaystyle f_{1}\neq f_{2}}$. To do so, we first introduce the characteristic function. Let ${\displaystyle A\subseteq \mathbb {R} }$ be any set. The characteristic function of ${\displaystyle A}$ is defined as

${\displaystyle \chi _{A}(x):={\begin{cases}1&x\in A\\0&x\notin A\end{cases}}}$

With this definition, we choose the set of functions ${\displaystyle {\mathcal {X}}}$ as

${\displaystyle {\mathcal {X}}:=\{\chi _{[0,1)},\chi _{[1,2)},\chi _{[2,3)},\chi _{[3,4)},\chi _{[4,5)}\}}$

It is easy to see (see exercise 1), that for ${\displaystyle n\in \{1,2,3,4,5\}}$, the expression

${\displaystyle \int _{\mathbb {R} }\chi _{[n-1,n)}(x)f_{1}(x)dx}$

equals the value of ${\displaystyle f_{1}}$ on the interval ${\displaystyle [n-1,n)}$, and the same is true for ${\displaystyle f_{2}}$. But as both functions are uniquely determined by their values on the intervals ${\displaystyle [n-1,n),n\in \{1,2,3,4,5\}}$ (since they are zero everywhere else), we can implement the following equality test:

${\displaystyle f_{1}=f_{2}\Leftrightarrow \forall \varphi \in {\mathcal {X}}:\int _{\mathbb {R} }\varphi (x)f_{1}(x)dx=\int _{\mathbb {R} }\varphi (x)f_{2}(x)dx}$

This obviously needs five evaluations of each integral, as ${\displaystyle \#{\mathcal {X}}=5}$.

Since we used the functions in ${\displaystyle {\mathcal {X}}}$ to test ${\displaystyle f_{1}}$ and ${\displaystyle f_{2}}$, we call them test functions. What we ask ourselves now is if this notion generalises from functions like ${\displaystyle f_{1}}$ and ${\displaystyle f_{2}}$, which are piecewise constant on certain intervals and zero everywhere else, to continuous functions. The following chapter shows that this is true.

## Bump functions

In order to write down the definition of a bump function more shortly, we need the following two definitions:

Definition 3.1:

Let ${\displaystyle B\subseteq \mathbb {R} ^{d}}$, and let ${\displaystyle f:B\to \mathbb {R} }$. We say that ${\displaystyle f}$ is smooth if all the partial derivatives

${\displaystyle \partial _{\alpha }f,\alpha \in \mathbb {N} _{0}^{d}}$

exist in all points of ${\displaystyle B}$ and are continuous. We write ${\displaystyle f\in {\mathcal {C}}^{\infty }(B)}$.

Definition 3.2:

Let ${\displaystyle f:\mathbb {R} ^{d}\to \mathbb {R} }$. We define the support of ${\displaystyle f}$, ${\displaystyle {\text{supp }}f}$, as follows:

${\displaystyle {\text{supp }}f:={\overline {\{x\in \mathbb {R} ^{d}|f(x)\neq 0\}}}}$

Now we are ready to define a bump function in a brief way:

Definition 3.3:

${\displaystyle \varphi :\mathbb {R} ^{d}\to \mathbb {R} }$ is called a bump function iff ${\displaystyle \varphi \in {\mathcal {C}}^{\infty }(\mathbb {R} ^{d})}$ and ${\displaystyle {\text{supp }}\varphi }$ is compact. The set of all bump functions is denoted by ${\displaystyle {\mathcal {D}}(O)}$.

These two properties make the function really look like a bump, as the following example shows:

Example 3.4: The standard mollifier ${\displaystyle \eta }$, given by

${\displaystyle \eta :\mathbb {R} ^{d}\to \mathbb {R} ,\eta (x)={\frac {1}{c}}{\begin{cases}e^{-{\frac {1}{1-\|x\|^{2}}}}&{\text{ if }}\|x\|_{2}<1\\0&{\text{ if }}\|x\|_{2}\geq 1\end{cases}}}$

, where ${\displaystyle c:=\int _{B_{1}(0)}e^{-{\frac {1}{1-\|x\|^{2}}}}dx}$, is a bump function (see exercise 2).

## Schwartz functions

As for the bump functions, in order to write down the definition of Schwartz functions shortly, we first need two helpful definitions.

Definition 3.5:

Let ${\displaystyle X}$ be an arbitrary set, and let ${\displaystyle f:X\to \mathbb {R} }$ be a function. Then we define the supremum norm of ${\displaystyle f}$ as follows:

${\displaystyle \|f\|_{\infty }:=\sup \limits _{x\in X}|f(x)|}$

Definition 3.6:

For a vector ${\displaystyle x=(x_{1},\ldots ,x_{d})\in \mathbb {R} ^{d}}$ and a ${\displaystyle d}$-dimensional multiindex ${\displaystyle \alpha \in \mathbb {N} _{0}^{d}}$ we define ${\displaystyle x^{\alpha }}$, ${\displaystyle x}$ to the power of ${\displaystyle \alpha }$, as follows:

${\displaystyle x^{\alpha }:=x_{1}^{\alpha _{1}}\cdots x_{d}^{\alpha _{d}}}$

Now we are ready to define a Schwartz function.

Definition 3.7:

We call ${\displaystyle \phi :\mathbb {R} ^{d}\to \mathbb {R} }$ a Schwartz function iff the following two conditions are satisfied:

1. ${\displaystyle \phi \in {\mathcal {C}}^{\infty }(\mathbb {R} ^{d})}$
2. ${\displaystyle \forall \alpha ,\beta \in \mathbb {N} _{0}^{d}:\|x^{\alpha }\partial _{\beta }\phi \|_{\infty }<\infty }$

By ${\displaystyle x^{\alpha }\partial _{\beta }\phi }$ we mean the function ${\displaystyle x\mapsto x^{\alpha }\partial _{\beta }\phi (x)}$.

Example 3.8: The function

${\displaystyle f:\mathbb {R} ^{2}\to \mathbb {R} ,f(x,y)=e^{-x^{2}-y^{2}}}$

is a Schwartz function.

Theorem 3.9:

Every bump function is also a Schwartz function.

This means for example that the standard mollifier is a Schwartz function.

Proof:

Let ${\displaystyle \varphi }$ be a bump function. Then, by definition of a bump function, ${\displaystyle \varphi \in {\mathcal {C}}^{\infty }(\mathbb {R} ^{d})}$. By the definition of bump functions, we choose ${\displaystyle R>0}$ such that

${\displaystyle {\text{supp }}\varphi \subseteq {\overline {B_{R}(0)}}}$

, as in ${\displaystyle \mathbb {R} ^{d}}$, a set is compact iff it is closed & bounded. Further, for ${\displaystyle \alpha ,\beta \in \mathbb {N} _{0}^{d}}$ arbitrary,

{\displaystyle {\begin{aligned}\|x^{\alpha }\partial _{\beta }\varphi (x)\|_{\infty }&:=\sup _{x\in \mathbb {R} ^{d}}|x^{\alpha }\partial _{\beta }\varphi (x)|&\\&=\sup _{x\in {\overline {B_{R}(0)}}}|x^{\alpha }\partial _{\beta }\varphi (x)|&{\text{supp }}\varphi \subseteq {\overline {B_{R}(0)}}\\&=\sup _{x\in {\overline {B_{R}(0)}}}\left(|x^{\alpha }||\partial _{\beta }\varphi (x)|\right)&{\text{rules for absolute value}}\\&\leq \sup _{x\in {\overline {B_{R}(0)}}}\left(R^{|\alpha |}|\partial _{\beta }\varphi (x)|\right)&\forall i\in \{1,\ldots ,d\},(x_{1},\ldots ,x_{d})\in {\overline {B_{R}(0)}}:|x_{i}|\leq R\\&<\infty &{\text{Extreme value theorem}}\end{aligned}}}

${\displaystyle \Box }$

## Convergence of bump and Schwartz functions

Now we define what convergence of a sequence of bump (Schwartz) functions to a bump (Schwartz) function means.

Definition 3.10:

A sequence of bump functions ${\displaystyle (\varphi _{i})_{i\in \mathbb {N} }}$ is said to converge to another bump function ${\displaystyle \varphi }$ iff the following two conditions are satisfied:

1. There is a compact set ${\displaystyle K\subset \Omega }$ such that ${\displaystyle \forall i\in \mathbb {N} :{\text{supp }}\varphi _{i}\subseteq K}$
2. ${\displaystyle \forall \alpha \in \mathbb {N} _{0}^{d}:\lim _{i\rightarrow \infty }\|\partial _{\alpha }\varphi _{i}-\partial _{\alpha }\varphi \|_{\infty }=0}$

Definition 3.11:

We say that the sequence of Schwartz functions ${\displaystyle (\phi _{i})_{i\in \mathbb {N} }}$ converges to ${\displaystyle \phi }$ iff the following condition is satisfied:

${\displaystyle \forall \alpha ,\beta \in \mathbb {N} _{0}^{d}:\|x^{\alpha }\partial _{\beta }\phi _{i}-x^{\alpha }\partial _{\beta }\phi \|_{\infty }\to 0,i\to \infty }$

Theorem 3.12:

Let ${\displaystyle (\varphi _{i})_{i\in \mathbb {N} }}$ be an arbitrary sequence of bump functions. If ${\displaystyle \varphi _{i}\to \varphi }$ with respect to the notion of convergence for bump functions, then also ${\displaystyle \varphi _{i}\to \varphi }$ with respect to the notion of convergence for Schwartz functions.

Proof:

Let ${\displaystyle O\subseteq \mathbb {R} ^{d}}$ be open, and let ${\displaystyle (\varphi _{l})_{l\in \mathbb {N} }}$ be a sequence in ${\displaystyle {\mathcal {D}}(O)}$ such that ${\displaystyle \varphi _{l}\to \varphi \in {\mathcal {D}}(O)}$ with respect to the notion of convergence of ${\displaystyle {\mathcal {D}}(O)}$. Let thus ${\displaystyle K\subset \mathbb {R} ^{d}}$ be the compact set in which all the ${\displaystyle {\text{supp }}\varphi _{l}}$ are contained. From this also follows that ${\displaystyle {\text{supp }}\varphi \subseteq K}$, since otherwise ${\displaystyle \|\varphi _{l}-\varphi \|_{\infty }\geq |c|}$, where ${\displaystyle c\in \mathbb {R} }$ is any nonzero value ${\displaystyle \varphi }$ takes outside ${\displaystyle K}$; this would contradict ${\displaystyle \varphi _{l}\to \varphi }$ with respect to our notion of convergence.

In ${\displaystyle \mathbb {R} ^{d}}$, ‘compact’ is equivalent to ‘bounded and closed’. Therefore, ${\displaystyle K\subset B_{R}(0)}$ for an ${\displaystyle R>0}$. Therefore, we have for all multiindices ${\displaystyle \alpha ,\beta \in \mathbb {N} _{0}^{d}}$:

{\displaystyle {\begin{aligned}\|x^{\alpha }\partial _{\beta }\varphi _{l}-x^{\alpha }\partial _{\beta }\varphi \|_{\infty }&=\sup _{x\in \mathbb {R} ^{d}}\left|x^{\alpha }\partial _{\beta }\varphi _{l}(x)-x^{\alpha }\partial _{\beta }\varphi (x)\right|&{\text{ definition of the supremum norm}}\\&=\sup _{x\in B_{R}(0)}\left|x^{\alpha }\partial _{\beta }\varphi _{l}(x)-x^{\alpha }\partial _{\beta }\varphi (x)\right|&{\text{ as }}{\text{supp }}\varphi _{l},{\text{supp }}\varphi \subseteq K\subset B_{R}(0)\\&\leq R^{|\alpha |}\sup _{x\in B_{R}(0)}\left|\partial _{\beta }\varphi _{l}(x)-\partial _{\beta }\varphi (x)\right|&\forall i\in \{1,\ldots ,d\},(x_{1},\ldots ,x_{d})\in {\overline {B_{R}(0)}}:|x_{i}|\leq R\\&=R^{|\alpha |}\sup _{x\in \mathbb {R} ^{d}}\left|\partial _{\beta }\varphi _{l}(x)-\partial _{\beta }\varphi (x)\right|&{\text{ as }}{\text{supp }}\varphi _{l},{\text{supp }}\varphi \subseteq K\subset B_{R}(0)\\&=R^{|\alpha |}\left\|\partial _{\beta }\varphi _{l}(x)-\partial _{\beta }\varphi (x)\right\|_{\infty }&{\text{ definition of the supremum norm}}\\&\to 0,l\to \infty &{\text{ since }}\varphi _{l}\to \varphi {\text{ in }}{\mathcal {D}}(O)\end{aligned}}}

Therefore the sequence converges with respect to the notion of convergence for Schwartz functions.${\displaystyle \Box }$

## The ‘testing’ property of test functions

In this section, we want to show that we can test equality of continuous functions ${\displaystyle f,g}$ by evaluating the integrals

${\displaystyle \int _{\mathbb {R} ^{d}}f(x)\varphi (x)dx}$ and ${\displaystyle \int _{\mathbb {R} ^{d}}g(x)\varphi (x)dx}$

for all ${\displaystyle \varphi \in {\mathcal {D}}(O)}$ (thus, evaluating the integrals for all ${\displaystyle \varphi \in {\mathcal {S}}(\mathbb {R} ^{d})}$ will also suffice as ${\displaystyle {\mathcal {D}}(O)\subset {\mathcal {S}}(\mathbb {R} ^{d})}$ due to theorem 3.9).

But before we are able to show that, we need a modified mollifier, where the modification is dependent of a parameter, and two lemmas about that modified mollifier.

Definition 3.13:

For ${\displaystyle R\in \mathbb {R} _{>0}}$, we define

${\displaystyle \eta _{R}:\mathbb {R} ^{d}\to \mathbb {R} ,\eta _{R}(x)=\eta \left({\frac {x}{R}}\right){\big /}R^{d}}$.

Lemma 3.14:

Let ${\displaystyle R\in \mathbb {R} _{>0}}$. Then

${\displaystyle {\text{supp }}\eta _{R}={\overline {B_{R}(0)}}}$.

Proof:

From the definition of ${\displaystyle \eta }$ follows

${\displaystyle {\text{supp }}\eta ={\overline {B_{1}(0)}}}$.

Further, for ${\displaystyle R\in \mathbb {R} _{>0}}$

{\displaystyle {\begin{aligned}{\frac {x}{R}}\in {\overline {B_{1}(0)}}&\Leftrightarrow \left\|{\frac {x}{R}}\right\|\leq 1\\&\Leftrightarrow \|x\|\leq R\\&\Leftrightarrow x\in {\overline {B_{R}(0)}}\end{aligned}}}

Therefore, and since

${\displaystyle x\in {\text{supp }}\eta _{R}\Leftrightarrow {\frac {x}{R}}\in {\text{supp }}\eta }$

, we have:

${\displaystyle x\in {\text{supp }}\eta _{R}\Leftrightarrow x\in {\overline {B_{R}(0)}}}$${\displaystyle \Box }$

In order to prove the next lemma, we need the following theorem from integration theory:

Theorem 3.15: (Multi-dimensional integration by substitution)

If ${\displaystyle O,U\subseteq \mathbb {R} ^{d}}$ are open, and ${\displaystyle \psi :U\to O}$ is a diffeomorphism, then

${\displaystyle \int _{O}f(x)dx=\int _{U}f(\psi (x))|\det J_{\psi }(x)|dx}$

We will omit the proof, as understanding it is not very important for understanding this wikibook.

Lemma 3.16:

Let ${\displaystyle R\in \mathbb {R} _{>0}}$. Then

${\displaystyle \int _{\mathbb {R} ^{d}}\eta _{R}(x)dx=1}$.

Proof:

{\displaystyle {\begin{aligned}\int _{\mathbb {R} ^{d}}\eta _{R}(x)dx&=\int _{\mathbb {R} ^{d}}\eta \left({\frac {x}{R}}\right){\big /}R^{d}dx&{\text{Def. of }}\eta _{R}\\&=\int _{\mathbb {R} ^{d}}\eta (x)dx&{\text{integration by substitution using }}x\mapsto Rx\\&=\int _{B_{1}(0)}\eta (x)dx&{\text{Def. of }}\eta \\&={\frac {\int _{B_{1}(0)}e^{-{\frac {1}{1-\|x\|}}}dx}{\int _{B_{1}(0)}e^{-{\frac {1}{1-\|x\|}}}dx}}&{\text{Def. of }}\eta \\&=1\end{aligned}}}${\displaystyle \Box }$

Now we are ready to prove the ‘testing’ property of test functions:

Theorem 3.17:

Let ${\displaystyle f,g:\mathbb {R} ^{d}\to \mathbb {R} }$ be continuous. If

${\displaystyle \forall \varphi \in {\mathcal {D}}(O):\int _{\mathbb {R} ^{d}}\varphi (x)f(x)dx=\int _{\mathbb {R} ^{d}}\varphi (x)g(x)dx}$,

then ${\displaystyle f=g}$.

Proof:

Let ${\displaystyle x\in \mathbb {R} ^{d}}$ be arbitrary, and let ${\displaystyle \epsilon \in \mathbb {R} _{>0}}$. Since ${\displaystyle f}$ is continuous, there exists a ${\displaystyle \delta \in \mathbb {R} _{>0}}$ such that

${\displaystyle \forall y\in {\overline {B_{\delta }(x)}}:|f(x)-f(y)|<\epsilon }$

Then we have

{\displaystyle {\begin{aligned}\left|f(x)-\int _{\mathbb {R} ^{d}}f(y)\eta _{\delta }(x-y)dy\right|&=\left|\int _{\mathbb {R} ^{d}}(f(x)-f(y))\eta _{\delta }(x-y)dy\right|&{\text{lemma 3.16}}\\&\leq \int _{\mathbb {R} ^{d}}|f(x)-f(y)|\eta _{\delta }(x-y)dy&{\text{triangle ineq. for the }}\int {\text{ and }}\eta _{\delta }\geq 0\\&=\int _{\overline {B_{\delta }(0)}}|f(x)-f(y)|\eta _{\delta }(x-y)dy&{\text{lemma 3.14}}\\&\leq \int _{\overline {B_{\delta }(0)}}\epsilon \eta _{\delta }(x-y)dy&{\text{monotony of the }}\int \\&\leq \epsilon &{\text{lemma 3.16 and }}\eta _{\delta }\geq 0\end{aligned}}}

Therefore, ${\displaystyle \int _{\mathbb {R} ^{d}}f(y)\eta _{\delta }(x-y)dy\to f(x),\delta \to 0}$. An analogous reasoning also shows that ${\displaystyle \int _{\mathbb {R} ^{d}}g(y)\eta _{\delta }(x-y)dy\to g(x),\delta \to 0}$. But due to the assumption, we have

${\displaystyle \forall \delta \in \mathbb {R} _{>0}:\int _{\mathbb {R} ^{d}}g(y)\eta _{\delta }(x-y)dy=\int _{\mathbb {R} ^{d}}f(y)\eta _{\delta }(x-y)dy}$

As limits in the reals are unique, it follows that ${\displaystyle f(x)=g(x)}$, and since ${\displaystyle x\in \mathbb {R} ^{d}}$ was arbitrary, we obtain ${\displaystyle f=g}$.${\displaystyle \Box }$

Remark 3.18: Let ${\displaystyle f,g:\mathbb {R} ^{d}\to \mathbb {R} }$ be continuous. If

${\displaystyle \forall \varphi \in {\mathcal {S}}(\mathbb {R} ^{d}):\int _{\mathbb {R} ^{d}}\varphi (x)f(x)dx=\int _{\mathbb {R} ^{d}}\varphi (x)g(x)dx}$,

then ${\displaystyle f=g}$.

Proof:

This follows from all bump functions being Schwartz functions, which is why the requirements for theorem 3.17 are met.${\displaystyle \Box }$

## Exercises

1. Let ${\displaystyle b\in \mathbb {R} }$ and ${\displaystyle f:\mathbb {R} \to \mathbb {R} }$ be constant on the interval ${\displaystyle [b-1,b)}$. Show that

${\displaystyle \forall y\in [b-1,b):\int _{\mathbb {R} }\chi _{[b-1,b)}(x)f(x)dx=f(y)}$
2. Prove that the standard mollifier as defined in example 3.4 is a bump function by proceeding as follows:
1. Prove that the function

${\displaystyle x\mapsto {\begin{cases}e^{-{\frac {1}{x}}}&x>0\\0&x\leq 0\end{cases}}}$

is contained in ${\displaystyle {\mathcal {C}}^{\infty }(\mathbb {R} )}$.

2. Prove that the function

${\displaystyle x\mapsto 1-\|x\|}$

is contained in ${\displaystyle {\mathcal {C}}^{\infty }(\mathbb {R} ^{d})}$.

3. Conclude that ${\displaystyle \eta \in {\mathcal {C}}^{\infty }(\mathbb {R} ^{d})}$.
4. Prove that ${\displaystyle {\text{supp }}\eta }$ is compact by calculating ${\displaystyle {\text{supp }}\eta }$ explicitly.
3. Let ${\displaystyle O\subseteq \mathbb {R} ^{d}}$ be open, let ${\displaystyle \varphi \in {\mathcal {D}}(O)}$ and let ${\displaystyle \phi \in {\mathcal {S}}(\mathbb {R} ^{d})}$. Prove that if ${\displaystyle \alpha ,\beta \in \mathbb {N} _{0}^{d}}$, then ${\displaystyle \partial _{\alpha }\varphi \in {\mathcal {D}}(O)}$ and ${\displaystyle x^{\alpha }\partial _{\beta }\phi \in {\mathcal {S}}(\mathbb {R} ^{d})}$.
4. Let ${\displaystyle O\subseteq \mathbb {R} ^{d}}$ be open, let ${\displaystyle \varphi _{1},\ldots ,\varphi _{n}\in {\mathcal {D}}(O)}$ be bump functions and let ${\displaystyle c_{1},\ldots ,c_{n}\in \mathbb {R} }$. Prove that ${\displaystyle \sum _{j=1}^{n}c_{j}\varphi _{j}\in {\mathcal {D}}(O)}$.
5. Let ${\displaystyle \phi _{1},\ldots ,\phi _{n}}$ be Schwartz functions functions and let ${\displaystyle c_{1},\ldots ,c_{n}\in \mathbb {R} }$. Prove that ${\displaystyle \sum _{j=1}^{n}c_{j}\phi _{j}}$ is a Schwartz function.
6. Let ${\displaystyle \alpha \in \mathbb {N} _{0}^{d}}$, let ${\displaystyle p(x):=\sum _{\varsigma \leq \alpha }c_{\varsigma }x^{\varsigma }}$ be a polynomial, and let ${\displaystyle \phi _{l}\to \phi }$ in the sense of Schwartz functions. Prove that ${\displaystyle p\phi _{l}\to p\phi }$ in the sense of Schwartz functions.
 Partial Differential Equations ← The transport equation Print version Distributions →

# Distributions

## Distributions and tempered distributions

Definition 4.1:

Let ${\displaystyle O\subseteq \mathbb {R} ^{d}}$ be open, and let ${\displaystyle {\mathcal {T}}:{\mathcal {D}}(O)\to \mathbb {R} }$ be a function. We call ${\displaystyle {\mathcal {T}}}$ a distribution iff

• ${\displaystyle {\mathcal {T}}}$ is linear (${\displaystyle \forall \varphi ,\vartheta \in {\mathcal {D}}(O),b,c\in \mathbb {R} :{\mathcal {T}}(b\varphi +c\vartheta )=b{\mathcal {T}}(\varphi )+c{\mathcal {T}}(\vartheta )}$)
• ${\displaystyle {\mathcal {T}}}$ is sequentially continuous (if ${\displaystyle \varphi _{l}\to \varphi }$ in the notion of convergence of bump functions, then ${\displaystyle {\mathcal {T}}(\varphi _{l})\to {\mathcal {T}}(\varphi )}$ in the reals)

The set of all distributions for ${\displaystyle {\mathcal {D}}(O)}$ we denote by ${\displaystyle {\mathcal {D}}(O)^{*}}$

Definition 4.2:

Let ${\displaystyle {\mathcal {T}}:{\mathcal {S}}(\mathbb {R} ^{d})\to \mathbb {R} }$ be a function. We call ${\displaystyle {\mathcal {T}}}$ a tempered distribution iff

• ${\displaystyle {\mathcal {T}}}$ is linear (${\displaystyle \forall \varphi ,\vartheta \in {\mathcal {S}}(\mathbb {R} ^{d}),b,c\in \mathbb {R} :{\mathcal {T}}(b\varphi +c\vartheta )=b{\mathcal {T}}(\varphi )+c{\mathcal {T}}(\vartheta )}$)
• ${\displaystyle {\mathcal {T}}}$ is sequentially continuous (if ${\displaystyle \varphi _{l}\to \varphi }$ in the notion of convergence of Schwartz functions, then ${\displaystyle {\mathcal {T}}(\varphi _{l})\to {\mathcal {T}}(\varphi )}$ in the reals)

The set of all tempered distributions we denote by ${\displaystyle {\mathcal {S}}(\mathbb {R} ^{d})}$.

Theorem 4.3:

Let ${\displaystyle {\mathcal {T}}}$ be a tempered distribution. Then the restriction of ${\displaystyle {\mathcal {T}}}$ to bump functions is a distribution.

Proof:

Let ${\displaystyle {\mathcal {T}}}$ be a tempered distribution, and let ${\displaystyle O\subseteq \mathbb {R} ^{d}}$ be open.

1.

We show that ${\displaystyle {\mathcal {T}}(\varphi )}$ has a well-defined value for ${\displaystyle \varphi \in {\mathcal {D}}(O)}$.

Due to theorem 3.9, every bump function is a Schwartz function, which is why the expression

${\displaystyle {\mathcal {T}}(\varphi )}$

makes sense for every ${\displaystyle \varphi \in {\mathcal {D}}(O)}$.

2.

We show that the restriction is linear.

Let ${\displaystyle a,b\in \mathbb {R} }$ and ${\displaystyle \varphi ,\vartheta \in {\mathcal {D}}(O)}$. Since due to theorem 3.9 ${\displaystyle \varphi }$ and ${\displaystyle \vartheta }$ are Schwartz functions as well, we have

${\displaystyle \forall a,b\in \mathbb {R} ,\varphi ,\vartheta \in {\mathcal {D}}(O):{\mathcal {T}}(a\varphi +b\vartheta )=a{\mathcal {T}}(\varphi )+b{\mathcal {T}}(\vartheta )}$

due to the linearity of ${\displaystyle {\mathcal {T}}}$ for all Schwartz functions. Thus ${\displaystyle {\mathcal {T}}}$ is also linear for bump functions.

3.

We show that the restriction of ${\displaystyle {\mathcal {T}}}$ to ${\displaystyle {\mathcal {D}}(O)}$ is sequentially continuous. Let ${\displaystyle \varphi _{l}\to \varphi }$ in the notion of convergence of bump functions. Due to theorem 3.11, ${\displaystyle \varphi _{l}\to \varphi }$ in the notion of convergence of Schwartz functions. Since ${\displaystyle {\mathcal {T}}}$ as a tempered distribution is sequentially continuous, ${\displaystyle {\mathcal {T}}(\varphi _{l})\to {\mathcal {T}}(\varphi )}$.${\displaystyle \Box }$

## The convolution

Definition 4.4:

Let ${\displaystyle f,g:\mathbb {R} ^{d}\to \mathbb {R} }$. The integral

${\displaystyle f*g:\mathbb {R} ^{d}\to \mathbb {R} ,(f*g)(y):=\int _{\mathbb {R} ^{d}}f(x)g(y-x)dx}$

is called convolution of ${\displaystyle f}$ and ${\displaystyle g}$ and denoted by ${\displaystyle f*g}$ if it exists.

The convolution of two functions may not always exist, but there are sufficient conditions for it to exist:

Theorem 4.5:

Let ${\displaystyle p,q\in [1,\infty ]}$ such that ${\displaystyle {\frac {1}{p}}+{\frac {1}{q}}=1}$ and let ${\displaystyle f\in L^{p}(\mathbb {R} ^{d})}$ and ${\displaystyle g\in L^{q}(\mathbb {R} ^{d})}$. Then for all ${\displaystyle y\in O}$, the integral

${\displaystyle \int _{\mathbb {R} ^{d}}f(x)g(y-x)dx}$

has a well-defined real value.

Proof:

Due to Hölder's inequality,

${\displaystyle \int _{\mathbb {R} ^{d}}|f(x)g(y-x)|dx\leq \left(\int _{\mathbb {R} ^{d}}|f(x)|^{p}dx\right)^{1/p}\left(\int _{\mathbb {R} ^{d}}|g(y-x)|^{q}dx\right)^{1/q}<\infty }$.${\displaystyle \Box }$

We shall now prove that the convolution is commutative, i. e. ${\displaystyle f*g=g*f}$.

Theorem 4.6:

Let ${\displaystyle p,q\in [1,\infty ]}$ such that ${\displaystyle {\frac {1}{p}}+{\frac {1}{q}}=1}$ (where ${\displaystyle {\frac {1}{\infty }}=0}$) and let ${\displaystyle f\in L^{p}(\mathbb {R} ^{d})}$ and ${\displaystyle g\in L^{q}(\mathbb {R} ^{d})}$. Then for all ${\displaystyle y\in \mathbb {R} ^{d}}$:

${\displaystyle \forall y\in \mathbb {R} ^{d}:(f*g)(y)=(g*f)(y)}$

Proof:

We apply multi-dimensional integration by substitution using the diffeomorphism ${\displaystyle x\mapsto y-x}$ to obtain

${\displaystyle (f*g)(y)=\int _{\mathbb {R} ^{d}}f(x)g(y-x)dx=\int _{\mathbb {R} ^{d}}f(y-x)g(x)dx=(g*f)(y)}$.${\displaystyle \Box }$

Lemma 4.7:

Let ${\displaystyle O\subseteq \mathbb {R} ^{d}}$ be open and let ${\displaystyle f\in L^{1}(\mathbb {R} ^{d})}$. Then ${\displaystyle f*\eta _{\delta }\in {\mathcal {C}}^{\infty }(\mathbb {R} ^{d})}$.

Proof:

Let ${\displaystyle \alpha \in \mathbb {N} _{0}^{d}}$ be arbitrary. Then, since for all ${\displaystyle y\in \mathbb {R} ^{d}}$

${\displaystyle \int _{\mathbb {R} ^{d}}|f(x)\partial _{\alpha }\eta _{\delta }(y-x)|dx\leq \|\partial _{\alpha }\eta _{\delta }\|_{\infty }\int _{\mathbb {R} ^{d}}|f(x)|dx}$

and further

${\displaystyle |f(x)\partial _{\alpha }\eta _{\delta }(y-x)|\leq |f(x)|}$,

Leibniz' integral rule (theorem 2.2) is applicable, and by repeated application of Leibniz' integral rule we obtain

${\displaystyle \partial _{\alpha }f*\eta _{\delta }=f*\partial _{\alpha }\eta _{\delta }}$.${\displaystyle \Box }$

## Regular distributions

In this section, we shortly study a class of distributions which we call regular distributions. In particular, we will see that for certain kinds of functions there exist corresponding distributions.

Definition 4.8:

Let ${\displaystyle O\subseteq \mathbb {R} ^{d}}$ be an open set and let ${\displaystyle {\mathcal {T}}\in {\mathcal {D}}(O)^{*}}$. If for all ${\displaystyle \varphi \in {\mathcal {D}}(O)}$ ${\displaystyle {\mathcal {T}}(\varphi )}$ can be written as

${\displaystyle {\mathcal {T}}(\varphi )=\int _{O}f(x)\varphi (x)dx}$

for a function ${\displaystyle f:O\to \mathbb {R} }$ which is independent of ${\displaystyle \varphi }$, then we call ${\displaystyle {\mathcal {T}}}$ a regular distribution.

Definition 4.9:

Let ${\displaystyle {\mathcal {T}}\in {\mathcal {S}}(\mathbb {R} ^{d})^{*}}$. If for all ${\displaystyle \phi \in {\mathcal {S}}(\mathbb {R} ^{d})}$ ${\displaystyle {\mathcal {T}}(\phi )}$ can be written as

${\displaystyle {\mathcal {T}}(\phi )=\int _{\mathbb {R} ^{d}}f(x)\phi (x)dx}$

for a function ${\displaystyle f:\mathbb {R} ^{d}\to \mathbb {R} }$ which is independent of ${\displaystyle \phi }$, then we call ${\displaystyle {\mathcal {T}}}$ a regular tempered distribution.

Two questions related to this definition could be asked: Given a function ${\displaystyle f:\mathbb {R} ^{d}\to \mathbb {R} }$, is ${\displaystyle {\mathcal {T}}_{f}:{\mathcal {D}}(O)\to \mathbb {R} }$ for ${\displaystyle O\subseteq \mathbb {R} ^{d}}$ open given by

${\displaystyle {\mathcal {T}}_{f}(\varphi ):=\int _{O}f(x)\varphi (x)dx}$

well-defined and a distribution? Or is ${\displaystyle {\mathcal {T}}_{f}:{\mathcal {S}}(\mathbb {R} ^{d})\to \mathbb {R} }$ given by

${\displaystyle {\mathcal {T}}_{f}(\phi ):=\int _{\mathbb {R} ^{d}}f(x)\phi (x)dx}$

well-defined and a tempered distribution? In general, the answer to these two questions is no, but both questions can be answered with yes if the respective function ${\displaystyle f}$ has the respectively right properties, as the following two theorems show. But before we state the first theorem, we have to define what local integrability means, because in the case of bump functions, local integrability will be exactly the property which ${\displaystyle f}$ needs in order to define a corresponding regular distribution:

Definition 4.10:

Let ${\displaystyle O\subseteq \mathbb {R} ^{d}}$ be open, ${\displaystyle f:O\to \mathbb {R} }$ be a function. We say that ${\displaystyle f}$ is locally integrable iff for all compact subsets ${\displaystyle K}$ of ${\displaystyle O}$

${\displaystyle -\infty <\int _{K}f(x)dx<\infty }$

We write ${\displaystyle f\in L_{\text{loc}}^{1}(O)}$.

Now we are ready to give some sufficient conditions on ${\displaystyle f}$ to define a corresponding regular distribution or regular tempered distribution by the way of

${\displaystyle {\mathcal {T}}_{f}:{\mathcal {D}}(O)\to \mathbb {R} ,{\mathcal {T}}_{f}(\varphi ):=\int _{O}f(x)\varphi (x)dx}$

or

${\displaystyle {\mathcal {T}}_{f}:{\mathcal {S}}(\mathbb {R} ^{d})\to \mathbb {R} ,{\mathcal {T}}_{f}(\phi ):=\int _{\mathbb {R} ^{d}}f(x)\phi (x)dx}$:

Theorem 4.11:

Let ${\displaystyle O\subseteq \mathbb {R} ^{d}}$ be open, and let ${\displaystyle f:O\to \mathbb {R} }$ be a function. Then

${\displaystyle {\mathcal {T}}_{f}:{\mathcal {D}}(O)\to \mathbb {R} ,{\mathcal {T}}_{f}(\varphi ):=\int _{O}f(x)\varphi (x)dx}$

is a regular distribution iff ${\displaystyle f\in L_{\text{loc}}^{1}(O)}$.

Proof:

1.

We show that if ${\displaystyle f\in L_{\text{loc}}^{1}(O)}$, then ${\displaystyle {\mathcal {T}}_{f}:{\mathcal {D}}(O)\to \mathbb {R} }$ is a distribution.

Well-definedness follows from the triangle inequality of the integral and the monotony of the integral:

{\displaystyle {\begin{aligned}\left|\int _{U}\varphi (x)f(x)dx\right|\leq \int _{U}|\varphi (x)f(x)|dx=\int _{{\text{supp }}\varphi }|\varphi (x)f(x)|dx\\\leq \int _{{\text{supp }}\varphi }\|\varphi \|_{\infty }|f(x)|dx=\|\varphi \|_{\infty }\int _{{\text{supp }}\varphi }|f(x)|dx<\infty \end{aligned}}}

In order to have an absolute value strictly less than infinity, the first integral must have a well-defined value in the first place. Therefore, ${\displaystyle {\mathcal {T}}_{f}}$ really maps to ${\displaystyle \mathbb {R} }$ and well-definedness is proven.

Continuity follows similarly due to

${\displaystyle |T_{f}\varphi _{l}-T_{f}\varphi |=\left|\int _{K}(\varphi _{l}-\varphi )(x)f(x)dx\right|\leq \|\varphi _{l}-\varphi \|_{\infty }\underbrace {\int _{K}|f(x)|dx} _{{\text{independent of }}l}\to 0,l\to \infty }$

, where ${\displaystyle K}$ is the compact set in which all the supports of ${\displaystyle \varphi _{l},l\in \mathbb {N} }$ and ${\displaystyle \varphi }$ are contained (remember: The existence of a compact set such that all the supports of ${\displaystyle \varphi _{l},l\in \mathbb {N} }$ are contained in it is a part of the definition of convergence in ${\displaystyle {\mathcal {D}}(O)}$, see the last chapter. As in the proof of theorem 3.11, we also conclude that the support of ${\displaystyle \varphi }$ is also contained in ${\displaystyle K}$).

Linearity follows due to the linearity of the integral.

2.

We show that ${\displaystyle {\mathcal {T}}_{f}}$ is a distribution, then ${\displaystyle f\in L_{\text{loc}}^{1}(O)}$ (in fact, we even show that if ${\displaystyle {\mathcal {T}}_{f}(\varphi )}$ has a well-defined real value for every ${\displaystyle \varphi \in {\mathcal {D}}(O)}$, then ${\displaystyle f\in L_{\text{loc}}^{1}(O)}$. Therefore, by part 1 of this proof, which showed that if ${\displaystyle f\in L_{\text{loc}}^{1}(O)}$ it follows that ${\displaystyle {\mathcal {T}}_{f}}$ is a distribution in ${\displaystyle {\mathcal {D}}^{*}(O)}$, we have that if ${\displaystyle {\mathcal {T}}_{f}(\varphi )}$ is a well-defined real number for every ${\displaystyle \varphi \in {\mathcal {D}}(O)}$, ${\displaystyle {\mathcal {T}}_{f}}$ is a distribution in ${\displaystyle {\mathcal {D}}(O)}$.

Let ${\displaystyle K\subset U}$ be an arbitrary compact set. We define

${\displaystyle \mu :K\to \mathbb {R} ,\mu (\xi ):=\inf _{x\in \mathbb {R} ^{d}\setminus O}\|\xi -x\|}$

${\displaystyle \mu }$ is continuous, even Lipschitz continuous with Lipschitz constant ${\displaystyle 1}$: Let ${\displaystyle \xi ,\iota \in \mathbb {R} ^{d}}$. Due to the triangle inequality, both

${\displaystyle \forall (x,y)\in \mathbb {R} ^{2}:\|\xi -x\|\leq \|\xi -\iota \|+\|\iota -y\|+\|y-x\|~~~~~(*)}$

and

${\displaystyle \forall (x,y)\in \mathbb {R} ^{2}:\|\iota -y\|\leq \|\iota -\xi \|+\|\xi -x\|+\|x-y\|~~~~~(**)}$

, which can be seen by applying the triangle inequality twice.

We choose sequences ${\displaystyle (x_{l})_{l\in \mathbb {N} }}$ and ${\displaystyle (y_{m})_{m\in \mathbb {N} }}$ in ${\displaystyle \mathbb {R} ^{d}\setminus O}$ such that ${\displaystyle \lim _{l\to \infty }\|\xi -x_{l}\|=\mu (\xi )}$ and ${\displaystyle \lim _{m\to \infty }\|\iota -y_{m}\|=\mu (\iota )}$ and consider two cases. First, we consider what happens if ${\displaystyle \mu (\xi )\geq \mu (\iota )}$. Then we have

{\displaystyle {\begin{aligned}|\mu (\xi )-\mu (\iota )|&=\mu (\xi )-\mu (\iota )&\\&=\inf _{x\in \mathbb {R} ^{d}\setminus O}\|\xi -x\|-\inf _{y\in \mathbb {R} ^{d}\setminus O}\|\iota -y\|&\\&=\inf _{x\in \mathbb {R} ^{d}\setminus O}\|\xi -x\|-\lim _{m\to \infty }\|\iota -y_{m}\|&\\&=\lim _{m\to \infty }\inf _{x\in \mathbb {R} ^{d}\setminus O}\left(\|\xi -x\|-\|\iota -y_{m}\|\right)&\\&\leq \lim _{m\to \infty }\inf _{x\in \mathbb {R} ^{d}\setminus O}\left(\|\xi -\iota \|+\|x-y_{m}\|\right)&(*){\text{ with }}y=y_{m}\\&=\|\xi -\iota \|&\end{aligned}}}.

Second, we consider what happens if ${\displaystyle \mu (\xi )\leq \mu (\iota )}$:

{\displaystyle {\begin{aligned}|\mu (\xi )-\mu (\iota )|&=\mu (\iota )-\mu (\xi )&\\&=\inf _{y\in \mathbb {R} ^{d}\setminus O}\|\iota -y\|-\inf _{x\in \mathbb {R} ^{d}\setminus O}\|\xi -x\|&\\&=\inf _{y\in \mathbb {R} ^{d}\setminus O}\|\iota -y\|-\lim _{l\to \infty }\|\xi -x_{l}\|&\\&=\lim _{l\to \infty }\inf _{y\in \mathbb {R} ^{d}\setminus O}\left(\|\iota -y\|-\|\xi -x_{l}\|\right)&\\&\leq \lim _{l\to \infty }\inf _{y\in \mathbb {R} ^{d}\setminus O}\left(\|\xi -\iota \|+\|y-x_{l}\|\right)&(**){\text{ with }}x=x_{l}\\&=\|\xi -\iota \|&\end{aligned}}}

Since always either ${\displaystyle \mu (\xi )\geq \mu (\iota )}$ or ${\displaystyle \mu (\xi )\leq \mu (\iota )}$, we have proven Lipschitz continuity and thus continuity. By the extreme value theorem, ${\displaystyle \mu }$ therefore has a minimum ${\displaystyle \kappa \in \mathbb {R} ^{d}}$. Since ${\displaystyle \mu (\kappa )=0}$ would mean that ${\displaystyle \|\xi -x_{l}\|\to 0,l\to \infty }$ for a sequence ${\displaystyle (x_{l})_{l\in \mathbb {N} }}$ in ${\displaystyle \mathbb {R} ^{d}\setminus O}$ which is a contradiction as ${\displaystyle \mathbb {R} ^{d}\setminus O}$ is closed and ${\displaystyle \kappa \in K\subset O}$, we have ${\displaystyle \mu (\kappa )>0}$.

Hence, if we define ${\displaystyle \delta :=\mu (\kappa )}$, then ${\displaystyle \delta >0}$. Further, the function

${\displaystyle \vartheta :\mathbb {R} ^{d}\to \mathbb {R} ,\vartheta (x):=(\chi _{K+B_{\delta /4}(0)}*\eta _{\delta /4})(x)=\int _{\mathbb {R} ^{d}}\eta _{\delta /4}(y)\chi _{K+B_{\delta /4}(0)}(x-y)dy=\int _{B_{\delta /4}(0)}\eta _{\delta /4}(y)\chi _{K+B_{\delta /4}(0)}(x-y)dy}$

has support contained in ${\displaystyle O}$, is equal to ${\displaystyle 1}$ within ${\displaystyle K}$ and further is contained in ${\displaystyle {\mathcal {C}}^{\infty }(\mathbb {R} ^{d})}$ due to lemma 4.7. Hence, it is also contained in ${\displaystyle {\mathcal {D}}(O)}$. Since therefore, by the monotonicity of the integral

${\displaystyle \int _{K}|f(x)|dx=\int _{O}|f(x)|\chi _{K}(x)dx\leq \int _{\mathbb {R} ^{d}}|f(x)|\vartheta (x)dx}$

, ${\displaystyle f}$ is indeed locally integrable.${\displaystyle \Box }$

Theorem 4.12:

Let ${\displaystyle f\in L^{2}(\mathbb {R} ^{d})}$, i. e.

${\displaystyle \int _{\mathbb {R} ^{d}}|f(x)|^{2}dx<\infty }$

Then

${\displaystyle {\mathcal {T}}_{f}:{\mathcal {S}}(\mathbb {R} ^{d})\to \mathbb {R} ,{\mathcal {T}}_{f}(\phi ):=\int _{\mathbb {R} ^{d}}f(x)\phi (x)dx}$

is a regular tempered distribution.

Proof:

From Hölder's inequality we obtain

${\displaystyle \int _{\mathbb {R} ^{d}}|\phi (x)||f(x)|dx\leq \|\phi \|_{L^{2}}\|f\|_{L^{2}}<\infty }$.

Hence, ${\displaystyle {\mathcal {T}}_{f}}$ is well-defined.

Due to the triangle inequality for integrals and Hölder's inequality, we have

${\displaystyle |T_{f}(\phi _{l})-T_{f}(\phi )|\leq \int _{\mathbb {R} ^{d}}|(\phi _{l}-\phi )(x)||f(x)|dx\leq \|\phi _{l}-\phi \|_{L^{2}}\|f\|_{L^{2}}}$

Furthermore

{\displaystyle {\begin{aligned}\|\phi _{l}-\phi \|_{L^{2}}^{2}&\leq \|\phi _{l}-\phi \|_{\infty }\int _{\mathbb {R} ^{d}}|(\phi _{l}-\phi )(x)|dx\\&=\|\phi _{l}-\phi \|_{\infty }\int _{\mathbb {R} ^{d}}\prod _{j=1}^{d}(1+x_{j}^{2})|(\phi _{l}-\phi )(x)|{\frac {1}{\prod _{j=1}^{d}(1+x_{j}^{2})}}dx\\&\leq \|\phi _{l}-\phi \|_{\infty }\left\|\prod _{j=1}^{d}(1+x_{j}^{2})(\phi _{l}-\phi )\right\|_{\infty }\underbrace {\int _{\mathbb {R} ^{d}}{\frac {1}{\prod _{j=1}^{d}(1+x_{j}^{2})}}dx} _{=\pi ^{d}}\end{aligned}}}.

If ${\displaystyle \phi _{l}\to \phi }$ in the notion of convergence of the Schwartz function space, then this expression goes to zero. Therefore, continuity is verified.

Linearity follows from the linearity of the integral.${\displaystyle \Box }$

## Equicontinuity

We now introduce the concept of equicontinuity.

Definition 4.13:

Let ${\displaystyle M}$ be a metric space equipped with a metric which we shall denote by ${\displaystyle d}$ here, let ${\displaystyle X\subseteq M}$ be a set in ${\displaystyle M}$, and let ${\displaystyle {\mathcal {Q}}}$ be a set of continuous functions mapping from ${\displaystyle X}$ to the real numbers ${\displaystyle \mathbb {R} }$. We call this set ${\displaystyle {\mathcal {Q}}}$ equicontinuous if and only if

${\displaystyle \forall x\in X:\exists \delta \in \mathbb {R} _{>0}:\forall y\in X:d(x,y)<\delta \Rightarrow \forall f\in {\mathcal {Q}}:|f(x)-f(y)|<\epsilon }$.

So equicontinuity is in fact defined for sets of continuous functions mapping from ${\displaystyle X}$ (a set in a metric space) to the real numbers ${\displaystyle \mathbb {R} }$.

Theorem 4.14:

Let ${\displaystyle M}$ be a metric space equipped with a metric which we shall denote by ${\displaystyle d}$, let ${\displaystyle Q\subseteq M}$ be a sequentially compact set in ${\displaystyle M}$, and let ${\displaystyle {\mathcal {Q}}}$ be an equicontinuous set of continuous functions from ${\displaystyle Q}$ to the real numbers ${\displaystyle \mathbb {R} }$. Then follows: If ${\displaystyle (f_{l})_{l\in \mathbb {N} }}$ is a sequence in ${\displaystyle {\mathcal {Q}}}$ such that ${\displaystyle f_{l}(x)}$ has a limit for each ${\displaystyle x\in Q}$, then for the function ${\displaystyle f(x):=\lim _{l\to \infty }f_{l}(x)}$, which maps from ${\displaystyle Q}$ to ${\displaystyle \mathbb {R} }$, it follows ${\displaystyle f_{l}\to f}$ uniformly.

Proof:

In order to prove uniform convergence, by definition we must prove that for all ${\displaystyle \epsilon >0}$, there exists an ${\displaystyle N\in \mathbb {N} }$ such that for all ${\displaystyle l\geq N:\forall x\in Q:|f_{l}(x)-f(x)|<\epsilon }$.

So let's assume the contrary, which equals by negating the logical statement

${\displaystyle \exists \epsilon >0:\forall N\in \mathbb {N} :\exists l\geq N:\exists x\in Q:|f_{l}(x)-f(x)|\geq \epsilon }$.

We choose a sequence ${\displaystyle (x_{m})_{m\in \mathbb {N} }}$ in ${\displaystyle Q}$. We take ${\displaystyle x_{1}}$ in ${\displaystyle Q}$ such that ${\displaystyle |f_{l_{1}}(x_{1})-f(x_{1})|\geq \epsilon }$ for an arbitrarily chosen ${\displaystyle l_{1}\in \mathbb {N} }$ and if we have already chosen ${\displaystyle x_{k}}$ and ${\displaystyle l_{k}}$ for all ${\displaystyle k\in \{1,\ldots ,m\}}$, we choose ${\displaystyle x_{m+1}}$ such that ${\displaystyle |f_{l_{m+1}}(x_{m+1})-f(x_{m+1})|\geq \epsilon }$, where ${\displaystyle l_{m+1}}$ is greater than ${\displaystyle l_{m}}$.

As ${\displaystyle Q}$ is sequentially compact, there is a convergent subsequence ${\displaystyle (x_{m_{j}})_{j\in \mathbb {N} }}$ of ${\displaystyle (x_{m})_{m\in \mathbb {N} }}$. Let us call the limit of that subsequence sequence ${\displaystyle x}$.

As ${\displaystyle {\mathcal {Q}}}$ is equicontinuous, we can choose ${\displaystyle \delta \in \mathbb {R} _{>0}}$ such that

${\displaystyle \|x-y\|<\delta \Rightarrow \forall f\in {\mathcal {Q}}:|f(x)-f(y)|<{\frac {\epsilon }{4}}}$.

Further, since ${\displaystyle x_{m_{j}}\to x}$ (if ${\displaystyle j\to \infty }$ of course), we may choose ${\displaystyle J\in \mathbb {N} }$ such that

${\displaystyle \forall j\geq J:\|x_{m_{j}}-x\|<\delta }$.

But then follows for ${\displaystyle j\geq J}$ and the reverse triangle inequality:

${\displaystyle |f_{l_{m_{j}}}(x)-f(x)|\geq \left||f_{l_{m_{j}}}(x)-f(x_{m_{j}})|-|f(x_{m_{j}})-f(x)|\right|}$

Since we had ${\displaystyle |f(x_{m_{j}})-f(x)|<{\frac {\epsilon }{4}}}$, the reverse triangle inequality and the definition of t

${\displaystyle |f_{l_{m_{j}}}(x)-f(x_{m_{j}})|\geq \left||f_{l_{m_{j}}}(x_{m_{j}})-f(x_{m_{j}})|-|f_{l_{m_{j}}}(x)-f_{l_{m_{j}}}(x_{m_{j}})|\right|\geq \epsilon -{\frac {\epsilon }{4}}}$

, we obtain:

{\displaystyle {\begin{aligned}|f_{l_{m_{j}}}(x)-f(x)|&\geq \left||f_{l_{m_{j}}}(x)-f(x_{m_{j}})|-|f(x_{m_{j}})-f(x)|\right|\\&=|f_{l_{m_{j}}}(x)-f(x_{m_{j}})|-|f(x_{m_{j}})-f(x)|\\&\geq \epsilon -{\frac {\epsilon }{4}}-{\frac {\epsilon }{4}}\\&\geq {\frac {\epsilon }{2}}\end{aligned}}}

Thus we have a contradiction to ${\displaystyle f_{l}(x)\to f(x)}$.${\displaystyle \Box }$

Theorem 4.15:

Let ${\displaystyle {\mathcal {Q}}}$ be a set of differentiable functions, mapping from the convex set ${\displaystyle X\subseteq \mathbb {R} ^{d}}$ to ${\displaystyle \mathbb {R} }$. If we have, that there exists a constant ${\displaystyle b\in \mathbb {R} _{>0}}$ such that for all functions in ${\displaystyle {\mathcal {Q}}}$, ${\displaystyle \forall x\in X:\|\nabla f(x)\|\leq b}$ (the ${\displaystyle \nabla f}$ exists for each function in ${\displaystyle {\mathcal {Q}}}$ because all functions there were required to be differentiable), then ${\displaystyle {\mathcal {Q}}}$ is equicontinuous.

Proof: We have to prove equicontinuity, so we have to prove

${\displaystyle \forall x\in X:\exists \delta \in \mathbb {R} _{>0}:\forall y\in X:\|x-y\|<\delta \Rightarrow \forall f\in {\mathcal {Q}}:|f(x)-f(y)|<\epsilon }$.

Let ${\displaystyle x\in X}$ be arbitrary.

We choose ${\displaystyle \delta :={\frac {\epsilon }{b}}}$.

Let ${\displaystyle y\in X}$ such that ${\displaystyle \|x-y\|<\delta }$, and let ${\displaystyle f\in {\mathcal {Q}}}$ be arbitrary. By the mean-value theorem in multiple dimensions, we obtain that there exists a ${\displaystyle \lambda \in [0,1]}$ such that:

${\displaystyle f(x)-f(y)=\nabla f(\lambda x+(1-\lambda )y)\cdot (x-y)}$

The element ${\displaystyle \lambda x+(1-\lambda )y}$ is inside ${\displaystyle X}$, because ${\displaystyle X}$ is convex. From the Cauchy-Schwarz inequality then follows:

${\displaystyle |f(x)-f(y)|=|\nabla f(\lambda x+(1-\lambda )y)\cdot (x-y)|\leq \|\nabla f(\lambda x+(1-\lambda )y)\|\|x-y\|${\displaystyle \Box }$

## The generalised product rule

Definition 4.16:

If ${\displaystyle \alpha =(\alpha _{1},\ldots ,\alpha _{d}),\beta =(\beta _{1},\ldots ,\beta _{d})\in \mathbb {N} _{0}^{d}}$ are two ${\displaystyle d}$-dimensional multiindices, we define the binomial coefficient of ${\displaystyle \alpha }$ over ${\displaystyle \beta }$ as

${\displaystyle {\binom {\alpha }{\beta }}:={\binom {\alpha _{1}}{\beta _{1}}}{\binom {\alpha _{2}}{\beta _{2}}}\cdots {\binom {\alpha _{d}}{\beta _{d}}}}$.

We also define less or equal relation on the set of multi-indices.

Definition 4.17:

Let ${\displaystyle \alpha =(\alpha _{1},\ldots ,\alpha _{d}),\beta =(\beta _{1},\ldots ,\beta _{d})\in \mathbb {N} _{0}^{d}}$ be two ${\displaystyle d}$-dimensional multiindices. We define ${\displaystyle \beta }$ to be less or equal than ${\displaystyle \alpha }$ if and only if

${\displaystyle \beta \leq \alpha :\Leftrightarrow \forall n\in \{1,\ldots ,d\}:\beta _{n}\leq \alpha _{n}}$.

For ${\displaystyle d\geq 2}$, there are vectors ${\displaystyle \alpha ,\beta \in \mathbb {N} _{0}^{d}}$ such that neither ${\displaystyle \alpha \leq \beta }$ nor ${\displaystyle \beta \leq \alpha }$. For ${\displaystyle d=2}$, the following two vectors are examples for this:

${\displaystyle \alpha =(1,0),\beta =(0,1)}$

This example can be generalised to higher dimensions (see exercise 6).

With these multiindex definitions, we are able to write down a more general version of the product rule. But in order to prove it, we need another lemma.

Lemma 4.18:

If ${\displaystyle n\in \{1,\ldots ,d\}}$ and ${\displaystyle e_{n}:=(0,\ldots ,0,1,0,\ldots ,0)}$, where the ${\displaystyle 1}$ is at the ${\displaystyle n}$-th place, we have

${\displaystyle {\binom {\alpha -e_{n}}{\beta -e_{n}}}+{\binom {\alpha -e_{n}}{\beta }}={\binom {\alpha }{\beta }}}$

for arbitrary multiindices ${\displaystyle \alpha ,\beta \in \mathbb {N} _{0}^{d}}$.

Proof:

For the ordinary binomial coefficients for natural numbers, we had the formula

${\displaystyle {\binom {n-1}{k-1}}+{\binom {n-1}{k}}={\binom {n}{k}}}$.

Therefore,