Calculus/Inverse function theorem, implicit function theorem

← The chain rule and Clairaut's theorem	Calculus	Vector calculus →
	Inverse function theorem, implicit function theorem

In this chapter, we want to prove the inverse function theorem (which asserts that if a function has invertible differential at a point, then it is locally invertible itself) and the implicit function theorem (which asserts that certain sets are the graphs of functions).

Banach's fixed point theorem

Theorem:

Let $(M,d)$ be a complete metric space, and let $f:M\to M$ be a strict contraction; that is, there exists a constant $0\leq \lambda <1$ such that

\forall m,n\in M:d(f(m),f(n))\leq \lambda d(m,n)

.

Then $f$ has a unique fixed point, which means that there is a unique $x\in M$ such that $f(x)=x$ . Furthermore, if we start with a completely arbitrary point $y\in M$ , then the sequence

y,f(y),f(f(y)),f(f(f(y))),\ldots

converges to $x$ .

Proof:

First, we prove uniqueness of the fixed point. Assume $x,y$ are both fixed points. Then

d(x,y)=d(f(x),f(y))\leq \lambda d(x,y)\Rightarrow (1-\lambda )d(x,y)=0

.

Since $0\leq \lambda <1$ , this implies $d(x,y)=0\Rightarrow x=y$ .

Now we prove existence and simultaneously the claim about the convergence of the sequence $y,f(y),f(f(y)),f(f(f(y))),\ldots$ . For notation, we thus set $z_{0}:=y$ and if $z_{n}$ is already defined, we set $z_{n+1}=f(z_{n})$ . Then the sequence $(z_{n})_{n\in \mathbb {N} }$ is nothing else but the sequence $y,f(y),f(f(y)),f(f(f(y))),\ldots$ .

Let $n\geq 0$ . We claim that

d(z_{n+1},z_{n})\leq \lambda ^{n}d(z_{1},z_{0})

.

Indeed, this follows by induction on $n$ . The case $n=0$ is trivial, and if the claim is true for $n$ , then $d(z_{n+2},z_{n+1})=d(f(z_{n+1}),f(z_{n}))\leq \lambda d(z_{n+1},z_{n})\leq \lambda \cdot \lambda ^{n}d(z_{1},z_{0})$ .

Hence, by the triangle inequality,

{\begin{aligned}d(z_{n+m},z_{n})&\leq \sum _{j=n+1}^{n+m}d(z_{j},z_{j-1})\\&\leq \sum _{j=n+1}^{n+m}\lambda ^{j-1}d(z_{1},z_{0})\\&\leq \sum _{j=n+1}^{\infty }\lambda ^{j-1}d(z_{1},z_{0})\\&=d(z_{1},z_{0})\lambda ^{n}{\frac {1}{1-\lambda }}\end{aligned}}

.

The latter expression goes to zero as $n\to \infty$ and hence we are dealing with a Cauchy sequence. As we are in a complete metric space, it converges to a limit $x$ . This limit further is a fixed point, as the continuity of $f$ ( $f$ is Lipschitz continuous with constant $\lambda$ ) implies

x=\lim _{n\to \infty }z_{n}=\lim _{n\to \infty }f(z_{n-1})=f(\lim _{n\to \infty }z_{n-1})=f(x)

.

\Box

A corollary to this important result is the following lemma, which shall be the main ingredient for the proof of the inverse function theorem:

Lemma:

Let $g:{\overline {B_{r}(0)}}\to {\overline {B_{r}(0)}}$ ( ${\overline {B_{r}(0)}}\subset \mathbb {R} ^{n}$ denoting the closed ball of radius $r$ ) be a function which is Lipschitz continuous with Lipschitz constant less or equal $1/2$ such that $g(0)=0$ . Then the function

f:{\overline {B_{r}(0)}}\to \mathbb {R} ^{n},f(x):=g(x)+x

is injective and $B_{r/2}(0)\subseteq f(B_{r}(0))$ .

Proof:

First, we note that for $y\in B_{r/2}(0)$ the function

h:{\overline {B_{r}(0)}}\to \mathbb {R} ^{n},h(z):=y-g(z)

is a strict contraction; this is due to

\|y-g(z)-(y-g(z'))\|=\|g(z')-g(z)\|\leq {\frac {1}{2}}\|z-z'\|

.

Furthermore, it maps ${\overline {B_{r}(0)}}$ to itself, since for $z\in {\overline {B_{r}(0)}}$

\|y-g(z)\|\leq \|y\|+\|g(z-0)\|\leq {\frac {r}{2}}+{\frac {1}{2}}\|z\|\leq r

.

Hence, the Banach fixed-point theorem is applicable to $h$ . Now $x$ being a fixed point of $h$ is equivalent to

f(x)=y

,

and thus $B_{r/2}(0)\subseteq f(B_{r}(0))$ follows from the existence of fixed points. Furthermore, if $f(x)=f(x')$ , then

{\frac {1}{2}}\|x-x'\|\geq \|g(x)-g(x')\|=\|f(x)-x-(f(x')-x')\|=\|x-x'\|

and hence $x=x'$ . Thus injectivity. $\Box$

The inverse function theorem

Theorem:

Let $f:\mathbb {R} ^{n}\to \mathbb {R} ^{n}$ be a function which is continuously differentiable in a neighbourhood $x_{0}\in \mathbb {R} ^{n}$ such that $f'(x_{0})$ is invertible. Then there exists an open set $U\subseteq \mathbb {R} ^{n}$ with $x_{0}\in U$ such that $f|_{U}$ is a bijective function with an inverse $f^{-1}:f(U)\to U$ which is differentiable at $x_{0}$ and satisfies

(f^{-1})'(f(x_{0}))=(f'(x_{0}))^{-1}

.

Proof:

We first reduce to the case $f(x_{0})=0$ , $x_{0}=0$ and $f'(x_{0})={\text{Id}}$ . Indeed, suppose for all those functions the theorem holds, and let now $h$ be an arbitrary function satisfying the requirements of the theorem (where the differentiability is given at $x_{0}$ ). We set

{\tilde {h}}(x):=h'(x_{0})^{-1}(h(x_{0}-x)-h(x_{0}))

and obtain that ${\tilde {h}}$ is differentiable at $0$ with differential ${\text{Id}}$ and ${\tilde {h}}(0)=0$ ; the first property follows since we multiply both the function and the linear-affine approximation by $h'(x_{0})^{-1}$ and only shift the function, and the second one is seen from inserting $x=0$ . Hence, we obtain an inverse of ${\tilde {h}}$ with it's differential at ${\tilde {h}}(0)=0$ , and if we now set

h^{-1}(y):=({\tilde {h}}^{-1}(h'(x_{0})^{-1}(y-h(x_{0})))-x_{0})

,

it can be seen that $h^{-1}$ is an inverse of $h$ with all the required properties (which is a bit of a tedious exercise, but involves nothing more than the definitions).

Thus let $f$ be a function such that $f(0)=0$ , $f$ is invertible at $0$ and $f'(0)={\text{Id}}$ . We define

g(x):=f(x)-x

.

The differential of this function is zero (since taking the differential is linear and the differential of the function $x\mapsto x$ is the identity). Since the function $g$ is also continuously differentiable at a small neighbourhood of $0$ , we find $\delta >0$ such that

{\frac {\partial g}{\partial x_{j}}}(x)<{\frac {1}{2n^{2}}}

for all $j\in \{1,\ldots ,n\}$ and $x\in B_{\delta }(0)$ . Since further $g(0)=f(0)-0=0$ , the general mean-value theorem and Cauchy's inequality imply that for $k\in \{1,\ldots ,n\}$ and $x\in B_{\delta }(0)$ ,

|g_{k}(x)|=|\langle x,{\frac {\partial g}{\partial x_{j}}}(t_{k}x)\rangle |\leq \|x\|n{\frac {1}{2n^{2}}}

for suitable $t_{k}\in [0,1]$ . Hence,

\|g(x)\|\leq |g_{1}(x)|+\cdots +|g_{n}(x)|\leq {\frac {1}{2}}\|x\|

(triangle inequality),

and thus, we obtain that our preparatory lemma is applicable, and $f$ is a bijection on ${\overline {B_{\delta }(0)}}$ , whose image is contained within the open set ${\overline {B_{\delta /2}(0)}}$ ; thus we may pick $U:=f^{-1}(B_{\delta /2}(0))$ , which is open due to the continuity of $f$ .

Thus, the most important part of the theorem is already done. All that is left to do is to prove differentiability of $f^{-1}$ at $0$ . Now we even prove the slightly stronger claim that the differential of $f^{-1}$ at $x_{0}$ is given by the identity, although this would also follow from the chain rule once differentiability is proven.

Note now that the contraction identity for $g$ implies the following bounds on $f$ :

{\frac {1}{2}}\|x\|\leq \|f(x)\|\leq {\frac {3}{2}}\|x\|

.

The second bound follows from

\|f(x)\|\leq \|f(x)-x\|+\|x\|=\|g(x)\|+\|x\|\leq {\frac {3}{2}}\|x\|

,

and the first bound follows from

\|f(x)\|\geq |\|f(x)-x\|-\|x\||=\left|\|g(x)\|-\|x\|\right|\geq {\frac {1}{2}}\|x\|

.

Now for the differentiability at $0$ . We have, by substitution of limits (as $f$ is continuous and $f(0)=0$ ):

{\begin{aligned}\lim _{\mathbf {h} \to 0}{\frac {\|f^{-1}(\mathbf {h} )-f^{-1}(0)-\operatorname {Id} (\mathbf {h} -0)\|}{\|\mathbf {h} \|}}&=\lim _{\mathbf {h} \to 0}{\frac {\|f^{-1}(f(\mathbf {h} ))-f(\mathbf {h} )\|}{\|f(\mathbf {h} )\|}}\\&=\lim _{\mathbf {h} \to 0}{\frac {\|\mathbf {h} -f(\mathbf {h} )\|}{\|f(\mathbf {h} )\|}},\end{aligned}}

where the last expression converges to zero due to the differentiability of $f$ at $0$ with differential the identity, and the sandwhich criterion applied to the expressions

{\frac {\|\mathbf {h} -f(\mathbf {h} )\|}{{\frac {3}{2}}\|\mathbf {h} \|}}

and

{\frac {\|\mathbf {h} -f(\mathbf {h} )\|}{{\frac {1}{2}}\|\mathbf {h} \|}}

.

\Box

The implicit function theorem

Theorem:

Let $f:\mathbb {R} ^{n}\to \mathbb {R}$ be a continuously differentiable function, and consider the set

S:=\{(x_{1},\ldots ,x_{n})\in \mathbb {R} ^{n}|f(x_{1},\ldots ,x_{n})=0\}

.

If we are given some $y\in S$ such that $\partial _{n}f(y)\neq 0$ , then we find $U\subseteq \mathbb {R} ^{n-1}$ open with $(y_{1},\ldots ,y_{n-1})\in U$ and $g:U\to S$ such that

y=g(y_{1},\ldots ,y_{n-1})

and

\{(z_{1},\ldots ,z_{n-1},g(z_{1},\ldots ,z_{n-1}))|(z_{1},\ldots ,z_{n-1})\in U\}\subseteq S

,

where $\{(z_{1},\ldots ,z_{n-1},g(z_{1},\ldots ,z_{n-1}))|(z_{1},\ldots ,z_{n-1})\in U\}$ is open with respect to the subspace topology of $U$ .

Furthermore, $g$ is a differentiable function.

Proof:

We define a new function

F:\mathbb {R} ^{n}\to \mathbb {R} ^{n},F(x_{1},\ldots ,x_{n}):=(x_{1},\ldots ,x_{n-1},f(x_{1},\ldots ,x_{n}))

.

The differential of this function looks like this:

F'(x)={\begin{pmatrix}1&0&\cdots &&0\\0&1&&&\vdots \\\vdots &&\ddots &&\\0&\cdots &0&1&0\\\partial _{1}f(x)&&\cdots &&\partial _{n}f(x)\end{pmatrix}}

Since we assumed that $\partial _{n}f(y)\neq 0$ , $F'(y)$ is invertible, and hence the inverse function theorem implies the existence of a small open neighbourhood ${\tilde {V}}\subseteq \mathbb {R} ^{n}$ containing $y$ such that restricted to that neighbourhood $F$ is itself invertible, with a differentiable inverse $F^{-1}$ , which is itself defined on an open set ${\tilde {U}}$ containing $F(y)$ . Now set first

U:=\{(x_{1},\ldots ,x_{n-1})|(x_{1},\ldots ,x_{n-1},0)\in {\tilde {U}}\}

,

which is open with respect to the subspace topology of $\mathbb {R} ^{n-1}$ , and then

g:U\to \mathbb {R} ,g(x_{1},\ldots ,x_{n-1}):=\pi _{n}(F^{-1}(x_{1},\ldots ,x_{n-1},0))

,

the $n$ -th component of $F^{-1}(x_{1},\ldots ,x_{n-1},0)$ . We claim that $g$ has the desired properties.

Indeed, we first note that $F^{-1}(x_{1},\ldots ,x_{n-1},0)=(x_{1},\ldots ,x_{n-1},g(x_{1},\ldots ,x_{n-1}))$ , since applying $F$ leaves the first $n-1$ components unchanged, and thus we get the identity by observing $F(F^{-1}(x))=x$ . Let thus $(z_{1},\ldots ,z_{n-1})\in U$ . Then

{\begin{aligned}f(z_{1},\ldots ,z_{n-1},g(z_{1},\ldots ,z_{n-1}))&=(\pi _{n}\circ F)(F^{-1}(z_{1},\ldots ,z_{n-1},0))\\&=\pi _{n}((F\circ F^{-1})(z_{1},\ldots ,z_{n-1},0))=0\end{aligned}}

.

Furthermore, the set

\{(z_{1},\ldots ,z_{n-1},g(z_{1},\ldots ,z_{n-1}))|(z_{1},\ldots ,z_{n-1})\in U\}

is open with respect to the subspace topology on $S$ . Indeed, we show

\{(z_{1},\ldots ,z_{n-1},g(z_{1},\ldots ,z_{n-1}))|(z_{1},\ldots ,z_{n-1})\in U\}=S\cap {\tilde {V}}

.

For $\subseteq$ , we first note that the set on the left hand side is in $S$ , since all points in it are mapped to zero by $f$ . Further,

F(z_{1},\ldots ,z_{n-1},g(z_{1},\ldots ,z_{n-1}))=(z_{1},\ldots ,z_{n-1},0)\in {\tilde {U}}

and hence $\subseteq$ is completed when applying $F^{-1}$ . For the other direction, let a point $(x_{1},\ldots ,x_{n})$ in $S\cap {\tilde {V}}$ be given, apply $F$ to get

F((x_{1},\ldots ,x_{n}))=(x_{1},\ldots ,x_{n-1},0)\in {\tilde {U}}

and hence $(x_{1},\ldots ,x_{n-1})\in U$ ; further

(x_{1},\ldots ,x_{n-1},g(x_{1},\ldots ,x_{n-1}))=(x_{1},\ldots ,x_{n})

by applying $F$ to both sides of the equation.

Now $g$ is automatically differentiable as the component of a differentiable function. $\Box$

Informally, the above theorem states that given a set $\{x\in \mathbb {R} ^{n}|f(x)=0\}$ , one can choose the first $n-1$ coordinates as a "base" for a function, whose graph is precisely a local bit of that set.