# Calculus/Inverse function theorem, implicit function theorem

 ← The chain rule and Clairaut's theorem Calculus Vector calculus → Inverse function theorem, implicit function theorem

In this chapter, we want to prove the inverse function theorem (which asserts that if a function has invertible differential at a point, then it is locally invertible itself) and the implicit function theorem (which asserts that certain sets are the graphs of functions).

## Banach's fixed point theorem

Theorem:

Let ${\displaystyle (M,d)}$  be a complete metric space, and let ${\displaystyle f:M\to M}$  be a strict contraction; that is, there exists a constant ${\displaystyle 0\leq \lambda <1}$  such that

${\displaystyle \forall m,n\in M:d(f(m),f(n))\leq \lambda d(m,n)}$ .

Then ${\displaystyle f}$  has a unique fixed point, which means that there is a unique ${\displaystyle x\in M}$  such that ${\displaystyle f(x)=x}$ . Furthermore, if we start with a completely arbitrary point ${\displaystyle y\in M}$ , then the sequence

${\displaystyle y,f(y),f(f(y)),f(f(f(y))),\ldots }$

converges to ${\displaystyle x}$ .

Proof:

First, we prove uniqueness of the fixed point. Assume ${\displaystyle x,y}$  are both fixed points. Then

${\displaystyle d(x,y)=d(f(x),f(y))\leq \lambda d(x,y)\Rightarrow (1-\lambda )d(x,y)=0}$ .

Since ${\displaystyle 0\leq \lambda <1}$ , this implies ${\displaystyle d(x,y)=0\Rightarrow x=y}$ .

Now we prove existence and simultaneously the claim about the convergence of the sequence ${\displaystyle y,f(y),f(f(y)),f(f(f(y))),\ldots }$ . For notation, we thus set ${\displaystyle z_{0}:=y}$  and if ${\displaystyle z_{n}}$  is already defined, we set ${\displaystyle z_{n+1}=f(z_{n})}$ . Then the sequence ${\displaystyle (z_{n})_{n\in \mathbb {N} }}$  is nothing else but the sequence ${\displaystyle y,f(y),f(f(y)),f(f(f(y))),\ldots }$ .

Let ${\displaystyle n\geq 0}$ . We claim that

${\displaystyle d(z_{n+1},z_{n})\leq \lambda ^{n}d(z_{1},z_{0})}$ .

Indeed, this follows by induction on ${\displaystyle n}$ . The case ${\displaystyle n=0}$  is trivial, and if the claim is true for ${\displaystyle n}$ , then ${\displaystyle d(z_{n+2},z_{n+1})=d(f(z_{n+1}),f(z_{n}))\leq \lambda d(z_{n+1},z_{n})\leq \lambda \cdot \lambda ^{n}d(z_{1},z_{0})}$ .

Hence, by the triangle inequality,

{\displaystyle {\begin{aligned}d(z_{n+m},z_{n})&\leq \sum _{j=n+1}^{n+m}d(z_{j},z_{j-1})\\&\leq \sum _{j=n+1}^{n+m}\lambda ^{j-1}d(z_{1},z_{0})\\&\leq \sum _{j=n+1}^{\infty }\lambda ^{j-1}d(z_{1},z_{0})\\&=d(z_{1},z_{0})\lambda ^{n}{\frac {1}{1-\lambda }}\end{aligned}}} .

The latter expression goes to zero as ${\displaystyle n\to \infty }$  and hence we are dealing with a Cauchy sequence. As we are in a complete metric space, it converges to a limit ${\displaystyle x}$ . This limit further is a fixed point, as the continuity of ${\displaystyle f}$  (${\displaystyle f}$  is Lipschitz continuous with constant ${\displaystyle \lambda }$ ) implies

${\displaystyle x=\lim _{n\to \infty }z_{n}=\lim _{n\to \infty }f(z_{n-1})=f(\lim _{n\to \infty }z_{n-1})=f(x)}$ .${\displaystyle \Box }$

A corollary to this important result is the following lemma, which shall be the main ingredient for the proof of the inverse function theorem:

Lemma:

Let ${\displaystyle g:{\overline {B_{r}(0)}}\to {\overline {B_{r}(0)}}}$  (${\displaystyle {\overline {B_{r}(0)}}\subset \mathbb {R} ^{n}}$  denoting the closed ball of radius ${\displaystyle r}$ ) be a function which is Lipschitz continuous with Lipschitz constant less or equal ${\displaystyle 1/2}$  such that ${\displaystyle g(0)=0}$ . Then the function

${\displaystyle f:{\overline {B_{r}(0)}}\to \mathbb {R} ^{n},f(x):=g(x)+x}$

is injective and ${\displaystyle B_{r/2}(0)\subseteq f(B_{r}(0))}$ .

Proof:

First, we note that for ${\displaystyle y\in B_{r/2}(0)}$  the function

${\displaystyle h:{\overline {B_{r}(0)}}\to \mathbb {R} ^{n},h(z):=y-g(z)}$

is a strict contraction; this is due to

${\displaystyle \|y-g(z)-(y-g(z'))\|=\|g(z')-g(z)\|\leq {\frac {1}{2}}\|z-z'\|}$ .

Furthermore, it maps ${\displaystyle {\overline {B_{r}(0)}}}$  to itself, since for ${\displaystyle z\in {\overline {B_{r}(0)}}}$

${\displaystyle \|y-g(z)\|\leq \|y\|+\|g(z-0)\|\leq {\frac {r}{2}}+{\frac {1}{2}}\|z\|\leq r}$ .

Hence, the Banach fixed-point theorem is applicable to ${\displaystyle h}$ . Now ${\displaystyle x}$  being a fixed point of ${\displaystyle h}$  is equivalent to

${\displaystyle f(x)=y}$ ,

and thus ${\displaystyle B_{r/2}(0)\subseteq f(B_{r}(0))}$  follows from the existence of fixed points. Furthermore, if ${\displaystyle f(x)=f(x')}$ , then

${\displaystyle {\frac {1}{2}}\|x-x'\|\geq \|g(x)-g(x')\|=\|f(x)-x-(f(x')-x')\|=\|x-x'\|}$

and hence ${\displaystyle x=x'}$ . Thus injectivity.${\displaystyle \Box }$

## The inverse function theorem

Theorem:

Let ${\displaystyle f:\mathbb {R} ^{n}\to \mathbb {R} ^{n}}$  be a function which is continuously differentiable in a neighbourhood ${\displaystyle x_{0}\in \mathbb {R} ^{n}}$  such that ${\displaystyle f'(x_{0})}$  is invertible. Then there exists an open set ${\displaystyle U\subseteq \mathbb {R} ^{n}}$  with ${\displaystyle x_{0}\in U}$  such that ${\displaystyle f|_{U}}$  is a bijective function with an inverse ${\displaystyle f^{-1}:f(U)\to U}$  which is differentiable at ${\displaystyle x_{0}}$  and satisfies

${\displaystyle (f^{-1})'(f(x_{0}))=(f'(x_{0}))^{-1}}$ .

Proof:

We first reduce to the case ${\displaystyle f(x_{0})=0}$ , ${\displaystyle x_{0}=0}$  and ${\displaystyle f'(x_{0})={\text{Id}}}$ . Indeed, suppose for all those functions the theorem holds, and let now ${\displaystyle h}$  be an arbitrary function satisfying the requirements of the theorem (where the differentiability is given at ${\displaystyle x_{0}}$ ). We set

${\displaystyle {\tilde {h}}(x):=h'(x_{0})^{-1}(h(x_{0}-x)-h(x_{0}))}$

and obtain that ${\displaystyle {\tilde {h}}}$  is differentiable at ${\displaystyle 0}$  with differential ${\displaystyle {\text{Id}}}$  and ${\displaystyle {\tilde {h}}(0)=0}$ ; the first property follows since we multiply both the function and the linear-affine approximation by ${\displaystyle h'(x_{0})^{-1}}$  and only shift the function, and the second one is seen from inserting ${\displaystyle x=0}$ . Hence, we obtain an inverse of ${\displaystyle {\tilde {h}}}$  with it's differential at ${\displaystyle {\tilde {h}}(0)=0}$ , and if we now set

${\displaystyle h^{-1}(y):=({\tilde {h}}^{-1}(h'(x_{0})^{-1}(y-h(x_{0})))-x_{0})}$ ,

it can be seen that ${\displaystyle h^{-1}}$  is an inverse of ${\displaystyle h}$  with all the required properties (which is a bit of a tedious exercise, but involves nothing more than the definitions).

Thus let ${\displaystyle f}$  be a function such that ${\displaystyle f(0)=0}$ , ${\displaystyle f}$  is invertible at ${\displaystyle 0}$  and ${\displaystyle f'(0)={\text{Id}}}$ . We define

${\displaystyle g(x):=f(x)-x}$ .

The differential of this function is zero (since taking the differential is linear and the differential of the function ${\displaystyle x\mapsto x}$  is the identity). Since the function ${\displaystyle g}$  is also continuously differentiable at a small neighbourhood of ${\displaystyle 0}$ , we find ${\displaystyle \delta >0}$  such that

${\displaystyle {\frac {\partial g}{\partial x_{j}}}(x)<{\frac {1}{2n^{2}}}}$

for all ${\displaystyle j\in \{1,\ldots ,n\}}$  and ${\displaystyle x\in B_{\delta }(0)}$ . Since further ${\displaystyle g(0)=f(0)-0=0}$ , the general mean-value theorem and Cauchy's inequality imply that for ${\displaystyle k\in \{1,\ldots ,n\}}$  and ${\displaystyle x\in B_{\delta }(0)}$ ,

${\displaystyle |g_{k}(x)|=|\langle x,{\frac {\partial g}{\partial x_{j}}}(t_{k}x)\rangle |\leq \|x\|n{\frac {1}{2n^{2}}}}$

for suitable ${\displaystyle t_{k}\in [0,1]}$ . Hence,

${\displaystyle \|g(x)\|\leq |g_{1}(x)|+\cdots +|g_{n}(x)|\leq {\frac {1}{2}}\|x\|}$  (triangle inequality),

and thus, we obtain that our preparatory lemma is applicable, and ${\displaystyle f}$  is a bijection on ${\displaystyle {\overline {B_{\delta }(0)}}}$ , whose image is contained within the open set ${\displaystyle {\overline {B_{\delta /2}(0)}}}$ ; thus we may pick ${\displaystyle U:=f^{-1}(B_{\delta /2}(0))}$ , which is open due to the continuity of ${\displaystyle f}$ .

Thus, the most important part of the theorem is already done. All that is left to do is to prove differentiability of ${\displaystyle f^{-1}}$  at ${\displaystyle 0}$ . Now we even prove the slightly stronger claim that the differential of ${\displaystyle f^{-1}}$  at ${\displaystyle x_{0}}$  is given by the identity, although this would also follow from the chain rule once differentiability is proven.

Note now that the contraction identity for ${\displaystyle g}$  implies the following bounds on ${\displaystyle f}$ :

${\displaystyle {\frac {1}{2}}\|x\|\leq \|f(x)\|\leq {\frac {3}{2}}\|x\|}$ .

The second bound follows from

${\displaystyle \|f(x)\|\leq \|f(x)-x\|+\|x\|=\|g(x)\|+\|x\|\leq {\frac {3}{2}}\|x\|}$ ,

and the first bound follows from

${\displaystyle \|f(x)\|\geq |\|f(x)-x\|-\|x\||=\left|\|g(x)\|-\|x\|\right|\geq {\frac {1}{2}}\|x\|}$ .

Now for the differentiability at ${\displaystyle 0}$ . We have, by substitution of limits (as ${\displaystyle f}$  is continuous and ${\displaystyle f(0)=0}$ ):

{\displaystyle {\begin{aligned}\lim _{\mathbf {h} \to 0}{\frac {\|f^{-1}(\mathbf {h} )-f^{-1}(0)-\operatorname {Id} (\mathbf {h} -0)\|}{\|\mathbf {h} \|}}&=\lim _{\mathbf {h} \to 0}{\frac {\|f^{-1}(f(\mathbf {h} ))-f(\mathbf {h} )\|}{\|f(\mathbf {h} )\|}}\\&=\lim _{\mathbf {h} \to 0}{\frac {\|\mathbf {h} -f(\mathbf {h} )\|}{\|f(\mathbf {h} )\|}},\end{aligned}}}

where the last expression converges to zero due to the differentiability of ${\displaystyle f}$  at ${\displaystyle 0}$  with differential the identity, and the sandwhich criterion applied to the expressions

${\displaystyle {\frac {\|\mathbf {h} -f(\mathbf {h} )\|}{{\frac {3}{2}}\|\mathbf {h} \|}}}$

and

${\displaystyle {\frac {\|\mathbf {h} -f(\mathbf {h} )\|}{{\frac {1}{2}}\|\mathbf {h} \|}}}$ .${\displaystyle \Box }$

## The implicit function theorem

Theorem:

Let ${\displaystyle f:\mathbb {R} ^{n}\to \mathbb {R} }$  be a continuously differentiable function, and consider the set

${\displaystyle S:=\{(x_{1},\ldots ,x_{n})\in \mathbb {R} ^{n}|f(x_{1},\ldots ,x_{n})=0\}}$ .

If we are given some ${\displaystyle y\in S}$  such that ${\displaystyle \partial _{n}f(y)\neq 0}$ , then we find ${\displaystyle U\subseteq \mathbb {R} ^{n-1}}$  open with ${\displaystyle (y_{1},\ldots ,y_{n-1})\in U}$  and ${\displaystyle g:U\to S}$  such that

${\displaystyle y=g(y_{1},\ldots ,y_{n-1})}$  and ${\displaystyle \{(z_{1},\ldots ,z_{n-1},g(z_{1},\ldots ,z_{n-1}))|(z_{1},\ldots ,z_{n-1})\in U\}\subseteq S}$ ,

where ${\displaystyle \{(z_{1},\ldots ,z_{n-1},g(z_{1},\ldots ,z_{n-1}))|(z_{1},\ldots ,z_{n-1})\in U\}}$  is open with respect to the subspace topology of ${\displaystyle U}$ .

Furthermore, ${\displaystyle g}$  is a differentiable function.

Proof:

We define a new function

${\displaystyle F:\mathbb {R} ^{n}\to \mathbb {R} ^{n},F(x_{1},\ldots ,x_{n}):=(x_{1},\ldots ,x_{n-1},f(x_{1},\ldots ,x_{n}))}$ .

The differential of this function looks like this:

${\displaystyle F'(x)={\begin{pmatrix}1&0&\cdots &&0\\0&1&&&\vdots \\\vdots &&\ddots &&\\0&\cdots &0&1&0\\\partial _{1}f(x)&&\cdots &&\partial _{n}f(x)\end{pmatrix}}}$

Since we assumed that ${\displaystyle \partial _{n}f(y)\neq 0}$ , ${\displaystyle F'(y)}$  is invertible, and hence the inverse function theorem implies the existence of a small open neighbourhood ${\displaystyle {\tilde {V}}\subseteq \mathbb {R} ^{n}}$  containing ${\displaystyle y}$  such that restricted to that neighbourhood ${\displaystyle F}$  is itself invertible, with a differentiable inverse ${\displaystyle F^{-1}}$ , which is itself defined on an open set ${\displaystyle {\tilde {U}}}$  containing ${\displaystyle F(y)}$ . Now set first

${\displaystyle U:=\{(x_{1},\ldots ,x_{n-1})|(x_{1},\ldots ,x_{n-1},0)\in {\tilde {U}}\}}$ ,

which is open with respect to the subspace topology of ${\displaystyle \mathbb {R} ^{n-1}}$ , and then

${\displaystyle g:U\to \mathbb {R} ,g(x_{1},\ldots ,x_{n-1}):=\pi _{n}(F^{-1}(x_{1},\ldots ,x_{n-1},0))}$ ,

the ${\displaystyle n}$ -th component of ${\displaystyle F^{-1}(x_{1},\ldots ,x_{n-1},0)}$ . We claim that ${\displaystyle g}$  has the desired properties.

Indeed, we first note that ${\displaystyle F^{-1}(x_{1},\ldots ,x_{n-1},0)=(x_{1},\ldots ,x_{n-1},g(x_{1},\ldots ,x_{n-1}))}$ , since applying ${\displaystyle F}$  leaves the first ${\displaystyle n-1}$  components unchanged, and thus we get the identity by observing ${\displaystyle F(F^{-1}(x))=x}$ . Let thus ${\displaystyle (z_{1},\ldots ,z_{n-1})\in U}$ . Then

{\displaystyle {\begin{aligned}f(z_{1},\ldots ,z_{n-1},g(z_{1},\ldots ,z_{n-1}))&=(\pi _{n}\circ F)(F^{-1}(z_{1},\ldots ,z_{n-1},0))\\&=\pi _{n}((F\circ F^{-1})(z_{1},\ldots ,z_{n-1},0))=0\end{aligned}}} .

Furthermore, the set

${\displaystyle \{(z_{1},\ldots ,z_{n-1},g(z_{1},\ldots ,z_{n-1}))|(z_{1},\ldots ,z_{n-1})\in U\}}$

is open with respect to the subspace topology on ${\displaystyle S}$ . Indeed, we show

${\displaystyle \{(z_{1},\ldots ,z_{n-1},g(z_{1},\ldots ,z_{n-1}))|(z_{1},\ldots ,z_{n-1})\in U\}=S\cap {\tilde {V}}}$ .

For ${\displaystyle \subseteq }$ , we first note that the set on the left hand side is in ${\displaystyle S}$ , since all points in it are mapped to zero by ${\displaystyle f}$ . Further,

${\displaystyle F(z_{1},\ldots ,z_{n-1},g(z_{1},\ldots ,z_{n-1}))=(z_{1},\ldots ,z_{n-1},0)\in {\tilde {U}}}$

and hence ${\displaystyle \subseteq }$  is completed when applying ${\displaystyle F^{-1}}$ . For the other direction, let a point ${\displaystyle (x_{1},\ldots ,x_{n})}$  in ${\displaystyle S\cap {\tilde {V}}}$  be given, apply ${\displaystyle F}$  to get

${\displaystyle F((x_{1},\ldots ,x_{n}))=(x_{1},\ldots ,x_{n-1},0)\in {\tilde {U}}}$

and hence ${\displaystyle (x_{1},\ldots ,x_{n-1})\in U}$ ; further

${\displaystyle (x_{1},\ldots ,x_{n-1},g(x_{1},\ldots ,x_{n-1}))=(x_{1},\ldots ,x_{n})}$

by applying ${\displaystyle F}$  to both sides of the equation.

Now ${\displaystyle g}$  is automatically differentiable as the component of a differentiable function.${\displaystyle \Box }$

Informally, the above theorem states that given a set ${\displaystyle \{x\in \mathbb {R} ^{n}|f(x)=0\}}$ , one can choose the first ${\displaystyle n-1}$  coordinates as a "base" for a function, whose graph is precisely a local bit of that set.