Theorem
A linear transformation
L
:
R
n
→
R
m
{\displaystyle L:\mathbb {R} ^{n}\to \mathbb {R} ^{m}}
amounts to multiplication by a uniquely defined matrix; that is, there exists a unique matrix
A
∈
R
m
×
n
{\displaystyle A\in \mathbb {R} ^{m\times n}}
such that
∀
v
→
∈
R
n
:
L
(
v
→
)
=
A
v
→
{\displaystyle \forall {\vec {v}}\in \mathbb {R} ^{n}:L({\vec {v}})=A{\vec {v}}}
Proof
We set the column vectors
(
a
1
,
j
a
2
,
j
⋮
a
n
,
j
)
:=
L
(
e
→
j
)
{\displaystyle {\begin{pmatrix}a_{1,j}\\a_{2,j}\\\vdots \\a_{n,j}\end{pmatrix}}:=L({\vec {e}}_{j})}
where
{
e
→
1
,
…
,
e
→
n
}
{\displaystyle \{{\vec {e}}_{1},\ldots ,{\vec {e}}_{n}\}}
is the standard basis of
R
n
{\displaystyle \mathbb {R} ^{n}}
. Then we define from this
A
:=
(
a
1
,
1
⋯
a
1
,
n
⋮
⋱
⋮
a
n
,
1
⋯
a
n
,
n
)
{\displaystyle A:={\begin{pmatrix}a_{1,1}&\cdots &a_{1,n}\\\vdots &\ddots &\vdots \\a_{n,1}&\cdots &a_{n,n}\end{pmatrix}}}
and note that for any vector
v
→
=
(
v
1
,
…
,
v
n
)
t
{\displaystyle {\vec {v}}=(v_{1},\ldots ,v_{n})^{t}}
of
R
n
{\displaystyle \mathbb {R} ^{n}}
we obtain
A
v
→
=
A
(
∑
j
=
1
n
v
j
e
→
j
)
=
∑
j
=
1
n
A
v
j
e
→
j
=
∑
j
=
1
n
v
j
L
(
e
→
j
)
=
L
(
∑
j
=
1
n
v
j
e
→
j
)
=
L
(
v
→
)
{\displaystyle A{\vec {v}}=A\left(\sum _{j=1}^{n}v_{j}{\vec {e}}_{j}\right)=\sum _{j=1}^{n}Av_{j}{\vec {e}}_{j}=\sum _{j=1}^{n}v_{j}L({\vec {e}}_{j})=L\left(\sum _{j=1}^{n}v_{j}{\vec {e}}_{j}\right)=L({\vec {v}})}
Thus, we have shown existence. To prove uniqueness, suppose there were any other matrix
B
∈
R
m
×
n
{\displaystyle B\in \mathbb {R} ^{m\times n}}
with the property that
∀
v
→
∈
R
n
:
L
(
v
→
)
=
B
v
→
{\displaystyle \forall {\vec {v}}\in \mathbb {R} ^{n}:L({\vec {v}})=B{\vec {v}}}
. Then in particular,
B
e
→
j
=
L
(
e
→
j
)
{\displaystyle B{\vec {e}}_{j}=L({\vec {e}}_{j})}
which already implies that
A
=
B
{\displaystyle A=B}
(since all the columns of both matrices are identical).
◻
{\displaystyle \Box }
How to generalise the derivative
edit
It is not immediately straightforward how one would generalize the derivative to higher dimensions. For, if we take the definition of the derivative at a point
x
0
{\displaystyle x_{0}}
lim
h
→
0
f
(
x
0
+
h
)
−
f
(
x
0
)
h
{\displaystyle \lim _{h\to 0}{\frac {f(x_{0}+h)-f(x_{0})}{h}}}
and insert vectors for
h
{\displaystyle h}
and
x
0
{\displaystyle x_{0}}
, we would divide the whole thing by a vector. But this is not defined.
Hence, we shall rephrase the definition of the derivative a bit and cast it into a form where it can be generalized to higher dimensions.
Theorem
Let
f
:
R
→
R
{\displaystyle f:\mathbb {R} \to \mathbb {R} }
be a one-dimensional function and let
x
0
∈
R
{\displaystyle x_{0}\in \mathbb {R} }
. Then
f
{\displaystyle f}
is differentiable at
x
0
{\displaystyle x_{0}}
if and only if there exists a linear function
l
:
R
→
R
{\displaystyle l:\mathbb {R} \to \mathbb {R} }
such that
lim
h
→
0
|
f
(
x
0
+
h
)
−
(
f
(
x
0
)
+
l
(
h
)
)
|
|
h
|
=
0
{\displaystyle \lim _{h\to 0}{\frac {{\Big |}f(x_{0}+h)-{\big (}f(x_{0})+l(h){\big )}{\Big |}}{|h|}}=0}
We note that according to the above, linear functions
l
:
R
→
R
{\displaystyle l:\mathbb {R} \to \mathbb {R} }
are given by multiplication by a
1
×
1
{\displaystyle 1\times 1}
-matrix, that is, a scalar.
Proof
First assume that
f
{\displaystyle f}
is differentiable at
x
0
{\displaystyle x_{0}}
. We set
l
(
h
)
:=
f
′
(
x
0
)
⋅
h
{\displaystyle l(h):=f'(x_{0})\cdot h}
and obtain
|
f
(
x
0
+
h
)
−
(
f
(
x
0
)
+
l
(
h
)
)
|
|
h
|
=
|
f
(
x
0
+
h
)
−
f
(
x
0
)
h
−
f
′
(
x
0
)
|
{\displaystyle {\frac {{\Big |}f(x_{0}+h)-{\big (}f(x_{0})+l(h){\big )}{\Big |}}{|h|}}=\left|{\frac {f(x_{0}+h)-f(x_{0})}{h}}-f'(x_{0})\right|}
which converges to 0 due to the definition of
f
′
(
x
0
)
{\displaystyle f'(x_{0})}
.
Assume now that we are given an
l
:
R
→
R
{\displaystyle l:\mathbb {R} \to \mathbb {R} }
such that
lim
h
→
0
|
f
(
x
0
+
h
)
−
(
f
(
x
0
)
+
l
(
h
)
)
|
|
h
|
=
0
{\displaystyle \lim _{h\to 0}{\frac {{\Big |}f(x_{0}+h)-{\big (}f(x_{0})+l(h){\big )}{\Big |}}{|h|}}=0}
Let
c
{\displaystyle c}
be the scalar associated to
l
{\displaystyle l}
. Then by an analogous computation
f
′
(
x
0
)
=
c
{\displaystyle f'(x_{0})=c}
.
◻
{\displaystyle \Box }
With the latter formulation of differentiability from the above theorem, we may readily generalize to higher dimensions, since division by the Euclidean norm of a vector is defined, and linear mappings are also defined in higher dimensions.
Definition
A function
f
:
R
m
→
R
n
{\displaystyle f:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}
is called differentiable or totally differentiable at a point
x
0
∈
R
m
{\displaystyle x_{0}\in \mathbb {R} ^{m}}
if and only if there exists a linear function
L
:
R
m
→
R
n
{\displaystyle L:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}
such that
lim
h
→
→
0
‖
f
(
x
0
+
h
→
)
−
(
f
(
x
0
)
+
L
(
h
→
)
)
‖
‖
h
→
‖
=
0
{\displaystyle \lim _{{\vec {h}}\to 0}{\frac {{\Big \|}f(x_{0}+{\vec {h}})-{\big (}f(x_{0})+L({\vec {h}}){\big )}{\Big \|}}{\|{\vec {h}}\|}}=0}
We have already proven that this definition coincides with the usual one in the one-dim. case (that is
m
=
n
=
1
{\displaystyle m=n=1}
).
We have the following theorem:
Theorem
Let
S
⊆
R
m
{\displaystyle S\subseteq \mathbb {R} ^{m}}
be a set, let
x
0
∈
S
∘
{\displaystyle x_{0}\in {\overset {\circ }{S}}}
be an interior point of
S
{\displaystyle S}
, and let
f
:
S
→
R
m
{\displaystyle f:S\to \mathbb {R} ^{m}}
be a function differentiable at
x
0
{\displaystyle x_{0}}
. Then the linear map
L
{\displaystyle L}
such that
lim
h
→
→
0
‖
f
(
x
0
+
h
→
)
−
(
f
(
x
0
)
+
L
(
h
→
)
)
‖
‖
h
→
‖
=
0
{\displaystyle \lim _{{\vec {h}}\to 0}{\frac {{\Big \|}f(x_{0}+{\vec {h}})-{\big (}f(x_{0})+L({\vec {h}}){\big )}{\Big \|}}{\|{\vec {h}}\|}}=0}
is unique; that is, there exists only one such map
L
{\displaystyle L}
.
Proof
Since
x
0
{\displaystyle x_{0}}
is an interior point of
S
{\displaystyle S}
, we find
r
>
0
{\displaystyle r>0}
such that
B
r
(
x
0
)
⊆
S
{\displaystyle B_{r}(x_{0})\subseteq S}
. Let now
K
:
R
m
→
R
n
{\displaystyle K:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}
be any other linear mapping with the property that
lim
h
→
→
0
‖
f
(
x
0
+
h
→
)
−
(
f
(
x
0
)
+
K
(
h
→
)
)
‖
‖
h
→
‖
=
0
{\displaystyle \lim _{{\vec {h}}\to 0}{\frac {{\Big \|}f(x_{0}+{\vec {h}})-{\big (}f(x_{0})+K({\vec {h}}){\big )}{\Big \|}}{\|{\vec {h}}\|}}=0}
We note that for all vectors of the standard basis
{
e
1
,
…
,
e
n
}
{\displaystyle \{e_{1},\ldots ,e_{n}\}}
, the numbers
λ
e
j
{\displaystyle \lambda e_{j}}
for
0
≤
λ
<
r
{\displaystyle 0\leq \lambda <r}
are contained within
S
{\displaystyle S}
. Hence, we obtain by the triangle inequality
‖
L
(
e
→
j
)
−
K
(
e
→
j
)
‖
=
‖
L
(
λ
e
→
j
)
−
K
(
λ
e
→
j
)
‖
‖
λ
e
→
j
‖
≤
‖
f
(
x
0
+
λ
e
→
j
)
−
(
f
(
x
0
)
+
L
(
λ
e
→
j
)
)
‖
‖
λ
e
→
j
‖
+
‖
f
(
x
0
+
λ
e
→
j
)
−
(
f
(
x
0
)
+
K
(
λ
e
→
j
)
)
‖
‖
λ
e
→
j
‖
{\displaystyle {\Big \|}L({\vec {e}}_{j})-K({\vec {e}}_{j}){\Big \|}={\frac {{\bigl \|}L(\lambda {\vec {e}}_{j})-K(\lambda {\vec {e}}_{j}){\bigr \|}}{\|\lambda {\vec {e}}_{j}\|}}\leq {\frac {{\Big \|}f(x_{0}+\lambda {\vec {e}}_{j})-{\big (}f(x_{0})+L(\lambda {\vec {e}}_{j}){\big )}{\Big \|}}{\|\lambda {\vec {e}}_{j}\|}}+{\frac {{\Big \|}f(x_{0}+\lambda {\vec {e}}_{j})-{\big (}f(x_{0})+K(\lambda {\vec {e}}_{j}){\big )}{\Big \|}}{\|\lambda {\vec {e}}_{j}\|}}}
Taking
λ
→
0
{\displaystyle \lambda \to 0}
, we see that
L
(
e
→
j
)
=
K
(
e
→
j
)
{\displaystyle L({\vec {e}}_{j})=K({\vec {e}}_{j})}
. Thus,
L
{\displaystyle L}
and
K
{\displaystyle K}
coincide on all basis vectors, and since every other vector can be expressed as a linear combination of those, by linearity of
L
{\displaystyle L}
and
K
{\displaystyle K}
we obtain
L
=
K
{\displaystyle L=K}
.
◻
{\displaystyle \Box }
Thus, the following definition is justified:
Definition
Let
f
:
S
→
R
n
{\displaystyle f:S\to \mathbb {R} ^{n}}
be a function (where
S
⊆
R
m
{\displaystyle S\subseteq \mathbb {R} ^{m}}
is a subset of
R
m
{\displaystyle \mathbb {R} ^{m}}
), and let
x
0
{\displaystyle x_{0}}
be an interior point of
S
{\displaystyle S}
such that
f
{\displaystyle f}
is differentiable at
x
0
{\displaystyle x_{0}}
. Then the unique linear function
L
{\displaystyle L}
such that
lim
h
→
→
0
‖
f
(
x
0
+
h
→
)
−
(
f
(
x
0
)
+
L
(
h
→
)
)
‖
‖
h
→
‖
=
0
{\displaystyle \lim _{{\vec {h}}\to 0}{\frac {{\Big \|}f(x_{0}+{\vec {h}})-{\big (}f(x_{0})+L({\vec {h}}){\big )}{\Big \|}}{\|{\vec {h}}\|}}=0}
is called the differential of
f
{\displaystyle f}
at
x
0
{\displaystyle x_{0}}
and is denoted
f
(
x
0
)
:=
L
{\displaystyle f(x_{0}):=L}
.
Directional and partial derivatives
edit
We shall first define directional derivatives.
Definition
Let
f
:
R
m
→
R
n
{\displaystyle f:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}
be a function, and let
v
→
∈
R
m
{\displaystyle {\vec {v}}\in \mathbb {R} ^{m}}
be a vector. If the limit
lim
h
→
0
f
(
x
0
+
h
v
→
)
−
f
(
x
0
)
h
{\displaystyle \lim _{h\to 0}{\frac {f(x_{0}+h{\vec {v}})-f(x_{0})}{h}}}
exists, it is called directional derivative of
f
{\displaystyle f}
in direction
v
→
{\displaystyle {\vec {v}}}
. We denote it by
D
v
→
f
(
x
0
)
{\displaystyle D_{\vec {v}}f(x_{0})}
.
The following theorem relates directional derivatives and the differential of a totally differentiable function:
Theorem
Let
f
:
R
m
→
R
n
{\displaystyle f:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}
be a function that is totally differentiable at
x
0
{\displaystyle x_{0}}
, and let
v
→
∈
R
m
∖
{
0
}
{\displaystyle {\vec {v}}\in \mathbb {R} ^{m}\setminus \{0\}}
be a nonzero vector. Then
D
v
→
f
(
x
0
)
{\displaystyle D_{\vec {v}}f(x_{0})}
exists and is equal to
f
′
(
x
0
)
v
→
{\displaystyle f'(x_{0}){\vec {v}}}
.
Proof
According to the very definition of total differentiability,
lim
h
→
0
‖
f
(
x
0
+
h
v
→
)
−
f
(
x
0
)
|
h
|
⋅
‖
v
→
‖
−
f
′
(
x
0
)
v
→
|
h
|
⋅
‖
v
→
‖
‖
=
0
{\displaystyle \lim _{h\to 0}\left\|{\frac {f(x_{0}+h{\vec {v}})-f(x_{0})}{|h|\cdot \|{\vec {v}}\|}}-{\frac {f'(x_{0}){\vec {v}}}{|h|\cdot \|{\vec {v}}\|}}\right\|=0}
Hence,
lim
h
→
0
‖
f
(
x
0
+
h
v
→
)
−
f
(
x
0
)
|
h
|
−
f
′
(
x
0
)
v
→
|
h
|
‖
=
0
{\displaystyle \lim _{h\to 0}\left\|{\frac {f(x_{0}+h{\vec {v}})-f(x_{0})}{|h|}}-{\frac {f'(x_{0}){\vec {v}}}{|h|}}\right\|=0}
by multiplying the above equation by
‖
v
→
‖
{\displaystyle \|{\vec {v}}\|}
. Noting that
‖
f
(
x
0
+
h
v
→
)
−
f
(
x
0
)
|
h
|
−
f
′
(
x
0
)
v
→
|
h
|
‖
=
‖
f
(
x
0
+
h
v
→
)
−
f
(
x
0
)
h
−
f
′
(
x
0
)
v
→
h
‖
{\displaystyle \left\|{\frac {f(x_{0}+h{\vec {v}})-f(x_{0})}{|h|}}-{\frac {f'(x_{0}){\vec {v}}}{|h|}}\right\|=\left\|{\frac {f(x_{0}+h{\vec {v}})-f(x_{0})}{h}}-{\frac {f'(x_{0}){\vec {v}}}{h}}\right\|}
the theorem follows.
◻
{\displaystyle \Box }
A special case of directional derivatives are partial derivatives:
Definition
Let
{
e
→
1
,
…
,
e
→
m
}
{\displaystyle \{{\vec {e}}_{1},\ldots ,{\vec {e}}_{m}\}}
be the standard basis of
R
m
{\displaystyle \mathbb {R} ^{m}}
, let
x
0
∈
R
m
{\displaystyle x_{0}\in \mathbb {R} ^{m}}
and let
f
:
R
m
→
R
n
{\displaystyle f:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}
be a function such that the directional derivatives
D
e
→
j
f
(
x
0
)
{\displaystyle D_{{\vec {e}}_{j}}f(x_{0})}
all exist. Then we set
∂
f
∂
x
j
:=
D
e
→
j
f
(
x
0
)
{\displaystyle {\frac {\partial f}{\partial x_{j}}}:=D_{{\vec {e}}_{j}}f(x_{0})}
and call it the partial derivative in the direction of
x
j
{\displaystyle x_{j}}
.
In fact, by writing down the definition of
D
e
→
j
f
(
x
0
)
{\displaystyle D_{{\vec {e}}_{j}}f(x_{0})}
, we see that the partial derivative in the direction of
x
j
{\displaystyle x_{j}}
is nothing else than the derivative of the function
y
↦
f
(
x
0
,
1
,
…
,
x
0
,
j
−
1
,
y
,
x
0
,
j
+
1
,
…
,
x
0
,
m
)
{\displaystyle y\mapsto f(x_{0,1},\ldots ,x_{0,j-1},y,x_{0,j+1},\ldots ,x_{0,m})}
in the variable
y
{\displaystyle y}
at the place
x
0
,
j
{\displaystyle x_{0,j}}
. That is, for instance, if
f
(
x
,
y
,
z
)
=
x
2
+
4
z
3
+
3
x
y
{\displaystyle f(x,y,z)=x^{2}+4z^{3}+3xy}
then
∂
f
∂
x
=
2
x
+
3
y
,
∂
f
∂
y
=
3
x
,
∂
f
∂
z
=
12
z
2
{\displaystyle {\frac {\partial f}{\partial x}}=2x+3y\ ,\ {\frac {\partial f}{\partial y}}=3x\ ,\ {\frac {\partial f}{\partial z}}=12z^{2}}
that is, when forming a partial derivative, we regard the other variables as constant and derive only with respect to the variable we are considering.
From the above, we know that the differential of a function
f
′
(
x
0
)
{\displaystyle f'(x_{0})}
has an associated matrix representing the linear map thus defined. Under a condition, we can determine this matrix from the partial derivatives of the component functions.
Theorem
Let
f
:
R
m
→
R
n
{\displaystyle f:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}
be a function such that all partial derivatives exist at
x
0
{\displaystyle x_{0}}
and are continuous in each component on
B
r
(
x
0
)
{\displaystyle B_{r}(x_{0})}
for a possibly very small, but positive
r
>
0
{\displaystyle r>0}
. Then
f
{\displaystyle f}
is totally differentiable at
x
0
{\displaystyle x_{0}}
and the differential of
f
{\displaystyle f}
is given by left multiplication by the matrix
J
f
(
x
0
)
:=
(
∂
f
1
∂
x
1
⋯
∂
f
1
∂
x
m
⋮
⋱
⋮
∂
f
n
∂
x
1
⋯
∂
f
n
∂
x
m
)
{\displaystyle J_{f}(x_{0}):={\begin{pmatrix}{\dfrac {\partial f_{1}}{\partial x_{1}}}&\cdots &{\dfrac {\partial f_{1}}{\partial x_{m}}}\\\vdots &\ddots &\vdots \\{\dfrac {\partial f_{n}}{\partial x_{1}}}&\cdots &{\dfrac {\partial f_{n}}{\partial x_{m}}}\end{pmatrix}}}
where
f
=
(
f
1
,
…
,
f
n
)
{\displaystyle f=(f_{1},\ldots ,f_{n})}
.
The matrix
J
f
(
x
0
)
{\displaystyle J_{f}(x_{0})}
is called the Jacobian matrix .
Proof
‖
f
(
x
0
+
h
→
)
−
(
f
(
x
0
)
+
J
f
(
x
0
)
h
→
)
‖
‖
h
→
‖
{\displaystyle {\frac {{\Big \|}f(x_{0}+{\vec {h}})-(f(x_{0})+J_{f}(x_{0}){\vec {h}}){\Big \|}}{\|{\vec {h}}\|}}}
=
‖
∑
j
=
1
n
f
j
(
x
0
+
h
→
)
e
→
j
−
∑
j
=
1
n
(
f
j
(
x
0
)
+
∑
k
=
1
m
h
k
∂
f
j
∂
x
m
(
x
0
)
)
e
→
j
‖
‖
h
→
‖
{\displaystyle ={\frac {\left\|\displaystyle \sum _{j=1}^{n}f_{j}(x_{0}+{\vec {h}}){\vec {e}}_{j}-\sum _{j=1}^{n}\left(f_{j}(x_{0})+\sum _{k=1}^{m}h_{k}{\frac {\partial f_{j}}{\partial x_{m}}}(x_{0})\right){\vec {e}}_{j}\right\|}{\|{\vec {h}}\|}}}
≤
∑
j
=
1
n
‖
f
j
(
x
0
+
h
)
−
(
f
j
(
x
0
)
+
∑
k
=
1
m
h
k
∂
f
j
∂
x
m
(
x
0
)
)
‖
‖
h
→
‖
{\displaystyle \leq \sum _{j=1}^{n}{\frac {\left\|f_{j}(x_{0}+h)-\left(f_{j}(x_{0})+\displaystyle \sum _{k=1}^{m}h_{k}{\frac {\partial f_{j}}{\partial x_{m}}}(x_{0})\right)\right\|}{\|{\vec {h}}\|}}}
We shall now prove that all summands of the last sum go to 0.
Indeed, let
j
∈
{
1
,
…
,
n
}
{\displaystyle j\in \{1,\ldots ,n\}}
. Writing again
h
→
=
(
h
1
,
…
,
h
m
)
{\displaystyle {\vec {h}}=(h_{1},\ldots ,h_{m})}
, we obtain by the one-dimensional mean value theorem, first applied in the first variable, then in the second and so on, the succession of equations
f
j
(
x
0
+
h
1
e
→
1
)
−
f
j
(
x
0
)
=
(
x
0
,
1
+
h
1
−
x
0
,
1
)
⏞
=
h
1
∂
f
j
∂
x
1
(
x
0
+
t
1
e
→
1
)
{\displaystyle f_{j}(x_{0}+h_{1}{\vec {e}}_{1})-f_{j}(x_{0})=\overbrace {(x_{0,1}+h_{1}-x_{0,1})} ^{=h_{1}}{\frac {\partial f_{j}}{\partial x_{1}}}(x_{0}+t_{1}{\vec {e}}_{1})}
f
j
(
x
0
+
h
1
e
→
1
+
h
2
e
→
2
)
−
f
j
(
x
0
+
h
1
e
→
1
)
=
(
x
0
,
2
+
h
2
−
x
0
,
2
)
⏞
=
h
2
∂
f
j
∂
x
2
(
x
0
+
h
1
e
→
1
+
t
2
e
→
2
)
{\displaystyle f_{j}(x_{0}+h_{1}{\vec {e}}_{1}+h_{2}{\vec {e}}_{2})-f_{j}(x_{0}+h_{1}{\vec {e}}_{1})=\overbrace {(x_{0,2}+h_{2}-x_{0,2})} ^{=h_{2}}{\frac {\partial f_{j}}{\partial x_{2}}}(x_{0}+h_{1}{\vec {e}}_{1}+t_{2}{\vec {e}}_{2})}
⋮
{\displaystyle \vdots }
f
j
(
x
0
+
h
1
e
→
1
+
⋯
+
h
m
e
→
m
)
−
f
j
(
x
0
+
h
1
e
→
1
+
⋯
+
h
m
−
1
e
→
m
−
1
)
=
(
x
0
,
m
+
h
m
−
x
0
,
m
)
⏞
=
h
m
∂
f
j
∂
x
m
(
x
0
+
h
1
e
→
1
+
⋯
+
h
m
−
1
e
→
m
−
1
+
t
n
e
→
m
)
{\displaystyle f_{j}(x_{0}+h_{1}{\vec {e}}_{1}+\cdots +h_{m}{\vec {e}}_{m})-f_{j}(x_{0}+h_{1}{\vec {e}}_{1}+\cdots +h_{m-1}{\vec {e}}_{m-1})=\overbrace {(x_{0,m}+h_{m}-x_{0,m})} ^{=h_{m}}{\frac {\partial f_{j}}{\partial x_{m}}}(x_{0}+h_{1}{\vec {e}}_{1}+\cdots +h_{m-1}{\vec {e}}_{m-1}+t_{n}{\vec {e}}_{m})}
for suitably chosen
t
k
∈
[
x
0
,
k
,
x
0
,
k
+
h
k
]
{\displaystyle t_{k}\in [x_{0,k},x_{0,k}+h_{k}]}
. We can now sum all these equations together to obtain
f
j
(
x
0
+
h
→
)
−
f
(
x
0
)
=
∑
k
=
1
m
h
k
∂
f
j
∂
x
k
(
x
0
+
∑
l
=
1
k
−
1
h
l
e
→
l
+
t
k
e
→
k
)
{\displaystyle f_{j}(x_{0}+{\vec {h}})-f(x_{0})=\sum _{k=1}^{m}h_{k}{\frac {\partial f_{j}}{\partial x_{k}}}\left(x_{0}+\sum _{l=1}^{k-1}h_{l}{\vec {e}}_{l}+t_{k}{\vec {e}}_{k}\right)}
Let now
δ
>
0
{\displaystyle \delta >0}
. Using the continuity of the
∂
f
j
∂
x
k
{\displaystyle {\frac {\partial f_{j}}{\partial x_{k}}}}
on
B
r
(
x
0
)
{\displaystyle B_{r}(x_{0})}
, we may choose
δ
k
>
0
{\displaystyle \delta _{k}>0}
such that
|
∂
f
j
∂
x
k
(
x
0
+
∑
l
=
1
k
−
1
h
l
e
→
l
+
t
k
e
k
)
−
∂
f
j
∂
x
m
(
x
0
)
|
<
ϵ
m
{\displaystyle \left|{\frac {\partial f_{j}}{\partial x_{k}}}\left(x_{0}+\sum _{l=1}^{k-1}h_{l}{\vec {e}}_{l}+t_{k}e_{k}\right)-{\frac {\partial f_{j}}{\partial x_{m}}}(x_{0})\right|<{\frac {\epsilon }{m}}}
for
|
h
k
|
<
δ
k
{\displaystyle |h_{k}|<\delta _{k}}
, given that
h
→
∈
B
r
(
0
)
{\displaystyle {\vec {h}}\in B_{r}(0)}
(which we may assume as
h
→
→
0
→
{\displaystyle {\vec {h}}\to {\vec {0}}}
). Hence, we obtain
‖
f
j
(
x
0
+
h
)
−
(
f
j
(
x
0
)
+
∑
k
=
1
m
h
k
∂
f
j
∂
x
m
(
x
0
)
)
‖
‖
h
→
‖
≤
‖
h
→
‖
⋅
m
⋅
ϵ
m
‖
h
→
‖
{\displaystyle {\frac {\left\|f_{j}(x_{0}+h)-\left(f_{j}(x_{0})+\displaystyle \sum _{k=1}^{m}h_{k}{\frac {\partial f_{j}}{\partial x_{m}}}(x_{0})\right)\right\|}{\|{\vec {h}}\|}}\leq {\frac {\|{\vec {h}}\|\cdot m\cdot {\frac {\epsilon }{m}}}{\|{\vec {h}}\|}}}
and thus the theorem.
◻
{\displaystyle \Box }
Corollary
If
f
:
R
m
→
R
n
{\displaystyle f:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}
is continuously differentiable at
x
0
∈
R
m
{\displaystyle x_{0}\in \mathbb {R} ^{m}}
and
v
→
∈
R
m
∖
{
0
}
{\displaystyle {\vec {v}}\in \mathbb {R} ^{m}\setminus \{0\}}
, then
D
v
→
f
(
x
0
)
=
∑
j
=
1
m
v
j
∂
f
∂
x
j
(
x
0
)
{\displaystyle D_{\vec {v}}f(x_{0})=\sum _{j=1}^{m}v_{j}{\frac {\partial f}{\partial x_{j}}}(x_{0})}
Proof
D
v
→
f
(
x
0
)
=
f
′
(
x
0
)
(
v
→
)
=
J
f
(
x
0
)
v
→
=
∑
j
=
1
m
v
j
∂
f
∂
x
j
(
x
0
)
{\displaystyle D_{\vec {v}}f(x_{0})=f'(x_{0})({\vec {v}})=J_{f}(x_{0}){\vec {v}}=\sum _{j=1}^{m}v_{j}{\frac {\partial f}{\partial x_{j}}}(x_{0})}
◻
{\displaystyle \Box }