The matrix of a linear transformation
edit
Theorem
A linear transformation
L
:
R
n
→
R
m
{\displaystyle L:\mathbb {R} ^{n}\to \mathbb {R} ^{m}}
amounts to multiplication by a uniquely defined matrix; that is, there exists a unique matrix
A
∈
R
m
×
n
{\displaystyle A\in \mathbb {R} ^{m\times n}}
such that
∀
v
→
∈
R
n
:
L
(
v
→
)
=
A
v
→
{\displaystyle \forall {\vec {v}}\in \mathbb {R} ^{n}:L({\vec {v}})=A{\vec {v}}}
Proof
We set the column vectors
(
a
1
,
j
a
2
,
j
⋮
a
n
,
j
)
:=
L
(
e
→
j
)
{\displaystyle {\begin{pmatrix}a_{1,j}\\a_{2,j}\\\vdots \\a_{n,j}\end{pmatrix}}:=L({\vec {e}}_{j})}
where
{
e
→
1
,
…
,
e
→
n
}
{\displaystyle \{{\vec {e}}_{1},\ldots ,{\vec {e}}_{n}\}}
is the standard basis of
R
n
{\displaystyle \mathbb {R} ^{n}}
. Then we define from this
A
:=
(
a
1
,
1
⋯
a
1
,
n
⋮
⋱
⋮
a
n
,
1
⋯
a
n
,
n
)
{\displaystyle A:={\begin{pmatrix}a_{1,1}&\cdots &a_{1,n}\\\vdots &\ddots &\vdots \\a_{n,1}&\cdots &a_{n,n}\end{pmatrix}}}
and note that for any vector
v
→
=
(
v
1
,
…
,
v
n
)
t
{\displaystyle {\vec {v}}=(v_{1},\ldots ,v_{n})^{t}}
of
R
n
{\displaystyle \mathbb {R} ^{n}}
we obtain
A
v
→
=
A
(
∑
j
=
1
n
v
j
e
→
j
)
=
∑
j
=
1
n
A
v
j
e
→
j
=
∑
j
=
1
n
v
j
L
(
e
→
j
)
=
L
(
∑
j
=
1
n
v
j
e
→
j
)
=
L
(
v
→
)
{\displaystyle A{\vec {v}}=A\left(\sum _{j=1}^{n}v_{j}{\vec {e}}_{j}\right)=\sum _{j=1}^{n}Av_{j}{\vec {e}}_{j}=\sum _{j=1}^{n}v_{j}L({\vec {e}}_{j})=L\left(\sum _{j=1}^{n}v_{j}{\vec {e}}_{j}\right)=L({\vec {v}})}
Thus, we have shown existence. To prove uniqueness, suppose there were any other matrix
B
∈
R
m
×
n
{\displaystyle B\in \mathbb {R} ^{m\times n}}
with the property that
∀
v
→
∈
R
n
:
L
(
v
→
)
=
B
v
→
{\displaystyle \forall {\vec {v}}\in \mathbb {R} ^{n}:L({\vec {v}})=B{\vec {v}}}
. Then in particular,
B
e
→
j
=
L
(
e
→
j
)
{\displaystyle B{\vec {e}}_{j}=L({\vec {e}}_{j})}
which already implies that
A
=
B
{\displaystyle A=B}
(since all the columns of both matrices are identical).
◻
{\displaystyle \Box }
How to generalise the derivative
edit
It is not immediately straightforward how one would generalize the derivative to higher dimensions. For, if we take the definition of the derivative at a point
x
0
{\displaystyle x_{0}}
lim
h
→
0
f
(
x
0
+
h
)
−
f
(
x
0
)
h
{\displaystyle \lim _{h\to 0}{\frac {f(x_{0}+h)-f(x_{0})}{h}}}
and insert vectors for
h
{\displaystyle h}
and
x
0
{\displaystyle x_{0}}
, we would divide the whole thing by a vector. But this is not defined.
Hence, we shall rephrase the definition of the derivative a bit and cast it into a form where it can be generalized to higher dimensions.
Theorem
Let
f
:
R
→
R
{\displaystyle f:\mathbb {R} \to \mathbb {R} }
be a one-dimensional function and let
x
0
∈
R
{\displaystyle x_{0}\in \mathbb {R} }
. Then
f
{\displaystyle f}
is differentiable at
x
0
{\displaystyle x_{0}}
if and only if there exists a linear function
l
:
R
→
R
{\displaystyle l:\mathbb {R} \to \mathbb {R} }
such that
lim
h
→
0
|
f
(
x
0
+
h
)
−
(
f
(
x
0
)
+
l
(
h
)
)
|
|
h
|
=
0
{\displaystyle \lim _{h\to 0}{\frac {{\Big |}f(x_{0}+h)-{\big (}f(x_{0})+l(h){\big )}{\Big |}}{|h|}}=0}
We note that according to the above, linear functions
l
:
R
→
R
{\displaystyle l:\mathbb {R} \to \mathbb {R} }
are given by multiplication by a
1
×
1
{\displaystyle 1\times 1}
-matrix, that is, a scalar.
Proof
First assume that
f
{\displaystyle f}
is differentiable at
x
0
{\displaystyle x_{0}}
. We set
l
(
h
)
:=
f
′
(
x
0
)
⋅
h
{\displaystyle l(h):=f'(x_{0})\cdot h}
and obtain
|
f
(
x
0
+
h
)
−
(
f
(
x
0
)
+
l
(
h
)
)
|
|
h
|
=
|
f
(
x
0
+
h
)
−
f
(
x
0
)
h
−
f
′
(
x
0
)
|
{\displaystyle {\frac {{\Big |}f(x_{0}+h)-{\big (}f(x_{0})+l(h){\big )}{\Big |}}{|h|}}=\left|{\frac {f(x_{0}+h)-f(x_{0})}{h}}-f'(x_{0})\right|}
which converges to 0 due to the definition of
f
′
(
x
0
)
{\displaystyle f'(x_{0})}
.
Assume now that we are given an
l
:
R
→
R
{\displaystyle l:\mathbb {R} \to \mathbb {R} }
such that
lim
h
→
0
|
f
(
x
0
+
h
)
−
(
f
(
x
0
)
+
l
(
h
)
)
|
|
h
|
=
0
{\displaystyle \lim _{h\to 0}{\frac {{\Big |}f(x_{0}+h)-{\big (}f(x_{0})+l(h){\big )}{\Big |}}{|h|}}=0}
Let
c
{\displaystyle c}
be the scalar associated to
l
{\displaystyle l}
. Then by an analogous computation
f
′
(
x
0
)
=
c
{\displaystyle f'(x_{0})=c}
.
◻
{\displaystyle \Box }
With the latter formulation of differentiability from the above theorem, we may readily generalize to higher dimensions, since division by the Euclidean norm of a vector is defined, and linear mappings are also defined in higher dimensions.
Definition
A function
f
:
R
m
→
R
n
{\displaystyle f:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}
is called differentiable or totally differentiable at a point
x
0
∈
R
m
{\displaystyle x_{0}\in \mathbb {R} ^{m}}
if and only if there exists a linear function
L
:
R
m
→
R
n
{\displaystyle L:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}
such that
lim
h
→
→
0
‖
f
(
x
0
+
h
→
)
−
(
f
(
x
0
)
+
L
(
h
→
)
)
‖
‖
h
→
‖
=
0
{\displaystyle \lim _{{\vec {h}}\to 0}{\frac {{\Big \|}f(x_{0}+{\vec {h}})-{\big (}f(x_{0})+L({\vec {h}}){\big )}{\Big \|}}{\|{\vec {h}}\|}}=0}
We have already proven that this definition coincides with the usual one in the one-dim. case (that is
m
=
n
=
1
{\displaystyle m=n=1}
).
We have the following theorem:
Theorem
Let
S
⊆
R
m
{\displaystyle S\subseteq \mathbb {R} ^{m}}
be a set, let
x
0
∈
S
∘
{\displaystyle x_{0}\in {\overset {\circ }{S}}}
be an interior point of
S
{\displaystyle S}
, and let
f
:
S
→
R
m
{\displaystyle f:S\to \mathbb {R} ^{m}}
be a function differentiable at
x
0
{\displaystyle x_{0}}
. Then the linear map
L
{\displaystyle L}
such that
lim
h
→
→
0
‖
f
(
x
0
+
h
→
)
−
(
f
(
x
0
)
+
L
(
h
→
)
)
‖
‖
h
→
‖
=
0
{\displaystyle \lim _{{\vec {h}}\to 0}{\frac {{\Big \|}f(x_{0}+{\vec {h}})-{\big (}f(x_{0})+L({\vec {h}}){\big )}{\Big \|}}{\|{\vec {h}}\|}}=0}
is unique; that is, there exists only one such map
L
{\displaystyle L}
.
Proof
Since
x
0
{\displaystyle x_{0}}
is an interior point of
S
{\displaystyle S}
, we find
r
>
0
{\displaystyle r>0}
such that
B
r
(
x
0
)
⊆
S
{\displaystyle B_{r}(x_{0})\subseteq S}
. Let now
K
:
R
m
→
R
n
{\displaystyle K:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}
be any other linear mapping with the property that
lim
h
→
→
0
‖
f
(
x
0
+
h
→
)
−
(
f
(
x
0
)
+
K
(
h
→
)
)
‖
‖
h
→
‖
=
0
{\displaystyle \lim _{{\vec {h}}\to 0}{\frac {{\Big \|}f(x_{0}+{\vec {h}})-{\big (}f(x_{0})+K({\vec {h}}){\big )}{\Big \|}}{\|{\vec {h}}\|}}=0}
We note that for all vectors of the standard basis
{
e
1
,
…
,
e
n
}
{\displaystyle \{e_{1},\ldots ,e_{n}\}}
, the numbers
λ
e
j
{\displaystyle \lambda e_{j}}
for
0
≤
λ
<
r
{\displaystyle 0\leq \lambda <r}
are contained within
S
{\displaystyle S}
. Hence, we obtain by the triangle inequality
‖
L
(
e
→
j
)
−
K
(
e
→
j
)
‖
=
‖
L
(
λ
e
→
j
)
−
K
(
λ
e
→
j
)
‖
‖
λ
e
→
j
‖
≤
‖
f
(
x
0
+
λ
e
→
j
)
−
(
f
(
x
0
)
+
L
(
λ
e
→
j
)
)
‖
‖
λ
e
→
j
‖
+
‖
f
(
x
0
+
λ
e
→
j
)
−
(
f
(
x
0
)
+
K
(
λ
e
→
j
)
)
‖
‖
λ
e
→
j
‖
{\displaystyle {\Big \|}L({\vec {e}}_{j})-K({\vec {e}}_{j}){\Big \|}={\frac {{\bigl \|}L(\lambda {\vec {e}}_{j})-K(\lambda {\vec {e}}_{j}){\bigr \|}}{\|\lambda {\vec {e}}_{j}\|}}\leq {\frac {{\Big \|}f(x_{0}+\lambda {\vec {e}}_{j})-{\big (}f(x_{0})+L(\lambda {\vec {e}}_{j}){\big )}{\Big \|}}{\|\lambda {\vec {e}}_{j}\|}}+{\frac {{\Big \|}f(x_{0}+\lambda {\vec {e}}_{j})-{\big (}f(x_{0})+K(\lambda {\vec {e}}_{j}){\big )}{\Big \|}}{\|\lambda {\vec {e}}_{j}\|}}}
Taking
λ
→
0
{\displaystyle \lambda \to 0}
, we see that
L
(
e
→
j
)
=
K
(
e
→
j
)
{\displaystyle L({\vec {e}}_{j})=K({\vec {e}}_{j})}
. Thus,
L
{\displaystyle L}
and
K
{\displaystyle K}
coincide on all basis vectors, and since every other vector can be expressed as a linear combination of those, by linearity of
L
{\displaystyle L}
and
K
{\displaystyle K}
we obtain
L
=
K
{\displaystyle L=K}
.
◻
{\displaystyle \Box }
Thus, the following definition is justified:
Definition
Let
f
:
S
→
R
n
{\displaystyle f:S\to \mathbb {R} ^{n}}
be a function (where
S
⊆
R
m
{\displaystyle S\subseteq \mathbb {R} ^{m}}
is a subset of
R
m
{\displaystyle \mathbb {R} ^{m}}
), and let
x
0
{\displaystyle x_{0}}
be an interior point of
S
{\displaystyle S}
such that
f
{\displaystyle f}
is differentiable at
x
0
{\displaystyle x_{0}}
. Then the unique linear function
L
{\displaystyle L}
such that
lim
h
→
→
0
‖
f
(
x
0
+
h
→
)
−
(
f
(
x
0
)
+
L
(
h
→
)
)
‖
‖
h
→
‖
=
0
{\displaystyle \lim _{{\vec {h}}\to 0}{\frac {{\Big \|}f(x_{0}+{\vec {h}})-{\big (}f(x_{0})+L({\vec {h}}){\big )}{\Big \|}}{\|{\vec {h}}\|}}=0}
is called the differential of
f
{\displaystyle f}
at
x
0
{\displaystyle x_{0}}
and is denoted
f
(
x
0
)
:=
L
{\displaystyle f(x_{0}):=L}
.
Directional and partial derivatives
edit
We shall first define directional derivatives.
Definition
Let
f
:
R
m
→
R
n
{\displaystyle f:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}
be a function, and let
v
→
∈
R
m
{\displaystyle {\vec {v}}\in \mathbb {R} ^{m}}
be a vector. If the limit
lim
h
→
0
f
(
x
0
+
h
v
→
)
−
f
(
x
0
)
h
{\displaystyle \lim _{h\to 0}{\frac {f(x_{0}+h{\vec {v}})-f(x_{0})}{h}}}
exists, it is called directional derivative of
f
{\displaystyle f}
in direction
v
→
{\displaystyle {\vec {v}}}
. We denote it by
D
v
→
f
(
x
0
)
{\displaystyle D_{\vec {v}}f(x_{0})}
.
The following theorem relates directional derivatives and the differential of a totally differentiable function:
Theorem
Let
f
:
R
m
→
R
n
{\displaystyle f:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}
be a function that is totally differentiable at
x
0
{\displaystyle x_{0}}
, and let
v
→
∈
R
m
∖
{
0
}
{\displaystyle {\vec {v}}\in \mathbb {R} ^{m}\setminus \{0\}}
be a nonzero vector. Then
D
v
→
f
(
x
0
)
{\displaystyle D_{\vec {v}}f(x_{0})}
exists and is equal to
f
′
(
x
0
)
v
→
{\displaystyle f'(x_{0}){\vec {v}}}
.
Proof
According to the very definition of total differentiability,
lim
h
→
0
‖
f
(
x
0
+
h
v
→
)
−
f
(
x
0
)
|
h
|
⋅
‖
v
→
‖
−
f
′
(
x
0
)
v
→
|
h
|
⋅
‖
v
→
‖
‖
=
0
{\displaystyle \lim _{h\to 0}\left\|{\frac {f(x_{0}+h{\vec {v}})-f(x_{0})}{|h|\cdot \|{\vec {v}}\|}}-{\frac {f'(x_{0}){\vec {v}}}{|h|\cdot \|{\vec {v}}\|}}\right\|=0}
Hence,
lim
h
→
0
‖
f
(
x
0
+
h
v
→
)
−
f
(
x
0
)
|
h
|
−
f
′
(
x
0
)
v
→
|
h
|
‖
=
0
{\displaystyle \lim _{h\to 0}\left\|{\frac {f(x_{0}+h{\vec {v}})-f(x_{0})}{|h|}}-{\frac {f'(x_{0}){\vec {v}}}{|h|}}\right\|=0}
by multiplying the above equation by
‖
v
→
‖
{\displaystyle \|{\vec {v}}\|}
. Noting that
‖
f
(
x
0
+
h
v
→
)
−
f
(
x
0
)
|
h
|
−
f
′
(
x
0
)
v
→
|
h
|
‖
=
‖
f
(
x
0
+
h
v
→
)
−
f
(
x
0
)
h
−
f
′
(
x
0
)
v
→
h
‖
{\displaystyle \left\|{\frac {f(x_{0}+h{\vec {v}})-f(x_{0})}{|h|}}-{\frac {f'(x_{0}){\vec {v}}}{|h|}}\right\|=\left\|{\frac {f(x_{0}+h{\vec {v}})-f(x_{0})}{h}}-{\frac {f'(x_{0}){\vec {v}}}{h}}\right\|}
the theorem follows.
◻
{\displaystyle \Box }
A special case of directional derivatives are partial derivatives:
Definition
Let
{
e
→
1
,
…
,
e
→
m
}
{\displaystyle \{{\vec {e}}_{1},\ldots ,{\vec {e}}_{m}\}}
be the standard basis of
R
m
{\displaystyle \mathbb {R} ^{m}}
, let
x
0
∈
R
m
{\displaystyle x_{0}\in \mathbb {R} ^{m}}
and let
f
:
R
m
→
R
n
{\displaystyle f:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}
be a function such that the directional derivatives
D
e
→
j
f
(
x
0
)
{\displaystyle D_{{\vec {e}}_{j}}f(x_{0})}
all exist. Then we set
∂
f
∂
x
j
:=
D
e
→
j
f
(
x
0
)
{\displaystyle {\frac {\partial f}{\partial x_{j}}}:=D_{{\vec {e}}_{j}}f(x_{0})}
and call it the partial derivative in the direction of
x
j
{\displaystyle x_{j}}
.
In fact, by writing down the definition of
D
e
→
j
f
(
x
0
)
{\displaystyle D_{{\vec {e}}_{j}}f(x_{0})}
, we see that the partial derivative in the direction of
x
j
{\displaystyle x_{j}}
is nothing else than the derivative of the function
y
↦
f
(
x
0
,
1
,
…
,
x
0
,
j
−
1
,
y
,
x
0
,
j
+
1
,
…
,
x
0
,
m
)
{\displaystyle y\mapsto f(x_{0,1},\ldots ,x_{0,j-1},y,x_{0,j+1},\ldots ,x_{0,m})}
in the variable
y
{\displaystyle y}
at the place
x
0
,
j
{\displaystyle x_{0,j}}
. That is, for instance, if
f
(
x
,
y
,
z
)
=
x
2
+
4
z
3
+
3
x
y
{\displaystyle f(x,y,z)=x^{2}+4z^{3}+3xy}
then
∂
f
∂
x
=
2
x
+
3
y
,
∂
f
∂
y
=
3
x
,
∂
f
∂
z
=
12
z
2
{\displaystyle {\frac {\partial f}{\partial x}}=2x+3y\ ,\ {\frac {\partial f}{\partial y}}=3x\ ,\ {\frac {\partial f}{\partial z}}=12z^{2}}
that is, when forming a partial derivative, we regard the other variables as constant and derive only with respect to the variable we are considering.
The Jacobian matrix
edit
From the above, we know that the differential of a function
f
′
(
x
0
)
{\displaystyle f'(x_{0})}
has an associated matrix representing the linear map thus defined. Under a condition, we can determine this matrix from the partial derivatives of the component functions.
Theorem
Let
f
:
R
m
→
R
n
{\displaystyle f:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}
be a function such that all partial derivatives exist at
x
0
{\displaystyle x_{0}}
and are continuous in each component on
B
r
(
x
0
)
{\displaystyle B_{r}(x_{0})}
for a possibly very small, but positive
r
>
0
{\displaystyle r>0}
. Then
f
{\displaystyle f}
is totally differentiable at
x
0
{\displaystyle x_{0}}
and the differential of
f
{\displaystyle f}
is given by left multiplication by the matrix
J
f
(
x
0
)
:=
(
∂
f
1
∂
x
1
⋯
∂
f
1
∂
x
m
⋮
⋱
⋮
∂
f
n
∂
x
1
⋯
∂
f
n
∂
x
m
)
{\displaystyle J_{f}(x_{0}):={\begin{pmatrix}{\dfrac {\partial f_{1}}{\partial x_{1}}}&\cdots &{\dfrac {\partial f_{1}}{\partial x_{m}}}\\\vdots &\ddots &\vdots \\{\dfrac {\partial f_{n}}{\partial x_{1}}}&\cdots &{\dfrac {\partial f_{n}}{\partial x_{m}}}\end{pmatrix}}}
where
f
=
(
f
1
,
…
,
f
n
)
{\displaystyle f=(f_{1},\ldots ,f_{n})}
.
The matrix
J
f
(
x
0
)
{\displaystyle J_{f}(x_{0})}
is called the Jacobian matrix .
Proof
‖
f
(
x
0
+
h
→
)
−
(
f
(
x
0
)
+
J
f
(
x
0
)
h
→
)
‖
‖
h
→
‖
{\displaystyle {\frac {{\Big \|}f(x_{0}+{\vec {h}})-(f(x_{0})+J_{f}(x_{0}){\vec {h}}){\Big \|}}{\|{\vec {h}}\|}}}
=
‖
∑
j
=
1
n
f
j
(
x
0
+
h
→
)
e
→
j
−
∑
j
=
1
n
(
f
j
(
x
0
)
+
∑
k
=
1
m
h
k
∂
f
j
∂
x
m
(
x
0
)
)
e
→
j
‖
‖
h
→
‖
{\displaystyle ={\frac {\left\|\displaystyle \sum _{j=1}^{n}f_{j}(x_{0}+{\vec {h}}){\vec {e}}_{j}-\sum _{j=1}^{n}\left(f_{j}(x_{0})+\sum _{k=1}^{m}h_{k}{\frac {\partial f_{j}}{\partial x_{m}}}(x_{0})\right){\vec {e}}_{j}\right\|}{\|{\vec {h}}\|}}}
≤
∑
j
=
1
n
‖
f
j
(
x
0
+
h
)
−
(
f
j
(
x
0
)
+
∑
k
=
1
m
h
k
∂
f
j
∂
x
m
(
x
0
)
)
‖
‖
h
→
‖
{\displaystyle \leq \sum _{j=1}^{n}{\frac {\left\|f_{j}(x_{0}+h)-\left(f_{j}(x_{0})+\displaystyle \sum _{k=1}^{m}h_{k}{\frac {\partial f_{j}}{\partial x_{m}}}(x_{0})\right)\right\|}{\|{\vec {h}}\|}}}
We shall now prove that all summands of the last sum go to 0.
Indeed, let
j
∈
{
1
,
…
,
n
}
{\displaystyle j\in \{1,\ldots ,n\}}
. Writing again
h
→
=
(
h
1
,
…
,
h
m
)
{\displaystyle {\vec {h}}=(h_{1},\ldots ,h_{m})}
, we obtain by the one-dimensional mean value theorem, first applied in the first variable, then in the second and so on, the succession of equations
f
j
(
x
0
+
h
1
e
→
1
)
−
f
j
(
x
0
)
=
(
x
0
,
1
+
h
1
−
x
0
,
1
)
⏞
=
h
1
∂
f
j
∂
x
1
(
x
0
+
t
1
e
→
1
)
{\displaystyle f_{j}(x_{0}+h_{1}{\vec {e}}_{1})-f_{j}(x_{0})=\overbrace {(x_{0,1}+h_{1}-x_{0,1})} ^{=h_{1}}{\frac {\partial f_{j}}{\partial x_{1}}}(x_{0}+t_{1}{\vec {e}}_{1})}
f
j
(
x
0
+
h
1
e
→
1
+
h
2
e
→
2
)
−
f
j
(
x
0
+
h
1
e
→
1
)
=
(
x
0
,
2
+
h
2
−
x
0
,
2
)
⏞
=
h
2
∂
f
j
∂
x
2
(
x
0
+
h
1
e
→
1
+
t
2
e
→
2
)
{\displaystyle f_{j}(x_{0}+h_{1}{\vec {e}}_{1}+h_{2}{\vec {e}}_{2})-f_{j}(x_{0}+h_{1}{\vec {e}}_{1})=\overbrace {(x_{0,2}+h_{2}-x_{0,2})} ^{=h_{2}}{\frac {\partial f_{j}}{\partial x_{2}}}(x_{0}+h_{1}{\vec {e}}_{1}+t_{2}{\vec {e}}_{2})}
⋮
{\displaystyle \vdots }
f
j
(
x
0
+
h
1
e
→
1
+
⋯
+
h
m
e
→
m
)
−
f
j
(
x
0
+
h
1
e
→
1
+
⋯
+
h
m
−
1
e
→
m
−
1
)
=
(
x
0
,
m
+
h
m
−
x
0
,
m
)
⏞
=
h
m
∂
f
j
∂
x
m
(
x
0
+
h
1
e
→
1
+
⋯
+
h
m
−
1
e
→
m
−
1
+
t
n
e
→
m
)
{\displaystyle f_{j}(x_{0}+h_{1}{\vec {e}}_{1}+\cdots +h_{m}{\vec {e}}_{m})-f_{j}(x_{0}+h_{1}{\vec {e}}_{1}+\cdots +h_{m-1}{\vec {e}}_{m-1})=\overbrace {(x_{0,m}+h_{m}-x_{0,m})} ^{=h_{m}}{\frac {\partial f_{j}}{\partial x_{m}}}(x_{0}+h_{1}{\vec {e}}_{1}+\cdots +h_{m-1}{\vec {e}}_{m-1}+t_{n}{\vec {e}}_{m})}
for suitably chosen
t
k
∈
[
x
0
,
k
,
x
0
,
k
+
h
k
]
{\displaystyle t_{k}\in [x_{0,k},x_{0,k}+h_{k}]}
. We can now sum all these equations together to obtain
f
j
(
x
0
+
h
→
)
−
f
(
x
0
)
=
∑
k
=
1
m
h
k
∂
f
j
∂
x
k
(
x
0
+
∑
l
=
1
k
−
1
h
l
e
→
l
+
t
k
e
→
k
)
{\displaystyle f_{j}(x_{0}+{\vec {h}})-f(x_{0})=\sum _{k=1}^{m}h_{k}{\frac {\partial f_{j}}{\partial x_{k}}}\left(x_{0}+\sum _{l=1}^{k-1}h_{l}{\vec {e}}_{l}+t_{k}{\vec {e}}_{k}\right)}
Let now
δ
>
0
{\displaystyle \delta >0}
. Using the continuity of the
∂
f
j
∂
x
k
{\displaystyle {\frac {\partial f_{j}}{\partial x_{k}}}}
on
B
r
(
x
0
)
{\displaystyle B_{r}(x_{0})}
, we may choose
δ
k
>
0
{\displaystyle \delta _{k}>0}
such that
|
∂
f
j
∂
x
k
(
x
0
+
∑
l
=
1
k
−
1
h
l
e
→
l
+
t
k
e
k
)
−
∂
f
j
∂
x
m
(
x
0
)
|
<
ϵ
m
{\displaystyle \left|{\frac {\partial f_{j}}{\partial x_{k}}}\left(x_{0}+\sum _{l=1}^{k-1}h_{l}{\vec {e}}_{l}+t_{k}e_{k}\right)-{\frac {\partial f_{j}}{\partial x_{m}}}(x_{0})\right|<{\frac {\epsilon }{m}}}
for
|
h
k
|
<
δ
k
{\displaystyle |h_{k}|<\delta _{k}}
, given that
h
→
∈
B
r
(
0
)
{\displaystyle {\vec {h}}\in B_{r}(0)}
(which we may assume as
h
→
→
0
→
{\displaystyle {\vec {h}}\to {\vec {0}}}
). Hence, we obtain
‖
f
j
(
x
0
+
h
)
−
(
f
j
(
x
0
)
+
∑
k
=
1
m
h
k
∂
f
j
∂
x
m
(
x
0
)
)
‖
‖
h
→
‖
≤
‖
h
→
‖
⋅
m
⋅
ϵ
m
‖
h
→
‖
{\displaystyle {\frac {\left\|f_{j}(x_{0}+h)-\left(f_{j}(x_{0})+\displaystyle \sum _{k=1}^{m}h_{k}{\frac {\partial f_{j}}{\partial x_{m}}}(x_{0})\right)\right\|}{\|{\vec {h}}\|}}\leq {\frac {\|{\vec {h}}\|\cdot m\cdot {\frac {\epsilon }{m}}}{\|{\vec {h}}\|}}}
and thus the theorem.
◻
{\displaystyle \Box }
Corollary
If
f
:
R
m
→
R
n
{\displaystyle f:\mathbb {R} ^{m}\to \mathbb {R} ^{n}}
is continuously differentiable at
x
0
∈
R
m
{\displaystyle x_{0}\in \mathbb {R} ^{m}}
and
v
→
∈
R
m
∖
{
0
}
{\displaystyle {\vec {v}}\in \mathbb {R} ^{m}\setminus \{0\}}
, then
D
v
→
f
(
x
0
)
=
∑
j
=
1
m
v
j
∂
f
∂
x
j
(
x
0
)
{\displaystyle D_{\vec {v}}f(x_{0})=\sum _{j=1}^{m}v_{j}{\frac {\partial f}{\partial x_{j}}}(x_{0})}
Proof
D
v
→
f
(
x
0
)
=
f
′
(
x
0
)
(
v
→
)
=
J
f
(
x
0
)
v
→
=
∑
j
=
1
m
v
j
∂
f
∂
x
j
(
x
0
)
{\displaystyle D_{\vec {v}}f(x_{0})=f'(x_{0})({\vec {v}})=J_{f}(x_{0}){\vec {v}}=\sum _{j=1}^{m}v_{j}{\frac {\partial f}{\partial x_{j}}}(x_{0})}
◻
{\displaystyle \Box }