Calculus/Multivariate optimisation: The Lagrangian

The problem

In previous sections, we've discussed how, using calculus, we can find the optimal solution of a single-variate $y=f(x)$ by finding all points where $f'(x)=0$ . But what if we're given a bivariate function - for example, $z=f(x,y)$ ? More importantly, what if we're given constraints to follow? The single-variate model does not scale at all.

One variable, one constraint

Consider the optimisation problem $\min {f(x)}$ given a constraint $g(x)=h$ .

First write the constraint in such a way that it's equal to 0 - so $g(x)-h=0$ . Then the Lagrangian of the system is defined by $L(x,\lambda )=f(x)+\lambda (g(x)-h)$ . We have two variables to optimise - $x$ and $\lambda$ . Then find the derivatives with respect to the variables:

${\frac {\partial L}{\partial x}}=f'(x)+\lambda g'(x)$

${\frac {\partial L}{\partial \lambda }}=g(x)-h$

Set them to 0. Then the optimal set $\{x,\lambda \}$ is the solution to $f'(x)+\lambda g'(x)=0$ and $g(x)=h$ .

TODO

Editor's note
Some authors will only formally take the partial derivatives of the variable and just set the constraints to 0; this is completely fine and it's easy to show that the two approaches are equivalent.

The reason this matters is when doing constrained optimisation with inequality constraints using KKT conditions (which isn't covered in this section), taking the partial derivatives of the constraints will change this to a Lagrangian equality problem - which isn't the same. It's easy to get the two mixed up.

A simple univariate example

Example. Solve the optimisation problem $\min {5x+3}$ given the constraint $x^{2}=25$ .

Then the Lagrangian system is $L(x,\lambda )=5x+3+\lambda (x^{2}-25)$ . Take the respective derivatives:

${\frac {\partial L}{\partial x}}=5+\lambda (2x)$

${\frac {\partial L}{\partial \lambda }}=x^{2}-25$

Set the second to zero - we get $x=\pm 5$ . Substitute in first: we get $5+10\lambda =0$ , which is $\lambda =-{\frac {1}{2}}$ . Substitute in second: we get $\lambda ={\frac {1}{2}}$ . In this case, the optimal minimum is the set $\{x,\lambda \}=\{-5,{\frac {1}{2}}\}$ (which is what we're looking for) and the optimal maximum is the set $\{x,\lambda \}=\{5,{\frac {-1}{2}}\}$ .

It is important to realise that the Lagrangian does not guarantee that a particular solution is a minimum - we need to test the solution ourselves - as in one case the solution was actually the maximum.

This is actually a pretty crappy example as you may have seen - it would have been perfectly appropriate in this case to simply test the optimisation problem with the only two valid values given the constraint! It gets more useful when we have multiple variables and constraints to consider.

Two variables, one constraint

Consider the optimisation problem $\min {f(x,y)}$ given a constraint $g(x,y)=h$ .

The Lagrangian system is almost identical to the single-variable case discussed above, except that we have a system of three partial derivatives to consider (2 variables + 1 constraint): $L(x,y,\lambda )=f(x,y)+\lambda (g(x,y)-h)$ . Now take the respective partial derivatives:

${\frac {\partial L}{\partial x}}=f'_{x}(x,y)+\lambda g'_{x}(x,y)$ (the first variable x)

${\frac {\partial L}{\partial y}}=f'_{y}(x,y)+\lambda g'_{y}(x,y)$ (the second variable y)

${\frac {\partial L}{\partial \lambda }}=g(x,y)-h$ (the constraint)

Set them to 0 - and the optimal triplet $\{x,y,\lambda \}$ is the solution to that.

A bivariate example

The problem, graphed. Notice the constraint "encircling" the optimisation problem - the solution must lie in that outer circle.

Example. Solve the optimisation problem $\max {5x+3y}$ given the constraint $x^{2}+y^{2}=25$ .

Solution.

Set up the Lagrangian:

$L(x,y,\lambda )=(5x+3y)+\lambda (x^{2}+y^{2}-25)$

Take the partial derivatives:

${\frac {\partial L}{\partial x}}=5+\lambda (2x)$

${\frac {\partial L}{\partial y}}=3+\lambda (2y)$

${\frac {\partial L}{\partial \lambda }}=x^{2}+y^{2}-25$

Setting them to 0, we have

$5+2\lambda x=0$

$3+2\lambda y=0$

$x^{2}+y^{2}=25$

Eliminate $\lambda$ from the first two equations to get a relation between x and y:

From the second equation, $2\lambda y=-3$ , or $\lambda ={\frac {-3}{2y}}$ . Similarly, from the first equation, $2\lambda x=-5$ , or $\lambda ={\frac {-5}{2x}}$ . Combining the two results with $\lambda$ , we get ${\frac {-3}{2y}}={\frac {-5}{2x}}$ . Simplifying, this is ${\frac {-3}{y}}={\frac {-5}{x}}$ , or ${\frac {-3x}{y}}=-5$ , which is ${\frac {-3x}{-5}}=y$ . This is the same as $y={\frac {3x}{5}}$ .

Now substitute in the third equation to find x and y:

$x^{2}+y^{2}=25$

$x^{2}+{\frac {9x^{2}}{25}}=25$

${\frac {34x^{2}}{25}}=25$

$x=\pm {\sqrt {\frac {625}{34}}}$ (this is about ± 4.288)

Similarly, $y={\frac {3x}{5}}=\pm {\frac {3}{5}}{\frac {25}{\sqrt {34}}}$ (this is about ± 2.572).

Remember that this is a maximisation problem, and hence we find that $x={\sqrt {\frac {625}{34}}}$ and $y={\frac {3}{5}}{\sqrt {\frac {625}{34}}}$ (the other solution is the minimum). We could have found the value of the constraint $\lambda$ , but that isn't necessary here given that the problem only wants us to find the maximum value. Notice that, as above, the Lagrangian likes to give the boundary solutions and it's our job to find out which of them is the actual solution (if there is one, that is).

Did we have to use the Lagrangian? Actually no. The problem could have been reduced to a univariate form by writing one variable of the constraint in terms of the other: $y=\pm {\sqrt {25-x^{2}}}$ , and substituting it into the optimisation problem:

$\max {(5x+3y)}=\max {(5x\pm 3({{\sqrt {25-x^{2}}})})}$

and use single-variable calculus techniques to solve the problem! But could you do this when there are three variables instead? No, because you will most likely only be able to reduce the problem to two variables.

The general form

In this section, consider a vector x of size n: ${\textbf {x}}={\begin{pmatrix}x_{1}\\x_{2}\\x_{3}\\...\\x_{n}\end{pmatrix}}$ .

Definition. (The Lagrangian)

Consider the optimisation problem $\min {f({\textbf {x}})}$ given a vector of m constraints ${\textbf {g}}(x)={\begin{pmatrix}g_{1}({\textbf {x}})\\g_{2}({\textbf {x}})\\g_{3}({\textbf {x}})\\...\\g_{m}({\textbf {x}})\end{pmatrix}}=0$ .

The Lagrangian of this system is defined as $L({\textbf {x}},\lambda )=f({\textbf {x}})+\lambda ({\textbf {g}}(x))$ .

Then take the partial derivatives with respect to the vectors x and λ: find ${\frac {\partial L}{\partial {\textbf {x}}}}$ and ${\frac {\partial L}{\partial \lambda }}$ .

Notice that this system has m + n variables, and you'll need to take m + n partial derivatives as well. This can get quite messy. A solution is to use matrix calculus.

This may scare you, and you shouldn't be concerned. The average Calculus 3 course will only consider 2 to three variables.

The regularity condition

Knowledge of linear algebra is expected for this section; this is unlikely to be covered in an average Calculus 3 course as a result.

The regularity condition applies when considering the Lagrange FONC (first order necessary condition)

Definition. (The Lagrange FONC)

Consider a minimiser ${\textbf {x}}$ of a function $f$ which is also regular. Then there exists a $\lambda$ such that $\nabla L({\textbf {x}},\lambda )$ = 0.

Remember that it's a necessary condition. This means that

just because a point does satisfies the Lagrange FONC does not mean that it is a minimiser or a maximiser.
a point that does not satisfy the Lagrange FONC cannot be a minimiser or maximiser.

Definition. (Regularity condition) Given a vector of constraints ${\textbf {g}}(x)=0$ , the regularity condition says that the gradient of each constraint at a particular point must be linearly independent.

If this condition is not satisfied, the Lagrange FONC does not apply at that point. The reason this does not matter with one constraint is because a single vector is linearly independent by definition.

Example.

Let ${\textbf {g}}(x,y)$ be defined as ${\textbf {g}}(x)={\binom {(x+1)^{2}+y^{2}-1}{(x-2)^{2}+y^{2}-4}}$ . Determine whether these vectors are regular at the point (0,0).

Solution.

Find the gradients:

$\nabla g_{1}(x,y)={\binom {2(x+1)}{2y}},\nabla g_{2}(x,y)={\binom {2(x-2)}{2y}}$

Substitute at point (0,0):

$\nabla g_{1}(0,0)={\binom {2}{0}},\nabla g_{2}(0,0)={\binom {-4}{0}}$

The problem now reduces to checking whether the vectors ${\binom {2}{0}}$ and ${\binom {-4}{0}}$ are linearly independent. To do that, recall that linear independence requires that given two constants $\alpha _{1}$ and $\alpha _{2}$ , the solution to $\alpha _{1}{\binom {2}{0}}+\alpha _{2}{\binom {-4}{0}}={\binom {0}{0}}$ must only occur when $\alpha _{1}=\alpha _{2}=0$ .

This is clearly not the case: a easy example is to set $\alpha _{1}=2$ and $\alpha _{2}=1$ . Hence the Lagrange FONC does not apply to this problem at that point.