Classical Mechanics/Lagrangian

< Classical Mechanics

Mechanics considered using forcesEdit

In Newtonian mechanics, a mechanical system is always made up of point masses or rigid bodies, and these are subject to known forces. One must therefore specify the composition of the system and the nature of forces that act on the various bodies. Then one writes the equations of motion for the system. Here are some examples of how one describes mechanical systems in Newtonian mechanics (these examples are surely known to you from school-level physics).

  • Example: a free mass point.

This is the most trivial of all mechanical systems: a mass point that does not interact with any other bodies and is subject to no forces. Introduce the coordinates x,y,z to describe the position of the mass point. Since the force is always equal to zero, the equations of motion are \ddot x=0, \ddot y=0, \ddot z=0. The general solution of these equations describes a linear motion with constant velocity: x=x_0 + v_x t, etc.

  • Example: two point masses with springs attached to a motionless wall.
Physics spring example

Two masses can move along a line (the x axis) without friction. The mass m_1 is attached to the wall by a spring, and the mass m_2 is attached to the mass m_1 by a spring. Both springs have spring constant k and the unstretched length L.

To write the equations of motion, we first introduce the two coordinates x_1, x_2 and then consider the forces acting on the two masses. The force on the mass m_1 is the sum of the leftward-pointing force F_1 from the left spring and the rightward-pointing force F_2 from the right spring. The force on m_2 is a leftward-pointing F_2. By definition of a "spring" we have F_1 = k (x_1-L) and F_2 = k (x_2 - x_1 - L). Therefore we write the equations for the accelerations a_1, a_2 of the two masses:

a_1 = \ddot {x_1} = \frac{F_2 - F_1}{m_1} = \frac{k}{m_1} (x_2 -x_1 -L) - \frac{k}{m_1} (x_1 -L),
a_2 = \ddot {x_2} = - \frac{F_2}{m_2} = - \frac{k}{m_2} (x_2 -x_1 -L).

At this point we are finished describing the system; we now need to solve these equations for particular initial conditions and determine the actual motion of this system.

Introducing the action principleEdit

The Lagrangian description of a mechanical system is rather different: First, we do not ask for the evolution of the system given some initial conditions, but instead assume that the position of the system at two different time moments t_1 and t_2 is known and fixed. For convenience, let us collect all coordinates (such as x,y,z or x_1,x_2 above) into one array of "generalized coordinates" and denote them by q_i. So the "boundary conditions" that we impose on the system are q_i(t_1)=A_i and q_i(t_2)=B_i, where A_i,B_i are fixed numbers. We now ask: how does the system move between the time moments t_1 and t_2. The Lagrangian description answers: during that time, the system must move in such a way as to give the minimum value to the integral \int _{t_1}^{t_2} L(q_i,\dot q_i) dt, where L(q_i, \dot q_i) is a known function called the Lagrange function or Lagrangian. For example, the Lagrangian for a free mass point is

L(x,y,z,\dot x, \dot y, \dot z) = \frac{m}{2} [\dot x^2 +\dot y^2 +\dot z^2].

The Lagrangian for the above example with two masses attached to the wall is

L(x_1,x_2,\dot x_1, \dot x_2) = \frac{m}{2} [\dot x_1^2 +\dot x_2^2] - \frac{k}{2} (x_1-L)^2 - \frac{k}{2}(x_2-x_1-L)^2.

For instance, according to the Lagrangian description, the free point mass moves in such a way that the functions x(t),y(t),z(t) give the minimum value to the integral \int _{t_1}^{t_2} \frac{m}{2}[\dot x^2 +\dot y^2 +\dot z^2]dt, where the values of x(t),y(t),z(t) at times t_{1,2} are fixed.

In principle, to find the minimum value of the integral \int _{t_1}^{t_2} L(q_i,\dot q_i) dt one would have to evaluate that integral for each possible trajectory q_i(t) and then choose the "optimal" trajectory q_i^{*}(t) for which this integral has the smallest value. (Of course, we shall learn and use a much more efficient mathematical approach to determine this "optimal" trajectory instead of trying every possible set of functions q_i(t).) The value of the mentioned integral is called the action corresponding to a particular trajectory q_i(t). Therefore the requirement that the integral should have the smallest value is often called "the principle of least action" or just action principle.

At this point, we need to answer the pressing question:

  • How can it be that the correct trajectory q_i^{*}(t) is found not by considering the forces but by requiring that some integral should have the minimum value? How does each point mass "know" that it needs to minimize some integral when it moves around?

The short answer is that the least action requirement is mathematically equivalent to the consideration of forces if the Lagrangian L is chosen correctly. The condition that some integral has the minimum value (when the integral is correctly chosen) is mathematically the same as the Newtonian equations for the acceleration. The point masses perhaps "know" nothing about this integral. It is simply mathematically convenient to formulate the mechanical laws in one sentence rather than in many sentences. (We shall see another, more intuitive explanation below.)

Suppose that we understand how the requirement that an integral has the minimum value can be translated into equations for the acceleration. Obviously the form of the integral needs to be different for each mechanical system since the equations of motion are different. Then the second question presents itself:

  • How can we find the Lagrange function L(q_i, \dot q_i) corresponding to each mechanical system?

This is a more complicated problem and one needs to study many examples to gain a command of this approach. (In brief: the Lagrange function is the kinetic energy minus the potential energy.)

Before considering Lagrange functions, we shall look at how the mathematical requirement of "least action" can be equivalent to equations of motion such as given in the examples above.

Variation of a functionalEdit

A function is a map from numbers into numbers; a functional is a map from functions into numbers. An application of a functional to a function is usually denoted by square brackets, e.g. S[f(x)].

Random examples of functionals, just to illustrate the concept:

S[f(x)] = \int _0 ^\infty \sqrt{f(3x)}dx
S[f(x)] = \int _{-1} ^1 \frac{f(x)}{1-x^2}dx
S[f(x)] = f(15) - 8 f'(3) - \int _0 ^1 \sin\left[(f(x-2)+\sqrt{x}e^{-x})^3\right] dx

In principle, a functional can be anything that assigns a number to any function. In practice, only some functionals are interesting and have applications in physics.

Since the action integral maps trajectories into numbers, we can call it the action functional. The action principle is formulated as follows: the trajectory q_i(t) must be such that the action functional evaluated on this trajectory has the minimum value among all trajectories.

This may appear to be similar to the familiar condition for the mechanical equilibrium: the coordinates x,y,z are such that the potential energy has the minimum value. However, there is a crucial difference: when we minimize the potential energy, we vary the three numbers x,y,z until we find the minimum value; but when we minimize a functional, we have to vary the whole function q_i(t) until we find the minimum value of the functional.

The branch of mathematics known as calculus of variations studies the problem of minimizing (maximizing, extremizing) functionals. One needs to learn a little bit of variational calculus at this point. Let us begin by solving some easy minimization problems involving functions of many variables; this will prepare us for dealing with functionals which can be thought of as functions of infinitely many variables. You should try the examples yourself before looking at the solutions.

Example 1: Minimize the function f(x,y) = x^2+xy+y^2 with respect to x,y.

Solution: Compute the partial derivatives of f with respect to x,y. These derivatives must both be equal to zero. This can only happen if x=0, y=0.

Example 2: Minimize the function f(x_1,...,x_n) = x_1^2+x_1 x_2+x_2^2+x_2 x_3+...+x_n^2 with respect to all x_j.

Solution: Compute the partial derivatives of f with respect to all x_j, where j=1,...,n. These derivatives must all be equal to zero. This can only happen if all x_j=0.

Example 3: Minimize the function f(x_0,...,x_n) = (x_1-x_0)^2+(x_2-x_1)^2+...+(x_n-x_{n-1})^2 with respect to all x_j subject to the restrictions x_0=0, x_n=A.

Solution: Compute the partial derivatives of f with respect to x_j, where j=2,...,n-1. These derivatives must all be equal to zero. This can only happen if x_j-x_{j-1}=x_{j+1}-x_j for j=1,2,...,n-1. The values x_0, x_n are known, therefore we find x_j=jA/n.

Intuitive calculationEdit

Let us now consider the problem of minimizing the functional S[x]=\int _0 ^1 {\dot x(t)}^2 dt with respect to all functions x(t) subject to the restrictions x(0)=0, x(1)=L. We shall first perform the minimization in a more intuitive but approximate way, and then we shall see how the same task is handled more elegantly by the variational calculus.

Let us imagine that we are trying to minimize the integral L[x] with respect to all functions x(t) using a digital computer. The first problem is that we cannot represent "all functions" x(t) on a computer because we can only store finitely many values x(t_0), x(t_1), ..., x(t_N) in an array within the computer memory. So we split the time interval [0,1] into a large number N of discrete steps [0,t_1], [t_1, t_2], ..., [t_{N-1}, 1], where the step size t_j-t_{j-1}\equiv \Delta t = 1/N is small; in other words, t_j = j/N, j=1, ..., N-1. We can describe the function x(t) by its values x_j at the points t_j, assuming that the function x(t) is a straight line between these points. The time moments t_1, ..., t_{N-1} will be kept fixed, and then the various values x_j will correspond to various possible functions x(t). (In this way we definitely will not describe all possible functions x(t), but the class of functions we do describe is broad enough so that we get the correct results in the limit N\to\infty. Basically, any function x(t) can be sufficiently well approximated by one of these "piecewise-linear" functions when the step size \Delta t is small enough.)

Since we have discretized the time and reduced our attention to piecewise-linear functions, we have

\dot x = \frac{x_j-x_{j-1}}{\Delta t}

within each interval t\in[t_{j-1}, t_j]. So we can express the integral S[x] as the finite sum,

 S[x] =\int _0^1 {\dot x(t)}^2 dt = \sum_{j=1}^N \frac{{(x_j-x_{j-1})}^2}{\Delta t^2} \Delta t,

where we have defined for convenience t_0 =0, t_N = 1.

At this point we can perform the minimization of S[x] quite easily. The functional S[x] is now a function of N-1 variables x_1, ..., x_{N-1}, i.e. S[x]=S(x_1,...,x_{N-1}), so the minimum is achieved at the values x_j where the derivatives of S(x_1,...,x_{N-1}) with respect to each x_j are zero. This problem is now quite similar to the Example 3 above, so the solution is x_j = jL/N, j=0,...,N. Now we recall that x_j is the value of the unknown function x(t) at the point t_j=j/N. Therefore the minimum of the functional S[x] is found at the values x_j such that would correspond to the function x(t)=Lt. As we increase the number N of intervals, we still obtain the same function x(t)=Lt, therefore the same function is obtained in the limit N\to\infty. We conclude that the function x(t)=Lt minimizes the functional L[x] with the restrictions x(0)=0, x(1)=L.

Variational calculationEdit

The above calculation has the advantage of being more intuitive and visual: it makes clear that minimization of a functional S[x(t)] with respect to a function x(t) is quite similar to the minimization of a function S(x_1, ..., x_N) with respect to a large number of variables x_j in the limit of infinitely many such variables. However, the formalism of variational calculus provides a much more efficient computational procedure. Here is how one calculates the function x(t) that minimizes S[x].

Let us consider a very small change \epsilon(t) in the function x(t) and see how the functional S[x] changes:

\delta S[x(t), \epsilon(t)] \equiv S[x(t)+\epsilon(t)]-S[x(t)].

(In many textbooks, the change in x(t) is denoted by \delta x(t), and generally the change of any quantity Q is denoted by \delta Q. We chose to write \epsilon(t) instead of \delta x(t) for clarity.)

The functional \delta S[x, \epsilon] is called the variation of the functional S[x] with respect to the change \epsilon(t) in the function x(t). The variation is itself a functional depending on two functions, x(t) and \epsilon(t). When \epsilon(t) is very small, we expect that the variation will be linear in \epsilon(t), just like the variation in the value of a normal function is linear in the amount of change in the argument, e.g. f(t+\alpha)-f(t)=f'(t) \alpha for small \alpha. So we expect that the variation \delta S[x, \epsilon] of the functional S[x] will be a linear functional of \epsilon(t). To understand what a linear functional looks like, consider a linear function f(\epsilon_j) depending on several variables \alpha_j, j=1,2,.... This function can always be written as

f(\epsilon_j) = \sum_{j}^{} A_j \epsilon_j

where A_j are suitable constants. Since a functional is like a function of infinitely many variables, the index j becomes a continuous variable t, the variables \epsilon_j and the constants A_j become functions \epsilon(t),A(t), while the sum over j becomes an integral over t. Thus, a linear functional of \epsilon(t) can be written as an integral,

\delta S[x, \epsilon]=\int _0^1 A(t) \epsilon(t) dt,

where A(t) is a suitable function. In the case of the usual function f(t), the "suitable constant A" is the derivative A=df(t)/dt. By analogy we call A(t) above the variational derivative of the functional and denote it by \delta S[x]/\delta x(t).

A function has a minimum (or maximum, or extremum) at a point where its derivative vanishes. So a functional S[x(t)] has a minimum (or maximum, or extremum) at the function x(t) where the functional derivative vanishes. We shall justify this statement below, and for now let us now compute the functional derivative of the functional S[x(t)]=\int_0^1 \dot x^2 dt.

Substituting x(t)+\epsilon(t) instead of x(t) into the functional, we get

\delta S[x,\epsilon]=\int _0^1 [(\dot x+\dot \epsilon)^2-\dot x^2]dt=2\int_0^1\dot x \dot \epsilon dt +O(\epsilon^2),

where we are going to neglect terms quadratic in \epsilon(t) and so we didn't write them out. We now need to rewrite this integral so that no derivatives of \epsilon(t) appear there; so we integrate by parts and find

\delta S[x(t),\epsilon(t)]= \left. \epsilon(t)\dot x(t)\right| _0^1 - 2\int _0^1 \ddot x(t) \epsilon(t) dt.

Since in our case the values x(0),x(1) are fixed, the function \epsilon(t) must be such that \epsilon(0)=\epsilon(1)=0, so the boundary terms vanish. The variational derivative is therefore

\delta S/\delta x(t) = -2\ddot x(t).

The functional S[x] has an extremum when its variation under an arbitrary change \epsilon(t) is second-order in \epsilon(t). However, above we have obtained the variation as a first-order quantity, linear in \epsilon(t); so this first-order quantity must vanish for x(t) where the functional has an extremum. An integral such as \int_0^1 A(t) \epsilon(t) can vanish for arbitrary \epsilon(t) only if the function A(t) vanishes for all t. In our case, the "function A(t)," i.e. the variational derivative \delta S/\delta x(t), is equal to -2\ddot x(t). Therefore the function x(t) on which the functional S[x] has an extremum must satisfy -2\ddot x(t)=0 or more simply \ddot x=0. This differential equation has the general solution x(t)=a+bt, and with the additional restrictions x(0)=0,x(1)=L we immediately get the solution x(t)=Lt.

General formulationEdit

To summarize: the requirement that the functional S[x(t)] must have an extremum at the function x(t) leads to a differential equation on the unknown function x(t). This differential equation is found as

\delta S[x]/\delta x(t) =0.

The procedure is quite similar to finding on extremum of a function f(t), where the point t of the extremum is found from the equation df(t)/dt=0.

Suppose that we are now asked to minimize the functional S[x(t)]=\int _0^1 (x^2+\dot x^2-x^4\sin t)dt subject to the restrictions x(0)=0,x(1)=1; in mechanics we shall mostly be dealing with functionals of this kind. We might try to discretize the function x(t), as we did above, but this is difficult. Moreover, for a different functional S[x] everything will have to be computed anew. Rather than go through the above procedure again and again, let us now derive the formula for the functional derivative for all functionals of this form, namely

S[x_i(t)]=\int_a^b L(x_i,\dot x_i,t)dt,

where L(x_i,v_i,t)dt is a given function of the coordinates x_i and velocities v_i\equiv \dot x_i (assuming that there are n coordinates, so i=1,...,n). This function L(x_i,v_i,t)dt is called the Lagrange function or simply the Lagrangian.

We introduce the infinitesimal changes \epsilon_i(t) into the functions x_i(t) and express the variation of the functional first through \epsilon_i(t) and \dot\epsilon_i(t),

\delta S[x_i(t),\epsilon_i(t)]=\int_a^b \sum _{i=1}^n\left[ \frac{\partial L}{\partial x_i}\epsilon_i(t)+\frac{\partial L}{\partial v_i}\dot \epsilon_i(t) \right] dt.

Then we integrate by parts, discard the boundary terms and obtain

\delta S[x_i(t),\epsilon_i(t)]=\int_a^b \sum _{i=1}^n\left[ \frac{\partial L}{\partial x_i}-\frac{d}{dt}\frac{\partial L}{\partial v_i} \right] \epsilon_i(t)dt.

Thus the variational derivatives can be written as

\frac{\delta S[x]}{\delta x_i(t)}=\frac{\partial L}{\partial x_i}-\frac{d}{dt}\frac{\partial L}{\partial v_i}.

Euler-Lagrange equationsEdit

Consider again the condition for a functional to have an extremum at x_i(t): the first-order variation must vanish. We have derived the above formula for the variation \delta S[x_i, \epsilon_i]. Since all \epsilon_i(t) are completely arbitrary (subject only to the boundary conditions \epsilon_i(a)=\epsilon_i(b)=0), the first-order variation vanishes only if the functions in square brackets all vanish at all t. Therefore we obtain the Euler-Lagrange equations

\frac{\partial L}{\partial x_i}-\frac{d}{dt}\frac{\partial L}{\partial v_i} = 0.

These are the differential equations that express the mathematical requirement that the functional S[x_i(t),\dot x_i(t),t] has an extremum at the set of functions x_i(t). There are as many equations as unknown functions x_i(t), one equation for each i=1,...,n.

Note that the Euler-Lagrange equations involve partial derivatives of the Lagrangian with respect to coordinates and velocities. The derivatives with respect to velocities v=\dot x are sometimes written as \partial L/\partial \dot x which might at first sight appear confusing. However, all that is meant by this notation is the derivative of the function L(x,v,t) with respect to its second argument.

The Euler-Lagrange equations also involve the derivative d/dt with respect to the time. This is not a partial derivative with respect to t but a total derivative. In other words, to compute \frac{d}{dt}\frac{\partial L}{\partial \dot x_i}, we need to substitute the functions x_i(t) and \dot x_i(t) into the expression \frac{\partial L}{\partial \dot x_i}, thus obtain a function of time only, and then take the derivative of this function with respect to time.

Remark: If the Lagrangian contains higher derivatives (e.g. the second derivative), the Euler-Lagrange formula is different. For example, if the Lagrangian is L=L(x,\dot x, \ddot x), then the Euler-Lagrange equation is

\frac{\partial L}{\partial x}-\frac{d}{dt}\frac{\partial L}{\partial \dot x} + \frac {d^2}{dt^2}\frac{\partial L}{\partial \ddot x}= 0.

Note that this equation may be up to fourth-order in time derivatives! Usually, one does not encounter such Lagrangians in studies of classical mechanics because ordinary systems are described by Lagrangians containing only first-order derivatives.

Summary: In mechanics, one specifies a system by writing a Lagrangian and pointing out the unknown functions in it. From that, one derives the equations of motion using the Euler-Lagrange formula. You need to know that formula really well and to understand how to apply it. This comes only with practice.

How to choose the LagrangianEdit

The basic rule is that the Lagrangian is equal to the kinetic energy minus the potential energy. (Both should be measured in an inertial system of reference! In a non-inertial system, this rule may fail.)

It can be shown that this rule works for an arbitrary mechanical system made up of point masses, springs, ropes, frictionless rails, etc., regardless of how one introduces the generalized coordinates. We shall not study the proof of this statement, but instead go directly to the examples of Lagrangians for various systems.

Examples of LagrangiansEdit

  • The Lagrangian for a free point mass moving along a straight line with coordinate x:
L=\frac{1}{2} m {\dot x}^2.
  • A point mass moving along a straight line with coordinate x, in a force field with potential energy V(x):
L=\frac{1}{2} m {\dot x}^2 - V(x).
  • A point mass moving in three-dimensional space with coordinates x_i\equiv (x,y,z), in a force field with potential energy V(x,y,z):
L=\frac{1}{2}\sum_{i=1}^3 \left(  m {\dot x_i}^2\right) - V(x,y,z)=\frac{m}{2}|\dot\vec x|^2-V(\vec x).
  • A point mass constrained to move along the circle x^2+z^2=R^2 in the gravitational field near the Earth (the z axis is vertical). It is convenient to introduce the angle \theta as the coordinate, with z=R\cos\theta, x=R\sin\theta. Then the potential energy is U=mgz=mgR\cos\theta, while the kinetic energy is K=mv^2/2=mR^2\omega^2/2=mR^2\dot\theta^2/2. So the Lagrangian is

Note that we have written the Lagrangian (and therefore we can derive the equations of motion) without knowing the force needed to keep the mass moving along the circle. This shows the great conceptual advantage of the Lagrangian approach; in the traditional Newtonian approach, the first step would be to determine this force, which is initially unknown, from a system of equations involving an unknown acceleration of the point mass.

  • Two (equal) point masses connected by a spring with length l:
L=\frac{m}{2}(\dot x_1^2 + \dot x_2^2)-\frac{k}{2}(x_1-x_2-l)^2.
  • A mathematical pendulum, i.e. a massless rigid stick of length l with a point mass attached at the end, that can move only in the x-z plane in the gravitational field near the Earth (vertical z axis). As the coordinate, we choose the angle \theta between the stick and the z axis. The Lagrangian is
L=\frac{m}{2} l^2 \dot \theta^2 +mgl\cos\theta .
  • A point mass m sliding without friction along an inclined plane that makes an angle \alpha with the horizontal, in the gravitational field of the Earth. As the coordinate, we choose x,y, where y is parallel to the incline. The height z is then z=x\tan\alpha, so the potential energy is U=mgz=mgx\tan\alpha. The kinetic energy is computed as
K=\frac{m}{2} (\dot x^2 + \dot y^2 + \dot z^2)=\frac{m}{2} (\dot x^2/\cos^2\alpha + \dot y^2).

Hence, the Lagrangian is

L=K-U=\frac{m}{2} (\dot x^2/\cos^2\alpha + \dot y^2) -mgx\tan\alpha.

Further workEdit

Exercise: You should now determine the Euler-Lagrange equations that follow from each the above Lagrangians and verify that these equations are the same as would be obtained from school-level Newtonian considerations for the respective physical systems. This should occupy you for at most an hour or two. Only then you will begin to appreciate the power of the Lagrangian approach.

Some more Lagrangian exercises here.

For more examples of setting up Lagrangians for mechanical systems and for deriving the Euler-Lagrange equations, ask your physics teacher or look up in any theoretical mechanics problem book. Much of the time, the Euler-Lagrange equations for some complicated system (say, a pendulum attached to the endpoint of another pendulum) would be too difficult to solve, but the point is to gain experience deriving them. Their derivation would be much less straightforward in the old Newtonian approach using forces.

See here for a very brief primer on differential equations.

If this is your first time looking at Lagrangians, you might be still asking yourself: how could the motion of a system be described by saying that some integral has the minimal value? Is it a purely formal mathematical trick, and if not, how can one get a more visually intuitive understanding? A partial answer is here.