General Relativity/Printable version

General Relativity

The current, editable version of this book is available in Wikibooks, the open-content textbooks collection, at

Permission is granted to copy, distribute, and/or modify this document under the terms of the Creative Commons Attribution-ShareAlike 3.0 License.


High-precision test of general relativity by the Cassini space probe (artist's impression): Radio signals sent between the Earth and the probe (green wave) are delayed by the warping of space and time (blue lines) due to the Sun's mass.

General relativity (GR) is a theory of gravitation that was developed by Albert Einstein between 1907 and 1915. According to general relativity, the observed gravitational attraction between masses results from the warping of space and time by those masses.

Before the advent of general relativity, Newton's law of universal gravitation had been accepted for more than two hundred years as a valid description of the gravitational force between masses. Under Newton's model, gravity was the result of an attractive force between massive objects. Although even Newton was bothered by the unknown nature of that force, the basic framework was extremely successful at describing motion.

However, experiments and observations show that Einstein's description accounts for several effects that are unexplained by Newton's law, such as minute anomalies in the orbits of Mercury and other planets. General relativity also predicts novel effects of gravity, such as gravitational waves, gravitational lensing and an effect of gravity on time known as gravitational time dilation. Many of these predictions have been confirmed by experiment, while others are the subject of ongoing research. For example, although there is indirect evidence for gravitational waves, direct evidence of their existence was also achieved by team of scientists in experiments such as the LIGO .

General relativity has developed into an essential tool in modern astrophysics. It provides the foundation for the current understanding of black holes, regions of space where gravitational attraction is so strong that not even light can escape. Their strong gravity is thought to be responsible for the intense radiation emitted by certain types of astronomical objects (such as active galactic nucleus or microquasars). General relativity is also part of the framework of the standard Big Bang model of cosmology.

Although general relativity is not the only relativistic theory of gravity, it is the simplest such theory that is consistent with the experimental data. Nevertheless, a number of open questions remain: the most fundamental is how general relativity can be reconciled with the laws of quantum physics to produce a complete and self-consistent theory of quantum gravity.

In ordinary three-dimensional space the formula for distance in Cartesian coordinates is


Now one can change the coordinate systems if one wants. If one rotates the coordinate system or stretches or shrinks it, the values for x, y, and z may change, but the distances will not. We can even conceive of more radical changes, like going into spherical coordinates where


In special relativity we learned that physics is described by another invariant in which


Again, we are free to change the coordinates of x, y, z and t to anything we want, but the underlying geometry and distances don't change.

The next step is to incorporate gravity into this picture. While the mathematical details can be complex, the basic idea is that the effects of gravity are equivalent to the effects of acceleration on an observer. From this equivalence principle, Einstein was able to show that what matter does is to change the rules for distances. The formula we showed above is strictly true only when matter is not present; when matter is present, the rules for determining distances change, and the effect of these changes is to produce the effects of gravity we all know.

This picture of gravity is powerfully simple and elegant. However, there is one problem with it; in order for it to be usable, it is necessary to learn many new mathematical concepts to understand how this picture works.

In our daily lives, we have become very familiar with the properties of three-dimensional Euclidean space because that is the world we live in. In order to do anything such as walking, moving, or catching balls, our brains have to deal with 3-space and so we have a great deal of intuitive knowledge about how this sort of geometry works. Even when we are doing mathematics in three-dimensional space, we are helped by the fact that our minds have this sort of knowledge built in. However when we discuss other types of space, our normal intuition fails us, and we are forced to follow the much more difficult path of trying to figure out what happens by describing the situation through precise mathematical statements, and this involves learning several new mathematical concepts and techniques.

To give an example of the mathematical techniques we will have to learn. Imagine you are on the surface of a flat plane. One formula for distance is


Another formula for distance in the plane in polar coordinates is


Now these two formula look quite different, but they are really two different descriptions of the same situation.

On the surface of a cylinder


we again have a formula which looks very similar to the distance in the plane expressed in polar coordinates: locally, the cylinder cannot be distinguished from the plane. Later on, we will give this a name: we call these flat surfaces.

However if we were on the surface of a sphere, then the distance for small changes in φ and θ is


Now in this situation, the difference in the distance formula is not merely one in which we are using different coordinates to talk about the same thing, the thing that we are talking about is actually different. So this brings up a lot of questions. How do we know if the differences in distance formulas are real or are just differences in coordinate systems? Can we talk about distance formulas in a way that lets us naturally distinguish between real differences rather than ones that are the result of our descriptions? How can we classify different geometries? All of this may be intuitively obvious when we are talking about three-dimensional spaces or two-dimensional objects such as spheres embedded in three dimension space. However, in order to talk about the behavior of four-dimensional space-time geometries, we need to rely on mathematical statements to get us answers.

Fortunately, mathematicians such as Riemann worked this all out in the start of the 19th century. However to understand how to deal with weird geometries, we will need to learn a few more concepts, such as the concept of a tensor.

Introduction to Tensors

In this section:

What is a tensor?

< General Relativity

A tensor is a powerful abstract entity. And while its abstractness makes it a somewhat difficult thing to describe, we can begin to get a feel for what a tensor is through a non-abstract example.

Suppose you are sailing. The wind is coming from a certain direction and can be described as a vector, a directional quantity. Now there are many ways to represent this vector. For example, you can represent it as a speed from a certain direction ( ). Alternatively, you can break it up into components and describe the vector as a combination of a certain amount of wind from the east and another amount from the south ( ). But despite the different ways of describing this vector, there is still this underlying abstract thing — the wind speed from a certain direction.

Now there is another important vector — the force that the wind produces when it hits the sail. If the direction of the force were always the same as the direction of the wind, then we could represent the relationship with a scalar, which just multiplies the wind vector by a constant factor to get the force vector.

However, life isn't that easy because the force is not always in the direction of the wind. In fact it usually isn't. So we can't represent the relationship between the wind and the force by a simple scalar. However, there is one important useful fact that we can use — the relationship between the wind speed and the force on the sail is (approximately at low speeds) linear. That is, if you double the speed of the wind, you double the force. The function that computes the force from the wind is a linear operator. The fact that this operator is linear lets us represent it in terms of a matrix, relative to a given basis:


Note that just like you can change the way you represent the speed of the wind(  and  ), and the force it produces (  and  ), you can change the way you represent the operator that connects the two. In fact, whenever you change the representations of the wind and the forces, you will have to change the matrix in order to talk about the same situation. However, just like a vector is an abstract thing that can represent the wind or the force that the wind produces when it hits the sail, there is another kind of abstract thing that you can use to represent the relation between these things. In symbols


That thing in the middle, that T, is an example of a tensor. It is defined by the way in which   relates to  . Tensors are really abstract things, but we now can begin to see the power of this abstractness. For one thing you can do algebra with tensors. So, say instead of one sail T, you have two sails T and U. We can represent the total force


Now because T and U are matrices and are linear we can combine them to form a new tensor V.


We can also multiply two tensors together. We have the force that the wind produces on the ship. Now the force that the ship produces on the water, when wind is acting on ship with force   can be represent also as a tensor (the same way we did with wind, sail and ship):

where tensor   characterizes ship and both sails.

So what is a tensor? edit

Now that we have seen an example of a tensor, we can be more explicit about what a tensor is. A tensor is a linear function. In the case that we described, a tensor takes a vector and turns it into another vector. We can be more general and talk about tensors that turn a vector into a scalar or a scalar into a vector.

At this point, you should get some sense as to why tensors are important in general relativity. General relativity is all about matter changing the way that distances work. How do you find a distance? Well you take a vector and put it into a function. If the distance is short enough the function is (approximately) linear, and you can describe it as a tensor.

Special tensors edit

One special tensor is called the Kronecker delta tensor. It is just the identity matrix.

An identity is like 0 for normal addition (adding 0 to a number keeps it the same) and 1 for multiplication (multiplying a number by 1 gives the number itself).

The identity matrix is often represented by δij, where


Actually, the fact that   is actually a tensor (rather than just a symbol that has two indices) is due to the fact that a mixed tensor that has this as its components in any one coordinate frame, will have the same components in any frame! Technically, we call such entities numerically invariant tensors.

Applications of tensors edit

Tensors are used in many places.

Apart from general relativity, tensors are used extensively in Continuum Mechanics. Tensors can be used to specify the stress at any point in the continuum. Such tensors are called stress tensors. The stress at any small surface (a tiny region around a point) in the continuum can be obtained by the matrix multiplication of this stress tensor and the unit vector (column vector) that is normal to the surface.

General relativity in one page edit

There are two things that can be described with tensors.

One is the curvature. A tensor describes how much things are bent. Using this tensor you can do things like calculate distances and angles, and figure out the shortest path through an area of space.

The second is stress-energy which is roughly how much energy and momentum exists in a particular location and what direction it is flowing in.

The basic equation of general relativity relates these two tensors. That's the basic idea. Everything else involves just getting used to doing the math.

Exercises edit

(Note: What I'd really like to do here is to have people add their answers to an answers page where they can see other people's answers. So feel free to add your answers HERE)

1) Describe an example of a tensor.

2) Suppose you have a situation in which the response of the sail to the wind is non-linear. How can you describe this in terms of tensors?

3) Can pressure be expressed as tensor?

Contravariant and Covariant Indices

< General Relativity

Rank and Dimension edit

Now that we have talked about tensors, we need to figure out how to classify them. One important characteristic is the rank of a tensor, which is the number of indicies needed to specify the tensor. An ordinary matrix is a rank 2 tensor, a vector is a rank 1 tensor, and a scalar is rank 0. Tensors can, in general, have rank greater than 2, and often do.

Another characteristic of a tensor is the dimension of the tensor, which is the count of each index. For example, if we have a matrix consisting of 3 rows, with 4 elements in each row (columns), then the matrix is a tensor of dimension (3,4), or equivalently, dimension 12.

The important thing about rank and dimension is that they are invariant to changes in the coordinate system. You can change the coordinate system all you want, and the rank and the dimensions don't change. This brings up the important question of how tensors do change when you change the coordinate system. One thing we shall find when we look at the question is that in reality there are two different types of vectors.

Contravariant and Covariant Vectors edit

Imagine that you are driving a car at 100 kilometers per hour to the east, or along the positive x-axis. We shall call your velocity vector v. For now, we will keep the vectors one-dimensional. Suddenly you realize that you are in a meter-ish mood and so we want to figure out how fast you are going using meters instead of kilometers. Quickly changing your coordinate system, you find that you are traveling 100 * 1000 = 100 000 meters per hour easterly. We will call this vector v'. No problem.

Now you are in the rolling countryside, and you notice the temperature changing. We then draw a map of how the temperature changes as we move across the countryside. We then travel along the path of steepest descent, or fastest cooling. At our current position, the temperature falls at 10 Celsius degrees per kilometer toward the east. Let's call this temperature gradient vector w. Again, you go into a meter-ish mood. Doing a quick calculation you figure out that the gradient of the temperature change is -10/1000 = -.01 Celsius degrees per meter. We shall call this vector w'.

Did you notice something interesting?

Even though we are talking about two vectors we are treating them very differently when we change our coordinates. In the first case, the vector reacted to the coordinate change by a multiplication. That is to say, v'=k•v. In the second case, we did a division: w'=1/k•w. The first case we were changing a vector that was distance per something, while in the second case, the vector was something per distance. These are two very different types of vectors. The graphic below depicts the vectors representing v, v', w, and w'

The first set of vectors, representing velocity, is contravariant; as the scale decreases from kilometers to meters, the length of the vector increases. The second set of vectors, representing temperature gradient, is covariant; as the scale decreases from kilometers to meters, the length of the vector decreases as well

The mathematical term for the first type of vector is called a contravariant vector. The second type of vector is called a covariant vector. Sometimes a covariant vector is called a one form.

Attempting a fuller explanation
It is easy to see why w is called covariant. Covariant simply means that the characteristic that w measures, change in temperature, increases in magnitude with an increase in displacement along the coordinate system. In other words, the further you travel from a fixed point, the more the temperature changes, or equivalently, change in temperature covaries with change in displacement.
Although it is a bit more difficult to see, v is called contravariant for precisely the opposite reason. Since v represents a velocity, or distance per unit time, we can think of v as the inverse of time per unit distance, meaning the amount of time that passes in traveling a certain fixed amount of distance. Time per unit distance is clearly covariant, because as you travel further and further from a fixed point, more and more time elapses. In other words, time covaries with displacement. Since velocity is the inverse of time per unit distance, than it follows that velocity must be contravariant.
The difference is also evident in the units of measure. The units of measure for v are meters per hour, whereas the units for w are degrees Celsius per meter. The coordinate system is position in space, measured in units of meters. So again, we see that the coordinate system appears in the numerator of v, which suggests that v is contravariant (with inverse time in this case), whereas the coordinate system appears in the denominator of w, which indicates that w is covariant (with change in temperature).
Contravariant vectors describe those quantities where the distance units comes at the numerator (like velocity), whereas covariant are those where the distance unit is at the denominator (like temperature gradient).

These are, of course, just fancy mathematical names. As we can see contravariant vectors and covariant vectors are very different from each other and we want to avoid confusing them with each other. To do this mathematicians have come up with a clever notation. The components of a contravariant vector are represented by superscripts, while the components of a covariant vector are represented by subscripts. So the components of vector v are v1 and v2 while the components of vector w are w1 and w2.

Scale Invariance edit

Now that we have contravariant vectors and covariant vectors, we can do something very interesting and combine them. We have a contravariant vector that describes the direction and speed at which we are going. We have covariant vector that describes the rate and direction at which the temperature changes. If we combine them using the dot product


dT/dt = 100 · -10 = -1000 degrees Celsius per hour

we get the rate at which the temperature changes, f, as we move in a certain direction, with units of degrees Celsius per hour. The interesting thing about the units of f is that they do not include any units of distance, such as meters or kilometers. So now suppose we change the coordinate system from meters to kilometers. How does f change?


dT/dt = 100,000 · -.01 = -1000 degrees Celsius per hour

It doesn't. We call this characteristic scale invariance, and we say that f is a scale invariant quantity. The value of f is invariant with changes in the scale of the coordinate system.

Now so far we have been treating w as if it were just an odd type of vector. But there is a another more powerful way of thinking about w. Look at what we just did. We took v, combined it with w and got something that doesn't change when you change the coordinate system. Now one way of thinking about it is to say that w is a function, that takes v and converts it into a scale invariant value, f. In plainspeak, w would be the function that takes in any velocity of a particle and produces the change in temperature that the particle experiences each hour (for the specific temperature field declared earlier).

Vector Spaces and Basis Vectors edit

This fact that a covariant vector like w can convert any contravariant vector like v into a scale invariant value like f is summarized by saying that w is a linear functional.

Let us be more precise about the word like. Mathematical operations, such as converting one sort of vector into another sort of vector, are done on vector spaces. See vector space for a careful definition of vector spaces. Here, loosely speaking, let us say that a vector space is a set of vectors which can be added together and multiplied by numbers and that the result is always another vector in the same vector space.

Let us define   to be the vector space of contravariant vectors like v.

Then, the set of all covariant vectors like w, which convert vectors like v from   into scalars like f, which we can also call the set of all linear functionals w on  , can be given the name  , which we call the dual space.

  is also a vector space. Remember, we can view w as a vector or as a function, depending on which of its properties we wish to emphasize.

Now we can be more careful about the word like by saying which spaces w and v must be a member of: any vector w in   (called a covariant vector, or a 1-form) can convert any vector v in   (called a contravariant vector) into a scale invariant value like f. (We have not said what space or set f is a member of: in practice, we will usually only be interested in f as a member of the set of real numbers.)

Any vector space has a set of basis vectors. That is to say, if  , then   may be written as   where,

  •   is an index ranging from 1 to the dimension of  .
  • The set { } are the basis vectors of vector space  .
  •   is a constant.

Note that although components of contravariant vectors are written with superscript ("upper") indices, the basis vectors are written with subscript ("lower") indicies. If the set { } is a basis for  , then   is written as the linear combination  . (We are using Einstein summation notation, detailed in the next section; this is shorthand for  .)

Before moving on to covariant vectors, we must define the notion of a dual basis. Remember that elements of   are linear functionals on  . So we can "apply" covariant vectors to contravariant vectors to get a scalar. For example, if   and  , then   returns a scalar. Now, the dual basis is defined as follows: if { } is a basis for  , then the dual basis is a basis { } for   which satisfies   (where   is the Kronecker delta) for every   and  .

Now, the components of covariant vectors are written with subscript ("lower") indices. As { } is a basis for  , we can write a covariant vector   as  .

We can now evaluate any functional (covariant vector) applied to any vector (contravariant vector). If   and  , then by linearity  . Finally, if we define  , we see that  .

Einstein Summation Notation

<General Relativity

In the last sections we talked about a number of operations involving tensors. One of them is to take a covariant vector and a contravariant vector and turn them into a scalar. Another is to get a contravariant vector and put it into a tensor and get out a force.

Since we want to do math with these, let us try to see how we can represent these. We take as an example trying to combine a contravariant vector (v) which represents the direction and speed we are travelling in and a covariant vector (w) which represents the rate of distance at which a temperature is changing in a certain direction. We want to get the scale invariant quantity describing the rate of time at which the temperature is changing as we move in direction v.

Now we could do it really abstractly. For example if we want to combine a contravariant tensor and covariant tensor to get a scalar we could write...


This is just our old friend the dot product. This has the advantage that it is short and simple to write. However, the problem with this is that it doesn't let us know what f, v, and w are. f is a scalar. v is a contravariant tensor. w is a covariant tensor. This wasn't a problem in basic vector calculus, where we just had to deal with scalars and vectors. But it is a problem now that our mathematical zoo has more animals.

The next approach would be to write everything as a component. So we have


The trouble with this is that it is a lot of typing of the same numbers, over and over again. Lets write it out in summation notation.


Better... But that summation sign, do we really want to write it over and over and over and over? What does it give us? We can be really clever and just write


and just know that when we see the same index on top and on the bottom, we mean to take a sum. This is called Einstein summation notation. Whenever one sees the same letter on both superscript ("upper") indices and subscript ("lower") indices in a product, one automatically sums over the indices. Note that in GR, indices usually range from 0 to 3. (Note: Greek letters typically range from 0 to 3, while Roman letters range from 1 to 3).

Here are some more examples of the Einstein summation notation being used:


2.   etc. (16 terms total)


Identities edit

Several identities arise from indicial notation.


Since   if  ,


Rigorous Definition of Tensors

< General Relativity

We have seen that a 1-form ("covariant vector") can be thought of an operator with one slot in which we insert a vector ("contravariant vector") and get the scalar  . Similarly, a vector can be thought of as an operator with one slot in which we can insert a 1-form to obtain the scalar  . As operators, they are linear, i.e.,  .

A tensor of rank n is an operator with n slots for inserting vectors or 1-forms, which, when all n slots are filled, returns a scalar. In order for such an operator to be a tensor, it must be linear in each slot and obey certain transformation rules (more on this later). An example of a rank 2 tensor is  . The symbol   (pronounced "tensor") tells you which slot each index acts on. This tensor   is said to be of type   because it has one contravariant slot and one covariant slot. Since   acts on the first slot and   acts on the second slot, we must insert a 1-form in the first slot and a vector in the second slot (remember, 1-forms act on vectors and vice-versa). Filling both of these slots, say with   and  , will return the scalar  . We can use linearity (remember, the tensor is linear in each slot) to evaluate this number:


We don't have to fill all of the slots. This will of course not produce a scalar, but it will lower the rank of the tensor. For example, if we fill the second slot of  , but not the first, we get a rank 1 tensor of type   (which is a contravariant vector):


For another example, consider the rank 5 tensor  . This is a tensor of type  . We can fill all of its slots to get a scalar:


Filling only the 3rd and 4th slots, we get a rank 3 tensor of type  :


As a final note, it should be mentioned that in General Relativity we will always have a special tensor called the "metric tensor" which will allow us to convert contravariant indices to covariant indices and vice-versa. This way, we can change the tensor type   and be able to insert either 1-forms or vectors into any slot of a given tensor.

Coordinate systems and the comma derivative

<General Relativity

In General Relativity we write our (4-dimensional) coordinates as  . The flat Minkowski spacetime coordinates ("Local Lorentz frame") are  ,  ,  , and  , where   is the speed of light,   is time, and  ,  , and   are the usual 3-dimensional Cartesian space coordinates.

A comma derivative is just a convenient notation for a partial derivative with respect to one of the coordinates. Here are some examples:





If several indices appear after the comma, they are all taken to be part of the differentiation. Here are some examples:



Now, we change coordinate systems via the Jacobian  . The transformation rule is  .

Finally, we present the following important theorem:


Proof:  , which by the chain rule is  , which is of course  .  

Tensors and geometry

Metric tensor

<General Relativity

Recall that a tensor is a linear function which can convert vectors into scalars. Recall also that a distance can be stated as a formula that converts vectors to a scalar. So can we express distance with tensors formulas? Yes, we can.

The first problem comes in, in that tensors are linear functions, but we have some squares in our distance formula. We can deal with this with a mathematical trick. Consider the formula for distance in normal three dimensional Euclidean space using cartesian coordinates:


We can rewrite this as:


Now   is obviously a tensor. What type of tensor is it? Well it takes two contravariant vectors and turns them into a scalar  . So it must be a covariant tensor of rank 2.   is called the Kronecker delta tensor, which is 1 whenever   and 0 otherwise. In general, instead of components  , we have   :


This leads us to a general metric tensor  . As shown earlier, in Euclidean 3-space,   is simply the Kronecker delta matrix.

And that is the equation of distances in Euclidean three space in tensor notation.

Now let's do special relativity using this notation:


where the Greek letters just remind us that we are summing over four dimensional space time. Now in the case of special relativity   is zero for where   and   are different, +1 for the space indices 1,2,3 and   for the time index. We can call this special matrix  , giving us the formulas:


In general, however,   will not be a constant. A simple example where we can see that is spherical coordinates, with the metric


Here,  ,  ,  ,  ,  ,  , and  .

Also, a metric may have off-diagonal terms, as in


It is easy to see that   and  .

Raising and Lowering Indices

<General Relativity

Given a tensor  , the components   are given by   (just insert appropriate basis vectors and basis one-forms into the slots to get the components).

So, given a metric tensor  , we get components   and  . Note that   since  .

Now, given a metric, we can convert from contravariant indices to covariant indices. The components of the metric tensor act as "raising and lowering operators" according to the rules   and  . Here are some examples:


Finally, here is a useful trick: thinking of the components of the metric as a matrix, it is true that   since  .


A geodesic is the generalization of a straight line for curved space. They deal largely with calculus of variations

Metric Geodesics edit

A metric geodesic is defined as a curve along the shortest or longest possible distance between two points. Mathematically, it is defined as a curve whose length does not change with small variations that vanish at the endpoints. This stability could be a minimum distance, a maximum distance, or a point of inflection.

Mathematically, a metric geodesic is defined by the curve

Affine Geodesics edit

For instance on the surface of a sphere the shortest possible distance between two points is always the circumference of the sphere that runs through those two points. Those points on the sphere define exactly one "line" that runs through them. This line can't be said to be straight in the Euclidean sense of the word. However, for the curved surface of the sphere it represents the shortest possible distance and is therefore a metric geodesic of that space and represents a straight path for that space. Another possible geodesic for those two points is the other part of the circumference, which would be the longest path possible.


The geometry taught in schools is Euclidean geometry; the geometry of a flat surface. Here all the familiar axioms apply, e.g. the angles of a triangle add up to 180o, and the area of concentric circles increases proportionally to the square of the radius. However on a curved surface, e.g. the surface of a sphere, these axioms no longer apply. The angles of a triangle can add up to as much as 270o, and flat-surface geometry no longer works. Such a surface is said to have a positive curvature.

Negatively curved surfaces also exist - they are shaped somewhat like an infinitely extended saddle - and Euclidean geometry does not apply to these surfaces either. For example, the angles of a triangle add up to less than 180o.

If we extend these ideas to three dimensions, (do not be worried if you can't imagine a three-dimensional surface of a sphere, the human mind was never equipped to do so), we have three options to describe the geometry of the universe. Either:

  1. The curvature of space is zero: i.e. Euclidean geometry applies
  2. Space has positive curvature, i.e. it is shaped as a hypersphere (3D spherical surface)
  3. Space has negative curvature, i.e. it is shaped like a so-called hypersaddle

Latest Limits on anisotropy of background radiation from WMAP

WMAP now places 50% tighter limits on the standard model of cosmology (cold dark matter and a cosmological constant in a flat universe), and there is no compelling sign of deviations from this model.

WMAP has detected a key signature of inflation. Wmap data place tight constraints on the hypothesized burst of growth in the first trillionth of a second of the universe, called 'inflation', when ripples in the very fabric of space may have been created. The 7-year data provide compelling evidence that the large-scale fluctuations are slightly more intense than the small-scales ones, a subtle prediction of many inflation models.

NASA's WMAP project showed to within 2% accuracy, by measuring angles between notable features in the Cosmic Microwave Background , that the universe is indeed flat (not in the pancake sense of the word, but meaning that it obeys the laws of Euclidean geometry). This has several intriguing implications (for example it implies that the total mass-energy of the universe is zero), some of which are covered later in this article.

General Relativity and Spacetime Curvature edit

Einstein's brilliance was to suggest that although gravity manifests itself as a force, it is in fact a result of the geometry of spacetime itself. He suggested that matter causes spacetime to curve positively. The sun, for instance warps spacetime, and it is this warping of geometry to which the planets react and not directly to the sun itself. This is a central tenet of the General theory of Relativity. This local curvature can be described in mathematical terms using tensor calculus, an incredibly elegant tool which provides consistent results, regardless of the chosen frame of reference.

This predicts that if a giant triangle was to be constructed around the sun, the angles at its vertices would in fact add up to more than 180o. This is easy to imagine if one thinks of the sun as warping geometry, causing the triangle to have "wonky" sides. However it is incredibly important to note that these lines are in fact the straightest lines possible (geodesics) in this warped geometry.

These predictions can be tested, and have been to a very high degree of accuracy.

How come matter doesn't cause the universe to have an overall positive curvature? edit

How can the universe exhibit Euclidean geometry if stars and planets distort it locally? Einstein's mass - energy equivalence predicts that not only stars and planets contribute to this local distortion, but energy also does. So mass, the Cosmic Microwave Background and other electromagnetic energy all contribute to the positive curvature of spacetime. How, then, is the universe flat?

The answer lies in a curious fact about gravitation. Imagine if you were to pluck the Earth from its orbit, so that it was no longer affected by the gravitational field of the sun. You would have to expend energy to do so, implying that the potential energy possessed by the earth by virtue of its position in orbit around the sun is in fact, negative, as it requires an input of energy to raise it to a state of zero gravitational potential.

Now if mass and observed "positive" energy, cause spacetime to curve one way, then gravitational "negative" energy must curve it the other way, leading to the observed universe with zero curvature.

Consider this for a moment. If net positive mass - energy means positive curvature, and similarly negative mass-energy means negative curvature, then spacetime with zero curvature implies zero mass energy.


Riemann tensor

In the mathematical field of differential geometry, the Riemann curvature tensor is the most standard way to express curvature of Riemannian manifolds. It is one of many things named after Bernhard Riemann. The curvature tensor is given in terms of a Levi-Civita connection by the following formula:


NB. Some authors define the curvature tensor with the opposite sign.

If   and   are coordinate vector fields then   and therefore the formula simplifies to


i.e. the curvature tensor measures noncommutativity of the covariant derivative.

The Riemann curvature tensor, especially in its coordinate expression (see below), is a central mathematical tool of general relativity, the modern theory of gravity.

Coordinate expression edit

In local coordinates   the Riemann curvature tensor is given by


where   are the coordinate vector fields. The above expression can be written using Christoffel symbols:


The transformation of a vector   after circling an infinitesimal rectangle   is:  .

Symmetries and identities edit

The Riemann curvature tensor has the following symmetries:


The last identity was discovered by Gregorio Ricci-Curbastro, but is often called the first Bianchi identity or algebraic Bianchi identity, because it looks similar to the Bianchi identity below. These three identities form a complete list of symmetries of the curvature tensor, i.e. given any tensor which satisfies the identities above, one can find a Riemannian manifold with such a curvature tensor at some point. Simple calculations show that such a tensor has   independent components.

Yet another useful identity follows from these three:


The Bianchi identity (often called the second Bianchi identity or differential Bianchi identity) involves the covariant derivative:


Given any coordinate chart about some point on the manifold, the above identities may be written in terms of the components of the Riemann tensor at this point as:

  (first Bianchi identity)
  (second Bianchi identity)

where the square brackets denote cyclic symmetrisation over the indices and the semi-colon is a covariant derivative.

For surfaces edit

For a two-dimensional surface, the Bianchi identities imply that the Riemann tensor can be expressed as


where   is the metric tensor and   is a function called the Gaussian curvature and a, b, c and d take values either 1 or 2. As expected we see that the Riemann curvature tensor only has one independent component.

The Gaussian curvature coincides with the sectional curvature of the surface. It is also exactly half the scalar curvature of the 2-manifold, while the Ricci curvature tensor of the surface is simply given by


Covariant Differentiation

<General Relativity

Now that we have established some of the basic of tensor algebra and curved space, lets try to do something within that curved space. We start by taking a derivative. In Einstein notation, taking a derivative looks like


In the spirit of seeing how things work when we transform coordinates, we convert the coordinates from   to   by a tensor transformation  

So let's figure out what our derivatives look like in our new coordinate system.




So if the transform is a constant we get a very nice result.....

The result is much less nice if the transform changes with location, that is to say instead of transform   we use the transform  

What we have here is a nice part, and a not so nice part. If only there was a way to get rid of the not so nice part. At this point we do an interesting trick and that is to redefine the notion of derivative. Instead of defining derivative as simply the way we do in Euclidean space, we create a new type of derivative called the covariant derivative. The covariant derivative is like the normal derivative, except that we add a "fudge factor" to get rid of the not nice parts of the equation so that the result transforms nicely.

Christoffel symbols

( << Back to General Relativity)

Definition of Christoffel Symbols edit

Consider an arbitrary contravariant vector field defined all over a Lorentzian manifold, and take   at  , and at a neighbouring point, the vector is   at  .

Next parallel transport   from   to  , and suppose the change in the vector is  . Define:


The components of   must have a linear dependence on the components of  . Define Christoffel symbols  :


Note that these Christoffel symbols are:

  • dependent on the coordinate system (hence they are NOT tensors)
  • functions of the coordinates

Now consider arbitrary contravariant and covariant vectors   and   respectively. Since   is a scalar,  , one arrives at:





Connection Between Covariant And Regular Derivatives edit

From above, one can obtain the relations between covariant derivatives and regular derivatives:



Analogously, for tensors:


Calculation of Christoffel Symbols edit

From  , one can conclude that  .

However, since   is a tensor, its covariant derivative can be expressed in terms of regular partial derivatives and Christoffel symbols:


Rewriting the expression above, and then performing permutation on i, k and l:




Adding up the three expressions above, one arrives at (using the notation  ):


Multiplying both sides by  :


Hence if the metric is known, the Christoffel symbols can be calculated.

Einstein's equation

<General Relativity

Main article: Einstein's field equation

The Einstein field equation or Einstein equation is a dynamical equation which describes how matter and energy change the geometry of spacetime, this curved geometry being interpreted as the gravitational field of the matter source. The motion of objects (with a mass much smaller than the matter source) in this gravitational field is described very accurately by the geodesic equation.

Mathematical form of Einstein's field equation edit

Einstein's field equation (EFE) is usually written in the form:



The EFE equation is a tensor equation relating a set of symmetric 4 x 4 tensors. It is written here in terms of components. Each tensor has 10 independent components. Given the freedom of choice of the four spacetime coordinates, the independent equations reduce to 6 in number.

The EFE is understood to be an equation for the metric tensor   (given a specified distribution of matter and energy in the form of a stress-energy tensor). Despite the simple appearance of the equation it is, in fact, quite complicated. This is because both the Ricci tensor and Ricci scalar depend on the metric in a complicated nonlinear manner.

One can write the EFE in a more compact form by defining the Einstein tensor


which is a symmetric second-rank tensor that is a function of the metric. Working in geometrized units where G = c = 1, the EFE can then be written as


The expression on the left represents the curvature of spacetime as determined by the metric and the expression on the right represents the matter/energy content of spacetime. The EFE can then be interpreted as a set of equations dictating how the curvature of spacetime is related to the matter/energy content of the universe.

These equations, together with the geodesic equation, form the core of the mathematical formulation of General Relativity.

Properties of Einstein's equation edit

Conservation of energy and momentum edit

An important consequence of the EFE is the local conservation of energy and momentum; this result arises by using the differential Bianchi identity to obtain


which, by using the EFE, results in


which expresses the local conservation law referred to above.

Nonlinearity of the field equations edit

The EFE are a set of 10 coupled elliptic-hyperbolic nonlinear partial differential equations for the metric components. This nonlinear feature of the dynamical equations distinguishes general relativity from other physical theories.

For example, Maxwell's equations of electromagnetism are linear in the electric and magnetic fields (i.e. the sum of two solutions is also a solution).

Another example is Schrodinger's equation of quantum mechanics where the equation is linear in the wavefunction.

The correspondence principle edit

Einstein's equation reduces to Newton's law of gravity by using both the weak-field approximation and the slow-motion approximation. In fact, the gravitational constant   appearing in the EFE's is determined by making these two approximations.

The cosmological constant edit

The cosmological constant term   was originally introduced by Einstein to allow for a static universe (i.e., one that is not expanding or contracting). This effort was unsuccessful for two reasons: the static universe described by this theory was unstable, and observations of distant galaxies by Hubble a decade later confirmed that our universe is in fact not static but expanding. So   was abandoned (set to 0), with Einstein calling it the "biggest blunder he ever made".

Despite Einstein's misguided motivation for introducting the cosmological constant term, there is nothing wrong (i.e. inconsistent) with the presence of such a term in the equations. Indeed, quite recently, improved astronomical techniques have found that a non-zero value of   is needed to explain some observations.

Einstein thought of the cosmological constant as an independent parameter, but its term in the field equation can also be moved algebraically to the other side, written as part of the stress-energy tensor:


The constant


is called the vacuum energy. The existence of a cosmological constant is equivalent to the existence of a non-zero vacuum energy. The terms are now used interchangeably in general relativity.

Solutions of the field equations edit

The solutions of the Einstein field equations are metrics of spacetime. The solutions are hence often called 'metrics'. These metrics describe the structure of the spacetime including the inertial motion of objects in the spacetime. As the field equations are non-linear, they cannot always be completely solved (i.e. without making approximations). For example, there is no known complete solution for a spacetime with two massive bodies in it (which is a theoretical model of a binary star system, for example). However, approximations are usually made in these cases. These are commonly referred to as post Newtonian approximations. Even so, there are numerous cases where the field equations have been solved completely, and those are called exact solutions.

The study of exact solutions of Einstein's field equations is one of the activities of cosmology. It leads to the prediction of black holes and to different models of evolution of the universe.

Vacuum field equations edit

If the energy-momentum tensor   is zero in the region under consideration, then the field equations are also referred to as the vacuum field equations, which can be written as:


The solutions to the vacuum field equations are called vacuum solutions. Flat Minkowski space is the simplest example of a vacuum solution. Nontrivial examples include the Schwarzschild solution and the Kerr solution.

The above vacuum equation assumes that the cosmological constant is zero. If it is taken to be nonzero then the vacuum equation becomes:


Mathematicians usually refer to manifolds with a vanishing Ricci tensor as Ricci-flat manifolds and manifolds with a Ricci tensor proportional to the metric as Einstein manifolds.

See also edit

References edit

  • Weinberg, S. Gravitation and Cosmology: Principles and Applications of the General Theory of Relativity (1972) ISBN 0471925675
  • Stephani, H., Kramer, D., MacCallum, M., Hoenselaers C. and Herlt, E. Exact Solutions of Einstein's Field Equations (2nd edn.) (2003) CUP ISBN 0521461367


<General Relativity

Relativistic cosmology is based on the following three assumptions:

  1. the cosmological principle
  2. Weyl's postulate
  3. general relativity.

Black holes

<General Relativity

Birkhoff's theorem

<General Relativity

Birkhoff's theorem

Any spherically symmetric solution of the vacuum field equations must be static and asymptotically flat.

proof ... Q.E.D.

This means that the exterior solution must be given by the Schwarzschild metric.

Schwarzschild metric

<General Relativity

Main article: Schwarzschild metric

The Schwarzschild metric can be put into the form


where   is the gravitational constant,   is interpreted as the mass of the gravitating object, and


is the standard metric on the 2-sphere. The constant


is called the Schwarzschild radius.

Note that as   or   one recovers the Minkowski metric:


Intuitively, this means that around small or far away from any gravitating bodies we expect space to be nearly flat. Metrics with this property are called asymptotically flat.

Note that there are two singularities in the Schwarzschild metric: at r=0 and  . It can be shown that while the latter singularity can be transformed away with a change of metric, the former is not. In other words, r=0 is a bonafide singularity in the metric.

Reissner-Nordström black hole

<General Relativity

Reissner-Nordström black hole is a black hole that carries electric charge  , no angular momentum, and mass  . General properties of such a black hole are described in the article charged black hole.

It is described by the electric field of a point-like charged particle, and especially by the Reissner-Nordström metric that generalizes the Schwarzschild metric of an electrically neutral black hole:


where we have used units with the speed of light and the gravitational constant equal to one ( ) and where the angular part of the metric is


The electromagnetic potential is


While the charged black holes with   (especially with  ) are similar to the Schwarzschild black hole, they have two horizons: the event horizon and an internal Cauchy horizon. The horizons are located at  . These horizons merge for   which is the case of an extremal black hole.

What is a singularity?

The notion of a singularity in general relativity refers to a region of space-time where the equations of physics break down and lose their predictive meaning as seen by some potential observer. One kind of a singularity would be where things become infinite. This definition is limited as there are solutions to Einstein's field equations where there are no infinite quantities and yet the physical description through the mathematics becomes undefined. There has been a great deal of work leading to just what a good definition of a singularity should be. One possible definition in general relativity is that it's a region of space-time in which timelike curves cannot be extended to all of the spacetime. In effect singularities are unphysical and indicate the breakdown of the theory.

BKL singularity

A BKL (Belinsky-Khalatnikov-Lifshitz) singularity[1] is a model of the dynamic evolution of the Universe near the initial singularity, described by an anisotropic, homogeneous, chaotic solution to Einstein's field equations of gravitation. According to this model, the Universe is oscillating (expanding and contracting) around a singular point (singularity) in which time and space become equal to zero. This singularity is physically real in the sense that it is a necessary property of the solution, and will appear also in the exact solution of those equations. The singularity is not artificially created by the assumptions and simplifications made by the other well-known special solutions such as the Friedmann-Lemaître-Robertson-Walker, quasi-isotropic, and Kasner solutions.

The Mixmaster universe is a solution to general relativity that exhibits properties similar to those discussed by BKL.

Existence of time singularity edit

The basis of modern cosmology are the special solutions of Einstein's field equations found by Alexander Friedmann in 1922 and 1924 that describe a completely homogeneous and isotropic Universe with any of two possible topologies corresponding to a space with a constant positive curvature ("closed model") or a constant negative curvature ("open model). The principal property of these solutions is their non-static nature. The concept of an inflating Universe that arises from Friedmann's solutions received a brilliant confirmation with the red-shift phenomenon discovered by E. Hubble and the present consensus is that the isotropic model, in general, gives an adequate description of the present state of the Universe.

At the same time, it is obvious that in the real world homogeneity is, at best, only an approximation. Even if one can speak about a homogeneous distribution of matter density at distances that are large compared to the intergalactic space, this homogeneity vanishes upon transition to smaller scales. On the other hand, the homogeneity assumption goes very far in a mathematical aspect. The high symmetry of the solution related to homogeneity can bring about specific properties that disappear when considering a more general case.

A related issue is how general is another important property of the isotropic model — the existence of a time singularity in the spacetime metric. In other words, the existence of such time singularity means finiteness of time. In the open model, there is one time singularity so time is limited from one end while in the closed model there are two singularities that limit time in both ends.

The adequacy of the isotropic model in describing the present state of the Universe by itself is not a reason to expect that it is so adequate in describing the early stages of Universe evolution. The problem initially addressed by the BKL paper[1] is whether the existence of such time singularity is a necessary property of relativistic cosmological models. There is the possibility that the singularity is generated by the simplifying assumptions, made when constructing these models. Independence of singularity on assumptions would mean that time singularity exists not only in the particular but also in the general solutions of the Einstein equations. A criterion for generality of solutions is the number of arbitrary space coordinate functions that they contain. These include only the "physically arbitrary" functions whose number cannot be reduced by any choice of reference frame. In the general solution, the number of such functions must be sufficient for arbitrary definition of initial conditions (distribution and movement of matter, distribution of gravitational field) in some moment of time chosen as initial. This number is four for vacuum and eight for a matter and/or radiation filled space.[2][3]

For a system of non-linear differential equations, such as the Einstein equations, general solution is not unambiguously defined. In principle, there may be multiple general integrals, and each of those may contain only a finite subset of all possible initial conditions. Each of those integrals may contain all required arbitrary functions which, however, may be subject to some conditions (e.g., some inequalities). Existence of a general solution with a singularity, therefore, does not preclude the existence also of other general solutions that do not contain a singularity. For example, there is no reason to doubt the existence of a general solution without singularity that describes an isolated body with a relatively small mass.

It is impossible to find a general integral for all space and for all time. However, this is not necessary for resolving the problem: it is sufficient to study the solution near the singularity. This would also resolve another aspect of the problem: the characteristics of spacetime metric evolution in the general solution when it reaches the physical singularity, understood as a point where matter density and invariants of the Riemann curvature tensor become infinite. The BKL paper[1] concerns only the cosmological aspect. This means, that the subject is a time singularity in the whole spacetime and not in some limited region as in a gravitational collapse of a finite body.

Previous work by the Landau-Lifshitz group[4][5][6] (reviewed in [2]) led to a conclusion that the general solution does not contain a physical singularity. This search for a broader class of solutions with singularity has been done, essentially, by a trial-and-error method, since a systemic approach to the study of the Einstein equations is lacking. A negative result, obtained in this way, is not convincing by itself; a solution with the necessary degree of generality would invalidate it, and at the same time would confirm any positive results related to the specific solution.

It is reasonable to suggest that if a singularity is present in the general solution, there must be some indications that are based only on the most general properties of the Einstein equations, although those indications by themselves might be insufficient for characterizing the singularity. At that time, the only known indication was related to the form of Einstein equations written in a synchronous reference frame, that is, in a frame in which the interval element is

  (eq. 1)

where the space distance element dl is separate from the time interval dt, and x0 = t is the proper time synchronized throughout the whole space.[7] The Einstein equation   written in synchronous frame gives a result in which the metric determinant g inevitably becomes zero in a finite time irrespective of any assumptions about matter distribution.[2][3]

This indication, however, was dropped after it became clear that it is linked with a specific geometric property of the synchronous frame: crossing of time line coordinates. This crossing takes place on some encircling hypersurfaces which are four-dimensional analogs of the caustic surfaces in geometrical optics; g becomes zero exactly at this crossing.[6] Therefore, although this singularity is general, it is fictitious, and not a physical one; it disappears when the reference frame is changed. This, apparently, stopped the incentive for further investigations.

However, the interest in this problem waxed again after Penrose published his theorems[8] that linked the existence of a singularity of unknown character with some very general assumptions that did not have anything in common with a choice of reference frame. Other similar theorems were found later on by Hawking[9][10] and Geroch[11] (see Penrose-Hawking singularity theorems). It became clear that the search for a general solution with singularity must continue.

Generalized Kasner solution edit

Further generalization of solutions depended on some solution classes found previously. The Friedmann solution, for example, is a special case of a solution class that contains three physically arbitrary coordinate functions.[2] In this class the space is anisotropic; however, its compression when approaching the singularity has "quasi-isotropic" character: the linear distances in all directions diminish as the same power of time. Like the fully homogeneous and isotropic case, this class of solutions exist only for a matter-filled space.

Much more general solutions are obtained by a generalization of an exact particular solution derived by Kasner[12] for a field in vacuum, in which the space is homogeneous and has Euclidean metric that depends on time according to the Kasner metric

  (eq. 2)

(see [13]). Here, p1, p2, p3 are any 3 numbers that are related by

  (eq. 3)

Because of these relationships, only 1 of the 3 numbers is independent. All 3 numbers are never the same; 2 numbers are the same only in the sets of values   and (0, 0, 1).[14] In all other cases the numbers are different, one number is negative and the other two are positive. If the numbers are arranged in increasing order, p1 < p2 < p3, they change in the ranges

  (eq. 4)

The numbers p1, p2, p3 can be written parametrically as

  (eq. 5)

All different values of p1, p2, p3 ordered as above are obtained by changing the value of the parameter u in the range u ≥ 1. The values u < 1 are brought into this range according to

  (eq. 6)
Figure 1

Figure 1 is a plot of p1, p2, p3 with an argument 1/u. The numbers p1(u) and p3(u) are monotonously increasing while p2(u) is monotonously decreasing function of the parameter u.

In the generalized solution, the form corresponding to (eq. 2) applies only to the asymptotic metric (the metric close to the singularity t = 0), respectively, to the major terms of its series expansion by powers of t. In the synchronous reference frame it is written in the form of (eq. 1) with a space distance element

  (eq. 7)

where   (eq. 8)

The three-dimensional vectors l, m, n define the directions at which space distance changes with time by the power laws (eq. 8). These vectors, as well as the numbers pl, pm, pn which, as before, are related by (eq. 3), are functions of the space coordinates. The powers pl, pm, pn are not arranged in increasing order, reserving the symbols p1, p2, p3 for the numbers in (eq. 5) that remain arranged in increasing order. The determinant of the metric of (eq. 7) is

  (eq. 9)

where v = l[mn]. It is convenient to introduce the following quantitities [15]

  (eq. 10)

The space metric in (eq. 7) is anisotropic because the powers of t in (eq. 8) cannot have the same values. On approaching the singularity at t = 0, the linear distances in each space element decrease in two directions and increase in the third direction. The volume of the element decreases in proportion to t.

The Einstein equations in vacuum in synchronous reference frame are[2][3]

  (eq. 11)
  (eq. 12)
  (eq. 13)

where   is the 3-dimensional tensor  , and Pαβ is the 3-dimensional Ricci tensor, which is expressed by the 3-dimensional metric tensor γαβ in the same way as Rik is expressed by gik; Pαβ contains only the space (but not the time) derivatives of γαβ.

The Kasner metric is introduced in the Einstein equations by substituting the respective metric tensor γαβ from (eq. 7) without defining a priori the dependence of a, b, c from t:


where the dot above a symbol designates differentiation with respect to time. The Einstein equation (eq. 11) takes the form

  (eq. 14)

All its terms are to a second order for the large (at t → 0) quantity 1/t. In the Einstein equations (eq. 12), terms of such order appear only from terms that are time-differentiated. If the components of Pαβ do not include terms of order higher than 2, then

  (eq. 15)

where indices l, m, n designate tensor components in the directions l, m, n.[2] These equations together with (eq. 14) give the expressions (eq. 8) with powers that satisfy (eq. 3).

However, the presence of 1 negative power among the 3 powers pl, pm, pn results in appearance of terms from Pαβ with an order greater than t−2. If the negative power is pl (pl = p1 < 0), then Pαβ contains the coordinate function λ and (eq. 12) become

  (eq. 16)

Here, the second terms are of order t−2(pm + pnpl) whereby pm + pnpl = 1 + 2 |pl| > 1.[16] To remove these terms and restore the metric (eq. 7), it is necessary to impose on the coordinate functions the condition λ = 0.

The remaining 3 Einstein equations (eq. 13) contain only first order time derivatives of the metric tensor. They give 3 time-independent relations that must be imposed as necessary conditions on the coordinate functions in (eq. 7). This, together with the condition λ = 0, makes 4 conditions. These conditions bind 10 different coordinate functions: 3 components of each of the vectors l, m, n, and one function in the powers of t (any one of the functions pl, pm, pn, which are bound by the conditions (eq. 3)). When calculating the number of physically arbitrary functions, it must be taken into account that the synchronous system used here allows time-independent arbitrary transformations of the 3 space coordinates. Therefore, the final solution contains overall 10 − 4 − 3 = 3 physically arbitrary functions which is 1 less than what is needed for the general solution in vacuum.

The degree of generality reached at this point is not lessened by introducing matter; matter is written into the metric (eq. 7) and contributes 4 new coordinate functions necessary to describe the initial distribution of its density and the 3 components of its velocity. This makes possible to determine matter evolution merely from the laws of its movement in an a priori given gravitational field. These movement laws are the hydrodynamic equations

  (eq. 17)
  (eq. 18)

where u i is the 4-dimensional velocity, ε and σ are the densities of energy and entropy of matter.[17] For the ultrarelativistic equation of state p = ε /3 the entropy σ ~ ε1/4. The major terms in (eq. 17) and (eq. 18) are those that contain time derivatives. From (eq. 17) and the space components of (eq. 18) one has


resulting in

  (eq. 19)

where 'const' are time-independent quantities. Additionally, from the identity uiui = 1 one has (because all covariant components of uα are to the same order)


where un is the velocity component along the direction of n that is connected with the highest (positive) power of t (supposing that pn = p3). From the above relations, it follows that

  (eq. 20)


  (eq. 21)

The above equations can be used to confirm that the components of the matter stress-energy-momentum tensor standing in the right hand side of the equations


are, indeed, to a lower order by 1/t than the major terms in their left hand sides. In the equations   the presence of matter results only in the change of relations imposed on their constituent coordinate functions.[2]

The fact that ε becomes infinite by the law (eq. 21) confirms that in the solution to (eq. 7) one deals with a physical singularity at any values of the powers p1, p2, p3 excepting only (0, 0, 1). For these last values, the singularity is non-physical and can be removed by a change of reference frame.

The fictional singularity corresponding to the powers (0, 0, 1) arises as a result of time line coordinates crossing over some 2-dimensional "focal surface". As pointed out in [2], a synchronous reference frame can always be chosen in such way that this inevitable time line crossing occurs exactly on such surface (instead of a 3-dimensional caustic surface). Therefore, a solution with such simultaneous for the whole space fictional singularity must exist with a full set of arbitrary functions needed for the general solution. Close to the point t = 0 it allows a regular expansion by whole powers of t.[18]

Oscillating mode towards the singularity edit

The four conditions that had to be imposed on the coordinate functions in the solution (eq. 7) are of different types: three conditions that arise from the equations   = 0 are "natural"; they are a consequence of the structure of Einstein equations. However, the additional condition λ = 0 that causes the loss of one derivative function, is of entirely different type.

The general solution by definition is completely stable; otherwise the Universe would not exist. Any perturbation is equivalent to a change in the initial conditions in some moment of time; since the general solution allows arbitrary initial conditions, the perturbation is not able to change its character. In other words, the existence of the limiting condition λ = 0 for the solution of (eq. 7) means instability caused by perturbations that break this condition. The action of such perturbation must bring the model to another mode which thereby will be most general. Such perturbation cannot be considered as small: a transition to a new mode exceeds the range of very small perturbations.

The analysis of the behavior of the model under perturbative action, performed by BKL, delineates a complex oscillatory mode on approaching the singularity.[1][19][20][21] They could not give all details of this mode in the broad frame of the general case. However, BKL explained the most important properties and character of the solution on specific models that allow far-reaching analytical study.

These models are based on a homogeneous space metric of a particular type. Supposing a homogeneity of space without any additional symmetry leaves a great freedom in choosing the metric. All possible homogeneous (but anisotropic) spaces are classified, according to Bianchi, in 9 classes.[22] BKL investigate only spaces of Bianchi Types VIII and IX.

If the metric has the form of (eq. 7), for each type of homogeneous spaces exists some functional relation between the reference vectors l, m, n and the space coordinates. The specific form of this relation is not important. The important fact is that for Type VIII and IX spaces, the quantities λ, μ, ν (eq. 10) are constants while all "mixed" products l rot m, l rot n, m rot l, etc. are zeros. For Type IX spaces, the quantities λ, μ, ν have the same sign and one can write λ = μ = ν = 1 (the simultaneous sign change of the 3 constants does not change anything). For Type VIII spaces, 2 constants have a sign that is opposite to the sign of the third constant; one can write, for example, λ = − 1, μ = ν = 1.[23]

The study of the effect of the perturbation on the "Kasner mode" is thus confined to a study on the effect of the λ-containing terms in the Einstein equations. Type VIII and IX spaces are the most suitable models exactly in this connection. Since all 3 quantities λ, μ, ν differ from zero, the condition λ = 0 does not hold irrespective of which direction l, m, n has negative power law time dependence.

The Einstein equations for the Type VIII and Type IX space models are[24]

  (eq. 22)
  (eq. 23)

(the remaining components  ,  ,  ,  ,  ,   are identically zeros). These equations contain only functions of time; this is a condition that has to be fulfilled in all homogeneous spaces. Here, the (eq. 22) and (eq. 23) are exact and their validity does not depend on how near one is to the singularity at t = 0.[25]

The time derivatives in (eq. 22) and (eq. 23) take a simpler form if а, b, с are substituted by their logarithms α, β, γ:

  (eq. 24)

substituting the variable t for τ according to:

  (eq. 25).


  (eq. 26)
  (eq. 27)

Adding together equations (eq. 26) and substituting in the left hand side the sum (α + β + γ)τ τ according to (eq. 27), one obtains an equation containing only first derivatives which is the first integral of the system (eq. 26):

  (eq. 28)

This equation plays the role of a binding condition imposed on the initial state of (eq. 26). The Kasner mode (eq. 8) is a solution of (eq. 26) when ignoring all terms in the right hand sides. But such situation cannot go on (at t → 0) indefinitely because among those terms there are always some that grow. Thus, if the negative power is in the function a(t) (pl = p1) then the perturbation of the Kasner mode will arise by the terms λ2a4; the rest of the terms will decrease with decreasing t. If only the growing terms are left in the right hand sides of (eq. 26), one obtains the system:

  (eq. 29)

(compare (eq. 16); below it is substituted λ2 = 1). The solution of these equations must describe the metric evolution from the initial state, in which it is described by (eq. 8) with a given set of powers (with pl < 0); let pl = р1, pm = р2, pn = р3 so that

  (eq. 30)


  (eq. 31)

where Λ is constant. Initial conditions for (eq. 29) are redefined as[26]

  (eq. 32)

Equations (eq. 29) are easily integrated; the solution that satisfies the condition (eq. 32) is

  (eq. 33)

where b0 and c0 are two more constants.

It can easily be seen that the asymptotic of functions (eq. 33) at t → 0 is (eq. 30). The asymptotic expressions of these functions and the function t(τ) at τ → −∞ is[27]


Expressing a, b, c as functions of t, one has

  (eq. 34)


  (eq. 35)


  (eq. 36)

The above shows that perturbation acts in such way that it changes one Kasner mode with another Kasner mode, and in this process the negative power of t flips from direction l to direction m: if before it was pl < 0, now it is p'm < 0. During this change the function a(t) passes through a maximum and b(t) passes through a minimum; b, which before was decreasing, now increases: a from increasing becomes decreasing; and the decreasing c(t) decreases further. The perturbation itself (λ2a in (eq. 29)), which before was increasing, now begins to decrease and die away. Further evolution similarly causes an increase in the perturbation from the terms with μ2 (instead of λ2) in (eq. 26), next change of the Kasner mode, and so on.

It is convenient to write the power substitution rule (eq. 35) with the help of the parametrization (eq. 5):

  (eq. 37)

The greater of the two positive powers remains positive.

BKL call this flip of negative power between directions a Kasner epoch. The key to understanding the character of metric evolution on approaching singularity is exactly this process of Kasner epoch alternation with flipping of powers pl, pm, pn by the rule (eq. 37).

The successive alternations (eq. 37) with flipping of the negative power p1 between directions l and m (Kasner epochs) continues by depletion of the whole part of the initial u until the moment at which u < 1. The value u < 1 transforms into u > 1 according to (eq. 6); in this moment the negative power is pl or pm while pn becomes the lesser of two positive numbers (pn = p2). The next series of Kasner epochs then flips the negative power between directions n and l or between n and m. At an arbitrary (irrational) initial value of u this process of alternation continues unlimited.[28]

In the exact solution of the Einstein equations, the powers pl, pm, pn lose their original, precise, sense. This circumstance introduces some "fuzziness" in the determination of these numbers (and together with them, to the parameter u) which, although small, makes meaningless the analysis of any definite (for example, rational) values of u. Therefore, only these laws that concern arbitrary irrational values of u have any particular meaning.

The larger periods in which the scales of space distances along two axes oscillate while distances along the third axis decrease monotonously, are called eras; volumes decrease by a law close to ~ t. On transition from one era to the next, the direction in which distances decrease monotonously, flips from one axis to another. The order of these transitions acquires the asymptotic character of a random process. The same random order is also characteristic for the alternation of the lengths of successive eras (by era length, BKL understand the number of Kasner epoch that an era contains, and not a time interval).

The era series become denser on approaching t = 0. However, the natural variable for describing the time course of this evolution is not the world time t but its logarithm, ln t, by which the whole process of reaching the singularity is extended to −∞.

According to (eq. 33), one of the functions a, b, c, that passes through a maximum during a transition between Kasner epochs, at the peak of its maximum is

  (eq. 38)

where it is supposed that amax is large compared to b0 and c0; in (eq. 38) u is the value of the parameter in the Kasner epoch before transition. It can be seen from here that the peaks of consecutive maxima during each era are gradually lowered. Indeed, in the next Kasner epoch this parameter has the value u' = u - 1, and Λ is substituted according to (eq. 36) with Λ' = Λ(1 − 2|p1(u)|). Therefore, the ratio of 2 consecutive maxima is


and finally

  (eq. 39)

The above are solutions to Einstein equations in vacuum. As for the pure Kasner mode, matter does not change the qualitative properties of this solution and can be written into it disregarding its reaction on the field.

However, if one does this for the model under discussion, understood as an exact solution of the Einstein equations, the resulting picture of matter evolution would not have a general character and would be specific for the high symmetry imminent to the present model. Mathematically, this specificity is related to the fact that for the homogeneous space geometry discussed here, the Ricci tensor components   are identically zeros and therefore the Einstein equations would not allow movement of matter (which gives non-zero stress energy-momentum tensor components  ).[29]

This difficulty is avoided if one includes in the model only the major terms of the limiting (at t → 0) metric and writes into it a matter with arbitrary initial distribution of densities and velocities. Then the course of evolution of matter is determined by its general laws of movement (eq. 17) and (eq. 18) that result in (eq. 21). During each Kasner epoch, density increases by the law

  (eq. 40)

where p3 is, as above, the greatest of the numbers p1, p2, p3. Matter density increases monotonously during all evolution towards the singularity.

To each era (s-th era) correspond a series of values of the parameter u starting from the greatest,  , and through the values   − 1,   − 2, ..., reaching to the smallest,   < 1. Then

  (eq. 41)

that is, k(s) = [ ] where the brackets mean the whole part of the value. The number k(s) is the era length, measured by the number of Kasner epochs that the era contains. For the next era

  (eq. 42)

In the limitless series of numbers u, composed by these rules, there are infinitesimally small (but never zero) values x(s) and correspondingly infinitely large lengths k(s).

Metric evolution edit

Very large u values correspond to Kasner powers

  (eq. 43)

which are close to the values (0, 0, 1). Two values that are close to zero, are also close to each other, and therefore the changes in two out of the three types of "perturbations" (the terms with λ, μ and ν in the right hand sides of (eq. 26)) are also very similar. If in the beginning of such long era these terms are very close in absolute values in the moment of transition between two Kasner epochs (or made artificially such by assigning initial conditions) then they will remain close during the greatest part of the length of the whole era. In this case (BKL call this the case of small oscillations), analysis based on the action of one type of perturbations becomes incorrect; one must take into account the simultaneous effect of two perturbation types.

Two perturbations edit

Consider a long era, during which 2 out of the 3 functions a, b, c (let them be a and b) undergo small oscillations while the third function (c) decreases monotonously. The latter function quickly becomes small; consider the solution just in the region where one can ignore c in comparison to a and b. The calculations are first done for the Type IX space model by substituting accordingly λ = μ = ν = 1.[20]

After ignoring function c, the first 2 equations (eq. 26) give

  (eq. 44)
  (eq. 45)

and as a third equation, (eq. 28) can be used, which takes the form

  (eq. 46)

The solution of (eq. 44) is written in the form


where α0, ξ0 are positive constants, and τ0 is the upper limit of the era for the variable τ. It is convenient to introduce further a new variable (instead of τ)

  (eq. 47)


  (eq. 48)

Equations (eq. 45) and (eq. 46) are transformed by introducing the variable χ = α − β:

  (eq. 49)
  (eq. 50)

Decrease of τ from τ0 to −∞ corresponds to a decrease of ξ from ξ0 to 0. The long era with close a and b (that is, with small χ), considered here, is obtained if ξ0 is a very large quantity. Indeed, at large ξ the solution of (eq. 49) in the first approximation by 1/ξ is

  (eq. 51)

where A is constant; the multiplier   makes χ a small quantity so it can be substituted in (eq. 49) by sh 2χ ≈ 2χ.[30]

From (eq. 50) one obtains


After determining α and β from (eq. 48) and (eq. 51) and expanding eα and eβ in series according to the above approximation, one obtains finally[31]:

  (eq. 52)
  (eq. 53)

The relation between the variable ξ and time t is obtained by integration of the definition dt = abc dτ which gives

  (eq. 54)

The constant c0 (the value of с at ξ = ξ0) should be now c0   α0·

Let us now consider the domain ξ   1. Here the major terms in the solution of (eq. 49) are:


where k is a constant in the range − 1 < k < 1; this condition ensures that the last term in (eq. 49) is small (sh 2χ contains ξ2k and ξ−2k). Then, after determining α, β, and t, one obtains

  (eq. 55)

This is again a Kasner mode with the negative t power coming into the function c(t).[32]

These results picture an evolution that is qualitatively similar to that, described above. During a long period of time that corresponds to a large decreasing ξ value, the two functions a and b oscillate, remaining close in magnitude  ; in the same time, both functions a and b slowly ( ) decrease. The period of oscillations is constant by the variable ξ : Δξ = 2π (or, which is the same, with a constant period by logarithmic time: Δ ln t = 2πΑ2). The third function, c, decreases monotonously by a law close to c = c0t/t0.

This evolution continues until ξ ~ 1 and formulas (eq. 52) and (eq. 53) are no longer applicable. Its time duration corresponds to change of t from t0 to the value t1, related to ξ0 according to

  (eq. 56)

The relationship between ξ and t during this time can be presented in the form

  (eq. 57)

After that, as seen from (eq. 55), the decreasing function c starts to increase while functions a and b start to decrease. This Kasner epoch continues until terms c2/a2b2 in (eq. 22) become ~ t2 and a next series of oscillations begins.

The law for density change during the long era under discussion is obtained by substitution of (eq. 52) in (eq. 20):

  (eq. 58)

When ξ changes from ξ0 to ξ ~ 1, the density increases   times.

It must be stressed that although the function c(t) changes by a law, close to c ~ t, the metric (eq. 52) does not correspond to a Kasner metric with powers (0, 0, 1). The latter corresponds to an exact solution (found by Taub[33]) which is allowed by eqs. 26-27 and in which

  (eq. 59)

where p, δ1, δ2 are constant. In the asymptotic region τ → −∞, one can obtain from here a = b = const, c = const.t after the substitution ерτ = t. In this metric, the singularity at t = 0 is non-physical.

Let us now describe the analogous study of the Type VIII model, substituting in eqs. 26-28 λ = −1, μ = ν = 1.[21]

If during the long era, the monotonically decreasing function is a, nothing changes in the foregoing analysis: ignoring a2 on the right side of equations (26) and (28), goes back to the same equations (49) and (50) (with altered notation). Some changes occur, however, if the monotonically decreasing function is b or c; let it be c.

As before, one has equation (49) with the same symbols, and, therefore, the former expressions (52) for the functions a(ξ) and b(ξ), but equation (50) is replaced by

  (eq. 60)

The major term at large ξ now becomes


so that

  (eq. 61)

The value of c as a function of time t is, as before c = c0t/t0 but the time dependence of ξ changes. The length of a long era depends on ξ0 according to

  (eq. 62)

On the other hand, the value ξ0 determines the number of oscillations of the functions a and b during an era (equal to ξ0/2π). Given the length of an era in logarithmic time (i.e., with given ratio t0/t1) the number of oscillations for Type VIII will be, generally speaking, less than for Type IX. For the period of oscillations one gets now Δ ln t = πξ/2; contrary to Type IX, the period is not constant throughout the long era, and slowly decreases along with ξ.

The small-time domain edit

As shown above, long eras violate the "regular" course of evolution; this fact makes it difficult to study the evolution of time intervals, encompassing several eras. It can shown, however, that such "abnormal" cases appear in the spontaneous evolution of the model to a singular point in the asymptotically small times t at sufficiently large distances from a start point with arbitrary initial conditions. Even in long eras both oscillatory functions during transitions between Kasner epochs remain so different that the transition occurs under the influence of only one perturbation. All results in this section relate equally to models of the types VIII and IX.[34]

During each Kasner epoch abc = Λt, i. e. α + β + γ = ln Λ + ln t. In transitions between epochs the constant ln Λ changes to the first order (cf. (eq. 36)). However, asymptotically to very large |ln t| values one can ignore not only these changes but also the constant ln Λ itself. In other words, this approximation corresponds to ignoring all values whose ratio to |ln t| converges to zero at t → 0. Then

  (eq. 63)

where Ω is the "logarithmic time"

  (eq. 64)

In this approximation, the process of epoch transitions can be regarded as a series of brief time flashes. The constant in the right hand side of condition (eq. 38) αmax = ½ ln (2|p1|Λ) that defines the periods of transition can also be ignored, i. e. this condition becomes α = 0 (or similar conditions for β or γ if the initial negative power is related to the functions b or c).[35] Thus, αmax, βmax, and γmax become zeros meaning that α, β, and γ will run only through negative values which are related in each moment by the relationship (eq. 64).

Figure 2

Considering such instant change of epochs, the transition periods are ignored as small in comparison to the epoch length; this condition is actually fulfilled.[36] Replacement of α, β, and γ maxima with zeros requires that quantities ln (|p1|Λ) be small in comparison with the amplitudes of oscillations of the respective functions. As mentioned above, during transitions between eras |p1| values can become very small while their magnitude and probability for occurrence are not related to the oscillation amplitudes in the respective moment. Therefore, in principle, it is possible to reach so small |p1| values that the above condition (zero maxima) is violated. Such drastic drop of αmax can lead to various special situations in which the transition between Kasner epochs by the rule (eq. 37) becomes incorrect (including the situations described above), see also [37]). These "dangerous" situations could break the laws used for the statistical analysis below. As mentioned, however, the probability for such deviations converges asymptotically to zero; this issue will be discussed below.

Consider an era that contains k Kasner epochs with a parameter u running through the values

  (eq. 65)

and let α and β are the oscillating functions during this era (Fig. 2).[38]

Initial moments of Kasner epochs with parameters un are Ωn. In each initial moment, one of the values α or β is zero, while the other has a minimum. Values α or β in consecutive minima, that is, in moments Ωn are

  (eq. 66)

(not distinguishing minima α and β). Values δn that measure those minima in respective Ωn units can run between 0 and 1. Function γ monotonously decreases during this era; according to (eq. 63) its value in moment Ωn is

  (eq. 67)

During the epoch starting at moment Ωn and ending at moment Ωn+1 one of the functions α or β increases from -δnΩn to zero while the other decreases from 0 to -δn+1Ωn+1 by linear laws, respectively:


resulting in the recurrent relationship

  (eq. 68)

and for the logarithmic epoch length

  (eq. 69)

where, for short, f(u) = 1 + u + u2. The sum of n epoch lengths is obtained by the formula

  (eq. 70)

It can be seen from (eq. 68) that |αn+1| > |αn|, i.e., the oscillation amplitudes of functions α and β increase during the whole era although the factors δn may be small. If the minimum at the beginning of an era is deep, the next minima will not become shallower; in other words, the residue |α — β| at the moment of transition between Kasner epochs remains large. This assertion does not depend upon era length k because transitions between epochs are determined by the common rule (eq. 37) also for long eras.

The last oscillation amplitude of functions α or β in a given era is related to the amplitude of the first oscillation by the relationship |αk-1| = |α0| (k + x) / (1 + x). Even at k 's as small as several units x can be ignored in comparison to k so that the increase of α and β oscillation amplitudes becomes proportional to the era length. For functions a = eα and b = eβ this means that if the amplitude of their oscillations in the beginning of an era was A0, at the end of this era the amplitude will become  .

The length of Kasner epochs (in logarithmic time) also increases inside a given era; it is easy to calculate from (eq. 69) that Δn+1 > Δn.[39] The total era length is

  (eq. 71)

(the term with 1/x arises from the last, k-th, epoch whose length is great at small x; cf. Fig. 2). Moment Ωn when the k-th epoch of a given era ends is at the same time moment Ω'0 of the beginning of the next era.

in the first Kasner epoch of the new era function γ is the first to rise from the minimal value γk = - Ωk (1 - δk) that it reached in the previous era; this value plays the role of a starting amplitude δ'0Ω'0 for the new series of oscillations. It is easily obtained that:

  (eq. 72)

It is obvious that δ'0Ω'0 > δ0Ω0. Even at not very great k the amplitude increase is very significant: function c = eγ begins to oscillate from amplitude  . The issue about the abovementioned "dangerous" cases of drastic lowering of the upper oscillation limit is left aside for now.

According to (eq. 40) the increase in matter density during the first (k - 1) epochs is given by the formula


For the last k epoch of a given era, it should be taken into account that at u = x < 1 the greatest power is p2(x) (not p3(x) ). Therefore, for the density increase over the whole era one obtains

  (eq. 73)

Therefore, even at not very great k values,  . During the next era (with a length k ' ) density will increase faster because of the increased starting amplitude A0':  , etc. These formulae illustrate the steep increase in matter density.

Statistical analysis near the singularity edit

The sequencing order of era lengths k(s), measured by the number of Kasner epochs contained in them, exhibits the character of a random process. The source of this stochasticity is the rule (eq. 41-42) according to which the transition from one era to the next is determined from an infinite numerical sequence of u values.

In the statistical description of this sequence, instead of a fixed initial value umax = k(0) + x(0), BKL consider values of x(0) that are distributed in the interval from 0 to 1 by some probabilistic distributional law. Then the values of x(s) that finish each (s-th) number series will also be distributed according to some laws. It can be shown [1] that with growing s these distributions converge to a definite static (s-independent) distribution of probabilities w(x) in which the initial conditions are completely "forgotten":

  (eq. 74)

This allows to find the distribution of probabilities for length k:

  (eq. 75)

The above formulae are the basis on which the statistical properties of the model evolution are studied.[34]

This study is complicated by the slow decrease of the distribution function (eq. 75) at large k:

  (eq. 76)

The mean value  , calculated from this distribution, diverges logarithmically. For a sequence, cut off at a very large but still finite number N, one has  . The usefulness of the mean in this case is very limited because of its instability: because of the slow decrease of W(k), fluctuations in k diverge faster than its mean. A more adequate characteristic of this sequence is the probability that a randomly chosen number from it belongs to a series of length K where K is large. This probability is lnK/lnN. It is small if  . In this respect one can say that a randomly chosen number from the given sequence belongs to the long series with a high probability.

The recurrent formulae defining transitions between eras are re-written and detailed below. Index s numbers the successive eras (not the Kasner epochs in a given era!), beginning from some era (s = 0) defined as initial. Ω(s) and ε(s) are, respectively, the initial moment and initial matter density in the s-th era; δsΩs is the initial oscillation amplitude of that pair of functions α, β, γ, which oscillates in the given era: k(s) is the length of s-th era, and x(s) determines the length of the next era according to k(s+1) = [1/x(s)]. According to (eq. 71-73)

  (eq. 77)