Mathematical Proof and the Principles of Mathematics/Preliminaries/Proof anatomy

Before going into the details on how to construct a proof, we'll begin with an example. It's a proof of the triangle inequality for real numbers, and the proof is basically the one given in Triangle inequality.

An example theorem and proof

Theorem: If x and y are real numbers, then

|x+y|\leq |x|+|y|.

The proof starts with the definition of absolute value:

Definition: If x is a real number then the absolute value of x, written |x| is

|x|={\begin{cases}x,&{\mbox{if }}x\geq 0\\-x,&{\mbox{if }}x<0.\end{cases}}

In this case, the proof is divided into three parts; there are two lemmas, or helper theorems, followed by a proof of the main theorem. It's not vital that you understand all its details right now since one of the goals of this book is to help with that sort of thing.

Lemma: If x is a real number then

-|x|\leq x\leq |x|

Proof: Let x be a real number. Then either x ≥ 0 or x < 0. If x ≥ 0 then, by the definition, |x| = x. So

-|x|=-x\leq 0\leq x=|x|.

If x < 0 then, also by the definition, |x| = −x. So in this case

-|x|=x<0<-x=|x|.

Either way,

-|x|\leq x\leq |x|.

Lemma: If x is a real number, and m is a real number with m ≥ 0, and if −m ≤ x ≤ m, then |x| ≤ m.

Proof: Let x and m be a real number, and assume m ≥ 0 and −m ≤ x ≤ m. Either x ≥ 0 or x < 0. If x ≥ 0 then, by the definition, |x| = x and so

|x|=x\leq m.

If x < 0 then, by the definition, |x| = −x and so

|x|=-x\leq m.

Either way, |x| ≤ m.

Theorem: If x and y are real numbers, then

|x+y|\leq |x|+|y|.

Proof: Let x and y be a real numbers. Then, by the first lemma,

-|x|\leq x\leq |x|

-|y|\leq y\leq |y|.

Adding these gives

-(|x|+|y|)=-|x|-|y|\leq x+y\leq |x|+|y|

and therefore, by the second lemma,

|x+y|\leq |x|+|y|.

Hierarchy

We've split up the proof into parts for two reasons. First, it saves effort since it avoids having to repeat the same argument twice. In this case, we applied the first lemma to both x and y so we didn't need to prove the same fact for both variables.

The second reason is more psychological. Humans prefer getting information in a hierarchy, and the more information there is the more levels are used. Books are usually divided into chapters, chapters into sections, sections into paragraphs, paragraphs into sentences, sentences into clauses, and clauses into words. So it's not unusual to break up a theorem into smaller pieces rather than have a single long string of statements. In this case we've separated the parts into separate lemmas, but it's not uncommon to have internal divisions in a proof. Logically, a lemma is just a theorem, but it's thought of a part of a larger result rather than something that stands on its own. In this way they are similar to subroutines which support a large program.

Notice also that the various parts are clearly marked: the words Theorem, Lemma, and Proof are marked in bold so the reader can how the proof is organized. This is similar to the way chapter headings mark each chapter in a book.

Progression

There is also a logical progression to the proof. The main theorem depends on the lemmas and not the other way around, and within each proof statements follow from previous statements. There are exceptions to this though, for example the proof uses several facts about inequalities and addition which are supposed to be familiar to the reader. Among these are:

If x ≥ y then −x ≤ −y.
If x≤y and y≤z then x≤z.
If x≤y and u≤v then x+u≤y+v.
Either x ≥ 0 or x < 0
−(x+y)=−x−y

Should these statements be taken as given or should they proved before you can use them? It's tempting to say take them as given, but then where do you stop? Perhaps you could just take the theorem itself as given and avoid the need for a proof altogether. But the point of having a proof is to convince people that the theorem is true, and simply taking the theorem as a given is unlikely to convince anyone. At the other extreme you could require that these statements be proved, then require that the statements used to prove them be proved, and so on. This process could be continued indefinitely and nothing would ever have declared proven.

The point here is that no proof occurs in a vacuum; it always uses more elementary theorems or first principles, and in turn what is being proved may be used to prove more advanced theorems. The progression avoids circular reasoning, but you can't avoid having first principles.

Similarly, definitions have their own progression. The absolute value was defined using the order relations (≤, ≥, <, >). But then you could define these in terms of the absolute value, as in x≥y means |x−y|=x−y, creating circular definitions. Such definitions don't work, so just as with theorems, there need to be certain terms without definitions. These are called primitive notions or undefined terms and all other definitions are based on them.

Inference

The ways statements can be put together to establish the truth of another statement are called the rules of inference. They come from logic, but are applied in mathematical proof. For example, if we rewrite the start of the first lemma, putting each statement on a separate line and giving the reason for each, the result is:

x is a real number. (Assumption)

(1)

If x is a real number then either x ≥ 0 or x < 0. (Total property)

(2)

Either x ≥ 0 or x < 0. (By 1 and 2)

(3)

This is more detail than is usually shown in a proof, but the reader is expected to be able to fill in these details.

If we take P to stand for the statement "x is a real number," and Q to stand for the statement "Either x ≥ 0 or x < 0," then we get a more generic form of the above fragment:

P (Assumption)

(1)

P implies Q (Previous theorem)

(2)

Q (By (1) and (2)

(3)

Here we're using "P implies Q" as a shorter version of "If P then Q." The rule being applied here is:

From

P
P implies Q

Deduce

Q.

The general form of this rule is:

From

(list of statements of a specific form)

Deduce

(another statement).

This form covers around half the rules of inference.

Inference with subproofs

The remaining rules require a subproof, or proof within a proof. We said that one exception is the rule that statements in a proof must follow from previous statements in the proof is for axioms or previously proven theorems. The other exception is for temporary assumptions which appear at the start of a proof or subproof. An example is the phrase "Let x be a real number," which appears at the start of the proof of the first lemma. We're not taking x to represent a real number from now on, but only within the current proof. In fact, the proof has three such temporary assumptions:

Let x be a real number.
If x ≥ 0 ...
If x < 0 ...

In each case a proof or subproof uses the assumptions to derive a conclusion, establishing a relationship between the two.

To see how this works in the example, the proof of the first lemma started

Let x be a real number.

and ended

-|x|\leq x\leq |x|.

This allows us to infer the statement of the theorem:

If x is a real number then

-|x|\leq x\leq |x|.

Letting P stand for the statement "x is a real number," and Q to stand for the statement

-|x|\leq x\leq |x|,

then the overall form of the proof is:

Assume P

(some derivation)

Q

Therefore P implies Q.

The rule being applied in this case is:

If by assuming

P

one can derive

Q

then deduce

P implies Q

The general form of this rule is:

If by assuming

(list of statements of a specific form)

one can derive

(another statement).

This form covers most of the remaining rules of inference; the few that remain are more complex but use the two basic forms as building blocks.

The reason for having rules of inference and using them to test the validity of a proof is to avoid having to rely on intuition as a test. As seen in the previous section, sometimes intuition is not reliable. When enough detail is included, the process of checking that a proof is valid is a very mechanical process. In fact computer programs have appeared which do just this, and they can ensure that human error does not creep into long and complex proofs.

Constituent components

At the very lowest level this proof has the following basic ingredients:

Statements
The objects these statements are about. In this case the objects are real numbers but they could be anything in general.
The properties of these objects, in other words predicates.
Ways of combining simple statements and predicates into more complicated ones, for example, "Either P or Q," is a combination of P and Q
Operations, or ways of combining objects to get other objects, for example x+y from x and y.
Statements which exist outside the proof: axioms, definitions, previously proven theorems.
Rules of inference saying how a proof can be constructed from statements and subproofs.

When the predicates, axioms, etc. which are specific to mathematics have been removed, what remains is in the domain of logic, and this will be the subject of the next chapter. In it, we'll start to explore how to construct proofs, using purely logical statements as examples.

← Mathematical proof