Mathematical Proof and the Principles of Mathematics/Logic/Axioms and equality

We've now covered most of what is known as first order logic, or at least we've covered what we'll need for the rest of this book. There are second and higher order logics which we'll only describe briefly. Second order logic concerns predicates that takes first order predicates as arguments rather than the objects of the universe of discourse. For example a relation is called reflexive if

For all ,

The predicate

is a reflexive relation

is then a second order predicate. Some care must be used in this direction to avoid the possibility of predicates being applied to themselves, otherwise self-referential statements such as the Liar's paradox discussed earlier may appear, contradictions may ensue, and the whole system may collapse. We'll see more on this later.

While we will, from time to time, make definitions in the form of second order predicates, but these should be thought of here as linguistic shortcuts which can be replaced by their definitions.

Axiom systems edit

We've gone about as far as possible without the introduction of axioms, but we'll need them to go forward so it's time to talk about them in detail.

So far, using rules of inference alone, we've only proved tautologies, in other words statements which don't really say anything meaningful. They may work as exercises, but in terms of saying anything interesting about the universe they aren't useful. It was realized even before Euclid that, in order to avoid circular argument, some initial assumptions must be made, logical bootstraps as you might call them.

The process goes something like this:

  1. Make a list of assumptions about the universe of discourse. These assumptions are called axioms.
  2. Use logic to prove new and hopefully interesting statements. These statements are called theorems.

This list of axioms is sometimes called an axiom system. Where axioms come from is an interesting question. They might come philosophy, experience, or by abstracting properties of other systems.

Useful properties of axiom systems edit

To be useful, axioms preferably have the following properties:

  • Consistency (there are some statements which are not theorems): This is a must-have feature, otherwise the system does not distinguish between theorems and non-theorems and is therefore useless. An equivalent way of saying this is that the statement   is not a theorem, in other words it's not possible to derive a contradiction from the axioms.
  • Completeness (every statement has either a proof or disproof): This is a should-have feature in that it's desirable to be able to decide whether a given statement is true or not. In other words there should be no unresolvable conjectures where neither a proof or counterexample is possible. Note that this is not the same as saying that the statement   or not   is a theorem since we may know   or not   is True without knowing which alternative it is. (This is somewhat controversial though; see the section on non-standard logic.) Note also that there many conjectures, some famous, for which we don't currently have a proof of   or a proof of not  . This only means that a proof has not been found though, not that a proof is impossible.
  • Independence (none of the axioms can be proved from the others): This is partly for aesthetic and partly for practical reasons. Aesthetically, the system is cleaner and more elegant if there no extra, unneeded axioms. As for practicality, suppose you have some system which you're trying to model with the axioms. The first thing you would need to do is verify that the axioms do, in fact, hold in your system. Unnecessary axioms increase the work load for this with no added benefit.
Aside: to show that an axiom is independent of the others, one finds a mathematical system in which all the axioms hold except for the given one. This shows that the given axiom cannot be proved from the others because if it could, it would hold in the given system.
  • Uncontroversial (there is general agreement as to which axioms should be included): If this does not hold then the conclusions of the theory may be disputed.
  • Power (they should be simple to describe, yet imply a plethora of rich mathematical results): This ultimately determines how useful the theory is. Sometimes the same set of axioms can be used to model different systems, and what you learn from the theory is applicable to both.

Given an axiomatic system, it would be comforting to be able to prove in some way that it's consistent and complete. Unfortunately, this was shown to impossible in the 1930's by Kurt Gödel, at least for any system usable for mathematics.

Using an axiom edit

We can now introduce another rule of inference:

If   is an axiom then deduce  

In other words you can use any axiom in a proof with no justification other than it's an axiom.

Axioms of equality edit

Our first axioms have to do with equality. While this technically still falls under the subject of logic, we're now close enough to see mathematics on the horizon.

Note that we will never actually define the phrase   equals   since it's impossible to do so without being circular. We can list some axioms which tell us how equal objects behave, and in some sense this serves as a definition. We can also give an informal description of what   equals   means, which is meant to be a guide to intuition and to help make the axioms seem plausible, but this carries no logical weight and can't be used in a proof. We'll assume you already have an intuitive idea of what   equals   means so we'll skip that part in this case.

Axiom E0: There is a relation '=', defined on the universe of discourse, written   and read   equals  .

Relations are often written using infix notation   instead of prefix notation  . This axiom basically just reserves the '=' symbol as having a special meaning.

Axiom E1 (Identity): For all  ,  .

This seems self-evident given the intuitive concept of equality. Ayn Rand fans may recognize this as the line "A is A" from Atlas Shrugged. This is also called the Reflexive property.

Axiom E2 (Substitution): If   is a predicate, then for all   and  ,   and   implies  .

This isn't actually a single axiom but what is called an axiom schema. (The word 'schema' here is a very technical term which basically means recipe or blueprint; just replace 'schema' with 'recipe' every time you see it an you won't go far wrong.) You get a new axiom for every predicate, so it's a compact way of getting any number of axioms. We're starting to relax the logical formalism here in phrasing here as well. It's clear what the statement means so there's little need to adhere strictly to the phrasing and parentheses used in previous pages.

Properties of equality edit

We can now prove two of the most important properties of equality from the axioms.

Theorem E3 (Symmetry): For all   and  ,   implies  .
Proof: Choose   and   and assume  . Let   be the predicate  . Then   by Axiom E1. Also,   so   implies  . So   which is the same as saying  . We have   implies   for arbitrary   and  , therefore for all   and  ,   implies  .

Note that we're going to be phasing out the tabular style of proof at least in the text, leaving it as an exercise for the reader to write out all the details.

Theorem E4 (Transitivity): For all  ,   and  ,   and   imply  .
Proof: This is left as an exercise with the following hint. Choose  ,   and  , and assume   and   be the predicate  . (At this point we leave it to the reader to fill in the rest of the proof.)

Defining objects edit

In addition to axioms, mathematicians also use definitions as first principles. We can use logical connectives and previously defined statements and predicates to define new statements and predicates as we have been doing all along. But we also want to be able to define an object from the properties it has. But what do you need in order for such a definition to make sense? Let's consider a predicate   and try to figure out what is needed for the statement

Define   to be the object for which   is True.

to be a meaningful definition.

First, we need such an   to exist. So we require the statement

(Existence) For some  ,  

to be true. It makes sense to require this, otherwise we would have both   and For all  , not   which would lead to a contradiction.

We need another requirement though. Suppose one person picks an   to be 'the' object for which   is True, and another person picks   to be 'the' object for which   is True. Now suppose there is some property of   which is not shared by  , in other words there is a predicate   for which   but not  . This leads to a contradiction again since both   and   are supposed to be the same object, but   is True for one and false for the other. So we need to guarantee that if   and   then there is no predicate   for which   but not  . But axiom E2 gives us just that condition if  . So we add another condition that the statement

(Uniqueness) For all   and  ,   and   imply  

be true.

These two conditions, existence and uniqueness, seem to be enough. We can then say the expression

The   for which  

is well-defined if   has the existence and uniqueness properties. We can declare  , or whatever other symbol we wish to use, as the name of object in our universe of discourse and take

 

to be an axiom. If   does not have both the existence and uniqueness properties then we are certainly free to write down the expression, but it's doesn't have any meaning, in other words it's undefined. We'll introduce more compact notation in a bit, but first we'll generalize this idea.

Functional relations edit

Instead of a predicate, lets start with a relation  . This time we want to know if

Define   to be the object for which   is True.

to be a meaningful. This is starting to become the idea of a function, but we're reserving the word 'function' for a more complicated structure involving sets in the next chapter. We've used the word 'function' already, but as a guide to intuition rather than as a definition.

Following the same kind of reasoning as in the previous section, we say

The   for which  

is well-defined if

For some  ,  

and

For all  ,   and  ,   and   imply  

We say a relation   is a functional relation if it satisfies the second condition.

Similarly, for a relation   in three variables, we say

The   for which  

is well-defined if

For some  ,  

and

For all  ,  ,   and  ,   and   imply  

We can continue this for as many variables as are needed. The relation   is a functional relation if it satisfies the second condition as before.

In the case where the expression "The   for which  " is well-defined, then write   for this   as a standard notation. We're overloading the parentheses notation here, but this shouldn't be a problem because you can tell what is meant by counting the parameters.

Going back to predicates, we take   to mean the   for which  .

To summarize, if   is a relation in at least one variable, we take   to be the value of   for which   provided

For some  ,  

and

For all  ,  , ...   and  ,   and   imply  .

If the two conditions are not met then the expression   is undefined and does not represent anything meaningful. If   is defined, then

 

is an axiom.

As an exercise, apply this to the equality relation. Namely, if   is defined to be  , then prove that for all  ,   is defined, and  .

Alternate approaches (optional) edit

The approach we've used here isn't the only one available. Another way is to simply declare that a function takes 0 or more objects as parameters and returns another object rather than a truth value as for relations. A function of 0 variables is then the same as a constant. In this scenario, you need an additional axiom for '=':

If   is any function, then for all   and  ,   and   is defined imply   is defined and  .

Note that we don't want to assume   has a value for all   since that would unduly restrict the usefulness of the concept.

Another approach is the one taken by Bourbaki, following Hilbert's epsilon calculus though with different notation. In that scenario, the phrase

Some   for which  

always defines an  . In order to avoid the issue where different values might be picked by different people, it's assumed that the value   is determined by the predicate   in some way, so everyone who picks an   will get the same answer. This requires the additional axiom:

(For all  ,   iff  ) implies (Some   for which  )=(Some   for which  )

The requirement that everyone will get the same   when choosing one for which   seems like a big ask. In the case where   has the existence and uniqueness properties, the phrase "Some   for which  " and "The   for which  " amount to the same thing.

We've chosen the approach we're using here because in future chapters we will come across situations where it's very easy to fooled by a false definition. In other words you may think you've defined a function, say, but really haven't defined anything because the logical requirements aren't met. The approach we're using spells out those requirements explicitly so once they've been checked we can move on with confidence that there are no hidden errors.