# Mathematical Proof and the Principles of Mathematics/Logic/Axioms and equality

We've now covered most of what is known as first order logic, or at least we've covered what we'll need for the rest of this book. There are second and higher order logics which we'll only describe briefly. Second order logic concerns predicates that takes first order predicates as arguments rather than the objects of the universe of discourse. For example a relation ${\displaystyle P(x,y)}$ is called reflexive if

For all ${\displaystyle x}$, ${\displaystyle P(x,x).}$

The predicate

${\displaystyle P}$ is a reflexive relation

is then a second order predicate. Some care must be used in this direction to avoid the possibility of predicates being applied to themselves, otherwise self-referential statements such as the Liar's paradox discussed earlier may appear, contradictions may ensue, and the whole system may collapse. We'll see more on this later.

While we will, from time to time, make definitions in the form of second order predicates, but these should be thought of here as linguistic shortcuts which can be replaced by their definitions.

## Axiom systems

We've gone about as far as possible without the introduction of axioms, but we'll need them to go forward so it's time to talk about them in detail.

So far, using rules of inference alone, we've only proved tautologies, in other words statements which don't really say anything meaningful. They may work as exercises, but in terms of saying anything interesting about the universe they aren't useful. It was realized even before Euclid that, in order to avoid circular argument, some initial assumptions must be made, logical bootstraps as you might call them.

The process goes something like this:

1. Make a list of assumptions about the universe of discourse. These assumptions are called axioms.
2. Use logic to prove new and hopefully interesting statements. These statements are called theorems.

This list of axioms is sometimes called an axiom system. Where axioms come from is an interesting question. They might come philosophy, experience, or by abstracting properties of other systems.

## Useful properties of axiom systems

To be useful, axioms preferably have the following properties:

• Consistency (there are some statements which are not theorems): This is a must-have feature, otherwise the system does not distinguish between theorems and non-theorems and is therefore useless. An equivalent way of saying this is that the statement ${\displaystyle False}$  is not a theorem, in other words it's not possible to derive a contradiction from the axioms.
• Completeness (every statement has either a proof or disproof): This is a should-have feature in that it's desirable to be able to decide whether a given statement is true or not. In other words there should be no unresolvable conjectures where neither a proof or counterexample is possible. Note that this is not the same as saying that the statement ${\displaystyle P}$  or not ${\displaystyle P}$  is a theorem since we may know ${\displaystyle P}$  or not ${\displaystyle P}$  is True without knowing which alternative it is. (This is somewhat controversial though; see the section on non-standard logic.) Note also that there many conjectures, some famous, for which we don't currently have a proof of ${\displaystyle P}$  or a proof of not ${\displaystyle P}$ . This only means that a proof has not been found though, not that a proof is impossible.
• Independence (none of the axioms can be proved from the others): This is partly for aesthetic and partly for practical reasons. Aesthetically, the system is cleaner and more elegant if there no extra, unneeded axioms. As for practicality, suppose you have some system which you're trying to model with the axioms. The first thing you would need to do is verify that the axioms do, in fact, hold in your system. Unnecessary axioms increase the work load for this with no added benefit.
Aside: to show that an axiom is independent of the others, one finds a mathematical system in which all the axioms hold except for the given one. This shows that the given axiom cannot be proved from the others because if it could, it would hold in the given system.
• Uncontroversial (there is general agreement as to which axioms should be included): If this does not hold then the conclusions of the theory may be disputed.
• Power (they should be simple to describe, yet imply a plethora of rich mathematical results): This ultimately determines how useful the theory is. Sometimes the same set of axioms can be used to model different systems, and what you learn from the theory is applicable to both.

Given an axiomatic system, it would be comforting to be able to prove in some way that it's consistent and complete. Unfortunately, this was shown to impossible in the 1930's by Kurt Gödel, at least for any system usable for mathematics.

## Using an axiom

We can now introduce another rule of inference:

If ${\displaystyle P}$  is an axiom then deduce ${\displaystyle P.}$

In other words you can use any axiom in a proof with no justification other than it's an axiom.

## Axioms of equality

Our first axioms have to do with equality. While this technically still falls under the subject of logic, we're now close enough to see mathematics on the horizon.

Note that we will never actually define the phrase ${\displaystyle x}$  equals ${\displaystyle y}$  since it's impossible to do so without being circular. We can list some axioms which tell us how equal objects behave, and in some sense this serves as a definition. We can also give an informal description of what ${\displaystyle x}$  equals ${\displaystyle y}$  means, which is meant to be a guide to intuition and to help make the axioms seem plausible, but this carries no logical weight and can't be used in a proof. We'll assume you already have an intuitive idea of what ${\displaystyle x}$  equals ${\displaystyle y}$  means so we'll skip that part in this case.

Axiom E0: There is a relation '=', defined on the universe of discourse, written ${\displaystyle x=y}$  and read ${\displaystyle x}$  equals ${\displaystyle y}$ .

Relations are often written using infix notation ${\displaystyle (xRy)}$  instead of prefix notation ${\displaystyle (R(x,y))}$ . This axiom basically just reserves the '=' symbol as having a special meaning.

Axiom E1 (Identity): For all ${\displaystyle x}$ , ${\displaystyle x=x}$ .

This seems self-evident given the intuitive concept of equality. Ayn Rand fans may recognize this as the line "A is A" from Atlas Shrugged. This is also called the Reflexive property.

Axiom E2 (Substitution): If ${\displaystyle P(x),}$  is a predicate, then for all ${\displaystyle x}$  and ${\displaystyle y}$ , ${\displaystyle x=y}$  and ${\displaystyle P(x)}$  implies ${\displaystyle P(y)}$ .

This isn't actually a single axiom but what is called an axiom schema. (The word 'schema' here is a very technical term which basically means recipe or blueprint; just replace 'schema' with 'recipe' every time you see it an you won't go far wrong.) You get a new axiom for every predicate, so it's a compact way of getting any number of axioms. We're starting to relax the logical formalism here in phrasing here as well. It's clear what the statement means so there's little need to adhere strictly to the phrasing and parentheses used in previous pages.

## Properties of equality

We can now prove two of the most important properties of equality from the axioms.

Theorem E3 (Symmetry): For all ${\displaystyle x}$  and ${\displaystyle y}$ , ${\displaystyle x=y}$  implies ${\displaystyle y=x}$ .
Proof: Choose ${\displaystyle a}$  and ${\displaystyle b}$  and assume ${\displaystyle a=b}$ . Let ${\displaystyle P(z)}$  be the predicate ${\displaystyle z=a}$ . Then ${\displaystyle P(a)}$  by Axiom E1. Also, ${\displaystyle a=b}$  so ${\displaystyle P(a)}$  implies ${\displaystyle P(b)}$ . So ${\displaystyle P(b)}$  which is the same as saying ${\displaystyle b=a}$ . We have ${\displaystyle a=b}$  implies ${\displaystyle b=a}$  for arbitrary ${\displaystyle a}$  and ${\displaystyle b}$ , therefore for all ${\displaystyle x}$  and ${\displaystyle y}$ , ${\displaystyle x=y}$  implies ${\displaystyle y=x}$ .

Note that we're going to be phasing out the tabular style of proof at least in the text, leaving it as an exercise for the reader to write out all the details.

Theorem E4 (Transitivity): For all ${\displaystyle x}$ , ${\displaystyle y}$  and ${\displaystyle z}$ , ${\displaystyle x=y}$  and ${\displaystyle y=z}$  imply ${\displaystyle x=z}$ .
Proof: This is left as an exercise with the following hint. Choose ${\displaystyle a}$ , ${\displaystyle b}$  and ${\displaystyle c}$ , and assume ${\displaystyle a=b}$  and ${\displaystyle b=c.Let[itex]P(w)}$  be the predicate ${\displaystyle a=w}$ . (At this point we leave it to the reader to fill in the rest of the proof.)

## Defining objects

In addition to axioms, mathematicians also use definitions as first principles. We can use logical connectives and previously defined statements and predicates to define new statements and predicates as we have been doing all along. But we also want to be able to define an object from the properties it has. But what do you need in order for such a definition to make sense? Let's consider a predicate ${\displaystyle P(x)}$  and try to figure out what is needed for the statement

Define ${\displaystyle X}$  to be the object for which ${\displaystyle P(X)}$  is True.

to be a meaningful definition.

First, we need such an ${\displaystyle X}$  to exist. So we require the statement

(Existence) For some ${\displaystyle x}$ , ${\displaystyle P(x)}$

to be true. It makes sense to require this, otherwise we would have both ${\displaystyle P(X)}$  and For all ${\displaystyle x}$ , not ${\displaystyle P(x)}$  which would lead to a contradiction.

We need another requirement though. Suppose one person picks an ${\displaystyle X}$  to be 'the' object for which ${\displaystyle P(X)}$  is True, and another person picks ${\displaystyle Y}$  to be 'the' object for which ${\displaystyle P(Y)}$  is True. Now suppose there is some property of ${\displaystyle X}$  which is not shared by ${\displaystyle Y}$ , in other words there is a predicate ${\displaystyle Q(x)}$  for which ${\displaystyle Q(X)}$  but not ${\displaystyle Q(Y)}$ . This leads to a contradiction again since both ${\displaystyle X}$  and ${\displaystyle Y}$  are supposed to be the same object, but ${\displaystyle Q}$  is True for one and false for the other. So we need to guarantee that if ${\displaystyle P(X)}$  and ${\displaystyle P(Y)}$  then there is no predicate ${\displaystyle Q(x)}$  for which ${\displaystyle Q(X)}$  but not ${\displaystyle Q(Y)}$ . But axiom E2 gives us just that condition if ${\displaystyle X=Y}$ . So we add another condition that the statement

(Uniqueness) For all ${\displaystyle x}$  and ${\displaystyle y}$ , ${\displaystyle P(x)}$  and ${\displaystyle P(y)}$  imply ${\displaystyle x=y}$

be true.

These two conditions, existence and uniqueness, seem to be enough. We can then say the expression

The ${\displaystyle X}$  for which ${\displaystyle P(X)}$

is well-defined if ${\displaystyle P(x)}$  has the existence and uniqueness properties. We can declare ${\displaystyle X}$ , or whatever other symbol we wish to use, as the name of object in our universe of discourse and take

${\displaystyle P(X)}$

to be an axiom. If ${\displaystyle P(x)}$  does not have both the existence and uniqueness properties then we are certainly free to write down the expression, but it's doesn't have any meaning, in other words it's undefined. We'll introduce more compact notation in a bit, but first we'll generalize this idea.

## Functional relations

Instead of a predicate, lets start with a relation ${\displaystyle P(x,y)}$ . This time we want to know if

Define ${\displaystyle Y(x)}$  to be the object for which ${\displaystyle P(x,y)}$  is True.

to be a meaningful. This is starting to become the idea of a function, but we're reserving the word 'function' for a more complicated structure involving sets in the next chapter. We've used the word 'function' already, but as a guide to intuition rather than as a definition.

Following the same kind of reasoning as in the previous section, we say

The ${\displaystyle Y(x)}$  for which ${\displaystyle P(x,y)}$

is well-defined if

For some ${\displaystyle y}$ , ${\displaystyle P(x,y)}$

and

For all ${\displaystyle x}$ , ${\displaystyle y}$  and ${\displaystyle z}$ , ${\displaystyle P(x,y)}$  and ${\displaystyle P(x,z)}$  imply ${\displaystyle y=z.}$

We say a relation ${\displaystyle P(x,y)}$  is a functional relation if it satisfies the second condition.

Similarly, for a relation ${\displaystyle P(x,y,z)}$  in three variables, we say

The ${\displaystyle Z(x,y)}$  for which ${\displaystyle P(x,y,z)}$

is well-defined if

For some ${\displaystyle z}$ , ${\displaystyle P(x,y,z)}$

and

For all ${\displaystyle x}$ , ${\displaystyle y}$ , ${\displaystyle z}$  and ${\displaystyle w}$ , ${\displaystyle P(x,y,z)}$  and ${\displaystyle P(x,y,w)}$  imply ${\displaystyle z=w.}$

We can continue this for as many variables as are needed. The relation ${\displaystyle P(x,y,z)}$  is a functional relation if it satisfies the second condition as before.

In the case where the expression "The ${\displaystyle Z(u,v,\dots )}$  for which ${\displaystyle P(u,v,\dots ,z)}$ " is well-defined, then write ${\displaystyle P(u,v,\dots )}$  for this ${\displaystyle Z}$  as a standard notation. We're overloading the parentheses notation here, but this shouldn't be a problem because you can tell what is meant by counting the parameters.

Going back to predicates, we take ${\displaystyle P()}$  to mean the ${\displaystyle X}$  for which ${\displaystyle P(X)}$ .

To summarize, if ${\displaystyle P(u,v,\dots ,z)}$  is a relation in at least one variable, we take ${\displaystyle P(u,v,\dots )}$  to be the value of ${\displaystyle z}$  for which ${\displaystyle P(u,v,\dots ,z)}$  provided

For some ${\displaystyle z}$ , ${\displaystyle P(u,v,\dots ,z)}$

and

For all ${\displaystyle u}$ , ${\displaystyle v}$ , ... ${\displaystyle z}$  and ${\displaystyle w}$ , ${\displaystyle P(u,v,\dots ,z)}$  and ${\displaystyle P(u,v,\dots ,w)}$  imply ${\displaystyle z=w}$ .

If the two conditions are not met then the expression ${\displaystyle P(u,v,\dots )}$  is undefined and does not represent anything meaningful. If ${\displaystyle P(u,v,\dots )}$  is defined, then

${\displaystyle P(u,v,\dots ,P(u,v,\dots ))}$

is an axiom.

As an exercise, apply this to the equality relation. Namely, if ${\displaystyle E(x,y)}$  is defined to be ${\displaystyle x=y}$ , then prove that for all ${\displaystyle x}$ , ${\displaystyle E(x)}$  is defined, and ${\displaystyle E(x)=x}$ .

## Alternate approaches (optional)

The approach we've used here isn't the only one available. Another way is to simply declare that a function takes 0 or more objects as parameters and returns another object rather than a truth value as for relations. A function of 0 variables is then the same as a constant. In this scenario, you need an additional axiom for '=':

If ${\displaystyle F(x)}$  is any function, then for all ${\displaystyle x}$  and ${\displaystyle y}$ , ${\displaystyle x=y}$  and ${\displaystyle F(x)}$  is defined imply ${\displaystyle F(y)}$  is defined and ${\displaystyle F(x)=F(y)}$ .

Note that we don't want to assume ${\displaystyle F(x)}$  has a value for all ${\displaystyle x}$  since that would unduly restrict the usefulness of the concept.

Another approach is the one taken by Bourbaki, following Hilbert's epsilon calculus though with different notation. In that scenario, the phrase

Some ${\displaystyle X}$  for which ${\displaystyle P(X)}$

always defines an ${\displaystyle X}$ . In order to avoid the issue where different values might be picked by different people, it's assumed that the value ${\displaystyle X}$  is determined by the predicate ${\displaystyle P(x)}$  in some way, so everyone who picks an ${\displaystyle X}$  will get the same answer. This requires the additional axiom:

(For all ${\displaystyle x}$ , ${\displaystyle P(x)}$  iff ${\displaystyle Q(x)}$ ) implies (Some ${\displaystyle X}$  for which ${\displaystyle P(X)}$ )=(Some ${\displaystyle X}$  for which ${\displaystyle Q(X)}$ )

The requirement that everyone will get the same ${\displaystyle X}$  when choosing one for which ${\displaystyle P(X)}$  seems like a big ask. In the case where ${\displaystyle P(X)}$  has the existence and uniqueness properties, the phrase "Some ${\displaystyle X}$  for which ${\displaystyle P(X)}$ " and "The ${\displaystyle X}$  for which ${\displaystyle P(X)}$ " amount to the same thing.

We've chosen the approach we're using here because in future chapters we will come across situations where it's very easy to fooled by a false definition. In other words you may think you've defined a function, say, but really haven't defined anything because the logical requirements aren't met. The approach we're using spells out those requirements explicitly so once they've been checked we can move on with confidence that there are no hidden errors.