Introduction to Programming Languages/Quest for Meaning
Semantics is the field concerned about the definition of meaning of programming constructs. There are several different notations to define such meaning. A few examples include:
The three different techniques mentioned above provide the means to explain, in a formal way, what each syntactic element of a programming language does. By formal, we mean unambiguous and mechanizable. A definition is unambiguous if there is only one way in which it can be interpreted. It is mechanizable if we can write that definition in a machine; thus, obtaining an interpreter for the programming language.
Formalism is good for several reasons. The first is that it helps in understanding a programming language. The meaning of programs is not always obvious. Even experts struggle sometimes to understand the behavior of code. And two programs, implemented in different languages, yet very similar, can have different behaviors. For instance, let us consider the following program, implemented in C:
#include <stdio.h>
int main() {
int x = 1;
x += (x = 2);
printf("x = %d\n", x);
}
This program prints x = 4. Now, lets take a look into another program, this time implemented in Java:
public class T {
public static void main(String args[]) {
int x = 1;
x += (x = 2);
System.out.println("x = " + x);
}
}
This program prints x = 3. Surprising, isn't it? And, which program is wrong? The answer to this question is: none of them. They all do what it is expected they should do, according to their semantics. In C, the assignment x += (x = 2) is translated into something like x = (x = 2) + x. In Java, on the other hand, we get something more like tmp = x; x = 2; x = x + tmp.
We shall see how to use operational semantics to describe the meaning of a programming language. The operational semantics describes meaning via an abstract machine. An abstract machine is an interpreter, i.e., a machine. However, this machine is not made of bolts and wires. It is made of mathematics. That is where the abstract in the name comes from.
To build an abstract machine we need a data-structure to represent programs. We already know such a data-structure: it is called the syntax tree, which we produce during parsing. However, the syntax tree has many elements that do not contribute to the meaning of programs. For instance, the two semi-colon marks, in the C command x = 1; x = x + 1; do not bear much voice in the final value stored in x. In this case, the semi-colon is there to help the parser to identify the end of commands. Similarly, markers such as the parentheses in the expression x * (y + z) are already encoded in the structure of the syntax tree. We do not really need to keep track of these tokens when building an interpreter to the programming language. Therefore, when designing interpreters, we need something that we call the abstract syntax of programs. Differently than the concrete syntax, the abstract syntax contains only the essential elements to let an interpreter navigate throughout the structure of programs.