Introduction to Programming Languages/An Interpreter for ML

In order to introduce the main concepts related to Operational Semantics, we shall define an interpreter for a subset of ML, which is comprehensive enough to include the lambda-calculus. We start with a very simple language, whose concrete syntax is given below:

    <exp> ::= <exp> + <mulexp>
            | <mulexp>
 <mulexp> ::= <mulexp> * <rootexp>
            | <rootexp>
<rootexp> ::= ( <exp> )
            | <constant>

As we had explained before, much of this syntax is boilerplate. The essential structure of the language that we want to interpret is much simpler:

<ast> ::= plus(<ast>, <ast>)
        | times(<ast>, <ast>)
        | const (<const>)

We have three kinds of elements so far: nodes denoting addition, multiplication and numbers. To produce an interpreter for this language, we need to define the actions that shall be taken once each of these nodes is visited. We shall produce such an interpreter in prolog. Because we have three varieties of nodes, we need to define three kinds of actions:

val(plus(X,Y),Value) :- val(X, XValue), val(Y, YValue), Value is XValue + YValue.
val(times(X,Y),Value) :- val(X, XValue), val(Y, YValue), Value is XValue * YValue.
val(const(X),X).

Our language, at this point, can only describe arithmetic expressions. Thus, the meaning of a program is always a number. Our prolog interpreter lets us get back this meaning:

| ?- val(const(10000), N).

N = 10000

(1 ms) yes
| ?- val(plus(const(10000), const(20)), N).

N = 10020

Notice that it is not uncommon for the meaning of a program to be only a value. In the lambda-calculus, every program is a value. In a purely functional programming language we also have this semantics. For instance, as we shall proceed in our developments, we will increase our language with more syntax. The meaning of our programs, which contain variables and functions, are just values, where a value can be a number or a function. Talking about variables, this is the next addition to our programming language. In this case, we need syntax to create let blocks, as we have in ML. For instance:

- let val x = 2 in x * x end ;
val it = 4 : int
- let val x = 2 in let val x = 3 in x * x end + x end ;
val it = 11 : int

To add let blocks to our programming language, we need to expand its concrete syntax. The new grammar can be seen below:

<exp>     ::= <exp> + <mulexp>
            | <mulexp>
<mulexp>  ::= <mulexp> * <rootexp>
            | <rootexp>
<rootexp> ::= let val <variable> = <exp> in <exp> end
            | ( <exp> )
            | <variable>
            | <constant>

In terms of abstract syntax, let blocks force us to create two new kinds of nodes. The first node represents the let construct itself. The second represents names, i.e., surrogate for values. This expanded abstract syntax can be seen below:

<ast> ::= plus (<ast>, <ast>)
        | times (<ast>, <ast>)
        | const (<const>)
        | let (<name>, <ast>, <ast>)
        | var (<name>)

We need to augment our interpreter with two new clauses, to understand variables and let blocks. One question that surfaces now is how to keep track of the values of variables. To accomplish this task, we need the support of an environment. The environment is a table, that binds variable names to the values that these names represent. Notice that we are not talking about mutable variables: once assigned, a variable will have the same value until no longer in scope. Our new version of the interpreter is given below:

val(plus(X, Y), Context, Value) :-
  val(X, Context, XValue),
  val(Y, Context, YValue),
  Value is XValue + YValue.
val(times(X, Y), Context, Value) :-
  val(X, Context, XValue),
  val(Y, Context, YValue),
  Value is XValue * YValue.
val(const(X), _, X).
val(var(X), Context, Value) :-
  lookup(X, Context, Value).
val(let(X, Exp1, Exp2), Context, Value2) :-
  val(Exp1, Context, Value1),
  val(Exp2, [(X, Value1)|Context], Value2).

lookup(Variable, [(Variable, Value)|_], Value).
lookup(VarX, [(VarY, _)|Rest], Value) :-
  VarX \= VarY,
  lookup(VarX, Rest, Value).

This new implementation lets us create variables, which we store and consult via the lookup predicate. The new semantics emulates well the behavior of blocks: a let binding creates a new scope, which is valid only within this block. As an example, below we show an ML program, and its equivalent version, in our language:

let val y = 3 in            val(let(y,const(3),
  let val x = y * y in        let(x, times(var(y), var(y)),
    x * x                       times(var(x), var(x)))),
  end                       nil, X).
end

Notice that we do have the block semantics. For instance, below we see an interpretation of the program let val x = 1 in let val x = 2 in x end + x end. This program produces the value 3, as it would be expected in ML:

| ?- val(let(x, const(1), plus(let(x, const(2), var(x)), var(x))), nil, V).
V = 3 ? ;

We want now to push the interpreter a bit further, to give it the ability to understand functions, including high-order functions and closures. This new addition means that a value can now be a function. As an example, the following program, in ML, returns a function, which denotes its meaning: let val f = fn x => x + 1 in f end. Adding functions to our language is more complicated than adding variables, because this addition requires a number of steps. In particular, our new concrete syntax is given by the grammar below:

<exp>     ::= fn <variable> => <exp>
            | <addexp>
<addexp>  ::= <addexp> + <mulexp>
            | <mulexp>
<mulexp>  ::= <mulexp> * <funexp>
            | <funexp>
<funexp>  ::= <funexp> <rootexp>
            | <rootexp>
<rootexp> ::= let val <variable> = <exp> in <exp> end
            | ( <exp> )
            | <variable>
            | <constant>

First, we need to be able to distinguish two types of values: functions and numbers. Because our programming language does not contain a type system, we shall use a "marker" to separate functions from numbers. Thus, we will append the atom fval to every value that is a function. Such values are formed by a pair: the argument and the body of the function. This is to say that every function in our programming language uses only one parameter. This is similar to what we find in ML, e.g., anonymous functions have exactly these two components: a formal parameter plus a body. Examples of anonymous functions include: fn x => x * x, fn x => (fn y => x + y) and fn x => 1. Our abstract syntax can be seen below:

<ast> ::= plus (<ast>, <ast>)
        | times (<ast>, <ast>)
        | const (<const>)
        | let (<name>, <ast>, <ast>)
        | var (<name>)
        | fn (<name>, <ast>)
        | apply (<ast>, <ast>)

We have a node apply, which denotes function application. Notice that this node receives a whole AST node, instead of a simple name that would denote a function. This is expected, as functions can be the outcome of expressions. For example, this is a valid program: (let val f = fn x => x + 1 in f end) 2. In this case, the left side of the application is the expression let val f = fn x => x + 1 in f end, which, once evaluated, will yield the function fn x => x + 1. The extended version of our interpreter is given below:

val(plus(X, Y), Context, Value) :-
  val(X, Context, XValue),
  val(Y, Context, YValue),
  Value is XValue + YValue.

val(times(X, Y), Context, Value) :-
  val(X, Context, XValue),
  val(Y, Context, YValue),
  Value is XValue * YValue.

val(const(X), _, X).

val(var(X), Context, Value) :-
  lookup(X, Context, Value).

val(let(X, Exp1, Exp2), Context, Value2) :-
  val(Exp1, Context, Value1),
  val(Exp2, [bind(X, Value1) | Context], Value2).

val(fn(Formal, Body), _, fval(Formal, Body)).

val(apply(Function, Actual), Context, Value) :-
  val(Function, Context, fval(Formal, Body)),
  val(Actual, Context, ParamValue),
  val(Body, [bind(Formal, ParamValue)|Context], Value).

lookup(Variable, [bind(Variable, Value)|_], Value).
lookup(VarX, [bind(VarY, _)|Rest], Value) :-
  VarX \= VarY, lookup(VarX, Rest, Value).

This new interpreter is powerful enough to handle programs as complicated as the code below, which is implemented in SML:

let val x = 1 in
  let val f = fn n => n + x in
    let val x = 2 in
      f 0
    end
  end
end

This program, once evaluated, gives us a rather unexpected result:

| ?- val(let(x, const(1), let(f, fn(n, plus(var(n), var(x))), let(x, const(2), apply(var(f), const(0))))), nil, X).

X = 2

We would expect to get 1 in ML. However, our interpreter gives back the value 2. The answer for this difference is simple: our interpreter uses dynamic scoping, whereas ML uses static scoping. This is natural: it is usually easier to implement dynamic scope. That is one of the reasons why some of the earlier programming languages such as Lisp used dynamic scope. But, as we had discussed before, static scope has many advantages of its dynamic counterpart. One of these advantages is the fact that we can use a function as a black-box. In other words, definitions that happen outside the body of a function will not change this function's behavior. In our case, implementing static scope is not difficult: we need to save the context that existed when the function was created. This new version of our interpreter implements static scoping:

val(plus(X, Y), Context, Value) :-
  val(X, Context, XValue),
  val(Y, Context, YValue),
  Value is XValue + YValue.

val(times(X, Y), Context, Value) :-
  val(X, Context, XValue),
  val(Y, Context, YValue),
  Value is XValue * YValue.

val(const(X), _, X).

val(var(X), Context, Value) :-
  lookup(X, Context, Value).

val(let(X, Exp1, Exp2), Context, Value2) :-
  val(Exp1, Context, Value1),
  val(Exp2, [bind(X, Value1) | Context], Value2).

val(fn(Formal, Body), Context, fval(Formal, Body, Context)).

val(apply(Function, Actual), Context, Value) :-
  val(Function, Context, fval(Formal, Body, Nesting)),
  val(Actual, Context, ParamValue),
  val(Body, [bind(Formal, ParamValue)|Nesting], Value).

lookup(Variable, [bind(Variable, Value)|_], Value).
lookup(VarX, [bind(VarY, _)|Rest], Value) :-
  VarX \= VarY, lookup(VarX, Rest, Value).

In this new version of our interpreter, functions are now represented as triples fval(Formal, Body, Context)). Two of the elements in this triple, Formal and Body, are like before: the function's argument and its body. The third element, Context, is the bindings that existed when the function was created. The implementation of apply has also changed. Instead of evaluating a function in the context in which said function is called, we evaluate it in the context in which it was created. With this new interpreter we get the same behavior as in SML:

| ?- val(let(x, const(1), let(f, fn(n, plus(var(n), var(x))), let(x, const(2), apply(var(f), const(0))))), nil, X).

X = 1

Notice that these two last versions of the interpreter are powerful enough to handle closures, just like SML does. For instance, we can write this program below in our interpreter, and run it:

let
  val f = fn x => let val g = fn y => y+x in g end
in
  f 1 2
end

If we translate it into our new language, then we have the following program:

?- val(let(f,fn(x,let(g,fn(y,plus(var(y),var(x))), var(g))), apply(apply(var(f),const(1)),const(2))),nil, X).

X = 3

← Quest for Meaning