Scheme Programming/Macros

Macros are said to be what makes Lisp Lisp. Other programming languages, notably C and C++, have macros, but they're not like Lisp macros. In C, the macro preprocessor allows you to do limited textual substitution on snippets of C code. Sometimes, this produces working C code, while other times it can result in invisible misplaced braces or semicolons that the compiler can't quite pinpoint.

Lisp macros are full-blown Lisp procedures, with the full power of any other Lisp procedure. Instead of text, they get lists that represent the bits of code that you want to change. The return value of a macro is a list representing the new program. Scheme supports three types of macro. When it comes to understanding what's going on, the easiest of these three types of macro is Lisp's original type of macro, known as defmacro in Common Lisp, EMACS Lisp, and SCM. Other Scheme implementations call this form define-macro, while still other implementations, notably Racket, reject it as being too primitive.

To use macros in SCM, you must first evaluate this form:

(require 'macro)

Other Scheme implementations generally do not require this step.

A simple macro can illustrate what's going on. We want it to do something that would be impossible to do with a procedure. So we'll implement a macro that works like a `begin` form, but runs the code within it in reverse order:

(defmacro (backwards . body)
  (cons 'begin
	(reverse body)))

Every time you write this:

(backwards
  (newline)
  (display 3)
  (newline)
  (display 2)
  (newline)
  (display 1))

...the macro code above gets called at compile time, and the code inside the backwards block is passed as the body argument. The backwards macro reverses this code and conses the symbol begin to the beginning of the result, creating a begin form:

(begin
  (display 1)
  (newline)
  (display 2)
  (newline)
  (display 3)
  (newline))

Your program will execute as if that begin form was what you wrote in your program. Macros can do absolutely anything to the code in their arguments. Here is a more useful macro:

(defmacro (while condition . body)
  `(let loop ()
     (cond (,condition
	    (begin . ,body)
	    (loop)))))

Then you can write this program:

(define counter 10)

(while (> counter 0)
    (display counter)
    (newline)
    (set! counter (- counter 1)))

The while loop checks if the conditional expression is true, and if it is, it executes the body over and over until the condition becomes false. In the above code, the condition becomes false when counter reaches a value of 0.

There is just one small problem with defmacro. The loop procedure that our while loop uses for recursion is visible to the code in the body when you use a while loop. You could set it to something else, and then the loop would break. You might do this accidentally:

> (while (< counter 10)
     (display counter)
     (newline)
     (set! loop 'oops))
0

;ERROR: Wrong type to apply:  oops
; in expression: (loop)
; in scope:
;   ()  procedure loop
;   (loop . #@let)
;STACK TRACE
1; ((#@cond ((#@< #@counter 10) (#@display #@counter) (#@newline) ...
2; ((#@let ((loop (#@lambda () (#@cond ((#@< #@counter 10) (#@dis ...

Lisp provides a solution to this, called gensym in Common Lisp and most Scheme implementations. However, in SCM, this procedure is called gentemp. It generates a symbol that is guaranteed not to appear anywhere else in the program. You can use it to generate a name to use instead of loop. Fixing the while macro, it now looks like this:

(defmacro (while condition . body)
  (let ((loop (gentemp))) ; gensym if you're not using SCM.
    `(let ,loop ()
       (cond (,condition
	      (begin . ,body)
	      (,loop))))))

Hygienic Macros: syntax-rules

Scheme introduces hygienic macros to the Lisp world. When writing this kind of macro, it is impossible to accidentally introduce variable names that the code you're altering can change. However, to write them, we must abandon the concept of code being made out of simple lists.

There are two kinds of hygienic macros, but only the less powerful of the two is supported by SCM. Syntax-rules macros rely on pattern-matching to define the valid ways to invoke the macro, and templates to define the output code. There are some macros that cannot be defined with syntax-rules. Here's the while loop again, defined with syntax-rules instead of defmacro:

(define-syntax while
  (syntax-rules ()
    ((_ condition . body)
     (let loop ()
       (cond (condition
	      (begin . body)
	      (loop)))))))

If you refer to loop within the body if this version of while, it refers to a different loop from the one shown in the template. It is impossible to define a variable with syntax-rules that code from outside the macro can see.

The form that says (_ condition . body) is the pattern. _ is a wildcard. Anything can match it, and its value is not bound to anything. In this case, it is standing in the place where the word while goes. condition and body are pattern variables. They're not real variables. They won't have value at runtime, only within the template. Due to dotted-list notation, body is matched in the cdr position, which means it's all the forms that come after the body.

Scheme's cond form can be implemented with this macro system. In most implementations, cond is in fact a macro. A typical cond definition looks like this:

(define-syntax my-cond
  (syntax-rules (else)
    ((_) (if #f 'not-reached))
    ((_ (else . body))
     (begin . body))
    ((_ (bool-expression . body) . rest)
     (if bool-expression
	 (begin . body)
	 (cond . rest)))))

my-cond works like Racket's version of cond in that (cond) is legal and has an unspecified return value, and unlike SCM's cond in which (cond) is illegal. The (else) at the top in parentheses is the list of literals. They have to match exactly in the code in order to match the pattern. In other words, where you see else in the pattern, it only matches if the word else is found there.

Now let's make a REAL macro

What if you could pattern-match regular lists at runtime in the same manner that you can pattern-match syntax at compile-time when using syntax-rules? Then, if you had a list and you wanted to capture the first two elements (or do something else if it wasn't really a list or if it didn't have two elements), you could write something like this:

(match some-value (literally-hitler)
 ((literally-hitler . rest) ; First element is literally Hitler.
  (error "Found the Nazi"))
 (((a b) second . rest) ; First element is a two-element list.
  (display a))
 ((first second . rest) ; It's a list with at least two elements.
    (display (list first second)))
 (else #f))

The macro supports literals just like syntax-rules. The only thing that's not supported is the wildcard operator. Every value matched will be bound to something.

We'll use syntax-rules because in the end, the pattern matching provided by it does more to make this job easier than does the Turing completeness of defmacro. When we're done, we'll be able to use the same kind of pattern matching in defmacro macros as well, by using this macro within its body. The only thing we'll be missing in defmacro then will be hygiene.

How pattern matching works

Implementing something like this using syntax-rules will require two macros. One macro will match a single pattern, and evaluate an expression if it succeeds, or return a failure value if it fails.

The match-pattern macro requires five arguments:

(match-pattern pattern literals match-value fail-value success-expr)

The pattern contains the names of variables in the positions you hope to find them in some candidate structure. So if the pattern is (a . b), for example, then you're trying to bind a to the car of a list, and b to its cdr. If a and b are not literals, then this pattern can be matched against match-value by generating the following code:

   (let ((hd (maybe-car match-value fail-value)))
     (if (eq? hd fail-value)
         fail-value
         (match-pattern tl literals (maybe-cdr match-value fail-value)
		                fail-value success-expr))))

maybe-car is a special version of car that won't raise an error if it's called on a non-pair. Instead of assuming that match-value will be a list, maybe-car makes it possible to check, without us having to write (if (pair? match-value) ...) every time.

If the car of the list matches, the match-pattern macro is used again on both the cdr of the data and the cdr of the pattern (which is tl in the code snippet above). This way, tl can be either another pattern, or a variable.

If maybe-car or maybe-cdr encounter something that isn't a list, they return fail-value, which we then check for and return if it's found. That way, if any call to maybe-car or maybe-cdr evaluates to fail-value, then the entire match-pattern also evaluates to fail-value.

The macro will use exists-in? from the Looping chapter to implement the literals. If an identifier in the pattern is found in the literals, then that identifier is interpreted as a symbol, and that position in the structure must be that identifier or fail-value will be returned. The failure-value will be provided from outside the macro. Ultimately, we'll generate a failure value at runtime using (gentemp) (or (gensym)) on any Scheme implementation except SCM).

(require 'macro)

(define (maybe-car obj fail-value)
  (if (pair? obj)
      (car obj)
      fail-value))

(define (maybe-cdr obj fail-value)
  (if (pair? obj)
      (cdr obj)
      fail-value))

(define exists-in?
  (lambda (ele lis)
    (cond ((null? lis) #f)
          ((equal? ele (car lis)) #t)
          (else (exists-in? ele (cdr lis))))))

(define-syntax match-pattern
  (syntax-rules ()
    ;; No pattern. Matches if the match-value is null.
    ((_ () literals match-value fail-value success-expr)
     (if (null? match-value)
	 success-expr
	 fail-value))
    ;; Notice there are TWO pattern-matches going on: One at compile-time via
    ;; syntax-rules, and another at runtime, being done with cond forms
    ;; and comparison with the 'fail-value to detect failures deeper in the
    ;; pattern. 
    ;;
    ;; This case matches when the first element of the pattern is a list.
    ;; It generates code that matches the match-value only if its first element
    ;; is also a list.
    ((_ ((hhd . htl) . tl) literals match-value fail-value success-expr)
     (cond ((eq? match-value fail-value)
	    fail-value)
           ;; Macros are allowed to expand into instances of themselves.
	   (else (match-pattern (hhd . htl) literals (maybe-car match-value fail-value)
				fail-value
		      (match-pattern tl literals (maybe-cdr match-value fail-value) fail-value
				     success-expr)))))
    ;; Matches if the pattern itself is a list. hd, short for "head", is a
    ;; variable that will be bound to the first element of the match-value if it's
    ;; a list. If it's not a list, (maybe-car) will cause hd to be bound to the fail-value.
    ;;
    ;; Also, the match-value may already be the fail-value due to occurrences at a shallower
    ;; level in the pattern. If this happens, then this code won't bother to delve any deeper.
    ((_ (hd . tl) literals match-value fail-value success-expr)
     (cond ((eq? match-value fail-value)
	    fail-value)
	   ((exists-in? 'hd 'literals)
	    (if (eq? (maybe-car match-value fail-value) 'hd)
		(match-pattern tl literals (maybe-cdr match-value fail-value)
			       fail-value success-expr)
		fail-value))
	   (else
	    (let ((hd (maybe-car match-value fail-value)))
	      (if (eq? hd fail-value)
		  fail-value
		  (match-pattern tl literals (maybe-cdr match-value fail-value)
				 fail-value success-expr))))))
    ;; The pattern doesn't have to be a list. If it's not, it'll be bound to the
    ;; whole match-value. Control can also reach here if the non-list pattern
    ;; is in the cdr position of a larger pattern.
    ((_ non-list literals match-value fail-value success-expr)
     (cond ((eq? match-value fail-value)
	    fail-value)
	   ((exists-in? 'non-list 'literals)
	    (if (eq? 'non-list match-value)
		success-expr
		fail-value))
	   (else (let ((non-list match-value)) success-expr))))))

This example shows what it does:

> (define test '((a 2) (3 4)))
#<unspecified>
> (match-pattern ((a b) (c d)) (a) test 'fail (list b c d))
(2 3 4)
> (define test '((1 2) (3 4)))
#<unspecified>
> (match-pattern ((a b) (c d)) (a) test 'fail (list b c d))
fail

The second case fails because a is defined as a literal, so there must be an actual a symbol in there for the pattern to match.

The failure-value makes it possible to evaluate a chain of these in a cond form, which makes it possible to write a multi-pattern match macro that mirrors syntax-rules.

(define-syntax match
  (syntax-rules (else)
    ((_ value literals) (error "No patterns matched"))
    ((_ value literals (else . body))
     (begin . body))
    ((_ value literals (pattern . body) . rest)
     (let* ((fail (gentemp))
	    (result (match-pattern pattern literals value fail
				   (begin . body))))
       (if (eq? result fail)
	   (match value literals . rest)
	   result)))))

An example:

(match '((1 2) (3 4)) (a)
   (((a b) (c d))
    (list b c d))
   (((b c) (d e))
    (list (+ b c) (+ d e)))
   (else 'whatever))

To demonstrate the use of this macro in conjunction with defmacro, let's define my-cond again using defmacro:

(defmacro (my-cond . clauses)
  (match (cons 'my-cond clauses) (else)
    ((x) '(if #f 'not-reached))
    ((x (else . body))
     `(begin . ,body))
    ((x (bool-expression . body) . rest)
     `(if ,bool-expression
	  (begin . ,body)
	  (my-cond . ,rest)))))

And that is what macros can do.