Write Yourself a Scheme in 48 Hours/Evaluation, Part 2

Write Yourself a Scheme in 48 Hours
← Error Checking and Exceptions	Evaluation, Part 2	Building a REPL →

Additional Primitives: Partial Application edit

Now that we can deal with type errors, bad arguments, and so on, we'll flesh out our primitive list so that it does something more than calculate. We'll add boolean operators, conditionals, and some basic string operations.

Start by adding the following into the list of primitives:

("=", numBoolBinop (==)),
("<", numBoolBinop (<)),
(">", numBoolBinop (>)),
("/=", numBoolBinop (/=)),
(">=", numBoolBinop (>=)),
("<=", numBoolBinop (<=)),
("&&", boolBoolBinop (&&)),
("||", boolBoolBinop (||)),
("string=?", strBoolBinop (==)),
("string<?", strBoolBinop (<)),
("string>?", strBoolBinop (>)),
("string<=?", strBoolBinop (<=)),
("string>=?", strBoolBinop (>=)),

These depend on helper functions that we haven't written yet: numBoolBinop, boolBoolBinop and strBoolBinop. Instead of taking a variable number of arguments and returning an integer, these both take exactly two arguments and return a boolean. They differ from each other only in the type of argument they expect, so let's factor the duplication into a generic boolBinop function that's parametrized by the unpacker function it applies to its arguments:

boolBinop :: (LispVal -> ThrowsError a) -> (a -> a -> Bool) -> [LispVal] -> ThrowsError LispVal
boolBinop unpacker op args = if length args /= 2 
                             then throwError $ NumArgs 2 args
                             else do left <- unpacker $ args !! 0
                                      right <- unpacker $ args !! 1
                                      return $ Bool $ left `op` right

Because each argument may throw a type mismatch, we have to unpack them sequentially, in a do-block (for the Error monad). We then apply the operation to the two arguments and wrap the result in the Bool constructor. Any function can be turned into an infix operator by wrapping it in backticks (`op`).

Also, take a look at the type signature. boolBinop takes two functions as its first two arguments: the first is used to unpack the arguments from LispVals to native Haskell types, and the second is the actual operation to perform. By parameterizing different parts of the behavior, you make the function more reusable.

Now we define three functions that specialize boolBinop with different unpackers:

numBoolBinop  = boolBinop unpackNum
strBoolBinop  = boolBinop unpackStr
boolBoolBinop = boolBinop unpackBool

We haven't told Haskell how to unpack strings from LispVals yet. This works similarly to unpackNum, pattern matching against the value and either returning it or throwing an error. Again, if passed a primitive value that could be interpreted as a string (such as a number or boolean), it will silently convert it to the string representation.

unpackStr :: LispVal -> ThrowsError String
unpackStr (String s) = return s
unpackStr (Number s) = return $ show s
unpackStr (Bool s)   = return $ show s
unpackStr notString  = throwError $ TypeMismatch "string" notString

And we use similar code to unpack booleans:

unpackBool :: LispVal -> ThrowsError Bool
unpackBool (Bool b) = return b
unpackBool notBool  = throwError $ TypeMismatch "boolean" notBool

Let's compile and test this to make sure it's working, before we proceed to the next feature:

$ ghc -package parsec -o simple_parser [../code/listing6.1.hs listing6.1.hs]
$ ./simple_parser "(< 2 3)"
#t
$ ./simple_parser "(> 2 3)"
#f
$ ./simple_parser "(>= 3 3)"
#t
$ ./simple_parser "(string=? \"test\"  \"test\")"
#t
$ ./simple_parser "(string<? \"abc\" \"bba\")"
#t

Conditionals: Pattern Matching 2 edit

Now, we'll proceed to adding an if-clause to our evaluator. As with standard Scheme, our evaluator considers #f to be false and any other value to be true:

eval (List [Atom "if", pred, conseq, alt]) = 
     do result <- eval pred
        case result of
             Bool False -> eval alt
             otherwise  -> eval conseq

As the function definitions are evaluated in order, be sure to place this one above eval (List (Atom func : args)) = mapM eval args >>= apply func or it will throw a Unrecognized primitive function args: "if" error.

This is another example of nested pattern-matching. Here, we're looking for a 4-element list. The first element must be the atom if. The others can be any Scheme forms. We take the first element, evaluate, and if it's false, evaluate the alternative. Otherwise, we evaluate the consequent.

Compile and run this, and you'll be able to play around with conditionals:

$ ghc -package parsec -o simple_parser [../code/listing6.2.hs listing6.2.hs]
$ ./simple_parser "(if (> 2 3) \"no\" \"yes\")"
"yes"
$ ./simple_parser "(if (= 3 3) (+ 2 3 (- 5 1)) \"unequal\")"
9

List Primitives: `car`, `cdr`, and `cons` edit

For good measure, let's also add in the basic list-handling primitives. Because we've chosen to represent our lists as Haskell algebraic data types instead of pairs, these are somewhat more complicated than their definitions in many Lisps. It's easiest to think of them in terms of their effect on printed S-expressions:

(car '(a b c)) = a
(car '(a)) = a
(car '(a b . c)) = a
(car 'a) = error – not a list
(car 'a 'b) = error – car only takes one argument

We can translate these fairly straightforwardly into pattern clauses, recalling that (x : xs) divides a list into the first element and the rest:

car :: [LispVal] -> ThrowsError LispVal
car [List (x : xs)]         = return x
car [DottedList (x : xs) _] = return x
car [badArg]                = throwError $ TypeMismatch "pair" badArg
car badArgList              = throwError $ NumArgs 1 badArgList

Let's do the same with cdr:

(cdr '(a b c)) = (b c)
(cdr '(a b)) = (b)
(cdr '(a)) = NIL
(cdr '(a . b)) = b
(cdr '(a b . c)) = (b . c)
(cdr 'a) = error – not a list
(cdr 'a 'b) = error – too many arguments

We can represent the first three cases with a single clause. Our parser represents '() as List [], and when you pattern-match (x : xs) against [x], xs is bound to []. The other ones translate to separate clauses:

cdr :: [LispVal] -> ThrowsError LispVal
cdr [List (x : xs)]         = return $ List xs
cdr [DottedList [_] x]      = return x
cdr [DottedList (_ : xs) x] = return $ DottedList xs x
cdr [badArg]                = throwError $ TypeMismatch "pair" badArg
cdr badArgList              = throwError $ NumArgs 1 badArgList

cons is a little tricky, enough that we should go through each clause case-by-case. If you cons together anything with Nil, you end up with a one-item list, the Nil serving as a terminator:

cons :: [LispVal] -> ThrowsError LispVal
cons [x1, List []] = return $ List [x1]

If you cons together anything and a list, it's like tacking that anything onto the front of the list:

 
cons [x, List xs] = return $ List $ x : xs

However, if the list is a DottedList, then it should stay a DottedList, taking into account the improper tail:

cons [x, DottedList xs xlast] = return $ DottedList (x : xs) xlast

If you cons together two non-lists, or put a list in front, you get a DottedList. This is because such a cons cell isn't terminated by the normal Nil that most lists are.

cons [x1, x2] = return $ DottedList [x1] x2

Finally, attempting to cons together more or less than two arguments is an error:

cons badArgList = throwError $ NumArgs 2 badArgList

Our last step is to implement eqv?. Scheme offers three levels of equivalence predicates: eq?, eqv?, and equal?. For our purposes, eq? and eqv? are basically the same: they recognize two items as the same if they print the same, and are fairly slow. So we can write one function for both of them and register it under eq? and eqv?.

eqv :: [LispVal] -> ThrowsError LispVal
eqv [(Bool arg1), (Bool arg2)]             = return $ Bool $ arg1 == arg2
eqv [(Number arg1), (Number arg2)]         = return $ Bool $ arg1 == arg2
eqv [(String arg1), (String arg2)]         = return $ Bool $ arg1 == arg2
eqv [(Atom arg1), (Atom arg2)]             = return $ Bool $ arg1 == arg2
eqv [(DottedList xs x), (DottedList ys y)] = eqv [List $ xs ++ [x], List $ ys ++ [y]]
eqv [(List arg1), (List arg2)]             = return $ Bool $ (length arg1 == length arg2) && 
                                                             (all eqvPair $ zip arg1 arg2)
     where eqvPair (x1, x2) = case eqv [x1, x2] of
                                Left err -> False
                                Right (Bool val) -> val
eqv [_, _]                                 = return $ Bool False
eqv badArgList                             = throwError $ NumArgs 2 badArgList

Most of these clauses are self-explanatory, the exception being the one for two Lists. This, after checking to make sure the lists are the same length, zips the two lists of pairs, and then uses the function all to return False if eqvPair returns False on any of the pairs. eqvPair is an example of a local definition: it is defined using the where keyword, just like a normal function, but is available only within that particular clause of eqv. Since we know that eqv only throws an error if the number of arguments is not 2, the line Left err -> False will never be executed at the moment.

`equal?` and Weak Typing: Heterogenous Lists edit

Since we introduced weak typing above, we'd also like to introduce an equal? function that ignores differences in the type tags and only tests if two values can be interpreted the same. For example, (eqv? 2 "2") = #f, yet we'd like (equal? 2 "2") = #t. Basically, we want to try all of our unpack functions, and if any of them result in Haskell values that are equal, return True.

The obvious way to approach this is to store the unpacking functions in a list and use mapM to execute them in turn. Unfortunately, this doesn't work, because standard Haskell only lets you put objects in a list if they're the same type. The various unpacker functions return different types, so you can't store them in the same list.

We'll get around this by using a GHC extension – Existential Types – that lets us create a heterogenous list, subject to typeclass constraints. Extensions are fairly common in the Haskell world: they're basically necessary to create any reasonably large program, and they're often compatible between implementations (existential types work in both Hugs and GHC and are a candidate for standardization). Note you need to use a special compiler flag for this: -fglasgow-exts as mentioned below; the newer -XExistentialQuantification; or add the pragma {-# LANGUAGE ExistentialQuantification #-} to the beginning of your code (In general, the compiler flag -Xfoo can be replaced by the pragma {-# LANGUAGE foo #-} inside the source file).

The first thing we need to do is define a data type that can hold any function from a LispVal -> something, provided that that something supports equality:

data Unpacker = forall a. Eq a => AnyUnpacker (LispVal -> ThrowsError a)

This is like any normal algebraic datatype, except for the type constraint. It says, "For any type that is an instance of Eq, you can define an Unpacker that takes a function from LispVal to that type, and may throw an error". We'll have to wrap our functions with the AnyUnpacker constructor, but then we can create a list of Unpackers that does just what we want it.

Rather than jump straight to the equal? function, let's first define a helper function that takes an Unpacker and then determines if two LispVals are equal when it unpacks them:

unpackEquals :: LispVal -> LispVal -> Unpacker -> ThrowsError Bool
unpackEquals arg1 arg2 (AnyUnpacker unpacker) = 
             do unpacked1 <- unpacker arg1
                unpacked2 <- unpacker arg2
                return $ unpacked1 == unpacked2
        `catchError` (const $ return False)

After pattern-matching to retrieve the actual function, we enter a do-block for the ThrowsError monad. This retrieves the Haskell values of the two LispVals, and then tests whether they're equal. If there is an error anywhere within the two unpackers, it returns False, using the const function because catchError expects a function to apply to the error value.

Finally, we can define equal? in terms of these helpers:

equal :: [LispVal] -> ThrowsError LispVal
equal [arg1, arg2] = do
      primitiveEquals <- liftM or $ mapM (unpackEquals arg1 arg2) 
                         [AnyUnpacker unpackNum, AnyUnpacker unpackStr, AnyUnpacker unpackBool]
      eqvEquals <- eqv [arg1, arg2]
      return $ Bool $ (primitiveEquals || let (Bool x) = eqvEquals in x)
equal badArgList = throwError $ NumArgs 2 badArgList

The first action makes a heterogenous list of [unpackNum, unpackStr, unpackBool], and then maps the partially applied (unpackEquals arg1 arg2) over it. This gives a list of booleans, so we use the Prelude function or to return true if any single one of them is true.

The second action tests the two arguments with eqv?. Since we want equal? to be looser than eqv?, it should return true whenever eqv? does so. This also lets us avoid handling cases like the list or dotted-list (though this introduces a bug; see exercise #2 in this section).

Finally, equal? ors both of these values together and wraps the result in the Bool constructor, returning a LispVal. The let (Bool x) = eqvEquals in x is a quick way of extracting a value from an algebraic type: it pattern matches Bool x against the eqvEquals value, and then returns x. The result of a let-expression is the expression following the keyword in.

To use these functions, insert them into our primitives list:

("car", car),
("cdr", cdr),
("cons", cons),
("eq?", eqv),
("eqv?", eqv),
("equal?", equal)]

To compile this code, you need to enable GHC extensions with -fglasgow-exts:

$ ghc -package parsec -fglasgow-exts -o parser [../code/listing6.4.hs listing6.4.hs]
$ ./parser "(cdr '(a simple test))"
(simple test)
$ ./parser "(car (cdr '(a simple test)))"
simple
$ ./parser "(car '((this is) a test))"
(this is)
$ ./parser "(cons '(this is) 'test)"
((this is) . test)
$ ./parser "(cons '(this is) '())"
((this is))
$ ./parser "(eqv? 1 3)"
#f
$ ./parser "(eqv? 3 3)"
#t
$ ./parser "(eqv? 'atom 'atom)"
#t

Exercises

Instead of treating any non-false value as true, change the definition of if so that the predicate accepts only Bool values and throws an error on any others.
equal? has a bug in that a list of values is compared using eqv? instead of equal?. For example, (equal? '(1 "2") '(1 2)) = #f, while you'd expect it to be #t. Change equal? so that it continues to ignore types as it recurs into list structures. You can either do this explicitly, following the example in eqv?, or factor the list clause into a separate helper function that is parameterized by the equality testing function.
Implement the cond and case expressions.
Add the rest of the string functions. You don't yet know enough to do string-set!; this is difficult to implement in Haskell, but you'll have enough information after the next two sections

Write Yourself a Scheme in 48 Hours
← Error Checking and Exceptions	Evaluation, Part 2	Building a REPL →

Write Yourself a Scheme in 48 Hours/Evaluation, Part 2

Additional Primitives: Partial Application edit

Conditionals: Pattern Matching 2 edit

List Primitives: car, cdr, and cons edit

equal? and Weak Typing: Heterogenous Lists edit

List Primitives: `car`, `cdr`, and `cons` edit

`equal?` and Weak Typing: Heterogenous Lists edit