Last modified on 5 July 2014, at 18:38

Write Yourself a Scheme in 48 Hours/Adding Variables and Assignment


Write Yourself a Scheme in 48 Hours
Adding Variables and Assignment

Finally, we get to the good stuff: variables. A variable lets us save the result of an expression and refer to it later. In Scheme, a variable can also be reset to new values, so that its value changes as the program executes. This presents a complication for Haskell, because the execution model is built upon functions that return values, but never change them.

Nevertheless, there are several ways to simulate state in Haskell, all involving monads. The simplest is probably the state monad, which lets you hide arbitrary state within the monad and pass it around behind the scenes. You specify the state type as a parameter to the monad (e.g. if a function returns an integer but modifies a list of string pairs, it would have type State [(String, String)] Integer), and access it via the get and put functions, usually within a do-block. You'd specify the initial state via the runState myStateAction initialList, which returns a pair containing the return value and the final state.

Unfortunately, the state monad doesn't work well for us, because the type of data we need to store is fairly complex. For a simple top-level environment, we could get away with [(String, LispVal)], storing mappings from variable names to values. However, when we start dealing with function calls, these mappings become a stack of nested environments, arbitrarily deep. And when we add closures, environments might get saved in an arbitrary Function value, and passed around throughout the program. In fact, they might be saved in a variable and passed out of the runState monad entirely, something we're not allowed to do.

Instead, we use a feature called state threads, letting Haskell manage the aggregate state for us. This lets us treat mutable variables as we would in any other programming language, using functions to get or set variables. There are two flavors of state threads: the ST monad creates a stateful computation that can be executed as a unit, without the state escaping to the rest of the program. The IORef module lets you use stateful variables within the IO monad. Since our state has to be interleaved with IO anyway (it persists between lines in the REPL, and we will eventually have IO functions within the language itself), we'll be using IORefs.

We can start out by importing Data.IORef and defining a type for our environments:

import Data.IORef
 
type Env = IORef [(String, IORef LispVal)]

This declares an Env as an IORef holding a list that maps Strings to mutable LispVals. We need IORefs for both the list itself and for individual values because there are two ways that the program can mutate the environment. It might use set! to change the value of an individual variable, a change visible to any function that shares that environment (Scheme allows nested scopes, so a variable in an outer scope is visible to all inner scopes). Or it might use define to add a new variable, which should be visible on all subsequent statements.

Since IORefs can only be used within the IO monad, we'll want a helper action to create an empty environment. We can't just use the empty list [] because all accesses to IORefs must be sequenced, and so the type of our null environment is IO Env instead of just plain Env:

nullEnv :: IO Env
nullEnv = newIORef []

From here, things get a bit more complicated, because we'll be simultaneously dealing with two monads. Remember, we also need an Error monad to handle things like unbound variables. The parts that need IO functionality and the parts that may throw exceptions are interleaved, so we can't just catch all the exceptions and return only normal values to the IO monad.

Haskell provides a mechanism known as monad transformers that lets you combine the functionality of multiple monads. We'll be using one of these, ErrorT, which lets us layer error-handling functionality on top of the IO monad. Our first step is to create a type synonym for our combined monad:

type IOThrowsError = ErrorT LispError IO

Like ThrowsError, IOThrowsError is really a type constructor: we've left off the last argument, the return type of the function. However, ErrorT takes one more argument than plain old Either: we have to specify the type of monad that we're layering our error-handling functionality over. We've created a monad that may contain IO actions that throw a LispError.

We have a mix of ThrowsError and IOThrowsError functions, but actions of different types cannot be contained within the same do-block, even if they provide essentially the same functionality. Haskell already provides a mechanism, lifting, to bring values of the lower type (IO) into the combined monad. Unfortunately, there's no similar support to bring a value of the untransformed upper type into the combined monad, so we need to write it ourselves:

liftThrows :: ThrowsError a -> IOThrowsError a
liftThrows (Left err) = throwError err
liftThrows (Right val) = return val

This destructures the Either type and either re-throws the error type or returns the ordinary value. Methods in typeclasses resolve based on the type of the expression, so throwError and return (members of MonadError and Monad, respectively) take on their IOThrowsError definitions. Incidentally, the type signature provided here is not fully general: if we'd left it off, the compiler would have inferred liftThrows :: (MonadError m a) => Either e a -> m a.

We'll also want a helper function to run the whole top-level IOThrowsError action, returning an IO action. We can't escape from the IO monad, because a function that performs IO has an effect on the outside world, and you don't want that in a lazily-evaluated pure function. But you can run the error computation and catch the errors.

runIOThrows :: IOThrowsError String -> IO String
runIOThrows action = runErrorT (trapError action) >>= return . extractValue

This uses our previously-defined trapError function to take any error values and convert them to their string representations, then runs the whole computation via runErrorT. The result is passed into extractValue and returned as a value in the IO monad.

Now we're ready to return to environment handling. We'll start with a function to determine if a given variable is already bound in the environment, necessary for proper handling of define:

isBound :: Env -> String -> IO Bool
isBound envRef var = readIORef envRef >>= return . maybe False (const True) . lookup var

This first extracts the actual environment value from its IORef via readIORef. Then we pass it to lookup to search for the particular variable we're interested in. lookup returns a Maybe value, so we return False if that value was Nothing and True otherwise (we need to use the const function because maybe expects a function to perform on the result and not just a value). Finally, we use return to lift that value into the IO monad. Since we're just interested in a true/false value, we don't need to deal with the actual IORef that lookup returns.

Next, we'll want to define a function to retrieve the current value of a variable:

getVar :: Env -> String -> IOThrowsError LispVal
getVar envRef var  =  do env <- liftIO $ readIORef envRef
                         maybe (throwError $ UnboundVar "Getting an unbound variable" var)
                               (liftIO . readIORef)
                               (lookup var env)

Like the previous function, this begins by retrieving the actual environment from the IORef. However, getVar uses the IOThrowsError monad, because it also needs to do some error handling. As a result, we need to use the liftIO function to lift the readIORef action into the combined monad. Similarly, when we return the value, we use liftIO . readIORef to generate an IOThrowsError action that reads the returned IORef. We don't need to use liftIO to throw an error, however, because throwError is a defined for the MonadError typeclass, of which ErrorT is an instance.

Now we create a function to set values:

setVar :: Env -> String -> LispVal -> IOThrowsError LispVal
setVar envRef var value = do env <- liftIO $ readIORef envRef
                             maybe (throwError $ UnboundVar "Setting an unbound variable" var)
                                   (liftIO . (flip writeIORef value))
                                   (lookup var env)
                             return value

Again, we first read the environment out of its IORef and run a lookup on it. This time, however, we want to change the variable instead of just reading it. The writeIORef action provides a means for this, but takes its arguments in the wrong order (ref -> value instead of value -> ref). So we use the built-in function flip to switch the arguments of writeIORef around, and then pass it the value. Finally, we return the value we just set, for convenience.

We'll want a function to handle the special behavior of define, which sets a variable if already bound or creates a new one if not. Since we've already defined a function to set values, we can use it in the former case:

defineVar :: Env -> String -> LispVal -> IOThrowsError LispVal
defineVar envRef var value = do
     alreadyDefined <- liftIO $ isBound envRef var
     if alreadyDefined
        then setVar envRef var value >> return value
        else liftIO $ do
             valueRef <- newIORef value
             env <- readIORef envRef
             writeIORef envRef ((var, valueRef) : env)
             return value

It's the latter case that's interesting, where the variable is unbound. We create an IO action (via do-notation) that creates a new IORef to hold the new variable, reads the current value of the environment, then writes a new list back to that variable consisting of the new (key, variable) pair added to the front of the list. Then we lift that whole do-block into the IOThrowsError monad with liftIO.

There's one more useful environment function: being able to bind a whole bunch of variables at once, as happens when a function is invoked. We might as well build that functionality now, though we won't be using it until the next section:

bindVars :: Env -> [(String, LispVal)] -> IO Env
bindVars envRef bindings = readIORef envRef >>= extendEnv bindings >>= newIORef
     where extendEnv bindings env = liftM (++ env) (mapM addBinding bindings)
           addBinding (var, value) = do ref <- newIORef value
                                        return (var, ref)

This is perhaps more complicated than the other functions, since it uses a monadic pipeline (rather than do-notation) and a pair of helper functions to do the work. It's best to start with the helper functions. addBinding takes a variable name and value, creates an IORef to hold the new variable, and then returns the name–value pair. extendEnv calls addBinding on each member of bindings (mapM) to create a list of (String, IORef LispVal) pairs, and then appends the current environment to the end of that (++ env). Finally, the whole function wires these functions up in a pipeline, starting by reading the existing environment out of its IORef, then passing the result to extendEnv, then returning a new IORef with the extended environment.

Now that we have all our environment functions, we need to start using them in the evaluator. Since Haskell has no global variables, we'll have to thread the environment through the evaluator as a parameter. While we're at it, we might as well add the set! and define special forms.

eval :: Env -> LispVal -> IOThrowsError LispVal
eval env val@(String _) = return val
eval env val@(Number _) = return val
eval env val@(Bool _) = return val
eval env (Atom id) = getVar env id
eval env (List [Atom "quote", val]) = return val
eval env (List [Atom "if", pred, conseq, alt]) =
     do result <- eval env pred
     case result of
          Bool False -> eval env alt
          otherwise -> eval env conseq
eval env (List [Atom "set!", Atom var, form]) =
     eval env form >>= setVar env var
eval env (List [Atom "define", Atom var, form]) =
     eval env form >>= defineVar env var
eval env (List (Atom func : args)) = mapM (eval env) args >>= liftThrows . apply func
eval env badForm = throwError $ BadSpecialForm "Unrecognized special form" badForm

Since a single environment gets threaded through a whole interactive session, we need to change a few of our IO functions to take an environment.

evalAndPrint :: Env -> String -> IO ()
evalAndPrint env expr =  evalString env expr >>= putStrLn
 
evalString :: Env -> String -> IO String
evalString env expr = runIOThrows $ liftM show $ (liftThrows $ readExpr expr) >>= eval env

We need the runIOThrows in evalString because the type of the monad has changed from ThrowsError to IOThrowsError. Similarly, we need a liftThrows to bring readExpr into the IOThrowsError monad.

Next, we initialize the environment with a null variable before starting the program:

runOne :: String -> IO ()
runOne expr = nullEnv >>= flip evalAndPrint expr
 
runRepl :: IO ()
runRepl = nullEnv >>= until_ (== "quit") (readPrompt "Lisp>>> ") . evalAndPrint

We've created an additional helper function runOne to handle the single-expression case, since it's now somewhat more involved than just running evalAndPrint. The changes to runRepl are a bit more subtle: notice how we added a function composition operator before evalAndPrint. That's because evalAndPrint now takes an additional Env parameter, fed from nullEnv. The function composition tells until_ that instead of taking plain old evalAndPrint as an action, it ought to apply it first to whatever's coming down the monadic pipeline, in this case the result of nullEnv. Thus, the actual function that gets applied to each line of input is (evalAndPrint env), just as we want it.

Finally, we need to change our main function to call runOne instead of evaluating evalAndPrint directly:

main :: IO ()
main = do args <- getArgs
          case length args of
               0 -> runRepl
               1 -> runOne $ args !! 0
               otherwise -> putStrLn "Program takes only 0 or 1 argument"

And we can compile and test our program:

$ ghc -package parsec -o lisp [../code/listing8.hs listing8.hs]
$ ./lisp
Lisp>>> (define x 3)
3
Lisp>>> (+ x 2)
5
Lisp>>> (+ y 2)
Getting an unbound variable: y
Lisp>>> (define y 5)
5
Lisp>>> (+ x (- y 2))
6
Lisp>>> (define str "A string")
"A string"
Lisp>>> (< str "The string")
Invalid type: expected number, found "A string"
Lisp>>> (string<? str "The string")
#t


Write Yourself a Scheme in 48 Hours
 ← Building a REPL Adding Variables and Assignment Defining Scheme Functions →