Write Yourself a Scheme in 48 Hours/First Steps

Write Yourself a Scheme in 48 Hours
First Steps Parsing → 

First, you'll need to install GHC. On GNU/Linux, it's often pre-installed or available via the package manager (apt or yum or pacman for example, depending on your distribution). It's also downloadable from http://www.haskell.org/ghc/. A binary package is probably easiest, unless you really know what you're doing. It should download and install like any other software package. This tutorial was developed on GNU/Linux, but everything should also work on Windows, as long as you know how to use the command line, or on the Macintosh from within the Terminal.

For UNIX (or Windows Emacs) users, there is a pretty good Emacs mode, including syntax highlighting and automatic indentation. Windows users can use Notepad or any other text editor: Haskell syntax is fairly Notepad-friendly, though you have to be careful with the indentation. Eclipse users might want to try the eclipsefp plug-in.

Now, it's time for your first Haskell program. This program will read a name off the command line and then print a greeting. Create a file ending in ".hs" and type the following text. Be sure to get the indentation right, or else it may not compile.

 module Main where
 import System.Environment
 
 main :: IO ()
 main = do
     args <- getArgs
     putStrLn ("Hello, " ++ args !! 0)

Let's go through this code. The first two lines specify that we'll be creating a module named Main that imports the System module. Every Haskell program begins with an action called main in a module named Main. That module may import others, but it must be present for the compiler to generate an executable file. Haskell is case-sensitive: module names are always capitalized, definitions always uncapitalized.

The line main :: IO () is a type declaration: it says that main is of type IO (), which is an IO action carrying along values of unit type (). A unit type allows only one value, also denoted (), thus holding no information. Type declarations in Haskell are optional: the compiler figures them out automatically, and only complains if they differ from what you've specified. In this tutorial, I specify the types of all declarations explicitly, for clarity. If you're following along at home, you may want to omit them, because it's less to change as we build our program.

The IO type is an instance of the Monad class (a class of types). Monad is a concept. To say a value is of a type of the monad class is to say:

  1. there is (a certain type of) extra information attached to this value;
  2. most functions do not need to worry about these pieces of information.

In this example,

  1. the "extra information" is IO actions to be performed using the carried along values;
  2. while the basic value (which attached with information) is void, represented as ().

Both IO [String] and IO () belong to the same IO monad type, but they have different base types. They act on (and pass along) values of different types, [String] and ().

"value attached with (hidden) information" is called "Monadic value".

"Monadic value" is often called "actions", because the easiest way to think about the using of IO monad is to think about a sequence of actions affecting the outside world. The sequence of actions maybe pass along basic values, and each action is able to act on the values.

Haskell is a functional language: instead of giving the computer a sequence of instructions to carry out, you give it a collection of definitions that tell it how to perform every function it might need. These definitions use various compositions of actions and functions. The compiler figures out an execution path that puts everything together.

To write one of these definitions, you set it up as an equation. The left hand side defines a name, and optionally one or more patterns (explained later) that will bind variables. The right hand side defines some composition of other definitions that tells the computer what to do when it encounters the name. These equations behave just like ordinary equations in algebra: you can always substitute the right hand side for the left within the text of the program, and it'll evaluate to the same value. Called "referential transparency", this property makes it significantly easier to reason about Haskell programs than other languages.

How will we define our main action? We know that it must be an IO () action, which we want to read the command line args and print some output, producing (), or nothing of value, eventually.

There are two ways to create an IO action (either directly or by calling a function that performs them):

  1. Lift an ordinary value into the IO monad, using the return function.
  2. Combine two existing IO actions.

Since we want to do two things, we'll take the second approach. The built-in action getArgs reads the command-line arguments and passes them along as a list of strings. The built-in function putStrLn takes a string and creates an action that writes this string to the console.

To combine these actions, we use a do-block. A do-block consists of a series of lines, all lined up with the first non-whitespace character after the do. Each line can have one of two forms:

  1. name <- action1
  2. action2

The first form binds the result of action1 to name, to be available in next actions. For example, if the type of action1 is IO [String] (an IO action returning a list of strings, as with getArgs), then name will be bound in all the subsequent actions to the list of strings thus passed along, through the use of "bind" operator >>=. The second form just executes action2, sequencing it with the next line (should there be one) through the >> operator. The bind operator has different semantics for each monad: in the case of the IO monad, it executes the actions sequentially, performing whatever external side-effects that result from actions. Because the semantics of this composition depend upon the particular monad used, you cannot mix actions of different monad types in the same do-block - only IO monad can be used (it's all in the same "pipe").

Of course, these actions may themselves call functions or complicated expressions, passing along their results (either by calling the return function, or some other function that eventually does so). In this example, we first take the first element of the argument list (at index 0, args !! 0), concatenate it onto the end of the string "Hello, " ("Hello, " ++), and finally pass that to putStrLn which creates new IO action, participating in the do-block sequencing.

A new action thus created, which is a combined sequence of actions as described above, is stored in the identifier main of type IO (). The Haskell system notices this definition, and executes the action in it.

Strings are lists of characters in Haskell, so you can use any of the list functions and operators on them. A full table of the standard operators and their precedences follows:

Operator(s) Precedence Associativity Description
. 9 right function composition
!! left list indexing
^, ^^, ** 8 right exponentiation (integer, fractional, and floating-point)
*, / 7 left multiplication, division
+, - 6 left addition, subtraction
: 5 right cons (list construction)
++ right list concatenation
`elem`, `notElem` 4 left list membership
==, /=, <, <=, >=, > left equality, inequality, and other relation operators
&& 3 right logical and
|| 2 right logical or
>>, >>= 1 left monadic bind ignoring the return value, monadic bind piping value to the next function
=<< right reverse monadic bind (same as above, but arguments reversed)
$ 0 right infix function application (f $ x is the same as f x, but right-associative instead of left)

To compile and run the program, try something like this:

debian:/home/jdtang/haskell_tutorial/code# ghc -o hello_you --make listing2.hs
debian:/home/jdtang/haskell_tutorial/code# ./hello_you Jonathan
Hello, Jonathan

The -o option specifies the name of the executable you want to create, and then you just specify the name of the Haskell source file.

Exercises
  1. Change the program so it reads two arguments from the command line, and prints out a message using both of them
  2. Change the program so it performs a simple arithmetic operation on the two arguments and prints out the result. You can use read to convert a string to a number, and show to convert a number back into a string. Play around with different operations.
  3. getLine is an IO action that reads a line from the console and returns it as a string. Change the program so it prompts for a name, reads the name, and then prints that instead of the command line value


Write Yourself a Scheme in 48 Hours
First Steps Parsing →