Clojure Programming/By Example

This is intended to be a hands on first look at Clojure. If you wish to try the examples as you go, then you may wish to have already set up a working environment as per http://en.wikibooks.org/wiki/Clojure_Programming/Getting_Started so you can see the results of the example code.

Clojure programs are written in forms. Forms enclosed in parenthesis indicate function calls:

(+ 1 2 3)

calls the '+' function with arguments 1 2 3 and returns the value 6, the sum of arguments.

New functions can be defined using defn:

(defn average [x y] (/ (+ x y) 2))

Here x and y are symbols representing the input arguments. Function '/' is called to divide the sum of x and y by 2. Note that forms are always in prefix notation, with function followed by subsequent arguments. Now average can be invoked as

(average 3 5)

and will return 4. In this example, 'average' is a symbol, whose value is a function. (refer to http://clojure.org/reader for a detailed explanation of forms)

Clojure provides easy access to the JVM:

(.show (javax.swing.JFrame.))

This calls the show method on the result of (javax.swing.JFrame.) which constructs a new Jframe. Note the full stop before the method call and the full stop after the construction. (refer to http://clojure.org/java_interop)

Functions can be passed to other functions:

(map + [1 2 3] [4 5 6])
;; Equivalent to (list (+ 1 4) (+ 2 5) (+ 3 6))

returns (5 7 9). Map is a function which takes another function and calls it with arguments taken from following collections. In our case we have provided the function '+' and two vectors of integers. The result is a list of the results of calling '+' with arguments taken from the vectors. Using functions as arguments to other functions is very powerful. We can use our previously defined average function with map like so:

(map average [1 2 3] [4 5 6])
;; Equivalent to (list (average 1 4) (average 2 5) (average 3 6))

returns (5/2 7/2 9/2) We see here that Clojure supports ratios as a data types. (refer to http://clojure.org/data_structures for a full list)

Functions can also return other functions:

(defn addx [x] (fn [y] (+ x y)))

Here addx will return a new function that takes 1 argument and adds x to it.

(addx 5)

returns a function that can be called with 1 argument and will add 5 to it.

(map (addx 5) [1 2 3 4 5])

returns (6 7 8 9 10) We called map with a the result of addx, which was a function that takes an argument and adds 5. That function was called on the list of numbers we supplied.

There is a shorthand way to create an unnamed function:

#(+ %1 %2)

Will create a function that calls '+' with two arguments %1 and %2.

(map #(+ %1 5) [1 2 3 4 5])

Will add 5 to the list of numbers we supplied. The ability to pass and create functions dynamically is referred to as first-class functions.

Functional Programming treats computation as the evaluation of mathematical functions and avoids state and mutable data. In an imperative language you would typically create variables and change their value regularly. In Clojure you return new results without modifying what was there before.

Functions without side-effects

Function side effects can be changing the values of inputs, changing global data, or performing IO.

Imperative: void moveplayer( p, x, y )

updates a player object with a new location

Object Oriented: class player { void move( x, y ) }

again, mutates an existing object

Functional: (moveplayer oldp x y)

a completely new player is returned, the old player is unaffected

In imperative you only know that p has changed because the function name hints it. And it might have changed other things such as some world data as well. In FP oldp is preserved (you don't need to worry about what happened to it or the world - nothing can change) and it is explicit that a new player is returned as a result of moving.

The main advantages here are reasoning, testability, and concurrency. The language enforces that there are no side effects so you can infer behaviour. Inputs directly map to outputs which makes it easier to construct and think about test cases. Two threads can operate simultaneously on the same data with no risk of them corrupting each other, as the data will not be changed.

Immutable data

Consider removing an item from a list. The imperative solution would modify the list in place. A functional solution would return a completely new list, leaving the original in place. This sounds on the surface to be wasteful, but there are many ways that this is optimized by the compiler to be very efficient.

Code without variables for someone used to imperative programming can take a little getting used to. Here is a quick guide to converting variable style code into functional code:

You want to accumulate some changes

// sum odds 
int x = 0; 
for (int i=1; i<100; i+=2) { 
   x+=i;
}

Rearrange these sort of things into a form that requires no variables

(reduce + (range 1 100 2))

(range 1 100 2) creates a lazy sequence of numbers 1 3 5 7 ... 99. 1 is the starting point, 100 is the end point, 2 is the step. reduce calls the + function. First it calls + with two arguments, the first two numbers supplied by range. Then it calls + again with the previous result and the next number, until all the numbers are exhausted. Clojure has lots of support for sequences, collections and high level operations. As you learn them, you will find very expressive ways to write tasks such as this.

You want to iterate, instead use the loop/recur construct

(loop [i 5 acc 1]
  (if (zero? i)
    acc
    (recur (dec i) (* acc i))))

computes the factorial of 5. The loop special form establishes bindings followed by expressions to be evaluated. In this example 5 is bound to i and 1 is bound to acc. The if special form then tests whether i is equal to zero. Since, it is not equal to 0, recur rebinds new values to i and acc before returning control back to the top of the loop in order to reevaluate the body of its expressions. A decremented i (dec i) is rebound to i and the product of acc and i (* acc i) is rebound to acc. This loop is recursively called until i equals 0; acc stores the result of multiplying each value i took. Note that a binding behaves like a variable.

Also, recur can target either a loop or function definition:

(defn factorial 
    ([n] 
        (factorial n 1)) 
    ([n acc] 
        (if  (= n 0)   acc 
             (recur (dec n) (* acc n)))))

In the above example, the factorial function can take either 1 argument [n], which results in the evaluation of:

(factorial n 1)

.

Or supplying 2 arguments results in evaluation of:

(if (= n 0) acc 
    (recur (dec n) (* acc n)))

recur is important because it rebinds the function's inputs instead of adding a recursive call to the stack. Had we instead used (factorial (dec n) (* acc n)) we would have similar behavior, but for large values of n, you may cause a stack overflow. Note also that we introduced two definitions for factorial, one with one argument, and another with two arguments. The user calls the one argument version, which gets translated into the two argument form for evaluation. The arity of a function is the number of arguments that the function takes.

Of course we could have written an even simpler definition similar to the previous sum odd example:

(defn factorial [n] 
  (reduce * (range 2 (inc n))))

You need to save a result and use it multiple times

There is a useful macro let which binds a symbol to a value for local use,

(let [g (+ 0.2 (rand 0.8))] 
  (Color3f.  g g g))

in this let form a random number between 0 and 0.8 is generated, 0.2 is added, and the result is bound to the symbol g. A color is constructed with red green blue values of g, which will be a grey scale of intensity ranging from 0.2 to 1

You want to make multiple method calls on the same object

Using java libraries often puts you into a situation where you want to use a local variable. Keep in mind doto. The great thing about doto is that it returns the object after applying several calls:

(doto (javax.swing.JFrame.)
  (.setLayout (java.awt.GridLayout. 2 2 3 3))
  (.add (javax.swing.JTextField.))
  (.add (javax.swing.JLabel. "Enter some text"))
  (.setSize 300 80)
  (.setVisible true))

Mutating permanent state variables

Clojure supports many mutable types, however it is important to know the difference between them and how they behave. The provided types are refs, agents, atoms and vars.

Refs

Refs are like ref cells in ML, boxes in Scheme, or pointers in other languages. It is a "box", that you can change the contents of. But unlike the other languages, the twist is that you can only make the change inside of a transaction. This ensures that two threads cannot have a conflict when updating or accessing what is stored inside the ref.

(def r (ref nil))

declares r to be a ref with initial value of nil.

(dosync (ref-set r 5))

sets r to 5 in a transaction.

@r

gets the value of r, which is 5. Note that @r is shorthand for (deref r), and works with all of Clojures mutable types. r itself is a ref, not a value.

Agents

Agents are modified by functions asynchronously. You send a function to the agent, which will later apply that function to its current value. It is asynchronous because the call to send returns immediately. The function is queued in a thread pool for execution, providing a convenient access to multi-threading.

(def a (agent 1))
(send a inc)
(await a)
@a

In this example we defined an agent with initial value 1. We sent the agent a function inc, which increments its argument. Now send queues that function for execution by a thread pool. await will block until all functions outstanding on an agent have completed. @a returns the value of our agent, which is now 2, because 1 was incremented.

Atoms

Atoms are modified by functions synchronously. You call swap! and the function you supply is applied to the value of the atom before swap! returns.

(def a (atom 1))
(swap! a inc)

Note that swap! returns the result of the function having been applied to the current atom value. Refs are 'coordinated' while agents and atoms are 'uncoordinated'. This means that in a multi-threaded environment, refs are modified in a transaction which ensures that only one thread can modify the value at a time. Whereas atoms and agents queue up change functions to ensure that the changes occur atomically. All of them are 'safe', they just use different strategies to provide this 'safety'.

Vars

Vars are like global variables in other languages. The "root binding" is an initial default value that is shared by all threads. The "binding" construct acts as if the var has been altered, but it is automatically restored to its previous value upon exiting the scope of the binding construct.

(def something 5)

Establishes a Var something with the value 5. Declaring functions actually establishes them as Vars. You should avoid using def, and especially avoid setting already declared bindings with def. Subsequently calling (def something 6) is not a thread-safe operation.

“Why doesn't Clojure have local variables?” is an often raised question. Mutation locally is just as hard to reason about as mutation globally, independent of concurrency. See for example a typical Java for loop that sets other local vars and contains breaks/returns. If it takes more thought initially to construct solutions that don't need variables, please try to expend the effort - it will repay you many times over.

However to support direct translation of imperative algorithms, there is a useful macro called with-local-vars which declares local vars which can be changed with var-set and read with var-get or @ for shorthand.

(defn factorial [x]
  (with-local-vars [acc 1, cnt x]
    (while (> @cnt 0)
      (var-set acc (* @acc @cnt))
      (var-set cnt (dec @cnt)))
    @acc))

This is a version of factorial using variables. As you can see it is not as nice as the versions described earlier, and is purely to demonstrate a local Var binding. This function is completely safe to call in a multi-threaded environment as the variables are local. However local variables cannot be allowed to leak out of their scope:

(def f (with-local-vars [a 2] #(+ 2 @a))) 
(var user/f) 
(f)

causes java.lang.IllegalStateException: Var null is unbound. The reason is that f returns a new function that adds 2 to a local variable defined in f. So the function returned is trying to hold onto a local variable of f. Now local variables are subject to change, but if change were to occur in a multi-threaded environment, and that variable had leaked outside its original scope, the change would not be local any more.

Closure is a term used when symbols are retained outside their definition:

(let [secret (ref "nothing")] 
  (defn read-secret [] @secret) 
  (defn write-secret [s] (dosync (ref-set secret s))))

Here we created two functions which both access a ref secret. We created them inside a let, so secret is not visible in our current scope anymore:

secret

causes java.lang.Exception: Unable to resolve symbol: secret in this context

However the functions themselves have retained secret and can use it to communicate:

(read-secret)

results in "nothing"

(write-secret "hi")

results in "hi"

(read-secret)

results in "hi". Note that these functions might be passed to different threads but will still be valid because secret is a ref, so access to it is controlled via transaction.

So let's write a multi-threaded program. But first we need some more helper functions:

(defn random-word []
  (nth ["hello" "bye" "foo" "bar" "baz"] (rand 5)))

nth selects a value from the vector of words we supplied, in this case we called rand to get a number between 0 (inclusive) and 5 (exclusive). You can use (doc rand) to look up information about the rand function.

(defn secret-modifier [id]
  (let [after (int (rand 10000))]
  (Thread/sleep after)
  (write-secret
    (str id " whispered " (random-word) " after " after " ms")))
  id)

Note that sleep is a static method of the java class Thread. Static methods and members must be accessed using a slash as shown. This function is going to be "sent" to an agent, so it must take an input argument (which is the current agent value) and returns a new value. In our case we don't want to change the value of the agent so we'll just return the input argument.

Now the multi-threaded part:

(def agents (map agent (range 4)))
(doseq [a agents]
  (send-off a secret-modifier))

We declared four agents with initial values 0 1 2 3, and used send-off to get them to execute secret-modifier. If you are really quick and type (read-secret) you will see that the secret is being updated by the various threads. After 10 seconds all the threads will be finished, because they sleep for a random period between 0 and 10 seconds before modifying the secret. So after 10 seconds the secret is no longer changing, and will be something like this: "1 whispered hello after 9591 ms"

Now we could have done something very similar just using java Threads:

(dotimes [i 5]
  (.start (Thread. (fn []
                     (write-secret
                       (str i " whispered " (random-word)))))))

But agents have some handy advantages. You can check if an agent has raised an exception:

(agent-errors (first agents))

Or wait until all the agents have finished executing:

(apply await agents)

Functions can be passed to agent via send or send-off, which places the function on a queue. The send queue is serviced by a thread-pool which is of the same size as the number of CPUs available. The send-off queue is serviced by a thread-pool which grows in order to provide a new thread immediately. In the above example we just wanted to see things happening on different threads, so we used send-off. However if our goal was computational throughput we would use send to perform calculation tasks as that would make best use of our processors. If a function is likely to block (like ours which sleeps for a random period), then it should not be dispatched with send, as it would tie up the 'computation' queue.

So let's write a multi-threaded dumb shuffler:

(dotimes [i 100000]
  (doseq [a agents]
    (send a #(+ %1 (- 2 (int (rand 5)))))))
(apply await agents)
(map deref agents)

Returns something like (245 -549 -87 -97). In this example we are actually modifying the value of the agent, i.e.: using it to store state. Also we are using send to take advantage of a thread-pool which suits our system. Note however that one agent will only ever execute in one thread at a time, because the calls are queued to ensure that the value of the agent is correctly modified. As you can see, agents can be used to coordinate state changes and provide convenient access to two useful thread-pools.

Clojure provides hash-maps which are very useful in many programming tasks. Maps are written between {} just like vectors are written between [] and lists are written between ().

{:name "Tim", :occupation "Programmer"}

Is a map which associates the keyword :name with "Tim" and :occupation with "Programmer". Note that the comma is treated as whitespace by Clojure but may be optionally supplied to help visually group things that go together. Keywords are preceded by : and provide a convenient way to name fields, however keys don't have to be keywords.

(defn map-count [map key] 
  (assoc map key (inc (get map key 0))))

This function takes a map as input, looks up a key and increments how many times that key has been counted. (get map key 0) just looks up key in map and returns its value if found, else 0. inc adds one to that value. (assoc map key value) will return a new map with key associated to value. So as you can see the map is not modified in any way. A new map is returned with a new value associated to the key supplied.

(reduce map-count {} ["hi" "mum" "hi" "dad" "hi"])

Results in {"dad" 1, "mum" 1, "hi" 3} because reduce starts with an empty map which is used as the input to the first function call, then the resulting map is used for the next and so forth. The strings are used as keys. Now we can write a more useful function which takes a string and counts the words in it:

(defn count-words [s] 
  (reduce map-count {} 
          (re-seq #"\w+" (.toLowerCase s)))) 
(count-words "hi mum hi dad hi")

Gives the same result. Note that re-seq here splits the input string into words based upon the regular expression provided.

One thing you will come across is that "maps are functions of their keys" which means:

user=> ({:a 1, :b 2, :c 3} :a) 
1

Here we created a map {:a 1, :b 2, :c 3}, can then called it like a function with the argument :a and it finds the value associated with that key, which is 1. Maps and keys do this trick by delegating to get, which is a function that looks up stuff:

user=> (get {:a 1, :b 2} :a) 
1

get also accepts an optional third argument, which is returned if key is not found in map:

user=> (get {:a 1, :b 2} :e 0) 
0

But you don't need to call get, you can just call the map:

user=> ({:a 1, :b 2, :c 3} :e 0) 
0

Keys can be called in the same way (reverse of what we did above):

user=> (:e {:a 1, :b 2} 99) 
99 
user=> (:b {:a 1, :b 2} 99)
2

assoc returns a new map with a value associated to a key:

user=> (assoc {:a 1, :b 2} :b 3) 
{:a 1, :b 3}

Using this knowledge you should be able to decipher the following more cryptic version of count-words which does exactly the same thing:

(defn count-words [s] 
  (reduce #(assoc %1 %2 (inc (%1 %2 0))) {} 
          (re-seq #"\w+" (.toLowerCase s))))

Here is an example of using a map as the value of an agent:

(defn action [mdl key val] (assoc mdl key val))
(def ag (agent {:a 1 :b 2}))
@ag
(send ag action :b 3)   ; send automatically supplies ag as the first argument to action
(await ag)
@ag

The function we send updates the map setting the value at 'key' to 'val'. It is obvious in retrospect but you might get confused initially when writing the function, the first parameter is always the value held by the agent and this does not have to be dereferenced in the function.

Lazy evaluation also warrants attention, I suggest reading http://blog.thinkrelevance.com/2008/12/1/living-lazy-without-variables

If you enjoy learning by example, please also keep in mind http://en.wikibooks.org/wiki/Clojure_Programming/Examples/API_Examples which is a useful resource for looking up how the various core functions can be used.