Clojure Programming/Concepts
Concepts
editBasics
editNumbers
editTypes of Numbers
editClojure supports the following numeric types:
- Integer
- Floating Point
- Ratio
- Decimal
Numbers in Clojure are based on java.lang.Number. BigInteger and BigDecimal are supported and hence we have arbitrary precision numbers in Clojure.
The Ratio type is described on Clojure page as
- Ratio
- Represents a ratio between integers. Division of integers that can't be reduced to an integer yields a ratio, i.e. 22/7 = 22/7, rather than a floating point or truncated value.
Ratios allows a computation to be maintained in numeric form. This can help avoid inaccuracies in long computations.
Here is a little experiment. Lets first try a computation of (1/3 * 3/1)
as floating point. Later we try the same with Ratio.
(def a (/ 1.0 3.0))
(def b (/ 3.0 1.0))
(* a b)
;; ⇒ 1.0
(def c (* a a a a a a a a a a)) ; ⇒ #'user/c
(def d (* b b b b b b b b b b)) ; ⇒ #'user/d
(* c d)
;; ⇒ 0.9999999999999996
The result we want is 1
, but the value of (* c d)
above is 0.9999999999999996
. This is due to the inaccuracies of a and b multiplying as we create c and d.
You really don't want such calculations happening in your pay cheque :)
The same done with ratios below:
(def a1 (/ 1 3))
(def b1 (/ 3 1))
(def c (* a1 a1 a1 a1 a1 a1 a1 a1 a1 a1))
(def d (* b1 b1 b1 b1 b1 b1 b1 b1 b1 b1))
(* c d)
;; ⇒ 1
The result is 1
as we hoped for.
Number Entry Formats
editClojure supports the usual formats for entry as shown below
user=> 10 ; decimal
10
user=> 010 ; octal
8
user=> 0xff ; hex
255
user=> 1.0e-2 ; double
0.01
user=> 1.0e2 ; double
100.0
To make things easier, a radix based entry format is
also supported in the form <radix>r<number>
. Where radix can be any natural number between 2 and 36.
2r1111
;; ⇒ 15
These formats can be mixed and used.
(+ 0x1 2r1 01)
;; ⇒ 3
Many bitwise operations are also supported by Clojure API:
(bit-and 2r1100 2r0100)
;; ⇒ 4
Some of the others are:
- (bit-and x y)
- (bit-and-not x y)
- (bit-clear x n)
- (bit-flip x n)
- (bit-not x)
- (bit-or x y)
- (bit-set x n)
- (bit-shift-left x n)
- (bit-shift-right x n)
- (bit-test x n)
- (bit-xor x y)
Check Clojure API for the complete documentation.
Converting Integers to Strings
editOne general purpose way to format any data for printing is to use a java.util.Formatter.
The predefined convenience function format
makes using a Formatter easy (the hash in %#x
displays the number as a hexadecimal number prefixed with 0x
):
(format "%#x" (bit-and 2r1100 2r0100))
;; ⇒ "0x4"
Converting integers to strings is even easier with java.lang.Integer. Note that since the methods are static, we must use the "/" syntax instead of ".method":
(Integer/toBinaryString 10)
;; ⇒ "1010"
(Integer/toHexString 10)
;; ⇒ "a"
(Integer/toOctalString 10)
;;⇒ "12"
Here is another way to specify the base of the string representation:
(Integer/toString 10 2)
;; ⇒"1010"
Where 10 is the number to be converted and 2 is the radix.
Note: In addition to the above syntax, which is used for accessing static fields or methods, . (dot)
can be used.
It is a special form used for accessing arbitrary (non-private) fields or methods in Java as explained in the Clojure Reference (Java Interop).
For example:
(. Integer toBinaryString 10)
;; ⇒ "1010"
For static accesses, the / syntax is preferred.
Converting Strings to Integers
editFor converting strings to integers, we can again use java.lang.Integer. This is shown below.
user=> (Integer/parseInt "A" 16) ; hex 10 user=> (Integer/parseInt "1010" 2) ; bin 10 user=> (Integer/parseInt "10" 8) ; oct 8 user=> (Integer/parseInt "8") ; dec 8
The above sections give an overview of the integer-to-string and string-to-integer formatting. There is a very rich set of well documented functions available in the Java libraries (too rich to document here). These functions can easily be used to meet varied needs.
Structures
editStructures in Clojure are a little different from those in languages
like Java or C++. They are also different from structures in Common
Lisp (even though we have a defstruct
in Clojure).
In Clojure, structures are a special case of maps and are explained in the data structures section in the reference.
The idea is that multiple instance of the structures will need to access their field values using the field names which are basically the map keys. This is fast and convenient, especially because Clojure automatically defines the keys as accessors for the structure instances.
Following are the important functions dealing with structures:
- defstruct
- create-struct
- struct
- struct-map
For the full API refer to data structures section in Clojure reference.
Structures are created using defstruct
which is a macro
wrapping the function create-struct
which actually creates
the struct. defstruct
creates the structure using
create-struct
and binds it to the structure name supplied to
defstruct
.
The object returned by create-struct
is what is called the
structure basis. This is not a structure instance but contains information
of what the structure instances should look like. New instances are
created using struct
or struct-map
.
The structure field names of type keyword or symbols are automatically usable as functions to access fields of the structure. This is possible as structures are maps and this feature is supported by maps. This is not possible for other types of field names such as strings or numbers. It is quite common to use keywords for field names for structures due to the above reason. Also, Clojure optimises structures to share base key information. The following shows sample usage:
(defstruct employee :name :id)
(struct employee "Mr. X" 10) ; ⇒ {:name "Mr. X", :id 10}
(struct-map employee :id 20 :name "Mr. Y") ; ⇒ {:name "Mr. Y", :id 20}
(def a (struct-map employee :id 20 :name "Mr. Y"))
(def b (struct employee "Mr. X" 10))'
;; :name and :id are accessors
(:name a) ; ⇒ "Mr. Y"
(:id b) ; ⇒ 10
(b :id) ; ⇒ 10
(b :name) ; ⇒ "Mr. X"
Clojure also supports the accessor
function that can
be used to get accessor functions for fields to allow easy access.
This is important when field names are of types other than keyword
or symbols. This is seen in the interaction below.
(def e-str (struct employee "John" 123))
e-str
;; ⇒ {:name "John", :id 123}
("name" e-str) ; ERROR: string not an accessor
;; ERROR ⇒
;; java.lang.ClassCastException: java.lang.String cannot be cast to clojure.lang.IFn
;; java.lang.ClassCastException: java.lang.String cannot be cast to clojure.lang.IFn
;; at user.eval__2537.invoke(Unknown Source)
;; at clojure.lang.Compiler.eval(Compiler.java:3847)
;; at clojure.lang.Repl.main(Repl.java:75)
(def e-name (accessor employee :name)) ; bind accessor to e-name
(e-name e-str) ; use accessor
;; ⇒ "John"
As structures are maps, new fields can be added to structure instances using
assoc
. dissoc
can be used to remove these
instance specific keys. Note however that struct base keys cannot be
removed.
b
;; ⇒ {:name "Mr. X", :id 10}
(def b1 (assoc b :function "engineer"))
b1
;; ⇒ {:name "Mr. X", :id 10, :function "engineer"}
(def b2 (dissoc b1 :function)) ; this works as :function is instance
b2
;; ⇒ {:name "Mr. X", :id 10}
(dissoc b2 :name) ; this fails. base keys cannot be dissociated
;; ERROR ⇒ java.lang.Exception: Can't remove struct key
assoc
can also be used to "update" a structure.
a
;; ⇒ {:name "Mr. Y", :id 20}
(assoc a :name "New Name")
;; ⇒ {:name "New Name", :id 20}
a ; note that 'a' is immutable and did not change
;; ⇒ {:name "Mr. Y", :id 20}
(def a1 (assoc a :name "Another New Name")) ; bind to a1
a1
;; ⇒ {:name "Another New Name", :id 20}
Observe that like other sequences in Clojure, structures are also
immutable, hence, simply doing assoc
above does not change
a
. Hence we rebind it to a1
. While it is
possible to rebind the new value back to a
, this is not
considered good style.
Exception Handling
editClojure supports Java based Exceptions. This may need some getting used to for Common Lisp users who are used to the Common Lisp Condition System.
Clojure does not support a condition system and is not expected to be supported anytime soon as per this message. That said, the more common exception system which is adopted by Clojure is well suited for most programming needs.
If you are new to exception handling, the Java Tutorial on Exceptions is a good place to learn about them.
In Clojure, exceptions can be handled using the following functions:
(try expr* catch-clause* finally-clause?)
- catch-clause -> (catch classname name expr*)
- finally-clause -> (finally expr*)
(throw expr)
Two types of exceptions you may want to handle in Clojure are:
- Clojure Exception: These are exception generated by Clojure or the underlying Java engine
- User Defined Exception: These are exceptions which you might create for your applications
Clojure Exceptions
editBelow is a simple interaction at the REPL that throws an exception:
user=> (/ 1 0)
java.lang.ArithmeticException: Divide by zero
java.lang.ArithmeticException: Divide by zero
at clojure.lang.Numbers.divide(Numbers.java:142)
at user.eval__2127.invoke(Unknown Source)
at clojure.lang.Compiler.eval(Compiler.java:3847)
at clojure.lang.Repl.main(Repl.java:75)
In the above case we see a java.lang.ArithmeticException
being
thrown. This is a runtime exception which is thrown by the underlying JVM. The long
message can sometimes be intimidating for new users but the trick is to
simply look at the exception (java.lang.ArithmeticException: Divide by zero
) and not bother with the rest of the trace.
Similar exceptions may be thrown by the compiler at the REPL.
user=> (def xx yy)
java.lang.Exception: Unable to resolve symbol: yy in this context
clojure.lang.Compiler$CompilerException: NO_SOURCE_FILE:4: Unable to resolve symbol: yy in this context
at clojure.lang.Compiler.analyze(Compiler.java:3669)
at clojure.lang.Compiler.access$200(Compiler.java:37)
at clojure.lang.Compiler$DefExpr$Parser.parse(Compiler.java:335)
at clojure.lang.Compiler.analyzeSeq(Compiler.java:3814)
at clojure.lang.Compiler.analyze(Compiler.java:3654)
at clojure.lang.Compiler.analyze(Compiler.java:3627)
at clojure.lang.Compiler.eval(Compiler.java:3851)
at clojure.lang.Repl.main(Repl.java:75)
In the above case, the compiler does not find the binding for yy
and hence it throws the exception. If your program is correct (i.e. in this case yy
is defined (def yy 10)
) , you won't see any compile time exceptions.
The following interaction shows how runtime exceptions like ArithmeticException
can be handled.
user=> (try (/ 1 0)
(catch Exception e (prn "in catch"))
(finally (prn "in finally")))
"in catch"
"in finally"
nil
The syntax for the try
block is (try expr* catch-clause* finally-clause?)
.
As can be seen, it's quite easy to handle exceptions in Clojure. One thing to note is that (catch Exception e ...)
is a catch all for exceptions as Exception
is a superclass of all exceptions. It is also possible to catch specific exceptions which is generally a good idea.
In the example below, we specifically catch ArithmeticException
.
user=> (try (/ 1 0) (catch ArithmeticException e (prn "in catch")) (finally (prn "in finally")))
"in catch"
"in finally"
nil
When we use some other exception type in the catch block, we find that the ArithmeticException
is not caught and is seen by the REPL.
user=> (try (/ 1 0) (catch IllegalArgumentException e (prn "in catch")) (finally (prn "in finally")))
"in finally"
java.lang.ArithmeticException: Divide by zero
java.lang.ArithmeticException: Divide by zero
at clojure.lang.Numbers.divide(Numbers.java:142)
at user.eval__2138.invoke(Unknown Source)
at clojure.lang.Compiler.eval(Compiler.java:3847)
at clojure.lang.Repl.main(Repl.java:75)
User-Defined Exceptions
editAs mentioned previously, all exceptions in Clojure need to be a subclass of java.lang.Exception (or generally speaking - java.lang.Throwable which is the superclass for Exception). This means that even when you want to define your own exceptions in Clojure, you need to derive it from Exception.
Don't worry, that's easier than it sounds :)
Clojure API provides a function gen-and-load-class
which can be used to extend java.lang.Exception
for user-defined exceptions. gen-and-load-class
generates and immediately loads the bytecode for the specified class.
Now, rather than talking too much, let's quickly look at code.
(gen-and-load-class 'user.UserException :extends Exception)
(defn user-exception-test []
(try
(throw (new user.UserException "msg: user exception was here!!"))
(catch user.UserException e
(prn "caught exception" e))
(finally (prn "finally clause invoked!!!"))))
Here we are creating a new class 'user.UserException
that
extends java.lang.Exception
. We create an instance
of user.UserException
using the special form (new Classname-symbol args*)
.
This is then thrown.
Sometimes you may come across code like (user.UserException. "msg: user exception was here!!")
.
This is just another way to say new
. Note the . (dot)
after the user.UserException
. This does exactly the same thing.
Here is the interaction:
user=> (load-file "except.clj") #'user/user-exception-test user=> (user-exception-test) "caught exception" user.UserException: msg: user exception was here!! "finally clause invoked!!!" nil user=>
So here we have both the catch
and the finally
clauses being invoked. That's all there is to it.
With Clojure's support for Java Interop, it is also possible for the user to create exceptions in Java and catch them in Clojure, but creating the exception in Clojure is typically more convenient.
Mutation Facilities
editEmployee Record Manipulation
editData structures and sequences in Clojure are immutable as seen in the examples presented in Clojure_Programming/Concepts#Structures (it is suggested that the reader go through that section first).
While immutable data has its advantages, any project of reasonable size will require the programmer to maintain some sort of state. Managing state in a language with immutable sequences and data structures is a frequent source of confusion for people used to programming languages that allow mutation of data.
A good essay on the Clojure approach is [http://clojure.org/state Values and Change - Clojure's approach to Identity and State], written by Rich Hickey.
It may be useful to watch Clojure Concurrency screen cast as some of those concepts are used in this section. Specifically refs and transactions.
In this section we create a simple employee record set and provide functions to:
- Add an employee
- Delete employee by name
- Change employee role by name
The example is purposely kept simple as the intent is to show the state and mutation facilities rather than provide full functionality.
Lets dive into the code.
(alias 'set 'clojure.set) ; use set/fn-name rather than clojure.set/fn-name
(defstruct employee
:name :id :role) ; == (def employee (create-struct :name :id ..))
(def employee-records (ref #{}))
;;;===================================
;;; Private Functions: No Side-effects
;;;===================================
(defn- update-role [n r recs]
(let [rec (set/select #(= (:name %) n) recs)
others (set/select #(not (= (:name %) n)) recs)]
(set/union (map #(set [(assoc % :role r)]) rec) others)))
(defn- delete-by-name [n recs]
(set/select #(not (= (:name %) n)) recs))
;;;=============================================
;;; Public Function: Update Ref employee-records
;;;=============================================
(defn update-employee-role [n r]
"update the role for employee named n to the new role r"
(dosync
(ref-set employee-records (update-role n r @employee-records))))
(defn delete-employee-by-name [n]
"delete employee with name n"
(dosync
(ref-set employee-records
(delete-by-name n @employee-records))))
(defn add-employee [e]
"add new employee e to employee-records"
(dosync (commute employee-records conj e)))
;;;=========================
;;; initialize employee data
;;;=========================
(add-employee (struct employee "Jack" 0 :Engineer))
(add-employee (struct employee "Jill" 1 :Finance))
(add-employee (struct-map employee :name "Hill" :id 2 :role :Stand))
In the first few lines we define the employee
structure.
The interesting definition after that is employee-records
.
(def employee-records (ref #{}))
In Clojure refs allow mutation of a storage location with a transaction.
user=> (def x (ref [1 2 3])) #'user/x user=> x clojure.lang.Ref@128594c user=> @x [1 2 3] user=> (deref x) [1 2 3] user=>
Next we define private functions update-role
and delete-by-name
using defn-
(note the minus '-' at the end). Observe that these are pure functions without any side-effects.
update-role
takes the employee name n
, the new role r
and a table of employee records recs
. As sequences are immutable, this function returns a new table of records with the employee role updated appropriately.
delete-by-name
also behaves in a similar manner by returning a new table of employees after deleting the relevant employee record.
For an explanation of the set
API see Clojure API reference.
We still haven't looked at how state is maintained. This is done by the public functions in the listing update-employee-role
, delete-employee-by-name
and add-employee
.
These functions delegate the job of record processing to the private functions. The important things to note are the use of the following functions:
ref-set
sets the value of a ref.dosync
is mandatory as refs can only be updated in a transaction anddosync
sets up the transaction.commute
updates the in-transaction value of a ref.
For a detailed explanation of these functions see the refs section in API reference.
The add-employee
function is quite trivial and hence not broken up into private and public function.
The source listing initializes the records with sample data towards the end.
Below is the interaction for this program.
user=> (load-file "employee.clj") #{{:name "Jack", :id 0, :role :Engineer} {:name "Hill", :id 2, :role :Stand} {:name "Jill", :id 1, :role :Finance}} user=> @employee-records #{{:name "Jack", :id 0, :role :Engineer} {:name "Hill", :id 2, :role :Stand} {:name "Jill", :id 1, :role :Finance}} user=> (add-employee (struct employee "James" 3 :Bond)) #{{:name "James", :id 3, :role :Bond} {:name "Jack", :id 0, :role :Engineer} {:name "Hill", :id 2, :role :Stand} {:name "Jill", :id 1, :role :Finance}} user=> @employee-records #{{:name "James", :id 3, :role :Bond} {:name "Jack", :id 0, :role :Engineer} {:name "Hill", :id 2, :role :Stand} {:name "Jill", :id 1, :role :Finance}} user=> (update-employee-role "Jill" :Sr.Finance) #{{:name "James", :id 3, :role :Bond} {:name "Jack", :id 0, :role :Engineer} {:name "Hill", :id 2, :role :Stand} {:name "Jill", :id 1, :role :Sr.Finance}} user=> @employee-records #{{:name "James", :id 3, :role :Bond} {:name "Jack", :id 0, :role :Engineer} {:name "Hill", :id 2, :role :Stand} {:name "Jill", :id 1, :role :Sr.Finance}} user=> (delete-employee-by-name "Hill") #{{:name "James", :id 3, :role :Bond} {:name "Jack", :id 0, :role :Engineer} {:name "Jill", :id 1, :role :Sr.Finance}} user=> @employee-records #{{:name "James", :id 3, :role :Bond} {:name "Jack", :id 0, :role :Engineer} {:name "Jill", :id 1, :role :Sr.Finance}}
Two things to note about the program:
- Using refs and transactions makes the program inherently thread safe. If we want to extend the program for a multi-threaded environment (using Clojure agents) it will scale with minimal change.
- Keeping the pure functionality separate from the public function that manages state, it is easier to ensure that the functionality is correct as pure functions are easier to test.
Overview
edit- use
require
to load clojure libraries - use
refer
to refer to functions in the current namespace - use
use
to load and refer all in one step - use
import
to refer to Java classes in the current namespace
Require
1. You can load the code for any clojure library with (require
libname)
. Try it with clojure.contrib.math
:
(require clojure.contrib.math)
2. Then print the directory of names available in the namespace
(dir clojure.contrib.math)
3. Show using lcm
to calculate the least common multiple:
1 (clojure.contrib.math/lcm 11 41) 2 -> 451
4. Writing out the namespace prefix on every function call is a pain, so you can specify a shorter alias using as
:
(require [clojure.contrib.math :as m])
5. Calling the shorter form is much easier:
1 (m/lcm 120 1000) 2 -> 3000
6. You can see all the loaded namespaces with
(all-ns)
Refer and Use
edit1. It would be even easier to use a function with no namespace prefix at all. You can do this by referring to the name, which makes a reference to the name in the current namespace:
(refer 'clojure.contrib.math)
2. Now you can call lcm
directly:
1 (lcm 16 30) 2 -> 240
3. If you want to load and refer all in one step, call use
:
(use 'clojure.contrib.math)
4. Referring a library refers all of its names. This is often undesirable, because
- it does not clearly document intent to readers
- it brings in more names than you need, which can lead to name collisions
Instead, use the following style to specify only those names you want:
(use '[clojure.contrib.math :only (lcm)])
The :only
option is available on all the namespace management forms. (There is also an :exclude
which works as you might expect.)
5. The variable *ns*
always contains the current namespace, and you can see what names your current namespace refers to by calling
(ns-refers *ns*)
6. The refers map is often pretty big. If you are only interested in one symbol, pass that symbol to the result of calling ns-refers
:
1 ((ns-refers *ns*) 'dir) 2 -> #'clojure.contrib.ns-utils/dir
Import
edit1. Importing is like referring, but for Java classes instead of Clojure namespaces. Instead of
(java.io.File. "woozle")
you can say
1 (import java.io.File) 2 (File. "woozle")
2. You can import multiple classes in a Java package with the form
(import [package Class Class])
For example:
1 (import [java.util Date Random]) 2 (Date. (long (.nextInt (Random.))))
3. Programmers new to Lisp are often put off by the "inside-out" reading of forms like the date creation above. Starting from the inside, you
- get a new Random
- get the next random integer
- cast it to a long
- pass the long to the Date constructor
You don't have to write inside-out code in Clojure. The -> macro takes its first form, and passes it as the first argument to its next form. The result then becomes the first argument of the next form, and so on. It is easier to read than to describe:
1 (-> (Random.) (.nextInt) (long) (Date.)) 2 -> #<Date Sun Dec 21 12:47:20 EST 1969>
Load and Reload
editThe REPL isn't for everything. For work you plan to keep, you will want to place your source code in a separate file. Here are the rules of thumb to remember when creating your own Clojure namespaces.
1. Clojure namespaces (a.k.a. libraries) are equivalent to Java packages.
2. Clojure respects Java naming conventions for directories and files, but Lisp naming conventions for namespace names. So a Clojure namespace com.my-app.utils would live in a path named com/my_app/utils.clj. Note especially the underscore/hyphen distinction.
3. Clojure files normally begin with a namespace declaration, e.g.
(ns com.my-app.utils)
4. The syntax for import/use/refer/require presented in the previous sections is for REPL use. Namespace declarations allow similar forms—similar enough to aid memory, but also different enough to confuse. The following forms at the REPL:
1 (use 'foo.bar) 2 (require 'baz.quux) 3 (import '[java.util Date Random])
would look like this in a source code file:
1 (ns 2 com.my-app.utils 3 (:use foo.bar) 4 (:require baz.quux) 5 (:import [java.util Date Random]))
Symbols become keywords, and quoting is no longer required.
5. At the time of this writing, the error messages for doing it wrong with namespaces are, well, opaque. Be careful.
Now let's try creating a source code file. We aren't going to bother with explicit compilation for now. Clojure will automatically (and quickly) compile source code files on the classpath. Instead, we can just add Clojure (.clj) files to the src directory.
1. Create a file named student/dialect.clj in the src directory, with the appropriate namespace declaration:
(ns student.dialect)
2. Now, implement a simple canadianize function that takes a string, and appends , eh?
(defn canadianize [sentence] (str sentence ", eh"))
3. From your REPL, use the new namespace:
(use 'student.dialect)
4. Now try it out.
1 (canadianize "Hello, world.") 2 -> "Hello, world., eh"
5. Oops! We need to trim the period off the end of the input. Fortunately, clojure.contrib.str-utils2 provides chop. Go back to student/dialect.clj and add require in clojure.contrib.str-utils2:
(ns student.dialect (:require [clojure.contrib.str-utils2 :as s]))
6. Now, update canadianize to use chop:
(defn canadianize [sentence] (str (s/chop sentence) ", eh?"))
7. If you simply retry calling canadianize from the repl, you will not see your new change, because the code was already loaded. However, you can use namespace forms with reload ( or reload-all) to reload a namespace (and its dependencies).
(use :reload 'student.dialect)
8. Now you should see the new version of canadianize:
1 (canadianize "Hello, world.") 2 -> "Hello, world, eh?"
Functional Programming
editAnonymous Functions
editClojure supports anonymous functions using fn
or the shorter reader macro #(..)
. The #(..)
is convenient due to its conciseness but is somewhat limited as #(..)
form cannot be nested.
Below are some examples using both forms:
user=> ((fn [x] (* x x)) 3) 9 user=> (map #(list %1 (inc %2)) [1 2 3] [1 2 3]) ((1 2) (2 3) (3 4)) user=> (map (fn [x y] (list x (inc y))) [1 2 3] [1 2 3]) ((1 2) (2 3) (3 4)) user=> (map #(list % (inc %)) [1 2 3]) ((1 2) (2 3) (3 4)) user=> (map (fn [x] (list x (inc x))) [1 2 3]) ((1 2) (2 3) (3 4)) user=> (#(apply str %&) "Hello") "Hello" user=> (#(apply str %&) "Hello" ", " "World!") "Hello, World!"
Note that in #(..)
form, %N
is used for arguments (1 based) and %&
for the rest argument. %
is a synonym for %1
.
Lazy Evaluation of Sequences
editThis section tries to walk through some code to give a better feel for the lazy evaluation of sequences by Clojure and how that might be useful. We also measure memory and time to understand whats happening better.
Consider a scenario where we want to do a big-computation
(1 second each) on records in a list with a billion items.
Typically we may not need all the billion items processed
(e.g. we may need only a filtered subset).
Let's define a little utility function free-mem
to help us monitor memory usage and another function
big-computation
that takes 1 second to do
its job.
(defn free-mem [] (.freeMemory (Runtime/getRuntime)))
(defn big-computation [x] (Thread/sleep 1000) (* 10 x))
In the functions above we use java.lang.Runtime and java.lang.Thread for getting free memory and supporting sleep.
We will also be using the built in function time
to measure our performance.
Here is a simple usage at REPL:
user=> (defn free-mem [] (.freeMemory (Runtime/getRuntime))) #'user/free-mem user=> (defn big-computation [x] (Thread/sleep 1000) (* 10 x)) #'user/big-computation user=> (time (big-computation 1)) "Elapsed time: 1000.339953 msecs" 10
Now we define a list of 1 billion numbers called nums
.
user=> (time (def nums (range 1000000000))) "Elapsed time: 0.166994 msecs" #'user/nums
Note that it takes Clojure only 0.17 ms to create a list of 1 billion numbers. This is because the list is not really created. The user just has a promise from Clojure that the appropriate number from this list will be returned when asked for.
Now, let's say, we want to apply big-computation
to x from 10000 to 10005 from this list.
This is the code for it:
;; The comments below should be read in the numbered order
;; to better understand this code.
(time ; [7] time the transaction
(def v ; [6] save vector as v
(apply vector ; [5] turn the list into a vector
(map big-computation ; [4] process each item for 1 second
(take 5 ; [3] take first 5 from filtered items
(filter ; [2] filter items 10000 to 10010
(fn [x] (and (> x 10000) (< x 10010)))
nums)))))) ; [1] nums = 1 billion items
Putting this code at the REPL, this is what we get:
user=> (free-mem) 2598000 user=> (time (def v (apply vector (map big-computation (take 5 (filter (fn [x] (and (> x 10000) (< x 10010))) nums)))))) "Elapsed time: 5036.234311 msecs" #'user/v user=> (free-mem) 2728800
The comments in the code block indicate the working of this code. It took us ~5 seconds to execute this. Here are some points to note:
- It did not take us 10000 seconds to filter out item number 10000 to 10010 from the list
- It did not take us 10 seconds to get first 5 items from the list of 10 filtered list
- Overall, it took the computation only 5 seconds which is basically the computation time.
- The amount of free memory is virtually the same even though we now have the promise of a billion records for processing. (It actually seems to have gone up a bit due to garbage collection)
Now if we access v it takes negligible time.
user=> (time (seq v)) "Elapsed time: 0.042045 msecs" (100010 100020 100030 100040 100050) user=>
Another point to note is that a lazy sequence does not mean that the computation is done every time; once the computation is done, it gets cached.
Try the following:
user=> (time (def comps (map big-computation nums))) "Elapsed time: 0.113564 msecs" #'user/comps user=> (defn t5 [] (take 5 comps)) #'user/t5 user=> (time (doall (t5))) "Elapsed time: 5010.49418 msecs" (0 10 20 30 40) user=> (time (doall (t5))) "Elapsed time: 0.096104 msecs" (0 10 20 30 40) user=>
In the first step we map big-computation
to a billion nums
. Then we define a function t5
that takes 5 computations from comps.
Observe that the first time t5 takes 5 seconds and after that it takes neglegible time. This is because once the calculation is done, the results are cached for later use. Since the result of t5
is also lazy, doall
is needed to force it to be eagerly evaluated before time
returns to the REPL.
Lazy data structures can offer significant advantage assuming that the program is designed to leverage that. Designing a program for lazy sequences and infinite data structures is a paradigm shift from eagerly just doing the computation in languages like C and Java vs giving a promise of a computation.
This section is based on this mail in the Clojure group.
Infinite Data Source
editAs Clojure supports lazy evaluation of sequences, it is possible to have infinite data sources in Clojure. The infinite sequence (0 1 2 3 4 5 ....) can be defined using (range) since clojure 1.2:[2]
(def nums (range)) ; Old version (def nums (iterate inc 0))
;; ⇒ #'user/nums
(take 5 nums)
;; ⇒ (0 1 2 3 4)
(drop 5 (take 11 nums))
;; ⇒ (5 6 7 8 9 10)
Here we see two functions that are used for create an infinite list of numbers starting from 0. As Clojure supports lazy sequences, only the required items are generated and taken of the head of this list. In the above case, if you were to type out (range) or (iterate inc 0) directly at the prompt, the [http://clojure.org/reader reader] would continue getting the next number forever and you would need to terminate the process.
(iterate f x)
is a function that continuously applies f to the
result of the previous application of f to x. Meaning, the result is
...(f(f(f(f .....(f(f(f x)))))...
. (iterate inc 0)
first gives 0 as the result, then (inc 0) => 1
, then (inc
(inc 0)) => 2
and so on.
(take n coll)
basically removes n
items from the collection. There are many variation of this theme:
- (take n coll)
- (take-nth n coll)
- (take-last n coll)
- (take-while pred coll)
- (drop n coll)
- (drop-while pred coll)
The reader is encouraged to look at the Clojure Sequence API for details.
List Comprehension
editList Comprehensions are the constructs offered by a language that make it easy to create new lists from old ones. As simple as it sounds, it is a very powerful concept. Clojure has good support for List comprehensions.
Lets say we want a set of all x + 1
for all x divisible by 4
with x
starting from 0
.
Here is one way to do it in Clojure:
(def nums (iterate inc 0))
;; ⇒ #'user/nums
(def s (for [x nums :when (zero? (rem x 4))] (inc x)))
;; ⇒ #'user/s
(take 5 s)
;; ⇒ (1 5 9 13 17)
nums
is the infinite list of numbers that we saw in the previous
section. We need to (def s ...)
for the set as we are creating an
infinite source of numbers. Running it directly at the prompt will make the
reader
suck out numbers from this source indefinitely.
The key construct here is the for
macro. Here the expression
[x nums ...
says that x comes out of nums
one at a
time. The next clause .. :when (zero? (rem x 4)) ..
basically says
that x should be pulled out only if it meets this criteria. Once this x is out,
inc
is applied to it. Binding all this to s
gives
us an infinite set. Hence, the (take 5 s)
and the expected result
that we see.
Another way to achieve the same result is to use map
and
filter
.
(def s (map inc (filter (fn [x] (zero? (rem x 4))) nums)))
;; ⇒ #'user/s
(take 5 s)
;; ⇒ (1 5 9 13 17)
Here we create a predicate (fn [x] (zero? (rem x 4)))
and pull out
x's from nums only if this predicate is satisfied. This is done by
filter
. Note that since Clojure is lazy, what filter
gives is only a promise of supplying the next number that satisfies the predicate.
It does not (and cannot in this particular case) evaluate the entire list. Once
we have this stream of x's, it is simply a matter of mapping inc to it (map
inc ...
.
The choice between List Comprehension i.e. for
and
map/filter
is largely a matter of user preference. There is no
major advantage of one over the other.
Lisp
editSequence Functions
edit(first coll)
editGets the first element of a sequence. Returns nil for an empty sequence or nil.
(first (list 1 2 3 4))
;; ⇒ 1
(first (list))
;; ⇒ nil
(first nil)
;; ⇒ nil
(map first [[1 2 3] "Test" (list 'hi 'bye)])
;; ⇒ (1 \T hi)
(first (drop 3 (list 1 2 3 4)))
;; ⇒ 4
(rest coll)
editGets everything except the first element of a sequence. Returns nil for an empty sequence or nil.
(rest (list 1 2 3 4))
;; ⇒ (2 3 4)
(rest (list))
;; ⇒ nil
(rest nil)
;; ⇒ nil
(map rest [[1 2 3] "Test" (list 'hi 'bye)])
;; ⇒ ((2 3) (\e \s \t) (bye))
(rest (take 3 (list 1 2 3 4)))
;; ⇒ (2 3)
(map f colls*)
editApplies f lazily to each item in the sequences, returning a lazy sequence of the return values of f.
Because the supplied function always returns true, these both return a sequence of true, repeated ten times.
(map (fn [x] true) (range 10))
;; ⇒ (true true true true true true true true true true)
(map (constantly true) (range 10))
;; ⇒ (true true true true true true true true true true)
These two functions both multiply their argument by 2, so (map ...) returns a sequence where every item in the original is doubled.
(map (fn [x] (* 2 x)) (range 10))
;; ⇒ (0 2 4 6 8 10 12 14 16 18)
(map (partial * 2) (range 10))
;; ⇒ (0 2 4 6 8 10 12 14 16 18)
(map ...) may take as many sequences as you supply to it (though it requires at least one sequence), but the function argument must accept as many arguments as there are sequences.
Thus, these two functions give the sequences multiplied together:
(map (fn [a b] (* a b)) (range 10) (range 10))
;; ⇒ (0 1 4 9 16 25 36 49 64 81)
(map * (range 10) (range 10))
;; ⇒ (0 1 4 9 16 25 36 49 64 81)
But the first one will only take two sequences as arguments, whereas the second one will take as many as are supplied.
(map (fn [a b] (* a b)) (range 10) (range 10) (range 10))
;; ⇒ java.lang.IllegalArgumentException: Wrong number of args passed
(map * (range 10) (range 10) (range 10))
;; ⇒ (0 1 8 27 64 125 216 343 512 729)
(map ...) will stop evaluating as soon as it reaches the end of any supplied sequence, so in all three of these cases, (map ...) stops evaluating at 5 items (the length of the shortest sequence,) despite the second and third giving it sequences that are longer than 5 items (in the third example, the longer sequence is of infinite length.)
Each of these takes a a sequence made up solely of the number 2 and a sequence of the numbers (0 1 2 3 4) and multiplies them together.
(map * (replicate 5 2) (range 5))
;; ⇒ (0 2 4 6 8)
(map * (replicate 10 2) (range 5))
;; ⇒ (0 2 4 6 8)
(map * (repeat 2) (range 5))
;; ⇒ (0 2 4 6 8)
(every? pred coll)
editReturns true if pred is true for every item in a sequence. False otherwise. pred, in this case, is a function taking a single argument and returning true or false.
As this function returns true always, (every? ...) evaluates to true. Note that these two functions say the same thing.
(every? (fn [x] true) (range 10))
;; ⇒ true
(every? (constantly true) (range 10))
;; ⇒ true
(pos? x) returns true when its argument is greater than zero. Since (range 10) gives a sequence of numbers from 0 to 9 and (range 1 10) gives a sequence of numbers from 1 to 10, (pos? x) returns false once for the first sequence and never for the second.
(every? pos? (range 10))
;; ⇒ false
(every? pos? (range 1 10))
;; ⇒ true
This function returns true when its argument is an even number. Since the range between 1 and 10 and the sequence (1 3 5 7 9) contain odd numbers, (every? ...) returns false.
As the sequence (2 4 6 8 10) contains only even numbers, (every? ...) returns true.
(every? (fn [x] (= 0 (rem x 2))) (range 1 10))
;; ⇒ false
(every? (fn [x] (= 0 (rem x 2))) (range 1 10 2))
;; ⇒ false
(every? (fn [x] (= 0 (rem x 2))) (range 2 10 2))
;; ⇒ true
If I had a need, elsewhere, to check if a number were even, I might, instead, write the following, making (even? num) an actual function before passing it as an argument to (every? ...)
(defn even? [num] (= 0 (rem num 2)))
;; ⇒ #<Var: user/even?>
(every? even? (range 1 10 2))
;; ⇒ false
(every? even? (range 2 10 2))
;; ⇒ true
Complementary function: (not-every? pred coll)
Returns the complementary value to (every? pred coll). False if pred is true for all items in the sequence, true if otherwise.
(not-every? pos? (range 10))
;; ⇒ true
(not-every? pos? (range 1 10))
;; ⇒ false
Looping and Iterating
editThree different ways to loop from 1 to 20, increment by 2, printing the loop index each time (from mailing list discussion):
;; Version 1
(loop [i 1]
(when (< i 20)
(println i)
(recur (+ 2 i))))
;; Version 2
(dorun (for [i (range 1 20 2)]
(println i)))
;; Version 3
(doseq [i (range 1 20 2)]
(println i))
Mutual Recursion
editMutual recursion is tricky but possible in Clojure. The form of (defn ...) allows the body of a function to refer to itself or previously existing names only. However, Clojure does allow dynamic redefinition of function bindings, in the following way:
;;; Mutual recursion example
;; Forward declaration
(def even?)
;; Define odd in terms of 0 or even
(defn odd? [n]
(if (zero? n)
false
(even? (dec n))))
;; Define even? in terms of 0 or odd
(defn even? [n]
(if (zero? n)
true
(odd? (dec n))))
;; Is 3 even or odd?
(even? 3)
;; ⇒ false
Mutual recursion is not possible in internal functions defined with let
. To declare a set of private recursive functions, you can use the above technique with defn-
instead of defn
, which will generate private definitions.
However one can emulate mutual recursive functions with loop
and recur
.
(use 'clojure.contrib.fcase)
(defmacro multi-loop
[vars & clauses]
(let [loop-var (gensym "multiloop__")
kickstart (first clauses)
loop-vars (into [loop-var kickstart] vars)]
`(loop ~loop-vars
(case ~loop-var
~@clauses))))
(defn even?
[n]
(multi-loop [n n]
:even (if (zero? n)
true
(recur :odd (dec n)))
:odd (if (zero? n)
false
(recur :even (dec n)))))
Collection Abstractions
editConcurrency
editMacros
editA nice walkthrough on how to write a macro can be found at http://blog.n01se.net/?p=33 by Chouser.
Macros are used to transform data structures at compile time. Let's develop a new do1
macro. The do
special form of Clojure evaluates all containing forms for their side-effects and returns the return value of the last one. do1
should act similar, but return the value of the first sub-form.
In the beginning one should first think about how the macro should be invoked.
(do1
:x
:y
:z)
The return value should be :x
. Then the next step is to think about how we would do this manually.
(let [x :x]
:y
:z
x)
This first evaluates :x, then :y and :z. Finally the let evaluates to the result of evaluating :x. This can be turned into a macro using defmacro
and `
.
(defmacro do1
[fform & rforms]
`(let [x# ~fform]
~@rforms
x#))
So what happens here. It is just a simple translation. We use the let
to create a temporary place for the result of our first form to stay. Since we cannot simply use some name (it might be used in the user code), we generate a new one with x#
. The # is a special notation of Clojure to help us: it generates a new name, which is guaranteed to be not used by the user code. The ~
"unquotes" our first form, that is ~fform
is replaced by the first argument. Then the ~@
is used to inject the remaining forms. Using the @
basically removes one set of () from the following expression. Finally we refer again to the result of the first form with x#
.
We can check the expansion of our macro with (macroexpand-1 '(do1 :x :y :z))
.
Libraries
editThe lib package from clojure.contrib
is now integrated into clojure. It is easy to define libraries that can be loaded by other scripts. Suppose we have an awesome add1
function which we want to provide to other developers. So what do we need? First we settle on a namespace, eg. example.ourlib
. Now we have to create a file in the classpath with the filename "example/ourlib.clj". The contents are pretty straight forward.
(ns example.ourlib)
(defn add1
[x]
(add x 1))
All we have to do now is to use the functionality of ns
. Suppose we have another file, where we want to use our function. ns
lets us specify our requirements in a lot of ways. The simplest is :require
(ns example.otherns
(:require example.ourlib))
(defn check-size
[x]
(if (too-small x)
(example.ourlib/add1 x)
x))
But what if we need the add1
function several times? We have to type always the namespace in front. We could add a (refer 'example.ourlib)
, but we can have this easier. Just use :use
instead of :require
! :use
loads the library as :require
does and immediately refer
s to the namespace.
So now we have already two small libraries which are maybe used in a third program.
(ns example.thirdns
(:require example.ourlib)
(:require example.otherns))
Again we can save some typing here. Similar to import
we can factor out the common prefix of our libraries' namespaces.
(ns example.thirdns
(:require (example ourlib otherns)))
Of course ourlib
contains 738 more functions, not only those shown above. We don't really want to have use
because bringing in so many names risks conflicts, but we also don't want to type the namespace all the time either. So the first thing we do is employ an alias
. But wait! You guessed it: ns
helps us again.
(ns example.otherns
(:require (example [ourlib :as ol])))
The :as
takes care of the aliasing and now we can refer to our add1
function as ol/add1
!
Up to now it is already quite nice. But if we think a bit about our source code organization, we might end up with the insight that 739 functions in one single file is maybe not the best idea to keep around. So we decide to do some refactoring. We create a file "example/ourlib/add1.clj" and put our functions there. We don't want the user to have to load many files instead of one, so we modify the "example/ourlib.clj" file to load any additional files as follows.
(ns example.ourlib
(:load "ourlib/add1"
"ourlib/otherfunc"
"ourlib/morefuncs"))
So the user still loads the "public" example.ourlib lib, which takes care of loading the rest. (The :load implementation includes code to provide the ".clj" suffix for the files being loaded)
For more information see the docstring of require - (doc require)
.
References
edit- ↑ http://github.com/relevance/labrepl
- ↑ range on ClojureDocs