Learning Clojure/Data Types

There are a few notable things to say about all of Clojure's types:

  1. Clojure is implemented in Java: the compiler is written in Java, and Clojure code itself is run as Java VM code. Consequently, data types in Clojure are Java data types: all values in Clojure are regular Java reference objects, i.e. instances of Java classes.
  2. Most Clojure types are immutable, i.e. once created, they never change.
  3. Clojure favors equality comparisons over identity comparisons: instead of, say, comparing two lists to see if they are the very same object in memory, the Clojure way is to compare their actual values, i.e. their content. Most languages (including Java) don't do things this way because inspecting the values of deeply structured objects is costly, but Clojure makes it cheap: when created, a Clojure object keeps around a hash of itself, and it's this hash which is compared in an equality comparison rather than actually inspecting the objects; this hash suffices as long as the compared structures are entirely immutable. (Watch out for cases of mutable Java objects stored in immutable Clojure collection objects. If the mutable object changes, this won't be reflected in the collection's hash.)

Numbers

edit

Java includes wrapper reference types for its primitive number types, e.g. java.lang.Integer "boxes" (wraps) the primitive int type. Because every Clojure function is a JVM method expecting Object arguments, Java primitives are usually boxed in Clojure functions: when Clojure calls a Java method, a returned primitive is automatically wrapped, and any arguments to a Java method are automatically unwrapped as necessary. (However, type hinting allows non-parameter locals in Clojure functions to be unboxed primitives, which can be useful when you're trying to optimize a loop.)

Clojure uses the classes java.lang.BigInt and java.lang.BigDecimal for arbitrary precision integer and decimal values, respectively. Special versions of the Clojure arithmetic operations (+', -', *', inc', and dec') intelligently return these kinds of values as necessary to ensure the results are always fully precise.

Some rational values simply can't be represented in floating-point, so Clojure adds a Ratio type. A Ratio value is a ratio between two integers. Written as a literal, a Ratio is two integers with a slash between them, e.g. 23/55 (twenty-three fifty-fifths).

Clojure arithmetic operations intelligently return integers or ratios as necessary, e.g. 7/3 plus 2/3 returns 3, and 11 divided by 5 returns 11/5. As long as your calculations involve only integers and ratios, the results will be mathematically fully accurate, but as soon as a floating-point or BigDecimal value enters the mix, you'll get floating-point or BigDecimal results, which may lead to results which are not mathematically fully accurate, e.g. 1 divided by 7 returns 1/7, but 1 divided by 7.0 returns 0.14285714285714285.

Strings

edit

A string in Clojure is simply an instance of java.lang.String. As in Java, string literals are written in double quotes, but unlike in Java, string literals may span onto multiple lines.

Characters

edit

A java.lang.Character literal is written as \ followed by the character:

\e
\t
\tab
\newline
\space

As you can see, whitespace characters are written as words after the \.

Booleans

edit

The literals true and false represent the values java.lang.Boolean.TRUE and java.lang.Boolean.FALSE, respectively.

In most Lisp dialects, there is a value semi-equivalent to Java null called nil. In Clojure, nil is simply Java's null value, end of story.

In Java, only true and false are legitimate values for condition expressions, but in Clojure, condition expressions treat nil as having the truth value false. So whereas !null ("not null") is invalid Java, the Clojure equivalent (not nil) returns true.

Functions

edit

A function in Clojure is a type of object, so a Clojure function can not only be invoked but can also be passed as an argument. As Clojure is a dynamic language, Clojure function parameters are not typed---arguments of any type can be passed to any Clojure function---but a Clojure function has a set arity, so an exception is thrown if you pass a function the wrong number of arguments. However, the last parameter of a function can be declared to accept any extra arguments as a list (like "variable arguments" in Java) such that the function accepts n or more arguments.

Vars

edit

Var is one of the few mutable types in Clojure. A Var is basically a single storage cell for holding another object---a collection of one, basically.

A single Var can actually constitute multiple references: a root binding (a binding visible to all threads) and any number of thread-local bindings (bindings each visible to a single thread). When the value of a Var is accessed, the binding accessed may depend upon the thread doing the access: the value of the Var's thread-local binding is returned if the Var has a thread-local binding for that thread; otherwise, the value of the Var's root binding value (if any) is returned.

Typically, all global functions and variables in Clojure are each stored in the root binding of a Var. Because a Var is mutable, we can change the Var's value to monkey-patch the system as it runs. For instance, we can substitute a buggy function with a fixed replacement. This works because, in Clojure, a compiled function is bound to the Vars holding the functions it invokes, not the functions themselves, nor the names used to specify the Vars; since a Var is mutable, the function(s) called by a function can change without redefining the function.

Local parameters and variables in Clojure are immutable: they are bound at the start of their lifetime and then never bound again. Sometimes, however, we really do want mutable locals, and Vars with thread-local bindings can serve this purpose.

Thread-local bindings also allow us to monkey-patch just for the span of a local context. Say we have a function cat which calls a function stored in a Var; if a function goat is root-bound to the Var, then cat will normally call goat; however, if we call cat in a scope where we have thread-locally bound a function moose to that Var, then cat will invoke moose instead of goat.

Namespaces

edit

You should organize your code into namespaces. A Clojure namespace is an object representing a mapping of symbol values to Var and/or java.lang.Class objects.

  • A Var can either be referred or interned in a namespace: the difference is that a Var can only be interned in one namespace but can be referred in any number of namespaces. In other words, the namespace in which a Var is interned is the namespace to which it "really" belongs.
  • A Class can only be referred, not interned, in namespaces. When a namespace is created, it automatically includes refers to the classes of java.lang.

In a sense, namespaces themselves live in one global namespace: a namespace name is unique to one single namespace, e.g. you never have more than one namespace named foo.

When Clojure starts, it creates a namespace called clojure in which it maps the symbol *ns* to a Var which is used to hold "the current namespace". Then, Clojure runs a script called core.clj, which interns in clojure many standard functions, including functions for manipulating the current namespace, such as:

  • in-ns sets the current namespace to a particular namespace (manipulating clojure/*ns* directly is frowned upon).
  • import refers Class objects into the current namespace.
  • refer refers the interned Vars of another namespace into the current namespace.

Symbols

edit

In Lisp, what are normally called identifiers in other languages are called symbols. A symbol, however, is not just a name seen by the compiler but rather a kind of value, a string-like kind of value---i.e. a sequence of characters. As a symbol is a value, a symbol can be stored in a collection, passed as an argument to a function, etc., just like any other object.

A symbol can only contain alphanumeric characters and * + ! / . : - _ ? but must not begin with a numeral or colon:

rubber-baby-buggy-bumper!      ; valid
j3_!:7                         ; valid
HELICOPTER                     ; valid 
+fiduciary+                    ; valid
3moose                         ; invalid
rubber baby buggy bumper       ; invalid

Symbols containing a / are namespace qualified:

foo/bar    ; a symbol qualified with the namespace name "foo"

Symbols containing . are treated specially at evaluation time, as we'll see.

Collections

edit

A key feature of Clojure is that its standard collection types---lists and hashmaps, mainly---are all persistent. A persistent collection is an object which is immutable but from which producing a new collection based on the existing collection is cheap because the existing data needn't be copied. For instance, the operation which appends an element to a persistent list does not actually modify the list but rather returns a new list which is the same as the original but with an extra element; this new list is created cheaply because it mostly requires just creating a new node and linking it to the already existing list nodes, which are now shared between the two lists. Both the original collection and the new collection have the same performance characteristics.

  • Lists

The Clojure persistent list type is a singly-linked list and is expressed as a literal in parentheses:

(53 "moo" asdf)   ; a list of three elements: a number, a string, and a symbol
  • Vectors

Singly-linked lists are often inappropriate, performance-wise, so Clojure includes a type it calls vector. A Clojure vector is an ordered, one-dimensional sequence like a list, but a vector is implemented as a hashmap-like structure such that index look up times are O(log32 n) instead of O(n). A vector is expressed as a literal in square brackets:

[53 "moo" asdf]    ; a vector of three elements: a number, a string, and a symbol
  • Hashmaps

A hashmap is expressed as a literal in curly braces such that each group of two arguments is a key-value pair:

{35 "moo" "quack" 21}   ; a hashmap with the key-value pairs 35 -> "moo" and "quack" -> 21
  • Sequence

A sequence is not an actual collection type but an interface to which list, vector, hashmap, and all other Clojure collection types conform. A sequence supports the operations first and rest: first retrieves the first item of the collection while rest retrieves a sequence of all the remaining items. As we'll see, sequences support a large number of operations built upon these two fundamental operations.

(When a sequence is produced from a map, first means retrieving a single pair of the map as a vector; the pair returned is effectively random as far as the programmer is concerned. The rest of a map-based sequence is the sequence of all remaining pairs as vectors.)

Keywords

edit

A keyword is a variant of a symbol, distinguished by being preceded by a colon:

:rubber-baby-buggy-bumper!      ; valid
:j3_!:7                         ; valid
:HELICOPTER                     ; valid 
:+fiduciary+                    ; valid

Keywords exist simply because, as you'll see, it's useful to have names in code which are symbol-like but not actually symbols. Keywords are by default not namespace-qualified. However, in some cases it may be useful to generate a keyword that is namespace-qualified so as to avoid name clashes with other code. For that purpose, one can either qualify the namespace explicitly or type a symbol preceded by two colons:

::gina     ; equivalent to :adam/gina (assuming this is in the namespace "adam")

;; in the REPL, after (in-ns 'adam) and (clojure.core/refer 'clojure.core)
adam=> (namespace :gina)        ; no namespace
nil
adam=> (namespace ::gina)
"adam" 
adam=> (namespace :adam/gina) 
"adam"

Note: There is a caveat with programmatically generated keywords regarding namespaces. One can generate a keyword that looks like it is part of a namespace, but (namespace) will return nil:

; use (namespace) to see what the namespace of the returned keywords is
user=> (keyword "test")        ; a keyword with no namespace
:test
user=> (keyword "user" "test") ; a keyword in the user namespace
:user/test
user=> (keyword "user/test")   ; a keyword that has no namespace but looks like it does!
:user/test

Metadata

edit

Metadata is data describing other data. A Clojure object can have a single other object (any object implementing IPersistentMap) attached to it as metadata, e.g. a Vector can have a hashmap attached to it as metadata.

Attaching metadata to an object does not modify the object but rather creates a new object---effectively, an object with different metadata is a different object. However, equality comparisons ignore metadata.


 
Basic Operations
Learning Clojure  
Data Structures
Data Types