New readers: Please report stumbling blocks! While the material on this page is intended to explain clearly, there are always mental traps that innocent readers new to the subject fall in but that the authors are not aware of. Please report any tricky passages to the Talk page or the #haskell IRC channel so that the style of exposition can be improved. 
Contents
IntroductionEdit
This chapter explains how to formalize the meaning of Haskell programs, the denotational semantics. It may seem to be nitpicking to formally specify that the program square x = x*x
means the same as the mathematical square function that maps each number to its square, but what about the meaning of a program like f x = f (x+1)
that loops forever? In the following, we will exemplify the approach first taken by Scott and Strachey to this question and obtain a foundation to reason about the correctness of functional programs in general and recursive definitions in particular. Of course, we will concentrate on those topics needed to understand Haskell programs.^{[1]}
Another aim of this chapter is to illustrate the notions strict and lazy that capture the idea that a function needs or needs not to evaluate its argument. This is a basic ingredient to predict the course of evaluation of Haskell programs and hence of primary interest to the programmer. Interestingly, these notions can be formulated concisely with denotational semantics alone, no reference to an execution model is necessary. They will be put to good use in Graph Reduction, but it is this chapter that will familiarize the reader with the denotational definition and involved notions such as ⊥ ("Bottom"). The reader only interested in strictness may wish to poke around in section Bottom and Partial Functions and quickly head over to Strict and NonStrict Semantics.
What are Denotational Semantics and what are they for?Edit
What does a Haskell program mean? This question is answered by the denotational semantics of Haskell. In general, the denotational semantics of a programming language map each of its programs to a mathematical object (denotation), that represents the meaning of the program in question. As an example, the mathematical object for the Haskell programs 10
, 9+1
, 2*5
and sum [1..4]
can be represented by the integer 10. We say that all those programs denote the integer 10. The collection of such mathematical objects is called the semantic domain.
The mapping from program code to a semantic domain is commonly written down with double square brackets ("Oxford brackets") around program code. For example,
Denotations are compositional, i.e. the meaning of a program like 1+9
only depends on the meaning of its constituents:
The same notation is used for types, i.e.
For simplicity however, we will silently identify expressions with their semantic objects in subsequent chapters and use this notation only when clarification is needed.
It is one of the key properties of purely functional languages like Haskell that a direct mathematical interpretation like "1+9
denotes 10" carries over to functions, too: in essence, the denotation of a program of type Integer > Integer
is a mathematical function between integers. While we will see that this expression needs refinement generally, to include nontermination, the situation for imperative languages is clearly worse: a procedure with that type denotes something that changes the state of a machine in possibly unintended ways. Imperative languages are tightly tied to operational semantics which describes their way of execution on a machine. It is possible to define a denotational semantics for imperative programs and to use it to reason about such programs, but the semantics often has operational nature and sometimes must be extended in comparison to the denotational semantics for functional languages.^{[2]} In contrast, the meaning of purely functional languages is by default completely independent from their way of execution. The Haskell98 standard even goes as far as to specify only Haskell's nonstrict denotational semantics, leaving open how to implement them.
In the end, denotational semantics enables us to develop formal proofs that programs indeed do what we want them to do mathematically. Ironically, for proving program properties in daytoday Haskell, one can use Equational reasoning, which transforms programs into equivalent ones without seeing much of the underlying mathematical objects we are concentrating on in this chapter. But the denotational semantics actually show up whenever we have to reason about nonterminating programs, for instance in Infinite Lists.
Of course, because they only state what a program is, denotational semantics cannot answer questions about how long a program takes or how much memory it eats; this is governed by the evaluation strategy which dictates how the computer calculates the normal form of an expression. On the other hand, the implementation has to respect the semantics, and to a certain extent, it is the semantics that determine how Haskell programs must be evaluated on a machine. We will elaborate on this in Strict and NonStrict Semantics.
What to choose as Semantic Domain?Edit
We are now looking for suitable mathematical objects that we can attribute to every Haskell program. In case of the example 10
, 2*5
and sum [1..4]
, it is clear that all expressions should denote the integer 10. Generalizing, every value x
of type Integer
is likely to denote an element of the set . The same can be done with values of type Bool
. For functions like f :: Integer > Integer
, we can appeal to the mathematical definition of "function" as a set of (argument,value)pairs, its graph.
But interpreting functions as their graph was too quick, because it does not work well with recursive definitions. Consider the definition
shaves :: Integer > Integer > Bool 1 `shaves` 1 = True 2 `shaves` 2 = False 0 `shaves` x = not (x `shaves` x) _ `shaves` _ = False
We can think of 0
,1
and 2
as being male persons with long beards and the question is who shaves whom. Person 1
shaves himself, but 2
gets shaved by the barber 0
because evaluating the third equation yields 0 `shaves` 2 == True
. In general, the third line says that the barber 0
shaves all persons that do not shave themselves.
What about the barber himself, is 0 `shaves` 0
true or not? If it is, then the third equation says that it is not. If it is not, then the third equation says that it is. Puzzled, we see that we just cannot attribute True
or False
to 0 `shaves` 0
, the graph we use as interpretation for the function shaves
must have an empty spot. We realize that our semantic objects must be able to incorporate partial functions, functions that are undefined for some arguments.
It is well known that this famous example gave rise to serious foundational problems in set theory. It's an example of an impredicative definition, a definition which uses itself, a logical circle. Unfortunately for recursive definitions, the circle is not the problem but the feature.
Bottom and Partial FunctionsEdit
⊥ BottomEdit
To define partial functions, we introduce a special value ⊥, named bottom and commonly written __
in typewriter font. We say that ⊥ is the completely "undefined" value or function. Every basic data type like Integer
or ()
contains one ⊥ besides their usual elements. So the possible values of type Integer
are
Adding ⊥ to the set of values is also called lifting. This is often depicted by a subscript like in . While this is the correct notation for the mathematical set "lifted integers", we prefer to talk about "values of type Integer
". We do this because suggests that there are "real" integers , but inside Haskell, the "integers" are Integer
.
As another example, the type ()
with only one element actually has two inhabitants:
For now, we will stick to programming with Integer
s. Arbitrary algebraic data types will be treated in section Algebraic Data Types as strict and nonstrict languages diverge on how these include ⊥.
In Haskell, the expression undefined
denotes ⊥. With its help, one can indeed verify some semantic properties of actual Haskell programs. undefined
has the polymorphic type forall a . a
which of course can be specialized to undefined :: Integer
, undefined :: ()
, undefined :: Integer > Integer
and so on. In the Haskell Prelude, it is defined as
undefined = error "Prelude.undefined"
As a side note, it follows from the CurryHoward isomorphism that any value of the polymorphic type forall a . a
must denote ⊥.
Partial Functions and the Semantic Approximation OrderEdit
Now, (bottom type) gives us the possibility to denote partial functions:
Here, yields well defined values for and but gives for all other . Note that the type is universal, as has no value: the function :: Integer > Integer
is given by
where the on the right hand side denotes a value of type Integer
.
To formalize, partial functions say, of type Integer > Integer
are at least mathematical mappings from the lifted integers to the lifted integers. But this is not enough, since it does not acknowledge the special role of . For example, the definition
looks counterintuitive, and, in fact, is wrong. Why does yield a defined value whereas is undefined? The intuition is that every partial function should yield more defined answers for more defined arguments. To formalize, we can say that every concrete number is more defined than :
Here, denotes that is more defined than . Likewise, will denote that either is more defined than or both are equal (and so have the same definedness). is also called the semantic approximation order because we can approximate defined values by less defined ones thus interpreting "more defined" as "approximating better". Of course, is designed to be the least element of a data type, we always have that for all , except the case when happens to denote itself:
As no number is more defined than another, the mathematical relation is false for any pair of numbers:
This is contrasted to ordinary order predicate , which can compare any two numbers. A quick way to remember this is the sentence: " and are different in terms of information content but are equal in terms of information quantity". That's another reason why we use a different symbol: .
One says that specifies a partial order and that the values of type Integer
form a partially ordered set (poset for short). A partial order is characterized by the following three laws
 Reflexivity, everything is just as defined as itself: for all
 Transitivity: if and , then
 Antisymmetry: if both and hold, then and must be equal: .
Exercises 

Do the integers form a poset with respect to the order ? 
We can depict the order on the values of type Integer
by the following graph
where every link between two nodes specifies that the one above is more defined than the one below. Because there is only one level (excluding ), one says that Integer
is a flat domain. The picture also explains the name of : it's called bottom because it always sits at the bottom.
MonotonicityEdit
Our intuition about partial functions now can be formulated as following: every partial function is a monotone mapping between partially ordered sets. More defined arguments will yield more defined values:
In particular, a function with is constant: for all . Note that here it is crucial that etc. don't hold.
Translated to Haskell, monotonicity means that we cannot use as a condition, i.e. we cannot pattern match on , or its equivalent undefined
. Otherwise, the example from above could be expressed as a Haskell program. As we shall see later, also denotes nonterminating programs, so that the inability to observe inside Haskell is related to the halting problem.
Of course, the notion of more defined than can be extended to partial functions by saying that a function is more defined than another if it is so at every possible argument:
Thus, the partial functions also form a poset, with the undefined function being the least element.
Recursive Definitions as Fixed Point IterationsEdit
Approximations of the Factorial FunctionEdit
Now that we have a means to describe partial functions, we can give an interpretation to recursive definitions. Lets take the prominent example of the factorial function whose recursive definition is
Although we saw that interpreting this recursive function directly as a set description may lead to problems, we intuitively know that in order to calculate for every given we have to iterate the right hand side. This iteration can be formalized as follows: we calculate a sequence of functions with the property that each one consists of the right hand side applied to the previous one, that is
We start with the undefined function , and the resulting sequence of partial functions reads:
and so on. Clearly,
and we expect that the sequence converges to the factorial function.
The iteration follows the well known scheme of a fixed point iteration
In our case, is a function and is a functional, a mapping between functions. We have
If we start with , the iteration will yield increasingly defined approximations to the factorial function
(Proof that the sequence increases: The first inequality follows from the fact that is less defined than anything else. The second inequality follows from the first one by applying to both sides and noting that is monotone. The third follows from the second in the same fashion and so on.)
It is very illustrative to formulate this iteration scheme in Haskell. As functionals are just ordinary higher order functions, we have
g :: (Integer > Integer) > (Integer > Integer) g x = \n > if n == 0 then 1 else n * x (n1) x0 :: Integer > Integer x0 = undefined (f0:f1:f2:f3:f4:fs) = iterate g x0
We can now evaluate the functions f0,f1,...
at sample arguments and see whether they yield undefined
or not:
> f3 0 1 > f3 1 1 > f3 2 2 > f3 5 *** Exception: Prelude.undefined > map f3 [0..] [1,1,2,*** Exception: Prelude.undefined > map f4 [0..] [1,1,2,6,*** Exception: Prelude.undefined > map f1 [0..] [1,*** Exception: Prelude.undefined
Of course, we cannot use this to check whether f4 is really undefined for all arguments.
ConvergenceEdit
To the mathematician, the question whether this sequence of approximations converges is still to be answered. For that, we say that a poset is a directed complete partial order (dcpo) iff every monotone sequence (also called chain) has a least upper bound (supremum)
If that's the case for the semantic approximation order, we clearly can be sure that monotone sequence of functions approximating the factorial function indeed has a limit. For our denotational semantics, we will only meet dcpos which have a least element which are called complete partial orders (cpo).
The Integer
s clearly form a (d)cpo, because the monotone sequences consisting of more than one element must be of the form
where is an ordinary number. Thus, is already the least upper bound.
For functions Integer > Integer
, this argument fails because monotone sequences may be of infinite length. But because Integer
is a (d)cpo, we know that for every point , there is a least upper bound
As the semantic approximation order is defined pointwise, the function is the supremum we looked for.
These have been the last touches for our aim to transform the impredicative definition of the factorial function into a well defined construction. Of course, it remains to be shown that actually yields a defined value for every , but this is not hard and far more reasonable than a completely illformed definition.
Bottom includes NonTerminationEdit
It is instructive to try our newly gained insight into recursive definitions on an example that does not terminate:
The approximating sequence reads
and consists only of . Clearly, the resulting limit is again. From an operational point of view, a machine executing this program will loop indefinitely. We thus see that may also denote a nonterminating function or value. Hence, given the halting problem, pattern matching on in Haskell is impossible.
Interpretation as Least Fixed PointEdit
Earlier, we called the approximating sequence an example of the well known "fixed point iteration" scheme. And of course, the definition of the factorial function can also be thought as the specification of a fixed point of the functional :
However, there might be multiple fixed points. For instance, there are several which fulfill the specification
Of course, when executing such a program, the machine will loop forever on or and thus not produce any valuable information about the value of . This corresponds to choosing the least defined fixed point as semantic object and this is indeed a canonical choice. Thus, we say that
defines the least fixed point of . Clearly, least is with respect to our semantic approximation order .
The existence of a least fixed point is guaranteed by our iterative construction if we add the condition that must be continuous (sometimes also called "chain continuous"). That simply means that respects suprema of monotone sequences:
We can then argue that with
we have
and the iteration limit is indeed a fixed point of . You may also want to convince yourself that the fixed point iteration yields the least fixed point possible.
Exercises 

Prove that the fixed point obtained by fixed point iteration starting with is also the least one, that it is smaller than any other fixed point. (Hint: is the least element of our cpo and is monotone) 
By the way, how do we know that each Haskell function we write down indeed is continuous? Just as with monotonicity, this has to be enforced by the programming language. Admittedly, these properties can somewhat be enforced or broken at will, so the question feels a bit void. But intuitively, monotonicity is guaranteed by not allowing pattern matches on . For continuity, we note that for an arbitrary type a
, every simple function a > Integer
is automatically continuous because the monotone sequences of Integer
s are of finite length. Any infinite chain of values of type a
gets mapped to a finite chain of Integer
s and respect for suprema becomes a consequence of monotonicity. Thus, all functions of the special case Integer > Integer
must be continuous. For functionals like ::(Integer > Integer) > (Integer > Integer)
, the continuity then materializes due to currying, as the type is isomorphic to ::((Integer > Integer), Integer) > Integer
and we can take a=((Integer > Integer), Integer)
.
In Haskell, the fixed interpretation of the factorial function can be coded as
factorial = fix g
with the help of the fixed point combinator
fix :: (a > a) > a
.We can define it by
fix f = let x = f x in x
which leaves us somewhat puzzled because when expanding , the result is not anything different from how we would have defined the factorial function in Haskell in the first place. But of course, the construction this whole section was about is not at all present when running a real Haskell program. It's just a means to put the mathematical interpretation of Haskell programs on a firm ground. Yet it is very nice that we can explore these semantics in Haskell itself with the help of undefined
.
Strict and NonStrict SemanticsEdit
After having elaborated on the denotational semantics of Haskell programs, we will drop the mathematical function notation for semantic objects in favor of their now equivalent Haskell notation f n
.
Strict FunctionsEdit
A function f
with one argument is called strict, if and only if
f ⊥ = ⊥
.Here are some examples of strict functions
id x = x succ x = x + 1 power2 0 = 1 power2 n = 2 * power2 (n1)
and there is nothing unexpected about them. But why are they strict? It is instructive to prove that these functions are indeed strict. For id
, this follows from the definition. For succ
, we have to ponder whether ⊥ + 1
is ⊥
or not. If it was not, then we should for example have ⊥ + 1 = 2
or more general ⊥ + 1 = k
for some concrete number k. We remember that every function is monotone, so we should have for example
2 = ⊥ + 1 ⊑ 4 + 1 = 5
as ⊥ ⊑ 4
. But neither of 2 ⊑ 5
, 2 = 5
nor 2 ⊒ 5
is valid so that k cannot be 2. In general, we obtain the contradiction
k = ⊥ + 1 ⊑ k + 1 = k + 1
.and thus the only possible choice is
succ ⊥ = ⊥ + 1 = ⊥
and succ
is strict.
Exercises 

Prove that power2 is strict. While one can base the proof on the "obvious" fact that power2 n is , the latter is preferably proven using fixed point iteration. 
NonStrict and Strict LanguagesEdit
Searching for nonstrict functions, it happens that there is only one prototype of a nonstrict function of type Integer > Integer
:
one x = 1
Its variants are constk x = k
for every concrete number k
. Why are these the only ones possible? Remember that one n
can be no less defined than one ⊥
. As Integer
is a flat domain, both must be equal.
Why is one
nonstrict? To see that it is, we use a Haskell interpreter and try
> one (undefined :: Integer) 1
which is not ⊥. This is reasonable as one
completely ignores its argument. When interpreting ⊥ in an operational sense as "nontermination", one may say that the nonstrictness of one
means that it does not force its argument to be evaluated and therefore avoids the infinite loop when evaluating the argument ⊥. But one might as well say that every function must evaluate its arguments before computing the result which means that one ⊥
should be ⊥, too. That is, if the program computing the argument does not halt, one
should not halt as well.^{[3]} It shows up that one can choose freely this or the other design for a functional programming language. One says that the language is strict or nonstrict depending on whether functions are strict or nonstrict by default. The choice for Haskell is nonstrict. In contrast, the functional languages ML and Lisp choose strict semantics.
Functions with several ArgumentsEdit
The notion of strictness extends to functions with several variables. For example, a function f
of two arguments is strict in the second argument if and only if
f x ⊥ = ⊥
for every x
. But for multiple arguments, mixed forms where the strictness depends on the given value of the other arguments, are much more common. An example is the conditional
cond b x y = if b then x else y
We see that it is strict in y
depending on whether the test b
is True
or False
:
cond True x ⊥ = x cond False x ⊥ = ⊥
and likewise for x
. Apparently, cond
is certainly ⊥ if both x
and y
are, but not necessarily when at least one of them is defined. This behavior is called joint strictness.
Clearly, cond
behaves like the ifthenelse statement where it is crucial not to evaluate both the then
and the else
branches:
if null xs then 'a' else head xs if n == 0 then 1 else 5 / n
Here, the else part is ⊥ when the condition is met. Thus, in a nonstrict language, we have the possibility to wrap primitive control statements such as ifthenelse into functions like cond
. This way, we can define our own control operators. In a strict language, this is not possible as both branches will be evaluated when calling cond
which makes it rather useless. This is a glimpse of the general observation that nonstrictness offers more flexibility for code reuse than strictness. See the chapter Laziness^{[4]} for more on this subject.
Algebraic Data TypesEdit
After treating the motivation case of partial functions between Integer
s, we now want to extend the scope of denotational semantics to arbitrary algebraic data types in Haskell.
A word about nomenclature: the collection of semantic objects for a particular type is usually called a domain. This term is more a generic name than a particular definition and we decide that our domains are cpos (complete partial orders), that is sets of values together with a relation more defined that obeys some conditions to allow fixed point iteration. Usually, one adds additional conditions to the cpos that ensure that the values of our domains can be represented in some finite way on a computer and thereby avoiding to ponder the twisted ways of uncountable infinite sets. But as we are not going to prove general domain theoretic theorems, the conditions will just happen to hold by construction.
ConstructorsEdit
Let's take the example types
data Bool = True  False data Maybe a = Just a  Nothing
Here, True
, False
and Nothing
are nullary constructors whereas Just
is a unary constructor. The inhabitants of Bool
form the following domain:
Remember that ⊥ is added as least element to the set of values True
and False
, we say that the type is lifted^{[5]}. A domain whose poset diagram consists of only one level is called a flat domain. We already know that is a flat domain as well, it's just that the level above ⊥ has an infinite number of elements.
What are the possible inhabitants of Maybe Bool
? They are
⊥, Nothing, Just ⊥, Just True, Just False
So the general rule is to insert all possible values into the unary (binary, ternary, ...) constructors as usual but without forgetting ⊥. Concerning the partial order, we remember the condition that the constructors should be monotone just as any other functions. Hence, the partial order looks as follows
But there is something to ponder: why isn't Just ⊥ = ⊥
? I mean "Just undefined" is as undefined as "undefined"! The answer is that this depends on whether the language is strict or nonstrict. In a strict language, all constructors are strict by default, i.e. Just ⊥ = ⊥
and the diagram would reduce to
As a consequence, all domains of a strict language are flat.
But in a nonstrict language like Haskell, constructors are nonstrict by default and Just ⊥
is a new element different from ⊥, because we can write a function that reacts differently to them:
f (Just _) = 4 f Nothing = 7
As f
ignores the contents of the Just
constructor, f (Just ⊥)
is 4
but f ⊥
is ⊥
(intuitively, if f
is passed ⊥, it will not be possible to tell whether to take the Just branch or the Nothing branch, and so ⊥ will be returned).
This gives rise to nonflat domains as depicted in the former graph. What should these be of use for? In the context of Graph Reduction, we may also think of ⊥ as an unevaluated expression. Thus, a value x = Just ⊥
may tell us that a computation (say a lookup) succeeded and is not Nothing
, but that the true value has not been evaluated yet. If we are only interested in whether x
succeeded or not, this actually saves us from the unnecessary work to calculate whether x
is Just True
or Just False
as would be the case in a flat domain. The full impact of nonflat domains will be explored in the chapter Laziness, but one prominent example are infinite lists treated in section Recursive Data Types and Infinite Lists.
Pattern MatchingEdit
In the section Strict Functions, we proved that some functions are strict by inspecting their results on different inputs and insisting on monotonicity. However, in the light of algebraic data types, there can only be one source of strictness in real life Haskell: pattern matching, i.e. case
expressions. The general rule is that pattern matching on a constructor of a data
type will force the function to be strict, i.e. matching ⊥ against a constructor always gives ⊥. For illustration, consider
const1 _ = 1
const1' True = 1 const1' False = 1
The first function const1
is nonstrict whereas the const1'
is strict because it decides whether the argument is True
or False
although its result doesn't depend on that. Pattern matching in function arguments is equivalent to case
expressions
const1' x = case x of True > 1 False > 1
which similarly impose strictness on x
: if the argument to the case
expression denotes ⊥ the whole case
will denote ⊥, too. However, the argument for case expressions may be more involved as in
foo k table = case lookup ("Foo." ++ k) table of Nothing > ... Just x > ...
and it can be difficult to track what this means for the strictness of foo
.
An example for multiple pattern matches in the equational style is the logical or
:
or True _ = True or _ True = True or _ _ = False
Note that equations are matched from top to bottom. The first equation for or
matches the first argument against True
, so or
is strict in its first argument. The same equation also tells us that or True x
is nonstrict in x
. If the first argument is False
, then the second will be matched against True
and or False x
is strict in x
. Note that while wildcards are a general sign of nonstrictness, this depends on their position with respect to the pattern matches against constructors.
Exercises 


There is another form of pattern matching, namely irrefutable patterns marked with a tilde ~
. Their use is demonstrated by
f ~(Just x) = 1 f Nothing = 2
An irrefutable pattern always succeeds (hence the name) resulting in f ⊥ = 1
. But when changing the definition of f
to
f ~(Just x) = x + 1 f Nothing = 2  this line may as well be left away
we have
f ⊥ = ⊥ + 1 = ⊥ f (Just 1) = 1 + 1 = 2
If the argument matches the pattern, x
will be bound to the corresponding value. Otherwise, any variable like x
will be bound to ⊥.
By default, let
and where
bindings are nonstrict, too:
foo key map = let Just x = lookup key map in ...
is equivalent to
foo key map = case (lookup key map) of ~(Just x) > ...
Exercises 


Recursive Data Types and Infinite ListsEdit
The case of recursive data structures is not very different from the base case. Consider a list of unit values
data List = []  () : List
Though this seems like a simple type, there is a surprisingly complicated number of ways you can fit in here and there, and therefore the corresponding graph is complicated. The bottom of this graph is shown below. An ellipsis indicates that the graph continues along this direction. A red ellipse behind an element indicates that this is the end of a chain; the element is in normal form.
and so on. But now, there are also chains of infinite length like
⊥ ():⊥ ():():⊥ ...
This causes us some trouble as we noted in section Convergence that every monotone sequence must have a least upper bound. This is only possible if we allow for infinite lists. Infinite lists (sometimes also called streams) turn out to be very useful and their manifold use cases are treated in full detail in chapter Laziness. Here, we will show what their denotational semantics should be and how to reason about them. Note that while the following discussion is restricted to lists only, it easily generalizes to arbitrary recursive data structures like trees.
In the following, we will switch back to the standard list type
data [a] = []  a : [a]
to close the syntactic gap to practical programming with infinite lists in Haskell.
Exercises 


Calculating with infinite lists is best shown by example. For that, we need an infinite list
ones :: [Integer] ones = 1 : ones
When applying the fixed point iteration to this recursive definition, we see that ones
ought to be the supremum of
⊥ 1:⊥ 1:1:⊥ 1:1:1:⊥ ...
,that is an infinite list of 1
. Let's try to understand what take 2 ones
should be. With the definition of take
take 0 _ = [] take n (x:xs) = x : take (n1) xs take n [] = []
we can apply take
to elements of the approximating sequence of ones
:
take 2 ⊥ ==> ⊥ take 2 (1:⊥) ==> 1 : take 1 ⊥ ==> 1 : ⊥ take 2 (1:1:⊥) ==> 1 : take 1 (1:⊥) ==> 1 : 1 : take 0 ⊥ ==> 1 : 1 : []
We see that take 2 (1:1:1:⊥)
and so on must be the same as take 2 (1:1:⊥) = 1:1:[]
because 1:1:[]
is fully defined. Taking the supremum on both the sequence of input lists and the resulting sequence of output lists, we can conclude
take 2 ones = 1:1:[]
Thus, taking the first two elements of ones
behaves exactly as expected.
Generalizing from the example, we see that reasoning about infinite lists involves considering the approximating sequence and passing to the supremum, the truly infinite list. Still, we did not give it a firm ground. The solution is to identify the infinite list with the whole chain itself and to formally add it as a new element to our domain: the infinite list is the sequence of its approximations. Of course, any infinite list like ones
can be compactly depicted as
ones = 1 : 1 : 1 : 1 : ...
what simply means that
ones = (⊥ 1:⊥ 1:1:⊥ ...)
Exercises 


What about the puzzle of how a computer can calculate with infinite lists? It takes an infinite amount of time, after all? Well, this is true. But the trick is that the computer may well finish in a finite amount of time if it only considers a finite part of the infinite list. So, infinite lists should be thought of as potentially infinite lists. In general, intermediate results take the form of infinite lists whereas the final value is finite. It is one of the benefits of denotational semantics that one treat the intermediate infinite data structures as truly infinite when reasoning about program correctness.
Exercises 


As a last note, the construction of a recursive domain can be done by a fixed point iteration similar to recursive definition for functions. Yet, the problem of infinite chains has to be tackled explicitly. See the literature in External Links for a formal construction.
Haskell specialities: Strictness Annotations and NewtypesEdit
Haskell offers a way to change the default nonstrict behavior of data type constructors by strictness annotations. In a data declaration like
data Maybe' a = Just' !a  Nothing'
an exclamation point !
before an argument of the constructor specifies that it should be strict in this argument. Hence we have Just' ⊥ = ⊥
in our example. Further information may be found in chapter Strictness.
In some cases, one wants to rename a data type, like in
data Couldbe a = Couldbe (Maybe a)
However, Couldbe a
contains both the elements ⊥
and Couldbe ⊥
. With the help of a newtype
definition
newtype Couldbe a = Couldbe (Maybe a)
we can arrange that Couldbe a
is semantically equal to Maybe a
, but different during type checking. In particular, the constructor Couldbe
is strict. Yet, this definition is subtly different from
data Couldbe' a = Couldbe' !(Maybe a)
To explain how, consider the functions
f (Couldbe m) = 42 f' (Couldbe' m) = 42
Here, f' ⊥
will cause the pattern match on the constructor Couldbe'
fail with the effect that f' ⊥ = ⊥
. But for the newtype, the match on Couldbe
will never fail, we get f ⊥ = 42
. In a sense, the difference can be stated as:
 for the strict case,
Couldbe' ⊥
is a synonym for ⊥  for the newtype, ⊥ is a synonym for
Couldbe ⊥
with the agreement that a pattern match on ⊥ fails and that a match on Constructor ⊥
does not.
Newtypes may also be used to define recursive types. An example is the alternate definition of the list type [a]
newtype List a = In (Maybe (a, List a))
Again, the point is that the constructor In
does not introduce an additional lifting with ⊥.
Other Selected TopicsEdit
Abstract Interpretation and Strictness AnalysisEdit
As lazy evaluation means a constant computational overhead, a Haskell compiler may want to discover where inherent nonstrictness is not needed at all which allows it to drop the overhead at these particular places. To that extent, the compiler performs strictness analysis just like we proved in some functions to be strict section Strict Functions. Of course, details of strictness depending on the exact values of arguments like in our example cond
are out of scope (this is in general undecidable). But the compiler may try to find approximate strictness information and this works in many common cases like power2
.
Now, abstract interpretation is a formidable idea to reason about strictness: ...
For more about strictness analysis, see the research papers about strictness analysis on the Haskell wiki.
Interpretation as PowersetsEdit
So far, we have introduced ⊥ and the semantic approximation order abstractly by specifying their properties. However, both as well as any inhabitants of a data type like Just ⊥
can be interpreted as ordinary sets. This is called the powerset construction. NOTE: i'm not sure whether this is really true. Someone how knows, please correct this.
The idea is to think of ⊥ as the set of all possible values and that a computation retrieves more information this by choosing a subset. In a sense, the denotation of a value starts its life as the set of all values which will be reduced by computations until there remains a set with a single element only.
As an example, consider Bool
where the domain looks like
{True} {False} \ / \ / ⊥ = {True, False}
The values True
and False
are encoded as the singleton sets {True}
and {False}
and ⊥ is the set of all possible values.
Another example is Maybe Bool
:
{Just True} {Just False} \ / \ / {Nothing} {Just True, Just False} \ / \ / ⊥ = {Nothing, Just True, Just False}
We see that the semantic approximation order is equivalent to set inclusion, but with arguments switched:
This approach can be used to give a semantics to exceptions in Haskell^{[6]}.
Naïve Sets are unsuited for Recursive Data TypesEdit
In the section What to choose as Semantic Domain?, we argued that taking simple sets as denotation for types doesn't work well with partial functions. In the light of recursive data types, things become even worse as John C. Reynolds showed in his paper Polymorphism is not settheoretic^{[7]}.
Reynolds actually considers the recursive type
newtype U = In ((U > Bool) > Bool)
Interpreting Bool
as the set {True,False}
and the function type A > B
as the set of functions from A
to B
, the type U
cannot denote a set. This is because (A > Bool)
is the set of subsets (powerset) of A
which, due to a diagonal argument analogous to Cantor's argument that there are "more" real numbers than natural ones, always has a bigger cardinality than A
. Thus, (U > Bool) > Bool
has an even bigger cardinality than U
and there is no way for it to be isomorphic to U
. Hence, the set U
must not exist, a contradiction.
In our world of partial functions, this argument fails. Here, an element of U
is given by a sequence of approximations taken from the sequence of domains
⊥, (⊥ > Bool) > Bool, (((⊥ > Bool) > Bool) > Bool) > Bool
and so onwhere ⊥ denotes the domain with the single inhabitant ⊥. While the author of this text admittedly has no clue on what such a thing should mean, the constructor gives a perfectly well defined object for U
. We see that the type (U > Bool) > Bool
merely consists of shifted approximating sequences which means that it is isomorphic to U
.
As a last note, Reynolds actually constructs an equivalent of U
in the second order polymorphic lambda calculus. There, it happens that all terms have a normal form, i.e. there are only total functions when we do not include a primitive recursion operator fix :: (a > a) > a
. Thus, there is no true need for partial functions and ⊥, yet a naïve set theoretic semantics fails. We can only speculate that this has to do with the fact that not every mathematical function is computable. In particular, the set of computable functions A > Bool
should not have a bigger cardinality than A
.
Notes
 ↑ In fact, there are no written down and complete denotational semantics of Haskell. This would be a tedious task void of additional insight and we happily embrace the folklore and common sense semantics.
 ↑ Monads are one of the most successful ways to give denotational semantics to imperative programs. See also Haskell/Advanced monads.
 ↑ Strictness as premature evaluation of function arguments is elaborated in the chapter Graph Reduction.
 ↑ The term Laziness comes from the fact that the prevalent implementation technique for nonstrict languages is called lazy evaluation
 ↑ The term lifted is somewhat overloaded, see also Unboxed Types.
 ↑ S. Peyton Jones, A. Reid, T. Hoare, S. Marlow, and F. Henderson. A semantics for imprecise exceptions. In Programming Languages Design and Implementation. ACM press, May 1999.
 ↑ John C. Reynolds. Polymorphism is not settheoretic. INRIA Rapports de Recherche No. 296. May 1984.
External LinksEdit
Online books about Denotational Semantics
 Schmidt, David A. (1986). Denotational Semantics. A Methodology for Language Development. Allyn and Bacon. http://www.cis.ksu.edu/~schmidt/text/densem.html.