Standard ML Programming/Types

Standard ML has a particularly strong static type system. Unlike many languages, it does not have subtypes ("is-a" relationships) or implicit casts between types.

Static typing

edit

Internally, all data in a computer is made up of bits (binary digits, either 0 or 1), which are grouped into bytes (minimally addressable groups of bits — eight bits in all modern machines), and usually in turn into words (groups of bytes that the CPU can treat as a single unit — nowadays usually either four bytes, i.e. 32 bits, or eight bytes, i.e. 64 bits).

However, it is fairly unusual that a program is intended to operate on bits, bytes, and words per se. Usually a program needs to operate on human-intelligible, abstract data-types: integers, say, or real numbers, or strings of characters — or all of these. Programs operate on bits and bytes mainly because these are how computers implement, or represent, these data-types. There are a variety of ways that programming languages address this discrepancy:

  • A language can provide no data-type abstractions, and require programmer code to deal explicitly in bits and bytes. It may provide operations intended for certain abstract data-types (such as addition of 32-bit integers), but leave it entirely up to program code to ensure that it only uses those operations on bytes representing those types. This approach is characteristic of assembly languages and, to some extent, C.
  • A language can provide exactly one data-type abstraction, used by all programmer code. This approach is characteristic of shell-scripting languages, which frequently operate almost exclusively on strings of characters.
  • A language can assign a type to each fragment of data (each value), and store that type assignment together with the value itself. When an operation is attempted on a value of an inappropriate data-type, the language either automatically converts it to the appropriate type (e.g., promoting an integer value to an equivalent value of a real-number type), or emits an error. This approach, where type information exists only at run-time, is called dynamic typing, and is characteristic of languages like Lisp, Python, Ruby and others.
  • A language can assign a type to each fragment of code (each expression). If a bit of code applies an operation to an expression of an inappropriate data-type, the compiler either infers additional code to perform the type conversion, or emits an error. This approach, where type information exists only at compile-time, is called static typing, and is characteristic of languages like Standard ML, OCaml, Haskell and others.

Most languages do not adhere strictly to one of the above approaches, but rather, use elements of more than one of them. The type system of Standard ML, however, uses static typing almost exclusively. This means that an ill-typed program will not even compile. (ML programmers consider this a good thing, as it allows many programming errors to be caught at compile-time that a dynamically-typed language would catch only at run-time.) To the extent that Standard ML does support dynamic typing, it is within the static-typing framework.

Strong typing

edit

The term strong typing is used in a wide variety of different ways (see the Wikipedia article “Strongly typed programming language” for a fairly thorough listing); nonetheless, it is fair to say that the Standard ML type system provides strong typing by almost all definitions. Every expression in an SML program has a specific type at compile-time, and its type at run-time is never in contravention of this. All type conversions are explicit (using functions such as real, which accepts an integer and returns an equivalent real number), and take the form of meaningful translations rather than mere re-interpretations of raw bits.

Basic types

edit

There are a number of basic types that may be thought of as "built in", firstly in that they're predefined by the Standard ML Basis library (so that Standard ML programs do not need to define them), and secondly in that the language provides literal notations for them, such as 34 (which is an integer), or "34" (which is a string of characters). Some of the most commonly used are:

  • int (integer), such as 3 or ~12. (Note that a tilde ~ is used for negative numbers.)
  • real (floating-point number), such as 4.2 or ~6.4.
    • Standard ML does not implicitly promote integers to floating-point numbers; therefore, an expression such as 2 + 5.67 is invalid. It must be written either as 2.0 + 5.67, or as real(2) + 5.67 (using the real function to convert 2 to 2.0).
  • string (string of characters), such as "this is a string" or "". (The latter is the empty string, which contains zero characters.)
  • char (one character), such as #"y" or #"\n". (The latter denotes the newline character, ASCII code 10.)
  • bool (Boolean value), which is either true or false.

The following code snippet declares two variables:

val n : int = 66
val x : real = ~23.0

After this snippet, n has type int and the value 66; x has type real and the value -23. Note that, unlike in some languages, these variable bindings are permanent; this n will always have the value 66 (though it's possible that other, unrelated variables, elsewhere in the program, will also have the name n, and those variables can have completely different types and values).

Type inference

edit

In above examples, we provided explicit type annotations to inform the compiler of the type of a variable. However, such type annotations are optional, and are rarely necessary. In most cases, the compiler simply infers the correct type. Therefore, the following two code snippets are equivalent:

val s : string = "example"
val b : bool = true
val s = "example"
val b = true

In examples below, we occasionally provide type annotations as a form of documentation; this has the nice property that the documentation's correctness is enforced, in that compiler will report an error if any type annotations are incorrect. In other cases we may include ordinary comments, of the form (* this is a comment *); this is a more flexible form of documentation, in that it can include any sort of text (rather than just type information), but of course its accuracy cannot be enforced by the compiler.

Tuples

edit

Types, including the above basic types, can be combined in a number of ways. One way is in a tuple, which is an ordered set of values; for example, the expression (1, 2) is of type int * int, and ("foo", false) is of type string * bool. There is also a 0-tuple, (), whose type is denoted unit. There are no 1-tuples, however; or rather, there is no distinction between (for example) (1) and 1, both having type int.

Tuples may be nested, and (unlike in some mathematical formalisms), (1,2,3) is distinct from both ((1,2),3) and (1,(2,3)). The first is of type int * int * int; the other two are of types (int * int) * int and int * (int * int), respectively.

expression type notes
() unit the 0-tuple
(3, "yes", "yes") int * string * string a 3-tuple (ordered triple)
(3, "yes", true) int * string * bool a 3-tuple (ordered triple)
((1, 2), 3) (int * int) * int a 2-tuple (ordered pair), whose first element is another 2-tuple

The following code snippet declares four variables. On the right side the environment after execution is shown. Notice the use of pattern matching to assign types and values to a and b and the use of projection in the assignment of also_a. This allows for a very convenient notation.

val pair = ("a", "b")
val (a, b) = pair
val also_a = #1 pair
identifier value type
pair ("a", "b") string * string
a "a" string
b "b" string
also_a "a" string

Records

edit

Another way to combine values is in a record. A record is quite like a tuple, except that its components are named rather than ordered; for example, { a = 5.0, b = "five" } is of type { a : real, b : string } (which is the same as type { b : string, a : real }).

In fact, in Standard ML, tuples are simply a special case of records; for example, the type int * string * bool is the same as the type { 1 : int, 2 : string, 3 : bool }.

Functions

edit

A function accepts a value and normally returns a value. For example, the factorial function we defined in the introduction:

fun factorial n =  if n < 1  then 1  else n * factorial (n - 1)

is of type int -> int, meaning that it accepts a value of type int and returns a value of type int.

Even if a function doesn't return a value at run-time — for example, if it raises an exception, or if it enters an infinite loop — it has a static return type at compile-time.

As with other types, we can provide explicit type annotations:

fun factorial (n : int) : int = if n < 1  then 1  else n * factorial (n - 1)

should we choose.

Tuples as arguments

edit

Although a Standard ML function must accept exactly one value (rather than taking a list of arguments), tuples and the above-mentioned pattern matching make this no restriction at all. For example, this code snippet:

fun sum (a, b) = a + b
fun average pair = sum pair div 2

creates two functions of type int * int -> int. This approach may also be used to create infix operators. This code snippet:

infix averaged_with
fun a averaged_with b = average (a, b)
val five = 3 averaged_with 7

establishes averaged_with into an infix operator, then creates it as a function of type int * int -> int.

And since a tuple is an ordinary type, a function can also return one. In this code snippet:

fun pair (n : int) = (n, n)

pair is of type int -> int * int.

Polymorphic data type

edit

In this code snippet:

fun pair x = (x, x)

the compiler has no way to infer a specific type for pair; it could be int -> int * int, real -> real * real, or even (int * real -> string) -> (int * real -> string) * (int * real -> string). Fortunately, it doesn't need to; it can simply assign it the polymorphic type 'a -> 'a * 'a, where 'a (pronounced "alpha") is a type variable, denoting any possible type. After the above definition, pair 3 and pair "x" are both well-defined, producing (3, 3) and ("x", "x") (respectively). A function can even depend on multiple type variables; in this snippet:

fun swap (x, y) = (y, x)

swap is of type 'a * 'b -> 'b * 'a. All or part of this can be indicated explicitly:

fun swap (x : 'a, y : 'b) : 'b * 'a = (y, x)

Functions as arguments, and curried functions

edit

A function can accept another function as an argument. For example, consider this code snippet:

fun pair_map (f, (x, y)) = (f x, f y)

This creates a function pair_map of type ('a -> 'b) * ('a * 'a) -> ('b * 'b), which applies its first argument (a function) to each element of its second argument (a pair), and returns the pair of results.

Conversely, a function can return a function. Above, we saw one way to create a two-argument function: accept a 2-tuple. Another approach, called currying, is to accept just the first argument, then return a function that accepts the second:

fun sum i j = i + j
val add_three = sum 3
val five = add_three 2
val ten = sum 5 5

This creates a function sum of type int -> int -> int (meaning int -> (int -> int)), a function add_three of type int -> int that returns three plus its argument, and integers five and ten.

Type declarations

edit

The type keyword may be used to create synonyms for existing data-types. For example, this code snippet:

type int_pair = int * int

creates the synonym int_pair for the data-type int * int. After that synonym has been created, a declaration like this one:

fun swap_int_pair ((i,j) : int_pair) = (j,i)

is exactly equivalent to one like this:

fun swap_int_pair (i : int, j : int) = (j,i)

As we shall see, this is mainly useful in modular programming, when creating a structure to match a given signature.

Datatype declarations

edit

The datatype keyword may be used to declare new data-types. For example, this code snippet:

datatype int_or_string = INT of int | STRING of string | NEITHER

creates an entirely new data-type int_or_string, along with new constructors (a sort of special function or value) INT, STRING, and NEITHER; each value of this type is either an INT with an integer, or a STRING with a string, or a NEITHER. We can then write:

val i = INT 3
val s = STRING "qq"
val n = NEITHER
val INT j = i

where the last declaration uses the pattern-matching facility to bind j to the integer 3.

Conceptually, these types resemble the enumerations or unions of a language such as C++, but they are completely type-safe, in that the compiler will distinguish the int_or_string type from every other type, and in that a value's constructor will be available at run-time to distinguish between the type's different variants (the different arms/branches/alternatives).

These data-types can be recursive:

datatype int_list = EMPTY | INT_LIST of int * int_list

creates a new type int_list, where each value of this type is either EMPTY (the empty list), or the concatenation of an integer with another int_list.

These data-types, like functions, can be polymorphic:

datatype 'a pair = PAIR of 'a * 'a

creates a new family of types 'a pair, such as int pair, string pair, and so on.

Lists

edit

One complex data-type provided by the Basis is the list. This is a recursive, polymorphic data-type, defined equivalently to this:

datatype 'a list = nil | :: of 'a * 'a list

where :: is an infix operator. So, for example, 3 :: 4 :: 5 :: nil is a list of three integers. Lists being one of the most common data-types in ML programs, the language also offers the special notation [3, 4, 5] for generating them.

The Basis also provides a number of functions for working with lists. One of these is length, which has type 'a list -> int, and which computes the length of a list. It may be defined like this:

fun length nil = 0
|   length (_::t) = 1 + length t

Another is rev, of type 'a list -> 'a list, which computes the reverse of a list — for example, it maps [ "a", "b", "c" ] to [ "c", "b", "a" ] — and may be defined like this:

local
  fun rev_helper (nil, ret) = ret
  |   rev_helper (h::t, ret) = rev_helper (t, h::ret)
in
  fun rev L = rev_helper (L, nil)
end

Exception declarations

edit

The built-in type exn (exception) resembles the types created by datatype declarations: it has variants, each with its own constructor. However, unlike with those types, new variants, with new constructors, can be added to the type, using exception declarations. This code snippet:

exception StringException of string
val e = StringException "example"
val StringException s = e

creates a constructor StringException of type string -> exn, a variable e of type exn, and a variable s of type string (its value being "example").

The exn type is unique in this regard; a type created within a program cannot be "added to" in this way.

References

edit

All of the above describe immutable forms of storage; for example, once a tuple is created, it cannot be changed (mutated). After this statement:

val x = (3, 4)

there is no way to change the value of x to be, say, (3, 5). (It is possible to create an entirely new x, that "shadows" the old one and has a completely different value — or even a completely different type — but that merely hides the old x, and won't affect any other values referring to it.)

The initial basis also provides mutable storage, in the form of references. In some ways, references behave as though they were defined like this:

datatype 'a ref = ref of 'a

For example, the following code snippet:

val reference : int ref = ref 12
val ref (twelve : int) = reference

binds the variable reference to a reference containing the value 12, and binds the variable twelve to the value 12.

However, the above snippet merely specifies the initial contents of the reference; the contents can be changed. This code snippet:

val () = reference := 13
val ref (thirteen : int) = reference

uses the built-in := function to modify the contents of reference. It then binds the new variable thirteen to the new value.

The Standard ML Basis Library defines a convenience function ! that retrieves the contents of a reference. It may be defined like this:

fun ! (ref x) = x

and is used like this:

val () = reference := 14
val fourteen = ! reference

Equality types

edit

Above, the concept of polymorphic types was discussed, and we have seen examples such as 'a * 'b -> 'b * 'a and 'a list. In these examples, the type applies to all possible types 'a and 'b. There also exists a slightly more restrictive type of polymorphism, which is restricted to equality types, denoted ''a, ''b, and so on. This type of polymorphism is generated by the polymorphic equality operator, =, which determines if two values are equal, and which has the type ''a * ''a -> bool. This means that both of its operands must be of the same type, and this type must be an equality type^ .

Of the "basic types" discussed above — int, real, string, char, and bool — all are equality types except for real. This means that 3 = 3, "3" = "3", #"3" = #"3", and true = true are all valid expressions evaluating to true; that 3 = 4, "3" = "4", #"3" = #"4", and true = false are all valid expressions evaluating to false; and that 3.0 = 3.0 and 3.0 = 4.0 are invalid expressions that the compiler will reject. The reason for this is that IEEE floating point equality breaks some of the requirements for equality in ML. In particular, nan is not equal to itself, so the relation is not reflexive.

Tuple and record types are equality types if and only if each component type is an equality type; for example, int * string, { b : bool, c : char }, and unit are equality types, whereas int * real and { x : real } are not.

Function types are never equality types, since in the general case it is impossible to determine whether two functions are equivalent.

A type created by a datatype declaration is an equality type if every variant is either a null constructor (one without an argument) or a constructor with an equality-type argument, and (in the case of polymorphic types) every type argument is an equality type. For example, this code snippet:

datatype suit = HEARTS | CLUBS | DIAMONDS | SPADES
datatype int_pair = INT_PAIR of int * int
datatype real_pair = REAL_PAIR of real * real
datatype 'a option = NONE | SOME of 'a

creates the equality types suit (null constructors only), int_pair (only one constructor, and its argument is the equality type int * int), and int option (one null constructor, and one constructor with the equality-type argument int), not to mention char option and string option and so on. It also creates the non-equality types real_pair (a constructor with argument of non-equality type real * real), real option, (int -> int) option, and so on.

A recursive type is an equality type if possible, and not otherwise. For example, consider the above-mentioned polymorphic type 'a list:

datatype 'a list = nil | :: of 'a * 'a list

Certainly real list is not an equality type, both because its type argument is not an equality type, and because real * real list (the type of ::'s argument) cannot be an equality type. However, there is no reason that int list cannot be an equality type, and so it is one. Note that this equality typing is only within a type; something like (nil : int list) = (nil : char list) would be invalid, because the two expressions are of different types, even though they have the same value. However, nil = nil and (nil : int list) = nil are both valid (and both evaluate to true).

The mutable type 'a ref is an equality type even if its component type is not. This is because two references are said to be equal if they identify the same ref cell (i.e., the same pointer, generated by the same call to the ref constructor). Therefore, for example, (ref 1) = (ref 1) and (ref 1.0) = (ref 1.0) are both valid — and both evaluate to false, because even though both references happen to point to identical values, the references themselves are separate, and each one can be mutated independently of the other.

However, a code snippet such as this one:

datatype 'a myref = myref of 'a ref

does not generate an equality type real myref, because its type argument is not an equality type — even though its sole constructor accepts an argument of an equality-type. As noted above, polymorphic types created with datatype declarations can only be equality types if their type arguments are, and though the built-in references are exempt from this restriction, they cannot be used to circumvent it. If it is desired that myref types always be equality types, one must use this approach:

datatype ''a myref = myref of ''a ref

which forbids real myref entirely (since the equality-type variable ''a cannot represent the non-equality type real).