Introduction to Programming Languages/Constructed Types

Constructed Types

Programming languages usually provide developers with very few primitive types. If programmers want to increase the available types, then they can define new composite types. If a type is a set, then a constructed type is just a new set built from other types.

Enumerations

Enumerations are the simplest kind of primitive types. In some programming languages enumerations are a subset of another type. In others, enumerations are an independent set of elements. In C we have the former, and in SML we have the latter way to define enumerations. The program below illustrates how enumerations are used in C:

#include <stdio.h>
enum coin { penny = 1, nickel = 5, dime = 10, quarter = 25 };
int main() {
  printf("%d\n", penny);
  printf("%d\n", penny + 1);
  printf("%d\n", penny + 4 == nickel);
}

As we can see in this figure, C's enumerations are a subset of the integers. Thus, any operation that can be applied on an integer can also be applied on elements of an enumeration. These enumerations are not even closed sets. In other words, it is possible to sum up two elements of the same enumeration, and to get a result that is not part of that type, e.g., penny + 1 is not in the coin enumeration, in our previous example.

In some programming languages enumerations are not a subset of an existing type. Instead, they are a set on its own. In SML, for example, enumerations are defined in this way:

datatype day = M | Tu | W | Th | F | Sa | Su;
fun isWeekend x = (x = Sa orelse x = Su);

An operation such as 1 + Tu, for instance, will not type check in SML, as enumerations are not integers. The only operation that is possible, in this case, is comparison, as we showed in the implementation of the function isWeekend.

Tuples

Tuples give us one of the most important kinds of constructed types. Conceptually, tuples are cartesian products of other types. An $\scriptstyle n$ -tuple is a sequence of $\scriptstyle n$ ordered elements. If a tuple $\scriptstyle T=T1\times T2$ is the product of two types, $\scriptstyle T1$ and $\scriptstyle T2$ , then $\scriptstyle T$ has as many elements as $\scriptstyle |T1|\times |T2|$ , where $\scriptstyle |T1|$ and $\scriptstyle |T2|$ are the cardinality of the sets $\scriptstyle T1$ and $\scriptstyle T2$ . As an example, below we have the declaration of a type a in SML:

- val a = (1.2, 3, "str");
val a = (1.2,3,"str") : real * int * string

The typical operation that we can apply on tuples is indexing its elements. In SML we index tuples via the # operator:

- val a = (1.2, 3, "str");
val a = (1.2,3,"str") : real * int * string
- #1 a;
val it = 1.2 : real
- val a = (1.2, ("string", true));
val a = (1.2,("string",true)) : real * (string * bool)
- #1 (#2 a);
val it = "string" : string

Tuples are very useful as a quick way to build aggregate data structures. These types are the only alternative that many programming languages provide to build functions that return multiple elements. For instance, the function below, written in Python, outputs, for each actual parameter, a pair with the parameter itself, and its ceil, i.e., its value rounded up:

>>> def logMap(n): return (n, math.ceil(n))
... 
>>> logMap(3.14)
(3.1400000000000001, 4.0)

In SML, as well as in any language that follows the lambda-calculus more closely, every function receives only one parameter. Tuples provide, in this case, an alternative to simulate the invocation of the function with multiple parameters:

- fun sum(a, b) = a + b;
val sum = fn : int * int -> int
- sum(2, 4);
val it = 6 : int
- val a = (2, 4);
val a = (2,4) : int * int
- sum a;
val it = 6 : int

Records

Some programming languages support tuples with named components. Tuples with named components are commonly known as record or structure types:

type complex = {
  rp:real,
  ip:real
};
fun getip (x : complex) = #ip x;

The C language has a construction named struct that can be used to represent tuples. The next example illustrates its use.

#include <stdio.h>

struct complex {
  double rp;
  double ip;
};

int main() {
  struct complex c1;
  c1.rp = 0;
  printf("%d\n", c1.rp);
  printf("%d\n", c1.ip);
}

The complex structure represents complex numbers with its real and imaginary parts. Each element of the struct has a name. Moreover, each element of a struct is associated with a type, which has an internal representation. Some languages, such as SML, hide this representation from the programmer. However, there are languages, such as C, that make this representation visible to the programmer. In this case, the language specification allows the programmer to tell exactly how a tuple is represented. It defines which parts of the representation must be the same in all C systems and which can be different in different implementation. Because the visibility of the internal representation of records in C, it is possible to create a program that iterates over the fields of a tuple. An an example, the program below defines a $\scriptstyle 5$ -tuple (mySt) and iterates over its elements using pointer manipulation.

#include <stdio.h>

struct mySt {
  int i1, i2, i3, i4, sentinel;
};

int main(int argc, char** argv) {
  struct mySt s;
  int *p1, *p4;
  s.i1 = 1;
  s.i2 = 2;
  s.i3 = 3;
  s.i4 = 4;

  p1 = ( (int*) &s);
  p4 = &(s.sentinel);
  do {
    printf("field = %d\n", *p1);
    p1++;
  } while (p1 != p4);
}

Lists

Lists are one of the most ubiquitous constructed types in functional programming languages. Lists have two important differences when compared to tuples:

The type declaration does not predefine a number of elements.
All the elements in a list must have the same type.

Additionally, the only operations that are generally allowed on lists are:

To read the head element.
To read the tail of the list.

The program below illustrates the use of a lists in SML:

- val a = [1,2,3];
val a = [1,2,3] : int list
- val x = hd a;
val x = 1 : int
- val y = tl a;
val y = [2,3] : int list

Arrays

Vectors are containers that store data having the same data type. The elements stored into an array are indexed. Many programming languages use integer numbers as indexes while others are more flexible and permit arrays to be indexed using a variety of types, such as integers, characters, enumerations, and so on. There are programming languages, such as Pascal, that give the developer the chance of choosing the lowest and highest indexes of an array. As an example, the code snippet below, written in Pascal, could be used to count the frequency of letters that occur in a document:

type
  LetterCount = array['a'..'z'] of Integer;

The approach adopted in Pascal makes it unnecessary to define the array size, which is automatically defined by the compiler. Contrary to Pascal, every array in C is indexed starting in zero. The array above, if written in C, would be like:

int LetterCount[26];

In some programming languages arrays have fixed size. Once they are declared, this size cannot be changed. However, there exist programming languages that provide arrays of variable length. Python is an example:

>>> a = [1,2,3,4]
>>> a
[1, 2, 3, 4]
>>> a[1:1] = [5,6,7]
>>> a
[1, 5, 6, 7, 2, 3, 4]

Some programming languages support multi-dimensional arrays. C is an example. The program below declares a 2-dimensional array, fills it up with integer numbers and then finds the sum of every cell:

#include "stdio.h"
int main() {
  const int M = 5000;
  const int N = 1000;
  char m[M][N];
  int i, j;
  int sum = 0;
  // Initializes the array:
  for (i = 0; i < M; i++) {
    for (j = 0; j < N; j++) {
      m[i][j] = (i + j) & 7;
    }
  }
  // Sums up the matrix elements, row major:
  for (i = 0; i < M; i++) {
    for (j = 0; j < N; j++) {
      sum += m[i][j];
    }
  }
  printf("The sum is %d\n", sum);
}

Contrary to lists, arrays provide developers with random accesses. That is, it is possible to read or update the $\scriptstyle n^{th}$ element of an array in $\scriptstyle O(1)$ . Notice that this operation is not usually allowed on lists. In that case, the operation of reading the $\scriptstyle n^{th}$ element is $\scriptstyle O(n)$ . Usually this restriction does not limit the efficiency of the algorithms that can be developed in array-free languages; however, there are some algorithms that cannot be implemented so efficiently in these languages. An example is the problem of determining if two strings are anagrams. In a programming language that provides random-access arrays, it is possible to implement this function in $\scriptstyle O(n)$ , where $\scriptstyle n$ is the size of the largest string. Otherwise, the most efficient implementation will be $\scriptstyle O(ln{\mbox{ }}n)$ . The program below, written in Java, solves the anagram problem in linear time:

public class Anagrams {
  public static void main(String args[]) {
    if (args.length != 2) {
      System.err.println("Syntax: java Anagrams s1 s2");
    } else {
      boolean yes = true;
      String s1 = args[0];
      String s2 = args[1];
      if (s1.length() != s2.length()) {
        yes = false;
      } else {
        int table[] = new int [128];
        for (int i = 0; i < s1.length(); i++) {
          table[s1.charAt(i)]++;
        }
        for (int i = 0; i < s2.length(); i++) {
          if (table[s2.charAt(i)] == 0) {
            yes = false;
            break;
          } else {
            table[s2.charAt(i)]--;
          }
        }
      }
      System.out.println("Are they anagrams? " + yes);
    }
  }
}

Associative Arrays

Some programming languages provide a special type of array, the dictionary, also known as the associative array, that allows one to map arbitrary objects to arbitrary values. Associative arrays are more common in dynamically typed languages. The program below, written in Python, illustrate the use of dictionaries:

>>> d = {"Name":"Persival", "Age":31}
>>> d["Name"]
'Persival'
>>> d["Gender"] = "Male";
>>> d
{'Gender': 'Male', 'Age': 31, 'Name': 'Persival'}
>>> d[0] = "Secret Info"
>>> d
{0: 'Secret Info', 'Gender': 'Male', 'Age': 31, 'Name': 'Persival'}

Classes

Classes are a special type of table that contains a reference to itself. In other words, a class is a type that "sees" itself. In terms of implementation, classes can be implemented as records, or as associative arrays. In Java or C++, for instance, classes are implemented as records, with a fixed number of elements. The program below shows an example of the implementation of a class:

public class ConsCell<E> {
  private E head;
  private ConsCell<E> tail;
  public ConsCell(E h, ConsCell<E> t) {
    head = h;
    tail = t;
  }
  public E getHead() {
    return head;
  }
  public ConsCell<E> getTail() {
    return tail;
  }
  public void print() {
    System.out.println(head);
    if (tail != null) {
      tail.print();
    }
  }
  public int length() {
    if (tail == null) {
      return 1;
    } else {
      return 1 + tail.length();
    }
  }
  public static void main(String argc[]) {
  ConsCell<Integer> c1=new ConsCell<Integer>(1, null);
  ConsCell<Integer> c2=new ConsCell<Integer>(2, c1);
  ConsCell<Integer> c3=new ConsCell<Integer>(3, c2);
  ConsCell<Integer> c4=new ConsCell<Integer>(4, c3);
    System.out.println(c4.length());
    System.out.println("Printing c4");
    c4.print();
    System.out.println("Printing c2");
    c2.print();
  }
}

In Java, the layout of a class cannot be modified. Once it is declared, new fields cannot be added. This restriction makes it easier to implement classes in this language efficiently. Calling a function that is declared within a class, also known as a "method", is an efficient operation. The target of a call can be found in $\scriptstyle O(1)$ time. Generally statically typed languages, such as C++, C# and Ocaml.

The other way to implement class is as associative arrays. Dynamically typed languages, such as Python, JavaScript and Lua implement classes in this way. The main advantage of this approach is flexibility: given that the class is a mutable table, new properties can be added to it at runtime, or removed from it. The program below shows how classes can be declared and used in Python:

INT_BITS = 32

def getIndex(element):
  index = element / INT_BITS
  offset = element % INT_BITS
  bit = 1 << offset
  return (index, bit)

class Set:
  def __init__(self, capacity):
    self.capacity = capacity
    self.vector = range(1 + self.capacity / INT_BITS)
    for i in range(len(self.vector)):
      self.vector[i] = 0

  def add(self, element):
    (index, bit) = getIndex(element)
    self.vector[index] |= bit

  def delete(self, element):
    (index, bit) = getIndex(element)
    self.vector[index] &= ~bit

  def contains(self, element):
    (index, bit) = getIndex(element)
    return (self.vector[index] & bit) > 0

s = Set(60)
s.add(59)
s.add(60)
s.add(61)

Classes are not rigid structures in Python. Thus, it is possible to either insert or remove new properties into these types. The program below illustrates how a new method can be added to our original Python example. The new method handles the possibility than an element not supported by the range of the bit set is added to this bit set:

def errorAdd(self, element):
  if (element > self.capacity):
    raise IndexError(str(element) + " is out of range.")
  else:
    (index, bit) = getIndex(element)
    self.vector[index] |= bit
    print element, "added successfully!"

Set.errorAdd = errorAdd

s = Set(60)
s.errorAdd(59)
s.add(60)
s.errorAdd(61)

Unions

The union of many sets $\scriptstyle A_{1},A_{2},\ldots ,A_{n}$ is the set of all elements that are in at least one of the base sets. The $\scriptstyle \cup$ symbol is used to express this concept $\scriptstyle (A_{1}\cup A_{2}\cup \ldots \cup A_{n})$ . In programming languages, this notion of union is applied to types: the union of several types $\scriptstyle T_{1},T_{2},\ldots ,T_{n}$ is a new type $\scriptstyle T$ that contains every element in each of the base types. Some programming languages force the developer to explicitly distinguish each base type from the other, when using the union. Other languages, such as C, trust the programmer to use correctly each element of the union. The code bellow shows how unions are declared and used in C:

#include <stdio.h>
union element {
  int i;
  float f;
};

int main() {
  union element e;
  e.f = 177689982.02993F;
  printf("Int = %d\n", e.i);
  printf("Float = %f\n", e.f);
  e.f = 0.0F;
}

This example shows that any of several representations or formats can be associated with one single variable. The C code above associates int and float with the same element. So, the union element e consists of a variable which may hold an int or a float. Notice that in the program above we have initialized the field f of the union e; however, we have used its field i. C does not require the programmer to use the base type as it has been declared in the union. A union, in C, is only a chunk of data stored in memory. How this data is interpreted depends on which operation the programmer wants to apply on it. If a union $\scriptstyle U$ is made of several base types $\scriptstyle T_{1},T_{2},\ldots ,T_{n}$ , then the compiler reserves in memory the area of the largest base type to allocate any instance of $\scriptstyle U$ .

There are programming languages that force the programmer to explicitly distinguish each base type of a union. SML fits in this family of languages. In SML unions are labeled, and if programmers want to declare a particular subtype of a union, they need to explicitly mention the label:

datatype element =
  I of int | F of real;
fun getReal (F x) = x
  | getReal (I x) = real x;

The labels allow us to know at runtime the type of the data stored into a variable. Because C does not have this concept of labels associated with unions, runtime type inspection is not possible in that language. Labels also ensure that all the sets in a union are disjoint. Therefore, the cardinality of a union $\scriptstyle U=T_{1},T_{2},\ldots ,T_{n}$ is the sum of the cardinalities of each base set $\scriptstyle T_{i}$ .

Function Types

Functions map a given set, the domain, into another set, the range (or image). Hence, the type of a function specifies the function's domain and range. This idea is analogous to the mathematical notation $\scriptstyle A\rightarrow B$ which refers to the set of all function with inputs from the set A and outputs from the set B. Many programming languages have this kind of construction. Below we show an example of function declaration in C. The domain of the function is a pair of real numbers and the image is a real number, e.g., $\scriptstyle F\times Float\rightarrow Float$ .

float sum(float a, float b) {
  return a + b;
}

← Primitive Types

Polymorphism*

It refers to the ability of a variable, function or object to take on multiple forms