Introduction to Programming Languages/Array Cost Model

We say that a data-structure is random access if the cost to read (or update) any element that it stores is the same, regardless of its position. Functional lists are clearly not random access: reading the n-th element is O(n). Arrays in C or Java, on the other hand, are. There are other data-structures which are also random access, at least on the average, such as hash-tables. Henceforth, we shall focus on C-style arrays. The key to random access lay on the fact that an array is stored contiguously in memory. Thus, an access such as a[i], in an array of type T, can be translated to the expression *(a + i). The same principle works for multidimensional arrays as well. As another example, a[i][j] in a 2-dimensional array of M lines and N columns, in C, translates to: *(a + i*N + j). However, even when we restrict ourselves to arrays, it is still difficult to ensure the same access cost always. The culprit is something called locality. To explain what is locality, let's consider two loops:

#include <stdio.h>
int main(int argc, char** argv) {
  const int M = 5000;
  const int N = 1000;
  char m[M][N];
  int i, j;

  if (argc % 2) {
    // initializes the array, row major:
    for (i = 0; i < M; i++) {
      for (j = 0; j < N; j++) {
        m[i][j] = 1;
      }
    }
  } else {
    // initializes the array, column major:
    for (j = 0; j < N; j++) {
      for (i = 0; i < M; i++) {
        m[i][j] = 1;
      }
    }
  }

  return 0;
}

This program runs at different speeds, depending on which loops executes (on an Intel core i5 at 1.4GHz):

$> clang -O0 trixSpeed.c -o trixSpeed

$>  time ./trixSpeed a

real	0m0.073s
user	0m0.068s
sys	0m0.003s

$> time ./trixSpeed 

real	0m0.025s
user	0m0.019s
sys	0m0.003s

So, what is going on with this example? The same number of operations is performed, regardless of which loop runs. However, the first loop is 3.5x faster. As mentioned before, the key feature playing a role in these results is locality. Modern general purpose computer architectures use a cache. The cache is divided in lines. Data is not brought from main memory into the cache in individual pieces. Instead, a whole line is brought at once. If a program can access multiple datum on the same line, then memory trips can be avoided. On the other hand, if data is accessed across different lines, then multiple trips to the main memory must be performed. These trips are expensive, and they account for most of the running cost of this example.

How should a program be implemented, so that it can benefit more from locality? The answer is: it depends on the programming language. C organizes data in a row-major scheme. This means that in a 2-dimensional array, data on the same line are placed next to each other in memory. However, data in the same column can be quite far away. Cells in the same column, but in adjacent lines, will be N elements apart, where N is the line size in the 2-dimensional array. On the other hand, there are programming languages that organize data in column-major order. An example is Fortran. In this case, it is more efficient to fix columns and vary lines, while traversing 2-dimensional arrays.

Unification Cost Model · Simple Predicates