Introduction to Programming Languages/Unification Cost Model

We start this section with a motivating question. What is the complexity of the predicate below, written in Prolog?

&=([], []).
&=([H|Tl], [H|Tr]) :- &=(Tl, Tr).

One could be easily tempted to say that this predicate is linear on the number of arguments of the smallest list. Yet, we could think about calls like:

&=([[1, 2, 3, 4, 5], [a, b, c, d, e]], [[1, 2, 3, 4, 5], [a, b, c, d, e]]).

We cannot solve this call to &= with two operations only. It is necessary to compare, one by one, all the two sublists that make up the larger lists. In this case, the complexity of &= is proportional to the size of the lists, not the number of elements of the lists. Unification-based search algorithms must scan entire data-structures in order to infer properties about them. And this scanning operation is exhaustive: every single possible combination of valid clauses will be explored during the resolution of unification. This brings in an important point: for efficiency reasons, it is better to solve the most restrictive clauses first, leaving those that are more open for later. For instance, let's consider the following predicates:

parent(d, a).
parent(f, e).
parent(a, b).
parent(e, b).
parent(e, c).

Given these terms, what, among these two implementations, is the best?

grandfather_1(X,Y) :- parent(X,Z), parent(Z,Y), male(X).

grandfather_2(X,Y) :- male(X), parent(X,Z), parent(Z,Y).

The second predicate is more efficient. To see why, notice that there are two clauses that unify with male(X) in our database of predicates: male(a) and male(b). Thus, the search tree for grandfather_2 starts with two nodes. Each of these nodes, which unifies with parent(X, Y), can have up to two nodes only, either parent(a, b) or parent(d, a), for a and d are the only atoms that satisfy male(X).

Tree showing inefficient unification

Had we used grandfather_1, then the story would be different. Our search tree, this time, starts with five nodes, for parent(X, Z) unifies with five clauses. Each one of these nodes, in turn, span three new nodes. This search tree is much larger than that one that grandfather_1 produces; consequently, it also takes longer to build. The bottom line of this discussion is: even though the order of terms within a predicate should bear no implication on its correctness, this order is still relevant for efficiency reasons. More restrictive predicates should come first, for they are more effective in pruning the search tree that is explored during the unification process.

Tree showing efficient unification

Tail Call Cost Model · Array Cost Model