Sorting is a fundamental task in many programs. We explain how to write an implementation of the fast and general merge sort algorithm in Prolog.

The basic algorithm

edit

The merge sort algorithm looks as follows:

function merge_sort(m)
    if length(m) ≤ 1
        return m
    else
        left, right = split_in_half(m)
        left = merge_sort(left)
        right = merge_sort(right)
        result = merge(left, right)
        return result

The first thing we have to do to implement any algorithm presented in this style in Prolog is to translate it from the language of functions to the language of predicates. Recall that Prolog has no return construct, but we can use an unbound variable as an output mechanism. So, we will write a predicate mergesort(Xs, S) that, given a list Xs, "returns" a sorted variant by binding S to it.

How about the if-then-else construct in the algorithm? We can translate that directly, but we also have the option of using Prolog's pattern matching. Because the conditional checks for ≤1, we have three structural cases: the empty list, a list of one element and the catch-all case. So, let's write our first merge sort.

% the empty list must be "returned" as-is
mergesort([], []).
% same for single element lists
mergesort([X], [X]).
% and now for the catch-all:
mergesort([X|Xs], S) :-
    length(Xs, Len),
    0 < Len,
    split_in_half([X|Xs], Ys, Zs),
    mergesort(Ys, SY),
    mergesort(Zs, SZ),
    merge(SY, SZ, S).

There's the skeleton of the algorithm. We have left out the definition of the helper predicates split_in_half and merge so far, but for a functioning sorting predicate, we must of course define those as well. Let's start with merge, which takes two sorted lists and produces a new sorted list containing all the elements in them. We use Prolog's recursion and pattern matching to implement merge.

% First two cases: merging any list Xs with an empty list yields Xs
merge([], Xs, Xs).
merge(Xs, [], Xs).
% Other cases: the @=< predicate compares terms by the "standard order"
merge([X|Xs], [Y|Ys], [X|S]) :-
    X @=< Y,
    merge(Xs, [Y|Ys], S).
merge([X|Xs], [Y|Ys], [Y|S]) :-
    Y @=< X,
    merge([X|Xs], Ys, S).

This predicate works, but it's repetitive and not very efficient. We'll revisit it in a minute, but first we must define split_in_half, which splits a list into two lists of roughly equal size (roughly because it may have an odd number of elements). To do so, we need the length of a list, which we unfortunately do not get as input (see merge_sort), so we need to compute that. We use a helper predicate split_at to actually split the list after computing the length.

split_in_half(Xs, Ys, Zs) :-
    length(Xs, Len),
    Half is Len // 2,    % // denotes integer division, rounding down
    split_at(Xs, Half, Ys, Zs).

Exercise. Before continuing to the definition of split_at, try to implement it yourself. Use recursion and case matching on both the input list and the second, length, argument.

Okay, now for split_at:

% split_at(Xs, N, Ys, Zs) divides Xs into a list Ys of length N
% and a list Zs containing the part after the first N.
split_at(Xs, N, Ys, Zs) :-
    length(Ys, N),
    append(Ys, Zs, Xs).

If the definition of split_at seems magical, then try to read it declaratively: "a list Xs, split after its first N elements into Ys and Zs, means that Ys has length N and Ys and Zs can be appended to retrieve Xs." The fact that this works is an example of Prolog's "bidirectional" power: many predicates can be run in "reverse order" to get the inverse of a computation.

Cleaning up

edit

We now have a merge sort program that will sort any list, but some parts of it aren't particularly elegant. Let's first revisit merge_sort. We translated an if-then-else construct to multiple clauses since that's so common in Prolog, but we really had no reason to do so: the algorithm is deterministic, so there should no backtracking going on. Let's try a more direct rewrite of the pseudocode using Prolog's own if-then-else.

mergesort(Xs, S) :-
    length(Xs, Len),
    (Len =< 2 ->
        S = Xs
    ;
        split_in_half(Xs, Ys, Zs),
        mergesort(Ys, SY),
        mergesort(Zs, SZ),
        merge(SY, SZ, S)
    ).

Now, let's tackle merge's efficiency and readability, as promised. The last two cases of this predicate look like a copy-paste, which is always a code smell. Also, there may be unnecessary backtracking going on. Let's rewrite merge using an if-then-else as well:

% First two cases: merging any list Xs with an empty list yields Xs
merge([], Xs, Xs).
merge(Xs, [], Xs).
% Other cases: the @=< predicate compares terms by the "standard order"
merge(Xs, Ys, S) :-
    Xs = [X|Xs0],
    Ys = [Y|Ys0],
    (X @=< Y ->
        S = [X|S0],
        merge(Xs0, Ys, S0)
    ;
        S = [Y|S0],
        merge(Xs, Ys0, S0)
    ).