Optimizing C++/General optimization techniques/Sorting

The C++ Standard Template Library (STL) provides the template function sort that implements a comparison sort algorithm. Because sort is templatized, it can be used for various types of sequences holding any type of key, as long as the keys are comparable (implement the < operator). A good compiler can generate code optimized for the various kinds of sequence/key combinations.

The reference implementation of the STL uses the introsort algorithm (since the 2000 release; the GNU C++ library uses the reference implementation). This algorithm is a very fast combination of quicksort and heapsort with a specially designed selection algorithm.

The sort template function is not guaranteed to be stable. When a stable sort is required, use the stable_sort template function instead.

This section suggests alternatives to the sort and stable_sort template functions that may be faster in specific cases.

Sorting with small ranges of keys

To sort a data set according an integer key having a small range, use the counting sort algorithm.

The counting sort algorithm has O(N+M) complexity, where N is the number of elements to sort and M is the range of the sort keys, that is the difference between the highest key and the lowest key. In case N elements are to be sorted whose key is an integer number belonging to an interval containing no more than two times N values (i.e when M <= 2 * N holds), this algorithm may be quite faster than sort. In some cases it is faster even with larger ranges.

This algorithm may be used also for a partial ordering; for example, if the keys are integers between zero and one billion, you can still sort them using only the most significant byte, so to get an order for which the formula $a_{n}<a_{n+1}+256*256*256$ holds.

Example: sorting 8-bit integers

Say you want to sort an array of arbitrary unsigned char elements. <climits> defines constant limits for integer and char types for a specific implementation. CHAR_BIT is the number of bits in a char object. ISO C++ requires CHAR_BIT to be 8 or greater. An unsigned char may have a value in the range between 0 and UCHAR_MAX. ISO C++ requires UCHAR_MAX to be 255 (2^8-1) or greater.

Note: unsigned char must be used because char can be signed or unsigned depending on the implementation.

#include <climits>

void count_sort(unsigned char *a, unsigned char *const end)
{
	unsigned char freq[UCHAR_MAX+1] = {0};
	unsigned char *p, c;
	
	for (p = a; p < end; ++p) {
		freq[*p] += 1;
	}
	for (c = 0, p = a; c < UCHAR_MAX; ++c) {
		while (freq[c]-- > 0) {
			*p++ = c;
		}
	}
	while (freq[c]-- > 0) {
		*p++ = c;
	}
}

The counting_sort function implements the pigeonhole sort algorithm. It takes a pointer to the first element of the input array and a pointer that points one element beyond the end of the array. Why? Because we don't have to stop here.

We can generalize counting_sort to a template function that also works for string, vector<unsigned char> and other sequence types, without loss of efficiency. When doing so, we need to work with iterators rather than pointers.

#include <iterator>
#include <limits>

template <typename iterator>
void counting_sort(iterator const &begin, iterator const &end)
{
	typedef std::iterator_traits<iterator>::value_type T;
	T max = std::numeric_limits<T>::max();
	T freq[max+1] = {0};
	iterator it;
	T c;

	for (it = begin; it < end; ++it) {
		freq[*it] += 1;
	}
	for (c = 0, it = begin; c < max; ++c)
		while (freq[c]-- > 0) {
			*it++ = c;
		}
	}
	while (freq[c]-- > 0) {
		*it++ = c;
	}
}

Partial sorting

Partitioning

If you have to split a sequence according a criterion, use a partitioning algorithm, instead of a sorting one.

In STL there is the std::partition algorithm, that is faster than the std::sort algorithm, as it has O(N) complexity, instead of O(N log(N)).

Stable partitioning and sorting

If you have to partition or sort a sequence for which equivalent entities may be swapped, don't use a stable algorithm.

In STL there is the std::stable_partition partitioning algorithm, that is slightly slower than the std::partition algorithm; and there is the std::stable_sort sorting algorithm, that is slightly slower than the std::sort algorithm.

Order partitioning

If you have to pick out the first N elements from a sequence, or the N^th element in a sequence, use an order partitioning algorithm, instead of a sorting one.

In STL there is the std::nth_element algorithm, that, although slightly slower than the std::stable_partition algorithm, is quite much faster than the std::sort algorithm, as it has O(N) complexity, instead of O(N log(N)).

Sorting only the first N elements

If you have to sort the first N elements of a much longer sequence, use an order statistic algorithm, instead of a sorting one.

In STL there are the std::partial_sort and std::partial_sort_copy algorithms, that, although slower than the std::nth_element algorithm, are so much faster than the std::sort algorithm as the partial sequence to sort is shorter than the whole sequence.