Introduction to Programming Languages/Ad Hoc Polymorphism

Ad-Hoc Polymorphism

edit

We say that a form of polymorphism is ad-hoc if it allows the same name to denote a finite number of programming entities. There are two main kinds of ad-hoc polymorphism: overloading and coercion.

Overloading

edit

Overloading is the capacity that a programming language has to use the same name to denote different operations. These operations can be invoked either through function names or through special symbols called operators. The most common form of overloading is operator overloading. For instance, C, C++, SML, Java and Python overload the plus symbol (+), to denote either the sum of integers or the sum of floating point numbers. Even though the notion of sum in either type can be, in principle, the same to us, the algorithms that implement these operations are very different. Integers are generally summed as 2's-complements. On the other hand, the algorithm to sum floating point numbers involve separately summing the base exponent and mantissa of the operands. Additionally, in Java and Python the plus symbol also denotes string concatenation, a third meaning for the same operator.

Languages such as C and SML only overload built-in operators. However, some programming languages allow the programmer to overload names. The following example illustrates user defined overloading in C++:

#include <iostream>
int sum(int a, int b) {
  std::cout << "Sum of ints\n";
  return a + b;
}

double sum(double a, double b) {
  std::cout << "Sum of doubles\n";
  return a + b;
}

int main() {
  std::cout << "The sum is " << sum(1, 2) << std::endl;
  std::cout << "The sum is " << sum(1.2, 2.1) << std::endl;
}

In the program above, we have two different implementations for the name sum. When this name is used as a function call, then the right implementation is chosen based on the type signature of the function. The signature of a function is formed by its name, plus the types of the parameters. The order of these types is important. So, in the program above we have two different signatures for the function sum. We have [sum, int, int] and [sum, double, double]. In most programming languages this signature is context insensitive. In other words, the type of the returned value is not part of the signature. The process of selecting the appropriated implementation given a name (or symbol) is called overload resolution. This selection is made by matching the type of the actual arguments in the call against the type of the formal parameters in one of the signatures.

The implementation of overloading is very simple. The compiler simply generates different names for all the implementations that receive the same name by the programmer. If, for instance, we compile the example above to assembly, we would find two different names for the two different implementations of sum:

$> g++ -S over.cpp
$> cat over.s
...
.globl __Z3sumdd
__Z3sumdd:
...
.globl __Z3sumii
__Z3sumii:
...

A few programming languages have support for operator overloading. The most well-known example is C++, yet operator overloading is also present in languages such as Fortress and Fortran 90. In the words of Guy Steele, the possibility of defining new data types and overloading operators give the programming language room to grow. In other words, developers can change the programming language so that it becomes closer to the problems that they must solve. As an example of operator overloading, the program below, written in C++, contains two overloaded operators, the plus symbol (+), and the streaming operators (<<).

#include <string.h>
#include <ostream>
#include <iostream>

class MyString {
  friend std::ostream & operator<<(std::ostream & os, const MyString & a) {
    os << a.member1;
  }
  public:
    static const int CAP = 100;
    MyString (const char* arg) {
      strncpy(member1, arg, CAP);
    }
    void operator +(MyString val) {
      strcat(member1, val.member1);
    }
  private:
    char member1[CAP];
};

int main () {
  MyString s1("Program");
  MyString s2("ming");
  s1 + s2;
  std::cout << s1 << std::endl;
}

Some programming languages allow developers to overwrite names and symbols, but these languages do not provide overloading. Overloading is only present if the programming language allows the two names co-exist in the same scope. For instance, in SML the developer can overwrite an operator. However, the old definition of this operator stops existing, as it has been shadowed by the new definition:

- infix 3 +;
infix 3 +
- fun op + (a, b) = a - b;
val + = fn : int * int -> int
- 3 + 2;
val it = 1 : int

Coercion

edit

Many programming languages support the conversion of a value into another having a different data type. These type conversions can be performed implicitly or explicitly. Implicit conversions happen automatically. Explicit conversion are performed by the programmer. The C code below illustrates implicit and explicit coercion. In line 2 the int constant 3 is automatically, i.e., implicitly, converted to double before the assignment takes place. C provides a special syntax for explicit conversions. In this case, we prefix the value that we want to convert with the name of the target type in parenthesis, as we show in line 3.

double x, y;
x = 3;            // implicitly coercion (coercion)
y = (double) 5;   // explicitly coercion (casting)

We shall use the word coercion to refer to the implicit type conversions. Coercions let the application developer to use the same syntax to swimmingly combine operands from different data types. Languages that support implicit conversion must define the rules that will be automatically applied when compatible values are combined. These rules are part of the semantics of the programming language. As an example, Java define six different ways to convert primitive types to double. Thus, all the calls of the function f below are correct:

public class Coercion {
  public static void f(double x) {
    System.out.println(x);
  }
  public static void main(String args[]) {
    f((byte)1);
    f((short)2);
    f('a');
    f(3);
    f(4L);
    f(5.6F);
    f(5.6);
  }
}

Although implicit type conversions are well-defined, they may lead to the creation of programs that are hard to understand. This difficulty is even more exacerbated in languages that combine coercion with overloading. As an example, even veteran C++ programmers may not be sure of which calls will be made by the program below, in lines 13-15:

#include <iostream>
int square(int a) {
  std::cout << "Square of ints\n";
  return a * a;
}
double square(double a) {
  std::cout << "Square of doubles\n";
  return a * a;
}
int main() {
  double b = 'a';
  int i = 'a';
  std::cout << square(b) << std::endl;
  std::cout << square(i) << std::endl;
  std::cout << square('a') << std::endl;
}

Even though the program above may look confusing, it is well-defined: the semantics of C++ gives from integers over doubles when converting characters. However, sometimes the combination of coercion and overloading may allow the creation of ambiguous programs. To illustrate this point, the program below is ambiguous, and will not compile. The problem, in this case, is that C++ allows not only the conversion of integers to doubles, but also the conversion of doubles to integers. Thus, there are two possible interpretations to the call sum(1, 2.1).

#include <iostream>
int sum(int a, int b) { return a + b; }

double sum(double a, double b) { return a + b; }

int main() {
  std::cout << "Sum = " << sum(1, 2.1) << std::endl;
}

Polymorphis · Universal Polymorphism