Universal Function Call Syntax

March 28, 2012

Over my years of experience programming, I have become more and more convinced that the key to code reusability and scalability is encapsulation — having subsystems communicate through small, well-defined interfaces and letting them hide their own implementation details. I know what you're thinking, “did he fire 5 shots or 6?” Er, scratch that, you're thinking “duh!” and “join the party!” Please indulge me, I hope you'll find the payoff worthwhile.

A corollary to that is that language features that make encapsulation easier or more effective are good things. Designing proper abstractions (and their corresponding encapsulations) turns out in real life to not be as straightforward as various texts and articles would have you believe. The community's understanding of encapsulation has improved over the years, and consequently the related features frozen in programming languages may sometimes seem slightly outdated.

Let's consider just one encapsulation feature we all know and love — the class. A class encapsulates data (fields) and the functions (methods) that manipulate that data:

module mylib;
class C {
    void set(int x) {
      enforce(x >= 1 && x <== 100);
      this.x = x;
    }
    int get() {
      return x;
    }
  private:
    int x;
}

That's about as simple as it gets. But then, we often find it reasonable to add more methods which manipulate the data:

class C {
   ...
   int square() {
     return get() * get();
   }
}

And an interesting thing starts to happen: we don't know when to stop adding methods. The list of required primitives tends to be rather short and stable, but the list of convenient, nice-to-have methods tends to go on forever. At the same time, such methods are quite desirable. Some usage patterns of primitives are frequent, and many consider the syntax

a.method(b,c)

better than

function(a,b,c).

The class tends to get larded up with all kinds of methods. After all, we cannot anticipate every way someone might use a class, so just in case, we throw in the kitchen sink. After all, someone using the class may not be able to modify the class declarations in the import files supplied.

The next stage in our thought process is “derive from class C, and add those extra methods there!”:

class D : C {
   int square() {
     return get() * get();
   }
}

Ok, but now every instance of

new C();

must be found and replaced with:

new D();

and of course this falls over completely when Fred derived D from C to add method square, and Bill derived E from C to add another method cube, and both methods are needed. (Of course, there's the dreaded multiple “diamond” inheritance, but I think there's a better way. Read on.)

Scott Meyers recognized this problem back in 2000, and wrote an insightful article How Non-Member Functions Improve Encapsulation about a solution — use free functions instead when they can be implemented as non-friends. (His arguments are compelling enough that I won't even try to improve on them.) Free functions in the D programming language would look like:

In fred.d:

import mylib: C;
int square(C c) { return c.get() * c.get(); }

In bill.d:

import mylib: C;
int cube(C c) { return c.get() * c.get() * c.get(); }

And it would be used like:

import mylib: C;
import fred: square;
import bill: cube;

...
auto c = new C;
...
auto s = square(c);
auto b = cube(c);

and that works. But there's a problem. We'd really prefer using

c.square();

instead of:

square(c);

After all, there's the matter of consistency. When there's both:

x.toInteger();
toString(i);

it just seems arbitrary. Chaining methods like:

toString(x.toInteger());

looks wonky compared with:

x.toInteger().toString();

But it's more than just cosmetic. A template may rely on structural conformance, meaning it is looking for specific methods. An example of this would be an input range (from std.range) which looks for the methods empty, popFront, and front. There's no way to add that functionality to an existing class without cracking it open and inserting them, and input ranges don't recognize free function versions. For built-in types, most languages do not allow adding methods.

I hope you've borne with me so far, because here's the payoff:

Universal Function Call Syntax

New to version 2.059 of D (and implemented by Kenji Hara) is the notion of universal function call syntax. It's been around in D since the beginning in a nascent form for arrays, but it's now available for all classes and structs. It's nothing more than if the compiler sees:

c.square(args);

and square is not a member of class C, then it looks for a free function of the form:

square(c,args);

That's it! Now it's easier to follow Scott Meyers' advice to minimize the number of methods in a class to just those that need to access its private state. Implement the rest as free functions. Methods can be “added” by third parties without changing the original class definition. Additions from multiple third parties can be used simultaneously. And payment will only be made for the methods that are actually used. Hopefully, this helps spell the end of “fat” and “kitchen sink” classes.

Alternatives

A different design to address this problem is called Extension Methods. Extension methods differ in that they require a special syntax to differentiate them from regular functions, and that they may only be called using the infix notation, i.e. c.foo(args). The D design does not require a special syntax, and the methods may be called with either the infix notation or the prefix notation, i.e. foo(c, args).

C++ STL algorithms avoid this problem by standardizing on the non-member function call syntax. They take iterators instead of containers, so they never call container member functions, and because iterators may actually be pointers, they never invoke member functions on iterators. Function objects are invoked as if they were function pointers, which they may in fact be.

Only time and usage experience will tell which of D's approach that allows both c.foo() and foo(c) to coexist, C#'s approach to only allow c.foo(), or C++ STL's approach to only allow foo(c) is superior.

Acknowledgements

Thanks to Scott Meyers for his helpful suggestions on this, Andrei Alexandrescu for repeatedly explaining it to me and offering helpful suggestions, and to Kenji Hara for implementing it for D, and to Eric Neibler, Bartsoz Milewski and David Held for their helpful suggestions.

Articles

Universal Function Call Syntax

Universal Function Call Syntax

Alternatives

Acknowledgements