digitalmars.D.learn - UFCS in generic libraries, silent hijacking, and compile errors.

aliak (40/40) Mar 10 2018 What are the recommended guidelines for using/not using UFCS in

Jonathan M Davis (40/49) Mar 10 2018 The idea is that the type can provide its own version of the function th...

aliak (29/64) Mar 11 2018 I think this may have hit the nail on the spot. So basically my

Jonathan M Davis (94/131) Mar 11 2018 You're talking about a situation where you used a function whose paramet...

aliak (78/202) Mar 13 2018 Not saying it's common, just something to be aware of that is

aliak <something something.com> writes:

What are the recommended guidelines for using/not using UFCS in 
writing generic libraries?

I ask because if you have an internal generic free function that 
you use on types in a generic algorithm via ufcs, then everything 
works fine until the type being operated on has a member function 
with a similar name.

If the signature matches the free function then the member 
function is called instead of the free function (silently) and if 
the signature does not match then you get a compiler error.

I.e.:

auto identity(T)(T t) { return t; }
auto fun(T)(T t) {
     return t.identity;
}

void main() {
     fun(3).writeln; // ok, print 3

     struct S1 {}
     fun(S1()).writeln; // ok, prints S1()

     struct S2 { int identity() { return 77; } }
     fun(S2()).writeln; // silent hijack, prints 77

     struct S3 { int identity(int i) { return i + 2; } }
     fun(S3()).writeln; // compile error
}

So the problem is that fun wants to use a utility function that 
it knows about, but it turns out that T can hijack that internal, 
private, utility if you use ufcs.

So basically, it seems like ufcs is fun until it doesn't work, 
and it's not really obvious that you can run in to these problems 
with ufcs.

I feel like the last case where it's a compilation error should 
maybe be ok though, since the function being called is actually 
undefined so ufcs should kick in. The problem as I understand it 
is that ufcs is not part of an overload set. Making it part of it 
would not prevent the silent hijack, but would remove the 
compilation problem.

Is there something I'm not seeing as to why UFCS is not part of 
the overload set, and is there a way other than not using UFCS to 
prevent the silent hijacking?

Cheers
- Ali

Mar 10 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Saturday, March 10, 2018 21:50:42 aliak via Digitalmars-d-learn wrote:
 What are the recommended guidelines for using/not using UFCS in
 writing generic libraries?

 I ask because if you have an internal generic free function that
 you use on types in a generic algorithm via ufcs, then everything
 works fine until the type being operated on has a member function
 with a similar name.

The idea is that the type can provide its own version of the function that
is better optimized for it - e.g. it could potentially provide a member
function find that is more efficient for it than
std.algorithm.searching.find. That's actually the only technical reason why
UFCS is superior to the normal function call syntax. Everything else is a
matter of personal preference, though some folks prefer UFCS enough that
they use it everywhere. And as long as the code isn't generic, that's
usually not a problem, but using UFCS in generic code with a function that
isn't well-known can risk problems if it happens to match a member function.
The main reason that this isn't generally a problem is that most generic
code operates on either a fairly specific subset of types or on a specific
API where types implementing that API don't usually implement extra
functions (in particular, ranges normally only define the range API
functions, so it's rare that they have member functions that conflict with
anything). But if you're worried about it or want a specific function to be
called that definitely isn't a member function, then just don't use UFCS.
The situation that you're concerned about is not one that seems to be much
of an issue in practice. That doesn't mean that it's never a problem, but
from what I've seen, it's very rarely a problem, and it's easy to work
around if you run into a particular case where it is a problem.

The one case that I am aware of where best practice is to avoid UFCS is with
put for output ranges, but that has nothing to with your concerns here.
Rather, it has to do with the fact that std.range.primitives.put has a lot
of overloads for handling various arguments (particularly when handling
ranges of characters), and almost no one implements their output ranges with
all of those overloads. So, if you use put with UFCS, you tend to run into
problems if you do anything other than put a single element of the exact
type at a time, whereas the free function handles more cases (even if they
ultimately end up calling that member function with a single argument of the
exact type). We probably shouldn't have had the free function and the member
function share the same name.

 Is there something I'm not seeing as to why UFCS is not part of
 the overload set, and is there a way other than not using UFCS to
 prevent the silent hijacking?

If it were part of the overload set, then you have problems calling member
functions, particularly because there is no way to call a member function
other than with UFCS, whereas you can call free functions without UFCS, and
if you _really_ want to be sure that a very specific function is called,
then you can even give its entire module path when calling it. When UFCS was
introduced, it was decided that having member functions always win out would
cause the fewest problems.

- Jonathan M Davis

Mar 10 2018

aliak <something something.com> writes:

On Saturday, 10 March 2018 at 23:00:07 UTC, Jonathan M Davis 
wrote:
 The idea is that the type can provide its own version of the 
 function that is better optimized for it - e.g. it could 
 potentially provide a member function find that is more 
 efficient for it than std.algorithm.searching.find. That's 
 actually the only technical reason why UFCS is superior to the 
 normal function call syntax.

I think this may have hit the nail on the spot. So basically my 
thinking is that if you're going to use UFCS inside a generic 
function where you can't know what kind of methods the type has, 
do not do it unless you want this particular behavior or if your 
intent is to use a type's known API.

 issue in practice. That doesn't mean that it's never a problem, 
 but from what I've seen, it's very rarely a problem, and it's 
 easy to work around if you run into a particular case where it 
 is a problem.

Ya, it's easy to work around but the caveat there is you need to 
realize it's happening first, and add that to that it's "rarely a 
problem" and well ... now it seems scary enough for this to 
mentioned somewhere I'd say.

 The one case that I am aware of where best practice is to avoid 
 UFCS is with put for output ranges, but that has nothing to 
 with your concerns here. Rather, it has to do with the fact 
 that std.range.primitives.put has a lot of overloads for 
 handling various arguments (particularly when handling ranges 
 of characters), and almost no one implements their output 
 ranges with all of those overloads. So, if you use put with 
 UFCS, you tend to run into problems if you do anything other 
 than put a single element of the exact type at a time, whereas 
 the free function handles more cases (even if they ultimately 
 end up calling that member function with a single argument of 
 the exact type). We probably shouldn't have had the free 
 function and the member function share the same name.

Oh, can you share a short example here maybe? Not sure I followed 
completely

Is it basically:

// if r is output range

r.put(a, b) // don't do this?

put(r, a, b) // but do this?

(Cause compilation error)


 Is there something I'm not seeing as to why UFCS is not part 
 of the overload set, and is there a way other than not using 
 UFCS to prevent the silent hijacking?

 If it were part of the overload set, then you have problems 
 calling member functions, particularly because there is no way 
 to call a member function other than with UFCS, whereas you can 
 call free functions without UFCS, and if you _really_ want to 
 be sure that a very specific function is called, then you can 
 even give its entire module path when calling it. When UFCS was 
 introduced, it was decided that having member functions always 
 win out would cause the fewest problems.

 - Jonathan M Davis

Ah yes, ok this makes sense.

Follow up:

How about if it's not part of the overload set, but is looked up 
if the function does not exist in the overload set. What would 
the problems there be?

Basically I don't see a reason why we wouldn't want the following 
to work:

struct S { void f() {} }
void f(S s, int i) {}
S().f(3); // error

Thanks!

Mar 11 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, March 11, 2018 08:39:54 aliak via Digitalmars-d-learn wrote:
 On Saturday, 10 March 2018 at 23:00:07 UTC, Jonathan M Davis
 issue in practice. That doesn't mean that it's never a problem,
 but from what I've seen, it's very rarely a problem, and it's
 easy to work around if you run into a particular case where it
 is a problem.

 Ya, it's easy to work around but the caveat there is you need to
 realize it's happening first, and add that to that it's "rarely a
 problem" and well ... now it seems scary enough for this to
 mentioned somewhere I'd say.

You're talking about a situation where you used a function whose parameters
match that of a member function exactly enough that a member function gets
called instead of a free function. That _can_ happen, but in most cases,
there's going to be a mismatch, and you'll get a compiler error if the type
defines a member function that matches the free function. I don't think that
I have ever seen that happen or ever seen anyone complain about it. The only
case I recall along those lines was someone who was trying to use a free
function that they'd decided to call front instead of something else, and it
had parameters beyond just the input range, so that programmer got
compilation errors when they tried to use it in their range-based functions.

I think that this is really a theoretical concern and not a practical one.
Certainly, it's really only going to potentially be an issue in library code
that gets used by a ton of folks with completely unexpected types. If it's
in your own code, you're usually well aware of what types are going to be
used with a generic function, and proper testing would catch the rare case
where there would be a problem. If you're really worried about it, then just
don't use UFCS, but for better or worse, it seems to be the case that the
vast majority of D programmers use UFCS all the time and don't run into
problems like this.

 The one case that I am aware of where best practice is to avoid
 UFCS is with put for output ranges, but that has nothing to
 with your concerns here. Rather, it has to do with the fact
 that std.range.primitives.put has a lot of overloads for
 handling various arguments (particularly when handling ranges
 of characters), and almost no one implements their output
 ranges with all of those overloads. So, if you use put with
 UFCS, you tend to run into problems if you do anything other
 than put a single element of the exact type at a time, whereas
 the free function handles more cases (even if they ultimately
 end up calling that member function with a single argument of
 the exact type). We probably shouldn't have had the free
 function and the member function share the same name.

 Oh, can you share a short example here maybe? Not sure I followed
 completely

 Is it basically:

 // if r is output range

 r.put(a, b) // don't do this?

 put(r, a, b) // but do this?

 (Cause compilation error)

Essentially yes, though you're passing too many arguments to put. There are
cases where put(output, foo) will compile while output.put(foo) will not. In
particular, std.range.primitives.put will accept both individual elements to
be written to the output range and ranges of elements to be written, whereas
typically, an output range will be written to only accept an element at a
time. It's even more extreme with output ranges of characters, because the
free function put will accept different string types and convert them, and
even if the programmer who designed the output range added various overloads
to put for completeness, it's enough extra work to deal with all of the
various character types that they probably didn't. And put also works with
stuff like delegates (most frequently used with a toString that accepts an
output range), which don't have member functions. So, if you write your
generic code to use the member function put, it's only going to work with
user-defined types that define the particular overload(s) of put that you're
using in your function, whereas if you use the free function, you have more
variety in the types of output ranges that your code works with, and you
have more ways that you can call put (e.g. passing a range of elements
instead of a single element).

 How about if it's not part of the overload set, but is looked up
 if the function does not exist in the overload set. What would
 the problems there be?

 Basically I don't see a reason why we wouldn't want the following
 to work:

 struct S { void f() {} }
 void f(S s, int i) {}
 S().f(3); // error

So, are you complaining that it's an error, or you want it to be an error?
As it stands, it's an error, because as far as the compiler is concerned,
you tried to call a member function with an argument that it doesn't accept.

If you want that code to work, then it would have to add the free function
to the overload set while somehow leaving out the overloads that matches the
member function, which isn't how D deals with overloading at this point. But
if it did, then you have problems as soon as the type adds another member
function overload. Also, if you have a free function that matches the name
of a member function but where their parameters don't match, wouldn't they
be unrelated functions? At that point, if you wrote code that accidentally
matched the free function instead of the member function, you end up with
code hijacking. Just because you made a mistake when typing the code, you
called entirely the wrong function, and it's very hard to see, because the
function names match. Hopefully, testing will catch it (and there's a decent
chance that it will), but essentially, the member function has been hijacked
by the free function.

D's overload rules were written with a strong bias towards preventing
function hijacking. To an extent, that's impossible once UFCS comes into
play, and Walter went with the choice that hijacked the least and was the
simplest to deal with. Basically, once UFCS comes into play, you have these
options:

1. Put all of the functions in the overload set.
2. The member function wins.
3. The free function wins.
4. Have a pseudo-overload set where when there's a conflict between a member
   function and a free function, the member function wins, but free
   functions that don't match can be called as well.
5. Have a pseudo-overload set where when there's a conflict between a member
   function and a free function, the free function wins, but member
   functions that don't match can be called as well.

If it's ever the case that the free function wins, then you can't call the
member function if the free function is available, which definitely causes

set, then you're in basically the same boat, because you can't call the
member function if there's a conflict. It's just that the free function
results in a compilation error as well without using an alias or the full



fit with how D does overloads in general, you run the risk of the free
function hijacking the member function whenever there's a mistake, and you
have problems whenever the member functions are altered, making it so that

what we have.

Basically, D's overload rules are designed to favor compilation errors over
the risk of calling the wrong function, and while its import system provides
ways to differentiate between free functions, it really doesn't provide a
way to differentiate between a member function and a free function except
via whether you use UFCS or not. And when those facts are taken into
account, it makes the most sense for member functions to just win whenenver
a free function and a member function have the same name. It also has the
bonus that it reduces compilation times, because if a free function could
ever trump a member function or was in any fashion included in its overload
set, then the compiler would have to check all of the available functions
when UFCS is used instead of looking at the member functions and then only
looking at free functions if there was no member function with that name.

- Jonathan M Davis

Mar 11 2018

aliak <something something.com> writes:

On Sunday, 11 March 2018 at 15:24:31 UTC, Jonathan M Davis wrote:
 On Sunday, March 11, 2018 08:39:54 aliak via 
 Digitalmars-d-learn wrote:
 On Saturday, 10 March 2018 at 23:00:07 UTC, Jonathan M Davis
 issue in practice. That doesn't mean that it's never a 
 problem, but from what I've seen, it's very rarely a 
 problem, and it's easy to work around if you run into a 
 particular case where it is a problem.

 Ya, it's easy to work around but the caveat there is you need 
 to realize it's happening first, and add that to that it's 
 "rarely a problem" and well ... now it seems scary enough for 
 this to mentioned somewhere I'd say.

 You're talking about a situation where you used a function 
 whose parameters match that of a member function exactly enough 
 that a member function gets called instead of a free function. 
 That _can_ happen, but in most cases, there's going to be a 
 mismatch, and you'll get a compiler error if the type defines a 
 member function that matches the free function. I don't think 
 that I have ever seen that happen or ever seen anyone complain 
 about it. The only case I recall along those lines was someone 
 who was trying to use a free function that they'd decided to 
 call front instead of something else, and it had parameters 
 beyond just the input range, so that programmer got compilation 
 errors when they tried to use it in their range-based functions.

Not saying it's common, just something to be aware of that is 
non-obvious (well it was not to me at least when I started 
getting in to D). It's _probably_ not going to be a problem, but 
if it ever is then it's going to be a very hard to detect one. 
And sure, the solution is to just not use ufcs to be certain, but 
ufcs is pretty damn appealing, which is probably why I didn't 
realize this at the beginning. As generic codes bases grow, the 
chances of this happening is certainly not 0 though.

 Essentially yes, though you're passing too many arguments to 
 put. There are cases where put(output, foo) will compile while 
 output.put(foo) will not. In particular, 
 std.range.primitives.put will accept both individual elements 
 to be written to the output range and ranges of elements to be 
 written, whereas typically, an output range will be written to 
 only accept an element at a time. It's even more extreme with 
 output ranges of characters, because the free function put will 
 accept different string types and convert them, and even if the 
 programmer who designed the output range added various 
 overloads to put for completeness, it's enough extra work to 
 deal with all of the various character types that they probably 
 didn't. And put also works with stuff like delegates (most 
 frequently used with a toString that accepts an output range), 
 which don't have member functions. So, if you write your 
 generic code to use the member function put, it's only going to 
 work with user-defined types that define the particular 
 overload(s) of put that you're using in your function, whereas 
 if you use the free function, you have more variety in the 
 types of output ranges that your code works with, and you have 
 more ways that you can call put (e.g. passing a range of 
 elements instead of a single element).

Ooh ouch, well that's certainly good to know about.

 Basically I don't see a reason why we wouldn't want the 
 following to work:

 struct S { void f() {} }
 void f(S s, int i) {}
 S().f(3); // error

 So, are you complaining that it's an error, or you want it to 
 be an error? As it stands, it's an error, because as far as the 
 compiler is concerned, you tried to call a member function with 
 an argument that it doesn't accept.

Complaining that it is an error :) well, not complaining, more 
trying to understand why really. And I appreciate you taking the 
time to explain. There're a lot of points in there so here we 
go...

 If you want that code to work, then it would have to add the 
 free function to the overload set while somehow leaving out the 
 overloads that matches the member function, which isn't how D 
 deals with overloading at this point.

Yeah, I'd say that's an implementation detail, but the main idea 
would be to treat an overload set that completely fails as an 
undefined function so that ufcs would kick in. Your problems with 
put would also go away then and implementing an output range 
would be less of a hassle.

 But if it did, then you have problems as soon as the type adds 
 another member function overload.

I'm not sure I see how. The member function would win out. This 
is the situation now anyway, with the added (IMO) disadvantage of 
ufcs being unusable then.

 Also, if you have a free function that matches the name of a 
 member function but where their parameters don't match, 
 wouldn't they be unrelated functions?

Well, maybe. The free function takes T as the first parameter so 
it's certainly related to the type. I suppose they are unrelated 
in the same way that:

struct S { f() {} }
g(S s) {}

g and f are unrelated.

 At that point, if you wrote code that accidentally matched the 
 free function instead of the member function, you end up with 
 code hijacking.

I'm not sure if code hijacking is the correct term here. This is 
a programmer error. It's exactly the same as if you have f(int) 
and f(long) and you call f(3) expecting to call f(long). Or if 
you have f(int, int) and f(int) and you accidentally type f(1) 
instead of f(1, 1).

 Just because you made a mistake when typing the code, you 
 called entirely the wrong function, and it's very hard to see, 
 because the function names match. Hopefully, testing will catch 
 it (and there's a decent chance that it will), but essentially, 
 the member function has been hijacked by the free function.

The exact same arguments can be made against function overloading 
here. This is as much a hijack as calling the wrong overload.

 D's overload rules were written with a strong bias towards 
 preventing function hijacking. To an extent, that's impossible 
 once UFCS comes into play, and Walter went with the choice that 
 hijacked the least and was the simplest to deal with.

Ya, I can understand it's a hard problem. So as it stands now, a 
member function can hijack an intended ufcs call of a free 
function. The case you've mentioned above though I'm not sure 
qualifies as hijacking. In the above case where a programmer 
accidentally types a name wrong, or parameters wrong, they've 
made a mistake. They wanted to call function f but they typed it 
wrong so they're calling function g. In this other case where a 
member function hijacks a ufcs call, the programmer intended to 
call f, typed f, but is somehow calling g.

 Basically, once UFCS comes into play, you have these options:

 1. Put all of the functions in the overload set.
 2. The member function wins.
 3. The free function wins.
 4. Have a pseudo-overload set where when there's a conflict 
 between a member
    function and a free function, the member function wins, but 
 free
    functions that don't match can be called as well.
 5. Have a pseudo-overload set where when there's a conflict 
 between a member
    function and a free function, the free function wins, but 
 member
    functions that don't match can be called as well.

 If it's ever the case that the free function wins, then you 
 can't call the member function if the free function is 

 out.  If all of the functions are in the overload set, then 
 you're in basically the same boat, because you can't call the 
 member function if there's a conflict. It's just that the free 
 function results in a compilation error as well without using 
 an alias or the full import path or some other trick to get at 


3, 5 and 1, yes, all out, completely agree here.



 the risk of the free function hijacking the member function 
 whenever there's a mistake, and you have problems whenever the 
 member functions are altered, making it so that which function 
 gets called can change silently.


overloads in general. And you make a good point of getting a 
silent ufcs call if you alter a member function after the fact 
though. That would certainly be unwanted.

Hmm... ok touche on that part. I think I may agree with the 
current D implementation just because of that last point of yours 
now. I'm not entirely sure yet, need to think about it.

Now I'm thinking that if you really want to write a utility 
function that acts on generic code, but you also want to allow 
specialization by a type, then this (not sure it works, not 
tested):

void util(T, U)(T t, U u) if (hasMember!(T, "util") && 
is(typeof(t.util(u)))) {
     t.util(u);
}

void util(T t, int a) // int case
void util(T t, string a) // string case
void util(T, U)(T t, U u) {
   // generic case, probably needs constraints I can't think of 
though.
}

And then later:

void g(T)(T t) {
   util(t, 3);
}

Now you get all your cases handled and no compilation error if T 
implements one of the cases of util but not the others (I wonder 
if free function put does this?)



 Basically, D's overload rules are designed to favor compilation 
 errors over the risk of calling the wrong function, and while 
 its import system provides ways to differentiate between free 
 functions, it really doesn't provide a way to differentiate 
 between a member function and a free function except via 
 whether you use UFCS or not. And when those facts are taken 
 into account, it makes the most sense for member functions to 
 just win whenenver a free function and a member function have 
 the same name. It also has the bonus that it reduces 
 compilation times, because if a free function could ever trump 
 a member function or was in any fashion included in its 
 overload set, then the compiler would have to check all of the 
 available functions when UFCS is used instead of looking at the 
 member functions and then only looking at free functions if 
 there was no member function with that name.

 - Jonathan M Davis

I'm not giving you the compilation times bonus point :p Yes I do 
agree it saves time but I doubt this would be an issue that would 
stop implementation if the things above were not an issue.

Cheers, thanks again for taking the time.
- Ali

Mar 13 2018

D Programming

C/C++ Programming

Other

digitalmars.D.learn - UFCS in generic libraries, silent hijacking, and compile errors.