digitalmars.D - opCmp / opEquals do not actually support partial orders

H. S. Teoh (55/55) Jul 17 2018 As we know, when opCmp is defined for a user type, then opEquals must

John Colvin (3/47) Jul 17 2018 Just do what std.typecons.Proxy does and return float.nan for the

Dominikus Dittes Scherkl (16/23) Jul 18 2018 Yes, that's the only way. Having this 4th value of opCmp is
Ivan Kazmenko (36/38) Jul 18 2018 Isn't it slow though on current processors? I just threw

Dominikus Dittes Scherkl (154/160) Jul 18 2018 No, floats are a whole lot less slow.

Ivan Kazmenko (27/30) Jul 18 2018 Are they? Locally, I don't see much difference. Code:

rikki cattermole (3/8) Jul 18 2018 Boy do I ever have some bad news for you!

Ivan Kazmenko (10/18) Jul 18 2018 Wow, thanks!

Jonathan M Davis (7/9) Jul 18 2018 Since when is that legal? I thought that it was required for opCmp to re...

Dominikus Dittes Scherkl (11/20) Jul 19 2018 It always worked with float as returntype (at least since I'm

Simen =?UTF-8?B?S2rDpnLDpXM=?= (7/25) Jul 19 2018 The oldest reference to opCmp returning NaN that I've found is

H. S. Teoh (19/28) Jul 18 2018 [...]

Dominikus Dittes Scherkl (5/13) Jul 19 2018 It already works with float, no recursion. A lot of the types I

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

As we know, when opCmp is defined for a user type, then opEquals must
also be defined in order for == to work, even though in theory the
compiler could translate x==y into x.opCmp(y)==0.

In the past, it was argued that this was so that partial orders could be
made to work, i.e., it could be the case that x and y are incomparable,
then x.opCmp(y) would return 0, and x.opEquals(y) would be false.
Supposedly, this would allow us to, say, model the subset relation
(which is a partial order) using the comparison operators.

However, this supposition is in fact false.  The reason is that `<=` and
`>=` *only* check the return value of opCmp(); they do not check
opEquals at all.  So suppose we define a Set type, and we'd like to use
the comparison operators to model the subset relation.  How would we
define opCmp and opEquals?

opEquals is obvious, of course.  It's true if x and y have the same
elements, false otherwise.

But opCmp turns out to be a tarpit.  Here's why:

- If x is a (strict) subset of y, then obviously we should return -1.

- If x is a (strict) superset of y, then obviously we return 1.

- If x and y have the same elements, then obviously we should return 0,
  since we'd like x <= y and x >= y to be true when x == y is true.

- If x and y are not subsets of each other, then what should we return?
  According to the original claim, it should also return 0, for
  "incomparable".  However, this leads to problems:

  - Since `x <= y` in D means `x.opCmp(y) <= 0`, this will return TRUE
    when x and y are incomparable.

  - Similarly, `x >= y` in D means `x.opCmp(y) >= 0`, so it will also
    return TRUE when x and y are incomparable.

  - Therefore, `x <= y` cannot distinguish between x being a non-strict
    subset of y vs. x and y being incomparable.  Similarly for `x >= y`.

So we have the counterintuitive semantics that <= and >= will return
true for non-comparable sets.

There is no return value of opCmp for which both <= and >= will be
false, as we need it to be if we are to map <= to ⊆ and >= to ⊇.

Furthermore, this situation cannot be remedied by redefining < and > to
be ⊆ and ⊇ (as we might try to do in order to work around <= and >= not
checking the return value of opEquals), because when x == y, then there
is no return value that opCmp could return that would make `x < y` and
`x > y` both true simultaneously.

IOW, with the current semantics of opEquals and opCmp, it is impossible
to map the semantics of ⊆ and ⊇ to the comparison operators, in spite of
the fact that we allow opCmp() to return 0 when opEquals returns false.

Conclusion: the claim that opCmp/opEquals could be made to support
partial orders is patently false.

In fact, it cannot even be made to model NaN's in floating-point
arithmetic, which is a mostly-linear ordering, because there is no way
to make <= and >= both false using opCmp() when NaN's are involved
(e.g., in a user-defined floating-point wrapper type).  The best you can
get is that `x <= NAN` and `x >= NAN` is always true.

Corollary: allowing opCmp()==0 to be inconsistent with opEquals()==true
is useless, since we cannot actually use it to support any meaningful
ordering that isn't a linear ordering.  Thus, it only serves as a source
of confusion, and should be made illegal in the language.


T

-- 
EMACS = Extremely Massive And Cumbersome System

Jul 17 2018

John Colvin <john.loughran.colvin gmail.com> writes:

On Tuesday, 17 July 2018 at 18:21:26 UTC, H. S. Teoh wrote:
 As we know, when opCmp is defined for a user type, then 
 opEquals must also be defined in order for == to work, even 
 though in theory the compiler could translate x==y into 
 x.opCmp(y)==0.

 In the past, it was argued that this was so that partial orders 
 could be made to work, i.e., it could be the case that x and y 
 are incomparable, then x.opCmp(y) would return 0, and 
 x.opEquals(y) would be false. Supposedly, this would allow us 
 to, say, model the subset relation (which is a partial order) 
 using the comparison operators.

 However, this supposition is in fact false.  The reason is that 
 `<=` and `>=` *only* check the return value of opCmp(); they do 
 not check opEquals at all.  So suppose we define a Set type, 
 and we'd like to use the comparison operators to model the 
 subset relation.  How would we define opCmp and opEquals?

 opEquals is obvious, of course.  It's true if x and y have the 
 same elements, false otherwise.

 But opCmp turns out to be a tarpit.  Here's why:

 - If x is a (strict) subset of y, then obviously we should 
 return -1.

 - If x is a (strict) superset of y, then obviously we return 1.

 - If x and y have the same elements, then obviously we should 
 return 0,
   since we'd like x <= y and x >= y to be true when x == y is 
 true.

 - If x and y are not subsets of each other, then what should we 
 return?
   According to the original claim, it should also return 0, for
   "incomparable".  However, this leads to problems:

   - Since `x <= y` in D means `x.opCmp(y) <= 0`, this will 
 return TRUE
     when x and y are incomparable.

   - Similarly, `x >= y` in D means `x.opCmp(y) >= 0`, so it 
 will also
     return TRUE when x and y are incomparable.

   - Therefore, `x <= y` cannot distinguish between x being a 
 non-strict
     subset of y vs. x and y being incomparable.  Similarly for 
 `x >= y`.

 So we have the counterintuitive semantics that <= and >= will 
 return true for non-comparable sets.

 There is no return value of opCmp for which both <= and >= will 
 be false, as we need it to be if we are to map <= to ⊆ and >= 
 to ⊇.

Just do what std.typecons.Proxy does and return float.nan for the 
incomparable case.

Jul 17 2018

Dominikus Dittes Scherkl <dominikus.scherkl continental-corporation.com> writes:

On Tuesday, 17 July 2018 at 21:18:12 UTC, John Colvin wrote:
 On Tuesday, 17 July 2018 at 18:21:26 UTC, H. S. Teoh wrote:
 But opCmp turns out to be a tarpit.  Here's why:


   According to the original claim, it should also return 0, for
   "incomparable".  However, this leads to problems:


Indeed. And it doesn't make sense at all.

 Just do what std.typecons.Proxy does and return float.nan for 
 the incomparable case.

Yes, that's the only way. Having this 4th value of opCmp is 
necessary
for may types. In fact opCmp() it practically the only place 
where I
ever use the type "float", just to have the NaN.
If I really need floatingpoint arithmetic, I always use real (or 
at least double).

I would like to have a "signed boolean" type (with the values -1, 
0, 1 and NaN)
simply for all kind of sign operations. But ok, float is 32bit, 
and IEEE
suggests a "half" type (16bit). As a "signed boolean" would need 
a byte anyway
there is no too much to gain.

Jul 18 2018

Ivan Kazmenko <gassa mail.ru> writes:

On Tuesday, 17 July 2018 at 21:18:12 UTC, John Colvin wrote:
 Just do what std.typecons.Proxy does and return float.nan for 
 the incomparable case.

Isn't it slow though on current processors?  I just threw 
together a test program.

-----
import std.datetime.stopwatch, std.math, std.stdio;
immutable double limit = 10 ^^ 7;
void main () {
     int s;
     auto sw = StopWatch (AutoStart.yes);
     auto start = sw.peek ();
     for (double x = NaN (0), i = 0; i < limit; i++)
         s += (i < x);
     auto now = sw.peek ();
     writeln (now - start, ", sum is ", s);
}
-----

Essentially, it compares a double x (initialized as a quiet NaN 
with payload 0) to a numeric double i, ten million times.  
Leaving x uninitialized, or using floats, work about the same.  
And here is the result:

-----
1 sec, 593 ms, 116 μs, and 3 hnsecs, sum is 0
-----

So, for a comparison involving NaN, we can do only a few million 
of them a second.  Granted, my Core i7-2600K is more than 5 years 
old already, but the situation is unlikely to get any better.  
But that's still a couple orders of magnitude slower than normal 
vs. normal floating-point comparison: try assigning some regular 
number to x, then the test takes ~50ms.

As far as I know, rare floating-point operations (such as this, 
or using subnormal numbers) are, or rather should be, treated as 
exceptions.  The hardware manufacturers neglect such rare 
operations to fit a bit more efficiency in the more common ones 
(or they are just lazy).  As a result, the developers don't 
overuse them.

Ivan Kazmenko.

Jul 18 2018

Dominikus Dittes Scherkl <dominikus.scherkl continental-corporation.com> writes:

On Wednesday, 18 July 2018 at 13:12:05 UTC, Ivan Kazmenko wrote:
 On Tuesday, 17 July 2018 at 21:18:12 UTC, John Colvin wrote:
 Just do what std.typecons.Proxy does and return float.nan for 
 the incomparable case.

 Isn't it slow though on current processors?  I just threw 
 together a test program.
 Leaving x uninitialized, or using floats, work about the same.

No, floats are a whole lot less slow.
But I agree, I would still prefer to have a signed bool which 
uses only 1 byte
and provide _much_ faster comparison to NaN. And I have created 
one,
but as it is not supported by the language natively, it 
internally has to use
float, so is not faster (or even slower):

/// signed boolean type (2bit), has the values neg (0b11), zero 
(0b00), pos (0b01) and nan (0b10)
/// this is an extended boolean that split "true" into 
positive/negative and "false" into zero/NaN
struct sbool
{
pure:
 safe:
 nogc:
nothrow:
    enum { neg = 0b11, zero = 0b00, pos = 0b01, nan = 0b10 };

    this(T)(const(T) x) if(isNumeric!T)
    {
       static if(is(Unqual!T == sbool))
          val = x.val; /// pos -> 1, neg -> 3, zero -> 0, nan -> 2
       else static if(is(Unqual!T == bool))
          val = x ? pos : zero;
       else static if(isUnsigned!T)
          val = x.isInvalid ? nan : (x>0) ? pos : zero;
       else // signed or safe signed
          val = x.isInvalid ? nan : (x<0) ? neg : (x>0) ? pos : 
zero;
    }

    T opCast(T)() const if(isNumeric!T)
    {
       static if(is(Unqual!T == sbool))
          return this;
       else static if(is(Unqual!T == bool))
          return val&1; // pos and neg are mapped to true, zero 
and NaN are mapped to false
       else static if(isFloatingPoint!T)
          return tsgn[val];
       else
          return val == nan ? invalid!T : cast(T)tsgn[val];
    }

    sbool opAssign(T)(const(T) x) if(isNumeric!T)
    {
       return this(x);
    }

    sbool opOpAssign(string op)(const sbool x) if(op == "+" || op 
== "-" || op == "*" || op == "/" || op == "%")
    {
       static if(op == "+") // attention! pos+neg = neg+pos = nan 
!!
          val = tadd[val];
       else static if(op == "-") // attention! pos-pos = neg-neg = 
nan !!
          val = tsub[val];
       else static if(op == "*")
          val = tmul[val];
       else static if(op == "/")
          val = tdiv[val];
       else static if(op == "%")
          val = trem[val];
       val >>= x.val<<1;
       val &= 3;
       return this;
    }

    sbool opUnary(string op)() if(op == "++" || op == "--")
    {
       mixin(op~"val");
       val &= 3;
       return this;
    }

    sbool opUnary(string op)() const if(op == "+" || op == "-" || 
op == "~")
    {
       static if(op == "+") return this;
       sbool r = this;
       mixin("r.val = "~op~"r.val");
       r.val &= 3;
       return r;
    }

    sbool opBinary(string op)(const(sbool) x) const if(op == "+" 
|| op == "-" || op == "*" || op == "/" || op == "%")
    {
       sbool r = this;
       return mixin("r "~op~"= x");
    }

    Signed!T opBinary(string op, T)(const(T) x) const 
if(isNumeric!T && op == "*")
    {
       static if(isUnsigned!T)
       {
          alias R = Signed!T;
          if(val == nan || x.isInvalid) return invalid!R;
          if(val == zero) return 0;
          if(x > R.max) return invalid!R;
          return (val == pos) ? R(x) : -R(x);
       }
       else // signed or float: return type is same as T
       {
          if(x.isInvalid) return x;
          final switch(val)
          {
          case pos: return x;
          case neg: return -x;
          case zero: return 0;
          case nan: return invalid!T;
          }
       }
    }

    Signed!T opBinaryRight(string op, T)(const(T) x) const 
if(isNumeric!T && op == "*")
    {
       return opBinary!"*"(x);
    }
private:
    ubyte val = nan;

    static immutable float[4] tsgn = [ 0.0f, 1.0f, float.init, 
-1.0f ];
    // composition tables              0: -1  N  1  0  1: -1  N  1 
  0  N: -1  N  1  0 -1: -1  N  1  0
    static immutable ubyte[4] tadd = [ 0b_11_10_01_00, 
0b_10_10_01_01, 0b_10_10_10_10, 0b_11_10_10_11 ];
    static immutable ubyte[4] tsub = [ 0b_01_10_11_00, 
0b_01_10_10_01, 0b_10_10_10_10, 0b_10_10_11_11 ];
    static immutable ubyte[4] tmul = [ 0b_00_10_00_00, 
0b_11_10_01_00, 0b_10_10_10_10, 0b_01_10_11_00 ];
    static immutable ubyte[4] tdiv = [ 0b_00_10_00_10, 
0b_11_10_01_10, 0b_10_10_10_10, 0b_01_10_11_10 ];
    static immutable ubyte[4] trem = [ 0b_00_10_00_10, 
0b_01_10_01_10, 0b_10_10_10_10, 0b_11_10_11_10 ];
    // remainder table is designed so, that if you define
    // quot = (abs(a)/abs(b)) * (sbool(a)/sbool(b));
    // rem  = (abs(a)%abs(b)) * (sbool(a)%sbool(b));
    // then   assert(a == quot*b + rem)   holds for all (a,b) with 
b != 0
}

unittest
{
    byte quot, rem;
    for(byte a = 127; a >= -127; --a)
    {
       for(byte b = 127; b >= -127; --b)
       {
          quot = cast(byte)( (abs(a)/abs(b)) * (sbool(a)/sbool(b)) 
);
          rem  = cast(byte)( (abs(a)%abs(b)) * (sbool(a)%sbool(b)) 
);
          assert((b == 0 && isInvalid(quot) && isInvalid(rem)) || 
a == quot*b + rem);
       }
    }
}

Jul 18 2018

Ivan Kazmenko <gassa mail.ru> writes:

On Wednesday, 18 July 2018 at 14:02:28 UTC, Dominikus Dittes 
Scherkl wrote:
 On Wednesday, 18 July 2018 at 13:12:05 UTC, Ivan Kazmenko wrote:
 Leaving x uninitialized, or using floats, work about the same.

 No, floats are a whole lot less slow.

Are they?  Locally, I don't see much difference.  Code:

-----
import std.datetime.stopwatch, std.math, std.stdio;
immutable float limit = 10 ^^ 7;
void main () {
     int s;
     auto sw = StopWatch (AutoStart.yes);
     auto start = sw.peek ();
     for (float x = NaN (0), i = 0; i < limit; i++)
         s += (i < x);
     auto now = sw.peek ();
     writeln (now - start, ", sum is ", s);
}
-----

Result:

-----
1 sec, 467 ms, 40 μs, and 6 hnsecs, sum is 0
-----

Fluctuating within a few per cent, but very similar to version 
with doubles.

That's by DMD32 on Windows.  (Sorry, my DMD64 broke after 
upgrading Visual Studio to 2017, and I failed to fix it right 
now.  Anyway, it's not like x86_64 uses a different set of 
general purpose floating-point hardware, right?)

Ivan Kazmenko.

Jul 18 2018

rikki cattermole <rikki cattermole.co.nz> writes:

On 19/07/2018 3:03 AM, Ivan Kazmenko wrote:
 
 That's by DMD32 on Windows.  (Sorry, my DMD64 broke after upgrading 
 Visual Studio to 2017, and I failed to fix it right now.  Anyway, it's 
 not like x86_64 uses a different set of general purpose floating-point 
 hardware, right?)

Boy do I ever have some bad news for you!

SSE for 64bit and x87 for 32bit, as per run.dlang.org.

Jul 18 2018

Ivan Kazmenko <gassa mail.ru> writes:

On Wednesday, 18 July 2018 at 15:13:24 UTC, rikki cattermole 
wrote:
 On 19/07/2018 3:03 AM, Ivan Kazmenko wrote:
 
 That's by DMD32 on Windows.  (Sorry, my DMD64 broke after 
 upgrading Visual Studio to 2017, and I failed to fix it right 
 now.  Anyway, it's not like x86_64 uses a different set of 
 general purpose floating-point hardware, right?)

 Boy do I ever have some bad news for you!

 SSE for 64bit and x87 for 32bit, as per run.dlang.org.

Wow, thanks!

As per https://run.dlang.io/, it's fast for float and double, but 
slow for reals (which are 80 bits and don't fit into the fancy 
instructions you mention).  Unfortunately, it fails to compile 
with -m32, but anyway, point taken.

As an aside, learning something new after virtually every post is 
why I love the D forum/newsgroup.

Ivan Kazmenko.

Jul 18 2018

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Tuesday, July 17, 2018 21:18:12 John Colvin via Digitalmars-d wrote:
 Just do what std.typecons.Proxy does and return float.nan for the
 incomparable case.

Since when is that legal? I thought that it was required for opCmp to return
int. Certainly, the spec implies that it has to be int. The fact that the
compiler allows it seems like a bug, though if Phobos is doing it, it
wouldn't surprise me if Walter would choose to update the spec rather than
fixing the compiler.

- Jonathan M Davis

Jul 18 2018

Dominikus Dittes Scherkl <dominikus.scherkl continental-corporation.com> writes:

On Wednesday, 18 July 2018 at 17:30:21 UTC, Jonathan M Davis 
wrote:
 On Tuesday, July 17, 2018 21:18:12 John Colvin via 
 Digitalmars-d wrote:
 Just do what std.typecons.Proxy does and return float.nan for 
 the incomparable case.

 Since when is that legal? I thought that it was required for 
 opCmp to return int. Certainly, the spec implies that it has to 
 be int. The fact that the compiler allows it seems like a bug, 
 though if Phobos is doing it, it wouldn't surprise me if Walter 
 would choose to update the spec rather than fixing the compiler.

It always worked with float as returntype (at least since I'm 
using D
- about 2.40 or so?), and it was necessary for the (unfortunately
deprecated) special operators <>, !<>, <>=, !<>=, ...

would really pissing me off if someone deemed this excellent 
feature
beeing a bug and removed it. The OP already found out that some
valuable mathematical concepts doesn't work with only 3 values for
opCmp. The 4th value is essencial.

Jul 19 2018

Simen =?UTF-8?B?S2rDpnLDpXM=?= <simen.kjaras gmail.com> writes:

On Thursday, 19 July 2018 at 10:14:34 UTC, Dominikus Dittes 
Scherkl wrote:
 On Wednesday, 18 July 2018 at 17:30:21 UTC, Jonathan M Davis 
 wrote:
 On Tuesday, July 17, 2018 21:18:12 John Colvin via 
 Digitalmars-d wrote:
 Just do what std.typecons.Proxy does and return float.nan for 
 the incomparable case.

 Since when is that legal? I thought that it was required for 
 opCmp to return int. Certainly, the spec implies that it has 
 to be int. The fact that the compiler allows it seems like a 
 bug, though if Phobos is doing it, it wouldn't surprise me if 
 Walter would choose to update the spec rather than fixing the 
 compiler.

 It always worked with float as returntype (at least since I'm 
 using D
 - about 2.40 or so?), and it was necessary for the 
 (unfortunately
 deprecated) special operators <>, !<>, <>=, !<>=, ...

The oldest reference to opCmp returning NaN that I've found is 
from 2005, so it's not an entirely new thing:
https://forum.dlang.org/thread/dd3cea$26u7$1 digitaldaemon.com

--
   Simen

Jul 19 2018

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Jul 18, 2018 at 11:30:21AM -0600, Jonathan M Davis via Digitalmars-d
wrote:
 On Tuesday, July 17, 2018 21:18:12 John Colvin via Digitalmars-d wrote:
 Just do what std.typecons.Proxy does and return float.nan for the
 incomparable case.

 
 Since when is that legal? I thought that it was required for opCmp to
 return int. Certainly, the spec implies that it has to be int. The
 fact that the compiler allows it seems like a bug, though if Phobos is
 doing it, it wouldn't surprise me if Walter would choose to update the
 spec rather than fixing the compiler.

[...]

Yeah, this is also a surprise to me.  Is this a bug?  I was under the
impression that opCmp must return int.

On the other hand, if opCmp is allowed to return a user-defined type, it
would solve the problem in a neat way: just define a quaternary type
that encapsulates the values -1, 0, 1, NaN, and have opCmp return the
equivalent of NaN for non-comparable arguments.  Then we could support
partial orders correctly.

But I have a hard time seeing this actually work in practice, because a
user-defined return type for opCmp leads to recursion: in order to
compare the return type against -1, 0, 1, etc., it would also need to
implement opCmp itself.  I'm almost certain this will cause the compiler
to choke. :-D  Not to mention opening up the possibility of ridiculous
cases like having two user-defined types whose opCmp returns the other
type, leading to infinite recursion.


T

-- 
Being able to learn is a great learning; being able to unlearn is a greater
learning.

Jul 18 2018

Dominikus Dittes Scherkl <dominikus.scherkl continental-corporation.com> writes:

On Wednesday, 18 July 2018 at 17:49:19 UTC, H. S. Teoh wrote:
 On Wed, Jul 18, 2018 at 11:30:21AM -0600, Jonathan M Davis via 
 On the other hand, if opCmp is allowed to return a user-defined 
 type, it would solve the problem in a neat way: just define a 
 quaternary type that encapsulates the values -1, 0, 1, NaN, and 
 have opCmp return the equivalent of NaN for non-comparable 
 arguments.  Then we could support partial orders correctly.

 But I have a hard time seeing this actually work in practice, 
 because a user-defined return type for opCmp leads to recursion:

It already works with float, no recursion. A lot of the types I 
use depend on this.

But having a language supported quarterny type would be good for 
its improved speed.

Jul 19 2018

D Programming

C/C++ Programming

Other

digitalmars.D - opCmp / opEquals do not actually support partial orders