digitalmars.D - [Vote] Should defining opCmp to be inconsistent with opEquals be

Stewart Gordon (24/24) Aug 06 2005 It is possible to define the opCmp and opEquals methods so that unequal

Regan Heath (49/53) Aug 06 2005 I see in your first example/link you're defining a character class for

Stewart Gordon (34/68) Aug 08 2005 Yes. But my example was on the contrived side, but maybe the idea can

Regan Heath (17/65) Aug 08 2005 Sure. But in the case of an AA we need to assign an

Stewart Gordon (11/15) Aug 09 2005

Regan Heath (29/38) Aug 09 2005 opCmp

Stewart Gordon (18/49) Aug 09 2005 Adding or looking up a key in an AA isn't comparing either.

Regan Heath (13/44) Aug 09 2005 In which case it should use opRank or opSort, no? (if it exists)

Stewart Gordon (14/29) Aug 10 2005

Regan Heath (42/64) Aug 10 2005 Sure.

Ben Hinkle (9/15) Aug 06 2005 My first preference would be 1 since I think it would be a maintenance

Burton Radons (16/17) Aug 07 2005 2, I don't care what incorrect code does and I don't want any speed

Ben Hinkle (34/50) Aug 07 2005 True. I was talking about <>= in the context of opCmp for objects. Since...

Manfred Nowak (22/24) Aug 07 2005 I already mentioned, that the AA-implementation is broken:

Stewart Gordon (12/35) Aug 08 2005 That's why it talks about calling .rehash after populating the AA. But

Manfred Nowak (18/21) Aug 08 2005 Optimal number of hash slots under which target function? And if you

Stewart Gordon (9/11) Aug 08 2005
Sean Kelly (12/31) Aug 08 2005 That's because they do match, at least as far as an ordered binary tree ...

Sean Kelly (6/11) Aug 08 2005 My response was poorly worded (one of these days I'll learn not to post ...
Stewart Gordon (8/12) Aug 08 2005

Sean Kelly (7/15) Aug 08 2005 I don't have writefln available in my current setup :p. But it looks li...

Stewart Gordon (10/19) Aug 09 2005 Update your compiler, or change the example to use printf.

Sean Kelly (4/10) Aug 09 2005 I disagree. DMD is a reference implementation, so where the spec is vag...

Manfred Nowak (5/8) Aug 09 2005 Then: what is the bugs group for? Only for deviations of dmd to the

Sean Kelly (11/18) Aug 09 2005 Mostly, yes. Though I've posted things there in the past that were more...
Derek Parnell (12/21) Aug 09 2005 Do you try to be thick?

Stewart Gordon (14/26) Aug 09 2005 This would mean that:

Sean Kelly (8/30) Aug 09 2005 See my other post for clarification. I didn't mean to imply that DMD sh...
Derek Parnell (13/41) Aug 09 2005 Absolutely true!

Sean Kelly (7/12) Aug 08 2005 Oh, I should mention that all my prior experience with hash maps has bee...
Manfred Nowak (25/30) Aug 22 2005 Stewart Gordon wrote in

Stewart Gordon <smjg_1998 yahoo.com> writes:

It is possible to define the opCmp and opEquals methods so that unequal 
objects can rank equally in order.  This is allowed by the Java specs, 
and indeed one or two Java API classes do this.

However, when this is done in D, the current implementation of 
associative arrays falls apart.

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/4556

The problem is that the AA implementation assumes that if opCmp returns 
zero then the keys match.  As far as I can see, the spec doesn't in any 
way forbid behaviour to the contrary.

But there seems to be room for debate on what the best solution is. 
Possibilities:

1. State in the spec that it is an incorrect practice to define opCmp to 
return 0 if opEquals returns true.

2. Keep it as an acceptable practice, but warn people properly that such 
a type may not work correctly as an AA key.

3. Keep it as an acceptable practice, and fix the AA implementation to 
work on such types.  Before you dismiss this idea as inefficient, please 
read

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/4710

Cast your votes now!

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
on the 'group where everyone may benefit.

Aug 06 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Sat, 06 Aug 2005 23:10:26 +0100, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:
 It is possible to define the opCmp and opEquals methods so that unequal  
 objects can rank equally in order.

I see in your first example/link you're defining a character class for  
which opEquals does a case-sensitive compare, but opCmp does a case  
insensitive comparrison. The goal, I assume, is to have characters/strings  
that sort/rank in a case insensitive fashion but are only exactly equal if  
the case is the same. Am I right?

A recent thread on the array 'sort' operator has made me think that  
perhaps we can apply the same idea to AA's. That idea is the ability to  
assign a 'sort'/'opCmp' method so the array.

So, we pick:

 1. State in the spec that it is an incorrect practice to define opCmp to  
 return 0 if opEquals returns true.

and instead allow the following...

class IChar
{
   static int opCmpCase(IChar lhs, IChar rhs) {}
   ..etc..
}

int[IChar] list;
list.opCmp = &IChar.opCmpCase;
..add items to 'list'..

and/or allow assignment of a delegate, eg.

class ICharSort
{
   int[] sortOrder; //by name, then by date, then by ..
   int opCmp(IChar lhs, IChar rhs) {}
}

ICharSort s = new ICharSort();
..define sortOrder in 's'..

int[IChar] list;
list.opCmp = &s.opCmp;

..add items to 'list'..

to allow us to sort/rank in a more complex fashion.

For an AA it could either be illegal, or would cause a rehash to assign  
the delegate when it contains items. For an array it wouldn't matter  
(except that it would sort of invalidate the state i.e. if you sort, then  
assign it's not technically 'sorted' by it's current operator any more)

The advantage here is that we have only one array/AA implementation and  
there should be little (no?) performance penalty using types that do or  
don't have this requirement.

Of course one argument against this idea may be that the class ('IChar' in  
this example) should never be sorted/ranked in any other way. In which  
case...

Why can't we have a 3rd operator opRank (or opSort) - definable in IChar,  
and make the the array/AA smart. The array/AA could, upon creation, assign  
a comparrison delegate to this new operator if present and opCmp otherwise.

In this way there is no additional performance penalty (as it only checks  
once, upon creation, not every comparrison)

Thoughts?

Regan

Aug 06 2005

Stewart Gordon <smjg_1998 yahoo.com> writes:

Regan Heath wrote:

 On Sat, 06 Aug 2005 23:10:26 +0100, Stewart Gordon 
 <smjg_1998 yahoo.com>  wrote:
 
 It is possible to define the opCmp and opEquals methods so that 
 unequal  objects can rank equally in order.

 
 I see in your first example/link you're defining a character class for  
 which opEquals does a case-sensitive compare, but opCmp does a case  
 insensitive comparrison. The goal, I assume, is to have 
 characters/strings  that sort/rank in a case insensitive fashion but are 
 only exactly equal if  the case is the same. Am I right?

Yes.  But my example was on the contrived side, but maybe the idea can 
be put to practical use.

 A recent thread on the array 'sort' operator has made me think that  
 perhaps we can apply the same idea to AA's. That idea is the ability to  
 assign a 'sort'/'opCmp' method so the array.

I personally prefer the idea of having array.sort(cmp) available here. 
Not only is writing

     array.sort(&cmpByDate);

shorter than

     array.opCmp = &cmpByDate;
     array.sort;

but it also doesn't bloat memory requirements.

 So, we pick:
 
 1. State in the spec that it is an incorrect practice to define opCmp 
 to  return 0 if opEquals returns true.

 
 and instead allow the following...
 
 class IChar
 {
   static int opCmpCase(IChar lhs, IChar rhs) {}
   ..etc..
 }
 
 int[IChar] list;
 list.opCmp = &IChar.opCmpCase;
 ..add items to 'list'..

I'm not sure opCmp is the best name - it looks like you're setting the 
function that will be used to compare this AA with another AA.  Maybe 
call it comparator?

That said, what's the point of this?  To allow some AAs to be 
case-sensitive and others to be case-insensitive?  If you're going to do 
this, do we gain much over doing assignments and lookups to 
aa[toupper(key)]?

If we solved the problem by giving every AA a comparator property, then 
every AA that uses such a keytype would need to set the comparator 
anyway.  So why not go the whole hog and let the type define the AA 
comparator?

<snip>
 Why can't we have a 3rd operator opRank (or opSort) - definable in 
 IChar,  and make the the array/AA smart. The array/AA could, upon 
 creation, assign  a comparrison delegate to this new operator if present 
 and opCmp otherwise.

<snip>

But calling it opSort would imply using it for .sort as well.  This is 
pointless IMO - normal opCmp can be used for this.  If you define opCmp 
such that unequal objects can rank equally, then you're stating that 
when sorting, you don't care what order equally-ranking objects end up in.

Maybe call it opStrictCmp.  AAs (and any other applications relying on a 
comparator consistent with opEquals) can use this, and the default .sort 
can use plain opCmp.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
on the 'group where everyone may benefit.

Aug 08 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Mon, 08 Aug 2005 12:31:59 +0100, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:
 A recent thread on the array 'sort' operator has made me think that   
 perhaps we can apply the same idea to AA's. That idea is the ability  
 to  assign a 'sort'/'opCmp' method so the array.

 I personally prefer the idea of having array.sort(cmp) available here.  
 Not only is writing

      array.sort(&cmpByDate);

 shorter than

      array.opCmp = &cmpByDate;
      array.sort;

 but it also doesn't bloat memory requirements.

Sure. But in the case of an AA we need to assign an  
operator/function/delegate for it to use in it's internal ordering on  
every insert/lookup/etc.

 So, we pick:

 1. State in the spec that it is an incorrect practice to define opCmp  
 to  return 0 if opEquals returns true.

  and instead allow the following...
  class IChar
 {
   static int opCmpCase(IChar lhs, IChar rhs) {}
   ..etc..
 }
  int[IChar] list;
 list.opCmp = &IChar.opCmpCase;
 ..add items to 'list'..

 I'm not sure opCmp is the best name - it looks like you're setting the  
 function that will be used to compare this AA with another AA.  Maybe  
 call it comparator?

We can call it whatever.

 That said, what's the point of this?  To allow some AAs to be  
 case-sensitive and others to be case-insensitive?  If you're going to do  
 this, do we gain much over doing assignments and lookups to  
 aa[toupper(key)]?

To allow you to store your weird objects for which opEquals and opCmp  
disagree.

 If we solved the problem by giving every AA a comparator property, then  
 every AA that uses such a keytype would need to set the comparator  
 anyway.  So why not go the whole hog and let the type define the AA  
 comparator?

Sure, that's exactly what I suggest below.

 <snip>
 Why can't we have a 3rd operator opRank (or opSort) - definable in  
 IChar,  and make the the array/AA smart. The array/AA could, upon  
 creation, assign  a comparrison delegate to this new operator if  
 present and opCmp otherwise.

 <snip>

 But calling it opSort would imply using it for .sort as well.

But that's _exactly_ what it is used for, ranking/sorting the object.

I'm suggesting that opEquals and opCmp should _always_ agree and opSort  
may disagree with opCmp. AA's would use opCmp, sort would use opSort. This  
lets you use them in AA's and sort them how you want in normal arrays.

You suggest the opposite below, either way a 3rd operator to handle your  
unusual type with AA's and arrays automagically picking up and using the  
new operator when poresent is the solution I like best.

 This is pointless IMO - normal opCmp can be used for this.  If you  
 define opCmp such that unequal objects can rank equally, then you're  
 stating that when sorting, you don't care what order equally-ranking  
 objects end up in.

 Maybe call it opStrictCmp.  AAs (and any other applications relying on a  
 comparator consistent with opEquals) can use this, and the default .sort  
 can use plain opCmp.

Regan

Aug 08 2005

Stewart Gordon <smjg_1998 yahoo.com> writes:

Regan Heath wrote:
<snip>
 I'm suggesting that opEquals and opCmp should _always_ agree and opSort  
 may disagree with opCmp. AA's would use opCmp, sort would use opSort. 
 This lets you use them in AA's and sort them how you want in normal 
 arrays.

<snip>

Which would the < <= > >= operators use under your proposal?  Since this 
is akin to sorting, it would make sense to use opSort, but then it 
becomes confusing that there is a method called opCmp but it doesn't do 
this.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
on the 'group where everyone may benefit.

Aug 09 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Tue, 09 Aug 2005 12:06:53 +0100, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:
 Regan Heath wrote:
 <snip>
 I'm suggesting that opEquals and opCmp should _always_ agree and  
 opSort  may disagree with opCmp. AA's would use opCmp, sort would use  
 opSort. This lets you use them in AA's and sort them how you want in  
 normal arrays.

 <snip>

 Which would the < <= > >= operators use under your proposal?

opCmp

 Since this is akin to sorting, it would make sense to use opSort

I disagree. You're comparing not sorting thus opCmp makes sense.

More important however is _how_ you want to compare in the general case  
with this special type, eg.

IChar a = 'a';
IChar b = 'B';

if (a < b) { //true or false? }

Assume for the sake of argument:
   'a' = 0x61 or 91 decimal
   'B' = 0x42 or 66 decimal

Is the statement true or false?

Does the answer depend on context (if so, which is more common case or no  
case?) or will this type always be compared in one way?

If you generally want to do a case-insensitive compare then you're  
defining an opCmp that AA's cannot use.
If you want to do case-sensitive compare generally then you're defining an  
opCmp that AA's can use.

I suspect you want the former, right?

In which case the solution might be to have another operator (as you  
suggested) opCmpStrict or something used by AA's. This is what I referred  
to when I said "either way" earlier. This operator can be found and used  
automagically by the AA without requiring a change to the 'balanced binary  
tree' part of the implementation.

Regan

p.s. My intention in starting this part of the thread was to explore  
solutions that don't involve changing the AA implementation. I'm not  
saying that's not the solution, I'm just exploring...

Aug 09 2005

Stewart Gordon <smjg_1998 yahoo.com> writes:

Regan Heath wrote:

 On Tue, 09 Aug 2005 12:06:53 +0100, Stewart Gordon 
 <smjg_1998 yahoo.com>  wrote:

<snip>
 Which would the < <= > >= operators use under your proposal?

 
 opCmp
 
 Since this is akin to sorting, it would make sense to use opSort

 
 I disagree. You're comparing not sorting thus opCmp makes sense.

Adding or looking up a key in an AA isn't comparing either.

Under the way I had it, one comparator is used externally and one 
internally.  It also remains possible to define < <= > >= to have a 
non-strict comparator.  (OK, so there would remain inconsistency between

     qwert <= yuiop

and

     qwert < yuiop || qwert == yuiop

but is this really an excuse?)

 More important however is _how_ you want to compare in the general case  
 with this special type, eg.
 
 IChar a = 'a';
 IChar b = 'B';
 
 if (a < b) { //true or false? }
 
 Assume for the sake of argument:
   'a' = 0x61 or 91 decimal
   'B' = 0x42 or 66 decimal
 
 Is the statement true or false?
 
 Does the answer depend on context (if so, which is more common case or 
 no case?) or will this type always be compared in one way?

My thought was that it will always be compared in this way.

 If you generally want to do a case-insensitive compare then you're 
 defining an opCmp that AA's cannot use.
 If you want to do case-sensitive compare generally then you're defining 
 an opCmp that AA's can use.
 
 I suspect you want the former, right?

<snip>

The idea of my example is that they rank case-insensitively but match 
case-sensitively.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
on the 'group where everyone may benefit.

Aug 09 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Tue, 09 Aug 2005 17:14:42 +0100, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:
 On Tue, 09 Aug 2005 12:06:53 +0100, Stewart Gordon  
 <smjg_1998 yahoo.com>  wrote:

 <snip>
 Which would the < <= > >= operators use under your proposal?

  opCmp

 Since this is akin to sorting, it would make sense to use opSort

  I disagree. You're comparing not sorting thus opCmp makes sense.

 Adding or looking up a key in an AA isn't comparing either.

In which case it should use opRank or opSort, no? (if it exists)

Comparing is required for sorting, but not vice-versa.
What AA's do is akin to sorting, not comparing.
Right?

 Under the way I had it, one comparator is used externally and one  
 internally.

externally: case-insensitive  (opCmp)
internally: case-sensitive    (opRank/opSort/opAA ..)

Right?

 More important however is _how_ you want to compare in the general  
 case  with this special type, eg.
  IChar a = 'a';
 IChar b = 'B';
  if (a < b) { //true or false? }
  Assume for the sake of argument:
   'a' = 0x61 or 91 decimal
   'B' = 0x42 or 66 decimal
  Is the statement true or false?
  Does the answer depend on context (if so, which is more common case or  
 no case?) or will this type always be compared in one way?

 My thought was that it will always be compared in this way.

Always compared in a case insensitive fashion?
Except.. when compared during sorting/ranking.

 If you generally want to do a case-insensitive compare then you're  
 defining an opCmp that AA's cannot use.
 If you want to do case-sensitive compare generally then you're defining  
 an opCmp that AA's can use.
  I suspect you want the former, right?

 <snip>

 The idea of my example is that they rank case-insensitively but match  
 case-sensitively.

By 'match' do you mean only '==' or '<' '<=' etc as well?

Regan

Aug 09 2005

Stewart Gordon <smjg_1998 yahoo.com> writes:

Regan Heath wrote:
 On Tue, 09 Aug 2005 17:14:42 +0100, Stewart Gordon 
 <smjg_1998 yahoo.com>  wrote:

<snip>
[Regan:]
 I disagree. You're comparing not sorting thus opCmp makes sense.

 Adding or looking up a key in an AA isn't comparing either.

 
 In which case it should use opRank or opSort, no? (if it exists)
 
 Comparing is required for sorting, but not vice-versa.
 What AA's do is akin to sorting, not comparing.
 Right?

Yes, except that an AA is sorted only internally.  OTOH array.sort is 
external sorting, i.e. the application, and possibly even the user, 
concerns him/her/itself with the order generated by array.sort.

<snip>
 The idea of my example is that they rank case-insensitively but match  
 case-sensitively.

 
 By 'match' do you mean only '==' or '<' '<=' etc as well?

By 'rank' I mean what < <= > >= do - maybe 'order' is a better word.  By 
'match' I mean ==, and hence what one would naturally expect of AA key 
matching.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
on the 'group where everyone may benefit.

Aug 10 2005

"Regan Heath" <regan netwin.co.nz> writes:

On Wed, 10 Aug 2005 18:12:48 +0100, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:
 Regan Heath wrote:
 On Tue, 09 Aug 2005 17:14:42 +0100, Stewart Gordon  
 <smjg_1998 yahoo.com>  wrote:

 <snip>
 [Regan:]
 I disagree. You're comparing not sorting thus opCmp makes sense.

 Adding or looking up a key in an AA isn't comparing either.

  In which case it should use opRank or opSort, no? (if it exists)
  Comparing is required for sorting, but not vice-versa.
 What AA's do is akin to sorting, not comparing.
 Right?

 Yes, except that an AA is sorted only internally.  OTOH array.sort is  
 external sorting, i.e. the application, and possibly even the user,  
 concerns him/her/itself with the order generated by array.sort.

Sure.

It doesn't matter how the AA's sort works provided it does not call two  
unequal items equal (the problem with your special type and it's opCmp  
operator).

Whereas the array.sort matters, and the user may want to change/define it  
differently in each call of array.sort.

 <snip>
 The idea of my example is that they rank case-insensitively but match   
 case-sensitively.

  By 'match' do you mean only '==' or '<' '<=' etc as well?

 By 'rank' I mean what < <= > >= do - maybe 'order' is a better word.  By  
 'match' I mean ==, and hence what one would naturally expect of AA key  
 matching.

I would have said that < <= etc "compare" not "rank" or "order", and ==  
was "compare" also, but perhaps "match". Basically, in my head < <= etc  
and = are all "compare" operations.

To "rank" or "order" you need some sort of context. i.e. inside an AA or  
when calling array.sort. You can "compare" any 2 items, you can "rank" or  
"order" a set of x (where x is > 1), ranking 2 items is the same as  
comparing, to decide which you need the context.

Anyway..

I think the solution is another operator which you can define for your  
types and is magically found and used by the AA implementation. So you'd  
have:

class IChar
{
   char data;

   int opCmp(IChar rhs)
   {
     return toupper(data) - toupper(rhs.data);
   }

   int opEquals(IChar rhs)
   {
     return data == rhs.data;
   }

   int opAA(IChar rhs)
   {
     return data - rhs.data;
   }
}

Giving you a type that you can compare with < <= etc (opCmp), compare with  
= (opEquals) and store in an AA (opAA - or whatever we want to call it)

The AA implementation needs only one minor change, the ability to find and  
use the opAA operator. Essentially instead of calling opCmp it has a  
delegate which it assigns upon creation to opAA or opCmp then uses for all  
it's internal workings.

Regan

Aug 10 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

 1. State in the spec that it is an incorrect practice to define opCmp to 
 return 0 if opEquals returns true.

 2. Keep it as an acceptable practice, but warn people properly that such a 
 type may not work correctly as an AA key.

 3. Keep it as an acceptable practice, and fix the AA implementation to 
 work on such types.

My first preference would be 1 since I think it would be a maintenance 
nightmare to allow values such that x<=y and y<=x but x!=y. It takes me back 
to my TA days in freshman calculus and the algebra abuse that students are 
capable of. But given that D has the <> and !<> operators presumably to 
allow users to distinguish those concepts from != and == I'll vote for 2. So 
either I say go with 1 and look hard at !<> and <> or go with 2 and say that 
if you design a type to use <> and !<> that you should give big blinking 
warnings.

ps - what does <>= do? It is a fun way of checking if an opCmp throws?

Aug 06 2005

Burton Radons <burton-radons smocky.com> writes:

2, I don't care what incorrect code does and I don't want any speed 
penalty from confirming that I'm writing correct code when I always test 
every line I write.

Object.opCmp should be changed to return float, so that it can use the 
semantics the compiler actually applies to the return value of 
comparison operators.  In that case 0 does mean equality - float.nan 
means unordered.

This change should've been made years ago, and the operator overloading 
page should be fixed up as well.  Someone send Walter a patch.

Ben Hinkle wrote:
 ps - what does <>= do? It is a fun way of checking if an opCmp throws? 

The throw semantics are irreproducible, and anyway either the standard 
or the implementation is wrong - it doesn't throw and I don't think 
there's a Phobos way to enable FP exceptions.

But directly related to your question, it returns whether the values are 
not unordered.  So "float.nan <>= float.nan" is false, while "4 <>= 4" 
is true.

Aug 07 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Burton Radons" <burton-radons smocky.com> wrote in message 
news:dd6alf$17kj$1 digitaldaemon.com...
 2, I don't care what incorrect code does and I don't want any speed 
 penalty from confirming that I'm writing correct code when I always test 
 every line I write.

 Object.opCmp should be changed to return float, so that it can use the 
 semantics the compiler actually applies to the return value of comparison 
 operators.  In that case 0 does mean equality - float.nan means unordered.

 This change should've been made years ago, and the operator overloading 
 page should be fixed up as well.  Someone send Walter a patch.

 Ben Hinkle wrote:
 ps - what does <>= do? It is a fun way of checking if an opCmp throws?

 The throw semantics are irreproducible, and anyway either the standard or 
 the implementation is wrong - it doesn't throw and I don't think there's a 
 Phobos way to enable FP exceptions.

 But directly related to your question, it returns whether the values are 
 not unordered.  So "float.nan <>= float.nan" is false, while "4 <>= 4" is 
 true.

True. I was talking about <>= in the context of opCmp for objects. Since 
opCmp currently returns an int <>= only tests if the opCmp throws. I agree 
if opCmp returned a float then we can start giving <>= meaning for objects. 
Note also that nan behavior is different from opCmp and opEqual implementing 
different notions of equality. To see what I mean concretely about <>= and 
the semantics of <> and != try running

class X {
  int val;
  int opCmp(Object y) {
    X y2 = cast(X)y;
    int diff = val - y2.val;
    if (diff < 5 && -diff < 5)
      return 0;
    return diff;
  }
  int opEqual(Object y) {
    X y2 = cast(X)y;
    return val == y2.val;
  }
}
int main() {
  X a = new X;
  X b = new X;
  a.val = 10;
  b.val = 12;
  assert( a <= b );
  assert( a >= b );
  assert( a != b );
  assert( a !<> b );
  assert( a <>= b ); // should always be true
  return 0;
}

Aug 07 2005

Manfred Nowak <svv1999 hotmail.com> writes:

Stewart Gordon <smjg_1998 yahoo.com> wrote:

My vote is:

 3. Keep it as an acceptable practice, and fix the AA
 implementation to work on such types.

I already mentioned, that the AA-implementation is broken:
http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/22170

| Hashing needs at least one hash function, an _equal_ operator
| and one of these: an upper bound on the number of elements or a
| rehash-operation.

The opCmp is not needed for hashing. The current Implementation uses 
opCmp for traversing its unbalanced binary trees, which can 
degenerate to linear lists, depending on the quality of toHash and 
the insertion sequence.

Degenerated AA's will be terrible slow, because in the average case a 
lookup will traverse half of the elements in the AA, whereas a pure 
hash with conflict resolution by jumping can be guaranteed to bind 
the lookup-time by a constant, i.e. independent of the number of 
elements stored.

Because only opEqual is needed there is no necessity to restrict or 
warn.

The drawback is, that if removing must be supported the space 
requirements are temporarily increased instead of decreased.

So clearly choice 3 with a clear preference for a pure hash.

-manfred

Aug 07 2005

Stewart Gordon <smjg_1998 yahoo.com> writes:

Manfred Nowak wrote:

 Stewart Gordon <smjg_1998 yahoo.com> wrote:
 
 My vote is:
 
 3. Keep it as an acceptable practice, and fix the AA
 implementation to work on such types.

 
 I already mentioned, that the AA-implementation is broken:
 http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/22170
 
 | Hashing needs at least one hash function, an _equal_ operator
 | and one of these: an upper bound on the number of elements or a
 | rehash-operation.
 
 The opCmp is not needed for hashing. The current Implementation uses 
 opCmp for traversing its unbalanced binary trees, which can 
 degenerate to linear lists, depending on the quality of toHash and 
 the insertion sequence.

That's why it talks about calling .rehash after populating the AA.  But 
yes, it doesn't specify how thoroughly it should reorganise the array. 
I guess a sensible approach (under the current AA imp) is to recalculate 
all the hash values, allocate the optimum number of hash slots and 
balance the binary trees.

 Degenerated AA's will be terrible slow, because in the average case a 
 lookup will traverse half of the elements in the AA, whereas a pure 
 hash with conflict resolution by jumping can be guaranteed to bind 
 the lookup-time by a constant, i.e. independent of the number of 
 elements stored.

<snip>

You have a point.  But what do you mean by jumping, exactly?

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
on the 'group where everyone may benefit.

Aug 08 2005

Manfred Nowak <svv1999 hotmail.com> writes:

Stewart Gordon <smjg_1998 yahoo.com> wrote:

[...]
 allocate the optimum number of hash slots and balance the binary
 trees.

Optimal number of hash slots under which target function? And if you 
have to press any balancing algorithm into code for the sake of 
collision resolution, that would imply, that you do not trust the 
hashing part. Then: why use hashing at all?

[...]
 But what do you mean by jumping, exactly?

If you find that the hash slot provided by toHash is already 
occupied, you choose the next slot to probe for occupation by 
increasing the index of the current hash slot by a given amount, i.e. 
jump to the next hash slot. All you have to guarantee is that if 
there is an empty hash slot you can find it by jumping.

Up to a fill rate of about 0.9 thoretical results show, that the 
length of the jump distance is unimportant and the length of the 
average probing sequnence in search for an empty hash slot can be 
bound by the reciprocal of 1 minus the fill rate, i.e. 10 in the case 
of a fill rate of 0.9.

-manfred

Aug 08 2005

Stewart Gordon <smjg_1998 yahoo.com> writes:

Stewart Gordon wrote:
<snip>
 1. State in the spec that it is an incorrect practice to define opCmp to 
 return 0 if opEquals returns true.

<snip>

Oops, I meant if opEquals returns false.  But I'm guessing that anyone 
who voted for this option misread it as this anyway....

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
on the 'group where everyone may benefit.

Aug 08 2005

Sean Kelly <sean f4.ca> writes:

In article <dd3cea$26u7$1 digitaldaemon.com>, Stewart Gordon says...
It is possible to define the opCmp and opEquals methods so that unequal 
objects can rank equally in order.  This is allowed by the Java specs, 
and indeed one or two Java API classes do this.

Yes.  I do this in C++ all the time and consider it an important feature.

However, when this is done in D, the current implementation of 
associative arrays falls apart.

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/4556

The problem is that the AA implementation assumes that if opCmp returns 
zero then the keys match.  As far as I can see, the spec doesn't in any 
way forbid behaviour to the contrary.

That's because they do match, at least as far as an ordered binary tree are
concerned.  The C++ standard defines this as equivalence, ie. two objects have
the same sort order.

But there seems to be room for debate on what the best solution is. 
Possibilities:

1. State in the spec that it is an incorrect practice to define opCmp to 
return 0 if opEquals returns true.

I'm not sure I like this.  While I generally make equivalence a looser match
than equality, it's reasonable that an application might violate this condition.

2. Keep it as an acceptable practice, but warn people properly that such 
a type may not work correctly as an AA key.

Why wouldn't it work properly?

3. Keep it as an acceptable practice, and fix the AA implementation to 
work on such types.  Before you dismiss this idea as inefficient, please 
read

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/4710

If hash maps are going to change to use equality rather than equivalence then it
has to happen soon.  But frankly, what I really want is a way to supply a
comparison delegate on declaration.


Sean

Aug 08 2005

Sean Kelly <sean f4.ca> writes:

In article <dd7sau$2ktr$1 digitaldaemon.com>, Sean Kelly says...
In article <dd3cea$26u7$1 digitaldaemon.com>, Stewart Gordon says...

1. State in the spec that it is an incorrect practice to define opCmp to 
return 0 if opEquals returns true.

I'm not sure I like this.  While I generally make equivalence a looser match
than equality, it's reasonable that an application might violate this condition.

My response was poorly worded (one of these days I'll learn not to post right
after waking up).  But now this has me confused.  Are you saying that the spec
should forbid two objects from being equivalent if they are equal?  Or did you
mean "if opEquals returns false?"  Either way, I don't like it.


Sean

Aug 08 2005

Stewart Gordon <smjg_1998 yahoo.com> writes:

Sean Kelly wrote:

<snip>
2. Keep it as an acceptable practice, but warn people properly that such 
a type may not work correctly as an AA key.

 
 Why wouldn't it work properly?

<snip>

Try the example in the post I linked to above and you'll see.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
on the 'group where everyone may benefit.

Aug 08 2005

Sean Kelly <sean f4.ca> writes:

In article <dd7udk$2n7b$1 digitaldaemon.com>, Stewart Gordon says...
Sean Kelly wrote:

<snip>
2. Keep it as an acceptable practice, but warn people properly that such 
a type may not work correctly as an AA key.

 
 Why wouldn't it work properly?

<snip>

Try the example in the post I linked to above and you'll see.

I don't have writefln available in my current setup :p.  But it looks like the
point you were trying to make is that 'C' and 'c' are considered equivalent
keys, which I would consider correct behavior given the current implementation.
Whether or not hash maps should be changed to use unordered slists for
collisions so opEquals can be used instead is a separate issue IMO.


Sean

Aug 08 2005

Stewart Gordon <smjg_1998 yahoo.com> writes:

Sean Kelly wrote:
 In article <dd7udk$2n7b$1 digitaldaemon.com>, Stewart Gordon says...

<snip>
 Try the example in the post I linked to above and you'll see.

 
 I don't have writefln available in my current setup :p. 

Update your compiler, or change the example to use printf.

 But it looks like the
 point you were trying to make is that 'C' and 'c' are considered equivalent
 keys, which I would consider correct behavior given the current implementation.

Correct behaviour is determined by the spec, not by some implementaion.

The natural assumption is that keys match by equality, not equivalence.

 Whether or not hash maps should be changed to use unordered slists for
 collisions so opEquals can be used instead is a separate issue IMO.

Indeed, separate from whether they should be fixed to use opEquals as well.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
on the 'group where everyone may benefit.

Aug 09 2005

Sean Kelly <sean f4.ca> writes:

In article <dda28h$21m5$1 digitaldaemon.com>, Stewart Gordon says...
Sean Kelly wrote:
 But it looks like the
 point you were trying to make is that 'C' and 'c' are considered equivalent
 keys, which I would consider correct behavior given the current implementation.

Correct behaviour is determined by the spec, not by some implementaion.

I disagree.  DMD is a reference implementation, so where the spec is vague DMD
should serve as an authoritative refernce on expected behavior.


Sean

Aug 09 2005

Manfred Nowak <svv1999 hotmail.com> writes:

Sean Kelly <sean f4.ca> wrote:

[...]
 I disagree.  DMD is a reference implementation, so where the
 spec is vague DMD should serve as an authoritative refernce on
 expected behavior. 

Then: what is the bugs group for? Only for deviations of dmd to the 
specs?

-manfred

Aug 09 2005

Sean Kelly <sean f4.ca> writes:

In article <ddajcc$2p86$1 digitaldaemon.com>, Manfred Nowak says...
Sean Kelly <sean f4.ca> wrote:

[...]
 I disagree.  DMD is a reference implementation, so where the
 spec is vague DMD should serve as an authoritative refernce on
 expected behavior. 

Then: what is the bugs group for? Only for deviations of dmd to the 
specs?

Mostly, yes.  Though I've posted things there in the past that were more of a
feature change request than a bug report.  Thing is, Walter is not a tech
writer, and he tends to spend time on compiler changes over documentation.  So
in instances where the spec is unclear or just plain doesn't say anything, you
have to make an educated guess about what's intended, and DMD is a useful
reference for this.  As far as this hash issue is concerned--I consider the use
of opCmp to be a requirement simply because a program wouldn't be portable
between compilers if one were implemented using opEquals and the other were
implemented using opCmp, and DMD uses opCmp.


Sean

Aug 09 2005

Derek Parnell <derek psych.ward> writes:

On Tue, 9 Aug 2005 15:50:36 +0000 (UTC), Manfred Nowak wrote:

 Sean Kelly <sean f4.ca> wrote:
 
 [...]
 I disagree.  DMD is a reference implementation, so where the
 spec is vague DMD should serve as an authoritative refernce on
 expected behavior. 

 
 Then: what is the bugs group for? Only for deviations of dmd to the 
 specs?

Do you try to be thick?

DMD is not perfect - it has mistakes that even we simpletons can see. And
in spite of that, it is still further along the track than the official
documentation. This is not how it should be, I know, but that's the reality
we have to put up with. And complaining about that aspect has got even less
chance of an improvement than some of the weirdo things on our wishlists.
Its simply not Walter's development methodology to document before coding.

-- 
Derek Parnell
Melbourne, Australia
10/08/2005 7:46:42 AM

Aug 09 2005

Stewart Gordon <smjg_1998 yahoo.com> writes:

Sean Kelly wrote:

 In article <dda28h$21m5$1 digitaldaemon.com>, Stewart Gordon says...
 
Sean Kelly wrote:

But it looks like the
point you were trying to make is that 'C' and 'c' are considered equivalent
keys, which I would consider correct behavior given the current implementation.

Correct behaviour is determined by the spec, not by some implementaion.

 
 I disagree.  DMD is a reference implementation, so where the spec is vague DMD
 should serve as an authoritative refernce on expected behavior.

This would mean that:

1. All undefined testcases in DStress should be either pass or xfail, 
defeating the point.

2. If some undefined code crashes the compiler, then crashing the 
compiler is correct behaviour.

3. Third-party compiler writers would have to hop back and forth between 
the spec and the reference implementation.

This strikes me as absurd.  A language should be implementable from the 
spec alone.

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
on the 'group where everyone may benefit.

Aug 09 2005

Sean Kelly <sean f4.ca> writes:

In article <ddal12$2sda$1 digitaldaemon.com>, Stewart Gordon says...
Sean Kelly wrote:

 In article <dda28h$21m5$1 digitaldaemon.com>, Stewart Gordon says...
 
Sean Kelly wrote:

But it looks like the
point you were trying to make is that 'C' and 'c' are considered equivalent
keys, which I would consider correct behavior given the current implementation.

Correct behaviour is determined by the spec, not by some implementaion.

 
 I disagree.  DMD is a reference implementation, so where the spec is vague DMD
 should serve as an authoritative refernce on expected behavior.

This would mean that:

1. All undefined testcases in DStress should be either pass or xfail, 
defeating the point.

2. If some undefined code crashes the compiler, then crashing the 
compiler is correct behaviour.

See my other post for clarification.  I didn't mean to imply that DMD should be
law so much as a reference for intended behavior when the spec is lacking.  But
some extrapolation is required as well.

3. Third-party compiler writers would have to hop back and forth between 
the spec and the reference implementation.

This strikes me as absurd.  A language should be implementable from the 
spec alone.

I agree completely.  But at the moment, the spec is a bit sparse.  In some
respects, I rather like this, as it makes me feel like some features I don't
like aren't yet set in stone.


Sean

Aug 09 2005

Derek Parnell <derek psych.ward> writes:

On Tue, 09 Aug 2005 17:20:07 +0100, Stewart Gordon wrote:

 Sean Kelly wrote:
 
 In article <dda28h$21m5$1 digitaldaemon.com>, Stewart Gordon says...
 
Sean Kelly wrote:

But it looks like the
point you were trying to make is that 'C' and 'c' are considered equivalent
keys, which I would consider correct behavior given the current implementation.

Correct behaviour is determined by the spec, not by some implementaion.

 
 I disagree.  DMD is a reference implementation, so where the spec is vague DMD
 should serve as an authoritative refernce on expected behavior.

 
 This would mean that:
 
 1. All undefined testcases in DStress should be either pass or xfail, 
 defeating the point.
 
 2. If some undefined code crashes the compiler, then crashing the 
 compiler is correct behaviour.
 
 3. Third-party compiler writers would have to hop back and forth between 
 the spec and the reference implementation.
 
 This strikes me as absurd.  A language should be implementable from the 
 spec alone.

Absolutely true! 

But that's not how it is. Reality supersedes theory.

DMD is both the best reference we have to date and it is not complete. It
contains mistakes and 'prototype' code. The official D Language
specification is not as up-to-date as the implementation of Walter's ideas
- DMD. 

Walter does not follow best practices when it comes to software
development; get over it.

-- 
Derek Parnell
Melbourne, Australia
10/08/2005 7:54:39 AM

Aug 09 2005

Sean Kelly <sean f4.ca> writes:

In article <dd3cea$26u7$1 digitaldaemon.com>, Stewart Gordon says...
It is possible to define the opCmp and opEquals methods so that unequal 
objects can rank equally in order.  This is allowed by the Java specs, 
and indeed one or two Java API classes do this.

However, when this is done in D, the current implementation of 
associative arrays falls apart.

Oh, I should mention that all my prior experience with hash maps has been that
they use equality for key comparisons, not equivalence.  I was a bit surprised
when I learned D's implementation doesn't work this way.  While binary trees
might be a tad faster in some cases, I'm not sure I like that anything I hash
must be sortable.


Sean

Aug 08 2005

Manfred Nowak <svv1999 hotmail.com> writes:

Stewart Gordon <smjg_1998 yahoo.com> wrote in
news:dd3cea$26u7$1 digitaldaemon.com 

[...]
 But there seems to be room for debate on what the best solution
 is. Possibilities:
 
 1. State in the spec that it is an incorrect practice to define
 opCmp to return 0 if opEquals returns true.

[...]

Introducing alternative 4:

Redesign of overloading operators.

1. In case of singletons also an `opEqual' is senseless and should be 
accompanied by an implementation with an `assert( 0);'

2. When an `opEqual' makes sense, then it is so basic, that it should 
be executed prior to the other cases except for performance reasons. 

2.a. Therefore an `opNotEqual' should accompany the `opEqual' and the
`opCmp' should be synthesized from both along the line:

   eq= opEqual();
   if( eq) return 0;
   neq= opNotEqual();
   if( eq != neq)
     return neq;
   else
     throw new Exception( "Aequivalence not allowed");
   
2.b. If the above template is not suitable out of performance reasons
the `opCmp' should not be synthesized and replaced by the definition 
of an `opCmpPerf'. Its then up to the adult programmer to assure the 
consistency of the design.

-manfred

Aug 22 2005

D Programming

C/C++ Programming

Other

digitalmars.D - [Vote] Should defining opCmp to be inconsistent with opEquals be