www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - [Vote] Should defining opCmp to be inconsistent with opEquals be

reply Stewart Gordon <smjg_1998 yahoo.com> writes:
It is possible to define the opCmp and opEquals methods so that unequal 
objects can rank equally in order.  This is allowed by the Java specs, 
and indeed one or two Java API classes do this.

However, when this is done in D, the current implementation of 
associative arrays falls apart.

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/4556

The problem is that the AA implementation assumes that if opCmp returns 
zero then the keys match.  As far as I can see, the spec doesn't in any 
way forbid behaviour to the contrary.

But there seems to be room for debate on what the best solution is. 
Possibilities:

1. State in the spec that it is an incorrect practice to define opCmp to 
return 0 if opEquals returns true.

2. Keep it as an acceptable practice, but warn people properly that such 
a type may not work correctly as an AA key.

3. Keep it as an acceptable practice, and fix the AA implementation to 
work on such types.  Before you dismiss this idea as inefficient, please 
read

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/4710

Cast your votes now!

Stewart.

-- 
My e-mail is valid but not my primary mailbox.  Please keep replies on 
on the 'group where everyone may benefit.
Aug 06 2005
next sibling parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Sat, 06 Aug 2005 23:10:26 +0100, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:
 It is possible to define the opCmp and opEquals methods so that unequal  
 objects can rank equally in order.
I see in your first example/link you're defining a character class for which opEquals does a case-sensitive compare, but opCmp does a case insensitive comparrison. The goal, I assume, is to have characters/strings that sort/rank in a case insensitive fashion but are only exactly equal if the case is the same. Am I right? A recent thread on the array 'sort' operator has made me think that perhaps we can apply the same idea to AA's. That idea is the ability to assign a 'sort'/'opCmp' method so the array. So, we pick:
 1. State in the spec that it is an incorrect practice to define opCmp to  
 return 0 if opEquals returns true.
and instead allow the following... class IChar { static int opCmpCase(IChar lhs, IChar rhs) {} ..etc.. } int[IChar] list; list.opCmp = &IChar.opCmpCase; ..add items to 'list'.. and/or allow assignment of a delegate, eg. class ICharSort { int[] sortOrder; //by name, then by date, then by .. int opCmp(IChar lhs, IChar rhs) {} } ICharSort s = new ICharSort(); ..define sortOrder in 's'.. int[IChar] list; list.opCmp = &s.opCmp; ..add items to 'list'.. to allow us to sort/rank in a more complex fashion. For an AA it could either be illegal, or would cause a rehash to assign the delegate when it contains items. For an array it wouldn't matter (except that it would sort of invalidate the state i.e. if you sort, then assign it's not technically 'sorted' by it's current operator any more) The advantage here is that we have only one array/AA implementation and there should be little (no?) performance penalty using types that do or don't have this requirement. Of course one argument against this idea may be that the class ('IChar' in this example) should never be sorted/ranked in any other way. In which case... Why can't we have a 3rd operator opRank (or opSort) - definable in IChar, and make the the array/AA smart. The array/AA could, upon creation, assign a comparrison delegate to this new operator if present and opCmp otherwise. In this way there is no additional performance penalty (as it only checks once, upon creation, not every comparrison) Thoughts? Regan
Aug 06 2005
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Regan Heath wrote:

 On Sat, 06 Aug 2005 23:10:26 +0100, Stewart Gordon 
 <smjg_1998 yahoo.com>  wrote:
 
 It is possible to define the opCmp and opEquals methods so that 
 unequal  objects can rank equally in order.
I see in your first example/link you're defining a character class for which opEquals does a case-sensitive compare, but opCmp does a case insensitive comparrison. The goal, I assume, is to have characters/strings that sort/rank in a case insensitive fashion but are only exactly equal if the case is the same. Am I right?
Yes. But my example was on the contrived side, but maybe the idea can be put to practical use.
 A recent thread on the array 'sort' operator has made me think that  
 perhaps we can apply the same idea to AA's. That idea is the ability to  
 assign a 'sort'/'opCmp' method so the array.
I personally prefer the idea of having array.sort(cmp) available here. Not only is writing array.sort(&cmpByDate); shorter than array.opCmp = &cmpByDate; array.sort; but it also doesn't bloat memory requirements.
 So, we pick:
 
 1. State in the spec that it is an incorrect practice to define opCmp 
 to  return 0 if opEquals returns true.
and instead allow the following... class IChar { static int opCmpCase(IChar lhs, IChar rhs) {} ..etc.. } int[IChar] list; list.opCmp = &IChar.opCmpCase; ..add items to 'list'..
I'm not sure opCmp is the best name - it looks like you're setting the function that will be used to compare this AA with another AA. Maybe call it comparator? That said, what's the point of this? To allow some AAs to be case-sensitive and others to be case-insensitive? If you're going to do this, do we gain much over doing assignments and lookups to aa[toupper(key)]? If we solved the problem by giving every AA a comparator property, then every AA that uses such a keytype would need to set the comparator anyway. So why not go the whole hog and let the type define the AA comparator? <snip>
 Why can't we have a 3rd operator opRank (or opSort) - definable in 
 IChar,  and make the the array/AA smart. The array/AA could, upon 
 creation, assign  a comparrison delegate to this new operator if present 
 and opCmp otherwise.
<snip> But calling it opSort would imply using it for .sort as well. This is pointless IMO - normal opCmp can be used for this. If you define opCmp such that unequal objects can rank equally, then you're stating that when sorting, you don't care what order equally-ranking objects end up in. Maybe call it opStrictCmp. AAs (and any other applications relying on a comparator consistent with opEquals) can use this, and the default .sort can use plain opCmp. Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on on the 'group where everyone may benefit.
Aug 08 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Mon, 08 Aug 2005 12:31:59 +0100, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:
 A recent thread on the array 'sort' operator has made me think that   
 perhaps we can apply the same idea to AA's. That idea is the ability  
 to  assign a 'sort'/'opCmp' method so the array.
I personally prefer the idea of having array.sort(cmp) available here. Not only is writing array.sort(&cmpByDate); shorter than array.opCmp = &cmpByDate; array.sort; but it also doesn't bloat memory requirements.
Sure. But in the case of an AA we need to assign an operator/function/delegate for it to use in it's internal ordering on every insert/lookup/etc.
 So, we pick:

 1. State in the spec that it is an incorrect practice to define opCmp  
 to  return 0 if opEquals returns true.
and instead allow the following... class IChar { static int opCmpCase(IChar lhs, IChar rhs) {} ..etc.. } int[IChar] list; list.opCmp = &IChar.opCmpCase; ..add items to 'list'..
I'm not sure opCmp is the best name - it looks like you're setting the function that will be used to compare this AA with another AA. Maybe call it comparator?
We can call it whatever.
 That said, what's the point of this?  To allow some AAs to be  
 case-sensitive and others to be case-insensitive?  If you're going to do  
 this, do we gain much over doing assignments and lookups to  
 aa[toupper(key)]?
To allow you to store your weird objects for which opEquals and opCmp disagree.
 If we solved the problem by giving every AA a comparator property, then  
 every AA that uses such a keytype would need to set the comparator  
 anyway.  So why not go the whole hog and let the type define the AA  
 comparator?
Sure, that's exactly what I suggest below.
 <snip>
 Why can't we have a 3rd operator opRank (or opSort) - definable in  
 IChar,  and make the the array/AA smart. The array/AA could, upon  
 creation, assign  a comparrison delegate to this new operator if  
 present and opCmp otherwise.
<snip> But calling it opSort would imply using it for .sort as well.
But that's _exactly_ what it is used for, ranking/sorting the object. I'm suggesting that opEquals and opCmp should _always_ agree and opSort may disagree with opCmp. AA's would use opCmp, sort would use opSort. This lets you use them in AA's and sort them how you want in normal arrays. You suggest the opposite below, either way a 3rd operator to handle your unusual type with AA's and arrays automagically picking up and using the new operator when poresent is the solution I like best.
 This is pointless IMO - normal opCmp can be used for this.  If you  
 define opCmp such that unequal objects can rank equally, then you're  
 stating that when sorting, you don't care what order equally-ranking  
 objects end up in.

 Maybe call it opStrictCmp.  AAs (and any other applications relying on a  
 comparator consistent with opEquals) can use this, and the default .sort  
 can use plain opCmp.
Regan
Aug 08 2005
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Regan Heath wrote:
<snip>
 I'm suggesting that opEquals and opCmp should _always_ agree and opSort  
 may disagree with opCmp. AA's would use opCmp, sort would use opSort. 
 This lets you use them in AA's and sort them how you want in normal 
 arrays.
<snip> Which would the < <= > >= operators use under your proposal? Since this is akin to sorting, it would make sense to use opSort, but then it becomes confusing that there is a method called opCmp but it doesn't do this. Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on on the 'group where everyone may benefit.
Aug 09 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Tue, 09 Aug 2005 12:06:53 +0100, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:
 Regan Heath wrote:
 <snip>
 I'm suggesting that opEquals and opCmp should _always_ agree and  
 opSort  may disagree with opCmp. AA's would use opCmp, sort would use  
 opSort. This lets you use them in AA's and sort them how you want in  
 normal arrays.
<snip> Which would the < <= > >= operators use under your proposal?
opCmp
 Since this is akin to sorting, it would make sense to use opSort
I disagree. You're comparing not sorting thus opCmp makes sense. More important however is _how_ you want to compare in the general case with this special type, eg. IChar a = 'a'; IChar b = 'B'; if (a < b) { //true or false? } Assume for the sake of argument: 'a' = 0x61 or 91 decimal 'B' = 0x42 or 66 decimal Is the statement true or false? Does the answer depend on context (if so, which is more common case or no case?) or will this type always be compared in one way? If you generally want to do a case-insensitive compare then you're defining an opCmp that AA's cannot use. If you want to do case-sensitive compare generally then you're defining an opCmp that AA's can use. I suspect you want the former, right? In which case the solution might be to have another operator (as you suggested) opCmpStrict or something used by AA's. This is what I referred to when I said "either way" earlier. This operator can be found and used automagically by the AA without requiring a change to the 'balanced binary tree' part of the implementation. Regan p.s. My intention in starting this part of the thread was to explore solutions that don't involve changing the AA implementation. I'm not saying that's not the solution, I'm just exploring...
Aug 09 2005
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Regan Heath wrote:

 On Tue, 09 Aug 2005 12:06:53 +0100, Stewart Gordon 
 <smjg_1998 yahoo.com>  wrote:
<snip>
 Which would the < <= > >= operators use under your proposal?
opCmp
 Since this is akin to sorting, it would make sense to use opSort
I disagree. You're comparing not sorting thus opCmp makes sense.
Adding or looking up a key in an AA isn't comparing either. Under the way I had it, one comparator is used externally and one internally. It also remains possible to define < <= > >= to have a non-strict comparator. (OK, so there would remain inconsistency between qwert <= yuiop and qwert < yuiop || qwert == yuiop but is this really an excuse?)
 More important however is _how_ you want to compare in the general case  
 with this special type, eg.
 
 IChar a = 'a';
 IChar b = 'B';
 
 if (a < b) { //true or false? }
 
 Assume for the sake of argument:
   'a' = 0x61 or 91 decimal
   'B' = 0x42 or 66 decimal
 
 Is the statement true or false?
 
 Does the answer depend on context (if so, which is more common case or 
 no case?) or will this type always be compared in one way?
My thought was that it will always be compared in this way.
 If you generally want to do a case-insensitive compare then you're 
 defining an opCmp that AA's cannot use.
 If you want to do case-sensitive compare generally then you're defining 
 an opCmp that AA's can use.
 
 I suspect you want the former, right?
<snip> The idea of my example is that they rank case-insensitively but match case-sensitively. Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on on the 'group where everyone may benefit.
Aug 09 2005
parent reply "Regan Heath" <regan netwin.co.nz> writes:
On Tue, 09 Aug 2005 17:14:42 +0100, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:
 On Tue, 09 Aug 2005 12:06:53 +0100, Stewart Gordon  
 <smjg_1998 yahoo.com>  wrote:
<snip>
 Which would the < <= > >= operators use under your proposal?
opCmp
 Since this is akin to sorting, it would make sense to use opSort
I disagree. You're comparing not sorting thus opCmp makes sense.
Adding or looking up a key in an AA isn't comparing either.
In which case it should use opRank or opSort, no? (if it exists) Comparing is required for sorting, but not vice-versa. What AA's do is akin to sorting, not comparing. Right?
 Under the way I had it, one comparator is used externally and one  
 internally.
externally: case-insensitive (opCmp) internally: case-sensitive (opRank/opSort/opAA ..) Right?
 More important however is _how_ you want to compare in the general  
 case  with this special type, eg.
  IChar a = 'a';
 IChar b = 'B';
  if (a < b) { //true or false? }
  Assume for the sake of argument:
   'a' = 0x61 or 91 decimal
   'B' = 0x42 or 66 decimal
  Is the statement true or false?
  Does the answer depend on context (if so, which is more common case or  
 no case?) or will this type always be compared in one way?
My thought was that it will always be compared in this way.
Always compared in a case insensitive fashion? Except.. when compared during sorting/ranking.
 If you generally want to do a case-insensitive compare then you're  
 defining an opCmp that AA's cannot use.
 If you want to do case-sensitive compare generally then you're defining  
 an opCmp that AA's can use.
  I suspect you want the former, right?
<snip> The idea of my example is that they rank case-insensitively but match case-sensitively.
By 'match' do you mean only '==' or '<' '<=' etc as well? Regan
Aug 09 2005
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Regan Heath wrote:
 On Tue, 09 Aug 2005 17:14:42 +0100, Stewart Gordon 
 <smjg_1998 yahoo.com>  wrote:
<snip> [Regan:]
 I disagree. You're comparing not sorting thus opCmp makes sense.
Adding or looking up a key in an AA isn't comparing either.
In which case it should use opRank or opSort, no? (if it exists) Comparing is required for sorting, but not vice-versa. What AA's do is akin to sorting, not comparing. Right?
Yes, except that an AA is sorted only internally. OTOH array.sort is external sorting, i.e. the application, and possibly even the user, concerns him/her/itself with the order generated by array.sort. <snip>
 The idea of my example is that they rank case-insensitively but match  
 case-sensitively.
By 'match' do you mean only '==' or '<' '<=' etc as well?
By 'rank' I mean what < <= > >= do - maybe 'order' is a better word. By 'match' I mean ==, and hence what one would naturally expect of AA key matching. Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on on the 'group where everyone may benefit.
Aug 10 2005
parent "Regan Heath" <regan netwin.co.nz> writes:
On Wed, 10 Aug 2005 18:12:48 +0100, Stewart Gordon <smjg_1998 yahoo.com>  
wrote:
 Regan Heath wrote:
 On Tue, 09 Aug 2005 17:14:42 +0100, Stewart Gordon  
 <smjg_1998 yahoo.com>  wrote:
<snip> [Regan:]
 I disagree. You're comparing not sorting thus opCmp makes sense.
Adding or looking up a key in an AA isn't comparing either.
In which case it should use opRank or opSort, no? (if it exists) Comparing is required for sorting, but not vice-versa. What AA's do is akin to sorting, not comparing. Right?
Yes, except that an AA is sorted only internally. OTOH array.sort is external sorting, i.e. the application, and possibly even the user, concerns him/her/itself with the order generated by array.sort.
Sure. It doesn't matter how the AA's sort works provided it does not call two unequal items equal (the problem with your special type and it's opCmp operator). Whereas the array.sort matters, and the user may want to change/define it differently in each call of array.sort.
 <snip>
 The idea of my example is that they rank case-insensitively but match   
 case-sensitively.
By 'match' do you mean only '==' or '<' '<=' etc as well?
By 'rank' I mean what < <= > >= do - maybe 'order' is a better word. By 'match' I mean ==, and hence what one would naturally expect of AA key matching.
I would have said that < <= etc "compare" not "rank" or "order", and == was "compare" also, but perhaps "match". Basically, in my head < <= etc and = are all "compare" operations. To "rank" or "order" you need some sort of context. i.e. inside an AA or when calling array.sort. You can "compare" any 2 items, you can "rank" or "order" a set of x (where x is > 1), ranking 2 items is the same as comparing, to decide which you need the context. Anyway.. I think the solution is another operator which you can define for your types and is magically found and used by the AA implementation. So you'd have: class IChar { char data; int opCmp(IChar rhs) { return toupper(data) - toupper(rhs.data); } int opEquals(IChar rhs) { return data == rhs.data; } int opAA(IChar rhs) { return data - rhs.data; } } Giving you a type that you can compare with < <= etc (opCmp), compare with = (opEquals) and store in an AA (opAA - or whatever we want to call it) The AA implementation needs only one minor change, the ability to find and use the opAA operator. Essentially instead of calling opCmp it has a delegate which it assigns upon creation to opAA or opCmp then uses for all it's internal workings. Regan
Aug 10 2005
prev sibling next sibling parent reply "Ben Hinkle" <ben.hinkle gmail.com> writes:
 1. State in the spec that it is an incorrect practice to define opCmp to 
 return 0 if opEquals returns true.

 2. Keep it as an acceptable practice, but warn people properly that such a 
 type may not work correctly as an AA key.

 3. Keep it as an acceptable practice, and fix the AA implementation to 
 work on such types.
My first preference would be 1 since I think it would be a maintenance nightmare to allow values such that x<=y and y<=x but x!=y. It takes me back to my TA days in freshman calculus and the algebra abuse that students are capable of. But given that D has the <> and !<> operators presumably to allow users to distinguish those concepts from != and == I'll vote for 2. So either I say go with 1 and look hard at !<> and <> or go with 2 and say that if you design a type to use <> and !<> that you should give big blinking warnings. ps - what does <>= do? It is a fun way of checking if an opCmp throws?
Aug 06 2005
parent reply Burton Radons <burton-radons smocky.com> writes:
2, I don't care what incorrect code does and I don't want any speed 
penalty from confirming that I'm writing correct code when I always test 
every line I write.

Object.opCmp should be changed to return float, so that it can use the 
semantics the compiler actually applies to the return value of 
comparison operators.  In that case 0 does mean equality - float.nan 
means unordered.

This change should've been made years ago, and the operator overloading 
page should be fixed up as well.  Someone send Walter a patch.

Ben Hinkle wrote:
 ps - what does <>= do? It is a fun way of checking if an opCmp throws? 
The throw semantics are irreproducible, and anyway either the standard or the implementation is wrong - it doesn't throw and I don't think there's a Phobos way to enable FP exceptions. But directly related to your question, it returns whether the values are not unordered. So "float.nan <>= float.nan" is false, while "4 <>= 4" is true.
Aug 07 2005
parent "Ben Hinkle" <ben.hinkle gmail.com> writes:
"Burton Radons" <burton-radons smocky.com> wrote in message 
news:dd6alf$17kj$1 digitaldaemon.com...
 2, I don't care what incorrect code does and I don't want any speed 
 penalty from confirming that I'm writing correct code when I always test 
 every line I write.

 Object.opCmp should be changed to return float, so that it can use the 
 semantics the compiler actually applies to the return value of comparison 
 operators.  In that case 0 does mean equality - float.nan means unordered.

 This change should've been made years ago, and the operator overloading 
 page should be fixed up as well.  Someone send Walter a patch.

 Ben Hinkle wrote:
 ps - what does <>= do? It is a fun way of checking if an opCmp throws?
The throw semantics are irreproducible, and anyway either the standard or the implementation is wrong - it doesn't throw and I don't think there's a Phobos way to enable FP exceptions. But directly related to your question, it returns whether the values are not unordered. So "float.nan <>= float.nan" is false, while "4 <>= 4" is true.
True. I was talking about <>= in the context of opCmp for objects. Since opCmp currently returns an int <>= only tests if the opCmp throws. I agree if opCmp returned a float then we can start giving <>= meaning for objects. Note also that nan behavior is different from opCmp and opEqual implementing different notions of equality. To see what I mean concretely about <>= and the semantics of <> and != try running class X { int val; int opCmp(Object y) { X y2 = cast(X)y; int diff = val - y2.val; if (diff < 5 && -diff < 5) return 0; return diff; } int opEqual(Object y) { X y2 = cast(X)y; return val == y2.val; } } int main() { X a = new X; X b = new X; a.val = 10; b.val = 12; assert( a <= b ); assert( a >= b ); assert( a != b ); assert( a !<> b ); assert( a <>= b ); // should always be true return 0; }
Aug 07 2005
prev sibling next sibling parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Stewart Gordon <smjg_1998 yahoo.com> wrote:

My vote is:

 3. Keep it as an acceptable practice, and fix the AA
 implementation to work on such types.
I already mentioned, that the AA-implementation is broken: http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/22170 | Hashing needs at least one hash function, an _equal_ operator | and one of these: an upper bound on the number of elements or a | rehash-operation. The opCmp is not needed for hashing. The current Implementation uses opCmp for traversing its unbalanced binary trees, which can degenerate to linear lists, depending on the quality of toHash and the insertion sequence. Degenerated AA's will be terrible slow, because in the average case a lookup will traverse half of the elements in the AA, whereas a pure hash with conflict resolution by jumping can be guaranteed to bind the lookup-time by a constant, i.e. independent of the number of elements stored. Because only opEqual is needed there is no necessity to restrict or warn. The drawback is, that if removing must be supported the space requirements are temporarily increased instead of decreased. So clearly choice 3 with a clear preference for a pure hash. -manfred
Aug 07 2005
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Manfred Nowak wrote:

 Stewart Gordon <smjg_1998 yahoo.com> wrote:
 
 My vote is:
 
 3. Keep it as an acceptable practice, and fix the AA
 implementation to work on such types.
I already mentioned, that the AA-implementation is broken: http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D/22170 | Hashing needs at least one hash function, an _equal_ operator | and one of these: an upper bound on the number of elements or a | rehash-operation. The opCmp is not needed for hashing. The current Implementation uses opCmp for traversing its unbalanced binary trees, which can degenerate to linear lists, depending on the quality of toHash and the insertion sequence.
That's why it talks about calling .rehash after populating the AA. But yes, it doesn't specify how thoroughly it should reorganise the array. I guess a sensible approach (under the current AA imp) is to recalculate all the hash values, allocate the optimum number of hash slots and balance the binary trees.
 Degenerated AA's will be terrible slow, because in the average case a 
 lookup will traverse half of the elements in the AA, whereas a pure 
 hash with conflict resolution by jumping can be guaranteed to bind 
 the lookup-time by a constant, i.e. independent of the number of 
 elements stored.
<snip> You have a point. But what do you mean by jumping, exactly? Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on on the 'group where everyone may benefit.
Aug 08 2005
parent Manfred Nowak <svv1999 hotmail.com> writes:
Stewart Gordon <smjg_1998 yahoo.com> wrote:

[...]
 allocate the optimum number of hash slots and balance the binary
 trees.
Optimal number of hash slots under which target function? And if you have to press any balancing algorithm into code for the sake of collision resolution, that would imply, that you do not trust the hashing part. Then: why use hashing at all? [...]
 But what do you mean by jumping, exactly?
If you find that the hash slot provided by toHash is already occupied, you choose the next slot to probe for occupation by increasing the index of the current hash slot by a given amount, i.e. jump to the next hash slot. All you have to guarantee is that if there is an empty hash slot you can find it by jumping. Up to a fill rate of about 0.9 thoretical results show, that the length of the jump distance is unimportant and the length of the average probing sequnence in search for an empty hash slot can be bound by the reciprocal of 1 minus the fill rate, i.e. 10 in the case of a fill rate of 0.9. -manfred
Aug 08 2005
prev sibling next sibling parent Stewart Gordon <smjg_1998 yahoo.com> writes:
Stewart Gordon wrote:
<snip>
 1. State in the spec that it is an incorrect practice to define opCmp to 
 return 0 if opEquals returns true.
<snip> Oops, I meant if opEquals returns false. But I'm guessing that anyone who voted for this option misread it as this anyway.... Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on on the 'group where everyone may benefit.
Aug 08 2005
prev sibling next sibling parent reply Sean Kelly <sean f4.ca> writes:
In article <dd3cea$26u7$1 digitaldaemon.com>, Stewart Gordon says...
It is possible to define the opCmp and opEquals methods so that unequal 
objects can rank equally in order.  This is allowed by the Java specs, 
and indeed one or two Java API classes do this.
Yes. I do this in C++ all the time and consider it an important feature.
However, when this is done in D, the current implementation of 
associative arrays falls apart.

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/4556

The problem is that the AA implementation assumes that if opCmp returns 
zero then the keys match.  As far as I can see, the spec doesn't in any 
way forbid behaviour to the contrary.
That's because they do match, at least as far as an ordered binary tree are concerned. The C++ standard defines this as equivalence, ie. two objects have the same sort order.
But there seems to be room for debate on what the best solution is. 
Possibilities:

1. State in the spec that it is an incorrect practice to define opCmp to 
return 0 if opEquals returns true.
I'm not sure I like this. While I generally make equivalence a looser match than equality, it's reasonable that an application might violate this condition.
2. Keep it as an acceptable practice, but warn people properly that such 
a type may not work correctly as an AA key.
Why wouldn't it work properly?
3. Keep it as an acceptable practice, and fix the AA implementation to 
work on such types.  Before you dismiss this idea as inefficient, please 
read

http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/4710
If hash maps are going to change to use equality rather than equivalence then it has to happen soon. But frankly, what I really want is a way to supply a comparison delegate on declaration. Sean
Aug 08 2005
next sibling parent Sean Kelly <sean f4.ca> writes:
In article <dd7sau$2ktr$1 digitaldaemon.com>, Sean Kelly says...
In article <dd3cea$26u7$1 digitaldaemon.com>, Stewart Gordon says...

1. State in the spec that it is an incorrect practice to define opCmp to 
return 0 if opEquals returns true.
I'm not sure I like this. While I generally make equivalence a looser match than equality, it's reasonable that an application might violate this condition.
My response was poorly worded (one of these days I'll learn not to post right after waking up). But now this has me confused. Are you saying that the spec should forbid two objects from being equivalent if they are equal? Or did you mean "if opEquals returns false?" Either way, I don't like it. Sean
Aug 08 2005
prev sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Sean Kelly wrote:

<snip>
2. Keep it as an acceptable practice, but warn people properly that such 
a type may not work correctly as an AA key.
Why wouldn't it work properly?
<snip> Try the example in the post I linked to above and you'll see. Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on on the 'group where everyone may benefit.
Aug 08 2005
parent reply Sean Kelly <sean f4.ca> writes:
In article <dd7udk$2n7b$1 digitaldaemon.com>, Stewart Gordon says...
Sean Kelly wrote:

<snip>
2. Keep it as an acceptable practice, but warn people properly that such 
a type may not work correctly as an AA key.
Why wouldn't it work properly?
<snip> Try the example in the post I linked to above and you'll see.
I don't have writefln available in my current setup :p. But it looks like the point you were trying to make is that 'C' and 'c' are considered equivalent keys, which I would consider correct behavior given the current implementation. Whether or not hash maps should be changed to use unordered slists for collisions so opEquals can be used instead is a separate issue IMO. Sean
Aug 08 2005
parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Sean Kelly wrote:
 In article <dd7udk$2n7b$1 digitaldaemon.com>, Stewart Gordon says...
<snip>
 Try the example in the post I linked to above and you'll see.
I don't have writefln available in my current setup :p.
Update your compiler, or change the example to use printf.
 But it looks like the
 point you were trying to make is that 'C' and 'c' are considered equivalent
 keys, which I would consider correct behavior given the current implementation.
Correct behaviour is determined by the spec, not by some implementaion. The natural assumption is that keys match by equality, not equivalence.
 Whether or not hash maps should be changed to use unordered slists for
 collisions so opEquals can be used instead is a separate issue IMO.
Indeed, separate from whether they should be fixed to use opEquals as well. Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on on the 'group where everyone may benefit.
Aug 09 2005
parent reply Sean Kelly <sean f4.ca> writes:
In article <dda28h$21m5$1 digitaldaemon.com>, Stewart Gordon says...
Sean Kelly wrote:
 But it looks like the
 point you were trying to make is that 'C' and 'c' are considered equivalent
 keys, which I would consider correct behavior given the current implementation.
Correct behaviour is determined by the spec, not by some implementaion.
I disagree. DMD is a reference implementation, so where the spec is vague DMD should serve as an authoritative refernce on expected behavior. Sean
Aug 09 2005
next sibling parent reply Manfred Nowak <svv1999 hotmail.com> writes:
Sean Kelly <sean f4.ca> wrote:

[...]
 I disagree.  DMD is a reference implementation, so where the
 spec is vague DMD should serve as an authoritative refernce on
 expected behavior. 
Then: what is the bugs group for? Only for deviations of dmd to the specs? -manfred
Aug 09 2005
next sibling parent Sean Kelly <sean f4.ca> writes:
In article <ddajcc$2p86$1 digitaldaemon.com>, Manfred Nowak says...
Sean Kelly <sean f4.ca> wrote:

[...]
 I disagree.  DMD is a reference implementation, so where the
 spec is vague DMD should serve as an authoritative refernce on
 expected behavior. 
Then: what is the bugs group for? Only for deviations of dmd to the specs?
Mostly, yes. Though I've posted things there in the past that were more of a feature change request than a bug report. Thing is, Walter is not a tech writer, and he tends to spend time on compiler changes over documentation. So in instances where the spec is unclear or just plain doesn't say anything, you have to make an educated guess about what's intended, and DMD is a useful reference for this. As far as this hash issue is concerned--I consider the use of opCmp to be a requirement simply because a program wouldn't be portable between compilers if one were implemented using opEquals and the other were implemented using opCmp, and DMD uses opCmp. Sean
Aug 09 2005
prev sibling parent Derek Parnell <derek psych.ward> writes:
On Tue, 9 Aug 2005 15:50:36 +0000 (UTC), Manfred Nowak wrote:

 Sean Kelly <sean f4.ca> wrote:
 
 [...]
 I disagree.  DMD is a reference implementation, so where the
 spec is vague DMD should serve as an authoritative refernce on
 expected behavior. 
Then: what is the bugs group for? Only for deviations of dmd to the specs?
Do you try to be thick? DMD is not perfect - it has mistakes that even we simpletons can see. And in spite of that, it is still further along the track than the official documentation. This is not how it should be, I know, but that's the reality we have to put up with. And complaining about that aspect has got even less chance of an improvement than some of the weirdo things on our wishlists. Its simply not Walter's development methodology to document before coding. -- Derek Parnell Melbourne, Australia 10/08/2005 7:46:42 AM
Aug 09 2005
prev sibling parent reply Stewart Gordon <smjg_1998 yahoo.com> writes:
Sean Kelly wrote:

 In article <dda28h$21m5$1 digitaldaemon.com>, Stewart Gordon says...
 
Sean Kelly wrote:

But it looks like the
point you were trying to make is that 'C' and 'c' are considered equivalent
keys, which I would consider correct behavior given the current implementation.
Correct behaviour is determined by the spec, not by some implementaion.
I disagree. DMD is a reference implementation, so where the spec is vague DMD should serve as an authoritative refernce on expected behavior.
This would mean that: 1. All undefined testcases in DStress should be either pass or xfail, defeating the point. 2. If some undefined code crashes the compiler, then crashing the compiler is correct behaviour. 3. Third-party compiler writers would have to hop back and forth between the spec and the reference implementation. This strikes me as absurd. A language should be implementable from the spec alone. Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on on the 'group where everyone may benefit.
Aug 09 2005
next sibling parent Sean Kelly <sean f4.ca> writes:
In article <ddal12$2sda$1 digitaldaemon.com>, Stewart Gordon says...
Sean Kelly wrote:

 In article <dda28h$21m5$1 digitaldaemon.com>, Stewart Gordon says...
 
Sean Kelly wrote:

But it looks like the
point you were trying to make is that 'C' and 'c' are considered equivalent
keys, which I would consider correct behavior given the current implementation.
Correct behaviour is determined by the spec, not by some implementaion.
I disagree. DMD is a reference implementation, so where the spec is vague DMD should serve as an authoritative refernce on expected behavior.
This would mean that: 1. All undefined testcases in DStress should be either pass or xfail, defeating the point. 2. If some undefined code crashes the compiler, then crashing the compiler is correct behaviour.
See my other post for clarification. I didn't mean to imply that DMD should be law so much as a reference for intended behavior when the spec is lacking. But some extrapolation is required as well.
3. Third-party compiler writers would have to hop back and forth between 
the spec and the reference implementation.

This strikes me as absurd.  A language should be implementable from the 
spec alone.
I agree completely. But at the moment, the spec is a bit sparse. In some respects, I rather like this, as it makes me feel like some features I don't like aren't yet set in stone. Sean
Aug 09 2005
prev sibling parent Derek Parnell <derek psych.ward> writes:
On Tue, 09 Aug 2005 17:20:07 +0100, Stewart Gordon wrote:

 Sean Kelly wrote:
 
 In article <dda28h$21m5$1 digitaldaemon.com>, Stewart Gordon says...
 
Sean Kelly wrote:

But it looks like the
point you were trying to make is that 'C' and 'c' are considered equivalent
keys, which I would consider correct behavior given the current implementation.
Correct behaviour is determined by the spec, not by some implementaion.
I disagree. DMD is a reference implementation, so where the spec is vague DMD should serve as an authoritative refernce on expected behavior.
This would mean that: 1. All undefined testcases in DStress should be either pass or xfail, defeating the point. 2. If some undefined code crashes the compiler, then crashing the compiler is correct behaviour. 3. Third-party compiler writers would have to hop back and forth between the spec and the reference implementation. This strikes me as absurd. A language should be implementable from the spec alone.
Absolutely true! But that's not how it is. Reality supersedes theory. DMD is both the best reference we have to date and it is not complete. It contains mistakes and 'prototype' code. The official D Language specification is not as up-to-date as the implementation of Walter's ideas - DMD. Walter does not follow best practices when it comes to software development; get over it. -- Derek Parnell Melbourne, Australia 10/08/2005 7:54:39 AM
Aug 09 2005
prev sibling next sibling parent Sean Kelly <sean f4.ca> writes:
In article <dd3cea$26u7$1 digitaldaemon.com>, Stewart Gordon says...
It is possible to define the opCmp and opEquals methods so that unequal 
objects can rank equally in order.  This is allowed by the Java specs, 
and indeed one or two Java API classes do this.

However, when this is done in D, the current implementation of 
associative arrays falls apart.
Oh, I should mention that all my prior experience with hash maps has been that they use equality for key comparisons, not equivalence. I was a bit surprised when I learned D's implementation doesn't work this way. While binary trees might be a tad faster in some cases, I'm not sure I like that anything I hash must be sortable. Sean
Aug 08 2005
prev sibling parent Manfred Nowak <svv1999 hotmail.com> writes:
Stewart Gordon <smjg_1998 yahoo.com> wrote in
news:dd3cea$26u7$1 digitaldaemon.com 

[...]
 But there seems to be room for debate on what the best solution
 is. Possibilities:
 
 1. State in the spec that it is an incorrect practice to define
 opCmp to return 0 if opEquals returns true.
[...] Introducing alternative 4: Redesign of overloading operators. 1. In case of singletons also an `opEqual' is senseless and should be accompanied by an implementation with an `assert( 0);' 2. When an `opEqual' makes sense, then it is so basic, that it should be executed prior to the other cases except for performance reasons. 2.a. Therefore an `opNotEqual' should accompany the `opEqual' and the `opCmp' should be synthesized from both along the line: eq= opEqual(); if( eq) return 0; neq= opNotEqual(); if( eq != neq) return neq; else throw new Exception( "Aequivalence not allowed"); 2.b. If the above template is not suitable out of performance reasons the `opCmp' should not be synthesized and replaced by the definition of an `opCmpPerf'. Its then up to the adult programmer to assure the consistency of the design. -manfred
Aug 22 2005