digitalmars.D.bugs - False matches in AAs revisited
- Stewart Gordon (136/136) Jul 18 2005 Using DMD 0.128, Windows 98SE.
- Kevin Bealer (15/151) Jul 20 2005 Lets say that opCmp() said that the object it found was equal, BUT, opEq...
- Stewart Gordon (21/34) Jul 21 2005 The spec doesn't refer to opEqual. But I'm guessing you mean opEquals.
- Kevin Bealer (11/45) Jul 22 2005 Let me explain what I mean. It has to use opCmp() to find the answer in...
- Stewart Gordon (17/46) Jul 25 2005 That given in the last paragraph of my original post.
- Kevin Bealer (14/52) Jul 27 2005 I see your point. However, I would prefer the current, faster, bin-tree...
- Stewart Gordon (26/35) Aug 06 2005 There seems to be an assumption polluting some of the discussions about
Using DMD 0.128, Windows 98SE. I'd got the impression that associative array lookup was a three-stage process: (i) toHash - to see which hash slot to look in (ii) opCmp - to do a binary search within the hash slot (iii) opEquals - to check that the key it's found really does match However, my recent experiments have shown that it skips stage (iii). At first this seemed to be a problem only with structs, but at least now it's happening with classes just the same. As this program shows: ---------- import std.stdio; import std.ctype; class IChar { char data; this(char d) { data = d; } int opEquals(Object o) in { assert (cast(IChar) o !is null); } body { writefln("Entered IChar.opEquals(Object)"); return opEquals(cast(IChar) o); } bit opEquals(IChar i) { writefln("Entered IChar.opEquals(IChar)"); return data == i.data; } int opCmp(Object o) in { assert (cast(IChar) o !is null); } body { writefln("Entered IChar.opCmp(Object)"); return opCmp(cast(IChar) o); } int opCmp(IChar i) { writefln("Entered IChar.opCmp(IChar)"); return toupper(data) - toupper(i.data); } uint toHash() { writefln("Entered IChar.toHash()"); return toupper(data); } unittest { IChar c = new IChar('c'); IChar d = new IChar('c'); IChar C = new IChar('C'); IChar e = new IChar('e'); assert (c == d); assert (c == C); assert (c != e); puts("Units tested"); } } void main() { int[IChar] aa; IChar c1 = new IChar('c'); IChar c2 = new IChar('c'); IChar c3 = new IChar('C'); aa[c1] = 4; writefln("Added aa[c1]"); aa.rehash; writefln("Rehashed, looking up aa[c1]"); if (c1 in aa) { writefln("aa[c1]:"); writefln(aa[c1]); } else { writefln("aa[c1] doesn't exist"); } writefln("looking up aa[c2]"); if (c2 in aa) { writefln("aa[c2]:"); writefln(aa[c2]); } else { writefln("aa[c2] doesn't exist"); } writefln("looking up aa[c3]"); if (c3 in aa) { writefln("aa[c3]:"); writefln(aa[c3]); } else { writefln("aa[c3] doesn't exist"); } aa[c3] = 7; writefln("Added aa[c3]"); aa.rehash; writefln("Rehashed, looking up aa[c1]"); if (c1 in aa) { writefln("aa[c1]:"); writefln(aa[c1]); } else { writefln("aa[c1] doesn't exist"); } writefln("looking up aa[c2]"); if (c2 in aa) { writefln("aa[c2]:"); writefln(aa[c2]); } else { writefln("aa[c2] doesn't exist"); } writefln("looking up aa[c3]"); if (c3 in aa) { writefln("aa[c3]:"); writefln(aa[c3]); } else { writefln("aa[c3] doesn't exist"); } writefln("removing aa[c3]"); aa.remove(c3); writefln("removed aa[c3], looking up aa[c1]"); if (c1 in aa) { writefln("aa[c1]:"); writefln(aa[c1]); } else { writefln("aa[c1] doesn't exist"); } } ---------- As you see if you try running it, opEquals is never called. It just relies on opCmp to compare the objects for equality. This doesn't work if, as in this case, unequal objects can rank equally in order. It would be trivial to make it call opEquals after digging down the binary search tree to check for true equality. It also wouldn't be difficult to dig further down by a combination of opCmp and opEquals calls if the equally ranking key it's found doesn't turn out to be equal. But see also http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/4555 Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Jul 18 2005
Lets say that opCmp() said that the object it found was equal, BUT, opEqual() thought it was not. What should the AA insertion / search code do? So, I think this is intentional. It is probably better to use opCmp() to test equality, because it can tell you which branch of the binary tree to take if it is not equal. opEqual only says 1 or 0. This problem shows up in Java, and C++ too. The C++ containers use a similar technique to D -- they call "less<T>" to determine order, and to test equality I think they assume that (! (less<T>(a,b) || less<T>(b,a)) means that a==b. I think this is a general issue in object oriented languages though -- opEqual() and opCmp() can be defined to implement inconsistent comparisons. You can fix this in various ways (always use opCmp() for example), but they all seem to impede performance. But Object comparisons really do have to be fast, so it becomes a case of caveat programmor. Kevin In article <dbfu2v$2kiv$1 digitaldaemon.com>, Stewart Gordon says...Using DMD 0.128, Windows 98SE. I'd got the impression that associative array lookup was a three-stage process: (i) toHash - to see which hash slot to look in (ii) opCmp - to do a binary search within the hash slot (iii) opEquals - to check that the key it's found really does match However, my recent experiments have shown that it skips stage (iii). At first this seemed to be a problem only with structs, but at least now it's happening with classes just the same. As this program shows: ---------- import std.stdio; import std.ctype; class IChar { char data; this(char d) { data = d; } int opEquals(Object o) in { assert (cast(IChar) o !is null); } body { writefln("Entered IChar.opEquals(Object)"); return opEquals(cast(IChar) o); } bit opEquals(IChar i) { writefln("Entered IChar.opEquals(IChar)"); return data == i.data; } int opCmp(Object o) in { assert (cast(IChar) o !is null); } body { writefln("Entered IChar.opCmp(Object)"); return opCmp(cast(IChar) o); } int opCmp(IChar i) { writefln("Entered IChar.opCmp(IChar)"); return toupper(data) - toupper(i.data); } uint toHash() { writefln("Entered IChar.toHash()"); return toupper(data); } unittest { IChar c = new IChar('c'); IChar d = new IChar('c'); IChar C = new IChar('C'); IChar e = new IChar('e'); assert (c == d); assert (c == C); assert (c != e); puts("Units tested"); } } void main() { int[IChar] aa; IChar c1 = new IChar('c'); IChar c2 = new IChar('c'); IChar c3 = new IChar('C'); aa[c1] = 4; writefln("Added aa[c1]"); aa.rehash; writefln("Rehashed, looking up aa[c1]"); if (c1 in aa) { writefln("aa[c1]:"); writefln(aa[c1]); } else { writefln("aa[c1] doesn't exist"); } writefln("looking up aa[c2]"); if (c2 in aa) { writefln("aa[c2]:"); writefln(aa[c2]); } else { writefln("aa[c2] doesn't exist"); } writefln("looking up aa[c3]"); if (c3 in aa) { writefln("aa[c3]:"); writefln(aa[c3]); } else { writefln("aa[c3] doesn't exist"); } aa[c3] = 7; writefln("Added aa[c3]"); aa.rehash; writefln("Rehashed, looking up aa[c1]"); if (c1 in aa) { writefln("aa[c1]:"); writefln(aa[c1]); } else { writefln("aa[c1] doesn't exist"); } writefln("looking up aa[c2]"); if (c2 in aa) { writefln("aa[c2]:"); writefln(aa[c2]); } else { writefln("aa[c2] doesn't exist"); } writefln("looking up aa[c3]"); if (c3 in aa) { writefln("aa[c3]:"); writefln(aa[c3]); } else { writefln("aa[c3] doesn't exist"); } writefln("removing aa[c3]"); aa.remove(c3); writefln("removed aa[c3], looking up aa[c1]"); if (c1 in aa) { writefln("aa[c1]:"); writefln(aa[c1]); } else { writefln("aa[c1] doesn't exist"); } } ---------- As you see if you try running it, opEquals is never called. It just relies on opCmp to compare the objects for equality. This doesn't work if, as in this case, unequal objects can rank equally in order. It would be trivial to make it call opEquals after digging down the binary search tree to check for true equality. It also wouldn't be difficult to dig further down by a combination of opCmp and opEquals calls if the equally ranking key it's found doesn't turn out to be equal. But see also http://www.digitalmars.com/drn-bin/wwwnews?digitalmars.D.bugs/4555 Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Jul 20 2005
Kevin Bealer wrote:Lets say that opCmp() said that the object it found was equal, BUT, opEqual() thought it was not. What should the AA insertion / search code do?The spec doesn't refer to opEqual. But I'm guessing you mean opEquals. It should use it.So, I think this is intentional. It is probably better to use opCmp() to test equality, because it can tell you which branch of the binary tree to take if it is not equal. opEqual only says 1 or 0.Up to a point....This problem shows up in Java, and C++ too. The C++ containers use a similar technique to D -- they call "less<T>" to determine order, and to test equality I think they assume that (! (less<T>(a,b) || less<T>(b,a)) means that a==b.How is less different from the < operator? Or is it just something that the STL provides so that you can pretend incomparable classes are comparable?I think this is a general issue in object oriented languages though -- opEqual() and opCmp() can be defined to implement inconsistent comparisons.This can sometimes be by design. Look at BigDecimal in Java for example.You can fix this in various ways (always use opCmp() for example), but they all seem to impede performance. But Object comparisons really do have to be fast, so it becomes a case of caveat programmor.<snip top of upside-down reply> Using only opCmp impedes generality more than it impedes performance. The same goes as for many other long-pending issues surrounding AAs and opCmp: if this is intended behaviour, it would need to be documented. Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Jul 21 2005
In article <dbnq4l$1tfh$1 digitaldaemon.com>, Stewart Gordon says...Kevin Bealer wrote:Let me explain what I mean. It has to use opCmp() to find the answer in the binary tree. Once it finds an answer that way, you would like it to use opEquals()? If so, what should it do with it? If it returns true, obviously we found the answer. If it returns false, what should the action be?Lets say that opCmp() said that the object it found was equal, BUT, opEqual() thought it was not. What should the AA insertion / search code do?The spec doesn't refer to opEqual. But I'm guessing you mean opEquals. It should use it.No difference, really. I think less<T> is a template argument, but if you don't provide it, the sorting in STL uses operator<. In any case, the same argument applies to operator<.So, I think this is intentional. It is probably better to use opCmp() to test equality, because it can tell you which branch of the binary tree to take if it is not equal. opEqual only says 1 or 0.Up to a point....This problem shows up in Java, and C++ too. The C++ containers use a similar technique to D -- they call "less<T>" to determine order, and to test equality I think they assume that (! (less<T>(a,b) || less<T>(b,a)) means that a==b.How is less different from the < operator? Or is it just something that the STL provides so that you can pretend incomparable classes are comparable?Documentation is good, but I don't understand the generality argument you are making. KevinI think this is a general issue in object oriented languages though -- opEqual() and opCmp() can be defined to implement inconsistent comparisons.This can sometimes be by design. Look at BigDecimal in Java for example.You can fix this in various ways (always use opCmp() for example), but they all seem to impede performance. But Object comparisons really do have to be fast, so it becomes a case of caveat programmor.<snip top of upside-down reply> Using only opCmp impedes generality more than it impedes performance. The same goes as for many other long-pending issues surrounding AAs and opCmp: if this is intended behaviour, it would need to be documented. Stewart.-- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.
Jul 22 2005
Kevin Bealer wrote:In article <dbnq4l$1tfh$1 digitaldaemon.com>, Stewart Gordon says...That given in the last paragraph of my original post. <snip>Kevin Bealer wrote:Let me explain what I mean. It has to use opCmp() to find the answer in the binary tree. Once it finds an answer that way, you would like it to use opEquals()? If so, what should it do with it? If it returns true, obviously we found the answer. If it returns false, what should the action be?Lets say that opCmp() said that the object it found was equal, BUT, opEqual() thought it was not. What should the AA insertion / search code do?The spec doesn't refer to opEqual. But I'm guessing you mean opEquals. It should use it.I see.... <snip>No difference, really. I think less<T> is a template argument, but if you don't provide it, the sorting in STL uses operator<. In any case, the same argument applies to operator<.This problem shows up in Java, and C++ too. The C++ containers use a similar technique to D -- they call "less<T>" to determine order, and to test equality I think they assume that (! (less<T>(a,b) || less<T>(b,a)) means that a==b.How is less different from the < operator? Or is it just something that the STL provides so that you can pretend incomparable classes are comparable?Have you not been listening? 1. It doesn't work on types that have no ordering. 2. It doesn't work on types where unequal objects can rank equally in order. Stewart. -- -----BEGIN GEEK CODE BLOCK----- Version: 3.1 GCS/M d- s:- a->--- UB P+ L E W++ N+++ o K- w++ O? M V? PS- PE- Y? PGP- t- 5? X? R b DI? D G e++>++++ h-- r-- !y ------END GEEK CODE BLOCK------ My e-mail is valid but not my primary mailbox. Please keep replies on the 'group where everyone may benefit.Using only opCmp impedes generality more than it impedes performance. The same goes as for many other long-pending issues surrounding AAs and opCmp: if this is intended behaviour, it would need to be documented.Documentation is good, but I don't understand the generality argument you are making.
Jul 25 2005
In article <dc2e89$1h10$1 digitaldaemon.com>, Stewart Gordon says...Kevin Bealer wrote:I see your point. However, I would prefer the current, faster, bin-tree list to changing the behavior for 1. I can code up an arbitrary order if necessary for objects that I need. I usually think its bad form to code a type where opEquals() != (0 == opCmp()). I know people do this, but I think equality should be equality. To me the fact that two methods exist is a performance thing, no more. In other words, having opCmp() follow a looser standard for returning "0" than opEquals() has for returning "1" seems wrong to me. The opCmp() (in my opinion) should check all the fields that opEquals() uses and impose some kind of directionality. BTW, I've heard arguments that some common cases of "inexact equality" are considered poor design now -- ie comparing floating point numbers with an "epsilon". This is tangential to the discussion though. KevinIn article <dbnq4l$1tfh$1 digitaldaemon.com>, Stewart Gordon says...That given in the last paragraph of my original post. <snip>Kevin Bealer wrote:Let me explain what I mean. It has to use opCmp() to find the answer in the binary tree. Once it finds an answer that way, you would like it to use opEquals()? If so, what should it do with it? If it returns true, obviously we found the answer. If it returns false, what should the action be?Lets say that opCmp() said that the object it found was equal, BUT, opEqual() thought it was not. What should the AA insertion / search code do?The spec doesn't refer to opEqual. But I'm guessing you mean opEquals. It should use it.I see.... <snip>No difference, really. I think less<T> is a template argument, but if you don't provide it, the sorting in STL uses operator<. In any case, the same argument applies to operator<.This problem shows up in Java, and C++ too. The C++ containers use a similar technique to D -- they call "less<T>" to determine order, and to test equality I think they assume that (! (less<T>(a,b) || less<T>(b,a)) means that a==b.How is less different from the < operator? Or is it just something that the STL provides so that you can pretend incomparable classes are comparable?Have you not been listening? 1. It doesn't work on types that have no ordering. 2. It doesn't work on types where unequal objects can rank equally in order. Stewart.Using only opCmp impedes generality more than it impedes performance. The same goes as for many other long-pending issues surrounding AAs and opCmp: if this is intended behaviour, it would need to be documented.Documentation is good, but I don't understand the generality argument you are making.
Jul 27 2005
Kevin Bealer wrote: <snip>I see your point. However, I would prefer the current, faster, bin-tree list to changing the behavior for 1.There seems to be an assumption polluting some of the discussions about AAs and opCmp - that there should be only one AA implementation. As I've said a few times, if we reimplemented AAs using templates and got rid of Object.opCmp, it would be simple to choose one implementation for comparable classes and another for incomparable ones. This is already possible with structs and unions without any changes to the language. Determinining whether opEquals(o) and opCmp(o) == 0 are equivalent is of course harder. But should it be a default behaviour to assume this equivalence? If we didn't, then we could have a pragma to persuade the compiler to assume that opEquals(o) == (opCmp(o) == 0) and hence hook up an implementation that skips the opEquals stage of the lookup. Of course, anyone could forget to specify this pragma. But that's a bit like "forgetting" to define an opCmp at all.I can code up an arbitrary order if necessary for objects that I need.True, but this effectively cancels out the statement that "For some objects, testing for less or greater makes no sense". Because the object that you "need" might be part of a third-party library.I usually think its bad form to code a type where opEquals() != (0 == opCmp()). I know people do this, but I think equality should be equality. To me the fact that two methods exist is a performance thing, no more. In other words, having opCmp() follow a looser standard for returning "0" than opEquals() has for returning "1" seems wrong to me. The opCmp() (in my opinion) should check all the fields that opEquals() uses and impose some kind of directionality.<snip> Here's an idea. When I've finished catching up after my week away, I'll start a vote over on d.D. Stewart. -- My e-mail is valid but not my primary mailbox. Please keep replies on on the 'group where everyone may benefit.
Aug 06 2005