digitalmars.D - Re: AA with complex keytype?
- Stewart Gordon <smjg_1998 yahoo.com> Feb 09 2007
- Manfred Nowak <svv1999 hotmail.com> Feb 09 2007
- Frits van Bommel <fvbommel REMwOVExCAPSs.nl> Feb 09 2007
- Manfred Nowak <svv1999 hotmail.com> Feb 09 2007
- Frits van Bommel <fvbommel REMwOVExCAPSs.nl> Feb 09 2007
- Manfred Nowak <svv1999 hotmail.com> Feb 09 2007
- Frits van Bommel <fvbommel REMwOVExCAPSs.nl> Feb 09 2007
- Manfred Nowak <svv1999 hotmail.com> Feb 10 2007
- Frits van Bommel <fvbommel REMwOVExCAPSs.nl> Feb 10 2007
- Manfred Nowak <svv1999 hotmail.com> Feb 11 2007
- Stewart Gordon <smjg_1998 yahoo.com> Feb 10 2007
- Frits van Bommel <fvbommel REMwOVExCAPSs.nl> Feb 10 2007
- Bill Baxter <dnewsgroup billbaxter.com> Feb 10 2007
- "Jarrett Billingsley" <kb3ctd2 yahoo.com> Feb 10 2007
- Charles D Hixson <charleshixsn earthlink.net> Feb 13 2007
- "Joel C. Salomon" <JoelCSalomon Gmail.com> Feb 14 2007
- Manfred Nowak <svv1999 hotmail.com> Feb 14 2007
- Joel Salomon <JoelCSalomon Gmail.com> Feb 14 2007
- Manfred Nowak <svv1999 hotmail.com> Feb 14 2007
- Frits van Bommel <fvbommel REMwOVExCAPSs.nl> Feb 14 2007
- Nicolai Waniek <no.spam thank.you> Feb 14 2007
- Manfred Nowak <svv1999 hotmail.com> Feb 14 2007
Manfred Nowak Wrote: <snip>Why are classes required to implement opCmp( Object) for AA's to function properly?
Apparently because Walter doesn't like the idea of providing an alternative implementation of AAs for unordered classes, but hasn't to my knowledge given a truly convincing reason. My library implementation http://pr.stewartsplace.org.uk/d/sutil/ uses only toHash and opEquals, thus works on unordered types.Currently AA's misbehave if opCmp(Object) is not overwritten for a class, but no error is thrown.
Currently? Not under DMD 1.005 as I try it. Please supply a code example that demonstrates this behaviour.For the transitivity of an ordering it suffices for opCmp(Object) to always give "==" if the objects are equal, and give "<" if they are not.
So if x != y, then both x < y and y < x? That wouldn't make sense at all.It should be easy to implement this as the default opCmp(Object).
OUAT, Object.opCmp was set up to compare the memory addresses, but this behaviour was removed to prepare for copying/compacting GC, and probably partly to eliminate the confusing behaviour it created. Stewart.
Feb 09 2007
Stewart Gordon wroteCurrently AA's misbehave if opCmp(Object) is not overwritten for a class, but no error is thrown.
example that demonstrates this behaviour.
Following code shows under 1.005 that no compile time error is given. class C{ hash_t toHash() { return 0; } } void main(){ bool[C] map; map[ new C]= true; map[ new C]= false; } A runtime error shows up only, when a comparison is actually needed. This might be far too late. In this example the runtime error is forced by implementing a worthless toHash.So if x != y, then both x < y and y < x? That wouldn't make sense at all.
Enough sense for an AA: all colliding elements are put into a linear list and searched in sequence on retrieval. Without a good toHash this would natrally lead to a bad runtime behaviour. -manfred
Feb 09 2007
Manfred Nowak wrote:So if x != y, then both x < y and y < x? That wouldn't make sense at all.
Enough sense for an AA: all colliding elements are put into a linear list and searched in sequence on retrieval. Without a good toHash this would natrally lead to a bad runtime behaviour.
If they'd be put into a linear list you wouldn't even *need* opCmp, so no opCmp would be "enough sence" as well ;). But the fact is they are put into a binary tree to keep acceptable behavior (i.e. O(log N) instead of O(N)) when the hash function sucks. Since hash functions can be user-defined, and not all users are experts at hashing, you need to consider that use case.
Feb 09 2007
Frits van Bommel wroteSince hash functions can be user-defined, and not all users are experts at hashing, you need to consider that use case.
least one of 1) or 2) where: (1a) toHash implements a good distribution and (1b) opCmp==!opEquals or better or (2a) toHash worse than described by (1a) and (2b) opCmp implements an ordering suitable for use in unbalanced binary trees There is no evidence, that users that are no experts at hashing will be experts at implementing a suitable ordering, especially if the elements do not have a natural ordering. -manfred
Feb 09 2007
Manfred Nowak wrote:Frits van Bommel wroteSince hash functions can be user-defined, and not all users are experts at hashing, you need to consider that use case.
least one of 1) or 2) where: (1a) toHash implements a good distribution and (1b) opCmp==!opEquals or better
I haven't inspected the current AA implementation in detail. Are you sure that an opCmp that never returns "greater" (or, equivalently, never returns "less") will always work? It may well be that the current implementation's binary trees degenerate into a linked list in that case, but it may also very well be that some elements become irretrievable.or (2a) toHash worse than described by (1a) and (2b) opCmp implements an ordering suitable for use in unbalanced binary trees There is no evidence, that users that are no experts at hashing will be experts at implementing a suitable ordering, especially if the elements do not have a natural ordering.
Orderings are easier to get right than hash functions, I think. Pretty much everybody knows what an ordering looks like, but hash functions need to have a good distribution which can be a rather complicated topic. Of course, since you specified an ordering suitable for an _unbalanced_ binary tree that complicates things a bit. But I don't think good behavior in there depends much on the comparison function. It's more related to in which order you insert the elements. Which orders are good and bad _are_ of course dependent on the comparison... (do current AAs use unbalanced trees? I have no idea, I just know they use _some_ form of trees)
Feb 09 2007
Frits van Bommel wroteAre you sure that an opCmp that never returns "greater" (or, equivalently, never returns "less") will always work?
have to go some kudos to Walter for that being true.(do current AAs use unbalanced trees? I have no idea, I just know they use _some_ form of trees)
lines. -manfred
Feb 09 2007
Manfred Nowak wrote:Frits van Bommel wroteAre you sure that an opCmp that never returns "greater" (or, equivalently, never returns "less") will always work?
have to go some kudos to Walter for that being true.
Doesn't it also require that objects are always passed in the same parameter order to the comparison function? So when comparing two concrete objects a and b, you must always do compare(a, b), never compare(b, a). No matter which is the one you're looking for and which is the element in the tree.
Feb 09 2007
Frits van Bommel wroteyou must always do compare(a, b), never compare(b, a)
No. Because if you are at a node in the tree where a==b, you are done. And if you are at a node in the tree where a!=b, you will go into the direction dictated by opCmp, which is by construction of opCmp the same for all permutations. This holds for all operations on the tree. -manfred
Feb 10 2007
Manfred Nowak wrote:Frits van Bommel wroteyou must always do compare(a, b), never compare(b, a)
No. Because if you are at a node in the tree where a==b, you are done. And if you are at a node in the tree where a!=b, you will go into the direction dictated by opCmp, which is by construction of opCmp the same for all permutations. This holds for all operations on the tree.
Let's say a!=b. If compare(a,b) returns a!=b, as you proposed, that evaluates to true, which converts to integer 1, representing "greater". So it returns that a>b holds. So if b is the node in the tree, you would then continue searching in the *right* branch. If on the other hand compare(b,a) is called, by the same process it returns 1 indicating b>a holds. If b is still the tree node, you would then continue your search in the *left* branch So the implementation needs to be careful about parameter order, and I don't want to go dig into the source to find out if it is. Nor do I want to check every Phobos/Tango release I use to find out if it _still_ is. Since the API doesn't specify this, I think it would be a mistake to rely on it even if it _currently_ holds.
Feb 10 2007
Frits van Bommel wroteSo the implementation needs to be careful about parameter order,
That's right. The value stemming from the collision bucket has to be at a fixed parameter position on all calls of opCmp. But this can be easily verified by inserting and retrieving two elements under a worthless toHash.Since the API doesn't specify this
That's also right. But this also can be fixed easily. -manfred
Feb 11 2007
Manfred Nowak Wrote:Frits van Bommel wrote
(do current AAs use unbalanced trees? I have no idea, I just know they use _some_ form of trees)
They are unbalanced. Look at phobos/internal/aaA.d:278 and following lines.
Balancing the trees on the fly would be inefficient. But rehash ought to balance them while it's at it. Stewart.
Feb 10 2007
Stewart Gordon wrote:Manfred Nowak Wrote:Frits van Bommel wrote
(do current AAs use unbalanced trees? I have no idea, I just know they use _some_ form of trees)
following lines.
Balancing the trees on the fly would be inefficient. But rehash ought to balance them while it's at it.
There are several well-known methods of efficiently keeping binary trees (reasonably) balanced. Off the top of my head there are AVL trees and red-black trees, but those definitely aren't the only ones. For implementing a hash-table bucket this is probably not necessary, though. There typically shouldn't be very much elements in a bucket.
Feb 10 2007
Frits van Bommel wrote:Stewart Gordon wrote:Manfred Nowak Wrote:Frits van Bommel wrote
(do current AAs use unbalanced trees? I have no idea, I just know they use _some_ form of trees)
lines.
Balancing the trees on the fly would be inefficient. But rehash ought to balance them while it's at it.
There are several well-known methods of efficiently keeping binary trees (reasonably) balanced. Off the top of my head there are AVL trees and red-black trees, but those definitely aren't the only ones. For implementing a hash-table bucket this is probably not necessary, though. There typically shouldn't be very much elements in a bucket.
So much for O(N lg N) performance in the case of a really bad hash function, though. You'll get O(N) if you happen to insert your values such that their hashes are strictly increasing or strictly decreasing. In theory anyway. ;-) I wouldn't be surprised though if in practice the simplicity of leaving out balancing outweighs the theoretical advantage of balanced trees in 95% of the cases. Someone who has too much time on their hands should take a look at what the code for Python dicts and Lua tables do. Both of those are reputed to be heavily tweaked for super-duper performance. --bb
Feb 10 2007
"Bill Baxter" <dnewsgroup billbaxter.com> wrote in message news:eqkmg1$hnb$1 digitaldaemon.com...Someone who has too much time on their hands should take a look at what the code for Python dicts and Lua tables do. Both of those are reputed to be heavily tweaked for super-duper performance.
Lua tables use (according to the source) "a mix of chained scatter table with Brent's variation." The insert method comment reads: /* ** inserts a new key into a hash table; first, check whether key's main ** position is free. If not, check whether colliding node is in its main ** position or not: if it is not, move colliding node to an empty place and ** put new key in its main position; otherwise (colliding node is in its main ** position), new key goes to an empty position. */ It's basically using a variation on separate chaining. Additionally, Lua 5 tables will attempt to keep integer keys in a true array rather than in the hash, which improves performance when using tables as arrays, but that's not really necessary in a language that has true arrays like D ;)
Feb 10 2007
Stewart Gordon wrote:<snip> So if x != y, then both x < y and y < x? That wouldn't make sense at all.It should be easy to implement this as the default opCmp(Object).
OUAT, Object.opCmp was set up to compare the memory addresses, but this behaviour was removed to prepare for copying/compacting GC, and probably partly to eliminate the confusing behaviour it created. Stewart.
Actually, there are many cases where modular arithmetic to various bases is the desired behavior, and in such a system "both x < y and y < x" makes perfect sense. You can reach ANY number by repeatedly incrementing and also by repeatedly decrementing. OTOH, I'm not certain that these are worth building into a language. Ada thought so, but no other language appears to have followed their lead. (Possibly it was also done in Algol68 ... I never used it, but it contained all sorts of experimental things.)
Feb 13 2007
Charles D Hixson wrote:Stewart Gordon wrote:So if x != y, then both x < y and y < x? That wouldn't make sense at all.
Actually, there are many cases where modular arithmetic to various bases is the desired behavior, and in such a system "both x < y and y < x" makes perfect sense. You can reach ANY number by repeatedly incrementing and also by repeatedly decrementing.
Mathematically speaking, a group like ℤ₃ is unordered, precisely for the reason given. If we need to impose an ordering for array purposes (I don’t understand that bit, just (mis)repeating something I°ve seen said on the subject), then give the ordering lexicographic properties. (x < y) && (y < x) should not ever return true. --Joel
Feb 14 2007
Joel C. Salomon wroteMathematically speaking, a group like ℤ₃ is unordered, precisely for the reason given. If we need to impose an ordering for array purposes (I don’t understand that bit, just (mis)repeating something I°ve seen said on the subject), then give the ordering lexicographic properties.
It is faster to not impose an ordering on an unordered set. -manfred
Feb 14 2007
Mathematically speaking, a group like ℤ₃ is unordered, precisely for the reason given. If we need to impose an ordering for array purposes (I don’t understand that bit, just (mis)repeating something I’ve seen said on the subject), then give the ordering lexicographic properties.
It is faster to not impose an ordering on an unordered set.
Do you know where D insists that “some” ordering be applied to a type, and why?
Feb 14 2007
Joel Salomon wroteDo you know where D insists that “some” ordering be applied to a type, and why?
I have given an example that an ordering _operator_ is required by D in another branch of this thread: class C{ hash_t toHash() { return 0; } } void main(){ bool[C] map; map[ new C]= true; map[ new C]= false; } And this branch of the thread discusses Stewarts remark, that an ordering operator, which doesn't implement an ordering is senseless in his opinion. -manfred
Feb 14 2007
Joel Salomon wrote:Mathematically speaking, a group like ℤ₃ is unordered, precisely for the reason given. If we need to impose an ordering for array purposes (I don’t understand that bit, just (mis)repeating something I’ve seen said on the subject), then give the ordering lexicographic properties.
Do you know where D insists that “some” ordering be applied to a type, and why?
Associative arrays. http://www.digitalmars.com/d/arrays.html#associative They're implemented as a hash table with a binary tree per bucket. Binary trees require some ordering between the elements. This isn't the only implementation option of course, for example it could also be a linked list instead of a binary tree. Or it could not use a hash table at all and just use one big binary tree[1]. (But AFAICT the only reasonably efficient option without requiring ordering would be a hash table with a linked list per bucket) [1]: Presumably some self-balancing tree like red-black or AVL.
Feb 14 2007
Charles D Hixson wrote:Actually, there are many cases where modular arithmetic to various bases is the desired behavior, and in such a system "both x < y and y < x" makes perfect sense. You can reach ANY number by repeatedly incrementing and also by repeatedly decrementing.
What's about this: a = 1+0j b = 0+1j how would you increment a to finnally get to b? how do you want to compare two complex numbers? this is mathematically impossible. Nicolai
Feb 14 2007
Nicolai Waniek wroteWhat's about this: a = 1+0j b = 0+1j how would you increment a to finnally get to b? how do you want to compare two complex numbers? this is mathematically impossible.
It depends on how one defines "increment". Because Charles only mentions one operation, "increment" can be defined to be the normal complex multiplication. Then "increment( a, b)= b" holds in your case above. If one wants to define an ordering for all points on the unit-circle around the origin Charles is also righteous to define: (a != b) => (a < b) & (b < a) Because you can travel on the circumference of the circle in each direction until "a != b" turns out to be false. -manfred
Feb 14 2007