digitalmars.D - Re: AA with complex keytype?

Stewart Gordon <smjg_1998 yahoo.com> Feb 09 2007

Manfred Nowak <svv1999 hotmail.com> Feb 09 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> Feb 09 2007

Manfred Nowak <svv1999 hotmail.com> Feb 09 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> Feb 09 2007

Manfred Nowak <svv1999 hotmail.com> Feb 09 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> Feb 09 2007

Manfred Nowak <svv1999 hotmail.com> Feb 10 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> Feb 10 2007

Manfred Nowak <svv1999 hotmail.com> Feb 11 2007

Stewart Gordon <smjg_1998 yahoo.com> Feb 10 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> Feb 10 2007

Bill Baxter <dnewsgroup billbaxter.com> Feb 10 2007

"Jarrett Billingsley" <kb3ctd2 yahoo.com> Feb 10 2007

Charles D Hixson <charleshixsn earthlink.net> Feb 13 2007

"Joel C. Salomon" <JoelCSalomon Gmail.com> Feb 14 2007

Manfred Nowak <svv1999 hotmail.com> Feb 14 2007

Joel Salomon <JoelCSalomon Gmail.com> Feb 14 2007

Manfred Nowak <svv1999 hotmail.com> Feb 14 2007
Frits van Bommel <fvbommel REMwOVExCAPSs.nl> Feb 14 2007

Nicolai Waniek <no.spam thank.you> Feb 14 2007

Manfred Nowak <svv1999 hotmail.com> Feb 14 2007

Stewart Gordon <smjg_1998 yahoo.com> writes:

Manfred Nowak Wrote:

<snip>
 Why are classes required to implement opCmp( Object) for AA's to 
 function properly?


Apparently because Walter doesn't like the idea of providing an alternative
implementation of AAs for unordered classes, but hasn't to my knowledge given a
truly convincing reason.

My library implementation

http://pr.stewartsplace.org.uk/d/sutil/

uses only toHash and opEquals, thus works on unordered types.

 Currently AA's misbehave if opCmp(Object) is not overwritten for a 
 class, but no error is thrown.


Currently?  Not under DMD 1.005 as I try it.  Please supply a code example that
demonstrates this behaviour.

 For the transitivity of an ordering it suffices for opCmp(Object) 
 to always give "==" if the objects are equal, and give "<" if they 
 are not.


So if x != y, then both x < y and y < x?  That wouldn't make sense at all.

 It should be easy to implement this as the default opCmp(Object).


OUAT, Object.opCmp was set up to compare the memory addresses, but this
behaviour was removed to prepare for copying/compacting GC, and probably partly
to eliminate the confusing behaviour it created.

Stewart.

Feb 09 2007

Manfred Nowak <svv1999 hotmail.com> writes:

Stewart Gordon wrote

 Currently AA's misbehave if opCmp(Object) is not overwritten for
 a class, but no error is thrown.


 example that demonstrates this behaviour. 


Following code shows under 1.005 that no compile time error is given.

class C{
  hash_t toHash() { 
    return 0;
  }
}
void main(){
  bool[C] map;
  map[ new C]= true;
  map[ new C]= false;
}

A runtime error shows up only, when a comparison is actually needed.
This might be far too late. In this example the runtime error is forced 
by implementing a worthless toHash. 


 So if x != y, then both x < y and y < x?  That wouldn't make sense
 at all. 


Enough sense for an AA: all colliding elements are put into a linear 
list and searched in sequence on retrieval. Without a good toHash this 
would natrally lead to a bad runtime behaviour.

-manfred

Feb 09 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Manfred Nowak wrote:
 So if x != y, then both x < y and y < x?  That wouldn't make sense
 at all. 


 Enough sense for an AA: all colliding elements are put into a linear 
 list and searched in sequence on retrieval. Without a good toHash this 
 would natrally lead to a bad runtime behaviour.


If they'd be put into a linear list you wouldn't even *need* opCmp, so 
no opCmp would be "enough sence" as well ;).
But the fact is they are put into a binary tree to keep acceptable 
behavior (i.e. O(log N) instead of O(N)) when the hash function sucks. 
Since hash functions can be user-defined, and not all users are experts 
at hashing, you need to consider that use case.

Feb 09 2007

Manfred Nowak <svv1999 hotmail.com> writes:

Frits van Bommel wrote

 Since hash functions can be user-defined, and not all users
 are experts at hashing, you need to consider that use case.


least one of 1) or 2)
where:
(1a) toHash implements a good distribution and
(1b) opCmp==!opEquals or better
or
(2a) toHash worse than described by (1a) and
(2b) opCmp implements an ordering suitable for use in unbalanced binary 
trees

There is no evidence, that users that are no experts at hashing will be 
experts at implementing a suitable ordering, especially if the elements 
do not have a natural ordering.

-manfred

Feb 09 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Manfred Nowak wrote:
 Frits van Bommel wrote
 
 Since hash functions can be user-defined, and not all users
 are experts at hashing, you need to consider that use case.


 least one of 1) or 2)
 where:
 (1a) toHash implements a good distribution and
 (1b) opCmp==!opEquals or better


I haven't inspected the current AA implementation in detail. Are you 
sure that an opCmp that never returns "greater" (or, equivalently, never 
returns "less") will always work?
It may well be that the current implementation's binary trees degenerate 
into a linked list in that case, but it may also very well be that some 
elements become irretrievable.

 or
 (2a) toHash worse than described by (1a) and
 (2b) opCmp implements an ordering suitable for use in unbalanced binary 
 trees
 
 There is no evidence, that users that are no experts at hashing will be 
 experts at implementing a suitable ordering, especially if the elements 
 do not have a natural ordering.


Orderings are easier to get right than hash functions, I think. Pretty 
much everybody knows what an ordering looks like, but hash functions 
need to have a good distribution which can be a rather complicated topic.

Of course, since you specified an ordering suitable for an _unbalanced_ 
binary tree that complicates things a bit. But I don't think good 
behavior in there depends much on the comparison function. It's more 
related to in which order you insert the elements. Which orders are good 
and bad _are_ of course dependent on the comparison...

(do current AAs use unbalanced trees? I have no idea, I just know they 
use _some_ form of trees)

Feb 09 2007

Manfred Nowak <svv1999 hotmail.com> writes:

Frits van Bommel wrote

 Are
 you sure that an opCmp that never returns "greater" (or,
 equivalently, never returns "less") will always work?


have to go some kudos to Walter for that being true. 

 (do current AAs use unbalanced trees? I have no idea, I just know
 they use _some_ form of trees)


lines.

-manfred

Feb 09 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Manfred Nowak wrote:
 Frits van Bommel wrote
 
 Are
 you sure that an opCmp that never returns "greater" (or,
 equivalently, never returns "less") will always work?


 have to go some kudos to Walter for that being true. 


Doesn't it also require that objects are always passed in the same 
parameter order to the comparison function?
So when comparing two concrete objects a and b, you must always do 
compare(a, b), never compare(b, a). No matter which is the one you're 
looking for and which is the element in the tree.

Feb 09 2007

Manfred Nowak <svv1999 hotmail.com> writes:

Frits van Bommel wrote

 you must always do compare(a, b), never compare(b, a)


No. Because if you are at a node in the tree where a==b, you are done. 
And if you are at a node in the tree where a!=b, you will go into the 
direction dictated by opCmp, which is by construction of opCmp the same 
for all permutations.
This holds for all operations on the tree. 

-manfred

Feb 10 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Manfred Nowak wrote:
 Frits van Bommel wrote
 
 you must always do compare(a, b), never compare(b, a)


 No. Because if you are at a node in the tree where a==b, you are done. 
 And if you are at a node in the tree where a!=b, you will go into the 
 direction dictated by opCmp, which is by construction of opCmp the same 
 for all permutations.
 This holds for all operations on the tree. 


Let's say a!=b. If compare(a,b) returns a!=b, as you proposed, that 
evaluates to true, which converts to integer 1, representing "greater". 
So it returns that a>b holds. So if b is the node in the tree, you would 
then continue searching in the *right* branch.
If on the other hand compare(b,a) is called, by the same process it 
returns 1 indicating b>a holds. If b is still the tree node, you would 
then continue your search in the *left* branch
So the implementation needs to be careful about parameter order, and I 
don't want to go dig into the source to find out if it is. Nor do I want 
to check every Phobos/Tango release I use to find out if it _still_ is. 
Since the API doesn't specify this, I think it would be a mistake to 
rely on it even if it _currently_ holds.

Feb 10 2007

Manfred Nowak <svv1999 hotmail.com> writes:

Frits van Bommel wrote

 So the implementation needs to be careful about parameter order,


That's right. The value stemming from the collision bucket has to be at 
a fixed parameter position on all calls of opCmp. But this can be 
easily verified by inserting and retrieving two elements under a 
worthless toHash.

 Since the API doesn't specify this


That's also right. But this also can be fixed easily.

-manfred

Feb 11 2007

Stewart Gordon <smjg_1998 yahoo.com> writes:

Manfred Nowak Wrote:

 Frits van Bommel wrote


 (do current AAs use unbalanced trees?  I have no idea, I just 
 know they use _some_ form of trees)


 They are unbalanced.  Look at phobos/internal/aaA.d:278 and 
 following lines.


Balancing the trees on the fly would be inefficient.  But rehash ought to
balance them while it's at it.

Stewart.

Feb 10 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Stewart Gordon wrote:
 Manfred Nowak Wrote:
 
 Frits van Bommel wrote


 (do current AAs use unbalanced trees?  I have no idea, I just 
 know they use _some_ form of trees)


 following lines.


 Balancing the trees on the fly would be inefficient.  But rehash ought to
balance them while it's at it.


There are several well-known methods of efficiently keeping binary trees 
(reasonably) balanced. Off the top of my head there are AVL trees and 
red-black trees, but those definitely aren't the only ones.

For implementing a hash-table bucket this is probably not necessary, 
though. There typically shouldn't be very much elements in a bucket.

Feb 10 2007

Bill Baxter <dnewsgroup billbaxter.com> writes:

Frits van Bommel wrote:
 Stewart Gordon wrote:
 Manfred Nowak Wrote:

 Frits van Bommel wrote


 (do current AAs use unbalanced trees?  I have no idea, I just know 
 they use _some_ form of trees)


 lines.


 Balancing the trees on the fly would be inefficient.  But rehash ought 
 to balance them while it's at it.




 There are several well-known methods of efficiently keeping binary trees 
 (reasonably) balanced. Off the top of my head there are AVL trees and 
 red-black trees, but those definitely aren't the only ones.
 
 For implementing a hash-table bucket this is probably not necessary, 
 though. There typically shouldn't be very much elements in a bucket.


So much for O(N lg N) performance in the case of a really bad hash 
function, though.  You'll get O(N) if you happen to insert your values 
such that their hashes are strictly increasing or strictly decreasing. 
In theory anyway. ;-)  I wouldn't be surprised though if in practice the 
simplicity of leaving out balancing outweighs the theoretical advantage 
of balanced trees in 95% of the cases.

Someone who has too much time on their hands should take a look at what 
the code for Python dicts and Lua tables do.  Both of those are reputed 
to be heavily tweaked for super-duper performance.

--bb

Feb 10 2007

"Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:

"Bill Baxter" <dnewsgroup billbaxter.com> wrote in message 
news:eqkmg1$hnb$1 digitaldaemon.com...

 Someone who has too much time on their hands should take a look at what 
 the code for Python dicts and Lua tables do.  Both of those are reputed to 
 be heavily tweaked for super-duper performance.


Lua tables use (according to the source) "a mix of chained scatter table 
with Brent's variation."  The insert method comment reads:

/*
** inserts a new key into a hash table; first, check whether key's main
** position is free. If not, check whether colliding node is in its main
** position or not: if it is not, move colliding node to an empty place and
** put new key in its main position; otherwise (colliding node is in its 
main
** position), new key goes to an empty position.
*/

It's basically using a variation on separate chaining.  Additionally, Lua 5 
tables will attempt to keep integer keys in a true array rather than in the 
hash, which improves performance when using tables as arrays, but that's not 
really necessary in a language that has true arrays like D ;)

Feb 10 2007

Charles D Hixson <charleshixsn earthlink.net> writes:

Stewart Gordon wrote:
  <snip>

 So if x != y, then both x < y and y < x?  That wouldn't make sense at all.
 
 It should be easy to implement this as the default opCmp(Object).


 OUAT, Object.opCmp was set up to compare the memory addresses, but this
behaviour was removed to prepare for copying/compacting GC, and probably partly
to eliminate the confusing behaviour it created.
 
 Stewart.
 


Actually, there are many cases where modular arithmetic to 
various bases is the desired behavior, and in such a system 
"both x < y and y < x" makes perfect sense.  You can reach ANY 
number by repeatedly incrementing and also by repeatedly 
decrementing.

OTOH, I'm not certain that these are worth building into a 
language.  Ada thought so, but no other language appears to 
have followed their lead.  (Possibly it was also done in 
Algol68 ... I never used it, but it contained all sorts of 
experimental things.)

Feb 13 2007

"Joel C. Salomon" <JoelCSalomon Gmail.com> writes:

Charles D Hixson wrote:
 Stewart Gordon wrote:
 So if x != y, then both x < y and y < x?  That wouldn't make sense at 
 all.


 Actually, there are many cases where modular arithmetic to various bases 
 is the desired behavior, and in such a system "both x < y and y < x" 
 makes perfect sense.  You can reach ANY number by repeatedly 
 incrementing and also by repeatedly decrementing.


Mathematically speaking, a group like ℤ₃ is unordered, precisely for the 
reason given.  If we need to impose an ordering for array purposes (I 
don’t understand that bit, just (mis)repeating something I°ve seen said 
on the subject), then give the ordering lexicographic properties.
	(x < y) && (y < x)
should not ever return true.

--Joel

Feb 14 2007

Manfred Nowak <svv1999 hotmail.com> writes:

Joel C. Salomon wrote

 Mathematically speaking, a group like ℤ₃ is unordered,
 precisely for the reason given.  If we need to impose an ordering
 for array purposes (I don’t understand that bit, just
 (mis)repeating something I°ve seen said on the subject), then
 give the ordering lexicographic properties. 


It is faster to not impose an ordering on an unordered set.

-manfred

Feb 14 2007

Joel Salomon <JoelCSalomon Gmail.com> writes:

 Mathematically speaking, a group like ℤ₃ is unordered,
 precisely for the reason given.  If we need to impose an ordering
 for array purposes (I don’t understand that bit, just
 (mis)repeating something I’ve seen said on the subject), then
 give the ordering lexicographic properties. 


 It is faster to not impose an ordering on an unordered set.


Do you know where D insists that “some” ordering be applied to a type,
and why?

Feb 14 2007

Manfred Nowak <svv1999 hotmail.com> writes:

Joel Salomon wrote

 Do you know where D insists that “some” ordering be applied to
 a type, and why?


I have given an example that an ordering _operator_ is required by D in 
another branch of this thread:

class C{
  hash_t toHash() { 
    return 0;
  }
}
void main(){
  bool[C] map;
  map[ new C]= true;
  map[ new C]= false;
}

And this branch of the thread discusses Stewarts remark, that an 
ordering operator, which doesn't implement an ordering is senseless in 
his opinion.

-manfred

Feb 14 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Joel Salomon wrote:
 Mathematically speaking, a group like ℤ₃ is unordered,
 precisely for the reason given.  If we need to impose an ordering
 for array purposes (I don’t understand that bit, just
 (mis)repeating something I’ve seen said on the subject), then
 give the ordering lexicographic properties. 




 Do you know where D insists that “some” ordering be applied to a type,
 and why?


Associative arrays. http://www.digitalmars.com/d/arrays.html#associative

They're implemented as a hash table with a binary tree per bucket. 
Binary trees require some ordering between the elements.
This isn't the only implementation option of course, for example it 
could also be a linked list instead of a binary tree. Or it could not 
use a hash table at all and just use one big binary tree[1]. (But AFAICT 
the only reasonably efficient option without requiring ordering would be 
a hash table with a linked list per bucket)



[1]: Presumably some self-balancing tree like red-black or AVL.

Feb 14 2007

Nicolai Waniek <no.spam thank.you> writes:

Charles D Hixson wrote:
 Actually, there are many cases where modular arithmetic to various bases
 is the desired behavior, and in such a system "both x < y and y < x"
 makes perfect sense.  You can reach ANY number by repeatedly
 incrementing and also by repeatedly decrementing.



What's about this:

a = 1+0j
b = 0+1j

how would you increment a to finnally get to b? how do you want to
compare two complex numbers? this is mathematically impossible.


Nicolai

Feb 14 2007

Manfred Nowak <svv1999 hotmail.com> writes:

Nicolai Waniek wrote

 What's about this:
 
 a = 1+0j
 b = 0+1j
 
 how would you increment a to finnally get to b? how do you want to
 compare two complex numbers? this is mathematically impossible.


It depends on how one defines "increment".
Because Charles only mentions one operation, "increment" can be defined 
to be the normal complex multiplication.

Then "increment( a, b)= b" holds in your case above.

If one wants to define an ordering for all points on the unit-circle 
around the origin Charles is also righteous to define:

  (a != b) => (a < b) & (b < a)

Because you can travel on the circumference of the circle in each 
direction until "a != b" turns out to be false.

-manfred

Feb 14 2007

D Programming

C/C++ Programming

Other

digitalmars.D - Re: AA with complex keytype?