digitalmars.D - AA implementation algo should be mentioned

Davidl 126.com (5/5) Mar 17 2007 without looking into phobos or join irc channel people would

Sean Kelly (9/14) Mar 17 2007 I personally love the C++ spec in this respect. Everything in the

Frits van Bommel (9/23) Mar 17 2007 I agree it would be nice if this were specified.

Frits van Bommel (4/7) Mar 17 2007 Oh sorry, forgot about resizing. What does that make it, average

Sean Kelly (3/11) Mar 17 2007 I think so.

Sean Kelly (7/32) Mar 17 2007 Oops, you're right. They're typically O(N) with a best case (omega?) of...

Frits van Bommel (9/21) Mar 17 2007 I don't think so. AFAIK, it's just a technique that attempts to avoid

Sean Kelly (7/17) Mar 17 2007 Could be, I suppose. Though I'd have to tell a Data Structures

Dejan Lekic (1/1) Mar 17 2007 You can always get the source, David...
Derek Parnell (9/14) Mar 17 2007 Isn't the alogrithm used an implementation detail of the compiler and no...

Tom (5/16) Mar 17 2007 I agree, but I think that time/space complexity IS mandatory in the

Davidl 126.com writes:

without looking into phobos or join irc channel people would
never know what algo is used by AA. and that would make people
feel AA is not reliable, why not mention the algo used by AA
and the complexity on the page? i think that would make the
doc looks better

Mar 17 2007

Sean Kelly <sean f4.ca> writes:

Davidl 126.com wrote:
 without looking into phobos or join irc channel people would
 never know what algo is used by AA. and that would make people
 feel AA is not reliable, why not mention the algo used by AA
 and the complexity on the page? i think that would make the
 doc looks better

I personally love the C++ spec in this respect.  Everything in the 
library is treated as an algorithm and complexity information is defined 
for all operations.  However, I'm not sure that specifics about the 
algorithm should be put into the D spec for AAs beyond that they must be 
O(1) for lookups, O(N) for iteration via foreach, and possibly some 
constraints on insert/remove performance (possibly amortized O(1) for 
inserts and maybe O(N) for removals?).


Sean

Mar 17 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Sean Kelly wrote:
 Davidl 126.com wrote:
 without looking into phobos or join irc channel people would
 never know what algo is used by AA. and that would make people
 feel AA is not reliable, why not mention the algo used by AA
 and the complexity on the page? i think that would make the
 doc looks better

 
 I personally love the C++ spec in this respect.  Everything in the 
 library is treated as an algorithm and complexity information is defined 
 for all operations.  However, I'm not sure that specifics about the 
 algorithm should be put into the D spec for AAs beyond that they must be 
 O(1) for lookups, O(N) for iteration via foreach, and possibly some 
 constraints on insert/remove performance (possibly amortized O(1) for 
 inserts and maybe O(N) for removals?).

I agree it would be nice if this were specified.

However, I don't think inserts are currently amortized O(1), are they? 
IIRC the Phobos implementation uses a hash table with binary trees for 
collision handling, which would mean /average/ O(1) for "good" toHash() 
implementations, but worst-case O(log N) (for "bad" toHash()s).
Also, that would mean removals are currently O(log N) as well, but 
specifying O(N) would allow other standard library implementations to 
use linked lists instead of binary trees so perhaps that's for the best.

Mar 17 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Frits van Bommel wrote:
[about inserts]
 IIRC the Phobos implementation uses a hash table with binary trees for 
 collision handling, which would mean /average/ O(1) for "good" toHash() 
 implementations, but worst-case O(log N) (for "bad" toHash()s).

Oh sorry, forgot about resizing. What does that make it, average 
amortized O(1)? :P

Mar 17 2007

Sean Kelly <sean f4.ca> writes:

Frits van Bommel wrote:
 Frits van Bommel wrote:
 [about inserts]
 IIRC the Phobos implementation uses a hash table with binary trees for 
 collision handling, which would mean /average/ O(1) for "good" 
 toHash() implementations, but worst-case O(log N) (for "bad" toHash()s).

 
 Oh sorry, forgot about resizing. What does that make it, average 
 amortized O(1)? :P

I think so.


Sean

Mar 17 2007

Sean Kelly <sean f4.ca> writes:

Frits van Bommel wrote:
 Sean Kelly wrote:
 Davidl 126.com wrote:
 without looking into phobos or join irc channel people would
 never know what algo is used by AA. and that would make people
 feel AA is not reliable, why not mention the algo used by AA
 and the complexity on the page? i think that would make the
 doc looks better

 I personally love the C++ spec in this respect.  Everything in the 
 library is treated as an algorithm and complexity information is 
 defined for all operations.  However, I'm not sure that specifics 
 about the algorithm should be put into the D spec for AAs beyond that 
 they must be O(1) for lookups, O(N) for iteration via foreach, and 
 possibly some constraints on insert/remove performance (possibly 
 amortized O(1) for inserts and maybe O(N) for removals?).

 
 I agree it would be nice if this were specified.
 
 However, I don't think inserts are currently amortized O(1), are they? 

Oops, you're right.  They're typically O(N) with a best case (omega?) of 
1.  Double hashing is O(1) for inserts I think, but removals tend to be 
somewhat messy.

 IIRC the Phobos implementation uses a hash table with binary trees for 
 collision handling, which would mean /average/ O(1) for "good" toHash() 
 implementations, but worst-case O(log N) (for "bad" toHash()s).
 Also, that would mean removals are currently O(log N) as well, but 
 specifying O(N) would allow other standard library implementations to 
 use linked lists instead of binary trees so perhaps that's for the best.

Yup, which is correct IMO.  There should be no restriction on 
implementation other than that some form of hashing is desired :-)


Sean

Mar 17 2007

Frits van Bommel <fvbommel REMwOVExCAPSs.nl> writes:

Sean Kelly wrote:
 Double hashing is O(1) for inserts I think, but removals tend to be 
 somewhat messy.

I don't think so. AFAIK, it's just a technique that attempts to avoid 
the clustering effect that linear probing suffers from. However, in a 
full table it can still take O(N) probes before an empty slot is found.
Wikipedia seems to back me up on this: "Like all other forms of open 
addressing, double hashing becomes linear as the hash table approaches 
maximum capacity" (http://en.wikipedia.org/wiki/Double_hashing)

 Frits van Bommel wrote:
 IIRC the Phobos implementation uses a hash table with binary trees for 
 collision handling, which would mean /average/ O(1) for "good" 
 toHash() implementations, but worst-case O(log N) (for "bad" toHash()s).
 Also, that would mean removals are currently O(log N) as well, but 
 specifying O(N) would allow other standard library implementations to 
 use linked lists instead of binary trees so perhaps that's for the best.

 
 Yup, which is correct IMO.  There should be no restriction on 
 implementation other than that some form of hashing is desired :-)

Right, I just wanted to mention that. (IIRC that's also the philosophy 
the C++ standard follows on this point)

Mar 17 2007

Sean Kelly <sean f4.ca> writes:

Frits van Bommel wrote:
 Sean Kelly wrote:
 Double hashing is O(1) for inserts I think, but removals tend to be 
 somewhat messy.

 
 I don't think so. AFAIK, it's just a technique that attempts to avoid 
 the clustering effect that linear probing suffers from. However, in a 
 full table it can still take O(N) probes before an empty slot is found.
 Wikipedia seems to back me up on this: "Like all other forms of open 
 addressing, double hashing becomes linear as the hash table approaches 
 maximum capacity" (http://en.wikipedia.org/wiki/Double_hashing)

Could be, I suppose.  Though I'd have to tell a Data Structures 
professor I know that his proof is incorrect.  I suppose I could be 
mis-remembering the conclusion as well though.  Hm... it may have been 
based on particular growth conditions, like never letting the table get 
more than half full before a rehash.



Sean

Mar 17 2007

Dejan Lekic <dejan.lekic gmail.com> writes:

You can always get the source, David...

Mar 17 2007

Derek Parnell <derek psych.ward> writes:

On Sat, 17 Mar 2007 23:29:18 +0800, Davidl 126.com wrote:

 without looking into phobos or join irc channel people would
 never know what algo is used by AA. and that would make people
 feel AA is not reliable, why not mention the algo used by AA
 and the complexity on the page? i think that would make the
 doc looks better

Isn't the alogrithm used an implementation detail of the compiler and not
the language? I would see that any description on the AA alogrithm should
be placed in the docs for the compiler and not mandated into the language.

-- 
Derek Parnell
Melbourne, Australia
"Justice for David Hicks!"
skype: derek.j.parnell

Mar 17 2007

Tom <tom nospam.com> writes:

Derek Parnell escribi�:
 On Sat, 17 Mar 2007 23:29:18 +0800, Davidl 126.com wrote:
 
 without looking into phobos or join irc channel people would
 never know what algo is used by AA. and that would make people
 feel AA is not reliable, why not mention the algo used by AA
 and the complexity on the page? i think that would make the
 doc looks better

 
 Isn't the alogrithm used an implementation detail of the compiler and not
 the language? I would see that any description on the AA alogrithm should
 be placed in the docs for the compiler and not mandated into the language.

I agree, but I think that time/space complexity IS mandatory in the 
language specification.

--
Tom;

Mar 17 2007

D Programming

C/C++ Programming

Other

digitalmars.D - AA implementation algo should be mentioned