digitalmars.D - Consistency
- Steve D (26/26) Feb 15 2015 Python (built-in)
- Adam D. Ruppe (4/8) Feb 15 2015 And also more easily wrong: 1 in [1,2,3] is a slow search, so the
- Xinok (8/34) Feb 15 2015 Part of the issue is that "in" means to search by key, not by
- bearophile (6/8) Feb 15 2015 Values as in Python are much more useful for arrays :-)
- Meta (15/29) Feb 15 2015 O(n) lookup
- John Colvin (3/5) Feb 15 2015 tup1.expand.only.countUntil(2).writeln;
- bearophile (5/7) Feb 15 2015 A little shorter:
- Xinok (3/13) Feb 15 2015 A small nitpick, but I'm sure that should be O(log n).
- Meta (3/16) Feb 15 2015 Oh, whoops. I mixed up average-case complexity with worst-case.
- bearophile (7/9) Feb 15 2015 D associative arrays used to be O(1) amortized and O(n ln n) in
- Meta (6/16) Feb 15 2015 Hmm, if that's the case, then the logic for not allowing "in" for
- Baz (6/25) Feb 15 2015 Where is the lang. scpec. saying that "in" must only be used if
- Andrei Alexandrescu (2/8) Feb 15 2015 I remember associative arrays were a lot slower and bigger before. -- An...
- Steve D (33/33) Feb 15 2015 Well, it provoked a little discussion, and I remain unconvinced.
- Andrei Alexandrescu (13/40) Feb 15 2015 We're good as we are. D's standard library goes for a proportional
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (8/12) Feb 16 2015 To be really consistent,
- Daniel Murphy (6/12) Feb 16 2015 It's called 'in', not 'haskey'. Is 3 in the array? Is 7 in the map?
- John Colvin (6/20) Feb 16 2015 I'm quite a fan of python's // operator for integer division,
- bearophile (8/14) Feb 16 2015 The C/D division operator semantics is a bug-prone design
- Andrei Alexandrescu (11/24) Feb 16 2015 Yah, I agree. FWIW many times "consistency" is used, well,
- "Marc =?UTF-8?B?U2Now7x0eiI=?= <schuetzm gmx.net> (5/17) Feb 17 2015 I disagree with that; the difference between keys and values is
- Craig Dillabaugh (4/14) Feb 15 2015 How is it possible to have a search structure that takes O(n ln n)
- Daniel Murphy (3/5) Feb 15 2015 No, they were still O(n) worst case, for a single bucket with a degenera...
- "Casper =?UTF-8?B?RsOmcmdlbWFuZCI=?= <shorttail hotmail.com> (6/12) Feb 16 2015 Java 7 changed its HashMap implementation to use TreeMap (red
- Daniel Murphy (5/9) Feb 16 2015 I don't think that would pay off for most uses of builtin AAs. One day
- bearophile (5/7) Feb 16 2015 I see. I was unable to hit this degenerate case in my testing
- FG (3/5) Feb 16 2015 From what I can see in druntime/rt/aaA.d, the bucket is chosen by divid...
- Daniel Murphy (2/5) Feb 16 2015 There isn't one any more, we're talking about the previous implementatio...
- Steven Schveighoffer (8/13) Feb 16 2015 IIRC, the AA code tried to make balanced trees, at least on rehash, but
Python (built-in) dict1 = {"a":1,"b":2} tup1 = (0,1,2,3) str1 = "abcd" There is some sort of consistency of use, though they are different sequence/collection types. Now try the same in D auto dict1 = ["a":1,"b":2]; auto tup1 = tuple(0,1,2,3); auto arr1 = [0,1,2,3]; auto str1 = "abcd"; Having some consistency involving sequences/arrays/strings/collections etc, which are the foundations of any language makes programming much easier, intuitive and pleasant. It shouldn't be too difficult for a super bare-metal language like D. I'm honestly not knocking D (I love it), just some thoughts (maybe provoke some discussion?, for D3?)
Feb 15 2015
On Sunday, 15 February 2015 at 17:18:08 UTC, Steve D wrote:Having some consistency involving sequences/arrays/strings/collections etc, which are the foundations of any language makes programming much easier, intuitive and pleasant.And also more easily wrong: 1 in [1,2,3] is a slow search, so the D strategy is to make it look different to draw your attention to the potential problem.
Feb 15 2015
On Sunday, 15 February 2015 at 17:18:08 UTC, Steve D wrote:Python (built-in) dict1 = {"a":1,"b":2} tup1 = (0,1,2,3) str1 = "abcd" There is some sort of consistency of use, though they are different sequence/collection types. Now try the same in D auto dict1 = ["a":1,"b":2]; auto tup1 = tuple(0,1,2,3); auto arr1 = [0,1,2,3]; auto str1 = "abcd"; Having some consistency involving sequences/arrays/strings/collections etc, which are the foundations of any language makes programming much easier, intuitive and pleasant. It shouldn't be too difficult for a super bare-metal language like D. I'm honestly not knocking D (I love it), just some thoughts (maybe provoke some discussion?, for D3?)Part of the issue is that "in" means to search by key, not by value. In that respect, it only makes sense for dictionaries (and perhaps sets). I guess the idea is that this generic code should be valid for any container which implements the in operator: if(a in r) writeln(r[a]); On a different note, I do wish D had tuple literals like Python has.
Feb 15 2015
Xinok:In that respect, it only makes sense for dictionariesValues as in Python are much more useful for arrays :-) Meta:D is better off than Python in this case,I've never subscribed with point of view :-) Bye, bearophile
Feb 15 2015
On Sunday, 15 February 2015 at 17:18:08 UTC, Steve D wrote:Python (built-in) dict1 = {"a":1,"b":2} tup1 = (0,1,2,3) str1 = "abcd"O(1) lookupO(n) lookupO(n) lookupO(n) lookup This has been discussed to death here. D is better off than Python in this case, as it doesn't hide the fact that it's doing an O(n) operation sometimes and an O(1) operation other times.No way to do this in Darr1.countUntil(2)str1.countUntil("c")There is some sort of consistency of use, though they are different sequence/collection types.The problem is mainly with Tuple, which is not a range (arrays and strings are). You can, however, write a small utility function for this. Now that I think about it, it can probably be made O(1) as well.
Feb 15 2015
On Sunday, 15 February 2015 at 18:15:13 UTC, Meta wrote:tup1.expand.only.countUntil(2).writeln; Admittedly, it's a little longer than expected :)No way to do this in D
Feb 15 2015
John Colvin:tup1.expand.only.countUntil(2).writeln; Admittedly, it's a little longer than expected :)A little shorter: tup1[].only.countUntil(2).writeln; Bye, bearophile
Feb 15 2015
On Sunday, 15 February 2015 at 18:15:13 UTC, Meta wrote:On Sunday, 15 February 2015 at 17:18:08 UTC, Steve D wrote:A small nitpick, but I'm sure that should be O(log n). Dictionaries don't have constant lookup time.Python (built-in) dict1 = {"a":1,"b":2} tup1 = (0,1,2,3) str1 = "abcd"O(1) lookup
Feb 15 2015
On Sunday, 15 February 2015 at 18:20:10 UTC, Xinok wrote:On Sunday, 15 February 2015 at 18:15:13 UTC, Meta wrote:Oh, whoops. I mixed up average-case complexity with worst-case. Although, isn't lookup O(n) in the worst case for hash tables?On Sunday, 15 February 2015 at 17:18:08 UTC, Steve D wrote:A small nitpick, but I'm sure that should be O(log n). Dictionaries don't have constant lookup time.Python (built-in) dict1 = {"a":1,"b":2} tup1 = (0,1,2,3) str1 = "abcd"O(1) lookup
Feb 15 2015
Meta:Oh, whoops. I mixed up average-case complexity with worst-case. Although, isn't lookup O(n) in the worst case for hash tables?D associative arrays used to be O(1) amortized and O(n ln n) in worst case. Now they are O(1) amortized and O(n) worst case. And for an adversary it's not too much hard to find and hit that O(n) case. Bye, bearophile
Feb 15 2015
On Sunday, 15 February 2015 at 18:45:45 UTC, bearophile wrote:Meta:Hmm, if that's the case, then the logic for not allowing "in" for arrays and other collections seems to be quite weak. However, what if the AA implementation is changed again so that lookup is worst-case O(n*logn)? Should we then again disable "in" for arrays, etc.?Oh, whoops. I mixed up average-case complexity with worst-case. Although, isn't lookup O(n) in the worst case for hash tables?D associative arrays used to be O(1) amortized and O(n ln n) in worst case. Now they are O(1) amortized and O(n) worst case. And for an adversary it's not too much hard to find and hit that O(n) case. Bye, bearophile
Feb 15 2015
On Sunday, 15 February 2015 at 19:04:57 UTC, Meta wrote:On Sunday, 15 February 2015 at 18:45:45 UTC, bearophile wrote:Where is the lang. scpec. saying that "in" must only be used if the complexity is O this O that ? (seriously show me the link...) It's understandable that phobos respects this consensus, but please don't deactivate opIn_r for the users..."in" makes sense in many contexts.Meta:Hmm, if that's the case, then the logic for not allowing "in" for arrays and other collections seems to be quite weak. However, what if the AA implementation is changed again so that lookup is worst-case O(n*logn)? Should we then again disable "in" for arrays, etc.?Oh, whoops. I mixed up average-case complexity with worst-case. Although, isn't lookup O(n) in the worst case for hash tables?D associative arrays used to be O(1) amortized and O(n ln n) in worst case. Now they are O(1) amortized and O(n) worst case. And for an adversary it's not too much hard to find and hit that O(n) case. Bye, bearophile
Feb 15 2015
On 2/15/15 10:45 AM, bearophile wrote:Meta:I remember associative arrays were a lot slower and bigger before. -- AndreiOh, whoops. I mixed up average-case complexity with worst-case. Although, isn't lookup O(n) in the worst case for hash tables?D associative arrays used to be O(1) amortized and O(n ln n) in worst case. Now they are O(1) amortized and O(n) worst case. And for an adversary it's not too much hard to find and hit that O(n) case.
Feb 15 2015
Well, it provoked a little discussion, and I remain unconvinced. Why differentiate between 'in' for an associative-array and 'in' for an unordered sequence/array? This implies that the programmer is an idiot who believes that trawling through an unordered sequence is as fast as a dictionary key lookup. Such programmers should not be coding, never went to school or are probably one of those new kids who only fanny-about with javascript frameworks. The point is that python's 'in' or 'index()' .. whatever, give the fastest implementation for the underlying data type. (the same as canFind will probably give the quickest result) The coder can trust this, and then use the common idiom instead of wondering (as a new D coder) - is it an 'in'?, a 'canFind'? an 'indexOf'? a 'countUntil'? Is it builtin? Is it in std.algorithm? What's the typical lookup for this seq? The second point is that common idioms across datatypes, make for consistent, simpler intuitive coding, whilst also trusting that the language implementors have spent time optimizing the different underlying implementations. Think about new D coders, or those coming from other languages or planets. Apparently this has all been done to death so, yeah, like whatever.. Arguing for inconsistency means you are all retarded :)
Feb 15 2015
On 2/15/15 12:29 PM, Steve D wrote:Well, it provoked a little discussion, and I remain unconvinced. Why differentiate between 'in' for an associative-array and 'in' for an unordered sequence/array? This implies that the programmer is an idiot who believes that trawling through an unordered sequence is as fast as a dictionary key lookup. Such programmers should not be coding, never went to school or are probably one of those new kids who only fanny-about with javascript frameworks. The point is that python's 'in' or 'index()' .. whatever, give the fastest implementation for the underlying data type. (the same as canFind will probably give the quickest result) The coder can trust this, and then use the common idiom instead of wondering (as a new D coder) - is it an 'in'?, a 'canFind'? an 'indexOf'? a 'countUntil'? Is it builtin? Is it in std.algorithm? What's the typical lookup for this seq? The second point is that common idioms across datatypes, make for consistent, simpler intuitive coding, whilst also trusting that the language implementors have spent time optimizing the different underlying implementations. Think about new D coders, or those coming from other languages or planets. Apparently this has all been done to death so, yeah, like whatever.. Arguing for inconsistency means you are all retarded :)We're good as we are. D's standard library goes for a proportional response in syntax space to the cost of search. This is STL's influence, where this approach has worked better than "best effort under uniform syntax" (e.g. std::distance()). std.container follows the same convention, see e.g. APIs with "linear" in their name. This is not enforced or prescriptive. User code is free to choose different conventions. The one possible change would be to allow "in" with statically-sized arrays, under the assumption that the amount of data searched is fixed and proportional to the size of the program. Andrei
Feb 15 2015
On Sunday, 15 February 2015 at 21:23:13 UTC, Andrei Alexandrescu wrote:The one possible change would be to allow "in" with statically-sized arrays, under the assumption that the amount of data searched is fixed and proportional to the size of the program.To be really consistent, x in arr would need to be equivalent to: (x >= 0) && (x < arr.length) `in` tests for the presence of a _key_ in AAs, and the equivalent notion of a key for arrays is an index.
Feb 16 2015
"Marc Schütz" " wrote in message news:iftpzvhoyxxqhbfsxull forum.dlang.org...To be really consistent, x in arr would need to be equivalent to: (x >= 0) && (x < arr.length) `in` tests for the presence of a _key_ in AAs, and the equivalent notion of a key for arrays is an index.It's called 'in', not 'haskey'. Is 3 in the array? Is 7 in the map? Everybody understands what it means and the whole argument is nonsense. Next somebody will be arguing that float/float and int/int aren't the same operation and should have different syntax.
Feb 16 2015
On Monday, 16 February 2015 at 13:10:44 UTC, Daniel Murphy wrote:"Marc Schütz" " wrote in message news:iftpzvhoyxxqhbfsxull forum.dlang.org...I'm quite a fan of python's // operator for integer division, especially when combined with python 3's choice to make / always mean floating point division (i.e. 3/2 == float(3)/2, 3//2 == 1). It recognises that integer division is a weird thing and separates it from the much less weird floating point division.To be really consistent, x in arr would need to be equivalent to: (x >= 0) && (x < arr.length) `in` tests for the presence of a _key_ in AAs, and the equivalent notion of a key for arrays is an index.It's called 'in', not 'haskey'. Is 3 in the array? Is 7 in the map? Everybody understands what it means and the whole argument is nonsense. Next somebody will be arguing that float/float and int/int aren't the same operation and should have different syntax.
Feb 16 2015
John Colvin:I'm quite a fan of python's // operator for integer division, especially when combined with python 3's choice to make / always mean floating point division (i.e. 3/2 == float(3)/2, 3//2 == 1). It recognises that integer division is a weird thing and separates it from the much less weird floating point division.The C/D division operator semantics is a bug-prone design mistake, and this design was fixed in Python3 despite this has broken lot of Python2 code. It's a pitfall that a modern language must avoid (I don't know how Rust handles divisions, I hope they have removed this problem). Bye, bearophile
Feb 16 2015
On 2/16/15 5:10 AM, Daniel Murphy wrote:"Marc Schütz" " wrote in message news:iftpzvhoyxxqhbfsxull forum.dlang.org...Yah, I agree. FWIW many times "consistency" is used, well, inconsistently. In a consistency argument the choice of benchmark (= what's the standard that new stuff must be consistent with?) is crucial. Oftentimes in entities as complex as programming languages, it's easy to make consistency arguments for numerous artifacts depending on the choice of benchmark. This is painfully evident in C++ - consistency-based arguments got diluted quite a bit since it became obvious they're so easy to make in favor of anything.To be really consistent, x in arr would need to be equivalent to: (x >= 0) && (x < arr.length) `in` tests for the presence of a _key_ in AAs, and the equivalent notion of a key for arrays is an index.It's called 'in', not 'haskey'. Is 3 in the array? Is 7 in the map? Everybody understands what it means and the whole argument is nonsense.Next somebody will be arguing that float/float and int/int aren't the same operation and should have different syntax.I guess you just got taken downtown over that :o). Andrei
Feb 16 2015
On Monday, 16 February 2015 at 13:10:44 UTC, Daniel Murphy wrote:"Marc Schütz" " wrote in message news:iftpzvhoyxxqhbfsxull forum.dlang.org...I disagree with that; the difference between keys and values is pretty obvious to me, as is the analogy with indices (both go inside the []). But the entire discussion is pointless, because `in` will not be changed (and for good reasons!).To be really consistent, x in arr would need to be equivalent to: (x >= 0) && (x < arr.length) `in` tests for the presence of a _key_ in AAs, and the equivalent notion of a key for arrays is an index.It's called 'in', not 'haskey'. Is 3 in the array? Is 7 in the map? Everybody understands what it means and the whole argument is nonsense.
Feb 17 2015
On Sunday, 15 February 2015 at 18:45:45 UTC, bearophile wrote:Meta:How is it possible to have a search structure that takes O(n ln n) time ... you would have to go out of your way to design something particularly bad.Oh, whoops. I mixed up average-case complexity with worst-case. Although, isn't lookup O(n) in the worst case for hash tables?D associative arrays used to be O(1) amortized and O(n ln n) in worst case. Now they are O(1) amortized and O(n) worst case. And for an adversary it's not too much hard to find and hit that O(n) case. Bye, bearophile
Feb 15 2015
"bearophile" wrote in message news:jufdlgyynxiwbqubbbkx forum.dlang.org...D associative arrays used to be O(1) amortized and O(n ln n) in worst case.No, they were still O(n) worst case, for a single bucket with a degenerate binary tree.
Feb 15 2015
On Monday, 16 February 2015 at 06:23:06 UTC, Daniel Murphy wrote:"bearophile" wrote in message news:jufdlgyynxiwbqubbbkx forum.dlang.org...Java 7 changed its HashMap implementation to use TreeMap (red black search tree) instead of LinkedList for its buckets, if the key can be sorted. That puts the worst case lookup time from O(n) to O(log n) for sortable keys. Maybe that's worth considering for AAs?D associative arrays used to be O(1) amortized and O(n ln n) in worst case.No, they were still O(n) worst case, for a single bucket with a degenerate binary tree.
Feb 16 2015
"Casper Færgemand" " wrote in message news:aenkruurfmfanarkeicm forum.dlang.org...Java 7 changed its HashMap implementation to use TreeMap (red black search tree) instead of LinkedList for its buckets, if the key can be sorted. That puts the worst case lookup time from O(n) to O(log n) for sortable keys. Maybe that's worth considering for AAs?I don't think that would pay off for most uses of builtin AAs. One day we'll probably get a fully customizable hash table in phobos that can do this kind of stuff.
Feb 16 2015
Daniel Murphy:No, they were still O(n) worst case, for a single bucket with a degenerate binary tree.I see. I was unable to hit this degenerate case in my testing code, but I guess that was possible. Thank you. Bye, bearophile
Feb 16 2015
What binary tree? I don't see any.No, they were still O(n) worst case, for a single bucket with a degenerate binary tree.I see. I was unable to hit this degenerate case in my testing code, but I guess that was possible. Thank you.From what I can see in druntime/rt/aaA.d, the bucket is chosen by dividing the key hash modulo the number of buckets. The number of buckets is a prime number (x in [31, 97, 389, 1543, ...]), so it is very unlikely to write a hash function that would give different results but the same (i % x) value. The most probable cause would therefore be a hash function implementation that gives the exact same hash for a prepared set of keys. Then you'd get the worse case scenario in which you have to walk a list and compare keys one-by-one in O(n). But that's not a problem with the associative array implementation. It's a bad hash problem.
Feb 16 2015
"FG" wrote in message news:mbsk0i$1ukb$1 digitalmars.com...There isn't one any more, we're talking about the previous implementation.What binary tree? I don't see any.No, they were still O(n) worst case, for a single bucket with a degenerate binary > tree.
Feb 16 2015
On 2/16/15 1:23 AM, Daniel Murphy wrote:"bearophile" wrote in message news:jufdlgyynxiwbqubbbkx forum.dlang.org...IIRC, the AA code tried to make balanced trees, at least on rehash, but I thought it did so proactively on insertion too. But in any case, the reason AA's are so slow is because the load factor is 4. As in, the AA will have 4x as many elements as buckets before it increases buckets. I've brought it up before. Typically you want to keep the number of elements per bucket near 1 for O(1) lookup. -SteveD associative arrays used to be O(1) amortized and O(n ln n) in worst case.No, they were still O(n) worst case, for a single bucket with a degenerate binary tree.
Feb 16 2015