digitalmars.D - Suggestion: Operator `in` for slices
- Quirin Schroll (35/35) Dec 18 2021 The operator `in` between `T` and `T[]` is requested a lot of
- Adam Ruppe (8/10) Dec 18 2021 Nah, the general rule against it is due to speed/computational
- Quirin Schroll (6/13) Dec 18 2021 How is that even a criterion? `foreach` on arrays has O(n)
- Dukc (14/23) Dec 19 2021 `foreach` is not supposed to give that guarantee. `in` is. Just
- Dennis (36/38) Dec 18 2021 Who do people keep repeating that?
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/6) Dec 18 2021 Also, if the array is sorted it would be O(log N). You can
- Dukc (11/22) Dec 19 2021 These are bugs. The O(log n) assumption still makes sense when
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (21/25) Dec 19 2021 I know this is a common abuse of notation in C++ circles, but
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (14/17) Dec 19 2021 By this I mean that using "O(…)" alone means worst case.
- Steven Schveighoffer (9/14) Dec 19 2021 D AAs do not use linked lists any more. It uses open addressing. But
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/10) Dec 19 2021 Yes, if you don't shrink.
- Steven Schveighoffer (3/15) Dec 21 2021 Please see existing research, this is not a new concept.
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/21) Dec 21 2021 What do you mean by research, this is a trivial topic. In order
- Paul Backus (8/18) Dec 22 2021 To guarantee amortized O(1) deletion, it is sufficient to ensure
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (62/64) Dec 29 2021 I haven't looked at the implementation of AAs or the allocator
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/15) Dec 29 2021 To put numbers on this, when deleting a long series of keys the
- Paul Backus (3/19) Dec 29 2021 Seems like an issue worth filing a bugzilla report about.
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/4) Jan 02 2022 Probably, I don't use AAs, but I can file an issue if nobody else
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (3/4) Jan 07 2022 I am not able to register for issues.dlang.org, does not not
- bauss (3/8) Jan 07 2022 Is that real? That gmail accounts don't work? That's so odd.
- Mike Parker (3/8) Jan 07 2022 I registered with a gmail account. What error do you see?
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (7/16) Jan 07 2022 «The e-mail address you entered
- =?UTF-8?Q?Ali_=c3=87ehreli?= (3/6) Dec 19 2021 That.
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (6/9) Dec 19 2021 But your proposal would kinda work like in Python?
- Dennis (5/10) Dec 20 2021 In my proposal `'a' in ['a', 'b']` will return `null` because it
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (5/8) Dec 20 2021 Oh yeah, I'm sorry, you looked at the index as a key. I'm not
- Steven Schveighoffer (27/53) Dec 19 2021 Well, yeah. But that doesn't mean it happens often or even regularly.
- Dennis (11/14) Dec 20 2021 For the record, I'm not arguing in favor of `in` for slices. I'm
- Araq (5/20) Dec 20 2021 And how come it cannot store the result of the "in" operation in
- Biotronic (15/27) Dec 20 2021 bool isSubset(T1, T2)(T1 needles, T2 haystack) {
- Dennis (2/3) Dec 20 2021 Thanks, that's an illuminating example.
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (10/17) Dec 20 2021 Yes, I agree with this. When I use "in" in Python I am very much
- Paul Backus (8/26) Dec 20 2021 ```d
- Stanislav Blinov (9/24) Dec 20 2021 TBH, for that particular case, this looks nicer, and IMO
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/10) Dec 20 2021 You can always create a library construct, but then you should
- Paul Backus (8/19) Dec 20 2021 There are a lot of features one could potentially add to D to
- Nick Treleaven (9/12) Dec 21 2021 No, `canFind` and `among` search for a given value in a range.
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (5/8) Dec 22 2021 What it is conceptually, depends on how you conceptualize it. The
- Nick Treleaven (9/12) Jan 01 2022 `in` is defined for RedBlackTree, but it returns bool:
- Elronnd (3/10) Jan 01 2022 A redblacktree is not a sequence, though; its elements are not
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (7/9) Jan 02 2022 Right, "in" is a set operator. You can implement a set as a dict
- Steven Schveighoffer (7/14) Dec 21 2021 One thing not really being discussed is that there is a difference
- Era Scarecrow (10/16) Jan 02 2022 Then maybe we in should be implemented; Have it check if you
- Nick Treleaven (7/15) Jan 04 2022 It would make sense to implement `in` for SortedRange, but there
- Era Scarecrow (14/18) Jan 05 2022 Sorry, i was somehow thinking there was a key/value pair in this
- Nick Treleaven (7/13) Jan 07 2022 Ok, you're right. The search key then wouldn't need to contain
- Biotronic (12/15) Dec 19 2021 It's in
The operator `in` between `T` and `T[]` is requested a lot of times. Most people suggest it should return a `bool` signifying if some value of the array on the right-hand side is equal (`==`) to the `T` value on the left-hand side. This is the obvious thing to do. I guess it has never been implemented because this information (true/false) is so much less than it could be. If D used 1-based indexing, `in` could return the index if found or 0 if not found. Because 0 == false, one could use `if (auto x in array)`. However, D's indexing 0-based, so 0 is a valid index. `in` cannot return a plain `size_t`. My idea is inspired by how AAs `in` operator does not return a (near-useless) `bool`, but a value that can be used like a `bool`, but provides much more valuable information: a `T*`. It would work returning a `T*` for slices' `in` as well, but it is much less useful than the index. (One could get the index of something from the pointer to it using pointer subtraction, but pointer arithmetic not considered ` safe`.) The index practically *is* the pointer. So, can we return an object that is, for almost all intents and purposes, usable as an index, but evaluating it via `if` returns `true` or `false` depending on whether the value was found or not. So, my suggestion is that the `in` operator for slices returns a value of this type: ```D struct InResult { size_t index = size_t.max; alias index this; bool opCast(T : bool)() const { return index != size_t.max; } } ``` In fact, this is very much a simple optional type. It takes a very unlikely valid value of its underlying type and reinterprets it as the invalid value.
Dec 18 2021
On Saturday, 18 December 2021 at 18:29:56 UTC, Quirin Schroll wrote:I guess it has never been implemented because this information (true/false) is so much less than it could be.Nah, the general rule against it is due to speed/computational complexity. The `in` operator on AAs has a speed of O(log n) or O(1). `in` on slices would exceed that being O(n). I'm not sure the speed rule is written or not, but generally `in` is supposed to have the same speed requirement as `[n]` which is sub-linear.
Dec 18 2021
On Saturday, 18 December 2021 at 18:47:33 UTC, Adam Ruppe wrote:On Saturday, 18 December 2021 at 18:29:56 UTC, Quirin Schroll wrote:How is that even a criterion? `foreach` on arrays has O(n) complexity as well. You cannot say it's okay for `foreach` because it obviously has to be that way. The same can be said for linear search. It doesn't take a genius that searching an unordered structure cannot be better than O(n) in the worst case.I guess it has never been implemented because this information (true/false) is so much less than it could be.Nah, the general rule against it is due to speed/computational complexity. The `in` operator on AAs has a speed of O(log n) or O(1). `in` on slices would exceed that being O(n).
Dec 18 2021
On Saturday, 18 December 2021 at 20:17:51 UTC, Quirin Schroll wrote:`foreach` is not supposed to give that guarantee. `in` is. Just like the indexing operator (`range[index]`). That does not say you can't define a function that checks for an element in O(n) time, but it should be called something else than `in` then. It's the same reason as why we don't define `range[index]` as `range.save.dropExactly(index).front` for all forward ranges that do not explicitly declare `opIndex`. It's not that you can't index a range like that, but that we usually want a compile error rather than silent O(n) indexing if we are expecting O(1) or O(log n) indexing.Nah, the general rule against it is due to speed/computational complexity. The `in` operator on AAs has a speed of O(log n) or O(1). `in` on slices would exceed that being O(n).How is that even a criterion? `foreach` on arrays has O(n) complexity as well. You cannot say it's okay for `foreach` because it obviously has to be that way.The same can be said for linear search. It doesn't take a genius that searching an unordered structure cannot be better than O(n) in the worst case.But it isn't always immediately clear to the reader what the type of the data structure is.
Dec 19 2021
On Saturday, 18 December 2021 at 18:47:33 UTC, Adam Ruppe wrote:The `in` operator on AAs has a speed of O(log n) or O(1). `in` on slices would exceed that being O(n).Who do people keep repeating that? - Worst case time complexity of the `in` operator in AA's is actually O(n). This can easily happen in practice when your `toHash` has a poor implementation, which I have had before - In custom types you can overload the `in` operator to something with arbitrarily bad time complexity. - Despite scaling with O(n), a linear scan is pretty fast. You can probably scan your entire RAM in a matter of seconds. The whole "it ruins performance guarantees" so far seems completely hypothetical and is not something that came from experience with actual code. The problem with `in` for slices to me is that it would be inconsistent with how it works for AAs, where `in` searches for a *key* and returns a pointer to a *value*. The same for slices would look like this: ```D T* opIn(size_t i, T[] arr) { return i >= arr.length ? null : &arr[i]; } void main() { int[3] a = [10, 20, 30]; assert((10 in a) == null); // index 10 is out of bounds assert((2 in a) == &a[2]); } ``` But people suggest `in` to search a *value* in a slice. I've once overloaded the `in` operator for a custom 2D array type used to represent a maze in which path finding was done. There it was useful for bounds checking since writing comparisons with `width` and `height` all the time is cumbersome. I wouldn't mind giving `in` this meaning so you can do easy bounds checks in slices as well, but I think it will be to confusing for newcomers expecting `in` to work like in Python.
Dec 18 2021
On Saturday, 18 December 2021 at 20:46:06 UTC, Dennis wrote:- Despite scaling with O(n), a linear scan is pretty fast. You can probably scan your entire RAM in a matter of seconds.Also, if the array is sorted it would be O(log N). You can achieve this either by adding a high level IR that can deduce that the array is sorted or making it part of the type system.
Dec 18 2021
On Saturday, 18 December 2021 at 20:46:06 UTC, Dennis wrote:On Saturday, 18 December 2021 at 18:47:33 UTC, Adam Ruppe wrote:These are bugs. The O(log n) assumption still makes sense when your code works as it's supposed to.The `in` operator on AAs has a speed of O(log n) or O(1). `in` on slices would exceed that being O(n).Who do people keep repeating that? - Worst case time complexity of the `in` operator in AA's is actually O(n). This can easily happen in practice when your `toHash` has a poor implementation, which I have had before - In custom types you can overload the `in` operator to something with arbitrarily bad time complexity.- Despite scaling with O(n), a linear scan is pretty fast. You can probably scan your entire RAM in a matter of seconds.A few seconds is by far too much. Remember, the data structure might be fed to function designed by introspection. The function sees "Ah, an `in` operator. `in` is on average O(log N) at worst, I can call it hundreds of times without significant slowdown", and compiles such that it takes a quarter of an hour to execute. Had the `in` operator respected the O(log N) assumption, the introspecting function would have failed to compile alerting the programmer, or used an alternative, quicker implementation.
Dec 19 2021
On Sunday, 19 December 2021 at 10:10:21 UTC, Dukc wrote:function sees "Ah, an `in` operator. `in` is on average O(log N) at worst, I can call it hundreds of times without significant slowdown", and compiles such that it takes a quarter of an hour to execute.I know this is a common abuse of notation in C++ circles, but O(…) means worst case. Also if something is O(log N), then it is also O(N) or any other bigger bound. The notations O(N) means that N is sufficient to put an upper bound on the computation growth (in terms of input size) for each and every possible input of any size. Average case is not given with Big-Oh-notation, but done "accurately" by presenting a model for the distribution of input data and then solving it. (usually a very big task) When some C++ libraries talk of amortized complexity, they probably should not use O-notation, but we get what they mean. E.g. if you do a long series of operations then the costly reallocations will be amortized, so they therefore ignore it in the analysis and "pretend" it does not happen. But it isn't on average. To do on-average-calculations you need a model of the input and the operations, and the calculcations can take a scholar many days to figure out. E.g. one strategy is to translate the model and patterns of operations to recurrence relations and solving it as integrals… a very tedious and time consuming job.
Dec 19 2021
On Sunday, 19 December 2021 at 11:07:30 UTC, Ola Fosheim Grøstad wrote:Average case is not given with Big-Oh-notation, but done "accurately" by presenting a model for the distribution of input data and then solving it. (usually a very big task)By this I mean that using "O(…)" alone means worst case. We can say that quicksort is O(N²). Not qualifying the statements implies worst case. If we consider all inputs equally likely then we can also claim that the average case complexity is limited by O(N log N). Now, if AA was implemented as a hashtable with a constant upper-bound on the linked lists (say 100) then we could say that "in" would be O(1). We could also claim that inserts would be O(1) amortized (ignore rehashing, assuming a distribution that is relatively flat). So in general, we should be able to have AAs where all operations (insert, delete and membership tests) are O(1) amortized.
Dec 19 2021
On 12/19/21 6:45 AM, Ola Fosheim Grøstad wrote:Now, if AA was implemented as a hashtable with a constant upper-bound on the linked lists (say 100) then we could say that "in" would be O(1).D AAs do not use linked lists any more. It uses open addressing. But having a bound on how many collisions can happen is ineffective against a bad hash function (say, one that always returns 0). The expectation of the hash table is that the hash function is good. Without a good hash function, the complexity goes to crap.We could also claim that inserts would be O(1) amortized (ignore rehashing, assuming a distribution that is relatively flat).That is what we claim (and what hash algorithms in general claim). amortized O(1) lookups and inserts. -Steve
Dec 19 2021
On Sunday, 19 December 2021 at 18:36:12 UTC, Steven Schveighoffer wrote:That is what we claim (and what hash algorithms in general claim). amortized O(1) lookups and inserts.Yes, if you don't shrink. You need to prove that the claim holds for all possible orderings of inserts, deletions and lookups. So you need to prove that you always do O(N) operations between rehashing, so you get O(N) operations of complexity O(1) + one rehash of complexity O(N) = O(N).
Dec 19 2021
On 12/19/21 2:34 PM, Ola Fosheim Grøstad wrote:On Sunday, 19 December 2021 at 18:36:12 UTC, Steven Schveighoffer wrote:Please see existing research, this is not a new concept. -SteveThat is what we claim (and what hash algorithms in general claim). amortized O(1) lookups and inserts.Yes, if you don't shrink. You need to prove that the claim holds for all possible orderings of inserts, deletions and lookups. So you need to prove that you always do O(N) operations between rehashing, so you get O(N) operations of complexity O(1) + one rehash of complexity O(N) = O(N).
Dec 21 2021
On Tuesday, 21 December 2021 at 17:02:22 UTC, Steven Schveighoffer wrote:On 12/19/21 2:34 PM, Ola Fosheim Grøstad wrote:What do you mean by research, this is a trivial topic. In order to get O(1) amortised you have to delay shrinkage, you must prove that any possible sequence does not reallocate more frequently than N operations.On Sunday, 19 December 2021 at 18:36:12 UTC, Steven Schveighoffer wrote:Please see existing research, this is not a new concept.That is what we claim (and what hash algorithms in general claim). amortized O(1) lookups and inserts.Yes, if you don't shrink. You need to prove that the claim holds for all possible orderings of inserts, deletions and lookups. So you need to prove that you always do O(N) operations between rehashing, so you get O(N) operations of complexity O(1) + one rehash of complexity O(N) = O(N).
Dec 21 2021
On Sunday, 19 December 2021 at 19:34:18 UTC, Ola Fosheim Grøstad wrote:On Sunday, 19 December 2021 at 18:36:12 UTC, Steven Schveighoffer wrote:To guarantee amortized O(1) deletion, it is sufficient to ensure that `shrink_threshold * growth_factor < growth_threshold`. And if you look at the source code for D's built-in AAs, you will see that this is, in fact, ensured: https://github.com/dlang/druntime/blob/v2.098.1/src/rt/aaA.d#L17-L27 So, we can safely put this discussion to rest, I think.That is what we claim (and what hash algorithms in general claim). amortized O(1) lookups and inserts.Yes, if you don't shrink. You need to prove that the claim holds for all possible orderings of inserts, deletions and lookups. So you need to prove that you always do O(N) operations between rehashing, so you get O(N) operations of complexity O(1) + one rehash of complexity O(N) = O(N).
Dec 22 2021
On Wednesday, 22 December 2021 at 14:27:32 UTC, Paul Backus wrote:https://github.com/dlang/druntime/blob/v2.098.1/src/rt/aaA.d#L17-L27 So, we can safely put this discussion to rest, I think.I haven't looked at the implementation of AAs or the allocator used, so I cannot tell whether it is O(1) amortized or not. You can test it by repeatedly adding/removing items near the thresholds. A preliminary test suggests that the observed behaviour is rather wasteful, which should be a warning for anyone who considers using it in production. It grows by a factor of 4, that is a lot of waste. It also appears to grow on deletion: adding changed 0 : 0 changed 6 : 8 changed 25 : 32 changed 102 : 128 changed 409 : 512 changed 1638 : 2048 changed 6553 : 8192 removing changed 4095 : 32768 changed 1023 : 8192 changed 255 : 2048 changed 63 : 512 changed 15 : 128 changed 3 : 32 ``` import std.stdio; struct AA { Impl* impl; }; struct Impl { void*[] buckets; }; void main(){ immutable N = 6554; int[int] a; ulong nn = 0; writeln("adding"); for(int i = 0; i<N; i++){ auto b = a; a[i] = i; auto aa = cast(AA*) &a; auto n = aa.impl.buckets.length; if (n!=nn) { writeln("changed ", i, " : ", nn); nn = n; } } writeln("\nremoving"); for(int i = N-1; i>=0; i--){ auto b = a; a.remove(i); auto aa = cast(AA*) &a; auto n = aa.impl.buckets.length; if (n!=nn) { writeln("changed ", i, " : ", nn); nn = n; } } } ```
Dec 29 2021
On Wednesday, 29 December 2021 at 10:53:13 UTC, Ola Fosheim Grøstad wrote:removing changed 4095 : 32768 changed 1023 : 8192 changed 255 : 2048 changed 63 : 512 changed 15 : 128 changed 3 : 32To put numbers on this, when deleting a long series of keys the hash table has consistently less than 13% filled slots, and right before shrinking it has 3% filled slots (1024/32768). That is a lot of wasted memory, 97%. Hash tables should not be worse than 75% waste as a rule of thumb. With such high constant factors O(1) notation is of little value.
Dec 29 2021
On Wednesday, 29 December 2021 at 11:26:48 UTC, Ola Fosheim Grøstad wrote:On Wednesday, 29 December 2021 at 10:53:13 UTC, Ola Fosheim Grøstad wrote:Seems like an issue worth filing a bugzilla report about.removing changed 4095 : 32768 changed 1023 : 8192 changed 255 : 2048 changed 63 : 512 changed 15 : 128 changed 3 : 32To put numbers on this, when deleting a long series of keys the hash table has consistently less than 13% filled slots, and right before shrinking it has 3% filled slots (1024/32768). That is a lot of wasted memory, 97%. Hash tables should not be worse than 75% waste as a rule of thumb. With such high constant factors O(1) notation is of little value.
Dec 29 2021
On Wednesday, 29 December 2021 at 21:18:30 UTC, Paul Backus wrote:Seems like an issue worth filing a bugzilla report about.Probably, I don't use AAs, but I can file an issue if nobody else has already.
Jan 02 2022
On Wednesday, 29 December 2021 at 21:18:30 UTC, Paul Backus wrote:Seems like an issue worth filing a bugzilla report about.I am not able to register for issues.dlang.org, does not not accept gmail accounts. I guess someone else has to file it.
Jan 07 2022
On Friday, 7 January 2022 at 10:07:02 UTC, Ola Fosheim Grøstad wrote:On Wednesday, 29 December 2021 at 21:18:30 UTC, Paul Backus wrote:Is that real? That gmail accounts don't work? That's so odd.Seems like an issue worth filing a bugzilla report about.I am not able to register for issues.dlang.org, does not not accept gmail accounts. I guess someone else has to file it.
Jan 07 2022
On Friday, 7 January 2022 at 10:07:02 UTC, Ola Fosheim Grøstad wrote:On Wednesday, 29 December 2021 at 21:18:30 UTC, Paul Backus wrote:I registered with a gmail account. What error do you see?Seems like an issue worth filing a bugzilla report about.I am not able to register for issues.dlang.org, does not not accept gmail accounts. I guess someone else has to file it.
Jan 07 2022
On Friday, 7 January 2022 at 10:45:39 UTC, Mike Parker wrote:On Friday, 7 January 2022 at 10:07:02 UTC, Ola Fosheim Grøstad wrote:«The e-mail address you entered (firstname.middlename.lastname gmail.com) didn't pass our syntax checking for a legal email address. A legal address must contain exactly one ' ', and at least one '.' after the . Currently, registering using Gmail addresses is not allowed due to spam. It also must not contain any illegal characters.»On Wednesday, 29 December 2021 at 21:18:30 UTC, Paul Backus wrote:I registered with a gmail account. What error do you see?Seems like an issue worth filing a bugzilla report about.I am not able to register for issues.dlang.org, does not not accept gmail accounts. I guess someone else has to file it.
Jan 07 2022
On 12/18/21 12:46 PM, Dennis wrote:The problem with `in` for slices to me is that it would be inconsistent with how it works for AAs, where `in` searches for a *key* and returns a pointer to a *value*.That. Ali
Dec 19 2021
On Saturday, 18 December 2021 at 20:46:06 UTC, Dennis wrote:I wouldn't mind giving `in` this meaning so you can do easy bounds checks in slices as well, but I think it will be to confusing for newcomers expecting `in` to work like in Python.But your proposal would kinda work like in Python? In [3]: 'a' in {'a':1, 'b':2} Out[3]: True In [4]: 'a' in ['a','b'] Out[4]: True
Dec 19 2021
On Sunday, 19 December 2021 at 17:11:23 UTC, Ola Fosheim Grøstad wrote:But your proposal would kinda work like in Python? In [3]: 'a' in {'a':1, 'b':2} Out[3]: True In [4]: 'a' in ['a','b'] Out[4]: TrueIn my proposal `'a' in ['a', 'b']` will return `null` because it looks for the key 'a', and since the ASCII value of 'a' is `0x61` and the array length 2, it will return `null`.
Dec 20 2021
On Monday, 20 December 2021 at 10:34:44 UTC, Dennis wrote:In my proposal `'a' in ['a', 'b']` will return `null` because it looks for the key 'a', and since the ASCII value of 'a' is `0x61` and the array length 2, it will return `null`.Oh yeah, I'm sorry, you looked at the index as a key. I'm not sure how I managed to mix that up… But yeah, just doing what Python does, but with a pointer to a value or null instead of boolean, makes sense in D.
Dec 20 2021
On 12/18/21 3:46 PM, Dennis wrote:On Saturday, 18 December 2021 at 18:47:33 UTC, Adam Ruppe wrote:Well, yeah. But that doesn't mean it happens often or even regularly. You need to have a bad `toHash` function. We shouldn't lower the bar for `in` just because it's possible to make it perform poorly.The `in` operator on AAs has a speed of O(log n) or O(1). `in` on slices would exceed that being O(n).Who do people keep repeating that? - Worst case time complexity of the `in` operator in AA's is actually O(n). This can easily happen in practice when your `toHash` has a poor implementation, which I have had before- In custom types you can overload the `in` operator to something with arbitrarily bad time complexity.There's your answer -- make a custom type that allows `in` with an array-like structure. Or make a UFCS function.- Despite scaling with O(n), a linear scan is pretty fast. You can probably scan your entire RAM in a matter of seconds.Complexity is not some old-fashioned idea that has no relevance on today's hardware. Linear searching is a real bottleneck, and identifying `in` to be sub-linear is a useful design decision. If you are searching your entire memory by using an `opCmp`, I bet it will be more than a few seconds.The whole "it ruins performance guarantees" so far seems completely hypothetical and is not something that came from experience with actual code.The design is sound. You use `in`, you get sub-linear (i.e. fast) performance. That kind of expectation is useful from a generic point of view. It's similar to opIndex for a linked list -- it's possible, it makes it so you can used linked lists in places that might expect arrays, but generic code that uses it might perform really badly as the code expects opIndex to be fast.The problem with `in` for slices to me is that it would be inconsistent with how it works for AAs, where `in` searches for a *key* and returns a pointer to a *value*. The same for slices would look like this:[snip]But people suggest `in` to search a *value* in a slice.Yes, this is a good point against. But the thing is, we also have `find`, which works fine, I'm not sure why we need `in` support.I've once overloaded the `in` operator for a custom 2D array type used to represent a maze in which path finding was done. There it was useful for bounds checking since writing comparisons with `width` and `height` all the time is cumbersome. I wouldn't mind giving `in` this meaning so you can do easy bounds checks in slices as well, but I think it will be to confusing for newcomers expecting `in` to work like in Python.Yes, that's not a terrible use case for it. But you are right that `in` will be confusing to newcomers, especially when you learn that an array is a collection of data. `if(x in arr)` doesn't seem to suggest that it will be looking at indexes. -Steve
Dec 19 2021
On Sunday, 19 December 2021 at 18:31:46 UTC, Steven Schveighoffer wrote:There's your answer -- make a custom type that allows `in` with an array-like structure.For the record, I'm not arguing in favor of `in` for slices. I'm just baffled that whenever this comes up, the first thing people bring up is the 'time complexity' argument.That kind of expectation is useful from a generic point of view.And this is my biggest problem with that argument: it's about this mystical function that takes a generic data structure and uses the `in` operator multiple times on it, which can now suddenly be called on an array, ruining performance to the surprise of the programmer. What is this function? Does it exist in Phobos currently?
Dec 20 2021
On Monday, 20 December 2021 at 11:04:00 UTC, Dennis wrote:On Sunday, 19 December 2021 at 18:31:46 UTC, Steven Schveighoffer wrote:And how come it cannot store the result of the "in" operation in a local variable for re-use? And how exactly is calling an oh-so-efficient sub-linear O(log N) function repeatedly when you can save the result in a local *not* a problem?There's your answer -- make a custom type that allows `in` with an array-like structure.For the record, I'm not arguing in favor of `in` for slices. I'm just baffled that whenever this comes up, the first thing people bring up is the 'time complexity' argument.That kind of expectation is useful from a generic point of view.And this is my biggest problem with that argument: it's about this mystical function that takes a generic data structure and uses the `in` operator multiple times on it, which can now suddenly be called on an array, ruining performance to the surprise of the programmer. What is this function? Does it exist in Phobos currently?
Dec 20 2021
On Monday, 20 December 2021 at 11:28:52 UTC, Araq wrote:On Monday, 20 December 2021 at 11:04:00 UTC, Dennis wrote:bool isSubset(T1, T2)(T1 needles, T2 haystack) { foreach (needle; needles) { if (!(needle in haystack)) return false; } return true; } If 'in' is O(1), that's a fairly sensible implementation. If it's O(N), it's not. Since the lookup uses different arguments each time, local variables can't help you. Note that, of course, there's no problem if `x in arr` is a simple bounds check - if your bounds check on an array is O(N), you're doing it wrong. -- SimenAnd this is my biggest problem with that argument: it's about this mystical function that takes a generic data structure and uses the `in` operator multiple times on it, which can now suddenly be called on an array, ruining performance to the surprise of the programmer. What is this function? Does it exist in Phobos currently?And how come it cannot store the result of the "in" operation in a local variable for re-use? And how exactly is calling an oh-so-efficient sub-linear O(log N) function repeatedly when you can save the result in a local *not* a problem?
Dec 20 2021
On Monday, 20 December 2021 at 13:07:40 UTC, Biotronic wrote:If 'in' is O(1), that's a fairly sensible implementation.Thanks, that's an illuminating example.
Dec 20 2021
On Monday, 20 December 2021 at 11:04:00 UTC, Dennis wrote:On Sunday, 19 December 2021 at 18:31:46 UTC, Steven Schveighoffer wrote:Yes, I agree with this. When I use "in" in Python I am very much aware that it does a linear scan, but it is a very useful shorthand that can replaces a long and confusing sequence of or-expressions. E.g. ``` if keyword in ['if', 'while', 'for] ``` is a very useful and clear syntactical construct.There's your answer -- make a custom type that allows `in` with an array-like structure.For the record, I'm not arguing in favor of `in` for slices. I'm just baffled that whenever this comes up, the first thing people bring up is the 'time complexity' argument.
Dec 20 2021
On Monday, 20 December 2021 at 13:54:56 UTC, Ola Fosheim Grøstad wrote:On Monday, 20 December 2021 at 11:04:00 UTC, Dennis wrote:```d import std.algorithm.searching; if (["if", "while", "for"].canFind(keyword)) { // ... } ```On Sunday, 19 December 2021 at 18:31:46 UTC, Steven Schveighoffer wrote:Yes, I agree with this. When I use "in" in Python I am very much aware that it does a linear scan, but it is a very useful shorthand that can replaces a long and confusing sequence of or-expressions. E.g. ``` if keyword in ['if', 'while', 'for] ``` is a very useful and clear syntactical construct.There's your answer -- make a custom type that allows `in` with an array-like structure.For the record, I'm not arguing in favor of `in` for slices. I'm just baffled that whenever this comes up, the first thing people bring up is the 'time complexity' argument.
Dec 20 2021
On Monday, 20 December 2021 at 14:21:08 UTC, Paul Backus wrote:On Monday, 20 December 2021 at 13:54:56 UTC, Ola Fosheim Grøstad wrote:TBH, for that particular case, this looks nicer, and IMO documents the intent better than the original Pythonism: ```d import std.algorithm.comparison : among; if (keyword.among("if", "while", "for")) { /* ... */ } ``` It also doesn't let work go to waste, as it returns a 1-based index.E.g. ``` if keyword in ['if', 'while', 'for] ``` is a very useful and clear syntactical construct.```d import std.algorithm.searching; if (["if", "while", "for"].canFind(keyword)) { // ... } ```
Dec 20 2021
On Monday, 20 December 2021 at 14:21:08 UTC, Paul Backus wrote:```d import std.algorithm.searching; if (["if", "while", "for"].canFind(keyword)) { // ... } ```You can always create a library construct, but then you should also argue in favour of removing AAs and "in". You either do it, or you don't. Why the half-measures?
Dec 20 2021
On Monday, 20 December 2021 at 21:11:28 UTC, Ola Fosheim Grøstad wrote:On Monday, 20 December 2021 at 14:21:08 UTC, Paul Backus wrote:There are a lot of features one could potentially add to D to improve it, and limited resources to devote to doing so. I'm not necessarily *against* adding `in` for slices, but since we already have a perfectly good library solution, I don't think it deserves a high priority. Feel free to write the DIP yourself if you disagree. :)```d import std.algorithm.searching; if (["if", "while", "for"].canFind(keyword)) { // ... } ```You can always create a library construct, but then you should also argue in favour of removing AAs and "in". You either do it, or you don't. Why the half-measures?
Dec 20 2021
On Monday, 20 December 2021 at 21:11:28 UTC, Ola Fosheim Grøstad wrote:You can always create a library construct, but then you should also argue in favour of removing AAs and "in". You either do it, or you don't. Why the half-measures?No, `canFind` and `among` search for a given value in a range. `in` checks if a given key exists and returns a pointer to the associated data (not a pointer to the matching key). This is why Phobos SortedRange does not implement opIn even though it would be sub-linear. Dennis is right that conceptually a key for an array is an index. An array element value is the data associated with an index.
Dec 21 2021
On Tuesday, 21 December 2021 at 17:06:04 UTC, Nick Treleaven wrote:Dennis is right that conceptually a key for an array is an index. An array element value is the data associated with an index.What it is conceptually, depends on how you conceptualize it. The most common conceptualization of "in" is as a set operation and you cannot represent a set with array indexes.
Dec 22 2021
On Wednesday, 22 December 2021 at 12:43:26 UTC, Ola Fosheim Grøstad wrote:What it is conceptually, depends on how you conceptualize it. The most common conceptualization of "in" is as a set operation and you cannot represent a set with array indexes.`in` is defined for RedBlackTree, but it returns bool: https://dlang.org/phobos/std_container_rbtree.html#.RedBlackTree.opBinaryRight So aside from the complexity argument, `element in arr` could be defined if it returned bool, not a pointer (in order to be consistent with AAs not returning a pointer to the *key*). But due to time complexity it makes more sense to define `index in arr`.
Jan 01 2022
On Saturday, 1 January 2022 at 16:40:02 UTC, Nick Treleaven wrote:On Wednesday, 22 December 2021 at 12:43:26 UTC, Ola Fosheim Grøstad wrote:A redblacktree is not a sequence, though; its elements are not keyed in any way.What it is conceptually, depends on how you conceptualize it. The most common conceptualization of "in" is as a set operation and you cannot represent a set with array indexes.`in` is defined for RedBlackTree, but it returns bool: https://dlang.org/phobos/std_container_rbtree.html#.RedBlackTree.opBinaryRight
Jan 01 2022
On Sunday, 2 January 2022 at 02:33:22 UTC, Elronnd wrote:A redblacktree is not a sequence, though; its elements are not keyed in any way.Right, "in" is a set operator. You can implement a set as a dict with null-values, then you can think of the keys as values. If you think of the ADT as the set-concept then what is a key and what is a value is a matter of interpretation. If you think of a binary tree (or rbtree) as a set, then the distinction between key and value becomes blurred.
Jan 02 2022
On 12/20/21 6:04 AM, Dennis wrote:On Sunday, 19 December 2021 at 18:31:46 UTC, Steven Schveighoffer wrote:One thing not really being discussed is that there is a difference between "some library" defining slow `in` operators, or slow `opIndex`, and dlang/Phobos doing it. D picked the path of trying to ensure complexity consistency for `in`, but it's more of a stylistic rule, not necessarily a requirement. -SteveThere's your answer -- make a custom type that allows `in` with an array-like structure.For the record, I'm not arguing in favor of `in` for slices. I'm just baffled that whenever this comes up, the first thing people bring up is the 'time complexity' argument.
Dec 21 2021
On Tuesday, 21 December 2021 at 17:00:49 UTC, Steven Schveighoffer wrote:One thing not really being discussed is that there is a difference between "some library" defining slow `in` operators, or slow `opIndex`, and dlang/Phobos doing it. D picked the path of trying to ensure complexity consistency for `in`, but it's more of a stylistic rule, not necessarily a requirement.Then maybe we in should be implemented; Have it check if you have it sorted (*unless you do assumesorted template*). If it is sorted naturally use BinarySearch, and if not it would probably do a linear search **BUT** give a warning message and file/line number so it can be fixed/traced? (*Or just make it have to be Binary Search and asserts out if it isn't sorted*) As for bool vs pointer return... probably return a pointer as that can easily be tested as bool for no extra cost.
Jan 02 2022
On Monday, 3 January 2022 at 00:15:53 UTC, Era Scarecrow wrote:Then maybe we in should be implemented; Have it check if you have it sorted (*unless you do assumesorted template*). If it is sorted naturally use BinarySearch, and if not it would probably do a linear search **BUT** give a warning message and file/line number so it can be fixed/traced? (*Or just make it have to be Binary Search and asserts out if it isn't sorted*) As for bool vs pointer return... probably return a pointer as that can easily be tested as bool for no extra cost.It would make sense to implement `in` for SortedRange, but there is no use case for returning a pointer. You already have the value on the left hand side, the only reason for a pointer would be to mutate the element in the range. Mutating that would potentially stop the range being in order, violating the point of SortedRange.
Jan 04 2022
On Tuesday, 4 January 2022 at 18:23:33 UTC, Nick Treleaven wrote:It would make sense to implement `in` for SortedRange, but there is no use case for returning a pointer. You already have the value on the left hand side, the only reason for a pointer would be to mutate the element in the range.Sorry, i was somehow thinking there was a key/value pair in this and not just a normal sorted block. But yes there may still be use cases. Consider i do opCmp on a struct, and it only does comparisons on say a name portion (*Or UPC or some other immutable property ID*), but no other part of the stored data. The name/ID would be how it's sorted; But any attached data which is non-sorted would be inaccessible otherwise. This could be phone number, address, grades, just about anything (*and those items could be changed without changing the sorting order*) Though you can probably do 2 steps it seems like a bit of extra pointless work to me. Or maybe AA's are a better choice in that case. I'm not sure.
Jan 05 2022
On Wednesday, 5 January 2022 at 23:16:48 UTC, Era Scarecrow wrote:Consider i do opCmp on a struct, and it only does comparisons on say a name portionOk, you're right. The search key then wouldn't need to contain the other data which is not used for comparisons.But any attached data which is non-sorted would be inaccessible otherwise. This could be phone number, address, grades, just about anything (*and those items could be changed without changing the sorting order*)True, but if SortedRange exposed a pointer to the matching element, it should probably be const so the data that *is* compared is not changed. (It could perhaps alternatively return a one-based index instead, which would convert to bool as expected).
Jan 07 2022
On Saturday, 18 December 2021 at 18:47:33 UTC, Adam Ruppe wrote:I'm not sure the speed rule is written or not, but generally `in` is supposed to have the same speed requirement as `[n]` which is sub-linear.It's in [std.container](https://dlang.org/phobos/std_container.html), bottom of the page. I [remember a discussion on complexity](https://forum.dlang.org/post/n3qtgq$2frv$1 digitalmars.com) that prompted Andrei to create a library for marking functions with their complexity. I'm not sure if this was the original source for the complexities given in std.container. My brain is also telling me the values were sourced from Alexander Stepanov's [Elements of Programming](http://elementsofprogramming.com/), but I can't find a source for that, and I'm too lazy to look very deep. -- Biotronic
Dec 19 2021