www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Why does `filterBidirectional` exist, and why isn't it called

reply FeepingCreature <feepingcreature gmail.com> writes:
Yes I know the stated reason, to save time initializing the range 
in `filter`.

You know what I did because we didn't know `filterBidirectional` 
existed?

I literally *walked the entire range* returned by `filter` to 
find the last matching element.

`filter` should expose `back`. By all means, let `back` lazily 
initialize, but I don't understand how `filterBidirectional` was 
ever considered acceptable. Since when do we sacrifice ease of 
use for performance? I mean, since 2011 apparently. - This is bad 
ergonomics, bad discoverability and bad API design. 
`filterBidirectional` delenda est.
Mar 09 2023
next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 3/9/23 3:06 AM, FeepingCreature wrote:
 Yes I know the stated reason, to save time initializing the range in 
 `filter`.
 
 You know what I did because we didn't know `filterBidirectional` existed?
 
 I literally *walked the entire range* returned by `filter` to find the 
 last matching element.
Why not `rng.retro.filter`?
 `filter` should expose `back`. By all means, let `back` lazily 
 initialize, but I don't understand how `filterBidirectional` was ever 
 considered acceptable. Since when do we sacrifice ease of use for 
 performance? I mean, since 2011 apparently. - This is bad ergonomics, 
 bad discoverability and bad API design. `filterBidirectional` delenda est.
This has been a constant debate -- should ranges be allowed to lazily initialize on the first call to front (or back)? I'd say no. back/front shouldn't be modifying operations. I dislike the idea of storing a "was initialized" flag and having to check that on every call. That being said, there's no formal requirement for it. It's just a convention I think Phobos should stick to for its included ranges. -Steve
Mar 09 2023
next sibling parent reply FeepingCreature <feepingcreature gmail.com> writes:
On Thursday, 9 March 2023 at 13:16:07 UTC, Steven Schveighoffer 
wrote:
 On 3/9/23 3:06 AM, FeepingCreature wrote:
 Yes I know the stated reason, to save time initializing the 
 range in `filter`.
 
 You know what I did because we didn't know 
 `filterBidirectional` existed?
 
 I literally *walked the entire range* returned by `filter` to 
 find the last matching element.
Why not `rng.retro.filter`?
`retro` changes the semantics of the range, which often represents business data. Thinking about the underlying model is complicated enough without throwing in "But consider: what if the elements were in reverse order".
 `filter` should expose `back`. By all means, let `back` lazily 
 initialize, but I don't understand how `filterBidirectional` 
 was ever considered acceptable. Since when do we sacrifice 
 ease of use for performance? I mean, since 2011 apparently. - 
 This is bad ergonomics, bad discoverability and bad API 
 design. `filterBidirectional` delenda est.
This has been a constant debate -- should ranges be allowed to lazily initialize on the first call to front (or back)? I'd say no. back/front shouldn't be modifying operations. I dislike the idea of storing a "was initialized" flag and having to check that on every call. That being said, there's no formal requirement for it. It's just a convention I think Phobos should stick to for its included ranges. -Steve
Sure: so precompute both `back` and `front`, every `filter` call. No caching. This isn't done right now because the language presumes, arbitrarily, that you won't care about `back`. That's what I meant by sacrificing convenience for performance. Just do what `filterBidirectional` does, every time, and have the current `filter` be the odd one out. `filterForward` or whatever. (I think the best way to go, in truth, is to *preinitialize front* and *cache back*. But that solution, much as I think it is optimal for all involved interests, is very hard to love.) That said, I think the range API has a fundamental problem here. `filter` pre-initializes `front` on every access, because it reasonably presumes that we will then want to query it. However, it cannot rely on this. If we were *guaranteed* that every access to `popFront` would alternate with at least one access to `front`, and conversely `back` with `popBack`, we could just reduce `popBack` to `nextRange.popBack` and do the rest of the popping in `back`! Then in the fast case (one access to each, in turn) we would not waste any check. And the actual `filter` call would always be constant-time.
Mar 09 2023
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 3/9/23 11:44 AM, FeepingCreature wrote:
 On Thursday, 9 March 2023 at 13:16:07 UTC, Steven Schveighoffer wrote:
 On 3/9/23 3:06 AM, FeepingCreature wrote:
 Yes I know the stated reason, to save time initializing the range in 
 `filter`.

 You know what I did because we didn't know `filterBidirectional` 
 existed?

 I literally *walked the entire range* returned by `filter` to find 
 the last matching element.
Why not `rng.retro.filter`?
`retro` changes the semantics of the range, which often represents business data. Thinking about the underlying model is complicated enough without throwing in "But consider: what if the elements were in reverse order".
Well, walking the entire range to find the last one doesn't imply that ordering matters at all. You are just looking for the last one.
 `filter` should expose `back`. By all means, let `back` lazily 
 initialize, but I don't understand how `filterBidirectional` was ever 
 considered acceptable. Since when do we sacrifice ease of use for 
 performance? I mean, since 2011 apparently. - This is bad ergonomics, 
 bad discoverability and bad API design. `filterBidirectional` delenda 
 est.
This has been a constant debate -- should ranges be allowed to lazily initialize on the first call to front (or back)? I'd say no. back/front shouldn't be modifying operations. I dislike the idea of storing a "was initialized" flag and having to check that on every call. That being said, there's no formal requirement for it. It's just a convention I think Phobos should stick to for its included ranges.
Sure: so precompute both `back` and `front`, every `filter` call. No caching. This isn't done right now because the language presumes, arbitrarily, that you won't care about `back`. That's what I meant by sacrificing convenience for performance. Just do what `filterBidirectional` does, every time, and have the current `filter` be the odd one out. `filterForward` or whatever.
I would be fine with that. But in truth, this is simply an acceptance of the status quo, just with different names.
 (I think the best way to go, in truth, is to *preinitialize front* and 
 *cache back*. But that solution, much as I think it is optimal for all 
 involved interests, is very hard to love.)
I prefer to have control over all the capabilities. If there is going to be significant cost for something I won't actually use, I'd rather avoid it. Allowing composability might be the way to go.
 That said, I think the range API has a fundamental problem here. 
 `filter` pre-initializes `front` on every access, because it reasonably 
 presumes that we will then want to query it. However, it cannot rely on 
 this. If we were *guaranteed* that every access to `popFront` would 
 alternate with at least one access to `front`, and conversely `back` 
 with `popBack`, we could just reduce `popBack` to `nextRange.popBack` 
 and do the rest of the popping in `back`! Then in the fast case (one 
 access to each, in turn) we would not waste any check. And the actual 
 `filter` call would always be constant-time.
 
`front` and `empty` are semantically non-mutating properties. Calling `front` and/or `empty` as many times as you want should not alter any values until you call `popFront`. I prefer them to be *actually* non-mutating as well. Delaying construction seems like the better option, and having such a wrapper solves all those problems for those who want lazy, without imposing on users who are intending to just use a range as it's intended. -Steve
Mar 09 2023
prev sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Thursday, 9 March 2023 at 13:16:07 UTC, Steven Schveighoffer 
wrote:
 This has been a constant debate -- should ranges be allowed to 
 lazily initialize on the first call to front (or back)?

 I'd say no. back/front shouldn't be modifying operations. I 
 dislike the idea of storing a "was initialized" flag and having 
 to check that on every call.

 That being said, there's no formal requirement for it. It's 
 just a convention I think Phobos should stick to for its 
 included ranges.
I think probably you have to go on a case-by-case basis. `File.byLine`, for example, does not and should not precompute `front`, because doing so is potentially very expensive. OTOH, `iota` does and should precompute `front`, because doing so is essentially free and makes the implementation much simpler. In general, I think we should err on the side of laziness, because it makes range composition easier. For example, when the user writes something like `chain(a(), b(), c())` or `choose(cond, a(), b())`, they probably do not want to spend unnecessary cycles precomputing `b().front` and `c().front`.
Mar 09 2023
next sibling parent Monkyyy <crazymonkyyy gmail.com> writes:
On Thursday, 9 March 2023 at 17:07:40 UTC, Paul Backus wrote:
 On Thursday, 9 March 2023 at 13:16:07 UTC, Steven Schveighoffer 
 wrote:
 [...]
I think probably you have to go on a case-by-case basis. `File.byLine`, for example, does not and should not precompute `front`, because doing so is potentially very expensive. OTOH, `iota` does and should precompute `front`, because doing so is essentially free and makes the implementation much simpler. In general, I think we should err on the side of laziness, because it makes range composition easier. For example, when the user writes something like `chain(a(), b(), c())` or `choose(cond, a(), b())`, they probably do not want to spend unnecessary cycles precomputing `b().front` and `c().front`.
In general if someone would want to precompute a range shouldnt front just be a varible?
Mar 09 2023
prev sibling next sibling parent jmh530 <john.michael.hall gmail.com> writes:
On Thursday, 9 March 2023 at 17:07:40 UTC, Paul Backus wrote:
 [snip]

 I think probably you have to go on a case-by-case basis. 
 `File.byLine`, for example, does not and should not precompute 
 `front`, because doing so is potentially very expensive. OTOH, 
 `iota` does and should precompute `front`, because doing so is 
 essentially free and makes the implementation much simpler.

 In general, I think we should err on the side of laziness, 
 because it makes range composition easier. For example, when 
 the user writes something like `chain(a(), b(), c())` or 
 `choose(cond, a(), b())`, they probably do not want to spend 
 unnecessary cycles precomputing `b().front` and `c().front`.
Or, provide the option to allow the user to make the choice at compile-time.
Mar 09 2023
prev sibling next sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 3/9/23 12:07 PM, Paul Backus wrote:
 On Thursday, 9 March 2023 at 13:16:07 UTC, Steven Schveighoffer wrote:
 This has been a constant debate -- should ranges be allowed to lazily 
 initialize on the first call to front (or back)?

 I'd say no. back/front shouldn't be modifying operations. I dislike 
 the idea of storing a "was initialized" flag and having to check that 
 on every call.

 That being said, there's no formal requirement for it. It's just a 
 convention I think Phobos should stick to for its included ranges.
I think probably you have to go on a case-by-case basis. `File.byLine`, for example, does not and should not precompute `front`, because doing so is potentially very expensive. OTOH, `iota` does and should precompute `front`, because doing so is essentially free and makes the implementation much simpler.
I disagree. If you construct a `byLine` you intend to iterate by lines. Fetching the first line is reasonable and expected (by me anyway). The big issue with `byLine`'s iteration mechanism is whether `popFront` should consume the next line or not. But that is a semantic expectation that people can differ on. Again, as I said elsewhere, using wrappers to compose the desired behavior should be what we use, and all ranges should just always have the same expected semantics. -Steve
Mar 09 2023
parent reply Paul Backus <snarwin gmail.com> writes:
On Thursday, 9 March 2023 at 19:48:21 UTC, Steven Schveighoffer 
wrote:
 On 3/9/23 12:07 PM, Paul Backus wrote:
 I think probably you have to go on a case-by-case basis. 
 `File.byLine`, for example, does not and should not precompute 
 `front`, because doing so is potentially very expensive. OTOH, 
 `iota` does and should precompute `front`, because doing so is 
 essentially free and makes the implementation much simpler.
I disagree. If you construct a `byLine` you intend to iterate by lines.
I gave examples in my post of situations where the user may construct a range before deciding whether or not to iterate it. If you find those examples unconvincing, I'd be happy to hear why.
Mar 09 2023
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 3/9/23 4:16 PM, Paul Backus wrote:
 On Thursday, 9 March 2023 at 19:48:21 UTC, Steven Schveighoffer wrote:
 On 3/9/23 12:07 PM, Paul Backus wrote:
 I think probably you have to go on a case-by-case basis. 
 `File.byLine`, for example, does not and should not precompute 
 `front`, because doing so is potentially very expensive. OTOH, `iota` 
 does and should precompute `front`, because doing so is essentially 
 free and makes the implementation much simpler.
I disagree. If you construct a `byLine` you intend to iterate by lines.
I gave examples in my post of situations where the user may construct a range before deciding whether or not to iterate it. If you find those examples unconvincing, I'd be happy to hear why.
`choose(cond, a(), b())` could evaluate the parameters lazily, and then it doesn't matter if a() or b() precomputes. In fact, the implementation ignores one of them anyway. For chain, it's more complex, because you don't know when the initialization will need to happen. But I'm also fine with that consequence in exchange for simplification of expectations. Let's say it's `chain(File("foo.txt").byLine, File("bar.txt").byLine)`. It's going to be expensive to open the file, but not really any more expensive to read the first data into the buffer (the disk cache will have it ready for you). So you don't save anything significant. However, you lose on the congnitive load for *all* uses of byLine, and indeed all ranges now have a semantic parameter that you have to either "just know" or read the docs/implementation. We already have very complex situations that happen because of the interactions between range initialization and iteration. Most of them happen when you expect lazy or eager, and the other is chosen. I think we should just pick one, and say, it's always that way, and if you want something different, you need wrappers. -Steve
Mar 09 2023
prev sibling parent reply Ogi <ogion.art gmail.com> writes:
On Thursday, 9 March 2023 at 17:07:40 UTC, Paul Backus wrote:
 I think probably you have to go on a case-by-case basis. 
 `File.byLine`, for example, does not and should not precompute 
 `front`, because doing so is potentially very expensive.
And then ‘File.byLine’ ends up being const and now you can’t call ‘front’ on it. Ouch.
Mar 10 2023
parent reply Paul Backus <snarwin gmail.com> writes:
On Friday, 10 March 2023 at 13:46:03 UTC, Ogi wrote:
 On Thursday, 9 March 2023 at 17:07:40 UTC, Paul Backus wrote:
 I think probably you have to go on a case-by-case basis. 
 `File.byLine`, for example, does not and should not precompute 
 `front`, because doing so is potentially very expensive.
And then ‘File.byLine’ ends up being const and now you can’t call ‘front’ on it. Ouch.
Ranges already don't work with `const`, so the fact that this one fails to work with `const` a little more than usual is not a huge deal, IMO.
Mar 10 2023
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 3/10/23 23:47, Paul Backus wrote:
 On Friday, 10 March 2023 at 13:46:03 UTC, Ogi wrote:
 On Thursday, 9 March 2023 at 17:07:40 UTC, Paul Backus wrote:
 I think probably you have to go on a case-by-case basis. 
 `File.byLine`, for example, does not and should not precompute 
 `front`, because doing so is potentially very expensive.
And then ‘File.byLine’ ends up being const and now you can’t call ‘front’ on it. Ouch.
Ranges already don't work with `const`, so the fact that this one fails to work with `const` a little more than usual is not a huge deal, IMO.
`const` tends to just not work really well together with abstraction. "`const` correctness" in D is a fool's errand.
Mar 11 2023
parent reply Tejas <notrealemail gmail.com> writes:
On Saturday, 11 March 2023 at 19:52:06 UTC, Timon Gehr wrote:
 On 3/10/23 23:47, Paul Backus wrote:
 On Friday, 10 March 2023 at 13:46:03 UTC, Ogi wrote:
 On Thursday, 9 March 2023 at 17:07:40 UTC, Paul Backus wrote:
 I think probably you have to go on a case-by-case basis. 
 `File.byLine`, for example, does not and should not 
 precompute `front`, because doing so is potentially very 
 expensive.
And then ‘File.byLine’ ends up being const and now you can’t call ‘front’ on it. Ouch.
Ranges already don't work with `const`, so the fact that this one fails to work with `const` a little more than usual is not a huge deal, IMO.
`const` tends to just not work really well together with abstraction. "`const` correctness" in D is a fool's errand.
Should we just remove `const` then and get it over with?
Mar 12 2023
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 3/12/23 11:06, Tejas wrote:
 On Saturday, 11 March 2023 at 19:52:06 UTC, Timon Gehr wrote:
 On 3/10/23 23:47, Paul Backus wrote:
 On Friday, 10 March 2023 at 13:46:03 UTC, Ogi wrote:
 On Thursday, 9 March 2023 at 17:07:40 UTC, Paul Backus wrote:
 I think probably you have to go on a case-by-case basis. 
 `File.byLine`, for example, does not and should not precompute 
 `front`, because doing so is potentially very expensive.
And then ‘File.byLine’ ends up being const and now you can’t call ‘front’ on it. Ouch.
Ranges already don't work with `const`, so the fact that this one fails to work with `const` a little more than usual is not a huge deal, IMO.
`const` tends to just not work really well together with abstraction. "`const` correctness" in D is a fool's errand.
Should we just remove `const` then and get it over with?
`const` is fine for POD. Things like `scope const(ubyte)[]` are useful. It's just not a good match for abstract data types.
Mar 12 2023
prev sibling next sibling parent JG <someone simewhere.com> writes:
On Thursday, 9 March 2023 at 08:06:02 UTC, FeepingCreature wrote:
 Yes I know the stated reason, to save time initializing the 
 range in `filter`.

 You know what I did because we didn't know 
 `filterBidirectional` existed?

 I literally *walked the entire range* returned by `filter` to 
 find the last matching element.

 `filter` should expose `back`. By all means, let `back` lazily 
 initialize, but I don't understand how `filterBidirectional` 
 was ever considered acceptable. Since when do we sacrifice ease 
 of use for performance? I mean, since 2011 apparently. - This 
 is bad ergonomics, bad discoverability and bad API design. 
 `filterBidirectional` delenda est.
I agree with you that it probably should be bidirectional by default. I don't agree with making back lazy (but that is perhaps a separate topic). It probably should have a compile time flag to enable bidirectional iteration (on by default).
Mar 12 2023
prev sibling parent reply Atila Neves <atila.neves gmail.com> writes:
On Thursday, 9 March 2023 at 08:06:02 UTC, FeepingCreature wrote:
 Yes I know the stated reason, to save time initializing the 
 range in `filter`.

 You know what I did because we didn't know 
 `filterBidirectional` existed?

 I literally *walked the entire range* returned by `filter` to 
 find the last matching element.

 `filter` should expose `back`. By all means, let `back` lazily 
 initialize, but I don't understand how `filterBidirectional` 
 was ever considered acceptable. Since when do we sacrifice ease 
 of use for performance? I mean, since 2011 apparently. - This 
 is bad ergonomics, bad discoverability and bad API design. 
 `filterBidirectional` delenda est.
I agree with the sentiment here, but I do wonder about unintended consequences of pre-computing `back` as well as `front`. Atila
Apr 14 2023
parent Lata chaudhary <dkgnews2023 gmail.com> writes:
filterBidirectional exists to provide a way to filter signals in 
both directions, which is necessary for specific applications. It 
is not simply called a filter because the standard filter 
function only filters in one direction, and does not provide this 
bidirectional capability.
Apr 18 2023