www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - chunkBy implementation

reply Steven Schveighoffer <schveiguy gmail.com> writes:
Here is the chunkBy implementation for an array: 
https://github.com/dlang/phobos/blob/4d7e5829cb60946fefad84d066de948e813eb6f5/std/algorithm/iteration.d#L2057

Complete with RefCounted allocations and everything. The whole works! 
Warning, do not pass this into std.array.array, it really isn't what you 
want to do.

Here's a simple range that slices an array according to a predicate:

struct SliceBy(alias pred, R)
{
     R remaining;
     size_t nextIdx;
     auto front() { return remaining[0 .. nextIdx]; }
     void popFront() {
         remaining = remaining[nextIdx .. $];
         if(remaining.length == 0)
             return;
         nextIdx = 1;
         while(nextIdx < remaining.length && pred(remaining[nextIdx - 
1], remaining[nextIdx]))
             ++nextIdx;
     }
     bool empty() { return remaining.empty; }
}

No allocations, no RefCounted. The range elements have the same 
properties as the original (random access, etc.). I can squash this into 
std.array.array and it works as expected (i.e. I get an R[]). I can add 
forward and bidirectional range primitives if I wanted to (and all the 
other machinery that would be required to get it phobos-ready), but I 
don't need all that for my purposes, I just need a foreachable thingy.

But why is this not something that is already in there? Is there a 
reason we want to call the predicate all over again if you process a 
group multiple times? Is there a reason we aren't taking advantage of 
slicing when available?

Just trying to figure out the reasoning behind the complexity.

-Steve
Sep 29 2020
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Sep 29, 2020 at 05:33:35PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 Here is the chunkBy implementation for an array:
https://github.com/dlang/phobos/blob/4d7e5829cb60946fefad84d066de948e813eb6f5/std/algorithm/iteration.d#L2057
 
 Complete with RefCounted allocations and everything. The whole works!
I was the one who wrote the original (non-RefCounted) implementation of chunkBy. During PR review, however, we ran into the case of how the resulting range iterates the original range twice: once in the subranges as you .popFront through them, and once in the parent range's .popFront (which needs to find the end of the subrange). To eliminate this redundancy, Andrei came up with the mothership idea: the subrange would keep a reference to the parent range, so that if the subrange has already been consumed, the parent range can just pick up where it left off instead of starting from the beginning of the subrange and scan until the next subrange. Keeping a reference to the parent range, however, opens a whole can o' worms about lifetime and all that jazz, so in the end, RefCounted was brought into the mix, to solve what's arguably a case of premature optimization. [...]
 Is there a reason we want to call the predicate all over again if you
 process a group multiple times? Is there a reason we aren't taking
 advantage of slicing when available?
I think I was so burned out by the time we arrived at the RefCounted implementation, I just had no more strength or interest left to think about the case when the incoming range has slicing! It's probably a case of an over-engineered solution displacing all other locally more optimal solutions -- since the RefCounted monster already handles all cases, nobody wants to introduce another case into the code that only handles a subset of cases. T -- You are only young once, but you can stay immature indefinitely. -- azephrahel
Sep 29 2020
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 9/29/20 6:08 PM, H. S. Teoh wrote:
 On Tue, Sep 29, 2020 at 05:33:35PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 Here is the chunkBy implementation for an array:
https://github.com/dlang/phobos/blob/4d7e5829cb60946fefad84d066de948e813eb6f5/std/algorithm/iteration.d#L2057

 Complete with RefCounted allocations and everything. The whole works!
I was the one who wrote the original (non-RefCounted) implementation of chunkBy. During PR review, however, we ran into the case of how the resulting range iterates the original range twice: once in the subranges as you .popFront through them, and once in the parent range's .popFront (which needs to find the end of the subrange). To eliminate this redundancy, Andrei came up with the mothership idea: the subrange would keep a reference to the parent range, so that if the subrange has already been consumed, the parent range can just pick up where it left off instead of starting from the beginning of the subrange and scan until the next subrange.
Hm... well, what about eagerly deciding when to stop, and just returning a Take range with the right number of elements? Then you aren't running the predicates more than once per element. In any case, I've solved my local problem, and I wanted to ask about it here, since it seemed so out of left field to have this bizarre "mothership" mechanism (I haven't seen that on any other ranges). If something like my quick-and-dirty version was added, I'm assuming maybe we'd have to name it something else. -Steve
Sep 29 2020