digitalmars.D - chunkBy implementation

Steven Schveighoffer (34/34) Sep 29 2020 Here is the chunkBy implementation for an array:

H. S. Teoh (26/32) Sep 29 2020 I was the one who wrote the original (non-RefCounted) implementation of

Steven Schveighoffer (10/26) Sep 29 2020 Hm... well, what about eagerly deciding when to stop, and just returning...

Steven Schveighoffer <schveiguy gmail.com> writes:

Here is the chunkBy implementation for an array: 
https://github.com/dlang/phobos/blob/4d7e5829cb60946fefad84d066de948e813eb6f5/std/algorithm/iteration.d#L2057

Complete with RefCounted allocations and everything. The whole works! 
Warning, do not pass this into std.array.array, it really isn't what you 
want to do.

Here's a simple range that slices an array according to a predicate:

struct SliceBy(alias pred, R)
{
     R remaining;
     size_t nextIdx;
     auto front() { return remaining[0 .. nextIdx]; }
     void popFront() {
         remaining = remaining[nextIdx .. $];
         if(remaining.length == 0)
             return;
         nextIdx = 1;
         while(nextIdx < remaining.length && pred(remaining[nextIdx - 
1], remaining[nextIdx]))
             ++nextIdx;
     }
     bool empty() { return remaining.empty; }
}

No allocations, no RefCounted. The range elements have the same 
properties as the original (random access, etc.). I can squash this into 
std.array.array and it works as expected (i.e. I get an R[]). I can add 
forward and bidirectional range primitives if I wanted to (and all the 
other machinery that would be required to get it phobos-ready), but I 
don't need all that for my purposes, I just need a foreachable thingy.

But why is this not something that is already in there? Is there a 
reason we want to call the predicate all over again if you process a 
group multiple times? Is there a reason we aren't taking advantage of 
slicing when available?

Just trying to figure out the reasoning behind the complexity.

-Steve

Sep 29 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Sep 29, 2020 at 05:33:35PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
 Here is the chunkBy implementation for an array:
https://github.com/dlang/phobos/blob/4d7e5829cb60946fefad84d066de948e813eb6f5/std/algorithm/iteration.d#L2057
 
 Complete with RefCounted allocations and everything. The whole works!

I was the one who wrote the original (non-RefCounted) implementation of
chunkBy. During PR review, however, we ran into the case of how the
resulting range iterates the original range twice: once in the subranges
as you .popFront through them, and once in the parent range's .popFront
(which needs to find the end of the subrange).

To eliminate this redundancy, Andrei came up with the mothership idea:
the subrange would keep a reference to the parent range, so that if the
subrange has already been consumed, the parent range can just pick up
where it left off instead of starting from the beginning of the subrange
and scan until the next subrange.

Keeping a reference to the parent range, however, opens a whole can o'
worms about lifetime and all that jazz, so in the end, RefCounted was
brought into the mix, to solve what's arguably a case of premature
optimization.


[...]
 Is there a reason we want to call the predicate all over again if you
 process a group multiple times? Is there a reason we aren't taking
 advantage of slicing when available?

I think I was so burned out by the time we arrived at the RefCounted
implementation, I just had no more strength or interest left to think
about the case when the incoming range has slicing!  It's probably a
case of an over-engineered solution displacing all other locally more
optimal solutions -- since the RefCounted monster already handles all
cases, nobody wants to introduce another case into the code that only
handles a subset of cases.


T

-- 
You are only young once, but you can stay immature indefinitely. -- azephrahel

Sep 29 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 9/29/20 6:08 PM, H. S. Teoh wrote:
On Tue, Sep 29, 2020 at 05:33:35PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
Here is the chunkBy implementation for an array:
https://github.com/dlang/phobos/blob/4d7e5829cb60946fefad84d066de948e813eb6f5/std/algorithm/iteration.d#L2057

Complete with RefCounted allocations and everything. The whole works!

I was the one who wrote the original (non-RefCounted) implementation of
chunkBy. During PR review, however, we ran into the case of how the
resulting range iterates the original range twice: once in the subranges
as you .popFront through them, and once in the parent range's .popFront
(which needs to find the end of the subrange).

To eliminate this redundancy, Andrei came up with the mothership idea:
the subrange would keep a reference to the parent range, so that if the
subrange has already been consumed, the parent range can just pick up
where it left off instead of starting from the beginning of the subrange
and scan until the next subrange.

Hm... well, what about eagerly deciding when to stop, and just returning
a Take range with the right number of elements? Then you aren't running
the predicates more than once per element.

In any case, I've solved my local problem, and I wanted to ask about it
here, since it seemed so out of left field to have this bizarre
"mothership" mechanism (I haven't seen that on any other ranges).

If something like my quick-and-dirty version was added, I'm assuming
maybe we'd have to name it something else.

-Steve

Sep 29 2020

D Programming

C/C++ Programming

Other

digitalmars.D - chunkBy implementation