digitalmars.D.learn - Is there something like a consuming take?

berni (11/18) Jul 06 2019 I want to copy the first n items of a range to an array, removing

a11e99z (8/18) Jul 06 2019 sure

berni (8/15) Jul 06 2019 Doesn't look like what I'm looking for, as it is exactly the same

a11e99z (15/22) Jul 06 2019 auto take_consuming( R )( ref R r, int cnt ) {
berni (3/3) Jul 06 2019 Now it's getting weird. Meanwhile I encountered, that take()

berni (9/25) Jul 06 2019 take does not consume.

Adam D. Ruppe (16/22) Jul 06 2019 byChunk is defined to reuse its buffer between calls.

berni (10/15) Jul 06 2019 Thanks for clearifing what happens. In my oppinion the behaviour

Adam D. Ruppe (37/41) Jul 06 2019 It depends on what you're passing.
Jonathan M Davis (35/38) Jul 06 2019 take _always_ consumes the range that it's given. The problem is that so...

Adam D. Ruppe (4/5) Jul 06 2019 not if it hasSlicing. see

berni (38/38) Jul 06 2019 I start to understand: A (lazy) range is something like a

Jonathan M Davis (21/24) Jul 07 2019 Without slicing, that's impossible without iterating over the elements

berni (14/16) Jul 07 2019 That's what I thought too, but meanwhile I think it is possible.

Jonathan M Davis (35/51) Jul 07 2019 Having one range know about the other isn't enough. That just means that...

berni (14/56) Jul 09 2019 Sounds like you didn't understand what I meant. So I tried to

berni (7/7) Jul 09 2019 Hurray, it works! :-)

Jonathan M Davis (5/10) Jul 07 2019 True enough, though if the code is generic, it still pretty much has to

Jonathan M Davis (23/61) Jul 06 2019 Another thing to consider about lazy ranges that take pointers to avoid

berni <someone somewhere.com> writes:

I want to copy the first n items of a range to an array, removing 
these items from the range.

This works:

 foreach (i;0..n)
 {
    data ~= r.front;
    r.popFront();
 }

but looks a little bit arkward.

I came up with this now:

 data = r.take(n).array;

This works partly, because the values of r are not consumed. So I 
have to call afterwards:

 r = r.drop(n);

Now I wonder, if it is possible to do this with one single call, 
something like

data = r.take_consuming(n).array;

Does there something like this exist?

Jul 06 2019

a11e99z <black80 bk.ru> writes:

On Saturday, 6 July 2019 at 11:20:50 UTC, berni wrote:
 I want to copy the first n items of a range to an array, I came 
 up with this now:
 data = r.take(n).array;

 This works partly, because the values of r are not consumed. So 
 I have to call afterwards:
 r = r.drop(n);

 Now I wonder, if it is possible to do this with one single 
 call, something like
 data = r.take_consuming(n).array;
 Does there something like this exist?

sure
auto take_consuming( R )( ref R r, int cnt ) {
     auto tmp = r.take( cnt ).array;
     r = r.drop( cnt );
     return tmp;
}
don't thank

Jul 06 2019

berni <someone somewhere.com> writes:

On Saturday, 6 July 2019 at 11:48:51 UTC, a11e99z wrote:
 sure
 auto take_consuming( R )( ref R r, int cnt ) {
     auto tmp = r.take( cnt ).array;
     r = r.drop( cnt );
     return tmp;
 }
 don't thank

Doesn't look like what I'm looking for, as it is exactly the same 
I allready found.

Maybe I need to explain, what I dislike with this approach: 
take() calls popFront n times and drop() calls popFront another n 
times giving a total of 2n times (depending on the underlying 
range, this might cause lot's of calulcations be done twice. The 
first version with the foreach loop calls popFront only n times.

Jul 06 2019

a11e99z <black80 bk.ru> writes:

On Saturday, 6 July 2019 at 12:10:13 UTC, berni wrote:
 On Saturday, 6 July 2019 at 11:48:51 UTC, a11e99z wrote:

 Maybe I need to explain, what I dislike with this approach: 
 take() calls popFront n times and drop() calls popFront another 
 n times giving a total of 2n times (depending on the underlying 
 range, this might cause lot's of calulcations be done twice. 
 The first version with the foreach loop calls popFront only n 
 times.

auto take_consuming( R )( ref R r, int cnt ) {
     import std.range.primitives : hasSlicing;
     static if (hasSlicing!R) { // without allocations
     	auto tmp = r[0..cnt];
     	r = r[cnt..$]; // or r.popFronN( cnt ); // O(1)
         return tmp;
     } else { // loop range once
         auto tmp = uninitializedArray!( ElementType!R[])( cnt);
         int k = 0;
         for (; !r.empty && k<cnt; ++k, r.popFront) tmp[ k] = 
r.front;
     	return tmp[ 0..k];
     }
}

Jul 06 2019

berni <someone somewhere.com> writes:

Now it's getting weird. Meanwhile I encountered, that take() 
sometimes consumes and sometimes not. Where can I learn, what is 
the reason behind this behavior? And how can I handle this?

Jul 06 2019

berni <someone somewhere.com> writes:

A small example showing this strange behaviour:

import std.stdio;
import std.algorithm.iteration;
import std.range;

enum BUFFER_SIZE = 1024;

void main(string[] args)
{
    auto a = (new File(args[1]))
        .byChunk(BUFFER_SIZE)
        .joiner;

    writeln(a.take(5));
    writeln(a);
}

Using a file, containing the bytes 1 to 10 I get:

[ 1, 2, 3, 4, 5 ]
[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10 ]

take does not consume.

When I now change BUFFER_SIZE to 2 I get:

[ 1, 2, 3, 4, 5 ]
[ 5, 6, 7, 8, 9, 10 ]

Now the first two buffers have been consumend and the third ([5, 
6]) not.

Feels like a bug in Phobos. But maybe I do not understand, what's 
happening and this is correct behaviour. Can anyone explain or 
confirm, that this is a bug?

Jul 06 2019

Adam D. Ruppe <destructionator gmail.com> writes:

On Saturday, 6 July 2019 at 14:40:23 UTC, berni wrote:
        .byChunk(BUFFER_SIZE)


byChunk is defined to reuse its buffer between calls.

http://dpldocs.info/experimental-docs/std.stdio.byChunk.1.html#examples

This means previous contents are overwritten when you advance.


 When I now change BUFFER_SIZE to 2 I get:

[ 1, 2, 3, 4, 5 ]
[ 5, 6, 7, 8, 9, 10 ]

 Now the first two buffers have been consumend and the third 
 ([5, 6]) not.

So here, the take call gave you a view into the first 5 elements. 
It read one and two and printed them, then byChunk.popFront was 
called, overwriting the buffer with 3,4 and take passed that to 
writeln, then popFront again, overwriting with 5,6.

writeln printed out 5, and take, having finished its work, left 
the buffer alone in its state.


Now, you print the other thing, which still has 5,6 in the 
buffer, then popFront, overwrites with 7,8, etc and so on.


So this is a case of input range behavior - always consuming the 
underlying file - combined with buffering of two elements at 
once, leaving 5,6 behind, and the reuse of the buffer meaning you 
see that 5,6 again on the next call.

Jul 06 2019

berni <someone somewhere.com> writes:

On Saturday, 6 July 2019 at 14:48:04 UTC, Adam D. Ruppe wrote:
 [...]
 So this is a case of input range behavior - always consuming 
 the underlying file - combined with buffering of two elements 
 at once, leaving 5,6 behind, and the reuse of the buffer 
 meaning you see that 5,6 again on the next call.

Thanks for clearifing what happens. In my oppinion the behaviour 
of take() should be better defined. It's clear, that take() 
returns a range with the first n elements of the underlaying 
range (and that is done lazily). But it's not specified what 
happens with the underlaying range. As the behaviour is 
unpredictable (or at least hard to predict), one should assume, 
that the underlaying range is completely destroyed by take(). 
This makes take() much less usefull, than it could be, in my 
eyes. :-(

Jul 06 2019

Adam D. Ruppe <destructionator gmail.com> writes:

On Saturday, 6 July 2019 at 14:12:36 UTC, berni wrote:
 Meanwhile I encountered, that take() sometimes consumes and 
 sometimes not.

It depends on what you're passing.

So take is defined as just getting the first N elements from the 
given range. So what happens next depends on what it is "taking" 
from (I don't like the name "take" exactly because that implies, 
well, taking. What the function really does is more like "view 
into first N elements".


With input ranges, iterating over them consumes it implicitly, so 
you could say that take *always* consumes input ranges (at least 
once it gets iterated over).

But, it will frequently consume a copy of the range instead of 
the one you have at the top level, since ranges are passed by 
value.

If you want it to be seen outside, `ref` is the general answer. 
You might find success with refRange in some cases.

http://dpldocs.info/experimental-docs/std.range.refRange.html



But if you are passing something with slicing, like a plain 
array, take will never actually consume it, and instead just 
slice the input, even if it is ref. This gets the view of those 
first elements in cheaper way.

So going back to your original definition:

 I want to copy the first n items of a range to an array, 
 removing these items from the range.

As far as I know, none of the std.range functions are defined to 
do this.

I'd probably write your own that:

1) takes the range by ref so changes are visible outside
2) iterates over it with popFront
3) returns the copy

that should fulfill all the requirements. you could slightly 
optimize arrays (or other hasSlicing things) like

int[] yourFunction(ref int[] arr, int n) {
    auto ret = arr[0 .. n];
    arr = arr[n .. $];
    return ret;
}


that is, just slicing and consuming in one go for each side and 
then you don't even have to actually copy it, just return the 
slice.

Jul 06 2019

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Saturday, July 6, 2019 8:12:36 AM MDT berni via Digitalmars-d-learn 
wrote:
 Now it's getting weird. Meanwhile I encountered, that take()
 sometimes consumes and sometimes not. Where can I learn, what is
 the reason behind this behavior? And how can I handle this?

take _always_ consumes the range that it's given. The problem is that some
types of ranges are implicitly saved when they're copied, whereas others
aren't, and when a range is implicitly saved when it's copied, you end up
with the copy being consumed. Dynamic arrays are implicitly saved when
they're copied, so the range you pass is saved, and the copy is consumed
instead of the original.

In generic code, you have to assume that once a range has been copied, you
can't use it anymore (just the copy) precisely because the semantics of
copying differ depending on the type of the range. You can only use a range
after copying it if you know what type of range you're dealing with and how
it behaves. So, you can rely on a dynamic array implicitly saving when it's
passed to take, but in generic code, you really shouldn't be using a range
again once you pass it to take, because what actually happens is dependent
on the type of the range.

In general what this means is that if you pass a range to a function, and
you then want to use the range again afterwards, you need to call save when
passing it to the function, and otherwise, you just assume that it's
consumed and don't use it again. You certainly don't pass it to a function
with the expectation of some elements being consumed and then continue to
use the rest of the range unless the function takes its argument by ref or
by pointer (which relatively few range-based functions do).

If you want a function that's guaranteed to not implicitly copy a range,
then it needs to accept the argument by ref or take a pointer to it. In the
case where a function doesn't have to do the work lazily, ref would work,
but in a case like take where you're returning a wrapper range, pointers
would be required. So, a version of take that didn't ever copy the range
it's given and thus never risked implicitly saving the range it was passed
would have to either take a pointer to it or take it by ref and then take
the address of the ref. In either case, the code using such a take would
then have to ensure that the original range didn't leave scope and get
destroyed before the take range was consumed, or the take range would then
refer to invalid memory.

- Jonathan M Davis

Jul 06 2019

Adam D. Ruppe <destructionator gmail.com> writes:

On Saturday, 6 July 2019 at 18:17:26 UTC, Jonathan M Davis wrote:
 take _always_ consumes the range that it's given

not if it hasSlicing. see 
http://dpldocs.info/experimental-docs/source/std.range.d.html#L2015

but yeah otherwise i agree with you

Jul 06 2019

berni <someone somewhere.com> writes:

I start to understand: A (lazy) range is something like a 
voucher: byChunk() does not provide the data immediately, but 
only a voucher to hand you a chunk (an array of 2 bytes in my 
example above) a time. This voucher is passed to joiner(), which 
keeps that voucher and hands me out a new voucher for a byte a 
time (called "a" in my example above). This voucher is now passed 
to take(), whick again keeps that voucher and hands me out yet an 
other voucher for a byte a time, but limits itself to 5 items.

Up to now nothing has happend, but the exchange of that vouchers. 
Now when writeln() is called, it shows take() that voucher and 
says: "Give me the first of your numbers". take() uses his 
voucher to do the same with joiner(), which uses its voucher to 
get the first chunk of byChunk(), takes the first byte of that 
chunk and hands it to take() and take() hands it to writeln().

Now writeln() uses it's voucher once again, to get the next byte. 
take() asks joiner() again, but this time joiner() does not ask 
byChunk(), because it still has one number left in the chunk it 
got when asking the last time. Now joiner() uses this second byte 
to hand it back. And so on.

After writeln() finished I'm doing something evil: I use a 
private copy of that voucher, that I allready handed to take() to 
use it myself. That's sort of stealing back. And hence the 
results are more or less unpredictable. Especially, when I use 
this copy, before take() finished it's job, as Jonathan M Davis 
points out.

My misthinking was to assume, that my copy of the voucher is 
still the same as the voucher, that take() owns after having 
handed out the 5 items, so I can use it to get the other items. 
If it where possible to ask take() to hand me back it's voucher, 
I could do it with this voucher, but I don't see a possibility to 
get that voucher back.

What I actually want to do, when using take(), is not to get a 
new voucher, but I want to keep my voucher and just use this 
voucher to get the first 5 items - take() doesn't do that, so 
I've got to write my own function which cannot be lazy.

Or better: I'd like to hand in my voucher and get back two 
vouchers, one for the first 5 bytes and one for the rest. That's 
actually, what I thought, take() is doing...

Jul 06 2019

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Saturday, July 6, 2019 11:02:15 PM MDT berni via Digitalmars-d-learn 
wrote:
 Or better: I'd like to hand in my voucher and get back two
 vouchers, one for the first 5 bytes and one for the rest. That's
 actually, what I thought, take() is doing...

Without slicing, that's impossible without iterating over the elements
multiple times. It would be possible to have a function like take that did
something like return a tuple of two ranges where the first one is the taken
elements, and the second one is the rest, but without slicing, that could
only be done by calling save so that the two ranges are independent copies
of the original range, and then the range that doesn't have the elements
that were taken off has to internally pop off all of the elements that were
taken so that its first element is the first one that wasn't taken. There's
no way to magically skip the taken elements for the second range without
slicing.

The alternative is to manually iterate through all of the elements that you
want to take, doing whatever you're going to do with them, and then doing
whatever you want to do with rest of the range after that. Then iteration
only occurs once. But that means that you don't have a range of the taken
elements and can't do something like pass them to another algorithm as a
range without constructing a new range from them, and if you're going to do
that, you might as well just call save, pass the range to take, and the call
drop on the original range to pop off the elements that were taken.

- Jonathan M Davis

Jul 07 2019

berni <someone somewhere.com> writes:

On Sunday, 7 July 2019 at 09:01:53 UTC, Jonathan M Davis wrote:
 Without slicing, that's impossible without iterating over the 
 elements multiple times.

That's what I thought too, but meanwhile I think it is possible. 
To get it working, the second range needs to know about the first 
one and when the second one is queried the first time it has to 
notify the first one, that it has to remove all remaining items 
from the underlying range (and in case it still needs them, save 
them) and never use it again.

The advantage of this setup over copying the items immediately is 
some sort of laziness. It might happen, that the second one will 
never query or it might happen, that the first one finished 
allready when the second one queries. In this cases no additional 
memory is needed.
Even if the second one is queried before the first one finished, 
the needed buffer might be much smaller.

Jul 07 2019

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Sunday, July 7, 2019 12:36:42 PM MDT berni via Digitalmars-d-learn wrote:
 On Sunday, 7 July 2019 at 09:01:53 UTC, Jonathan M Davis wrote:
 Without slicing, that's impossible without iterating over the
 elements multiple times.

 That's what I thought too, but meanwhile I think it is possible.
 To get it working, the second range needs to know about the first
 one and when the second one is queried the first time it has to
 notify the first one, that it has to remove all remaining items
 from the underlying range (and in case it still needs them, save
 them) and never use it again.

 The advantage of this setup over copying the items immediately is
 some sort of laziness. It might happen, that the second one will
 never query or it might happen, that the first one finished
 allready when the second one queries. In this cases no additional
 memory is needed.
 Even if the second one is queried before the first one finished,
 the needed buffer might be much smaller.

Having one range know about the other isn't enough. That just means that the
take range would tell the other range that it had popped an element off, and
then the other would know that it had to pop an element off. That still
involves popping the same element on different ranges twice. In order to
have popFront only called once, they have to be the same range, so when take
pops an element off, it pops an element off of the other range. If both the
take range and the drop range are wrappers, then it would be possible to do
something with pointers to the original range where the number of elements
that has been popped off is kept track of, and as long as the take range
pops them all first, the drop range can just continue iterating through the
original range, but if any elements from the drop range got iterated first,
then it would have to save the range and start popping from there not to
screw up the take range. And since pointers to the original range would have
to be used to avoid having independent copies, you run the risk of all of
this being screwed up by a piece of code popping an element from the
original range at some point. Also, all of that machinery would likely
quickly become far more expensive than simply saving the range to call take
on it and then calling drop on the original. Yes, that pops the same
elements multiple times, but wrapper ranges already end up calling chains of
popFronts, fronts, and emptys as they iterate through a range. You rarely
really get a single popFront call for a popFront unless the optimizer has
actually managed to optimize out all of the layers.

You can try mucking around with your own implemention of take if you want,
but I'd suggest that you're just better off taking the approach that it's
expected that if you use take, you're going to have to iterate through those
elements twice and that if that's not acceptable, it's better to do whatever
you need to do with the take range by manually operating on each of the
elements rather than having them be a range. Unless random-access ranges
with slicing are involved, trying to do anything that effectively slices a
range without iterating over the elements multiple times is almost certainly
going to be a mess. At some point, if you really want slicing semantics,
then you need to make sure that your range actually has slicing rather than
trying to act like it does when it doesn't.

- Jonathan M Davis

Jul 07 2019

berni <someone somewhere.com> writes:

On Sunday, 7 July 2019 at 21:55:17 UTC, Jonathan M Davis wrote:
 Having one range know about the other isn't enough. That just 
 means that the take range would tell the other range that it 
 had popped an element off, and then the other would know that 
 it had to pop an element off. That still involves popping the 
 same element on different ranges twice.

Sounds like you didn't understand what I meant. So I tried to 
implement what I had in mind and I have to confess, that there is 
something about ranges that I do not understand yet:

import std.stdio;
import std.array;

void main()
{
    auto a = [1,2,3,4,5,6,7,8,9,10,11,12,13,14,15];

    call(a,5);
}

void call(T)(ref T range, int c)
{
    struct Take
    {
        T* take_range;
        int count;

         property bool empty=false;
         property auto front() { return (*take_range).front; }
        void popFront()
        {
            (*take_range).popFront();
            if (--count<=0) empty=true;
        }

        T* get_range()
        {
            assert(count==0);
            while (!empty) popFront();
            return take_range;
        }
    }

    auto take = Take(&range,c);

//    writeln(take);
    while (!take.empty) { write(take.front); take.popFront(); }

    auto original_range = take.get_range();

    writeln(*original_range);
}

Running this program leads to
12345[6, 7, 8, 9, 10, 11, 12, 13, 14, 15]

as expected.

But when I replace that loop at the bottom by writeln(take) the 
assertion in get_range fails (count==5). As all the functions are 
called in the same order by both the loop and writeln, I confess, 
that writeln somehow manages to use a copy of "take" although 
this isn't a ForwardRange and copying should therefore not be 
possible.

 You can try mucking around with your own implemention of take 
 if you want, but I'd suggest that you're just better off taking 
 the approach that it's expected that if you use take,

I't not about improving or doing things different. It's just 
about me understanding, what happens.

Jul 09 2019

berni <someone somewhere.com> writes:

Hurray, it works! :-)

https://run.dlang.io/is/2GMq34

I have to use classes to avoid copying when arguments are passed 
to a function. (And yes, there should of course be much more 
checks, especially when there are to few elements in the original 
range. And it could be speed improved and so on... I wanted to 
keep it simple.)

Jul 09 2019

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Saturday, July 6, 2019 12:58:52 PM MDT Adam D. Ruppe via Digitalmars-d-
learn wrote:
 On Saturday, 6 July 2019 at 18:17:26 UTC, Jonathan M Davis wrote:
 take _always_ consumes the range that it's given

 not if it hasSlicing. see
 http://dpldocs.info/experimental-docs/source/std.range.d.html#L2015

 but yeah otherwise i agree with you

True enough, though if the code is generic, it still pretty much has to
assume that the range was consumed.

- Jonathan M Davis

Jul 07 2019

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Saturday, July 6, 2019 12:17:26 PM MDT Jonathan M Davis via Digitalmars-
d-learn wrote:
 On Saturday, July 6, 2019 8:12:36 AM MDT berni via Digitalmars-d-learn

 wrote:
 Now it's getting weird. Meanwhile I encountered, that take()
 sometimes consumes and sometimes not. Where can I learn, what is
 the reason behind this behavior? And how can I handle this?

 take _always_ consumes the range that it's given. The problem is that some
 types of ranges are implicitly saved when they're copied, whereas others
 aren't, and when a range is implicitly saved when it's copied, you end up
 with the copy being consumed. Dynamic arrays are implicitly saved when
 they're copied, so the range you pass is saved, and the copy is consumed
 instead of the original.

 In generic code, you have to assume that once a range has been copied, you
 can't use it anymore (just the copy) precisely because the semantics of
 copying differ depending on the type of the range. You can only use a
 range after copying it if you know what type of range you're dealing with
 and how it behaves. So, you can rely on a dynamic array implicitly saving
 when it's passed to take, but in generic code, you really shouldn't be
 using a range again once you pass it to take, because what actually
 happens is dependent on the type of the range.

 In general what this means is that if you pass a range to a function, and
 you then want to use the range again afterwards, you need to call save
 when passing it to the function, and otherwise, you just assume that it's
 consumed and don't use it again. You certainly don't pass it to a
 function with the expectation of some elements being consumed and then
 continue to use the rest of the range unless the function takes its
 argument by ref or by pointer (which relatively few range-based functions
 do).

 If you want a function that's guaranteed to not implicitly copy a range,
 then it needs to accept the argument by ref or take a pointer to it. In
 the case where a function doesn't have to do the work lazily, ref would
 work, but in a case like take where you're returning a wrapper range,
 pointers would be required. So, a version of take that didn't ever copy
 the range it's given and thus never risked implicitly saving the range it
 was passed would have to either take a pointer to it or take it by ref
 and then take the address of the ref. In either case, the code using such
 a take would then have to ensure that the original range didn't leave
 scope and get destroyed before the take range was consumed, or the take
 range would then refer to invalid memory.

Another thing to consider about lazy ranges that take pointers to avoid
implicit saving and ensure that they consume the original is that if the
original has anything popped from it before the wrapper range is fully
consumed, then that will screw up the wrapper range, because the elements in
the range it's wrapping will have changed. Wrapper ranges are really
designed with the idea that they're operating on their own copy of the
range. Even with take as it's currently implemented, you risk bugs if you
pass it a range which is a reference type without calling save, because if
you don't fully consume the take range before using the original range
again, you end up screwing up the elements in the take range when you pop
anything off of the original range.

It really doesn't work well to have a wrapper range which doesn't have its
own independent copy of the range to iterate over. The only case where it
works cleanly to have a wrapper range like take gives you and not have to
iterate through the elements again to have a range starting at the first
element after the take range is to have a random-access range that has
slicing, in which case, you can just slice it, and take isn't required.
Without a random-access range with slicing, if you want to only iterate
through the elements once, you really should just be iterating through the
range manually rather than using a wrapper range.

- Jonathan M Davis

Jul 06 2019

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Is there something like a consuming take?