digitalmars.D - opIndex, opSlice, length for joiner?

Maverick Chardet (22/22) Jan 20 2016 Hi everyone,

Jack Stouffer (9/15) Jan 20 2016 I definitely think you should open a PR with your ideas
Ilya (10/32) Jan 20 2016 O(K) looks OK for joiner. However this will cause additional
Jonathan M Davis (14/20) Jan 20 2016 The various range primitives all need to be O(1). We need to be

Maverick Chardet (15/35) Jan 20 2016 Thank you all for your comments! I'm not making a PR for now

Andrei Alexandrescu (20/39) Jan 20 2016 Thanks for askin. Consider chain(), which currently DOES offer random

Maverick Chardet (17/38) Jan 21 2016 Thank you for your remarks!

Andrei Alexandrescu (2/5) Jan 21 2016 Indeed that range would be ill-formed. -- Andrei

Maverick Chardet (43/50) Jan 22 2016 Okay so I'll just use an assert!

Jonathan M Davis (23/73) Jan 22 2016 Arguably, walkLength shouldn't even accept a range that isn't a

Maverick Chardet (9/20) Jan 22 2016 Yes actually this was my point, why not forcing the range to be a

Maverick Chardet (6/6) Jan 22 2016 And coming back to the specialized version of walkLength: in the

Andrei Alexandrescu (3/8) Jan 22 2016 walkLength is what walkLength does: it accepts even an input range and

Maverick Chardet <chardetm gmail.com> writes:

Hi everyone,

After reading the thread "Distributed Memory implementation": 
http://forum.dlang.org/post/oqcfifzolmolcvyupsef forum.dlang.org

And more precisely the suggestion of Dragos Carp on page 2: 
http://forum.dlang.org/post/txgabyyoutitketlpvgm forum.dlang.org

I looked at the joiner source code 
(std.algorithm.iteration.joiner) and I have several ideas to make 
it have the opIndex, opSlice and length methods under certain 
conditions.

First question: what do you think of making this possible if all 
the considered ranges have opIndex, opSlice and length? I think 
that all the ranges types don't have to all implement the three 
methods in all cases (for instance length would actually only 
require ElementType!RoR and Separator to implement length) but we 
can discuss all that later...

The most important issue that comes to my mind is that the 
operations would not take constant time... A trivial 
implementation would be in O(k) where k is the number of joined 
ranges, and with some precomputation in O(k) time and space we 
can make length O(1) and opIndex and opSlice O(log k)... Would 
that be a problem?

Thank you in advance for your remarks!

Jan 20 2016

Jack Stouffer <jack jackstouffer.com> writes:

On Wednesday, 20 January 2016 at 17:36:29 UTC, Maverick Chardet 
wrote:
 The most important issue that comes to my mind is that the 
 operations would not take constant time... A trivial 
 implementation would be in O(k) where k is the number of joined 
 ranges, and with some precomputation in O(k) time and space we 
 can make length O(1) and opIndex and opSlice O(log k)... Would 
 that be a problem?

I definitely think you should open a PR with your ideas 
regardless of what people say in this thread. Secondly, giving 
the user more options than he/she had before can't really be a 
bad thing, even if the options aren't perfect. That is, if you 
clearly document the trade-offs in the documentation.

But I am saying this before I see the code, so I can't really say 
one way or the other.

Jan 20 2016

Ilya <ilyayaroshenko gmail.com> writes:

On Wednesday, 20 January 2016 at 17:36:29 UTC, Maverick Chardet
wrote:
Hi everyone,

After reading the thread "Distributed Memory implementation":
http://forum.dlang.org/post/oqcfifzolmolcvyupsef forum.dlang.org

And more precisely the suggestion of Dragos Carp on page 2:
http://forum.dlang.org/post/txgabyyoutitketlpvgm forum.dlang.org

I looked at the joiner source code
(std.algorithm.iteration.joiner) and I have several ideas to
make it have the opIndex, opSlice and length methods under
certain conditions.

First question: what do you think of making this possible if
all the considered ranges have opIndex, opSlice and length? I
think that all the ranges types don't have to all implement the
three methods in all cases (for instance length would actually
only require ElementType!RoR and Separator to implement length)
but we can discuss all that later...

The most important issue that comes to my mind is that the
operations would not take constant time... A trivial
implementation would be in O(k) where k is the number of joined
ranges, and with some precomputation in O(k) time and space we
can make length O(1) and opIndex and opSlice O(log k)... Would
that be a problem?

Thank you in advance for your remarks!

O(K) looks OK for joiner. However this will cause additional
performance problem because a couple of Phobos algorithm assumes
that opIndex is fast as front+popFront. Perhaps a special UDA can
be introduced, something like optimized("forward range").

See also `byElement` for multidimensional random access slices.
It has opIndex and slicing.
https://github.com/D-Programming-Language/phobos/blob/v2.070.0-b2/std/experimental/ndslice/selection.d#L893

Ilya

Jan 20 2016

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Wednesday, 20 January 2016 at 17:36:29 UTC, Maverick Chardet 
wrote:
 The most important issue that comes to my mind is that the 
 operations would not take constant time... A trivial 
 implementation would be in O(k) where k is the number of joined 
 ranges, and with some precomputation in O(k) time and space we 
 can make length O(1) and opIndex and opSlice O(log k)... Would 
 that be a problem?

The various range primitives all need to be O(1). We need to be 
able to rely on the big-o complexity of the primitives used by 
ranges, or the performance of various algorithms will tank when 
used with ranges that don't stick to the expected big-o 
complexities. So, if a range type cannot provide a primitive as 
O(1), it should not provide that primitive. If that weren't the 
case, then we'd have stuff like length, opIndex, and opSlice for 
narrow strings, but we don't.

std.container lists the algorithmic complexity for its various 
operations for the same reason, and as I understand it, its 
replacement that Andrei is working on is going to do the same.

- Jonathan M Davis

Jan 20 2016

Maverick Chardet <chardetm gmail.com> writes:

Thank you all for your comments! I'm not making a PR for now 
because however I implement this, there will be complexity issues 
that need to be discussed...

On Wednesday, 20 January 2016 at 20:23:08 UTC, Jonathan M Davis 
wrote:
 On Wednesday, 20 January 2016 at 17:36:29 UTC, Maverick Chardet 
 wrote:
 The most important issue that comes to my mind is that the 
 operations would not take constant time... A trivial 
 implementation would be in O(k) where k is the number of 
 joined ranges, and with some precomputation in O(k) time and 
 space we can make length O(1) and opIndex and opSlice O(log 
 k)... Would that be a problem?

 The various range primitives all need to be O(1). We need to be 
 able to rely on the big-o complexity of the primitives used by 
 ranges, or the performance of various algorithms will tank when 
 used with ranges that don't stick to the expected big-o 
 complexities. So, if a range type cannot provide a primitive as 
 O(1), it should not provide that primitive. If that weren't the 
 case, then we'd have stuff like length, opIndex, and opSlice 
 for narrow strings, but we don't.

 std.container lists the algorithmic complexity for its various 
 operations for the same reason, and as I understand it, its 
 replacement that Andrei is working on is going to do the same.

 - Jonathan M Davis

That makes sense, so it won't happen unless we can make all of 
them run in O(1) time right? And this seems hopeless to me, even 
if all ranges are random access and implement opIndex, opSlice 
and length... Maybe with some precomputing in O(k) or O(n) (with 
k the number of ranges and n the total number of elements) with 
some fancy data structure?

But the whole thing would lose a lot of interest with a 
precomputing in O(n) time, appart from the syntaxic point of 
view...

Maverick Chardet

Jan 20 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 01/20/2016 12:36 PM, Maverick Chardet wrote:
 Hi everyone,

 After reading the thread "Distributed Memory implementation":
 http://forum.dlang.org/post/oqcfifzolmolcvyupsef forum.dlang.org

 And more precisely the suggestion of Dragos Carp on page 2:
 http://forum.dlang.org/post/txgabyyoutitketlpvgm forum.dlang.org

 I looked at the joiner source code (std.algorithm.iteration.joiner) and
 I have several ideas to make it have the opIndex, opSlice and length
 methods under certain conditions.

 First question: what do you think of making this possible if all the
 considered ranges have opIndex, opSlice and length? I think that all the
 ranges types don't have to all implement the three methods in all cases
 (for instance length would actually only require ElementType!RoR and
 Separator to implement length) but we can discuss all that later...

 The most important issue that comes to my mind is that the operations
 would not take constant time... A trivial implementation would be in
 O(k) where k is the number of joined ranges, and with some
 precomputation in O(k) time and space we can make length O(1) and
 opIndex and opSlice O(log k)... Would that be a problem?

 Thank you in advance for your remarks!

Thanks for askin. Consider chain(), which currently DOES offer random 
access if its constituents to. In that case, the number of ranges 
chained is a compile-time constant dictated by the source code being 
compiled; no data dependency.

In contrast, the performance of joiner()'s proposed primitives would be 
dependent on the data. For your simplest example, length would be linear 
in the number of ranges. One way to improve on that would be to compute 
length upon the first call and then memoize (and update) it. There is a 
risk it could go out of sync if someone changes the ranges involved 
surreptitiously.

One good thing to do here would be to specialize joiner().walkLength. 
Then people writing r.walkLength would benefit from the quicker 
specialized version.

For indexed access, things get a fair amount more complicated. You can't 
afford linear time, so you need some serious data structure to ensure 
good performance there. That seems to take joiner() in a design space 
that makes it less attractive; currently it's a simple spec with a 
somewhat straightforward implementation.


Andrei

Jan 20 2016

Maverick Chardet <chardetm gmail.com> writes:

On Thursday, 21 January 2016 at 01:22:43 UTC, Andrei Alexandrescu 
wrote:
 Thanks for askin. Consider chain(), which currently DOES offer 
 random access if its constituents to. In that case, the number 
 of ranges chained is a compile-time constant dictated by the 
 source code being compiled; no data dependency.

 In contrast, the performance of joiner()'s proposed primitives 
 would be dependent on the data. For your simplest example, 
 length would be linear in the number of ranges. One way to 
 improve on that would be to compute length upon the first call 
 and then memoize (and update) it. There is a risk it could go 
 out of sync if someone changes the ranges involved 
 surreptitiously.

 One good thing to do here would be to specialize 
 joiner().walkLength. Then people writing r.walkLength would 
 benefit from the quicker specialized version.

 For indexed access, things get a fair amount more complicated. 
 You can't afford linear time, so you need some serious data 
 structure to ensure good performance there. That seems to take 
 joiner() in a design space that makes it less attractive; 
 currently it's a simple spec with a somewhat straightforward 
 implementation.


 Andrei


Thank you for your remarks!

I just finished to implement the specialized version of 
walkLength (this was a great idea), I just have to write a few 
comments and clever unittests before the PR, I may send it later 
today or tomorrow!


Just a question: is it possible for a range to be considered 
infinite and at the same time have a length? I suppose that if it 
is possible, such a range would be ill-formed...

I'm asking because for my specialized version to compile in the 
case where there is no separator, I require 
hasLength!(ElementType!RoR) (otherwise the specialization is 
useless), and I'm wondering wether testing for 
!isInfinite!(ElementType!RoR) for the version with no "upTo" 
parameter is useful or not at all...

Maverick Chardet

Jan 21 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 01/21/2016 05:22 PM, Maverick Chardet wrote:
 Just a question: is it possible for a range to be considered infinite
 and at the same time have a length? I suppose that if it is possible,
 such a range would be ill-formed...

Indeed that range would be ill-formed. -- Andrei

Jan 21 2016

Maverick Chardet <chardetm gmail.com> writes:

On Friday, 22 January 2016 at 01:02:10 UTC, Andrei Alexandrescu 
wrote:
 On 01/21/2016 05:22 PM, Maverick Chardet wrote:
 Just a question: is it possible for a range to be considered 
 infinite
 and at the same time have a length? I suppose that if it is 
 possible,
 such a range would be ill-formed...

 Indeed that range would be ill-formed. -- Andrei

Okay so I'll just use an assert!


Something is bothering me though, I'm wondering why walkLength 
does not save the range when possible (when it is an input 
range)? For now it's just making a copy, couldn't it be 
problematic in some cases? The current code is:


auto walkLength(Range)(Range range)
     if (isInputRange!Range && !isInfinite!Range)
{
     static if (hasLength!Range)
         return range.length;
     else
     {
         size_t result;
         for ( ; !range.empty ; range.popFront() )
             ++result;
         return result;
     }
}


Why not something like this:


auto walkLength(Range)(const ref Range range)
     if (isInputRange!Range && !isInfinite!Range)
{
     static if (hasLength!Range)
         return range.length;
     else
     {
         size_t result;
         static if (isForwardRange!range)
         {
             Range rangeCopy = range.save;
         }
         else
         {
             Range rangeCopy = range;
         }
         for ( ; !rangeCopy.empty ; rangeCopy.popFront() )
             ++result;
         return result;
     }
}


Maverick Chardet

Jan 22 2016

Jonathan M Davis <jmdavisProg gmx.com> writes:

On Friday, 22 January 2016 at 10:18:20 UTC, Maverick Chardet 
wrote:
 On Friday, 22 January 2016 at 01:02:10 UTC, Andrei Alexandrescu 
 wrote:
 On 01/21/2016 05:22 PM, Maverick Chardet wrote:
 Just a question: is it possible for a range to be considered 
 infinite
 and at the same time have a length? I suppose that if it is 
 possible,
 such a range would be ill-formed...

 Indeed that range would be ill-formed. -- Andrei

 Okay so I'll just use an assert!


 Something is bothering me though, I'm wondering why walkLength 
 does not save the range when possible (when it is an input 
 range)? For now it's just making a copy, couldn't it be 
 problematic in some cases? The current code is:


 auto walkLength(Range)(Range range)
     if (isInputRange!Range && !isInfinite!Range)
 {
     static if (hasLength!Range)
         return range.length;
     else
     {
         size_t result;
         for ( ; !range.empty ; range.popFront() )
             ++result;
         return result;
     }
 }


 Why not something like this:


 auto walkLength(Range)(const ref Range range)
     if (isInputRange!Range && !isInfinite!Range)
 {
     static if (hasLength!Range)
         return range.length;
     else
     {
         size_t result;
         static if (isForwardRange!range)
         {
             Range rangeCopy = range.save;
         }
         else
         {
             Range rangeCopy = range;
         }
         for ( ; !rangeCopy.empty ; rangeCopy.popFront() )
             ++result;
         return result;
     }
 }

Arguably, walkLength shouldn't even accept a range that isn't a 
forward range, since if it's not, then walkLength is just going 
to eat the range. As for calling save, there's no point. It's too 
late as soon as you've passed the range into the function, since 
that copied the range, which is undefined behavior due to the 
fact that the exact semantics of copying a range vary wildly 
depending on how a range is implemented. In generic code, if you 
want to use a range after passing it to a function, you have to 
call save and pass that to the function; otherwise, you get 
undefined behavior when you do anything with the range after 
passing it to the function.

Also, I would point out that passing by const ref isn't going to 
work. The range would be const, so you couldn't iterate over it 
(and save returns the exact same type as the range, so it 
wouldn't remove const even if it could - at it probably can't 
anyway). And if the function takes the range by ref, then it will 
consume it, which you really don't want. Calling save would get 
around that problem, but then you'd still have the problem that 
the function wouldn't accept rvalues, which would be highly 
annoying.

- Jonathan M Davis

Jan 22 2016

Maverick Chardet <chardetm gmail.com> writes:

On Friday, 22 January 2016 at 11:23:05 UTC, Jonathan M Davis 
wrote:
 Arguably, walkLength shouldn't even accept a range that isn't a 
 forward range, since if it's not, then walkLength is just going 
 to eat the range. As for calling save, there's no point. It's 
 too late as soon as you've passed the range into the function, 
 since that copied the range, which is undefined behavior due to 
 the fact that the exact semantics of copying a range vary 
 wildly depending on how a range is implemented. In generic 
 code, if you want to use a range after passing it to a 
 function, you have to call save and pass that to the function; 
 otherwise, you get undefined behavior when you do anything with 
 the range after passing it to the function.

Yes actually this was my point, why not forcing the range to be a 
forward range? The only advantage I see is that it allows to 
count the number of elements of an input range, but then I guess 
it would need to be mentioned in the documentation somewhere, 
what do you think?

Yes my code is wrong sorry about that, I was just trying to 
figure out how to force the saving in some way...

Jan 22 2016

Maverick Chardet <chardetm gmail.com> writes:

And coming back to the specialized version of walkLength: in the 
case where joiner does not implement save, should I still allow 
the specialized version? (and in this case do what: consume the 
range, make a copy of "this" and hope for the best, something 
else?)

Maverick Chardet

Jan 22 2016

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 01/22/2016 07:59 AM, Maverick Chardet wrote:
 And coming back to the specialized version of walkLength: in the case
 where joiner does not implement save, should I still allow the
 specialized version? (and in this case do what: consume the range, make
 a copy of "this" and hope for the best, something else?)

 Maverick Chardet

walkLength is what walkLength does: it accepts even an input range and 
has the discretion to consume it. So let them eat cake. -- Andrei

Jan 22 2016

D Programming

C/C++ Programming

Other

digitalmars.D - opIndex, opSlice, length for joiner?