www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - opIndex, opSlice, length for joiner?

reply Maverick Chardet <chardetm gmail.com> writes:
Hi everyone,

After reading the thread "Distributed Memory implementation": 
http://forum.dlang.org/post/oqcfifzolmolcvyupsef forum.dlang.org

And more precisely the suggestion of Dragos Carp on page 2: 
http://forum.dlang.org/post/txgabyyoutitketlpvgm forum.dlang.org

I looked at the joiner source code 
(std.algorithm.iteration.joiner) and I have several ideas to make 
it have the opIndex, opSlice and length methods under certain 
conditions.

First question: what do you think of making this possible if all 
the considered ranges have opIndex, opSlice and length? I think 
that all the ranges types don't have to all implement the three 
methods in all cases (for instance length would actually only 
require ElementType!RoR and Separator to implement length) but we 
can discuss all that later...

The most important issue that comes to my mind is that the 
operations would not take constant time... A trivial 
implementation would be in O(k) where k is the number of joined 
ranges, and with some precomputation in O(k) time and space we 
can make length O(1) and opIndex and opSlice O(log k)... Would 
that be a problem?

Thank you in advance for your remarks!
Jan 20 2016
next sibling parent Jack Stouffer <jack jackstouffer.com> writes:
On Wednesday, 20 January 2016 at 17:36:29 UTC, Maverick Chardet 
wrote:
 The most important issue that comes to my mind is that the 
 operations would not take constant time... A trivial 
 implementation would be in O(k) where k is the number of joined 
 ranges, and with some precomputation in O(k) time and space we 
 can make length O(1) and opIndex and opSlice O(log k)... Would 
 that be a problem?
I definitely think you should open a PR with your ideas regardless of what people say in this thread. Secondly, giving the user more options than he/she had before can't really be a bad thing, even if the options aren't perfect. That is, if you clearly document the trade-offs in the documentation. But I am saying this before I see the code, so I can't really say one way or the other.
Jan 20 2016
prev sibling next sibling parent Ilya <ilyayaroshenko gmail.com> writes:
On Wednesday, 20 January 2016 at 17:36:29 UTC, Maverick Chardet 
wrote:
 Hi everyone,

 After reading the thread "Distributed Memory implementation": 
 http://forum.dlang.org/post/oqcfifzolmolcvyupsef forum.dlang.org

 And more precisely the suggestion of Dragos Carp on page 2: 
 http://forum.dlang.org/post/txgabyyoutitketlpvgm forum.dlang.org

 I looked at the joiner source code 
 (std.algorithm.iteration.joiner) and I have several ideas to 
 make it have the opIndex, opSlice and length methods under 
 certain conditions.

 First question: what do you think of making this possible if 
 all the considered ranges have opIndex, opSlice and length? I 
 think that all the ranges types don't have to all implement the 
 three methods in all cases (for instance length would actually 
 only require ElementType!RoR and Separator to implement length) 
 but we can discuss all that later...

 The most important issue that comes to my mind is that the 
 operations would not take constant time... A trivial 
 implementation would be in O(k) where k is the number of joined 
 ranges, and with some precomputation in O(k) time and space we 
 can make length O(1) and opIndex and opSlice O(log k)... Would 
 that be a problem?

 Thank you in advance for your remarks!
O(K) looks OK for joiner. However this will cause additional performance problem because a couple of Phobos algorithm assumes that opIndex is fast as front+popFront. Perhaps a special UDA can be introduced, something like optimized("forward range"). See also `byElement` for multidimensional random access slices. It has opIndex and slicing. https://github.com/D-Programming-Language/phobos/blob/v2.070.0-b2/std/experimental/ndslice/selection.d#L893 Ilya
Jan 20 2016
prev sibling next sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Wednesday, 20 January 2016 at 17:36:29 UTC, Maverick Chardet 
wrote:
 The most important issue that comes to my mind is that the 
 operations would not take constant time... A trivial 
 implementation would be in O(k) where k is the number of joined 
 ranges, and with some precomputation in O(k) time and space we 
 can make length O(1) and opIndex and opSlice O(log k)... Would 
 that be a problem?
The various range primitives all need to be O(1). We need to be able to rely on the big-o complexity of the primitives used by ranges, or the performance of various algorithms will tank when used with ranges that don't stick to the expected big-o complexities. So, if a range type cannot provide a primitive as O(1), it should not provide that primitive. If that weren't the case, then we'd have stuff like length, opIndex, and opSlice for narrow strings, but we don't. std.container lists the algorithmic complexity for its various operations for the same reason, and as I understand it, its replacement that Andrei is working on is going to do the same. - Jonathan M Davis
Jan 20 2016
parent Maverick Chardet <chardetm gmail.com> writes:
Thank you all for your comments! I'm not making a PR for now 
because however I implement this, there will be complexity issues 
that need to be discussed...

On Wednesday, 20 January 2016 at 20:23:08 UTC, Jonathan M Davis 
wrote:
 On Wednesday, 20 January 2016 at 17:36:29 UTC, Maverick Chardet 
 wrote:
 The most important issue that comes to my mind is that the 
 operations would not take constant time... A trivial 
 implementation would be in O(k) where k is the number of 
 joined ranges, and with some precomputation in O(k) time and 
 space we can make length O(1) and opIndex and opSlice O(log 
 k)... Would that be a problem?
The various range primitives all need to be O(1). We need to be able to rely on the big-o complexity of the primitives used by ranges, or the performance of various algorithms will tank when used with ranges that don't stick to the expected big-o complexities. So, if a range type cannot provide a primitive as O(1), it should not provide that primitive. If that weren't the case, then we'd have stuff like length, opIndex, and opSlice for narrow strings, but we don't. std.container lists the algorithmic complexity for its various operations for the same reason, and as I understand it, its replacement that Andrei is working on is going to do the same. - Jonathan M Davis
That makes sense, so it won't happen unless we can make all of them run in O(1) time right? And this seems hopeless to me, even if all ranges are random access and implement opIndex, opSlice and length... Maybe with some precomputing in O(k) or O(n) (with k the number of ranges and n the total number of elements) with some fancy data structure? But the whole thing would lose a lot of interest with a precomputing in O(n) time, appart from the syntaxic point of view... Maverick Chardet
Jan 20 2016
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 01/20/2016 12:36 PM, Maverick Chardet wrote:
 Hi everyone,

 After reading the thread "Distributed Memory implementation":
 http://forum.dlang.org/post/oqcfifzolmolcvyupsef forum.dlang.org

 And more precisely the suggestion of Dragos Carp on page 2:
 http://forum.dlang.org/post/txgabyyoutitketlpvgm forum.dlang.org

 I looked at the joiner source code (std.algorithm.iteration.joiner) and
 I have several ideas to make it have the opIndex, opSlice and length
 methods under certain conditions.

 First question: what do you think of making this possible if all the
 considered ranges have opIndex, opSlice and length? I think that all the
 ranges types don't have to all implement the three methods in all cases
 (for instance length would actually only require ElementType!RoR and
 Separator to implement length) but we can discuss all that later...

 The most important issue that comes to my mind is that the operations
 would not take constant time... A trivial implementation would be in
 O(k) where k is the number of joined ranges, and with some
 precomputation in O(k) time and space we can make length O(1) and
 opIndex and opSlice O(log k)... Would that be a problem?

 Thank you in advance for your remarks!
Thanks for askin. Consider chain(), which currently DOES offer random access if its constituents to. In that case, the number of ranges chained is a compile-time constant dictated by the source code being compiled; no data dependency. In contrast, the performance of joiner()'s proposed primitives would be dependent on the data. For your simplest example, length would be linear in the number of ranges. One way to improve on that would be to compute length upon the first call and then memoize (and update) it. There is a risk it could go out of sync if someone changes the ranges involved surreptitiously. One good thing to do here would be to specialize joiner().walkLength. Then people writing r.walkLength would benefit from the quicker specialized version. For indexed access, things get a fair amount more complicated. You can't afford linear time, so you need some serious data structure to ensure good performance there. That seems to take joiner() in a design space that makes it less attractive; currently it's a simple spec with a somewhat straightforward implementation. Andrei
Jan 20 2016
parent reply Maverick Chardet <chardetm gmail.com> writes:
On Thursday, 21 January 2016 at 01:22:43 UTC, Andrei Alexandrescu 
wrote:
 Thanks for askin. Consider chain(), which currently DOES offer 
 random access if its constituents to. In that case, the number 
 of ranges chained is a compile-time constant dictated by the 
 source code being compiled; no data dependency.

 In contrast, the performance of joiner()'s proposed primitives 
 would be dependent on the data. For your simplest example, 
 length would be linear in the number of ranges. One way to 
 improve on that would be to compute length upon the first call 
 and then memoize (and update) it. There is a risk it could go 
 out of sync if someone changes the ranges involved 
 surreptitiously.

 One good thing to do here would be to specialize 
 joiner().walkLength. Then people writing r.walkLength would 
 benefit from the quicker specialized version.

 For indexed access, things get a fair amount more complicated. 
 You can't afford linear time, so you need some serious data 
 structure to ensure good performance there. That seems to take 
 joiner() in a design space that makes it less attractive; 
 currently it's a simple spec with a somewhat straightforward 
 implementation.


 Andrei
Thank you for your remarks! I just finished to implement the specialized version of walkLength (this was a great idea), I just have to write a few comments and clever unittests before the PR, I may send it later today or tomorrow! Just a question: is it possible for a range to be considered infinite and at the same time have a length? I suppose that if it is possible, such a range would be ill-formed... I'm asking because for my specialized version to compile in the case where there is no separator, I require hasLength!(ElementType!RoR) (otherwise the specialization is useless), and I'm wondering wether testing for !isInfinite!(ElementType!RoR) for the version with no "upTo" parameter is useful or not at all... Maverick Chardet
Jan 21 2016
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 01/21/2016 05:22 PM, Maverick Chardet wrote:
 Just a question: is it possible for a range to be considered infinite
 and at the same time have a length? I suppose that if it is possible,
 such a range would be ill-formed...
Indeed that range would be ill-formed. -- Andrei
Jan 21 2016
parent reply Maverick Chardet <chardetm gmail.com> writes:
On Friday, 22 January 2016 at 01:02:10 UTC, Andrei Alexandrescu 
wrote:
 On 01/21/2016 05:22 PM, Maverick Chardet wrote:
 Just a question: is it possible for a range to be considered 
 infinite
 and at the same time have a length? I suppose that if it is 
 possible,
 such a range would be ill-formed...
Indeed that range would be ill-formed. -- Andrei
Okay so I'll just use an assert! Something is bothering me though, I'm wondering why walkLength does not save the range when possible (when it is an input range)? For now it's just making a copy, couldn't it be problematic in some cases? The current code is: auto walkLength(Range)(Range range) if (isInputRange!Range && !isInfinite!Range) { static if (hasLength!Range) return range.length; else { size_t result; for ( ; !range.empty ; range.popFront() ) ++result; return result; } } Why not something like this: auto walkLength(Range)(const ref Range range) if (isInputRange!Range && !isInfinite!Range) { static if (hasLength!Range) return range.length; else { size_t result; static if (isForwardRange!range) { Range rangeCopy = range.save; } else { Range rangeCopy = range; } for ( ; !rangeCopy.empty ; rangeCopy.popFront() ) ++result; return result; } } Maverick Chardet
Jan 22 2016
parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Friday, 22 January 2016 at 10:18:20 UTC, Maverick Chardet 
wrote:
 On Friday, 22 January 2016 at 01:02:10 UTC, Andrei Alexandrescu 
 wrote:
 On 01/21/2016 05:22 PM, Maverick Chardet wrote:
 Just a question: is it possible for a range to be considered 
 infinite
 and at the same time have a length? I suppose that if it is 
 possible,
 such a range would be ill-formed...
Indeed that range would be ill-formed. -- Andrei
Okay so I'll just use an assert! Something is bothering me though, I'm wondering why walkLength does not save the range when possible (when it is an input range)? For now it's just making a copy, couldn't it be problematic in some cases? The current code is: auto walkLength(Range)(Range range) if (isInputRange!Range && !isInfinite!Range) { static if (hasLength!Range) return range.length; else { size_t result; for ( ; !range.empty ; range.popFront() ) ++result; return result; } } Why not something like this: auto walkLength(Range)(const ref Range range) if (isInputRange!Range && !isInfinite!Range) { static if (hasLength!Range) return range.length; else { size_t result; static if (isForwardRange!range) { Range rangeCopy = range.save; } else { Range rangeCopy = range; } for ( ; !rangeCopy.empty ; rangeCopy.popFront() ) ++result; return result; } }
Arguably, walkLength shouldn't even accept a range that isn't a forward range, since if it's not, then walkLength is just going to eat the range. As for calling save, there's no point. It's too late as soon as you've passed the range into the function, since that copied the range, which is undefined behavior due to the fact that the exact semantics of copying a range vary wildly depending on how a range is implemented. In generic code, if you want to use a range after passing it to a function, you have to call save and pass that to the function; otherwise, you get undefined behavior when you do anything with the range after passing it to the function. Also, I would point out that passing by const ref isn't going to work. The range would be const, so you couldn't iterate over it (and save returns the exact same type as the range, so it wouldn't remove const even if it could - at it probably can't anyway). And if the function takes the range by ref, then it will consume it, which you really don't want. Calling save would get around that problem, but then you'd still have the problem that the function wouldn't accept rvalues, which would be highly annoying. - Jonathan M Davis
Jan 22 2016
parent reply Maverick Chardet <chardetm gmail.com> writes:
On Friday, 22 January 2016 at 11:23:05 UTC, Jonathan M Davis 
wrote:
 Arguably, walkLength shouldn't even accept a range that isn't a 
 forward range, since if it's not, then walkLength is just going 
 to eat the range. As for calling save, there's no point. It's 
 too late as soon as you've passed the range into the function, 
 since that copied the range, which is undefined behavior due to 
 the fact that the exact semantics of copying a range vary 
 wildly depending on how a range is implemented. In generic 
 code, if you want to use a range after passing it to a 
 function, you have to call save and pass that to the function; 
 otherwise, you get undefined behavior when you do anything with 
 the range after passing it to the function.
Yes actually this was my point, why not forcing the range to be a forward range? The only advantage I see is that it allows to count the number of elements of an input range, but then I guess it would need to be mentioned in the documentation somewhere, what do you think? Yes my code is wrong sorry about that, I was just trying to figure out how to force the saving in some way...
Jan 22 2016
parent reply Maverick Chardet <chardetm gmail.com> writes:
And coming back to the specialized version of walkLength: in the 
case where joiner does not implement save, should I still allow 
the specialized version? (and in this case do what: consume the 
range, make a copy of "this" and hope for the best, something 
else?)

Maverick Chardet
Jan 22 2016
parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 01/22/2016 07:59 AM, Maverick Chardet wrote:
 And coming back to the specialized version of walkLength: in the case
 where joiner does not implement save, should I still allow the
 specialized version? (and in this case do what: consume the range, make
 a copy of "this" and hope for the best, something else?)

 Maverick Chardet
walkLength is what walkLength does: it accepts even an input range and has the discretion to consume it. So let them eat cake. -- Andrei
Jan 22 2016