digitalmars.D - opIndex, opSlice, length for joiner?
- Maverick Chardet (22/22) Jan 20 2016 Hi everyone,
- Jack Stouffer (9/15) Jan 20 2016 I definitely think you should open a PR with your ideas
- Ilya (10/32) Jan 20 2016 O(K) looks OK for joiner. However this will cause additional
- Jonathan M Davis (14/20) Jan 20 2016 The various range primitives all need to be O(1). We need to be
- Maverick Chardet (15/35) Jan 20 2016 Thank you all for your comments! I'm not making a PR for now
- Andrei Alexandrescu (20/39) Jan 20 2016 Thanks for askin. Consider chain(), which currently DOES offer random
- Maverick Chardet (17/38) Jan 21 2016 Thank you for your remarks!
- Andrei Alexandrescu (2/5) Jan 21 2016 Indeed that range would be ill-formed. -- Andrei
- Maverick Chardet (43/50) Jan 22 2016 Okay so I'll just use an assert!
- Jonathan M Davis (23/73) Jan 22 2016 Arguably, walkLength shouldn't even accept a range that isn't a
- Maverick Chardet (9/20) Jan 22 2016 Yes actually this was my point, why not forcing the range to be a
- Maverick Chardet (6/6) Jan 22 2016 And coming back to the specialized version of walkLength: in the
- Andrei Alexandrescu (3/8) Jan 22 2016 walkLength is what walkLength does: it accepts even an input range and
Hi everyone, After reading the thread "Distributed Memory implementation": http://forum.dlang.org/post/oqcfifzolmolcvyupsef forum.dlang.org And more precisely the suggestion of Dragos Carp on page 2: http://forum.dlang.org/post/txgabyyoutitketlpvgm forum.dlang.org I looked at the joiner source code (std.algorithm.iteration.joiner) and I have several ideas to make it have the opIndex, opSlice and length methods under certain conditions. First question: what do you think of making this possible if all the considered ranges have opIndex, opSlice and length? I think that all the ranges types don't have to all implement the three methods in all cases (for instance length would actually only require ElementType!RoR and Separator to implement length) but we can discuss all that later... The most important issue that comes to my mind is that the operations would not take constant time... A trivial implementation would be in O(k) where k is the number of joined ranges, and with some precomputation in O(k) time and space we can make length O(1) and opIndex and opSlice O(log k)... Would that be a problem? Thank you in advance for your remarks!
Jan 20 2016
On Wednesday, 20 January 2016 at 17:36:29 UTC, Maverick Chardet wrote:The most important issue that comes to my mind is that the operations would not take constant time... A trivial implementation would be in O(k) where k is the number of joined ranges, and with some precomputation in O(k) time and space we can make length O(1) and opIndex and opSlice O(log k)... Would that be a problem?I definitely think you should open a PR with your ideas regardless of what people say in this thread. Secondly, giving the user more options than he/she had before can't really be a bad thing, even if the options aren't perfect. That is, if you clearly document the trade-offs in the documentation. But I am saying this before I see the code, so I can't really say one way or the other.
Jan 20 2016
On Wednesday, 20 January 2016 at 17:36:29 UTC, Maverick Chardet wrote:Hi everyone, After reading the thread "Distributed Memory implementation": http://forum.dlang.org/post/oqcfifzolmolcvyupsef forum.dlang.org And more precisely the suggestion of Dragos Carp on page 2: http://forum.dlang.org/post/txgabyyoutitketlpvgm forum.dlang.org I looked at the joiner source code (std.algorithm.iteration.joiner) and I have several ideas to make it have the opIndex, opSlice and length methods under certain conditions. First question: what do you think of making this possible if all the considered ranges have opIndex, opSlice and length? I think that all the ranges types don't have to all implement the three methods in all cases (for instance length would actually only require ElementType!RoR and Separator to implement length) but we can discuss all that later... The most important issue that comes to my mind is that the operations would not take constant time... A trivial implementation would be in O(k) where k is the number of joined ranges, and with some precomputation in O(k) time and space we can make length O(1) and opIndex and opSlice O(log k)... Would that be a problem? Thank you in advance for your remarks!O(K) looks OK for joiner. However this will cause additional performance problem because a couple of Phobos algorithm assumes that opIndex is fast as front+popFront. Perhaps a special UDA can be introduced, something like optimized("forward range"). See also `byElement` for multidimensional random access slices. It has opIndex and slicing. https://github.com/D-Programming-Language/phobos/blob/v2.070.0-b2/std/experimental/ndslice/selection.d#L893 Ilya
Jan 20 2016
On Wednesday, 20 January 2016 at 17:36:29 UTC, Maverick Chardet wrote:The most important issue that comes to my mind is that the operations would not take constant time... A trivial implementation would be in O(k) where k is the number of joined ranges, and with some precomputation in O(k) time and space we can make length O(1) and opIndex and opSlice O(log k)... Would that be a problem?The various range primitives all need to be O(1). We need to be able to rely on the big-o complexity of the primitives used by ranges, or the performance of various algorithms will tank when used with ranges that don't stick to the expected big-o complexities. So, if a range type cannot provide a primitive as O(1), it should not provide that primitive. If that weren't the case, then we'd have stuff like length, opIndex, and opSlice for narrow strings, but we don't. std.container lists the algorithmic complexity for its various operations for the same reason, and as I understand it, its replacement that Andrei is working on is going to do the same. - Jonathan M Davis
Jan 20 2016
Thank you all for your comments! I'm not making a PR for now because however I implement this, there will be complexity issues that need to be discussed... On Wednesday, 20 January 2016 at 20:23:08 UTC, Jonathan M Davis wrote:On Wednesday, 20 January 2016 at 17:36:29 UTC, Maverick Chardet wrote:That makes sense, so it won't happen unless we can make all of them run in O(1) time right? And this seems hopeless to me, even if all ranges are random access and implement opIndex, opSlice and length... Maybe with some precomputing in O(k) or O(n) (with k the number of ranges and n the total number of elements) with some fancy data structure? But the whole thing would lose a lot of interest with a precomputing in O(n) time, appart from the syntaxic point of view... Maverick ChardetThe most important issue that comes to my mind is that the operations would not take constant time... A trivial implementation would be in O(k) where k is the number of joined ranges, and with some precomputation in O(k) time and space we can make length O(1) and opIndex and opSlice O(log k)... Would that be a problem?The various range primitives all need to be O(1). We need to be able to rely on the big-o complexity of the primitives used by ranges, or the performance of various algorithms will tank when used with ranges that don't stick to the expected big-o complexities. So, if a range type cannot provide a primitive as O(1), it should not provide that primitive. If that weren't the case, then we'd have stuff like length, opIndex, and opSlice for narrow strings, but we don't. std.container lists the algorithmic complexity for its various operations for the same reason, and as I understand it, its replacement that Andrei is working on is going to do the same. - Jonathan M Davis
Jan 20 2016
On 01/20/2016 12:36 PM, Maverick Chardet wrote:Hi everyone, After reading the thread "Distributed Memory implementation": http://forum.dlang.org/post/oqcfifzolmolcvyupsef forum.dlang.org And more precisely the suggestion of Dragos Carp on page 2: http://forum.dlang.org/post/txgabyyoutitketlpvgm forum.dlang.org I looked at the joiner source code (std.algorithm.iteration.joiner) and I have several ideas to make it have the opIndex, opSlice and length methods under certain conditions. First question: what do you think of making this possible if all the considered ranges have opIndex, opSlice and length? I think that all the ranges types don't have to all implement the three methods in all cases (for instance length would actually only require ElementType!RoR and Separator to implement length) but we can discuss all that later... The most important issue that comes to my mind is that the operations would not take constant time... A trivial implementation would be in O(k) where k is the number of joined ranges, and with some precomputation in O(k) time and space we can make length O(1) and opIndex and opSlice O(log k)... Would that be a problem? Thank you in advance for your remarks!Thanks for askin. Consider chain(), which currently DOES offer random access if its constituents to. In that case, the number of ranges chained is a compile-time constant dictated by the source code being compiled; no data dependency. In contrast, the performance of joiner()'s proposed primitives would be dependent on the data. For your simplest example, length would be linear in the number of ranges. One way to improve on that would be to compute length upon the first call and then memoize (and update) it. There is a risk it could go out of sync if someone changes the ranges involved surreptitiously. One good thing to do here would be to specialize joiner().walkLength. Then people writing r.walkLength would benefit from the quicker specialized version. For indexed access, things get a fair amount more complicated. You can't afford linear time, so you need some serious data structure to ensure good performance there. That seems to take joiner() in a design space that makes it less attractive; currently it's a simple spec with a somewhat straightforward implementation. Andrei
Jan 20 2016
On Thursday, 21 January 2016 at 01:22:43 UTC, Andrei Alexandrescu wrote:Thanks for askin. Consider chain(), which currently DOES offer random access if its constituents to. In that case, the number of ranges chained is a compile-time constant dictated by the source code being compiled; no data dependency. In contrast, the performance of joiner()'s proposed primitives would be dependent on the data. For your simplest example, length would be linear in the number of ranges. One way to improve on that would be to compute length upon the first call and then memoize (and update) it. There is a risk it could go out of sync if someone changes the ranges involved surreptitiously. One good thing to do here would be to specialize joiner().walkLength. Then people writing r.walkLength would benefit from the quicker specialized version. For indexed access, things get a fair amount more complicated. You can't afford linear time, so you need some serious data structure to ensure good performance there. That seems to take joiner() in a design space that makes it less attractive; currently it's a simple spec with a somewhat straightforward implementation. AndreiThank you for your remarks! I just finished to implement the specialized version of walkLength (this was a great idea), I just have to write a few comments and clever unittests before the PR, I may send it later today or tomorrow! Just a question: is it possible for a range to be considered infinite and at the same time have a length? I suppose that if it is possible, such a range would be ill-formed... I'm asking because for my specialized version to compile in the case where there is no separator, I require hasLength!(ElementType!RoR) (otherwise the specialization is useless), and I'm wondering wether testing for !isInfinite!(ElementType!RoR) for the version with no "upTo" parameter is useful or not at all... Maverick Chardet
Jan 21 2016
On 01/21/2016 05:22 PM, Maverick Chardet wrote:Just a question: is it possible for a range to be considered infinite and at the same time have a length? I suppose that if it is possible, such a range would be ill-formed...Indeed that range would be ill-formed. -- Andrei
Jan 21 2016
On Friday, 22 January 2016 at 01:02:10 UTC, Andrei Alexandrescu wrote:On 01/21/2016 05:22 PM, Maverick Chardet wrote:Okay so I'll just use an assert! Something is bothering me though, I'm wondering why walkLength does not save the range when possible (when it is an input range)? For now it's just making a copy, couldn't it be problematic in some cases? The current code is: auto walkLength(Range)(Range range) if (isInputRange!Range && !isInfinite!Range) { static if (hasLength!Range) return range.length; else { size_t result; for ( ; !range.empty ; range.popFront() ) ++result; return result; } } Why not something like this: auto walkLength(Range)(const ref Range range) if (isInputRange!Range && !isInfinite!Range) { static if (hasLength!Range) return range.length; else { size_t result; static if (isForwardRange!range) { Range rangeCopy = range.save; } else { Range rangeCopy = range; } for ( ; !rangeCopy.empty ; rangeCopy.popFront() ) ++result; return result; } } Maverick ChardetJust a question: is it possible for a range to be considered infinite and at the same time have a length? I suppose that if it is possible, such a range would be ill-formed...Indeed that range would be ill-formed. -- Andrei
Jan 22 2016
On Friday, 22 January 2016 at 10:18:20 UTC, Maverick Chardet wrote:On Friday, 22 January 2016 at 01:02:10 UTC, Andrei Alexandrescu wrote:Arguably, walkLength shouldn't even accept a range that isn't a forward range, since if it's not, then walkLength is just going to eat the range. As for calling save, there's no point. It's too late as soon as you've passed the range into the function, since that copied the range, which is undefined behavior due to the fact that the exact semantics of copying a range vary wildly depending on how a range is implemented. In generic code, if you want to use a range after passing it to a function, you have to call save and pass that to the function; otherwise, you get undefined behavior when you do anything with the range after passing it to the function. Also, I would point out that passing by const ref isn't going to work. The range would be const, so you couldn't iterate over it (and save returns the exact same type as the range, so it wouldn't remove const even if it could - at it probably can't anyway). And if the function takes the range by ref, then it will consume it, which you really don't want. Calling save would get around that problem, but then you'd still have the problem that the function wouldn't accept rvalues, which would be highly annoying. - Jonathan M DavisOn 01/21/2016 05:22 PM, Maverick Chardet wrote:Okay so I'll just use an assert! Something is bothering me though, I'm wondering why walkLength does not save the range when possible (when it is an input range)? For now it's just making a copy, couldn't it be problematic in some cases? The current code is: auto walkLength(Range)(Range range) if (isInputRange!Range && !isInfinite!Range) { static if (hasLength!Range) return range.length; else { size_t result; for ( ; !range.empty ; range.popFront() ) ++result; return result; } } Why not something like this: auto walkLength(Range)(const ref Range range) if (isInputRange!Range && !isInfinite!Range) { static if (hasLength!Range) return range.length; else { size_t result; static if (isForwardRange!range) { Range rangeCopy = range.save; } else { Range rangeCopy = range; } for ( ; !rangeCopy.empty ; rangeCopy.popFront() ) ++result; return result; } }Just a question: is it possible for a range to be considered infinite and at the same time have a length? I suppose that if it is possible, such a range would be ill-formed...Indeed that range would be ill-formed. -- Andrei
Jan 22 2016
On Friday, 22 January 2016 at 11:23:05 UTC, Jonathan M Davis wrote:Arguably, walkLength shouldn't even accept a range that isn't a forward range, since if it's not, then walkLength is just going to eat the range. As for calling save, there's no point. It's too late as soon as you've passed the range into the function, since that copied the range, which is undefined behavior due to the fact that the exact semantics of copying a range vary wildly depending on how a range is implemented. In generic code, if you want to use a range after passing it to a function, you have to call save and pass that to the function; otherwise, you get undefined behavior when you do anything with the range after passing it to the function.Yes actually this was my point, why not forcing the range to be a forward range? The only advantage I see is that it allows to count the number of elements of an input range, but then I guess it would need to be mentioned in the documentation somewhere, what do you think? Yes my code is wrong sorry about that, I was just trying to figure out how to force the saving in some way...
Jan 22 2016
And coming back to the specialized version of walkLength: in the case where joiner does not implement save, should I still allow the specialized version? (and in this case do what: consume the range, make a copy of "this" and hope for the best, something else?) Maverick Chardet
Jan 22 2016
On 01/22/2016 07:59 AM, Maverick Chardet wrote:And coming back to the specialized version of walkLength: in the case where joiner does not implement save, should I still allow the specialized version? (and in this case do what: consume the range, make a copy of "this" and hope for the best, something else?) Maverick ChardetwalkLength is what walkLength does: it accepts even an input range and has the discretion to consume it. So let them eat cake. -- Andrei
Jan 22 2016