digitalmars.D - Proposal: takeFront and takeBack
- Jonathan M Davis (69/69) Jul 03 2012 This seems like it probably merits a bit of discussion, so I'm bringing ...
- Roman D. Boiko (2/2) Jul 03 2012 The name takeFront is too similar to take, which has very
- Jonathan M Davis (6/8) Jul 03 2012 I have similar concerns, but takeFront and takeBack were the best that I...
- Wouter Verhelst (15/36) Jul 03 2012 Couldn't you just overload popFront?
- monarch_dodra (6/21) Jul 03 2012 You can't overload by return value, so that is not possible.
- bearophile (9/14) Jul 03 2012 I expect a language to have a pop() in its collections, that
- deadalnix (3/24) Jul 04 2012 Many languages does this (it doesn't mean it is the right thing to do).
- Tobias Pankrath (1/3) Jul 04 2012 In C++ it was exception safety, wasn't it?
- Jonathan M Davis (6/10) Jul 04 2012 I believe that it was purely a question of speed. If popFront returns an...
- Tobias Pankrath (4/18) Jul 04 2012 If you pop from a container and return this the copy constructor
- Jonathan M Davis (10/31) Jul 04 2012 Yes. But the cost of copying the value and the cost of the exception are...
- Tobias Pankrath (7/23) Jul 04 2012 How would you implement an returning popFront for a generic
- Jonathan M Davis (16/44) Jul 04 2012 Okay. I see what you mean now. I thought that the idea was that you woul...
- deadalnix (8/11) Jul 04 2012 If you implement popFront as
- deadalnix (9/19) Jul 04 2012 If you return by reference, you get an overhead of 1 MOV instruction.
- Timon Gehr (6/22) Jul 04 2012 You get an 'overhead' of calling front, which is potentially unbounded.
- Jonathan M Davis (20/45) Jul 04 2012 ions in
- travert phare.normalesup.org (Christophe Travert) (6/6) Jul 05 2012 If you really don't need the value, you could devise a "justPop" method
- David Piepgrass (19/19) Jul 05 2012 (grain of salt, I'm new to D.)
- Jonathan M Davis (16/38) Jul 05 2012 That's generally just fine. The only ranges in Phobos which would have a...
- monarch_dodra (18/41) Jul 03 2012 Great job! I think takeFront, is a really good idea. It would
- Tobias Pankrath (1/1) Jul 03 2012 consumeFront?
- Roman D. Boiko (2/3) Jul 03 2012 +1
- Jonathan M Davis (3/4) Jul 03 2012 That seems like a good suggestion.
- travert phare.normalesup.org (Christophe Travert) (15/15) Jul 03 2012 takeFront implementation is dangerous for ranges which invalidates their...
- Jonathan M Davis (13/26) Jul 03 2012 Hmm. I hadn't thought of that. That could be a good reason not to do thi...
- Roman D. Boiko (11/38) Jul 03 2012 Just an idea (feels dirty, but still): such range could redefine
- travert phare.normalesup.org (Christophe Travert) (11/11) Jul 03 2012 Range have been designed with the idea that front is valid until next
- Roman D. Boiko (13/31) Jul 03 2012 takeFront -> consumeFront
- Roman D. Boiko (8/8) Jul 03 2012 Still, ranges which invalidate front on popFront could defer that
- Roman D. Boiko (6/36) Jul 03 2012 An alternative is not to define a free function. So if
- Jonathan M Davis (24/43) Jul 03 2012 An alternative would be to make it so that takeFront (or consumeFront, a...
- Roman D. Boiko (8/84) Jul 03 2012 You bet me by 19 seconds and a better explanation :)
- Dmitry Olshansky (12/52) Jul 03 2012 I was about to propose fetchFront/fetchBack with similar semantics.
This seems like it probably merits a bit of discussion, so I'm bringing it up here rather than simply opening a pull request. At present, for some ranges (variably-lengthed ranges such as strings in particular), calling front incurs a cost which popFront at least partially duplicates. So, the range primitives are inherently inefficient in that they force you to incur that extra cost as you iterate over the range. Ideally, there would be a way to get front and pop it off at the same time so that you incur the cost only once (either that or have front cache its result in a way that lets it avoid the extra cost in popFront when popFront is called - though that wouldn't work with strings, since for them, the range primitives are free functions). So, I'm proposing takeFront and takeBack: https://github.com/jmdavis/phobos/commit/5bfa8126fa14a539fee67807821ec0a10503f27b For most ranges, takeFront does this: auto takeFront(R)(ref R range) if(isInputRange!R && !isNarrowString!R) { assert(!range.empty); auto retval = range.front; range.popFront(); return retval; } So, it's pretty much the same cost as using front and popFront separately (whether it costs more or less probably depends on the exact code and optimizations, but it should be comparable). But for strings, it looks like this auto takeFront(R)(ref R range) if(isNarrowString!R) { import std.utf; assert(!range.empty); size_t index = 0; auto retval = decode(range, index); range = range[index .. $]; return retval; } So, for strings, it'll be more efficient to use takeFront than calling front and popFront separately. The idea then is that any user-defined range which can implement takeFront more efficiently than the default will define it. Then range- based functions use takeFront - e.g. range.takeFront() - and if the user- defined range implements a more efficient version, that one is used and they gain the extra efficiency, or if they don't, then the free function is used with UFCS, and they incur the same cost that they'd incur calling front and popFront separately. So, it's invisible to range-based functions whether a range actually implements takeFront itself. takeBack does the same thing as takeFront but it does it with back and popBack for bidirectional ranges. I _think_ that this is a fairly clean solution to the problem, but someone else might be able to point out why this is a bad idea, or they might have a better idea. And this will have a definite impact on how ranges are normally used if we add this, so I'm bringing it up here for discussion. Opinions? Thoughts? Insights? Oh, and if we go with this, ideally, the compiler would be updated to use takeFront for foreach instead of front and popFront if a range implements it (but still do it the current way if it doesn't). So, if typeof(range) implements takeFront, foreach(e; range) {...} would then become for(auto _range = range; !_range.empty;) { auto e = _range.takeFront(); ... } instead of for(auto _range = range; !_range.empty; _range.popFront()) { auto e = _range.front(); ... } but that's an optimization which could be added later. - Jonathan M Davis
Jul 03 2012
The name takeFront is too similar to take, which has very different semantics.
Jul 03 2012
On Tuesday, July 03, 2012 19:17:40 Roman D. Boiko wrote:The name takeFront is too similar to take, which has very different semantics.I have similar concerns, but takeFront and takeBack were the best that I could come up with. frontPopFront or retPopFront and the like would be horrible. So, suggestions are welcome. The concept is more important than the name, and if someone can come up with a better name, then I'm all for it. - Jonathan M Davis
Jul 03 2012
Jonathan M Davis <jmdavisProg gmx.com> writes:This seems like it probably merits a bit of discussion, so I'm bringing it up here rather than simply opening a pull request. At present, for some ranges (variably-lengthed ranges such as strings in particular), calling front incurs a cost which popFront at least partially duplicates. So, the range primitives are inherently inefficient in that they force you to incur that extra cost as you iterate over the range. Ideally, there would be a way to get front and pop it off at the same time so that you incur the cost only once (either that or have front cache its result in a way that lets it avoid the extra cost in popFront when popFront is called - though that wouldn't work with strings, since for them, the range primitives are free functions). So, I'm proposing takeFront and takeBack: https://github.com/jmdavis/phobos/commit/5bfa8126fa14a539fee67807821ec0a10503f27b For most ranges, takeFront does this: auto takeFront(R)(ref R range) if(isInputRange!R && !isNarrowString!R) { assert(!range.empty); auto retval = range.front; range.popFront(); return retval; }Couldn't you just overload popFront? have a void popFront which throws off the first element without returning anything, and an auto popFront which does return data. I'd always been taught that "pop" means "read a bit and chop it off", which means that having to first read the front and then pop it off (i.e., in two separate methods) feels rather counterintuitive to me. That's in addition to the fact that yes, there's a performance issue. But hey, I've only been doing this D thing for a few weeks, so feel free to ignore me if I'm not making any sense :-) -- The volume of a pizza of thickness a and radius z can be described by the following formula: pi zz a
Jul 03 2012
On Tuesday, 3 July 2012 at 17:22:17 UTC, Wouter Verhelst wrote:Jonathan M Davis <jmdavisProg gmx.com> writes: Couldn't you just overload popFront? have a void popFront which throws off the first element without returning anything, and an auto popFront which does return data. I'd always been taught that "pop" means "read a bit and chop it off", which means that having to first read the front and then pop it off (i.e., in two separate methods) feels rather counterintuitive to me. That's in addition to the fact that yes, there's a performance issue. But hey, I've only been doing this D thing for a few weeks, so feel free to ignore me if I'm not making any sense :-)You can't overload by return value, so that is not possible. As far as I can recall, I've always been taught that pop does NOT (should not) return a value. Rationale being it makes you pay for a read/copy you may not have asked for. That's the way C++ does it, and is what I've come to expect from any language.
Jul 03 2012
monarch_dodra:As far as I can recall, I've always been taught that pop does NOT (should not) return a value. Rationale being it makes you pay for a read/copy you may not have asked for.I think it's also a matter of exception safety.That's the way C++ does it, and is what I've come to expect from any language.I expect a language to have a pop() in its collections, that returns the first item and removes it from the collection, as in Python. In D sometimes I put the front and popfront on the same line of code, because I think of them almost as a single operation :-) Bye, bearophile
Jul 03 2012
Le 03/07/2012 19:27, monarch_dodra a écrit :On Tuesday, 3 July 2012 at 17:22:17 UTC, Wouter Verhelst wrote:Many languages does this (it doesn't mean it is the right thing to do). Do you know why this shouldn't be done ?Jonathan M Davis <jmdavisProg gmx.com> writes: Couldn't you just overload popFront? have a void popFront which throws off the first element without returning anything, and an auto popFront which does return data. I'd always been taught that "pop" means "read a bit and chop it off", which means that having to first read the front and then pop it off (i.e., in two separate methods) feels rather counterintuitive to me. That's in addition to the fact that yes, there's a performance issue. But hey, I've only been doing this D thing for a few weeks, so feel free to ignore me if I'm not making any sense :-)You can't overload by return value, so that is not possible. As far as I can recall, I've always been taught that pop does NOT (should not) return a value. Rationale being it makes you pay for a read/copy you may not have asked for. That's the way C++ does it, and is what I've come to expect from any language.
Jul 04 2012
Many languages does this (it doesn't mean it is the right thing to do). Do you know why this shouldn't be done ?In C++ it was exception safety, wasn't it?
Jul 04 2012
On Wednesday, July 04, 2012 12:55:44 Tobias Pankrath wrote:I believe that it was purely a question of speed. If popFront returns an element, then that element gets copied, and if you didn't need to access the element, then that's wasted cycles. You have to worry about exceptions in either case, depending on the what popFront is doing. - Jonathan M DavisMany languages does this (it doesn't mean it is the right thing to do). Do you know why this shouldn't be done ?In C++ it was exception safety, wasn't it?
Jul 04 2012
On Wednesday, 4 July 2012 at 18:40:50 UTC, Jonathan M Davis wrote:On Wednesday, July 04, 2012 12:55:44 Tobias Pankrath wrote:If you pop from a container and return this the copy constructor / postblit will run. If in this moment a exception is thrown, than this value is lost.I believe that it was purely a question of speed. If popFront returns an element, then that element gets copied, and if you didn't need to access the element, then that's wasted cycles. You have to worry about exceptions in either case, depending on the what popFront is doing. - Jonathan M DavisMany languages does this (it doesn't mean it is the right thing to do). Do you know why this shouldn't be done ?In C++ it was exception safety, wasn't it?
Jul 04 2012
On Wednesday, July 04, 2012 22:10:36 Tobias Pankrath wrote:On Wednesday, 4 July 2012 at 18:40:50 UTC, Jonathan M Davis wrote:Yes. But the cost of copying the value and the cost of the exception are separate. Simply popping off the element could throw an exception (in the case of strings in D, you'll get a UTFException if the string is malformed). So, _regardless_ of whether you return an element, you potentially risk an exception being thrown (depending on the implementation of the container or range). And so I think that exceptions are pretty much irrelevant to the discussion of whether returning an element from popFront is a good idea or not. - Jonathan M DavisOn Wednesday, July 04, 2012 12:55:44 Tobias Pankrath wrote:If you pop from a container and return this the copy constructor / postblit will run. If in this moment a exception is thrown, than this value is lost.I believe that it was purely a question of speed. If popFront returns an element, then that element gets copied, and if you didn't need to access the element, then that's wasted cycles. You have to worry about exceptions in either case, depending on the what popFront is doing. - Jonathan M DavisMany languages does this (it doesn't mean it is the right thing to do). Do you know why this shouldn't be done ?In C++ it was exception safety, wasn't it?
Jul 04 2012
Yes. But the cost of copying the value and the cost of the exception are separate.The argument is not about performance, it's about loosing values.Simply popping off the element could throw an exception (in the case of strings in D, you'll get a UTFException if the string is malformed). So, _regardless_ of whether you return an element, you potentially risk an exception being thrown (depending on the implementation of the container or range).It's not the same.And so I think that exceptions are pretty much irrelevant to the discussion of whether returning an element from popFront is a good idea or not.How would you implement an returning popFront for a generic container that does not leak values? It's not possible (at least in C++, not sure in D), so it is relevant. Maybe it was not the prime reason, but it is a good reason non the less.
Jul 04 2012
On Wednesday, July 04, 2012 22:34:46 Tobias Pankrath wrote:Okay. I see what you mean now. I thought that the idea was that you would somehow avoid exceptions when popping such that the fact returning the element could generate an exception made it then possible to have exceptions be thrown from popFront, which would be a problem. Well, in most cases, I honestly don't think that that would be a problem, since if an exception is thrown from the copy constructor / postblit of the object being returned, it's probably foobared anyway, and having it on the front of the range (or container) would really buy you anything. And if you didn't actually want the element, then it doesn't really matter anyway - beyond the fact that you now have an extra way to have an exception be thrown. But it is true that you avoid that whole issue if you separate returning the element from popping it. Regardless, performance alone makes it worthwhile to not have popFront return anything IMHO, and that's the argument that I've always heard. - Jonathan M DavisYes. But the cost of copying the value and the cost of the exception are separate.The argument is not about performance, it's about loosing values.Simply popping off the element could throw an exception (in the case of strings in D, you'll get a UTFException if the string is malformed). So, _regardless_ of whether you return an element, you potentially risk an exception being thrown (depending on the implementation of the container or range).It's not the same.And so I think that exceptions are pretty much irrelevant to the discussion of whether returning an element from popFront is a good idea or not.How would you implement an returning popFront for a generic container that does not leak values? It's not possible (at least in C++, not sure in D), so it is relevant. Maybe it was not the prime reason, but it is a good reason non the less.
Jul 04 2012
Le 04/07/2012 22:34, Tobias Pankrath a écrit :If you implement popFront as { popFront(); // current popFront return front; } Then no value is lost with Exception. If an Exception is thrown, this is dubious anyway, because the value is likely to be irrelevant anyway.Yes. But the cost of copying the value and the cost of the exception are separate.The argument is not about performance, it's about loosing values.
Jul 04 2012
Le 04/07/2012 20:40, Jonathan M Davis a écrit :On Wednesday, July 04, 2012 12:55:44 Tobias Pankrath wrote:If you return by reference, you get an overhead of 1 MOV instruction. This is ridiculous. You win nothing else, because the register things are returned in is a trash register, so you have the returned thing, or garbage in it, you can assume nothing else. The cost is neglectible. And any compiler is able to reduce that cost to 0 when inlining if the value is not read. If the compiler don't inline, an extra MOV instruction is not will slow you down.I believe that it was purely a question of speed. If popFront returns an element, then that element gets copied, and if you didn't need to access the element, then that's wasted cycles. You have to worry about exceptions in either case, depending on the what popFront is doing. - Jonathan M DavisMany languages does this (it doesn't mean it is the right thing to do). Do you know why this shouldn't be done ?In C++ it was exception safety, wasn't it?
Jul 04 2012
On 07/04/2012 11:53 PM, deadalnix wrote:Le 04/07/2012 20:40, Jonathan M Davis a écrit :You get an 'overhead' of calling front, which is potentially unbounded. struct Map(...){ ... property auto ref front() { return f(otherRange.front); } }On Wednesday, July 04, 2012 12:55:44 Tobias Pankrath wrote:If you return by reference, you get an overhead of 1 MOV instruction. This is ridiculous.I believe that it was purely a question of speed. If popFront returns an element, then that element gets copied, and if you didn't need to access the element, then that's wasted cycles. You have to worry about exceptions in either case, depending on the what popFront is doing. - Jonathan M DavisMany languages does this (it doesn't mean it is the right thing to do). Do you know why this shouldn't be done ?In C++ it was exception safety, wasn't it?
Jul 04 2012
On Thursday, July 05, 2012 00:03:02 Timon Gehr wrote:On 07/04/2012 11:53 PM, deadalnix wrote:rns anLe 04/07/2012 20:40, Jonathan M Davis a =C3=A9crit :On Wednesday, July 04, 2012 12:55:44 Tobias Pankrath wrote:=20 I believe that it was purely a question of speed. If popFront retu=Many languages does this (it doesn't mean it is the right thing to do). Do you know why this shouldn't be done ?=20 In C++ it was exception safety, wasn't it?ions inelement, then that element gets copied, and if you didn't need to access the element, then that's wasted cycles. You have to worry about except=n.either case, depending on the what popFront is doing. =20 - Jonathan M Davis=20 If you return by reference, you get an overhead of 1 MOV instructio=d.This is ridiculous.=20 You get an 'overhead' of calling front, which is potentially unbounde==20 struct Map(...){ ... property auto ref front() { return f(otherRange.front); } }Not to mention, in many cases, you _can't_ return a ref like deadalnix=20= suggests, because that would require that front be returning an lvalue,= and it=20 often isn't. And in the case of strings - which is the prime reason for= this=20 discussion in the first place - it definitely can't return an lvalue, b= ecause=20 front is calculated (that and returning a value from popFront would mak= e=20 popFront much more expensive in the case where you _don't_ actually nee= d to=20 access front, so making popFront return an element doesn't help the str= ing=20 case at all). - Jonathan M Davis
Jul 04 2012
If you really don't need the value, you could devise a "justPop" method that does not return (by the way, overloading by return type would be an amazing feature here). The idea is not "we should return a value everytime we pop", but "we should pop when we return a value". -- Christophe
Jul 05 2012
(grain of salt, I'm new to D.) I'd vote for consumeFront being always available, because it's distinctly more convenient to call one function instead of two, especially when you expect that making a copy of front is cheap (e.g. a collection of pointers, numbers or slices). Ranges where 'front' returns a pointer to a buffer that popFront destroys (overwrites) are surely in the minority, right? So I hope they would be retrofitted to support consumeFront. But, given that popFront is allowed to be destructive to the value of front, by re-using the same buffer (and that the proposed consumeFront might also be implemented with 'delayed destruction' to front), I am wondering how one is supposed to implement generic code correctly when this is unacceptable, e.g.: void append(Range1,Range2)(Range1 input, ref Range2 output) { // Usually works, unless input.popFront happens to be destructive? foreach(e; input) output ~= e; }
Jul 05 2012
On Thursday, July 05, 2012 23:59:47 David Piepgrass wrote:(grain of salt, I'm new to D.) I'd vote for consumeFront being always available, because it's distinctly more convenient to call one function instead of two, especially when you expect that making a copy of front is cheap (e.g. a collection of pointers, numbers or slices). Ranges where 'front' returns a pointer to a buffer that popFront destroys (overwrites) are surely in the minority, right? So I hope they would be retrofitted to support consumeFront. But, given that popFront is allowed to be destructive to the value of front, by re-using the same buffer (and that the proposed consumeFront might also be implemented with 'delayed destruction' to front), I am wondering how one is supposed to implement generic code correctly when this is unacceptable, e.g.: void append(Range1,Range2)(Range1 input, ref Range2 output) { // Usually works, unless input.popFront happens to be destructive? foreach(e; input) output ~= e; }That's generally just fine. The only ranges in Phobos which would have any problem with that would be the ones in std.stdio which reuse a buffer for front, and I wouldn't really expect other ranges to have that sort of problem unless they were doing the same thing. It mostly becomes an issue of knowing which ranges you have to be careful of that way, and being careful what you pass them to. So, I wouldn't worry about it all that much in general, but something like consumeFront would break them entirely, since if it were introduced, then presumably, it would be used fairly heavily by many range- based functions, and it's guaranteed not to work with them, whereas a situation like your code above would be fairly rare, and it would generally be clear to the user of a range like ByLine that the function that they're passing it to would be doing something like appending each element in it into a new range, which wouldn't work, so they'd take precautions, like wrapping it in a call to map which duped the elements before passing it to append. - Jonathan M Davis
Jul 05 2012
On Tuesday, 3 July 2012 at 16:37:20 UTC, Jonathan M Davis wrote:... Oh, and if we go with this, ideally, the compiler would be updated to use takeFront for foreach instead of front and popFront if a range implements it (but still do it the current way if it doesn't). So, if typeof(range) implements takeFront, foreach(e; range) {...} would then become for(auto _range = range; !_range.empty;) { auto e = _range.takeFront(); ... } instead of for(auto _range = range; !_range.empty; _range.popFront()) { auto e = _range.front(); ... } but that's an optimization which could be added later. - Jonathan M DavisGreat job! I think takeFront, is a really good idea. It would also finally fix the whole "should pop return a value" debate. I'd only be afraid the name might clash (in spirit) with the existing "take", "takeOne" and "takeNone". Not so long ago, I asked for a way to get the last elements of a subrange, and proposed a method called "takeBack"... Maybe "popMoveFront" would be more addequate/Accurate? This would make it more in line with the existing "moveFront" defined inside range, As well as the "popFront" which would be inside the range implementation: It would make the trio "front, popFront, popMoveFront". I think wiring it into foreach is also a good idea, but would require compiler support of course. Personally though, regarding foreach, I'd appreciate it more if we could first get foreach with ref to work with "front assign" ( http://forum.dlang.org/thread/ceftaiklanejfhodbpix forum.dlang.org ) That said, your proposal is probably less costly to implement.
Jul 03 2012
On Tuesday, 3 July 2012 at 17:29:45 UTC, Tobias Pankrath wrote:consumeFront?+1
Jul 03 2012
On Tuesday, July 03, 2012 19:29:44 Tobias Pankrath wrote:consumeFront?That seems like a good suggestion. - Jonathan M Davis
Jul 03 2012
takeFront implementation is dangerous for ranges which invalidates their front value when popFront is called, for instance, File.byLine. Thus takeFront will have to be used with care: any range implement takeFront (because of the template and USFC), but it may not be valid. That makes the range interface more complicated: There is a takeFront property, but you have to check it is safe to use... how do you check that by the way? Are there so many ranges that duplicate efforts in front and popFront, where there is no easy workarround? Would it be such an improvement in performance to add takeFront? strings being immutable, my guess would be that it is not that hard for the compiler to optimize a foreach loop (it should be easy to profile this...). Even if there is an efficiency issue, you could easily solve it by implementing a range wrapper or an opApply method to make foreach faster. -- Christophe
Jul 03 2012
On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:takeFront implementation is dangerous for ranges which invalidates their front value when popFront is called, for instance, File.byLine. Thus takeFront will have to be used with care: any range implement takeFront (because of the template and USFC), but it may not be valid. That makes the range interface more complicated: There is a takeFront property, but you have to check it is safe to use... how do you check that by the way?Hmm. I hadn't thought of that. That could be a good reason not to do this. I'm not quite sure how to get around that. Hmmm... It would arguably be a bit ugly, but a range which couldn't safely be used with takeFront could have an enum on it which indicated that, and takeFront would fail to compile if used with such a range. Of course, that would mean that you'd have to special case such ranges in any range-based function which used takeFront so that there was a branch which didn't use takeFront for ranges which couldn't use it. I don't know. It's certainly something to think about. We may need a different solution.Are there so many ranges that duplicate efforts in front and popFront, where there is no easy workarround? Would it be such an improvement in performance to add takeFront? strings being immutable, my guess would be that it is not that hard for the compiler to optimize a foreach loop (it should be easy to profile this...). Even if there is an efficiency issue, you could easily solve it by implementing a range wrapper or an opApply method to make foreach faster.foreach isn't the problem. strings don't even use the range API for foreach. It's operating on them outside of foreach that's the problem. - Jonathan M Davis
Jul 03 2012
On Tuesday, 3 July 2012 at 17:40:24 UTC, Jonathan M Davis wrote:On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:Just an idea (feels dirty, but still): such range could redefine takeFront, and maintain state. takeFront would defer a call to popFront until the next operation. All of front, popFront, takeFront and empty would check state (whether popFront has been deferred), and execute popFront first. Side effect would be that a call to empty would invalidate result of takeFront, but that could be changed also: empty instead of calling popFront would check whether there is anything else after the front element. This is much more complicated, but would only be necessary for such containers.takeFront implementation is dangerous for ranges which invalidates their front value when popFront is called, for instance, File.byLine. Thus takeFront will have to be used with care: any range implement takeFront (because of the template and USFC), but it may not be valid. That makes the range interface more complicated: There is a takeFront property, but you have to check it is safe to use... how do you check that by the way?Hmm. I hadn't thought of that. That could be a good reason not to do this. I'm not quite sure how to get around that. Hmmm... It would arguably be a bit ugly, but a range which couldn't safely be used with takeFront could have an enum on it which indicated that, and takeFront would fail to compile if used with such a range. Of course, that would mean that you'd have to special case such ranges in any range-based function which used takeFront so that there was a branch which didn't use takeFront for ranges which couldn't use it.
Jul 03 2012
Range have been designed with the idea that front is valid until next call to popFront. If popFront was to be called right after front, then it would just be a popFront that returns a value, and maybe a justPop or something if you don't want to copy the value. It's delicate to come now and decide that if a previously returned front value is no longer valid after a call to popFront, it should be documented by an enum. Who is going to review all libraries (written and to come) to make sure the enum is placed when it should be? The reverse should be done instead: if you want you range to be optimized by calling takeFront, define something (for example... takeFront). Then use algorithms that are specialized for takeFront.
Jul 03 2012
On Tuesday, 3 July 2012 at 18:12:54 UTC, travert phare.normalesup.org (Christophe Travert) wrote:Range have been designed with the idea that front is valid until next call to popFront. If popFront was to be called right after front, then it would just be a popFront that returns a value, and maybe a justPop or something if you don't want to copy the value. It's delicate to come now and decide that if a previously returned front value is no longer valid after a call to popFront, it should be documented by an enum. Who is going to review all libraries (written and to come) to make sure the enum is placed when it should be? The reverse should be done instead: if you want you range to be optimized by calling takeFront, define something (for example... takeFront). Then use algorithms that are specialized for takeFront.takeFront -> consumeFront hasConsume has been proposed by Jonathan already. But not having to modify all existing ranges which invalidate value returned by front might be a good argument against my hack also. However I'm not sure there are really many of such ranges. Those must return something with reference semantics in order to be able to invalidate it. Most common case would be when front value is always stored in the same memory (a buffer). The benefit of my hacking idea is that clients would need no special casing.
Jul 03 2012
Still, ranges which invalidate front on popFront could defer that (store a flag that popFront or consumeFront has been called and invalidate value previously returned from front in the next call to front). That would be intrusive with respect to such ranges, and a breaking change. But it would simplify client code significantly. (This doesn't mean that I'm insisting, or strongly prefer some of my two ideas, just wanted to clearly restate the second one.)
Jul 03 2012
On Tuesday, 3 July 2012 at 17:40:24 UTC, Jonathan M Davis wrote:On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:An alternative is not to define a free function. So if consumeFront is defined, it may be used by the algorithm, otherwise simply call to front and popFront in the algorithm. If branching is necessary, then the is no value in a free function (it wouldn't optimize for speed any way).takeFront implementation is dangerous for ranges which invalidates their front value when popFront is called, for instance, File.byLine. Thus takeFront will have to be used with care: any range implement takeFront (because of the template and USFC), but it may not be valid. That makes the range interface more complicated: There is a takeFront property, but you have to check it is safe to use... how do you check that by the way?Hmm. I hadn't thought of that. That could be a good reason not to do this. I'm not quite sure how to get around that. Hmmm... It would arguably be a bit ugly, but a range which couldn't safely be used with takeFront could have an enum on it which indicated that, and takeFront would fail to compile if used with such a range. Of course, that would mean that you'd have to special case such ranges in any range-based function which used takeFront so that there was a branch which didn't use takeFront for ranges which couldn't use it. I don't know. It's certainly something to think about. We may need a different solution.
Jul 03 2012
On Tuesday, July 03, 2012 10:40:07 Jonathan M Davis wrote:On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:An alternative would be to make it so that takeFront (or consumeFront, as someone suggested, since that's a better name) would be defined only for ranges which it helps. So, similar to hasSlicing, you get something like hasConsume, and then a range-based function which wants to use consumeFront or consumeBack checks hasConsume!R and uses it on that particular static if branch if it does and uses another branch with front and popFront if it doesn't. std.range.consumeFront then works only with strings (and maybe arrays). However, it seemed to me that a major upside of consumeFront as I proposed it was that you could always just use it, and it would do the most efficient thing, so you _didn't_ have to special case for it, whereas with hasConsume, you would. But if you can't use consumeFront safely on all ranges, then that doesn't really work. It's still worth something but not as much. You can already special case strings (and in many cases need to). What consumeFront does is make it so that functions operating on wrappers around strings (i.e. the results of functions like filter or map) can operate on them more efficiently as well. It also makes it so that user-defined ranges can get that benefit from algorithms that know nothing about those ranges and therefore don't special case them (e.g. std.algorithm functions can then be more efficient for user-defined ranges which can define a more efficient consumeFront). So, even if we had hasConsume and functions had to special case in order to use consumeFront or consumeBack, it would still be beneficial. However, it would be nice to have a solution which _didn't_ require special casing. - Jonathan M DavistakeFront implementation is dangerous for ranges which invalidates their front value when popFront is called, for instance, File.byLine. Thus takeFront will have to be used with care: any range implement takeFront (because of the template and USFC), but it may not be valid. That makes the range interface more complicated: There is a takeFront property, but you have to check it is safe to use... how do you check that by the way?Hmm. I hadn't thought of that. That could be a good reason not to do this. I'm not quite sure how to get around that. Hmmm... It would arguably be a bit ugly, but a range which couldn't safely be used with takeFront could have an enum on it which indicated that, and takeFront would fail to compile if used with such a range. Of course, that would mean that you'd have to special case such ranges in any range-based function which used takeFront so that there was a branch which didn't use takeFront for ranges which couldn't use it. I don't know. It's certainly something to think about. We may need a different solution.
Jul 03 2012
On Tuesday, 3 July 2012 at 18:00:55 UTC, Jonathan M Davis wrote:On Tuesday, July 03, 2012 10:40:07 Jonathan M Davis wrote:You bet me by 19 seconds and a better explanation :) So far we have two options: cleaner code because there would be no need to hack containers which invalidate their front value on a call to popFront, or a hack which I proposed above, but localized in such containers. By the way, some of such containers could potentially defer invalidating front till the next call to front.On Tuesday, July 03, 2012 17:31:21 Christophe Travert wrote:An alternative would be to make it so that takeFront (or consumeFront, as someone suggested, since that's a better name) would be defined only for ranges which it helps. So, similar to hasSlicing, you get something like hasConsume, and then a range-based function which wants to use consumeFront or consumeBack checks hasConsume!R and uses it on that particular static if branch if it does and uses another branch with front and popFront if it doesn't. std.range.consumeFront then works only with strings (and maybe arrays). However, it seemed to me that a major upside of consumeFront as I proposed it was that you could always just use it, and it would do the most efficient thing, so you _didn't_ have to special case for it, whereas with hasConsume, you would. But if you can't use consumeFront safely on all ranges, then that doesn't really work. It's still worth something but not as much. You can already special case strings (and in many cases need to). What consumeFront does is make it so that functions operating on wrappers around strings (i.e. the results of functions like filter or map) can operate on them more efficiently as well. It also makes it so that user-defined ranges can get that benefit from algorithms that know nothing about those ranges and therefore don't special case them (e.g. std.algorithm functions can then be more efficient for user-defined ranges which can define a more efficient consumeFront). So, even if we had hasConsume and functions had to special case in order to use consumeFront or consumeBack, it would still be beneficial. However, it would be nice to have a solution which _didn't_ require special casing. - Jonathan M DavistakeFront implementation is dangerous for ranges which invalidates their front value when popFront is called, for instance, File.byLine. Thus takeFront will have to be used with care: any range implement takeFront (because of the template and USFC), but it may not be valid. That makes the range interface more complicated: There is a takeFront property, but you have to check it is safe to use... how do you check that by the way?Hmm. I hadn't thought of that. That could be a good reason not to do this. I'm not quite sure how to get around that. Hmmm... It would arguably be a bit ugly, but a range which couldn't safely be used with takeFront could have an enum on it which indicated that, and takeFront would fail to compile if used with such a range. Of course, that would mean that you'd have to special case such ranges in any range-based function which used takeFront so that there was a branch which didn't use takeFront for ranges which couldn't use it. I don't know. It's certainly something to think about. We may need a different solution.
Jul 03 2012
On 03-Jul-12 20:37, Jonathan M Davis wrote:This seems like it probably merits a bit of discussion, so I'm bringing it up here rather than simply opening a pull request. At present, for some ranges (variably-lengthed ranges such as strings in particular), calling front incurs a cost which popFront at least partially duplicates. So, the range primitives are inherently inefficient in that they force you to incur that extra cost as you iterate over the range. Ideally, there would be a way to get front and pop it off at the same time so that you incur the cost only once (either that or have front cache its result in a way that lets it avoid the extra cost in popFront when popFront is called - though that wouldn't work with strings, since for them, the range primitives are free functions). So, I'm proposing takeFront and takeBack:I was about to propose fetchFront/fetchBack with similar semantics. Thanks for pushing this proposal it looks good. My initial intent however was to add Variable Length Range as a concept. ... Now I think binary predicate hasFetch a-la hasSlicing is better.https://github.com/jmdavis/phobos/commit/5bfa8126fa14a539fee67807821ec0a10503f27b For most ranges, takeFront does this: auto takeFront(R)(ref R range) if(isInputRange!R && !isNarrowString!R) { assert(!range.empty); auto retval = range.front; range.popFront(); return retval; }Aye. Yet there is indeed a problem with pseudo-ranges that reuse 'slot' for front on each popFront. Well they always been brittle.So, for strings, it'll be more efficient to use takeFront than calling front and popFront separately. The idea then is that any user-defined range which can implement takeFront more efficiently than the default will define it. Then range- based functions use takeFront - e.g. range.takeFront() - and if the user- defined range implements a more efficient version, that one is used and they gain the extra efficiency, or if they don't, then the free function is used with UFCS, and they incur the same cost that they'd incur calling front and popFront separately. So, it's invisible to range-based functions whether a range actually implements takeFront itself. takeBack does the same thing as takeFront but it does it with back and popBack for bidirectional ranges. I _think_ that this is a fairly clean solution to the problem, but someone else might be able to point out why this is a bad idea, or they might have a better idea. And this will have a definite impact on how ranges are normally used if we add this, so I'm bringing it up here for discussion. Opinions? Thoughts? Insights? Oh, and if we go with this, ideally, the compiler would be updated to use takeFront for foreach instead of front and popFront if a range implements it (but still do it the current way if it doesn't). So, if typeof(range) implements takeFront,Right. In fact as we've seen above with e.g. stdin.byLine not every range can do fetchFront correctly so it's a strict subset. That was the reason for current trio of basic operations BTW. -- Dmitry Olshansky
Jul 03 2012