digitalmars.D - Transience of .front in input vs. forward ranges
- H. S. Teoh (142/142) Oct 30 2012 Now that Andrei is back (I think?), I want to bring up this discussion
- deadalnix (62/64) Oct 30 2012 Interesting post.
- H. S. Teoh (43/113) Oct 30 2012 Hmm. I like this idea! This is much better than my proposal. So let's
- deadalnix (3/4) Oct 31 2012 The obvious drawback is that this make invalidating ranges harder to
- H. S. Teoh (12/19) Oct 31 2012 Actually, there is another problem: many algorithms' output will be
- deadalnix (4/20) Oct 31 2012 No it isn't always transitive. But that is true that it is in some cases...
- H. S. Teoh (22/41) Oct 31 2012 But then can the alternate behaviour be used at all? For example, joiner
- deadalnix (2/40) Nov 01 2012 Would joiner be able to take advantage of this if it was known or not ?
- H. S. Teoh (18/65) Nov 01 2012 I have a working version of joiner that doesn't assume the persistence
- deadalnix (23/23) Nov 01 2012 This is up to the consumer to choose if .front can be oblitered on
- H. S. Teoh (11/40) Nov 02 2012 Ah, I see. That makes sense. So basically it's not the source (or any
- Jonathan M Davis (7/14) Nov 02 2012 It's no different form propogating slicing or random access or whatnot. ...
- deadalnix (14/28) Nov 04 2012 The whole point of my proposal is to avoid adding isTransient or
- Andrei Alexandrescu (19/24) Nov 04 2012 IMHO this property is deducible from the others:
- Dmitry Olshansky (20/44) Nov 04 2012 Or rather onePass!R == ...
- Andrei Alexandrescu (10/18) Nov 04 2012 I didn't claim that. My strongest claim was:
- Dmitry Olshansky (19/37) Nov 04 2012 Indeed, but it wasn't your claim that I meant. It's fine with me but
- Jonathan M Davis (35/54) Nov 04 2012 The reality is that in the general case, you can't know for certain whet...
- deadalnix (5/29) Nov 04 2012 You can stop here. This shows that you didn't understood the proposal.
- Andrei Alexandrescu (16/35) Nov 04 2012 Indeed I'd misunderstood. So I went back and my current understanding is...
- deadalnix (11/25) Nov 04 2012 To sum it up, a transformation range should implement .transient if it
- Andrei Alexandrescu (9/12) Nov 04 2012 It doesn't fit the (admittedly difficult to fulfill) desideratum that
-
monarch_dodra
(13/19)
Nov 04 2012
Yes, please! (
Although I don't necessarily agree with - H. S. Teoh (61/73) Nov 04 2012 Actually, it does. The proposal was to modify byLine so that it returns
- deadalnix (3/73) Nov 05 2012 I think I'll hire you when I need to promote my ideas :D This is much
- H. S. Teoh (14/29) Nov 03 2012 [...]
- Era Scarecrow (17/27) Nov 03 2012 From watching and gleaming from what I have so far, I can only
- Jonathan M Davis (60/79) Nov 03 2012 That's just trying to be able to tell a range whether it should be trans...
- deadalnix (2/9) Nov 04 2012 Can you elaborate on that. I definitively don't see a problem that big h...
- Jonathan M Davis (22/33) Nov 04 2012 eed
- deadalnix (10/30) Nov 05 2012 HS Teoh explained it nicely.
- Jonathan M Davis (8/20) Nov 05 2012 You still need to wrap it in every wrapper range which could possibly su...
- deadalnix (25/44) Nov 05 2012 As shown before, most wrapper range wouldn't have to do a thing except
- H. S. Teoh (57/78) Nov 05 2012 You *can* wrap it if the wrapper range can support transience. It
- Andrei Alexandrescu (13/27) Nov 05 2012 It's not undefined behavior, it's just surprising behavior. UB would be
- deadalnix (31/47) Nov 05 2012 Unless you know the internal implementation of the range, it is
- Andrei Alexandrescu (8/42) Nov 05 2012 Good solutions are found in the minds of people. Starting from the idea
- deadalnix (12/22) Nov 05 2012 I certainly don't ! I'd be happy to switch to another solution granted
- H. S. Teoh (31/40) Nov 05 2012 I'd like to hear a counterproposal to the .transient idea. Let's not get
- Andrei Alexandrescu (7/10) Nov 05 2012 I'd like to build more of this argument. I pointed out that your range
- deadalnix (11/17) Nov 05 2012 Yes my code can also fall in the .dup stuff.
- H. S. Teoh (12/22) Nov 05 2012 [...]
- Andrei Alexandrescu (8/13) Nov 05 2012 I think the simplification of input range = transient and forward range
- deadalnix (9/21) Nov 05 2012 At this point, if transient is out I'd prefer Jonathan's proposal were
- H. S. Teoh (59/79) Nov 05 2012 Then std.array.array is broken by definition, and cannot be implemented
- Jonathan M Davis (36/55) Nov 05 2012 We can create an hasTransientFront trait for checking whether front is
- H. S. Teoh (70/87) Nov 05 2012 [...]
- Tommi (58/58) Nov 05 2012 I don't think this whole issue has anything to do with ranges. I
- H. S. Teoh (10/21) Nov 05 2012 [...]
- Tommi (5/8) Nov 05 2012 I'm not familiar with that definition of generic code. But I do
- H. S. Teoh (12/21) Nov 05 2012 OK I worded that poorly. All I meant was that currently, there is no
- Andrei Alexandrescu (13/31) Nov 05 2012 Here's where user defined @tributes would help a lot. We'd then define
- deadalnix (2/34) Nov 06 2012 You mentioned me once that AOP was useless in D.
- Andrei Alexandrescu (3/42) Nov 06 2012 What's the connection?
- deadalnix (3/21) Nov 06 2012 This is OT. But this @owned stuff coupled with code that is generated
- Timon Gehr (3/36) Nov 06 2012 That is actually the first thing that I will use the new attribute
- Mehrdad (6/14) Nov 05 2012 C++ as a language doesn't, but if you follow the convention that
- Andrei Alexandrescu (5/13) Nov 05 2012 Languages commonly have trouble defining comparison and copying
- Andrei Alexandrescu (11/16) Nov 05 2012 We could transfer that matter to the type of .front itself, i.e. define
- Jonathan M Davis (42/59) Nov 05 2012 peekFront would work better than copy, because whether front needs to be...
- deadalnix (11/40) Nov 06 2012 Assuming you call front on the source range, you don't need to copy :
- deadalnix (3/18) Nov 06 2012 This have the same issue than .transient have in regard of transformer
- H. S. Teoh (33/76) Nov 06 2012 I like .peekFront. It's easy to understand, and is self-documenting. And
- Mehrdad (6/14) Nov 06 2012 That's not what "peek" means in any of the languages I've seen.
- deadalnix (2/18) Nov 06 2012 Is it possible to have the pro and cons of peekFront vs transient ?
- H. S. Teoh (55/75) Nov 07 2012 I'll give it a shot, since nobody else is replying:
- deadalnix (16/88) Nov 07 2012 .transient seems simpler to me in this case. Adding one line is easier
- Jonathan M Davis (9/17) Nov 07 2012 Why would you need to duplicate anything?. If you can implement it using...
- deadalnix (31/48) Nov 09 2012 OK, seeing your answer and H. S Teoh, I think I badly expressed myself,
- H. S. Teoh (11/52) Nov 09 2012 Hmm, you're right. At first I thought .front would just be a thin
- Tommi (11/11) Nov 12 2012 Now I finally see that deepDup/deepCopy/clone is not a (good)
- Tommi (13/21) Nov 12 2012 ...forgot to add how this relates to this issue:
- Jonathan M Davis (12/15) Nov 12 2012 That wouldn't work. It's the complete opposite of what a generative rang...
- Tommi (24/36) Nov 12 2012 I didn't understand any of those arguments, so perhaps I still
- Jonathan M Davis (22/64) Nov 12 2012 With regards to this discussion, transience means that front is a refere...
- Tommi (11/30) Nov 12 2012 Oh, it was just a matter of miscommunication then. I bet you
- Tommi (16/19) Nov 12 2012 "..so that all types had value semantics". That's a bit too
- Jonathan M Davis (9/12) Nov 09 2012 In most cases, it still won't be a problem. How much in the way of code ...
- Mehrdad (3/11) Nov 09 2012 Quite a ton, if it's calculated lazily.
- H. S. Teoh (10/28) Nov 07 2012 [...]
- H. S. Teoh (13/34) Nov 04 2012 [...]
- Andrei Alexandrescu (14/40) Nov 04 2012 I think at this point we should streamline and simplify ranges while
- H. S. Teoh (29/56) Nov 04 2012 [...]
- Andrei Alexandrescu (25/55) Nov 04 2012 The express purpose here is to simplify instead of offering highly
Now that Andrei is back (I think?), I want to bring up this discussion again, because I think it's important. Recently, in another thread, it was found that std.algorithm.joiner doesn't work properly with input ranges whose .front value is invalidated by popFront(). Andrei stated that for input ranges .front should not be assumed to return a persistent value, whereas for forward ranges, .front can be assumed to be persistent. However, Jonathan believes that .front should never be transient. Obviously, both cannot be the case simultaneously. So we need to decide exactly what it should be, because the current situation is subtly broken, and this subtle brokenness is pervasive. For example, I recently rewrote joiner to eliminate the assumption that .front is persistent, only to discover that in the unittests, I can't use array() or equal() (or, for that matter, writefln()), because they apparently all make this assumption at some point (I didn't bother to find out exactly where). In other words, right now input ranges really only work with arrays and array-like objects. Not the generic ranges that Andrei has in mind in his article "On Iteration". Many input ranges will subtly break, the prime whipping boy example being byLine (which I hate to bring up because it does not represent the full scope of such transient ranges), a range that returns in-place permutations of an array, or anything that reuses a buffer, really. This situation isn't as simple as input ranges being transient and forward ranges not, though. I want to bring another example besides the dead horse byLine() to the spotlight. Let's say you have a range R that spans all permutations of some starting array A. For efficiency reasons, we don't want to allocate a new array every time we return a permutation; so we have an internal buffer in R that holds the current permutation, which is what .front returns. Then popFront() simply permutes this buffer in-place. Something like this: struct AllPermutations(T) { T[] front, first; bool done; this(T[] initialBuf) { first = initialBuf; front = first.dup; } void popFront() { nextPermutation(current); if (equal(front, first)) done = true; } property bool empty() { return !done; } } This is an input range, according to Andrei's definition. The value of .front is transient, since popFront() modifies it in-place. According to Jonathan's definition, however, this isn't a valid range for that very reason. Now consider what happens if we add this member: auto save() { AllPermutations!T copy; copy.front = this.front; copy.first = this.first; copy.done = this.done; return copy; } This returns a separate instance of the same range, starting with the current permutation, and ending with the original permutation, as before. I submit that this makes it a forward range. However, this fails to be a forward range under Andrei's definition, because forward ranges require .front to be persistent. So we'd have to modify the range to be something like this: struct AllPermutations(T) { T[] current, first; bool done; this(T[] initialBuf) { first = initialBuf; current = first.dup; } void popFront() { nextPermutation(current); if (equal(current, first)) done = true; } property bool empty() { return !done; } property T[] front() { return current.dup; // <--- note this line } auto save() { AllPermutations!T copy; copy.front = this.front; copy.first = this.first; copy.done = this.done; return copy; } } Note that now we have to duplicate the output array every time .front is accessed. So whatever gains we may have had by using nextPermutation to modify the array in-place is lost, just so that we can conform to an arbitrary standard of what a forward range is. Under Jonathan's definition, we'd have to incur this cost regardless of whether we had save() or not, since .front is *always* required to be persistent. But I propose that the correct solution is to recognize that whether or not .front is transient is orthogonal to whether a range is an input range or a forward range. Many algorithms actually don't care if .front is persistent or not (at least in theory), including equal(), find(), map(), count(), reduce(), joiner(). Maybe they *currently* assume that .front is persistent, but they are easily implementable *without* this assumption (I have an implementation of joiner that doesn't make this assumption). Not allowing forward ranges to have transient .front values, or not allowing transient .front values at all, will introduce the arbitrary need for lots of implicit copying, which makes D code inefficient. It will also limit the applicability of std.algorithm and std.range (if said the cost of said copying is unacceptable for a particular application). The problem, of course, is how to check of a range has transient .front or not. I proposed adding a .isTransient member to the range, but this depends on the range implementor to remember to mark the range with .isTransient=true, and we know that coding by convention is not scalable. So here's another idea: template isTransient(R) if (isInputRange!R) { static if (is(typeof(R.front) : immutable(typeof(R.front)))) enum isTransient = false; else enum isTransient = true; } The idea is that value types are implicitly convertible to immutable, and value types are non-transient. Reference types, if they can be made immutable, ensures that calling popFront() will never invalidate them (by definition of immutable). To use byLine as an example, if we want to make it non-transient, we simply have .front return string instead of char[]. Then we can safely use it with any of the existing algorithms that assume that .front won't mutate under them when popFront() is called. Algorithms that *don't* need .front to be persistent can just take input/forward/whatever ranges as usual. In any case, whether we decide to do this or not, we NEED to set down a clear, unambiguous definition of exactly what an input range is, and what a forward range is, and what kind of assumptions may be safely made regarding the value of .front. The current ambiguous situation is a hiding place for subtle bugs, and is unacceptable. T -- Long, long ago, the ancient Chinese invented a device that lets them see through walls. It was called the "window".
Oct 30 2012
Le 30/10/2012 23:29, H. S. Teoh a écrit :Now that Andrei is back (I think?), I want to bring up this discussion again, because I think it's important.Interesting post. It appears to me that invalidating .front is done for performances reasons most of the time. As this is the unsafe face of the coin, I strongly think that this shouldn't be the default behavior. To me the best solution is to provide a function or a property on ranges that provide the unsafe version of the range. It is easy to provide a default implementation as UFCS that is a NOOP so no code is being broken. With that design, it is possible to provide an always safe interface while allowing algorithm that can handle non persistent .front to run at full speed. As it is always safe to provide a range that persist .front when a range that do not persist it is expected, no code needs to be broken. Obviously, the performance boost will not show up in this case :D byLine and alike will have to be modified to follow this policy, but hopefully, no code should be broken in the process. If I use your permuting example, it should be done as follow : struct InvalidatingAllPermutations(T) { T[] current, first; bool done; this(T[] initialBuf) { first = initialBuf; current = first.dup; } void popFront() { nextPermutation(current); if (equal(current, first)) done = true; } property bool empty() { return !done; } property T[] front() { return current; } auto save() { AllPermutations!T copy; copy.front = this.front; copy.first = this.first; copy.done = this.done; return copy; } } struct AllPermutations(T) { InvalidatingAllPermutations!T innerRange; alias innerRange this; T[] current; this(T[] initialBuf) { innerRange = InvalidatingAllPermutations!T(initialBuf); current = innerRange.front.dup; } property T[] front() { return current; } void popFront() { innerRange.popFront(); current = innerRange.front.dup; } } I do think this is the way that conciliate safe as default but still allow to be fast when needed, without adding much on most code.
Oct 30 2012
On Wed, Oct 31, 2012 at 12:00:50AM +0100, deadalnix wrote: [...]It appears to me that invalidating .front is done for performances reasons most of the time. As this is the unsafe face of the coin, I strongly think that this shouldn't be the default behavior. To me the best solution is to provide a function or a property on ranges that provide the unsafe version of the range. It is easy to provide a default implementation as UFCS that is a NOOP so no code is being broken. With that design, it is possible to provide an always safe interface while allowing algorithm that can handle non persistent .front to run at full speed.Hmm. I like this idea! This is much better than my proposal. So let's say we change byLine() to always return a copy of the buffer, so that .front is never invalidated. Then for algorithms that don't care about .front being invalidated, they can do something like this: auto myAlgo(R)(R inputRange) { auto r = inputRange.fastRange; while (!r.empty) { auto x = r.front; // make use of x r.popFront(); // this invalidates x } return result; } void main() { myAlgo(stdin.byLine()); } We can use UFCS so that fastRange is always defined: // Default implementation auto fastRange(R)(R range) { return range; } For byLine(), then, we have: struct ByLine(...) { // safe implementation property auto fastRange() { struct UnsafeByLine(...) { ... } return UnsafeByLine(this); } } I like this. Very clean, and doesn't break backward compatibility, and allows select algorithms to be optimized for speed without affecting everything else. [...]If I use your permuting example, it should be done as follow : struct InvalidatingAllPermutations(T) { T[] current, first; bool done; this(T[] initialBuf) { first = initialBuf; current = first.dup; } void popFront() { nextPermutation(current); if (equal(current, first)) done = true; } property bool empty() { return !done; } property T[] front() { return current; } auto save() { AllPermutations!T copy; copy.front = this.front; copy.first = this.first; copy.done = this.done; return copy; } } struct AllPermutations(T) { InvalidatingAllPermutations!T innerRange; alias innerRange this; T[] current; this(T[] initialBuf) { innerRange = InvalidatingAllPermutations!T(initialBuf); current = innerRange.front.dup; } property T[] front() { return current; } void popFront() { innerRange.popFront(); current = innerRange.front.dup; } } I do think this is the way that conciliate safe as default but still allow to be fast when needed, without adding much on most code.+1. I like this better than my proposal. T -- Doubtless it is a good thing to have an open mind, but a truly open mind should be open at both ends, like the food-pipe, with the capacity for excretion as well as absorption. -- Northrop Frye
Oct 30 2012
Le 31/10/2012 01:56, H. S. Teoh a écrit :+1. I like this better than my proposal.The obvious drawback is that this make invalidating ranges harder to write. But they seems to be more the exception than the rule.
Oct 31 2012
On Wed, Oct 31, 2012 at 02:18:55PM +0100, deadalnix wrote:Le 31/10/2012 01:56, H. S. Teoh a écrit :Actually, there is another problem: many algorithms' output will be transient or not depending on the input range. For example, we could write map to use the transient version of byLine, say, but then the range that map returns will also be transient (no longer safe). IOW, transience of .front is transitive in some cases. This again makes things complicated: as soon as you use a single transient range, it makes downstream ranges transient as well. So we're back to the problem of how to mark the range as transient in a transparent way. :-( T -- VI = Visual Irritation+1. I like this better than my proposal.The obvious drawback is that this make invalidating ranges harder to write. But they seems to be more the exception than the rule.
Oct 31 2012
Le 31/10/2012 18:19, H. S. Teoh a écrit :On Wed, Oct 31, 2012 at 02:18:55PM +0100, deadalnix wrote:No it isn't always transitive. But that is true that it is in some cases. This is why the code should default to the safe behavior. It is up to the programmer to ensure correctness when opting for an alternate behavior.Le 31/10/2012 01:56, H. S. Teoh a écrit :Actually, there is another problem: many algorithms' output will be transient or not depending on the input range. For example, we could write map to use the transient version of byLine, say, but then the range that map returns will also be transient (no longer safe). IOW, transience of .front is transitive in some cases. This again makes things complicated: as soon as you use a single transient range, it makes downstream ranges transient as well. So we're back to the problem of how to mark the range as transient in a transparent way. :-(+1. I like this better than my proposal.The obvious drawback is that this make invalidating ranges harder to write. But they seems to be more the exception than the rule.
Oct 31 2012
On Thu, Nov 01, 2012 at 12:05:18AM +0100, deadalnix wrote:Le 31/10/2012 18:19, H. S. Teoh a écrit :[...]But then can the alternate behaviour be used at all? For example, joiner will return a range which will be transient if the original range is transient. But since this is "unsafe" behaviour, joiner's implementation can't take advantage of the faster behaviour at all, since otherwise the user, not knowing that it is switching to, say, byLine.fast instead of the safe version of byLine, may assume that the resulting range is non-transient, but it is actually transient. So this nullifies the usefulness of having .fast, since many Phobos algorithms that would have been able to take advantage of .fast can't, because their result will be unsafe. In which case, is it even worth the bother to implement such a thing? I think we still haven't addressed the fundamental issue, that is, we need a way to differentiate between transient and non-transient .front values. Being able to work with transient values is the best, because it automatically also works for non-transient values, but some algos need to be able to assume non-transience. This core problem is still unsolved. T -- Windows: the ultimate triumph of marketing over technology. -- Adrian von BidderActually, there is another problem: many algorithms' output will be transient or not depending on the input range. For example, we could write map to use the transient version of byLine, say, but then the range that map returns will also be transient (no longer safe). IOW, transience of .front is transitive in some cases. This again makes things complicated: as soon as you use a single transient range, it makes downstream ranges transient as well. So we're back to the problem of how to mark the range as transient in a transparent way. :-(No it isn't always transitive. But that is true that it is in some cases. This is why the code should default to the safe behavior. It is up to the programmer to ensure correctness when opting for an alternate behavior.
Oct 31 2012
Le 01/11/2012 05:30, H. S. Teoh a écrit :On Thu, Nov 01, 2012 at 12:05:18AM +0100, deadalnix wrote:Would joiner be able to take advantage of this if it was known or not ?Le 31/10/2012 18:19, H. S. Teoh a écrit :[...]But then can the alternate behaviour be used at all? For example, joiner will return a range which will be transient if the original range is transient. But since this is "unsafe" behaviour, joiner's implementation can't take advantage of the faster behaviour at all, since otherwise the user, not knowing that it is switching to, say, byLine.fast instead of the safe version of byLine, may assume that the resulting range is non-transient, but it is actually transient. So this nullifies the usefulness of having .fast, since many Phobos algorithms that would have been able to take advantage of .fast can't, because their result will be unsafe. In which case, is it even worth the bother to implement such a thing? I think we still haven't addressed the fundamental issue, that is, we need a way to differentiate between transient and non-transient .front values. Being able to work with transient values is the best, because it automatically also works for non-transient values, but some algos need to be able to assume non-transience. This core problem is still unsolved.Actually, there is another problem: many algorithms' output will be transient or not depending on the input range. For example, we could write map to use the transient version of byLine, say, but then the range that map returns will also be transient (no longer safe). IOW, transience of .front is transitive in some cases. This again makes things complicated: as soon as you use a single transient range, it makes downstream ranges transient as well. So we're back to the problem of how to mark the range as transient in a transparent way. :-(No it isn't always transitive. But that is true that it is in some cases. This is why the code should default to the safe behavior. It is up to the programmer to ensure correctness when opting for an alternate behavior.
Nov 01 2012
On Thu, Nov 01, 2012 at 03:18:13PM +0100, deadalnix wrote:Le 01/11/2012 05:30, H. S. Teoh a écrit :I have a working version of joiner that doesn't assume the persistence of .front. The thing is, if the range you hand it is transient, then the resulting joined range is also transient. This shows up in one of my unittests: I can't use writeln() because it assumes persistence of .front, and I can't use array() either for the same reason. I have to explicit write a foreach to .dup the elements of the range into an array before it can be safely used in writeln(), etc.. This makes it impossible for joiner to transparently switch to a fast implementation (call .fast on the original range), because in this case the transience is transitive -- the user will have to explicitly call joiner with .fast. I don't like that, because then how would you know which algos are safe to call with .fast? It's only by documentation, i.e., by convention, and is bound to be a hiding place for bugs later on. T -- Those who don't understand Unix are condemned to reinvent it, poorly.On Thu, Nov 01, 2012 at 12:05:18AM +0100, deadalnix wrote:Would joiner be able to take advantage of this if it was known or not ?Le 31/10/2012 18:19, H. S. Teoh a écrit :[...]But then can the alternate behaviour be used at all? For example, joiner will return a range which will be transient if the original range is transient. But since this is "unsafe" behaviour, joiner's implementation can't take advantage of the faster behaviour at all, since otherwise the user, not knowing that it is switching to, say, byLine.fast instead of the safe version of byLine, may assume that the resulting range is non-transient, but it is actually transient. So this nullifies the usefulness of having .fast, since many Phobos algorithms that would have been able to take advantage of .fast can't, because their result will be unsafe. In which case, is it even worth the bother to implement such a thing? I think we still haven't addressed the fundamental issue, that is, we need a way to differentiate between transient and non-transient .front values. Being able to work with transient values is the best, because it automatically also works for non-transient values, but some algos need to be able to assume non-transience. This core problem is still unsolved.Actually, there is another problem: many algorithms' output will be transient or not depending on the input range. For example, we could write map to use the transient version of byLine, say, but then the range that map returns will also be transient (no longer safe). IOW, transience of .front is transitive in some cases. This again makes things complicated: as soon as you use a single transient range, it makes downstream ranges transient as well. So we're back to the problem of how to mark the range as transient in a transparent way. :-(No it isn't always transitive. But that is true that it is in some cases. This is why the code should default to the safe behavior. It is up to the programmer to ensure correctness when opting for an alternate behavior.
Nov 01 2012
This is up to the consumer to choose if .front can be oblitered on popFront, not to an intermediate algorithm. joiner isn't a consumer, this is a « transformer ». transformer have to propagate the .fast (I don't like this name, but this is unimportant for now) to its source. Let'(s consider the following sheme : source -> transformer (possibly several of them) -> consumer. Now see example below : auto transform(R)(R range) { struct Transfomer(R) { // Operations . . . property auto fast() { return transform(source.fast); } } } So the fast is used by the consumer and bubble throw all ranges that support it up to the source. Or is made a NOOP in the process in one of the transformers or the source do not support that optimization. As said before, this is up to the consumer to know it it accept .front to be obliterated. In your case, writeln don't support that, so .fast must not be used with writeln.
Nov 01 2012
On Thu, Nov 01, 2012 at 11:40:52PM +0100, deadalnix wrote:This is up to the consumer to choose if .front can be oblitered on popFront, not to an intermediate algorithm. joiner isn't a consumer, this is a « transformer ». transformer have to propagate the .fast (I don't like this name, but this is unimportant for now) to its source.What would be a better name for it?Let'(s consider the following sheme : source -> transformer (possibly several of them) -> consumer. Now see example below : auto transform(R)(R range) { struct Transfomer(R) { // Operations . . . property auto fast() { return transform(source.fast); } } } So the fast is used by the consumer and bubble throw all ranges that support it up to the source. Or is made a NOOP in the process in one of the transformers or the source do not support that optimization. As said before, this is up to the consumer to know it it accept .front to be obliterated. In your case, writeln don't support that, so .fast must not be used with writeln.Ah, I see. That makes sense. So basically it's not the source (or any intermediate step) that decides whether to use the optimization, but the final consumer. Though now we have the issue that all intermediate ranges must propagate .fast, which is troublesome if every range has to do it manually. Can this be handled automatically by UFCS? T -- What doesn't kill me makes me stranger.
Nov 02 2012
On Friday, November 02, 2012 10:01:55 H. S. Teoh wrote:Ah, I see. That makes sense. So basically it's not the source (or any intermediate step) that decides whether to use the optimization, but the final consumer. Though now we have the issue that all intermediate ranges must propagate .fast, which is troublesome if every range has to do it manually. Can this be handled automatically by UFCS?It's no different form propogating slicing or random access or whatnot. Wrapper ranges have to look at the capabilities of the ranges that they're wrapping and create wrappers for each of the range functions where appropriate. If we added isTransient or fastRange or whatever, wrapper ranges would then have to take it into account, or the wrapper wouldn't have it. - Jonathan M Davis
Nov 02 2012
Le 02/11/2012 21:17, Jonathan M Davis a écrit :On Friday, November 02, 2012 10:01:55 H. S. Teoh wrote:The whole point of my proposal is to avoid adding isTransient or something similar. Let's put back relevant elements of the solution I propose : 1/ range preserve .front by default . 2/ Ranges that would get performance boost from invalidating .front can propose a property (we called it .fast in the thread) that return a new range that invalidate .front . 3/ Range that don't support that feature can be made compatible with UFCS, .fast being a NOP in such case. 4/ Ranges that work as a transofmration on existing ranges can propose the property too if they don't require .front to be persistant. It is implemented by bubbling the property call to the source range. 5/ It is up to the ultimate consumer to call that property or not.Ah, I see. That makes sense. So basically it's not the source (or any intermediate step) that decides whether to use the optimization, but the final consumer. Though now we have the issue that all intermediate ranges must propagate .fast, which is troublesome if every range has to do it manually. Can this be handled automatically by UFCS?It's no different form propogating slicing or random access or whatnot. Wrapper ranges have to look at the capabilities of the ranges that they're wrapping and create wrappers for each of the range functions where appropriate. If we added isTransient or fastRange or whatever, wrapper ranges would then have to take it into account, or the wrapper wouldn't have it. - Jonathan M Davis
Nov 04 2012
On 11/4/12 9:36 AM, deadalnix wrote:Let's put back relevant elements of the solution I propose : 1/ range preserve .front by default . 2/ Ranges that would get performance boost from invalidating .front can propose a property (we called it .fast in the thread) that return a new range that invalidate .front .IMHO this property is deducible from the others: fast!R == isInputRange!R && !isForwardRange!R && hasMutableIndirections!(ElementType!R) I think it would be vastly better to avoid defining a new property of ranges (that subsequently would need to be cross-checked by many algorithms). Then we have additional scalability issues because all derived ranges would need to propagate .fast etc. Then we need to cater for odd cases such as random-access ranges being .fast. The simpler way would be to simply state that input ranges can't guarantee the persistence of .front after calling .popFront. This is quite expected for input ranges, and no different for any object that gives a char[]-based API; subsequently changing the object may change the contents of the char[] returned. The algorithms that need to worry about .front's transience are only the ones that work with input ranges (as opposed to forward ranges or stronger). This approach puts the burden in the right place - on the implementers of specific algorithms supporting one-pass streaming. Andrei
Nov 04 2012
11/4/2012 7:16 PM, Andrei Alexandrescu пишет:On 11/4/12 9:36 AM, deadalnix wrote:Or rather onePass!R == ... but I seriously doubt that ability to save iteration state and the fact of reusing single slot/memory chunk for .front are always correlated.Let's put back relevant elements of the solution I propose : 1/ range preserve .front by default . 2/ Ranges that would get performance boost from invalidating .front can propose a property (we called it .fast in the thread) that return a new range that invalidate .front .IMHO this property is deducible from the others: fast!R == isInputRange!R && !isForwardRange!R && hasMutableIndirections!(ElementType!R)I think it would be vastly better to avoid defining a new property of ranges (that subsequently would need to be cross-checked by many algorithms). Then we have additional scalability issues because all derived ranges would need to propagate .fast etc. Then we need to cater for odd cases such as random-access ranges being .fast.Neither .fast nor separate transient flag strike me as good solutions as placing a lot of burden on the range implementer (and introducing yet another convention).The simpler way would be to simply state that input ranges can't guarantee the persistence of .front after calling .popFront. This is quite expected for input ranges, and no different for any object that gives a char[]-based API; subsequently changing the object may change the contents of the char[] returned.The algorithms that need to worry about .front's transience are only the ones that work with input ranges (as opposed to forward ranges or stronger).The fact that algorithm doesn't save iteration state != it counts on transient .front. A simple example of std.array.array will do - it iterates range once and pumps elements into appender/builder "sink". So it doesn't require forward range but just that elements are not _aliased_ under the hood. Another example is dirEntries - it returns entries with immutable strings, but it can't save iteration state. It does work with array because of no aliasing of .front values.This approach puts the burden in the right place - on the implementers of specific algorithms supporting one-pass streaming.Aside from mixing separate concepts into one* I like the approach. *InputRange vs ForwardRange has nothing to do with mutable indirections of .front. -- Dmitry Olshansky
Nov 04 2012
On 11/4/12 11:36 AM, Dmitry Olshansky wrote:The fact that algorithm doesn't save iteration state != it counts on transient .front.I didn't claim that. My strongest claim was: input-range-having-front-with-mutable-indirection IMPLIES no-transience-guarantee. Note the IMPLIES (not EQUIVALENT) and no guarantee as opposed to guaranteed change of contents.A simple example of std.array.array will do - it iterates range once and pumps elements into appender/builder "sink". So it doesn't require forward range but just that elements are not _aliased_ under the hood.No surprise here.Another example is dirEntries - it returns entries with immutable strings, but it can't save iteration state. It does work with array because of no aliasing of .front values.This example is actually nice because it shows that indeed an input range with NO mutable indirection on ElementType guarantees non-transience. Andrei
Nov 04 2012
11/4/2012 9:02 PM, Andrei Alexandrescu пишет:On 11/4/12 11:36 AM, Dmitry Olshansky wrote:Indeed, but it wasn't your claim that I meant. It's fine with me but it's hardly moving us forward. I was investigating your simplified proposal and its consequences for e.g. std.array.array. And apparently forgot to make the central point. I was mostly going by:The fact that algorithm doesn't save iteration state != it counts on transient .front.I didn't claim that. My strongest claim was: input-range-having-front-with-mutable-indirection IMPLIES no-transience-guarantee.The simpler way would be to simply state that input ranges can'tguarantee the persistence of .front after calling .popFront. andThe algorithms that need to worry about .front's transience are onlythe ones that work with input ranges (as opposed to forward ranges or stronger). And conclude that applying this simple rule to std.array.array* means it has to assume non-transient elements and somehow copy/dup'em. Problem is we don't have generic function to make 100% guaranteed unaliased copy (it could omit doing any work on non-aliased objects). Also that it misses forward ranges that have aliased .front, assuming they are non-transient. *It has to work with input ranges as I noted below.-- Dmitry OlshanskyA simple example of std.array.array will do - it iterates range once and pumps elements into appender/builder "sink". So it doesn't require forward range but just that elements are not _aliased_ under the hood.No surprise here.Another example is dirEntries - it returns entries with immutable strings, but it can't save iteration state. It does work with array because of no aliasing of .front values.This example is actually nice because it shows that indeed an input range with NO mutable indirection on ElementType guarantees non-transience.
Nov 04 2012
On Sunday, November 04, 2012 21:49:58 Dmitry Olshansky wrote:I was mostly going by: > The simpler way would be to simply state that input ranges can't guarantee the persistence of .front after calling .popFront. and >The algorithms that need to worry about .front's transience are only the ones that work with input ranges (as opposed to forward ranges or stronger). And conclude that applying this simple rule to std.array.array* means it has to assume non-transient elements and somehow copy/dup'em. Problem is we don't have generic function to make 100% guaranteed unaliased copy (it could omit doing any work on non-aliased objects). Also that it misses forward ranges that have aliased .front, assuming they are non-transient.The reality is that in the general case, you can't know for certain whether the result of front is transient or not. In some cases, you know, because the type is clearly immutable or a value type. In others (e.g. char[]), you can't know. A particularly annoying one though is classes, because they're almost never going to be immutable, so in almost all cases, if input ranges can have transient fronts and the range's element types is a class, then you have to assume that its front is transient. And I'm not sure whether it's possible to determine whether a struct is a value type or a reference type or not. The mutable indirections bit probably makes it so that you can determine for sure that some structs are value types, but you can't be certain for some of them whether they're a reference type or value type without examining the postblit constructor. So, what it ends up coming down to if we accept that input ranges can have transient fronts is that algorithms then either have to require forward ranges in the case where they need to keep the result of front around, or we need a template of some kind which can determine in at least some cases that front in not transient (but it's going to have to say that front is transient in many cases where it isn't), and such algorithms will have to use that trait when dealing with input ranges. The big problem though is std.array.array. Based on the functions that an input range has, all it needs is an input range, but it'll never work with a transient front. So, what is likely one of the most heavily used range-based functions can't work with input ranges. This is a big problem, and fixing it _will_ break code. Because either, we need to create a hasTransientFront template or make std.array.array require at least a forward range, which will make dealing with input ranges that much worse. Still, as much as I hate the idea of permitting transient fronts, I'd rather have it be part of input ranges and be able to assume that other range types have non-transient fronts than create isTransient. Ranges are already overly complicated as it is. Regardless, if we decide that input ranges can have transient fronts, then we need to make that very clear, which clearly hasn't been the case, because only Andrei thought that the transience of front had anything to do with the definition of an input range. - Jonathan M Davis
Nov 04 2012
Le 04/11/2012 16:16, Andrei Alexandrescu a écrit :On 11/4/12 9:36 AM, deadalnix wrote:You can stop here. This shows that you didn't understood the proposal. The whole point of the proposal is to avoid the cross checking part. See code sample posted earlier in the discussion for full understanding.Let's put back relevant elements of the solution I propose : 1/ range preserve .front by default . 2/ Ranges that would get performance boost from invalidating .front can propose a property (we called it .fast in the thread) that return a new range that invalidate .front .IMHO this property is deducible from the others: fast!R == isInputRange!R && !isForwardRange!R && hasMutableIndirections!(ElementType!R) I think it would be vastly better to avoid defining a new property of ranges (that subsequently would need to be cross-checked by many algorithms).Then we have additional scalability issues because all derived ranges would need to propagate .fast etc. Then we need to cater for odd cases such as random-access ranges being .fast.The beauty of the proposal is that nothing NEED to be .fast .The simpler way would be to simply state that input ranges can't guarantee the persistence of .front after calling .popFront. This is quite expected for input ranges, and no different for any object that gives a char[]-based API; subsequently changing the object may change the contents of the char[] returned. The algorithms that need to worry about .front's transience are only the ones that work with input ranges (as opposed to forward ranges or stronger). This approach puts the burden in the right place - on the implementers of specific algorithms supporting one-pass streaming.
Nov 04 2012
On 11/4/12 11:40 AM, deadalnix wrote:Le 04/11/2012 16:16, Andrei Alexandrescu a écrit :Indeed I'd misunderstood. So I went back and my current understanding is that it's all about defining this: auto transient(R r) if (isInputRange!R) { return r; } Then certain ranges would implement a property .transient if there's an opportunity for e.g. reusing buffers (a la ByLine and ByChunk). At the end of the day, simply invoking r.transient for any range r would either get the optimized range or would vacuously return that same range. As far as I can tell from the subsequent discussion, there are quite a few issues related to propagating transient by ranges that compose on top of other ranges. By and large, adding an optional attribute of ranges is difficult to make free or minimal in cost. AndreiOn 11/4/12 9:36 AM, deadalnix wrote:You can stop here. This shows that you didn't understood the proposal. The whole point of the proposal is to avoid the cross checking part. See code sample posted earlier in the discussion for full understanding.Let's put back relevant elements of the solution I propose : 1/ range preserve .front by default . 2/ Ranges that would get performance boost from invalidating .front can propose a property (we called it .fast in the thread) that return a new range that invalidate .front .IMHO this property is deducible from the others: fast!R == isInputRange!R && !isForwardRange!R && hasMutableIndirections!(ElementType!R) I think it would be vastly better to avoid defining a new property of ranges (that subsequently would need to be cross-checked by many algorithms).
Nov 04 2012
Le 04/11/2012 17:57, Andrei Alexandrescu a écrit :Indeed I'd misunderstood. So I went back and my current understanding is that it's all about defining this: auto transient(R r) if (isInputRange!R) { return r; } Then certain ranges would implement a property .transient if there's an opportunity for e.g. reusing buffers (a la ByLine and ByChunk). At the end of the day, simply invoking r.transient for any range r would either get the optimized range or would vacuously return that same range.Now we are on the same page. (and I like transient much better than .fast).As far as I can tell from the subsequent discussion, there are quite a few issues related to propagating transient by ranges that compose on top of other ranges. By and large, adding an optional attribute of ranges is difficult to make free or minimal in cost.To sum it up, a transformation range should implement .transient if it can handle the fact that its source can be transient. The good news is that if it doesn't, then the program is still 100% correct. It is just slower. So this approach promote program correctness, but still allow crazy bearded programmer to optimize the code where needed. I think it fit nicely D's phylosophy, in the sense it does provide a safe, easy to use interface while providing a backdoor when this isn't enough.
Nov 04 2012
On 11/4/12 12:26 PM, deadalnix wrote:I think it fit nicely D's phylosophy, in the sense it does provide a safe, easy to use interface while providing a backdoor when this isn't enough.It doesn't fit the (admittedly difficult to fulfill) desideratum that the obvious code is safe and fast. And the obvious code uses byLine, not byLine.transient. Back to a simpler solution: what's wrong with adding alternative APIs for certain input ranges? We have byLine, byChunk, byChunkAsync. We may as well add eachLine, eachChunk, eachChunkAsync and let the documentation explain the differences. Andrei
Nov 04 2012
On Sunday, 4 November 2012 at 17:38:07 UTC, Andrei Alexandrescu wrote:Back to a simpler solution: what's wrong with adding alternative APIs for certain input ranges? We have byLine, byChunk, byChunkAsync. We may as well add eachLine, eachChunk, eachChunkAsync and let the documentation explain the differences. AndreiYes, please! (<nitpick>Although I don't necessarily agree with the naming</nitpick>). I think we should go with transient ranges the same way we went with "non-size_t index" ranges. Acknowledge that they exist, but not guarantee their support when passed to Phobos algorithms. byLine is a perfect example of this: You can use it all you want in a for loop, but pass it "copy" or "array" at your own risk. At that point, all we do is "tag" byLine as "Warning: Transient.", and call it a day. Not as perfectly safe as we'd want, but good enough, IMO. We then offer "eachLine", which isn't transient, and everyone is happy.
Nov 04 2012
On Sun, Nov 04, 2012 at 12:38:06PM -0500, Andrei Alexandrescu wrote:On 11/4/12 12:26 PM, deadalnix wrote:Actually, it does. The proposal was to modify byLine so that it returns strings instead of a reused char[] buffer. The current version of byLine that has transient front will be moved into a .transient property. This has the following results: - All existing code doesn't break: it just becomes a tad slower from duplicating each line. Correctness is not broken. - Code that wants to be fast can call .transient at the end of the construction, e.g., stdin.byLine().map!myFunc.transient. Assuming that map propagates .transient (which also implies it's safe to use with transient ranges), this will return an optimized range that reuses the transient buffer safely. - Code that calls .transient on non-transient ranges get a no-op: [1,2,3].map!myFunc.transient is equivalent to [1,2,3].map!myFunc, because non-transient ranges just alias .transient to themselves (this can be done via UFCS so no actual change needs to be made to existing non-transient ranges). IOW, code is always 100% safe by default. You can optionally use .transient with certain ranges if you'd like for it to be faster. Using .transient where it isn't supported simply defaults to the usual safe behaviour (no performance increase, but no safety bugs either). Now, this can be taken one step further. User code need not even know what .transient is. As deadalnix said, the insight is that it is only invoked by range *consumers*. For example, one could in theory make canFind() transient-safe, so the user would write this: auto n = stdin.byLine().map!myFunc.canFind(myNeedle); and canFind uses the .transient range to do the finding. Assuming map!myFunc correctly supports .transient, this request for the faster range automatically propagates back to byLine(), and so the code is fast *without the user even asking for .transient*. And if the range being scanned is non-transient, then it behaves normally as it does today. For example, if map!myFunc does *not* support .transient, then when canFind asks for .transient, UFCS takes over and it just gets the (non-transient) mapped range back. Which in turn asks for the *non-transient* version of byLine(), so in no case will there be any safety breakage. I think this solution is most elegant so far, in the sense that: (1) Existing code doesn't break; (2) Existing code doesn't even need to change, or can be slowly optimized to use .transient on a case-by-case basis without any code breakage; (3) The user doesn't even need to know what .transient is to reap the benefits, where it's supported; (4) The algorithm writer decides whether or not .transient is supported, guaranteeing that there will be no hidden bugs caused by assuming .front is persistent and then getting a transient range passed to it. So it's the algorithm that declares whether or not it supports .transient.I think it fit nicely D's phylosophy, in the sense it does provide a safe, easy to use interface while providing a backdoor when this isn't enough.It doesn't fit the (admittedly difficult to fulfill) desideratum that the obvious code is safe and fast. And the obvious code uses byLine, not byLine.transient.Back to a simpler solution: what's wrong with adding alternative APIs for certain input ranges? We have byLine, byChunk, byChunkAsync. We may as well add eachLine, eachChunk, eachChunkAsync and let the documentation explain the differences.[...] This is less elegant than deadalnix's proposal, because there is a certain risk associated with using .transient ranges. Now the onus is on the user to make sure he doesn't pass a transient range to an algorithm that can't handle it. With deadalnix's proposal, it's the *algorithm* that decides whether or not it knows how to handle .transient properly. The user doesn't have to know, and can still reap the benefits. An algorithm that doesn't know how to deal with transient ranges simply does nothing, and UFCS takes over to provide the default .transient, which just returns the original non-transient range. T -- ГоÑударÑтво делает вид, что платит нам зарплату, а мы делаем вид, что работаем.
Nov 04 2012
Le 04/11/2012 23:10, H. S. Teoh a écrit :On Sun, Nov 04, 2012 at 12:38:06PM -0500, Andrei Alexandrescu wrote:I think I'll hire you when I need to promote my ideas :D This is much better explained than what I did myself and is exactly what I had in mind.On 11/4/12 12:26 PM, deadalnix wrote:Actually, it does. The proposal was to modify byLine so that it returns strings instead of a reused char[] buffer. The current version of byLine that has transient front will be moved into a .transient property. This has the following results: - All existing code doesn't break: it just becomes a tad slower from duplicating each line. Correctness is not broken. - Code that wants to be fast can call .transient at the end of the construction, e.g., stdin.byLine().map!myFunc.transient. Assuming that map propagates .transient (which also implies it's safe to use with transient ranges), this will return an optimized range that reuses the transient buffer safely. - Code that calls .transient on non-transient ranges get a no-op: [1,2,3].map!myFunc.transient is equivalent to [1,2,3].map!myFunc, because non-transient ranges just alias .transient to themselves (this can be done via UFCS so no actual change needs to be made to existing non-transient ranges). IOW, code is always 100% safe by default. You can optionally use .transient with certain ranges if you'd like for it to be faster. Using .transient where it isn't supported simply defaults to the usual safe behaviour (no performance increase, but no safety bugs either). Now, this can be taken one step further. User code need not even know what .transient is. As deadalnix said, the insight is that it is only invoked by range *consumers*. For example, one could in theory make canFind() transient-safe, so the user would write this: auto n = stdin.byLine().map!myFunc.canFind(myNeedle); and canFind uses the .transient range to do the finding. Assuming map!myFunc correctly supports .transient, this request for the faster range automatically propagates back to byLine(), and so the code is fast *without the user even asking for .transient*. And if the range being scanned is non-transient, then it behaves normally as it does today. For example, if map!myFunc does *not* support .transient, then when canFind asks for .transient, UFCS takes over and it just gets the (non-transient) mapped range back. Which in turn asks for the *non-transient* version of byLine(), so in no case will there be any safety breakage. I think this solution is most elegant so far, in the sense that: (1) Existing code doesn't break; (2) Existing code doesn't even need to change, or can be slowly optimized to use .transient on a case-by-case basis without any code breakage; (3) The user doesn't even need to know what .transient is to reap the benefits, where it's supported; (4) The algorithm writer decides whether or not .transient is supported, guaranteeing that there will be no hidden bugs caused by assuming .front is persistent and then getting a transient range passed to it. So it's the algorithm that declares whether or not it supports .transient.I think it fit nicely D's phylosophy, in the sense it does provide a safe, easy to use interface while providing a backdoor when this isn't enough.It doesn't fit the (admittedly difficult to fulfill) desideratum that the obvious code is safe and fast. And the obvious code uses byLine, not byLine.transient.Back to a simpler solution: what's wrong with adding alternative APIs for certain input ranges? We have byLine, byChunk, byChunkAsync. We may as well add eachLine, eachChunk, eachChunkAsync and let the documentation explain the differences.[...] This is less elegant than deadalnix's proposal, because there is a certain risk associated with using .transient ranges. Now the onus is on the user to make sure he doesn't pass a transient range to an algorithm that can't handle it. With deadalnix's proposal, it's the *algorithm* that decides whether or not it knows how to handle .transient properly. The user doesn't have to know, and can still reap the benefits. An algorithm that doesn't know how to deal with transient ranges simply does nothing, and UFCS takes over to provide the default .transient, which just returns the original non-transient range.
Nov 05 2012
On Fri, Nov 02, 2012 at 04:17:10PM -0400, Jonathan M Davis wrote:On Friday, November 02, 2012 10:01:55 H. S. Teoh wrote:[...] I wish Andrei would give some input as to how we should proceed with this. I do consider this a major issue with ranges, because for efficiency reasons I often write ranges that have transient .front values, and they can lead to subtle bugs with the current implementation of std.algorithm. It would be good to settle on this issue one way or another. I'd be happy even if the decision is to say that transient ranges are invalid and shouldn't be considered "real" ranges. Anything is better than the current nebulous state of things which only leads to subtle bugs for the unwary. T -- Real Programmers use "cat > a.out".Ah, I see. That makes sense. So basically it's not the source (or any intermediate step) that decides whether to use the optimization, but the final consumer. Though now we have the issue that all intermediate ranges must propagate .fast, which is troublesome if every range has to do it manually. Can this be handled automatically by UFCS?It's no different form propogating slicing or random access or whatnot. Wrapper ranges have to look at the capabilities of the ranges that they're wrapping and create wrappers for each of the range functions where appropriate. If we added isTransient or fastRange or whatever, wrapper ranges would then have to take it into account, or the wrapper wouldn't have it.
Nov 03 2012
On Saturday, 3 November 2012 at 23:19:11 UTC, H. S. Teoh wrote:I wish Andrei would give some input as to how we should proceed with this. I do consider this a major issue with ranges, because for efficiency reasons I often write ranges that have transient .front values, and they can lead to subtle bugs with the current implementation of std.algorithm. It would be good to settle on this issue one way or another. I'd be happy even if the decision is to say that transient ranges are invalid and shouldn't be considered "real" ranges. Anything is better than the current nebulous state of things which only leads to subtle bugs for the unwary.From watching and gleaming from what I have so far, I can only think that transient should NOT be the default way that ranges work (as it causes too many problems); However transient should be allowed (and available) when possible. I can only think that perhaps using a template to determine if it's transient should be present, that way it's a simple flag to enable/disable and should propagate anywhere that that range was used. struct Range(bool isTransient=false) { static if (isTransient) { //front and transient } else { //front and non-transient } } Unless someone thinks this is a bad approach?
Nov 03 2012
On Sunday, November 04, 2012 02:30:49 Era Scarecrow wrote:From watching and gleaming from what I have so far, I can only think that transient should NOT be the default way that ranges work (as it causes too many problems); However transient should be allowed (and available) when possible. I can only think that perhaps using a template to determine if it's transient should be present, that way it's a simple flag to enable/disable and should propagate anywhere that that range was used. struct Range(bool isTransient=false) { static if (isTransient) { //front and transient } else { //front and non-transient } } Unless someone thinks this is a bad approach?That's just trying to be able to tell a range whether it should be transient or not and doesn't really solve the problem. The problem is that algorithms in general assume that front is _not_ transient, and we need to either decide that algorithms are free to continue to assume that front is non-transient (meaning that you make front transient on your range at your own risk and no guarantees that any range-based functions will work with it), or we need to provide a way for range-based functions to know whether a particular range has a transient front or not. As such, I think that it comes down primarily to either 1. Just decide that all input ranges can have transient fronts and that no ranges greater than input ranges can have transient fronts. Then algorithms can infer transience from the type of range. This is how Andrei thinks that it's supposed to work but pretty much no one else does (if nothing else, because that idea was never made clear anywhere, and no one else inferred that from the definitions of input ranges). 2. Make it so that any range can have a transient front but provide a template for checking for it like we do with stuff like hasSlicing or length. Presumably that template would check that for something like like the existence of R.isTransient, and ranges with transient fronts would just declare an enum with that name. Then every range-based function which relied on front not being transient would have to check whether the range that it was given was transient or not or fail to compile if it is, and every wrapper range would have to propogate the isTransient member. Failure to do so in either case (which _would_ happen at least some of the time, at least outside of Phobos), would result in transient ranges silently doing bizarre things. 3. Make it so that ranges which can be transient are non-transient by default but provide a function to get at a transient version for speed (which was the fastRange proposal in this thread). The main problem here is that when the fast range gets wrapped, it's transient, and so anything using the wrapped range will be forced to use the transient version rather than using the non- transient version and only using the transient version when it's asked for. So, I don't think that this is particularly viable. 4. Just decide that range-based algorithms can assume that front isn't ever transient. Any range types which make it transient do so at their own risk. opApply. I think that transience complicates things too much. Even just having input ranges have transient fronts really screws with things. Something as simple as auto app = appender!E(); foreach(e; range) app.put(e); is totally screwed by transience, and it's only operating on input ranges. I seriously question that a function like std.array.array can be implemented with a transient front. Without a way to explicitly copy front, you can't have front be transient and keep it like std.array.array does. And even if there _were_ a way to explictly copy the result of front, that would be horribly inefficient in the cases where front isn't transient, because it would force you to copy when it was completely unnecessary. I honestly don't think that transience really works outside of some very specific use cases, and the cost of trying to support it will not be small. And given the issues that std.array.array has with it and the fact that std.array.array really shouldn't have to operate on forward ranges, I don't think that the idea of input ranges having transient fronts is really going to work, which would mean using something like isTransient, which has the same sort of problems as save tends to and is likely to be forgotten even more than save is. So, I really question that transience and ranges is really going to work at all. If we _do_ support it though, I think that we need to go the isTransient route. - Jonathan M Davis
Nov 03 2012
Le 04/11/2012 03:26, Jonathan M Davis a écrit :3. Make it so that ranges which can be transient are non-transient by default but provide a function to get at a transient version for speed (which was the fastRange proposal in this thread). The main problem here is that when the fast range gets wrapped, it's transient, and so anything using the wrapped range will be forced to use the transient version rather than using the non- transient version and only using the transient version when it's asked for. So, I don't think that this is particularly viable.Can you elaborate on that. I definitively don't see a problem that big here.
Nov 04 2012
On Sunday, November 04, 2012 15:40:18 deadalnix wrote:Le 04/11/2012 03:26, Jonathan M Davis a =C3=A9crit :by3. Make it so that ranges which can be transient are non-transient =eeddefault but provide a function to get at a transient version for sp=here(which was the fastRange proposal in this thread). The main problem=ythingis that when the fast range gets wrapped, it's transient, and so an=using the wrapped range will be forced to use the transient version=nsientrather than using the non- transient version and only using the tra=larlyversion when it's asked for. So, I don't think that this is particu=g here. The problem is that you can't opt out of the fast range once you've got= it. If=20 you use fastRange and then the result ends up getting wrapped by anothe= r=20 range, that range's front is transient and has no way of indicating it.= So,=20 you're right back in the boat that we're in right now. In order to avoi= d that,=20 virtually all range-based functions would have to not use fastRange, be= cause=20 they have no idea whether you're going to use their result to pass to a= nother=20 function or not. - Jonathan M Davisviable.=20 Can you elaborate on that. I definitively don't see a problem that bi=
Nov 04 2012
Le 05/11/2012 04:11, Jonathan M Davis a écrit :On Sunday, November 04, 2012 15:40:18 deadalnix wrote:HS Teoh explained it nicely. The responsibility of using .transient or not belong to the consumer. Only the consumer can know if a transient range is suitable. So you don't wrap a transient range. You wrap a non transient range, and, if the consumer is able to handle the transcient version, it call .transient on its source, that call it on its source, etc . . . up to the real source. If one transformer is unable to handle transient range, the it don't pass .transient to its source.Le 04/11/2012 03:26, Jonathan M Davis a écrit :The problem is that you can't opt out of the fast range once you've got it. If you use fastRange and then the result ends up getting wrapped by another range, that range's front is transient and has no way of indicating it. So, you're right back in the boat that we're in right now. In order to avoid that, virtually all range-based functions would have to not use fastRange, because they have no idea whether you're going to use their result to pass to another function or not. - Jonathan M Davis3. Make it so that ranges which can be transient are non-transient by default but provide a function to get at a transient version for speed (which was the fastRange proposal in this thread). The main problem here is that when the fast range gets wrapped, it's transient, and so anything using the wrapped range will be forced to use the transient version rather than using the non- transient version and only using the transient version when it's asked for. So, I don't think that this is particularly viable.Can you elaborate on that. I definitively don't see a problem that big here.
Nov 05 2012
On Monday, November 05, 2012 15:48:41 deadalnix wrote:HS Teoh explained it nicely. The responsibility of using .transient or not belong to the consumer. Only the consumer can know if a transient range is suitable. So you don't wrap a transient range. You wrap a non transient range, and, if the consumer is able to handle the transcient version, it call .transient on its source, that call it on its source, etc . . . up to the real source. If one transformer is unable to handle transient range, the it don't pass .transient to its source.You still need to wrap it in every wrapper range which could possibly support transience. So, this affects every single range which could possibly support transience. At least with Andrei's proposal, transience is explicitly restricted to input ranges, which seriously reduces the problems caused by them, particularly since so few functions can really function with just an input range. - Jonathan M Davis
Nov 05 2012
Le 05/11/2012 19:42, Jonathan M Davis a écrit :On Monday, November 05, 2012 15:48:41 deadalnix wrote:As shown before, most wrapper range wouldn't have to do a thing except rewraping with .transient . The typical situation is that a wrapper range get its transientness from the wraped range. So it boils down to womething like : struct Wrapper(Wrapped) { Wrapped e; // Range implementation property auto transient() { return Wrapper!(typeof(e.transient))(e.transient); } } Considering that doing a wrapper that is compatible with transient ranges is already some work and require the writer to be aware of such a concept (you can expect most programmer to simply ignore that fact). The extra work for a wrapper range is really small, and add the benefice to ensure that the developer that programmed the wrapper took special care to ensure this will work with transient.HS Teoh explained it nicely. The responsibility of using .transient or not belong to the consumer. Only the consumer can know if a transient range is suitable. So you don't wrap a transient range. You wrap a non transient range, and, if the consumer is able to handle the transcient version, it call .transient on its source, that call it on its source, etc . . . up to the real source. If one transformer is unable to handle transient range, the it don't pass .transient to its source.You still need to wrap it in every wrapper range which could possibly support transience. So, this affects every single range which could possibly support transience.At least with Andrei's proposal, transience is explicitly restricted to input ranges, which seriously reduces the problems caused by them, particularly since so few functions can really function with just an input range.This proposal is simplistic. array() is not even implementable with such an approach. It is also slower for several use cases shown in this thread. And finally, it is likely that many non transient compatible stuff will proliferate, leading to undefined behavior all over the place. Simply telling to people to follow some guideline is not going to work. Many lead devs have experienced that with small teams, and we are talking here about every single D user.
Nov 05 2012
On Mon, Nov 05, 2012 at 01:42:47PM -0500, Jonathan M Davis wrote:On Monday, November 05, 2012 15:48:41 deadalnix wrote:You *can* wrap it if the wrapper range can support transience. It doesn't *have* to be wrapped. That's what I like about deadalnix's proposal -- do nothing, and you get *correct* behaviour. Do something, and you make it faster. Sounds like a good deal to me. Besides, almost *all* of those wrapper ranges are currently _broken_ for transient ranges. You get undefined behaviour when you inadvertently pass a transient range to them. No matter what you do, this has to be fixed, one way or another. By fixing *all* ranges to be non-transient by default (and, as you say, there are only a scant few of them -- byLine is the only Phobos example I can think of), we fix all of this broken behaviour instantly. The rest is optional optimization.HS Teoh explained it nicely. The responsibility of using .transient or not belong to the consumer. Only the consumer can know if a transient range is suitable. So you don't wrap a transient range. You wrap a non transient range, and, if the consumer is able to handle the transcient version, it call .transient on its source, that call it on its source, etc . . . up to the real source. If one transformer is unable to handle transient range, the it don't pass .transient to its source.You still need to wrap it in every wrapper range which could possibly support transience. So, this affects every single range which could possibly support transience.At least with Andrei's proposal, transience is explicitly restricted to input ranges, which seriously reduces the problems caused by them, particularly since so few functions can really function with just an input range.[...] With Andrei's proposal, all code that assumes transient .front with input ranges are broken by definition. I've looked over std.algorithm -- there are a *lot* of places with this assumption. Instead of getting correct default behaviour, you get a whole bunch of broken code with buggy behaviour, unless you first rewrite a lot of code in std.algorithm (and probably std.range as well). Furthermore, afterwards there is still no safety net: accidentally create a transient forward range? You're screwed, std.algorithm breaks all over the place. You said that ranges are supposed to be easy for users to create. Well, with Andrei's scheme, all the user has to do is to write a .save method in his input range, and all of a sudden everything breaks. The problem is that input ranges allow transient .front to begin with. Now users have to remember, once I add .save, I have to make .front non-transient. This is very counterintuitive to me. I'm extending a range to make it usable as a forward range, and now suddenly I can't reuse my buffers anymore? In my mind, forward ranges should be a refinement of input ranges. So I should be able to just add more stuff to my input range, and it should become a valid forward range. But now this is not true anymore, and the reason? We arbitrarily decided to make forward ranges non-transient. By making *all* ranges non-transient by default, we avoid this problem. Users don't have to remember the strange interaction between .front and .save. You only get transient ranges when you explicitly ask for it. So code is safe by default, and if you know what you're doing, you can make it faster. And std.array.array actually works correctly 100% of the time with no further changes. The current situation is, code is *not* safe by default. Pass a transient range to std.algorithm, and sometimes it works, sometimes it breaks. Passing it to std.array.array sometimes works, sometimes doesn't. Subtle bugs arise everywhere. With Andrei's proposal, you still have the same problem: forget to update a Phobos algorithm that expects an input range, and it breaks. You have to pretty much rewrite every single algorithm in Phobos that expects an input range, AND complete this rewrite BEFORE code starts working correctly. Then after that, every time you write code that takes an input range, you have to watch out for transient ranges, otherwise things will break. And std.array.array can no longer take an input range because it can't copy .front correctly. IOW, input ranges become almost completely useless. How is this any better than deadalnix's solution? From what I can tell, it's a lot worse, for all of the above reasons. T -- "Hi." "'Lo."
Nov 05 2012
On 11/5/12 9:45 PM, H. S. Teoh wrote:Besides, almost *all* of those wrapper ranges are currently _broken_ for transient ranges. You get undefined behaviour when you inadvertently pass a transient range to them.It's not undefined behavior, it's just surprising behavior. UB would be a fair amount more worrisome.With Andrei's proposal, all code that assumes transient .front with input ranges are broken by definition.I think this should be: all code that DOES NOT assume transient .front with input ranges is broken.I've looked over std.algorithm -- there are a *lot* of places with this assumption. Instead of getting correct default behaviour, you get a whole bunch of broken code with buggy behaviour, unless you first rewrite a lot of code in std.algorithm (and probably std.range as well).Could you please put together a list of these algorithms so we have it? Thanks.Furthermore, afterwards there is still no safety net: accidentally create a transient forward range?But that goes for any incorrect range.How is this any better than deadalnix's solution? From what I can tell, it's a lot worse, for all of the above reasons.I think we should start from here: the .transient proposal will not be accepted because it is too complex. Consider it a baseline that other proposals would be evaluated against. Then let's see how to devise a robust, simple solution. Andrei
Nov 05 2012
Le 05/11/2012 22:51, Andrei Alexandrescu a écrit :On 11/5/12 9:45 PM, H. S. Teoh wrote:Unless you know the internal implementation of the range, it is undefined (nothing in the spec specify what the range does in this case, which IS undefined behavior).Besides, almost *all* of those wrapper ranges are currently _broken_ for transient ranges. You get undefined behaviour when you inadvertently pass a transient range to them.It's not undefined behavior, it's just surprising behavior. UB would be a fair amount more worrisome.You never addressed the std.array.array() problem. Neither the fact that forward ranges may require to be transient (I have that in my rubik's cube solver, and H. S. Teoh seems to needs that too).With Andrei's proposal, all code that assumes transient .front with input ranges are broken by definition.I think this should be: all code that DOES NOT assume transient .front with input ranges is broken.I'd like to see a proposal discarded in favor of a better proposal. I'm certain I can say that we don't have one. Let me explain what is wrong in your proposal (=> forward range = non transient / input range = transient). First two concepts are packed into one. I can safely say that both are not related, and uses cases has already been presented in all possible combination of input/forward transient or not. This usually seems like a win, but ends up creating a more complex situation at the ends. I've made that mistake quite a lot of time myself, surely enough to be sure that this is a bad idea. See for instance how the private method made final automagicaly (which sounds nice at first) create inconsistent behavior when it come to NVI as explained in TDPL. This is typical of how different concept packed into one tends to explode at some point. Additionally, the proposed solution put a responsibility on the developer : he/she must ensure that all its code with input ranges is compatible with transient range. The fact is that transient stuff are highly unusual in most programming languages (even when possible, it is usually avoided like plague), and that all input ranges will not be transient. This WILL result in a lot of code in the field using input range but not supporting when it is transient. Putting the responsibility on the programmer never worked since programmer exists, and it is not going to improve soon.How is this any better than deadalnix's solution? From what I can tell, it's a lot worse, for all of the above reasons.I think we should start from here: the .transient proposal will not be accepted because it is too complex. Consider it a baseline that other proposals would be evaluated against. Then let's see how to devise a robust, simple solution.
Nov 05 2012
On 11/6/12 1:26 AM, deadalnix wrote:Le 05/11/2012 22:51, Andrei Alexandrescu a écrit :That would be unspecified behavior.On 11/5/12 9:45 PM, H. S. Teoh wrote:Unless you know the internal implementation of the range, it is undefined (nothing in the spec specify what the range does in this case, which IS undefined behavior).Besides, almost *all* of those wrapper ranges are currently _broken_ for transient ranges. You get undefined behaviour when you inadvertently pass a transient range to them.It's not undefined behavior, it's just surprising behavior. UB would be a fair amount more worrisome.Good solutions are found in the minds of people. Starting from the idea that .transient is unacceptably complicated, work from that angle. You can't claim you have reached perfection in design can you?You never addressed the std.array.array() problem. Neither the fact that forward ranges may require to be transient (I have that in my rubik's cube solver, and H. S. Teoh seems to needs that too).With Andrei's proposal, all code that assumes transient .front with input ranges are broken by definition.I think this should be: all code that DOES NOT assume transient .front with input ranges is broken.I'd like to see a proposal discarded in favor of a better proposal. I'm certain I can say that we don't have one.How is this any better than deadalnix's solution? From what I can tell, it's a lot worse, for all of the above reasons.I think we should start from here: the .transient proposal will not be accepted because it is too complex. Consider it a baseline that other proposals would be evaluated against. Then let's see how to devise a robust, simple solution.Let me explain what is wrong in your proposal (=> forward range = non transient / input range = transient).I am well aware of the relative tradeoffs. Arguing for .transient is futile at this point. What we need to do is find something better. Andrei
Nov 05 2012
Le 06/11/2012 01:21, Andrei Alexandrescu a écrit :I certainly don't ! I'd be happy to switch to another solution granted it is superior. I just don't see any right now, which obviously don't mean none exist.I'd like to see a proposal discarded in favor of a better proposal. I'm certain I can say that we don't have one.Good solutions are found in the minds of people. Starting from the idea that .transient is unacceptably complicated, work from that angle. You can't claim you have reached perfection in design can you?To be honest, my biggest fear isn't that this proposal is rejected, but that we fallback as default on the input range = transient / forward range = non transient scheme, because we fail to come up with something better, or that the status quo is choosen (as both seems to me worse than the .transient proposal). Now, as previously said, if someone come up with something better, I'd be happy to drop it completely. I'm not pushing this proposal because it is mine.Let me explain what is wrong in your proposal (=> forward range = non transient / input range = transient).I am well aware of the relative tradeoffs. Arguing for .transient is futile at this point. What we need to do is find something better.
Nov 05 2012
On Tue, Nov 06, 2012 at 01:44:28AM +0100, deadalnix wrote:Le 06/11/2012 01:21, Andrei Alexandrescu a écrit :[...]I'd like to hear a counterproposal to the .transient idea. Let's not get into a battle of words with nothing on the table to discuss. [...]Good solutions are found in the minds of people. Starting from the idea that .transient is unacceptably complicated, work from that angle. You can't claim you have reached perfection in design can you?To be honest, my biggest fear isn't that this proposal is rejected, but that we fallback as default on the input range = transient / forward range = non transient scheme, because we fail to come up with something better, or that the status quo is choosen (as both seems to me worse than the .transient proposal).[...] Yeah, that is my fear also. Which is why I'm making noise about this issue. The status quo is definitely out of the question, as it leads to undefined/unspecified behaviour with basically no warning. Conflating transience with input range is not workable IMO, because it leads to leaky abstractions like transient forward ranges being considered non-forward ranges because we arbitrarily decided that forward ranges cannot be transient. I've already given a number of non-trivial examples of transient forward ranges, and apparently deadalnix has also encountered the same issue. Denying these cases just for the sake of holding on to an idealized abstraction is only further proof that the abstraction is leaky. We all know from experience that leaky abstractions will only lead to trouble down the road. The only other remaining proposal so far is Jonathan's, that is to redefine all ranges to be non-transient. Then we fix byLine() to be non-transient, and everything else will work with no further trouble. We also don't get to use std.algorithm with any transient ranges, and will in all likelihood reinvent std.algorithm everywhere a transient range pops up. I don't see this as a viable solution either. So the .transient idea currently seems to be the best we have. I'd love to hear a superior proposal, if there is one. T -- I don't trust computers, I've spent too long programming to think that they can get anything right. -- James Miller
Nov 05 2012
On 11/6/12 3:06 AM, H. S. Teoh wrote:I've already given a number of non-trivial examples of transient forward ranges, and apparently deadalnix has also encountered the same issue.I'd like to build more of this argument. I pointed out that your range is actually defining .dup, not .save. I think an argument can be made that forward ranges with transient .front do not qualify as forward ranges, seeing as a forward range is modeled after a list that actually has data sitting in memory. Andrei
Nov 05 2012
Le 06/11/2012 02:36, Andrei Alexandrescu a écrit :On 11/6/12 3:06 AM, H. S. Teoh wrote:Yes my code can also fall in the .dup stuff. If we actually make that difference, our case is moot, but a hell lot of code is broken as well. For instance, the way map is defined is highly incorrect as each call to .front can pop a newly generated element as well, and I'm pretty that isn't the only one. We are here back to equality vs identity, and it seems D also have some stuff to fix in that regard (notably struct and AA). Discussing that here only make the problem more confuse and is not helping as long as this question is open in D.I've already given a number of non-trivial examples of transient forward ranges, and apparently deadalnix has also encountered the same issue.I'd like to build more of this argument. I pointed out that your range is actually defining .dup, not .save.
Nov 05 2012
On Tue, Nov 06, 2012 at 03:36:54AM +0200, Andrei Alexandrescu wrote:On 11/6/12 3:06 AM, H. S. Teoh wrote:[...] I suppose you could argue that way. But that still doesn't solve the problem of std.array.array with input ranges. Making std.array.array require forward ranges seems to defeat its purpose, to me. So even if you get rid of transience in forward ranges, you still need to make that distinction with input ranges. Which still requires large-scale code changes to existing Phobos algorithms. And which still requires some sort of complication to the existing range code. T -- Music critic: "That's an imitation fugue!"I've already given a number of non-trivial examples of transient forward ranges, and apparently deadalnix has also encountered the same issue.I'd like to build more of this argument. I pointed out that your range is actually defining .dup, not .save. I think an argument can be made that forward ranges with transient .front do not qualify as forward ranges, seeing as a forward range is modeled after a list that actually has data sitting in memory.
Nov 05 2012
On 11/6/12 2:44 AM, deadalnix wrote:To be honest, my biggest fear isn't that this proposal is rejected, but that we fallback as default on the input range = transient / forward range = non transient scheme, because we fail to come up with something better, or that the status quo is choosen (as both seems to me worse than the .transient proposal).I think the simplification of input range = transient and forward range = not transient has a lot going for it. It is simple, easy to explain and understand, and builds on simple real-life examples (buffered input and singly-linked lists). Clearly adding a new notion to the soup makes for more expressiveness, but it also makes for more complexity and subtlety in support for niche ranges. This should not be neglected. Andrei
Nov 05 2012
Le 06/11/2012 02:40, Andrei Alexandrescu a écrit :On 11/6/12 2:44 AM, deadalnix wrote:At this point, if transient is out I'd prefer Jonathan's proposal were everything is non transient. This is clearly simpler to use and break less code. Indeed, the added complexity of .transient exists. The beauty of it is that it is possible to write 100% correct code without even knowing .transient exists. This is why I like this option : the added complexity only exists for the programmer that what to explore the arcane of the language (which include you and me, but not most D users).To be honest, my biggest fear isn't that this proposal is rejected, but that we fallback as default on the input range = transient / forward range = non transient scheme, because we fail to come up with something better, or that the status quo is choosen (as both seems to me worse than the .transient proposal).I think the simplification of input range = transient and forward range = not transient has a lot going for it. It is simple, easy to explain and understand, and builds on simple real-life examples (buffered input and singly-linked lists). Clearly adding a new notion to the soup makes for more expressiveness, but it also makes for more complexity and subtlety in support for niche ranges. This should not be neglected.
Nov 05 2012
On Mon, Nov 05, 2012 at 11:51:35PM +0200, Andrei Alexandrescu wrote: [...]Then std.array.array is broken by definition, and cannot be implemented with anything less than a forward range. This will very likely break a lot of existing code.With Andrei's proposal, all code that assumes transient .front with input ranges are broken by definition.I think this should be: all code that DOES NOT assume transient .front with input ranges is broken.This is probably an incomplete list, but it's a start: std.algorithm.reduce - when no seed is given. std.algorithm.joiner - both variants (I have a fix for the second variant) std.algorithm.group std.algorithm.minCount (std.algorithm.minPos - but takes forwardRange, so probably OK) (std.algorithm.Levenshtein - takes forwardRange, probably OK) (std.algorithm.makeIndex - takes forwardRange, probably OK) std.algorithm.splitter - tries to take slices without checking if range is sliceable std.algorithm.uniq std.algorithm.topNCopy - copies .front of input range. std.algorithm.NWayUnion - copies .front of (possibly input) outer range into a heap (std.algorithm.largestPartialIntersection - uses NWayUnion) std.array.array - tries to copy input range std.array.insertInPlace - tries to copy input range std.array.join - tries to copy input range std.stdio.writeln - there's too much code here so I didn't narrow it down, but testing with a transient range shows that it doesn't work correctly. This probably implicates std.format somewhere. I didn't check the other Phobos modules, but basically any code that currently takes an input range needs to be audited for this issue. [...]I've looked over std.algorithm -- there are a *lot* of places with this assumption. Instead of getting correct default behaviour, you get a whole bunch of broken code with buggy behaviour, unless you first rewrite a lot of code in std.algorithm (and probably std.range as well).Could you please put together a list of these algorithms so we have it? Thanks.[...] I still think forcing input ranges to be transient is oversimplifying the issue. Whether a range is transient is orthogonal to whether you can .save the current position or whether you can traverse it in two directions. Trying to conflate the two only leads to leaky abstractions. The only real solution IMO is to address the issue head-on: either recognize transience as an inherent property of ranges, or (re)define ranges to exclude all transience. If we recognize transience as an inherent property of ranges, then no matter what we do, there's going to be complications, because you'll always have to deal with two cases. Either an algorithm can handle transience, or it can't. If it can't, there needs to be a way to make it so that it will reject transient ranges somehow. With deadalnix's proposal, the overhead of doing this is more or less minimized, but it is still there. If we exclude all transience, then there is no additional complication. We just have to fix the current code and make it crystal clear in the documentation that .front must always be persistent no matter what. The disadvantage here is that some algorithms, like find(), don't *need* the persistence of .front, so you lose generality. Any code that works with transient ranges will have to stick with foreach, or reinvent std.algorithm. With .transient proposal, this is the default behaviour, except that we have the *option* of using transient ranges if both the range type and the algorithm can handle it. So the .transient proposal is pretty close to a comfortable balance between the two extremes. If it is still considered too complicated, then I'd like to hear what other proposals will address all the underlying issues in just as nice a way (or an even nicer way), without compromising this balance between the two extremes. T -- In order to understand recursion you must first understand recursion.How is this any better than deadalnix's solution? From what I can tell, it's a lot worse, for all of the above reasons.I think we should start from here: the .transient proposal will not be accepted because it is too complex. Consider it a baseline that other proposals would be evaluated against. Then let's see how to devise a robust, simple solution.
Nov 05 2012
On Monday, November 05, 2012 16:49:42 H. S. Teoh wrote:On Mon, Nov 05, 2012 at 11:51:35PM +0200, Andrei Alexandrescu wrote: [...]We can create an hasTransientFront trait for checking whether front is transient or not. It can't be 100% accurate, but it would fall on the side of declaring something transient when it wasn't, so it wouldn't declare something transient to be non-transient. Then any range for which hasTransientFront was false could work with std.array.array. For the rest, they can do something like auto arr = std.array.array(map!"a.dup"(file.byLine())); But it's certainly true that fixing std.array.array will break code, which is annoying. But as long as front can be transient, there's no way around that. byLine cannot possibly work with std.array.array as it stands. The only solutions that I see at this point are 1. Exclude transience completely. 2. Mark ranges as transient in some way and force algorithms to take this new trait into account. 3. Insist that input ranges have transient fronts (save perhaps when it can be statically verified that they aren't based on front's type) and that anything requiring a non-transient front require forward ranges. 4. Make all ranges non-transient but provide a way to get at a transient version of it and force algorithms to take this new property into account. All 4 solutions have problems. They just have different problems.Then std.array.array is broken by definition, and cannot be implemented with anything less than a forward range. This will very likely break a lot of existing code.With Andrei's proposal, all code that assumes transient .front with input ranges are broken by definition.I think this should be: all code that DOES NOT assume transient .front with input ranges is broken.I still think forcing input ranges to be transient is oversimplifying the issue. Whether a range is transient is orthogonal to whether you can .save the current position or whether you can traverse it in two directions. Trying to conflate the two only leads to leaky abstractions.I completly agree that transience and whether something is an input range are orthogonal, but we need to simplify.The only real solution IMO is to address the issue head-on: either recognize transience as an inherent property of ranges, or (re)define ranges to exclude all transience.Personally, I'm all for excluding transience completely and insisting that anything which would be transient use opApply, but Andrei doesn't like that idea. It would certainly be the simplest solution though, and it probably wouldn't really cause much of a problem for ranges like ByLine, because in most cases what you do is use them with a foreach loop. std.array.array would be the main loss there, but if opApply is used for foreach rather than the range functions (I don't know which currently gets precedence), then ByLine can become a normal range when used with range functions (i.e. no transience) and just reuse its buffer with opApply. Then it would work with range-based functions but still avoid extra allocations when simply iterating over it. That would be my ideal solution at this point, but clearly not everyone agrees. - Jonathan M Davis
Nov 05 2012
On Tue, Nov 06, 2012 at 02:03:45AM +0100, Jonathan M Davis wrote: [...]The only solutions that I see at this point are 1. Exclude transience completely. 2. Mark ranges as transient in some way and force algorithms to take this new trait into account. 3. Insist that input ranges have transient fronts (save perhaps when it can be statically verified that they aren't based on front's type) and that anything requiring a non-transient front require forward ranges. 4. Make all ranges non-transient but provide a way to get at a transient version of it and force algorithms to take this new property into account. All 4 solutions have problems. They just have different problems.[...] Hmm. Another idea just occurred to me. The basic problem here is that we are conflating two kinds of values, transient and persistent, under a single name .front. What if we explicitly name them? Say, .copyFront for the non-transient value and .refFront for the transient value (the names are unimportant right now, let's consider the semantics of it). Then an algorithm like std.array.array can be written something like this: auto array(R)(R range) { auto a = appender!(ElementType!R)(); while (!range.empty) { // Self-documenting: we expect to get a // persistent copy of the front value. a.put(range.copyFront); range.popFront(); } return a.data; } OTOH, an algorithm that is agnostic to transience, like find(), can be written something like this: R find(R,S)(R haystack, S needle) { while (!haystack.empty) { // Self-documenting: we don't expect a // persistent front value. if (haystack.refFront == needle) return haystack; haystack.popFront(); } return haystack; } Of course, this immediately breaks many ranges, because they only have a single .front currently. Which is bad. So we make use of UFCS: property auto copyFront(R)(R range) { // Not sure how to do this yet, the idea is that // copyFront should be default. return range.front; } property auto refFront(R)(R range) { // By default, .refFront is the same as copyFront. return range.copyFront; } ByLine can then do this: struct ByLine { char[] buf; property auto copyFront() { return buf.dup; } property auto refFront() { return buf; } } I haven't worked out all the details yet, but this is the gist of it. While it's still not perfect, it is a slight improvement over the .transient proposal in that no new range types are involved. There is still the issue of naming: I chose .copyFront and .refFront just for illustration purposes. I'm thinking probably we can make .front == .copyFront, so that it will automatically work with (almost) all current ranges, which should be non-transient. UFCS saves us from having to implement .refFront for everything except actual transient ranges, which should be only a few rare cases. Of course, this has the disadvantage of needing to update existing algorithms to use .refFront or .copyFront where appropriate -- but then, we have to do that anyway if we're going to support transient ranges at all. Maybe somebody can improve on this idea? T -- Long, long ago, the ancient Chinese invented a device that lets them see through walls. It was called the "window".
Nov 05 2012
I don't think this whole issue has anything to do with ranges. I think this is an issue of assuming that the symbol = means "copy what's on the right to what's on the left". When in reality, = could mean: (if what's on the right has reference semantics) "make what's on the left reference the same thing that the thing on the right references". I think all range types should be allowed to return whatever they want from their front property. It's the responsibility of the user of the range to *actually* copy what front returns (if that's what he intends), instead of assuming that = means copy. In the code below, X marks the bug: module main; import std.stdio; import std.range; class MyData { int _value; } struct MyFwdRange { MyData _myData; this(MyData md) { _myData = md; } property MyData front() { return _myData; } property bool empty() const { return _myData._value > 10; } void popFront() { _myData._value++; } property MyFwdRange save() const { auto copy = MyFwdRange(); copy._myData = new MyData(); copy._myData._value = _myData._value; return copy; } } void printFirstAndSecond(R)(R range) if (isForwardRange!MyFwdRange) { // X auto first = range.front; // That is *not* copying range.popFront(); writeln(first._value, " ", range.front._value); } void main() { auto mfr = MyFwdRange(new MyData()); printFirstAndSecond(mfr); // prints: 1 1 (instead of 0 1) }
Nov 05 2012
On Tue, Nov 06, 2012 at 05:18:18AM +0100, Tommi wrote:I don't think this whole issue has anything to do with ranges. I think this is an issue of assuming that the symbol = means "copy what's on the right to what's on the left". When in reality, = could mean: (if what's on the right has reference semantics) "make what's on the left reference the same thing that the thing on the right references". I think all range types should be allowed to return whatever they want from their front property. It's the responsibility of the user of the range to *actually* copy what front returns (if that's what he intends), instead of assuming that = means copy.[...] The problem is that you can't do this in generic code, because generic code by definition doesn't know how to copy an arbitrary type. Unless we introduce a standardized deep copy operation to D, like a .clone method that all copyable types implement, this solution isn't really viable. T -- The computer is only a tool. Unfortunately, so is the user. -- Armaphine, K5
Nov 05 2012
On Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:The problem is that you can't do this in generic code, because generic code by definition doesn't know how to copy an arbitrary type.I'm not familiar with that definition of generic code. But I do feel that there's a pretty big problem with a language design if the language doesn't provide a generic way to make a copy of a variable. To be fair, e.g. C++ doesn't provide that either.
Nov 05 2012
On Tue, Nov 06, 2012 at 05:49:44AM +0100, Tommi wrote:On Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:OK I worded that poorly. All I meant was that currently, there is no generic way to make a copy of something. It could be construed to be a bug or a language deficiency, but that's how things are currently. One *could* introduce a new language construct for making a copy of something, of course, but that leads to all sorts of issues about implicit allocation, how to eliminate unnecessary implicit copying, etc.. It's not a simple problem, in spite of the simplicity of stating the problem. T -- Verbing weirds language. -- Calvin (& Hobbes)The problem is that you can't do this in generic code, because generic code by definition doesn't know how to copy an arbitrary type.I'm not familiar with that definition of generic code. But I do feel that there's a pretty big problem with a language design if the language doesn't provide a generic way to make a copy of a variable. To be fair, e.g. C++ doesn't provide that either.
Nov 05 2012
On 11/6/12 7:48 AM, H. S. Teoh wrote:On Tue, Nov 06, 2012 at 05:49:44AM +0100, Tommi wrote:Here's where user defined tributes would help a lot. We'd then define owned to mention that a class reference field inside an object must be duplicated upon copy: class A { ... } struct B { owned A payload; A another; ... } That way a generic clone() routine could be written. AndreiOn Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:OK I worded that poorly. All I meant was that currently, there is no generic way to make a copy of something. It could be construed to be a bug or a language deficiency, but that's how things are currently. One *could* introduce a new language construct for making a copy of something, of course, but that leads to all sorts of issues about implicit allocation, how to eliminate unnecessary implicit copying, etc.. It's not a simple problem, in spite of the simplicity of stating the problem.The problem is that you can't do this in generic code, because generic code by definition doesn't know how to copy an arbitrary type.I'm not familiar with that definition of generic code. But I do feel that there's a pretty big problem with a language design if the language doesn't provide a generic way to make a copy of a variable. To be fair, e.g. C++ doesn't provide that either.
Nov 05 2012
Le 06/11/2012 07:56, Andrei Alexandrescu a écrit :On 11/6/12 7:48 AM, H. S. Teoh wrote:You mentioned me once that AOP was useless in D.On Tue, Nov 06, 2012 at 05:49:44AM +0100, Tommi wrote:Here's where user defined tributes would help a lot. We'd then define owned to mention that a class reference field inside an object must be duplicated upon copy: class A { ... } struct B { owned A payload; A another; ... } That way a generic clone() routine could be written.On Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:OK I worded that poorly. All I meant was that currently, there is no generic way to make a copy of something. It could be construed to be a bug or a language deficiency, but that's how things are currently. One *could* introduce a new language construct for making a copy of something, of course, but that leads to all sorts of issues about implicit allocation, how to eliminate unnecessary implicit copying, etc.. It's not a simple problem, in spite of the simplicity of stating the problem.The problem is that you can't do this in generic code, because generic code by definition doesn't know how to copy an arbitrary type.I'm not familiar with that definition of generic code. But I do feel that there's a pretty big problem with a language design if the language doesn't provide a generic way to make a copy of a variable. To be fair, e.g. C++ doesn't provide that either.
Nov 06 2012
On 11/6/12 3:50 PM, deadalnix wrote:Le 06/11/2012 07:56, Andrei Alexandrescu a écrit :What's the connection? AndreiOn 11/6/12 7:48 AM, H. S. Teoh wrote:You mentioned me once that AOP was useless in D.On Tue, Nov 06, 2012 at 05:49:44AM +0100, Tommi wrote:Here's where user defined tributes would help a lot. We'd then define owned to mention that a class reference field inside an object must be duplicated upon copy: class A { ... } struct B { owned A payload; A another; ... } That way a generic clone() routine could be written.On Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:OK I worded that poorly. All I meant was that currently, there is no generic way to make a copy of something. It could be construed to be a bug or a language deficiency, but that's how things are currently. One *could* introduce a new language construct for making a copy of something, of course, but that leads to all sorts of issues about implicit allocation, how to eliminate unnecessary implicit copying, etc.. It's not a simple problem, in spite of the simplicity of stating the problem.The problem is that you can't do this in generic code, because generic code by definition doesn't know how to copy an arbitrary type.I'm not familiar with that definition of generic code. But I do feel that there's a pretty big problem with a language design if the language doesn't provide a generic way to make a copy of a variable. To be fair, e.g. C++ doesn't provide that either.
Nov 06 2012
Le 06/11/2012 15:44, Andrei Alexandrescu a écrit :This is OT. But this owned stuff coupled with code that is generated depending on its presence IS AOP.What's the connection?Here's where user defined tributes would help a lot. We'd then define owned to mention that a class reference field inside an object must be duplicated upon copy: class A { ... } struct B { owned A payload; A another; ... } That way a generic clone() routine could be written.You mentioned me once that AOP was useless in D.
Nov 06 2012
On 11/06/2012 07:56 AM, Andrei Alexandrescu wrote:On 11/6/12 7:48 AM, H. S. Teoh wrote:That is actually the first thing that I will use the new attribute system for.On Tue, Nov 06, 2012 at 05:49:44AM +0100, Tommi wrote:Here's where user defined tributes would help a lot. We'd then define owned to mention that a class reference field inside an object must be duplicated upon copy: class A { ... } struct B { owned A payload; A another; ... } That way a generic clone() routine could be written. AndreiOn Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:OK I worded that poorly. All I meant was that currently, there is no generic way to make a copy of something. It could be construed to be a bug or a language deficiency, but that's how things are currently. One *could* introduce a new language construct for making a copy of something, of course, but that leads to all sorts of issues about implicit allocation, how to eliminate unnecessary implicit copying, etc.. It's not a simple problem, in spite of the simplicity of stating the problem.The problem is that you can't do this in generic code, because generic code by definition doesn't know how to copy an arbitrary type.I'm not familiar with that definition of generic code. But I do feel that there's a pretty big problem with a language design if the language doesn't provide a generic way to make a copy of a variable. To be fair, e.g. C++ doesn't provide that either.
Nov 06 2012
On Tuesday, 6 November 2012 at 04:49:45 UTC, Tommi wrote:On Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:C++ as a language doesn't, but if you follow the convention that C++ establishes in its libraries (where copy assignment & copy construction == deep copy), it always works out correctly. D doesn't have that convention so that's why we're running into trouble.The problem is that you can't do this in generic code, because generic code by definition doesn't know how to copy an arbitrary type.I'm not familiar with that definition of generic code. But I do feel that there's a pretty big problem with a language design if the language doesn't provide a generic way to make a copy of a variable. To be fair, e.g. C++ doesn't provide that either.
Nov 05 2012
On 11/6/12 6:49 AM, Tommi wrote:On Tuesday, 6 November 2012 at 04:31:56 UTC, H. S. Teoh wrote:Languages commonly have trouble defining comparison and copying generically. More often than not user intervention is needed (e.g. see Java's clone, Lisp's many comparison operators etc). AndreiThe problem is that you can't do this in generic code, because generic code by definition doesn't know how to copy an arbitrary type.I'm not familiar with that definition of generic code. But I do feel that there's a pretty big problem with a language design if the language doesn't provide a generic way to make a copy of a variable. To be fair, e.g. C++ doesn't provide that either.
Nov 05 2012
On 11/6/12 4:36 AM, H. S. Teoh wrote:Hmm. Another idea just occurred to me. The basic problem here is that we are conflating two kinds of values, transient and persistent, under a single name .front. What if we explicitly name them? Say, .copyFront for the non-transient value and .refFront for the transient value (the names are unimportant right now, let's consider the semantics of it).We could transfer that matter to the type of .front itself, i.e. define a function copy(x) that returns e.g. x for for string and x.dup for char[] etc. There would be problems on e.g. defining copy for structs with pointer and class reference fields etc. One quite simple approach would be to define (on the contrary) .peekFront, which means "yeah, I'd like to take a peek at the front but I don't plan to store it anywhere". That would entail we define eachLine etc. to return string from .front and char[] from .peekFront, and deprecate byLine. Andrei
Nov 05 2012
On Tuesday, November 06, 2012 08:49:26 Andrei Alexandrescu wrote:On 11/6/12 4:36 AM, H. S. Teoh wrote:peekFront would work better than copy, because whether front needs to be copied or not doesn't necessarily have much to do with its type. For instance, byLine/eachLine can return char[] from front just fine and still have it be a new array every time. So, while in some cases, you can tell from the type that no copy is needed (e.g. string), you can't tell in the general case and would be forced to make needless copies in a number of cases. For instance, every range of class objects would end up having to make a copy, because they'd be mutable reference types, and without knowing that the range is doing, you have no way of knowing whether it keeps replacing the objects referred to by front or not (it's not particularly likely that it would be, but you can't tell for sure just the same). If we defined peekFront via UFCS as a wrapper which calls front, then anything wanting to use peekFront could use peekFront regardless of whether the type defined it or not. So, that would reduce the impact caused by its introduction, but it would still impact a lot of range types ultimately, because we'd have to create appropriate wrappers for peekFront in most of them, or we'd end up making unnecessary copies. I don't like how much this impacts, but as H. S. Teoh points out, we don't exactly have very many options with minimal impact beyond banning transient fronts entirely (which I'd honestly like to do just the same). At least this avoids the need to create more traits to test ranges for, since if we create a free function peekFront, all range types can just assume that it's there and create wrappers for it without caring whether the wrapped range defines it itself or uses the free function. And it's less complicated than the .transient suggestion. Though it _does_ introduce the possibility of front and peekFront returning completely different types, which could complicate things a bit. It might be better to require that they be identical to avoid that problem. For better or worse though, this approach would mean that byLine (or eachLine or whatever) wouldn't be reusing the buffer with foreach like they do now, though I suppose that you could make them have opApply which does the same thing as now (meaning that it effectively uses peekFront), and then any range- based functions would use front until they were updated to use peekFront if appropriate. But then again, maybe we want byLine/eachLine to copy by default, since that's safer, much as it's less efficient, since then we have safe by default but still have an explicit means to be more efficient. That fits in well with our general approach. peekFront may be the way to go, but I think that we need to think through the consequences (like the potential problems caused by front and peekFront returning different types) before we decide on this. - Jonathan M DavisHmm. Another idea just occurred to me. The basic problem here is that we are conflating two kinds of values, transient and persistent, under a single name .front. What if we explicitly name them? Say, .copyFront for the non-transient value and .refFront for the transient value (the names are unimportant right now, let's consider the semantics of it).We could transfer that matter to the type of .front itself, i.e. define a function copy(x) that returns e.g. x for for string and x.dup for char[] etc. There would be problems on e.g. defining copy for structs with pointer and class reference fields etc. One quite simple approach would be to define (on the contrary) .peekFront, which means "yeah, I'd like to take a peek at the front but I don't plan to store it anywhere". That would entail we define eachLine etc. to return string from .front and char[] from .peekFront, and deprecate byLine.
Nov 05 2012
Le 06/11/2012 08:17, Jonathan M Davis a écrit :If we defined peekFront via UFCS as a wrapper which calls front, then anything wanting to use peekFront could use peekFront regardless of whether the type defined it or not. So, that would reduce the impact caused by its introduction, but it would still impact a lot of range types ultimately, because we'd have to create appropriate wrappers for peekFront in most of them, or we'd end up making unnecessary copies.Assuming you call front on the source range, you don't need to copy : you already work on a clean copy. With that one, you ends up duplicating all code that use front on wrapper range into one that use peekFront as well. I'm not sure this is better than .transient, but maybe.I don't like how much this impacts, but as H. S. Teoh points out, we don't exactly have very many options with minimal impact beyond banning transient fronts entirely (which I'd honestly like to do just the same).If we really want to go simple, this seems like the way to go.At least this avoids the need to create more traits to test ranges for, since if we create a free function peekFront, all range types can just assume that it's there and create wrappers for it without caring whether the wrapped range defines it itself or uses the free function. And it's less complicated than the .transient suggestion. Though it _does_ introduce the possibility of front and peekFront returning completely different types, which could complicate things a bit. It might be better to require that they be identical to avoid that problem.I'm not sure yet this is simpler.For better or worse though, this approach would mean that byLine (or eachLine or whatever) wouldn't be reusing the buffer with foreach like they do now, though I suppose that you could make them have opApply which does the same thing as now (meaning that it effectively uses peekFront), and then any range- based functions would use front until they were updated to use peekFront if appropriate. But then again, maybe we want byLine/eachLine to copy by default, since that's safer, much as it's less efficient, since then we have safe by default but still have an explicit means to be more efficient. That fits in well with our general approach.Safe by default seems like the way to go.peekFront may be the way to go, but I think that we need to think through the consequences (like the potential problems caused by front and peekFront returning different types) before we decide on this.Yes especially string/char[] has already been detected as a potential problem.
Nov 06 2012
Le 06/11/2012 07:49, Andrei Alexandrescu a écrit :On 11/6/12 4:36 AM, H. S. Teoh wrote:This have the same issue than .transient have in regard of transformer ranges (BTW what is the correct terminology for that ?).Hmm. Another idea just occurred to me. The basic problem here is that we are conflating two kinds of values, transient and persistent, under a single name .front. What if we explicitly name them? Say, .copyFront for the non-transient value and .refFront for the transient value (the names are unimportant right now, let's consider the semantics of it).We could transfer that matter to the type of .front itself, i.e. define a function copy(x) that returns e.g. x for for string and x.dup for char[] etc. There would be problems on e.g. defining copy for structs with pointer and class reference fields etc. One quite simple approach would be to define (on the contrary) .peekFront, which means "yeah, I'd like to take a peek at the front but I don't plan to store it anywhere". That would entail we define eachLine etc. to return string from .front and char[] from .peekFront, and deprecate byLine.
Nov 06 2012
On Mon, Nov 05, 2012 at 11:17:08PM -0800, Jonathan M Davis wrote:On Tuesday, November 06, 2012 08:49:26 Andrei Alexandrescu wrote:I like .peekFront. It's easy to understand, and is self-documenting. And this approach is safe by default (all current code uses .front), and can be implemented incrementally (doesn't require updating all of Phobos at once to avoid transience-related bugs in the interim).On 11/6/12 4:36 AM, H. S. Teoh wrote:Hmm. Another idea just occurred to me. The basic problem here is that we are conflating two kinds of values, transient and persistent, under a single name .front. What if we explicitly name them? Say, .copyFront for the non-transient value and .refFront for the transient value (the names are unimportant right now, let's consider the semantics of it).We could transfer that matter to the type of .front itself, i.e. define a function copy(x) that returns e.g. x for for string and x.dup for char[] etc. There would be problems on e.g. defining copy for structs with pointer and class reference fields etc. One quite simple approach would be to define (on the contrary) .peekFront, which means "yeah, I'd like to take a peek at the front but I don't plan to store it anywhere". That would entail we define eachLine etc. to return string from .front and char[] from .peekFront, and deprecate byLine.peekFront would work better than copy, because whether front needs to be copied or not doesn't necessarily have much to do with its type.Yeah, I think trying to guess transience from the return type is risky, and prone to bugs. OTOH, having a generic language-wide way to duplicate a value is something worth considering. I think it will allow us to solve a number of other nagging issues. [...]At least this avoids the need to create more traits to test ranges for, since if we create a free function peekFront, all range types can just assume that it's there and create wrappers for it without caring whether the wrapped range defines it itself or uses the free function. And it's less complicated than the .transient suggestion. Though it _does_ introduce the possibility of front and peekFront returning completely different types, which could complicate things a bit. It might be better to require that they be identical to avoid that problem.The free function can be made to always conform to the same type as .front: ElementType!R peekFront(R)(R range) { return range.front; } Though it still doesn't prevent a member called .peekFront of a different type. We *could* use template constraints to enforce that though: auto myRangeFunc(R)(R range) if (is(typeof(range.front)==typeof(range.peekFront))) ... This could be put in a template that the other range-checking templates can call.For better or worse though, this approach would mean that byLine (or eachLine or whatever) wouldn't be reusing the buffer with foreach like they do now, though I suppose that you could make them have opApply which does the same thing as now (meaning that it effectively uses peekFront), and then any range- based functions would use front until they were updated to use peekFront if appropriate. But then again, maybe we want byLine/eachLine to copy by default, since that's safer, much as it's less efficient, since then we have safe by default but still have an explicit means to be more efficient. That fits in well with our general approach.I think we should make byLine copy by default. Only if you explicitly call peekFront you get the non-copying behaviour. This fits better with D's motto of being safe by default, and fast if you know what you're doing.peekFront may be the way to go, but I think that we need to think through the consequences (like the potential problems caused by front and peekFront returning different types) before we decide on this.[...] For simplicity, I think we should just enforce typeof(.front) == typeof(.peekFront). We will get needless complications otherwise. T -- We've all heard that a million monkeys banging on a million typewriters will eventually reproduce the entire works of Shakespeare. Now, thanks to the Internet, we know this is not true. -- Robert Wilensk
Nov 06 2012
On Tuesday, 6 November 2012 at 16:09:59 UTC, H. S. Teoh wrote:On Mon, Nov 05, 2012 at 11:17:08PM -0800, Jonathan M Davis wrote:That's not what "peek" means in any of the languages I've seen. Peek just means "show me the element, but don't pop it off". It does not have to do anything with how the caller uses the result. It would be very confusing to have both front and peekFront.On Tuesday, November 06, 2012 08:49:26 Andrei Alexandrescu wrote:On 11/6/12 4:36 AM, H. S. Teoh wrote: One quite simple approach would be to define (on the contrary) .peekFront, which means "yeah, I'd like to take a peek at the front but I don't plan to store it anywhere".
Nov 06 2012
Le 06/11/2012 07:49, Andrei Alexandrescu a écrit :On 11/6/12 4:36 AM, H. S. Teoh wrote:Is it possible to have the pro and cons of peekFront vs transient ?Hmm. Another idea just occurred to me. The basic problem here is that we are conflating two kinds of values, transient and persistent, under a single name .front. What if we explicitly name them? Say, .copyFront for the non-transient value and .refFront for the transient value (the names are unimportant right now, let's consider the semantics of it).We could transfer that matter to the type of .front itself, i.e. define a function copy(x) that returns e.g. x for for string and x.dup for char[] etc. There would be problems on e.g. defining copy for structs with pointer and class reference fields etc. One quite simple approach would be to define (on the contrary) .peekFront, which means "yeah, I'd like to take a peek at the front but I don't plan to store it anywhere". That would entail we define eachLine etc. to return string from .front and char[] from .peekFront, and deprecate byLine. Andrei
Nov 06 2012
On Tue, Nov 06, 2012 at 10:03:56PM +0100, deadalnix wrote:Le 06/11/2012 07:49, Andrei Alexandrescu a écrit :[...]On 11/6/12 4:36 AM, H. S. Teoh wrote:Hmm. Another idea just occurred to me. The basic problem here is that we are conflating two kinds of values, transient and persistent, under a single name .front. What if we explicitly name them? Say, .copyFront for the non-transient value and .refFront for the transient value (the names are unimportant right now, let's consider the semantics of it).We could transfer that matter to the type of .front itself, i.e. define a function copy(x) that returns e.g. x for for string and x.dup for char[] etc. There would be problems on e.g. defining copy for structs with pointer and class reference fields etc. One quite simple approach would be to define (on the contrary) .peekFront, which means "yeah, I'd like to take a peek at the front but I don't plan to store it anywhere". That would entail we define eachLine etc. to return string from .front and char[] from .peekFront, and deprecate byLine.Is it possible to have the pro and cons of peekFront vs transient ?I'll give it a shot, since nobody else is replying: Both .peekFront and .transient require at least existing transient ranges to be modified, as well as algorithms that can take advantage of them. - For .peekFront, existing transient ranges' .front is just renamed to .peekFront, and we add a new .front which returns the .dup (or otherwise copy) of .peekFront. For .transient, we need to write a .transient method that returns a wrapper struct whose .front is transient. * So .peekFront is slightly simpler in this case. - For .peekFront, algorithms that can handle transience can simply have all occurrences of .front replaced with .peekFront. Whereas for .transient, they will need to modified to explicitly call .transient, save that range, and use that instead of the passed-in range. * So .peekFront is slightly simpler in this case. - For .peekFront, algorithms that *can't* handle transience will have to be rewritten. Same goes for .transient. The redeeming factor in both cases is that algorithms that aren't rewritten yet will continue to work as before, just a bit slower. They will also begin to work correctly with transient ranges even _before_ being rewritten, whereas right now they're broken. * Here, .peekFront and .transient are equal in terms of effort required. - For .peekFront, the .front of *any* range is guaranteed to be non-transient, so we never have to worry about whether a particular range's .front is transient or not. User code has no possibility of passing a transient range into an algorithm that cannot handle it. For .transient, the .front of ranges is non-transient by default, but once you call .transient on it, you get a range whose .front is transient. There is a small possibility of wrong user code that passes a .transient range to an algorithm that can't handle it. * So .peekFront is slightly safer on this point. These are the points that I can think of, off the top of my head. Are there any others? Either way, both .peekFront and .transient requires rewriting currently transient ranges (which AFAIK is only byLine) and any algorithms that should be able to handle transience (there are a few I can think of, like std.algorithm.joiner, std.array.join, and maybe NWayUnion and writeln & co.). We can't do better than this if we're going to support transience at all. I note, though, that the list of algorithms that need to be updated is shorter (perhaps significantly so) than the list of currently-broken algorithms that I posted earlier, because many of the algorithms in that list *cannot* be implemented to handle transience. E.g. there is no way to implement uniq on a transient input range since there's no way to save the previous value seen, so there's no way to compare two adjacent values. Likewise, findAdjacent cannot be implemented for transient ranges for the same reason. Once these are eliminated from the list, there should only be a few algorithms left. (And I already have a version of joiner that works correctly with transient ranges.) T -- Windows: the ultimate triumph of marketing over technology. -- Adrian von Bidder
Nov 07 2012
Le 07/11/2012 19:24, H. S. Teoh a écrit :On Tue, Nov 06, 2012 at 10:03:56PM +0100, deadalnix wrote:Agreed !Le 06/11/2012 07:49, Andrei Alexandrescu a écrit :[...]On 11/6/12 4:36 AM, H. S. Teoh wrote:Hmm. Another idea just occurred to me. The basic problem here is that we are conflating two kinds of values, transient and persistent, under a single name .front. What if we explicitly name them? Say, .copyFront for the non-transient value and .refFront for the transient value (the names are unimportant right now, let's consider the semantics of it).We could transfer that matter to the type of .front itself, i.e. define a function copy(x) that returns e.g. x for for string and x.dup for char[] etc. There would be problems on e.g. defining copy for structs with pointer and class reference fields etc. One quite simple approach would be to define (on the contrary) .peekFront, which means "yeah, I'd like to take a peek at the front but I don't plan to store it anywhere". That would entail we define eachLine etc. to return string from .front and char[] from .peekFront, and deprecate byLine.Is it possible to have the pro and cons of peekFront vs transient ?I'll give it a shot, since nobody else is replying: Both .peekFront and .transient require at least existing transient ranges to be modified, as well as algorithms that can take advantage of them. - For .peekFront, existing transient ranges' .front is just renamed to .peekFront, and we add a new .front which returns the .dup (or otherwise copy) of .peekFront. For .transient, we need to write a .transient method that returns a wrapper struct whose .front is transient. * So .peekFront is slightly simpler in this case.- For .peekFront, algorithms that can handle transience can simply have all occurrences of .front replaced with .peekFront. Whereas for .transient, they will need to modified to explicitly call .transient, save that range, and use that instead of the passed-in range. * So .peekFront is slightly simpler in this case..transient seems simpler to me in this case. Adding one line is easier than replacing all occurrences of something.- For .peekFront, algorithms that *can't* handle transience will have to be rewritten. Same goes for .transient. The redeeming factor in both cases is that algorithms that aren't rewritten yet will continue to work as before, just a bit slower. They will also begin to work correctly with transient ranges even _before_ being rewritten, whereas right now they're broken. * Here, .peekFront and .transient are equal in terms of effort required.Indeed.- For .peekFront, the .front of *any* range is guaranteed to be non-transient, so we never have to worry about whether a particular range's .front is transient or not. User code has no possibility of passing a transient range into an algorithm that cannot handle it. For .transient, the .front of ranges is non-transient by default, but once you call .transient on it, you get a range whose .front is transient. There is a small possibility of wrong user code that passes a .transient range to an algorithm that can't handle it. * So .peekFront is slightly safer on this point. These are the points that I can think of, off the top of my head. Are there any others? Either way, both .peekFront and .transient requires rewriting currently transient ranges (which AFAIK is only byLine) and any algorithms that should be able to handle transience (there are a few I can think of, like std.algorithm.joiner, std.array.join, and maybe NWayUnion and writeln& co.). We can't do better than this if we're going to support transience at all. I note, though, that the list of algorithms that need to be updated is shorter (perhaps significantly so) than the list of currently-broken algorithms that I posted earlier, because many of the algorithms in that list *cannot* be implemented to handle transience. E.g. there is no way to implement uniq on a transient input range since there's no way to save the previous value seen, so there's no way to compare two adjacent values. Likewise, findAdjacent cannot be implemented for transient ranges for the same reason. Once these are eliminated from the list, there should only be a few algorithms left. (And I already have a version of joiner that works correctly with transient ranges.)OK, overall, .peekFront seems like a good idea. Something I'm afraid with .peekFront is code duplication. Let's take the joiner example. Joiner's transientness depend on its source transientness. This seems to me like a very common case for transformer ranges. If we choose the .peekFront, the naive thing to do is to implement the same algorithm twice, using front and peekFront, which is code duplication, and usually a bad idea. Is a solution known here ? Could alias template parameter help ? I'm not sure what is the solution, but if one exist, I'm sold for .peekFront ( .transientFront seems like a better name as peek have other meaning in other languages ).
Nov 07 2012
On Wednesday, November 07, 2012 21:31:13 deadalnix wrote:OK, overall, .peekFront seems like a good idea. Something I'm afraid with .peekFront is code duplication. Let's take the joiner example. Joiner's transientness depend on its source transientness. This seems to me like a very common case for transformer ranges. If we choose the .peekFront, the naive thing to do is to implement the same algorithm twice, using front and peekFront, which is code duplication, and usually a bad idea.Why would you need to duplicate anything?. If you can implement it using peekFront, then you use peekFront. If you can't, you use front. And anything which uses front when you could have used peekFront will still work. Also, if a free function peekFront which forwards to front is defined, then all range- based functions can use peekFront if they need it regardless of whether a range defines it (it's just that it would do the same as front in the case where the range didn't define it). - Jonathan M Davis
Nov 07 2012
Le 07/11/2012 22:40, Jonathan M Davis a écrit :On Wednesday, November 07, 2012 21:31:13 deadalnix wrote:OK, seeing your answer and H. S Teoh, I think I badly expressed myself, because none of you understood. Let's try to explain it better. So back on our joiner range. This range have a source, and will be given to a consumer as follow : source -> joiner -> consumer . Now let's consider that source provide its own, transient peekFront. Now joiner can provide both peekFront (based on source.peekFront), which is transcient and front (based on source.front) which is not transcient. The fact is that both front and peekFront in joiner will have the same implementation, because the transience of joiner depends of the transience of its source (like most tranformer ranges). Or, with sample code : struct Joiner { R source; // Fields and range stuffs. property front() { // Some computation using source.front } property peekFront() { // Exact same computation using source.peekFront } } I hope this make the code duplication more apparent. It is impossible for joiner to provide only a version with peekFront, because joiner.front will be transient, and if it doesn't provide the version with .peekFront, it can't take advantage of the improvement that transience can provide. So my question is, how this duplication can be avoided ? If it can, .peekFront is definitively the way to go. But if it can't, this seems like a really problematic point.OK, overall, .peekFront seems like a good idea. Something I'm afraid with .peekFront is code duplication. Let's take the joiner example. Joiner's transientness depend on its source transientness. This seems to me like a very common case for transformer ranges. If we choose the .peekFront, the naive thing to do is to implement the same algorithm twice, using front and peekFront, which is code duplication, and usually a bad idea.Why would you need to duplicate anything?. If you can implement it using peekFront, then you use peekFront. If you can't, you use front. And anything which uses front when you could have used peekFront will still work. Also, if a free function peekFront which forwards to front is defined, then all range- based functions can use peekFront if they need it regardless of whether a range defines it (it's just that it would do the same as front in the case where the range didn't define it). - Jonathan M Davis
Nov 09 2012
On Fri, Nov 09, 2012 at 02:52:20PM +0100, deadalnix wrote: [...]OK, seeing your answer and H. S Teoh, I think I badly expressed myself, because none of you understood. Let's try to explain it better. So back on our joiner range. This range have a source, and will be given to a consumer as follow : source -> joiner -> consumer . Now let's consider that source provide its own, transient peekFront. Now joiner can provide both peekFront (based on source.peekFront), which is transcient and front (based on source.front) which is not transcient. The fact is that both front and peekFront in joiner will have the same implementation, because the transience of joiner depends of the transience of its source (like most tranformer ranges). Or, with sample code : struct Joiner { R source; // Fields and range stuffs. property front() { // Some computation using source.front } property peekFront() { // Exact same computation using source.peekFront } } I hope this make the code duplication more apparent. It is impossible for joiner to provide only a version with peekFront, because joiner.front will be transient, and if it doesn't provide the version with .peekFront, it can't take advantage of the improvement that transience can provide.Hmm, you're right. At first I thought .front would just be a thin wrapper around .peekFront (e.g., return peekFront().dup), but this only works in the source range; joiner wouldn't know how to do that.So my question is, how this duplication can be avoided ? If it can, .peekFront is definitively the way to go. But if it can't, this seems like a really problematic point.My first thought is to use a mixin to generate the function bodies, but that's really ugly and doesn't really avoid duplication in the generated code regardless. :-/ I'll have to think about it more carefully. T -- It is impossible to make anything foolproof because fools are so ingenious. -- Sammy
Nov 09 2012
Now I finally see that deepDup/deepCopy/clone is not a (good) solution, because it would be inefficient in a lot of situations. This whole mess makes me wish that D was designed so that all types had value semantics (by convention, since it's probably not possible to enforce by the language). That would mean: 1) No classes. Just duck-typing based polymorphism à la go language. 2) Dynamic arrays of mutable types would have had to been implemented as copy-on-write types à la Qt containers. Don't know about the performance implications of COW though.
Nov 12 2012
On Monday, 12 November 2012 at 08:37:20 UTC, Tommi wrote:This whole mess makes me wish that D was designed so that all types had value semantics (by convention, since it's probably not possible to enforce by the language). That would mean: 1) No classes. Just duck-typing based polymorphism à la go language. 2) Dynamic arrays of mutable types would have had to been implemented as copy-on-write types à la Qt containers....forgot to add how this relates to this issue: Then you'd solve this issue by specifying range concept so that front should return by value if it's transient, and front should return by reference or const reference if it's persistent. Then you'd capture front by const reference à la C++. If front returns reference or const reference, then const reference simply references that persistent data. If front returns by value (that's an rvalue from caller's point of view), then this C++ style const reference that we use for capture would extend the lifetime of this temporary rvalue to the const reference variable's scope. And, just to have some code in a post: ref const auto saved = range.front;
Nov 12 2012
On Monday, November 12, 2012 20:51:53 Tommi wrote:Then you'd solve this issue by specifying range concept so that front should return by value if it's transient, and front should return by reference or const reference if it's persistent.That wouldn't work. It's the complete opposite of what a generative range would require if it generates the return value in front. Narrow strings in particularly would be screwed by it, because their front is calculated, and since it's a free function, as is popFront, there's no way to save the return value of front in popFront. It has to be calculated in front, and it's not at all transient. It also screws with the ability to have sealed containers, since in that case, something like front would never return by ref. The refness of front's type really shouldn't have anything to do with its transience. They're completely unrelated. - Jonathan M Davis
Nov 12 2012
On Monday, 12 November 2012 at 20:52:07 UTC, Jonathan M Davis wrote:That wouldn't work. It's the complete opposite of what a generative range would require if it generates the return value in front. Narrow strings in particularly would be screwed by it, because their front is calculated, and since it's a free function, as is popFront, there's no way to save the return value of front in popFront. It has to be calculated in front, and it's not at all transient. It also screws with the ability to have sealed containers, since in that case, something like front would never return by ref. The refness of front's type really shouldn't have anything to do with its transience. They're completely unrelated.I didn't understand any of those arguments, so perhaps I still don't fully comprehend this issue. Persistence ----------- I thought persistent X means that there's a dedicated storage space for X where it will exist after front returns and X's value will not be tampered with by calling popFront. That notion of persistence led to me into thinking that persistent front could (and really should) return either a reference to X or a const reference if the range doesn't want to allow mutation through front. Transience ---------- The other side of the coin in my thinking was that transient means that X doesn't have a dedicated storage space. So transient X could be a variable that's local to function front, or it could be that X is stored in a buffer and will exist there after front returns, but the buffer would be overwritten by a call to popFront. And that notion of transience led me into thinking that transient front could never return a reference. NOTE: All of the above assumes that D were designed so that all types had value semantics.
Nov 12 2012
On Monday, November 12, 2012 23:01:22 Tommi wrote:On Monday, 12 November 2012 at 20:52:07 UTC, Jonathan M Davis wrote:With regards to this discussion, transience means that front is a reference type which gets overwritten when popFront is called. So, when popFront is called any references to the return value of front which the caller has kept will change rather than staying the same as they were before popFront. That has _nothing_ to do with returning by ref. It's perfectly possible for a front which is non-transient to return by ref. front for arrays does that. On the other hand, front on narrow strings doesn't and _can't_ return by ref, because front is calculated on every call. There is no persistent storage for it. But it's _not_ transient like ByLine is. A range like ByLine _could_ return its front by ref. There's nothing stopping it, and it could work just fine. It's just that it doesn't benefit it any, and it would have to make sure that the retun value was still usable on each call to popFront (e.g. if ByLine returned by ref, it would have to worry about whether the buffer was then non-null or sufficiently large or whatever - but it could still work). So, it's unlikely that a range with a transient front would ever return by ref, but it can if it really wants to.That wouldn't work. It's the complete opposite of what a generative range would require if it generates the return value in front. Narrow strings in particularly would be screwed by it, because their front is calculated, and since it's a free function, as is popFront, there's no way to save the return value of front in popFront. It has to be calculated in front, and it's not at all transient. It also screws with the ability to have sealed containers, since in that case, something like front would never return by ref. The refness of front's type really shouldn't have anything to do with its transience. They're completely unrelated.I didn't understand any of those arguments, so perhaps I still don't fully comprehend this issue. Persistence ----------- I thought persistent X means that there's a dedicated storage space for X where it will exist after front returns and X's value will not be tampered with by calling popFront. That notion of persistence led to me into thinking that persistent front could (and really should) return either a reference to X or a const reference if the range doesn't want to allow mutation through front. Transience ---------- The other side of the coin in my thinking was that transient means that X doesn't have a dedicated storage space. So transient X could be a variable that's local to function front, or it could be that X is stored in a buffer and will exist there after front returns, but the buffer would be overwritten by a call to popFront. And that notion of transience led me into thinking that transient front could never return a reference.NOTE: All of the above assumes that D were designed so that all types had value semantics.Which will never happen. So, any discussion based on that premise is pretty pointless. And this discussion would never have happened in the first place if D had no reference types, because you'll never have a transient front with a value type. - Jonathan M Davis
Nov 12 2012
On Monday, 12 November 2012 at 22:44:22 UTC, Jonathan M Davis wrote:On Monday, November 12, 2012 23:01:22 Tommi wrote:Oh, it was just a matter of miscommunication then. I bet you missed the following post of mine, which made it clear that I wasn't suggesting a solution, but simply dreaming of a better language design (like I usually am): On Monday, 12 November 2012 at 08:37:20 UTC, Tommi wrote:NOTE: All of the above assumes that D were designed so that all types had value semantics.Which will never happen. So, any discussion based on that premise is pretty pointless. And this discussion would never have happened in the first place if D had no reference types, because you'll never have a transient front with a value type.Now I finally see that deepDup/deepCopy/clone is not a (good) solution, because it would be inefficient in a lot of situations. This whole mess makes me wish that D was designed so that all types had value semantics (by convention, since it's probably not possible to enforce by the language). That would mean: 1) No classes. Just duck-typing based polymorphism à la go language. 2) Dynamic arrays of mutable types would have had to been implemented as copy-on-write types à la Qt containers. Don't know about the performance implications of COW though.I may throw these wild ideas around, but I don't do it in order to have them implemented by D, but so that some-one could crush those ideas with counter-arguments. But yeah, this would be a wrong thread to have that discussion anyway.
Nov 12 2012
On Monday, 12 November 2012 at 08:37:20 UTC, Tommi wrote:This whole mess makes me wish that D was designed so that all types had value semantics (by convention, since it's probably not possible to enforce by the language)."..so that all types had value semantics". That's a bit too harsh. Rather there would need to two kinds of types: 1) struct: own their data, value semantics 2) referer: don't own their data, reference semantics Dynamic arrays would fall into the first category; owns their data, have value semantics. Slices of dynamic arrays would be a separate type, falling into the second category; don't own the data, reference semantics. Range could be of either kind of type. You'd need two kinds of pointers too: the kind that owns its data, and the kind that references data that someone else owns. And you'd be able to differentiate between these two kinds of types at compile-time. Disclaimer: I promise not to spam this thread with this idea any further.
Nov 12 2012
On Friday, November 09, 2012 14:52:20 deadalnix wrote:So my question is, how this duplication can be avoided ? If it can, .peekFront is definitively the way to go. But if it can't, this seems like a really problematic point.In most cases, it still won't be a problem. How much in the way of code is generally in front? Almost none. It's popFront which generally has the complicated logic. And if you do end up with a more abnormal range which requires a bunch of work in front, depending on what it does, it can just pass the result of front or peekFront to another function which does the work, or worst case, it can mixin the code. Mostly though, I wouldn't expect much code duplication to actually be required. - Jonathan M Davis
Nov 09 2012
On Friday, 9 November 2012 at 20:22:10 UTC, Jonathan M Davis wrote:On Friday, November 09, 2012 14:52:20 deadalnix wrote:Quite a ton, if it's calculated lazily.So my question is, how this duplication can be avoided ? If it can, .peekFront is definitively the way to go. But if it can't, this seems like a really problematic point.In most cases, it still won't be a problem. How much in the way of code is generally in front?
Nov 09 2012
On Wed, Nov 07, 2012 at 10:40:32PM +0100, Jonathan M Davis wrote:On Wednesday, November 07, 2012 21:31:13 deadalnix wrote:[...] Yeah, UFCS will kick in if the range itself doesn't define peekFront, then the free function peekFront will simply call the range's .front instead. So basically: if an algo needs .front to be persistent, they use .front. If they don't care if .front is persistent, they use .peekFront. T -- You are only young once, but you can stay immature indefinitely. -- azephrahelOK, overall, .peekFront seems like a good idea. Something I'm afraid with .peekFront is code duplication. Let's take the joiner example. Joiner's transientness depend on its source transientness. This seems to me like a very common case for transformer ranges. If we choose the .peekFront, the naive thing to do is to implement the same algorithm twice, using front and peekFront, which is code duplication, and usually a bad idea.Why would you need to duplicate anything?. If you can implement it using peekFront, then you use peekFront. If you can't, you use front. And anything which uses front when you could have used peekFront will still work. Also, if a free function peekFront which forwards to front is defined, then all range- based functions can use peekFront if they need it regardless of whether a range defines it (it's just that it would do the same as front in the case where the range didn't define it).
Nov 07 2012
On Sun, Nov 04, 2012 at 07:11:22PM -0800, Jonathan M Davis wrote:On Sunday, November 04, 2012 15:40:18 deadalnix wrote:[...] The crux of what deadalnix proposed is that range transformers simply pass .transient along, whilst range consumers decide whether or not to use .transient. So this isn't even an issue until you are in a range consumer, which can decide whether or not the transient range should be used. IOW, the choice between .transient or not isn't even made until the end. The range consumer is the one in the position to decide whether or not .transient should be used. T -- A bend in the road is not the end of the road unless you fail to make the turn. -- Brian WhiteLe 04/11/2012 03:26, Jonathan M Davis a écrit :The problem is that you can't opt out of the fast range once you've got it. If you use fastRange and then the result ends up getting wrapped by another range, that range's front is transient and has no way of indicating it. So, you're right back in the boat that we're in right now. In order to avoid that, virtually all range-based functions would have to not use fastRange, because they have no idea whether you're going to use their result to pass to another function or not.3. Make it so that ranges which can be transient are non-transient by default but provide a function to get at a transient version for speed (which was the fastRange proposal in this thread). The main problem here is that when the fast range gets wrapped, it's transient, and so anything using the wrapped range will be forced to use the transient version rather than using the non- transient version and only using the transient version when it's asked for. So, I don't think that this is particularly viable.Can you elaborate on that. I definitively don't see a problem that big here.
Nov 04 2012
On 11/3/12 7:21 PM, H. S. Teoh wrote:On Fri, Nov 02, 2012 at 04:17:10PM -0400, Jonathan M Davis wrote:I think at this point we should streamline and simplify ranges while addressing fundamental concepts (such as unbuffered ranges). Delving into defining increasingly niche attributes for ranges is important work, but to the extent we can make things simpler that would be a gain. In my view, transiency of .front has a lot to do with input ranges. An input range means that .front is evanescent upon calling .popBack; it has no lasting presence in memory, and invoking popBack means the next item will replace the current one. Fetching lines should be solved by using types; trafficking in char[] does not offer guarantees about the preservation of the content. In contrast, trafficking in string formalizes a binding agreement between the range and its user that the content is immutable. AndreiOn Friday, November 02, 2012 10:01:55 H. S. Teoh wrote:[...] I wish Andrei would give some input as to how we should proceed with this. I do consider this a major issue with ranges, because for efficiency reasons I often write ranges that have transient .front values, and they can lead to subtle bugs with the current implementation of std.algorithm. It would be good to settle on this issue one way or another. I'd be happy even if the decision is to say that transient ranges are invalid and shouldn't be considered "real" ranges. Anything is better than the current nebulous state of things which only leads to subtle bugs for the unwary.Ah, I see. That makes sense. So basically it's not the source (or any intermediate step) that decides whether to use the optimization, but the final consumer. Though now we have the issue that all intermediate ranges must propagate .fast, which is troublesome if every range has to do it manually. Can this be handled automatically by UFCS?It's no different form propogating slicing or random access or whatnot. Wrapper ranges have to look at the capabilities of the ranges that they're wrapping and create wrappers for each of the range functions where appropriate. If we added isTransient or fastRange or whatever, wrapper ranges would then have to take it into account, or the wrapper wouldn't have it.
Nov 04 2012
On Sun, Nov 04, 2012 at 08:24:55AM -0500, Andrei Alexandrescu wrote:On 11/3/12 7:21 PM, H. S. Teoh wrote:[...][...] I think that this may be oversimplifying the issue. What about the example of a range of in-place permutations over an array? That range can be made a forward range, or even bidirectional (by reversing the sense of the comparison used by nextPermutation). But it is inherently a transient range. And this is merely one example. I have some non-trivial code that evaluates an expression tree, each node of which may generate multiple elements; one may take a range over all possible values that the tree may have. For efficiency reasons, each output value (which is vector-valued) reuses the same buffer. It is a forward range with a transient .front (it has to be a forward range, because otherwise one can't range over all combinations of multiple values that subtrees may produce). While I'm perfectly fine with the decision that such ranges shouldn't be considered proper ranges at all, it would greatly limit the applicability of std.algorithm in my code. Either I would have to artificially limit the range to an input range (which is a bit silly, as it amounts to renaming .save to something else just so std.algorithm wouldn't recognize it as a forward range), or I wouldn't be able to use std.algorithm to manipulate these ranges, but would have to roll my own. Of course, D makes it so that rolling my own isn't all that hard, but it would be a bit disappointing that the supposedly-generic algorithms in the standard library aren't that generic after all. T -- I see that you JS got Bach.I wish Andrei would give some input as to how we should proceed with this. I do consider this a major issue with ranges, because for efficiency reasons I often write ranges that have transient .front values, and they can lead to subtle bugs with the current implementation of std.algorithm. It would be good to settle on this issue one way or another. I'd be happy even if the decision is to say that transient ranges are invalid and shouldn't be considered "real" ranges. Anything is better than the current nebulous state of things which only leads to subtle bugs for the unwary.I think at this point we should streamline and simplify ranges while addressing fundamental concepts (such as unbuffered ranges). Delving into defining increasingly niche attributes for ranges is important work, but to the extent we can make things simpler that would be a gain. In my view, transiency of .front has a lot to do with input ranges. An input range means that .front is evanescent upon calling .popBack; it has no lasting presence in memory, and invoking popBack means the next item will replace the current one. Fetching lines should be solved by using types; trafficking in char[] does not offer guarantees about the preservation of the content. In contrast, trafficking in string formalizes a binding agreement between the range and its user that the content is immutable.
Nov 04 2012
On 11/4/12 11:16 AM, H. S. Teoh wrote:The express purpose here is to simplify instead of offering highly granular notions with many odd corner cases and combinations. So the risk of oversimplifying is indeed material.Fetching lines should be solved by using types; trafficking in char[] does not offer guarantees about the preservation of the content. In contrast, trafficking in string formalizes a binding agreement between the range and its user that the content is immutable.[...] I think that this may be oversimplifying the issue.What about the example of a range of in-place permutations over an array? That range can be made a forward range, or even bidirectional (by reversing the sense of the comparison used by nextPermutation). But it is inherently a transient range.That's an interesting example indeed. It would keep a T[] as its state and each popFront() would mutate that array to its next permutation. Implementing .save would be achieved by duplicating the array. I would argue this is a tenuous range to characterize as a forward range. Invoking .save on a "usual" forward range means saving the iteration state, with an understanding that the saved range will go through the same iteration steps as the initial one. But nonono, actually the saved range goes through DUPLICATES of the states that the original goes through. Clearly there is an isomorphism of the two sequences of states, but they're not the same states - for example, someone looking at the original array will see the original changing the array, but not the .save. So what you're looking at is a range that can define .dup, not one that can define .save.And this is merely one example. I have some non-trivial code that evaluates an expression tree, each node of which may generate multiple elements; one may take a range over all possible values that the tree may have. For efficiency reasons, each output value (which is vector-valued) reuses the same buffer. It is a forward range with a transient .front (it has to be a forward range, because otherwise one can't range over all combinations of multiple values that subtrees may produce).I'm not sure about this example. Are we looking again at a .dup kind of thing?While I'm perfectly fine with the decision that such ranges shouldn't be considered proper ranges at all, it would greatly limit the applicability of std.algorithm in my code. Either I would have to artificially limit the range to an input range (which is a bit silly, as it amounts to renaming .save to something else just so std.algorithm wouldn't recognize it as a forward range), or I wouldn't be able to use std.algorithm to manipulate these ranges, but would have to roll my own. Of course, D makes it so that rolling my own isn't all that hard, but it would be a bit disappointing that the supposedly-generic algorithms in the standard library aren't that generic after all.I understand. At the same time, there is a point where an abstraction can't comprehend all possible instances. There's already significant aggravation moveFront and hasLvalueElements etc., which I'd want to simplify given a chance. Continuing down that path seems risky to me. Andrei
Nov 04 2012