digitalmars.D - output ranges: by ref or by value?
- Andrei Alexandrescu (23/23) Dec 31 2009 Consider:
- Michel Fortin (21/31) Dec 31 2009 I think modeling built-in arrays is the way to go as it makes less
- Philippe Sigaud (28/48) Dec 31 2009 I agree. And arrays may well be the most used range anyway.
- Andrei Alexandrescu (43/105) Jan 01 2010 Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays
- Michel Fortin (23/27) Jan 01 2010 I agree and disagree. I wasn't proposing that ranges support ~=, and I
- Jason House (9/36) Jan 01 2010 I worry about a growing level of convention with ranges. Another recent...
- Andrei Alexandrescu (11/48) Jan 01 2010 I am implementing right now a change in the range interface mentioned in...
- Jason House (3/58) Jan 01 2010 Or if they completely forgot that bit of convention and omit creating a ...
- Michel Fortin (19/34) Jan 01 2010 I don't think this will be a problem for the optimizer.
- Andrei Alexandrescu (4/42) Jan 01 2010 save() is not only for classes. It also distinguishes input ranges from
- Michel Fortin (16/19) Jan 01 2010 Right. I still maintain that it's a bad approach. I've written a lot of
- Andrei Alexandrescu (4/22) Jan 01 2010 I tried that, it makes input ranges next to unusable. save() is an
- Michel Fortin (20/29) Jan 02 2010 I think I can see why. You can't have ref member and local variables
- Andrei Alexandrescu (7/40) Jan 02 2010 That's an idea, and names are powerful, but I think it's reasonable to
- Michel Fortin (21/26) Jan 02 2010 I'm not expecting a miracle from it either, it'd just be much less confu...
- Andrei Alexandrescu (32/97) Jan 01 2010 It may be best to discuss this on an example:
- Rainer Deyke (10/14) Jan 01 2010 Require that all ranges are structs. If you want to implement a range
- Andrei Alexandrescu (4/14) Jan 01 2010 That's a good idea, but it doesn't work with covariant return types.
- Andrei Alexandrescu (3/17) Jan 01 2010 Oh, besides it doesn't work for struct ranges that iterate one-pass stre...
- Steven Schveighoffer (4/21) Jan 02 2010 What does save do in those cases?
- Andrei Alexandrescu (21/46) Jan 02 2010 It provides a syntactic differentiation between input ranges and forward...
- Steven Schveighoffer (13/59) Jan 02 2010 Would it not be as useful then to just define attributes on the type and...
- Andrei Alexandrescu (10/74) Jan 02 2010 If you have an enum that says a range's state can be saved, then you
- Steven Schveighoffer (43/63) Jan 02 2010 The function already exists -- opAssign. My point is that save would ju...
- Andrei Alexandrescu (38/109) Jan 02 2010 You might mean this(this). Either doesn't help because you'd still be
- Steven Schveighoffer (55/91) Jan 02 2010 Yes, that's what I meant :) I haven't gotten used to struct constructor...
- Michel Fortin (27/43) Jan 02 2010 This is making me think...
- Andrei Alexandrescu (6/50) Jan 02 2010 These are implementation matters. save() could lazily save information,
- Michel Fortin (31/60) Jan 03 2010 The idea is that you would use save and restore like this:
- Andrei Alexandrescu (40/102) Jan 02 2010 Why gratuitously limit the design? You're asking to replace this:
- Steven Schveighoffer (62/160) Jan 03 2010 Well, one thing you could do is:
- Andrei Alexandrescu (52/221) Jan 03 2010 isForwardRange will be defined to yield true if and only if the range
- Steven Schveighoffer (128/281) Jan 03 2010 The answer is, you don't. Ranges via interfaces are slower than primiti...
- Andrei Alexandrescu (93/283) Jan 03 2010 I understand we don't agree on this. To me, making containers work with
- Steven Schveighoffer (34/86) Jan 03 2010 We are arguing in circles, so I will just stop :)
Consider: R2 copy(R1, R2)(R1 src, R2 tgt) { foreach (ref e; src) tgt.put(e); return tgt; } Currently output ranges are supposed to define only the put() method. However, you also want to copy using regular ranges as a target, hence the shim: // pseudo-method void put(R, E)(ref R tgt, E e) { tgt.front = e; tgt.popFront(); } Now copying ranges is possible even when they don't define put(). An example of "pure" output range is ArrayAppender. It doesn't define anything interesting except put. The question of this post is the following: should output ranges be passed by value or by reference? ArrayAppender uses an extra indirection to work properly when passed by value. But if we want to model built-in arrays' operator ~=, we'd need to request that all output ranges be passed by reference. Any ideas - please share. Andrei
Dec 31 2009
On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:The question of this post is the following: should output ranges be passed by value or by reference? ArrayAppender uses an extra indirection to work properly when passed by value. But if we want to model built-in arrays' operator ~=, we'd need to request that all output ranges be passed by reference.I think modeling built-in arrays is the way to go as it makes less things to learn. In fact, it makes it easier to learn ranges because you can begin by learning arrays, then transpose this knowledge to ranges which are more abstract and harder to grasp. Beside, an extra indirection is wasteful when you don't need it. It's easier to add a new layer of indirection when you need one than the reverse, so the primitive shouldn't require any indirection.// pseudo-method void put(R, E)(ref R tgt, E e) { tgt.front = e; tgt.popFront(); }I like that because it works especially well with arrays. Here's what I'm thinking of: char[10] buffer; char[] remainingSpace = buffer[]; while (!remainingSpace.empty) remainingSpace.put(getc()); // now buffer is full writeln(buffer); -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Dec 31 2009
On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin michelf.com>wrote:On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu < SeeWebsiteForEmail erdani.org> said: The question of this post is the following: should output ranges be passedI agree. And arrays may well be the most used range anyway. Beside, an extra indirection is wasteful when you don't need it. It's easierby value or by reference? ArrayAppender uses an extra indirection to work properly when passed by value. But if we want to model built-in arrays' operator ~=, we'd need to request that all output ranges be passed by reference.I think modeling built-in arrays is the way to go as it makes less things to learn. In fact, it makes it easier to learn ranges because you can begin by learning arrays, then transpose this knowledge to ranges which are more abstract and harder to grasp.to add a new layer of indirection when you need one than the reverse, so the primitive shouldn't require any indirection.So (squint through sleep-deprived eyes:) that makes it by ref, right?// pseudo-methodA few random comments, sorry if they are not entirely coherent: - this new put needs hasAssignableElements!R, right? What's in this case the difference between isOutputRange!R and hasAssignableElements!R? - should all higher-order ranges expose a put method if possible? (stride comes to mind, but also take or filter). - does that play nice with the new auto ref / ref template parameter from 2.038? It seems to me that this new feature will go hand in hand with this, but I may be mistaken. - your shim method will be used like this: put(r,e); whereas for an output range you use: r.put(e); and you cannot freely go from one form to the other, except for arrays, which are output ranges anyway [*]. Does that mean that you must disseminate static if ByRef/Assignable/Output/Whatever checks everywhere, to use either put(r,e) or r.put(e)? - what if R is a range of ranges (ie: if E is itself a range). Should put by invoked recursively? What if its a chunked range? - something I wanted to ask for a long time: does put really write to the range as written in the docs or to the underlying container for which the output range is but a 'writable' view? The container/range separation does not exist for arrays, but is important for other cases. Philippe [*] except if this transformation rule is implemented now?void put(R, E)(ref R tgt, E e) { tgt.front = e; tgt.popFront(); }
Dec 31 2009
Philippe Sigaud wrote:On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin michelf.com <mailto:michel.fortin michelf.com>> wrote: On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org <mailto:SeeWebsiteForEmail erdani.org>> said: The question of this post is the following: should output ranges be passed by value or by reference? ArrayAppender uses an extra indirection to work properly when passed by value. But if we want to model built-in arrays' operator ~=, we'd need to request that all output ranges be passed by reference. I think modeling built-in arrays is the way to go as it makes less things to learn. In fact, it makes it easier to learn ranges because you can begin by learning arrays, then transpose this knowledge to ranges which are more abstract and harder to grasp. I agree. And arrays may well be the most used range anyway.Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays motivated by practical necessity. I don't want to propagate that quirk into ranges. The best output range is one that works properly when passed by value.Beside, an extra indirection is wasteful when you don't need it. It's easier to add a new layer of indirection when you need one than the reverse, so the primitive shouldn't require any indirection. So (squint through sleep-deprived eyes:) that makes it by ref, right? // pseudo-method void put(R, E)(ref R tgt, E e) { tgt.front = e; tgt.popFront(); }It doesn't. The ref in there is to pass tgt to the pseudo-method put, not to the function that invokes it.A few random comments, sorry if they are not entirely coherent: - this new put needs hasAssignableElements!R, right? What's in this case the difference between isOutputRange!R and hasAssignableElements!R?It's a good question. There are two possible designs: 1. In the current design, the difference is that hasAssignableElements!R does not imply the range may grow. Consider this: auto a = new int[10], b = new int[10]; copy(a, b); This should work. But this shouldn't: auto a = new int[10], b = new int[5]; copy(a, b); because copy does not grow the target. If you want to append to b, you write: copy(a, appender(&b)); 2. In the design sketched in http://www.informit.com/articles/printerfriendly.aspx?p=1407357, iteration is separated from access. In that approach, you'd have a one-pass range for both input and output. I'm not sure which design is better. (Suggestions are welcome.) For a pure output range, it's awkward to define empty() (what should it return?) and it's also awkward to put elements by calling two functions front/popFront instead of one.- should all higher-order ranges expose a put method if possible? (stride comes to mind, but also take or filter).I don't think so. The generic put will take care of that.- does that play nice with the new auto ref / ref template parameter from 2.038? It seems to me that this new feature will go hand in hand with this, but I may be mistaken.There's no obvious connection. The nice thing about auto ref is this: struct SomeAdapterRange(R) if (isWhateverRange!R) { private R src; property auto ref front() { return src.front; } } You don't want to see how that looks today. Actually: http://www.dsource.org/projects/phobos/browser/trunk/phobos/std/range.d Search the page for "mixin" :o}.- your shim method will be used like this: put(r,e); whereas for an output range you use: r.put(e); and you cannot freely go from one form to the other, except for arrays, which are output ranges anyway [*]. Does that mean that you must disseminate static if ByRef/Assignable/Output/Whatever checks everywhere, to use either put(r,e) or r.put(e)?The D language automatically rewrites the latter into the former.- what if R is a range of ranges (ie: if E is itself a range). Should put by invoked recursively? What if its a chunked range?I don't know.- something I wanted to ask for a long time: does put really write to the range as written in the docs or to the underlying container for which the output range is but a 'writable' view? The container/range separation does not exist for arrays, but is important for other cases.Depends on how the range is defined. Appender holds a pointer to an array and appends to it. But appender is a special-purpose range. A usual range cannot change the topology of the container it's under. Andrei
Jan 01 2010
On 2010-01-01 09:47:58 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays motivated by practical necessity. I don't want to propagate that quirk into ranges. The best output range is one that works properly when passed by value.I agree and disagree. I wasn't proposing that ranges support ~=, and I don't thing it'd be a good idea, so I agree with you here. But I still believe output ranges should behave like arrays. I was proposing that you model output ranges after buffers. A stream then becomes a buffer of infinite length. Look at this example: char[10] buffer; char[] remainingSpace = buffer[]; while (!remainingSpace.empty) remainingSpace.put(getc()); // now buffer is full writeln(buffer); Now rename "remainingSpace" for "outputStream" and it works fine, except for two things: "empty" sounds strange, and an output stream is never empty of remaining space: it's infinite length. But conceptually, a buffer and a stream are almost the same, one having finite capacity while the other is infinite. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 01 2010
Andrei Alexandrescu wrote:Philippe Sigaud wrote:I worry about a growing level of convention with ranges. Another recent range thread discussed the need to call consume after a successful call to startsWith. If I violated convention and had a range class, things would fail miserably. There would be no need to consume after a successful call to startsWith and the range would have a random number of elements removed on an unsuccessful call to startsWith. I'm pretty sure that early discussions of ranges claimed that they could be either structs and classes, but in practice that isn't the case.On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin michelf.com <mailto:michel.fortin michelf.com>> wrote: On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org <mailto:SeeWebsiteForEmail erdani.org>> said: The question of this post is the following: should output ranges be passed by value or by reference? ArrayAppender uses an extra indirection to work properly when passed by value. But if we want to model built-in arrays' operator ~=, we'd need to request that all output ranges be passed by reference. I think modeling built-in arrays is the way to go as it makes less things to learn. In fact, it makes it easier to learn ranges because you can begin by learning arrays, then transpose this knowledge to ranges which are more abstract and harder to grasp. I agree. And arrays may well be the most used range anyway.Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays motivated by practical necessity. I don't want to propagate that quirk into ranges. The best output range is one that works properly when passed by value.
Jan 01 2010
Jason House wrote:Andrei Alexandrescu wrote:I am implementing right now a change in the range interface mentioned in http://www.informit.com/articles/printerfriendly.aspx?p=1407357, namely: add a function save() that saves the iteration state of a range. With save() in tow, class ranges and struct ranges can be used the same way. True, if someone forgets to say auto copy = r.save(); and instead says: auto copy = r; the behavior will indeed be different for class ranges and struct ranges. AndreiPhilippe Sigaud wrote:I worry about a growing level of convention with ranges. Another recent range thread discussed the need to call consume after a successful call to startsWith. If I violated convention and had a range class, things would fail miserably. There would be no need to consume after a successful call to startsWith and the range would have a random number of elements removed on an unsuccessful call to startsWith. I'm pretty sure that early discussions of ranges claimed that they could be either structs and classes, but in practice that isn't the case.On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin michelf.com <mailto:michel.fortin michelf.com>> wrote: On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org <mailto:SeeWebsiteForEmail erdani.org>> said: The question of this post is the following: should output ranges be passed by value or by reference? ArrayAppender uses an extra indirection to work properly when passed by value. But if we want to model built-in arrays' operator ~=, we'd need to request that all output ranges be passed by reference. I think modeling built-in arrays is the way to go as it makes less things to learn. In fact, it makes it easier to learn ranges because you can begin by learning arrays, then transpose this knowledge to ranges which are more abstract and harder to grasp. I agree. And arrays may well be the most used range anyway.Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays motivated by practical necessity. I don't want to propagate that quirk into ranges. The best output range is one that works properly when passed by value.
Jan 01 2010
Andrei Alexandrescu Wrote:Jason House wrote:Or if they completely forgot that bit of convention and omit creating a variable called save... Also, doesn't use of save degrade performance for structs? Or does the inliner/optimizer remove the copy variable altogether?Andrei Alexandrescu wrote:I am implementing right now a change in the range interface mentioned in http://www.informit.com/articles/printerfriendly.aspx?p=1407357, namely: add a function save() that saves the iteration state of a range. With save() in tow, class ranges and struct ranges can be used the same way. True, if someone forgets to say auto copy = r.save(); and instead says: auto copy = r; the behavior will indeed be different for class ranges and struct ranges.Philippe Sigaud wrote:I worry about a growing level of convention with ranges. Another recent range thread discussed the need to call consume after a successful call to startsWith. If I violated convention and had a range class, things would fail miserably. There would be no need to consume after a successful call to startsWith and the range would have a random number of elements removed on an unsuccessful call to startsWith. I'm pretty sure that early discussions of ranges claimed that they could be either structs and classes, but in practice that isn't the case.On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin michelf.com <mailto:michel.fortin michelf.com>> wrote: On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org <mailto:SeeWebsiteForEmail erdani.org>> said: The question of this post is the following: should output ranges be passed by value or by reference? ArrayAppender uses an extra indirection to work properly when passed by value. But if we want to model built-in arrays' operator ~=, we'd need to request that all output ranges be passed by reference. I think modeling built-in arrays is the way to go as it makes less things to learn. In fact, it makes it easier to learn ranges because you can begin by learning arrays, then transpose this knowledge to ranges which are more abstract and harder to grasp. I agree. And arrays may well be the most used range anyway.Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays motivated by practical necessity. I don't want to propagate that quirk into ranges. The best output range is one that works properly when passed by value.Andrei
Jan 01 2010
On 2010-01-01 14:10:40 -0500, Jason House <jason.james.house gmail.com> said:I don't think this will be a problem for the optimizer. But I say dump save(): it's more trouble than it's worth. I'm sure it will be a problem for most programmers, as most will learn and test algorithms with arrays and struct ranges and not classes. "save()" is a detail too easy to miss. The benefit of supporting classes by asking everyone to remember adding save() seems tiny considering you can easily create a struct wrapper if you really need your range to be a class. I know that the struct wrapper would probably force more copies of a range than necessary. But there are many ways this could be alleviated. For instance, the struct wrapper could use copy-on-write with a reference counter to detect when a copy is necessary. So while it'd be a little harder to design a range from a class, it'd be easier for everyone to use ranges correctly. -- Michel Fortin michel.fortin michelf.com http://michelf.com/With save() in tow, class ranges and struct ranges can be used the same way. True, if someone forgets to say auto copy = r.save(); and instead says: auto copy = r; the behavior will indeed be different for class ranges and struct ranges.Or if they completely forgot that bit of convention and omit creating a variable called save... Also, doesn't use of save degrade performance for structs? Or does the inliner/optimizer remove the copy variable altogether?
Jan 01 2010
Michel Fortin wrote:On 2010-01-01 14:10:40 -0500, Jason House <jason.james.house gmail.com> said:save() is not only for classes. It also distinguishes input ranges from forward ranges. It's the primitive that STL didn't define but should have. AndreiI don't think this will be a problem for the optimizer. But I say dump save(): it's more trouble than it's worth. I'm sure it will be a problem for most programmers, as most will learn and test algorithms with arrays and struct ranges and not classes. "save()" is a detail too easy to miss. The benefit of supporting classes by asking everyone to remember adding save() seems tiny considering you can easily create a struct wrapper if you really need your range to be a class. I know that the struct wrapper would probably force more copies of a range than necessary. But there are many ways this could be alleviated. For instance, the struct wrapper could use copy-on-write with a reference counter to detect when a copy is necessary. So while it'd be a little harder to design a range from a class, it'd be easier for everyone to use ranges correctly.With save() in tow, class ranges and struct ranges can be used the same way. True, if someone forgets to say auto copy = r.save(); and instead says: auto copy = r; the behavior will indeed be different for class ranges and struct ranges.Or if they completely forgot that bit of convention and omit creating a variable called save... Also, doesn't use of save degrade performance for structs? Or does the inliner/optimizer remove the copy variable altogether?
Jan 01 2010
On 2010-01-01 15:53:42 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:save() is not only for classes. It also distinguishes input ranges from forward ranges. It's the primitive that STL didn't define but should have.Right. I still maintain that it's a bad approach. I've written a lot of algorithms in C++ that worked with iterators, always assuming assignment would copy the state. Fortunately I didn't had to use input iterators with them, most of the time. But I did once or twice, and the thing was working slightly off. What I'd do instead is somehow make input ranges non-copyable. They could be either passed by ref or moved, never copied. This way they would still behave exactly like array slices, only not copyable, and you get a compile-time error if you try to copy them which is infinitely better than a subtle change in behavior. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 01 2010
Michel Fortin wrote:On 2010-01-01 15:53:42 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:I tried that, it makes input ranges next to unusable. save() is an imperfect but workable solution. Andreisave() is not only for classes. It also distinguishes input ranges from forward ranges. It's the primitive that STL didn't define but should have.Right. I still maintain that it's a bad approach. I've written a lot of algorithms in C++ that worked with iterators, always assuming assignment would copy the state. Fortunately I didn't had to use input iterators with them, most of the time. But I did once or twice, and the thing was working slightly off. What I'd do instead is somehow make input ranges non-copyable. They could be either passed by ref or moved, never copied. This way they would still behave exactly like array slices, only not copyable, and you get a compile-time error if you try to copy them which is infinitely better than a subtle change in behavior.
Jan 01 2010
On 2010-01-01 17:54:12 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:Michel Fortin wrote:I think I can see why. You can't have ref member and local variables like in C++, so it's pretty hard to use references.What I'd do instead is somehow make input ranges non-copyable. They could be either passed by ref or moved, never copied. This way they would still behave exactly like array slices, only not copyable, and you get a compile-time error if you try to copy them which is infinitely better than a subtle change in behavior.I tried that, it makes input ranges next to unusable.save() is an imperfect but workable solution.save() is an workable but error-prone solution. Perhaps we could mitigate this by making people more aware of the difference instead. Couldn't we rename "input range" for "input stream"? Currently you have ranges that behave one way and ranges that behave the other way, which is confusing. Having a different name for both would emphasize there is a difference. With different names, you're guarantied to get the "what's the difference?" question from newbies. And it's simple to explain: "You can often use ranges and streams interchangeably, but for that to work you must use save() when you need a copy of the current state. Also, not all streams support save(). It's good practice to always use save() so that algorithms work for both for ranges and streams." -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 02 2010
Michel Fortin wrote:On 2010-01-01 17:54:12 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:That's an idea, and names are powerful, but I think it's reasonable to not expect miracles from that name change. It has disadvantages too - "input range" vs. "forward range" clarifies there's a conceptual relationship between the two, whereas "streams" are different from anything else. AndreiMichel Fortin wrote:I think I can see why. You can't have ref member and local variables like in C++, so it's pretty hard to use references.What I'd do instead is somehow make input ranges non-copyable. They could be either passed by ref or moved, never copied. This way they would still behave exactly like array slices, only not copyable, and you get a compile-time error if you try to copy them which is infinitely better than a subtle change in behavior.I tried that, it makes input ranges next to unusable.save() is an imperfect but workable solution.save() is an workable but error-prone solution. Perhaps we could mitigate this by making people more aware of the difference instead. Couldn't we rename "input range" for "input stream"? Currently you have ranges that behave one way and ranges that behave the other way, which is confusing. Having a different name for both would emphasize there is a difference. With different names, you're guarantied to get the "what's the difference?" question from newbies. And it's simple to explain: "You can often use ranges and streams interchangeably, but for that to work you must use save() when you need a copy of the current state. Also, not all streams support save(). It's good practice to always use save() so that algorithms work for both for ranges and streams."
Jan 02 2010
On 2010-01-02 09:59:51 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:That's an idea, and names are powerful, but I think it's reasonable to not expect miracles from that name change. It has disadvantages too - "input range" vs. "forward range" clarifies there's a conceptual relationship between the two, whereas "streams" are different from anything else.I'm not expecting a miracle from it either, it'd just be much less confusing. You could say that assignment of an input stream might or might not save its state (depending on the stream type) so you must call save() to save the state when working with streams, but ranges are guarantied to save their state on assignment, thus behaving more predictably and just like arrays. So if you're working only with ranges, not streams, you never need to worry about save(). A similar option would be to have both input ranges and input streams: * input range: by value semantics, no need for save() * input stream: by reference semantics A pointer to an input range would thus automatically qualify as an input stream, so it's easy to give an input range to a function taking an input stream. Well, except for stack-allocated ranges in SafeD for which you can't create a pointer. This pretty much break the idea, I think. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 02 2010
Jason House wrote:Andrei Alexandrescu Wrote:It may be best to discuss this on an example: /** If $(D startsWith(r1, r2)), consume the corresponding elements off $(D r1) and return $(D true). Otherwise, leave $(D r1) unchanged and return $(D false). */ bool consume(R1, R2)(ref R1 r1, R2 r2) if (isForwardRange!R1 && isInputRange!R2) { auto r = r1.save(); while (!r2.empty && !r.empty && r.front == r2.front) { r.popFront(); r2.popFront(); } if (r2.empty) { r1 = r; return true; } return false; } For most structs, save() is very simple: auto save() { return this; } For classes, save() entails creating a new object: auto save() { return new typeof(this)(this); } If the implementor of consume() forgets to call save(), the situation is unpleasant albeit not catastrophic: for most struct ranges things will continue to work, but for class ranges the function will fail to perform to spec. I don't know how to improve on that. Anyway, it's not entirely a convention. I'll change isForwardRange to require the existence of save(). AndreiJason House wrote:Or if they completely forgot that bit of convention and omit creating a variable called save... Also, doesn't use of save degrade performance for structs? Or does the inliner/optimizer remove the copy variable altogether?Andrei Alexandrescu wrote:I am implementing right now a change in the range interface mentioned in http://www.informit.com/articles/printerfriendly.aspx?p=1407357, namely: add a function save() that saves the iteration state of a range. With save() in tow, class ranges and struct ranges can be used the same way. True, if someone forgets to say auto copy = r.save(); and instead says: auto copy = r; the behavior will indeed be different for class ranges and struct ranges.Philippe Sigaud wrote:I worry about a growing level of convention with ranges. Another recent range thread discussed the need to call consume after a successful call to startsWith. If I violated convention and had a range class, things would fail miserably. There would be no need to consume after a successful call to startsWith and the range would have a random number of elements removed on an unsuccessful call to startsWith. I'm pretty sure that early discussions of ranges claimed that they could be either structs and classes, but in practice that isn't the case.On Thu, Dec 31, 2009 at 16:47, Michel Fortin <michel.fortin michelf.com <mailto:michel.fortin michelf.com>> wrote: On 2009-12-31 09:58:06 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org <mailto:SeeWebsiteForEmail erdani.org>> said: The question of this post is the following: should output ranges be passed by value or by reference? ArrayAppender uses an extra indirection to work properly when passed by value. But if we want to model built-in arrays' operator ~=, we'd need to request that all output ranges be passed by reference. I think modeling built-in arrays is the way to go as it makes less things to learn. In fact, it makes it easier to learn ranges because you can begin by learning arrays, then transpose this knowledge to ranges which are more abstract and harder to grasp. I agree. And arrays may well be the most used range anyway.Upon more thinking, I'm leaning the other way. ~= is a quirk of arrays motivated by practical necessity. I don't want to propagate that quirk into ranges. The best output range is one that works properly when passed by value.
Jan 01 2010
Andrei Alexandrescu wrote:If the implementor of consume() forgets to call save(), the situation is unpleasant albeit not catastrophic: for most struct ranges things will continue to work, but for class ranges the function will fail to perform to spec. I don't know how to improve on that.Require that all ranges are structs. If you want to implement a range as a class, use a wrapper struct that creates a new object in its postblit function. The wrapper struct can be made generic and placed in the standard library. Same performance as the current approach, slightly more effort on the part of the range implementor, much easier and less error-prone on the side of the range user. -- Rainer Deyke - rainerd eldwood.com
Jan 01 2010
Rainer Deyke wrote:Andrei Alexandrescu wrote:That's a good idea, but it doesn't work with covariant return types. Those are needed for the container hierarchy that I'm working on. AndreiIf the implementor of consume() forgets to call save(), the situation is unpleasant albeit not catastrophic: for most struct ranges things will continue to work, but for class ranges the function will fail to perform to spec. I don't know how to improve on that.Require that all ranges are structs. If you want to implement a range as a class, use a wrapper struct that creates a new object in its postblit function. The wrapper struct can be made generic and placed in the standard library.
Jan 01 2010
Rainer Deyke wrote:Andrei Alexandrescu wrote:Oh, besides it doesn't work for struct ranges that iterate one-pass streams. AndreiIf the implementor of consume() forgets to call save(), the situation is unpleasant albeit not catastrophic: for most struct ranges things will continue to work, but for class ranges the function will fail to perform to spec. I don't know how to improve on that.Require that all ranges are structs. If you want to implement a range as a class, use a wrapper struct that creates a new object in its postblit function. The wrapper struct can be made generic and placed in the standard library. Same performance as the current approach, slightly more effort on the part of the range implementor, much easier and less error-prone on the side of the range user.
Jan 01 2010
On Fri, 01 Jan 2010 18:45:35 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Rainer Deyke wrote:What does save do in those cases? -SteveAndrei Alexandrescu wrote:Oh, besides it doesn't work for struct ranges that iterate one-pass streams.If the implementor of consume() forgets to call save(), the situation is unpleasant albeit not catastrophic: for most struct ranges things will continue to work, but for class ranges the function will fail to perform to spec. I don't know how to improve on that.Require that all ranges are structs. If you want to implement a range as a class, use a wrapper struct that creates a new object in its postblit function. The wrapper struct can be made generic and placed in the standard library. Same performance as the current approach, slightly more effort on the part of the range implementor, much easier and less error-prone on the side of the range user.
Jan 02 2010
Steven Schveighoffer wrote:On Fri, 01 Jan 2010 18:45:35 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:It provides a syntactic differentiation between input ranges and forward ranges. STL's input iterators are syntactically indistinguishable from forward iterators. Because of that, all STL algorithms that expect forward ranges will also compile and run with input ranges, although never with the expected result. This has been a lingering problem with C++98, and a key motivator for concepts. Since syntactic differences cannot be used to distinguish between input and forward iterators, the reasoning went, we need something else as a discriminator - and that would be a concept: someone who defines an input iterator declares it obeys the input iterator concept, whereas someone defining a forward iterator declares it obeys the forward iterator concept. During the meltdown of concepts, a number of people including Bjarne Stroustrup and myself have suggested that a simple workable solution would be to define an extra function a la "save" that is used in algorithms and only supported by forward iterators, but not by input iterators. Then, algorithms use save() and will correctly refuse to compile calls with input iterators. The remaining risk is that someone writes an algorithm and forgets to use save(). AndreiRainer Deyke wrote:What does save do in those cases? -SteveAndrei Alexandrescu wrote:Oh, besides it doesn't work for struct ranges that iterate one-pass streams.If the implementor of consume() forgets to call save(), the situation is unpleasant albeit not catastrophic: for most struct ranges things will continue to work, but for class ranges the function will fail to perform to spec. I don't know how to improve on that.Require that all ranges are structs. If you want to implement a range as a class, use a wrapper struct that creates a new object in its postblit function. The wrapper struct can be made generic and placed in the standard library. Same performance as the current approach, slightly more effort on the part of the range implementor, much easier and less error-prone on the side of the range user.
Jan 02 2010
On Sat, 02 Jan 2010 13:06:25 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Steven Schveighoffer wrote:Would it not be as useful then to just define attributes on the type and save a bunch of useless/weird looking code? that is, have an enum value inside the type that defines its state can be saved. It seems to me like save is the equivalent of that anyways, since its possible to forget to use it, and still have your code compile. Basically, it appears to me that save either a) doesn't compile or b) is the equivalent of assignment. It doesn't seem to add much to the functionality. This is all except for classes, which I think have no business being ranges in the first place. -SteveOn Fri, 01 Jan 2010 18:45:35 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:It provides a syntactic differentiation between input ranges and forward ranges. STL's input iterators are syntactically indistinguishable from forward iterators. Because of that, all STL algorithms that expect forward ranges will also compile and run with input ranges, although never with the expected result. This has been a lingering problem with C++98, and a key motivator for concepts. Since syntactic differences cannot be used to distinguish between input and forward iterators, the reasoning went, we need something else as a discriminator - and that would be a concept: someone who defines an input iterator declares it obeys the input iterator concept, whereas someone defining a forward iterator declares it obeys the forward iterator concept. During the meltdown of concepts, a number of people including Bjarne Stroustrup and myself have suggested that a simple workable solution would be to define an extra function a la "save" that is used in algorithms and only supported by forward iterators, but not by input iterators. Then, algorithms use save() and will correctly refuse to compile calls with input iterators. The remaining risk is that someone writes an algorithm and forgets to use save().Rainer Deyke wrote:What does save do in those cases? -SteveAndrei Alexandrescu wrote:Oh, besides it doesn't work for struct ranges that iterate one-pass streams.If the implementor of consume() forgets to call save(), the situation is unpleasant albeit not catastrophic: for most struct ranges things will continue to work, but for class ranges the function will fail to perform to spec. I don't know how to improve on that.Require that all ranges are structs. If you want to implement a range as a class, use a wrapper struct that creates a new object in its postblit function. The wrapper struct can be made generic and placed in the standard library. Same performance as the current approach, slightly more effort on the part of the range implementor, much easier and less error-prone on the side of the range user.
Jan 02 2010
Steven Schveighoffer wrote:On Sat, 02 Jan 2010 13:06:25 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:If you have an enum that says a range's state can be saved, then you still need a function to effect that save :o). So you added, not removed, bloat.Steven Schveighoffer wrote:Would it not be as useful then to just define attributes on the type and save a bunch of useless/weird looking code? that is, have an enum value inside the type that defines its state can be saved. It seems to me like save is the equivalent of that anyways, since its possible to forget to use it, and still have your code compile.On Fri, 01 Jan 2010 18:45:35 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:It provides a syntactic differentiation between input ranges and forward ranges. STL's input iterators are syntactically indistinguishable from forward iterators. Because of that, all STL algorithms that expect forward ranges will also compile and run with input ranges, although never with the expected result. This has been a lingering problem with C++98, and a key motivator for concepts. Since syntactic differences cannot be used to distinguish between input and forward iterators, the reasoning went, we need something else as a discriminator - and that would be a concept: someone who defines an input iterator declares it obeys the input iterator concept, whereas someone defining a forward iterator declares it obeys the forward iterator concept. During the meltdown of concepts, a number of people including Bjarne Stroustrup and myself have suggested that a simple workable solution would be to define an extra function a la "save" that is used in algorithms and only supported by forward iterators, but not by input iterators. Then, algorithms use save() and will correctly refuse to compile calls with input iterators. The remaining risk is that someone writes an algorithm and forgets to use save().Rainer Deyke wrote:What does save do in those cases? -SteveAndrei Alexandrescu wrote:Oh, besides it doesn't work for struct ranges that iterate one-pass streams.If the implementor of consume() forgets to call save(), the situation is unpleasant albeit not catastrophic: for most struct ranges things will continue to work, but for class ranges the function will fail to perform to spec. I don't know how to improve on that.Require that all ranges are structs. If you want to implement a range as a class, use a wrapper struct that creates a new object in its postblit function. The wrapper struct can be made generic and placed in the standard library. Same performance as the current approach, slightly more effort on the part of the range implementor, much easier and less error-prone on the side of the range user.Basically, it appears to me that save either a) doesn't compile or b) is the equivalent of assignment. It doesn't seem to add much to the functionality.No, for class ranges save() is nontrivial (a sort of clone). I suspect even struct ranges for certain file-oriented stuff would do nontrivial work (e.g. open the file anew and fseek() on the current position etc.)This is all except for classes, which I think have no business being ranges in the first place.Well then how would the container hierarchy work? It does need range interfaces. How does dcollections deal with that? Andrei
Jan 02 2010
On Sat, 02 Jan 2010 13:51:48 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Steven Schveighoffer wrote:The function already exists -- opAssign. My point is that save would just be the same as opAssign in most cases where you'd want to implement save. Cases where you'd want to implement save differently than opAssign are cases where you wouldn't want to use it in a generic fashion.Would it not be as useful then to just define attributes on the type and save a bunch of useless/weird looking code? that is, have an enum value inside the type that defines its state can be saved. It seems to me like save is the equivalent of that anyways, since its possible to forget to use it, and still have your code compile.If you have an enum that says a range's state can be saved, then you still need a function to effect that save :o). So you added, not removed, bloat.Classes should not be ranges. And I hope that any algorithm that uses savable ranges would not be used on a file range. For example, your consume function: bool consume(R1, R2)(ref R1 r1, R2 r2) if (isForwardRange!R1 && isInputRange!R2) { auto r = r1.save(); // open an extra file, and seek the file to the given point *just in case*?! while (!r2.empty && !r.empty && r.front == r2.front) { r.popFront(); r2.popFront(); } if (r2.empty) { r1 = r; // note that this unaliases r1 from any other copies (not save()'d copies) that were made of it. // also note that you now have opened more handles to the same file. Calling this many times could // consume quite a few handles. return true; } return false; } Here is a good exercise that would help clarify what we need: determine all the types of ranges you can think of that would need a special save function.Basically, it appears to me that save either a) doesn't compile or b) is the equivalent of assignment. It doesn't seem to add much to the functionality.No, for class ranges save() is nontrivial (a sort of clone). I suspect even struct ranges for certain file-oriented stuff would do nontrivial work (e.g. open the file anew and fseek() on the current position etc.)A container is not a range. A range of a container should not be a class. For dcollections, I was planning on ranges being a struct with a begin and end cursor defining the part of the container referenced by the range. Such a range can always be copied and modifying the copy through the range functions should not modify the original range. I see a range as being useful for iteration or algorithms, but not for general use. A great example is AAs. Would you say that an AA *is* a range or should *provide* a range? If it is a range, does that mean you remove elements as you call popFront? Does that make any sense? If it doesn't, then what happens if you add elements through another alias to that AA? -SteveThis is all except for classes, which I think have no business being ranges in the first place.Well then how would the container hierarchy work? It does need range interfaces. How does dcollections deal with that?
Jan 02 2010
Steven Schveighoffer wrote:On Sat, 02 Jan 2010 13:51:48 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:You might mean this(this). Either doesn't help because you'd still be unable to differentiate between input ranges and forward ranges. Much of the point of save() is to mark a syntactic difference between input ranges and forward ranges. Input ranges still need this(this) and opAssign - those just have different semantics.Steven Schveighoffer wrote:The function already exists -- opAssign. My point is that save would just be the same as opAssign in most cases where you'd want to implement save. Cases where you'd want to implement save differently than opAssign are cases where you wouldn't want to use it in a generic fashion.Would it not be as useful then to just define attributes on the type and save a bunch of useless/weird looking code? that is, have an enum value inside the type that defines its state can be saved. It seems to me like save is the equivalent of that anyways, since its possible to forget to use it, and still have your code compile.If you have an enum that says a range's state can be saved, then you still need a function to effect that save :o). So you added, not removed, bloat.A range that defines save() is a forward range. save() creates an independent range from its source. The file etc. example was hypothetical but realistic.Classes should not be ranges. And I hope that any algorithm that uses savable ranges would not be used on a file range. For example, your consume function: bool consume(R1, R2)(ref R1 r1, R2 r2) if (isForwardRange!R1 && isInputRange!R2) { auto r = r1.save(); // open an extra file, and seek the file to the given point *just in case*?! while (!r2.empty && !r.empty && r.front == r2.front) { r.popFront(); r2.popFront(); } if (r2.empty) { r1 = r; // note that this unaliases r1 from any other copies (not save()'d copies) that were made of it. // also note that you now have opened more handles to the same file. Calling this many times could // consume quite a few handles. return true; } return false; } Here is a good exercise that would help clarify what we need: determine all the types of ranges you can think of that would need a special save function.Basically, it appears to me that save either a) doesn't compile or b) is the equivalent of assignment. It doesn't seem to add much to the functionality.No, for class ranges save() is nontrivial (a sort of clone). I suspect even struct ranges for certain file-oriented stuff would do nontrivial work (e.g. open the file anew and fseek() on the current position etc.)Struct ranges won't work with a container hierarchy. If you define a container hierarchy (classes etc.) you'll also need a range hierarchy. Otherwise defining the inheritance relationship is impossible. Consider: abstract class Container(E) { // most general container property bool empty(); bool add(E element); E removeAny(); InputRange!E opSlice(); } That's what I'd expect any container worth its salt to implement: (a) test for empty; (b) add an element to the container; (c) remove some element from the container; (d) get a range that spans the entire container. (Probably removeAll() and a range-positioned remove() would make sense, too.) My point is, how do you inherit this guy? Well by taking advantage of covariant return types: abstract class Array(E) : Container!E { property bool empty(); bool add(E element); E removeAny(); RandomAccessRange!E opSlice(); ... more stuff ... } Ergo, RandomAccessRange!E must inherit InputRange!E for covariance to kick in. The resulting setup is quite interesting: you either know you work with an Array and therefore you get a random-access range, or you just use the generic Container and get an input range.A container is not a range. A range of a container should not be a class. For dcollections, I was planning on ranges being a struct with a begin and end cursor defining the part of the container referenced by the range. Such a range can always be copied and modifying the copy through the range functions should not modify the original range.This is all except for classes, which I think have no business being ranges in the first place.Well then how would the container hierarchy work? It does need range interfaces. How does dcollections deal with that?I see a range as being useful for iteration or algorithms, but not for general use. A great example is AAs. Would you say that an AA *is* a range or should *provide* a range? If it is a range, does that mean you remove elements as you call popFront? Does that make any sense? If it doesn't, then what happens if you add elements through another alias to that AA?An AA provides several ranges - among which byKey and byValue. Andrei
Jan 02 2010
On Sat, 02 Jan 2010 19:51:41 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Steven Schveighoffer wrote:Yes, that's what I meant :) I haven't gotten used to struct constructors.The function already exists -- opAssign. My point is that save would just be the same as opAssign in most cases where you'd want to implement save. Cases where you'd want to implement save differently than opAssign are cases where you wouldn't want to use it in a generic fashion.You might mean this(this).Either doesn't help because you'd still be unable to differentiate between input ranges and forward ranges. Much of the point of save() is to mark a syntactic difference between input ranges and forward ranges. Input ranges still need this(this) and opAssign - those just have different semantics.That was the point of the enum. It would be a synthetic difference, but IMO so is save. If it turns out that the only usable times you implement save all look like: auto save() { return *this; } then save gives you nothing. It's kind of a proof by induction (I think). You say that algorithm A requires the ability to save the state of a range. So I define a function save on a range which cannot be saved by a simple assignment (i.e. a file). I use A on that range, and the results are not what I expect or kill performance or consume unneeded resources, I'd rather not use algorithm A on that range, negating the need to define a save function in the first place. I think that's what we're going to end up with.A range that defines save() is a forward range. save() creates an independent range from its source. The file etc. example was hypothetical but realistic.I meant what actual types of ranges, not categories, I don't know how to say this better... i.e. the file range is one type, what are others? What I'm looking for is ranges that would otherwise be input ranges unless you define save on them. For example, a file range without save is an input range, cannot be a forward range. But if you define save, it is a forward range. An array is an example of a range that can be a forward range without the save function. My theory is, given this list of ranges, if you pair them with an algorithm that requires save capability, you wouldn't want to use that algorithm on it anyways (kinda like the consume example).Struct ranges won't work with a container hierarchy. If you define a container hierarchy (classes etc.) you'll also need a range hierarchy. Otherwise defining the inheritance relationship is impossible. Consider: abstract class Container(E) { // most general container property bool empty(); bool add(E element); E removeAny(); InputRange!E opSlice(); } That's what I'd expect any container worth its salt to implement: (a) test for empty; (b) add an element to the container; (c) remove some element from the container; (d) get a range that spans the entire container. (Probably removeAll() and a range-positioned remove() would make sense, too.)The range interface is compile-time only, there is no need to define it in the container interface. opSlice does not need to be part of the interface, even if it is defined on all the container types. opApply makes for a much better interface-friendly iteration anyways. BTW, the primitives in dcollections are: clear(); // clear all elements remove(V v); // remove an element contains(V v); // returns whether an element is contained in the colleciton length(); // the length of the collection dup(); // duplicate the collection opApply(int delegate(ref V v) dg); // iterate the collection opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the collection That means it covers only empty in your list of must-have functions (via length() == 0). Add is not a primitive because the Map collections shouldn't assign some random key to the new element. removeAny is defined only on sets and multisets, but I'm not sure that it couldn't be moved to Collection, in fact, I'll probably do that. Note that it's missing begin and end which are defined on every single container type (i.e. the equivalent of the all-elements range). This is because those primitives return a struct that is different for every container type. It also surpasses opSlice via opApply, since all an input range can do anyways is iterate. In fact, opApply is more powerful because you can change elements and remove elements (via purging). Plus it's more efficient than a range-via-interface.I misunderstood your statement "[a container hierarchy] does need range interfaces." I thought you meant containers themselves need to implement the range interface, I see now that isn't the case, so my bad. -SteveI see a range as being useful for iteration or algorithms, but not for general use. A great example is AAs. Would you say that an AA *is* a range or should *provide* a range? If it is a range, does that mean you remove elements as you call popFront? Does that make any sense? If it doesn't, then what happens if you add elements through another alias to that AA?An AA provides several ranges - among which byKey and byValue.
Jan 02 2010
On 2010-01-02 21:46:57 -0500, "Steven Schveighoffer" <schveiguy yahoo.com> said:That was the point of the enum. It would be a synthetic difference, but IMO so is save. If it turns out that the only usable times you implement save all look like: auto save() { return *this; } then save gives you nothing. It's kind of a proof by induction (I think). You say that algorithm A requires the ability to save the state of a range. So I define a function save on a range which cannot be saved by a simple assignment (i.e. a file). I use A on that range, and the results are not what I expect or kill performance or consume unneeded resources, I'd rather not use algorithm A on that range, negating the need to define a save function in the first place. I think that's what we're going to end up with.This is making me think... First, opening files silently whenever an algorithm feels the need to save its state is just madness. What if the operating system decide to pop a security window when opening a restricted file? What if the file has been deleted or replaced in its directory while your program is still hanging on the old inode? One might want to save the position in a file in order to be able to return there later, but when you return there the last thing you actually want to do is to open the file a second time: what you might realistically want is to set the position to what it was before, calling fseek. So perhaps it would be more useful to have both save() to save the current state (the position in a file), and restore() which would restore the saved state (returning to the saved position in a file). For ranges with reference semantics -- please call them streams! -- save() and restore() would work just as fpos and fseek for files, but they might also not be available like in a TCP stream or in a communication channel between threads. For ranges such as array slices, save and restore would just copy the range in one or the other direction. The only reason to have them is so they can be used as streams. -- Michel Fortin michel.fortin michelf.com http://michelf.com/
Jan 02 2010
Michel Fortin wrote:On 2010-01-02 21:46:57 -0500, "Steven Schveighoffer" <schveiguy yahoo.com> said:These are implementation matters. save() could lazily save information, duplicate the file handle (cheap on many OSs), etc.That was the point of the enum. It would be a synthetic difference, but IMO so is save. If it turns out that the only usable times you implement save all look like: auto save() { return *this; } then save gives you nothing. It's kind of a proof by induction (I think). You say that algorithm A requires the ability to save the state of a range. So I define a function save on a range which cannot be saved by a simple assignment (i.e. a file). I use A on that range, and the results are not what I expect or kill performance or consume unneeded resources, I'd rather not use algorithm A on that range, negating the need to define a save function in the first place. I think that's what we're going to end up with.This is making me think... First, opening files silently whenever an algorithm feels the need to save its state is just madness. What if the operating system decide to pop a security window when opening a restricted file? What if the file has been deleted or replaced in its directory while your program is still hanging on the old inode? One might want to save the position in a file in order to be able to return there later, but when you return there the last thing you actually want to do is to open the file a second time: what you might realistically want is to set the position to what it was before, calling fseek. So perhaps it would be more useful to have both save() to save the current state (the position in a file), and restore() which would restore the saved state (returning to the saved position in a file).For ranges with reference semantics -- please call them streams! -- save() and restore() would work just as fpos and fseek for files, but they might also not be available like in a TCP stream or in a communication channel between threads. For ranges such as array slices, save and restore would just copy the range in one or the other direction. The only reason to have them is so they can be used as streams.This I don't understand. Array slices' save is: T[] save(T[] array) { return array; } Andrei
Jan 02 2010
On 2010-01-03 00:51:46 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> said:First, opening files silently whenever an algorithm feels the need to save its state is just madness. What if the operating system decide to pop a security window when opening a restricted file? What if the file has been deleted or replaced in its directory while your program is still hanging on the old inode? One might want to save the position in a file in order to be able to return there later, but when you return there the last thing you actually want to do is to open the file a second time: what you might realistically want is to set the position to what it was before, calling fseek. So perhaps it would be more useful to have both save() to save the current state (the position in a file), and restore() which would restore the saved state (returning to the saved position in a file).These are implementation matters. save() could lazily save information, duplicate the file handle (cheap on many OSs), etc.The idea is that you would use save and restore like this: auto state = range.save(); // ... range.restore(state); For array slices, save doesn't really have to change, just define restore: T[] save(T[] array) { return array; } void restore(ref T[] array, T[] state) { array = state; } But with file handles: size_t save(FILE handle) { return fpos(handle); } void restore(FILE handle, size_t state) { fseek(handle, state); } And for a range interface to be used in a class hierarchy: interface Range { Variant save(); void restore(Variant state); } This way you don't have to create a new range every time you need to start from a previously saved state, you can reuse the existing one which is much more efficient. Also, the saved state can be anything, of any type, and doesn't need to be a range. So when implementing a non-trivial range you don't need to implement lazy initialization logic and all sort of complicated stuff just in case someone might save the state. Note that with save defined like this you can't really duplicate the range in the general case. I'm not sure if this is good or bad. How many algorithms really require you to duplicate a range? -- Michel Fortin michel.fortin michelf.com http://michelf.com/For ranges with reference semantics -- please call them streams! -- save() and restore() would work just as fpos and fseek for files, but they might also not be available like in a TCP stream or in a communication channel between threads. For ranges such as array slices, save and restore would just copy the range in one or the other direction. The only reason to have them is so they can be used as streams.This I don't understand. Array slices' save is: T[] save(T[] array) { return array; }
Jan 03 2010
Steven Schveighoffer wrote:My theory is, given this list of ranges, if you pair them with an algorithm that requires save capability, you wouldn't want to use that algorithm on it anyways (kinda like the consume example).Why gratuitously limit the design? You're asking to replace this: R save() { return this; } with: enum thisIsAForwardRange = true; Is there a reason? The former leaves in flexibility. The latter doesn't, for no good reason.I want container users to be able to save iteration state and also to open up containers to std.algorithm and other goodies, so I'm shying away from opApply.Struct ranges won't work with a container hierarchy. If you define a container hierarchy (classes etc.) you'll also need a range hierarchy. Otherwise defining the inheritance relationship is impossible. Consider: abstract class Container(E) { // most general container property bool empty(); bool add(E element); E removeAny(); InputRange!E opSlice(); } That's what I'd expect any container worth its salt to implement: (a) test for empty; (b) add an element to the container; (c) remove some element from the container; (d) get a range that spans the entire container. (Probably removeAll() and a range-positioned remove() would make sense, too.)The range interface is compile-time only, there is no need to define it in the container interface. opSlice does not need to be part of the interface, even if it is defined on all the container types. opApply makes for a much better interface-friendly iteration anyways.BTW, the primitives in dcollections are: clear(); // clear all elements remove(V v); // remove an elementSearch and remove? That's an odd primitive. Why wouldn't you offer an interface for iteration that allows an algorithm for search, and a primitive for positioned removal?contains(V v); // returns whether an element is contained in the collecitonI don't think this belongs to primitives. It's O(n) for many containers and again it's a generic algorithm, not a member.length(); // the length of the collectionThat's not a primitive either. std.algorithm.walkLength is. For me, all sorts of red flags and alarm buzzers go off when primitives are guaranteed that can't be implemented efficiently but by a subset of containers. You can't discount complexity as a implementation detail.dup(); // duplicate the collection opApply(int delegate(ref V v) dg); // iterate the collection opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the collection That means it covers only empty in your list of must-have functions (via length() == 0).How do you implement length() for a singly-linked list? Is empty() going to take O(n)?Add is not a primitive because the Map collections shouldn't assign some random key to the new element. removeAny is defined only on sets and multisets, but I'm not sure that it couldn't be moved to Collection, in fact, I'll probably do that.add is a primitive that takes Tuple!(K, V) as the element type.Note that it's missing begin and end which are defined on every single container type (i.e. the equivalent of the all-elements range). This is because those primitives return a struct that is different for every container type.So you can't write container-independent iteration code unless you use opApply, in which case composition becomes tenuous.It also surpasses opSlice via opApply, since all an input range can do anyways is iterate. In fact, opApply is more powerful because you can change elements and remove elements (via purging). Plus it's more efficient than a range-via-interface.An input range is a subset of other (more powerful) ranges. It's also much more flexible. I agree that calling range primitives via interfaces can become an efficiency liability.Yah, they'd offer it as a result of opSlice(). Covariant return types will ensure there's no virtual call when you know what container you operate on. Above all: the primitive set for a container must be a small set of functions that (a) can be implemented by all containers within reasonable efficiency bounds, and (b) are container-specific, not generic. IMHO any container design that defines a search(Element) as a primitive is misguided. Searching is not a container primitive - it's an algorithm. Only more specialized containers (e.g. trees, hashes etc.) can afford to define search(Element) as a primitive. Linear search is a generic algorithm that works the same for everyone. It does not belong as a method of any container. AndreiI misunderstood your statement "[a container hierarchy] does need range interfaces." I thought you meant containers themselves need to implement the range interface, I see now that isn't the case, so my bad.I see a range as being useful for iteration or algorithms, but not for general use. A great example is AAs. Would you say that an AA *is* a range or should *provide* a range? If it is a range, does that mean you remove elements as you call popFront? Does that make any sense? If it doesn't, then what happens if you add elements through another alias to that AA?An AA provides several ranges - among which byKey and byValue.
Jan 02 2010
On Sun, 03 Jan 2010 00:49:08 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Steven Schveighoffer wrote:Well, one thing you could do is: enum thisIsAnInputRange = true; and then no special implementation is needed for normal forward ranges. the other point is there is no special treatment needed inside algorithms -- the risk of forgetting to use save at the right points of the algorithm is higher than forgetting to say isForwardRange!(R) at the beginning of the function.My theory is, given this list of ranges, if you pair them with an algorithm that requires save capability, you wouldn't want to use that algorithm on it anyways (kinda like the consume example).Why gratuitously limit the design? You're asking to replace this: R save() { return this; } with: enum thisIsAForwardRange = true; Is there a reason? The former leaves in flexibility. The latter doesn't, for no good reason.Not having opSlice be part of the interface itself does not preclude it from implementing opSlice, and does not preclude using ranges of it in std.algorithm. If I'm not mistaken, all functions in std.algorithm rely on compile time interfaces. opApply allows for full input range functionality for things like copying and outputting where templates may not be desired.I want container users to be able to save iteration state and also to open up containers to std.algorithm and other goodies, so I'm shying away from opApply.Struct ranges won't work with a container hierarchy. If you define a container hierarchy (classes etc.) you'll also need a range hierarchy. Otherwise defining the inheritance relationship is impossible. Consider: abstract class Container(E) { // most general container property bool empty(); bool add(E element); E removeAny(); InputRange!E opSlice(); } That's what I'd expect any container worth its salt to implement: (a) test for empty; (b) add an element to the container; (c) remove some element from the container; (d) get a range that spans the entire container. (Probably removeAll() and a range-positioned remove() would make sense, too.)The range interface is compile-time only, there is no need to define it in the container interface. opSlice does not need to be part of the interface, even if it is defined on all the container types. opApply makes for a much better interface-friendly iteration anyways.Search and positioned removal are also a primitives, but not defined on the interface. remove was a primitive on Tango's containers, and dcollections was originally meant to be a replacement for Tango's containers. I think the point is, if you have an interface reference, what would be the minimum functionality needed so that you could use a container without knowing its implementation.BTW, the primitives in dcollections are: clear(); // clear all elements remove(V v); // remove an elementSearch and remove? That's an odd primitive. Why wouldn't you offer an interface for iteration that allows an algorithm for search, and a primitive for positioned removal?Again, it's part of the minimal usable interface. It's not a generic algorithm, because some containers can implement this more efficiently. Plus, to use the generic algorithms, you would need to use interfaces as ranges which I think are completely useless.contains(V v); // returns whether an element is contained in the collecitonI don't think this belongs to primitives. It's O(n) for many containers and again it's a generic algorithm, not a member.All current dcollection containers have O(1) length.length(); // the length of the collectionThat's not a primitive either. std.algorithm.walkLength is. For me, all sorts of red flags and alarm buzzers go off when primitives are guaranteed that can't be implemented efficiently but by a subset of containers. You can't discount complexity as a implementation detail.first, dcollections' list implementation is doubly linked because all collections are forward and backward iterable. second, even for singly linked lists, you can have either O(1) length or O(1) splicing (consuming a link list range into another linked list). Dcollections' default link list implementation uses O(1) length since I think splicing is a specialized requirement.dup(); // duplicate the collection opApply(int delegate(ref V v) dg); // iterate the collection opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the collection That means it covers only empty in your list of must-have functions (via length() == 0).How do you implement length() for a singly-linked list? Is empty() going to take O(n)?How do you define that on Container(V)? on Map(K, V), set(K k, V v) is an interface method. what you can do is define Map(K, V) as inheriting Container(Tuple!(K, V)), but then trying to use the container functions are very cumbersome. In dcollections, Map(K, V) inherits Collection(V).Add is not a primitive because the Map collections shouldn't assign some random key to the new element. removeAny is defined only on sets and multisets, but I'm not sure that it couldn't be moved to Collection, in fact, I'll probably do that.add is a primitive that takes Tuple!(K, V) as the element type.No, you can easily write container-independent iteration as long as you have the implementation. If you use interfaces you can write opApply wrappers to do different things. I'm not sure what you mean by composition.Note that it's missing begin and end which are defined on every single container type (i.e. the equivalent of the all-elements range). This is because those primitives return a struct that is different for every container type.So you can't write container-independent iteration code unless you use opApply, in which case composition becomes tenuous.How is it more flexible? You can't replace data, and you can't remove data while iterating, both of which are possible with dcollection's primitives. If you have a Container(E) which defines InputRange!E opSlice, how do you get at the more defined range definition? casting?It also surpasses opSlice via opApply, since all an input range can do anyways is iterate. In fact, opApply is more powerful because you can change elements and remove elements (via purging). Plus it's more efficient than a range-via-interface.An input range is a subset of other (more powerful) ranges. It's also much more flexible. I agree that calling range primitives via interfaces can become an efficiency liability.Not having opSlice on the interface guarantees you never have a virtual call for iteration :) opApply mitigates the virtual call on the interface.Yah, they'd offer it as a result of opSlice(). Covariant return types will ensure there's no virtual call when you know what container you operate on.I misunderstood your statement "[a container hierarchy] does need range interfaces." I thought you meant containers themselves need to implement the range interface, I see now that isn't the case, so my bad.I see a range as being useful for iteration or algorithms, but not for general use. A great example is AAs. Would you say that an AA *is* a range or should *provide* a range? If it is a range, does that mean you remove elements as you call popFront? Does that make any sense? If it doesn't, then what happens if you add elements through another alias to that AA?An AA provides several ranges - among which byKey and byValue.Above all: the primitive set for a container must be a small set of functions that (a) can be implemented by all containers within reasonable efficiency bounds, and (b) are container-specific, not generic. IMHO any container design that defines a search(Element) as a primitive is misguided. Searching is not a container primitive - it's an algorithm. Only more specialized containers (e.g. trees, hashes etc.) can afford to define search(Element) as a primitive. Linear search is a generic algorithm that works the same for everyone. It does not belong as a method of any container.If the minimal container design isn't usable without std.algorithm, then I don't think it's worth having interfaces. If you need std.algorithm, you need the full implementation of the container because it's a compile-time interface. Interface ranges are something that should be avoided, it's like having a programming language where everything has to be a class. What you are saying seems completely incorrect to me: "since not all containers can implement fast search, I'm going to guarantee that *all* containers use a linear search via their interface. *AND* I want to make each loop in the search algorithm call 3 virtual functions!" How is that better than a search function that guarantees linear performance but gives the option of being as fast as possible with no loop virtual calls? In fact, my implementation is guaranteed to be faster than std.algorithm because of the lack of virtual calls. -Steve
Jan 03 2010
Steven Schveighoffer wrote:On Sun, 03 Jan 2010 00:49:08 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:isForwardRange will be defined to yield true if and only if the range defines save. But I see you point - user code only asserts isForwardRange and then does not bother to use save(), just copies stuff around in confidence that copying does the right thing. Thanks for this insight. I don't see how to reconcile that with class ranges and covariance.Steven Schveighoffer wrote:Well, one thing you could do is: enum thisIsAnInputRange = true; and then no special implementation is needed for normal forward ranges. the other point is there is no special treatment needed inside algorithms -- the risk of forgetting to use save at the right points of the algorithm is higher than forgetting to say isForwardRange!(R) at the beginning of the function.My theory is, given this list of ranges, if you pair them with an algorithm that requires save capability, you wouldn't want to use that algorithm on it anyways (kinda like the consume example).Why gratuitously limit the design? You're asking to replace this: R save() { return this; } with: enum thisIsAForwardRange = true; Is there a reason? The former leaves in flexibility. The latter doesn't, for no good reason.Not having opSlice be part of the interface itself does not preclude it from implementing opSlice, and does not preclude using ranges of it in std.algorithm. If I'm not mistaken, all functions in std.algorithm rely on compile time interfaces. opApply allows for full input range functionality for things like copying and outputting where templates may not be desired.The point is how much container-independent code can someone write by using the Container interface. If all you have is a Container, you can't use it with any range algorithm.Yes, and I think remove(V v) does not belong to the minimum functionality. It combines two functions (search and remove) and raises questions such as what happens to duplicate elements.Search and positioned removal are also a primitives, but not defined on the interface. remove was a primitive on Tango's containers, and dcollections was originally meant to be a replacement for Tango's containers. I think the point is, if you have an interface reference, what would be the minimum functionality needed so that you could use a container without knowing its implementation.BTW, the primitives in dcollections are: clear(); // clear all elements remove(V v); // remove an elementSearch and remove? That's an odd primitive. Why wouldn't you offer an interface for iteration that allows an algorithm for search, and a primitive for positioned removal?But this is exactly what I believe to be a mistake: you are abstracting away algorithmic complexity.Again, it's part of the minimal usable interface. It's not a generic algorithm, because some containers can implement this more efficiently.contains(V v); // returns whether an element is contained in the collecitonI don't think this belongs to primitives. It's O(n) for many containers and again it's a generic algorithm, not a member.Plus, to use the generic algorithms, you would need to use interfaces as ranges which I think are completely useless.Why?Some containers can't define O(1) length conveniently.All current dcollection containers have O(1) length.length(); // the length of the collectionThat's not a primitive either. std.algorithm.walkLength is. For me, all sorts of red flags and alarm buzzers go off when primitives are guaranteed that can't be implemented efficiently but by a subset of containers. You can't discount complexity as a implementation detail.Right. The question is how much pressure Container is putting on the implementation. I'd rather leave it to the list implementation to decide to store the length or not.first, dcollections' list implementation is doubly linked because all collections are forward and backward iterable. second, even for singly linked lists, you can have either O(1) length or O(1) splicing (consuming a link list range into another linked list). Dcollections' default link list implementation uses O(1) length since I think splicing is a specialized requirement.dup(); // duplicate the collection opApply(int delegate(ref V v) dg); // iterate the collection opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the collection That means it covers only empty in your list of must-have functions (via length() == 0).How do you implement length() for a singly-linked list? Is empty() going to take O(n)?Map!(K, V) has Tuple!(K, V) as its element type.How do you define that on Container(V)? on Map(K, V), set(K k, V v) is an interface method.Add is not a primitive because the Map collections shouldn't assign some random key to the new element. removeAny is defined only on sets and multisets, but I'm not sure that it couldn't be moved to Collection, in fact, I'll probably do that.add is a primitive that takes Tuple!(K, V) as the element type.what you can do is define Map(K, V) as inheriting Container(Tuple!(K, V)), but then trying to use the container functions are very cumbersome. In dcollections, Map(K, V) inherits Collection(V).In this context: container-independent = using the Container interface. This is the whole purpose of creating a container hierarchy. If the container design fosters knowing the implementation, maybe a class hierarchy is not the right choice in the first place.No, you can easily write container-independent iteration as long as you have the implementation.Note that it's missing begin and end which are defined on every single container type (i.e. the equivalent of the all-elements range). This is because those primitives return a struct that is different for every container type.So you can't write container-independent iteration code unless you use opApply, in which case composition becomes tenuous.If you use interfaces you can write opApply wrappers to do different things. I'm not sure what you mean by composition.For example, compose ranges a la retro or take.You can replace data by assigning to range's elements. Removal is done via positioned remove (which I think needs to be a primitive).How is it more flexible? You can't replace data, and you can't remove data while iterating, both of which are possible with dcollection's primitives. If you have a Container(E) which defines InputRange!E opSlice, how do you get at the more defined range definition? casting?It also surpasses opSlice via opApply, since all an input range can do anyways is iterate. In fact, opApply is more powerful because you can change elements and remove elements (via purging). Plus it's more efficient than a range-via-interface.An input range is a subset of other (more powerful) ranges. It's also much more flexible. I agree that calling range primitives via interfaces can become an efficiency liability.And takes away the ability to compose ranges and to use algorithms with the container.Not having opSlice on the interface guarantees you never have a virtual call for iteration :) opApply mitigates the virtual call on the interface.Yah, they'd offer it as a result of opSlice(). Covariant return types will ensure there's no virtual call when you know what container you operate on.I misunderstood your statement "[a container hierarchy] does need range interfaces." I thought you meant containers themselves need to implement the range interface, I see now that isn't the case, so my bad.I see a range as being useful for iteration or algorithms, but not for general use. A great example is AAs. Would you say that an AA *is* a range or should *provide* a range? If it is a range, does that mean you remove elements as you call popFront? Does that make any sense? If it doesn't, then what happens if you add elements through another alias to that AA?An AA provides several ranges - among which byKey and byValue.Why? I think the other way: if the minimal container design is unusable with std.algorithm, the design took a wrong turn somewhere.Above all: the primitive set for a container must be a small set of functions that (a) can be implemented by all containers within reasonable efficiency bounds, and (b) are container-specific, not generic. IMHO any container design that defines a search(Element) as a primitive is misguided. Searching is not a container primitive - it's an algorithm. Only more specialized containers (e.g. trees, hashes etc.) can afford to define search(Element) as a primitive. Linear search is a generic algorithm that works the same for everyone. It does not belong as a method of any container.If the minimal container design isn't usable without std.algorithm, then I don't think it's worth having interfaces.If you need std.algorithm, you need the full implementation of the container because it's a compile-time interface.How did you reach that conclusion? std.algorithm uses a syntactic interface that is obeyed by interfaces too. There's no problem.Interface ranges are something that should be avoided, it's like having a programming language where everything has to be a class.I disagree. The negation of an implementation dogma can be just as limiting as the dogma. The way I see it, a design defines some desiderata. Then it does whatever it takes to fulfill them. If one desideratum is to use a class hierachy to write container-independent code, then interface ranges naturally follow. There's no ifs and buts about it.What you are saying seems completely incorrect to me: "since not all containers can implement fast search, I'm going to guarantee that *all* containers use a linear search via their interface.This is a misunderstanding. In the STL linear containers don't define find(), but associative containers do. That is the correct design.*AND* I want to make each loop in the search algorithm call 3 virtual functions!"Not necessarily. This happens only if you use the Container interface to write container-independent code. It is the cost it takes for the design to fulfill its desiderata.How is that better than a search function that guarantees linear performance but gives the option of being as fast as possible with no loop virtual calls?It is better because it doesn't write off search complexity as an implementation detail. "Linear or better" is not a good guarantee. A good guarantee is "From this node of the hierarchy down, this primitive is defined to guarantee O(log n) search". Linear search is a generic algorithm and does not belong to any container. Andrei
Jan 03 2010
On Sun, 03 Jan 2010 09:25:25 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Steven Schveighoffer wrote:The answer is, you don't. Ranges via interfaces are slower than primitive functions defined on the interface, so use what's best for interfaces when you have an interface and what's best for compile-time optimization when you have the full implementation.Not having opSlice be part of the interface itself does not preclude it from implementing opSlice, and does not preclude using ranges of it in std.algorithm. If I'm not mistaken, all functions in std.algorithm rely on compile time interfaces. opApply allows for full input range functionality for things like copying and outputting where templates may not be desired.The point is how much container-independent code can someone write by using the Container interface. If all you have is a Container, you can't use it with any range algorithm.the function removes one value if it is in the container, zero if it is not. Another primitive interface Multi(V) adds a removeAll(V v) function. Again, it combines two functions that are awkward or difficult to implement using ranges via interfaces.Yes, and I think remove(V v) does not belong to the minimum functionality. It combines two functions (search and remove) and raises questions such as what happens to duplicate elements.Search and positioned removal are also a primitives, but not defined on the interface. remove was a primitive on Tango's containers, and dcollections was originally meant to be a replacement for Tango's containers. I think the point is, if you have an interface reference, what would be the minimum functionality needed so that you could use a container without knowing its implementation.BTW, the primitives in dcollections are: clear(); // clear all elements remove(V v); // remove an elementSearch and remove? That's an odd primitive. Why wouldn't you offer an interface for iteration that allows an algorithm for search, and a primitive for positioned removal?Not really, I say right in the docs that contains(V) can take O(n) time. There's no abstraction. I don't say that the algorithmic complexity is defined by the implementation. It's no different than std.algorithm's find.But this is exactly what I believe to be a mistake: you are abstracting away algorithmic complexity.Again, it's part of the minimal usable interface. It's not a generic algorithm, because some containers can implement this more efficiently.contains(V v); // returns whether an element is contained in the collecitonI don't think this belongs to primitives. It's O(n) for many containers and again it's a generic algorithm, not a member.3 virtual calls per iteration, no possibility to inline, and reference copy semantics. The latter is the biggest problem for me. I want to be able to save ranges (iterators) for later use on containers, and having to call to the heap for each save is a deal killer.Plus, to use the generic algorithms, you would need to use interfaces as ranges which I think are completely useless.Why?length is actually defined to return ~0 if it cannot compute the length quickly :) Forgot to mention that detail. So such containers have an "out" so to say. In fact, length is part of a different primitive I defined: Iterator(V) Iterator(V) defines length and opApply and can be implemented by any class, not just containers. My original goal when I planned dcollections for tango was for things like LineInput file readers and such to implement the Iterator interface. That way you could do cool stuff like: container.add(new LineInput("file.txt")); // add all the lines from the file but it didn't come to fruition, so Iterator is only implemented on dcollections...Some containers can't define O(1) length conveniently.All current dcollection containers have O(1) length.length(); // the length of the collectionThat's not a primitive either. std.algorithm.walkLength is. For me, all sorts of red flags and alarm buzzers go off when primitives are guaranteed that can't be implemented efficiently but by a subset of containers. You can't discount complexity as a implementation detail.That is possible with my design. I realize I didn't fully disclose the nature of length before. But of course, there aren't any containers in dcollections that *don't* define length. However, you make it sound like there are so many containers that would want to not store the length. There is only one -- linked lists, and only with special requirements (O(1) splicing) I guess this all goes back to how empty is to be composed. In dcollections, all containers implement O(1) length, so that is not an issue. If you had a specialized linked list implementation, I'm not sure how you could compose it with my primitives. But I don't consider it to be a worthwhile deficiency to fix. most of the time, you care if something is empty because you are about to take an element, or iterate it. Just call the thing you want to do, and there are ways to check if it didn't return anything. In any case, empty could be defined as a primitive if necessary (in all dcollections it would be written return length == 0).Right. The question is how much pressure Container is putting on the implementation. I'd rather leave it to the list implementation to decide to store the length or not.first, dcollections' list implementation is doubly linked because all collections are forward and backward iterable. second, even for singly linked lists, you can have either O(1) length or O(1) splicing (consuming a link list range into another linked list). Dcollections' default link list implementation uses O(1) length since I think splicing is a specialized requirement.dup(); // duplicate the collection opApply(int delegate(ref V v) dg); // iterate the collection opApply(int delegate(ref bool doPurge, ref V v) dg); // purge the collection That means it covers only empty in your list of must-have functions (via length() == 0).How do you implement length() for a singly-linked list? Is empty() going to take O(n)?That makes things awkward. For example, to remove, you must remove the K-V pair. typically you only want to specify the K or the V. I realize it doesn't make things awkward for your container interface since you don't *define* a remove function, but your container interface isn't very useful as a general container.Map!(K, V) has Tuple!(K, V) as its element type.How do you define that on Container(V)? on Map(K, V), set(K k, V v) is an interface method.Add is not a primitive because the Map collections shouldn't assign some random key to the new element. removeAny is defined only on sets and multisets, but I'm not sure that it couldn't be moved to Collection, in fact, I'll probably do that.add is a primitive that takes Tuple!(K, V) as the element type.how do you compose retro on an input range? that's all you got with your container interface. It's the same for opApply. take is simple: class TakeIterator(V) : Iterator(V) { private Iterator!V src; private uint nelems; public this(Iterator!V src, uint nelems) { this.src = src; this.nelems = nelems;} public int opApply(int delegate(ref V v) dg) { uint elems = 0; int result = 0; foreach(ref v; src) { if(elems++ > nelems || (result = dg(v)) != 0) break; } return result; } public uint length() { uint len = src.length(); return len == NO_LENGTH_SUPPORT ? NO_LENGTH_SUPPORT : len < nelems ? len : nelems; } }In this context: container-independent = using the Container interface. This is the whole purpose of creating a container hierarchy. If the container design fosters knowing the implementation, maybe a class hierarchy is not the right choice in the first place.No, you can easily write container-independent iteration as long as you have the implementation.Note that it's missing begin and end which are defined on every single container type (i.e. the equivalent of the all-elements range). This is because those primitives return a struct that is different for every container type.So you can't write container-independent iteration code unless you use opApply, in which case composition becomes tenuous.If you use interfaces you can write opApply wrappers to do different things. I'm not sure what you mean by composition.For example, compose ranges a la retro or take.That is available in dcollections, just not part of the interface. Every container implements remove(cursor) function, but since cursors are implementation specific, it can't be part of the interface.You can replace data by assigning to range's elements. Removal is done via positioned remove (which I think needs to be a primitive).How is it more flexible? You can't replace data, and you can't remove data while iterating, both of which are possible with dcollection's primitives. If you have a Container(E) which defines InputRange!E opSlice, how do you get at the more defined range definition? casting?It also surpasses opSlice via opApply, since all an input range can do anyways is iterate. In fact, opApply is more powerful because you can change elements and remove elements (via purging). Plus it's more efficient than a range-via-interface.An input range is a subset of other (more powerful) ranges. It's also much more flexible. I agree that calling range primitives via interfaces can become an efficiency liability.As long as you only have the interface and not the actual container. I have no problem with that.Not having opSlice on the interface guarantees you never have a virtual call for iteration :) opApply mitigates the virtual call on the interface.And takes away the ability to compose ranges and to use algorithms with the container.If I have to pull out std.algorithm to do simple things like remove a specific element from a container, where std.algorithm may not do the most efficient thing, then why the hell have an interface in the first place? If I have an interface, I want to make it do all the things all containers can do, not delegate it to some other part of the library. I trust that the interface knows how to do container functions in the best way possible.Why?Above all: the primitive set for a container must be a small set of functions that (a) can be implemented by all containers within reasonable efficiency bounds, and (b) are container-specific, not generic. IMHO any container design that defines a search(Element) as a primitive is misguided. Searching is not a container primitive - it's an algorithm. Only more specialized containers (e.g. trees, hashes etc.) can afford to define search(Element) as a primitive. Linear search is a generic algorithm that works the same for everyone. It does not belong as a method of any container.If the minimal container design isn't usable without std.algorithm, then I don't think it's worth having interfaces.I think the other way: if the minimal container design is unusable with std.algorithm, the design took a wrong turn somewhere.If the OO *interface* does not support std.algorithm, then that's good by me, seeing as how you have to use non-inlinable virtual functions to do simple tasks on ranges that cannot be copied without allocating on the heap. I think absolutely containers should support std.algorithm via the method std.algorithm is usable by any other range -- compile-time interfaces.It's a bastardized hack to use interfaces with std.algorithm. I think it's as useful as functional qsort. Yeah the big-O is the same, but the implementation sucks.If you need std.algorithm, you need the full implementation of the container because it's a compile-time interface.How did you reach that conclusion? std.algorithm uses a syntactic interface that is obeyed by interfaces too. There's no problem.You're starting to get too smart for me here :) I have no idea what disrderata means.Interface ranges are something that should be avoided, it's like having a programming language where everything has to be a class.I disagree. The negation of an implementation dogma can be just as limiting as the dogma. The way I see it, a design defines some desiderata. Then it does whatever it takes to fulfill them.If one desideratum is to use a class hierachy to write container-independent code, then interface ranges naturally follow. There's no ifs and buts about it.The container hierarchy can support two *different* paradigms. First, the paradigm of runtime interfaces which may be useful for using containers to store pieces of data to go with an object. Such as a socket cache that maps ip addresses to sockets. This cache cares nothing about sorting or doing retro iteration on the socket cache. It cares not about the implementation of the map, just that the map does map-like things (i.e. assign this key to this value, remove this key, etc.). The other paradigm is to have access to std.algorithm in the most efficient way possible. Such access requires full implementation knowledge to make the most efficient implementation of algorithms. In fact, containers can *use* std.algorithm underneath their member functions if so desired.But STL isn't accessed without the full implementation, you are comparing apples to oranges here. You said that find is something that should be relegated to std.algorithm. Isn't std.algorithm's find a linear search? If I have a Container!E, don't I need to use std.algorithm to search for an element with it's returned range? How is this not forcing a linear search when using an interface to a container? What is the answer to that, don't search for an element in a container unless you have the full implementation? That works in dcollections too!What you are saying seems completely incorrect to me: "since not all containers can implement fast search, I'm going to guarantee that *all* containers use a linear search via their interface.This is a misunderstanding. In the STL linear containers don't define find(), but associative containers do. That is the correct design.Is that not the case we are discussing?*AND* I want to make each loop in the search algorithm call 3 virtual functions!"Not necessarily. This happens only if you use the Container interface to write container-independent code. It is the cost it takes for the design to fulfill its desiderata.That is available via the full implementation. With the interface, it's the "best you can do", because that's all that is available. "linear or better" is a better guarantee than "linear with guaranteed 3 no-inlining virtual function calls per element." -SteveHow is that better than a search function that guarantees linear performance but gives the option of being as fast as possible with no loop virtual calls?It is better because it doesn't write off search complexity as an implementation detail. "Linear or better" is not a good guarantee. A good guarantee is "From this node of the hierarchy down, this primitive is defined to guarantee O(log n) search". Linear search is a generic algorithm and does not belong to any container.
Jan 03 2010
Steven Schveighoffer wrote:On Sun, 03 Jan 2010 09:25:25 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I understand we don't agree on this. To me, making containers work with algorithms is a drop-dead requirement so I will rest my case. Nevertheless, I think there's one point that got missed. It's a tad subtle, and I find it pretty cool because it's the first time I used covariant returns for something interesting. Consider: interface InputIter(T) { ... } interface ForwardIter(T) : InputIter!T { ... } class Container(T) { InputIter!T opSlice() { ... } } class Array(T) : Container(T) { class Iterator : ForwardIter!T { ... all final functions ... } Iterator opSlice(); } Now there are two use cases: a) The user uses Array specifically. auto a = new Array!int; sort(a[]); In this case, sort gets a range of type Array!(int).Iterator, which defines final primitives. Therefore, the compiler does not emit ONE virtual call throughout. I believe this point was lost. b) The user wants generality and OOP-style so uses Container throughout: Container!int a = new Array!int; auto found = find(a[], 5); This time find gets an InputIter!int, so it will use virtual calls for iteration. The beauty of this design is that without any code duplication it clearly spans the spectrum between static knowledge and dynamic flexibility - within one design and one implementation. This is the design I want to pursue.Steven Schveighoffer wrote:The answer is, you don't. Ranges via interfaces are slower than primitive functions defined on the interface, so use what's best for interfaces when you have an interface and what's best for compile-time optimization when you have the full implementation.Not having opSlice be part of the interface itself does not preclude it from implementing opSlice, and does not preclude using ranges of it in std.algorithm. If I'm not mistaken, all functions in std.algorithm rely on compile time interfaces. opApply allows for full input range functionality for things like copying and outputting where templates may not be desired.The point is how much container-independent code can someone write by using the Container interface. If all you have is a Container, you can't use it with any range algorithm.So you need a way to do searching and a way to do positioned remove. Why combine them into one, when you can use both separately to great effect?Yes, and I think remove(V v) does not belong to the minimum functionality. It combines two functions (search and remove) and raises questions such as what happens to duplicate elements.the function removes one value if it is in the container, zero if it is not.Another primitive interface Multi(V) adds a removeAll(V v) function.Why do you need a separate interface for removeAll? Are there containers that can remove one value but not all?Again, it combines two functions that are awkward or difficult to implement using ranges via interfaces.I contend that point.Per my argument above, there may be zero virtual calls per iteration. (I want to make sure my point about covariance was understood even if it is not considered compelling.)Not really, I say right in the docs that contains(V) can take O(n) time. There's no abstraction. I don't say that the algorithmic complexity is defined by the implementation. It's no different than std.algorithm's find.But this is exactly what I believe to be a mistake: you are abstracting away algorithmic complexity.Again, it's part of the minimal usable interface. It's not a generic algorithm, because some containers can implement this more efficiently.contains(V v); // returns whether an element is contained in the collecitonI don't think this belongs to primitives. It's O(n) for many containers and again it's a generic algorithm, not a member.3 virtual calls per iteration, no possibility to inline, and reference copy semantics. The latter is the biggest problem for me. I want to be able to save ranges (iterators) for later use on containers, and having to call to the heap for each save is a deal killer.Plus, to use the generic algorithms, you would need to use interfaces as ranges which I think are completely useless.Why?However, you make it sound like there are so many containers that would want to not store the length. There is only one -- linked lists, and only with special requirements (O(1) splicing)I agree it's tenuous to leave length() out. It's a judgment call.It is. What is true is a converse of your statement - a general container isn't very useful as a map - which I'd agree with, and is sensible OO design. As long as a map is seen as a container, it _does_ store K-V pairs. If I know it's a map, great, it defines a different primitive for removing an element given its key.Map!(K, V) has Tuple!(K, V) as its element type.That makes things awkward. For example, to remove, you must remove the K-V pair. typically you only want to specify the K or the V. I realize it doesn't make things awkward for your container interface since you don't *define* a remove function, but your container interface isn't very useful as a general container.You need to be a bit down in the hierarchy to use retro.how do you compose retro on an input range? that's all you got with your container interface. It's the same for opApply.If you use interfaces you can write opApply wrappers to do different things. I'm not sure what you mean by composition.For example, compose ranges a la retro or take.take is simple: class TakeIterator(V) : Iterator(V) { private Iterator!V src; private uint nelems; public this(Iterator!V src, uint nelems) { this.src = src; this.nelems = nelems;} public int opApply(int delegate(ref V v) dg) { uint elems = 0; int result = 0; foreach(ref v; src) { if(elems++ > nelems || (result = dg(v)) != 0) break; } return result; } public uint length() { uint len = src.length(); return len == NO_LENGTH_SUPPORT ? NO_LENGTH_SUPPORT : len < nelems ? len : nelems; } }And then how do I compose take with something else, or pass it to std.algorithm.find()? This whole thing doesn't hold water.Well that's a problem with the design, no? Why then define a hierarchy? A simple set of unrelated types may be a better design.You can replace data by assigning to range's elements. Removal is done via positioned remove (which I think needs to be a primitive).That is available in dcollections, just not part of the interface. Every container implements remove(cursor) function, but since cursors are implementation specific, it can't be part of the interface.If I have to pull out std.algorithm to do simple things like remove a specific element from a container, where std.algorithm may not do the most efficient thing, then why the hell have an interface in the first place?Things can be arranged such that std.algorithm does do the best thing, and is also the most general and reusable approach. The STL has partly shown that. I plan to take that one step further - to show that container-independent code can be gainfully written with a hierarchy of containers (something that STL failed at).If I have an interface, I want to make it do all the things all containers can do, not delegate it to some other part of the library. I trust that the interface knows how to do container functions in the best way possible.The interface must be expressive enough to let generic algorithms do their job.Did my covariance argument above convince you otherwise?I think the other way: if the minimal container design is unusable with std.algorithm, the design took a wrong turn somewhere.If the OO *interface* does not support std.algorithm, then that's good by me, seeing as how you have to use non-inlinable virtual functions to do simple tasks on ranges that cannot be copied without allocating on the heap.I think absolutely containers should support std.algorithm via the method std.algorithm is usable by any other range -- compile-time interfaces.I agree :o).Come on, the Internet is there. I meant "essential goal".I disagree. The negation of an implementation dogma can be just as limiting as the dogma. The way I see it, a design defines some desiderata. Then it does whatever it takes to fulfill them.You're starting to get too smart for me here :) I have no idea what disrderata means.Yah. Covariance takes care of all of the above. I'd almost agree with you as of a couple of months ago, when the whole covariance thing hadn't occurred to me.If one desideratum is to use a class hierachy to write container-independent code, then interface ranges naturally follow. There's no ifs and buts about it.The container hierarchy can support two *different* paradigms. First, the paradigm of runtime interfaces which may be useful for using containers to store pieces of data to go with an object. Such as a socket cache that maps ip addresses to sockets. This cache cares nothing about sorting or doing retro iteration on the socket cache. It cares not about the implementation of the map, just that the map does map-like things (i.e. assign this key to this value, remove this key, etc.). The other paradigm is to have access to std.algorithm in the most efficient way possible. Such access requires full implementation knowledge to make the most efficient implementation of algorithms. In fact, containers can *use* std.algorithm underneath their member functions if so desired.YES. If a design must express define linear search in more than one place, then that design is suboptimal. This is an important goal I want to follow.But STL isn't accessed without the full implementation, you are comparing apples to oranges here. You said that find is something that should be relegated to std.algorithm.What you are saying seems completely incorrect to me: "since not all containers can implement fast search, I'm going to guarantee that *all* containers use a linear search via their interface.This is a misunderstanding. In the STL linear containers don't define find(), but associative containers do. That is the correct design.Isn't std.algorithm's find a linear search?Yes.If I have a Container!E, don't I need to use std.algorithm to search for an element with it's returned range?No. It's like in the STL - either the container defines its own searching primitives (e.g. set or map), or you can always grab the container's range and use the well-known linear search defined by std.algorithm.How is this not forcing a linear search when using an interface to a container?It isn't forcing a linear search. It is fostering awareness of the capabilities needed by the compiler. A design that defines a best-effort find() as a member is suboptimal. There's no guarantee it can provide. A putative user cannot tell anything about the complexity of their code that uses the find() member. A good design has the user say, hey, linear search is ok here; in this other place, I need a fast find so I can't use Container!T, I need AssociativeContainer!T.What is the answer to that, don't search for an element in a container unless you have the full implementation? That works in dcollections too!The shape and characteristic of the search is defined by the type of the container. A great library that got mentioned here in the past: http://www.gobosoft.com/eiffel/gobo/structure/container.html I strongly recommend studying that design; it is very close to optimality. They define DS_SEARCHABLE as a subclass of DS_CONTAINER - as they should.No, there are two cases as I discussed above in this post.Is that not the case we are discussing?*AND* I want to make each loop in the search algorithm call 3 virtual functions!"Not necessarily. This happens only if you use the Container interface to write container-independent code. It is the cost it takes for the design to fulfill its desiderata.I think one issue might be that you think "interface" and "implementation" with nothing in between. I'm talking "gradually more specialized interface", as the Gobo library above defines. AndreiThat is available via the full implementation. With the interface, it's the "best you can do", because that's all that is available. "linear or better" is a better guarantee than "linear with guaranteed 3 no-inlining virtual function calls per element."How is that better than a search function that guarantees linear performance but gives the option of being as fast as possible with no loop virtual calls?It is better because it doesn't write off search complexity as an implementation detail. "Linear or better" is not a good guarantee. A good guarantee is "From this node of the hierarchy down, this primitive is defined to guarantee O(log n) search". Linear search is a generic algorithm and does not belong to any container.
Jan 03 2010
We are arguing in circles, so I will just stop :) I'll address the one point I think we both feel is most important below On Sun, 03 Jan 2010 17:19:52 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:Steven Schveighoffer wrote:I see why it is attractive to you. But to me, algorithms are not the main driver for containers. One thing we both agree on -- when you know the full implementation, algorithms from std.algorithm should be implemented as fast as possible. Where we disagree is what is desired when the full implementation is not known. I contend that the ability to link with std.algorithm isn't a requirement in that case, and is not worth complicating the whole world of ranges to do so (i.e. build std.algorithm just in case you have reference-type ranges with a "save" requirement). If you don't know the implementation, don't depend on std.algorithm to have all the answers, depend on the implementation which is abstracted correctly by the interface. What this means is there will be some overlap in functions that are defined on the interfaces and functions defined in std.algorithm. Most likely the overlapping functions both point to the same implementation (naturally, this should live in std.algorithm). This is for convenience to people who want to use containers in the interface form, so they do not need to concern themselves with the abilities of std.algorithm, they just want containers that help them get work done. There is still one flaw in your ideas for covariant types: even though final functions can be used throughout and the possibility for inlining exists, you *still* need to use the heap to make copies of ranges. That was and still is my biggest concern. I think the rest of this whole post is based on our disagreement of these design choices, and it really doesn't seem productive to continue all the subtle points. [rest of growing disagreement snipped] BTW, I use covariant return types freely in dcollections and I agree it kicks ass. It seems like black magic especially when you are returning possibly a class or interface which need to be handled differently. -SteveOn Sun, 03 Jan 2010 09:25:25 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I understand we don't agree on this. To me, making containers work with algorithms is a drop-dead requirement so I will rest my case. Nevertheless, I think there's one point that got missed. It's a tad subtle, and I find it pretty cool because it's the first time I used covariant returns for something interesting. Consider: interface InputIter(T) { ... } interface ForwardIter(T) : InputIter!T { ... } class Container(T) { InputIter!T opSlice() { ... } } class Array(T) : Container(T) { class Iterator : ForwardIter!T { ... all final functions ... } Iterator opSlice(); } Now there are two use cases: a) The user uses Array specifically. auto a = new Array!int; sort(a[]); In this case, sort gets a range of type Array!(int).Iterator, which defines final primitives. Therefore, the compiler does not emit ONE virtual call throughout. I believe this point was lost. b) The user wants generality and OOP-style so uses Container throughout: Container!int a = new Array!int; auto found = find(a[], 5); This time find gets an InputIter!int, so it will use virtual calls for iteration. The beauty of this design is that without any code duplication it clearly spans the spectrum between static knowledge and dynamic flexibility - within one design and one implementation. This is the design I want to pursue.Steven Schveighoffer wrote:The answer is, you don't. Ranges via interfaces are slower than primitive functions defined on the interface, so use what's best for interfaces when you have an interface and what's best for compile-time optimization when you have the full implementation.Not having opSlice be part of the interface itself does not preclude it from implementing opSlice, and does not preclude using ranges of it in std.algorithm. If I'm not mistaken, all functions in std.algorithm rely on compile time interfaces. opApply allows for full input range functionality for things like copying and outputting where templates may not be desired.The point is how much container-independent code can someone write by using the Container interface. If all you have is a Container, you can't use it with any range algorithm.
Jan 03 2010