digitalmars.D - Safe Cursors and Ranges
- Pillsy (22/22) Aug 26 2010 These thoughts were inspired by the recent thread, "Retrieving the
- Steven Schveighoffer (43/81) Aug 26 2010 That's exactly how cursors perform in dcollections. An excerpt from
- Pillsy (18/29) Aug 26 2010 [...]
- Steven Schveighoffer (51/78) Aug 26 2010 A cursor that refers to one element is much more efficient to pass
- Pillsy (33/74) Aug 26 2010 Ah, yes. If you just need the one element, I figure that you can use
- Steven Schveighoffer (46/104) Aug 26 2010 I think there is a misunderstanding of how I went about defining
- dsimcha (39/61) Aug 26 2010 operations to get a huge amount of benefit:
- Andrei Alexandrescu (25/36) Aug 26 2010 I'm weary of this philosophy. I've seen a phenomenon happen time and
- Jonathan M Davis (36/59) Aug 26 2010 I think that what it comes down to is that you need to be smart about wh...
- Pillsy (29/52) Aug 27 2010 I really don't disagree. One of the reasons I'm suggesting that we
- Andrei Alexandrescu (13/15) Aug 26 2010 (I wrote my last post before reading yours.) Excellent points. I do
These thoughts were inspired by the recent thread, "Retrieving the traversed range". I am generally pretty sold on the range concept, but I think for some operations some sort of cursor is required. However, there are some unsafe cursor-related operations that are best avoided. However, as far as I can see, the unsafe operations revolve around synthesizing a new range out of two cursors. However, you can get a lot of the desired functionality associated with finding elements and splitting ranges *without* allowing the synthesis of of ranges from unrelated cursors. The key to doing this is to have each cursor be permanently bound to the underlying range which it was derived from. If you have this kind of cursor, you only really need it to support two operations to get a huge amount of benefit: Range before (); Range after (); The before () operation just returns a range of the appropriate type (Forward, Bidirectional, RandomAccess) of all the elements that come before the cursor is pointing to, while the after () operation returns a range of the appropriate type with all the elements after the cursor. Given this concept, a cursor really points at the boundary between elements of a range, if you will. If find and related operations return this kind of cursor, all of the operations involving splitting ranges become trivial. It's completely consistent for either before () or after () to be empty, and since you can't glue two of these cursors together, you are always guaranteed to get well-defined ranges out. While I think the before () and after () operations are the only ones which are strictly necessary for the desiderata I've seen so far, people could write their own generic algorithms if cursors supported more of the iterator operations, for moving forward (and backward, where appropriate) and inspecting the element immediately after the cursor (or before, where appropriate), and if ranges supported operations to return cursors from reasonable places (like from the front or back, or from a specific indexed elemnt for RandomAccess ranges). The key idea is that these cursors aren't a primitive part of a range; instead, they take a range and add a position *inside* the range. They're a perfect fit for all those "three-legged" operations out there because they actually have three legs. Cheers, Pillsy
Aug 26 2010
On Thu, 26 Aug 2010 10:18:51 -0400, Pillsy <pillsbury gmail.com> wrote:These thoughts were inspired by the recent thread, "Retrieving the traversed range". I am generally pretty sold on the range concept, but I think for some operations some sort of cursor is required. However, there are some unsafe cursor-related operations that are best avoided. However, as far as I can see, the unsafe operations revolve around synthesizing a new range out of two cursors. However, you can get a lot of the desired functionality associated with finding elements and splitting ranges *without* allowing the synthesis of of ranges from unrelated cursors. The key to doing this is to have each cursor be permanently bound to the underlying range which it was derived from. If you have this kind of cursor, you only really need it to support two operations to get a huge amount of benefit: Range before (); Range after (); The before () operation just returns a range of the appropriate type (Forward, Bidirectional, RandomAccess) of all the elements that come before the cursor is pointing to, while the after () operation returns a range of the appropriate type with all the elements after the cursor. Given this concept, a cursor really points at the boundary between elements of a range, if you will. If find and related operations return this kind of cursor, all of the operations involving splitting ranges become trivial. It's completely consistent for either before () or after () to be empty, and since you can't glue two of these cursors together, you are always guaranteed to get well-defined ranges out. While I think the before () and after () operations are the only ones which are strictly necessary for the desiderata I've seen so far, people could write their own generic algorithms if cursors supported more of the iterator operations, for moving forward (and backward, where appropriate) and inspecting the element immediately after the cursor (or before, where appropriate), and if ranges supported operations to return cursors from reasonable places (like from the front or back, or from a specific indexed elemnt for RandomAccess ranges). The key idea is that these cursors aren't a primitive part of a range; instead, they take a range and add a position *inside* the range. They're a perfect fit for all those "three-legged" operations out there because they actually have three legs.That's exactly how cursors perform in dcollections. An excerpt from dcollections' concepts document: Cursors: Cursors are special 1 or 0 element ranges. They implement the forward range interface. The benefit to using cursors is that they always refer to exactly one element. A normal range uses two marker elements, a begin and an end element. Therefore, cursors are less susceptible to invalidation on removal of elements. Cursors support these additional features: - Use a cursor as a reference point when dealing with a container. - Use as an end point when slicing a container. Note that some containers only support slicing with an arbitrary cursor and the beginning or end of the container. Slicing is guaranteed to be a fast operation (lgn or better). Determining cursor/range ownership: All collections can identify whether a cursor/range belongs to them. Each collection class has a 'belongs' method to determine ownership. The ownership check is guaranteed to be a fast operation (O(lgN) or better). ------------------- The 'three-legged' cursor as you propose was an early idea of mine too. Because dcollections containers are classes however, there is an easy point of ownership to establish -- the class reference. So a dcollections cursor doesn't have to refer to its owner, but it can in order to fulfill the requirement. My view is that a cursor should be separate from a range, and a range should either be able to tell if a cursor is part of it ( safe) or be told that a cursor is a part of it ( system or trusted). Inside a three-legged algorithm which accepts a range as input, and generates a cursor from the range for the third leg, it can force the range to assume the cursor is a part of it, knowing that it's true. You can make the second operation safe by wrapping a cursor + range in a struct that generates the cursor from the range (and therefore knows the cursor is a part of the range). For two examples of ranges that can tell if a cursor is a part of it, an array can tell if a cursor is a part of it without any extra mechanisms because it is an O(1) check for an interior pointer. A balanced tree can also perform an O(lgn) check that a cursor is part of it (though it's questionable whether a tree is a range). -Steve
Aug 26 2010
Steven Schveighoffer Wrote:On Thu, 26 Aug 2010 10:18:51 -0400, Pillsy <pillsbury gmail.com> wrote:[...][...]The key idea is that these cursors aren't a primitive part of a range; instead, they take a range and add a position *inside* the range. They're a perfect fit for all those "three-legged" operations out there because they actually have three legs.The 'three-legged' cursor as you propose was an early idea of mine too.If you don't mind me asking, what made you give up on it?My view is that a cursor should be separate from a range, and a range should either be able to tell if a cursor is part of it ( safe) or be told that a cursor is a part of it ( system or trusted).What's the advantage of not holding on to the range as part of the cursor? I'm more focused on the typical value-type ranges, but I'm not seeing a lot of need for cursors that have lives independent of underlying ranges. Is the concern just one of more frequent invalidation? IME, three-legged operations are fairly common, but four- and more- legged operations which can't be easily decomposed into consecutive three-legged operations are very uncommon. What I'm thinking is that cursors aren't skinny ranges, they're ranges that are even fatter. You simply wouldn't use one where a range would suffice. Allowing questions about whether a range contains a cursor or not strikes me as just buying trouble. Thanks for your thoughts, Pillsy
Aug 26 2010
On Thu, 26 Aug 2010 11:17:32 -0400, Pillsy <pillsbury gmail.com> wrote:Steven Schveighoffer Wrote:A cursor that refers to one element is much more efficient to pass around/copy. Plus the extra baggage was not needed in dcollections -- any operations you perform on a cursor besides changing the value require you to have the container also (i.e. a cursor is mainly a parameter to the container). I don't know how this relates to three-legged algorithms and straight ranges+cursors, dcollections ranges+cursors would need some extra binding support to use them, since you need the container to create a range based on two cursors.On Thu, 26 Aug 2010 10:18:51 -0400, Pillsy <pillsbury gmail.com> wrote:[...][...]The key idea is that these cursors aren't a primitive part of a range; instead, they take a range and add a position *inside* the range. They're a perfect fit for all those "three-legged" operations out there because they actually have three legs.The 'three-legged' cursor as you propose was an early idea of mine too.If you don't mind me asking, what made you give up on it?Because some cursors do not need the hard reference, it is trivial to tell whether such cursors belong to a range. For instance, it's trivial to tell that a pointer is part of an array. So a cursor for an array would not need to refer the range it belongs to. For instance, if I have an array, and a cursor in that array, it's trivial for an allBefore operation to determine that the cursor is a member of the array. Now, let's say we have two slices that contain the same cursor, in your scheme, in order to use one cursor against the other range, you need to recompose the cursor, which seems unnecessarily awkward. Such a scheme is not possible on other ranges, but that just means those ranges don't support safe calls for allBefore and the like. For those ranges, a 3-legged type would be appropriate, but the 3-legged thing could be a combination of a cursor and a range, with the cursor proven to be part of the range -- an invariant of the 3-legged type. Having a cursor point to exactly one element allows more flexibility, because you could use a cursor with multiple ranges. It also decouples the cursor from a range that might be superfluous information. For example, if I want to store a link list element cursor so I can change it later, why do I also have to carry around the baggage of the range it came from? One of the drawbacks is the cursor cannot move to other elements or alter topology, it's strictly a reference-only. Another reason to have the 3-legged type.My view is that a cursor should be separate from a range, and a range should either be able to tell if a cursor is part of it ( safe) or be told that a cursor is a part of it ( system or trusted).What's the advantage of not holding on to the range as part of the cursor? I'm more focused on the typical value-type ranges, but I'm not seeing a lot of need for cursors that have lives independent of underlying ranges. Is the concern just one of more frequent invalidation?IME, three-legged operations are fairly common, but four- and more- legged operations which can't be easily decomposed into consecutive three-legged operations are very uncommon.I have no experience with three-legged operations. I just looked at cursors from the point of view of reference to a single element. Such an ability is important for a container library. I can see such an ability being important for simple operations as well, not just algorithms.What I'm thinking is that cursors aren't skinny ranges, they're ranges that are even fatter. You simply wouldn't use one where a range would suffice. Allowing questions about whether a range contains a cursor or not strikes me as just buying trouble.But you don't always care if a cursor is a part of a range. For example, to change or refer to a specific element of a range, why do you need to refer to the range also? The composition of a range and cursor together can be guarded by system or trusted indicating the type system is taking your word for it, but in some cases, it can be proven without assumption (i.e. arrays). I don't have a problem with defining such a three-legged type, but can we call it something besides a cursor? Not that I have a trademark or anything, but it would be really confusing to have it mean two different things depending on context :) As I said, I can see utility in having a three-legged beast that combines a cursor and a range, but I see value in having a simple one-element cursor as well. -Steve
Aug 26 2010
Steven Schveighoffer Wrote:On Thu, 26 Aug 2010 11:17:32 -0400, Pillsy <pillsbury gmail.com> wrote:[...]If you don't mind me asking, what made you give up on it?A cursor that refers to one element is much more efficient to pass around/copy.Ah, yes. If you just need the one element, I figure that you can use a range. In most of the cases I'm familiar with, ranges are not distinctly less lightweight than array slices, and in D one tends to blithely pass slices all over the place. Things could be different for iterating over trees---is that where you ran into trouble with the idea in dcollections?Plus the extra baggage was not needed in dcollections -- any operations you perform on a cursor besides changing the value require you to have the container also (i.e. a cursor is mainly a parameter to the container). I don't know how this relates to three-legged algorithms and straight ranges+cursors, dcollections ranges+cursors would need some extra binding support to use them, since you need the container to create a range based on two cursors.What's the advantage of not holding on to the range as part of the cursor? I'm more focused on the typical value-type ranges, but I'm not seeing a lot of need for cursors that have lives independent of underlying ranges. Is the concern just one of more frequent invalidation?Because some cursors do not need the hard reference, it is trivial to tell whether such cursors belong to a range. For instance, it's trivial to tell that a pointer is part of an array. So a cursor for an array would not need to refer the range it belongs to.Ah, I see. That would make things harder to use for the specific use case I'm thinking of, where one has a range one would like to split at a certain element (which you might have found any number of ways). Basically, you can just pass around the cursor, instead of needing the cursor and the range. This strikes me as likely to be very helpful for three-legged races^W cases.For instance, if I have an array, and a cursor in that array, it's trivial for an allBefore operation to determine that the cursor is a member of the array. Now, let's say we have two slices that contain the same cursor, in your scheme, in order to use one cursor against the other range, you need to recompose the cursor, which seems unnecessarily awkward.The way I see it, two different slices can't contain the same cursor; a cursor into a slice will always be associated with that slice and no other. Maybe there are important use cases I'm missing by defining things this way, but I don't know what they might be. [...]For example, if I want to store a link list element cursor so I can change it later, why do I also have to carry around the baggage of the range it came from?You may not, but why not just hang on to the range it came from in the first place instead of using the cursor? What's in the link-list range besides a pointer to a node anyway? [...]What I'm thinking is that cursors aren't skinny ranges, they're ranges that are even fatter. You simply wouldn't use one where a range would suffice. Allowing questions about whether a range contains a cursor or not strikes me as just buying trouble.But you don't always care if a cursor is a part of a range. For example, to change or refer to a specific element of a range, why do you need to refer to the range also?You don't, but what do you lose by referring to the range also? A few words on the stack (sticking with the idea that ranges are value types)? [...]I don't have a problem with defining such a three-legged type, but can we call it something besides a cursor?Sure. How about Tripod? 8^) I was thinking of calling them either Splitters, actually, 'cause that's what they're there for: splitting ranges. Cheers, Pillsy
Aug 26 2010
On Thu, 26 Aug 2010 13:30:35 -0400, Pillsy <pillsbury gmail.com> wrote:Steven Schveighoffer Wrote:I think there is a misunderstanding of how I went about defining dcollections cursors. I never actually tried the tripod idea, I just posted it to the newsgroup as a possible solution for dcollections cursors. Originally in dcollections 1, cursors were iterators -- allowing movement of the cursor as well as dereferencing. This was way way back, before ranges were a prevalent concept and I was trying to write a better collection library for tango. When negotiating with Andrei on allowing me to have cursors (when I hoped dcollections could be a part of phobos), I went through several different ideas to try and make cursors safe. I ended up trying to define a simple reference type that didn't allow iterating. That is, a reference to a single item that cannot move, thereby removing the unsafe nature of iterators. I ran into the problem that the "end" cursor was pointing to an invalid element, so you still had to rely on the coder not referencing it, just like in C++. I decided I'd add a bool to the cursor to let it know it was invalid, and the idea hit me all of a sudden -- that's just a 0 or 1-element range depending on the bool. So the current incarnation was born. And I have to say, it's quite usable. Once I defined it that way, the code almost wrote itself. The issue with using ranges to refer to single elements is, it's difficult to keep a node-based range intact, for later use. For example, if you want to refer to a single element in a tree, and you use a range to do that (assuming it's a range with a start and end node), things can happen. More elements could be inserted between them, the end marker of your range could get removed, etc. Then when you pass the range to the tree's remove function, you're giving it an invalid range, or removing more elements than you wanted. Cursors are not susceptible to these problems, since they only ever refer to one element. So there *is* value in having a special cursor type that only refers to one element vs. a range, regardless of the copy performance.On Thu, 26 Aug 2010 11:17:32 -0400, Pillsy <pillsbury gmail.com> wrote:[...]If you don't mind me asking, what made you give up on it?A cursor that refers to one element is much more efficient to pass around/copy.Ah, yes. If you just need the one element, I figure that you can use a range. In most of the cases I'm familiar with, ranges are not distinctly less lightweight than array slices, and in D one tends to blithely pass slices all over the place. Things could be different for iterating over trees---is that where you ran into trouble with the idea in dcollections?Sure, but you can build such a type and its corresponding splitter function (which actually could be a member of the tripod) in terms of the low-level function which are more powerful because they accept two different entities -- the range and the cursor.Because some cursors do not need the hard reference, it is trivial to tell whether such cursors belong to a range. For instance, it's trivial to tell that a pointer is part of an array. So a cursor for an array would not need to refer the range it belongs to.Ah, I see. That would make things harder to use for the specific use case I'm thinking of, where one has a range one would like to split at a certain element (which you might have found any number of ways). Basically, you can just pass around the cursor, instead of needing the cursor and the range. This strikes me as likely to be very helpful for three-legged races^W cases.Not sure what the use cases would be, but defining the simplest thing possible and then building more complex concepts on top of them usually allows more possibilities.For instance, if I have an array, and a cursor in that array, it's trivial for an allBefore operation to determine that the cursor is a member of the array. Now, let's say we have two slices that contain the same cursor, in your scheme, in order to use one cursor against the other range, you need to recompose the cursor, which seems unnecessarily awkward.The way I see it, two different slices can't contain the same cursor; a cursor into a slice will always be associated with that slice and no other. Maybe there are important use cases I'm missing by defining things this way, but I don't know what they might be. [...]The pointer to the possibly invalid end node. See my above description.For example, if I want to store a link list element cursor so I can change it later, why do I also have to carry around the baggage of the range it came from?You may not, but why not just hang on to the range it came from in the first place instead of using the cursor? What's in the link-list range besides a pointer to a node anyway?[...]It becomes awkward when you are passing a range around but are saying "well, really, I'm only passing you a reference to the first element, ignore the rest of it". You end up naming things like removeFirstElement(range r) instead of remove(cursor c). This is from my experience with defining dcollections and a std.container redblacktree.What I'm thinking is that cursors aren't skinny ranges, they're ranges that are even fatter. You simply wouldn't use one where a range would suffice. Allowing questions about whether a range contains a cursor or not strikes me as just buying trouble.But you don't always care if a cursor is a part of a range. For example, to change or refer to a specific element of a range, why do you need to refer to the range also?You don't, but what do you lose by referring to the range also? A few words on the stack (sticking with the idea that ranges are value types)?[...]I like it :) -SteveI don't have a problem with defining such a three-legged type, but can we call it something besides a cursor?Sure. How about Tripod? 8^)
Aug 26 2010
== Quote from Pillsy (pillsbury gmail.com)'s articleThese thoughts were inspired by the recent thread, "Retrieving the traversed range". I am generally pretty sold on the range concept, but I think for some operations some sort of cursor is required. However, there are some unsafe cursor-related operations that are best avoided. However, as far as I can see, the unsafe operations revolve around synthesizing a new range out of two cursors. However, you can get a lot of the desired functionality associated with finding elements and splitting ranges *without* allowing the synthesis of of ranges from unrelated cursors. The key to doing this is to have each cursor be permanentlybound to the underlying range which it was derived from.If you have this kind of cursor, you only really need it to support twooperations to get a huge amount of benefit:Range before (); Range after (); The before () operation just returns a range of the appropriate type (Forward,Bidirectional, RandomAccess) of all the elements that come before the cursor is pointing to, while the after () operation returns a range of the appropriate type with all the elements after the cursor. Given this concept, a cursor really points at the boundary between elements of a range, if you will.If find and related operations return this kind of cursor, all of the operationsinvolving splitting ranges become trivial. It's completely consistent for either before () or after () to be empty, and since you can't glue two of these cursors together, you are always guaranteed to get well-defined ranges out.While I think the before () and after () operations are the only ones which are strictly necessary for the desiderata I've seen so far, people could write their own generic algorithms if cursors supported more of the iterator operations, for moving forward (and backward, where appropriate)and inspecting the element immediately after the cursor (or before, where appropriate), and if ranges supported operations to return cursors from reasonable places (like from the front or back, or from a specific indexed elemnt for RandomAccess ranges).The key idea is that these cursors aren't a primitive part of a range; instead,they take a range and add a position *inside* the range. They're a perfect fit for all those "three-legged" operations out there because they actually have three legs.Cheers, PillsyI'm starting to think that the whole concept of ranges is starting to get too complicated. We're already seeing it with this moveFront() stuff that's a special case to deal with structs that have expensive postblits. We need a healthy dose of "worse is better" to throw at ranges. IMHO generic code should solve 90% of the problem in a way that's robust, not terribly hard to implement and easy to use and understand. Trying to solve 100% of the problem with generic code means that your solution will be absurdly complicated in both interface and implementation, and therefore will be a buggy POS that nobody can figure out how to use. Adding yet more complexity to ranges has the following negative consequences, such that I think it's justified to sacrifice completeness in favor of simplicity: 1. Having yet more things to wrap makes writing correct higher order ranges that much more difficult and bug-prone. The composability of ranges is what makes them such an elegant and useful abstraction in the first place. 2. The combinatorial explosion of range types (isBidirectional, hasBefore, hasLength, ...) makes testing a higher order range a nightmare. 3. The more stuff we require someone to define to create a range (for example, if before() were simply required for bidirectional ranges), the higher the odds that we'll run into cases where one of these primitives can't be implemented efficiently. 4. I remember when ranges first came out, some people were pretty confused by them. Adding more stuff to solve the last 10% of cases will only add to this. It's better to have ranges that can solve 90% of the problem for the most competent 80% of programmers than ranges that can solve 100% of the problem, but only for the 1% who are hardcore gurus.
Aug 26 2010
On 8/26/10 9:18 PDT, dsimcha wrote:I'm starting to think that the whole concept of ranges is starting to get too complicated. We're already seeing it with this moveFront() stuff that's a special case to deal with structs that have expensive postblits. We need a healthy dose of "worse is better" to throw at ranges. IMHO generic code should solve 90% of the problem in a way that's robust, not terribly hard to implement and easy to use and understand. Trying to solve 100% of the problem with generic code means that your solution will be absurdly complicated in both interface and implementation, and therefore will be a buggy POS that nobody can figure out how to use.[...]It's better to have ranges that can solve 90% of the problem for the most competent 80% of programmers than ranges that can solve 100% of the problem, but only for the 1% who are hardcore gurus.I'm weary of this philosophy. I've seen a phenomenon happen time and again with myself and others: you hit a 10% case exactly when you (a) know what you want to do, (b) you know that the library could and should help you, and (c) you are working on a difficult enough problem to be wary of implementing it by hand. Two examples from recent history: A poster used std.algorithm for the _first_ (sic!) time, and found a fundamental limitation: std.algorithm doesn't work with immutable arrays. Second, Peter Alexander wanted to implement nextPermutation (a nontrivial algorithm in which you need all the help you can get) as his _first_ (sic again!) artifact of a new library, and hit the bidirectional range composition limitation. I don't think we want to aim std.algorithm and others for a 90% thing. That's what Visual Basic did. We should aim for making it a 100% thing without losing the other costs from sight (something that I believe we have a good track record on). In light of that, it's worth looking at optional functionality of the kind I described (allBefore etc.) would add power without aggravating everyone. Also, regarding moveFront and friends, I think it's a good time to start the following discussion: do we want to support types with arbitrarily expensive copying, or could we simply require reference counting and copy-on-write for such types? Let's start a new thread about this! Andrei
Aug 26 2010
On Thursday, August 26, 2010 10:51:02 Andrei Alexandrescu wrote:On 8/26/10 9:18 PDT, dsimcha wrote:I think that what it comes down to is that you need to be smart about what functionality that you do and don't include. David has an excellent point that when APIs become too complicated, they don't get used. For instance, I know _very_ few people who even _consider_ using C++'s algorithm library. It's just too hard to use. It has too many quirks (like how remove() works) and the lack of lambdas functions makes it incredibly painful to do simple things with it. Any library with much complexity is going to run into such issues, but they need to be minimized for a library to be properly useable. We can't afford to make ranges too complex. They need to be reasonably useable by your average programmer. Any time that we need to add functionality to make them do more, we need to weigh the benefits and costs. If adding ability X makes it possible to do thing Y, but only a few people need to do thing Y, and adding feature X greatly increases the complexity, then it probably shouldn't be added. On the other hand, if adding feature X doesn't add much complexity and/or a lot of people need to do thing Y, then it would likely be a good thing to add feature X. If we find something that a lot of people are trying to do and can't, then we need to really look at a reasonable way to make it possible. However, there are limits to what you can do without making Phobos' APIs unreasonably complex to understand and use. I do _not_ think that we should necessarily be hitting the 100% case. The cost for that is likely too high. However, we probably should be going for more like the 98% or 99% case. (after all 90% indicates that 1 out of 10 things that you try to do won't be feasible, and that's actually a lot, much as 90% sounds like a big number). So, I do think that we should be making ranges (and the rest of Phobos) as powerful we reasonably can, but there _are_ limits to what they can reasonably do, and we should not necessarily try to make it so that they can do absolutely everything. And when we do make ranges more powerful, we need to strive to limit the increase to complexity as much as possible. Ultimately, ranges really should be powerful enough to do virtually everything that you would normally do with iterators (along with what they can do that iterators can't), but they should also be useable by the average programmer. Whatever we do needs to reasonably balance those goals. And I think that there is legitmate concern that ranges are risking becoming too complex. - Jonathan M DavisI'm starting to think that the whole concept of ranges is starting to get too complicated. We're already seeing it with this moveFront() stuff that's a special case to deal with structs that have expensive postblits. We need a healthy dose of "worse is better" to throw at ranges. IMHO generic code should solve 90% of the problem in a way that's robust, not terribly hard to implement and easy to use and understand. Trying to solve 100% of the problem with generic code means that your solution will be absurdly complicated in both interface and implementation, and therefore will be a buggy POS that nobody can figure out how to use.[...]It's better to have ranges that can solve 90% of the problem for the most competent 80% of programmers than ranges that can solve 100% of the problem, but only for the 1% who are hardcore gurus.I'm weary of this philosophy. I've seen a phenomenon happen time and again with myself and others: you hit a 10% case exactly when you (a) know what you want to do, (b) you know that the library could and should help you, and (c) you are working on a difficult enough problem to be wary of implementing it by hand.
Aug 26 2010
Jonathan M Davis Wrote:On Thursday, August 26, 2010 10:51:02 Andrei Alexandrescu wrote:[...]I'm weary of this philosophy. I've seen a phenomenon happen time and again with myself and others: you hit a 10% case exactly when you (a) know what you want to do, (b) you know that the library could and should help you, and (c) you are working on a difficult enough problem to be wary of implementing it by hand.I think that what it comes down to is that you need to be smart about what functionality that you do and don't include.I really don't disagree. One of the reasons I'm suggesting that we have a Tripod[1] concept to cover these three-legged operations is that I think that it's cohesive enough that fitting this functionality into a Range directly is unduly complicating. Having two concepts which each provide a reasonably orthogonal set of six basic operations is much less complicated than having one concept which provides a dozen less orthogonal operations. [1] I actually think "Pivot" is the right name of the range+iterator-in-the-middle abstraction.David has an excellent point that when APIs become too complicated, they don't get used. For instance, I know _very_ few people who even _consider_ using C++'s algorithm library. It's just too hard to use.Huh. I use it with some frequency when I program C++, and I hardly regard myself as a C++ wizard. Sure, it can be pretty damn awkward, but welcome to C++. [...]It has too many quirks (like how remove() works) and the lack of lambdas functions makes it incredibly painful to do simple things with it.I think the lambda functions are a great example: they're a place where C++98/03 doesn't provide *enough*, rather than providing too much. The library runs up against a fundamental limitation of the underlying language, rendering parts of it (like for_each) pretty useless. There can be a severe cost to not providing enough. [...]I do _not_ think that we should necessarily be hitting the 100% case. The cost for that is likely too high. However, we probably should be going for more like the 98% or 99% case. (after all 90% indicates that 1 out of 10 things that you try to do won't be feasible, and that's actually a lot, much as 90% sounds like a big number).This is definitely agree with. Making something that legitimately does everything anyone could ever want is how you end up with nightmares like Scheme's call-with-current-continuation, but letting 1 use case in 10 drop on the floor makes the cost/benefit ratio of learning your library or language really dubious. [...] Cheers, Pillsy
Aug 27 2010
On 8/26/10 7:18 PDT, Pillsy wrote: [snip]Range before (); Range after ();(I wrote my last post before reading yours.) Excellent points. I do think we can get away without defining a new type. In addition to the allBefore(all, tail) primitive (which is easy to implement as a safe primitive by ranges that support it) we could also define allAfter(all, head) that works when head is superimposed over the beginning portion of all (is a prefix of it). It would still be impossible to take a range somewhere in the middle of another and then get in constant time the portion before and after it, but (a) that is difficult to make safe in O(1) anyway, and (b) there is exceedingly rare need for it. Andrei
Aug 26 2010