digitalmars.D.learn - to invalidate a range
- Ellery Newcomer (5/5) Aug 12 2011 in std.container, the stable* container functions advocate that they do
- bearophile (4/7) Aug 12 2011 Generally modifying a collection while you iterate on it causes troubles...
- Ellery Newcomer (14/21) Aug 12 2011 I am not convinced of this. In the following code, there is definitely a...
- Steven Schveighoffer (18/21) Aug 12 2011 Say for example, you are iterating a red black tree, and your current
- Ellery Newcomer (5/18) Aug 12 2011 Then there is no way to implement a stable remove from a node based
- Jonathan M Davis (13/28) Aug 12 2011 Removing elements from a node-based container only invalidates ranges if...
- Steven Schveighoffer (17/37) Aug 15 2011 Not one that guarantees stability. However, you can implement a remove ...
- Jonathan M Davis (40/46) Aug 12 2011 Short answer: The range doesn't point to what it's supposed to point to
- Ellery Newcomer (9/21) Aug 12 2011 "shouldn't" isn't a guarantee. Where there is "shouldn't", there can't
- Jonathan M Davis (22/47) Aug 12 2011 An implementation can guarantee it as long as your range doesn't directl...
- Ellery Newcomer (9/18) Aug 12 2011 Forgive my being dense, but where is this 'as long as' coming from? If
- Jonathan M Davis (48/67) Aug 12 2011 Are you familiar with iterators? This will be a lot easier if you are. A...
- Ellery Newcomer (8/22) Aug 12 2011 Now you're just bludgeoning me into apathy (though my ability to
- Jonathan M Davis (15/43) Aug 12 2011 It means that if you're dealing with a node-based container, and you rem...
- Steven Schveighoffer (11/26) Aug 15 2011 I don't think it's possible to implement stableRemove IMO. I believe
in std.container, the stable* container functions advocate that they do not invalidate the ranges of their containers. What does it mean to invalidate a range? my assumption is it means causing e.g. front or popFront to fail when empty says they should succeed or vice versa.
Aug 12 2011
Ellery Newcomer:in std.container, the stable* container functions advocate that they do not invalidate the ranges of their containers. What does it mean to invalidate a range?Generally modifying a collection while you iterate on it causes troubles. When you iterate on a range and you modify the range during the iteration Python gives you an error, because the "for" temporary sets boolean inside the iteratee. In D this problem was avoided in another way, using those stable functions. Bye, bearophile
Aug 12 2011
On 08/12/2011 03:13 PM, bearophile wrote:Ellery Newcomer:I am not convinced of this. In the following code, there is definitely a problem; it just remains to be seen whether the range is invalidated, or merely noisy according to the specified semantics. import std.container; import std.stdio; import std.array; void main(){ auto arr = make!(Array!int)([1,2,3]); auto r = arr[]; writeln(array(r.save())); arr.stableRemoveAny(); writeln(array(r.save())); }in std.container, the stable* container functions advocate that they do not invalidate the ranges of their containers. What does it mean to invalidate a range?Generally modifying a collection while you iterate on it causes troubles. When you iterate on a range and you modify the range during the iteration Python gives you an error, because the "for" temporary sets boolean inside the iteratee. In D this problem was avoided in another way, using those stable functions. Bye, bearophile
Aug 12 2011
On Fri, 12 Aug 2011 15:54:53 -0400, Ellery Newcomer <ellery-newcomer utulsa.edu> wrote:in std.container, the stable* container functions advocate that they do not invalidate the ranges of their containers. What does it mean to invalidate a range?Say for example, you are iterating a red black tree, and your current "front" points at a certain node. Then that node is removed from the tree. That range is now invalid, because the node it points to is not valid. What happens when you use an invalidated range? Well, we could implement something that throws an exception, but that's an efficiency problem. I contemplated doing this for debug mode in dcollections, I probably still will. Another example of an invalidated range, let's say you have a hash map. The range has a start and a finish, with the finish being iterated after the start. If you add a node, it could cause a rehash, which could potentially put the finish *before* the start! However, the same hash implementation could potentially define a stable add, which is guaranteed not to rehash the map, even when it exceeds the rehash threshold :) -Steve
Aug 12 2011
On 08/12/2011 03:29 PM, Steven Schveighoffer wrote:On Fri, 12 Aug 2011 15:54:53 -0400, Ellery Newcomer <ellery-newcomer utulsa.edu> wrote:Then there is no way to implement a stable remove from a node based container?in std.container, the stable* container functions advocate that they do not invalidate the ranges of their containers. What does it mean to invalidate a range?Say for example, you are iterating a red black tree, and your current "front" points at a certain node. Then that node is removed from the tree. That range is now invalid, because the node it points to is not valid.Another example of an invalidated range, let's say you have a hash map. The range has a start and a finish, with the finish being iterated after the start. If you add a node, it could cause a rehash, which could potentially put the finish *before* the start!Then the invalidation is that the range failed to produce an element of the container?
Aug 12 2011
On Friday, August 12, 2011 13:58 Ellery Newcomer wrote:On 08/12/2011 03:29 PM, Steven Schveighoffer wrote:Removing elements from a node-based container only invalidates ranges if they specifically point to an element which was removed. Whether an iterator is invalidated is more obvious, because it points to a specific element, and as long as it's not the one which was removed, you're fine. For a range, it's not as obvious, because you don't really know how it was implemented internally. However, as long as it's effectively holding the begin and end iterators for the range (which is almost certainly what it has to do), then you know that your rang is fine as long as the first element pointed to and the last elemented pointed to weren't removed. You _do_ have the possible concern that your range doesn't contain the same elements that it did before (if an element was removed from its middle), but the range is still valid. - Jonathan M DavisOn Fri, 12 Aug 2011 15:54:53 -0400, Ellery Newcomer <ellery-newcomer utulsa.edu> wrote:Then there is no way to implement a stable remove from a node based container?in std.container, the stable* container functions advocate that they do not invalidate the ranges of their containers. What does it mean to invalidate a range?Say for example, you are iterating a red black tree, and your current "front" points at a certain node. Then that node is removed from the tree. That range is now invalid, because the node it points to is not valid.
Aug 12 2011
On Fri, 12 Aug 2011 16:58:15 -0400, Ellery Newcomer <ellery-newcomer utulsa.edu> wrote:On 08/12/2011 03:29 PM, Steven Schveighoffer wrote:Not one that guarantees stability. However, you can implement a remove that can be proven to be stable for certain cases (basically, as long as you don't remove one of the endpoints).On Fri, 12 Aug 2011 15:54:53 -0400, Ellery Newcomer <ellery-newcomer utulsa.edu> wrote:Then there is no way to implement a stable remove from a node based container?in std.container, the stable* container functions advocate that they do not invalidate the ranges of their containers. What does it mean to invalidate a range?Say for example, you are iterating a red black tree, and your current "front" points at a certain node. Then that node is removed from the tree. That range is now invalid, because the node it points to is not valid.No, it may crash the program, for example if empty does: return this.beginNode is this.endNode; If beginNode is sequentially after endNode, this condition will never be true. But there are other definitions of "invalid", I'd call any of these cases invalidated ranges: * it fails to iterate valid nodes that were in the range before the operation, and are still valid after the operation. * it iterates a node more than once (for example, iterates a node before the operation, then iterates it again after the operation) * it iterates invalid nodes (nodes that have been removed). -SteveAnother example of an invalidated range, let's say you have a hash map. The range has a start and a finish, with the finish being iterated after the start. If you add a node, it could cause a rehash, which could potentially put the finish *before* the start!Then the invalidation is that the range failed to produce an element of the container?
Aug 15 2011
On Friday, August 12, 2011 12:54 Ellery Newcomer wrote:in std.container, the stable* container functions advocate that they do not invalidate the ranges of their containers. What does it mean to invalidate a range? my assumption is it means causing e.g. front or popFront to fail when empty says they should succeed or vice versa.Short answer: The range doesn't point to what it's supposed to point to anymore. Don't use it. Its behavior is undefined. Long answer: This is a classic issue in C++ with the STL, and it applies to D's ranges for the same reason. An iterator or a range is valid only so long as it continues to point to a valid element in the container that it points to. With a vector or Array for instance, if you have an iterator or range pointer to that vector/Array and the container is reallocated because you appended to it, and it didn't have any capacity left, then you have an iterator/range which points to memory which isn't in the container anymore. Iterating with that iterator/range would be problematic. In C++, you'd likely be iterating over memory which had been deleted, which could cause all kinds of problems and would blow up on you in a variety of ways at least some of the time. In D, the memory is probably still sitting on the stack exactly as it was, so iterating over it would mean iterating over an old version of the container. It probably wouldn't blow up, but it definitely wouldn't be what you wanted. Adding and removing elements without reallocations causes problems too, because the elements get shifted around. The iterator/range may still technically be valid and useable, but it doesn't necessarily point to the same data anymore. In the case of container that uses nodes - such as a linked list - because you can add and remove elements without affecting other elements, iterators and ranges don't tend to get invalidated as easily. As long as you don't remove the element (or elements in the case of a range - assuming that it keeps track of its two end points, as is likely) that it points to, then adding or removing elements from the container shouldn't invalidate the iterator/range. So, whether a particular operation invalidates an iterator or range depends very much on the container and the operation. std.container provides stableX functions which do whatever is necessary to guarantee that any ranges which point to the container stay valid. However, any other operation which alters a container risks invalidating any existing range for that container. It may or may not invalidate the range, depending on the container and the operation, but it's a risk. The only way to avoid any risk of invalidating ranges is to not keep ranges over a container when you alter that container. So, basically what it comes down to is the short answer. A range which has been invalidated doesn't point to what it's supposed to point to anymore, and using it results in undefined behavior. It's less likely to blow up in D, because it's generally memory-safe, but you're going to get incorrect behavior. - Jonathan M Davis
Aug 12 2011
On 08/12/2011 03:54 PM, Jonathan M Davis wrote:In the case of container that uses nodes - such as a linked list - because you can add and remove elements without affecting other elements, iterators and ranges don't tend to get invalidated as easily. As long as you don't remove the element (or elements in the case of a range - assuming that it keeps track of its two end points, as is likely) that it points to, then adding or removing elements from the container shouldn't invalidate the iterator/range."shouldn't" isn't a guarantee. Where there is "shouldn't", there can't be stableRemove*, no?So, basically what it comes down to is the short answer. A range which has been invalidated doesn't point to what it's supposed to point to anymore, and using it results in undefined behavior. It's less likely to blow up in D, because it's generally memory-safe, but you're going to get incorrect behavior. - Jonathan M Davissuppose your linked list range points to a node X. element in X is removed by the linked list, and the range automagically moves to X.next (or X.prev). Is the range invalid by this standard or not? (no way 'san ifrinn I'm going to implement that, though). heh heh. most of this business has only convinced me I want immutable containers.
Aug 12 2011
On Friday, August 12, 2011 15:29 Ellery Newcomer wrote:On 08/12/2011 03:54 PM, Jonathan M Davis wrote:An implementation can guarantee it as long as your range doesn't directly point to an element being removed (i.e. as long as the element isn't on the ends - or maybe one past the end, depending on the implementation). But _no_ container can guarantee that an iterator or range which directly references an element which is removed is going to stay valid - not without playing some serious games internally which make iterators and ranges too inefficent, and possibly not even then. So, stableRemove is only going to guarantee that a range stays valid on as long as the end points of that range aren't what was being removed.In the case of container that uses nodes - such as a linked list - because you can add and remove elements without affecting other elements, iterators and ranges don't tend to get invalidated as easily. As long as you don't remove the element (or elements in the case of a range - assuming that it keeps track of its two end points, as is likely) that it points to, then adding or removing elements from the container shouldn't invalidate the iterator/range."shouldn't" isn't a guarantee. Where there is "shouldn't", there can't be stableRemove*, no?If the element that you removed was the end point of a range, then the range won't be valid anymore.So, basically what it comes down to is the short answer. A range which has been invalidated doesn't point to what it's supposed to point to anymore, and using it results in undefined behavior. It's less likely to blow up in D, because it's generally memory-safe, but you're going to get incorrect behavior. - Jonathan M Davissuppose your linked list range points to a node X. element in X is removed by the linked list, and the range automagically moves to X.next (or X.prev). Is the range invalid by this standard or not? (no way 'san ifrinn I'm going to implement that, though).heh heh. most of this business has only convinced me I want immutable containers.It's only an issue if you keep ranges of a container around and then alter the container. If you're just use ranges to do an operation or two and then throw them away, it's not an issue. C++ has been this way for years, and it's generally not a problem. It _can_ be a problem if you try and keep iterators/ranges around while altering a container, but there's not really a good way around that. And as long as you're aware of that, you'll be fine. It's only when you try and alter a container while retaining ranges to it that you're going to have to start worrying about whether a range has been invalidated or not. - Jonathan M Davis
Aug 12 2011
On 08/12/2011 05:51 PM, Jonathan M Davis wrote:An implementation can guarantee it as long as your range doesn't directly point to an element being removed (i.e. as long as the element isn't on the ends - or maybe one past the end, depending on the implementation). But _no_ container can guarantee that an iterator or range which directly references an element which is removed is going to stay valid - not without playing some serious games internally which make iterators and ranges too inefficent, and possibly not even then. So, stableRemove is only going to guarantee that a range stays valid on as long as the end points of that range aren't what was being removed.Forgive my being dense, but where is this 'as long as' coming from? If your range only points to ends in e.g. a linked list, how is it supposed to retrieve elements in the middle? I'm having a hard time visualizing a range over a node based container that doesn't point to a node in the middle (at some point in time). The range points to the node to retrieve the current front quickly, the node can get removed, the removed function won't know its invalidating the range without playing yon internal games, ergo stable remove cannot be.
Aug 12 2011
On Friday, August 12, 2011 16:16 Ellery Newcomer wrote:On 08/12/2011 05:51 PM, Jonathan M Davis wrote:Are you familiar with iterators? This will be a lot easier if you are. An iterator points to one element and one element only. In C++, you tend to pass around pairs of iterators - one pointing to the first element in a range of elements and one pointing to one past the end. You then usually iterate by incrementing the first iterator until it equals the second. Ranges at their most basic level are a pair of iterators - one pointing to the first element in the range and one pointing either to the last element or one past that, depending on the implementation. The range API and concept is actually much more flexible than that, allowing us to implement stuff like the fibonacci sequence as a range, but when it comes to containers, they're almost certainly doing what C++ by using two iterators, except that it's wrapped them so that you don't ever have to worry about them pointing to separate containers or the first iterator actually being past the second one. Wrapping them as a range makes it much cleaner, but ultimately, for containers at least, you're still going to have those iterators internally. So, front returns the element that the first iterator points to and popFront just increments that iterator by one. If the range has back, then back points to the last element of the range (though the internal iterator may point one elment past that), and popBack decrements that iterator by one. It doesn't directly refer to _any_ elements in the middle. So, in a node-based container, adding or removing elements in the middle of the range will have no effect on the internal iterators. It'll have an effect on what elements you ultimately iterate over if you iterate over the range later, but the two iterators are still completely valid and point to the same elements that they always have. However, if you remove either of the elements that the iterators point to, _then_ the range is going to be invalidated, because its iterators are invalid. They don't point to valid elements in the container anymore. In a contiguous container, such as Array, adding or removing elements is more disruptive, since the elements get copied around inside of the contiguous block of memory, and while the iterators may continue to point at the same indices as before, the exact elements will have shifted. So, they're still valid in the sense that they point to valid elements, but they don't point to the same elements. The two places where they end up no longer pointing to valid elements are when you remove enough elements that one or both iterators points at an index which is greater than the size of the container and when the container has to be reallocated (typically when you append enough elements that it no longer has the capacity to resize in place). So, in general, altering contiguous containers, such as Array, risks invalidating all iterators or ranges which point to them, whereas with node-based containers it's only when the end point of a range gets removed that you run into that kind of trouble. Really, to understand range invalidation, you probably should have a fair understanding of iterators. But again, as long as you don't keep any ranges around when you alter a container by adding or removing elements from it, you don't have anything to worry about. - Jonathan M DavisAn implementation can guarantee it as long as your range doesn't directly point to an element being removed (i.e. as long as the element isn't on the ends - or maybe one past the end, depending on the implementation). But _no_ container can guarantee that an iterator or range which directly references an element which is removed is going to stay valid - not without playing some serious games internally which make iterators and ranges too inefficent, and possibly not even then. So, stableRemove is only going to guarantee that a range stays valid on as long as the end points of that range aren't what was being removed.Forgive my being dense, but where is this 'as long as' coming from? If your range only points to ends in e.g. a linked list, how is it supposed to retrieve elements in the middle? I'm having a hard time visualizing a range over a node based container that doesn't point to a node in the middle (at some point in time). The range points to the node to retrieve the current front quickly, the node can get removed, the removed function won't know its invalidating the range without playing yon internal games, ergo stable remove cannot be.
Aug 12 2011
On 08/12/2011 06:34 PM, Jonathan M Davis wrote:Now you're just bludgeoning me into apathy (though my ability to communicate seems lacking). The iterator is an abstraction. Beneath it, in a node based container, [I expect] will be a pointer to a node, which might point to any node in the container. This means that removing any node could potentially invalidate a range somewhere. When such a conflict arises, you cannot both perform the removal and keep a valid range, regardless of whether you even knew of the conflict.Forgive my being dense, but where is this 'as long as' coming from? If your range only points to ends in e.g. a linked list, how is it supposed to retrieve elements in the middle? I'm having a hard time visualizing a range over a node based container that doesn't point to a node in the middle (at some point in time). The range points to the node to retrieve the current front quickly, the node can get removed, the removed function won't know its invalidating the range without playing yon internal games, ergo stable remove cannot be.Are you familiar with iterators? This will be a lot easier if you are. An iterator points to one element and one element only. In C++, you tend to pass around pairs of iterators - one pointing to the first element in a range of elements and one pointing to one past the end. You then usually iterate by incrementing the first iterator until it equals the second.
Aug 12 2011
On Friday, August 12, 2011 20:03:59 Ellery Newcomer wrote:On 08/12/2011 06:34 PM, Jonathan M Davis wrote:It means that if you're dealing with a node-based container, and you remove an element from that container, and you have a range which does not have that element at either of its ends, then you know that your range is valid. If you're keeping random ranges around and removing elements from the container, then no, you can't know whether the range is still valid or not. What it really comes down to is that you don't keep ranges around long term, and that if you're altering a container, and you're using a range over that container at the same time, you need to be sure of what you're doing. If you are, then you can use the range in spite of altering the container. If you're not, then you're in trouble. The simplest thing to do, of course, is to just not alter a container while you have ranges which refer to it (or to get rid of any ranges that you have over a container when you do alter that container). - Jonathan M DavisNow you're just bludgeoning me into apathy (though my ability to communicate seems lacking). The iterator is an abstraction. Beneath it, in a node based container, [I expect] will be a pointer to a node, which might point to any node in the container. This means that removing any node could potentially invalidate a range somewhere. When such a conflict arises, you cannot both perform the removal and keep a valid range, regardless of whether you even knew of the conflict.Forgive my being dense, but where is this 'as long as' coming from? If your range only points to ends in e.g. a linked list, how is it supposed to retrieve elements in the middle? I'm having a hard time visualizing a range over a node based container that doesn't point to a node in the middle (at some point in time). The range points to the node to retrieve the current front quickly, the node can get removed, the removed function won't know its invalidating the range without playing yon internal games, ergo stable remove cannot be.Are you familiar with iterators? This will be a lot easier if you are. An iterator points to one element and one element only. In C++, you tend to pass around pairs of iterators - one pointing to the first element in a range of elements and one pointing to one past the end. You then usually iterate by incrementing the first iterator until it equals the second.
Aug 12 2011
On Fri, 12 Aug 2011 18:29:00 -0400, Ellery Newcomer <ellery-newcomer utulsa.edu> wrote:On 08/12/2011 03:54 PM, Jonathan M Davis wrote:I don't think it's possible to implement stableRemove IMO. I believe SList does claim it, but IMO once you remove an element from a container, any range that iterates that element is invalid. Once we get custom allocators, this is going to become a lot dicier, because removing elements actually may deallocate them. stableAdd is more possible for implementing, as long as adding does not significantly change the topology of the container (for example, adding to a hash may do a rehash which changes the topology). -SteveIn the case of container that uses nodes - such as a linked list - because you can add and remove elements without affecting other elements, iterators and ranges don't tend to get invalidated as easily. As long as you don't remove the element (or elements in the case of a range - assuming that it keeps track of its two end points, as is likely) that it points to, then adding or removing elements from the container shouldn't invalidate the iterator/range."shouldn't" isn't a guarantee. Where there is "shouldn't", there can't be stableRemove*, no?
Aug 15 2011