www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - "Range invalidation" ?

reply Mark <smarksc gmail.com> writes:
C++ has the issue of iterator invalidation, where certain 
operations on a container while iterating on it may invalidate 
the iterator, in which case it is no longer safe to use the 
iterator.

D has ranges, but presumably the same issue can arise in D. For 
instance, if I have a ForwardRange and I use the save primitive 
to keep a reference to some tail of it, then I can manipulate one 
of the ranges in a way that may leave the other in an undefined 
state. For instance, if the range is allocated on the heap and at 
some point I free its memory, then the range's copy is going to 
point to some garbage.

Is there some documentation anywhere on:
1. Primitive operations, e.g. concatenation, that may invalidate 
a range.
2. Funtions in Phobos, operating on a range, e.g. (filter, sort, 
etc.), that may invalidate it?

Thanks. -Mark
Aug 30 2017
parent reply Jonathan M Davis via Digitalmars-d-learn writes:
On Wednesday, August 30, 2017 13:28:48 Mark via Digitalmars-d-learn wrote:
 C++ has the issue of iterator invalidation, where certain
 operations on a container while iterating on it may invalidate
 the iterator, in which case it is no longer safe to use the
 iterator.

 D has ranges, but presumably the same issue can arise in D. For
 instance, if I have a ForwardRange and I use the save primitive
 to keep a reference to some tail of it, then I can manipulate one
 of the ranges in a way that may leave the other in an undefined
 state. For instance, if the range is allocated on the heap and at
 some point I free its memory, then the range's copy is going to
 point to some garbage.

 Is there some documentation anywhere on:
 1. Primitive operations, e.g. concatenation, that may invalidate
 a range.
 2. Funtions in Phobos, operating on a range, e.g. (filter, sort,
 etc.), that may invalidate it?
It's entirely dependent on the implementation. _Most_ ranges cannot be invalidated. They're usually either purely generative and aren't backed by much in the way of memory (just the information needed to generate the next value, which is likely held directly in a struct), or they're ultimately backed by dynamic arrays, in which case, the only risk would be something which had direct access to the array mutating its elements. Not even mutating the length of the array would affect the range, because the range itself would have a dynamic array which was a slice of the other, in which case, the only affect that can be transferred from one to the other would be the mutation of the elements that they both refer to. The times when range invalidation comes into play is when it refers to non-GC-allocated memory (e.g. malloce-ed memory or memory on the stack - like if it has a dynamic array which is a slice of a static array) and when the range is for an actual container whose elements may change. And whether a range over a container would be invalidated by something like an element being added or removed is entirely dependent on how the range and container are implmented. Odds are that with something like Array, removing elements will screw with the range, because like vector or a dynamic array, Array is dealing with a contiguous block of memory, and unless Array is allocating a new block of memory when it's altered so that the old is still valid for any ranges (and it's obvously not doing that, because that would go against its charter), then altering an Array is bound to screw up any ranges that refer to it. Something like RedBlackTree may or may not be screwed up (it likely depends on which elements are added or removed and which the range refers to), but if something is removed from the middle of the range (or added to it), then it would be at least insofar as it wouldn't refer to the same elements anymore. Really, unless a container goes to a bunch of extra work to keep track of whether a range exists which refers to a particular element or range of elements and ensure that any ranges continue to refer to the same elements even when the container is mutated, mucking with a container is going to muck with the range. And so in general, with a container, the only options for ranges are either going to be to potentially allocate memory for the range itself (thus copying a section of the container rather than just referring to it) whenever the underlying container is mutated in a way that would invalidate the range, or it's going to have to do something like is done in Java and make it so that the range knows if it's been invalidated and throws an Exception or Error if it's used after that. The invalidation problem is just all around worse with ranges backed by containers than it is for iterators backed by containers, because ranges refer to more elements, and they ususally refer to a contiguous set of elements rather than the elements individually. Dynamic arrays avoid all of this because any memory that they shared is never mucked with by mutating an individual dynamic array except in the case where an element is mutated. Altering the length of one dynamic array does not alter the length of memory of any other dynamic array, even if they're slices of the same memory. But that also means that they end up reallocating pretty easily if you do anything other than simply append to them, whereas removing an element from a typical container does not result in a reallocation. In any case, it's ultimately going to come down to the implementation of the container in question. IIRC, std.container is not at all clear about which operations do and don't invalidate ranges, but in general, it's hard enough to make it so that a range stays valid after a container is mutated that the odds are that any range which continues to be used after its underlying container has been mutated is not going to contain the same elements anymore, and it might blow up in your face. So, unless the documentation is explicit about it, it's probably just best to assume that a range over a container is no longer valid after the container has been mutated. Honestly, containers is the big place where ranges tend to be worse than iterators, and we haven't done all that good a job with containers in general for Phobos. Supposedly, Andrei is working on a new set of containers, but they have yet to materialize. And all of the issues that arise from ranges refering to more than one element and not being able to figure out that a range refers to a container once it's wrapped in a another range makes stuff like remove functions difficult. std.container solved that problem by using take/Take so that remove takes Take!Range and is thus able to know that the underlying range is from that container while also understanding how the wrapper range works so that it only removes the elements refered to by the wrapper range, but that doesn't scale. You can do it for a few set of primitives, but even then it's worse than with iterators. And invalidation is inherently worse for ranges without the container playing games and potentially essentially allocating a new container just for the range, which is unlikely to be cheap. I think that the general sentiment has been that folks want D's containers to not have range invalidation problems, but that's not necessarily a easy nut to crack while being efficient. - Jonathan M Davis
Aug 30 2017
parent Mark <smarksc gmail.com> writes:
On Wednesday, 30 August 2017 at 15:06:12 UTC, Jonathan M Davis 
wrote:
 [...]
Thank you for the detailed response.
Aug 31 2017