digitalmars.D - Orphan ranges
- Andrei Alexandrescu (33/33) Apr 15 2012 I'm making good progress on an allocator design. If things come together...
- Tove (8/44) Apr 15 2012 Wow, cool!
- Jacob Carlborg (5/38) Apr 15 2012 As you say, built-in arrays work like this. Therefore at least an
- Chad J (52/66) Apr 15 2012 How about
- Andrei Alexandrescu (12/59) Apr 15 2012 Yah, the allocator would be a generic parameter. Any allocator that
- Walter Bright (3/9) Apr 15 2012 I suspect that disallowing them would be surprising behavior, as D with ...
- Walter Bright (3/12) Apr 15 2012 I'd add that unless there's a pretty compelling reason not to, the colle...
- Dmitry Olshansky (23/44) Apr 16 2012 Nice overview. I believe there should adapters on top of that.
- Steven Schveighoffer (38/69) Apr 16 2012 * pool/freelist based. Dcollections has such an allocator (Called a chu...
- Jonathan M Davis (20/58) Apr 16 2012 So, the problem that we have is basically
- Andrei Alexandrescu (7/24) Apr 16 2012 Well it's just a different matter. In particular containers offer the
- Jonathan M Davis (11/41) Apr 16 2012 Yeah, but the example is pretty much the same as code where someone call...
I'm making good progress on an allocator design. If things come together as I hope, it'll kick some serious ass. I'm currently looking at four allocation models: * straight GC, safe (no deallocation) * GC + the ability to free memory * malloc/free, unsafe, buyer beware (STL-level safety) * reference counted (based on either malloc or GC+free) * region (scoped) I need to kink out a few details, most important being - is safety part of the allocator or an extricable property that's a different policy? For now I have a related question about orphan ranges. Consider this motivating example: void main() { int[] x; { auto b = new int[20]; x = b[5 .. $ - 5]; } ... use x ... } The range x is a 10-elements range that originates in a 20-element array. There is no safe way to access the original array again, so in a sense the other 10 elements are "lost". That's why I call x an orphan range - a range of which original container is gone. Built-in arrays work routinely like that, and in fact the originating arrays are not distinguished by type in any way from their ranges (be they orphan or not). The question is, what should std.container do about orphan ranges in general? Should it allow them, disallow them, or leave the decision open (e.g. to be made by the allocator)? Depending on what way we go, the low-level design would be quite different. Thanks, Andrei
Apr 15 2012
On Sunday, 15 April 2012 at 16:34:12 UTC, Andrei Alexandrescu wrote:I'm making good progress on an allocator design. If things come together as I hope, it'll kick some serious ass. I'm currently looking at four allocation models: * straight GC, safe (no deallocation) * GC + the ability to free memory * malloc/free, unsafe, buyer beware (STL-level safety) * reference counted (based on either malloc or GC+free) * region (scoped) I need to kink out a few details, most important being - is safety part of the allocator or an extricable property that's a different policy? For now I have a related question about orphan ranges. Consider this motivating example: void main() { int[] x; { auto b = new int[20]; x = b[5 .. $ - 5]; } ... use x ... } The range x is a 10-elements range that originates in a 20-element array. There is no safe way to access the original array again, so in a sense the other 10 elements are "lost". That's why I call x an orphan range - a range of which original container is gone. Built-in arrays work routinely like that, and in fact the originating arrays are not distinguished by type in any way from their ranges (be they orphan or not). The question is, what should std.container do about orphan ranges in general? Should it allow them, disallow them, or leave the decision open (e.g. to be made by the allocator)? Depending on what way we go, the low-level design would be quite different. Thanks, AndreiWow, cool! To flat out disallow Orphan ranges is imho too restrictive, especially considering we already have a safe solution, what is missing is an unsafe(no overhead) version. If the design can handle safe/unsafe as a policy for a stack(scoped) based allocator... that would be kick ass, indeed.
Apr 15 2012
On 2012-04-15 18:34, Andrei Alexandrescu wrote:I'm making good progress on an allocator design. If things come together as I hope, it'll kick some serious ass. I'm currently looking at four allocation models: * straight GC, safe (no deallocation) * GC + the ability to free memory * malloc/free, unsafe, buyer beware (STL-level safety) * reference counted (based on either malloc or GC+free) * region (scoped) I need to kink out a few details, most important being - is safety part of the allocator or an extricable property that's a different policy? For now I have a related question about orphan ranges. Consider this motivating example: void main() { int[] x; { auto b = new int[20]; x = b[5 .. $ - 5]; } ... use x ... } The range x is a 10-elements range that originates in a 20-element array. There is no safe way to access the original array again, so in a sense the other 10 elements are "lost". That's why I call x an orphan range - a range of which original container is gone. Built-in arrays work routinely like that, and in fact the originating arrays are not distinguished by type in any way from their ranges (be they orphan or not). The question is, what should std.container do about orphan ranges in general? Should it allow them, disallow them, or leave the decision open (e.g. to be made by the allocator)? Depending on what way we go, the low-level design would be quite different. Thanks, AndreiAs you say, built-in arrays work like this. Therefore at least an allocator using the GC should allow this. -- /Jacob Carlborg
Apr 15 2012
On 04/15/2012 12:34 PM, Andrei Alexandrescu wrote:I'm making good progress on an allocator design. If things come together as I hope, it'll kick some serious ass. I'm currently looking at four allocation models: * straight GC, safe (no deallocation) * GC + the ability to free memory * malloc/free, unsafe, buyer beware (STL-level safety) * reference counted (based on either malloc or GC+free) * region (scoped) I need to kink out a few details, most important being - is safety part of the allocator or an extricable property that's a different policy? For now I have a related question about orphan ranges. ... Thanks, AndreiHow about * User-defined allocator ?? At the very least, I think we should be able to choose from the above strategies for a given container. If this isn't done, then there will always be people wanting to re-implement the containers just to accomplish silly stuff like allocating a little differently. As for safety: I would probably want to be able to tweak the ways my containers are safe/unsafe. I'd want safety by default, with the ability to choose my vulnerabilities. "Safety" is a fairly vague notion, since there are a number of things that could cause memory corruption or undefined behavior. Allowing me to decide which risk factors to take allows me to decide which causes of corruption/undefined behavior I want to expose myself to. Once there's a sufficiently large panel of twiddley knobs and toggle switches, then it might make sense to create "policies" like "safe" vs "blazing fast". As for orphan ranges: I've always written under the assumption that an array like 'x' would keep 'b' around. It would be nice if large amounts of memory wasted from orphans could be reused in some way, but not at the expense of a lot of speed. If large waste from orphans sticks around, a warning in the docs would be nice. Other note: For the longest time I've thought that D should do reference counting for types that can be proven to have no reference cycles. I don't understand why this is difficult, but even if it is it might be worth it. I don't actually need it personally, but I constantly hear complaints about "oh, D requires garbage collection because you can't use most of Phobos without it". What they really want, I bet, is deterministic memory management for realtime systems that does not cause thread contention/world stops. Add reference counting and this will suddenly be doable for what I'd image would be a lot of important stuff in Phobos (think CoW string functions--probably reference counter friendly). At this point the "reference counted" option gets subsumed into the "straight GC" / "GC+free" options, because the language/compiler would decide at compile-time which strategy to use for a range. I see D's current coverage of memory management techniques looking like this: - GC garbage collection: allocation for objects with complicated lifespans, with the caveat of complex allocation/deallocation code (check, not %100 featureful) - malloc/free for mediocre-speed allocation of objects whose lifespans can't be easily calculated but are known by the programmer (check) - custom allocators for when everything else fails (check, in most cases) - memory reuse with slices/immutability which gives us an awesome zero-overhead solution in a number of situations (check) - reference counting for almost-deterministic allocation of objects with easily calculated lifespans (**D does not cover this case!**) - stack for fast allocation of short-lived objects (check)
Apr 15 2012
On 4/15/12 12:50 PM, Chad J wrote:How about * User-defined allocator ??Yah, the allocator would be a generic parameter. Any allocator that supports a compatible interface with the archetypes used would work. But making e.g. file-backed allocators or whatnot may be difficult.As for safety: I would probably want to be able to tweak the ways my containers are safe/unsafe. I'd want safety by default, with the ability to choose my vulnerabilities. "Safety" is a fairly vague notion, since there are a number of things that could cause memory corruption or undefined behavior. Allowing me to decide which risk factors to take allows me to decide which causes of corruption/undefined behavior I want to expose myself to. Once there's a sufficiently large panel of twiddley knobs and toggle switches, then it might make sense to create "policies" like "safe" vs "blazing fast".Makes sense.As for orphan ranges: I've always written under the assumption that an array like 'x' would keep 'b' around. It would be nice if large amounts of memory wasted from orphans could be reused in some way, but not at the expense of a lot of speed. If large waste from orphans sticks around, a warning in the docs would be nice.So orphans should be allowed. Wonder if that should be an option or just "it".Other note: For the longest time I've thought that D should do reference counting for types that can be proven to have no reference cycles. I don't understand why this is difficult, but even if it is it might be worth it. I don't actually need it personally, but I constantly hear complaints about "oh, D requires garbage collection because you can't use most of Phobos without it". What they really want, I bet, is deterministic memory management for realtime systems that does not cause thread contention/world stops. Add reference counting and this will suddenly be doable for what I'd image would be a lot of important stuff in Phobos (think CoW string functions--probably reference counter friendly). At this point the "reference counted" option gets subsumed into the "straight GC" / "GC+free" options, because the language/compiler would decide at compile-time which strategy to use for a range.Agreed. RC is not that difficult, and implementing it in terms of containers should be doable pretty nicely.I see D's current coverage of memory management techniques looking like this: - GC garbage collection: allocation for objects with complicated lifespans, with the caveat of complex allocation/deallocation code (check, not %100 featureful) - malloc/free for mediocre-speed allocation of objects whose lifespans can't be easily calculated but are known by the programmer (check) - custom allocators for when everything else fails (check, in most cases) - memory reuse with slices/immutability which gives us an awesome zero-overhead solution in a number of situations (check) - reference counting for almost-deterministic allocation of objects with easily calculated lifespans (**D does not cover this case!**) - stack for fast allocation of short-lived objects (check)Sounds great. There's quite a few difficulties along the road, but I think the roadmap is good. Andrei
Apr 15 2012
On 4/15/2012 9:34 AM, Andrei Alexandrescu wrote:Built-in arrays work routinely like that, and in fact the originating arrays are not distinguished by type in any way from their ranges (be they orphan or not). The question is, what should std.container do about orphan ranges in general? Should it allow them, disallow them, or leave the decision open (e.g. to be made by the allocator)? Depending on what way we go, the low-level design would be quite different.I suspect that disallowing them would be surprising behavior, as D with array slices supports it well.
Apr 15 2012
On 4/15/2012 11:43 AM, Walter Bright wrote:On 4/15/2012 9:34 AM, Andrei Alexandrescu wrote:I'd add that unless there's a pretty compelling reason not to, the collections should support orphan ranges.Built-in arrays work routinely like that, and in fact the originating arrays are not distinguished by type in any way from their ranges (be they orphan or not). The question is, what should std.container do about orphan ranges in general? Should it allow them, disallow them, or leave the decision open (e.g. to be made by the allocator)? Depending on what way we go, the low-level design would be quite different.I suspect that disallowing them would be surprising behavior, as D with array slices supports it well.
Apr 15 2012
On 15.04.2012 20:34, Andrei Alexandrescu wrote:I'm making good progress on an allocator design. If things come together as I hope, it'll kick some serious ass. I'm currently looking at four allocation models: * straight GC, safe (no deallocation) * GC + the ability to free memory * malloc/free, unsafe, buyer beware (STL-level safety) * reference counted (based on either malloc or GC+free) * region (scoped)Nice overview. I believe there should adapters on top of that. Say a pool allocator on top of any of these. Same goes for free-list which is arguably a limited form of pool allocator.I need to kink out a few details, most important being - is safety part of the allocator or an extricable property that's a different policy?That was the main Q apparently. Potential answers follow.For now I have a related question about orphan ranges. Consider this motivating example: void main() { int[] x; { auto b = new int[20];Imagine it's an Array!int(20) from here on. And x is a slice type for it.x = b[5 .. $ - 5]; } ... use x ... }Here, obviously the ability to use x not thinking at all of b is largely a property of GC allocator. I'd call it "any reference suffice" property, meaning that no extra effort needed to keep container alive. Otherwise if say used refcounted scheme then Array *itself* has to be aware of the fact its allocator is !any_reference_suffice thus put a an extra reference into it's slice. Now scale-up from this low-level issue. All of the above was written in the usual memory-safety-not-negotiable perspective. Call it a policy and enact it by default. Now kill all of extra safety hooks and call it unsafe-speed-comes-first policy. The answer then in my opinion that it's *both*. Allocator tells what kind of guarantees it has and container then observes if it has to do some extra work to ensure his policy stands (e.g. add extra refs to allow memory safety for orphan ranges). Because by the end of day it's container that provides ranges and thus responsible for this decision. -- Dmitry Olshansky
Apr 16 2012
On Sun, 15 Apr 2012 12:34:11 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I'm making good progress on an allocator design. If things come together as I hope, it'll kick some serious ass. I'm currently looking at four allocation models: * straight GC, safe (no deallocation) * GC + the ability to free memory * malloc/free, unsafe, buyer beware (STL-level safety) * reference counted (based on either malloc or GC+free) * region (scoped)* pool/freelist based. Dcollections has such an allocator (Called a chunk allocator), which I hope would be applicable. It's in fact the default allocator. I saw you responded later that it will be a generic parameter. While I support that, and it is the same model in dcollections, it leaves a lot to be desired. For instance, if I have a RedBlackTree of SList!int, will each SList have it's own copy of an allocator? Sure you could use a singleton allocator to avoid overhead, but what if I simply want only the SLists in my RedBlackTree to share an allocator, and not with all SList!int? I've been occasionally putting some thought into that for dcollections, and I haven't really come up with a good solution yet, but it feels withink reach with some dedicated effort. I'm interested if you have a solution for this in mind.I need to kink out a few details, most important being - is safety part of the allocator or an extricable property that's a different policy? For now I have a related question about orphan ranges.It's definitely an allocator property. For example, malloc/free is unsafe, there is no way to make it safe.Consider this motivating example: void main() { int[] x; { auto b = new int[20]; x = b[5 .. $ - 5]; } ... use x ... } The range x is a 10-elements range that originates in a 20-element array. There is no safe way to access the original array again, so in a sense the other 10 elements are "lost". That's why I call x an orphan range - a range of which original container is gone. Built-in arrays work routinely like that, and in fact the originating arrays are not distinguished by type in any way from their ranges (be they orphan or not). The question is, what should std.container do about orphan ranges in general? Should it allow them, disallow them, or leave the decision open (e.g. to be made by the allocator)? Depending on what way we go, the low-level design would be quite different.I'd like to slightly correct your statement ;) b is a 20-element range that originates in a 31-element array (b.capacity is 31). The GC is the allocator and owner of the array type, which has no formal type. But I understand the question. and I think the situation is somewhat different -- most ranges will not be able to be attached spontaneously to different container instances. Nor will they rely on the allocator to generate extra instances at will to appease the requests. Built-in arrays/slices are like that. However, I think absolutely we need orphan range support. I think it's really a property of the allocator, not the container, as to whether orphan ranges exist. For proof, look no further than your list of possible allocators at the malloc/free version. Clearly orphan ranges are not implementable using that back-end, since once you lose your original container reference, the memory is leaked. If we make it container policy, any container would have to disallow that allocator. I think that's too restrictive, and makes the malloc/free allocator somewhat useless. In terms of "allowance", I don't see how you disallow them. All I think you can do is make using orphaned ranges defined behavior or not. -Steve
Apr 16 2012
On Sunday, April 15, 2012 11:34:11 Andrei Alexandrescu wrote:I'm making good progress on an allocator design. If things come together as I hope, it'll kick some serious ass. I'm currently looking at four allocation models: * straight GC, safe (no deallocation) * GC + the ability to free memory * malloc/free, unsafe, buyer beware (STL-level safety) * reference counted (based on either malloc or GC+free) * region (scoped) I need to kink out a few details, most important being - is safety part of the allocator or an extricable property that's a different policy? For now I have a related question about orphan ranges. Consider this motivating example: void main() { int[] x; { auto b = new int[20]; x = b[5 .. $ - 5]; } ... use x ... } The range x is a 10-elements range that originates in a 20-element array. There is no safe way to access the original array again, so in a sense the other 10 elements are "lost". That's why I call x an orphan range - a range of which original container is gone. Built-in arrays work routinely like that, and in fact the originating arrays are not distinguished by type in any way from their ranges (be they orphan or not). The question is, what should std.container do about orphan ranges in general? Should it allow them, disallow them, or leave the decision open (e.g. to be made by the allocator)? Depending on what way we go, the low-level design would be quite different.So, the problem that we have is basically void main() { Range r; { Container c = //allocated using allocator, however that works r = c[]; } //Container may or may not exist in memory, depending on the allocator, //but the range does exist and may or may not be valid. } How is this different from an iterator or range being invalidated after a function is called on the container which alters its state? You could theoretically add plumbing to the range to keep track of whether it's valid or not, but we're not doing that or planning to do that with std.container are we? And if we're not doing that, why would we go out of our way to do so in the case where a custom allocator causes a container to exist for a shorter lifetime than it would normally? - Jonathan M Davis
Apr 16 2012
On 4/16/12 12:57 PM, Jonathan M Davis wrote:So, the problem that we have is basically void main() { Range r; { Container c = //allocated using allocator, however that works r = c[]; } //Container may or may not exist in memory, depending on the allocator, //but the range does exist and may or may not be valid. }Yes.How is this different from an iterator or range being invalidated after a function is called on the container which alters its state?Well it's just a different matter. In particular containers offer the unstable versions of their primitives when they mess iterators up.You could theoretically add plumbing to the range to keep track of whether it's valid or not, but we're not doing that or planning to do that with std.container are we?I guess we have to, at least plant the decision to do so or not in the allocator or in a policy. Andrei
Apr 16 2012
On Monday, April 16, 2012 23:07:34 Andrei Alexandrescu wrote:On 4/16/12 12:57 PM, Jonathan M Davis wrote:Yeah, but the example is pretty much the same as code where someone called remove instead of stableRemove. They did something that invalidates the existing ranges for the container. It's just that rather than calling a function to do it, they did it with scope and allocators.So, the problem that we have is basically void main() { Range r; { Container c = //allocated using allocator, however that works r = c[]; } //Container may or may not exist in memory, depending on the allocator, //but the range does exist and may or may not be valid. }Yes.How is this different from an iterator or range being invalidated after a function is called on the container which alters its state?Well it's just a different matter. In particular containers offer the unstable versions of their primitives when they mess iterators up.I don't really see how else we could handle it. At minimum, if we're going to do it normally, it needs to be via assertions so that they'll go away in release mode - the same goes for checking for ranges which get invalidated by function calls to the container. Otherwise, we get stuck with undesirable overhead. - Jonathan M DavisYou could theoretically add plumbing to the range to keep track of whether it's valid or not, but we're not doing that or planning to do that with std.container are we?I guess we have to, at least plant the decision to do so or not in the allocator or in a policy.
Apr 16 2012