digitalmars.D - std.container and classes
- Jonathan M Davis (15/15) Dec 13 2011 Is the plan for std.container still to have all of its containers be fin...
- Andrei Alexandrescu (50/65) Dec 17 2011 Apologies for being slow on this. It may be a fateful time to discuss
- Jesse Phillips (23/35) Dec 17 2011 This is an interesting reasoning to go with class. Which is similar to
- Peter Alexander (4/10) Dec 17 2011 Allocators are very commonly used in C++, although people rarely use
- Peter Alexander (10/26) Dec 17 2011 My thoughts on this:
- Vladimir Panteleev (4/10) Dec 17 2011 Since containers are templated, why not use asserts for this, and
- Jonathan M Davis (36/114) Dec 17 2011 The only reason that I can think of to use a reference-counted struct in...
- Andrei Alexandrescu (6/13) Dec 17 2011 Being on the heap is not the main issue. The main issue is the data
- Jonathan M Davis (37/54) Dec 17 2011 My initial take on this is that the primary difference between a final c...
- Andrei Alexandrescu (7/45) Dec 18 2011 If this is a new notion, it might take a while for all of its aspects to...
- Jonathan M Davis (36/41) Dec 18 2011 After thinking about this a bit, I don't think that custom allocators ar...
- Jacob Carlborg (6/47) Dec 18 2011 The Tango runtime has added a new method in Object, "dispose". This
- Jonathan M Davis (53/56) Dec 20 2011 That sounds a lot like using clear, though clear doesn't free memory unl...
- foobar (17/86) Dec 20 2011 I disagree with the above conclusion. you conflate two issues
- Jonathan M Davis (9/27) Dec 20 2011 And if they're classes and not managed by the GC nor in a struct which m...
- foobar (9/18) Dec 21 2011 Memory management should be the responsibility of the allocator
- Andrei Alexandrescu (11/19) Dec 21 2011 (Please don't overquote. It took me a million years to scroll all the
- foobar (7/18) Dec 21 2011 Could you give an example to illustrate this design difference?
- Jacob Carlborg (11/67) Dec 20 2011 With "dispose" it would give you the option to control the lifetime of
- Jonathan M Davis (20/26) Dec 20 2011 Well, both delete and scope are going away, so std.typecons.scoped would...
- Jacob Carlborg (7/33) Dec 21 2011 I was thinking if there were good uses for "delete" and "scope" they
- Caligo (8/10) Dec 20 2011 Two questions:
- Andrei Alexandrescu (9/17) Dec 20 2011 Good containers are hard to define. It took the C++ community ten years
- Steven Schveighoffer (11/21) Dec 19 2011 This whole thread of discussion is somewhat more complicated than it has...
- Jerry (16/130) Dec 20 2011 One advantage of a struct container is that when you make it a member of...
- Froglegs (8/8) Dec 20 2011 I don't really think ref counted struct vs class is fair,
- foobar (16/102) Dec 20 2011 My thoughts are:
- Jonathan M Davis (32/46) Dec 20 2011 Containers are fairly straightforward in general. So, if you need to imp...
- foobar (10/24) Dec 20 2011 [snip]
Is the plan for std.container still to have all of its containers be final classes (classes so that they're reference types and final so that their functions are inlinable)? Or has that changed? I believe that Andrei said something recently about discussing reference counting and containers with Walter. The reason that I bring this up is that Array and SList are still structs, and the longer that they're structs, the more code that will break when they get changed to classes. Granted, some level of code breakage may occur when we add custom allocators to them, but since that would probably only affect the constructor (and preferably wouldn't affect anything if you want to simply create a container with the GC heap as you would now were Array and SList classes), the breakage for that should be minimal. Is there any reason for me to not just go and make Array and SList final classes and create a pull request for it? - Jonathan M Davis
Dec 13 2011
On 12/13/11 9:08 PM, Jonathan M Davis wrote:Is the plan for std.container still to have all of its containers be final classes (classes so that they're reference types and final so that their functions are inlinable)? Or has that changed? I believe that Andrei said something recently about discussing reference counting and containers with Walter. The reason that I bring this up is that Array and SList are still structs, and the longer that they're structs, the more code that will break when they get changed to classes. Granted, some level of code breakage may occur when we add custom allocators to them, but since that would probably only affect the constructor (and preferably wouldn't affect anything if you want to simply create a container with the GC heap as you would now were Array and SList classes), the breakage for that should be minimal. Is there any reason for me to not just go and make Array and SList final classes and create a pull request for it? - Jonathan M DavisApologies for being slow on this. It may be a fateful time to discuss that right now, after all the discussion of what's appropriate for stdlib vs. application code etc. As some of you know, Walter and I went back and forth several times on this. First, there was the issue of making containers value types vs. reference types. Making containers value types would be in keep with the STL approach. However, Walter noted that copying entire containers by default is most often NOT desirable and there's significant care and adornments in C++ programs to make sure that that default behavior is avoided (e.g. adding const& to function parameters). So we decided to make containers reference types, and that seemed to be a good choice. The second decision is classes vs. structs. Walter correctly pointed out that the obvious choice for defining a reference type in D - whether the type is momonorphic or polymorphic - is making it a class. If containers aren't classes, the reasoning went, it means we took a wrong step somewhere; it might mean our flagship abstraction for reference types is not suitable for, well, defining a reference type. Fast forward a couple of months, a few unslept nights, and a bunch of newsgroup and IRC conversations. Several additional pieces came together. The most important thing I noticed is that people expect standard containers to have sophisticated memory management. Many ask not about containers as much as "containers with custom allocators". Second, containers tend to be large memory users by definition. Third, containers are self-contained (heh) and relatively simple in terms of what they model, meaning that they _never_ suffer from circular references, like general entity types might. All of these arguments very strongly suggest that many want containers to be types with deterministic control over memory and accept configurable allocation strategies (regions, heap, malloc, custom). So that would mean containers should be reference counted structs. This cycle of thought has happened twice, and the evidence coming the second time has been stronger. The first time around I went about and started implementing std.container with reference counting in mind. The code is not easy to write, and is not to be recommended for most types, hence my thinking (at the end of the first cycle) that we should switch to class containers. One fear I have is that people would be curious, look at the implementation of std.container, and be like "so am I expected to do all this to define a robust type"? I start to think that the right answer to that is to improve library support for good reference counted types, and define reference counted struct containers that are deterministic. Safety is also an issue. I was hoping I'd provide safety as a policy, e.g. one may choose for a given container whether they want safe or not (and presumably fast). I think it's best to postpone that policy and focus for now on defining safe containers with safe ranges. This precludes e.g. using T[] as a range for Array!T. Please discuss. Andrei
Dec 17 2011
On Sat, 17 Dec 2011 17:31:46 -0600, Andrei Alexandrescu wrote:The second decision is classes vs. structs. Walter correctly pointed out that the obvious choice for defining a reference type in D - whether the type is momonorphic or polymorphic - is making it a class. If containers aren't classes, the reasoning went, it means we took a wrong step somewhere; it might mean our flagship abstraction for reference types is not suitable for, well, defining a reference type.This is an interesting reasoning to go with class. Which is similar to what you end up saying here:One fear I have is that people would be curious, look at the implementation of std.container, and be like "so am I expected to do all this to define a robust type"? I start to think that the right answer to that is to improve library support for good reference counted types, and define reference counted struct containers that are deterministic.I'm not really sure the answer, but here are some thoughts. There is interest in having reference counting, and making it easy. There was a question on SO about how to make a reference counted object. I gave it my best shot answering having never done it myself. It isn't very straight forward and I think people will expect to use it similar to auto_prt. And considering how similar each wrapped object is, I think D can pull it off. http://stackoverflow.com/questions/4632355/making-a-reference-counted- object-in-d-using-refcountedt So building a reference counted container type probably shouldn't be a challenge in D (or I should say it should be a goal to make it not a challenge). For example, it might be as simple as, a build a value container which you then expose wrapped with RefCounted!T alias RefCounted!ContainerImp Container; But as to whether this should be what is implemented in the standard library, I don't know. You make mention of custom allocators and such. Is this interest going to be of benefit, or is it just something people are use to having from C++? If it makes sense to have such then the containers should support it. Don't classes allow for custom allocators? Is there a reason that can't be used? Do we need to improve on that?
Dec 17 2011
On 18/12/11 12:06 AM, Jesse Phillips wrote:But as to whether this should be what is implemented in the standard library, I don't know. You make mention of custom allocators and such. Is this interest going to be of benefit, or is it just something people are use to having from C++? If it makes sense to have such then the containers should support it. Don't classes allow for custom allocators? Is there a reason that can't be used? Do we need to improve on that?Allocators are very commonly used in C++, although people rarely use them as they are implemented in the standard library because they were poorly designed. We can't repeat the same mistakes.
Dec 17 2011
On 17/12/11 11:31 PM, Andrei Alexandrescu wrote:The most important thing I noticed is that people expect standard containers to have sophisticated memory management. Many ask not about containers as much as "containers with custom allocators". Second, containers tend to be large memory users by definition. Third, containers are self-contained (heh) and relatively simple in terms of what they model, meaning that they _never_ suffer from circular references, like general entity types might. All of these arguments very strongly suggest that many want containers to be types with deterministic control over memory and accept configurable allocation strategies (regions, heap, malloc, custom). So that would mean containers should be reference counted structs.My thoughts on this: Allocators are a must, and need to handled better than they were in C++. Classes are simpler than ref-counted structs, and ref-counting is a massive performance drain and code-bloater anyway, so just go with final classes.Safety is also an issue. I was hoping I'd provide safety as a policy, e.g. one may choose for a given container whether they want safe or not (and presumably fast). I think it's best to postpone that policy and focus for now on defining safe containers with safe ranges. This precludes e.g. using T[] as a range for Array!T.IMO they should just be safe and not provide the option of non-safe. It just complicates things to cater to people that won't use them anyway C++ standard library containers are unsafe and relatively lean, yet are rarely used when performance really matters.
Dec 17 2011
On Saturday, 17 December 2011 at 23:31:47 UTC, Andrei Alexandrescu wrote:Safety is also an issue. I was hoping I'd provide safety as a policy, e.g. one may choose for a given container whether they want safe or not (and presumably fast). I think it's best to postpone that policy and focus for now on defining safe containers with safe ranges. This precludes e.g. using T[] as a range for Array!T.Since containers are templated, why not use asserts for this, and let the user choose with -release (just as with built-in arrays)?
Dec 17 2011
On Saturday, December 17, 2011 17:31:46 Andrei Alexandrescu wrote:On 12/13/11 9:08 PM, Jonathan M Davis wrote:The only reason that I can think of to use a reference-counted struct instead of a class is becuse then it's easier to avoid the GC heap entirely. Almost all of a container's memory is going to end up on the heap regardless, because the elements almost never end up in the container itself. They're in a dynamic array or in nodes or something similar. So, whether the container is ref- counted or a class is almost irrelevant. If anything making it a class makes more sense, because then it doesn't have to worry about the extra cost of ref- counting. However, once we bring allocators into the picture, the situation changes slightly. In both cases, the elements themselves end up where the allocator puts them. However, in the case of a struct, the member variables themselves end up on the stack, whereas with the class, they end up on the GC heap unless the programmer puts the object somewhere else with emplace or possibly with a function on the allocator. The other concern is that with a class, it's immediately clear that you're dealing with a reference, but that's much easier to miss with a struct when structs are almost always value types. Also, some containers can't be an a usable state from their init value (e.g. RedBlackTree), so they're more awkward as structs. disable might fix that problem, but I don't know. It would be also be awkward to not be able to just declare a RedBlackTree without immediately initializing it. So, in general, I think that final classes make more sense. The ref-counted struct is just trying to do what a class does already, and the class doesn't have the extra overhead of the ref-counting. The only concern is the fact that even when using a custom allocator, the few member variables that the container has end up on the GC heap unless the programmer goes to extra effort to avoid it, and part of the point of using the custom allocatorc is specifically to avoid the GC heap. The allocator API could make that easy to deal with though by having a function which takes the type and the arguments for the constructor and constructs it for you. e.g. auto arr = custom.alloc!Array(4, 5, 7); So, I favor final classes. I just don't see much benefit in ref-counted structs. They're more confusing and generally harder to deal with, with the one exception being avoiding the GC heap for the member variables of the container. - Jonathan M DavisIs the plan for std.container still to have all of its containers be final classes (classes so that they're reference types and final so that their functions are inlinable)? Or has that changed? I believe that Andrei said something recently about discussing reference counting and containers with Walter. The reason that I bring this up is that Array and SList are still structs, and the longer that they're structs, the more code that will break when they get changed to classes. Granted, some level of code breakage may occur when we add custom allocators to them, but since that would probably only affect the constructor (and preferably wouldn't affect anything if you want to simply create a container with the GC heap as you would now were Array and SList classes), the breakage for that should be minimal. Is there any reason for me to not just go and make Array and SList final classes and create a pull request for it? - Jonathan M DavisApologies for being slow on this. It may be a fateful time to discuss that right now, after all the discussion of what's appropriate for stdlib vs. application code etc. As some of you know, Walter and I went back and forth several times on this. First, there was the issue of making containers value types vs. reference types. Making containers value types would be in keep with the STL approach. However, Walter noted that copying entire containers by default is most often NOT desirable and there's significant care and adornments in C++ programs to make sure that that default behavior is avoided (e.g. adding const& to function parameters). So we decided to make containers reference types, and that seemed to be a good choice. The second decision is classes vs. structs. Walter correctly pointed out that the obvious choice for defining a reference type in D - whether the type is momonorphic or polymorphic - is making it a class. If containers aren't classes, the reasoning went, it means we took a wrong step somewhere; it might mean our flagship abstraction for reference types is not suitable for, well, defining a reference type. Fast forward a couple of months, a few unslept nights, and a bunch of newsgroup and IRC conversations. Several additional pieces came together. The most important thing I noticed is that people expect standard containers to have sophisticated memory management. Many ask not about containers as much as "containers with custom allocators". Second, containers tend to be large memory users by definition. Third, containers are self-contained (heh) and relatively simple in terms of what they model, meaning that they _never_ suffer from circular references, like general entity types might. All of these arguments very strongly suggest that many want containers to be types with deterministic control over memory and accept configurable allocation strategies (regions, heap, malloc, custom). So that would mean containers should be reference counted structs. This cycle of thought has happened twice, and the evidence coming the second time has been stronger. The first time around I went about and started implementing std.container with reference counting in mind. The code is not easy to write, and is not to be recommended for most types, hence my thinking (at the end of the first cycle) that we should switch to class containers. One fear I have is that people would be curious, look at the implementation of std.container, and be like "so am I expected to do all this to define a robust type"? I start to think that the right answer to that is to improve library support for good reference counted types, and define reference counted struct containers that are deterministic. Safety is also an issue. I was hoping I'd provide safety as a policy, e.g. one may choose for a given container whether they want safe or not (and presumably fast). I think it's best to postpone that policy and focus for now on defining safe containers with safe ranges. This precludes e.g. using T[] as a range for Array!T. Please discuss.
Dec 17 2011
On 12/17/11 7:52 PM, Jonathan M Davis wrote:The only reason that I can think of to use a reference-counted struct instead of a class is becuse then it's easier to avoid the GC heap entirely. Almost all of a container's memory is going to end up on the heap regardless, because the elements almost never end up in the container itself.Being on the heap is not the main issue. The main issue is the data becoming garbage once all references are gone.They're in a dynamic array or in nodes or something similar. So, whether the container is ref- counted or a class is almost irrelevant.I think this argument is starting off the wrong premise, and is already wrong by this point, so I skipped the rest. Am I right? Andrei
Dec 17 2011
On Sunday, December 18, 2011 00:15:40 Andrei Alexandrescu wrote:On 12/17/11 7:52 PM, Jonathan M Davis wrote:My initial take on this is that the primary difference between a final class and a ref-counted struct is the fact that the class is on the heap and needs no ref-counting, whereas the struct is no the stack and has to do ref-counting, and everything else is the same regardless, since the memory for the container has to go on the heap regardless (be it the GC heap or wherever the allocator puts it). But what you're bringing up is that in the case of the class, everything in the container stays around until it's garbage collected, whereas in the case of the struct, it goes away as soon as the ref-count hits zero? I hadn't thought of that. But if we're talking about the GC heap, since most of the internals are going to be on the heap in either case, with the only difference being the container itself and its value type member variables. However, once you use a custom allocator (say one that uses malloc and free), then in the struct case, everything gets cleaned up as soon as the ref-count hits zero, whereas with the class it sits around until the container itself is collected - assuming that the class itself is on the GC heap. If it was using the malloc-free allocator, then it would be around until it was manually freed by the allocator. I don't know. That is certainly food for thought. My natural inclinition is to just make it a class, since it's a reference type, and then you don't have to worry about the cost of ref-counting or have any confusion over whether the container is a reference type or not. But if there's any real performance advantage in using ref-counted structs due to something to do with memory management, then that would be highly valuable. If no custom allocators are used, then I'd expect the struct to be worse off, since it has the cost of being copied as well as the cost of ref-counting, whereas the only cost in passing the class around is copying it's reference, and other than the few member variables that the container has, there's no difference in how long the memory in the container sticks around. It has to wait for the GC in either case. It's only when a custom allocator which could free the memory as soon as the ref-count hits zero comes into play that it makes a difference, in which case the struct would probably be more efficient. Unless I'm missing something? It certainly doesn't seem like an easy call. - Jonathan M DavisThe only reason that I can think of to use a reference-counted struct instead of a class is becuse then it's easier to avoid the GC heap entirely. Almost all of a container's memory is going to end up on the heap regardless, because the elements almost never end up in the container itself.Being on the heap is not the main issue. The main issue is the data becoming garbage once all references are gone.They're in a dynamic array or in nodes or something similar. So, whether the container is ref- counted or a class is almost irrelevant.I think this argument is starting off the wrong premise, and is already wrong by this point, so I skipped the rest. Am I right?
Dec 17 2011
On 12/18/11 1:42 AM, Jonathan M Davis wrote:On Sunday, December 18, 2011 00:15:40 Andrei Alexandrescu wrote:If this is a new notion, it might take a while for all of its aspects to sink in. I'm not saying that to be smug - reference counting vs. other methods of collecting garbage is definitely a difficult topic on today's architectures, and I can say at least it took me quite a while to build some reasonable mental model of the related tradeoffs. AndreiOn 12/17/11 7:52 PM, Jonathan M Davis wrote:My initial take on this is that the primary difference between a final class and a ref-counted struct is the fact that the class is on the heap and needs no ref-counting, whereas the struct is no the stack and has to do ref-counting, and everything else is the same regardless, since the memory for the container has to go on the heap regardless (be it the GC heap or wherever the allocator puts it). But what you're bringing up is that in the case of the class, everything in the container stays around until it's garbage collected, whereas in the case of the struct, it goes away as soon as the ref-count hits zero? I hadn't thought of that. But if we're talking about the GC heap, since most of the internals are going to be on the heap in either case, with the only difference being the container itself and its value type member variables. However, once you use a custom allocator (say one that uses malloc and free), then in the struct case, everything gets cleaned up as soon as the ref-count hits zero, whereas with the class it sits around until the container itself is collected - assuming that the class itself is on the GC heap. If it was using the malloc-free allocator, then it would be around until it was manually freed by the allocator. I don't know. That is certainly food for thought.The only reason that I can think of to use a reference-counted struct instead of a class is becuse then it's easier to avoid the GC heap entirely. Almost all of a container's memory is going to end up on the heap regardless, because the elements almost never end up in the container itself.Being on the heap is not the main issue. The main issue is the data becoming garbage once all references are gone.They're in a dynamic array or in nodes or something similar. So, whether the container is ref- counted or a class is almost irrelevant.I think this argument is starting off the wrong premise, and is already wrong by this point, so I skipped the rest. Am I right?
Dec 18 2011
On Sunday, December 18, 2011 02:46:01 Andrei Alexandrescu wrote:If this is a new notion, it might take a while for all of its aspects to sink in. I'm not saying that to be smug - reference counting vs. other methods of collecting garbage is definitely a difficult topic on today's architectures, and I can say at least it took me quite a while to build some reasonable mental model of the related tradeoffs.After thinking about this a bit, I don't think that custom allocators are really going to work with classes very well. Unless you allocate the entire class with the custom allocator (as opposed to just telling the container to use the custom allocator), the memory won't be cleaned up until the object is collected by the GC (even though it's using a custom allocator to allocate its internals), in which case, what does the custom allocator really buy us? So, the idea of just passing the allocator to the container doesn't seem to cut it for a class. The object itself needs to be allocated with the custom allocator. On top of that, if you're allocating the container with a custom allocator, doesn't that risk issues like what the to-be-deprecated scope modifier causes? You practically end up having to wrap the class in a struct anyway, in which case, what's the point of using a class? Just in general, it's much harder to control the memory if the container is a class rather than a struct. With a struct, you can do pretty much whatever you want with the memory. Its lifetime is strictly controlled, which gives much greater opportunites to optimize its memory usage. With a class, it's going to be difficult to completely divorce the container from the GC in a clean manner. With a struct, we could easily make the default not use the GC at all, which would make the default use of the containers much more efficient. You also wouldn't have to worry about creating the container itself with an allocator and all of the potential issues that that brings. I don't particularly like the idea of the cost of passing the container around as a struct (since it's few member variables will have to be memcopied and the ref-counting will have to be done), but that's not a huge cost (though it _is_ a pervasive one), and I'm begginning to think that it's a necessary one if we want containers to be able to properly and efficiently manage their own memory. Depending on the use case, the benefits from the containers being able to fully manage their own memory without the programmer having to do anything special could easily outweigh the additional cost of passing a struct around. And if ranges over such containers are passed around more frequently than the containers themselves (as I would expect), then the minor cost of passing the container around is even less of a big deal, since it would primarily be the ranges being passed around. So, I'm beginning to think that we're going to have to go the struct route. - Jonathan M Davis
Dec 18 2011
On 2011-12-19 05:12, Jonathan M Davis wrote:On Sunday, December 18, 2011 02:46:01 Andrei Alexandrescu wrote:The Tango runtime has added a new method in Object, "dispose". This method is called when "scope" is used or "delete" is explicitly called. I don't know if that would help. -- /Jacob CarlborgIf this is a new notion, it might take a while for all of its aspects to sink in. I'm not saying that to be smug - reference counting vs. other methods of collecting garbage is definitely a difficult topic on today's architectures, and I can say at least it took me quite a while to build some reasonable mental model of the related tradeoffs.After thinking about this a bit, I don't think that custom allocators are really going to work with classes very well. Unless you allocate the entire class with the custom allocator (as opposed to just telling the container to use the custom allocator), the memory won't be cleaned up until the object is collected by the GC (even though it's using a custom allocator to allocate its internals), in which case, what does the custom allocator really buy us? So, the idea of just passing the allocator to the container doesn't seem to cut it for a class. The object itself needs to be allocated with the custom allocator. On top of that, if you're allocating the container with a custom allocator, doesn't that risk issues like what the to-be-deprecated scope modifier causes? You practically end up having to wrap the class in a struct anyway, in which case, what's the point of using a class? Just in general, it's much harder to control the memory if the container is a class rather than a struct. With a struct, you can do pretty much whatever you want with the memory. Its lifetime is strictly controlled, which gives much greater opportunites to optimize its memory usage. With a class, it's going to be difficult to completely divorce the container from the GC in a clean manner. With a struct, we could easily make the default not use the GC at all, which would make the default use of the containers much more efficient. You also wouldn't have to worry about creating the container itself with an allocator and all of the potential issues that that brings. I don't particularly like the idea of the cost of passing the container around as a struct (since it's few member variables will have to be memcopied and the ref-counting will have to be done), but that's not a huge cost (though it _is_ a pervasive one), and I'm begginning to think that it's a necessary one if we want containers to be able to properly and efficiently manage their own memory. Depending on the use case, the benefits from the containers being able to fully manage their own memory without the programmer having to do anything special could easily outweigh the additional cost of passing a struct around. And if ranges over such containers are passed around more frequently than the containers themselves (as I would expect), then the minor cost of passing the container around is even less of a big deal, since it would primarily be the ranges being passed around. So, I'm beginning to think that we're going to have to go the struct route. - Jonathan M Davis
Dec 18 2011
On Monday, December 19, 2011 08:54:00 Jacob Carlborg wrote:The Tango runtime has added a new method in Object, "dispose". This method is called when "scope" is used or "delete" is explicitly called. I don't know if that would help.That sounds a lot like using clear, though clear doesn't free memory unless the finalizer does (and I'm not sure that managing memory with finalizers actually works right now; there were issues with that previously - but it might have only been GC memory, so perhaps malloc and free would still work). The issue with them is escaping references. You can easily end up with references to data which has been destroyed. It also makes them harder to pass around unless you essentially wrap the container in ref-counted struct. The lifetime of the container must be managed if you really want to be able to use custom allocators or have the container be as efficient as possible with regards to its memory usage in the general case. With a struct, it manages itself. It can do whatever it wants to with memory internally, and the combination of its constructor, postblit constructor, and destructor allows it to take care of it itself and clean up its memory when it's destroyed. You also don't have issues of stuff referring to a container which doesn't exist anymore (unless you add pointers to the container into the mix and then move or destroy the container). With classes, if they're owned by the GC, then it's the GC that manages their lifetime. It destroys them when they're no longer referenced and it runs a collection cycle. You have no control over its lifetime, and even if it tries to manage its memory internally on some level, that memory won't be cleaned up until the GC collects the container itself. So, if you want to manage the lifetime of the class, you can't use a naked reference anymore. You need a way to get the GC to collect it, or you need to have it somewhere else other than the GC heap. If you use something like Scoped, then its lifetime will be tied to the scope that it's in, but you risk references to it escaping, and limiting the container to a particular scope could be far too limiting anyway. That being the case, if you want to use an allocator other than the GC for the class itself, then you're probably going to have to stick it in a struct. That struct could then manage the container's lifetime on some level. It would probably have to use ref-counting and then call clear on the container when the ref-count hits zero. That should then allow the container to at least clean up its internal memory, assuming that that's not on the GC heap, but unless you use emplace on non-GC heap memory, the container itself still can't be collected until the GC collects it (since clear destroys it but doesn't free its memory). Not to mention, if you're using a ref-counted struct to hold a class in order to manage that class' lifetime, why on earth did you make it a class in the first place? If what you're suggesting with dispose is that it would act like clear except that it would actually free the container, well that pretty much goes against the direction that we've been trying to go with the GC, which is to not have the programmer explicitly freeing anything on the GC heap (which is why delete is being removed from the language). I really think that if we want deterministic destruction of containers, they need to be structs. And if we want to use custom allocators with containers, I don't see how we could reasonably expect them to be of much use unless we're expecting that programmers will explicitly call clear on them when they're done with them. So, while the fact that containers need to be reference types does imply that classes are the way to go, I think that we're going to have to go with structs if we want their memory management to be more efficient than simply letting the GC handle it. - Jonathan M Davis
Dec 20 2011
On Wednesday, 21 December 2011 at 05:08:27 UTC, Jonathan M Davis wrote:On Monday, December 19, 2011 08:54:00 Jacob Carlborg wrote:I disagree with the above conclusion. you conflate two issues that are orthogonal: a. value vs. ref semantics which is already seemed to be decided in favor of the latter and hence classes. b. memory and lifetime management The containers should allow for (disregard the specifics of the syntax): Container a = new(SharedMemAllocator) LinkedList(); Container b = new(MallocAllocator) LinkedList(); Container c = new(GC) LinkedList(); When adding an item to the above containers the relevant allocator will enact its policy about intermixing with other allocators - by default the item will be copied if it comes from a separate allocator. I don't see anything here that forces the use of structs instead of classes.The Tango runtime has added a new method in Object, "dispose". This method is called when "scope" is used or "delete" is explicitly called. I don't know if that would help.That sounds a lot like using clear, though clear doesn't free memory unless the finalizer does (and I'm not sure that managing memory with finalizers actually works right now; there were issues with that previously - but it might have only been GC memory, so perhaps malloc and free would still work). The issue with them is escaping references. You can easily end up with references to data which has been destroyed. It also makes them harder to pass around unless you essentially wrap the container in ref-counted struct. The lifetime of the container must be managed if you really want to be able to use custom allocators or have the container be as efficient as possible with regards to its memory usage in the general case. With a struct, it manages itself. It can do whatever it wants to with memory internally, and the combination of its constructor, postblit constructor, and destructor allows it to take care of it itself and clean up its memory when it's destroyed. You also don't have issues of stuff referring to a container which doesn't exist anymore (unless you add pointers to the container into the mix and then move or destroy the container). With classes, if they're owned by the GC, then it's the GC that manages their lifetime. It destroys them when they're no longer referenced and it runs a collection cycle. You have no control over its lifetime, and even if it tries to manage its memory internally on some level, that memory won't be cleaned up until the GC collects the container itself. So, if you want to manage the lifetime of the class, you can't use a naked reference anymore. You need a way to get the GC to collect it, or you need to have it somewhere else other than the GC heap. If you use something like Scoped, then its lifetime will be tied to the scope that it's in, but you risk references to it escaping, and limiting the container to a particular scope could be far too limiting anyway. That being the case, if you want to use an allocator other than the GC for the class itself, then you're probably going to have to stick it in a struct. That struct could then manage the container's lifetime on some level. It would probably have to use ref-counting and then call clear on the container when the ref-count hits zero. That should then allow the container to at least clean up its internal memory, assuming that that's not on the GC heap, but unless you use emplace on non-GC heap memory, the container itself still can't be collected until the GC collects it (since clear destroys it but doesn't free its memory). Not to mention, if you're using a ref-counted struct to hold a class in order to manage that class' lifetime, why on earth did you make it a class in the first place? If what you're suggesting with dispose is that it would act like clear except that it would actually free the container, well that pretty much goes against the direction that we've been trying to go with the GC, which is to not have the programmer explicitly freeing anything on the GC heap (which is why delete is being removed from the language). I really think that if we want deterministic destruction of containers, they need to be structs. And if we want to use custom allocators with containers, I don't see how we could reasonably expect them to be of much use unless we're expecting that programmers will explicitly call clear on them when they're done with them. So, while the fact that containers need to be reference types does imply that classes are the way to go, I think that we're going to have to go with structs if we want their memory management to be more efficient than simply letting the GC handle it. - Jonathan M Davis
Dec 20 2011
On Wednesday, December 21, 2011 08:07:23 foobar wrote:I disagree with the above conclusion. you conflate two issues that are orthogonal: a. value vs. ref semantics which is already seemed to be decided in favor of the latter and hence classes. b. memory and lifetime management The containers should allow for (disregard the specifics of the syntax): Container a = new(SharedMemAllocator) LinkedList(); Container b = new(MallocAllocator) LinkedList(); Container c = new(GC) LinkedList(); When adding an item to the above containers the relevant allocator will enact its policy about intermixing with other allocators - by default the item will be copied if it comes from a separate allocator. I don't see anything here that forces the use of structs instead of classes.And if they're classes and not managed by the GC nor in a struct which manages their lifetime, how are they going to be freed? Does the user have to explicitly free them themselves? How is that better than using a ref-counted struct? And no, using reference semantics does _not_ require classes. A class is just the easiest way to get reference semantics. It doesn't necessarily mean that it's the best way. That depends on the context. - Jonathan M Davis
Dec 20 2011
On Wednesday, 21 December 2011 at 07:30:45 UTC, Jonathan M Davis wrote:And if they're classes and not managed by the GC nor in a struct which manages their lifetime, how are they going to be freed? Does the user have to explicitly free them themselves? How is that better than using a ref-counted struct?Memory management should be the responsibility of the allocator and it shouldn't be limited only to a single policy of ref-counting which has many disadvantages.And no, using reference semantics does _not_ require classes. A class is just the easiest way to get reference semantics. It doesn't necessarily mean that it's the best way. That depends on the context. - Jonathan M DavisI didn't use the word 'require' anywhere, but it is indeed by far the simplest and best way to define a ref-type. All I'm saying is that when travelling from NYC to Washington DC through China, you really should have good reasons to not use the direct route.
Dec 21 2011
On 12/21/11 1:07 AM, foobar wrote:The containers should allow for (disregard the specifics of the syntax): Container a = new(SharedMemAllocator) LinkedList(); Container b = new(MallocAllocator) LinkedList(); Container c = new(GC) LinkedList(); When adding an item to the above containers the relevant allocator will enact its policy about intermixing with other allocators - by default the item will be copied if it comes from a separate allocator. I don't see anything here that forces the use of structs instead of classes.(Please don't overquote. It took me a million years to scroll all the way down on my phone. Thanks.) This is an alluring proposition but it's a lot more difficult than one might think. The problem is the entire container design is different if e.g. using refcounting vs. classic garbage collection. You may want to try designing a simple container and isolate all memory/lifetime-specific issues into an allocator. See what allocator interface you come up with. I was unable to solve this problem, but it's possible it has a solution. Andrei
Dec 21 2011
On Wednesday, 21 December 2011 at 16:39:13 UTC, Andrei Alexandrescu wrote:(Please don't overquote. It took me a million years to scroll all the way down on my phone. Thanks.) This is an alluring proposition but it's a lot more difficult than one might think. The problem is the entire container design is different if e.g. using refcounting vs. classic garbage collection. You may want to try designing a simple container and isolate all memory/lifetime-specific issues into an allocator. See what allocator interface you come up with. I was unable to solve this problem, but it's possible it has a solution. AndreiCould you give an example to illustrate this design difference? Which container would be affected by this? There's already one good allocator API design I'm aware of: http://www.open-std.org/Jtc1/sc22/wg21/docs/papers/2005/n1850.pdf Can we reuse the design/ideas or can you see problems with it?
Dec 21 2011
On 2011-12-21 06:07, Jonathan M Davis wrote:On Monday, December 19, 2011 08:54:00 Jacob Carlborg wrote:With "dispose" it would give you the option to control the lifetime of the object, in some cases.The Tango runtime has added a new method in Object, "dispose". This method is called when "scope" is used or "delete" is explicitly called. I don't know if that would help.That sounds a lot like using clear, though clear doesn't free memory unless the finalizer does (and I'm not sure that managing memory with finalizers actually works right now; there were issues with that previously - but it might have only been GC memory, so perhaps malloc and free would still work). The issue with them is escaping references. You can easily end up with references to data which has been destroyed. It also makes them harder to pass around unless you essentially wrap the container in ref-counted struct. The lifetime of the container must be managed if you really want to be able to use custom allocators or have the container be as efficient as possible with regards to its memory usage in the general case. With a struct, it manages itself. It can do whatever it wants to with memory internally, and the combination of its constructor, postblit constructor, and destructor allows it to take care of it itself and clean up its memory when it's destroyed. You also don't have issues of stuff referring to a container which doesn't exist anymore (unless you add pointers to the container into the mix and then move or destroy the container). With classes, if they're owned by the GC, then it's the GC that manages their lifetime. It destroys them when they're no longer referenced and it runs a collection cycle. You have no control over its lifetime, and even if it tries to manage its memory internally on some level, that memory won't be cleaned up until the GC collects the container itself. So, if you want to manage the lifetime of the class, you can't use a naked reference anymore. You need a way to get the GC to collect it, or you need to have it somewhere else other than the GC heap.If you use something like Scoped, then its lifetime will be tied to the scope that it's in, but you risk references to it escaping, and limiting the container to a particular scope could be far too limiting anyway. That being the case, if you want to use an allocator other than the GC for the class itself, then you're probably going to have to stick it in a struct. That struct could then manage the container's lifetime on some level. It would probably have to use ref-counting and then call clear on the container when the ref-count hits zero. That should then allow the container to at least clean up its internal memory, assuming that that's not on the GC heap, but unless you use emplace on non-GC heap memory, the container itself still can't be collected until the GC collects it (since clear destroys it but doesn't free its memory). Not to mention, if you're using a ref-counted struct to hold a class in order to manage that class' lifetime, why on earth did you make it a class in the first place? If what you're suggesting with dispose is that it would act like clear except that it would actually free the container, well that pretty much goes against the direction that we've been trying to go with the GC, which is to not have the programmer explicitly freeing anything on the GC heap (which is why delete is being removed from the language).I thought that one of the problems with the GC and memory was that you can't rely on when/if the destructors are called. With "dispose" it would give the developer a reliable method to cleanup resources, assuming "scope" or "delete" is used.I really think that if we want deterministic destruction of containers, they need to be structs. And if we want to use custom allocators with containers, I don't see how we could reasonably expect them to be of much use unless we're expecting that programmers will explicitly call clear on them when they're done with them. So, while the fact that containers need to be reference types does imply that classes are the way to go, I think that we're going to have to go with structs if we want their memory management to be more efficient than simply letting the GC handle it. - Jonathan M DavisI was thinking that classes could be the safe and default but and scope/delete could be used as an optimization. -- /Jacob Carlborg
Dec 20 2011
On Wednesday, December 21, 2011 08:44:10 Jacob Carlborg wrote:I thought that one of the problems with the GC and memory was that you can't rely on when/if the destructors are called. With "dispose" it would give the developer a reliable method to cleanup resources, assuming "scope" or "delete" is used.I was thinking that classes could be the safe and default but and scope/delete could be used as an optimization.Well, both delete and scope are going away, so std.typecons.scoped would have to be used instead, but that only really works for local variables. It would probably work for a member variable as well, but in both cases, you have to worry about the class going away when Scoped goes away and how that affects all of the references to it. If, on other hand, you use a ref-counted struct, it takes care of itself without the risks of the container going away while references to it still exist (unless you keep a pointer to it somewhere instead of the ref-counted struct). So, ref-counted structs would be much safer, and they could use advanced memory management internally by default, making them more efficient by default. What I really don't get, however, is how custom allocators would be of much use for classes unless the entire class is allocated that way (in which case, it's probably going to end up in a struct of some kind anyway unless you want the same issues that you get with scoped), since if the class is on the GC heap, its internal memory management couldn't release any of it until the container is collected by the GC. So, it seems to me that the desire for custom allocators would favor the use of ref-counted structs over classes. - Jonathan M Davis
Dec 20 2011
On 2011-12-21 08:53, Jonathan M Davis wrote:On Wednesday, December 21, 2011 08:44:10 Jacob Carlborg wrote:I was thinking if there were good uses for "delete" and "scope" they could stay. I think the "dispose" method in Tango makes them more useful.I thought that one of the problems with the GC and memory was that you can't rely on when/if the destructors are called. With "dispose" it would give the developer a reliable method to cleanup resources, assuming "scope" or "delete" is used.I was thinking that classes could be the safe and default but and scope/delete could be used as an optimization.Well, both delete and scope are going away, so std.typecons.scoped would have to be used instead, but that only really works for local variables. It would probably work for a member variable as well, but in both cases, you have to worry about the class going away when Scoped goes away and how that affects all of the references to it.If, on other hand, you use a ref-counted struct, it takes care of itself without the risks of the container going away while references to it still exist (unless you keep a pointer to it somewhere instead of the ref-counted struct). So, ref-counted structs would be much safer, and they could use advanced memory management internally by default, making them more efficient by default. What I really don't get, however, is how custom allocators would be of much use for classes unless the entire class is allocated that way (in which case, it's probably going to end up in a struct of some kind anyway unless you want the same issues that you get with scoped), since if the class is on the GC heap, its internal memory management couldn't release any of it until the container is collected by the GC. So, it seems to me that the desire for custom allocators would favor the use of ref-counted structs over classes. - Jonathan M DavisI sounds like a struct is what should be used, it will be most flexible as well. -- /Jacob Carlborg
Dec 21 2011
On Sun, Dec 18, 2011 at 10:12 PM, Jonathan M Davis <jmdavisProg gmx.com>wrote:So, I'm beginning to think that we're going to have to go the struct route. - Jonathan M DavisTwo questions: 1. If you guys, the D experts, are having such a difficult time with this, what happens to the rest of us when we need to implement data structures that are not offered by Phobos? Do we just wait and see what happens with Phobos and learn from it? 2. Are the new containers going to be multi-threaded? i.e., will I be able to insert elements into a container from multiple threads?
Dec 20 2011
On 12/20/11 10:15 PM, Caligo wrote:Two questions: 1. If you guys, the D experts, are having such a difficult time with this, what happens to the rest of us when we need to implement data structures that are not offered by Phobos?Good containers are hard to define. It took the C++ community ten years and a great mathematician to define some passable ones. I have no worries - it's a hard task.Do we just wait and see what happens with Phobos and learn from it?No need. The basic design is pretty clear - containers offer ranges, and algorithms use ranges.2. Are the new containers going to be multi-threaded? i.e., will I be able to insert elements into a container from multiple threads?Containers will not be implicitly shared. We'll define specialized shared containers. Andrei
Dec 20 2011
On Wed, 21 Dec 2011 07:37:33 +0200, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:a great mathematicianName please?
Dec 21 2011
On Wednesday, 21 December 2011 at 12:50:44 UTC, so wrote:On Wed, 21 Dec 2011 07:37:33 +0200, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I believe he is referring to http://en.wikipedia.org/wiki/Alexander_Stepanova great mathematicianName please?
Dec 21 2011
On Wed, 21 Dec 2011 14:56:28 +0200, Froglegs <lugtug gmail.com> wrote:On Wednesday, 21 December 2011 at 12:50:44 UTC, so wrote:Thank you Mr Frog. I didn't know the "mathematician" part of him. I am warning you Java (or any die hard OOP) guys, don't read his articles.On Wed, 21 Dec 2011 07:37:33 +0200, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I believe he is referring to http://en.wikipedia.org/wiki/Alexander_Stepanova great mathematicianName please?
Dec 21 2011
On Sun, 18 Dec 2011 01:15:40 -0500, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 12/17/11 7:52 PM, Jonathan M Davis wrote:This whole thread of discussion is somewhat more complicated than it has to be. A ref-counted type is implemented via a reference to allocated data. That data can be a class instance or a struct pointer. Using a reference style struct just introduces problems you don't have to worry about with a class. It gains you nothing except having to crowbar the language into believing your struct is really a reference type. And that's really hard to do. -SteveThe only reason that I can think of to use a reference-counted struct instead of a class is becuse then it's easier to avoid the GC heap entirely. Almost all of a container's memory is going to end up on the heap regardless, because the elements almost never end up in the container itself.Being on the heap is not the main issue. The main issue is the data becoming garbage once all references are gone.
Dec 19 2011
Jonathan M Davis <jmdavisProg gmx.com> writes:On Saturday, December 17, 2011 17:31:46 Andrei Alexandrescu wrote:One advantage of a struct container is that when you make it a member of a struct or class that will have lots of instances and therefore lots of tiny containers. In C++ at my work, we embed tiny vectors in objects all the time. If you have a situation where there's enough of the objects that the memory actually matters, and the vectors need to be resizeable but are often small, the container overhead matters. With a class, you've added a reference pointer, vtable, typeinfo on top of the storage for the container itself. Also, the extra dereference to get from the object to its member container is often going to be a cache miss, slowing things down further. So if you embed an Array class in an object, you pay for 2 pointer dereferences to get to the data rather than one. With a class you'll have to roll your own to avoid the costs. JerryOn 12/13/11 9:08 PM, Jonathan M Davis wrote:The only reason that I can think of to use a reference-counted struct instead of a class is becuse then it's easier to avoid the GC heap entirely. Almost all of a container's memory is going to end up on the heap regardless, because the elements almost never end up in the container itself. They're in a dynamic array or in nodes or something similar. So, whether the container is ref- counted or a class is almost irrelevant. If anything making it a class makes more sense, because then it doesn't have to worry about the extra cost of ref- counting. However, once we bring allocators into the picture, the situation changes slightly. In both cases, the elements themselves end up where the allocator puts them. However, in the case of a struct, the member variables themselves end up on the stack, whereas with the class, they end up on the GC heap unless the programmer puts the object somewhere else with emplace or possibly with a function on the allocator. The other concern is that with a class, it's immediately clear that you're dealing with a reference, but that's much easier to miss with a struct when structs are almost always value types. Also, some containers can't be an a usable state from their init value (e.g. RedBlackTree), so they're more awkward as structs. disable might fix that problem, but I don't know. It would be also be awkward to not be able to just declare a RedBlackTree without immediately initializing it. So, in general, I think that final classes make more sense. The ref-counted struct is just trying to do what a class does already, and the class doesn't have the extra overhead of the ref-counting. The only concern is the fact that even when using a custom allocator, the few member variables that the container has end up on the GC heap unless the programmer goes to extra effort to avoid it, and part of the point of using the custom allocatorc is specifically to avoid the GC heap. The allocator API could make that easy to deal with though by having a function which takes the type and the arguments for the constructor and constructs it for you. e.g. auto arr = custom.alloc!Array(4, 5, 7); So, I favor final classes. I just don't see much benefit in ref-counted structs. They're more confusing and generally harder to deal with, with the one exception being avoiding the GC heap for the member variables of the container. - Jonathan M DavisIs the plan for std.container still to have all of its containers be final classes (classes so that they're reference types and final so that their functions are inlinable)? Or has that changed? I believe that Andrei said something recently about discussing reference counting and containers with Walter. The reason that I bring this up is that Array and SList are still structs, and the longer that they're structs, the more code that will break when they get changed to classes. Granted, some level of code breakage may occur when we add custom allocators to them, but since that would probably only affect the constructor (and preferably wouldn't affect anything if you want to simply create a container with the GC heap as you would now were Array and SList classes), the breakage for that should be minimal. Is there any reason for me to not just go and make Array and SList final classes and create a pull request for it? - Jonathan M DavisApologies for being slow on this. It may be a fateful time to discuss that right now, after all the discussion of what's appropriate for stdlib vs. application code etc. As some of you know, Walter and I went back and forth several times on this. First, there was the issue of making containers value types vs. reference types. Making containers value types would be in keep with the STL approach. However, Walter noted that copying entire containers by default is most often NOT desirable and there's significant care and adornments in C++ programs to make sure that that default behavior is avoided (e.g. adding const& to function parameters). So we decided to make containers reference types, and that seemed to be a good choice. The second decision is classes vs. structs. Walter correctly pointed out that the obvious choice for defining a reference type in D - whether the type is momonorphic or polymorphic - is making it a class. If containers aren't classes, the reasoning went, it means we took a wrong step somewhere; it might mean our flagship abstraction for reference types is not suitable for, well, defining a reference type. Fast forward a couple of months, a few unslept nights, and a bunch of newsgroup and IRC conversations. Several additional pieces came together. The most important thing I noticed is that people expect standard containers to have sophisticated memory management. Many ask not about containers as much as "containers with custom allocators". Second, containers tend to be large memory users by definition. Third, containers are self-contained (heh) and relatively simple in terms of what they model, meaning that they _never_ suffer from circular references, like general entity types might. All of these arguments very strongly suggest that many want containers to be types with deterministic control over memory and accept configurable allocation strategies (regions, heap, malloc, custom). So that would mean containers should be reference counted structs. This cycle of thought has happened twice, and the evidence coming the second time has been stronger. The first time around I went about and started implementing std.container with reference counting in mind. The code is not easy to write, and is not to be recommended for most types, hence my thinking (at the end of the first cycle) that we should switch to class containers. One fear I have is that people would be curious, look at the implementation of std.container, and be like "so am I expected to do all this to define a robust type"? I start to think that the right answer to that is to improve library support for good reference counted types, and define reference counted struct containers that are deterministic. Safety is also an issue. I was hoping I'd provide safety as a policy, e.g. one may choose for a given container whether they want safe or not (and presumably fast). I think it's best to postpone that policy and focus for now on defining safe containers with safe ranges. This precludes e.g. using T[] as a range for Array!T. Please discuss.
Dec 20 2011
I don't really think ref counted struct vs class is fair, because in reality most containers don't need ref counting. I can't think of one instance in C++ where I stuck a container directly in a shared_ptr or anything similar. Also as far I as I can tell making it a class would bloat it with unnecessary data(vtable), and being it is common to have many many instances of these containers, that doesn't sound like such a great thing.
Dec 20 2011
On Saturday, 17 December 2011 at 23:31:47 UTC, Andrei Alexandrescu wrote:On 12/13/11 9:08 PM, Jonathan M Davis wrote:My thoughts are: - value vs. reference: definitely reference hence classes - reference type - BAD idea. it adds code bloat, code complexity and hurts performance, not to mention it conflicts with concurrency which is the way of the future. - allocators - EXCELLENT idea. As discussed previously the design must not be the c++ one. - In addition to the general purpose ref. containers we should provide a few special value type containers such as small arrays with primitive types, a FlagSet with 1 bit per flag, etc. - safety - definitely safe by default and this is wat we should concentrate ATM. unsafe could be added later. ref-counted seems redundant to me if we provide allocators. There should be an RC_alloc for this.Is the plan for std.container still to have all of its containers be final classes (classes so that they're reference types and final so that their functions are inlinable)? Or has that changed? I believe that Andrei said something recently about discussing reference counting and containers with Walter. The reason that I bring this up is that Array and SList are still structs, and the longer that they're structs, the more code that will break when they get changed to classes. Granted, some level of code breakage may occur when we add custom allocators to them, but since that would probably only affect the constructor (and preferably wouldn't affect anything if you want to simply create a container with the GC heap as you would now were Array and SList classes), the breakage for that should be minimal. Is there any reason for me to not just go and make Array and SList final classes and create a pull request for it? - Jonathan M DavisApologies for being slow on this. It may be a fateful time to discuss that right now, after all the discussion of what's appropriate for stdlib vs. application code etc. As some of you know, Walter and I went back and forth several times on this. First, there was the issue of making containers value types vs. reference types. Making containers value types would be in keep with the STL approach. However, Walter noted that copying entire containers by default is most often NOT desirable and there's significant care and adornments in C++ programs to make sure that that default behavior is avoided (e.g. adding const& to function parameters). So we decided to make containers reference types, and that seemed to be a good choice. The second decision is classes vs. structs. Walter correctly pointed out that the obvious choice for defining a reference type in D - whether the type is momonorphic or polymorphic - is making it a class. If containers aren't classes, the reasoning went, it means we took a wrong step somewhere; it might mean our flagship abstraction for reference types is not suitable for, well, defining a reference type. Fast forward a couple of months, a few unslept nights, and a bunch of newsgroup and IRC conversations. Several additional pieces came together. The most important thing I noticed is that people expect standard containers to have sophisticated memory management. Many ask not about containers as much as "containers with custom allocators". Second, containers tend to be large memory users by definition. Third, containers are self-contained (heh) and relatively simple in terms of what they model, meaning that they _never_ suffer from circular references, like general entity types might. All of these arguments very strongly suggest that many want containers to be types with deterministic control over memory and accept configurable allocation strategies (regions, heap, malloc, custom). So that would mean containers should be reference counted structs. This cycle of thought has happened twice, and the evidence coming the second time has been stronger. The first time around I went about and started implementing std.container with reference counting in mind. The code is not easy to write, and is not to be recommended for most types, hence my thinking (at the end of the first cycle) that we should switch to class containers. One fear I have is that people would be curious, look at the implementation of std.container, and be like "so am I expected to do all this to define a robust type"? I start to think that the right answer to that is to improve library support for good reference counted types, and define reference counted struct containers that are deterministic. Safety is also an issue. I was hoping I'd provide safety as a policy, e.g. one may choose for a given container whether they want safe or not (and presumably fast). I think it's best to postpone that policy and focus for now on defining safe containers with safe ranges. This precludes e.g. using T[] as a range for Array!T. Please discuss. Andrei
Dec 20 2011
On Tuesday, December 20, 2011 22:15:43 Caligo wrote:On Sun, Dec 18, 2011 at 10:12 PM, Jonathan M Davis<jmdavisProg gmx.com>wrote:Containers are fairly straightforward in general. So, if you need to implement your own, it's not generally all that hard. The problem here is that the standard library needs to be efficient in the general case. We don't know under what circumstances these containers are going to be used, so we're under different constraints than someone who's writing a container for their own use. Clearly, containers can be either structs or classes. Both will work. The issue is which will be more efficient in the general case. And if you're implementing your own containers, you can choose to do what Phobos does or go another route - especially since you're going to know under what circumstances you're using your own containers. For the most part, I think that the issues being discussed with regards to classes vs structs are as big a deal in this case as they are because it's the standard library. Ideally, Phobos will take the most efficient route such that if you want your containers to be as absolutely as efficient as possible, you'd do the same thing that they're doing with your own containers, but you wouldn't necessarily need to. Since, as Andrei has been pointing out, the standard library is the sort of place where you go to extremes to make code efficient, and you don't always do things in the way that would normally be considered the nice way to do it, because it _needs_ to have that efficiency and can afford to be nastier code, since it's generally the experts who are working on it. So yes, learn from Phobos, but you don't have to do everything the way that it does.So, I'm beginning to think that we're going to have to go the struct route. - Jonathan M DavisTwo questions: 1. If you guys, the D experts, are having such a difficult time with this, what happens to the rest of us when we need to implement data structures that are not offered by Phobos? Do we just wait and see what happens with Phobos and learn from it?2. Are the new containers going to be multi-threaded? i.e., will I be able to insert elements into a container from multiple threads?No. That would introduce unnecessary overhead. If you want synchronized containers, then wrap the standard ones in your own. Maybe some sort of synchronized wrapper will be provided by Phobos at some point, but the containers themselves are definitely not going to be. You can add the multi- threading protections and the cost that they bring on top of an implementation that doesn't have them, but if you have such protections built-in, those using the containers can't remove them and would be stuck with the performance cost. - Jonathan M Davis
Dec 20 2011
On Wednesday, 21 December 2011 at 04:45:48 UTC, Jonathan M Davis wrote:On Tuesday, December 20, 2011 22:15:43 Caligo wrote:[snip]actually, a synchronized container doesn't really provide concurrent write access, it serialized the access of multiple threads by using a lock. What about true concurrent containers such that two threads can access concurrently the same container at different places? E.g Array could be divided into chunks and each chunk is owned by a separate thread, Tree that can divide it's sub-trees among several threads, etc..2. Are the new containers going to be multi-threaded? i.e., will I be able to insert elements into a container from multiple threads?No. That would introduce unnecessary overhead. If you want synchronized containers, then wrap the standard ones in your own. Maybe some sort of synchronized wrapper will be provided by Phobos at some point, but the containers themselves are definitely not going to be. You can add the multi- threading protections and the cost that they bring on top of an implementation that doesn't have them, but if you have such protections built-in, those using the containers can't remove them and would be stuck with the performance cost. - Jonathan M Davis
Dec 20 2011