digitalmars.D - std.container update
- Andrei Alexandrescu (13/13) May 26 2010 I just implemented a singly-linked list type to illustrate the container...
- Steven Schveighoffer (11/23) May 27 2010 Well, one thing I don't like about it is that you cannot do O(1) inserti...
- Steven Schveighoffer (7/33) May 27 2010 In fact, I'd say that it would be critical to split the insert and remov...
- Steven Schveighoffer (7/42) May 27 2010 Actualy mergesort could not be implemented without some sort of way to
- Andrei Alexandrescu (46/95) May 27 2010 On 05/27/2010 08:23 AM, Steven Schveighoffer wrote:
- Steven Schveighoffer (26/38) May 27 2010 I have another general question in general about these collections.
- Pillsy (13/18) May 27 2010 [...]
- Andrei Alexandrescu (10/26) May 27 2010 There are indeed tradeoffs associated with allocation of nodes that way....
- Andrei Alexandrescu (13/13) May 27 2010 Another update:
- Jonathan M Davis (11/11) May 27 2010 I take it that Array is basically supposed to be std.container's version...
- Simen kjaeraas (10/12) May 27 2010 standard name for that sort of container and wouldn't be confused with
- Jonathan M Davis (12/18) May 28 2010 Vector is arguably a poor name because of that. However, it is, at this
- Andrei Alexandrescu (3/14) May 27 2010 Good point. Other opinions?
- Leandro Lucarella (16/31) May 27 2010 I always thought vector was a bad name choice in C++, I had that word
- Andrei Alexandrescu (16/40) May 27 2010 This is a good question. The relationship between Array and T[] is simpl...
- Jonathan M Davis (16/20) May 28 2010 The biggest difference is that a vector has capacity separate from its
- bearophile (4/10) May 28 2010 I can think about the opposite too: to remove the built-in dynamic array...
- Leandro Lucarella (10/21) May 28 2010 OK, I forgot that, I guess my brain choose to forget it to stop keeping
- Steven Schveighoffer (7/19) May 28 2010 This isn't exactly true. D's builtin arrays also are amortized constant...
I just implemented a singly-linked list type to illustrate the container abstraction. http://erdani.com/d/phobos/std_container.html http://erdani.com/d/phobos/container.d One interesting aspect is that SList must cooperate with Take. More details are to be found in code, but essentially SList ranges are all sentinel-terminated, which makes them "right-bound". If you want to only remove a few elements from the middle of a list, you need to construct a range that spans a limited portion of the list. I encoded that by using the Take abstraction in std.range. Please let me know of how you find it. There are a few refinements and ancillary functions to define, but those are pretty clear given the rest. Andrei
May 26 2010
On Thu, 27 May 2010 00:36:24 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I just implemented a singly-linked list type to illustrate the container abstraction. http://erdani.com/d/phobos/std_container.html http://erdani.com/d/phobos/container.d One interesting aspect is that SList must cooperate with Take. More details are to be found in code, but essentially SList ranges are all sentinel-terminated, which makes them "right-bound". If you want to only remove a few elements from the middle of a list, you need to construct a range that spans a limited portion of the list. I encoded that by using the Take abstraction in std.range. Please let me know of how you find it. There are a few refinements and ancillary functions to define, but those are pretty clear given the rest.Well, one thing I don't like about it is that you cannot do O(1) insertion or removal. insertBefore requires O(n) complexity and insertAfter requires O(m) complexity. Removal will require O(n) complexity, even though it doesn't exist right now. Fast removal and insertion are key to having a linked list. I'd suggest modifying the range so it uses a Node ** instead, which points to the _next element of the previous node of the one currently being iterated, or _root if it's the first node in the list. -Steve
May 27 2010
On Thu, 27 May 2010 09:23:03 -0400, Steven Schveighoffer <schveiguy yahoo.com> wrote:On Thu, 27 May 2010 00:36:24 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:In fact, I'd say that it would be critical to split the insert and remove functions to slow (O(n) or greater) versions and fast( O(lg(n) or less)) functions. Some algorithms may depend on fast insertion and removal, such as mergesort on a linked list. -SteveI just implemented a singly-linked list type to illustrate the container abstraction. http://erdani.com/d/phobos/std_container.html http://erdani.com/d/phobos/container.d One interesting aspect is that SList must cooperate with Take. More details are to be found in code, but essentially SList ranges are all sentinel-terminated, which makes them "right-bound". If you want to only remove a few elements from the middle of a list, you need to construct a range that spans a limited portion of the list. I encoded that by using the Take abstraction in std.range. Please let me know of how you find it. There are a few refinements and ancillary functions to define, but those are pretty clear given the rest.Well, one thing I don't like about it is that you cannot do O(1) insertion or removal. insertBefore requires O(n) complexity and insertAfter requires O(m) complexity. Removal will require O(n) complexity, even though it doesn't exist right now. Fast removal and insertion are key to having a linked list. I'd suggest modifying the range so it uses a Node ** instead, which points to the _next element of the previous node of the one currently being iterated, or _root if it's the first node in the list.
May 27 2010
On Thu, 27 May 2010 09:27:39 -0400, Steven Schveighoffer <schveiguy yahoo.com> wrote:On Thu, 27 May 2010 09:23:03 -0400, Steven Schveighoffer <schveiguy yahoo.com> wrote:Actualy mergesort could not be implemented without some sort of way to restructure the list without allocating new nodes, requiring knowledge of the link structure. I'm not sure it could be generic... I'll think about this a bit. -SteveOn Thu, 27 May 2010 00:36:24 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:In fact, I'd say that it would be critical to split the insert and remove functions to slow (O(n) or greater) versions and fast( O(lg(n) or less)) functions. Some algorithms may depend on fast insertion and removal, such as mergesort on a linked list.I just implemented a singly-linked list type to illustrate the container abstraction. http://erdani.com/d/phobos/std_container.html http://erdani.com/d/phobos/container.d One interesting aspect is that SList must cooperate with Take. More details are to be found in code, but essentially SList ranges are all sentinel-terminated, which makes them "right-bound". If you want to only remove a few elements from the middle of a list, you need to construct a range that spans a limited portion of the list. I encoded that by using the Take abstraction in std.range. Please let me know of how you find it. There are a few refinements and ancillary functions to define, but those are pretty clear given the rest.Well, one thing I don't like about it is that you cannot do O(1) insertion or removal. insertBefore requires O(n) complexity and insertAfter requires O(m) complexity. Removal will require O(n) complexity, even though it doesn't exist right now. Fast removal and insertion are key to having a linked list. I'd suggest modifying the range so it uses a Node ** instead, which points to the _next element of the previous node of the one currently being iterated, or _root if it's the first node in the list.
May 27 2010
On 05/27/2010 08:23 AM, Steven Schveighoffer wrote: You're making a number of great points in the four posts starting with this. Since they sort of augment one another, let me quote and answer them all here.On Thu, 27 May 2010 00:36:24 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:The performance profile of SList is what one would expect for a straightforward singly-linked list implemented with nodes, pointer to root, and the such. You are making a good point that adding one 1-2 extra indirections would lead to improvements of certain primitives. There have been various experiments with STL-like slist implementations that dwell on such ideas. See e.g. http://manpages.ubuntu.com/manpages/lucid/man3/std::forward_list.3cxx.html with functions like before_begin() in addition to begin(), and also insert_after() instead of insert(). It has become clear to STL implementors that adding hidden indirections puts forward_list at a disadvantage compared to the straightforward lists against which they compete. I want to avoid that by abiding to the straight storage strategy and deal with its consequences. Second post:I just implemented a singly-linked list type to illustrate the container abstraction. http://erdani.com/d/phobos/std_container.html http://erdani.com/d/phobos/container.d Please let me know of how you find it. There are a few refinements and ancillary functions to define, but those are pretty clear given the rest.Well, one thing I don't like about it is that you cannot do O(1) insertion or removal. insertBefore requires O(n) complexity and insertAfter requires O(m) complexity. Removal will require O(n) complexity, even though it doesn't exist right now. Fast removal and insertion are key to having a linked list. I'd suggest modifying the range so it uses a Node ** instead, which points to the _next element of the previous node of the one currently being iterated, or _root if it's the first node in the list. -SteveIn fact, I'd say that it would be critical to split the insert and remove functions to slow (O(n) or greater) versions and fast( O(lg(n) or less)) functions. Some algorithms may depend on fast insertion and removal, such as mergesort on a linked list.This is a good idea, which generalizes linearRemove(). I'd suspected if there's one place to distinguish linear from better, then there ought to be more. From here we have a fork in the road: a) Tighten complexity requirements for e.g. insertBefore(Range, Stuff) and let containers that can't implement them just not have them. b) Define linearInsertBefore(Range, Stuff) etc. I'm leaning towards (a) because a list can insert virtually anywhere by only using insertAfter(Take!Range, Stuff) (which I haven't defined yet but I will, and which also takes care of the complexity issue) and insertFront(Stuff). Third post:Actualy mergesort could not be implemented without some sort of way to restructure the list without allocating new nodes, requiring knowledge of the link structure. I'm not sure it could be generic... I'll think about this a bit.You are entirely right. A mergesort that relies on link shuffling needs to have intimate knowledge of the node structure. It would most likely be a member of the list. Fourth post:I have another general question in general about these collections. Ranges fix a very bad problem with iterators -- non-matching begin and end iterators. For example, passing a begin from one list and and end from another.Right.However, all your range functions such as remove, insertBefore, etc. are called via the container type, using a range as an argument. But there can be no static verification that a range actually is part of a given container. Should there be some requirement that a container validates that a range is part of the container before performing the operation? This is the case for dcollections. In your current implementation of SList, verification seems incidental for insertBefore because you must find the previous node from the root anyways (and that will throw on enforce if it cannot find it). But insertAfter does not, so something like this will compile, run, and result in strange behavior: auto sl1 = make(SList!int, 1, 2, 3, 4, 5); // I may not have this correct, but it's not important auto sl2 = make(SList!int, 6, 7, 8, 9, 10); auto r1 = sl1[]; r1.popFront(); r1.popFront(); auto r2 = take(r1, 2); sl2.insertAfter(r2, 666); // actually inserts into sl1It's a good question. My current view of the matter is to maximize library flexibility and versatility. Following the adage that you can build safe on fast but not the other way around, I'd go with the following philosophy: checking of range membership to their collections should be made only to the extent of ensuring memory safety. Beyond that, if efficiency is in tension with verifiability (as is the case with your example above), leave it to the programmer or the supra-structure to correctly pair collections with their ranges. Sounds good? Andrei
May 27 2010
On Thu, 27 May 2010 00:36:24 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:I just implemented a singly-linked list type to illustrate the container abstraction. http://erdani.com/d/phobos/std_container.html http://erdani.com/d/phobos/container.d One interesting aspect is that SList must cooperate with Take. More details are to be found in code, but essentially SList ranges are all sentinel-terminated, which makes them "right-bound". If you want to only remove a few elements from the middle of a list, you need to construct a range that spans a limited portion of the list. I encoded that by using the Take abstraction in std.range. Please let me know of how you find it. There are a few refinements and ancillary functions to define, but those are pretty clear given the rest.I have another general question in general about these collections. Ranges fix a very bad problem with iterators -- non-matching begin and end iterators. For example, passing a begin from one list and and end from another. However, all your range functions such as remove, insertBefore, etc. are called via the container type, using a range as an argument. But there can be no static verification that a range actually is part of a given container. Should there be some requirement that a container validates that a range is part of the container before performing the operation? This is the case for dcollections. In your current implementation of SList, verification seems incidental for insertBefore because you must find the previous node from the root anyways (and that will throw on enforce if it cannot find it). But insertAfter does not, so something like this will compile, run, and result in strange behavior: auto sl1 = make(SList!int, 1, 2, 3, 4, 5); // I may not have this correct, but it's not important auto sl2 = make(SList!int, 6, 7, 8, 9, 10); auto r1 = sl1[]; r1.popFront(); r1.popFront(); auto r2 = take(r1, 2); sl2.insertAfter(r2, 666); // actually inserts into sl1 -Steve
May 27 2010
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article:I just implemented a singly-linked list type to illustrate the container abstraction. http://erdani.com/d/phobos/std_container.html http://erdani.com/d/phobos/container.d[...]Please let me know of how you find it.One thing worries me a bit, but may be based on a misunderstanding of how D's GC works. In dup, you allocate all the nodes at once as an array and then go through them one by one setting their payload_ and next_ values to make the list. My worry is that the array is allocated as a single block and (presumably) GC'd as a single block. If I copy a 1000 element list and then remove 999 elements from the front using removeFront(), won't I end up leaking memory for the first 999 nodes? Cheers, Pillsy
May 27 2010
On 05/27/2010 11:01 AM, Pillsy wrote:== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article:There are indeed tradeoffs associated with allocation of nodes that way. Technically that's not a leak, but indeed the entire block will stay put for as long as at least one node in it it's used. One improvement would be to put removed nodes in a static freelist to be used by all SLists of the same type in the current thread. In that case, remove becomes unstable. I think I hastened a bit with that optimization, it's debatable and detracts from the main point of the discussion. AndreiI just implemented a singly-linked list type to illustrate the container abstraction. http://erdani.com/d/phobos/std_container.html http://erdani.com/d/phobos/container.d[...]Please let me know of how you find it.One thing worries me a bit, but may be based on a misunderstanding of how D's GC works. In dup, you allocate all the nodes at once as an array and then go through them one by one setting their payload_ and next_ values to make the list. My worry is that the array is allocated as a single block and (presumably) GC'd as a single block. If I copy a 1000 element list and then remove 999 elements from the front using removeFront(), won't I end up leaking memory for the first 999 nodes?
May 27 2010
Another update: http://erdani.com/d/phobos/std_container.html http://erdani.com/d/phobos/container.d I simplified the implementation (no more array allocation etc.), eliminated replace() and insertBefore() after convincing myself they are not a good fit for lists, and added linearRemove(Take!Range). Take!Range seems to be the secret recipe for an enjoyable SList experience. I'll follow up later with an Array sample implementation. Then on the radar are an associative container (probably a straight hash or a binary tree) and then TightSList and TightArray which use deterministic storage management. I figured out a lot of stuff, and it seems to settle together quite nicely. Andrei
May 27 2010
I take it that Array is basically supposed to be std.container's version of C++'s vector or Java's ArrayList? If so, I would suggest that Array is not the best of names in that it would become very easy to confuse it with built-in arrays when discussing them (particularly in verbal communication where you can't see the capital A). Personally, I would have just gone with Vector, since it's a fairly standard name for that sort of container and wouldn't be confused with anything else. If you really want Array, that's fine - it should be clear enough when it's in code - but I would think that talking about it could easily be confusing enough that it wouldn't be the best of names. - Jonathan M Davis
May 27 2010
Jonathan M Davis <jmdavisProg gmail.com> wrote:wouldn't be confused with anything else. If you really want Array, that's Personally, I would have just gone with Vector, since it's a fairlystandard name for that sort of container and wouldn't be confused with anything else. I take it you don't work in simulation or games, then? Don't do much linear algebra? While I understand the reasoning for it, I dislike the name vector for arrays. A vector to me, is a geometric object with a length and magnitude, not a random collection of whatevers. -- Simen
May 27 2010
Simen kjaeraas wrote:I take it you don't work in simulation or games, then? Don't do much linear algebra? While I understand the reasoning for it, I dislike the name vector for arrays. A vector to me, is a geometric object with a length and magnitude, not a random collection of whatevers.Vector is arguably a poor name because of that. However, it is, at this point, a very standard name for such a container. At minimum, both C++ and Java have a vector class, though the main one that you'd use in Java at this point would be ArrayList rather than Vector. So, if a better name can be found, that's great, but Vector is quite a standard name for the concept at this point, and I do think that it works better than Array. I also don't like the name ArrayList because I wouldn't really consider a vector to be a list, but that's another option. I'm not all that familiar with what other languages have used for such a class, but there are probably other pre- existing options out there as well. - Jonathan M Davis
May 28 2010
On 05/27/2010 06:28 PM, Jonathan M Davis wrote:I take it that Array is basically supposed to be std.container's version of C++'s vector or Java's ArrayList? If so, I would suggest that Array is not the best of names in that it would become very easy to confuse it with built-in arrays when discussing them (particularly in verbal communication where you can't see the capital A). Personally, I would have just gone with Vector, since it's a fairly standard name for that sort of container and wouldn't be confused with anything else. If you really want Array, that's fine - it should be clear enough when it's in code - but I would think that talking about it could easily be confusing enough that it wouldn't be the best of names. - Jonathan M DavisGood point. Other opinions? Andrei
May 27 2010
Andrei Alexandrescu, el 27 de mayo a las 20:06 me escribiste:On 05/27/2010 06:28 PM, Jonathan M Davis wrote:I always thought vector was a bad name choice in C++, I had that word associated with what C++ calls a valarray (a physics vector). I agree that Array is too generic for D, though, and unfortunately I don't have a better suggestion, but you asked for other opinions ;) BTW, what would be the point of an array/vector when you have built-in arrays? If built-in arrays would be syntax sugar for a real library type, like AAs, I can see as a good option using Array for that type, since built-in arrays and the library Array would be the same thing. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- DETIENEN A PADRE, MADRE, TIOS Y ABUELOS: TODOS DEPRAVADOS -- Crónica TVI take it that Array is basically supposed to be std.container's version of C++'s vector or Java's ArrayList? If so, I would suggest that Array is not the best of names in that it would become very easy to confuse it with built-in arrays when discussing them (particularly in verbal communication where you can't see the capital A). Personally, I would have just gone with Vector, since it's a fairly standard name for that sort of container and wouldn't be confused with anything else. If you really want Array, that's fine - it should be clear enough when it's in code - but I would think that talking about it could easily be confusing enough that it wouldn't be the best of names. - Jonathan M DavisGood point. Other opinions?
May 27 2010
On 05/27/2010 08:42 PM, Leandro Lucarella wrote:Andrei Alexandrescu, el 27 de mayo a las 20:06 me escribiste:This is a good question. The relationship between Array and T[] is simple: * Array is a compliant container so it is a reference type * T[] is Array's range With built-in associative arrays the question becomes even more interesting because the associative array _is_ already a reference type. So Walter and I agreed to make built-in associative arrays just compliant std.container objects, with three differences: * AssocArray lives in object.di for various reasons * The compiler translates the type name V[K] as AssocArray!(K, V) * The compiler translates associative array literals into associative array objects Other than that... an associative array will be just one of the growing offering of collections in std.container. And I think that's the way it should be. AndreiOn 05/27/2010 06:28 PM, Jonathan M Davis wrote:I always thought vector was a bad name choice in C++, I had that word associated with what C++ calls a valarray (a physics vector). I agree that Array is too generic for D, though, and unfortunately I don't have a better suggestion, but you asked for other opinions ;) BTW, what would be the point of an array/vector when you have built-in arrays? If built-in arrays would be syntax sugar for a real library type, like AAs, I can see as a good option using Array for that type, since built-in arrays and the library Array would be the same thing.I take it that Array is basically supposed to be std.container's version of C++'s vector or Java's ArrayList? If so, I would suggest that Array is not the best of names in that it would become very easy to confuse it with built-in arrays when discussing them (particularly in verbal communication where you can't see the capital A). Personally, I would have just gone with Vector, since it's a fairly standard name for that sort of container and wouldn't be confused with anything else. If you really want Array, that's fine - it should be clear enough when it's in code - but I would think that talking about it could easily be confusing enough that it wouldn't be the best of names. - Jonathan M DavisGood point. Other opinions?
May 27 2010
BTW, what would be the point of an array/vector when you have built-in arrays? If built-in arrays would be syntax sugar for a real library type, like AAs, I can see as a good option using Array for that type, since built-in arrays and the library Array would be the same thing.The biggest difference is that a vector has capacity separate from its size/length. You can efficiency insert elements at the end with a vector - amortized constant time usually - but you can't do that with a built-in array because it would have to reallocate every time. D's arrays are fantastic, but they're still not quite good enough to outright replace a vector type. On top of that, of course, there's the issue of the API used to interact with it, which is what we're getting with std.container, but that could probably be gotten around in much the same way as the range stuff was with std.array. Really, the big difference is that you can efficiently append to a vector type while you cannot do that with an array. The array is still a tad too low level. And since you do need that low levelness at least some of the time (particularly in a systems language), it's not like you could replace arrays with vectors or make them act more like vectors. Both types have their uses. - Jonathan M Davis
May 28 2010
Jonathan M Davis:The biggest difference is that a vector has capacity separate from its size/length. You can efficiency insert elements at the end with a vector - amortized constant time usually - but you can't do that with a built-in array because it would have to reallocate every time. D's arrays are fantastic, but they're still not quite good enough to outright replace a vector type.I can think about the opposite too: to remove the built-in dynamic arrays and replace them as it's being done with associative arrays, keeping only the syntax sugar and mapping them to the Array/Vector. Is this a bad idea? (this is only about dynamic arrays, at the moment built-in fixed-sized arrays have to be kept). Bye, bearophile
May 28 2010
Jonathan M Davis, el 28 de mayo a las 04:58 me escribiste:OK, I forgot that, I guess my brain choose to forget it to stop keeping the flame alive, so I'll forget all about it again =P -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Estamos cantando a la sombra de nuestra parra una canción que dice que uno sólo conserva lo que no amarraBTW, what would be the point of an array/vector when you have built-in arrays? If built-in arrays would be syntax sugar for a real library type, like AAs, I can see as a good option using Array for that type, since built-in arrays and the library Array would be the same thing.The biggest difference is that a vector has capacity separate from its size/length. You can efficiency insert elements at the end with a vector - amortized constant time usually - but you can't do that with a built-in array because it would have to reallocate every time. D's arrays are fantastic, but they're still not quite good enough to outright replace a vector type.
May 28 2010
On Fri, 28 May 2010 07:58:55 -0400, Jonathan M Davis <jmdavisProg gmail.com> wrote:This isn't exactly true. D's builtin arrays also are amortized constant time to append, but the constant is way higher :) I agree that an array-like container would provide features that D's builtin arrays do not, including much better append performance. -SteveBTW, what would be the point of an array/vector when you have built-in arrays? If built-in arrays would be syntax sugar for a real library type, like AAs, I can see as a good option using Array for that type, since built-in arrays and the library Array would be the same thing.The biggest difference is that a vector has capacity separate from its size/length. You can efficiency insert elements at the end with a vector - amortized constant time usually - but you can't do that with a built-in array because it would have to reallocate every time. D's arrays are fantastic, but they're still not quite good enough to outright replace a vector type.
May 28 2010