www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Copy-On-Write (COW) Managed Containers?

reply Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:
What are your thoughts on the pros and cons with copy-on-write 
(COW) managed allocations in D, typically for containers such as 
`std.container.Array`?

Has it been considered for use in a standard 
containers/collections library in D?

Why not?

The only example I've found is C++ is std::string which in some 
versions of STL seems to make use of COW.

Swift, on the other hand, uses it extensively to minimize 
implicit aliasing. At [1] Chris Lattner outlines the design 
choices behind this decision.

[1] https://youtu.be/nWTvXbQHwWs?t=1026
Oct 20 2020
next sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 20 October 2020 at 12:42:26 UTC, Per Nordlöw wrote:
 What are your thoughts on the pros and cons with copy-on-write 
 (COW) managed allocations in D, typically for containers such 
 as `std.container.Array`?

 Has it been considered for use in a standard 
 containers/collections library in D?

 Why not?

 The only example I've found is C++ is std::string which in some 
 versions of STL seems to make use of COW.
Copy-on-write only makes sense if you intend to make copies without modifying them. That is rather unlikely for arrays. When you create a copy you usually do it with the intent of modifying the copy. I think this behaviour for std::string came about because C++ didn't get std::string_view until recently. I think it is a flaw. If you use reference counting throughout like Swift/Objective-C, then I guess you could do it. But it isn't really suitable for low level programming. It is a high level feature.
Oct 20 2020
parent reply IGotD- <nise nise.com> writes:
On Tuesday, 20 October 2020 at 14:56:44 UTC, Ola Fosheim Grøstad 
wrote:
 Copy-on-write only makes sense if you intend to make copies 
 without modifying them. That is rather unlikely for arrays. 
 When you create a copy you usually do it with the intent of 
 modifying the copy.

 I think this behaviour for std::string came about because C++ 
 didn't get std::string_view until recently. I think it is a 
 flaw.

 If you use reference counting throughout like 
 Swift/Objective-C, then I guess you could do it. But it isn't 
 really suitable for low level programming. It is a high level 
 feature.
I can see several cases where you want to do operations on slices, regardless if it as string or other type of elements. Today std::string use SSO (short string optimization) in most libraries which means that if the string is shorter than a certain size it can be stored inside the class (usually around 16 bytes) otherwise it must allocate the array. If the string is copied it actually copies the data and do not reuse anything. Previously which must be several years from now std::string used COW and the reason was it didn't scale with multiprocessor environments they claimed but I've not seen the actual reasoning behind it. The reason we have string_view is because around C++11 std::string added the zero termination by default which wasn't required before. Now string_view is required because of this and you can really discuss if that was a sane choice. Also, reference counting might very well be suitable for low level programming. It's actually a GC method that is used often. The Linux kernel is full of it, in a manual fashion of course. Isn't both the array in std.container.Array and the regular built-in array COW in D?
Oct 20 2020
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Tuesday, 20 October 2020 at 15:28:36 UTC, IGotD- wrote:
 I can see several cases where you want to do operations on 
 slices, regardless if it as string or other type of elements.
Yes, C++ has std::span for that. Not really sure why they also wanted string_view.
 anything. Previously which must be several years from now 
 std::string used COW and the reason was it didn't scale with 
 multiprocessor environments they claimed but I've not seen the 
 actual reasoning behind it.
I don't know. You can use COW when designing a high level language for multiprocessor execution (HPC), but that is something different than what we are speaking of here? And it only makes sense if the compiler is able to reason about concurrency.
 The reason we have string_view is because around C++11 
 std::string added the zero termination by default which wasn't 
 required before. Now string_view is required because of this 
 and you can really discuss if that was a sane choice.
I doubt people use std::string for much more than paths and names... It is a very lacklustre design, but then again, no string-representation can fit all use scenarios (in low level programming that is).
 Also, reference counting might very well be suitable for low 
 level programming.
Yes, but not as a homogenous reference strategy.
 Isn't both the array in std.container.Array and the regular 
 built-in array COW in D?
COW would require all mutable operations to test a flag in the object before mutation. That is a performance killer. Maybe you are talking about optimizations? But that would not be COW...
Oct 20 2020
parent Max Haughton <maxhaton gmail.com> writes:
On Tuesday, 20 October 2020 at 15:52:53 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 20 October 2020 at 15:28:36 UTC, IGotD- wrote:
 I can see several cases where you want to do operations on 
 slices, regardless if it as string or other type of elements.
Yes, C++ has std::span for that. Not really sure why they also wanted string_view.
 anything. Previously which must be several years from now 
 std::string used COW and the reason was it didn't scale with 
 multiprocessor environments they claimed but I've not seen the 
 actual reasoning behind it.
I don't know. You can use COW when designing a high level language for multiprocessor execution (HPC), but that is something different than what we are speaking of here? And it only makes sense if the compiler is able to reason about concurrency.
 The reason we have string_view is because around C++11 
 std::string added the zero termination by default which wasn't 
 required before. Now string_view is required because of this 
 and you can really discuss if that was a sane choice.
I doubt people use std::string for much more than paths and names... It is a very lacklustre design, but then again, no string-representation can fit all use scenarios (in low level programming that is).
 Also, reference counting might very well be suitable for low 
 level programming.
Yes, but not as a homogenous reference strategy.
 Isn't both the array in std.container.Array and the regular 
 built-in array COW in D?
COW would require all mutable operations to test a flag in the object before mutation. That is a performance killer. Maybe you are talking about optimizations? But that would not be COW...
I think Facebook's string library still has flag/s for small string, dynamic, and COW. The container will have flags anyway, the performance hit could be mitigated (I am writing a library to help measure this). For example, with some trickery you can turn a branch into a conditional move or bitops - the amortized performance benefit may make it worth doing too, so keep that in mind (i.e. a smaller container with slower flag checking may be faster than the opposite due to cache performance) D has an advantange here, because the metaprogramming makes choosing (say) internal buffer sizes easier, and we can choose not to enable COW for shared types if needed. Phobos could really use some nogc containers using std.exp.allocator.
Oct 20 2020
prev sibling parent reply ikod <igor.khasilev gmail.com> writes:
On Tuesday, 20 October 2020 at 12:42:26 UTC, Per Nordlöw wrote:
 What are your thoughts on the pros and cons with copy-on-write 
 (COW) managed allocations in D, typically for containers such 
 as `std.container.Array`?
I use COW to create copy for hashmap bucket array if user decided to mutate container during byKey/byPair iteration. The only downside I see is temporary doubled memory usage. The benefits are clear - you can provide stable iterators.
Oct 20 2020
parent rikki cattermole <rikki cattermole.co.nz> writes:
On 21/10/2020 4:40 AM, ikod wrote:
 I use COW to create copy for hashmap bucket array if user decided to 
 mutate container during byKey/byPair iteration. The only downside I see 
 is temporary doubled memory usage. The benefits are clear - you can 
 provide stable iterators.
I am very interested in concurrent data structures. They give the guarantee that they will still work with mutation during iteration and won't lock. COW given a concurrent data structure alternative is probably less desirable given that it will require allocation at the minimum.
Oct 20 2020