digitalmars.D - Copy-On-Write (COW) Managed Containers?

Per =?UTF-8?B?Tm9yZGzDtnc=?= (12/12) Oct 20 2020 What are your thoughts on the pros and cons with copy-on-write

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (10/18) Oct 20 2020 Copy-on-write only makes sense if you intend to make copies

IGotD- (22/33) Oct 20 2020 I can see several cases where you want to do operations on

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (17/31) Oct 20 2020 Yes, C++ has std::span for that. Not really sure why they also

Max Haughton (16/47) Oct 20 2020 I think Facebook's string library still has flag/s for small

ikod (5/8) Oct 20 2020 I use COW to create copy for hashmap bucket array if user decided

rikki cattermole (6/10) Oct 20 2020 I am very interested in concurrent data structures. They give the

Per =?UTF-8?B?Tm9yZGzDtnc=?= <per.nordlow gmail.com> writes:

What are your thoughts on the pros and cons with copy-on-write 
(COW) managed allocations in D, typically for containers such as 
`std.container.Array`?

Has it been considered for use in a standard 
containers/collections library in D?

Why not?

The only example I've found is C++ is std::string which in some 
versions of STL seems to make use of COW.

Swift, on the other hand, uses it extensively to minimize 
implicit aliasing. At [1] Chris Lattner outlines the design 
choices behind this decision.

[1] https://youtu.be/nWTvXbQHwWs?t=1026

Oct 20 2020

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Tuesday, 20 October 2020 at 12:42:26 UTC, Per Nordlöw wrote:
 What are your thoughts on the pros and cons with copy-on-write 
 (COW) managed allocations in D, typically for containers such 
 as `std.container.Array`?

 Has it been considered for use in a standard 
 containers/collections library in D?

 Why not?

 The only example I've found is C++ is std::string which in some 
 versions of STL seems to make use of COW.

Copy-on-write only makes sense if you intend to make copies 
without modifying them. That is rather unlikely for arrays. When 
you create a copy you usually do it with the intent of modifying 
the copy.

I think this behaviour for std::string came about because C++ 
didn't get std::string_view until recently. I think it is a flaw.

If you use reference counting throughout like Swift/Objective-C, 
then I guess you could do it. But it isn't really suitable for 
low level programming. It is a high level feature.

Oct 20 2020

IGotD- <nise nise.com> writes:

On Tuesday, 20 October 2020 at 14:56:44 UTC, Ola Fosheim Grøstad 
wrote:
 Copy-on-write only makes sense if you intend to make copies 
 without modifying them. That is rather unlikely for arrays. 
 When you create a copy you usually do it with the intent of 
 modifying the copy.

 I think this behaviour for std::string came about because C++ 
 didn't get std::string_view until recently. I think it is a 
 flaw.

 If you use reference counting throughout like 
 Swift/Objective-C, then I guess you could do it. But it isn't 
 really suitable for low level programming. It is a high level 
 feature.

I can see several cases where you want to do operations on 
slices, regardless if it as string or other type of elements.

Today std::string use SSO (short string optimization) in most 
libraries which means that if the string is shorter than a 
certain size it can be stored inside the class (usually around 16 
bytes) otherwise it must allocate the array. If the string is 
copied it actually copies the data and do not reuse anything. 
Previously which must be several years from now std::string used 
COW and the reason was it didn't scale with multiprocessor 
environments they claimed but I've not seen the actual reasoning 
behind it.

The reason we have string_view is because around C++11 
std::string added the zero termination by default which wasn't 
required before. Now string_view is required because of this and 
you can really discuss if that was a sane choice.

Also, reference counting might very well be suitable for low 
level programming. It's actually a GC method that is used often. 
The Linux kernel is full of it, in a manual fashion of course.

Isn't both the array in std.container.Array and the regular 
built-in array COW in D?

Oct 20 2020

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Tuesday, 20 October 2020 at 15:28:36 UTC, IGotD- wrote:
 I can see several cases where you want to do operations on 
 slices, regardless if it as string or other type of elements.

Yes, C++ has std::span for that. Not really sure why they also 
wanted string_view.

 anything. Previously which must be several years from now 
 std::string used COW and the reason was it didn't scale with 
 multiprocessor environments they claimed but I've not seen the 
 actual reasoning behind it.

I don't know. You can use COW when designing a high level 
language for multiprocessor execution (HPC), but that is 
something different than what we are speaking of here? And it 
only makes sense if the compiler is able to reason about 
concurrency.

 The reason we have string_view is because around C++11 
 std::string added the zero termination by default which wasn't 
 required before. Now string_view is required because of this 
 and you can really discuss if that was a sane choice.

I doubt people use std::string for much more than paths and 
names... It is a very lacklustre design, but then again, no 
string-representation can fit all use scenarios (in low level 
programming that is).

 Also, reference counting might very well be suitable for low 
 level programming.

Yes, but not as a homogenous reference strategy.

 Isn't both the array in std.container.Array and the regular 
 built-in array COW in D?

COW would require all mutable operations to test a flag in the 
object before mutation. That is a performance killer.

Maybe you are talking about optimizations? But that would not be 
COW...

Oct 20 2020

Max Haughton <maxhaton gmail.com> writes:

On Tuesday, 20 October 2020 at 15:52:53 UTC, Ola Fosheim Grøstad 
wrote:
 On Tuesday, 20 October 2020 at 15:28:36 UTC, IGotD- wrote:
 I can see several cases where you want to do operations on 
 slices, regardless if it as string or other type of elements.

 Yes, C++ has std::span for that. Not really sure why they also 
 wanted string_view.

 anything. Previously which must be several years from now 
 std::string used COW and the reason was it didn't scale with 
 multiprocessor environments they claimed but I've not seen the 
 actual reasoning behind it.

 I don't know. You can use COW when designing a high level 
 language for multiprocessor execution (HPC), but that is 
 something different than what we are speaking of here? And it 
 only makes sense if the compiler is able to reason about 
 concurrency.

 The reason we have string_view is because around C++11 
 std::string added the zero termination by default which wasn't 
 required before. Now string_view is required because of this 
 and you can really discuss if that was a sane choice.

 I doubt people use std::string for much more than paths and 
 names... It is a very lacklustre design, but then again, no 
 string-representation can fit all use scenarios (in low level 
 programming that is).

 Also, reference counting might very well be suitable for low 
 level programming.

 Yes, but not as a homogenous reference strategy.

 Isn't both the array in std.container.Array and the regular 
 built-in array COW in D?

 COW would require all mutable operations to test a flag in the 
 object before mutation. That is a performance killer.

 Maybe you are talking about optimizations? But that would not 
 be COW...

I think Facebook's string library still has flag/s for small 
string, dynamic, and COW.

The container will have flags anyway, the performance hit could 
be mitigated (I am writing a library to help measure this). For 
example, with some trickery you can turn a branch into a 
conditional move or bitops - the amortized performance benefit 
may make it worth doing too, so keep that in mind (i.e. a smaller 
container with slower flag checking may be faster than the 
opposite due to cache performance)

D has an advantange here, because the metaprogramming makes 
choosing (say) internal buffer sizes easier, and we can choose 
not to enable COW for shared types if needed.

Phobos could really use some  nogc containers using 
std.exp.allocator.

Oct 20 2020

ikod <igor.khasilev gmail.com> writes:

On Tuesday, 20 October 2020 at 12:42:26 UTC, Per Nordlöw wrote:
 What are your thoughts on the pros and cons with copy-on-write 
 (COW) managed allocations in D, typically for containers such 
 as `std.container.Array`?

I use COW to create copy for hashmap bucket array if user decided 
to mutate container during byKey/byPair iteration. The only 
downside I see is temporary doubled memory usage. The benefits 
are clear - you can provide stable iterators.

Oct 20 2020

rikki cattermole <rikki cattermole.co.nz> writes:

On 21/10/2020 4:40 AM, ikod wrote:
 I use COW to create copy for hashmap bucket array if user decided to 
 mutate container during byKey/byPair iteration. The only downside I see 
 is temporary doubled memory usage. The benefits are clear - you can 
 provide stable iterators.

I am very interested in concurrent data structures. They give the 
guarantee that they will still work with mutation during iteration and 
won't lock.

COW given a concurrent data structure alternative is probably less 
desirable given that it will require allocation at the minimum.

Oct 20 2020

D Programming

C/C++ Programming

Other

digitalmars.D - Copy-On-Write (COW) Managed Containers?