www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - copy and array length vs capacity. (Doc suggestion?)

reply Jon D <jond noreply.com> writes:
Something I found confusing was the relationship between array 
capacity and copy(). A short example:

void main()
{
     import std.algorithm: copy;

     auto a = new int[](3);
     assert(a.length == 3);
     [1, 2, 3].copy(a);     // Okay

     int[] b;
     b.reserve(3);
     assert(b.capacity >= 3);
     assert(b.length == 0);
     [1, 2, 3].copy(b);     // Error
}

I had expected that copy() would work if the target had 
sufficient capacity, but that's not the case. Target has to have 
sufficient length.

If I've understood this correctly, a small change to the 
documentation for copy() might make this clearer. In particular, 
the "precondition" section:

     Preconditions:
     target shall have enough room to accomodate the entirety of 
source.

Clarifying that "enough room" means 'length' rather than 
'capacity' might be beneficial.
Nov 21 2015
next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
Hi Jon! :)

On 11/21/2015 03:34 PM, Jon D wrote:

      Preconditions:
      target shall have enough room to accomodate the entirety of source.

 Clarifying that "enough room" means 'length' rather than 'capacity'
 might be beneficial.
May I suggest that you improve that page. ;) If you don't already have a clone o the repo, you can do it easily by clicking the "Improve this page" button on that page. Regarding why copy() cannot use the capacity of the slice, it is because slices don't know about each other, so, copy could not let other slices know that the capacity has just been used by this particular slice. However, copy() could first append an element, in which case the capacity would be owned by this slice. copy() then safely use the capacity, knowing very well that the act of appending that one element has dropped the capacities of all other slices to zero. In pseudo code: if there is enough capacity and if copying will spill into capacity then append an element copy by spilling into capacity set .length appropriately Others, please review, implement, prove that it is efficient, and post a pull request. :) Ali
Nov 21 2015
parent Jon D <jond noreply.com> writes:
On Sunday, 22 November 2015 at 00:10:07 UTC, Ali Çehreli wrote:
 May I suggest that you improve that page. ;) If you don't 
 already have a clone o the repo, you can do it easily by 
 clicking the "Improve this page" button on that page.
Hi Ali, thanks for the quick response. And point taken :) I hadn't noticed those buttons on the doc pages, looks very convenient. There are a couple formalities I need to look into before making contributions, even small ones, but I'll check into these.
 Regarding why copy() cannot use the capacity of the slice, it 
 is because slices don't know about each other, so, copy could 
 not let other slices know that the capacity has just been used 
 by this particular slice.
Thanks for the explanation, very helpful understanding what's going on. --Jon
Nov 21 2015
prev sibling parent reply Jonathan M Davis via Digitalmars-d-learn writes:
On Saturday, November 21, 2015 23:34:25 Jon D via Digitalmars-d-learn wrote:
 Something I found confusing was the relationship between array
 capacity and copy(). A short example:

 void main()
 {
      import std.algorithm: copy;

      auto a = new int[](3);
      assert(a.length == 3);
      [1, 2, 3].copy(a);     // Okay

      int[] b;
      b.reserve(3);
      assert(b.capacity >= 3);
      assert(b.length == 0);
      [1, 2, 3].copy(b);     // Error
 }

 I had expected that copy() would work if the target had
 sufficient capacity, but that's not the case. Target has to have
 sufficient length.

 If I've understood this correctly, a small change to the
 documentation for copy() might make this clearer. In particular,
 the "precondition" section:

      Preconditions:
      target shall have enough room to accomodate the entirety of
 source.

 Clarifying that "enough room" means 'length' rather than
 'capacity' might be beneficial.
Honestly, arrays suck as output ranges. They don't get appended to; they get filled, and for better or worse, the documentation for copy is probably assuming that you know that. If you want your array to be appended to when using it as an output range, then you need to use std.array.Appender. - Jonathan M Davis
Nov 21 2015
parent reply Jon D <jond noreply.com> writes:
On Sunday, 22 November 2015 at 00:31:53 UTC, Jonathan M Davis 
wrote:
 Honestly, arrays suck as output ranges. They don't get appended 
 to; they get filled, and for better or worse, the documentation 
 for copy is probably assuming that you know that. If you want 
 your array to be appended to when using it as an output range, 
 then you need to use std.array.Appender.
Hi Jonathan, thanks for the reply and the info about std.array.Appender. I was actually using copy to fill an array, not append. However, I also wanted to preallocate the space. And, since I'm mainly trying to understand the language, I was also trying to figure out the difference between these two forms of creating a dynamic array with an initial size: auto x = new int[](n); int[] y; y.reserve(n); The obvious difference is that first initializes n values, the second form does not. I'm still unclear if there are other material differences, or when one might be preferred over the other :) It's was in this context the behavior of copy surprised me, that it wouldn't operate on the second form without first filling in the elements. If this seems unclear, I can provide a slightly longer sample showing what I was doing. --Jon
Nov 21 2015
next sibling parent Jonathan M Davis via Digitalmars-d-learn writes:
On Sunday, November 22, 2015 03:19:54 Jon D via Digitalmars-d-learn wrote:
 On Sunday, 22 November 2015 at 00:31:53 UTC, Jonathan M Davis
 wrote:
 Honestly, arrays suck as output ranges. They don't get appended
 to; they get filled, and for better or worse, the documentation
 for copy is probably assuming that you know that. If you want
 your array to be appended to when using it as an output range,
 then you need to use std.array.Appender.
Hi Jonathan, thanks for the reply and the info about std.array.Appender. I was actually using copy to fill an array, not append. However, I also wanted to preallocate the space. And, since I'm mainly trying to understand the language, I was also trying to figure out the difference between these two forms of creating a dynamic array with an initial size: auto x = new int[](n); int[] y; y.reserve(n); The obvious difference is that first initializes n values, the second form does not. I'm still unclear if there are other material differences, or when one might be preferred over the other :) It's was in this context the behavior of copy surprised me, that it wouldn't operate on the second form without first filling in the elements. If this seems unclear, I can provide a slightly longer sample showing what I was doing.
If you haven't read this article yet, then you should read it: http://dlang.org/d-array-article.html It doesn't use the official terminology (in particular, it talks about T[] as being a slice and the underlying GC buffer as being the dynamic array, whereas per the language spec T[] is the dynamic array (which is alsa a slice of some sort of memory), and the underlying GC buffer that typically backs a dynamic array is just a GC buffer and is essentially an implementation detail), but it should give you good insight into how arrays work in D. - Jonathan M Davis
Nov 21 2015
prev sibling parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/21/15 10:19 PM, Jon D wrote:
 On Sunday, 22 November 2015 at 00:31:53 UTC, Jonathan M Davis wrote:
 Honestly, arrays suck as output ranges. They don't get appended to;
 they get filled, and for better or worse, the documentation for copy
 is probably assuming that you know that. If you want your array to be
 appended to when using it as an output range, then you need to use
 std.array.Appender.
Hi Jonathan, thanks for the reply and the info about std.array.Appender. I was actually using copy to fill an array, not append. However, I also wanted to preallocate the space. And, since I'm mainly trying to understand the language, I was also trying to figure out the difference between these two forms of creating a dynamic array with an initial size: auto x = new int[](n); int[] y; y.reserve(n);
If you want to change the size of the array, use length: y.length = n; This will extend y to the correct length, automatically reserving a block of data that can hold it, and allow you to write to the array. All reserve does is to make sure there is enough space so you can append that much data to it. It is not relevant to your use case.
 The obvious difference is that first initializes n values, the second
 form does not. I'm still unclear if there are other material
 differences, or when one might be preferred over the other :) It's was
 in this context the behavior of copy surprised me, that it wouldn't
 operate on the second form without first filling in the elements. If
 this seems unclear, I can provide a slightly longer sample showing what
 I was doing.
extending length affects the given array, extending if necessary. reserve is ONLY relevant if you are using appending (arr ~= x). It doesn't actually affect the "slice" or the variable you are using, at all (except to possibly point it at newly allocated space). copy uses an "output range" as it's destination. The output range supports taking elements and putting them somewhere. In the case of a simple array, putting them somewhere means assigning to the first element, and then moving to the next one. -Steve
Nov 23 2015
parent reply Jon D <jond noreply.com> writes:
On Monday, 23 November 2015 at 15:19:08 UTC, Steven Schveighoffer 
wrote:
 On 11/21/15 10:19 PM, Jon D wrote:
 On Sunday, 22 November 2015 at 00:31:53 UTC, Jonathan M Davis 
 wrote:
 Honestly, arrays suck as output ranges. They don't get 
 appended to;
 they get filled, and for better or worse, the documentation 
 for copy
 is probably assuming that you know that. If you want your 
 array to be
 appended to when using it as an output range, then you need 
 to use
 std.array.Appender.
Hi Jonathan, thanks for the reply and the info about std.array.Appender. I was actually using copy to fill an array, not append. However, I also wanted to preallocate the space. And, since I'm mainly trying to understand the language, I was also trying to figure out the difference between these two forms of creating a dynamic array with an initial size: auto x = new int[](n); int[] y; y.reserve(n);
If you want to change the size of the array, use length: y.length = n; This will extend y to the correct length, automatically reserving a block of data that can hold it, and allow you to write to the array. All reserve does is to make sure there is enough space so you can append that much data to it. It is not relevant to your use case.
 The obvious difference is that first initializes n values, the 
 second
 form does not. I'm still unclear if there are other material
 differences, or when one might be preferred over the other :) 
 It's was
 in this context the behavior of copy surprised me, that it 
 wouldn't
 operate on the second form without first filling in the 
 elements. If
 this seems unclear, I can provide a slightly longer sample 
 showing what
 I was doing.
extending length affects the given array, extending if necessary. reserve is ONLY relevant if you are using appending (arr ~= x). It doesn't actually affect the "slice" or the variable you are using, at all (except to possibly point it at newly allocated space). copy uses an "output range" as it's destination. The output range supports taking elements and putting them somewhere. In the case of a simple array, putting them somewhere means assigning to the first element, and then moving to the next one. -Steve
Thanks for the reply. And for your article (which Jonathan recommended). It clarified a number of things. In the example I gave, what I was really wondering was if there is a difference between allocating with 'new' or with 'reserve', or with 'length', for that matter. That is, is there a material difference between: auto x = new int[](n); int[] y; y.length = n; I can imagine that the first might be faster, but otherwise there appears no difference. As the article stresses, the question is the ownership model. If I'm understanding, both cause an allocation into the runtime managed heap. --Jon
Nov 23 2015
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/23/15 4:29 PM, Jon D wrote:
 On Monday, 23 November 2015 at 15:19:08 UTC, Steven Schveighoffer wrote:
 On 11/21/15 10:19 PM, Jon D wrote:
 On Sunday, 22 November 2015 at 00:31:53 UTC, Jonathan M Davis wrote:
 Honestly, arrays suck as output ranges. They don't get appended to;
 they get filled, and for better or worse, the documentation for copy
 is probably assuming that you know that. If you want your array to be
 appended to when using it as an output range, then you need to use
 std.array.Appender.
Hi Jonathan, thanks for the reply and the info about std.array.Appender. I was actually using copy to fill an array, not append. However, I also wanted to preallocate the space. And, since I'm mainly trying to understand the language, I was also trying to figure out the difference between these two forms of creating a dynamic array with an initial size: auto x = new int[](n); int[] y; y.reserve(n);
If you want to change the size of the array, use length: y.length = n; This will extend y to the correct length, automatically reserving a block of data that can hold it, and allow you to write to the array. All reserve does is to make sure there is enough space so you can append that much data to it. It is not relevant to your use case.
 The obvious difference is that first initializes n values, the second
 form does not. I'm still unclear if there are other material
 differences, or when one might be preferred over the other :) It's was
 in this context the behavior of copy surprised me, that it wouldn't
 operate on the second form without first filling in the elements. If
 this seems unclear, I can provide a slightly longer sample showing what
 I was doing.
extending length affects the given array, extending if necessary. reserve is ONLY relevant if you are using appending (arr ~= x). It doesn't actually affect the "slice" or the variable you are using, at all (except to possibly point it at newly allocated space). copy uses an "output range" as it's destination. The output range supports taking elements and putting them somewhere. In the case of a simple array, putting them somewhere means assigning to the first element, and then moving to the next one.
Thanks for the reply. And for your article (which Jonathan recommended). It clarified a number of things. In the example I gave, what I was really wondering was if there is a difference between allocating with 'new' or with 'reserve', or with 'length', for that matter. That is, is there a material difference between: auto x = new int[](n); int[] y; y.length = n;
There is no difference at all, other than the function that is called (the former will call an allocation function, the latter will call a length setting function, which then will determine if more data is needed, and finding it is, call the allocation function).
 I can imagine that the first might be faster, but otherwise there
 appears no difference. As the article stresses, the question is the
 ownership model. If I'm understanding, both cause an allocation into the
 runtime managed heap.
You are correct. -Steve
Nov 23 2015
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 11/23/2015 04:03 PM, Steven Schveighoffer wrote:
 On 11/23/15 4:29 PM, Jon D wrote:
 In the example I gave, what I was really wondering was if there is a
 difference between allocating with 'new' or with 'reserve', or with
 'length', for that matter. That is, is there a material difference
 between:

      auto x = new int[](n);
      int[] y; y.length = n;
There is no difference at all, other than the function that is called (the former will call an allocation function, the latter will call a length setting function, which then will determine if more data is needed, and finding it is, call the allocation function).
Although Jon's example above does not compare reserve, I have to ask: How about non-trivial types? Both cases above would set all elements to .init, right? So, I think reserve would be faster if copy() knew how to take advantage of capacity. It could emplace elements instead of copying, no? Ali
Nov 23 2015
parent reply Steven Schveighoffer <schveiguy yahoo.com> writes:
On 11/23/15 7:29 PM, Ali Çehreli wrote:
 On 11/23/2015 04:03 PM, Steven Schveighoffer wrote:
  > On 11/23/15 4:29 PM, Jon D wrote:

  >> In the example I gave, what I was really wondering was if there is a
  >> difference between allocating with 'new' or with 'reserve', or with
  >> 'length', for that matter. That is, is there a material difference
  >> between:
  >>
  >>      auto x = new int[](n);
  >>      int[] y; y.length = n;
  >
  > There is no difference at all, other than the function that is called
  > (the former will call an allocation function, the latter will call a
  > length setting function, which then will determine if more data is
  > needed, and finding it is, call the allocation function).

 Although Jon's example above does not compare reserve, I have to ask:
 How about non-trivial types? Both cases above would set all elements to
 ..init, right? So, I think reserve would be faster if copy() knew how to
 take advantage of capacity. It could emplace elements instead of
 copying, no?
I think the cost of looking up the array metadata is more than the initialization of elements to .init. However, using an Appender would likely fix all these problems. You could also use https://dlang.org/phobos/std_array.html#uninitializedArray to create the array before copying. There are quite a few options, actually :) A delegate is also surprisingly considered an output range! Because why not? So you can do this too as a crude substitute for appender (or for testing performance): import std.range; // for iota import std.algorithm; void main() { int[] arr; arr.reserve(100); iota(100).copy((int a) { arr ~= a;}); } -Steve
Nov 23 2015
parent Jon D <jond noreply.com> writes:
On Tuesday, 24 November 2015 at 01:00:40 UTC, Steven 
Schveighoffer wrote:
 On 11/23/15 7:29 PM, Ali Çehreli wrote:
 On 11/23/2015 04:03 PM, Steven Schveighoffer wrote:
  > On 11/23/15 4:29 PM, Jon D wrote:

  >> In the example I gave, what I was really wondering was if 
 there is a
  >> difference between allocating with 'new' or with 
 'reserve', or with
  >> 'length', for that matter. That is, is there a material 
 difference
  >> between:
  >>
  >>      auto x = new int[](n);
  >>      int[] y; y.length = n;
  >
  > There is no difference at all, other than the function that 
 is called
  > (the former will call an allocation function, the latter 
 will call a
  > length setting function, which then will determine if more 
 data is
  > needed, and finding it is, call the allocation function).

 Although Jon's example above does not compare reserve, I have 
 to ask:
 How about non-trivial types? Both cases above would set all 
 elements to
 ..init, right? So, I think reserve would be faster if copy() 
 knew how to
 take advantage of capacity. It could emplace elements instead 
 of
 copying, no?
I think the cost of looking up the array metadata is more than the initialization of elements to .init. However, using an Appender would likely fix all these problems. You could also use https://dlang.org/phobos/std_array.html#uninitializedArray to create the array before copying. There are quite a few options, actually :) A delegate is also surprisingly considered an output range! Because why not? So you can do this too as a crude substitute for appender (or for testing performance): import std.range; // for iota import std.algorithm; void main() { int[] arr; arr.reserve(100); iota(100).copy((int a) { arr ~= a;}); } -Steve
Thanks. I was also wondering if that initial allocation could be avoided. Code I was writing involved repeatedly using a buffer in a loop. I was trying out taskPool.amap, which needs a random access range. This meant copying from the input range being read. Something like: auto input = anInfiniteRange(); auto bufsize = workPerThread * taskPool.size(); auto workbuf = new int[](bufsize); auto results = new int[](bufsize); while (true) { input.take(bufsize).copy(workbuf); input.popFront(bufsize); taskPool.amap!expensiveCalc(workbuf, workPerThread, results); results.doSomething(); } I'm just writing a toy example, but it is where these questions came from. For this example, the next step would be to allow the buffer size to change while iterating. --Jon
Nov 23 2015