digitalmars.D.learn - copy and array length vs capacity. (Doc suggestion?)

Jon D (25/25) Nov 21 2015 Something I found confusing was the relationship between array

=?UTF-8?Q?Ali_=c3=87ehreli?= (23/27) Nov 21 2015 May I suggest that you improve that page. ;) If you don't already have a...

Jon D (9/16) Nov 21 2015 Hi Ali, thanks for the quick response. And point taken :) I

Jonathan M Davis via Digitalmars-d-learn (6/31) Nov 21 2015 Honestly, arrays suck as output ranges. They don't get appended to; they...

Jon D (18/23) Nov 21 2015 Hi Jonathan, thanks for the reply and the info about

Jonathan M Davis via Digitalmars-d-learn (11/35) Nov 21 2015 If you haven't read this article yet, then you should read it:
Steven Schveighoffer (16/38) Nov 23 2015 If you want to change the size of the array, use length:

Jon D (15/72) Nov 23 2015 Thanks for the reply. And for your article (which Jonathan

Steven Schveighoffer (7/65) Nov 23 2015 There is no difference at all, other than the function that is called

=?UTF-8?Q?Ali_=c3=87ehreli?= (7/19) Nov 23 2015 Although Jon's example above does not compare reserve, I have to ask:

Steven Schveighoffer (19/38) Nov 23 2015 I think the cost of looking up the array metadata is more than the

Jon D (22/73) Nov 23 2015 Thanks. I was also wondering if that initial allocation could be

Jon D <jond noreply.com> writes:

Something I found confusing was the relationship between array 
capacity and copy(). A short example:

void main()
{
     import std.algorithm: copy;

     auto a = new int[](3);
     assert(a.length == 3);
     [1, 2, 3].copy(a);     // Okay

     int[] b;
     b.reserve(3);
     assert(b.capacity >= 3);
     assert(b.length == 0);
     [1, 2, 3].copy(b);     // Error
}

I had expected that copy() would work if the target had 
sufficient capacity, but that's not the case. Target has to have 
sufficient length.

If I've understood this correctly, a small change to the 
documentation for copy() might make this clearer. In particular, 
the "precondition" section:

     Preconditions:
     target shall have enough room to accomodate the entirety of 
source.

Clarifying that "enough room" means 'length' rather than 
'capacity' might be beneficial.

Nov 21 2015

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

Hi Jon! :)

On 11/21/2015 03:34 PM, Jon D wrote:

      Preconditions:
      target shall have enough room to accomodate the entirety of source.

 Clarifying that "enough room" means 'length' rather than 'capacity'
 might be beneficial.

May I suggest that you improve that page. ;) If you don't already have a 
clone o the repo, you can do it easily by clicking the "Improve this 
page" button on that page.

Regarding why copy() cannot use the capacity of the slice, it is because 
slices don't know about each other, so, copy could not let other slices 
know that the capacity has just been used by this particular slice.

However, copy() could first append an element, in which case the 
capacity would be owned by this slice. copy() then safely use the 
capacity, knowing very well that the act of appending that one element 
has dropped the capacities of all other slices to zero.

In pseudo code:

if
   there is enough capacity
   and if copying will spill into capacity
then
    append an element
    copy by spilling into capacity
    set .length appropriately

Others, please review, implement, prove that it is efficient, and post a 
pull request. :)

Ali

Nov 21 2015

Jon D <jond noreply.com> writes:

On Sunday, 22 November 2015 at 00:10:07 UTC, Ali Çehreli wrote:
 May I suggest that you improve that page. ;) If you don't 
 already have a clone o the repo, you can do it easily by 
 clicking the "Improve this page" button on that page.

Hi Ali, thanks for the quick response. And point taken :)  I 
hadn't noticed those buttons on the doc pages, looks very 
convenient. There are a couple formalities I need to look into 
before making contributions, even small ones, but I'll check into 
these.
 Regarding why copy() cannot use the capacity of the slice, it 
 is because slices don't know about each other, so, copy could 
 not let other slices know that the capacity has just been used 
 by this particular slice.

Thanks for the explanation, very helpful understanding what's 
going on.

--Jon

Nov 21 2015

Jonathan M Davis via Digitalmars-d-learn writes:

On Saturday, November 21, 2015 23:34:25 Jon D via Digitalmars-d-learn wrote:
 Something I found confusing was the relationship between array
 capacity and copy(). A short example:

 void main()
 {
      import std.algorithm: copy;

      auto a = new int[](3);
      assert(a.length == 3);
      [1, 2, 3].copy(a);     // Okay

      int[] b;
      b.reserve(3);
      assert(b.capacity >= 3);
      assert(b.length == 0);
      [1, 2, 3].copy(b);     // Error
 }

 I had expected that copy() would work if the target had
 sufficient capacity, but that's not the case. Target has to have
 sufficient length.

 If I've understood this correctly, a small change to the
 documentation for copy() might make this clearer. In particular,
 the "precondition" section:

      Preconditions:
      target shall have enough room to accomodate the entirety of
 source.

 Clarifying that "enough room" means 'length' rather than
 'capacity' might be beneficial.

Honestly, arrays suck as output ranges. They don't get appended to; they get
filled, and for better or worse, the documentation for copy is probably
assuming that you know that. If you want your array to be appended to when
using it as an output range, then you need to use std.array.Appender.

- Jonathan M Davis

Nov 21 2015

Jon D <jond noreply.com> writes:

On Sunday, 22 November 2015 at 00:31:53 UTC, Jonathan M Davis 
wrote:
 Honestly, arrays suck as output ranges. They don't get appended 
 to; they get filled, and for better or worse, the documentation 
 for copy is probably assuming that you know that. If you want 
 your array to be appended to when using it as an output range, 
 then you need to use std.array.Appender.

Hi Jonathan, thanks for the reply and the info about 
std.array.Appender. I was actually using copy to fill an array, 
not append. However, I also wanted to preallocate the space. And, 
since I'm mainly trying to understand the language, I was also 
trying to figure out the difference between these two forms of 
creating a dynamic array with an initial size:

    auto x = new int[](n);
    int[] y;  y.reserve(n);

The obvious difference is that first initializes n values, the 
second form does not. I'm still unclear if there are other 
material differences, or when one might be preferred over the 
other :) It's was in this context the behavior of copy surprised 
me, that it wouldn't operate on the second form without first 
filling in the elements. If this seems unclear, I can provide a 
slightly longer sample showing what I was doing.

--Jon

Nov 21 2015

Jonathan M Davis via Digitalmars-d-learn writes:

On Sunday, November 22, 2015 03:19:54 Jon D via Digitalmars-d-learn wrote:
 On Sunday, 22 November 2015 at 00:31:53 UTC, Jonathan M Davis
 wrote:
 Honestly, arrays suck as output ranges. They don't get appended
 to; they get filled, and for better or worse, the documentation
 for copy is probably assuming that you know that. If you want
 your array to be appended to when using it as an output range,
 then you need to use std.array.Appender.

 Hi Jonathan, thanks for the reply and the info about
 std.array.Appender. I was actually using copy to fill an array,
 not append. However, I also wanted to preallocate the space. And,
 since I'm mainly trying to understand the language, I was also
 trying to figure out the difference between these two forms of
 creating a dynamic array with an initial size:

     auto x = new int[](n);
     int[] y;  y.reserve(n);

 The obvious difference is that first initializes n values, the
 second form does not. I'm still unclear if there are other
 material differences, or when one might be preferred over the
 other :) It's was in this context the behavior of copy surprised
 me, that it wouldn't operate on the second form without first
 filling in the elements. If this seems unclear, I can provide a
 slightly longer sample showing what I was doing.

If you haven't read this article yet, then you should read it:

http://dlang.org/d-array-article.html

It doesn't use the official terminology (in particular, it talks about T[]
as being a slice and the underlying GC buffer as being the dynamic array,
whereas per the language spec T[] is the dynamic array (which is alsa a
slice of some sort of memory), and the underlying GC buffer that typically
backs a dynamic array is just a GC buffer and is essentially an
implementation detail), but it should give you good insight into how arrays
work in D.

- Jonathan M Davis

Nov 21 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 11/21/15 10:19 PM, Jon D wrote:
 On Sunday, 22 November 2015 at 00:31:53 UTC, Jonathan M Davis wrote:
 Honestly, arrays suck as output ranges. They don't get appended to;
 they get filled, and for better or worse, the documentation for copy
 is probably assuming that you know that. If you want your array to be
 appended to when using it as an output range, then you need to use
 std.array.Appender.

 Hi Jonathan, thanks for the reply and the info about std.array.Appender.
 I was actually using copy to fill an array, not append. However, I also
 wanted to preallocate the space. And, since I'm mainly trying to
 understand the language, I was also trying to figure out the difference
 between these two forms of creating a dynamic array with an initial size:

     auto x = new int[](n);
     int[] y;  y.reserve(n);

If you want to change the size of the array, use length:

y.length = n;

This will extend y to the correct length, automatically reserving a 
block of data that can hold it, and allow you to write to the array.

All reserve does is to make sure there is enough space so you can append 
that much data to it. It is not relevant to your use case.

 The obvious difference is that first initializes n values, the second
 form does not. I'm still unclear if there are other material
 differences, or when one might be preferred over the other :) It's was
 in this context the behavior of copy surprised me, that it wouldn't
 operate on the second form without first filling in the elements. If
 this seems unclear, I can provide a slightly longer sample showing what
 I was doing.

extending length affects the given array, extending if necessary. 
reserve is ONLY relevant if you are using appending (arr ~= x). It 
doesn't actually affect the "slice" or the variable you are using, at 
all (except to possibly point it at newly allocated space).

copy uses an "output range" as it's destination. The output range 
supports taking elements and putting them somewhere. In the case of a 
simple array, putting them somewhere means assigning to the first 
element, and then moving to the next one.

-Steve

Nov 23 2015

Jon D <jond noreply.com> writes:

On Monday, 23 November 2015 at 15:19:08 UTC, Steven Schveighoffer 
wrote:
 On 11/21/15 10:19 PM, Jon D wrote:
 On Sunday, 22 November 2015 at 00:31:53 UTC, Jonathan M Davis 
 wrote:
 Honestly, arrays suck as output ranges. They don't get 
 appended to;
 they get filled, and for better or worse, the documentation 
 for copy
 is probably assuming that you know that. If you want your 
 array to be
 appended to when using it as an output range, then you need 
 to use
 std.array.Appender.

 Hi Jonathan, thanks for the reply and the info about 
 std.array.Appender.
 I was actually using copy to fill an array, not append. 
 However, I also
 wanted to preallocate the space. And, since I'm mainly trying 
 to
 understand the language, I was also trying to figure out the 
 difference
 between these two forms of creating a dynamic array with an 
 initial size:

     auto x = new int[](n);
     int[] y;  y.reserve(n);

 If you want to change the size of the array, use length:

 y.length = n;

 This will extend y to the correct length, automatically 
 reserving a block of data that can hold it, and allow you to 
 write to the array.

 All reserve does is to make sure there is enough space so you 
 can append that much data to it. It is not relevant to your use 
 case.

 The obvious difference is that first initializes n values, the 
 second
 form does not. I'm still unclear if there are other material
 differences, or when one might be preferred over the other :) 
 It's was
 in this context the behavior of copy surprised me, that it 
 wouldn't
 operate on the second form without first filling in the 
 elements. If
 this seems unclear, I can provide a slightly longer sample 
 showing what
 I was doing.

 extending length affects the given array, extending if 
 necessary. reserve is ONLY relevant if you are using appending 
 (arr ~= x). It doesn't actually affect the "slice" or the 
 variable you are using, at all (except to possibly point it at 
 newly allocated space).

 copy uses an "output range" as it's destination. The output 
 range supports taking elements and putting them somewhere. In 
 the case of a simple array, putting them somewhere means 
 assigning to the first element, and then moving to the next one.

 -Steve

Thanks for the reply. And for your article (which Jonathan 
recommended). It clarified a number of things.

In the example I gave, what I was really wondering was if there 
is a difference between allocating with 'new' or with 'reserve', 
or with 'length', for that matter. That is, is there a material 
difference between:

     auto x = new int[](n);
     int[] y; y.length = n;

I can imagine that the first might be faster, but otherwise there 
appears no difference. As the article stresses, the question is 
the ownership model. If I'm understanding, both cause an 
allocation into the runtime managed heap.

--Jon

Nov 23 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 11/23/15 4:29 PM, Jon D wrote:
 On Monday, 23 November 2015 at 15:19:08 UTC, Steven Schveighoffer wrote:
 On 11/21/15 10:19 PM, Jon D wrote:
 On Sunday, 22 November 2015 at 00:31:53 UTC, Jonathan M Davis wrote:
 Honestly, arrays suck as output ranges. They don't get appended to;
 they get filled, and for better or worse, the documentation for copy
 is probably assuming that you know that. If you want your array to be
 appended to when using it as an output range, then you need to use
 std.array.Appender.

 Hi Jonathan, thanks for the reply and the info about std.array.Appender.
 I was actually using copy to fill an array, not append. However, I also
 wanted to preallocate the space. And, since I'm mainly trying to
 understand the language, I was also trying to figure out the difference
 between these two forms of creating a dynamic array with an initial
 size:

     auto x = new int[](n);
     int[] y;  y.reserve(n);

 If you want to change the size of the array, use length:

 y.length = n;

 This will extend y to the correct length, automatically reserving a
 block of data that can hold it, and allow you to write to the array.

 All reserve does is to make sure there is enough space so you can
 append that much data to it. It is not relevant to your use case.

 The obvious difference is that first initializes n values, the second
 form does not. I'm still unclear if there are other material
 differences, or when one might be preferred over the other :) It's was
 in this context the behavior of copy surprised me, that it wouldn't
 operate on the second form without first filling in the elements. If
 this seems unclear, I can provide a slightly longer sample showing what
 I was doing.

 extending length affects the given array, extending if necessary.
 reserve is ONLY relevant if you are using appending (arr ~= x). It
 doesn't actually affect the "slice" or the variable you are using, at
 all (except to possibly point it at newly allocated space).

 copy uses an "output range" as it's destination. The output range
 supports taking elements and putting them somewhere. In the case of a
 simple array, putting them somewhere means assigning to the first
 element, and then moving to the next one.

 Thanks for the reply. And for your article (which Jonathan recommended).
 It clarified a number of things.

 In the example I gave, what I was really wondering was if there is a
 difference between allocating with 'new' or with 'reserve', or with
 'length', for that matter. That is, is there a material difference between:

      auto x = new int[](n);
      int[] y; y.length = n;

There is no difference at all, other than the function that is called 
(the former will call an allocation function, the latter will call a 
length setting function, which then will determine if more data is 
needed, and finding it is, call the allocation function).

 I can imagine that the first might be faster, but otherwise there
 appears no difference. As the article stresses, the question is the
 ownership model. If I'm understanding, both cause an allocation into the
 runtime managed heap.

You are correct.

-Steve

Nov 23 2015

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 11/23/2015 04:03 PM, Steven Schveighoffer wrote:
 On 11/23/15 4:29 PM, Jon D wrote:

 In the example I gave, what I was really wondering was if there is a
 difference between allocating with 'new' or with 'reserve', or with
 'length', for that matter. That is, is there a material difference
 between:

      auto x = new int[](n);
      int[] y; y.length = n;

 There is no difference at all, other than the function that is called
 (the former will call an allocation function, the latter will call a
 length setting function, which then will determine if more data is
 needed, and finding it is, call the allocation function).

Although Jon's example above does not compare reserve, I have to ask: 
How about non-trivial types? Both cases above would set all elements to 
.init, right? So, I think reserve would be faster if copy() knew how to 
take advantage of capacity. It could emplace elements instead of 
copying, no?

Ali

Nov 23 2015

Steven Schveighoffer <schveiguy yahoo.com> writes:

On 11/23/15 7:29 PM, Ali Çehreli wrote:
 On 11/23/2015 04:03 PM, Steven Schveighoffer wrote:
  > On 11/23/15 4:29 PM, Jon D wrote:

  >> In the example I gave, what I was really wondering was if there is a
  >> difference between allocating with 'new' or with 'reserve', or with
  >> 'length', for that matter. That is, is there a material difference
  >> between:
  >>
  >>      auto x = new int[](n);
  >>      int[] y; y.length = n;
  >
  > There is no difference at all, other than the function that is called
  > (the former will call an allocation function, the latter will call a
  > length setting function, which then will determine if more data is
  > needed, and finding it is, call the allocation function).

 Although Jon's example above does not compare reserve, I have to ask:
 How about non-trivial types? Both cases above would set all elements to
 ..init, right? So, I think reserve would be faster if copy() knew how to
 take advantage of capacity. It could emplace elements instead of
 copying, no?

I think the cost of looking up the array metadata is more than the 
initialization of elements to .init. However, using an Appender would 
likely fix all these problems.

You could also use 
https://dlang.org/phobos/std_array.html#uninitializedArray to create the 
array before copying. There are quite a few options, actually :)

A delegate is also surprisingly considered an output range! Because why 
not? So you can do this too as a crude substitute for appender (or for 
testing performance):

import std.range; // for iota
import std.algorithm;

void main()
{
    int[] arr;
    arr.reserve(100);

    iota(100).copy((int a) { arr ~= a;});
}

-Steve

Nov 23 2015

Jon D <jond noreply.com> writes:

On Tuesday, 24 November 2015 at 01:00:40 UTC, Steven 
Schveighoffer wrote:
 On 11/23/15 7:29 PM, Ali Çehreli wrote:
 On 11/23/2015 04:03 PM, Steven Schveighoffer wrote:
  > On 11/23/15 4:29 PM, Jon D wrote:

  >> In the example I gave, what I was really wondering was if 
 there is a
  >> difference between allocating with 'new' or with 
 'reserve', or with
  >> 'length', for that matter. That is, is there a material 
 difference
  >> between:
  >>
  >>      auto x = new int[](n);
  >>      int[] y; y.length = n;
  >
  > There is no difference at all, other than the function that 
 is called
  > (the former will call an allocation function, the latter 
 will call a
  > length setting function, which then will determine if more 
 data is
  > needed, and finding it is, call the allocation function).

 Although Jon's example above does not compare reserve, I have 
 to ask:
 How about non-trivial types? Both cases above would set all 
 elements to
 ..init, right? So, I think reserve would be faster if copy() 
 knew how to
 take advantage of capacity. It could emplace elements instead 
 of
 copying, no?

 I think the cost of looking up the array metadata is more than 
 the initialization of elements to .init. However, using an 
 Appender would likely fix all these problems.

 You could also use 
 https://dlang.org/phobos/std_array.html#uninitializedArray to 
 create the array before copying. There are quite a few options, 
 actually :)

 A delegate is also surprisingly considered an output range! 
 Because why not? So you can do this too as a crude substitute 
 for appender (or for testing performance):

 import std.range; // for iota
 import std.algorithm;

 void main()
 {
    int[] arr;
    arr.reserve(100);

    iota(100).copy((int a) { arr ~= a;});
 }

 -Steve

Thanks. I was also wondering if that initial allocation could be 
avoided. Code I was writing involved repeatedly using a buffer in 
a loop. I was trying out taskPool.amap, which needs a random 
access range. This meant copying from the input range being read. 
Something like:

     auto input = anInfiniteRange();
     auto bufsize = workPerThread * taskPool.size();
     auto workbuf = new int[](bufsize);
     auto results = new int[](bufsize);
     while (true) {
         input.take(bufsize).copy(workbuf);
         input.popFront(bufsize);
         taskPool.amap!expensiveCalc(workbuf, workPerThread, 
results);
         results.doSomething();
     }

I'm just writing a toy example, but it is where these questions 
came from. For this example, the next step would be to allow the 
buffer size to change while iterating.

--Jon

Nov 23 2015

D Programming

C/C++ Programming

Other

digitalmars.D.learn - copy and array length vs capacity. (Doc suggestion?)