digitalmars.D.learn - A slice can lose capacity simply by calling a function

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (67/67) May 02 2015 This is related to a discussion[1] that I had started recently but I

Jonathan M Davis via Digitalmars-d-learn (30/32) May 02 2015 Actually, you can modify the capacity of arr quite easily. All you have ...

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (15/17) May 02 2015 Yes, it is very different from other languages like C and C++ that

Jonathan M Davis via Digitalmars-d-learn (29/45) May 02 2015 It really comes down to how the memory itself is owned and managed, and ...

Andrew Godfrey (13/18) May 03 2015 I agree with everything Jonathan said in both threads EXCEPT that

Meta (13/32) May 03 2015 It is consistent if you look at it the perspective of wanting D's
Jonathan M Davis via Digitalmars-d-learn (22/33) May 03 2015 I really don't see the problem. It's a natural side effect of how dynami...

=?UTF-8?B?QWxpIMOHZWhyZWxp?= (59/98) May 03 2015 The problem is described in those two sentences of yours: The behavior

Jonathan M Davis (66/78) May 04 2015 Well, we'll see how much I'm able to cover about arrays. The

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

This is related to a discussion[1] that I had started recently but I 
will give an even shorter example here:

void main()
{
     // Two slices to all element
     auto a = [ 1, 2, 3, 4 ];
     auto b = a;

     // Initially, they both have capacity (strange! :) )
     assert(a.capacity == 7);
     assert(b.capacity == 7);

     // The first one that gets a new element gets the capacity
     b ~= 42;
     assert(a.capacity == 0);    // <-- a loses
     assert(b.capacity == 7);
}

The interesting thing is that this situation is the same as appending to 
a slice parameter:

void foo(int[] b)
{
     // Since stomping is prevented by the runtime, I am
     // foolishly assuming that I can freely append to my
     // parameter. Unfortunately, this action will cost the
     // original slice its capacity.
     b ~= 42;
}

void main()
{
     auto a = [ 1, 2, 3, 4 ];

     assert(a.capacity == 7);
     foo(a);
     assert(a.capacity == 0);    // <-- Capacity is gone :(
}

Note that the code above is about a function appending to a parameter 
for its own implementation. Otherwise, the appended element cannot be 
seen by the original slice anyway.

Also note that it would be the same if the parameter were const(int)[].

This is a new discovery of mine. Note that this is different from the 
non-determinism of when two slices stop sharing elements.[2] To me, this 
is very a strange consequence of passing a slice /by value/ despite the 
common expectation that by value leaves the original variable untouched. 
After this, I am tempted to come up with the following guideline.

(I am leaving 'immutable' and 'shared' out of this discussion.)

Guideline: Slice parameters should either be passed by reference to 
non-const or passed by value to const:

   1a) void foo(ref       int [] arr);  // can modify everything

   1b) void foo(ref const(int)[] arr);  // cannot modify elements

   2)  void foo(    const(int[]) arr);  // cannot affect anything
                                        // (even capacity)

Only then the caller can be sure that the capacity of the original slice 
will not change. (I am assuming that the function is smart enough not to 
call assumeUnique.)

Since 1a and 1b are by reference, the function would not append to the 
parameter for its own implementation purposes anyway. If it did append, 
it would be with the intention of that particular side-effect.

For 2, thanks to the const parameter, the function must copy the slice 
first before appending to it.

If the parameter is to a mutable slice (even with const elements) then 
the original slice can lose capacity. It is possible to come up with 
solutions that preserve the capacity of the original slice but I think 
the previous guideline is sufficient:

     foo(a.capacityPreserved);

(capacityPreserved can be a function, returning a RAII object, which 
calls assumeUnique in its destructor on the original slice.)

Does the guideline make sense?

Ali

[1] http://forum.dlang.org/thread/mhtu1k$2it8$1 digitalmars.com

[2] http://dlang.org/d-array-article.html

May 02 2015

Jonathan M Davis via Digitalmars-d-learn writes:

On Saturday, May 02, 2015 01:21:14 Ali �ehreli via Digitalmars-d-learn wrote:
    2)  void foo(    const(int[]) arr);  // cannot affect anything
                                         // (even capacity)

Actually, you can modify the capacity of arr quite easily. All you have to
do is slice it and append to the slice, e.g.

auto arr2 = arr;
arr2 ~= 7;

Any code which has access to a dynamic array can affect that array's
capacity unless the array's capacity is already at its length or 0, because
if you have access to the array, you can slice, and that slice will have
access to exactly the same memory as the original. const prevents altering
the elements, of course, but it has no effect whatsoever on the ability of
other slices to expand into the memory beyond the end of that array.

And of course, if you misuse something like assumeSafeAppend (i.e. when it's
_not_ actually safe to append), then you can _really_ bork things.

Really, if you're dealing with a thread-local, dynamic array, and you check
its capacity immediately before doing something, then you can be sure that
it's capacity will be what it was when that something starts, but unless you
follow every line of code after checking the capacity and verify that none
of it could possibly have appended to a slice which referred to the last
point used in the memory block that that array points to or done something
like call assumeSafeAppend or anything else which could have affected the
capacity of that array, then you have to assume that it's possible that the
capacity of the array has changed.

I really don't think that it's reasonable in the general case to expect to
be able to guarantee that the capacity of a dynamic array won't change. If
you know exactly what the code is up to, and the array or any other array
that might refer to that same block of memory is only going to be appended
to under very controlled circumstances that you fully understand, then you
can know that the array's capacity won't change. But in general, if there's
any possibility of an array being appended to, then all bets are off.

- Jonathan M Davis

May 02 2015

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 05/02/2015 01:56 AM, Jonathan M Davis via Digitalmars-d-learn wrote:

 I really don't think that it's reasonable in the general case to 

expect to
 be able to guarantee that the capacity of a dynamic array won't change.

Yes, it is very different from other languages like C and C++ that 
talking about this behavior is worth it.

The elements are const, the slice is passed by value, but the reserved 
capacity disappears:

void foo(const(int)[] b)
{
     b ~= 42;
}

     a.reserve(milllllion);
     assert(a.capacity >= milllllion);

     foo(a);
     assert(a.capacity == 0);    // WAT

Ali

May 02 2015

Jonathan M Davis via Digitalmars-d-learn writes:

On Saturday, May 02, 2015 07:46:27 Ali �ehreli via Digitalmars-d-learn wrote:
 On 05/02/2015 01:56 AM, Jonathan M Davis via Digitalmars-d-learn wrote:

  > I really don't think that it's reasonable in the general case to
 expect to
  > be able to guarantee that the capacity of a dynamic array won't change.

 Yes, it is very different from other languages like C and C++ that
 talking about this behavior is worth it.

It really comes down to how the memory itself is owned and managed, and with
dynamic arrays, it's the runtime that does that, and dynamic arrays simply
don't own or manage their memory, so expecting to them to isn't going to
work.

 The elements are const, the slice is passed by value, but the reserved
 capacity disappears:

 void foo(const(int)[] b)
 {
      b ~= 42;
 }

      a.reserve(milllllion);
      assert(a.capacity >= milllllion);

      foo(a);
      assert(a.capacity == 0);    // WAT

Yeah. If you just code in a way that you're making sure that you're not
slicing an array and then appending to the slices, then you can rely on
reserve working as you'd expect, and situations like this work just fine

int[] foo;
foo.reserve(target);

while(blah)
{
    //...
    foo ~= next;
    //...
}

so long as you're not doing anything else with the array in the process, but
if you start passing a dynamic array around to functions which might append
to it or a to a slice of it, then all bets are off.

I really don't think that it's an issue in general, but if you do want to
guarantee that nothing affects the capacity of your array, then you're going
to need to either wrap all access to it so that nothing else can actually
get at a slice of the array itself, or you need to use another container -
one which is actually a container rather than the weird half-container,
half-iterator/range construct that a D dynamic array is.

Ultimately, I think that the semantics of D arrays make sense and are
extremely useful, but they are definitely weird in comparison to pretty much
anything else that you're going to run into.

- Jonathan M Davis

May 02 2015

"Andrew Godfrey" <X y.com> writes:

 I really don't think that it's an issue in general, but if you 
 do want to
 guarantee that nothing affects the capacity of your array, then 
 you're going
 to need to either wrap all access to it

I agree with everything Jonathan said in both threads EXCEPT that 
this is not an issue.

The syntax "slice.capacity" gives the impression that 'capacity' 
is a property of the slice.

Such lack of consistency is a SERIOUS issue for D, especially 
when it occurs in something as basic as an array. One other place 
I see this problem is the semantics of utf8 strings.

This is DEADLY for D adoption. Personally, I think I'll be trying 
out Rust now, and checking back on these two issues later. This 
makes me sad, D is so promising ... but unfortunately the high 
standards it sets in many areas, are completely undercut by such 
basic problems. I work on large systems, and I can't imagine 
building a large system in this state.

May 03 2015

"Meta" <jared771 gmail.com> writes:

On Sunday, 3 May 2015 at 15:21:17 UTC, Andrew Godfrey wrote:
 I really don't think that it's an issue in general, but if you 
 do want to
 guarantee that nothing affects the capacity of your array, 
 then you're going
 to need to either wrap all access to it

 I agree with everything Jonathan said in both threads EXCEPT 
 that this is not an issue.

 The syntax "slice.capacity" gives the impression that 
 'capacity' is a property of the slice.

 Such lack of consistency is a SERIOUS issue for D, especially 
 when it occurs in something as basic as an array. One other 
 place I see this problem is the semantics of utf8 strings.

 This is DEADLY for D adoption. Personally, I think I'll be 
 trying out Rust now, and checking back on these two issues 
 later. This makes me sad, D is so promising ... but 
 unfortunately the high standards it sets in many areas, are 
 completely undercut by such basic problems. I work on large 
 systems, and I can't imagine building a large system in this 
 state.

It is consistent if you look at it the perspective of wanting D's 
slices to have certain semantics while still being memory safe. 
It is word behavior; however, when you think about it, it is 
necessary if we want D's slices to have the semantics that they 
do while stomping memory. If you look at Go's slices, they are 
NOT memory safe; it's possible (and even easy) in Go to have the 
data in a slice change under you unexpectedly. This behavior is 
impossible in D as far as I know. Yes, Rust does statically 
guarantee this can never happen, but it also needs its complex 
ownership system to do that. So D is really a midway point 
between Go and Rust. Not unsafe, unlike Go, but with some 
slightly confusing behavior.

May 03 2015

Jonathan M Davis via Digitalmars-d-learn writes:

On Sunday, May 03, 2015 15:21:15 Andrew Godfrey via Digitalmars-d-learn wrote:
 I really don't think that it's an issue in general, but if you
 do want to
 guarantee that nothing affects the capacity of your array, then
 you're going
 to need to either wrap all access to it

 I agree with everything Jonathan said in both threads EXCEPT that
 this is not an issue.

 The syntax "slice.capacity" gives the impression that 'capacity'
 is a property of the slice.

 Such lack of consistency is a SERIOUS issue for D, especially
 when it occurs in something as basic as an array.

I really don't see the problem. It's a natural side effect of how dynamic
arrays work in D. The only thing I can see that could be changed without
losing some of the capabilities that we currently have is if capacity were
explicitly a separate function - e.g. calcArrayCapacity - but I really don't
think that that would buy us much. You'd still have to understand how D's
dynamic arrays work to understand how the capacity for a dynamic array
works. And most code really doesn't care about the capacity of an array. If
you profile and find that appending to an array is a hot spot in your
program, then you dig in and use reserve or Appender or use something other
than a naked array, and you figure out what works best for your particular
use case, but for most code, it just works to use ~=. And if you're
concerned about reallocations even before profiling, then it's trivial to
just call reserve first or to use Appender. And unless you're doing
something like constantly slicing and appending to each slice (which I
expect to be a rare thing to do), you're really not going to run into
problems.

Sure, the semantics of D's dynamic arrays take some getting used to -
especially if you want to understand all of their ins and outs. But I really
don't see why it's a big problem. For the most part, D's dynamic arrays just
work.

- Jonathan M Davis

May 03 2015

=?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:

On 05/03/2015 06:06 PM, Jonathan M Davis via Digitalmars-d-learn wrote:

On Sunday, May 03, 2015 15:21:15 Andrew Godfrey via

Digitalmars-d-learn wrote:
I really don't think that it's an issue in general, but if you
do want to
guarantee that nothing affects the capacity of your array, then
you're going
to need to either wrap all access to it

I agree with everything Jonathan said in both threads EXCEPT that
this is not an issue.

The syntax "slice.capacity" gives the impression that 'capacity'
is a property of the slice.

Such lack of consistency is a SERIOUS issue for D, especially
when it occurs in something as basic as an array.

I really don't see the problem. It's a natural side effect of how dynamic
arrays work in D.

The problem is described in those two sentences of yours: The behavior
of a language construct depends on the implementation detail.
Unfortunately, it is backward. Normally, the semantics would be layed
out and then those semantics would be implemented.

It is not the case with D's dynamic arrays.

(Aside: Heck, we (at least you and I) can't even decide whether dynamic
arrays and slices are the same thing. (For the record, yes, they are the
same thing from every vantage point: They have the same semantics and
they are implemented in the same way.))

Saying that there is no problem would be contradicting yourself. These
are your quotes from the following thread:

http://forum.dlang.org/thread/mhtu1k$2it8$1 digitalmars.com#post-mailman.597.1430448607.4581.digitalmars-d-learn:40puremagic.com

- "it seems like so few people understand it"

- "pretty much everyone has trouble with them initially"

- "I _still_ occasionally end up \"Aha!\" moments"

So, yes, this is a problem.

The problem is even worse because we can't come up with a consistent way
of describing the semantics after the fact (after how they are
implemented). The following are some of my attempts:

"sharing may terminate when the length of the slice increases"

"capacity is first-come first served"

However, they are only mostly correct, not always. And that's the part
that is frustrating the most: There is no way of "understanding" the
semantics so that one can "explain" them to at least to one's self.

The only thing I can see that could be changed without

Please note that at least I am not interested in changing anything: I am
interested in a way of explaining the semantics. I could not find that
way yet. (I am eagerly waiting for your DConf talk to see how you make
sense of it all.)

You'd still have to understand how D's
dynamic arrays work to understand how the capacity for a dynamic array
works.

I am happy that I do my part by opening threads like this one to shine
more light on what array capacity is not.

And most code really doesn't care about the capacity of an array. If
you profile and find that appending to an array is a hot spot in your
program, then you dig in and use reserve or Appender or use something

other
than a naked array, and you figure out what works best for your

particular
use case, but for most code, it just works to use ~=. And if you're
concerned about reallocations even before profiling, then it's trivial to
just call reserve first or to use Appender. And unless you're doing
something like constantly slicing and appending to each slice (which I
expect to be a rare thing to do), you're really not going to run into
problems.

I am sorry but that is missing the point. This issue is not only about
performance.

Note that I may not be the only programmer who writes the entire
application. Imagine that I write a function inside a library. Do I have
the *right* to append to my parameter slice? Do I have the right to
disturb my caller's slice? The answer is a resounding NO, in capital
letters. If I lose my caller's potentially precious and carefully
crafter capacity, perhaps after a call to reserve(), then I am a bad
library function.

As a programmer, I need guidelines. Here is what I have at this point:
Never append to a parameter slice.

Sure, the semantics of D's dynamic arrays take some getting used to -

Yes, they do: I started learning D about 6 years ago. How many years
more do you reckon until I get them?

especially if you want to understand all of their ins and outs.

Yes I do.

But I really
don't see why it's a big problem.

I said it before why it's a big problem: "The elements are const, the
slice is passed by value, but the [slice is affected]." The loss of
capacity itself is not a big deal. However, this behavior conflicts with
other parts of the language (as well as at least C and C++).

For the most part, D's dynamic arrays just
work.

I know you are not trolling but I can't take your brushing off this
issue with phares like "for the most part". That's the frigging problem!
"For the most part" is not sufficient. Unless somebody explains the
semantics in a consistent way, I will continue to try to do it myself.
(Remember: Never append to a parameter slice. Good function, good!)

- Jonathan M Davis

Ali

May 03 2015

"Jonathan M Davis" <jmdavisProg gmx.com> writes:

On Monday, 4 May 2015 at 06:23:42 UTC, Ali Çehreli wrote:
 On 05/03/2015 06:06 PM, Jonathan M Davis via 
 Digitalmars-d-learn wrote:

 (I am eagerly waiting for your DConf talk to see how you make 
 sense of it all.)

Well, we'll see how much I'm able to cover about arrays. The 
focus of the talk is on ranges, not arrays, so I don't know if 
talking much about non-range stuff like array capacity is going 
to fit in with everything else that needs to be discussed that 
_is_ specific to ranges. I'd like to discuss it though.

Regardless, I keep meaning to write an article on ranges, and I'm 
increasingly convinced that I should take a crack at writing one 
on arrays, since while Steven's article is quite enlightening, I 
think that it approaches things the wrong way (e.g. it focuses on 
the memory buffers that the runtime manages rather than the 
dynamic arrays themselves) and uses the wrong terminology (e.g. 
talking about the memory buffers that the runtime manages as 
being dynamic arrays, when according to the language spec T[] is 
a dynamic array, and what it refers to really doesn't matter with 
regards to whether it's a dynamic array or not). So, I'll 
probably turn some portion of my talk into an article or two, and 
there won't be the same time pressures there.

At this point, I feel like I have how dynamic arrays work well 
ironed out in my head and that it's actually pretty 
straightforward, but in general, we seem to do a poor job of 
explaining it in a straightforward manner, which results in far 
more confusion on the topic than I think there should be.

 For the most part, D's dynamic arrays just
 work.

 I know you are not trolling but I can't take your brushing off 
 this issue with phares like "for the most part". That's the 
 frigging problem! "For the most part" is not sufficient. Unless 
 somebody explains the semantics in a consistent way, I will 
 continue to try to do it myself. (Remember: Never append to a 
 parameter slice. Good function, good!)

Aside from performance considerations, you can pretty much ignore 
the capacity issue. The only other concern that it raises is 
whether two dynamic arrays still refer to the same memory block, 
and once you append to either of them, you can't assume that they 
do, and you can't assume that they don't (though it's easy enough 
to check via their ptr properties). That can be managed on some 
level by checking the capacity ahead of time, but really, once 
you start appending, you have to treat each slice as possibly 
separate, and if you want to guarantee it, you really need to use 
dup or idup.

But most code just doesn't need to care about capacity. And if 
you really do need to care, odds are that you can either fix the 
problem with a reserve call or by using Appender, or you should 
just not use dynamic arrays directly. In general, I'd consider 
code that was worrying much about the capacity of dynamic arrays 
to be error-prone - or at least that it's not going about things 
in the best way. By its very nature, it's likely to end up being 
inefficient and is too likely to care about whether two dynamic 
arrays refer to the same memory or not.

Dynamic arrays are badly designed for situations where you can 
have random stuff appending to your array. They just are. Because 
there's no ownership, and they're not full reference types, 
making it trivial to end up with something appended to one 
dynamic array but not actually end up on the one you want it on. 
As such, I'd argue that anything that's doing a lot of random 
appending to arrays shouldn't be using dynamic arrays (at least, 
not without wrapping them so that there's clear ownership of the 
memory).

So, ultimately, I see array capacity as being pretty much a 
non-issue, because most code that would care much about is going 
about things in the wrong way. But maybe what we need is a clear 
set of guidelines about how dynamic array slices should be 
managed so that they're generally used efficiently and without 
risking weird behavior due to expecting two dynamic arrays to 
refer to the same array when they don't.

In general though, I'd argue that code should be constructing 
arrays up front and then processing them as ranges and not doing 
a lot of appending to them later. In particular, if you do a lot 
of appending and removals as you go along, it's going to be a 
performance hit, and you seriously risk having trouble due to 
having operated on a slice of the dynamic array you actually 
wanted to operate on.

- Jonathan M Davis

May 04 2015

D Programming

C/C++ Programming

Other

digitalmars.D.learn - A slice can lose capacity simply by calling a function