www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - A slice can lose capacity simply by calling a function

reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
This is related to a discussion[1] that I had started recently but I 
will give an even shorter example here:

void main()
{
     // Two slices to all element
     auto a = [ 1, 2, 3, 4 ];
     auto b = a;

     // Initially, they both have capacity (strange! :) )
     assert(a.capacity == 7);
     assert(b.capacity == 7);

     // The first one that gets a new element gets the capacity
     b ~= 42;
     assert(a.capacity == 0);    // <-- a loses
     assert(b.capacity == 7);
}

The interesting thing is that this situation is the same as appending to 
a slice parameter:

void foo(int[] b)
{
     // Since stomping is prevented by the runtime, I am
     // foolishly assuming that I can freely append to my
     // parameter. Unfortunately, this action will cost the
     // original slice its capacity.
     b ~= 42;
}

void main()
{
     auto a = [ 1, 2, 3, 4 ];

     assert(a.capacity == 7);
     foo(a);
     assert(a.capacity == 0);    // <-- Capacity is gone :(
}

Note that the code above is about a function appending to a parameter 
for its own implementation. Otherwise, the appended element cannot be 
seen by the original slice anyway.

Also note that it would be the same if the parameter were const(int)[].

This is a new discovery of mine. Note that this is different from the 
non-determinism of when two slices stop sharing elements.[2] To me, this 
is very a strange consequence of passing a slice /by value/ despite the 
common expectation that by value leaves the original variable untouched. 
After this, I am tempted to come up with the following guideline.

(I am leaving 'immutable' and 'shared' out of this discussion.)

Guideline: Slice parameters should either be passed by reference to 
non-const or passed by value to const:

   1a) void foo(ref       int [] arr);  // can modify everything

   1b) void foo(ref const(int)[] arr);  // cannot modify elements

   2)  void foo(    const(int[]) arr);  // cannot affect anything
                                        // (even capacity)

Only then the caller can be sure that the capacity of the original slice 
will not change. (I am assuming that the function is smart enough not to 
call assumeUnique.)

Since 1a and 1b are by reference, the function would not append to the 
parameter for its own implementation purposes anyway. If it did append, 
it would be with the intention of that particular side-effect.

For 2, thanks to the const parameter, the function must copy the slice 
first before appending to it.

If the parameter is to a mutable slice (even with const elements) then 
the original slice can lose capacity. It is possible to come up with 
solutions that preserve the capacity of the original slice but I think 
the previous guideline is sufficient:

     foo(a.capacityPreserved);

(capacityPreserved can be a function, returning a RAII object, which 
calls assumeUnique in its destructor on the original slice.)

Does the guideline make sense?

Ali

[1] http://forum.dlang.org/thread/mhtu1k$2it8$1 digitalmars.com

[2] http://dlang.org/d-array-article.html
May 02 2015
parent reply Jonathan M Davis via Digitalmars-d-learn writes:
On Saturday, May 02, 2015 01:21:14 Ali Çehreli via Digitalmars-d-learn wrote:
    2)  void foo(    const(int[]) arr);  // cannot affect anything
                                         // (even capacity)
Actually, you can modify the capacity of arr quite easily. All you have to do is slice it and append to the slice, e.g. auto arr2 = arr; arr2 ~= 7; Any code which has access to a dynamic array can affect that array's capacity unless the array's capacity is already at its length or 0, because if you have access to the array, you can slice, and that slice will have access to exactly the same memory as the original. const prevents altering the elements, of course, but it has no effect whatsoever on the ability of other slices to expand into the memory beyond the end of that array. And of course, if you misuse something like assumeSafeAppend (i.e. when it's _not_ actually safe to append), then you can _really_ bork things. Really, if you're dealing with a thread-local, dynamic array, and you check its capacity immediately before doing something, then you can be sure that it's capacity will be what it was when that something starts, but unless you follow every line of code after checking the capacity and verify that none of it could possibly have appended to a slice which referred to the last point used in the memory block that that array points to or done something like call assumeSafeAppend or anything else which could have affected the capacity of that array, then you have to assume that it's possible that the capacity of the array has changed. I really don't think that it's reasonable in the general case to expect to be able to guarantee that the capacity of a dynamic array won't change. If you know exactly what the code is up to, and the array or any other array that might refer to that same block of memory is only going to be appended to under very controlled circumstances that you fully understand, then you can know that the array's capacity won't change. But in general, if there's any possibility of an array being appended to, then all bets are off. - Jonathan M Davis
May 02 2015
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 05/02/2015 01:56 AM, Jonathan M Davis via Digitalmars-d-learn wrote:

 I really don't think that it's reasonable in the general case to 
expect to
 be able to guarantee that the capacity of a dynamic array won't change.
Yes, it is very different from other languages like C and C++ that talking about this behavior is worth it. The elements are const, the slice is passed by value, but the reserved capacity disappears: void foo(const(int)[] b) { b ~= 42; } a.reserve(milllllion); assert(a.capacity >= milllllion); foo(a); assert(a.capacity == 0); // WAT Ali
May 02 2015
parent reply Jonathan M Davis via Digitalmars-d-learn writes:
On Saturday, May 02, 2015 07:46:27 Ali Çehreli via Digitalmars-d-learn wrote:
 On 05/02/2015 01:56 AM, Jonathan M Davis via Digitalmars-d-learn wrote:

  > I really don't think that it's reasonable in the general case to
 expect to
  > be able to guarantee that the capacity of a dynamic array won't change.

 Yes, it is very different from other languages like C and C++ that
 talking about this behavior is worth it.
It really comes down to how the memory itself is owned and managed, and with dynamic arrays, it's the runtime that does that, and dynamic arrays simply don't own or manage their memory, so expecting to them to isn't going to work.
 The elements are const, the slice is passed by value, but the reserved
 capacity disappears:

 void foo(const(int)[] b)
 {
      b ~= 42;
 }

      a.reserve(milllllion);
      assert(a.capacity >= milllllion);

      foo(a);
      assert(a.capacity == 0);    // WAT
Yeah. If you just code in a way that you're making sure that you're not slicing an array and then appending to the slices, then you can rely on reserve working as you'd expect, and situations like this work just fine int[] foo; foo.reserve(target); while(blah) { //... foo ~= next; //... } so long as you're not doing anything else with the array in the process, but if you start passing a dynamic array around to functions which might append to it or a to a slice of it, then all bets are off. I really don't think that it's an issue in general, but if you do want to guarantee that nothing affects the capacity of your array, then you're going to need to either wrap all access to it so that nothing else can actually get at a slice of the array itself, or you need to use another container - one which is actually a container rather than the weird half-container, half-iterator/range construct that a D dynamic array is. Ultimately, I think that the semantics of D arrays make sense and are extremely useful, but they are definitely weird in comparison to pretty much anything else that you're going to run into. - Jonathan M Davis
May 02 2015
parent reply "Andrew Godfrey" <X y.com> writes:
 I really don't think that it's an issue in general, but if you 
 do want to
 guarantee that nothing affects the capacity of your array, then 
 you're going
 to need to either wrap all access to it
I agree with everything Jonathan said in both threads EXCEPT that this is not an issue. The syntax "slice.capacity" gives the impression that 'capacity' is a property of the slice. Such lack of consistency is a SERIOUS issue for D, especially when it occurs in something as basic as an array. One other place I see this problem is the semantics of utf8 strings. This is DEADLY for D adoption. Personally, I think I'll be trying out Rust now, and checking back on these two issues later. This makes me sad, D is so promising ... but unfortunately the high standards it sets in many areas, are completely undercut by such basic problems. I work on large systems, and I can't imagine building a large system in this state.
May 03 2015
next sibling parent "Meta" <jared771 gmail.com> writes:
On Sunday, 3 May 2015 at 15:21:17 UTC, Andrew Godfrey wrote:
 I really don't think that it's an issue in general, but if you 
 do want to
 guarantee that nothing affects the capacity of your array, 
 then you're going
 to need to either wrap all access to it
I agree with everything Jonathan said in both threads EXCEPT that this is not an issue. The syntax "slice.capacity" gives the impression that 'capacity' is a property of the slice. Such lack of consistency is a SERIOUS issue for D, especially when it occurs in something as basic as an array. One other place I see this problem is the semantics of utf8 strings. This is DEADLY for D adoption. Personally, I think I'll be trying out Rust now, and checking back on these two issues later. This makes me sad, D is so promising ... but unfortunately the high standards it sets in many areas, are completely undercut by such basic problems. I work on large systems, and I can't imagine building a large system in this state.
It is consistent if you look at it the perspective of wanting D's slices to have certain semantics while still being memory safe. It is word behavior; however, when you think about it, it is necessary if we want D's slices to have the semantics that they do while stomping memory. If you look at Go's slices, they are NOT memory safe; it's possible (and even easy) in Go to have the data in a slice change under you unexpectedly. This behavior is impossible in D as far as I know. Yes, Rust does statically guarantee this can never happen, but it also needs its complex ownership system to do that. So D is really a midway point between Go and Rust. Not unsafe, unlike Go, but with some slightly confusing behavior.
May 03 2015
prev sibling parent reply Jonathan M Davis via Digitalmars-d-learn writes:
On Sunday, May 03, 2015 15:21:15 Andrew Godfrey via Digitalmars-d-learn wrote:
 I really don't think that it's an issue in general, but if you
 do want to
 guarantee that nothing affects the capacity of your array, then
 you're going
 to need to either wrap all access to it
I agree with everything Jonathan said in both threads EXCEPT that this is not an issue. The syntax "slice.capacity" gives the impression that 'capacity' is a property of the slice. Such lack of consistency is a SERIOUS issue for D, especially when it occurs in something as basic as an array.
I really don't see the problem. It's a natural side effect of how dynamic arrays work in D. The only thing I can see that could be changed without losing some of the capabilities that we currently have is if capacity were explicitly a separate function - e.g. calcArrayCapacity - but I really don't think that that would buy us much. You'd still have to understand how D's dynamic arrays work to understand how the capacity for a dynamic array works. And most code really doesn't care about the capacity of an array. If you profile and find that appending to an array is a hot spot in your program, then you dig in and use reserve or Appender or use something other than a naked array, and you figure out what works best for your particular use case, but for most code, it just works to use ~=. And if you're concerned about reallocations even before profiling, then it's trivial to just call reserve first or to use Appender. And unless you're doing something like constantly slicing and appending to each slice (which I expect to be a rare thing to do), you're really not going to run into problems. Sure, the semantics of D's dynamic arrays take some getting used to - especially if you want to understand all of their ins and outs. But I really don't see why it's a big problem. For the most part, D's dynamic arrays just work. - Jonathan M Davis
May 03 2015
parent reply =?UTF-8?B?QWxpIMOHZWhyZWxp?= <acehreli yahoo.com> writes:
On 05/03/2015 06:06 PM, Jonathan M Davis via Digitalmars-d-learn wrote:

 On Sunday, May 03, 2015 15:21:15 Andrew Godfrey via 
Digitalmars-d-learn wrote:
 I really don't think that it's an issue in general, but if you
 do want to
 guarantee that nothing affects the capacity of your array, then
 you're going
 to need to either wrap all access to it
I agree with everything Jonathan said in both threads EXCEPT that this is not an issue. The syntax "slice.capacity" gives the impression that 'capacity' is a property of the slice. Such lack of consistency is a SERIOUS issue for D, especially when it occurs in something as basic as an array.
I really don't see the problem. It's a natural side effect of how dynamic arrays work in D.
The problem is described in those two sentences of yours: The behavior of a language construct depends on the implementation detail. Unfortunately, it is backward. Normally, the semantics would be layed out and then those semantics would be implemented. It is not the case with D's dynamic arrays. (Aside: Heck, we (at least you and I) can't even decide whether dynamic arrays and slices are the same thing. (For the record, yes, they are the same thing from every vantage point: They have the same semantics and they are implemented in the same way.)) Saying that there is no problem would be contradicting yourself. These are your quotes from the following thread: http://forum.dlang.org/thread/mhtu1k$2it8$1 digitalmars.com#post-mailman.597.1430448607.4581.digitalmars-d-learn:40puremagic.com - "it seems like so few people understand it" - "pretty much everyone has trouble with them initially" - "I _still_ occasionally end up \"Aha!\" moments" So, yes, this is a problem. The problem is even worse because we can't come up with a consistent way of describing the semantics after the fact (after how they are implemented). The following are some of my attempts: "sharing may terminate when the length of the slice increases" "capacity is first-come first served" However, they are only mostly correct, not always. And that's the part that is frustrating the most: There is no way of "understanding" the semantics so that one can "explain" them to at least to one's self.
 The only thing I can see that could be changed without
Please note that at least I am not interested in changing anything: I am interested in a way of explaining the semantics. I could not find that way yet. (I am eagerly waiting for your DConf talk to see how you make sense of it all.)
 You'd still have to understand how D's
 dynamic arrays work to understand how the capacity for a dynamic array
 works.
I am happy that I do my part by opening threads like this one to shine more light on what array capacity is not.
 And most code really doesn't care about the capacity of an array. If
 you profile and find that appending to an array is a hot spot in your
 program, then you dig in and use reserve or Appender or use something 
other
 than a naked array, and you figure out what works best for your 
particular
 use case, but for most code, it just works to use ~=. And if you're
 concerned about reallocations even before profiling, then it's trivial to
 just call reserve first or to use Appender. And unless you're doing
 something like constantly slicing and appending to each slice (which I
 expect to be a rare thing to do), you're really not going to run into
 problems.
I am sorry but that is missing the point. This issue is not only about performance. Note that I may not be the only programmer who writes the entire application. Imagine that I write a function inside a library. Do I have the *right* to append to my parameter slice? Do I have the right to disturb my caller's slice? The answer is a resounding NO, in capital letters. If I lose my caller's potentially precious and carefully crafter capacity, perhaps after a call to reserve(), then I am a bad library function. As a programmer, I need guidelines. Here is what I have at this point: Never append to a parameter slice.
 Sure, the semantics of D's dynamic arrays take some getting used to -
Yes, they do: I started learning D about 6 years ago. How many years more do you reckon until I get them?
 especially if you want to understand all of their ins and outs.
Yes I do.
 But I really
 don't see why it's a big problem.
I said it before why it's a big problem: "The elements are const, the slice is passed by value, but the [slice is affected]." The loss of capacity itself is not a big deal. However, this behavior conflicts with other parts of the language (as well as at least C and C++).
 For the most part, D's dynamic arrays just
 work.
I know you are not trolling but I can't take your brushing off this issue with phares like "for the most part". That's the frigging problem! "For the most part" is not sufficient. Unless somebody explains the semantics in a consistent way, I will continue to try to do it myself. (Remember: Never append to a parameter slice. Good function, good!)
 - Jonathan M Davis
Ali
May 03 2015
parent "Jonathan M Davis" <jmdavisProg gmx.com> writes:
On Monday, 4 May 2015 at 06:23:42 UTC, Ali Çehreli wrote:
 On 05/03/2015 06:06 PM, Jonathan M Davis via 
 Digitalmars-d-learn wrote:
 (I am eagerly waiting for your DConf talk to see how you make 
 sense of it all.)
Well, we'll see how much I'm able to cover about arrays. The focus of the talk is on ranges, not arrays, so I don't know if talking much about non-range stuff like array capacity is going to fit in with everything else that needs to be discussed that _is_ specific to ranges. I'd like to discuss it though. Regardless, I keep meaning to write an article on ranges, and I'm increasingly convinced that I should take a crack at writing one on arrays, since while Steven's article is quite enlightening, I think that it approaches things the wrong way (e.g. it focuses on the memory buffers that the runtime manages rather than the dynamic arrays themselves) and uses the wrong terminology (e.g. talking about the memory buffers that the runtime manages as being dynamic arrays, when according to the language spec T[] is a dynamic array, and what it refers to really doesn't matter with regards to whether it's a dynamic array or not). So, I'll probably turn some portion of my talk into an article or two, and there won't be the same time pressures there. At this point, I feel like I have how dynamic arrays work well ironed out in my head and that it's actually pretty straightforward, but in general, we seem to do a poor job of explaining it in a straightforward manner, which results in far more confusion on the topic than I think there should be.
 For the most part, D's dynamic arrays just
 work.
I know you are not trolling but I can't take your brushing off this issue with phares like "for the most part". That's the frigging problem! "For the most part" is not sufficient. Unless somebody explains the semantics in a consistent way, I will continue to try to do it myself. (Remember: Never append to a parameter slice. Good function, good!)
Aside from performance considerations, you can pretty much ignore the capacity issue. The only other concern that it raises is whether two dynamic arrays still refer to the same memory block, and once you append to either of them, you can't assume that they do, and you can't assume that they don't (though it's easy enough to check via their ptr properties). That can be managed on some level by checking the capacity ahead of time, but really, once you start appending, you have to treat each slice as possibly separate, and if you want to guarantee it, you really need to use dup or idup. But most code just doesn't need to care about capacity. And if you really do need to care, odds are that you can either fix the problem with a reserve call or by using Appender, or you should just not use dynamic arrays directly. In general, I'd consider code that was worrying much about the capacity of dynamic arrays to be error-prone - or at least that it's not going about things in the best way. By its very nature, it's likely to end up being inefficient and is too likely to care about whether two dynamic arrays refer to the same memory or not. Dynamic arrays are badly designed for situations where you can have random stuff appending to your array. They just are. Because there's no ownership, and they're not full reference types, making it trivial to end up with something appended to one dynamic array but not actually end up on the one you want it on. As such, I'd argue that anything that's doing a lot of random appending to arrays shouldn't be using dynamic arrays (at least, not without wrapping them so that there's clear ownership of the memory). So, ultimately, I see array capacity as being pretty much a non-issue, because most code that would care much about is going about things in the wrong way. But maybe what we need is a clear set of guidelines about how dynamic array slices should be managed so that they're generally used efficiently and without risking weird behavior due to expecting two dynamic arrays to refer to the same array when they don't. In general though, I'd argue that code should be constructing arrays up front and then processing them as ranges and not doing a lot of appending to them later. In particular, if you do a lot of appending and removals as you go along, it's going to be a performance hit, and you seriously risk having trouble due to having operated on a slice of the dynamic array you actually wanted to operate on. - Jonathan M Davis
May 04 2015