www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Array concatenation & optimisation

reply IchorDev <zxinsworld gmail.com> writes:
Obviously when writing optimised code it is desirable to reduce 
heap allocation frequency. With that in mind, I'm used to being 
told by the compiler that I can't do this in ` nogc` code:
```d
void assign(ref int[4] a)  nogc{
	a[] = [1,3,6,9]; //Error: array literal in ` nogc` function 
`assign` may cause a GC allocation
}
```
 'may cause' a GC allocation
Does this mean that array literals are *always* separately allocated first, or is this usually optimised out? For instance, will this example *always* allocate a new dynamic array for the array literal, and then append it to the existing one, even in optimised builds? ```d void append(ref int[] a){ a ~= [5, 4, 9]; } ``` Obviously for a long array literal, the benefit of knowing its length upfront (and the readability) would probably outweigh the allocation; but for small array literals, is splitting them into separate concatenations going to yield faster code, or will I waste my time and screen space? P.S. I am mostly addressing LDC2 & GDC's output, since I am aware that DMD's optimisations are usually minimal.
Jul 20
next sibling parent reply Nick Treleaven <nick geany.org> writes:
On Sunday, 21 July 2024 at 05:43:32 UTC, IchorDev wrote:
 Obviously when writing optimised code it is desirable to reduce 
 heap allocation frequency. With that in mind, I'm used to being 
 told by the compiler that I can't do this in ` nogc` code:
 ```d
 void assign(ref int[4] a)  nogc{
 	a[] = [1,3,6,9]; //Error: array literal in ` nogc` function 
 `assign` may cause a GC allocation
 }
 ```
Just to mention that if you assign to the static array it works: `a = [1,3,6,9];`.
 'may cause' a GC allocation
Does this mean that array literals are *always* separately allocated first, or is this usually optimised out?
My understanding is that they do not allocate if used to initialize or assign a static array. That includes passing an array literal as an argument to a static array function parameter. A scope slice can also be initialized from an array literal in nogc code: https://dlang.org/changelog/2.102.0.html#dmd.scope-array-on-stack But assigning a literal to a scope slice is not allowed in nogc code.
 For instance, will this example *always* allocate a new dynamic 
 array for the array literal, and then append it to the existing 
 one, even in optimised builds?
 ```d
 void append(ref int[] a){
 	a ~= [5, 4, 9];
 }
 ```
If there is enough spare capacity in a's allocation, no allocation will occur.
 Obviously for a long array literal, the benefit of knowing its 
 length upfront (and the readability) would probably outweigh 
 the allocation; but for small array literals, is splitting them 
 into separate concatenations going to yield faster code, or 
 will I waste my time and screen space?
Note that concatenation always allocates:
 Concatenation always creates a copy of its operands, even if 
 one of the operands is a 0 length array
https://dlang.org/spec/arrays.html#array-concatenation
 P.S. I am mostly addressing LDC2 & GDC's output, since I am 
 aware that DMD's optimisations are usually minimal.
While people may say that on the forum, dmd's optimizer does actually do data flow analysis: https://forum.dlang.org/post/uqhgoi$31a7$1 digitalmars.com
Jul 21
next sibling parent Nick Treleaven <nick geany.org> writes:
On Sunday, 21 July 2024 at 10:33:38 UTC, Nick Treleaven wrote:
 For instance, will this example *always* allocate a new 
 dynamic array for the array literal, and then append it to the 
 existing one, even in optimised builds?
 ```d
 void append(ref int[] a){
 	a ~= [5, 4, 9];
 }
 ```
If there is enough spare capacity in a's allocation, no allocation will occur.
Sorry, I see what you mean. I've compared the -vasm output (without -O) for that function and this one: ```d void append(ref int[] a){ enum e = [5, 4, 9]; a ~= e; } ``` The ASM output is the same. If you use runtime value elements I'm not sure but I think there's still no heap allocation. I'm not good at reading ASM though.
Jul 21
prev sibling next sibling parent IchorDev <zxinsworld gmail.com> writes:
On Sunday, 21 July 2024 at 10:33:38 UTC, Nick Treleaven wrote:
 Just to mention that if you assign to the static array it 
 works: `a = [1,3,6,9];`.
Bonkers. `array[]` is meant to be 'all of `array` as a slice', so you'd think that's how you copy a slice to a static array, but no!
 My understanding is that they do not allocate if used to 
 initialize or assign a static array. That includes passing an 
 array literal as an argument to a static array function 
 parameter.
D is pretty eager to make array literals into slices; thus have I developed a general distrust for using array literals in the vicinity of static arrays. Case and point:
 A scope slice can also be initialized from an array literal in 
  nogc code:
 https://dlang.org/changelog/2.102.0.html#dmd.scope-array-on-stack
 But assigning a literal to a scope slice is not allowed in 
  nogc code.
 If there is enough spare capacity in a's allocation, no 
 allocation will occur.

 Obviously for a long array literal, the benefit of knowing its 
 length upfront (and the readability) would probably outweigh 
 the allocation; but for small array literals, is splitting 
 them into separate concatenations going to yield faster code, 
 or will I waste my time and screen space?
Note that concatenation always allocates:
 Concatenation always creates a copy of its operands, even if 
 one of the operands is a 0 length array
https://dlang.org/spec/arrays.html#array-concatenation
Thank you for all this info!
 P.S. I am mostly addressing LDC2 & GDC's output, since I am 
 aware that DMD's optimisations are usually minimal.
 While people may say that on the forum, dmd's optimizer does 
 actually do data flow analysis:
 https://forum.dlang.org/post/uqhgoi$31a7$1 digitalmars.com
People frequently come here to complain that 'D is slow' when they're using DMD, and often not even using `-O`. The responses will then usually contain some version of 'don't use DMD to generate optimised code'.
Jul 21
prev sibling parent Nick Treleaven <nick geany.org> writes:
On Sunday, 21 July 2024 at 10:33:38 UTC, Nick Treleaven wrote:
 On Sunday, 21 July 2024 at 05:43:32 UTC, IchorDev wrote:
 Does this mean that array literals are *always* separately 
 allocated first, or is this usually optimised out?
My understanding is that they do not allocate if used to initialize or assign a static array. That includes passing an array literal as an argument to a static array function parameter. A scope slice can also be initialized from an array literal in nogc code: https://dlang.org/changelog/2.102.0.html#dmd.scope-array-on-stack
Another (related) case that doesn't heap allocate is when passing a literal as a scope parameter: ```d void main() nogc { f([1,2]); } void f(scope int[]) nogc; ``` I will look at adding these cases to the spec.
Jul 22
prev sibling next sibling parent reply Johan <j j.nl> writes:
On Sunday, 21 July 2024 at 05:43:32 UTC, IchorDev wrote:
 
 Does this mean that array literals are *always* separately 
 allocated first, or is this usually optimised out?
Not always allocated, see your example below. I don't quite know what the heuristic is for allocation or not...
 For instance, will this example *always* allocate a new dynamic 
 array for the array literal, and then append it to the existing 
 one, even in optimised builds?
 ```d
 void append(ref int[] a){
 	a ~= [5, 4, 9];
 }
 ```
https://d.godbolt.org/z/sG5Kancs4 The short array is not dynamically allocated (it's allocated on the stack, or for larger arrays it will be a hidden symbol in the binary image), even at `-O0` (i.e. `-O` was not passed). -Johan
Jul 21
parent IchorDev <zxinsworld gmail.com> writes:
On Sunday, 21 July 2024 at 15:41:50 UTC, Johan wrote:
 https://d.godbolt.org/z/sG5Kancs4

 The short array is not dynamically allocated (it's allocated on 
 the stack, or for larger arrays it will be a hidden symbol in 
 the binary image), even at `-O0` (i.e. `-O` was not passed).
Wow thanks, that's an impressive find! I didn't know LDC was quite so clever either.
Jul 21
prev sibling parent reply Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Sunday, 21 July 2024 at 05:43:32 UTC, IchorDev wrote:
 Obviously when writing optimised code it is desirable to reduce 
 heap allocation frequency. With that in mind, I'm used to being 
 told by the compiler that I can't do this in ` nogc` code:
 ```d
 void assign(ref int[4] a)  nogc{
 	a[] = [1,3,6,9]; //Error: array literal in ` nogc` function 
 `assign` may cause a GC allocation
 }
 ```
 'may cause' a GC allocation
Does this mean that array literals are *always* separately allocated first, or is this usually optimised out? For instance, will this example *always* allocate a new dynamic array for the array literal, and then append it to the existing one, even in optimised builds?
Optimization is unrelated to language semantics, except for what optimizations the language semantics allow for. Even if an allocation is optimized away, if the language semantics don’t require this optimization (which means it’s no optimization actually), it must pretend it’s not happening as far as diagnostics etc. are concerned. My mental model of array literals is that array literals have their own, internal type. Think of `__arrayLiteral!(T, n)`. If you ask an array literal its type (e.g. using `typeof`), it’ll say `T[]`. But when you use it to initialize a static array, it simply behaves differently than a `T[]` because it just isn’t one. The madness about this is that even casts don’t affect the typing. ```d void main() nogc { int x; enum int[] xs = [1,2,3]; int[4] ys = cast(int[])(xs ~ x); // good int[4] zs = (b ? xs : xs) ~ x; // error } ``` Here, neither the explicit type `int[]` for `xs` or the cast (you can remove any subset of them) make it so that the assignment isn’t ` nogc`. The whole `__arrayLiteral!(T, n)` is after some semi-constant folding that only applies to the length. In no way is `xs ~ x` a compile-time constant as `x` is a run-time value. However, if you use `(b ? xs : xs)` instead of plain `xs` with a run-time boolean `b`, the language doesn’t see it as an array with compile-time-known length, and thus requires allocation. In your example, you’re not assigning an array literal to a static array as far as the type system is concerned. The left-hand side is `a[]`, which has type `int[]`. So, as far as the type system is concerned, you assign an array literal to an `int[]`, and that requires allocating the literal on the GC heap, rendering the function non-` nogc`. If the optimizer figures out that all of this ends up just putting some values in some static array and it removes the allocation, this has no effect on whether the function is ` nogc`.
Jul 22
parent IchorDev <zxinsworld gmail.com> writes:
On Monday, 22 July 2024 at 12:03:33 UTC, Quirin Schroll wrote:
 this has no effect on whether the function is ` nogc`.
That is a highly amusing (but obviously understandable) logical contradiction that I’d never considered before. ‘Sometimes you can’t use non-GC code in ` nogc` code.’
Jul 22