digitalmars.D.ldc - Struct copies
- bearophile (119/119) Jan 26 2014 The following code is compiled with the ldc2 compiler based on
- Kai Nacke (31/36) Jan 26 2014 Hi bearophile!
- Kai Nacke (6/9) Jan 26 2014 The obvious difference between ldc and clang is that clang
- Stanislav Blinov (9/9) Jan 27 2014 It would seem that ldc is performing a memberwise assignment. It
- bearophile (9/11) Jan 27 2014 What's strange on this?
- Stanislav Blinov (5/14) Jan 28 2014 Won't swap references or pointers (due to immutable) or structs
The following code is compiled with the ldc2 compiler based on LLVM 3.3.1. This swaps two values in-place: void swap(T)(ref T x, ref T y) pure nothrow { immutable aux = x; x = y; y = aux; } If I swap uint values I get the asm and IR: __D5test611__T4swapTkZ4swapFNaNbNfKkKkZv: pushl %esi movl 8(%esp), %ecx movl (%ecx), %edx movl (%eax), %esi movl %esi, (%ecx) movl %edx, (%eax) popl %esi ret $4 ; Function Attrs: nounwind define x86_stdcallcc void "\01__D5test65swap1FNaNbKkKkZv"(i32* entry: %tmp = load i32* %x_arg, align 4 %tmp2 = load i32* %y_arg, align 4 store i32 %tmp2, i32* %x_arg, align 4 store i32 %tmp, i32* %y_arg, align 4 ret void } Often I have a simple struct like this, with a sizeof equal to a size_t or two size_t (a size_t is a 32 bit unsigned on this system): struct Foo { ushort a; char b, c; } If I instantiate the swap function template on values of type Foo I get the asm and IR: __D5test621__T4swapTS5test63FooZ4swapFNaNbNfKS5test63FooKS5test63FooZv: pushl %edi pushl %esi movl 12(%esp), %ecx movw (%ecx), %dx movw 2(%ecx), %si movl (%eax), %edi movl %edi, (%ecx) movw %dx, (%eax) movw %si, 2(%eax) popl %esi popl %edi ret $4 ; Function Attrs: nounwind define x86_stdcallcc void "\01__D5test65swap2FNaNbKS5test63FooKS5test63FooZv"(%test6.Foo* entry: %0 = getelementptr inbounds %test6.Foo* %x_arg, i32 0, i32 0 %1 = load i16* %0, align 1 %2 = getelementptr inbounds %test6.Foo* %x_arg, i32 0, i32 1 %3 = load i8* %2, align 1 %4 = getelementptr inbounds %test6.Foo* %x_arg, i32 0, i32 2 %5 = load i8* %4, align 1 %6 = bitcast %test6.Foo* %y_arg to i32* %7 = bitcast %test6.Foo* %x_arg to i32* %8 = load i32* %6, align 1 store i32 %8, i32* %7, align 1 %9 = getelementptr inbounds %test6.Foo* %y_arg, i32 0, i32 0 store i16 %1, i16* %9, align 1 %10 = getelementptr inbounds %test6.Foo* %y_arg, i32 0, i32 1 store i8 %3, i8* %10, align 1 %11 = getelementptr inbounds %test6.Foo* %y_arg, i32 0, i32 2 store i8 %5, i8* %11, align 1 ret void } If I create a new union Bar that contains a 32 bit integer that comprises all three Foo fields: union Bar { uint all; struct { ushort a; char b, c; } } Now I can define a new swap function that works on values of type Bar: void swap2(ref Bar x, ref Bar y) pure nothrow { immutable Bar aux = x; x.all = y.all; y.all = aux.all; } Its asm and IR are shorter: __D5test65swap2FNaNbKS5test63BarKS5test63BarZv: pushl %esi movl 8(%esp), %ecx movl (%ecx), %edx movl (%eax), %esi movl %esi, (%ecx) movl %edx, (%eax) popl %esi ret $4 ; Function Attrs: nounwind define x86_stdcallcc void "\01__D5test65swap3FNaNbKS5test63BarKS5test63BarZv"(%test6.Bar* entry: %0 = getelementptr inbounds %test6.Bar* %x_arg, i32 0, i32 0 %1 = load i32* %0, align 1 %tmp4 = getelementptr %test6.Bar* %y_arg, i32 0, i32 0 %tmp5 = load i32* %tmp4, align 4 store i32 %tmp5, i32* %0, align 4 store i32 %1, i32* %tmp4, align 4 ret void } In the case of swapping Foos why isn't LLVM optimizing the swap function to a shorter asm like swap2? I have asked this on the LLVM IRC channel, and aKor has told me that similar C code Clang on swaps two Foo using a memcpy so uses a single 32 bit copy. So perhaps ldc2 can do the same for this common case. Bye, bearophile
Jan 26 2014
On Sunday, 26 January 2014 at 13:02:50 UTC, bearophile wrote:In the case of swapping Foos why isn't LLVM optimizing the swap function to a shorter asm like swap2? I have asked this on the LLVM IRC channel, and aKor has told me that similar C code Clang on swaps two Foo using a memcpy so uses a single 32 bit copy. So perhaps ldc2 can do the same for this common case.Hi bearophile! In fact, ldc uses llvm.memcpy in the swap function. This is what I get with ldc 0.13.0-alpha1 using LLVM 3.4 on mingw32 with no optimization: define weak_odr x86_stdcallcc void "\01__D4swap20__T4swapTS4swap3FooZ4swapFNaNbNfKS4swap3FooKS4sw p3FooZv"(%swap.Foo* inreg %y_arg, %swap.Foo* %x_arg) { entry: %aux = alloca %swap.Foo, align 2 %tmp = bitcast %swap.Foo* %aux to i8* %tmp1 = bitcast %swap.Foo* %x_arg to i8* call void llvm.memcpy.p0i8.p0i8.i32(i8* %tmp, i8* %tmp1, i32 4, i32 1, i1 false) %tmp2 = load %swap.Foo* %aux %tmp3 = bitcast %swap.Foo* %x_arg to i8* %tmp4 = bitcast %swap.Foo* %y_arg to i8* call void llvm.memcpy.p0i8.p0i8.i32(i8* %tmp3, i8* %tmp4, i32 4, i32 1, i1 false) %tmp5 = load %swap.Foo* %x_arg %tmp6 = bitcast %swap.Foo* %y_arg to i8* %tmp7 = bitcast %swap.Foo* %aux to i8* call void llvm.memcpy.p0i8.p0i8.i32(i8* %tmp6, i8* %tmp7, i32 4, i32 1, i1 false) %tmp8 = load %swap.Foo* %y_arg ret void } Using -O2 or -O3, I get IR and ASM similar to the one you posted. I do not understand this. I'll check what clang is doing here. Regards, Kai
Jan 26 2014
On Monday, 27 January 2014 at 07:00:18 UTC, Kai Nacke wrote:Using -O2 or -O3, I get IR and ASM similar to the one you posted. I do not understand this. I'll check what clang is doing here.The obvious difference between ldc and clang is that clang generates better alignment information. Otherwise, the IR is almost identical. Regards, Kai
Jan 26 2014
It would seem that ldc is performing a memberwise assignment. It could probably be optimized away since it's known at compile time whether the fields have their own assignment overloaded or not. With unions it's straight: just a memcopy on the largest size (sadly dmd doesn't do that yet, but it also does all sorts of nasty things with unions). With structs it's a little more involving. Generally though, pure code generation issues aside, that is one very strange swap function, bearophile :)
Jan 27 2014
Stanislav Blinov:Generally though, pure code generation issues aside, that is one very strange swap function, bearophile :)What's strange on this? void swap(T)(ref T x, ref T y) pure nothrow { immutable aux = x; x = y; y = aux; } Bye, bearophile
Jan 27 2014
On Tuesday, 28 January 2014 at 01:39:47 UTC, bearophile wrote:Stanislav Blinov:Won't swap references or pointers (due to immutable) or structs with disabled postblit (due to assignment). Solution to first is simple: immutable -> auto. Second would basically require you to perform memcpy manually anyway.Generally though, pure code generation issues aside, that is one very strange swap function, bearophile :)What's strange on this? void swap(T)(ref T x, ref T y) pure nothrow { immutable aux = x; x = y; y = aux; }
Jan 28 2014