digitalmars.D.ldc - Operator overloading leads to bad code optimization
- claptrap (16/16) Dec 03 2021 Just a simple function to split a bezier in two.
- max haughton (4/20) Dec 05 2021 This is (to me at least) an odd one. Maybe there's a
- kinke (7/22) Dec 05 2021 With gdc v11.1, I count 69 instructions for split and 51 for
- ClapTrap (5/28) Dec 05 2021 gdc v11.1 doesn't inline the operator calls when I try it, if you
- max haughton (7/36) Dec 05 2021 To make GCC inline properly without LTO you can use
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (7/9) Dec 06 2021 Multiplying with 0.5 only affects the exponent, but the add could
- max haughton (5/14) Dec 07 2021 My sentence was referring to Iains decision to refuse to inline
Just a simple function to split a bezier in two. Using "-O3" LDC the operator version is 84 instructions LDC the hand expanded math is 49 instructions. It seems something as simple as this should be better optimised? Or am I missing something? https://godbolt.org/z/4h9vob3Yo In fact there's quite a few bits where it looks like completely redundant code is left in? Eg... 123 movss dword ptr [rsp - 24], xmm1 124 movss xmm0, dword ptr [rip + .LCPI4_0] 125 mulss xmm1, xmm0 126 movss dword ptr [rsp - 24], xmm1 137 movss dword ptr [rsp - 24], xmm2 138 mulss xmm2, xmm0 139 movss dword ptr [rsp - 24], xmm2
Dec 03 2021
On Friday, 3 December 2021 at 21:24:07 UTC, claptrap wrote:Just a simple function to split a bezier in two. Using "-O3" LDC the operator version is 84 instructions LDC the hand expanded math is 49 instructions. It seems something as simple as this should be better optimised? Or am I missing something? https://godbolt.org/z/4h9vob3Yo In fact there's quite a few bits where it looks like completely redundant code is left in? Eg... 123 movss dword ptr [rsp - 24], xmm1 124 movss xmm0, dword ptr [rip + .LCPI4_0] 125 mulss xmm1, xmm0 126 movss dword ptr [rsp - 24], xmm1 137 movss dword ptr [rsp - 24], xmm2 138 mulss xmm2, xmm0 139 movss dword ptr [rsp - 24], xmm2This is (to me at least) an odd one. Maybe there's a pass-ordering issue here leading to bad code. Seems like GCC does not have this issue.
Dec 05 2021
On Sunday, 5 December 2021 at 21:38:55 UTC, max haughton wrote:On Friday, 3 December 2021 at 21:24:07 UTC, claptrap wrote:With gdc v11.1, I count 69 instructions for split and 51 for split2 (59 with -O3). So I guess there's a semantic difference here with the slightly changed evaluation order (2D addition before scaling). With `alias Point = __vector(float[2])`, split is reduced to 28 instructions: https://godbolt.org/z/7ffebjaz8Just a simple function to split a bezier in two. Using "-O3" LDC the operator version is 84 instructions LDC the hand expanded math is 49 instructions. It seems something as simple as this should be better optimised? Or am I missing something? https://godbolt.org/z/4h9vob3Yo [...][...] Seems like GCC does not have this issue.
Dec 05 2021
On Sunday, 5 December 2021 at 23:36:21 UTC, kinke wrote:On Sunday, 5 December 2021 at 21:38:55 UTC, max haughton wrote:gdc v11.1 doesn't inline the operator calls when I try it, if you try an earlier version 10.2 it does which reduces it to 48 instructionsOn Friday, 3 December 2021 at 21:24:07 UTC, claptrap wrote:With gdc v11.1, I count 69 instructions for split and 51 for split2 (59 with -O3). So I guess there's a semantic difference here with the slightly changed evaluation order (2D addition before scaling).Just a simple function to split a bezier in two. Using "-O3" LDC the operator version is 84 instructions LDC the hand expanded math is 49 instructions. It seems something as simple as this should be better optimised? Or am I missing something? https://godbolt.org/z/4h9vob3Yo [...][...] Seems like GCC does not have this issue.With `alias Point = __vector(float[2])`, split is reduced to 28 instructions: https://godbolt.org/z/7ffebjaz8Wow, that's awesome!
Dec 05 2021
On Monday, 6 December 2021 at 00:38:18 UTC, ClapTrap wrote:On Sunday, 5 December 2021 at 23:36:21 UTC, kinke wrote:To make GCC inline properly without LTO you can use `-fwhole-program`. Maybe Iain also has a flag that restores the old template behaviour. These kinds of wacky phase ordering (I assume) issues is why I am slightly distrustful of GDC post-inlining decision.On Sunday, 5 December 2021 at 21:38:55 UTC, max haughton wrote:gdc v11.1 doesn't inline the operator calls when I try it, if you try an earlier version 10.2 it does which reduces it to 48 instructionsOn Friday, 3 December 2021 at 21:24:07 UTC, claptrap wrote:With gdc v11.1, I count 69 instructions for split and 51 for split2 (59 with -O3). So I guess there's a semantic difference here with the slightly changed evaluation order (2D addition before scaling).Just a simple function to split a bezier in two. Using "-O3" LDC the operator version is 84 instructions LDC the hand expanded math is 49 instructions. It seems something as simple as this should be better optimised? Or am I missing something? https://godbolt.org/z/4h9vob3Yo [...][...] Seems like GCC does not have this issue.With `alias Point = __vector(float[2])`, split is reduced to 28 instructions: https://godbolt.org/z/7ffebjaz8Wow, that's awesome!
Dec 05 2021
On Monday, 6 December 2021 at 00:41:06 UTC, max haughton wrote:These kinds of wacky phase ordering (I assume) issues is why I am slightly distrustful of GDC post-inlining decision.Multiplying with 0.5 only affects the exponent, but the add could overflow/underflow. Maybe that is wacky for D since it specifies for a set of options. If I specify -O or -O3 I would expect the same options as gcc. Otherwise people will claim that C++ is faster?
Dec 06 2021
On Monday, 6 December 2021 at 11:55:08 UTC, Ola Fosheim Grøstad wrote:On Monday, 6 December 2021 at 00:41:06 UTC, max haughton wrote:My sentence was referring to Iains decision to refuse to inline templates (i.e. defer to LTO). Makes it harder to work out what the compiler is going to do / is doing.These kinds of wacky phase ordering (I assume) issues is why I am slightly distrustful of GDC post-inlining decision.Multiplying with 0.5 only affects the exponent, but the add could overflow/underflow. Maybe that is wacky for D since it a shortcut for a set of options. If I specify -O or -O3 I would expect the same options as gcc. Otherwise people will claim that C++ is faster?
Dec 07 2021