digitalmars.D.learn - No CTFE of function
- Cecil Ward (20/20) Aug 26 2017 I have a pure function that has constant inputs, known at
- ag0aep6g (8/10) Aug 26 2017 That's not how CTFE works. CTFE only kicks in when the *result*
- Cecil Ward (17/27) Aug 26 2017 I think I understand, but I'm not sure. I should have explained
- Cecil Ward (10/40) Aug 26 2017 I was expecting this optimisation to 'return literal constant
- Cecil Ward (3/15) Aug 26 2017 I suspect I posted this in the wrong category completely, should
- Jonathan M Davis via Digitalmars-d-learn (26/71) Aug 26 2017 I don't know what you've seen before, but CTFE _only_ happens when the
- Cecil Ward (2/3) Aug 28 2017 Indeed. I used the term CTFE too loosely.
- ag0aep6g (14/40) Aug 26 2017 I don't know what might prevent the optimization.
- Cecil Ward (44/91) Aug 27 2017 Static had already been tried. Failed. Thanks to your tip, I
- Cecil Ward (10/15) Aug 27 2017 I wonder if there is anything written up anywhere about what
- Mike Parker (23/32) Aug 27 2017 The rules for CTFE are outlined in the docs [1]. What is
- Cecil Ward (5/16) Aug 28 2017 Those links are extremely useful. Many thanks. Because I am full
- Cecil Ward (10/32) Aug 28 2017 I will henceforth use the enum trick advice all times.
I have a pure function that has constant inputs, known at compile-time, contains no funny stuff internally - looked at the generated code, and no RTL calls at all. But in a test call with constant literal values (arrays initialised to literal) passed to the pure routine GDC refuses to CTFE the whole thing, as I would expect it (based on previous experience with d and GDC) to simply generate a trivial function that puts out a block of CTFE-evaluated constant data corresponding to the input. Unfortunately it's a bit too long to post in here. I've tried lots of variations. Function is marked nogc safe pure nothrow Any ideas as to why GDC might just refuse to do CTFE on compile-time-known inputs in a truly pure situation? Haven't tried DMD yet. Can try LDC. Am using d.godbolt.org to look at the result, as I don't have a machine here to run a d compiler on. Other things I can think of. Contains function-in-a-function calls, which are all unlined out in the generated code nicely, and not the first time I've done that with GDC either. Switches: Am using -Os or -O2 or -O3 - tried all. Tuning to presume + enable the latest x86-64 instructions. release build, no bounds-checks.
Aug 26 2017
On Saturday, 26 August 2017 at 16:52:36 UTC, Cecil Ward wrote:Any ideas as to why GDC might just refuse to do CTFE on compile-time-known inputs in a truly pure situation?That's not how CTFE works. CTFE only kicks in when the *result* is required at compile time. For example, when you assign it to an enum. The inputs must be known at compile time, and the interpreter will refuse to go on when you try something impure. But those things don't trigger CTFE. The compiler may choose to precompute any constant expression, but that's an optimization (constant folding), not CTFE.
Aug 26 2017
On Saturday, 26 August 2017 at 18:16:07 UTC, ag0aep6g wrote:On Saturday, 26 August 2017 at 16:52:36 UTC, Cecil Ward wrote:I think I understand, but I'm not sure. I should have explained properly. I suspect what I should have said was that I was expecting an _optimisation_ and I didn't see it. I thought that a specific instance of a call to my pure function that has all compile-time-known arguments would just produce generated code that returned an explicit constant that is worked out by CTFE calculation, replacing the actual code for the general function entirely. So for example auto foo() { return bar( 2, 3 ); } (where bar is strongly pure and completely CTFE-able) should have been replaced by generated x64 code looking exactly literally like auto foo() { return 5; } expect that the returned result would be a fixed-length literal array of 32-but numbers in my case (no dynamic arrays anywhere, these I believe potentially involve RTL calls and the allocator internally).Any ideas as to why GDC might just refuse to do CTFE on compile-time-known inputs in a truly pure situation?That's not how CTFE works. CTFE only kicks in when the *result* is required at compile time. For example, when you assign it to an enum. The inputs must be known at compile time, and the interpreter will refuse to go on when you try something impure. But those things don't trigger CTFE. The compiler may choose to precompute any constant expression, but that's an optimization (constant folding), not CTFE.
Aug 26 2017
On Saturday, 26 August 2017 at 23:49:30 UTC, Cecil Ward wrote:On Saturday, 26 August 2017 at 18:16:07 UTC, ag0aep6g wrote:I was expecting this optimisation to 'return literal constant only' because I have seen it before in other cases with GDC. Obviously generating a call that involves running the algorithm at runtime is a performance disaster when it certainly could have all been thrown away in the particular case in point and been replaced by a return of a precomputed value with zero runtime cost. So this is actually an issue with specific compilers, but I was wondering if I have missed anything about any D general rules that make CTFE evaluation practically impossible?On Saturday, 26 August 2017 at 16:52:36 UTC, Cecil Ward wrote:I think I understand, but I'm not sure. I should have explained properly. I suspect what I should have said was that I was expecting an _optimisation_ and I didn't see it. I thought that a specific instance of a call to my pure function that has all compile-time-known arguments would just produce generated code that returned an explicit constant that is worked out by CTFE calculation, replacing the actual code for the general function entirely. So for example auto foo() { return bar( 2, 3 ); } (where bar is strongly pure and completely CTFE-able) should have been replaced by generated x64 code looking exactly literally like auto foo() { return 5; } expect that the returned result would be a fixed-length literal array of 32-but numbers in my case (no dynamic arrays anywhere, these I believe potentially involve RTL calls and the allocator internally).Any ideas as to why GDC might just refuse to do CTFE on compile-time-known inputs in a truly pure situation?That's not how CTFE works. CTFE only kicks in when the *result* is required at compile time. For example, when you assign it to an enum. The inputs must be known at compile time, and the interpreter will refuse to go on when you try something impure. But those things don't trigger CTFE. The compiler may choose to precompute any constant expression, but that's an optimization (constant folding), not CTFE.
Aug 26 2017
On Saturday, 26 August 2017 at 23:53:36 UTC, Cecil Ward wrote:On Saturday, 26 August 2017 at 23:49:30 UTC, Cecil Ward wrote:I suspect I posted this in the wrong category completely, should have been under GDC (poss applies to LDC too, will test that)[...]I was expecting this optimisation to 'return literal constant only' because I have seen it before in other cases with GDC. Obviously generating a call that involves running the algorithm at runtime is a performance disaster when it certainly could have all been thrown away in the particular case in point and been replaced by a return of a precomputed value with zero runtime cost. So this is actually an issue with specific compilers, but I was wondering if I have missed anything about any D general rules that make CTFE evaluation practically impossible?
Aug 26 2017
On Saturday, August 26, 2017 23:53:36 Cecil Ward via Digitalmars-d-learn wrote:On Saturday, 26 August 2017 at 23:49:30 UTC, Cecil Ward wrote:I don't know what you've seen before, but CTFE _only_ happens when the result must be known at compile time - e.g. it's used to directly initialize an enum or static variable. You will _never_ see CTFE done simply because you called the function with literals. It's quite possible that GDC's optimizer could inline the function and do constant folding and significantly reduce the code that you actually end up with (maybe even optimize it out entirely in some cases), but it would not be CTFE. It would simply be the compiler backend optimizing the code. CTFE is done by the frontend, and it's the same across dmd, ldc, and gdc so long as they have the same version of the frontend (though the current version of gdc is quite old, so if anything, it's behind on what it can do). So, if you want CTFE to occur, then you _must_ assign the result to something that must have its value known at compile time, and that will be the same across the various compilers so long as the frontend version is the same. Any optimizations which might optimize out function calls would be highly dependent on the compiler backend and could easily differ across compiler versions. My guess is that you previously saw your code optimized down such that you thought that the compiler used CTFE when it didn't and that you're not seeing such an optimization now, because your function is too large. If you want to guarantee that the call is made at compile time and not worry about whether the optimizer will do what you want, just assign the result to an enum and then use the enum rather than hoping that the optimizer will optimize the call out for you. - Jonathan M DavisOn Saturday, 26 August 2017 at 18:16:07 UTC, ag0aep6g wrote:I was expecting this optimisation to 'return literal constant only' because I have seen it before in other cases with GDC. Obviously generating a call that involves running the algorithm at runtime is a performance disaster when it certainly could have all been thrown away in the particular case in point and been replaced by a return of a precomputed value with zero runtime cost. So this is actually an issue with specific compilers, but I was wondering if I have missed anything about any D general rules that make CTFE evaluation practically impossible?On Saturday, 26 August 2017 at 16:52:36 UTC, Cecil Ward wrote:I think I understand, but I'm not sure. I should have explained properly. I suspect what I should have said was that I was expecting an _optimisation_ and I didn't see it. I thought that a specific instance of a call to my pure function that has all compile-time-known arguments would just produce generated code that returned an explicit constant that is worked out by CTFE calculation, replacing the actual code for the general function entirely. So for example auto foo() { return bar( 2, 3 ); } (where bar is strongly pure and completely CTFE-able) should have been replaced by generated x64 code looking exactly literally like auto foo() { return 5; } expect that the returned result would be a fixed-length literal array of 32-but numbers in my case (no dynamic arrays anywhere, these I believe potentially involve RTL calls and the allocator internally).Any ideas as to why GDC might just refuse to do CTFE on compile-time-known inputs in a truly pure situation?That's not how CTFE works. CTFE only kicks in when the *result* is required at compile time. For example, when you assign it to an enum. The inputs must be known at compile time, and the interpreter will refuse to go on when you try something impure. But those things don't trigger CTFE. The compiler may choose to precompute any constant expression, but that's an optimization (constant folding), not CTFE.
Aug 26 2017
On Sunday, 27 August 2017 at 00:08:45 UTC, Jonathan M Davis wrote:[...]Indeed. I used the term CTFE too loosely.
Aug 28 2017
On 08/27/2017 01:53 AM, Cecil Ward wrote:On Saturday, 26 August 2017 at 23:49:30 UTC, Cecil Ward wrote:[...]I don't know what might prevent the optimization. You can force (actual) CTFE with an enum or static variable. Then you don't have to rely on the optimizer. And the compiler will reject the code if you try something that can't be done at compile time. Example: ---- auto foo() { enum r = bar(2, 3); return r; } ---- Please don't use the term "CTFE" for the optimization. The two are related, of course. The optimizer may literally evaluate functions at compile time. But I think we better reserve the acronym "CTFE" for the guaranteed/forced kind of precomputation, to avoid confusion.I think I understand, but I'm not sure. I should have explained properly. I suspect what I should have said was that I was expecting an _optimisation_ and I didn't see it. I thought that a specific instance of a call to my pure function that has all compile-time-known arguments would just produce generated code that returned an explicit constant that is worked out by CTFE calculation, replacing the actual code for the general function entirely. So for example auto foo() { return bar( 2, 3 ); } (where bar is strongly pure and completely CTFE-able) should have been replaced by generated x64 code looking exactly literally like auto foo() { return 5; } expect that the returned result would be a fixed-length literal array of 32-but numbers in my case (no dynamic arrays anywhere, these I believe potentially involve RTL calls and the allocator internally).I was expecting this optimisation to 'return literal constant only' because I have seen it before in other cases with GDC. Obviously generating a call that involves running the algorithm at runtime is a performance disaster when it certainly could have all been thrown away in the particular case in point and been replaced by a return of a precomputed value with zero runtime cost. So this is actually an issue with specific compilers, but I was wondering if I have missed anything about any D general rules that make CTFE evaluation practically impossible?
Aug 26 2017
On Sunday, 27 August 2017 at 00:20:47 UTC, ag0aep6g wrote:On 08/27/2017 01:53 AM, Cecil Ward wrote:Static had already been tried. Failed. Thanks to your tip, I tried enum next. Failed as well, wouldn't compile with GDC. I tried LDC, which did the right thing in all cases. Optimised correctly in every use case to not compute in the generated code, just return the literal compile-time calculated result array by writing a load of immediate values straight to the destination. Hurrah for LDC. Then tried DMD via web-based edit/compile feature at dlang.org website. Refused to compile in the enum case and actually told me why, in a very very cryptic way. I worked out that it has a problem internally (this is a now an assignment into an enum, so I have permission to use the term CTFE now) in that it refuses to do CTFE if any variable is declared using an =void initialiser to stop the wasteful huge pre-fill with zeros which could take half an hour on a large object with slow memory and for all I know play havoc with the cache. So simply deleting the = void fixed the problem with DMD. So that's it. There are unknown random internal factors that prevent CTFE or CTFE-type optimisation. I had wondered if pointers might present a problem. The function in question originally was specced something like pure nothrow nogc safe void pure_compute( result_t * p_result, in input_t x ) and just as a test, I tried changing it to result_t pure_compute( in input_t x ) instead. I don't think it makes any difference though. I discovered the DMD -void thing at that point so this was not checked out properly. Your enum tip was very helpful. Ps GDC errors: Another thing that has wasted a load of time is that GDC signals errors on lines where there is a function call that is fine, yet the only problem is in the body of the function that is _being_ called itself, and fixing the function makes the phantom error at the call-site go away. This nasty behaviour has you looking for errors at and before the call-site, or thinking you have the spec of the call args wrong or incorrect types. [Compiler-Explorer problem : I am perhaps blaming GDC unfairly, because I have only ever used it through the telescope that is d.godbolt.org and I am assuming that reports errors on the correct source lines. It doesn't show error message text tho, which is a nightmare, but nothing to do with the compiler obviously.]On Saturday, 26 August 2017 at 23:49:30 UTC, Cecil Ward wrote:[...]I don't know what might prevent the optimization. You can force (actual) CTFE with an enum or static variable. Then you don't have to rely on the optimizer. And the compiler will reject the code if you try something that can't be done at compile time. Example: ---- auto foo() { enum r = bar(2, 3); return r; } ---- Please don't use the term "CTFE" for the optimization. The two are related, of course. The optimizer may literally evaluate functions at compile time. But I think we better reserve the acronym "CTFE" for the guaranteed/forced kind of precomputation, to avoid confusion.I think I understand, but I'm not sure. I should have explained properly. I suspect what I should have said was that I was expecting an _optimisation_ and I didn't see it. I thought that a specific instance of a call to my pure function that has all compile-time-known arguments would just produce generated code that returned an explicit constant that is worked out by CTFE calculation, replacing the actual code for the general function entirely. So for example auto foo() { return bar( 2, 3 ); } (where bar is strongly pure and completely CTFE-able) should have been replaced by generated x64 code looking exactly literally like auto foo() { return 5; } expect that the returned result would be a fixed-length literal array of 32-but numbers in my case (no dynamic arrays anywhere, these I believe potentially involve RTL calls and the allocator internally).I was expecting this optimisation to 'return literal constant only' because I have seen it before in other cases with GDC. Obviously generating a call that involves running the algorithm at runtime is a performance disaster when it certainly could have all been thrown away in the particular case in point and been replaced by a return of a precomputed value with zero runtime cost. So this is actually an issue with specific compilers, but I was wondering if I have missed anything about any D general rules that make CTFE evaluation practically impossible?
Aug 27 2017
On Sunday, 27 August 2017 at 17:36:54 UTC, Cecil Ward wrote:On Sunday, 27 August 2017 at 00:20:47 UTC, ag0aep6g wrote:I wonder if there is anything written up anywhere about what kinds of things are blockers to either CTFE or to successful constant-folding optimisation in particular compilers or in general? Would be useful to know what to stay away from if you really need to make sure that horrendously slow code does not get run at runtime. Sometimes it is possible or even relatively easy to reorganise things and do without certain practices in order to win such a massive reward.[...]Static had already been tried. Failed. Thanks to your tip, I tried enum next. Failed as well, wouldn't compile with GDC. [...]
Aug 27 2017
On Sunday, 27 August 2017 at 17:47:54 UTC, Cecil Ward wrote:I wonder if there is anything written up anywhere about what kinds of things are blockers to either CTFE or to successful constant-folding optimisation in particular compilers or in general? Would be useful to know what to stay away from if you really need to make sure that horrendously slow code does not get run at runtime. Sometimes it is possible or even relatively easy to reorganise things and do without certain practices in order to win such a massive reward.The rules for CTFE are outlined in the docs [1]. What is described there is all there is to it. If those criteria are not met, the function cannot be executed at compile time. More importantly, as mentioned earlier in the thread, CTFE will only occur if a function *must* be executed at compile time, i.e. it is in a context where the result of the function is required at compile-time. An enum declaration is such a situation, a variable initialization is not. There are also a couple of posts on the D Blog. Stefan has written about the new CTFE engine [1] and I posted something showing a compile-time sort. These illustrate the points laid out in the documentation. As for compiler optimizations, there are some basic optimizations that will be common across all compilers, and you can google for compiler optimizations to find such generalities. Many of these apply across languages, and those specific to the C-family languages will likely be found in D compilers. Beyond that, I'm unaware of any documentation that outlines optimizations in D compilers. [1] https://dlang.org/spec/function.html#interpretation [2] https://dlang.org/blog/2017/04/10/the-new-ctfe-engine/ [3] https://dlang.org/blog/2017/06/05/compile-time-sort-in-d/
Aug 27 2017
On Monday, 28 August 2017 at 03:16:24 UTC, Mike Parker wrote:On Sunday, 27 August 2017 at 17:47:54 UTC, Cecil Ward wrote:Those links are extremely useful. Many thanks. Because I am full of NHS pain drugs, I am pretty confused half the time, and so finding documentation is difficult for me through the haze, so much appreciated. RTFM of course applies as always.[...]The rules for CTFE are outlined in the docs [1]. What is described there is all there is to it. If those criteria are not met, the function cannot be executed at compile time. More importantly, as mentioned earlier in the thread, CTFE will only occur if a function *must* be executed at compile time, i.e. it is in a context where the result of the function is required at compile-time. An enum declaration is such a situation, a variable initialization is not. [...]
Aug 28 2017
On Saturday, 26 August 2017 at 16:52:36 UTC, Cecil Ward wrote:I have a pure function that has constant inputs, known at compile-time, contains no funny stuff internally - looked at the generated code, and no RTL calls at all. But in a test call with constant literal values (arrays initialised to literal) passed to the pure routine GDC refuses to CTFE the whole thing, as I would expect it (based on previous experience with d and GDC) to simply generate a trivial function that puts out a block of CTFE-evaluated constant data corresponding to the input. Unfortunately it's a bit too long to post in here. I've tried lots of variations. Function is marked nogc safe pure nothrow Any ideas as to why GDC might just refuse to do CTFE on compile-time-known inputs in a truly pure situation? Haven't tried DMD yet. Can try LDC. Am using d.godbolt.org to look at the result, as I don't have a machine here to run a d compiler on. Other things I can think of. Contains function-in-a-function calls, which are all unlined out in the generated code nicely, and not the first time I've done that with GDC either. Switches: Am using -Os or -O2 or -O3 - tried all. Tuning to presume + enable the latest x86-64 instructions. release build, no bounds-checks.I will henceforth use the enum trick advice all times. I noticed that the problem with init =void is compiler-dependent. Using an enum for real CTFE, I don't get error messages from LDC or GDC (i.e. [old?] versions currently up on d.godbolt.org) x64 compilers even if I do use the =void optimisation. This saved a totally wasteful and pointless zero-fill of 64 bytes using 2 YMM instructions in the particular unit test case I had, but of course could easily be dramatically bad news depending on the array size I am unnecessarily filling.
Aug 28 2017