digitalmars.D - inlining...
- Manu (57/57) Mar 13 2014 So, I'm constantly running into issues with not having control over inli...
- John Colvin (13/106) Mar 14 2014 As much as I like the idea:
- w0rp (9/22) Mar 14 2014 I think it's possible for a programmer to make a better decision
- duh (15/27) Mar 14 2014 No compiler gets this right 100% of the time, so if it is the
- Ethan (16/17) Mar 14 2014 If all methods are virtual by default, how can the compiler
- Manu (35/46) Mar 14 2014 The compiler applies generalised heuristics, which are certainly for the
- John Colvin (13/94) Mar 14 2014 Thanks for the explanations.
- bearophile (7/16) Mar 14 2014 If the function is private in a module, and it's called only from
- Manu (16/32) Mar 14 2014 Yup, this is a classic example. Extremely relevant.
- Nick Sabalausky (26/34) Mar 14 2014 I don't know how this compares to other inliners, but FWIW, DMD's
- Chris Williams (4/46) Mar 14 2014 Probably one easy adjustment that would result in a lot of gain
- Jacob Carlborg (33/86) Mar 14 2014 Haven't we already agreed a pragma for force inline should be
- Michel Fortin (8/11) Mar 14 2014 Maybe, but what does it do? Should it just inline the call to func? Or
- Jacob Carlborg (4/7) Mar 14 2014 I guess Manu needs to answer this one.
- Manu (3/12) Mar 14 2014 I'd say it should inline only func. Any sub-calls are subject to the
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (15/18) Mar 18 2014 I agree with you that explicit inlining is absolutely necessary
- Manu (41/54) Mar 18 2014 Inlining is a basic codegen tool, and it's important that low-level
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (9/28) Mar 18 2014 You could provide it with a recursion level parameter or
- Manu (34/57) Mar 19 2014 Again, I think this is significantly overcomplicating something which se...
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (14/31) Mar 19 2014 Not if you delay optimization until profiling and focus on higher
- Manu (22/53) Mar 19 2014 Okay, do you have use cases for any of this stuff? Are you just making i...
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (10/17) Mar 19 2014 I don't need anything, I hand optimize prematurely. And I don't
- Manu (20/35) Mar 19 2014 The problem is upside down. If you want to inline multiple levels, you
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (45/77) Mar 20 2014 Yes, that is true in cases where leaves are frequently visited.
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (12/12) Mar 20 2014 I just want to add these reasons for having inlining despite
- Manu (44/112) Mar 20 2014 I'm sorry. I really can't support any of these wildly complex ideas. I j...
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (7/16) Mar 20 2014 They aren't actually complex, except tail-call optimization (but
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (3/3) Mar 20 2014 Please note that 1 level mixin is not sufficient in the case of
- Manu (16/19) Mar 20 2014 I don't think I would ever want to inline the whole call tree of a libra...
- ponce (20/28) Mar 20 2014 It looks promising when seen like that, but introducing explicit
- "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (13/16) Mar 20 2014 The rules aren't random. The inliner conceptually use weighting
- Puming (20/102) Mar 23 2014 Maybe we could have both declare site inlining and call site
- Jacob Carlborg (15/20) Mar 20 2014 Can't you create a tuple with different attributes depending on which
- Manu (16/116) Mar 14 2014 It's been discussed. I never agreed to it (I _really_ don't like it), bu...
- Daniel Murphy (4/19) Mar 14 2014 Huh? The pragma could easily be restricted to apply to exactly one func...
- Manu (7/29) Mar 14 2014 Then why bother with a pragma?
- Manu (6/38) Mar 17 2014 *bump*
- Andrei Alexandrescu (8/44) Mar 17 2014 I'd like to see a solution to inlining along the lines of "pliz pliz
- Manu (7/13) Mar 17 2014 I don't follow, how does that work?
- Andrei Alexandrescu (6/20) Mar 17 2014 The notion that a compiler can ask for any function to be inlined
- Manu (7/33) Mar 18 2014 Exactly, we already have it in CTFE. It doesn't really add any new conce...
- David Gileadi (4/20) Mar 14 2014 Is there already some trait for getting the string value of a function
- Paulo Pinto (4/28) Mar 14 2014 Might be problematic with modules delivered only in .di + binary form.
- David Gileadi (3/34) Mar 14 2014 This would need to behave similarly.
So, I'm constantly running into issues with not having control over inline. I've run into it again doing experiments in preparation for my dconf talk... I have identified 2 cases which come up regularly: 1. A function that should always be inline unconditionally (std.simd is effectively blocked on this) 2. A particular invocation of a function should be inlined for this call only The first case it just about having control over code gen. Some functions should effectively be macros or pseudo-intrinsics (ie, intrinsic wrappers in std.simd, beauty wrappers around asm code, etc), and I don't ever want to see a symbol appear in the binary. My suggestion is introduction of __forceinline or something like it. We need this. The second case is interesting, and I've found it comes up a few times on different occasions. In my current instance, I'm trying to build generic framework to perform efficient composable data processing, and a basic requirement is that the components are inlined, such that the optimiser can interleave the work properly. Let's imagine I have a template which implements a work loop, which wants to call a bunch of work elements it receives by alias. The issue is, each of those must be inlined, for this call instance only, and there's no way to do this. I'm gonna draw the line at stringified code to use with mixin; I hate that, and I don't want to encourage use of mixin or stringified code in user-facing API's as a matter of practise. Also, some of these work elements might be useful functions in their own right, which means they can indeed be a function existing somewhere else that shouldn't itself be attributed as __forceinline. What are the current options to force that some code is inlined? My feeling is that an ideal solution would be something like an enhancement which would allow the 'mixin' keyword to be used with regular function calls. What this would do is 'mix in' the function call at this location, ie, effectively inline that particular call, and it leverages a keyword and concept that we already have. It would obviously produce a compile error of the code is not available. I quite like this idea, but there is a potential syntactical problem; how to assign the return value? int func(int y) { return y*y+10; } int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get in the way' if the output int output = mixin(func(10)); // now i feel paren spammy... mixin(int output = func(10)); // this doesn't feel right... My feeling is the first is the best, but I'm not sure about that grammatically. The other thing that comes to mind is that it seems like this might make a case for AST macros... but I think that's probably overkill for this situation, and I'm not confident we're ever gonna attempt to crack that nut. I'd like to see something practical and unobjectionable preferably. This problem is fairly far reaching; phobos receives a lot of lambdas these days, which I've found don't reliably inline and interfere with the optimisers ability to optimise the code. There was some discussion about a code unrolling API some time back, and this would apply there (the suggested solution used string mixins! >_<). Debug build performance is a problem which would be improved with this feature.
Mar 13 2014
On Friday, 14 March 2014 at 06:21:27 UTC, Manu wrote:So, I'm constantly running into issues with not having control over inline. I've run into it again doing experiments in preparation for my dconf talk... I have identified 2 cases which come up regularly: 1. A function that should always be inline unconditionally (std.simd is effectively blocked on this) 2. A particular invocation of a function should be inlined for this call only The first case it just about having control over code gen. Some functions should effectively be macros or pseudo-intrinsics (ie, intrinsic wrappers in std.simd, beauty wrappers around asm code, etc), and I don't ever want to see a symbol appear in the binary. My suggestion is introduction of __forceinline or something like it. We need this. The second case is interesting, and I've found it comes up a few times on different occasions. In my current instance, I'm trying to build generic framework to perform efficient composable data processing, and a basic requirement is that the components are inlined, such that the optimiser can interleave the work properly. Let's imagine I have a template which implements a work loop, which wants to call a bunch of work elements it receives by alias. The issue is, each of those must be inlined, for this call instance only, and there's no way to do this. I'm gonna draw the line at stringified code to use with mixin; I hate that, and I don't want to encourage use of mixin or stringified code in user-facing API's as a matter of practise. Also, some of these work elements might be useful functions in their own right, which means they can indeed be a function existing somewhere else that shouldn't itself be attributed as __forceinline. What are the current options to force that some code is inlined? My feeling is that an ideal solution would be something like an enhancement which would allow the 'mixin' keyword to be used with regular function calls. What this would do is 'mix in' the function call at this location, ie, effectively inline that particular call, and it leverages a keyword and concept that we already have. It would obviously produce a compile error of the code is not available. I quite like this idea, but there is a potential syntactical problem; how to assign the return value? int func(int y) { return y*y+10; } int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get in the way' if the output int output = mixin(func(10)); // now i feel paren spammy... mixin(int output = func(10)); // this doesn't feel right... My feeling is the first is the best, but I'm not sure about that grammatically. The other thing that comes to mind is that it seems like this might make a case for AST macros... but I think that's probably overkill for this situation, and I'm not confident we're ever gonna attempt to crack that nut. I'd like to see something practical and unobjectionable preferably. This problem is fairly far reaching; phobos receives a lot of lambdas these days, which I've found don't reliably inline and interfere with the optimisers ability to optimise the code. There was some discussion about a code unrolling API some time back, and this would apply there (the suggested solution used string mixins! >_<). Debug build performance is a problem which would be improved with this feature.As much as I like the idea: Something always tells me this is the compilers job... What clever reasoning are you applying that the compiler's inliner can't? It seems like a different situation to say SIMD code, where correctly structuring loops can require a lot of gymnastics that the compiler can't or won't (floating point conformance) do. The inlining decision seems easily automatable in comparison. I understand that unoptimised builds for debugging are a problem, but a sensible compiler let's you hand pick your optimisation passes. In short: why are compilers not good enough at this that the programmer needs to be involved?
Mar 14 2014
On Friday, 14 March 2014 at 08:03:04 UTC, John Colvin wrote:As much as I like the idea: Something always tells me this is the compilers job... What clever reasoning are you applying that the compiler's inliner can't? It seems like a different situation to say SIMD code, where correctly structuring loops can require a lot of gymnastics that the compiler can't or won't (floating point conformance) do. The inlining decision seems easily automatable in comparison. I understand that unoptimised builds for debugging are a problem, but a sensible compiler let's you hand pick your optimisation passes. In short: why are compilers not good enough at this that the programmer needs to be involved?I think it's possible for a programmer to make a better decision about what to do than a compiler. Clearly the compiler isn't smart enough to make the right decisions for Manu now, so I think it would be acceptable to at least insert functionality to give him that control now until the compiler can. There is the question of whether or not it's possible for a compiler to make the right decisions in the right places, but I'm not experienced enough to address that.
Mar 14 2014
Something always tells me this is the compilers job... What clever reasoning are you applying that the compiler's inliner can't? It seems like a different situation to say SIMD code, where correctly structuring loops can require a lot of gymnastics that the compiler can't or won't (floating point conformance) do. The inlining decision seems easily automatable in comparison. I understand that unoptimised builds for debugging are a problem, but a sensible compiler let's you hand pick your optimisation passes. In short: why are compilers not good enough at this that the programmer needs to be involved?No compiler gets this right 100% of the time, so if it is the compilers job they are failing. Most C++ compilers will sometimes require use of forceinline with SSE intrinsics. Unless it has PGO support the compiler has no idea about the runtime usage of that code. It wouldn't know which code the program spends 90% of its time in so it just applies general heuristics when deciding to inline. What I'd like is the ability to set a inline level per function. Something like 0 being always inline, and 10 being never inline. Unless specified otherwise, the default would be 5 So if you want forceinline behavior inline(0) vec3 dot(vec3 a, vec3 b); //always inlined inline(10) vec3 cross(vec3 a, vec3 b); //never inlined And override it at callsite-- inline(10) auto v = dot(a,b);
Mar 14 2014
On Friday, 14 March 2014 at 08:03:04 UTC, John Colvin wrote:Something always tells me this is the compilers jobIf all methods are virtual by default, how can the compiler inline the code? Properties are a great example where I'd want to both final and inline them in quite a few cases. In those cases, the existence of inline would negate the need for final entirely because being a virtual method would never come in to the equation. This would also apply to UFCS functions, which I use to wrap D types such as strings in to C++ interface vtables without making the programmer jump through a bunch of hoops. Inline in Microsoft's compiler is always considered a strong hint. There are cases where even __forceinline won't actually inline a function if the compiler decides you're on crack. I assume this would be the case here, and you'd just be helping inform the compiler what you want inlined in case it slips up and gets it wrong.
Mar 14 2014
On 14 March 2014 18:03, John Colvin <john.loughran.colvin gmail.com> wrote:As much as I like the idea: Something always tells me this is the compilers job... What clever reasoning are you applying that the compiler's inliner can't? It seems like a different situation to say SIMD code, where correctly structuring loops can require a lot of gymnastics that the compiler can't or won't (floating point conformance) do. The inlining decision seems easily automatable in comparison. I understand that unoptimised builds for debugging are a problem, but a sensible compiler let's you hand pick your optimisation passes. In short: why are compilers not good enough at this that the programmer needs to be involved?The compiler applies generalised heuristics, which are certainly for the 'common' case, whatever that happens to be. The compiler simply doesn't know what you're doing, so it's very hard for the compiler to do anything really intelligent. Inlining heuristics are fickle, and they also don't know what you're actually trying to do. Is a function 'long'? How long is 'long'? Is the function 'hot'? Do we prefer code size or execution speed? Is the function called only from this location, or is it used in many locations? Etc. Inlining is one of the most fuzzy pieces of logic in the compiler, and relies on a lot of information that is impossible for the compiler to deduce, so it applies heuristics to try and do a decent job, but it's certainly not perfect. I argue, nothing so fickle can exist in the language without having a manual override. Especially not in a native language. In my current case, the functions I need to inline are not exactly trivial. They're really pushing the boundaries of the compilers inliner heuristics, and then I'm calling a series of such functions that operate on parallel data. If they don't inline, the performance equals the sum of the functions plus some overhead. If they all inline, the performance is equal to only the longest one, and no overhead (the others fill in pipeline gaps). Further, some of these functions embed some shared work... if they don't inline, this work is repeated. If they do inline, the redundant repeated work is eliminated. My experiments with std.algorithm were a failure. I realised quickly that I couldn't rely on the inliner to do a satisfactory job, and the optimiser was unable to do it's job properly. std.algorithm could really benefit from the mixin suggestion since things like predicate functions are always trivial, usually supplied as little lambdas, and inlining isn't reliable. Especially in the debug builds. Something like algorithm loop sugar shouldn't run heaps worse than an explicit loop just because it happens to be implemented by a generic function.
Mar 14 2014
On Friday, 14 March 2014 at 11:04:34 UTC, Manu wrote:On 14 March 2014 18:03, John Colvin <john.loughran.colvin gmail.com> wrote:Thanks for the explanations. Another use case is to aid propogation of compile-time information for optimisation. A function might look like a poor candidate for inlining for other reasons, but if there's a statically known (to the caller) integer parameter coming in that will be used to decide a loop length, inlining allows that info to be propogated to the callee. Static loop lengths => well optimised loops, with opportunities for optimal unrolling. Even with quite a large function this can be a good choice to inline. I don't know how good compilers are at taking this sort of thing into account already.As much as I like the idea: Something always tells me this is the compilers job... What clever reasoning are you applying that the compiler's inliner can't? It seems like a different situation to say SIMD code, where correctly structuring loops can require a lot of gymnastics that the compiler can't or won't (floating point conformance) do. The inlining decision seems easily automatable in comparison. I understand that unoptimised builds for debugging are a problem, but a sensible compiler let's you hand pick your optimisation passes. In short: why are compilers not good enough at this that the programmer needs to be involved?The compiler applies generalised heuristics, which are certainly for the 'common' case, whatever that happens to be. The compiler simply doesn't know what you're doing, so it's very hard for the compiler to do anything really intelligent. Inlining heuristics are fickle, and they also don't know what you're actually trying to do. Is a function 'long'? How long is 'long'? Is the function 'hot'? Do we prefer code size or execution speed? Is the function called only from this location, or is it used in many locations? Etc. Inlining is one of the most fuzzy pieces of logic in the compiler, and relies on a lot of information that is impossible for the compiler to deduce, so it applies heuristics to try and do a decent job, but it's certainly not perfect. I argue, nothing so fickle can exist in the language without having a manual override. Especially not in a native language. In my current case, the functions I need to inline are not exactly trivial. They're really pushing the boundaries of the compilers inliner heuristics, and then I'm calling a series of such functions that operate on parallel data. If they don't inline, the performance equals the sum of the functions plus some overhead. If they all inline, the performance is equal to only the longest one, and no overhead (the others fill in pipeline gaps). Further, some of these functions embed some shared work... if they don't inline, this work is repeated. If they do inline, the redundant repeated work is eliminated. My experiments with std.algorithm were a failure. I realised quickly that I couldn't rely on the inliner to do a satisfactory job, and the optimiser was unable to do it's job properly. std.algorithm could really benefit from the mixin suggestion since things like predicate functions are always trivial, usually supplied as little lambdas, and inlining isn't reliable. Especially in the debug builds. Something like algorithm loop sugar shouldn't run heaps worse than an explicit loop just because it happens to be implemented by a generic function.
Mar 14 2014
John Colvin:Another use case is to aid propogation of compile-time information for optimisation. A function might look like a poor candidate for inlining for other reasons, but if there's a statically known (to the caller) integer parameter coming in that will be used to decide a loop length, inlining allows that info to be propogated to the callee. Static loop lengths => well optimised loops, with opportunities for optimal unrolling. Even with quite a large function this can be a good choice to inline.If the function is private in a module, and it's called only from one point (or otherwise the loop count is the same in different calls), I think this optimization can be performed even if the function is not inlined. Bye, bearophile
Mar 14 2014
On 14 March 2014 22:02, John Colvin <john.loughran.colvin gmail.com> wrote:Thanks for the explanations. Another use case is to aid propogation of compile-time information for optimisation. A function might look like a poor candidate for inlining for other reasons, but if there's a statically known (to the caller) integer parameter coming in that will be used to decide a loop length, inlining allows that info to be propogated to the callee. Static loop lengths => well optimised loops, with opportunities for optimal unrolling. Even with quite a large function this can be a good choice to inline.Yup, this is a classic example. Extremely relevant. And it's precisely the sort of thing that an inline heuristic is likely to fail at. I don't know how good compilers are at taking this sort of thing intoaccount already.I don't know if they try or not, but I can say from experience that results are generally unreliable. I would never depend on the inliner to get this right. On 14 March 2014 22:08, bearophile <bearophileHUGS lycos.com> wrote:John Colvin: ...This is probably true, but I would never rely on it. You have some carefully tuned code that works well, and then one day, some random unrelated thing tweaks a balance, and your previously good code is suddenly slow for unknown reasons. The point is, there are times when you know your code should be inlined; ie, it's not an 'optimisation', it's an expectation/requirement. A programmer needs to be able to express this.If the function is private in a module, and it's called only from one point (or otherwise the loop count is the same in different calls), I think this optimization can be performed even if the function is not inlined.
Mar 14 2014
On 3/14/2014 8:37 AM, Manu wrote:On 14 March 2014 22:02, John Colvin <john.loughran.colvin gmail.com> wrote:I don't know how this compares to other inliners, but FWIW, DMD's inliner is pretty simple (By coincidence, I was just digging into it the other day): Every expression node (ie non-statement, non-declaration) in the function's AST adds 1 to the cost of inlining (so ex: 1+2*3 would have a cost of 2 - one mult, plus one addition). If the total cost is under 250, the function is inlined. Also, any type of AST node that isn't explicitly handled in inline.c will prevent a function from ever being inlined (since the ijnliner doesn't know how to inline it). I assume this is probably after lowerings are done, though, so more advanced constructs probably don't need to be explicitly handled. There is one other minor difficulty worth noting: When DMD wants to inline a function call, and the function's return value is actually used (ex: "auto x = foo();" or "1 + foo()"), the function must get inlined as an expression. Unfortunately, AIUI, a lot of D's statements can't be implemented inside an expression ATM (such as loops), so these would currently prevent such a function call from being inlined. I don't know how easy or difficult that would be to fix. Conceptually it should be simple: Create an Expression type StatementExp to wrap a Statement as an expression. But other parts of the backend would probably need to know about it, and I'm unfamiliar with the rest of the backend, so have no idea what that would/wouldn't entail. Not that it can't be done (AFAIK), but since the subject came up I thought I'd give a brief overview of the current DMD inliner, just FWIW.I don't know how good compilers are at taking this sort of thing into account already.I don't know if they try or not, but I can say from experience that results are generally unreliable. I would never depend on the inliner to get this right.
Mar 14 2014
On Friday, 14 March 2014 at 22:12:38 UTC, Nick Sabalausky wrote:On 3/14/2014 8:37 AM, Manu wrote:Probably one easy adjustment that would result in a lot of gain in optimization would be to bump the lower bound of 250 if the function is an operator overload.On 14 March 2014 22:02, John Colvin <john.loughran.colvin gmail.com> wrote:I don't know how this compares to other inliners, but FWIW, DMD's inliner is pretty simple (By coincidence, I was just digging into it the other day): Every expression node (ie non-statement, non-declaration) in the function's AST adds 1 to the cost of inlining (so ex: 1+2*3 would have a cost of 2 - one mult, plus one addition). If the total cost is under 250, the function is inlined. Also, any type of AST node that isn't explicitly handled in inline.c will prevent a function from ever being inlined (since the ijnliner doesn't know how to inline it). I assume this is probably after lowerings are done, though, so more advanced constructs probably don't need to be explicitly handled. There is one other minor difficulty worth noting: When DMD wants to inline a function call, and the function's return value is actually used (ex: "auto x = foo();" or "1 + foo()"), the function must get inlined as an expression. Unfortunately, AIUI, a lot of D's statements can't be implemented inside an expression ATM (such as loops), so these would currently prevent such a function call from being inlined. I don't know how easy or difficult that would be to fix. Conceptually it should be simple: Create an Expression type StatementExp to wrap a Statement as an expression. But other parts of the backend would probably need to know about it, and I'm unfamiliar with the rest of the backend, so have no idea what that would/wouldn't entail. Not that it can't be done (AFAIK), but since the subject came up I thought I'd give a brief overview of the current DMD inliner, just FWIW.I don't know how good compilers are at taking this sort of thing into account already.I don't know if they try or not, but I can say from experience that results are generally unreliable. I would never depend on the inliner to get this right.
Mar 14 2014
On 2014-03-14 07:21, Manu wrote:So, I'm constantly running into issues with not having control over inline. I've run into it again doing experiments in preparation for my dconf talk... I have identified 2 cases which come up regularly: 1. A function that should always be inline unconditionally (std.simd is effectively blocked on this) 2. A particular invocation of a function should be inlined for this call only The first case it just about having control over code gen. Some functions should effectively be macros or pseudo-intrinsics (ie, intrinsic wrappers in std.simd, beauty wrappers around asm code, etc), and I don't ever want to see a symbol appear in the binary. My suggestion is introduction of __forceinline or something like it. We need this.Haven't we already agreed a pragma for force inline should be implemented. Or is that something I have dreamed?The second case is interesting, and I've found it comes up a few times on different occasions. In my current instance, I'm trying to build generic framework to perform efficient composable data processing, and a basic requirement is that the components are inlined, such that the optimiser can interleave the work properly. Let's imagine I have a template which implements a work loop, which wants to call a bunch of work elements it receives by alias. The issue is, each of those must be inlined, for this call instance only, and there's no way to do this. I'm gonna draw the line at stringified code to use with mixin; I hate that, and I don't want to encourage use of mixin or stringified code in user-facing API's as a matter of practise. Also, some of these work elements might be useful functions in their own right, which means they can indeed be a function existing somewhere else that shouldn't itself be attributed as __forceinline. What are the current options to force that some code is inlined? My feeling is that an ideal solution would be something like an enhancement which would allow the 'mixin' keyword to be used with regular function calls. What this would do is 'mix in' the function call at this location, ie, effectively inline that particular call, and it leverages a keyword and concept that we already have. It would obviously produce a compile error of the code is not available. I quite like this idea, but there is a potential syntactical problem; how to assign the return value? int func(int y) { return y*y+10; } int output = mixin func(10); // the 'mixin' keyword seems to kinda 'getI think this is the best syntax of these three alternatives.in the way' if the output int output = mixin(func(10)); // now i feel paren spammy...This syntax can't work. It's already interpreted calling "func" and use the result as a string mixin.mixin(int output = func(10)); // this doesn't feel right...No.My feeling is the first is the best, but I'm not sure about that grammatically.Yeah, I agree.The other thing that comes to mind is that it seems like this might make a case for AST macros... but I think that's probably overkill for this situation, and I'm not confident we're ever gonna attempt to crack that nut. I'd like to see something practical and unobjectionable preferably.AST macros would solve it. It could solve the first use case as well. I would not implement AST macros just to support force inline but we have many other uses cases as well. I would have implement AST macros a long time ago. Hopefully this would avoid the need to create new language features in some cases. First use case, just define a macro that returns the AST for the content of the function you would create. macro func (Ast!(int) a) { return <[ $a * $a; ]>; } int output = func(10); // always inlined Second use case, define a macro, "inline", that takes the function you want to call as a parameter. The macro will basically inline the body. macro inline (T, U...) (Ast!(T function (U) func) { // this would probably be more complicated return func.body; } int output = func(10); // not inlined int output = inline(func(10)); // always inlinedThis problem is fairly far reaching; phobos receives a lot of lambdas these days, which I've found don't reliably inline and interfere with the optimisers ability to optimise the code.I thought since lambdas are passed as template parameters they would always be inlined. -- /Jacob Carlborg
Mar 14 2014
On 2014-03-14 17:57:59 +0000, Jacob Carlborg <doob me.com> said:Maybe, but what does it do? Should it just inline the call to func? Or should it inline recursively every call inside func? Or maybe something in the middle? -- Michel Fortin michel.fortin michelf.ca http://michelf.caint output = mixin func(10); // the 'mixin' keyword seems to kinda 'getI think this is the best syntax of these three alternatives.
Mar 14 2014
On 2014-03-14 19:02, Michel Fortin wrote:Maybe, but what does it do? Should it just inline the call to func? Or should it inline recursively every call inside func? Or maybe something in the middle?I guess Manu needs to answer this one. -- /Jacob Carlborg
Mar 14 2014
On 15 March 2014 04:02, Michel Fortin <michel.fortin michelf.ca> wrote:On 2014-03-14 17:57:59 +0000, Jacob Carlborg <doob me.com> said: int output = mixin func(10); // the 'mixin' keyword seems to kinda 'getI'd say it should inline only func. Any sub-calls are subject to the regular inline heuristics.Maybe, but what does it do? Should it just inline the call to func? Or should it inline recursively every call inside func? Or maybe something in the middle?I think this is the best syntax of these three alternatives.
Mar 14 2014
On Saturday, 15 March 2014 at 04:17:06 UTC, Manu wrote:I'd say it should inline only func. Any sub-calls are subject to the regular inline heuristics.I agree with you that explicit inlining is absolutely necessary and that call site inlining is highly desirable. However, I think that the call-site inlining should inline as much as possible. Basically this is something you will try when the code is too slow to meet real time deadlines and you hope to avoid going for a hand optimized solution in order to cut down on dev time. That suggests aggressive inlining to me. If the inlining only goes one level then I don't think this will be used frequently enough to be useful, e.g. you can just create one inline version and then a non-inline version that calls the inline version. E.g.: noninline_func(){ inline_func();} Ola.
Mar 18 2014
On 18 March 2014 23:11, <7d89a89974b0ff40.invalid internationalized.invalid>wrote:On Saturday, 15 March 2014 at 04:17:06 UTC, Manu wrote:Inlining is a basic codegen tool, and it's important that low-level programmers have tight control over this aspect of the compiler's codegen. I think it's a mistake to consider it an optimisation, unless you know precisely what you're doing. I wouldn't want to see it try and forcibly inline the whole tree; there's no reason to believe that the whole tree should be inlined 100% of the time, rather, it's almost certainly not the case. In the case you do want to inline the whole tree, you can just cascade the mixin through the stack. In the case you suggest which flattens the tree by default, we've lost control; how to tell it only to do it for one level without hacks? And I believe this is the common case. For example, It's very likely that you might require a function to inline that is relatively trivial in its own right; a wrapper or a macro effectively, but conditionally calls an expensive function, or perhaps calls a function that you don't have source for (it would break at that point if it tried to inline the tree). If the inlining only goes one level then I don't think this will be usedI'd say it should inline only func. Any sub-calls are subject to the regular inline heuristics.I agree with you that explicit inlining is absolutely necessary and that call site inlining is highly desirable. However, I think that the call-site inlining should inline as much as possible. Basically this is something you will try when the code is too slow to meet real time deadlines and you hope to avoid going for a hand optimized solution in order to cut down on dev time. That suggests aggressive inlining to me.frequently enough to be useful, e.g. you can just create one inline version and then a non-inline version that calls the inline version.As the one that requested it, I have numerous uses for it to mixin just the one level. I can't imagine any uses where I would ever want to explicitly inline the whole tree, and not be happy to cascade it manually. E.g.:noninline_func(){ inline_func();}Why? This is really overcomplicating a simple thing. And I'm not quite sure what you're suggesting this should do either. Are you saying the call tree is flattened behind this proxy non-inline function? I don't think that's useful. I don't think anything would/should be marked __alwaysinline unless you REALLY mean that it has literally no business being called. Ie, marking something __alwaysinline just for the sake of wrapping it with a non-inline is the wrong thing to do. Just to reiterate, inline is a tool, not an 'optimisation'. It doesn't necessarily yield faster code, in many situations it is slower, and best left to the compiler to decide. But it's an important tool for any low-level programmer to have. D must provide a sufficient suite of low-level tools that allow proper control over the code generation. I think as a tool, it should be deliberate and conservative in approach, ie, just one level, and let the programmer cascade it if that's what they mean to do. There should be no surprises with something like this, and if it's inlining a whole call tree, you often don't know what happens further down the tree, and it's more likely to change on you unexpectedly.
Mar 18 2014
On Wednesday, 19 March 2014 at 01:28:48 UTC, Manu wrote:In the case you do want to inline the whole tree, you can just cascade the mixin through the stack. In the case you suggest which flattens the tree by default, we've lost control; how to tell it only to do it for one level without hacks? And I believe this is the common case.You could provide it with a recursion level parameter or parameters for cost level heuristics. It could also be used to flatten tail-call recursion.As the one that requested it, I have numerous uses for it to mixin just the one level. I can't imagine any uses where I would ever want to explicitly inline the whole tree, and not be happy to cascade it manually.In innerloops to factor out common subexpressions that are otherwise recomputed over and over and over. When the function is generated code (not hand written).No, I am saying that the one level mixin doesn't provide you with anything new. You already have that. It is sugar.noninline_func(){ inline_func();}Why? This is really overcomplicating a simple thing. And I'm not quite sure what you're suggesting this should do either. Are you saying the call tree is flattened behind this proxy non-inline function?
Mar 18 2014
On 19 March 2014 16:18, <7d89a89974b0ff40.invalid internationalized.invalid>wrote:On Wednesday, 19 March 2014 at 01:28:48 UTC, Manu wrote:Again, I think this is significantly overcomplicating something which see is being extremely simple. It could also be used to flatten tail-call recursion. I don't think it's valid to inline a tail call recursion, because the inlined call also wants to inline another call to itself... You can't know how fer it should go, so it needs to be transformed into a loop, and not we're talking about something completely different than inlining. As the one that requested it, I have numerous uses for it to mixin just theIn the case you do want to inline the whole tree, you can just cascade the mixin through the stack. In the case you suggest which flattens the tree by default, we've lost control; how to tell it only to do it for one level without hacks? And I believe this is the common case.You could provide it with a recursion level parameter or parameters for cost level heuristics.This is highly context sensitive. I would trust the compiler heuristics to make the right decision here. The idea of eliminating common sub-expressions suggests that there _are_ common sub-expressions, which aren't affected by the function arguments. This case is highly unusual in my experience. And I personally wouldn't depend on a feature such as this to address that sort of a problem in my codegen. I would just refactor the function a little bit to call the common sub-expression ahead of time. When the function is generated code (not hand written). I'm not sue what you mean here? noninline_func(){ inline_func();}one level. I can't imagine any uses where I would ever want to explicitly inline the whole tree, and not be happy to cascade it manually.In innerloops to factor out common subexpressions that are otherwise recomputed over and over and over.It really does provide something new. It provides effectively, a type-safe implementation of something that may be used in place of C/C++ macros. I think that would be extremely useful in a variety of applications. You already have that. It is sugar.No, I am saying that the one level mixin doesn't provide you with anything new.Why? This is really overcomplicating a simple thing. And I'm not quite sure what you're suggesting this should do either. Are you saying the call tree is flattened behind this proxy non-inline function?I don't already have it, otherwise I'd be making use of it. D has no control over the inliner. GDC/LDC offer attributes, but then it's really annoying that D has no mechanism to make use of compiler-specific attributes in a portable way (ie, attribute aliasing), so I can't make use of those without significantly interfering with my code. I also don't think that suggestion of yours works. I suspect the compiler will see the outer function as a trivial wrapper which will fall within the compilers normal inline heuristics, and it will all inline anyway.
Mar 19 2014
On Wednesday, 19 March 2014 at 08:35:53 UTC, Manu wrote:The idea of eliminating common sub-expressions suggests that there _are_ common sub-expressions, which aren't affected by the function arguments. This case is highly unusual in my experience.Not if you delay optimization until profiling and focus on higher level structures during initial implementation. Or use composing (like generic programming). If you hand optimize right from the start then you might be right, but if you never call a function with the same parameters then you are doing premature optimization IMHO.Code that is generated by a tool (or composable templates or whatever) tend to be repetitive and suboptimal. I.e. boiler plate code that looks like it was written by a monkey…When the function is generated code (not hand written).I'm not sue what you mean here?I meant that if you have explicit inline hints like C++ then you also have call-site inlining if you want to.You already have that. It is sugar.I don't already have it, otherwise I'd be making use of it. D has no control over the inliner.I also don't think that suggestion of yours works. I suspect the compiler will see the outer function as a trivial wrapper which will fall within the compilers normal inline heuristics, and it will all inline anyway.That should be considered a bug if it is called from more than one location.
Mar 19 2014
On 19 March 2014 19:16, <7d89a89974b0ff40.invalid internationalized.invalid>wrote:On Wednesday, 19 March 2014 at 08:35:53 UTC, Manu wrote:Okay, do you have use cases for any of this stuff? Are you just making it up, or do you have significant experience to say this is what you need? I can say for a fact, that recursive inline would make almost everything I want it for much more annoying. I would find myself doing stupid stuff to fight the recursive inliner in every instance. When the function is generated code (not hand written).The idea of eliminating common sub-expressions suggests that there _are_ common sub-expressions, which aren't affected by the function arguments. This case is highly unusual in my experience.Not if you delay optimization until profiling and focus on higher level structures during initial implementation. Or use composing (like generic programming). If you hand optimize right from the start then you might be right, but if you never call a function with the same parameters then you are doing premature optimization IMHO.I'm not sure where the value is... why would you want to inline this? You already have that. It is sugar.Code that is generated by a tool (or composable templates or whatever) tend to be repetitive and suboptimal. I.e. boiler plate code that looks like it was written by a monkey=E2=80=A6I'm not sue what you mean here?eI meant that if you have explicit inline hints like C++ then you also hav=I don't already have it, otherwise I'd be making use of it. D has no control over the inliner.call-site inlining if you want to.I still don't follow. C++ doesn't have call-site inlining. C/C++ has macros, and there is no way to achieve the same functionality in D right now, that's a key motivation for the proposal. I also don't think that suggestion of yours works. I suspect the compilerSeriously, you're making 'inline' about 10 times more complicated than it should ever be. If you ask me, I have no value in recursive inlining, infact, that would anger me considerably. By making this hard, you're also making it equally unlikely. Let inline exist first, then if/when it doesn't suit your use cases, argue for the details.will see the outer function as a trivial wrapper which will fall within the compilers normal inline heuristics, and it will all inline anyway.That should be considered a bug if it is called from more than one location.
Mar 19 2014
On Wednesday, 19 March 2014 at 12:35:30 UTC, Manu wrote:Okay, do you have use cases for any of this stuff? Are you just making it up, or do you have significant experience to say this is what you need?I don't need anything, I hand optimize prematurely. And I don't want to do that. But yes, inner loops benefits from exhaustive inlining because you get to move common expressions out of the loop or change them to delta increments. It is only when you trash the caches that inlining does not pay off. I do it by hand. I don't want to do it by hand.If you ask me, I have no value in recursive inlining, infact, that would anger me considerably.Why? You could always set the depth to 1, or make 1 the default. And it isn't difficult to implement.
Mar 19 2014
On 20 March 2014 06:23, <7d89a89974b0ff40.invalid internationalized.invalid>wrote:On Wednesday, 19 March 2014 at 12:35:30 UTC, Manu wrote:The problem is upside down. If you want to inline multiple levels, you start from the leaves and move downwards, not from the root moving upwards (leaving a bunch of leaves perhaps not inlined), which is what you're really suggesting. Inlining should be strictly deliberate, there's nothing to say that every function called in a tree should be inlined. There's a high probability there's one/some that shouldn't be among a few that should. Remember too, that call-site inlining isn't the only method, there would also be always-inline. I think always-inline is what you want for some decidedly trivial functions (although these will probably be heuristically inlined anyway), not call-site inlining. I just don't see how recursive call-site inlining is appropriate, considering that call trees are often complex, subject to change, and may even call functions that you don't have source for. You can cascade the mixin keyword if you want to, that's very simple. I'd be highly surprised if you ever encountered a call tree where you wanted to inline everything (and the optimiser didn't do it for you). As soon as you encounter a single function in the tree that shouldn't be inlined, then you'll be forced to do it one level at a time anyway.Okay, do you have use cases for any of this stuff? Are you just making it up, or do you have significant experience to say this is what you need?I don't need anything, I hand optimize prematurely. And I don't want to do that. But yes, inner loops benefits from exhaustive inlining because you get to move common expressions out of the loop or change them to delta increments. It is only when you trash the caches that inlining does not pay off. I do it by hand. I don't want to do it by hand. If you ask me, I have no value in recursive inlining, infact, that wouldanger me considerably.Why? You could always set the depth to 1, or make 1 the default. And it isn't difficult to implement.
Mar 19 2014
On Thursday, 20 March 2014 at 02:08:16 UTC, Manu wrote:The problem is upside down. If you want to inline multiple levels, you start from the leaves and move downwards, not from the root moving upwardsYes, that is true in cases where leaves are frequently visited. Good point. I am most interested in full inlining, but the heuristics should probably start with the leaves for people not interested in that. Agree. Anyway, in the case of ray tracing (or any search structure) I could see the value of having the opposite in combination with CTFE/partial evaluation. Example: Define a static scene (of objects) and let the compiler turn it into "a state machine" of code. Another example: Define an array of data, use partial evaluation to turn it into a binary tree, then turn the binary tree into code.Inlining should be strictly deliberate, there's nothing to say that every function called in a tree should be inlined. There's a high probability there's one/some that shouldn't be among a few that should.In the case of a long running loop it does not really matter. What it does get you is a chance to use generic code (or libraries) and then do a first-resort optimization. I basically see it as a time-saving feature (programmers time). A tool for cutting development costs.Remember too, that call-site inlining isn't the only method, there would also be always-inline...Yes, that is the first. I have in another thread some time ago suggested a solution that use weighted inlining to aid compiler heuristics: http://forum.dlang.org/thread/szjkyfpnachnnyknnfwp forum.dlang.org#post-szjkyfpnachnnyknnfwp:40forum.dlang.org As you can see I also suggested call-site inlining, so I am fully behind you in this. :-) Lack of inlining and GC are my main objections to D.I think always-inline is what you want for some decidedly trivial functions (although these will probably be heuristically inlined anyway), not call-site inlining.I agree. Compiler heuristics can change. It is desirable to be able to express intent no matter what the current heuristics are.I just don't see how recursive call-site inlining is appropriate, considering that call trees are often complex, subject to change, and may even call functions that you don't have source for.You should not use it blindly.You can cascade the mixin keyword if you want to, that's very simple.Not if you build the innerloop using generic components. I want this inline_everything while(conditon){ statement; statement; }I'd be highly surprised if you ever encountered a call tree where you wanted to inline everything (and the optimiser didn't do it for you).Not if you move to high-level programming using prewritten code and only go low level after profiling.As soon as you encounter a single function in the tree that shouldn't be inlined, then you'll be forced to do it one level at a time anyway.But then you have to change the libraries you are using!? Nothing prevents you to introduce exceptions as an extension though. I want inline(0.5) as default, but also be able to write inline(1) for inline always and inline(0) for inline never. func1(){} // implies inline(0.5) weighting inline func2(){} // same as inline(1) weighting, inline always inline(0.75) fun31(){} // increase the heuristics weighting inline(0) func4(){} // never-ever inline Ola.
Mar 20 2014
I just want to add these reasons for having inlining despite having compiler heuristics: 1. If you compile for embedded or PNACL on the web, you want a small executable. That means the heuristics should not inline if it increase the code size unless the programmer specified it in the code. (Or that you specify a target size, and do compiler re-runs until it fits.) 2. If you use profile guided opimization you should inline based on call frequency, but the input set might have missed some scenarios and you should be able to overrule the profile by explicit inlining in code where you know that it matters. (e.g. tight loop in an exception handler)
Mar 20 2014
On 20 March 2014 18:35, <7d89a89974b0ff40.invalid internationalized.invalid>wrote:On Thursday, 20 March 2014 at 02:08:16 UTC, Manu wrote:I'm sorry. I really can't support any of these wildly complex ideas. I just don't feel they're useful, and they're not very well founded. A numeric weight? What scale is it in? I'm not sure of any 'standard-inline-weight-measure' that any programmer would be able to intuitively gauge the magic number against. That will simply never be agreed by the devs. It also doesn't make much sense... different platforms will assign very different weights and different heuristics at the inliner. It's not a numeric quantity; it's a complex determination whether a function is a good candidate or not. The value you specify is likely highly context sensitive and probably not portable. Heuristic based Inlining should be left to the optimiser to decide. And I totally object to recursive inlining. It has a kind of absolute nature that removes control all the way down the call tree, and I don't feel it's likely that you would often (ever?) want to explicitly inline an entire call tree. If you want to inline a second level, then write mixin in the second level. Recurse. You are talking about generic code as if this isn't appropriate, but I specifically intend to use this in generic code very similar to what you suggest; so I don't see the incompatibility. I think you're saying like manually specifying it all the way down the call tree is inconvenient, but I would argue that manually specifying *exclusions* throughout the call tree after specifying a recursive inline is even more inconvenient. It requires more language (a feature to mark an exclusion), has a kind of obtuse double-negative logic about it, and it's equally invasive to your code. If you can prove that single level call-site inlining doesn't satisfy your needs at some later time, make a proposal then, along with your real-world use cases. But by throwing it in this thread right now, you're kinda just killing the thread, and making it very unlikely that anything will happen at all, which is annoying, because I REALLY need this (I've been trying to motivate inline support for over 3 years), and I get the feeling you're just throwing hypotheticals around. You're still fairly new here, but be aware that feature requests will become exponentially less likely to be accepted with every degree of complexity added. By making this seem hard, you're also making it almost certain not to happen, which isn't in either of our interest. My OP suggestion is the simplest solution I can conceive which will definitely satisfy all the real-world use cases that I've ever encountered. Is predictable, portable, simple.The problem is upside down. If you want to inline multiple levels, you start from the leaves and move downwards, not from the root moving upwardsYes, that is true in cases where leaves are frequently visited. Good point. I am most interested in full inlining, but the heuristics should probably start with the leaves for people not interested in that. Agree. Anyway, in the case of ray tracing (or any search structure) I could see the value of having the opposite in combination with CTFE/partial evaluation. Example: Define a static scene (of objects) and let the compiler turn it into "a state machine" of code. Another example: Define an array of data, use partial evaluation to turn it into a binary tree, then turn the binary tree into code. Inlining should be strictly deliberate, there's nothing to say that everyfunction called in a tree should be inlined. There's a high probability there's one/some that shouldn't be among a few that should.In the case of a long running loop it does not really matter. What it does get you is a chance to use generic code (or libraries) and then do a first-resort optimization. I basically see it as a time-saving feature (programmers time). A tool for cutting development costs. Remember too, that call-site inlining isn't the only method, there wouldalso be always-inline...Yes, that is the first. I have in another thread some time ago suggested a solution that use weighted inlining to aid compiler heuristics: http://forum.dlang.org/thread/szjkyfpnachnnyknnfwp forum.dlang.org#post- szjkyfpnachnnyknnfwp:40forum.dlang.org As you can see I also suggested call-site inlining, so I am fully behind you in this. :-) Lack of inlining and GC are my main objections to D. I think always-inline is what you want for somedecidedly trivial functions (although these will probably be heuristically inlined anyway), not call-site inlining.I agree. Compiler heuristics can change. It is desirable to be able to express intent no matter what the current heuristics are. I just don't see how recursivecall-site inlining is appropriate, considering that call trees are often complex, subject to change, and may even call functions that you don't have source for.You should not use it blindly. You can cascade the mixin keyword if you want to, that's very simple.Not if you build the innerloop using generic components. I want this inline_everything while(conditon){ statement; statement; } I'd be highly surprised if you ever encountered a call tree whereyou wanted to inline everything (and the optimiser didn't do it for you).Not if you move to high-level programming using prewritten code and only go low level after profiling. As soon as you encounter a single function in the tree that shouldn't beinlined, then you'll be forced to do it one level at a time anyway.But then you have to change the libraries you are using!? Nothing prevents you to introduce exceptions as an extension though. I want inline(0.5) as default, but also be able to write inline(1) for inline always and inline(0) for inline never. func1(){} // implies inline(0.5) weighting inline func2(){} // same as inline(1) weighting, inline always inline(0.75) fun31(){} // increase the heuristics weighting inline(0) func4(){} // never-ever inline Ola.
Mar 20 2014
On Thursday, 20 March 2014 at 12:31:33 UTC, Manu wrote:I'm sorry. I really can't support any of these wildly complex ideas.They aren't actually complex, except tail-call optimization (but that is well understood).If you want to inline a second level, then write mixin in the second level.You might as well do copy-paste then. You cannot add inlining to an imported library without modifying it.at all, which is annoying, because I REALLY need this (I've been trying to motivate inline support for over 3 years), and I get the feeling you're just throwing hypotheticals around.You need inlining, agree, but not 1 level mixin. Because you can do that with regular inlining.
Mar 20 2014
Please note that 1 level mixin is not sufficient in the case of libraries. In too many cases you will not inline the function that does the work, only the interface wrapper.
Mar 20 2014
On 21 March 2014 00:10, <7d89a89974b0ff40.invalid internationalized.invalid>wrote:Please note that 1 level mixin is not sufficient in the case of libraries. In too many cases you will not inline the function that does the work, only the interface wrapper.I don't think I would ever want to inline the whole call tree of a library. I've certainly never wanted to do anything like that in 20 years or so, and I've worked on some really performance critical systems, like amiga, dreamcast, ps2. It still sounds really sketchy. If the function that does the work is a few levels deep, then there is probably a good reason for that. What if there's an error check that writes log output or something? Or some branch that leads to other uncommon paths? I think you're making this problem up. Can you demonstrate where this has been a problem for you in the past? The call tree would have to be so very particular for this to be appropriate, and then you say this is a library, which you have no control over... so the call tree is just perfect by chance? What if the library changes?
Mar 20 2014
On Thursday, 20 March 2014 at 08:35:22 UTC, Ola Fosheim Grøstad wrote:Nothing prevents you to introduce exceptions as an extension though. I want inline(0.5) as default, but also be able to write inline(1) for inline always and inline(0) for inline never. func1(){} // implies inline(0.5) weighting inline func2(){} // same as inline(1) weighting, inline always inline(0.75) fun31(){} // increase the heuristics weighting inline(0) func4(){} // never-ever inlineIt looks promising when seen like that, but introducing explicit inlining/deinlining to me correspond to a precise process: 1. Bottleneck is identified. 2. "we could {inline|deinline} this call at this particular place and see what happens" 3. Apply inline directive for this call. Only "always" or "never" is ever wanted for me, and for 1 level only. 4. Measure and validate like all optimizations. Now after this, even if the inlining become harmful for other reasons, I want this inlining to be maintained, whatever the cost, not subject to random rules I don't know of. When you tweak inlining, you are supposed to know what you are doing, and it's not just an optimization, it's an essential tool that enables other optimizations, help disambiguate aliasing, help the auto-vectorizer, help constant propagation... In the large majority of cases it can be left to the compiler, and in the 1% cases that matters I want to do it explicitely full stop.
Mar 20 2014
On Thursday, 20 March 2014 at 15:26:35 UTC, ponce wrote:Now after this, even if the inlining become harmful for other reasons, I want this inlining to be maintained, whatever the cost, not subject to random rules I don't know of. When youThe rules aren't random. The inliner conceptually use weighting anyway, you just increase the threshold for a specific call-tree. E.g. if a function is on the borderline of being inlined the probability is 50% if you add some noise to the selection with a magnitude that equals the "typical approximation error" of the heuristics. "inline(0.75)" should increase the probability to 75%. Today all functions have an implied "inline(0.5)". I think you should have this kind of control for all compiler heuristics thresholds that are "arbitrary", not only inlining. Call site inlining is primarily useful for inlining external code. The alternative is usually to replace libraries with your own version.
Mar 20 2014
Maybe we could have both declare site inlining and call site inlining. with declare site, what we mean is that this function's body is used so commonly that we make it into a function only because we don't want duplicate code, not because it should be a standalone function. with call site inlining, one can inline thirdparty functions which is not declared inline. I think the `inline` Manu suggested should not be viewed as a mere optimization thing, but more like a code generation utility which happens to be faster. In this point of view, this kind of `inline` should be controlled by the coder, not the compiler. To make it clear that we are not talking about optimization, maybe we should call it another name, like 'mixin function'? BTW, the Kotlin language recently get a new released, which added support for declare site force inline, the team argues its necessity here: http://blog.jetbrains.com/kotlin/2014/03/m7-release-available/#more-1439 in the comments:It’s traditional to think about inlining as a mere optimization, but this dates back to the times >when software was shipped as one huge binary file. Why we think inline should be a language feature: 1. Other language features (to be implemented soon) depend on it. Namely, non-local returns >and type-dependent functions. Basically, inline functions are very restricted macros, and thisOn Thursday, 20 March 2014 at 02:08:16 UTC, Manu wrote:is definitely a language feature.2. Due to dynamic linking and binary compatibility issues it can not be up to the compiler >whether to inline something or not on the JVM: if bodies of inline functions change, all >dependent code should be recompiled, i.e. it’s the library author’s liability to preserve >functionality, so such functions must be explicitly marked.On 20 March 2014 06:23, <7d89a89974b0ff40.invalid internationalized.invalid>wrote:On Wednesday, 19 March 2014 at 12:35:30 UTC, Manu wrote:The problem is upside down. If you want to inline multiple levels, you start from the leaves and move downwards, not from the root moving upwards (leaving a bunch of leaves perhaps not inlined), which is what you're really suggesting. Inlining should be strictly deliberate, there's nothing to say that every function called in a tree should be inlined. There's a high probability there's one/some that shouldn't be among a few that should. Remember too, that call-site inlining isn't the only method, there would also be always-inline. I think always-inline is what you want for some decidedly trivial functions (although these will probably be heuristically inlined anyway), not call-site inlining. I just don't see how recursive call-site inlining is appropriate, considering that call trees are often complex, subject to change, and may even call functions that you don't have source for. You can cascade the mixin keyword if you want to, that's very simple. I'd be highly surprised if you ever encountered a call tree where you wanted to inline everything (and the optimiser didn't do it for you). As soon as you encounter a single function in the tree that shouldn't be inlined, then you'll be forced to do it one level at a time anyway.Okay, do you have use cases for any of this stuff? Are you just making it up, or do you have significant experience to say this is what you need?I don't need anything, I hand optimize prematurely. And I don't want to do that. But yes, inner loops benefits from exhaustive inlining because you get to move common expressions out of the loop or change them to delta increments. It is only when you trash the caches that inlining does not pay off. I do it by hand. I don't want to do it by hand. If you ask me, I have no value in recursive inlining, infact, that wouldanger me considerably.Why? You could always set the depth to 1, or make 1 the default. And it isn't difficult to implement.
Mar 23 2014
On 2014-03-19 09:35, Manu wrote:I don't already have it, otherwise I'd be making use of it. D has no control over the inliner. GDC/LDC offer attributes, but then it's really annoying that D has no mechanism to make use of compiler-specific attributes in a portable way (ie, attribute aliasing), so I can't make use of those without significantly interfering with my code.Can't you create a tuple with different attributes depending on which compiler is currently compiling? Something like this: version (LDC) alias attributes = TypeTuple!( attribute("forceinline"); else version (GDC) alias attributes = TypeTuple!( attribute("forceinline")); else version (DigitalMars) alias attributes = TypeTuple!(); else static assert(false); (attributes) void foo () { } // This assume that "attributes" will be expanded -- /Jacob Carlborg
Mar 20 2014
On 15 March 2014 03:57, Jacob Carlborg <doob me.com> wrote:On 2014-03-14 07:21, Manu wrote:It's been discussed. I never agreed to it (I _really_ don't like it), but I'll take it if it's the best I'm gonna get. I don't like stateful attributes like that. I think it's error prone, especially when it's silent. 'private:' for instance will complain if you write a new function in an area influenced by the private state and try and call it from elsewhere; ie, you know you made the mistake. If you write a new function in an area influenced by the forceinline state which wasn't intended to be inlined, you won't know. I think that's dangerous. The second case is interesting, and I've found it comes up a few timesSo, I'm constantly running into issues with not having control over inline. I've run into it again doing experiments in preparation for my dconf talk... I have identified 2 cases which come up regularly: 1. A function that should always be inline unconditionally (std.simd is effectively blocked on this) 2. A particular invocation of a function should be inlined for this call only The first case it just about having control over code gen. Some functions should effectively be macros or pseudo-intrinsics (ie, intrinsic wrappers in std.simd, beauty wrappers around asm code, etc), and I don't ever want to see a symbol appear in the binary. My suggestion is introduction of __forceinline or something like it. We need this.Haven't we already agreed a pragma for force inline should be implemented. Or is that something I have dreamed?So you think it's grammatically okay? The other thing that comes to mind is that it seems like this might makeon different occasions. In my current instance, I'm trying to build generic framework to perform efficient composable data processing, and a basic requirement is that the components are inlined, such that the optimiser can interleave the work properly. Let's imagine I have a template which implements a work loop, which wants to call a bunch of work elements it receives by alias. The issue is, each of those must be inlined, for this call instance only, and there's no way to do this. I'm gonna draw the line at stringified code to use with mixin; I hate that, and I don't want to encourage use of mixin or stringified code in user-facing API's as a matter of practise. Also, some of these work elements might be useful functions in their own right, which means they can indeed be a function existing somewhere else that shouldn't itself be attributed as __forceinline. What are the current options to force that some code is inlined? My feeling is that an ideal solution would be something like an enhancement which would allow the 'mixin' keyword to be used with regular function calls. What this would do is 'mix in' the function call at this location, ie, effectively inline that particular call, and it leverages a keyword and concept that we already have. It would obviously produce a compile error of the code is not available. I quite like this idea, but there is a potential syntactical problem; how to assign the return value? int func(int y) { return y*y+10; } int output = mixin func(10); // the 'mixin' keyword seems to kinda 'getI think this is the best syntax of these three alternatives. in the way' if the outputint output = mixin(func(10)); // now i feel paren spammy...This syntax can't work. It's already interpreted calling "func" and use the result as a string mixin. mixin(int output = func(10)); // this doesn't feel right...No. My feeling is the first is the best, but I'm not sure about thatgrammatically.Yeah, I agree.Maybe... (and not in debug builds). Without explicit control of the inliner, you just never know.a case for AST macros... but I think that's probably overkill for this situation, and I'm not confident we're ever gonna attempt to crack that nut. I'd like to see something practical and unobjectionable preferably.AST macros would solve it. It could solve the first use case as well. I would not implement AST macros just to support force inline but we have many other uses cases as well. I would have implement AST macros a long time ago. Hopefully this would avoid the need to create new language features in some cases. First use case, just define a macro that returns the AST for the content of the function you would create. macro func (Ast!(int) a) { return <[ $a * $a; ]>; } int output = func(10); // always inlined Second use case, define a macro, "inline", that takes the function you want to call as a parameter. The macro will basically inline the body. macro inline (T, U...) (Ast!(T function (U) func) { // this would probably be more complicated return func.body; } int output = func(10); // not inlined int output = inline(func(10)); // always inlined This problem is fairly far reaching; phobos receives a lot of lambdasthese days, which I've found don't reliably inline and interfere with the optimisers ability to optimise the code.I thought since lambdas are passed as template parameters they would always be inlined.
Mar 14 2014
"Manu" <turkeyman gmail.com> wrote in message news:mailman.128.1394856947.23258.digitalmars-d puremagic.com...Huh? The pragma could easily be restricted to apply to exactly one function declaration, if that's what's desired.Haven't we already agreed a pragma for force inline should be implemented. Or is that something I have dreamed?It's been discussed. I never agreed to it (I _really_ don't like it), but I'll take it if it's the best I'm gonna get. I don't like stateful attributes like that. I think it's error prone, especially when it's silent. 'private:' for instance will complain if you write a new function in an area influenced by the private state and try and call it from elsewhere; ie, you know you made the mistake. If you write a new function in an area influenced by the forceinline state which wasn't intended to be inlined, you won't know. I think that's dangerous.
Mar 14 2014
On 15 March 2014 14:33, Daniel Murphy <yebbliesnospam gmail.com> wrote:"Manu" <turkeyman gmail.com> wrote in message news:mailman.128.1394856947. 23258.digitalmars-d puremagic.com... > Haven't we already agreed a pragma for force inline should be >Then why bother with a pragma? It's just a special case for the sake of a special case... I don't see why resist the language conventions. Where's the precedent for that? It just sounds like it's asking to cause edge cases and trouble down the line. Is it gonna get messy when it involves with templates? What about methods, sub-functions?implemented. Or isHuh? The pragma could easily be restricted to apply to exactly one function declaration, if that's what's desired.that something I have dreamed?It's been discussed. I never agreed to it (I _really_ don't like it), but I'll take it if it's the best I'm gonna get. I don't like stateful attributes like that. I think it's error prone, especially when it's silent. 'private:' for instance will complain if you write a new function in an area influenced by the private state and try and call it from elsewhere; ie, you know you made the mistake. If you write a new function in an area influenced by the forceinline state which wasn't intended to be inlined, you won't know. I think that's dangerous.
Mar 14 2014
On 15 March 2014 14:55, Manu <turkeyman gmail.com> wrote:On 15 March 2014 14:33, Daniel Murphy <yebbliesnospam gmail.com> wrote:*bump* I actually care about this a whole lot more than final-by-default right now ;) I'd like to think there's a possible solution to these problems that everyone agrees with."Manu" <turkeyman gmail.com> wrote in message news:mailman.128.1394856947.23258.digitalmars-d puremagic.com... > Haven't we already agreed a pragma for force inline should be >Then why bother with a pragma? It's just a special case for the sake of a special case... I don't see why resist the language conventions. Where's the precedent for that? It just sounds like it's asking to cause edge cases and trouble down the line. Is it gonna get messy when it involves with templates? What about methods, sub-functions?implemented. Or isHuh? The pragma could easily be restricted to apply to exactly one function declaration, if that's what's desired.that something I have dreamed?It's been discussed. I never agreed to it (I _really_ don't like it), but I'll take it if it's the best I'm gonna get. I don't like stateful attributes like that. I think it's error prone, especially when it's silent. 'private:' for instance will complain if you write a new function in an area influenced by the private state and try and call it from elsewhere; ie, you know you made the mistake. If you write a new function in an area influenced by the forceinline state which wasn't intended to be inlined, you won't know. I think that's dangerous.
Mar 17 2014
On 3/17/14, 6:26 AM, Manu wrote:On 15 March 2014 14:55, Manu <turkeyman gmail.com <mailto:turkeyman gmail.com>> wrote: On 15 March 2014 14:33, Daniel Murphy <yebbliesnospam gmail.com <mailto:yebbliesnospam gmail.com>> wrote: "Manu" <turkeyman gmail.com <mailto:turkeyman gmail.com>> wrote in message news:mailman.128.1394856947.__23258.digitalmars-d puremagic.__com... > Haven't we already agreed a pragma for force inline should be > implemented. Or is > that something I have dreamed? It's been discussed. I never agreed to it (I _really_ don't like it), but I'll take it if it's the best I'm gonna get. I don't like stateful attributes like that. I think it's error prone, especially when it's silent. 'private:' for instance will complain if you write a new function in an area influenced by the private state and try and call it from elsewhere; ie, you know you made the mistake. If you write a new function in an area influenced by the forceinline state which wasn't intended to be inlined, you won't know. I think that's dangerous. Huh? The pragma could easily be restricted to apply to exactly one function declaration, if that's what's desired. Then why bother with a pragma? It's just a special case for the sake of a special case... I don't see why resist the language conventions. Where's the precedent for that? It just sounds like it's asking to cause edge cases and trouble down the line. Is it gonna get messy when it involves with templates? What about methods, sub-functions? *bump* I actually care about this a whole lot more than final-by-default right now ;) I'd like to think there's a possible solution to these problems that everyone agrees with.I'd like to see a solution to inlining along the lines of "pliz pliz inline" (best effort) and "never inline". Outlining only at a specific call site is seldom needed and when it is it's trivially achievable with a noinline function forwarding to the inline function. Inlining only at a specific call site is a tall order and essentially impossible if header generation had been used. Andrei
Mar 17 2014
On 18 March 2014 01:36, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>wrote:I'd like to see a solution to inlining along the lines of "pliz pliz inline" (best effort) and "never inline". Outlining only at a specific call site is seldom needed and when it is it's trivially achievable with a noinline function forwarding to the inline function. Inlining only at a specific call site is a tall order and essentially impossible if header generation had been used.I don't follow, how does that work? It's the key innovation here. Since D doesn't have macros, I think it's something that really needs to be supported nicely. Obviously it's impossible if source is unavailable. It should give the same complaints that CTFE gives when source is unavailable.
Mar 17 2014
On 3/17/14, 9:10 AM, Manu wrote:On 18 March 2014 01:36, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org <mailto:SeeWebsiteForEmail erdani.org>> wrote: I'd like to see a solution to inlining along the lines of "pliz pliz inline" (best effort) and "never inline". Outlining only at a specific call site is seldom needed and when it is it's trivially achievable with a noinline function forwarding to the inline function. Inlining only at a specific call site is a tall order and essentially impossible if header generation had been used. I don't follow, how does that work? It's the key innovation here. Since D doesn't have macros, I think it's something that really needs to be supported nicely. Obviously it's impossible if source is unavailable. It should give the same complaints that CTFE gives when source is unavailable.The notion that a compiler can ask for any function to be inlined without the compiler having been "warned" in the function declaration makes me uncomfortable about feasibility. However, upon further thinking the same happens with CTFE. Andrei
Mar 17 2014
On 18 March 2014 06:37, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org>wrote:On 3/17/14, 9:10 AM, Manu wrote:Exactly, we already have it in CTFE. It doesn't really add any new concept that D isn't already comfortable with. It can kinda be seen as sort of a type safe macro, which is a tool that D is lacking compared to C. I think the mixin keyword and concept makes perfect sense in this context. It feels quite intuitive to me.On 18 March 2014 01:36, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org <mailto:SeeWebsiteForEmail erdani.org>> wrote: I'd like to see a solution to inlining along the lines of "pliz pliz inline" (best effort) and "never inline". Outlining only at a specific call site is seldom needed and when it is it's trivially achievable with a noinline function forwarding to the inline function. Inlining only at a specific call site is a tall order and essentially impossible if header generation had been used. I don't follow, how does that work? It's the key innovation here. Since D doesn't have macros, I think it's something that really needs to be supported nicely. Obviously it's impossible if source is unavailable. It should give the same complaints that CTFE gives when source is unavailable.The notion that a compiler can ask for any function to be inlined without the compiler having been "warned" in the function declaration makes me uncomfortable about feasibility. However, upon further thinking the same happens with CTFE.
Mar 18 2014
On 3/13/14, 11:21 PM, Manu wrote:My feeling is that an ideal solution would be something like an enhancement which would allow the 'mixin' keyword to be used with regular function calls. What this would do is 'mix in' the function call at this location, ie, effectively inline that particular call, and it leverages a keyword and concept that we already have. It would obviously produce a compile error of the code is not available. I quite like this idea, but there is a potential syntactical problem; how to assign the return value? int func(int y) { return y*y+10; } int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get in the way' if the output int output = mixin(func(10)); // now i feel paren spammy... mixin(int output = func(10)); // this doesn't feel right... My feeling is the first is the best, but I'm not sure about that grammatically.Is there already some trait for getting the string value of a function including its code? If so then a mixin plus a small helper function might do the job. If not then is such a trait feasible?
Mar 14 2014
Am 14.03.2014 19:09, schrieb David Gileadi:On 3/13/14, 11:21 PM, Manu wrote:Might be problematic with modules delivered only in .di + binary form. -- PauloMy feeling is that an ideal solution would be something like an enhancement which would allow the 'mixin' keyword to be used with regular function calls. What this would do is 'mix in' the function call at this location, ie, effectively inline that particular call, and it leverages a keyword and concept that we already have. It would obviously produce a compile error of the code is not available. I quite like this idea, but there is a potential syntactical problem; how to assign the return value? int func(int y) { return y*y+10; } int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get in the way' if the output int output = mixin(func(10)); // now i feel paren spammy... mixin(int output = func(10)); // this doesn't feel right... My feeling is the first is the best, but I'm not sure about that grammatically.Is there already some trait for getting the string value of a function including its code? If so then a mixin plus a small helper function might do the job. If not then is such a trait feasible?
Mar 14 2014
On 3/14/14, 1:42 PM, Paulo Pinto wrote:Am 14.03.2014 19:09, schrieb David Gileadi:Quite, but as Manu says about his proposed solution,On 3/13/14, 11:21 PM, Manu wrote:Might be problematic with modules delivered only in .di + binary form. -- PauloMy feeling is that an ideal solution would be something like an enhancement which would allow the 'mixin' keyword to be used with regular function calls. What this would do is 'mix in' the function call at this location, ie, effectively inline that particular call, and it leverages a keyword and concept that we already have. It would obviously produce a compile error of the code is not available. I quite like this idea, but there is a potential syntactical problem; how to assign the return value? int func(int y) { return y*y+10; } int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get in the way' if the output int output = mixin(func(10)); // now i feel paren spammy... mixin(int output = func(10)); // this doesn't feel right... My feeling is the first is the best, but I'm not sure about that grammatically.Is there already some trait for getting the string value of a function including its code? If so then a mixin plus a small helper function might do the job. If not then is such a trait feasible?It would obviously produce a compile error of (sic) the code is not available.This would need to behave similarly.
Mar 14 2014