www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - inlining...

reply Manu <turkeyman gmail.com> writes:
So, I'm constantly running into issues with not having control over inline.
I've run into it again doing experiments in preparation for my dconf talk...

I have identified 2 cases which come up regularly:
 1. A function that should always be inline unconditionally (std.simd is
effectively blocked on this)
 2. A particular invocation of a function should be inlined for this call
only

The first case it just about having control over code gen. Some functions
should effectively be macros or pseudo-intrinsics (ie, intrinsic wrappers
in std.simd, beauty wrappers around asm code, etc), and I don't ever want
to see a symbol appear in the binary.

My suggestion is introduction of __forceinline or something like it. We
need this.


The second case is interesting, and I've found it comes up a few times on
different occasions.
In my current instance, I'm trying to build generic framework to perform
efficient composable data processing, and a basic requirement is that the
components are inlined, such that the optimiser can interleave the work
properly.

Let's imagine I have a template which implements a work loop, which wants
to call a bunch of work elements it receives by alias. The issue is, each
of those must be inlined, for this call instance only, and there's no way
to do this.
I'm gonna draw the line at stringified code to use with mixin; I hate that,
and I don't want to encourage use of mixin or stringified code in
user-facing API's as a matter of practise. Also, some of these work
elements might be useful functions in their own right, which means they can
indeed be a function existing somewhere else that shouldn't itself be
attributed as __forceinline.

What are the current options to force that some code is inlined?

My feeling is that an ideal solution would be something like an enhancement
which would allow the 'mixin' keyword to be used with regular function
calls.
What this would do is 'mix in' the function call at this location, ie,
effectively inline that particular call, and it leverages a keyword and
concept that we already have. It would obviously produce a compile error of
the code is not available.

I quite like this idea, but there is a potential syntactical problem; how
to assign the return value?

int func(int y) { return y*y+10; }

int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get in
the way' if the output
int output = mixin(func(10)); // now i feel paren spammy...
mixin(int output = func(10)); // this doesn't feel right...

My feeling is the first is the best, but I'm not sure about that
grammatically.


The other thing that comes to mind is that it seems like this might make a
case for AST macros... but I think that's probably overkill for this
situation, and I'm not confident we're ever gonna attempt to crack that
nut. I'd like to see something practical and unobjectionable preferably.


This problem is fairly far reaching; phobos receives a lot of lambdas these
days, which I've found don't reliably inline and interfere with the
optimisers ability to optimise the code.
There was some discussion about a code unrolling API some time back, and
this would apply there (the suggested solution used string mixins! >_<).
Debug build performance is a problem which would be improved with this
feature.
Mar 13 2014
next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Friday, 14 March 2014 at 06:21:27 UTC, Manu wrote:
 So, I'm constantly running into issues with not having control 
 over inline.
 I've run into it again doing experiments in preparation for my 
 dconf talk...

 I have identified 2 cases which come up regularly:
  1. A function that should always be inline unconditionally 
 (std.simd is
 effectively blocked on this)
  2. A particular invocation of a function should be inlined for 
 this call
 only

 The first case it just about having control over code gen. Some 
 functions
 should effectively be macros or pseudo-intrinsics (ie, 
 intrinsic wrappers
 in std.simd, beauty wrappers around asm code, etc), and I don't 
 ever want
 to see a symbol appear in the binary.

 My suggestion is introduction of __forceinline or something 
 like it. We
 need this.


 The second case is interesting, and I've found it comes up a 
 few times on
 different occasions.
 In my current instance, I'm trying to build generic framework 
 to perform
 efficient composable data processing, and a basic requirement 
 is that the
 components are inlined, such that the optimiser can interleave 
 the work
 properly.

 Let's imagine I have a template which implements a work loop, 
 which wants
 to call a bunch of work elements it receives by alias. The 
 issue is, each
 of those must be inlined, for this call instance only, and 
 there's no way
 to do this.
 I'm gonna draw the line at stringified code to use with mixin; 
 I hate that,
 and I don't want to encourage use of mixin or stringified code 
 in
 user-facing API's as a matter of practise. Also, some of these 
 work
 elements might be useful functions in their own right, which 
 means they can
 indeed be a function existing somewhere else that shouldn't 
 itself be
 attributed as __forceinline.

 What are the current options to force that some code is inlined?

 My feeling is that an ideal solution would be something like an 
 enhancement
 which would allow the 'mixin' keyword to be used with regular 
 function
 calls.
 What this would do is 'mix in' the function call at this 
 location, ie,
 effectively inline that particular call, and it leverages a 
 keyword and
 concept that we already have. It would obviously produce a 
 compile error of
 the code is not available.

 I quite like this idea, but there is a potential syntactical 
 problem; how
 to assign the return value?

 int func(int y) { return y*y+10; }

 int output = mixin func(10); // the 'mixin' keyword seems to 
 kinda 'get in
 the way' if the output
 int output = mixin(func(10)); // now i feel paren spammy...
 mixin(int output = func(10)); // this doesn't feel right...

 My feeling is the first is the best, but I'm not sure about that
 grammatically.


 The other thing that comes to mind is that it seems like this 
 might make a
 case for AST macros... but I think that's probably overkill for 
 this
 situation, and I'm not confident we're ever gonna attempt to 
 crack that
 nut. I'd like to see something practical and unobjectionable 
 preferably.


 This problem is fairly far reaching; phobos receives a lot of 
 lambdas these
 days, which I've found don't reliably inline and interfere with 
 the
 optimisers ability to optimise the code.
 There was some discussion about a code unrolling API some time 
 back, and
 this would apply there (the suggested solution used string 
 mixins! >_<).
 Debug build performance is a problem which would be improved 
 with this
 feature.
As much as I like the idea: Something always tells me this is the compilers job... What clever reasoning are you applying that the compiler's inliner can't? It seems like a different situation to say SIMD code, where correctly structuring loops can require a lot of gymnastics that the compiler can't or won't (floating point conformance) do. The inlining decision seems easily automatable in comparison. I understand that unoptimised builds for debugging are a problem, but a sensible compiler let's you hand pick your optimisation passes. In short: why are compilers not good enough at this that the programmer needs to be involved?
Mar 14 2014
next sibling parent "w0rp" <devw0rp gmail.com> writes:
On Friday, 14 March 2014 at 08:03:04 UTC, John Colvin wrote:
 As much as I like the idea:

 Something always tells me this is the compilers job... What 
 clever reasoning are you applying that the compiler's inliner 
 can't? It seems like a different situation to say SIMD code, 
 where correctly structuring loops can require a lot of 
 gymnastics that the compiler can't or won't (floating point 
 conformance) do. The inlining decision seems easily automatable 
 in comparison.

 I understand that unoptimised builds for debugging are a 
 problem, but a sensible compiler let's you hand pick your 
 optimisation passes.

 In short: why are compilers not good enough at this that the 
 programmer needs to be involved?
I think it's possible for a programmer to make a better decision about what to do than a compiler. Clearly the compiler isn't smart enough to make the right decisions for Manu now, so I think it would be acceptable to at least insert functionality to give him that control now until the compiler can. There is the question of whether or not it's possible for a compiler to make the right decisions in the right places, but I'm not experienced enough to address that.
Mar 14 2014
prev sibling next sibling parent "duh" <nothx yahoo.com> writes:
 Something always tells me this is the compilers job... What 
 clever reasoning are you applying that the compiler's inliner 
 can't? It seems like a different situation to say SIMD code, 
 where correctly structuring loops can require a lot of 
 gymnastics that the compiler can't or won't (floating point 
 conformance) do. The inlining decision seems easily automatable 
 in comparison.

 I understand that unoptimised builds for debugging are a 
 problem, but a sensible compiler let's you hand pick your 
 optimisation passes.

 In short: why are compilers not good enough at this that the 
 programmer needs to be involved?
No compiler gets this right 100% of the time, so if it is the compilers job they are failing. Most C++ compilers will sometimes require use of forceinline with SSE intrinsics. Unless it has PGO support the compiler has no idea about the runtime usage of that code. It wouldn't know which code the program spends 90% of its time in so it just applies general heuristics when deciding to inline. What I'd like is the ability to set a inline level per function. Something like 0 being always inline, and 10 being never inline. Unless specified otherwise, the default would be 5 So if you want forceinline behavior inline(0) vec3 dot(vec3 a, vec3 b); //always inlined inline(10) vec3 cross(vec3 a, vec3 b); //never inlined And override it at callsite-- inline(10) auto v = dot(a,b);
Mar 14 2014
prev sibling next sibling parent "Ethan" <gooberman gmail.com> writes:
On Friday, 14 March 2014 at 08:03:04 UTC, John Colvin wrote:
 Something always tells me this is the compilers job
If all methods are virtual by default, how can the compiler inline the code? Properties are a great example where I'd want to both final and inline them in quite a few cases. In those cases, the existence of inline would negate the need for final entirely because being a virtual method would never come in to the equation. This would also apply to UFCS functions, which I use to wrap D types such as strings in to C++ interface vtables without making the programmer jump through a bunch of hoops. Inline in Microsoft's compiler is always considered a strong hint. There are cases where even __forceinline won't actually inline a function if the compiler decides you're on crack. I assume this would be the case here, and you'd just be helping inform the compiler what you want inlined in case it slips up and gets it wrong.
Mar 14 2014
prev sibling parent reply Manu <turkeyman gmail.com> writes:
On 14 March 2014 18:03, John Colvin <john.loughran.colvin gmail.com> wrote:

 As much as I like the idea:

 Something always tells me this is the compilers job... What clever
 reasoning are you applying that the compiler's inliner can't? It seems like
 a different situation to say SIMD code, where correctly structuring loops
 can require a lot of gymnastics that the compiler can't or won't (floating
 point conformance) do. The inlining decision seems easily automatable in
 comparison.

 I understand that unoptimised builds for debugging are a problem, but a
 sensible compiler let's you hand pick your optimisation passes.

 In short: why are compilers not good enough at this that the programmer
 needs to be involved?
The compiler applies generalised heuristics, which are certainly for the 'common' case, whatever that happens to be. The compiler simply doesn't know what you're doing, so it's very hard for the compiler to do anything really intelligent. Inlining heuristics are fickle, and they also don't know what you're actually trying to do. Is a function 'long'? How long is 'long'? Is the function 'hot'? Do we prefer code size or execution speed? Is the function called only from this location, or is it used in many locations? Etc. Inlining is one of the most fuzzy pieces of logic in the compiler, and relies on a lot of information that is impossible for the compiler to deduce, so it applies heuristics to try and do a decent job, but it's certainly not perfect. I argue, nothing so fickle can exist in the language without having a manual override. Especially not in a native language. In my current case, the functions I need to inline are not exactly trivial. They're really pushing the boundaries of the compilers inliner heuristics, and then I'm calling a series of such functions that operate on parallel data. If they don't inline, the performance equals the sum of the functions plus some overhead. If they all inline, the performance is equal to only the longest one, and no overhead (the others fill in pipeline gaps). Further, some of these functions embed some shared work... if they don't inline, this work is repeated. If they do inline, the redundant repeated work is eliminated. My experiments with std.algorithm were a failure. I realised quickly that I couldn't rely on the inliner to do a satisfactory job, and the optimiser was unable to do it's job properly. std.algorithm could really benefit from the mixin suggestion since things like predicate functions are always trivial, usually supplied as little lambdas, and inlining isn't reliable. Especially in the debug builds. Something like algorithm loop sugar shouldn't run heaps worse than an explicit loop just because it happens to be implemented by a generic function.
Mar 14 2014
parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Friday, 14 March 2014 at 11:04:34 UTC, Manu wrote:
 On 14 March 2014 18:03, John Colvin 
 <john.loughran.colvin gmail.com> wrote:

 As much as I like the idea:

 Something always tells me this is the compilers job... What 
 clever
 reasoning are you applying that the compiler's inliner can't? 
 It seems like
 a different situation to say SIMD code, where correctly 
 structuring loops
 can require a lot of gymnastics that the compiler can't or 
 won't (floating
 point conformance) do. The inlining decision seems easily 
 automatable in
 comparison.

 I understand that unoptimised builds for debugging are a 
 problem, but a
 sensible compiler let's you hand pick your optimisation passes.

 In short: why are compilers not good enough at this that the 
 programmer
 needs to be involved?
The compiler applies generalised heuristics, which are certainly for the 'common' case, whatever that happens to be. The compiler simply doesn't know what you're doing, so it's very hard for the compiler to do anything really intelligent. Inlining heuristics are fickle, and they also don't know what you're actually trying to do. Is a function 'long'? How long is 'long'? Is the function 'hot'? Do we prefer code size or execution speed? Is the function called only from this location, or is it used in many locations? Etc. Inlining is one of the most fuzzy pieces of logic in the compiler, and relies on a lot of information that is impossible for the compiler to deduce, so it applies heuristics to try and do a decent job, but it's certainly not perfect. I argue, nothing so fickle can exist in the language without having a manual override. Especially not in a native language. In my current case, the functions I need to inline are not exactly trivial. They're really pushing the boundaries of the compilers inliner heuristics, and then I'm calling a series of such functions that operate on parallel data. If they don't inline, the performance equals the sum of the functions plus some overhead. If they all inline, the performance is equal to only the longest one, and no overhead (the others fill in pipeline gaps). Further, some of these functions embed some shared work... if they don't inline, this work is repeated. If they do inline, the redundant repeated work is eliminated. My experiments with std.algorithm were a failure. I realised quickly that I couldn't rely on the inliner to do a satisfactory job, and the optimiser was unable to do it's job properly. std.algorithm could really benefit from the mixin suggestion since things like predicate functions are always trivial, usually supplied as little lambdas, and inlining isn't reliable. Especially in the debug builds. Something like algorithm loop sugar shouldn't run heaps worse than an explicit loop just because it happens to be implemented by a generic function.
Thanks for the explanations. Another use case is to aid propogation of compile-time information for optimisation. A function might look like a poor candidate for inlining for other reasons, but if there's a statically known (to the caller) integer parameter coming in that will be used to decide a loop length, inlining allows that info to be propogated to the callee. Static loop lengths => well optimised loops, with opportunities for optimal unrolling. Even with quite a large function this can be a good choice to inline. I don't know how good compilers are at taking this sort of thing into account already.
Mar 14 2014
next sibling parent "bearophile" <bearophileHUGS lycos.com> writes:
John Colvin:

 Another use case is to aid propogation of compile-time 
 information for optimisation.
 A function might look like a poor candidate for inlining for 
 other reasons, but if there's a statically known (to the 
 caller) integer parameter coming in that will be used to decide 
 a loop length, inlining allows that info to be propogated to 
 the callee. Static loop lengths => well optimised loops, with 
 opportunities for optimal unrolling. Even with quite a large 
 function this can be a good choice to inline.
If the function is private in a module, and it's called only from one point (or otherwise the loop count is the same in different calls), I think this optimization can be performed even if the function is not inlined. Bye, bearophile
Mar 14 2014
prev sibling parent reply Manu <turkeyman gmail.com> writes:
On 14 March 2014 22:02, John Colvin <john.loughran.colvin gmail.com> wrote:
 Thanks for the explanations.

 Another use case is to aid propogation of compile-time information for
 optimisation.
 A function might look like a poor candidate for inlining for other
 reasons, but if there's a statically known (to the caller) integer
 parameter coming in that will be used to decide a loop length, inlining
 allows that info to be propogated to the callee. Static loop lengths =>
 well optimised loops, with opportunities for optimal unrolling. Even with
 quite a large function this can be a good choice to inline.
Yup, this is a classic example. Extremely relevant. And it's precisely the sort of thing that an inline heuristic is likely to fail at. I don't know how good compilers are at taking this sort of thing into
 account already.
I don't know if they try or not, but I can say from experience that results are generally unreliable. I would never depend on the inliner to get this right. On 14 March 2014 22:08, bearophile <bearophileHUGS lycos.com> wrote:
 John Colvin:


 ...

 If the function is private in a module, and it's called only from one
 point (or otherwise the loop count is the same in different calls), I think
 this optimization can be performed even if the function is not inlined.
This is probably true, but I would never rely on it. You have some carefully tuned code that works well, and then one day, some random unrelated thing tweaks a balance, and your previously good code is suddenly slow for unknown reasons. The point is, there are times when you know your code should be inlined; ie, it's not an 'optimisation', it's an expectation/requirement. A programmer needs to be able to express this.
Mar 14 2014
parent reply Nick Sabalausky <SeeWebsiteToContactMe semitwist.com> writes:
On 3/14/2014 8:37 AM, Manu wrote:
 On 14 March 2014 22:02, John Colvin <john.loughran.colvin gmail.com> wrote:
 I don't know how good compilers are at taking this sort of thing into
 account already.
I don't know if they try or not, but I can say from experience that results are generally unreliable. I would never depend on the inliner to get this right.
I don't know how this compares to other inliners, but FWIW, DMD's inliner is pretty simple (By coincidence, I was just digging into it the other day): Every expression node (ie non-statement, non-declaration) in the function's AST adds 1 to the cost of inlining (so ex: 1+2*3 would have a cost of 2 - one mult, plus one addition). If the total cost is under 250, the function is inlined. Also, any type of AST node that isn't explicitly handled in inline.c will prevent a function from ever being inlined (since the ijnliner doesn't know how to inline it). I assume this is probably after lowerings are done, though, so more advanced constructs probably don't need to be explicitly handled. There is one other minor difficulty worth noting: When DMD wants to inline a function call, and the function's return value is actually used (ex: "auto x = foo();" or "1 + foo()"), the function must get inlined as an expression. Unfortunately, AIUI, a lot of D's statements can't be implemented inside an expression ATM (such as loops), so these would currently prevent such a function call from being inlined. I don't know how easy or difficult that would be to fix. Conceptually it should be simple: Create an Expression type StatementExp to wrap a Statement as an expression. But other parts of the backend would probably need to know about it, and I'm unfamiliar with the rest of the backend, so have no idea what that would/wouldn't entail. Not that it can't be done (AFAIK), but since the subject came up I thought I'd give a brief overview of the current DMD inliner, just FWIW.
Mar 14 2014
parent "Chris Williams" <yoreanon-chrisw yahoo.co.jp> writes:
On Friday, 14 March 2014 at 22:12:38 UTC, Nick Sabalausky wrote:
 On 3/14/2014 8:37 AM, Manu wrote:
 On 14 March 2014 22:02, John Colvin 
 <john.loughran.colvin gmail.com> wrote:
 I don't know how good compilers are at taking this sort of 
 thing into
 account already.
I don't know if they try or not, but I can say from experience that results are generally unreliable. I would never depend on the inliner to get this right.
I don't know how this compares to other inliners, but FWIW, DMD's inliner is pretty simple (By coincidence, I was just digging into it the other day): Every expression node (ie non-statement, non-declaration) in the function's AST adds 1 to the cost of inlining (so ex: 1+2*3 would have a cost of 2 - one mult, plus one addition). If the total cost is under 250, the function is inlined. Also, any type of AST node that isn't explicitly handled in inline.c will prevent a function from ever being inlined (since the ijnliner doesn't know how to inline it). I assume this is probably after lowerings are done, though, so more advanced constructs probably don't need to be explicitly handled. There is one other minor difficulty worth noting: When DMD wants to inline a function call, and the function's return value is actually used (ex: "auto x = foo();" or "1 + foo()"), the function must get inlined as an expression. Unfortunately, AIUI, a lot of D's statements can't be implemented inside an expression ATM (such as loops), so these would currently prevent such a function call from being inlined. I don't know how easy or difficult that would be to fix. Conceptually it should be simple: Create an Expression type StatementExp to wrap a Statement as an expression. But other parts of the backend would probably need to know about it, and I'm unfamiliar with the rest of the backend, so have no idea what that would/wouldn't entail. Not that it can't be done (AFAIK), but since the subject came up I thought I'd give a brief overview of the current DMD inliner, just FWIW.
Probably one easy adjustment that would result in a lot of gain in optimization would be to bump the lower bound of 250 if the function is an operator overload.
Mar 14 2014
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2014-03-14 07:21, Manu wrote:
 So, I'm constantly running into issues with not having control over inline.
 I've run into it again doing experiments in preparation for my dconf talk...

 I have identified 2 cases which come up regularly:
   1. A function that should always be inline unconditionally (std.simd
 is effectively blocked on this)
   2. A particular invocation of a function should be inlined for this
 call only

 The first case it just about having control over code gen. Some
 functions should effectively be macros or pseudo-intrinsics (ie,
 intrinsic wrappers in std.simd, beauty wrappers around asm code, etc),
 and I don't ever want to see a symbol appear in the binary.

 My suggestion is introduction of __forceinline or something like it. We
 need this.
Haven't we already agreed a pragma for force inline should be implemented. Or is that something I have dreamed?
 The second case is interesting, and I've found it comes up a few times
 on different occasions.
 In my current instance, I'm trying to build generic framework to perform
 efficient composable data processing, and a basic requirement is that
 the components are inlined, such that the optimiser can interleave the
 work properly.

 Let's imagine I have a template which implements a work loop, which
 wants to call a bunch of work elements it receives by alias. The issue
 is, each of those must be inlined, for this call instance only, and
 there's no way to do this.
 I'm gonna draw the line at stringified code to use with mixin; I hate
 that, and I don't want to encourage use of mixin or stringified code in
 user-facing API's as a matter of practise. Also, some of these work
 elements might be useful functions in their own right, which means they
 can indeed be a function existing somewhere else that shouldn't itself
 be attributed as __forceinline.

 What are the current options to force that some code is inlined?

 My feeling is that an ideal solution would be something like an
 enhancement which would allow the 'mixin' keyword to be used with
 regular function calls.
 What this would do is 'mix in' the function call at this location, ie,
 effectively inline that particular call, and it leverages a keyword and
 concept that we already have. It would obviously produce a compile error
 of the code is not available.

 I quite like this idea, but there is a potential syntactical problem;
 how to assign the return value?

 int func(int y) { return y*y+10; }

 int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get
I think this is the best syntax of these three alternatives.
 in the way' if the output
 int output = mixin(func(10)); // now i feel paren spammy...
This syntax can't work. It's already interpreted calling "func" and use the result as a string mixin.
 mixin(int output = func(10)); // this doesn't feel right...
No.
 My feeling is the first is the best, but I'm not sure about that
 grammatically.
Yeah, I agree.
 The other thing that comes to mind is that it seems like this might make
 a case for AST macros... but I think that's probably overkill for this
 situation, and I'm not confident we're ever gonna attempt to crack that
 nut. I'd like to see something practical and unobjectionable preferably.
AST macros would solve it. It could solve the first use case as well. I would not implement AST macros just to support force inline but we have many other uses cases as well. I would have implement AST macros a long time ago. Hopefully this would avoid the need to create new language features in some cases. First use case, just define a macro that returns the AST for the content of the function you would create. macro func (Ast!(int) a) { return <[ $a * $a; ]>; } int output = func(10); // always inlined Second use case, define a macro, "inline", that takes the function you want to call as a parameter. The macro will basically inline the body. macro inline (T, U...) (Ast!(T function (U) func) { // this would probably be more complicated return func.body; } int output = func(10); // not inlined int output = inline(func(10)); // always inlined
 This problem is fairly far reaching; phobos receives a lot of lambdas
 these days, which I've found don't reliably inline and interfere with
 the optimisers ability to optimise the code.
I thought since lambdas are passed as template parameters they would always be inlined. -- /Jacob Carlborg
Mar 14 2014
next sibling parent reply Michel Fortin <michel.fortin michelf.ca> writes:
On 2014-03-14 17:57:59 +0000, Jacob Carlborg <doob me.com> said:

 int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get
I think this is the best syntax of these three alternatives.
Maybe, but what does it do? Should it just inline the call to func? Or should it inline recursively every call inside func? Or maybe something in the middle? -- Michel Fortin michel.fortin michelf.ca http://michelf.ca
Mar 14 2014
next sibling parent Jacob Carlborg <doob me.com> writes:
On 2014-03-14 19:02, Michel Fortin wrote:

 Maybe, but what does it do? Should it just inline the call to func? Or
 should it inline recursively every call inside func? Or maybe something
 in the middle?
I guess Manu needs to answer this one. -- /Jacob Carlborg
Mar 14 2014
prev sibling parent reply Manu <turkeyman gmail.com> writes:
On 15 March 2014 04:02, Michel Fortin <michel.fortin michelf.ca> wrote:

 On 2014-03-14 17:57:59 +0000, Jacob Carlborg <doob me.com> said:

  int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get

 I think this is the best syntax of these three alternatives.
Maybe, but what does it do? Should it just inline the call to func? Or should it inline recursively every call inside func? Or maybe something in the middle?
I'd say it should inline only func. Any sub-calls are subject to the regular inline heuristics.
Mar 14 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Saturday, 15 March 2014 at 04:17:06 UTC, Manu wrote:
 I'd say it should inline only func. Any sub-calls are subject 
 to the
 regular inline heuristics.
I agree with you that explicit inlining is absolutely necessary and that call site inlining is highly desirable. However, I think that the call-site inlining should inline as much as possible. Basically this is something you will try when the code is too slow to meet real time deadlines and you hope to avoid going for a hand optimized solution in order to cut down on dev time. That suggests aggressive inlining to me. If the inlining only goes one level then I don't think this will be used frequently enough to be useful, e.g. you can just create one inline version and then a non-inline version that calls the inline version. E.g.: noninline_func(){ inline_func();} Ola.
Mar 18 2014
parent reply Manu <turkeyman gmail.com> writes:
On 18 March 2014 23:11,
<7d89a89974b0ff40.invalid internationalized.invalid>wrote:

 On Saturday, 15 March 2014 at 04:17:06 UTC, Manu wrote:

 I'd say it should inline only func. Any sub-calls are subject to the
 regular inline heuristics.
I agree with you that explicit inlining is absolutely necessary and that call site inlining is highly desirable. However, I think that the call-site inlining should inline as much as possible. Basically this is something you will try when the code is too slow to meet real time deadlines and you hope to avoid going for a hand optimized solution in order to cut down on dev time. That suggests aggressive inlining to me.
Inlining is a basic codegen tool, and it's important that low-level programmers have tight control over this aspect of the compiler's codegen. I think it's a mistake to consider it an optimisation, unless you know precisely what you're doing. I wouldn't want to see it try and forcibly inline the whole tree; there's no reason to believe that the whole tree should be inlined 100% of the time, rather, it's almost certainly not the case. In the case you do want to inline the whole tree, you can just cascade the mixin through the stack. In the case you suggest which flattens the tree by default, we've lost control; how to tell it only to do it for one level without hacks? And I believe this is the common case. For example, It's very likely that you might require a function to inline that is relatively trivial in its own right; a wrapper or a macro effectively, but conditionally calls an expensive function, or perhaps calls a function that you don't have source for (it would break at that point if it tried to inline the tree). If the inlining only goes one level then I don't think this will be used
 frequently enough to be useful, e.g. you can just create one inline version
 and then a non-inline version that calls the inline version.
As the one that requested it, I have numerous uses for it to mixin just the one level. I can't imagine any uses where I would ever want to explicitly inline the whole tree, and not be happy to cascade it manually. E.g.:
 noninline_func(){ inline_func();}
Why? This is really overcomplicating a simple thing. And I'm not quite sure what you're suggesting this should do either. Are you saying the call tree is flattened behind this proxy non-inline function? I don't think that's useful. I don't think anything would/should be marked __alwaysinline unless you REALLY mean that it has literally no business being called. Ie, marking something __alwaysinline just for the sake of wrapping it with a non-inline is the wrong thing to do. Just to reiterate, inline is a tool, not an 'optimisation'. It doesn't necessarily yield faster code, in many situations it is slower, and best left to the compiler to decide. But it's an important tool for any low-level programmer to have. D must provide a sufficient suite of low-level tools that allow proper control over the code generation. I think as a tool, it should be deliberate and conservative in approach, ie, just one level, and let the programmer cascade it if that's what they mean to do. There should be no surprises with something like this, and if it's inlining a whole call tree, you often don't know what happens further down the tree, and it's more likely to change on you unexpectedly.
Mar 18 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 19 March 2014 at 01:28:48 UTC, Manu wrote:
 In the case you do want to inline the whole tree, you can just 
 cascade the
 mixin through the stack. In the case you suggest which flattens 
 the tree by
 default, we've lost control; how to tell it only to do it for 
 one level
 without hacks? And I believe this is the common case.
You could provide it with a recursion level parameter or parameters for cost level heuristics. It could also be used to flatten tail-call recursion.
 As the one that requested it, I have numerous uses for it to 
 mixin just the
 one level. I can't imagine any uses where I would ever want to 
 explicitly
 inline the whole tree, and not be happy to cascade it manually.
In innerloops to factor out common subexpressions that are otherwise recomputed over and over and over. When the function is generated code (not hand written).
 noninline_func(){ inline_func();}
Why? This is really overcomplicating a simple thing. And I'm not quite sure what you're suggesting this should do either. Are you saying the call tree is flattened behind this proxy non-inline function?
No, I am saying that the one level mixin doesn't provide you with anything new. You already have that. It is sugar.
Mar 18 2014
parent reply Manu <turkeyman gmail.com> writes:
On 19 March 2014 16:18,
<7d89a89974b0ff40.invalid internationalized.invalid>wrote:

 On Wednesday, 19 March 2014 at 01:28:48 UTC, Manu wrote:

 In the case you do want to inline the whole tree, you can just cascade the
 mixin through the stack. In the case you suggest which flattens the tree
 by
 default, we've lost control; how to tell it only to do it for one level
 without hacks? And I believe this is the common case.
You could provide it with a recursion level parameter or parameters for cost level heuristics.
Again, I think this is significantly overcomplicating something which see is being extremely simple. It could also be used to flatten tail-call recursion. I don't think it's valid to inline a tail call recursion, because the inlined call also wants to inline another call to itself... You can't know how fer it should go, so it needs to be transformed into a loop, and not we're talking about something completely different than inlining. As the one that requested it, I have numerous uses for it to mixin just the
 one level. I can't imagine any uses where I would ever want to explicitly
 inline the whole tree, and not be happy to cascade it manually.
In innerloops to factor out common subexpressions that are otherwise recomputed over and over and over.
This is highly context sensitive. I would trust the compiler heuristics to make the right decision here. The idea of eliminating common sub-expressions suggests that there _are_ common sub-expressions, which aren't affected by the function arguments. This case is highly unusual in my experience. And I personally wouldn't depend on a feature such as this to address that sort of a problem in my codegen. I would just refactor the function a little bit to call the common sub-expression ahead of time. When the function is generated code (not hand written). I'm not sue what you mean here? noninline_func(){ inline_func();}

 Why? This is really overcomplicating a simple thing. And I'm not quite
 sure
 what you're suggesting this should do either. Are you saying the call tree
 is flattened behind this proxy non-inline function?
No, I am saying that the one level mixin doesn't provide you with anything new.
It really does provide something new. It provides effectively, a type-safe implementation of something that may be used in place of C/C++ macros. I think that would be extremely useful in a variety of applications. You already have that. It is sugar.

I don't already have it, otherwise I'd be making use of it. D has no
control over the inliner. GDC/LDC offer attributes, but then it's really
annoying that D has no mechanism to make use of compiler-specific
attributes in a portable way (ie, attribute aliasing), so I can't make use
of those without significantly interfering with my code.

I also don't think that suggestion of yours works. I suspect the compiler
will see the outer function as a trivial wrapper which will fall within the
compilers normal inline heuristics, and it will all inline anyway.
Mar 19 2014
next sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 19 March 2014 at 08:35:53 UTC, Manu wrote:
 The idea of eliminating common sub-expressions suggests that 
 there _are_
 common sub-expressions, which aren't affected by the function 
 arguments.
 This case is highly unusual in my experience.
Not if you delay optimization until profiling and focus on higher level structures during initial implementation. Or use composing (like generic programming). If you hand optimize right from the start then you might be right, but if you never call a function with the same parameters then you are doing premature optimization IMHO.
 When the function is generated code (not hand written).
I'm not sue what you mean here?
Code that is generated by a tool (or composable templates or whatever) tend to be repetitive and suboptimal. I.e. boiler plate code that looks like it was written by a monkey…
 You already have that. It is sugar.
I don't already have it, otherwise I'd be making use of it. D has no control over the inliner.
I meant that if you have explicit inline hints like C++ then you also have call-site inlining if you want to.
 I also don't think that suggestion of yours works. I suspect 
 the compiler
 will see the outer function as a trivial wrapper which will 
 fall within the
 compilers normal inline heuristics, and it will all inline 
 anyway.
That should be considered a bug if it is called from more than one location.
Mar 19 2014
parent reply Manu <turkeyman gmail.com> writes:
On 19 March 2014 19:16,
<7d89a89974b0ff40.invalid internationalized.invalid>wrote:

 On Wednesday, 19 March 2014 at 08:35:53 UTC, Manu wrote:

 The idea of eliminating common sub-expressions suggests that there _are_
 common sub-expressions, which aren't affected by the function arguments.
 This case is highly unusual in my experience.
Not if you delay optimization until profiling and focus on higher level structures during initial implementation. Or use composing (like generic programming). If you hand optimize right from the start then you might be right, but if you never call a function with the same parameters then you are doing premature optimization IMHO.
Okay, do you have use cases for any of this stuff? Are you just making it up, or do you have significant experience to say this is what you need? I can say for a fact, that recursive inline would make almost everything I want it for much more annoying. I would find myself doing stupid stuff to fight the recursive inliner in every instance. When the function is generated code (not hand written).

 I'm not sue what you mean here?
Code that is generated by a tool (or composable templates or whatever) tend to be repetitive and suboptimal. I.e. boiler plate code that looks like it was written by a monkey=E2=80=A6
I'm not sure where the value is... why would you want to inline this? You already have that. It is sugar.

 I don't already have it, otherwise I'd be making use of it. D has no
 control over the inliner.
I meant that if you have explicit inline hints like C++ then you also hav=
e
 call-site inlining if you want to.
I still don't follow. C++ doesn't have call-site inlining. C/C++ has macros, and there is no way to achieve the same functionality in D right now, that's a key motivation for the proposal. I also don't think that suggestion of yours works. I suspect the compiler
 will see the outer function as a trivial wrapper which will fall within
 the
 compilers normal inline heuristics, and it will all inline anyway.
That should be considered a bug if it is called from more than one location.
Seriously, you're making 'inline' about 10 times more complicated than it should ever be. If you ask me, I have no value in recursive inlining, infact, that would anger me considerably. By making this hard, you're also making it equally unlikely. Let inline exist first, then if/when it doesn't suit your use cases, argue for the details.
Mar 19 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Wednesday, 19 March 2014 at 12:35:30 UTC, Manu wrote:
 Okay, do you have use cases for any of this stuff? Are you just 
 making it
 up, or do you have significant experience to say this is what 
 you need?
I don't need anything, I hand optimize prematurely. And I don't want to do that. But yes, inner loops benefits from exhaustive inlining because you get to move common expressions out of the loop or change them to delta increments. It is only when you trash the caches that inlining does not pay off. I do it by hand. I don't want to do it by hand.
 If you ask me, I have no value in recursive inlining, infact, 
 that would
 anger me considerably.
Why? You could always set the depth to 1, or make 1 the default. And it isn't difficult to implement.
Mar 19 2014
parent reply Manu <turkeyman gmail.com> writes:
On 20 March 2014 06:23,
<7d89a89974b0ff40.invalid internationalized.invalid>wrote:

 On Wednesday, 19 March 2014 at 12:35:30 UTC, Manu wrote:

 Okay, do you have use cases for any of this stuff? Are you just making it
 up, or do you have significant experience to say this is what you need?
I don't need anything, I hand optimize prematurely. And I don't want to do that. But yes, inner loops benefits from exhaustive inlining because you get to move common expressions out of the loop or change them to delta increments. It is only when you trash the caches that inlining does not pay off. I do it by hand. I don't want to do it by hand. If you ask me, I have no value in recursive inlining, infact, that would
 anger me considerably.
Why? You could always set the depth to 1, or make 1 the default. And it isn't difficult to implement.
The problem is upside down. If you want to inline multiple levels, you start from the leaves and move downwards, not from the root moving upwards (leaving a bunch of leaves perhaps not inlined), which is what you're really suggesting. Inlining should be strictly deliberate, there's nothing to say that every function called in a tree should be inlined. There's a high probability there's one/some that shouldn't be among a few that should. Remember too, that call-site inlining isn't the only method, there would also be always-inline. I think always-inline is what you want for some decidedly trivial functions (although these will probably be heuristically inlined anyway), not call-site inlining. I just don't see how recursive call-site inlining is appropriate, considering that call trees are often complex, subject to change, and may even call functions that you don't have source for. You can cascade the mixin keyword if you want to, that's very simple. I'd be highly surprised if you ever encountered a call tree where you wanted to inline everything (and the optimiser didn't do it for you). As soon as you encounter a single function in the tree that shouldn't be inlined, then you'll be forced to do it one level at a time anyway.
Mar 19 2014
next sibling parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Thursday, 20 March 2014 at 02:08:16 UTC, Manu wrote:
 The problem is upside down. If you want to inline multiple 
 levels, you
 start from the leaves and move downwards, not from the root 
 moving upwards
Yes, that is true in cases where leaves are frequently visited. Good point. I am most interested in full inlining, but the heuristics should probably start with the leaves for people not interested in that. Agree. Anyway, in the case of ray tracing (or any search structure) I could see the value of having the opposite in combination with CTFE/partial evaluation. Example: Define a static scene (of objects) and let the compiler turn it into "a state machine" of code. Another example: Define an array of data, use partial evaluation to turn it into a binary tree, then turn the binary tree into code.
 Inlining should be strictly deliberate, there's nothing to say 
 that every
 function called in a tree should be inlined. There's a high 
 probability
 there's one/some that shouldn't be among a few that should.
In the case of a long running loop it does not really matter. What it does get you is a chance to use generic code (or libraries) and then do a first-resort optimization. I basically see it as a time-saving feature (programmers time). A tool for cutting development costs.
 Remember too, that call-site inlining isn't the only method, 
 there would
 also be always-inline...
Yes, that is the first. I have in another thread some time ago suggested a solution that use weighted inlining to aid compiler heuristics: http://forum.dlang.org/thread/szjkyfpnachnnyknnfwp forum.dlang.org#post-szjkyfpnachnnyknnfwp:40forum.dlang.org As you can see I also suggested call-site inlining, so I am fully behind you in this. :-) Lack of inlining and GC are my main objections to D.
 I think always-inline is what you want for some
 decidedly trivial functions (although these will probably be 
 heuristically
 inlined anyway), not call-site inlining.
I agree. Compiler heuristics can change. It is desirable to be able to express intent no matter what the current heuristics are.
 I just don't see how recursive
 call-site inlining is appropriate, considering that call trees 
 are often
 complex, subject to change, and may even call functions that 
 you don't have
 source for.
You should not use it blindly.
 You can cascade the mixin keyword if you want to, that's very 
 simple.
Not if you build the innerloop using generic components. I want this inline_everything while(conditon){ statement; statement; }
 I'd be highly surprised if you ever encountered a call tree 
 where
 you wanted to inline everything (and the optimiser didn't do it 
 for you).
Not if you move to high-level programming using prewritten code and only go low level after profiling.
 As soon as you encounter a single function in the tree that 
 shouldn't be
 inlined, then you'll be forced to do it one level at a time 
 anyway.
But then you have to change the libraries you are using!? Nothing prevents you to introduce exceptions as an extension though. I want inline(0.5) as default, but also be able to write inline(1) for inline always and inline(0) for inline never. func1(){} // implies inline(0.5) weighting inline func2(){} // same as inline(1) weighting, inline always inline(0.75) fun31(){} // increase the heuristics weighting inline(0) func4(){} // never-ever inline Ola.
Mar 20 2014
next sibling parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
I just want to add these reasons for having inlining despite 
having compiler heuristics:

1. If you compile for embedded or PNACL on the web, you want a 
small executable. That means the heuristics should not inline if 
it increase the code size unless the programmer specified it in 
the code. (Or that you specify a target size, and do compiler 
re-runs until it fits.)

2. If you use profile guided opimization you should inline based 
on call frequency, but the input set might have missed some 
scenarios and you should be able to overrule the profile by 
explicit inlining in code where you know that it matters. (e.g. 
tight loop in an exception handler)
Mar 20 2014
prev sibling next sibling parent reply Manu <turkeyman gmail.com> writes:
On 20 March 2014 18:35,
<7d89a89974b0ff40.invalid internationalized.invalid>wrote:

 On Thursday, 20 March 2014 at 02:08:16 UTC, Manu wrote:

 The problem is upside down. If you want to inline multiple levels, you
 start from the leaves and move downwards, not from the root moving upwards
Yes, that is true in cases where leaves are frequently visited. Good point. I am most interested in full inlining, but the heuristics should probably start with the leaves for people not interested in that. Agree. Anyway, in the case of ray tracing (or any search structure) I could see the value of having the opposite in combination with CTFE/partial evaluation. Example: Define a static scene (of objects) and let the compiler turn it into "a state machine" of code. Another example: Define an array of data, use partial evaluation to turn it into a binary tree, then turn the binary tree into code. Inlining should be strictly deliberate, there's nothing to say that every
 function called in a tree should be inlined. There's a high probability
 there's one/some that shouldn't be among a few that should.
In the case of a long running loop it does not really matter. What it does get you is a chance to use generic code (or libraries) and then do a first-resort optimization. I basically see it as a time-saving feature (programmers time). A tool for cutting development costs. Remember too, that call-site inlining isn't the only method, there would
 also be always-inline...
Yes, that is the first. I have in another thread some time ago suggested a solution that use weighted inlining to aid compiler heuristics: http://forum.dlang.org/thread/szjkyfpnachnnyknnfwp forum.dlang.org#post- szjkyfpnachnnyknnfwp:40forum.dlang.org As you can see I also suggested call-site inlining, so I am fully behind you in this. :-) Lack of inlining and GC are my main objections to D. I think always-inline is what you want for some
 decidedly trivial functions (although these will probably be heuristically
 inlined anyway), not call-site inlining.
I agree. Compiler heuristics can change. It is desirable to be able to express intent no matter what the current heuristics are. I just don't see how recursive
 call-site inlining is appropriate, considering that call trees are often
 complex, subject to change, and may even call functions that you don't
 have
 source for.
You should not use it blindly. You can cascade the mixin keyword if you want to, that's very simple.

 Not if you build the innerloop using generic components. I want this

 inline_everything while(conditon){
 statement;
 statement;

 }

  I'd be highly surprised if you ever encountered a call tree where
 you wanted to inline everything (and the optimiser didn't do it for you).
Not if you move to high-level programming using prewritten code and only go low level after profiling. As soon as you encounter a single function in the tree that shouldn't be
 inlined, then you'll be forced to do it one level at a time anyway.
But then you have to change the libraries you are using!? Nothing prevents you to introduce exceptions as an extension though. I want inline(0.5) as default, but also be able to write inline(1) for inline always and inline(0) for inline never. func1(){} // implies inline(0.5) weighting inline func2(){} // same as inline(1) weighting, inline always inline(0.75) fun31(){} // increase the heuristics weighting inline(0) func4(){} // never-ever inline Ola.
I'm sorry. I really can't support any of these wildly complex ideas. I just don't feel they're useful, and they're not very well founded. A numeric weight? What scale is it in? I'm not sure of any 'standard-inline-weight-measure' that any programmer would be able to intuitively gauge the magic number against. That will simply never be agreed by the devs. It also doesn't make much sense... different platforms will assign very different weights and different heuristics at the inliner. It's not a numeric quantity; it's a complex determination whether a function is a good candidate or not. The value you specify is likely highly context sensitive and probably not portable. Heuristic based Inlining should be left to the optimiser to decide. And I totally object to recursive inlining. It has a kind of absolute nature that removes control all the way down the call tree, and I don't feel it's likely that you would often (ever?) want to explicitly inline an entire call tree. If you want to inline a second level, then write mixin in the second level. Recurse. You are talking about generic code as if this isn't appropriate, but I specifically intend to use this in generic code very similar to what you suggest; so I don't see the incompatibility. I think you're saying like manually specifying it all the way down the call tree is inconvenient, but I would argue that manually specifying *exclusions* throughout the call tree after specifying a recursive inline is even more inconvenient. It requires more language (a feature to mark an exclusion), has a kind of obtuse double-negative logic about it, and it's equally invasive to your code. If you can prove that single level call-site inlining doesn't satisfy your needs at some later time, make a proposal then, along with your real-world use cases. But by throwing it in this thread right now, you're kinda just killing the thread, and making it very unlikely that anything will happen at all, which is annoying, because I REALLY need this (I've been trying to motivate inline support for over 3 years), and I get the feeling you're just throwing hypotheticals around. You're still fairly new here, but be aware that feature requests will become exponentially less likely to be accepted with every degree of complexity added. By making this seem hard, you're also making it almost certain not to happen, which isn't in either of our interest. My OP suggestion is the simplest solution I can conceive which will definitely satisfy all the real-world use cases that I've ever encountered. Is predictable, portable, simple.
Mar 20 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Thursday, 20 March 2014 at 12:31:33 UTC, Manu wrote:
 I'm sorry. I really can't support any of these wildly complex 
 ideas.
They aren't actually complex, except tail-call optimization (but that is well understood).
 If you want to inline a second level, then write mixin in the 
 second level.
You might as well do copy-paste then. You cannot add inlining to an imported library without modifying it.
 at all, which is annoying, because I REALLY need this (I've 
 been trying to
 motivate inline support for over 3 years), and I get the 
 feeling you're
 just throwing hypotheticals around.
You need inlining, agree, but not 1 level mixin. Because you can do that with regular inlining.
Mar 20 2014
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
Please note that 1 level mixin is not sufficient in the case of 
libraries. In too many cases you will not inline the function 
that does the work, only the interface wrapper.
Mar 20 2014
parent Manu <turkeyman gmail.com> writes:
On 21 March 2014 00:10,
<7d89a89974b0ff40.invalid internationalized.invalid>wrote:

 Please note that 1 level mixin is not sufficient in the case of libraries.
 In too many cases you will not inline the function that does the work, only
 the interface wrapper.
I don't think I would ever want to inline the whole call tree of a library. I've certainly never wanted to do anything like that in 20 years or so, and I've worked on some really performance critical systems, like amiga, dreamcast, ps2. It still sounds really sketchy. If the function that does the work is a few levels deep, then there is probably a good reason for that. What if there's an error check that writes log output or something? Or some branch that leads to other uncommon paths? I think you're making this problem up. Can you demonstrate where this has been a problem for you in the past? The call tree would have to be so very particular for this to be appropriate, and then you say this is a library, which you have no control over... so the call tree is just perfect by chance? What if the library changes?
Mar 20 2014
prev sibling parent reply "ponce" <contact gam3sfrommars.fr> writes:
On Thursday, 20 March 2014 at 08:35:22 UTC, Ola Fosheim Grøstad 
wrote:
 Nothing prevents you to introduce exceptions as an extension 
 though. I want inline(0.5) as default, but also be able to 
 write inline(1) for inline always and inline(0) for inline 
 never.

 func1(){} // implies inline(0.5) weighting
 inline func2(){} // same as inline(1) weighting, inline always
 inline(0.75) fun31(){} // increase the heuristics weighting
 inline(0) func4(){} // never-ever inline
It looks promising when seen like that, but introducing explicit inlining/deinlining to me correspond to a precise process: 1. Bottleneck is identified. 2. "we could {inline|deinline} this call at this particular place and see what happens" 3. Apply inline directive for this call. Only "always" or "never" is ever wanted for me, and for 1 level only. 4. Measure and validate like all optimizations. Now after this, even if the inlining become harmful for other reasons, I want this inlining to be maintained, whatever the cost, not subject to random rules I don't know of. When you tweak inlining, you are supposed to know what you are doing, and it's not just an optimization, it's an essential tool that enables other optimizations, help disambiguate aliasing, help the auto-vectorizer, help constant propagation... In the large majority of cases it can be left to the compiler, and in the 1% cases that matters I want to do it explicitely full stop.
Mar 20 2014
parent "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Thursday, 20 March 2014 at 15:26:35 UTC, ponce wrote:
 Now after this, even if the inlining become harmful for other 
 reasons, I want this inlining to be maintained, whatever the 
 cost, not subject to random rules I don't know of. When you
The rules aren't random. The inliner conceptually use weighting anyway, you just increase the threshold for a specific call-tree. E.g. if a function is on the borderline of being inlined the probability is 50% if you add some noise to the selection with a magnitude that equals the "typical approximation error" of the heuristics. "inline(0.75)" should increase the probability to 75%. Today all functions have an implied "inline(0.5)". I think you should have this kind of control for all compiler heuristics thresholds that are "arbitrary", not only inlining. Call site inlining is primarily useful for inlining external code. The alternative is usually to replace libraries with your own version.
Mar 20 2014
prev sibling parent "Puming" <zhaopuming gmail.com> writes:
Maybe we could have both declare site inlining and call site 
inlining.

with declare site, what we mean is that this function's body is 
used so commonly that we make it into a function only because we 
don't want duplicate code, not because it should be a standalone 
function.

with call site inlining, one can inline thirdparty functions 
which is not declared inline.

I think the `inline` Manu suggested should not be viewed as a 
mere optimization thing, but more like a code generation utility 
which happens to be faster. In this point of view, this kind of 
`inline` should be controlled by the coder, not the compiler.

To make it clear that we are not talking about optimization, 
maybe we should call it another name, like 'mixin function'?

BTW, the Kotlin language recently get a new released, which added 
support for declare site force inline, the team argues its 
necessity here:

http://blog.jetbrains.com/kotlin/2014/03/m7-release-available/#more-1439

in the comments:
It’s traditional to think about inlining as a mere optimization, 
but this dates back to the times >when software was shipped as 
one huge binary file.

Why we think inline should be a language feature:
1. Other language features (to be implemented soon) depend on 
it. Namely, non-local returns >and type-dependent functions. 
Basically, inline functions are very restricted macros, and this 
is definitely a language feature.
2. Due to dynamic linking and binary compatibility issues it can not be up to the compiler >whether to inline something or not on the JVM: if bodies of inline functions change, all >dependent code should be recompiled, i.e. it’s the library author’s liability to preserve >functionality, so such functions must be explicitly marked.
On Thursday, 20 March 2014 at 02:08:16 UTC, Manu wrote:
 On 20 March 2014 06:23,
 <7d89a89974b0ff40.invalid internationalized.invalid>wrote:

 On Wednesday, 19 March 2014 at 12:35:30 UTC, Manu wrote:

 Okay, do you have use cases for any of this stuff? Are you 
 just making it
 up, or do you have significant experience to say this is what 
 you need?
I don't need anything, I hand optimize prematurely. And I don't want to do that. But yes, inner loops benefits from exhaustive inlining because you get to move common expressions out of the loop or change them to delta increments. It is only when you trash the caches that inlining does not pay off. I do it by hand. I don't want to do it by hand. If you ask me, I have no value in recursive inlining, infact, that would
 anger me considerably.
Why? You could always set the depth to 1, or make 1 the default. And it isn't difficult to implement.
The problem is upside down. If you want to inline multiple levels, you start from the leaves and move downwards, not from the root moving upwards (leaving a bunch of leaves perhaps not inlined), which is what you're really suggesting. Inlining should be strictly deliberate, there's nothing to say that every function called in a tree should be inlined. There's a high probability there's one/some that shouldn't be among a few that should. Remember too, that call-site inlining isn't the only method, there would also be always-inline. I think always-inline is what you want for some decidedly trivial functions (although these will probably be heuristically inlined anyway), not call-site inlining. I just don't see how recursive call-site inlining is appropriate, considering that call trees are often complex, subject to change, and may even call functions that you don't have source for. You can cascade the mixin keyword if you want to, that's very simple. I'd be highly surprised if you ever encountered a call tree where you wanted to inline everything (and the optimiser didn't do it for you). As soon as you encounter a single function in the tree that shouldn't be inlined, then you'll be forced to do it one level at a time anyway.
Mar 23 2014
prev sibling parent Jacob Carlborg <doob me.com> writes:
On 2014-03-19 09:35, Manu wrote:

 I don't already have it, otherwise I'd be making use of it. D has no
 control over the inliner. GDC/LDC offer attributes, but then it's really
 annoying that D has no mechanism to make use of compiler-specific
 attributes in a portable way (ie, attribute aliasing), so I can't make
 use of those without significantly interfering with my code.
Can't you create a tuple with different attributes depending on which compiler is currently compiling? Something like this: version (LDC) alias attributes = TypeTuple!( attribute("forceinline"); else version (GDC) alias attributes = TypeTuple!( attribute("forceinline")); else version (DigitalMars) alias attributes = TypeTuple!(); else static assert(false); (attributes) void foo () { } // This assume that "attributes" will be expanded -- /Jacob Carlborg
Mar 20 2014
prev sibling parent reply Manu <turkeyman gmail.com> writes:
On 15 March 2014 03:57, Jacob Carlborg <doob me.com> wrote:

 On 2014-03-14 07:21, Manu wrote:

 So, I'm constantly running into issues with not having control over
 inline.
 I've run into it again doing experiments in preparation for my dconf
 talk...

 I have identified 2 cases which come up regularly:
   1. A function that should always be inline unconditionally (std.simd
 is effectively blocked on this)
   2. A particular invocation of a function should be inlined for this
 call only

 The first case it just about having control over code gen. Some
 functions should effectively be macros or pseudo-intrinsics (ie,
 intrinsic wrappers in std.simd, beauty wrappers around asm code, etc),
 and I don't ever want to see a symbol appear in the binary.

 My suggestion is introduction of __forceinline or something like it. We
 need this.
Haven't we already agreed a pragma for force inline should be implemented. Or is that something I have dreamed?
It's been discussed. I never agreed to it (I _really_ don't like it), but I'll take it if it's the best I'm gonna get. I don't like stateful attributes like that. I think it's error prone, especially when it's silent. 'private:' for instance will complain if you write a new function in an area influenced by the private state and try and call it from elsewhere; ie, you know you made the mistake. If you write a new function in an area influenced by the forceinline state which wasn't intended to be inlined, you won't know. I think that's dangerous. The second case is interesting, and I've found it comes up a few times
 on different occasions.
 In my current instance, I'm trying to build generic framework to perform
 efficient composable data processing, and a basic requirement is that
 the components are inlined, such that the optimiser can interleave the
 work properly.

 Let's imagine I have a template which implements a work loop, which
 wants to call a bunch of work elements it receives by alias. The issue
 is, each of those must be inlined, for this call instance only, and
 there's no way to do this.
 I'm gonna draw the line at stringified code to use with mixin; I hate
 that, and I don't want to encourage use of mixin or stringified code in
 user-facing API's as a matter of practise. Also, some of these work
 elements might be useful functions in their own right, which means they
 can indeed be a function existing somewhere else that shouldn't itself
 be attributed as __forceinline.

 What are the current options to force that some code is inlined?

 My feeling is that an ideal solution would be something like an
 enhancement which would allow the 'mixin' keyword to be used with
 regular function calls.
 What this would do is 'mix in' the function call at this location, ie,
 effectively inline that particular call, and it leverages a keyword and
 concept that we already have. It would obviously produce a compile error
 of the code is not available.

 I quite like this idea, but there is a potential syntactical problem;
 how to assign the return value?

 int func(int y) { return y*y+10; }

 int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get
I think this is the best syntax of these three alternatives. in the way' if the output
 int output = mixin(func(10)); // now i feel paren spammy...
This syntax can't work. It's already interpreted calling "func" and use the result as a string mixin. mixin(int output = func(10)); // this doesn't feel right...

 No.


  My feeling is the first is the best, but I'm not sure about that
 grammatically.
Yeah, I agree.
So you think it's grammatically okay? The other thing that comes to mind is that it seems like this might make
 a case for AST macros... but I think that's probably overkill for this
 situation, and I'm not confident we're ever gonna attempt to crack that
 nut. I'd like to see something practical and unobjectionable preferably.
AST macros would solve it. It could solve the first use case as well. I would not implement AST macros just to support force inline but we have many other uses cases as well. I would have implement AST macros a long time ago. Hopefully this would avoid the need to create new language features in some cases. First use case, just define a macro that returns the AST for the content of the function you would create. macro func (Ast!(int) a) { return <[ $a * $a; ]>; } int output = func(10); // always inlined Second use case, define a macro, "inline", that takes the function you want to call as a parameter. The macro will basically inline the body. macro inline (T, U...) (Ast!(T function (U) func) { // this would probably be more complicated return func.body; } int output = func(10); // not inlined int output = inline(func(10)); // always inlined This problem is fairly far reaching; phobos receives a lot of lambdas
 these days, which I've found don't reliably inline and interfere with
 the optimisers ability to optimise the code.
I thought since lambdas are passed as template parameters they would always be inlined.
Maybe... (and not in debug builds). Without explicit control of the inliner, you just never know.
Mar 14 2014
parent reply "Daniel Murphy" <yebbliesnospam gmail.com> writes:
"Manu" <turkeyman gmail.com> wrote in message 
news:mailman.128.1394856947.23258.digitalmars-d puremagic.com...

 Haven't we already agreed a pragma for force inline should be 
 implemented. Or is
 that something I have dreamed?
It's been discussed. I never agreed to it (I _really_ don't like it), but I'll take it if it's the best I'm gonna get. I don't like stateful attributes like that. I think it's error prone, especially when it's silent. 'private:' for instance will complain if you write a new function in an area influenced by the private state and try and call it from elsewhere; ie, you know you made the mistake. If you write a new function in an area influenced by the forceinline state which wasn't intended to be inlined, you won't know. I think that's dangerous.
Huh? The pragma could easily be restricted to apply to exactly one function declaration, if that's what's desired.
Mar 14 2014
next sibling parent Manu <turkeyman gmail.com> writes:
On 15 March 2014 14:33, Daniel Murphy <yebbliesnospam gmail.com> wrote:

 "Manu" <turkeyman gmail.com> wrote in message news:mailman.128.1394856947.
 23258.digitalmars-d puremagic.com...

  > Haven't we already agreed a pragma for force inline should be >
 implemented. Or is
 that something I have dreamed?
It's been discussed. I never agreed to it (I _really_ don't like it), but I'll take it if it's the best I'm gonna get. I don't like stateful attributes like that. I think it's error prone, especially when it's silent. 'private:' for instance will complain if you write a new function in an area influenced by the private state and try and call it from elsewhere; ie, you know you made the mistake. If you write a new function in an area influenced by the forceinline state which wasn't intended to be inlined, you won't know. I think that's dangerous.
Huh? The pragma could easily be restricted to apply to exactly one function declaration, if that's what's desired.
Then why bother with a pragma? It's just a special case for the sake of a special case... I don't see why resist the language conventions. Where's the precedent for that? It just sounds like it's asking to cause edge cases and trouble down the line. Is it gonna get messy when it involves with templates? What about methods, sub-functions?
Mar 14 2014
prev sibling parent reply Manu <turkeyman gmail.com> writes:
On 15 March 2014 14:55, Manu <turkeyman gmail.com> wrote:

 On 15 March 2014 14:33, Daniel Murphy <yebbliesnospam gmail.com> wrote:

 "Manu" <turkeyman gmail.com> wrote in message
 news:mailman.128.1394856947.23258.digitalmars-d puremagic.com...

  > Haven't we already agreed a pragma for force inline should be >
 implemented. Or is
 that something I have dreamed?
It's been discussed. I never agreed to it (I _really_ don't like it), but I'll take it if it's the best I'm gonna get. I don't like stateful attributes like that. I think it's error prone, especially when it's silent. 'private:' for instance will complain if you write a new function in an area influenced by the private state and try and call it from elsewhere; ie, you know you made the mistake. If you write a new function in an area influenced by the forceinline state which wasn't intended to be inlined, you won't know. I think that's dangerous.
Huh? The pragma could easily be restricted to apply to exactly one function declaration, if that's what's desired.
Then why bother with a pragma? It's just a special case for the sake of a special case... I don't see why resist the language conventions. Where's the precedent for that? It just sounds like it's asking to cause edge cases and trouble down the line. Is it gonna get messy when it involves with templates? What about methods, sub-functions?
*bump* I actually care about this a whole lot more than final-by-default right now ;) I'd like to think there's a possible solution to these problems that everyone agrees with.
Mar 17 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/17/14, 6:26 AM, Manu wrote:
 On 15 March 2014 14:55, Manu <turkeyman gmail.com
 <mailto:turkeyman gmail.com>> wrote:

     On 15 March 2014 14:33, Daniel Murphy <yebbliesnospam gmail.com
     <mailto:yebbliesnospam gmail.com>> wrote:

         "Manu" <turkeyman gmail.com <mailto:turkeyman gmail.com>> wrote
         in message
         news:mailman.128.1394856947.__23258.digitalmars-d puremagic.__com...

              > Haven't we already agreed a pragma for force inline
             should be > implemented. Or is
              > that something I have dreamed?

             It's been discussed. I never agreed to it (I _really_ don't
             like it), but I'll take it if it's the best
             I'm gonna get.

             I don't like stateful attributes like that. I think it's
             error prone, especially when it's silent.
             'private:' for instance will complain if you write a new
             function in an area influenced by the
             private state and try and call it from elsewhere; ie, you
             know you made the mistake.
             If you write a new function in an area influenced by the
             forceinline state which wasn't intended
             to be inlined, you won't know. I think that's dangerous.


         Huh?  The pragma could easily be restricted to apply to exactly
         one function declaration, if that's what's desired.


     Then why bother with a pragma?
     It's just a special case for the sake of a special case... I don't
     see why resist the language conventions. Where's the precedent for
     that? It just sounds like it's asking to cause edge cases and
     trouble down the line.
     Is it gonna get messy when it involves with templates? What about
     methods, sub-functions?


 *bump*
 I actually care about this a whole lot more than final-by-default right
 now ;)

 I'd like to think there's a possible solution to these problems that
 everyone agrees with.
I'd like to see a solution to inlining along the lines of "pliz pliz inline" (best effort) and "never inline". Outlining only at a specific call site is seldom needed and when it is it's trivially achievable with a noinline function forwarding to the inline function. Inlining only at a specific call site is a tall order and essentially impossible if header generation had been used. Andrei
Mar 17 2014
parent reply Manu <turkeyman gmail.com> writes:
On 18 March 2014 01:36, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org>wrote:

 I'd like to see a solution to inlining along the lines of "pliz pliz
 inline" (best effort) and "never inline".

 Outlining only at a specific call site is seldom needed and when it is
 it's trivially achievable with a noinline function forwarding to the inline
 function. Inlining only at a specific call site is a tall order and
 essentially impossible if header generation had been used.
I don't follow, how does that work? It's the key innovation here. Since D doesn't have macros, I think it's something that really needs to be supported nicely. Obviously it's impossible if source is unavailable. It should give the same complaints that CTFE gives when source is unavailable.
Mar 17 2014
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 3/17/14, 9:10 AM, Manu wrote:
 On 18 March 2014 01:36, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org <mailto:SeeWebsiteForEmail erdani.org>>
 wrote:


     I'd like to see a solution to inlining along the lines of "pliz pliz
     inline" (best effort) and "never inline".

     Outlining only at a specific call site is seldom needed and when it
     is it's trivially achievable with a noinline function forwarding to
     the inline function. Inlining only at a specific call site is a tall
     order and essentially impossible if header generation had been used.


 I don't follow, how does that work?
 It's the key innovation here. Since D doesn't have macros, I think it's
 something that really needs to be supported nicely.
 Obviously it's impossible if source is unavailable. It should give the
 same complaints that CTFE gives when source is unavailable.
The notion that a compiler can ask for any function to be inlined without the compiler having been "warned" in the function declaration makes me uncomfortable about feasibility. However, upon further thinking the same happens with CTFE. Andrei
Mar 17 2014
parent Manu <turkeyman gmail.com> writes:
On 18 March 2014 06:37, Andrei Alexandrescu
<SeeWebsiteForEmail erdani.org>wrote:

 On 3/17/14, 9:10 AM, Manu wrote:

 On 18 March 2014 01:36, Andrei Alexandrescu
 <SeeWebsiteForEmail erdani.org <mailto:SeeWebsiteForEmail erdani.org>>

 wrote:


     I'd like to see a solution to inlining along the lines of "pliz pliz
     inline" (best effort) and "never inline".

     Outlining only at a specific call site is seldom needed and when it
     is it's trivially achievable with a noinline function forwarding to
     the inline function. Inlining only at a specific call site is a tall
     order and essentially impossible if header generation had been used.


 I don't follow, how does that work?
 It's the key innovation here. Since D doesn't have macros, I think it's
 something that really needs to be supported nicely.
 Obviously it's impossible if source is unavailable. It should give the
 same complaints that CTFE gives when source is unavailable.
The notion that a compiler can ask for any function to be inlined without the compiler having been "warned" in the function declaration makes me uncomfortable about feasibility. However, upon further thinking the same happens with CTFE.
Exactly, we already have it in CTFE. It doesn't really add any new concept that D isn't already comfortable with. It can kinda be seen as sort of a type safe macro, which is a tool that D is lacking compared to C. I think the mixin keyword and concept makes perfect sense in this context. It feels quite intuitive to me.
Mar 18 2014
prev sibling parent reply David Gileadi <gileadis NSPMgmail.com> writes:
On 3/13/14, 11:21 PM, Manu wrote:
 My feeling is that an ideal solution would be something like an
 enhancement which would allow the 'mixin' keyword to be used with
 regular function calls.
 What this would do is 'mix in' the function call at this location, ie,
 effectively inline that particular call, and it leverages a keyword and
 concept that we already have. It would obviously produce a compile error
 of the code is not available.

 I quite like this idea, but there is a potential syntactical problem;
 how to assign the return value?

 int func(int y) { return y*y+10; }

 int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get
 in the way' if the output
 int output = mixin(func(10)); // now i feel paren spammy...
 mixin(int output = func(10)); // this doesn't feel right...

 My feeling is the first is the best, but I'm not sure about that
 grammatically.
Is there already some trait for getting the string value of a function including its code? If so then a mixin plus a small helper function might do the job. If not then is such a trait feasible?
Mar 14 2014
parent reply Paulo Pinto <pjmlp progtools.org> writes:
Am 14.03.2014 19:09, schrieb David Gileadi:
 On 3/13/14, 11:21 PM, Manu wrote:
 My feeling is that an ideal solution would be something like an
 enhancement which would allow the 'mixin' keyword to be used with
 regular function calls.
 What this would do is 'mix in' the function call at this location, ie,
 effectively inline that particular call, and it leverages a keyword and
 concept that we already have. It would obviously produce a compile error
 of the code is not available.

 I quite like this idea, but there is a potential syntactical problem;
 how to assign the return value?

 int func(int y) { return y*y+10; }

 int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get
 in the way' if the output
 int output = mixin(func(10)); // now i feel paren spammy...
 mixin(int output = func(10)); // this doesn't feel right...

 My feeling is the first is the best, but I'm not sure about that
 grammatically.
Is there already some trait for getting the string value of a function including its code? If so then a mixin plus a small helper function might do the job. If not then is such a trait feasible?
Might be problematic with modules delivered only in .di + binary form. -- Paulo
Mar 14 2014
parent David Gileadi <gileadis NSPMgmail.com> writes:
On 3/14/14, 1:42 PM, Paulo Pinto wrote:
 Am 14.03.2014 19:09, schrieb David Gileadi:
 On 3/13/14, 11:21 PM, Manu wrote:
 My feeling is that an ideal solution would be something like an
 enhancement which would allow the 'mixin' keyword to be used with
 regular function calls.
 What this would do is 'mix in' the function call at this location, ie,
 effectively inline that particular call, and it leverages a keyword and
 concept that we already have. It would obviously produce a compile error
 of the code is not available.

 I quite like this idea, but there is a potential syntactical problem;
 how to assign the return value?

 int func(int y) { return y*y+10; }

 int output = mixin func(10); // the 'mixin' keyword seems to kinda 'get
 in the way' if the output
 int output = mixin(func(10)); // now i feel paren spammy...
 mixin(int output = func(10)); // this doesn't feel right...

 My feeling is the first is the best, but I'm not sure about that
 grammatically.
Is there already some trait for getting the string value of a function including its code? If so then a mixin plus a small helper function might do the job. If not then is such a trait feasible?
Might be problematic with modules delivered only in .di + binary form. -- Paulo
Quite, but as Manu says about his proposed solution,
 It would obviously produce a compile error
 of (sic) the code is not available.
This would need to behave similarly.
Mar 14 2014