www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - The cost of doing compile time introspection

reply Moritz Maxeiner <moritz ucworks.org> writes:
So, after being asked to support dynamic loading in llvm-d[1] at 
DConf 2017 I essentially had the following two options:
- have two separate declarations of the bindings, one with the 
function signatures (linking) and one with function pointers 
(loading)
- Have one declaration and derive the other at compile time

Since this is D and I was reminded of Andrei's keynote I thought 
to myself "use the Compiler, Dummy" and went about 
introspecting[2] the function signatures[3]. The relevant code 
looks like this:

---
import functions = llvm.functions.link;

template isCFunction(alias scope_, string member)
{
     static if (isFunction!(__traits(getMember, scope_, member)) &&
                (functionLinkage!(__traits(getMember, scope_, 
member)) == "C" ||
                 functionLinkage!(__traits(getMember, scope_, 
member)) == "Windows")) {
         enum isCFunction = true;
     } else {
         enum isCFunction = false;
     }
}

template CFunctions(alias mod)
{
     alias isCFunction(string member) = .isCFunction!(mod, member);
     alias CFunctions = Filter!(isCFunction, __traits(allMembers, 
mod));
}

string declareStubs()
{
     import std.array : appender;
     auto code = appender!string;
     foreach (fn; CFunctions!functions) {
         code.put("typeof(functions."); code.put(fn);
         code.put(")* "); code.put(fn); code.put(";\n");
     }
     return code.data;
}

mixin (declareStubs);
---

Now, the above code essentially searches through all symbols in 
the llvm.functions.link module, filters out anything that's not 
an extern(System) function and then generates code declaring a 
function pointer for each of them, and then finally mixes that 
code in.

This increases compilation time on my Broadwell i5-5200U from 
faster than my terminal emulator can print to 4 seconds. Yikes. 
 From experience I knew that it wouldn't be cheap, but that's a 
hefty cost for a conceptually simple use case (from my PoV).

You might ask why I need to use CTFE here (as the interpreter is 
not cheap): It's because I want the mixed-in function pointers to 
be at module scope and I'm not aware of any other way to do that 
currently; if the new static foreach works the way I suspect, 
then that part could essentially be replaced (at module scope) 
with something like

---
static foreach (fn; CFunctions!functions) {
     typeof(functions.fn)* __traits(identifier, fn);
}
---

Looks much nicer, but will it be fast? I don't know, but I hope 
so, since it shouldn't need to spin up CTFE.

TLDR: Compile time introspection still increases compile time 
from below human response time (i.e. without encouraging context 
switching in your brain) to breaking your concentration even for 
simple use cases.

[1] https://github.com/Calrama/llvm-d
[2] 
https://github.com/Calrama/llvm-d/blob/master/source/llvm/functions/load.d
[3] 
https://github.com/Calrama/llvm-d/blob/master/source/llvm/functions/link.d
May 10 2017
parent reply Biotronic <simen.kjaras gmail.com> writes:
On Wednesday, 10 May 2017 at 11:45:05 UTC, Moritz Maxeiner wrote:
[CTFE slow]

First, as you may know, Stefan Koch is working on an improved 
CTFE engine that will hopefully make things a lot better.

As for making the code faster right now, could this be done with 
mixin templates instead?

Something like:

import functions = llvm.functions.link;
import std.meta, std.traits;

template isCFunction(alias member)
{
     static if (isFunction!(member) &&
                (functionLinkage!(member) == "C" ||
                 functionLinkage!(member) == "Windows")) {
         enum isCFunction = true;
     } else {
         enum isCFunction = false;
     }
}

template CFunctions(alias scope_)
{
     alias GetSymbol(string member) = 
AliasSeq!(__traits(getMember, scope_, member));
     alias CFunctions = Filter!(isCFunction, staticMap!(GetSymbol, 
__traits(allMembers, scope_)));
}

mixin template declareStubsImpl(T...)
{
     static if (T.length == 0) {
     } else {
         mixin("extern (C) typeof(T[0])* "~__traits(identifier, 
T[0])~";");
         mixin declareStubsImpl!(T[1..$]);
     }
}

mixin template declareStubs(alias scope_)
{
     mixin declareStubsImpl!(CFunctions!scope_);
}

mixin declareStubs!functions;

(tests indicate this compiles but doesn't run, and I'm helping 
you on company time, so I can't fix everything right now :p)


 if the new static foreach works the way I suspect, then that 
 part could essentially be replaced (at module scope) with 
 something like

 ---
 static foreach (fn; CFunctions!functions) {
     typeof(functions.fn)* __traits(identifier, fn);
 }
A few things here - functions.fn would not do what you want, and neither would __traits(identifier). functions.fn would treat "fn" like a part of name, not a string value, so this will make the poor compiler barf. __traits(identifier, fn) expects fn to be a symbol, while here it's a string. In fact, it's exactly the string you want __traits to return. Lastly, you'll still need a mixin, whether it's for __traits(identifier, fn) or just fn - they're just strings. Something like this: static foreach (fn; CFunctions!functions) { mixin("typeof(__traits(getMember, functions, fn))* "~fn~";"); }
May 10 2017
next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 10 May 2017 at 14:03:58 UTC, Biotronic wrote:
 On Wednesday, 10 May 2017 at 11:45:05 UTC, Moritz Maxeiner 
 wrote:
 [CTFE slow]

 First, as you may know, Stefan Koch is working on an improved 
 CTFE engine that will hopefully make things a lot better.
It will not; This is issue is caused by templates, and not by CTFE.
May 10 2017
parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 10.05.2017 16:28, Stefan Koch wrote:
 On Wednesday, 10 May 2017 at 14:03:58 UTC, Biotronic wrote:
 On Wednesday, 10 May 2017 at 11:45:05 UTC, Moritz Maxeiner wrote:
 [CTFE slow]

 First, as you may know, Stefan Koch is working on an improved CTFE
 engine that will hopefully make things a lot better.
It will not; This is issue is caused by templates, and not by CTFE.
I think my measurements show that the main bottleneck is actually Appender in CTFE, while templates contribute a smaller, yet significant, amount.
May 11 2017
parent Stefan Koch <uplink.coder googlemail.com> writes:
On Thursday, 11 May 2017 at 21:57:06 UTC, Timon Gehr wrote:
 On 10.05.2017 16:28, Stefan Koch wrote:
 On Wednesday, 10 May 2017 at 14:03:58 UTC, Biotronic wrote:
 On Wednesday, 10 May 2017 at 11:45:05 UTC, Moritz Maxeiner 
 wrote:
 [CTFE slow]

 First, as you may know, Stefan Koch is working on an improved 
 CTFE
 engine that will hopefully make things a lot better.
It will not; This is issue is caused by templates, and not by CTFE.
I think my measurements show that the main bottleneck is actually Appender in CTFE, while templates contribute a smaller, yet significant, amount.
You are correct. I should not have made this statement without actually measuring. Still templates produce the enormous amounts of code that ctfe has to wade trough. So while they are not the bottleneck in this case, they are still the cause.
May 11 2017
prev sibling next sibling parent Moritz Maxeiner <moritz ucworks.org> writes:
On Wednesday, 10 May 2017 at 14:03:58 UTC, Biotronic wrote:
 On Wednesday, 10 May 2017 at 11:45:05 UTC, Moritz Maxeiner 
 wrote:
 [CTFE slow]

 First, as you may know, Stefan Koch is working on an improved 
 CTFE engine that will hopefully make things a lot better.
As he already pointed out, it won't help with the introspection template costs, though.
 As for making the code faster right now, could this be done 
 with mixin templates instead?

 Something like:

 [...]
This will move from CTFE to templates, sure, but templates aren't that fast, either. I'll experiment with your suggestion regardless to see what it'll result in, but I'm not very optimistic about that solving the issue.
 (tests indicate this compiles but doesn't run, and I'm helping 
 you on company time, so I can't fix everything right now :p)
Thank you, I'll gladly take any advice and try to optimize things, though my primary purpose with this post was not to ask for help (for once :p ), but to publicize my experience with compile time introspection costs as a reference for others (and because a certain someone asked me too - you know who you are).
 [...]
A few things here - functions.fn would not do what you want, and neither would __traits(identifier). functions.fn would treat "fn" like a part of name, not a string value, so this will make the poor compiler barf. __traits(identifier, fn) expects fn to be a symbol, while here it's a string. In fact, it's exactly the string you want __traits to return. Lastly, you'll still need a mixin, whether it's for __traits(identifier, fn) or just fn - they're just strings.
Right, but it'll still be one fairly short, readable line in the end (that should be faster than CTFE).
May 10 2017
prev sibling next sibling parent Moritz Maxeiner <moritz ucworks.org> writes:
On Wednesday, 10 May 2017 at 14:03:58 UTC, Biotronic wrote:
 As for making the code faster right now, could this be done 
 with mixin templates instead?

 Something like:

 import functions = llvm.functions.link;
 import std.meta, std.traits;

 template isCFunction(alias member)
 {
     static if (isFunction!(member) &&
                (functionLinkage!(member) == "C" ||
                 functionLinkage!(member) == "Windows")) {
         enum isCFunction = true;
     } else {
         enum isCFunction = false;
     }
 }

 template CFunctions(alias scope_)
 {
     alias GetSymbol(string member) = 
 AliasSeq!(__traits(getMember, scope_, member));
     alias CFunctions = Filter!(isCFunction, 
 staticMap!(GetSymbol, __traits(allMembers, scope_)));
 }

 mixin template declareStubsImpl(T...)
 {
     static if (T.length == 0) {
     } else {
         mixin("extern (C) typeof(T[0])* "~__traits(identifier, 
 T[0])~";");
         mixin declareStubsImpl!(T[1..$]);
     }
 }

 mixin template declareStubs(alias scope_)
 {
     mixin declareStubsImpl!(CFunctions!scope_);
 }

 mixin declareStubs!functions;
After testing this approach out, I couldn't even time it. Why? Because the compiler pretty much immediately hits the (I think fixed) recursive template expansion limit. The LLVM C API has too many functions for this :/
May 11 2017
prev sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 10.05.2017 16:03, Biotronic wrote:

 A few things here - functions.fn would not do what you want, and neither
 would __traits(identifier).
 functions.fn would treat "fn" like a part of name, not a string value,
 so this will make the poor compiler barf.
 __traits(identifier, fn) expects fn to be a symbol, while here it's a
 string. In fact, it's exactly the string you want __traits to return.
 Lastly, you'll still need a mixin, whether it's for __traits(identifier,
 fn) or just fn - they're just strings. Something like this:

 static foreach (fn; CFunctions!functions) {
     mixin("typeof(__traits(getMember, functions, fn))* "~fn~";");
 }
Yes, this works and is a few times faster. It's slightly faster when inlining the condition: static foreach(fn;__traits(allMembers, functions)){ static if (isFunction!(__traits(getMember, functions, fn)) && (functionLinkage!(__traits(getMember, functions, fn)) == "C" || functionLinkage!(__traits(getMember, functions, fn)) == "Windows")){ mixin("typeof(functions."~fn~")* "~fn~";"); } } With the DMD debug build, I measured the following times on my machine: Baselines: just imports: 0m0.318s copy-pasted generated code after printing it with pragma(msg, ...): 0m0.341s Compile-time code generation: old version: 0m2.569s static foreach, uninlined: 0m0.704s static foreach inlined: 0m0.610s Still not great, but a notable improvement. isFunction and functionLinkage are slow, so I got rid of them (as well as the dependency on std.traits): static foreach(fn;__traits(allMembers, functions)){ static if(fn != "object" && fn != "llvm" && fn != "orEmpty"): mixin("typeof(functions."~fn~")* "~fn~";"); } timing: 0m0.350s (This is not perfect as you'll need to edit the list in case you are adding more non-c-function members to that module, but I guess it is a good trade-off.) You can achieve essentially the same using a string mixin: mixin({ string r; foreach(fn;__traits(allMembers, functions)) if(fn != "object" && fn != "llvm" && fn != "orEmpty") r~="typeof(functions."~fn~")* "~fn~";"; return r; }()); timing: 0m0.370s In case the original semantics should be preserved, I think this is the best option: mixin({ string r; foreach(fn;CFunctions!functions) r~="typeof(functions."~fn~")* "~fn~";"; return r; }()); timing: 0m0.740s
May 11 2017
parent Moritz Maxeiner <moritz ucworks.org> writes:
On Thursday, 11 May 2017 at 21:09:05 UTC, Timon Gehr wrote:
 [...]

 Yes, this works and is a few times faster.
 It's slightly faster when inlining the condition:

 static foreach(fn;__traits(allMembers, functions)){
     static if (isFunction!(__traits(getMember, functions, fn)) 
 &&
                (functionLinkage!(__traits(getMember, functions, 
 fn)) == "C" ||
                 functionLinkage!(__traits(getMember, functions, 
 fn)) == "Windows")){
         mixin("typeof(functions."~fn~")* "~fn~";");
     }
 }

 With the DMD debug build, I measured the following times on my 
 machine:

 Baselines:

 just imports:
 0m0.318s

 copy-pasted generated code after printing it with pragma(msg, 
 ...):
 0m0.341s

 Compile-time code generation:

 old version:
 0m2.569s

 static foreach, uninlined:
 0m0.704s

 static foreach inlined:
 0m0.610s

 Still not great, but a notable improvement.


 isFunction and functionLinkage are slow, so I got rid of them 
 (as well as the dependency on std.traits):

 static foreach(fn;__traits(allMembers, functions)){
     static if(fn != "object" && fn != "llvm" && fn != 
 "orEmpty"):
     mixin("typeof(functions."~fn~")* "~fn~";");
 }

 timing:
 0m0.350s

 (This is not perfect as you'll need to edit the list in case 
 you are adding more non-c-function members to that module, but 
 I guess it is a good trade-off.)



 You can achieve essentially the same using a string mixin:

 mixin({
     string r;
     foreach(fn;__traits(allMembers, functions))
         if(fn != "object" && fn != "llvm" && fn != "orEmpty")
             r~="typeof(functions."~fn~")* "~fn~";";
     return r;
 }());

 timing:
 0m0.370s



 In case the original semantics should be preserved, I think 
 this is the best option:

 mixin({
     string r;
     foreach(fn;CFunctions!functions)
         r~="typeof(functions."~fn~")* "~fn~";";
     return r;
 }());

 timing:
 0m0.740s
Thank you for the detailed comparison. I have applied your optimizations (with minor refactoring that did not impact compile time for me) and ended up with this (sorry for some name changes, wasn't happy with my original ones): --- import link = llvm.functions.link; bool isSym(string m) { return m != "object" && m != "llvm" && m != "orEmpty"; } string declareSymPtr(string m) { return "typeof(link." ~ m ~ ")* " ~ m ~ ";"; } string getSymPtr(string m) { return m ~ " = library.getSymbol!(typeof(" ~ m ~ "))(\"" ~ m ~ "\");"; } mixin ({ string code; foreach (m; __traits(allMembers, link)) if (m.isSym) { code ~= m.declareSymPtr; } return code; }()); public struct LLVM { static void getSymbols() { foreach (m; __traits(allMembers, link)) static if (m.isSym) { mixin (m.getSymPtr); } } } --- I am not particularly happy about (isSym) having to do a name based blacklist approach instead of a type based whitelist approach, though. With this I'm at least out of the "OMG why is it still compiling" range and thank you to everyone for that. It's still not in the ideal range of < 100 milliseconds, but I'll take what I can get.
May 11 2017