digitalmars.D - An LLVM bug that affect both LDC and SDC. Worth pushing for
- deadalnix (6/6) Jun 15 2014 http://llvm.org/bugs/show_bug.cgi?id=20049
- safety0ff (4/10) Jun 16 2014 This is the corresponding D code:
- deadalnix (3/15) Jun 16 2014 Not exactly, but this is the kind of code that will trigger the
- Iain Buclaw via Digitalmars-d (3/16) Jun 16 2014 That code shouldn't create a GC allocated closure. :o)
- deadalnix (4/26) Jun 16 2014 Change return bar to return &bar and you got one possible
- Iain Buclaw via Digitalmars-d (4/34) Jun 18 2014 Yeah, I did get that bit. I'm not sure of the optimisation though.
- David Nadlinger (5/8) Jun 18 2014 How would that work if your inliner operates on some
- Iain Buclaw via Digitalmars-d (9/17) Jun 18 2014 I don't know LLVM to comment. But the way GCC operates at a higher
- David Nadlinger (12/33) Jun 18 2014 You stated that closure/frame generation should occur after
- deadalnix (3/13) Jun 18 2014 Yes, but the problem is not limited to SDC. LDC exhibit the same
- David Nadlinger (8/10) Jun 18 2014 Yes, certainly. To me, this looks like a limitation in GVN or so.
- deadalnix (6/9) Jun 18 2014 That doesn't really work that way for LLVM. You generate language
- Iain Buclaw via Digitalmars-d (13/23) Jun 18 2014 Likewise here. But unless I'm missing something (I'm not sure what
- deadalnix (14/28) Jun 18 2014 That is the final goal. A first goal should be:
- Iain Buclaw via Digitalmars-d (10/39) Jun 18 2014 I just tried out doing something simple in gdc to see if I could
- dennis luehring (35/46) Jun 18 2014 just to show what clang 3.5 svn and libc++ can currently optimize down
- deadalnix (2/2) Jun 18 2014 If they go for clang specific solution, that aren't gonna cut it
- dennis luehring (2/4) Jun 18 2014 only as an orientation what weaker language + optimizer can reach :)
http://llvm.org/bugs/show_bug.cgi?id=20049 Basically when you have a closure in a closure and the whole thing get inlined, LLVM mess up, which result in compiler not being able to optimize GC allocation away. Probably worth pushing for. It does probably affect other functional languages as well, but I didn't checked.
Jun 15 2014
On Monday, 16 June 2014 at 06:09:28 UTC, deadalnix wrote:http://llvm.org/bugs/show_bug.cgi?id=20049 Basically when you have a closure in a closure and the whole thing get inlined, LLVM mess up, which result in compiler not being able to optimize GC allocation away. Probably worth pushing for. It does probably affect other functional languages as well, but I didn't checked.This is the corresponding D code: https://github.com/deadalnix/SDC/blob/master/tests/test0156.d Correct?
Jun 16 2014
On Monday, 16 June 2014 at 16:31:20 UTC, safety0ff wrote:On Monday, 16 June 2014 at 06:09:28 UTC, deadalnix wrote:Not exactly, but this is the kind of code that will trigger the bug.http://llvm.org/bugs/show_bug.cgi?id=20049 Basically when you have a closure in a closure and the whole thing get inlined, LLVM mess up, which result in compiler not being able to optimize GC allocation away. Probably worth pushing for. It does probably affect other functional languages as well, but I didn't checked.This is the corresponding D code: https://github.com/deadalnix/SDC/blob/master/tests/test0156.d Correct?
Jun 16 2014
On 16 June 2014 17:31, safety0ff via Digitalmars-d <digitalmars-d puremagic.com> wrote:On Monday, 16 June 2014 at 06:09:28 UTC, deadalnix wrote:That code shouldn't create a GC allocated closure. :o)http://llvm.org/bugs/show_bug.cgi?id=20049 Basically when you have a closure in a closure and the whole thing get inlined, LLVM mess up, which result in compiler not being able to optimize GC allocation away. Probably worth pushing for. It does probably affect other functional languages as well, but I didn't checked.This is the corresponding D code: https://github.com/deadalnix/SDC/blob/master/tests/test0156.d Correct?
Jun 16 2014
On Monday, 16 June 2014 at 19:18:29 UTC, Iain Buclaw via Digitalmars-d wrote:On 16 June 2014 17:31, safety0ff via Digitalmars-d <digitalmars-d puremagic.com> wrote:Change return bar to return &bar and you got one possible candidate to trigger the bug.On Monday, 16 June 2014 at 06:09:28 UTC, deadalnix wrote:That code shouldn't create a GC allocated closure. :o)http://llvm.org/bugs/show_bug.cgi?id=20049 Basically when you have a closure in a closure and the whole thing get inlined, LLVM mess up, which result in compiler not being able to optimize GC allocation away. Probably worth pushing for. It does probably affect other functional languages as well, but I didn't checked.This is the corresponding D code: https://github.com/deadalnix/SDC/blob/master/tests/test0156.d Correct?
Jun 16 2014
On 16 June 2014 20:37, deadalnix via Digitalmars-d <digitalmars-d puremagic.com> wrote:On Monday, 16 June 2014 at 19:18:29 UTC, Iain Buclaw via Digitalmars-d wrote:Yeah, I did get that bit. I'm not sure of the optimisation though. IMO, the closure/frame generation should occur *after* inlining.On 16 June 2014 17:31, safety0ff via Digitalmars-d <digitalmars-d puremagic.com> wrote:Change return bar to return &bar and you got one possible candidate to trigger the bug.On Monday, 16 June 2014 at 06:09:28 UTC, deadalnix wrote:That code shouldn't create a GC allocated closure. :o)http://llvm.org/bugs/show_bug.cgi?id=20049 Basically when you have a closure in a closure and the whole thing get inlined, LLVM mess up, which result in compiler not being able to optimize GC allocation away. Probably worth pushing for. It does probably affect other functional languages as well, but I didn't checked.This is the corresponding D code: https://github.com/deadalnix/SDC/blob/master/tests/test0156.d Correct?
Jun 18 2014
On Wednesday, 18 June 2014 at 09:29:14 UTC, Iain Buclaw via Digitalmars-d wrote:Yeah, I did get that bit. I'm not sure of the optimisation though. IMO, the closure/frame generation should occur *after* inlining.How would that work if your inliner operates on some language-independent IR? David
Jun 18 2014
On 18 June 2014 14:18, David Nadlinger via Digitalmars-d <digitalmars-d puremagic.com> wrote:On Wednesday, 18 June 2014 at 09:29:14 UTC, Iain Buclaw via Digitalmars-d wrote:I don't know LLVM to comment. But the way GCC operates at a higher level so that all information is available to use (the inlined function is just duplicated with all its parameters remapped into variables, and the return expression is turned into an assignment to a dedicated return-value variable). Though the fact still is that the same is true with GDC, it's IR is generated before optimisation passes.Yeah, I did get that bit. I'm not sure of the optimisation though. IMO, the closure/frame generation should occur *after* inlining.How would that work if your inliner operates on some language-independent IR?
Jun 18 2014
On Wednesday, 18 June 2014 at 21:14:48 UTC, Iain Buclaw via Digitalmars-d wrote:On 18 June 2014 14:18, David Nadlinger via Digitalmars-d <digitalmars-d puremagic.com> wrote:You stated that closure/frame generation should occur after inlining. I doubt that this is feasible to implement in the current LDC architecture, and probably also in GDC (although I don't know its internals well enough to be sure). What we do in LDC, by the way, is just to optimize the closure GC allocations into a stack allocation if we can prove the context is not escaped after inlining. This happens in a custom optimization pass on the IR level. deadalnix is presumably talking about something very similar he is working on for SDC. DavidOn Wednesday, 18 June 2014 at 09:29:14 UTC, Iain Buclaw via Digitalmars-d wrote:I don't know LLVM to comment. But the way GCC operates at a higher level so that all information is available to use (the inlined function is just duplicated with all its parameters remapped into variables, and the return expression is turned into an assignment to a dedicated return-value variable).IMO, the closure/frame generation should occur *after* inlining.How would that work if your inliner operates on some language-independent IR?
Jun 18 2014
On Wednesday, 18 June 2014 at 22:33:03 UTC, David Nadlinger wrote:You stated that closure/frame generation should occur after inlining. I doubt that this is feasible to implement in the current LDC architecture, and probably also in GDC (although I don't know its internals well enough to be sure). What we do in LDC, by the way, is just to optimize the closure GC allocations into a stack allocation if we can prove the context is not escaped after inlining. This happens in a custom optimization pass on the IR level. deadalnix is presumably talking about something very similar he is working on for SDC. DavidYes, but the problem is not limited to SDC. LDC exhibit the same behavior (because it is an LLVM bug, not a SDC or LDC one).
Jun 18 2014
On Wednesday, 18 June 2014 at 23:08:06 UTC, deadalnix wrote:Yes, but the problem is not limited to SDC. LDC exhibit the same behavior (because it is an LLVM bug, not a SDC or LDC one).Yes, certainly. To me, this looks like a limitation in GVN or so. But coming back to the D side of things, do you have an actual D test case showing the problem? The remaining load in your example shouldn't be enough to trip up LDC's optimizer pass by itself, but I'm rather certain that there might be more complex code with missed optimization opportunities due to this. David
Jun 18 2014
On Wednesday, 18 June 2014 at 09:29:14 UTC, Iain Buclaw via Digitalmars-d wrote:Yeah, I did get that bit. I'm not sure of the optimisation though. IMO, the closure/frame generation should occur *after* inlining.That doesn't really work that way for LLVM. You generate language independent IR and optimizations passes run on it. The front can add passes of its own in the optimization process to do language dependent optimizations.
Jun 18 2014
On 18 June 2014 19:20, deadalnix via Digitalmars-d <digitalmars-d puremagic.com> wrote:On Wednesday, 18 June 2014 at 09:29:14 UTC, Iain Buclaw via Digitalmars-d wrote:Likewise here. But unless I'm missing something (I'm not sure what magic happens with allocate, for instance), I'm not sure how you could expect the optimisation passes to squash closures together. Am I correct in that it's asking for: ------ int *i = new int; *i = 42; return *i; To be folded into: ------ return 42;Yeah, I did get that bit. I'm not sure of the optimisation though. IMO, the closure/frame generation should occur *after* inlining.That doesn't really work that way for LLVM. You generate language independent IR and optimizations passes run on it. The front can add passes of its own in the optimization process to do language dependent optimizations.
Jun 18 2014
On Wednesday, 18 June 2014 at 21:22:44 UTC, Iain Buclaw via Digitalmars-d wrote:Likewise here. But unless I'm missing something (I'm not sure what magic happens with allocate, for instance), I'm not sure how you could expect the optimisation passes to squash closures together. Am I correct in that it's asking for: ------ int *i = new int; *i = 42; return *i; To be folded into: ------ return 42;That is the final goal. A first goal should be: int *i = new int; *i = 42; return 42; That first step is supposed to be done by LLVM infra itself (and it does for such a simple example, but if you multiply the new, it gets confused). It is necessary because at this point, the language specific pass will be able to detect that nobody ever read from the allocated memory and that it doesn't escape, so it can be optimized away. If the first step do not happen, then the second step won't either, and it cascade down to pretty stupid code generation.
Jun 18 2014
On 18 June 2014 22:29, deadalnix via Digitalmars-d <digitalmars-d puremagic.com> wrote:On Wednesday, 18 June 2014 at 21:22:44 UTC, Iain Buclaw via Digitalmars-d wrote:I just tried out doing something simple in gdc to see if I could trigger this - got optimisation passes to compile it down to: _d_allocmemory (16); _d_allocmemory (16); return 36; Which is more than what I expected... it managed to const-fold all operations into a single return, just haven't lost the (now) useless GC allocations for the closures that were removed as dead code.Likewise here. But unless I'm missing something (I'm not sure what magic happens with allocate, for instance), I'm not sure how you could expect the optimisation passes to squash closures together. Am I correct in that it's asking for: ------ int *i = new int; *i = 42; return *i; To be folded into: ------ return 42;That is the final goal. A first goal should be: int *i = new int; *i = 42; return 42; That first step is supposed to be done by LLVM infra itself (and it does for such a simple example, but if you multiply the new, it gets confused). It is necessary because at this point, the language specific pass will be able to detect that nobody ever read from the allocated memory and that it doesn't escape, so it can be optimized away. If the first step do not happen, then the second step won't either, and it cascade down to pretty stupid code generation.
Jun 18 2014
Am 18.06.2014 23:22, schrieb Iain Buclaw via Digitalmars-d:Likewise here. But unless I'm missing something (I'm not sure what magic happens with allocate, for instance), I'm not sure how you could expect the optimisation passes to squash closures together. Am I correct in that it's asking for: ------ int *i = new int; *i = 42; return *i; To be folded into: ------ return 42;just to show what clang 3.5 svn and libc++ can currently optimize down patches clang: http://reviews.llvm.org/rL210137 libc++: http://reviews.llvm.org/rL210211 #example 1 #include <vector> #include <numeric> int main() { const std::vector<int> a{1,2}; const std::vector<int> b{4,5}; const std::vector<int> ints { std::accumulate(a.begin(),a.end(),1), std::accumulate(b.begin(),b.end(),2), }; return std::accumulate(ints.begin(),ints.end(),100); } asm result: movl $115, %eax retq #example 2 #include <string> int main() { return std::string("hello").size(); } asm result: movl $5, %eax retq an older clang/libc++, gcc 4.9.x, and VS2013 producing much much (much) more asm code in these situations
Jun 18 2014
If they go for clang specific solution, that aren't gonna cut it for us :(
Jun 18 2014
Am 19.06.2014 07:16, schrieb deadalnix:If they go for clang specific solution, that aren't gonna cut it for us :(only as an orientation what weaker language + optimizer can reach :)
Jun 18 2014