www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - Android/ARM: fixing exception-handling

reply "Joakim" <dlang joakim.fea.st> writes:
I've gotten pretty far along with ldc for Android/ARM, with the 
big remaining issue appearing to be the unfinished support for 
exception-handling.  Many exceptions seem to work just fine, 
while others cause segfaults.  I've just started looking at 
ldc.eh with one such failing exception from the unit tests for 
core.thread and it seems to error out when trying to find the 
landing pad and action offset, I think in the get_uleb128 helper.

David, what remains to be done for ARM support, if you know 
anything more specific than simply finding and fixing the 
remaining stuff that doesn't work?
Jun 16 2015
parent reply "David Nadlinger" <code klickverbot.at> writes:
On Tuesday, 16 June 2015 at 23:07:45 UTC, Joakim wrote:
 David, what remains to be done for ARM support, if you know 
 anything more specific than simply finding and fixing the 
 remaining stuff that doesn't work?
Unfortunately, I don't know of anything more specific than the couple EH-related of test case failures on Linux/EABI. It has been quite some while since I last worked on LDC/ARM to be honest; most of my ARM work is getting embedded stuff done with C++14 these days. Maybe Dan knows of some other codegen/math-related issues still to be solved? - David
Jun 16 2015
next sibling parent reply Dan Olson <gorox comcast.net> writes:
"David Nadlinger" <code klickverbot.at> writes:

 On Tuesday, 16 June 2015 at 23:07:45 UTC, Joakim wrote:
 David, what remains to be done for ARM support, if you know anything
 more specific than simply finding and fixing the remaining stuff
 that doesn't work?
Unfortunately, I don't know of anything more specific than the couple EH-related of test case failures on Linux/EABI. It has been quite some while since I last worked on LDC/ARM to be honest; most of my ARM work is getting embedded stuff done with C++14 these days. Maybe Dan knows of some other codegen/math-related issues still to be solved?
There might be some clues in the iOS branch for ldc/eh.d even though it is dealing with SjLj style exceptions and landing pads are interpreted differently. It has few version variations but uses much of the same code. It seems to work ok during the unittests. I haven't encountered any weirdness since I spent some late nights with a debugger a year ago. https://github.com/smolt/druntime/blob/ios/src/ldc/eh.d Diff with tag v0.15.1 to see where I changed stuff. All my published ios branches are currently based on 0.15.1 and using LLVM 3.5.1. Joakim, what branch of LDC are you basing your Android stuff on? I can publish to github ios merges with 0.15.2-beta and 0.16.0 (branch merge-2.067), but I don't think there is any additional help there with regard to EH, even though ldc/eh.d did change for druntime ldc branch. As far as codegen problems - there is nothing related to EH that I can think of. The optimizer occassionally gets some alignment wrong with neon instructions in LLVM 3.5.1, but that does not show up as a EH problem. Currently neon is disabled when building optimized libs. If you haven't created a gen/abi-arm.{h,cpp}, you will need to as the default has a few problems on ARM, but still not related to EH. If you are on LLVM 3.5.1, try the one on the ios branch named abi-ios.{h,cpp}. There are additional abi-ios changes for 0.15.2 because D variadic functions handling changed. -- Dan
Jun 16 2015
parent reply "Joakim" <dlang joakim.fea.st> writes:
On Wednesday, 17 June 2015 at 06:50:52 UTC, Dan Olson wrote:
 There might be some clues in the iOS branch for ldc/eh.d even 
 though it is dealing with SjLj style exceptions and landing 
 pads are interpreted differently.  It has few version 
 variations but uses much of the same code.  It seems to work ok 
 during the unittests.  I haven't encountered any weirdness 
 since I spent some late nights with a debugger a year ago.

 https://github.com/smolt/druntime/blob/ios/src/ldc/eh.d

 Diff with tag v0.15.1 to see where I changed stuff.  All my 
 published ios branches are currently based on 0.15.1 and using 
 LLVM 3.5.1.

 Joakim, what branch of LDC are you basing your Android stuff on?
I'm currently using the merge-2.067 branch linked against a lightly patched llvm 3.6, the one that's used in the Android NDK, and compiled by clang 3.6.1.
 I can publish to github ios merges with 0.15.2-beta and 0.16.0 
 (branch merge-2.067), but I don't think there is any additional 
 help there with regard to EH, even though ldc/eh.d did change 
 for druntime ldc branch.
I hadn't bothered looking at how your iOS branch dealt with exceptions, since you had said a while back that it uses setjmp/longjmp exceptions, but I'll take a look now and see if there's anything helpful.
 As far as codegen problems - there is nothing related to EH 
 that I can think of.  The optimizer occassionally gets some 
 alignment wrong with neon instructions in LLVM 3.5.1, but that 
 does not show up as a EH problem.  Currently neon is disabled 
 when building optimized libs.

 If you haven't created a gen/abi-arm.{h,cpp}, you will need to 
 as the default has a few problems on ARM, but still not related 
 to EH.  If you are on LLVM 3.5.1, try the one on the ios branch 
 named abi-ios.{h,cpp}. There are additional abi-ios changes for 
 0.15.2 because D variadic functions handling changed.
I'll take a look. Right now, the only change I made to gen/abi.cpp is to use the C calling convention everywhere.
Jun 17 2015
parent reply "Joakim" <dlang joakim.fea.st> writes:
On Wednesday, 17 June 2015 at 07:32:35 UTC, Joakim wrote:
 I hadn't bothered looking at how your iOS branch dealt with 
 exceptions, since you had said a while back that it uses 
 setjmp/longjmp exceptions, but I'll take a look now and see if 
 there's anything helpful.
Took a look, don't think it's relevant to DWARF exceptions.
 I'll take a look.  Right now, the only change I made to 
 gen/abi.cpp is to use the C calling convention everywhere.
It appears that the only change you made is to turn off passing structs by value? https://github.com/smolt/ldc/blob/ios/gen/abi-ios.cpp#L53 The fast C calling convention works for you? It always caused problems for me on ARM, including causing a segfault in llvm when compiling, the last time I tried it. I spent some time looking into the ARM EH issues and it appears that disabling inlining fixes a lot of it: --- a/gen/optimizer.cpp +++ b/gen/optimizer.cpp -163,8 +163,8 static unsigned sizeLevel() { // Determines whether or not to run the normal, full inlining pass. bool willInline() { - return enableInlining == cl::BOU_TRUE || - (enableInlining == cl::BOU_UNSET && optLevel() > 1); + return enableInlining == cl::BOU_TRUE;// || + //(enableInlining == cl::BOU_UNSET && optLevel() > 1); } bool isOptimizationEnabled() { I also get proper backtraces in gdb much more often after turning off inlining, not to mention actual error output on the command-line as opposed to segfaults. I'm guessing something is screwed up in the generation or handling of DWARF exception data by function inlining. Almost all of druntime now passes tests on Android/ARM, with the exception of some codegen issues in core.time. For a comparison, running the phobos tests with logging turned on in the ldc/eh.d code showed that only about 67 exceptions were thrown with -O2/-O3 -release and inlining turned on. With inlining turned off, it jumps up to 658 exceptions, an order of magnitude more, because many more tests are run once EH starts working. A couple exceptions might still be uncaught and need to be fixed, but it appears that EH is not the bottleneck anymore, it's codegen and other ARM issues. David, Kai, or whoever else runs tests on linux/Android/ARM, can you turn inlining off and verify the same results on your ARM hardware?
Jul 08 2015
next sibling parent Dan Olson <gorox comcast.net> writes:
"Joakim" <dlang joakim.fea.st> writes:

 On Wednesday, 17 June 2015 at 07:32:35 UTC, Joakim wrote:
 It appears that the only change you made is to turn off passing
 structs by value?

 https://github.com/smolt/ldc/blob/ios/gen/abi-ios.cpp#L53
Hi Joakm, Yes, that little change had a big impact. http://forum.dlang.org/post/m2r3u5ac0c.fsf comcast.net Structs are still passed by value, just in a different way. The LLVM "byval" attribute non-inuitively passes a pointer to a struct instead of passing its contents in registers and stack. http://llvm.org/docs/LangRef.html#parameter-attributes.
 The fast C calling convention works for you?  It always caused
 problems for me on ARM, including causing a segfault in llvm when
 compiling, the last time I tried it.
fastcc has worked quite well and an attempt to change to C calling convention (ccc) led to funny codegen for some aggregate function return values (e.g complex reals) when optimization was enabled. But that problem seemed to go away with LLVM 3.6. In the end I have abandoned fastcc for ccc with my 0.15.2 and 2.067 merge branches because LDC adopted a different variadic approach and fastcc doesn't support it. -- Dan
Jul 09 2015
prev sibling parent reply "Joakim" <dlang joakim.fea.st> writes:
On Wednesday, 8 July 2015 at 16:14:43 UTC, Joakim wrote:
 I spent some time looking into the ARM EH issues and it appears 
 that disabling inlining fixes a lot of it:

 --- a/gen/optimizer.cpp
 +++ b/gen/optimizer.cpp
    -163,8 +163,8    static unsigned sizeLevel() {

  // Determines whether or not to run the normal, full inlining 
 pass.
  bool willInline() {
 -    return enableInlining == cl::BOU_TRUE ||
 -        (enableInlining == cl::BOU_UNSET && optLevel() > 1);
 +    return enableInlining == cl::BOU_TRUE;// ||
 +        //(enableInlining == cl::BOU_UNSET && optLevel() > 1);
  }

  bool isOptimizationEnabled() {

 I also get proper backtraces in gdb much more often after 
 turning off inlining, not to mention actual error output on the 
 command-line as opposed to segfaults.  I'm guessing something 
 is screwed up in the generation or handling of DWARF exception 
 data by function inlining.  Almost all of druntime now passes 
 tests on Android/ARM, with the exception of some codegen issues 
 in core.time.

 For a comparison, running the phobos tests with logging turned 
 on in the ldc/eh.d code showed that only about 67 exceptions 
 were thrown with -O2/-O3 -release and inlining turned on.  With 
 inlining turned off, it jumps up to 658 exceptions, an order of 
 magnitude more, because many more tests are run once EH starts 
 working.  A couple exceptions might still be uncaught and need 
 to be fixed, but it appears that EH is not the bottleneck 
 anymore, it's codegen and other ARM issues.

 David, Kai, or whoever else runs tests on linux/Android/ARM, 
 can you turn inlining off and verify the same results on your 
 ARM hardware?
I spent some more time looking into this and it appears that an ARM optimization pass in llvm is the real issue, not inlining. It turns out that enabling the EH_personality debug output in ldc.eh and turning off inlining happened to generate ARM code that worked earlier, but I can get it to work without those two hacks by turning off one call to an ARM optimization pass in llvm instead. Specifically, if I disable this second call to createARMLoadStoreOptimizationPass() and then compile only ldc/eh.d with the resulting ldc2, ARM EH will work, because the second "while" loop in eh_personality_common doesn't segfault anymore: https://github.com/llvm-mirror/llvm/blob/release_36/lib/Target/ARM/ARMTargetMachine.cpp#L312 Otherwise, it will often, though not always, fail at a ldmib instruction, similar to the other codegen issue I brought up in another thread, which Dan provided a workaround for. With this second pass turned off, that ldmib instruction isn't there and EH starts working. I haven't looked further into exactly what that ARM optimization pass is screwing up, but this is probably an llvm codegen issue.
Jul 25 2015
parent Dan Olson <gorox comcast.net> writes:
"Joakim" <dlang joakim.fea.st> writes:
 I spent some more time looking into this and it appears that an ARM
 optimization pass in llvm is the real issue, not inlining.  It turns
 out that enabling the EH_personality debug output in ldc.eh and
 turning off inlining happened to generate ARM code that worked
 earlier, but I can get it to work without those two hacks by turning
 off one call to an ARM optimization pass in llvm instead.
Good puzzle solving. There might be clues in the clang source code on how to set everything up to make that optimization pass work. Clang does a lot of interesting stuff, like coercing args and changing alignment that I don't think is done in LDC.
Jul 28 2015
prev sibling parent "Joakim" <dlang joakim.fea.st> writes:
On Wednesday, 17 June 2015 at 01:03:19 UTC, David Nadlinger wrote:
 On Tuesday, 16 June 2015 at 23:07:45 UTC, Joakim wrote:
 David, what remains to be done for ARM support, if you know 
 anything more specific than simply finding and fixing the 
 remaining stuff that doesn't work?
Unfortunately, I don't know of anything more specific than the couple EH-related of test case failures on Linux/EABI.
OK, I'll look into those. It does seem that there are a lot more unit tests that throw exceptions in 2.067 though, so a lot more than a couple fail. I've also found one or two tests unrelated to exceptions that may have ARM codegen issues. I'll look into those further and file the appropriate issues, if necessary.
Jun 17 2015