digitalmars.D.ldc - Android/ARM codegen
- Joakim (72/72) Jul 16 2015 Alright, I've been stepping through some failing tests: one that
- Dan Olson (9/20) Jul 16 2015 That doesn't look right.
- Joakim (13/35) Jul 16 2015 llvm triple and other flags used:
- Dan Olson (4/18) Jul 16 2015 And the nice thing about LLVM / LDC is that I can specify your
- Dan Olson (35/64) Jul 17 2015 Joakim, I have a hunch. Can you try changing your abi-android.cpp so
Alright, I've been stepping through some failing tests: one that seems to be bad codegen is that many calls to std.algorithm.iteration.map seem to fail, for example, when running the tests for std.zip. One of the std.zip tests calls std.random.uniform, which then gets its second parameter stomped by map from rndGen(). I've tracked it down to this function in the llvm IR: define weak_odr void _D3std9algorithm9iteration47__T3mapS363std6random6rndGenFNcNdNfZ9__lambda4Z42__T3mapTS3std5range13__T6RepeatTiZ6RepeatZ3mapMFNaNbNiNfS3std5range13__T6RepeatTiZ6RepeatZS3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult(%"std.random.rndGen.M pResult!(__lambda4, Repeat!int).MapResult"* noalias nocapture sret %.sret_arg, i8* %.nest_arg, %"std.range.Repeat!int.Repeat"* byval nocapture readonly { %.structliteral = alloca %"std.random.rndGen.MapResult!(__lambda4, Repeat!int).MapResult", align 4 ; [#uses = 3 type = %"std.random.rndGen.MapResult!(__lambda4, Repeat!int).MapResult"*] %1 = getelementptr inbounds %"std.random.rndGen.MapResult!(__lambda4, Repeat!int).MapResult"* %.structliteral, i32 0, i32 0, i32 0 ; [#uses = 1 type = i32*] store i32 0, i32* %1, align 4 %2 = getelementptr %"std.random.rndGen.MapResult!(__lambda4, Repeat!int).MapResult"* %.structliteral, i32 0, i32 1 ; [#uses = 1 type = i8**] store i8* %.nest_arg, i8** %2, align 4 %tmp = call %"std.random.rndGen.MapResult!(__lambda4, Repeat!int).MapResult"* _D3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult6__ctorMFNaNbNcNiNfS3std5range13__T6RepeatTiZ6RepeatZS3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult(%"std.random.rndGen.M pResult!(__lambda4, Repeat!int).MapResult"* returned %.structliteral, %"std.range.Repeat!int.Repeat"* byval %r_arg) ; [#uses = 1 type = %"std.random.rndGen.MapResult!(__lambda4, Repeat!int).MapResult"*] %3 = bitcast %"std.random.rndGen.MapResult!(__lambda4, Repeat!int).MapResult"* %tmp to i64* ; [#uses = 1 type = i64*] %4 = bitcast %"std.random.rndGen.MapResult!(__lambda4, Repeat!int).MapResult"* %.sret_arg to i64* ; [#uses = 1 type = i64*] %5 = load i64* %3, align 1 ; [#uses = 1 type = i64] store i64 %5, i64* %4, align 4 ret void } which gets translated to the following ARM assembly: _D3std9algorithm9iteration47__T3mapS363std6random6rndGenFNcNdNfZ9__lambda4Z42__T3mapTS3std5range13__T6RepeatTiZ6RepeatZ3mapMFNaNbNiNfS3std5range13__T6RepeatTiZ6RepeatZS3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult: .fnstart .Leh_func_begin94: .save {r4, lr} push {r4, lr} mov r4, r0 stmib sp, {r0, r1} bl _D3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult6__ctorMFNaNbNcNiNfS3std5range13__T6RepeatTiZ6RepeatZS3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult(PLT) ldmib sp, {r0, r1} stm r4, {r0, r1} pop {r4, lr} bx lr The problem appears to be that even though r4 is saved at the beginning of the function, it is overwritten by r1 in the stmib instruction afterwards. Obviously there's no point in pushing r4 and popping it at the end of the function, if you've lost it by the stack pointer by 8, then stmib increments it 4 _before_ storing r0 then r1, overwriting the contents of r4 saved at the beginning of the function. Other calls to map also seem to fail in other instances of the same template function, but interestingly in different ways, ie no uses of stmib there. I haven't tracked down exactly why those other instances fail. I'm not sure how to reduce this further and if I need to file a bug for llvm: any pointers?
Jul 16 2015
"Joakim" <dlang joakim.fea.st> writes:The problem appears to be that even though r4 is saved at the beginning of the function, it is overwritten by r1 in the stmib instruction afterwards. Obviously there's no point in pushing r4 and popping it at the end of the function, if you've lost it by stack pointer by 8, then stmib increments it 4 _before_ storing r0 then r1, overwriting the contents of r4 saved at the beginning of the function. Other calls to map also seem to fail in other instances of the same template function, but interestingly in different ways, ie no uses of stmib there. I haven't tracked down exactly why those other instances fail.That doesn't look right. What is your triple and other options (optimization), llvm version your patched src is based on, and ldc branch (merge-2.067 branch I think)? I'd like to compare the asm snippet with what I get get for thumbv7-apple-ios, because it should be similar, but it passes std.zip. It could be I just haven't stumbled into the combination you are using. -- Dan
Jul 16 2015
On Thursday, 16 July 2015 at 16:40:31 UTC, Dan Olson wrote:"Joakim" <dlang joakim.fea.st> writes:llvm triple and other flags used: --output-o -w -d -mtriple=armv7-none-linux-androideabi -relocation-model=pic -O3 -release -unittest ldc was compiled against a locally-compiled llvm 3.6 from the Android NDK repo, which has some modifications: https://android.googlesource.com/toolchain/llvm/+log/release_36 Yes, I'm using the merge-2.067 branch of ldc as of commit 122ea372d from a couple weeks ago, with inlining turned off, as noted in the earlier EH thread. I would be curious to see what llvm IR and ARM assembly is generated for you on iOS and what other flags you're using. The actual function shown is from the std.random module.The problem appears to be that even though r4 is saved at the beginning of the function, it is overwritten by r1 in the stmib instruction afterwards. Obviously there's no point in pushing r4 and popping it at the end of the function, if you've lost it by overwriting in between. Specifically, "sub increments it 4 _before_ storing r0 then r1, overwriting the contents of r4 saved at the beginning of the function. Other calls to map also seem to fail in other instances of the same template function, but interestingly in different ways, ie no uses of stmib there. I haven't tracked down exactly why those other instances fail.That doesn't look right. What is your triple and other options (optimization), llvm version your patched src is based on, and ldc branch (merge-2.067 branch I think)? I'd like to compare the asm snippet with what I get get for thumbv7-apple-ios, because it should be similar, but it passes std.zip. It could be I just haven't stumbled into the combination you are using.
Jul 16 2015
On Thursday, 16 July 2015 at 17:07:33 UTC, Joakim wrote:On Thursday, 16 July 2015 at 16:40:31 UTC, Dan Olson wrote:And the nice thing about LLVM / LDC is that I can specify your triple and other options with one compiler. I love that about LLVM: all cross compilers in one.[...]llvm triple and other flags used: --output-o -w -d -mtriple=armv7-none-linux-androideabi -relocation-model=pic -O3 -release -unittest ldc was compiled against a locally-compiled llvm 3.6 from the Android NDK repo, which has some modifications: https://android.googlesource.com/toolchain/llvm/+log/release_36 Yes, I'm using the merge-2.067 branch of ldc as of commit 122ea372d from a couple weeks ago, with inlining turned off, as noted in the earlier EH thread. I would be curious to see what llvm IR and ARM assembly is generated for you on iOS and what other flags you're using. The actual function shown is from the std.random module.
Jul 16 2015
"Joakim" <dlang joakim.fea.st> writes:Alright, I've been stepping through some failing tests: one that seems to be bad codegen is that many calls to std.algorithm.iteration.map seem to fail, for example, when running the tests for std.zip. One of the std.zip tests calls std.random.uniform, which then gets its second parameter stomped by map from rndGen().Joakim, I have a hunch. Can you try changing your abi-android.cpp so passByVal() returns false always? That is the only difference in our generated IR and in my experience byval causes incorrect code on ARM. As an experiment, I changed my passByVal() back to return true for Tstruct (the LDC default), and then my IR is identical to yours and my ARM code, though slightly different, also is clobbering a saved reg on the stack. Anyway, I pasted my GOOD assembly below yours for comparison.which gets translated to the following ARM assembly: _D3std9algorithm9iteration47__T3mapS363std6random6rndGenFNcNdNfZ9__lambda4Z42__T3mapTS3std5range13__T6RepeatTiZ6RepeatZ3mapMFNaNbNiNfS3std5range13__T6RepeatTiZ6RepeatZS3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult: .fnstart .Leh_func_begin94: .save {r4, lr} push {r4, lr} mov r4, r0 stmib sp, {r0, r1} bl _D3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult6__ctorMFNaNbNcNiNfS3std5range13__T6RepeatTiZ6RepeatZS3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult(PLT) ldmib sp, {r0, r1} stm r4, {r0, r1} pop {r4, lr} bx lrBetter assembly produced with -output-s -w -d -mtriple=armv7-apple-ios -relocation-model=pic -O3 -release -unittest -disable-inlining (needed to match up with your change). It is similar but uses stm (e.g stmia) instead of stmib. __D3std9algorithm9iteration47__T3mapS363std6random6rndGenFNcNdNfZ9__lambda4Z42__T3mapTS3std5range13__T6RepeatTiZ6RepeatZ3mapMFNaNbNiNfS3std5range13__T6RepeatTiZ6RepeatZS3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult: push {r4, r7, lr} mov r4, r0 stm sp, {r0, r1} mov r0, sp mov r1, r2 bl __D3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult6__ctorMFNaNbNcNiNfS3std5range13__T6RepeatTiZ6RepeatZS3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult ldm sp, {r0, r1} strd r0, r1, [r4] pop {r4, r7, pc} I wanted to use your triple with OS of android, but std/random.d needs Android specific D code from elsewhere, so compile failed. As aside, with inlining enabled, that function shrinks to 3 instructions: __D3std9algorithm9iteration47__T3mapS363std6random6rndGenFNcNdNfZ9__lambda4Z42__T3mapTS3std5range13__T6RepeatTiZ6RepeatZ3mapMFNaNbNiNfS3std5range13__T6RepeatTiZ6RepeatZS3std9algorithm9iteration87__T9MapResultS363std6random6rndGenFNcNdNfZ9__lambda4TS3std5range13__T6RepeatTiZ6RepeatZ9MapResult: mov r3, r1 strd r2, r3, [r0] bx lr Hope it helps
Jul 17 2015
On Friday, 17 July 2015 at 08:58:30 UTC, Dan Olson wrote:"Joakim" <dlang joakim.fea.st> writes:Yep, that fixed it, std.zip passes its tests now. :) I'll run the rest of the tests and report back, thanks for your help.Alright, I've been stepping through some failing tests: one that seems to be bad codegen is that many calls to std.algorithm.iteration.map seem to fail, for example, when running the tests for std.zip. One of the std.zip tests calls std.random.uniform, which then gets its second parameter stomped by map from rndGen().Joakim, I have a hunch. Can you try changing your abi-android.cpp so passByVal() returns false always? That is the only difference in our generated IR and in my experience byval causes incorrect code on ARM. As an experiment, I changed my passByVal() back to return true for Tstruct (the LDC default), and then my IR is identical to yours and my ARM code, though slightly different, also is clobbering a saved reg on the stack.
Jul 17 2015
"Joakim" <dlang joakim.fea.st> writes:Yep, that fixed it, std.zip passes its tests now. :) I'll run the rest of the tests and report back, thanks for your help.:-)
Jul 17 2015
On Friday, 17 July 2015 at 16:33:07 UTC, Dan Olson wrote:"Joakim" <dlang joakim.fea.st> writes:Around 30 more modules from std.phobos now pass their tests with that change, most of which used to segfault somewhere in a test before. core.time from druntime also passes all its tests now. That one change made a big difference, thanks for pointing it out. Now to fix the rest. Many are related to the real type, which I haven't patched for 64-bit yet.Yep, that fixed it, std.zip passes its tests now. :) I'll run the rest of the tests and report back, thanks for your help.:-)
Jul 17 2015
On Friday, 17 July 2015 at 17:57:01 UTC, Joakim wrote:On Friday, 17 July 2015 at 16:33:07 UTC, Dan Olson wrote:Dan, as I said in the main forum, most of the druntime/phobos modules' tests pass on Android/ARM now. However, I had to turn off optimizations for a handful of modules, have you had to do the same? As noted before, one optimization pass was screwing up ldc.eh. I also had to turn off all optimizations, ie -O0, for std.random and std.stream to get their unit tests to pass. For one phobos module, std.regex, turning off all optimizations for druntime's core.memory got the regex tests to pass. Other than those four modules, everything is compiled with -O3 and seems to work, except for the two modules that still segfault, std.net.isemail and std.regex.internal.tests, where compiling those modules with -O0 doesn't make a difference. I haven't spent any time tracking down if other optimized modules might be causing those two to segfault, as was the case with std.regex and core.memory, or exactly which llvm optimizations are causing problems with core.memory, std.random, and std.stream. Are you seeing similar results with your 2.067 branch of ldc with iOS? Since ARM codegen should be similar for the two, I wonder if I'm the only one seeing this."Joakim" <dlang joakim.fea.st> writes:Around 30 more modules from std.phobos now pass their tests with that change, most of which used to segfault somewhere in a test before. core.time from druntime also passes all its tests now. That one change made a big difference, thanks for pointing it out. Now to fix the rest. Many are related to the real type, which I haven't patched for 64-bit yet.Yep, that fixed it, std.zip passes its tests now. :) I'll run the rest of the tests and report back, thanks for your help.:-)
Aug 03 2015
"Joakim" <dlang joakim.fea.st> writes:On Friday, 17 July 2015 at 17:57:01 UTC, Joakim wrote:Hi Joakim - I have -O3 optimization on for all modules in the release build and I think I have tested with most of the other -O levels. I did run into an alignment error with neon instructions in std.random unittest with LLVM 3.5.1 and 0.15.1 (2.066) and eventually disabled neon (-mattr=-neon) during optimization as a workaround. I have not tried reenabling neon for merge-2.067 and LLVM 3.6. The problem was a vst1.64 instruction requesting 128-bit alignment when the data was only 64-bit aligned: 0x52197a: vst1.64 {d16, d17}, [r5:128] // r5 addr not properly aligned If it can happen in std.random it could happen elsewhere. You might try -mattr=-neon and see what happens. One other difference to think about is that Android is AAPCS and iOS is a variant of the older APCS. LLVM has some different paths for these. I studied it some a few weeks ago as I made the extern(C) ABI compatible with clang. -- DanOn Friday, 17 July 2015 at 16:33:07 UTC, Dan Olson wrote:Dan, as I said in the main forum, most of the druntime/phobos modules' tests pass on Android/ARM now. However, I had to turn off optimizations for a handful of modules, have you had to do the same? As noted before, one optimization pass was screwing up ldc.eh. I also had to turn off all optimizations, ie -O0, for std.random and std.stream to get their unit tests to pass. For one phobos module, std.regex, turning off all optimizations for druntime's core.memory got the regex tests to pass. Other than those four modules, everything is compiled with -O3 and seems to work, except for the two modules that still segfault, std.net.isemail and std.regex.internal.tests, where compiling those modules with -O0 doesn't make a difference. I haven't spent any time tracking down if other optimized modules might be causing those two to segfault, as was the case with std.regex and core.memory, or exactly which llvm optimizations are causing problems with core.memory, std.random, and std.stream. Are you seeing similar results with your 2.067 branch of ldc with iOS? Since ARM codegen should be similar for the two, I wonder if I'm the only one seeing this."Joakim" <dlang joakim.fea.st> writes:
Aug 03 2015