digitalmars.D.ldc - Openwrt Linux Uclibc ARM GC issue
- Radu (71/71) Dec 15 2017 Trying to run some D code on Openwrt with Uclibc and got stuck by
- David Nadlinger (17/25) Dec 15 2017 The assert is inside an invariant which checks that the TLS information
- Radu (62/90) Dec 17 2017 My various attempts on getting it to run behaved very erratic.
- Joakim (31/135) Dec 17 2017 I believe that triple defaults to ARMv5, are you sure your
- Suliman (1/1) Dec 18 2017 offtop: there is another interesing lib: https://uclibc-ng.org/
- Radu (58/203) Jan 09 2018 Got some time to work on this - just to clarify I'm developing
- David Nadlinger (7/10) Jan 10 2018 You mean thread_suspendHandler? Perhaps single-stepping through the code...
- Radu (69/82) Jan 10 2018 David, indeed sem_post works correctly, I guess gdb interpreted
- Joakim (11/17) Jan 10 2018 Have you ported much of druntime to Uclibc? It currently assumes
- Radu (59/78) Jan 14 2018 I missed a bunch of details that where killing the signal
- Joakim (11/18) Jan 15 2018 Figured that was it, that's why I asked you a couple times how
- David Nadlinger (6/12) Jan 15 2018 We inherited that from Martin's code – presumably, it's just never the...
- Joakim (22/56) Dec 16 2017 First thing I'd do is build and run the test runners, then make
- Radu (14/75) Dec 17 2017 Test runners where out of the question as no program started. See
Trying to run some D code on Openwrt with Uclibc and got stuck by broken GC. Using LDC 1.6 ==================================== LDC - the LLVM D compiler (1.6.0): based on DMD v2.076.1 and LLVM 5.0.0 built with LDC - the LLVM D compiler (1.6.0) Default target: x86_64-unknown-linux-gnu Host CPU: broadwell http://dlang.org - http://wiki.dlang.org/LDC Registered Targets: aarch64 - AArch64 (little endian) aarch64_be - AArch64 (big endian) arm - ARM arm64 - ARM64 (little endian) armeb - ARM (big endian) nvptx - NVIDIA PTX 32-bit nvptx64 - NVIDIA PTX 64-bit ppc32 - PowerPC 32 ppc64 - PowerPC 64 ppc64le - PowerPC 64 LE thumb - Thumb thumbeb - Thumb (big endian) x86 - 32-bit X86: Pentium-Pro and above x86-64 - 64-bit X86: EM64T and AMD64 ==================================== Run time libs where compiled with: ==================================== ldc-build-runtime --dFlags="-w;-mtriple=armv7-linux-gnueabihf -mcpu=cortex-a7 -L-lstdc++" --cFlags="-mcpu=cortex-a7 -mfloat-abi=hard -D__UCLIBC_HAS_BACKTRACE__ -D__UCLIBC_HAS_TLS__" --targetSystem="Linux;UNIX" BUILD_SHARED_LIBS=OFF ==================================== The minimal program is: ++++++++++++++++++++ import core.memory; void main() { GC.collect(); } ++++++++++++++++++++ Compiled with `ldc2 -mtriple=armv7-linux-gnueabihf -mcpu=cortex-a7 -gcc=arm-openwrt-linux-gcc` When run, I get this error spuriously: ==================================== core.exception.AssertError rt/sections_elf_shared.d(116): Assertion failure Fatal error in EH code: _Unwind_RaiseException failed with reason code: 9 Aborted (core dumped) ==================================== GDB on the coredump: ==================================== (gdb) bt ldso/ldso/ldso.c:418 flags=<optimized out>, piclib=-1225360472, ppnt=0x21, infile=<optimized out>) at ldso/ldso/dl-elf.c:442 rpnt=0xbeea5d9c, libname=0x0) at ldso/ldso/dl-elf.c:703 out>, execute=<optimized out>) at libpthread/nptl/forward.c:152 Backtrace stopped: previous frame identical to this frame (corrupt stack?) ==================================== Any idea what might be wrong?
Dec 15 2017
On 15 Dec 2017, at 14:06, Radu via digitalmars-d-ldc wrote:When run, I get this error spuriously: ==================================== core.exception.AssertError rt/sections_elf_shared.d(116): Assertion failure Fatal error in EH code: _Unwind_RaiseException failed with reason code: 9 Aborted (core dumped) ====================================The assert is inside an invariant which checks that the TLS information has been extracted successfully. Perhaps uclibc uses a TLS implementation that is not ABI-compatible with glibc? (druntime needs to determine the TLS ranges to register them with the GC, for the main thread as well as newly spawned ones.) Where in the program lifecycle does the error occur? From the backtrace, it looks like during C runtime startup, in which case I am not quite seeing the connection to the GC. Why unwinding fails is another question, but not one I would be terribly worried about – it is possible that the error e.g. just occurs too early for the EH machinery to be properly set up yet. Other low-level parts of druntime have been converted to directly abort (e.g. using assert(0)) instead. In fact, I am about to overhaul sections_elf_shared in that respect anyway to improve error reporting when mixing shared and non-shared builds. — David
Dec 15 2017
On Friday, 15 December 2017 at 14:24:08 UTC, David Nadlinger wrote:On 15 Dec 2017, at 14:06, Radu via digitalmars-d-ldc wrote:My various attempts on getting it to run behaved very erratic. So I changed the parameters for cross compile, basically I removed all architecture specifics leaving only `-mtriple=arm-linux-gnueabihf`, and `-mfloat-abi=hard` on C side. My testing hardware is a ARM Cortex-A7, http://linux-sunxi.org/A33 With the compiler switches changed I could run my test program and try the druntime test runner (albeit with some changes on math and stdio to get it linking): ./druntime-test-runner 0.000s PASS release32 core.atomic 0.000s PASS release32 core.bitop 0.000s PASS release32 core.checkedint 0.005s PASS release32 core.demangle 0.000s PASS release32 core.exception 0.002s PASS release32 core.internal.arrayop 0.000s PASS release32 core.internal.convert 0.000s PASS release32 core.internal.hash 0.000s PASS release32 core.internal.string 0.000s PASS release32 core.math 0.000s PASS release32 core.memory 0.002s PASS release32 core.sync.barrier 0.015s PASS release32 core.sync.condition 0.000s PASS release32 core.sync.config 0.016s PASS release32 core.sync.mutex 0.016s PASS release32 core.sync.rwmutex 0.002s PASS release32 core.sync.semaphore Segmentation fault (core dumped) The seg fault is from core.thread:1351 unittest { auto t1 = new Thread({ foreach (_; 0 .. 20) Thread.getAll; }).start; auto t2 = new Thread({ foreach (_; 0 .. 20) GC.collect; // this seg faults }).start; t1.join(); t2.join(); } Calling GC.collect from the main thread doesn't seg fault. Core dump is not very helpful - stack is garbage, but running with gdbserver a minimal program with the unit test I can see this: Thread 1 "test" received signal SIGUSR1, User defined signal 1. pthread_getattr_np (thread_id=0, attr=0xb6b302bc) at libpthread/nptl/pthread_getattr_np.c:47 47 iattr->schedpolicy = thread->schedpolicy; (gdb) step Thread 1 "test" received signal SIGUSR2, User defined signal 2. 0xb6e50d80 in epoll_wait (epfd=-1090521272, events=0x8, maxevents=2, timeout=-1224756080) at libc/sysdeps/linux/common/epoll.c:58 58 CANCELLABLE_SYSCALL(int, epoll_wait, (int epfd, struct epoll_event *events, int maxevents, int timeout), (gdb) step Thread 1 "test" received signal SIGSEGV, Segmentation fault. 0xfffffffc in ?? () (gdb)When run, I get this error spuriously: ==================================== core.exception.AssertError rt/sections_elf_shared.d(116): Assertion failure Fatal error in EH code: _Unwind_RaiseException failed with reason code: 9 Aborted (core dumped) ====================================The assert is inside an invariant which checks that the TLS information has been extracted successfully. Perhaps uclibc uses a TLS implementation that is not ABI-compatible with glibc? (druntime needs to determine the TLS ranges to register them with the GC, for the main thread as well as newly spawned ones.) Where in the program lifecycle does the error occur? From the backtrace, it looks like during C runtime startup, in which case I am not quite seeing the connection to the GC. Why unwinding fails is another question, but not one I would be terribly worried about – it is possible that the error e.g. just occurs too early for the EH machinery to be properly set up yet. Other low-level parts of druntime have been converted to directly abort (e.g. using assert(0)) instead. In fact, I am about to overhaul sections_elf_shared in that respect anyway to improve error reporting when mixing shared and non-shared builds. — David
Dec 17 2017
On Sunday, 17 December 2017 at 17:12:41 UTC, Radu wrote:On Friday, 15 December 2017 at 14:24:08 UTC, David Nadlinger wrote:I believe that triple defaults to ARMv5, are you sure your Openwrt kernel is built for ARMv7? Try running uname -m on the device to check. For example, most low- to mid-level smartphones these days ship with ARMv8 chips but the kernel is only built for 32-bit ARMv7, so they can only run 32-bit apps.On 15 Dec 2017, at 14:06, Radu via digitalmars-d-ldc wrote:My various attempts on getting it to run behaved very erratic. So I changed the parameters for cross compile, basically I removed all architecture specifics leaving only `-mtriple=arm-linux-gnueabihf`, and `-mfloat-abi=hard` on C side. My testing hardware is a ARM Cortex-A7, http://linux-sunxi.org/A33When run, I get this error spuriously: ==================================== core.exception.AssertError rt/sections_elf_shared.d(116): Assertion failure Fatal error in EH code: _Unwind_RaiseException failed with reason code: 9 Aborted (core dumped) ====================================The assert is inside an invariant which checks that the TLS information has been extracted successfully. Perhaps uclibc uses a TLS implementation that is not ABI-compatible with glibc? (druntime needs to determine the TLS ranges to register them with the GC, for the main thread as well as newly spawned ones.) Where in the program lifecycle does the error occur? From the backtrace, it looks like during C runtime startup, in which case I am not quite seeing the connection to the GC. Why unwinding fails is another question, but not one I would be terribly worried about – it is possible that the error e.g. just occurs too early for the EH machinery to be properly set up yet. Other low-level parts of druntime have been converted to directly abort (e.g. using assert(0)) instead. In fact, I am about to overhaul sections_elf_shared in that respect anyway to improve error reporting when mixing shared and non-shared builds. — DavidWith the compiler switches changed I could run my test program and try the druntime test runner (albeit with some changes on math and stdio to get it linking): ./druntime-test-runner 0.000s PASS release32 core.atomic 0.000s PASS release32 core.bitop 0.000s PASS release32 core.checkedint 0.005s PASS release32 core.demangle 0.000s PASS release32 core.exception 0.002s PASS release32 core.internal.arrayop 0.000s PASS release32 core.internal.convert 0.000s PASS release32 core.internal.hash 0.000s PASS release32 core.internal.string 0.000s PASS release32 core.math 0.000s PASS release32 core.memory 0.002s PASS release32 core.sync.barrier 0.015s PASS release32 core.sync.condition 0.000s PASS release32 core.sync.config 0.016s PASS release32 core.sync.mutex 0.016s PASS release32 core.sync.rwmutex 0.002s PASS release32 core.sync.semaphore Segmentation fault (core dumped) The seg fault is from core.thread:1351 unittest { auto t1 = new Thread({ foreach (_; 0 .. 20) Thread.getAll; }).start; auto t2 = new Thread({ foreach (_; 0 .. 20) GC.collect; // this seg faults }).start; t1.join(); t2.join(); } Calling GC.collect from the main thread doesn't seg fault.Try running core.thread alone and see if it makes a difference, ./druntime-test-runner core.thread, as I've sometimes seen tested modules interfere with each other. I see that there are a few places where Glibc is assumed in core.thread, make sure those are right on Uclibc too: https://github.com/ldc-developers/druntime/blob/ldc-v1.6.0/src/core/thread.d#L3301 https://github.com/ldc-developers/druntime/blob/ldc-v1.6.0/src/core/thread.d#L3410 You can also try skipping those tests that segfault for now and make sure everything else works, by adding something like version(skip) before that failing unittest block, so you know the extent of the test problems.Core dump is not very helpful - stack is garbage, but running with gdbserver a minimal program with the unit test I can see this: Thread 1 "test" received signal SIGUSR1, User defined signal 1. pthread_getattr_np (thread_id=0, attr=0xb6b302bc) at libpthread/nptl/pthread_getattr_np.c:47 47 iattr->schedpolicy = thread->schedpolicy; (gdb) step Thread 1 "test" received signal SIGUSR2, User defined signal 2. 0xb6e50d80 in epoll_wait (epfd=-1090521272, events=0x8, maxevents=2, timeout=-1224756080) at libc/sysdeps/linux/common/epoll.c:58 58 CANCELLABLE_SYSCALL(int, epoll_wait, (int epfd, struct epoll_event *events, int maxevents, int timeout), (gdb) step Thread 1 "test" received signal SIGSEGV, Segmentation fault. 0xfffffffc in ?? () (gdb)The SIGUSR1/SIGUSR2 signals mean the GC ran fine. You'd need to delve more into the code and the implementation details mentioned above to track this down. On Sunday, 17 December 2017 at 17:20:32 UTC, Radu wrote:Yes - latest LDC versions make cross compiling a breeze so kudos to you guys for making this happening. I'm using Linux subsystem for Window btw. so for me this is even more fun as I can work on both environments natively :)Yeah, you could just use the Windows ldc too, assuming you have a cross-compiler from that OS, as shown on the wiki for Windows with the Android NDK.The modifications need it surface deep are very few - some math and memory streams functions are missing.I don't know how much it differs from Glibc, but we'd always be interested in a port, assuming you have the time to submit a pull like this recent one for Musl: https://github.com/dlang/druntime/pull/1997The road block looks to be somewhere in the GC and TLS, or the interaction of them (at least this is my feeling ATM)Not being able to do an explicit collect there isn't that big a deal: I'd skip that test for now and run everything else, then come back to that one once you have an idea of the bigger picture.
Dec 17 2017
offtop: there is another interesing lib: https://uclibc-ng.org/
Dec 18 2017
On Sunday, 17 December 2017 at 19:05:04 UTC, Joakim wrote:On Sunday, 17 December 2017 at 17:12:41 UTC, Radu wrote:Got some time to work on this - just to clarify I'm developing against uClibc-ng 1.0.9, noticed others suggesting this and wanted to make it clear. Re. the architecture - it is an armv7a as 'uname -a' says: armv7l GNU/Linux' I could not produce any working binary by specifying the armv7a architecture to ldc, so I used the generic arm architecture for gnueabihf, as previously stated. I managed to get the druntime tester running (minus some math functions and memstream) except for one specific blocking issue - Thread.suspend does not work, it produces a segfault. To test this I commented out all suspendAll/resumeAll unittests from core.thread and stubbed out GC.collect(). This issue is not linked to the GC, as the segfault happens even when disabling the GC.collect function and enable the suspendAll/resumeAll unittests, the GC just happens to use the suspend mechanics and exposes the core issue. From what I can see in gdb 'thread_resumeHandler' is to blame, it looks like 'sem_post( &suspendCount )' will immediately trigger the resumeSignal and the call for 'sigsuspend( &sigres )' is never made. Like: 464 status = sem_post( &suspendCount ); (gdb) n Thread 2 "druntime-test-r" received signal SIGUSR2, User defined signal 2. 0x001b46d0 in core.thread.thread_suspendHandler(int).op(void*) (sp=0xb572f900 "$F\033") at thread.d:464 464 status = sem_post( &suspendCount ); (gdb) info threads Id Target Id Frame 1 Thread 16005.16005 "druntime-test-r" 0x001ba7a0 in _D4core6thread5Fiber5stateMxFNaNbNdNiNfZEQBnQBlQBh5State (this=0xb6d34980) at thread.d:4533 * 2 Thread 16005.16273 "druntime-test-r" 0x001b46d0 in core.thread.thread_suspendHandler(int).op(void*) (sp=0xb572f900 "$F\033") at thread.d:464 (gdb) bt core.thread.thread_suspendHandler(int).op(void*) (sp=0xb572f900 "$F\033") at thread.d:464 void(void*) nothrow delegate) (fn=...) at thread.d:2600 Backtrace stopped: previous frame identical to this frame (corrupt stack?) (gdb) n Thread 2 "druntime-test-r" received signal SIGSEGV, Segmentation fault. 0xfffffffc in ?? () (gdb) bt Backtrace stopped: previous frame identical to this frame (corrupt stack?)On Friday, 15 December 2017 at 14:24:08 UTC, David Nadlinger wrote:I believe that triple defaults to ARMv5, are you sure your Openwrt kernel is built for ARMv7? Try running uname -m on the device to check. For example, most low- to mid-level smartphones these days ship with ARMv8 chips but the kernel is only built for 32-bit ARMv7, so they can only run 32-bit apps.On 15 Dec 2017, at 14:06, Radu via digitalmars-d-ldc wrote:My various attempts on getting it to run behaved very erratic. So I changed the parameters for cross compile, basically I removed all architecture specifics leaving only `-mtriple=arm-linux-gnueabihf`, and `-mfloat-abi=hard` on C side. My testing hardware is a ARM Cortex-A7, http://linux-sunxi.org/A33When run, I get this error spuriously: ==================================== core.exception.AssertError rt/sections_elf_shared.d(116): Assertion failure Fatal error in EH code: _Unwind_RaiseException failed with reason code: 9 Aborted (core dumped) ====================================The assert is inside an invariant which checks that the TLS information has been extracted successfully. Perhaps uclibc uses a TLS implementation that is not ABI-compatible with glibc? (druntime needs to determine the TLS ranges to register them with the GC, for the main thread as well as newly spawned ones.) Where in the program lifecycle does the error occur? From the backtrace, it looks like during C runtime startup, in which case I am not quite seeing the connection to the GC. Why unwinding fails is another question, but not one I would be terribly worried about – it is possible that the error e.g. just occurs too early for the EH machinery to be properly set up yet. Other low-level parts of druntime have been converted to directly abort (e.g. using assert(0)) instead. In fact, I am about to overhaul sections_elf_shared in that respect anyway to improve error reporting when mixing shared and non-shared builds. — DavidWith the compiler switches changed I could run my test program and try the druntime test runner (albeit with some changes on math and stdio to get it linking): ./druntime-test-runner 0.000s PASS release32 core.atomic 0.000s PASS release32 core.bitop 0.000s PASS release32 core.checkedint 0.005s PASS release32 core.demangle 0.000s PASS release32 core.exception 0.002s PASS release32 core.internal.arrayop 0.000s PASS release32 core.internal.convert 0.000s PASS release32 core.internal.hash 0.000s PASS release32 core.internal.string 0.000s PASS release32 core.math 0.000s PASS release32 core.memory 0.002s PASS release32 core.sync.barrier 0.015s PASS release32 core.sync.condition 0.000s PASS release32 core.sync.config 0.016s PASS release32 core.sync.mutex 0.016s PASS release32 core.sync.rwmutex 0.002s PASS release32 core.sync.semaphore Segmentation fault (core dumped) The seg fault is from core.thread:1351 unittest { auto t1 = new Thread({ foreach (_; 0 .. 20) Thread.getAll; }).start; auto t2 = new Thread({ foreach (_; 0 .. 20) GC.collect; // this seg faults }).start; t1.join(); t2.join(); } Calling GC.collect from the main thread doesn't seg fault.Try running core.thread alone and see if it makes a difference, ./druntime-test-runner core.thread, as I've sometimes seen tested modules interfere with each other. I see that there are a few places where Glibc is assumed in core.thread, make sure those are right on Uclibc too: https://github.com/ldc-developers/druntime/blob/ldc-v1.6.0/src/core/thread.d#L3301 https://github.com/ldc-developers/druntime/blob/ldc-v1.6.0/src/core/thread.d#L3410 You can also try skipping those tests that segfault for now and make sure everything else works, by adding something like version(skip) before that failing unittest block, so you know the extent of the test problems.Core dump is not very helpful - stack is garbage, but running with gdbserver a minimal program with the unit test I can see this: Thread 1 "test" received signal SIGUSR1, User defined signal 1. pthread_getattr_np (thread_id=0, attr=0xb6b302bc) at libpthread/nptl/pthread_getattr_np.c:47 47 iattr->schedpolicy = thread->schedpolicy; (gdb) step Thread 1 "test" received signal SIGUSR2, User defined signal 2. 0xb6e50d80 in epoll_wait (epfd=-1090521272, events=0x8, maxevents=2, timeout=-1224756080) at libc/sysdeps/linux/common/epoll.c:58 58 CANCELLABLE_SYSCALL(int, epoll_wait, (int epfd, struct epoll_event *events, int maxevents, int timeout), (gdb) step Thread 1 "test" received signal SIGSEGV, Segmentation fault. 0xfffffffc in ?? () (gdb)The SIGUSR1/SIGUSR2 signals mean the GC ran fine. You'd need to delve more into the code and the implementation details mentioned above to track this down. On Sunday, 17 December 2017 at 17:20:32 UTC, Radu wrote:Yes - latest LDC versions make cross compiling a breeze so kudos to you guys for making this happening. I'm using Linux subsystem for Window btw. so for me this is even more fun as I can work on both environments natively :)Yeah, you could just use the Windows ldc too, assuming you have a cross-compiler from that OS, as shown on the wiki for Windows with the Android NDK.The modifications need it surface deep are very few - some math and memory streams functions are missing.I don't know how much it differs from Glibc, but we'd always be interested in a port, assuming you have the time to submit a pull like this recent one for Musl: https://github.com/dlang/druntime/pull/1997The road block looks to be somewhere in the GC and TLS, or the interaction of them (at least this is my feeling ATM)Not being able to do an explicit collect there isn't that big a deal: I'd skip that test for now and run everything else, then come back to that one once you have an idea of the bigger picture.
Jan 09 2018
On 10 Jan 2018, at 0:27, Radu via digitalmars-d-ldc wrote:From what I can see in gdb 'thread_resumeHandler' is to blame, it looks like 'sem_post( &suspendCount )' will immediately trigger the resumeSignal and the call for 'sigsuspend( &sigres )' is never made.You mean thread_suspendHandler? Perhaps single-stepping through the code and having a look where the stack is corrupted would yield some insight? Is there possibly some ABI incompatibility caused by callWithStackShell? sem_post shouldn't cause anything to happen on the calling thread itself; and it is explicitly documented to be re-entrant w.r.t. signals. —David
Jan 10 2018
On Wednesday, 10 January 2018 at 11:13:17 UTC, David Nadlinger wrote:On 10 Jan 2018, at 0:27, Radu via digitalmars-d-ldc wrote:David, indeed sem_post works correctly, I guess gdb interpreted the sequence in the wrong order. Moving the break point to the thread_resumeHandler I can see that the handler gets called, but I think you are right about the ABI, observe: Thread 2 "druntime-test-r" received signal SIGUSR2, User defined signal 2. 0xb6e88648 in ?? () from target:/lib/libc.so.1 (gdb) bt core.thread.thread_suspendHandler(int).op(void*) (sp=0xb572f900 "$F\033") at thread.d:467 void(void*) nothrow delegate) (fn=...) at thread.d:2600 (gdb) c Thread 2 "druntime-test-r" hit Breakpoint 1, thread_resumeHandler (sig=12) at thread.d:494 warning: Source file is more recent than executable. 494 assert( sig == resumeSignalNumber ); (gdb) i f Stack level 0, frame at 0xb572f4d8: pc = 0x1b487c in thread_resumeHandler (thread.d:494); saved pc = 0xfffffffe called by frame at 0xb572f4d8 source language d. Arglist at 0xb572f4c8, args: sig=12 Locals at 0xb572f4c8, Previous frame's sp is 0xb572f4d8 Saved registers: r11 at 0xb572f4d0, lr at 0xb572f4d4 ....... (gdb) disas (gdb) disas Dump of assembler code for function thread_resumeHandler: 0x001b4864 <+0>: push {r11, lr} 0x001b4868 <+4>: mov r11, sp <thread_resumeHandler+72> 0x001b4874 <+16>: ldr r1, [pc, r1] 0x001b4880 <+28>: ldr r1, [r1] 0x001b4884 <+32>: cmp r0, r1 0x001b4888 <+36>: bne 0x1b4894 <thread_resumeHandler+48> 0x001b488c <+40>: mov sp, r11 => 0x001b4890 <+44>: pop {r11, pc} <thread_resumeHandler+76> 0x001b4898 <+52>: add r1, pc, r0 0x001b48a8 <+68>: bl 0xf00c8 <_d_assert> 0x001b48ac <+72>: mulseq r4, r8, r5 0x001b48b0 <+76>: ; <UNDEFINED> instruction: 0x00117bd1 (gdb) ni 0x001b4890 in thread_resumeHandler (sig=-2) at thread.d:499 499 } Warning: Cannot insert breakpoint 0. Cannot access memory at address 0xfffffffe It looks that PC is invalid causing the segmentation fault.From what I can see in gdb 'thread_resumeHandler' is to blame, it looks like 'sem_post( &suspendCount )' will immediately trigger the resumeSignal and the call for 'sigsuspend( &sigres )' is never made.You mean thread_suspendHandler? Perhaps single-stepping through the code and having a look where the stack is corrupted would yield some insight? Is there possibly some ABI incompatibility caused by callWithStackShell? sem_post shouldn't cause anything to happen on the calling thread itself; and it is explicitly documented to be re-entrant w.r.t. signals. —David
Jan 10 2018
On Wednesday, 10 January 2018 at 14:17:53 UTC, Radu wrote:On Wednesday, 10 January 2018 at 11:13:17 UTC, David Nadlinger wrote:Have you ported much of druntime to Uclibc? It currently assumes Glibc on linux by default, so if there are differences between the way the two handle such signals, it can cause problems. For example, the Android Java Runtime intercepts SIGUSR1/SIGUSR2 and doesn't run their signal handlers, so I had to work around that issue: https://github.com/dlang/druntime/pull/1851#discussion_r123886260 You may be running across a similar incompatibility, so I suggest you port all the version-dependent blocks of that module and its dependent modules first.[...]David, indeed sem_post works correctly, I guess gdb interpreted the sequence in the wrong order. [...]
Jan 10 2018
On Wednesday, 10 January 2018 at 15:56:52 UTC, Joakim wrote:On Wednesday, 10 January 2018 at 14:17:53 UTC, Radu wrote:I missed a bunch of details that where killing the signal handling, thanks for the guidance!, various size differences on structs. Fixed now. druntime tests are passing in release mode now. The debug build fails with: core.exception.AssertError rt/sections_elf_shared.d(116): Assertion failure, Code looks like: invariant() { assert(_moduleGroup.modules.length); static if (SharedELF) { assert(_tlsMod || !_tlsSize); // <-- fails } } Stack trace: sections_elf_shared.d:116 (this=<error reading variable: Cannot access memory at address 0xe9>) at sections_elf_shared.d:67 (this=...) at sections_elf_shared.d:104 _D2rt6memory16initStaticDataGCFZ14__foreachbody1MFKSQBy19secti ns_elf_shared3DSOZi (sg=...) at memory.d:23 _D2rt19sections_elf_shared3DSO7opApplyFMDFKSQBqQBqQyZiZi (dg=...) at sections_elf_shared.d:73 int(char[][]) function).runAll() () at dmain2.d:478 int(char[][]) function).tryExec(scope void() delegate) (dg=...) at dmain2.d:454 mainFunc=0xc5210 <D main>) at dmain2.d:487 __entrypoint.d:8 I don't really understand that invariant, I see that those vars are initialized way before in the init part and have values, for example: _tlsMod = 0 and _tlsSize = 388 Stack trace: _D2rt19sections_elf_shared12scanSegmentsFNbNiKxS4core3sys5linux4link12dl_phd _infoPSQDeQDe3DSOZv (info=..., pdso=0x307150) at sections_elf_shared.d:871 sections_elf_shared.d:455 target:/lib/ld-uClibc.so.0 Any idea why this fails and how to fix?On Wednesday, 10 January 2018 at 11:13:17 UTC, David Nadlinger wrote:Have you ported much of druntime to Uclibc? It currently assumes Glibc on linux by default, so if there are differences between the way the two handle such signals, it can cause problems. For example, the Android Java Runtime intercepts SIGUSR1/SIGUSR2 and doesn't run their signal handlers, so I had to work around that issue: https://github.com/dlang/druntime/pull/1851#discussion_r123886260 You may be running across a similar incompatibility, so I suggest you port all the version-dependent blocks of that module and its dependent modules first.[...]David, indeed sem_post works correctly, I guess gdb interpreted the sequence in the wrong order. [...]
Jan 14 2018
On Sunday, 14 January 2018 at 21:33:28 UTC, Radu wrote:On Wednesday, 10 January 2018 at 15:56:52 UTC, Joakim wrote:Figured that was it, that's why I asked you a couple times how much you had ported druntime.[...]I missed a bunch of details that where killing the signal handling, thanks for the guidance!, various size differences on structs. Fixed now.druntime tests are passing in release mode now. [...]_tlsMod and _tlsSize are extracted from shared libraries and then passed to __tls_get_addr to initialize thread-local storage for each library. That invariant makes sure the TLS index _tlsMod isn't 0 along with a non-zero size, not sure why David checks for that. It could be he doesn't expect the index 0 for a shared library whereas uClibc is okay with that? I don't use this module or arbitrary shared libraries on Android/ARM, so I haven't had to mess with it.
Jan 15 2018
On 15 Jan 2018, at 10:05, Joakim via digitalmars-d-ldc wrote:_tlsMod and _tlsSize are extracted from shared libraries and then passed to __tls_get_addr to initialize thread-local storage for each library. That invariant makes sure the TLS index _tlsMod isn't 0 along with a non-zero size, not sure why David checks for that. It could be he doesn't expect the index 0 for a shared library whereas uClibc is okay with that?We inherited that from Martin's code – presumably, it's just never the case on glibc. If all the tests work with shared libraries (DMD test suite and runtime unit tests, plus druntime/test, as run by ctest), there is nothing to worry about. — David
Jan 15 2018
On Friday, 15 December 2017 at 14:06:37 UTC, Radu wrote:Trying to run some D code on Openwrt with Uclibc and got stuck by broken GC. Using LDC 1.6 ==================================== LDC - the LLVM D compiler (1.6.0): based on DMD v2.076.1 and LLVM 5.0.0 built with LDC - the LLVM D compiler (1.6.0) Default target: x86_64-unknown-linux-gnu Host CPU: broadwell http://dlang.org - http://wiki.dlang.org/LDC Registered Targets: aarch64 - AArch64 (little endian) aarch64_be - AArch64 (big endian) arm - ARM arm64 - ARM64 (little endian) armeb - ARM (big endian) nvptx - NVIDIA PTX 32-bit nvptx64 - NVIDIA PTX 64-bit ppc32 - PowerPC 32 ppc64 - PowerPC 64 ppc64le - PowerPC 64 LE thumb - Thumb thumbeb - Thumb (big endian) x86 - 32-bit X86: Pentium-Pro and above x86-64 - 64-bit X86: EM64T and AMD64 ==================================== Run time libs where compiled with: ==================================== ldc-build-runtime --dFlags="-w;-mtriple=armv7-linux-gnueabihf -mcpu=cortex-a7 -L-lstdc++" --cFlags="-mcpu=cortex-a7 -mfloat-abi=hard -D__UCLIBC_HAS_BACKTRACE__ -D__UCLIBC_HAS_TLS__" --targetSystem="Linux;UNIX" BUILD_SHARED_LIBS=OFF ====================================First thing I'd do is build and run the test runners, then make sure no tests are failing, particularly in druntime. Another thing I notice is that you don't separate many of those C and D flags with semi-colons: not sure how that worked for you, as I get errors if I try something similar. Also, you need to specify the C cross-compiler with CC=arm-openwrt-linux-gcc before running ldc-build-runtime: maybe you did that but forgot to mention it. It is fairly easy to cross-compile the test runners too if you pass the --testrunners flag, see the instructions for the RPi and Android for examples: https://wiki.dlang.org/Building_LDC_runtime_libraries https://wiki.dlang.org/Build_D_for_Android You may need to make some modifications to druntime or Phobos to get everything to compile, and you may have to specify some linker flags too, to get the test runners to link. Let us know how it works out. While you could reuse most of the glibc declarations for now, you may eventually need to patch druntime for Uclibc, as was done before for Bionic and the NetBSD libc for example: https://github.com/dlang/druntime/pull/734 https://github.com/dlang/druntime/pull/1494
Dec 16 2017
On Saturday, 16 December 2017 at 14:14:40 UTC, Joakim wrote:On Friday, 15 December 2017 at 14:06:37 UTC, Radu wrote:Test runners where out of the question as no program started. See my reply to David. Yeah I setup the CC correctly, but curiously specifying a more fitting platform triple and -march on GCC produced non working binaries, I had to revert to the defaults. Yes - latest LDC versions make cross compiling a breeze so kudos to you guys for making this happening. I'm using Linux subsystem for Window btw. so for me this is even more fun as I can work on both environments natively :) The modifications need it surface deep are very few - some math and memory streams functions are missing. The road block looks to be somewhere in the GC and TLS, or the interaction of them (at least this is my feeling ATM)Trying to run some D code on Openwrt with Uclibc and got stuck by broken GC. Using LDC 1.6 ==================================== LDC - the LLVM D compiler (1.6.0): based on DMD v2.076.1 and LLVM 5.0.0 built with LDC - the LLVM D compiler (1.6.0) Default target: x86_64-unknown-linux-gnu Host CPU: broadwell http://dlang.org - http://wiki.dlang.org/LDC Registered Targets: aarch64 - AArch64 (little endian) aarch64_be - AArch64 (big endian) arm - ARM arm64 - ARM64 (little endian) armeb - ARM (big endian) nvptx - NVIDIA PTX 32-bit nvptx64 - NVIDIA PTX 64-bit ppc32 - PowerPC 32 ppc64 - PowerPC 64 ppc64le - PowerPC 64 LE thumb - Thumb thumbeb - Thumb (big endian) x86 - 32-bit X86: Pentium-Pro and above x86-64 - 64-bit X86: EM64T and AMD64 ==================================== Run time libs where compiled with: ==================================== ldc-build-runtime --dFlags="-w;-mtriple=armv7-linux-gnueabihf -mcpu=cortex-a7 -L-lstdc++" --cFlags="-mcpu=cortex-a7 -mfloat-abi=hard -D__UCLIBC_HAS_BACKTRACE__ -D__UCLIBC_HAS_TLS__" --targetSystem="Linux;UNIX" BUILD_SHARED_LIBS=OFF ====================================First thing I'd do is build and run the test runners, then make sure no tests are failing, particularly in druntime. Another thing I notice is that you don't separate many of those C and D flags with semi-colons: not sure how that worked for you, as I get errors if I try something similar. Also, you need to specify the C cross-compiler with CC=arm-openwrt-linux-gcc before running ldc-build-runtime: maybe you did that but forgot to mention it. It is fairly easy to cross-compile the test runners too if you pass the --testrunners flag, see the instructions for the RPi and Android for examples: https://wiki.dlang.org/Building_LDC_runtime_libraries https://wiki.dlang.org/Build_D_for_Android You may need to make some modifications to druntime or Phobos to get everything to compile, and you may have to specify some linker flags too, to get the test runners to link. Let us know how it works out. While you could reuse most of the glibc declarations for now, you may eventually need to patch druntime for Uclibc, as was done before for Bionic and the NetBSD libc for example: https://github.com/dlang/druntime/pull/734 https://github.com/dlang/druntime/pull/1494
Dec 17 2017