digitalmars.D.ldc - Remaining Travis merge-2.064 failure
- David Nadlinger via digitalmars-d-ldc (13/13) May 17 2014 Hi all,
- Kai Nacke (10/29) May 18 2014 Hi David,
- Kai Nacke (4/4) May 19 2014 It is also not reproducible on Ubuntu 13.10 with multilib
- David Nadlinger via digitalmars-d-ldc (2/3) May 20 2014
- David Nadlinger via digitalmars-d-ldc (18/22) May 20 2014 Hi Kai,
- Kai Nacke (8/12) May 23 2014 Hi David!
- Kai Nacke (33/43) May 25 2014 Hi David!
- David Nadlinger via digitalmars-d-ldc (12/14) May 26 2014 As I mentioned, I suspect that the issue is dependent on the order the
- Kai Nacke (31/49) May 26 2014 This reveals:
- Christian Kamm (9/13) Jun 09 2014 You are saying std_stdio.o uses that symbol - and the bug is that the
- Christian Kamm (16/19) Jun 09 2014 The symbol isn't emitted because its instantiatingModule is std.bitmanip
- David Nadlinger via digitalmars-d-ldc (29/42) Jun 09 2014 This logic was adapted from what I gathered from discussions when 2.064
- Christian Kamm (14/36) Jun 09 2014 Yes, I saw. I just think the comment is misleading: it says
- David Nadlinger via digitalmars-d-ldc (13/17) Jun 10 2014 Yes, that would be great.
- Christian Kamm (4/14) Jun 10 2014 Oh, right! Thanks for clearing that up for me.
- Kai Nacke (5/13) Jun 10 2014 That's a nice summary what the code does. A pull request would be
- Christian Kamm (10/12) Jun 14 2014 For what it's worth, compiling stdio.d with dmd 2.064 also does not emit
- Christian Kamm (3/16) Jun 14 2014 Maybe removing the call to ranlib or using ranlib -D could help? The
- David Nadlinger via digitalmars-d-ldc (9/10) Jun 14 2014 At this point, I think _any_ fix for the issue would be fine. We
- Christian Kamm (10/13) Jun 14 2014 Is it possible to log into the Travis instances? If it is, looking at
- David Nadlinger via digitalmars-d-ldc (6/8) Jun 14 2014 You can get at the output by simply submitting a pull request that
- Kai Nacke (10/18) Jun 15 2014 Yes, but the std.array.Appender() ctor which calls
Hi all, so there currently seems to be one remaining Travis build failure on the merge-2.064 branch, namely a linking issue in the release tests, and only on LLVM 3.4: https://travis-ci.org/ldc-developers/ldc/builds/25057239 The issue looks like a symbol emission problem where std.net.curl is getting pulled in randomly because it contained a template instance that is missing from whatever other module by chance. However, I can neither replicate the issue on x86_64 Linux locally nor on OS X. Any ideas? We should really try to get the merge-* branches integrated into master ASAP. Best, David
May 17 2014
On Sunday, 18 May 2014 at 02:04:52 UTC, David Nadlinger via digitalmars-d-ldc wrote:Hi all, so there currently seems to be one remaining Travis build failure on the merge-2.064 branch, namely a linking issue in the release tests, and only on LLVM 3.4: https://travis-ci.org/ldc-developers/ldc/builds/25057239 The issue looks like a symbol emission problem where std.net.curl is getting pulled in randomly because it contained a template instance that is missing from whatever other module by chance. However, I can neither replicate the issue on x86_64 Linux locally nor on OS X. Any ideas? We should really try to get the merge-* branches integrated into master ASAP. Best, DavidHi David, I think this is a problem with the multilib setup. This Travis build passes: https://travis-ci.org/ldc-developers/ldc/builds/25474830 - the only difference is that the 32bit libraries are not build (and some applications are not installed, e.g. gcc-multilib). Regards, Kai
May 18 2014
It is also not reproducible on Ubuntu 13.10 with multilib creation. Regards, Kai
May 19 2014
On Mon 19 May 2014 05:46:44 PM CEST, Kai Nacke via digitalmars-d-ldc wrote:It is also not reproducible on Ubuntu 13.10 with multilib creation.
May 20 2014
Hi Kai, On Mon, May 19, 2014 at 7:36 AM, Kai Nacke via digitalmars-d-ldc <digitalmars-d-ldc puremagic.com> wrote:I think this is a problem with the multilib setup. This Travis build passes: https://travis-ci.org/ldc-developers/ldc/builds/25474830 - the only difference is that the 32bit libraries are not build (and some applications are not installed, e.g. gcc-multilib).There is still be the possibility that this is a bug in LDC. If it's a case of template symbols not being emitted correctly, then the order in which the other object files in libphobos.a are tried might differ between different toolchain versions, and we might just get unlucky to hit curl.o (std.net) in the multilib case. If you have a setup where you can actually reproduce this, it might be worth having a short look at what actually causes curl.o to be pulled in. Maybe GNU ld has an option similar to the OS X one to do this, but if you can't find it like me, you could adapt that hacky script I sent you once to track down which unresolved symbol causes the issue. I agree that it is not worth postponing the release due to this, though, if we don't have concrete evidence that this also affects user code. Best, David
May 20 2014
Hi David! On Tuesday, 20 May 2014 at 10:16:46 UTC, David Nadlinger via digitalmars-d-ldc wrote:I agree that it is not worth postponing the release due to this, though, if we don't have concrete evidence that this also affects user code.I changed my mind here. The problem is reproducible with the beta1 binaries if used with a different Linux distro. I have some time this weekend and will analyze this further. Regards, Kai
May 23 2014
Hi David! On Tuesday, 20 May 2014 at 10:16:46 UTC, David Nadlinger via digitalmars-d-ldc wrote:Hi Kai, On Mon, May 19, 2014 at 7:36 AM, Kai Nacke via digitalmars-d-ldc <digitalmars-d-ldc puremagic.com> wrote:I am a bit stuck here. I compiled stdiobase.d (one of the failing tests) with -unittest -main. Then I extracted libphobos-ldc.a and resolved all dependencies by hand. This results in gcc -o stdiobase ../stdiobase.o ../__main.o src_rt_dmain2.o std_stdio.o src_object_.o src_ldc_eh.o src_rt_monitor_.o src_rt_critical_.o src_rt_lifetime.o src_rt_tlsgc.o src_rt_aaA.o src_rt_cast_.o src_core_memory.o src_gc_gc.o src_gc_bits.o src_core_sync_mutex.o src_core_sync_exception.o src_core_exception.o src_core_thread.o src_gc_proxy.o src_core_time.o src_rt_adi.o src_rt_typeinfo_ti_*.o src_rt_util_console.o src_rt_sections_ldc.o src_rt_sections_linux.o std_string.o std_exception.o std_format.o src_core_sys_posix_netdb.o std_utf.o std_array.o std_conv.o std_typecons.o std_algorithm.o std_range.o std_typetuple.o std_traits.o std_ascii.o std_functional.o std_uni.o src_rt_memory.o src_core_stdc_errno.o src_rt_util_hash.o src_gc_os.o src_rt_minfo.o src_core_bitop.o src_rt_util_string.o src_rt_util_utf.o src_rt_util_container.o src_rt_aApply.o src_core_runtime.o src_ldc_arrayinit.o src_rt_qsort.o std_math.o std_random.o std_bitmanip.o std_container.o std_internal_unicode_comp.o std_internal_unicode_tables.o src_core_demangle.o std_numeric.o std_complex.o src_rt_switch_.o errno.c.o -lc -lpthread -lm -ldl -lrt This creates the executable stdiobase without a link error and without using -lcurl. I don't understand this. Any ideas? (All on Ubuntu 12 64bit.) Regards, KaiI think this is a problem with the multilib setup. This Travis build passes: https://travis-ci.org/ldc-developers/ldc/builds/25474830 - the only difference is that the 32bit libraries are not build (and some applications are not installed, e.g. gcc-multilib).
May 25 2014
Hi Kai, On 05/26/2014 07:56 AM, Kai Nacke via digitalmars-d-ldc wrote:This creates the executable stdiobase without a link error and without using -lcurl. I don't understand this. Any ideas? (All on Ubuntu 12 64bit.)As I mentioned, I suspect that the issue is dependent on the order the linker searches the object files. Several object files might have the missing template symbol, curl.o just being one of them. If you run the failing compile with -L-M, you should see a mention of why curl.o is pulled in near the top of the output. IIRC the output also includes information about for which specifc module the symbol was requested. Then, you'd need to debug into LDC to see why the symbol in question is not emitted to the module that needs it. Best, David
May 26 2014
On Monday, 26 May 2014 at 09:03:29 UTC, David Nadlinger via digitalmars-d-ldc wrote:Hi Kai, On 05/26/2014 07:56 AM, Kai Nacke via digitalmars-d-ldc wrote:This reveals: /build/work/ldc/runtime/../lib/libphobos-ldc.a(std_net_curl.o) /build/work/ldc/runtime/../lib/libphobos-ldc.a(std_stdio.o) (_D6object15__T8capacityTaZ8capacityFNaNbNdAaZm) The mentioned weak symbol is defined in several files: _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_uni.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_math.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_exception.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_conv.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_functional.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_string.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_typetuple.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_container.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_numeric.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_complex.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_array.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_algorithm.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_bitmanip.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_range.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_format.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_utf.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_traits.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_typecons.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_random.o _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm in std_net_curl.o but std.stdio is missing. This at least explains it... Thanks. Regards, KaiThis creates the executable stdiobase without a link error and without using -lcurl. I don't understand this. Any ideas? (All on Ubuntu 12 64bit.)As I mentioned, I suspect that the issue is dependent on the order the linker searches the object files. Several object files might have the missing template symbol, curl.o just being one of them. If you run the failing compile with -L-M, you should see a mention of why curl.o is pulled in near the top of the output. IIRC the output also includes information about for which specifc module the symbol was requested. Then, you'd need to debug into LDC to see why the symbol in question is not emitted to the module that needs it. Best, David
May 26 2014
/build/work/ldc/runtime/../lib/libphobos-ldc.a(std_net_curl.o) /build/work/ldc/runtime/../lib/libphobos-ldc.a(std_stdio.o) (_D6object15__T8capacityTaZ8capacityFNaNbNdAaZm)You are saying std_stdio.o uses that symbol - and the bug is that the instantiation was not emitted into that object file? The symptom is that it then uses the instantiation from std_net_curl.o, leading to a linker error because -lcurl wasn't passed? My runtime/std/stdio.o also only has a U _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm so I think I can reproduce it locally. Regards, Christian
Jun 09 2014
On 09.06.2014 09:29, Christian Kamm wrote:My runtime/std/stdio.o also only has a U _D6object15__T8capacityTaZ8capacityFNaNbNdAaZm so I think I can reproduce it locally.The symbol isn't emitted because its instantiatingModule is std.bitmanip - which is not a root module - and thus the function is ignored by DtoDefineFunction. I think the comment in there (functions.cpp:922) is wrong. The frontend seems to try hard to make sure instantiatingModule is a non-root module if possible. That should mean LDC is not emitting templates that have a non-root module instantiating them somewhere. The idea probably is that you shouldn't need to emit functions again if they were already emitted into a library you import and link. (if that's desired, the correct fix is probably to require -lcurl when linking phobos...) I wonder if you can break that behavior with a cycle of imports-that-instantiate. But I couldn't make a failing test case. Cheers, Christian
Jun 09 2014
On 9 Jun 2014, at 10:44, Christian Kamm via digitalmars-d-ldc wrote:I think the comment in there (functions.cpp:922) is wrong. The frontend seems to try hard to make sure instantiatingModule is a non-root module if possible. That should mean LDC is not emitting templates that have a non-root module instantiating them somewhere.This logic was adapted from what I gathered from discussions when 2.064 (I think) came out. If you look at FuncDeclaration::toObjFile in DMD 2.064.2, you'll see that it uses the same logic to determine whether to emit a certain symbol. In more recent versions, Kenji has moved the check into the frontend at our (GDC/LDC) request, but it is still fundamentally the same (FuncDeclaration::needsCodegen, https://github.com/D-Programming-Language/dmd/pull/3107).The idea probably is that you shouldn't need to emit functions again if they were already emitted into a library you import and link. (if that's desired, the correct fix is probably to require -lcurl when linking phobos...)The idea is instead that functions that are already part of an *object file* you need to link anyway should not be emitted again. This is a sound design, as long as you only omit template instances that you know are already required by somebody else in your dependency graph (ignoring cycles for the moment). Now, obviously std.net.curl isn't in the import graph of std.stdio. What seems to happen here is that std.net.curl only contains the symbol by accident, even though we thought it was going to be provided by somebody else. And as we don't build with symbol-per-section and --gc-sections yet, this of course causes us to also pull in the libcurl dependencies. Thinking about this a bit, it seems very plausible that the compiler actually works as intended here. This suggests that a possible fix would be to split off everything that depends on curl into a separate static library, as this would guarantee that the linker looks up the object files from the non-curl modules first (but then, of course, we'd either have to specify libphobos-ldc twice, or use the GNU ld grouping options, to get std.net.curl to link with its Phobos dependencies). Best, David
Jun 09 2014
On 09.06.2014 13:09, David Nadlinger via digitalmars-d-ldc wrote:On 9 Jun 2014, at 10:44, Christian Kamm via digitalmars-d-ldc wrote:Yes, I saw. I just think the comment is misleading: it says "Skip generating code if this part of a TemplateInstance that is instantiated only by non-root modules" but actually it seems to skip instances that have any non-root module instantiating them. I'll make a pull request to fix it.I think the comment in there (functions.cpp:922) is wrong. The frontend seems to try hard to make sure instantiatingModule is a non-root module if possible. That should mean LDC is not emitting templates that have a non-root module instantiating them somewhere.This logic was adapted from what I gathered from discussions when 2.064 (I think) came out. If you look at FuncDeclaration::toObjFile in DMD 2.064.2, you'll see that it uses the same logic to determine whether to emit a certain symbol.Okay. Aside: how does it deal with cycles? Wouldn't no instance be emitted if two modules both instantiate the same function and include each other? (in practice both were emitted for me)The idea probably is that you shouldn't need to emit functions again if they were already emitted into a library you import and link. (if that's desired, the correct fix is probably to require -lcurl when linking phobos...)The idea is instead that functions that are already part of an *object file* you need to link anyway should not be emitted again. This is a sound design, as long as you only omit template instances that you know are already required by somebody else in your dependency graph (ignoring cycles for the moment).Thinking about this a bit, it seems very plausible that the compiler actually works as intended here.Agreed. In dmd's libphobos2, array_10f_5e7.o and array_187_86f.o use the symbol while only object_5_50d.o defines it. Why doesn't dmd's stdio use it? Regards, Christian
Jun 09 2014
Hi Christian, On 9 Jun 2014, at 18:28, Christian Kamm via digitalmars-d-ldc wrote:I'll make a pull request to fix it.Yes, that would be great.Okay. Aside: how does it deal with cycles? Wouldn't no instance be emitted if two modules both instantiate the same function and include each other? (in practice both were emitted for me)If ti->instantiatingModule (the module you would like to pull the symbol in from) itself also imports at least one of the root modules, then importsRoot will be true, and the symbol will still be defined. Exactly the same situation will occur (with m/mi swapped) when building what currently is ti->instantiatingModule, so you end up emitting that template into both modules, as you observed. On a somewhat unrelated note, the use of "insearch" in that piece of code is a beautiful example of DMD's … uhm … pasta-inspired design. Best, David
Jun 10 2014
On 10.06.2014 14:09, David Nadlinger via digitalmars-d-ldc wrote:Oh, right! Thanks for clearing that up for me. Cheers, ChristianOkay. Aside: how does it deal with cycles? Wouldn't no instance be emitted if two modules both instantiate the same function and include each other? (in practice both were emitted for me)If ti->instantiatingModule (the module you would like to pull the symbol in from) itself also imports at least one of the root modules, then importsRoot will be true, and the symbol will still be defined. Exactly the same situation will occur (with m/mi swapped) when building what currently is ti->instantiatingModule, so you end up emitting that template into both modules, as you observed.
Jun 10 2014
On Monday, 9 June 2014 at 16:28:11 UTC, Christian Kamm wrote:On 09.06.2014 13:09, David Nadlinger via digitalmars-d-ldc wrote: Yes, I saw. I just think the comment is misleading: it says "Skip generating code if this part of a TemplateInstance that is instantiated only by non-root modules" but actually it seems to skip instances that have any non-root module instantiating them. I'll make a pull request to fix it.That's a nice summary what the code does. A pull request would be great! Regards, Kai
Jun 10 2014
Thinking about this a bit, it seems very plausible that the compiler actually works as intended here.For what it's worth, compiling stdio.d with dmd 2.064 also does not emit object.capacity(). So if dmd's phobos was built the same way as ldc's, I expect it'd have the same issue. It seems like we could imitate what dmd -lib does or use --gc-sections like David suggested to fix this for real. Could we, as a workaround, reorder the object files in the phobos library to make std.net.curl come last? Does the static linker work that way? It's annoying that this blocks a new ldc version from being released. Regards, Christian
Jun 14 2014
On 14.06.2014 14:18, Christian Kamm wrote:Maybe removing the call to ranlib or using ranlib -D could help? The order in the archive looks fine (string.o before curl.o).Thinking about this a bit, it seems very plausible that the compiler actually works as intended here.For what it's worth, compiling stdio.d with dmd 2.064 also does not emit object.capacity(). So if dmd's phobos was built the same way as ldc's, I expect it'd have the same issue. It seems like we could imitate what dmd -lib does or use --gc-sections like David suggested to fix this for real. Could we, as a workaround, reorder the object files in the phobos library to make std.net.curl come last? Does the static linker work that way? It's annoying that this blocks a new ldc version from being released.
Jun 14 2014
On Sat, Jun 14, 2014 at 2:18 PM, Christian Kamm via digitalmars-d-ldc <digitalmars-d-ldc puremagic.com> wrote:It's annoying that this blocks a new ldc version from being released.At this point, I think _any_ fix for the issue would be fine. We really need to get merge-2.064 and merge-2.065 out there. Unfortunately, I still can't reproduce the issue. The Travis CI docs say that they run Ubuntu 12.04 LTS, but I couldn't get the linker error to appear on a EC2 instance I set up from the Canonical AMI. Best, David
Jun 14 2014
On 14.06.2014 20:28, David Nadlinger via digitalmars-d-ldc wrote:Unfortunately, I still can't reproduce the issue. The Travis CI docs say that they run Ubuntu 12.04 LTS, but I couldn't get the linker error to appear on a EC2 instance I set up from the Canonical AMI.Is it possible to log into the Travis instances? If it is, looking at the output and playing around with the static linker flags could help finding a workaround. I'd be interested in nm -s on that libphobos2.a as well as the order of files in the archive and the symbols in the object files. I don't think ranlib -D would change anything - it seems to only force some meta information to 0, not change the lookup order. Cheers, Christian
Jun 14 2014
On Sat, Jun 14, 2014 at 8:45 PM, Christian Kamm via digitalmars-d-ldc <digitalmars-d-ldc puremagic.com> wrote:I'd be interested in nm -s on that libphobos2.a as well as the order of files in the archive and the symbols in the object files.You can get at the output by simply submitting a pull request that adds the "nm -s" command to .travis.yml. Directly logging in is not possible, as far as I'm aware. David
Jun 14 2014
On Saturday, 14 June 2014 at 12:18:54 UTC, Christian Kamm wrote:Yes, but the std.array.Appender() ctor which calls object.capacity() is also not emitted by dmd 2.064. This ctor is called nowhere so it looks we are emitting too much code. The ctor is: pure nothrow ref safe std.array.Appender!(immutable(char)[]).Appender std.array.Appender!(immutable(char)[]).Appender.__ctor(char[]) Regards, KaiThinking about this a bit, it seems very plausible that the compiler actually works as intended here.For what it's worth, compiling stdio.d with dmd 2.064 also does not emit object.capacity(). So if dmd's phobos was built the same way as ldc's, I expect it'd have the same issue.
Jun 15 2014