www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - dmd memory usage/static lib/algorithm bug?

reply Marek Janukowicz <marek janukowicz.net> writes:
This is really a cross-domain issue, but I didn't feel like splitting it 
into separate posts would make sense.

I use DMD 2.063.2 on Linux 64-bit.

I have some code in my (non-trivial) application that basically corresponds 
to this:

import std.stdio, std.algorithm, std.array;

void main () {
  struct Val {
    int i;
  }
  Val[] arr;
  arr ~= Val( 3 );
  arr ~= Val( 1 );
  arr ~= Val( 2 );
  auto sorter = (Val a, Val b) { return a.i < b.i; };
  writefln( "sorted: %s", arr.sort!(sorter));
}

While this simple example works, I'm getting segfaults with corresponding 
code in thisi bigger project. Those segfaults can be traced down to 
algorithm.d line 8315 or another line (8358?) that use this "sorter" lambda 
I passed to "sort!" - suggesting it is a bad memory reference.

I tried to create a simple test case that would fail similarly, but to no 
avail. I can't make the code for my whole project available, so let's just 
say it's either some bug in DMD or something caused by my limited knowledge 
of D.

Now the funny things begin: I copied algorithm.d to my project in an attempt 
to make some modifications to it (hopefully to fix the problem or at least 
get some insight into its nature), but things miraculously started working! 
This leads me to the suspicion there is something wrong with libphobos2.a 
file provided with DMD tarball.

Next problem in the line is that compilation of my project with algorithm.d 
included takes almost 4GB of RAM. While I'm aware of the fact DMD 
deliberately doesn't free the memory for performance purposes, this makes 
the compilation fail due to insufficient memory on machines with 4GB RAM 
(and some taken).

So my questions are:
- how can I limit DMD memory usage?
- how can I include a static library with my project? I can compile 
algorithm.d to a static lib, but how do I include this one explicitly with 
my project while the rest of Phobos should be taken from the stock 
libphobos2.a ?
- any other ideas how to solve my problems on any level?

-- 
Marek Janukowicz
Aug 28 2013
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Aug 28, 2013 at 05:29:40PM +0200, Marek Janukowicz wrote:
 This is really a cross-domain issue, but I didn't feel like splitting
 it into separate posts would make sense.
 
 I use DMD 2.063.2 on Linux 64-bit.
 
 I have some code in my (non-trivial) application that basically
 corresponds to this:
 
 import std.stdio, std.algorithm, std.array;
 
 void main () {
   struct Val {
     int i;
   }
   Val[] arr;
   arr ~= Val( 3 );
   arr ~= Val( 1 );
   arr ~= Val( 2 );
   auto sorter = (Val a, Val b) { return a.i < b.i; };
   writefln( "sorted: %s", arr.sort!(sorter));
 }
 
 While this simple example works, I'm getting segfaults with
 corresponding code in thisi bigger project. Those segfaults can be
 traced down to algorithm.d line 8315 or another line (8358?) that use
 this "sorter" lambda I passed to "sort!" - suggesting it is a bad
 memory reference.
Possible causes that I can think of are: 1) You have a struct somewhere and your lambda closes over it (or one of its members), but later on you return this struct to another scope and invoke the lambda. But since structs can get moved around when passed between different functions, the lambda's context pointer is invalid and so it crashes. On simple programs this problem may be hidden because you don't use the stack as much, so the old copy of the struct may still be intact even though that part of the stack is technically no longer valid, so the lambda may still *appear* to work. 2) There is a compiler bug that generates wrong code for a lambda. There have been some such bugs before where it fails to recognize a lambda, or fails to notice that a local variable is closed over, so it doesn't move the local variable to the heap but leaves it on the stack, where it gets invalidated afterwards, causing the lambda to crash.
 I tried to create a simple test case that would fail similarly, but to
 no avail. I can't make the code for my whole project available, so
 let's just say it's either some bug in DMD or something caused by my
 limited knowledge of D.
If you have a reliable way of reproducing the problem and can encapsulate it into a shell script / batch file, you can use Vladimir Panteleev's DustMite to automatically reduce your code to the minimum for reproducing the bug. See: https://github.com/D-Programming-Language/tools/tree/master/DustMite
 Now the funny things begin: I copied algorithm.d to my project in an
 attempt to make some modifications to it (hopefully to fix the problem
 or at least get some insight into its nature), but things miraculously
 started working!  This leads me to the suspicion there is something
 wrong with libphobos2.a file provided with DMD tarball.
This is another possibility. :) Did you check whether DMD is linking the correct version of libphobos2.a into your program? Sometimes strange things can happen when you have stale copies of older versions of libphobos2.a lying around your system, and DMD accidentally picks those up instead of the correct version. But it could also be, that this is merely masking the problem. Invalid pointers are notorious for causing heisenbugs that appear/disappear when you move unrelated code around.
 Next problem in the line is that compilation of my project with
 algorithm.d included takes almost 4GB of RAM. While I'm aware of the
 fact DMD deliberately doesn't free the memory for performance
 purposes, this makes the compilation fail due to insufficient memory
 on machines with 4GB RAM (and some taken).
You could try splitting it up. :) Well, we're planning to split it up at some point, now that DMD supports package.d. Here's roughly how you might do it: - Temporarily rename std/algorithm.d into another file. - Create a directory called std/algorithm/ - Create the file std/algorithm/package.d containing something like this: module std.algorithm; public import std.algorithm.search; public import std.algorithm.sort; public import std.algorithm.set; public import std.algorithm.mutation; ... - Split up algorithm.d into the above parts (std/algorithm/search.d, std/algorithm/sort.d, ... etc.). You probably don't really need to follow exactly the above division; any partitioning of std.algorithm into mutually-independent parts will do. Probably splitting into just two parts will to cut down memory usage enough to make it compilable on your system. Or, if this is too much work for your purposes, you could make a copy of std.algorithm, rename it to my.algorithm, say, update all your imports accordingly, and then just edit the file and delete the parts you don't use. For example, if you don't use any of the set functions (merge, cartesianProduct, etc.), just delete them from the file along with all their unittests. This should reduce the amount of memory needed to compile it.
 So my questions are:
 - how can I limit DMD memory usage?
I'll let others answer, since I'm not that familiar with DMD source code myself.
 - how can I include a static library with my project? I can compile
 algorithm.d to a static lib, but how do I include this one explicitly
 with my project while the rest of Phobos should be taken from the
 stock libphobos2.a ?
This could be tricky. One way to do it, is to rename std.algorithm to my.algorithm (as mentioned above), compile that into myalgo.a, and then do something like: dmd program.d ... -ofprogram -L-lalgo.a -L-L. Hope this helps. T -- Just because you survived after you did it, doesn't mean it wasn't stupid!
Aug 28 2013
prev sibling parent reply Marek Janukowicz <marek janukowicz.net> writes:
I was finally able to create simple test case that probably reproduces the 
bug (probably, because the stack trace is completely different, but the code 
that is there is similar). This requires 2 source code files:

main.d:

module main;

// This line must be there - import any module from std causes is necessary 
to
// reproduce the bug
import std.stdio;

import sorter;

void main () {
} 

-------------------

sorter.d:

module sorter;

import std.algorithm;

  struct Val {
    int i;
  }

unittest {
  Val [] arr;
  arr ~= Val( 2 );
  arr ~= Val( 1 );

  // This works
  arr.sort!((Val a, Val b) { return a.i < b.i; });

  // This segfaults when sorting
  auto dg = (Val a, Val b) { return a.i < b.i; };
  arr.sort!(dg);

}

------------------

Run with: 

dmd -unittest main.d sorter.d && ./main

For me this results in a segfault. Changing one of many seemingly unrelated 
details (eg. moving offending code directly to main, commenting out 
std.stdio import in main.d) makes the problem disappear.

Can anyone try to reproduce that? Again, I'm on DMD 2.063.2.

H. S. Teoh - thanks for your detailed description, but this test case 
probably sheds some more light and invalidates some of your hyphotheses. As 
for building static library - I thought it would be easier, so if the bug 
remains unresolved I'll probably just rebuild the whole phobos. The problem 
is definitely not some old version of libphobos2.a stuck around, because the 
problem could be reproduced exactly the same way on 3 machines I tried.

-- 
Marek Janukowicz
Aug 28 2013
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Aug 28, 2013 at 11:02:17PM +0200, Marek Janukowicz wrote:
 I was finally able to create simple test case that probably reproduces
 the bug (probably, because the stack trace is completely different,
 but the code that is there is similar). This requires 2 source code
 files:
[...]
 Run with: 
 
 dmd -unittest main.d sorter.d && ./main
 
 For me this results in a segfault. Changing one of many seemingly
 unrelated details (eg. moving offending code directly to main,
 commenting out std.stdio import in main.d) makes the problem
 disappear.
 
 Can anyone try to reproduce that? Again, I'm on DMD 2.063.2.
It doesn't seem to happen on git HEAD. I'm going to try 2.063.2 and see what happens. Oh, and BTW, are you on Linux 32-bit or 64-bit? Don't know if that makes a difference, but just in case.
 H. S. Teoh - thanks for your detailed description, but this test case
 probably sheds some more light and invalidates some of your
 hyphotheses. As for building static library - I thought it would be
 easier, so if the bug remains unresolved I'll probably just rebuild
 the whole phobos. The problem is definitely not some old version of
 libphobos2.a stuck around, because the problem could be reproduced
 exactly the same way on 3 machines I tried.
[...] I'll give it a try on 2.063.2 to see if I can reproduce the problem. T -- Ruby is essentially Perl minus Wall.
Aug 28 2013
parent reply Marek Janukowicz <marek janukowicz.net> writes:
H. S. Teoh wrote:

 On Wed, Aug 28, 2013 at 11:02:17PM +0200, Marek Janukowicz wrote:
 I was finally able to create simple test case that probably reproduces
 the bug (probably, because the stack trace is completely different,
 but the code that is there is similar). This requires 2 source code
 files:
[...]
 Run with:
 
 dmd -unittest main.d sorter.d && ./main
 
 For me this results in a segfault. Changing one of many seemingly
 unrelated details (eg. moving offending code directly to main,
 commenting out std.stdio import in main.d) makes the problem
 disappear.
 
 Can anyone try to reproduce that? Again, I'm on DMD 2.063.2.
It doesn't seem to happen on git HEAD. I'm going to try 2.063.2 and see what happens. Oh, and BTW, are you on Linux 32-bit or 64-bit? Don't know if that makes a difference, but just in case.
64-bit -- Marek Janukowicz
Aug 28 2013
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Aug 29, 2013 at 12:45:05AM +0200, Marek Janukowicz wrote:
 H. S. Teoh wrote:
[...]
 Oh, and BTW, are you on Linux 32-bit or 64-bit? Don't know if that
 makes a difference, but just in case.
64-bit
[...] Maybe try compiling with -m32 and see if it makes a difference? If so, it may be a 64-bit related dmd bug. I'm also having trouble building a working compiler toolchain with a purely 64-bit environment. T -- Fact is stranger than fiction.
Aug 28 2013
parent reply Marek Janukowicz <marek janukowicz.net> writes:
H. S. Teoh wrote:

 On Thu, Aug 29, 2013 at 12:45:05AM +0200, Marek Janukowicz wrote:
 H. S. Teoh wrote:
[...]
 Oh, and BTW, are you on Linux 32-bit or 64-bit? Don't know if that
 makes a difference, but just in case.
64-bit
[...] Maybe try compiling with -m32 and see if it makes a difference? If so, it may be a 64-bit related dmd bug. I'm also having trouble building a working compiler toolchain with a purely 64-bit environment.
Yeah, it makes a difference :) ./main(_D4core7runtime18runModuleUnitTestsUZb19unittestSegvHandlerUiPS4core3sys5posix6signal9siginfo_tPvZv+0x2c) [0x80a8c64] linux-gate.so.1(__kernel_rt_sigreturn+0x0)[0xffffe410] ./main(_D6sorter14__unittestL9_3FZv101__T13quickSortImplS65_D6sorter14__unittestL9_3FZv2dgPFNaNbNfS6sorter3ValS6sorter3ValZbTAS6sorter3ValZ13quickSortImplMFAS6sorter3ValZv+0x1a7) [0x80a1b53] ./main(_D6sorter14__unittestL9_3FZv122__T4sortS65_D6sorter14__unittestL9_3FZv2dgPFNaNbNfS6sorter3ValS6sorter3ValZbVE3std9algorithm12SwapStrategy0TAS6sorter3ValZ4sortMFAS6sorter3ValZS6sorter14__unittestL9_3FZv99__T11SortedRangeTAS6sorter3ValS65_D6sorter14__unittestL9_3FZv2dgPFNaNbNfS6sorter3ValS6sorter3ValZbZ11SortedRange+0x17) [0x80a20cb] ./main(_D6sorter14__unittestL9_3FZv+0x6d)[0x80a2079] ./main(_D6sorter9__modtestFZv+0x8)[0x80a21ac] ./main(_D4core7runtime18runModuleUnitTestsUZb16__foreachbody352MFKPS6object10ModuleInfoZi+0x24) [0x80a8ccc] ./main(_D2rt5minfo17moduleinfos_applyFMDFKPS6object10ModuleInfoZiZi16__foreachbody541MFKS2rt14sections_linux3DSOZi+0x37) [0x80a5f47] ./main(_D2rt14sections_linux3DSO7opApplyFMDFKS2rt14sections_linux3DSOZiZi+0x2c) [0x80a619c] ./main(_D2rt5minfo17moduleinfos_applyFMDFKPS6object10ModuleInfoZiZi+0x14) [0x80a5ef4] ./main(runModuleUnitTests+0x87)[0x80a8bd7] ./main(_D2rt6dmain211_d_run_mainUiPPaPUAAaZiZi6runAllMFZv+0x25)[0x80a4a55] ./main(_D2rt6dmain211_d_run_mainUiPPaPUAAaZiZi7tryExecMFMDFZvZv+0x18) [0x80a46c0] ./main(_d_run_main+0x121)[0x80a4691] ./main(main+0x14)[0x80a4564] /lib32/libc.so.6(__libc_start_main+0xf3)[0xf74d4943] Segmentation fault (core dumped) This stacktrace did not show in 64-bit version, but the problem persists. -- Marek Janukowicz
Aug 28 2013
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Aug 29, 2013 at 01:46:03AM +0200, Marek Janukowicz wrote:
 H. S. Teoh wrote:
 
 On Thu, Aug 29, 2013 at 12:45:05AM +0200, Marek Janukowicz wrote:
 H. S. Teoh wrote:
[...]
 Oh, and BTW, are you on Linux 32-bit or 64-bit? Don't know if
 that makes a difference, but just in case.
64-bit
[...] Maybe try compiling with -m32 and see if it makes a difference? If so, it may be a 64-bit related dmd bug. I'm also having trouble building a working compiler toolchain with a purely 64-bit environment.
Yeah, it makes a difference :)
[...]
 This stacktrace did not show in 64-bit version, but the problem
 persists.
[...] OK, my trouble with compiling 2.063.2 in 64-bit was actually my own fault -- I had a faulty dmd.conf -- so actually that has nothing to do with your problem. Anyway, I don't think it's a problem with libphobos2.a, because tracing through your failing test case, I see that it's all template functions instantiated from std.algorithm, and no static functions from the library are called at the point of failure. The template code in Phobos also looks kosher, so right now I'm suspecting a DMD codegen bug. I'm going to investigate the disassembly now to figure out what's going on. The good news is that the upcoming dmd 2.064 doesn't appear to have this problem: the part of std.algorithm that concerns your code doesn't appear to have been touched since 2.063.2, so the problem is unlikely to be there. So whatever dmd bug is causing the problem has been fixed in git HEAD. The bad news is that dmd 2.064 has some changes that makes it incompatible with 2.063.2 druntime/phobos, so you won't be able to use it unless you build the entire dmd/druntime/phobos toolchain from git. T -- "No, John. I want formats that are actually useful, rather than over-featured megaliths that address all questions by piling on ridiculous internal links in forms which are hideously over-complex." -- Simon St. Laurent on xml-dev
Aug 29 2013
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Aug 28, 2013 at 02:10:34PM -0700, H. S. Teoh wrote:
 On Wed, Aug 28, 2013 at 11:02:17PM +0200, Marek Janukowicz wrote:
 I was finally able to create simple test case that probably
 reproduces the bug (probably, because the stack trace is completely
 different, but the code that is there is similar). This requires 2
 source code files:
[...]
 Run with: 
 
 dmd -unittest main.d sorter.d && ./main
 
 For me this results in a segfault. Changing one of many seemingly
 unrelated details (eg. moving offending code directly to main,
 commenting out std.stdio import in main.d) makes the problem
 disappear.
 
 Can anyone try to reproduce that? Again, I'm on DMD 2.063.2.
It doesn't seem to happen on git HEAD. I'm going to try 2.063.2 and see what happens.
[...] Update: I've reproduced this problem on 2.063.2, but while trying to track down the problem, I discovered that building 2.063.2 from source doesn't produce a working toolchain. :-( I've spent an hour trying to figure out what's wrong with 2.063.2, but with no success. So, you *could* be right that something may be wrong with libphobos2.a. I'll try to download the tarball from dlang.org again and see if that helps, or if I've messed up my system somehow. :-P T -- There are two ways to write error-free programs; only the third one works.
Aug 28 2013