www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Tricky DMD bug, but I have no idea how to report

reply JN <666total wp.pl> writes:
Hey guys,

while working on my game engine project, I encountered a DMD 
codegen bug. It occurs only when compiling in release mode, debug 
works. Unfortunately I am unable to minimize the code, since it's 
quite a bit of code, and changing the code changes the bug 
occurrence. Basically my faulty piece of code looks like this

class Texture2D {}

auto a = new Texture2D();
auto b = new Texture2D();
auto c = new Texture2D();
Texture2D[int] TextureBindings;
writeln(a, b, c);
textureBindings[0] = a;
textureBindings[1] = b;
textureBindings[2] = c;
writeln(textureBindings);

and the output is:

Texture2DTexture2DTexture2D
[0:null, 2:null, 1:null]

I'd expect it to output:

Texture2DTexture2DTexture2D
[0:Texture2D, 2:Texture2D, 1:Texture2D]

depending on what I change around this code, for example changing 
it to

writeln(a, " ", b, " ", c);

results in output of:

Texture2D Texture2D Texture2D
[0:Texture2D, 2:null, 1:null]

It feels completely random. Removing, adding calls completely 
unrelated to these changes the result. My guess is that the 
compiler somehow reorders the calls incorrectly, changing the 
semantics. Trick is, LDC works correctly and produces the 
expected result, both when compiling in debug and release mode.

I tried to play around with assoc arrays on run.dlang.io but 
could never reproduce it. It has to do something with the way my 
code works and possibly interacts with other C libraries. Does 
anyone have an idea what could it be and how to reproduce it so 
that it can be reported and fixed? For now, I'll just switch to 
LDC, but I feel bad leaving a possible bug intact and unreported.

This is with DMD32 D Compiler v2.083.1, on Windows, x86_64 
compilation target.
Dec 17 2018
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Dec 17, 2018 at 09:59:59PM +0000, JN via Digitalmars-d-learn wrote:
[...]
 class Texture2D {}
 
 auto a = new Texture2D();
 auto b = new Texture2D();
 auto c = new Texture2D();
 Texture2D[int] TextureBindings;
 writeln(a, b, c);
 textureBindings[0] = a;
 textureBindings[1] = b;
 textureBindings[2] = c;
 writeln(textureBindings);
 
 and the output is:
 
 Texture2DTexture2DTexture2D
 [0:null, 2:null, 1:null]
 
 I'd expect it to output:
 
 Texture2DTexture2DTexture2D
 [0:Texture2D, 2:Texture2D, 1:Texture2D]
 
 depending on what I change around this code, for example changing it to
 
 writeln(a, " ", b, " ", c);
 
 results in output of:
 
 Texture2D Texture2D Texture2D
 [0:Texture2D, 2:null, 1:null]
Ah, a pointer bug. Lovely. :-/ My first guess is that you have a bunch of references to local variables that have gone out of scope.
 It feels completely random. Removing, adding calls completely
 unrelated to these changes the result.
Typical symptoms of a pointer bug of some kind. Could be an uninitialized pointer, if you have used `T* p = void;` anywhere.
 My guess is that the compiler somehow reorders the calls incorrectly,
 changing the semantics.
Possible, but unlikely. My bet is that you have dangling pointers, most likely to local variables that have gone out of scope. Perhaps somewhere in the code you ran into the evil implicit conversion of static arrays into slices, which results in dangling pointers if said slice persists beyond the lifetime of the static array. Another likely candidate is that if you're calling C/C++ libraries somewhere in your code, you may have passed in a wrong size, perhaps a byte count where an array length ought to be used, or vice versa, and as a result you got a buffer overrun. I ran into similar bugs when writing OpenGL code.
 Trick is, LDC works correctly and produces the expected result, both
 when compiling in debug and release mode.
[...] I bet the bug is still there, just latent because of the slightly different memory layout when compiling with LDC. You probably want to be absolutely sure it's a compiler bug before moving on, as it could very well be a bug in your code. A less likely possibility might be an optimizer bug -- do you get different results if you add / remove '-O' (and/or '-inline') from your dmd command-line? If some combination of -O and -inline (or their removal thereof) "fixes" the problem, it could be an optimizer bug. But those are rare, and usually only show up when you use an obscure D feature combined with another obscure corner case, in a way that people haven't thought of. My bet is still on a pointer bug somewhere in your code. T -- If the comments and the code disagree, it's likely that *both* are wrong. -- Christopher
Dec 17 2018
parent reply JN <666total wp.pl> writes:
On Monday, 17 December 2018 at 22:22:05 UTC, H. S. Teoh wrote:
 A less likely possibility might be an optimizer bug -- do you 
 get different results if you add / remove '-O' (and/or 
 '-inline') from your dmd command-line?  If some combination of 
 -O and -inline (or their removal thereof) "fixes" the problem, 
 it could be an optimizer bug. But those are rare, and usually 
 only show up when you use an obscure D feature combined with 
 another obscure corner case, in a way that people haven't 
 thought of.  My bet is still on a pointer bug somewhere in your 
 code.
I played around with dmd commandline. It works with -O. Works with -O -inline. As soon as I add -boundscheck=off it breaks. As I understand it, out of bounds access is UB. Which would fit my problems because they look like UB. But if I run without boundscheck=off, shouldn't I get a RangeError somewhere?
Dec 18 2018
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Dec 18, 2018 at 10:29:07PM +0000, JN via Digitalmars-d-learn wrote:
 On Monday, 17 December 2018 at 22:22:05 UTC, H. S. Teoh wrote:
 A less likely possibility might be an optimizer bug -- do you get
 different results if you add / remove '-O' (and/or '-inline') from
 your dmd command-line?  If some combination of -O and -inline (or
 their removal thereof) "fixes" the problem, it could be an optimizer
 bug. But those are rare, and usually only show up when you use an
 obscure D feature combined with another obscure corner case, in a
 way that people haven't thought of.  My bet is still on a pointer
 bug somewhere in your code.
 
I played around with dmd commandline. It works with -O. Works with -O -inline. As soon as I add -boundscheck=off it breaks. As I understand it, out of bounds access is UB. Which would fit my problems because they look like UB. But if I run without boundscheck=off, shouldn't I get a RangeError somewhere?
In theory, yes. But I wonder if there's some corner case where some combination of -O or -inline may cause a bounds check to be elided, but still hit UB. Perhaps the optimizer skipped a bounds check even though it shouldn't have. What about compiling with -boundscheck=off but without -O -inline? Does that make a difference? Barring that, it might be one of those really evil pointer bugs where the problem has already happened far away from the site where the symptoms first appear, usually an undetected memory corruption that only shows up as invalid data long after the actual corruption happened. Very hard to trace. Are you sure you didn't accidentally do something like escape a pointer to a local variable, or a slice of a local static array that has since gone out of scope? Because that's what your symptoms most closely resemble. The last time I ran into this in my own D code, it was caused by D's really evil implicit conversion of static arrays to slices, where passing a local static array implicitly passes a slice instead, e.g.: SomeObject persistentStorage; auto someFunc(int[] data) { ... // stuff persistentStorage.insert(data); // retains reference to data ... } void buggyCode() { int[16] arr = ...; ... someFunc(arr); // <--- implicit conversion happens here ... // uh oh, arr is going out of scope, but // persistentStorage holds a reference to it } void main() { ... buggyCode(); // escaped reference to local variable ... // Crash when it tries to access the slice to // out-of-scope data: doSomething(persistentStorage); ... } Since no explicit slicing was done, there was no compiler error / warning of any sort, and it wasn't obvious from the code what had happened. By the time doSomething() was called, it was already long past the source of the problem in buggyCode(), and it was almost impossible to trace the problem back to its source. Theoretically, -dip25 and -dip1000 are supposed to prevent this sort of problem, but I don't know how fully-implemented they are, whether they would catch the specific instance in your code, or whether your code even compiles with these options. T -- There's light at the end of the tunnel. It's the oncoming train.
Dec 18 2018
parent reply JN <666total wp.pl> writes:
On Tuesday, 18 December 2018 at 22:56:19 UTC, H. S. Teoh wrote:
 Since no explicit slicing was done, there was no compiler error 
 / warning of any sort, and it wasn't obvious from the code what 
 had happened. By the time doSomething() was called, it was 
 already long past the source of the problem in buggyCode(), and 
 it was almost impossible to trace the problem back to its 
 source.

 Theoretically, -dip25 and -dip1000 are supposed to prevent this 
 sort of problem, but I don't know how fully-implemented they 
 are, whether they would catch the specific instance in your 
 code, or whether your code even compiles with these options.


 T
No luck. Actually, I avoid in my code pointers in general, I write my code very "Java-like" with objects everywhere etc. I gave up on the issue actually, perhaps I am encountering this bug https://issues.dlang.org/show_bug.cgi?id=16511 in my own code. Anyway, 32-bit and 64-bit debug work, so does LDC. That's good enough for me.
Feb 06 2019
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Feb 06, 2019 at 09:50:44PM +0000, JN via Digitalmars-d-learn wrote:
 On Tuesday, 18 December 2018 at 22:56:19 UTC, H. S. Teoh wrote:
 Since no explicit slicing was done, there was no compiler error /
 warning of any sort, and it wasn't obvious from the code what had
 happened. By the time doSomething() was called, it was already long
 past the source of the problem in buggyCode(), and it was almost
 impossible to trace the problem back to its source.
 
 Theoretically, -dip25 and -dip1000 are supposed to prevent this sort
 of problem, but I don't know how fully-implemented they are, whether
 they would catch the specific instance in your code, or whether your
 code even compiles with these options.
[...]
 No luck. Actually, I avoid in my code pointers in general, I write my
 code very "Java-like" with objects everywhere etc.
[...] The nasty thing about the implicit static array -> slice conversion is that your code can have no bare pointers in sight, yet you still end up with an invalid reference to an out-of-scope local variable. Some of us have argued that this conversion ought to be be prohibited. But we haven't actually tried going in that direction yet, because it *will* break existing code (though IMO such code is suspect to begin with, and besides, all you have to do is to explicitly slice the static array to get around the newly-introduced compile error). Of course, I've no clue whether this is the cause of your problems -- it's just one of many possibilities. Pointer bugs are nasty things to debug, regardless of whether or not they've been abstracted away in nicer clothing. I still remember pointer bugs that took literally months just to get a clue on, because it was nigh impossible to track down where they happened -- the symptoms are too far removed from the cause. You pretty much have to take a wild guess and get lucky. They are just as bad as race condition bugs. (Once, a race condition bug took me almost half a year to fix, because it only showed up in the customer's live environment and we could never reproduce it locally. We knew there was a race somewhere, but it was impossible to locate it. Eventually, by pure accident, an unrelated code change subtly altered the timings of certain things that made the bug more likely to manifest under certain conditions -- and only then were we finally able to reliably reproduce the problem and track down its root cause.) T -- "I suspect the best way to deal with procrastination is to put off the procrastination itself until later. I've been meaning to try this, but haven't gotten around to it yet. " -- swr
Feb 06 2019
parent reply JN <666total wp.pl> writes:
On Wednesday, 6 February 2019 at 22:22:26 UTC, H. S. Teoh wrote:
 Of course, I've no clue whether this is the cause of your 
 problems -- it's just one of many possibilities.  Pointer bugs 
 are nasty things to debug, regardless of whether or not they've 
 been abstracted away in nicer clothing.  I still remember 
 pointer bugs that took literally months just to get a clue on, 
 because it was nigh impossible to track down where they 
 happened -- the symptoms are too far removed from the cause.  
 You pretty much have to take a wild guess and get lucky.

 They are just as bad as race condition bugs. (Once, a race 
 condition bug took me almost half a year to fix, because it 
 only showed up in the customer's live environment and we could 
 never reproduce it locally. We knew there was a race somewhere, 
 but it was impossible to locate it. Eventually, by pure 
 accident, an unrelated code change subtly altered the timings 
 of certain things that made the bug more likely to manifest 
 under certain conditions -- and only then were we finally able 
 to reliably reproduce the problem and track down its root 
 cause.)


 T
I am not sure if it's a pointer bug. What worries me is that it breaks at the start of the program, but uncommenting code at the end of the program influences it. Unless there's some crazy reordering going on, this shouldn't normally have an effect. I still believe the bug is on the compiler side, but it's a bit of code in my case, and if I try to minimize the case, the issue disappears. Oh well.
Feb 06 2019
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Feb 06, 2019 at 10:37:27PM +0000, JN via Digitalmars-d-learn wrote:
[...]
 I am not sure if it's a pointer bug. What worries me is that it breaks
 at the start of the program, but uncommenting code at the end of the
 program influences it. Unless there's some crazy reordering going on,
 this shouldn't normally have an effect.
As I've said before, this kind of "spooky" action-at-a-distance symptom is exactly the kind of behaviour you'd expect from a pointer bug. Of course, it doesn't mean that it *must* be a pointer bug, but it does look awfully similar to one.
 I still believe the bug is on the compiler side, but it's a bit of
 code in my case, and if I try to minimize the case, the issue
 disappears. Oh well.
That's another typical symptom of a pointer bug. It seems less likely to be a codegen bug, because I'd expect a codegen bug to exhibit more consistent symptoms: if a particular code is triggering a compiler codegen bug, then it shouldn't matter what other code is being compiled, the bug should show up in all cases. This kind of sensitivity to minute, unrelated changes is closer to how pointer bugs tend to behave. Of course, it's possible that there's a pointer bug in the *compiler*, so there's that. It's hard to tell either way at this point. Though given how much the compiler is used by so many people on a daily basis, it's also less likely though not impossible. Unless your code just happens to contain a particularly rare combination of language features that causes the compiler to go down a rarely-tested code path that contains the bug. Anyway, given what you said about how moving (or minimizing) seemingly-unrelated code around seems to affect the symptoms, we could do a little educated guesswork to try to narrow it down a little more. You said commenting out code at the end of the program affects whether it crashes at the beginning. Is this in the same function (presumably main()), or is it in different functions? If it's in the same function, one possibility is that you have some local variables that are being overrun by a buffer overflow or some bad pointer. Commenting out code at the end of the function changes the layout of variables on the stack, so it would change what gets overwritten. Possibly, the bug gets hidden by the bad pointer being redirected to some innocuous variable whose value is no longer used, or some such, so the presence of the bug is masked. If the commented-out code is in a different function from the location of the crash, and you're sure that the commented out code is not being run before the crash, then it would appear to be something related to the layout of global variables. Perhaps there's some module static ctor that's being triggered / not triggered, that changes the global state in some way that affects the code at the beginning of the program? If there's a bad pointer that points to some heap location, the action of module ctors running vs. not running could alter the heap state enough to mask the bug in some cases. Another possibility is if you're interfacing with C code and have a non null-terminated D string that's being cast to char*, and the presence of more code in the executable may perturb the data/code segment layout just enough to push the string somewhere that happens to contain a null shortly afterwards. Just some guesses based on my experience with pointer bugs. T -- Written on the window of a clothing store: No shirt, no shoes, no service.
Feb 06 2019
prev sibling next sibling parent Aliak <something something.com> writes:
On Monday, 17 December 2018 at 21:59:59 UTC, JN wrote:
 Hey guys,

 while working on my game engine project, I encountered a DMD 
 codegen bug. It occurs only when compiling in release mode, 
 debug works. Unfortunately I am unable to minimize the code, 
 since it's quite a bit of code, and changing the code changes 
 the bug occurrence. Basically my faulty piece of code looks 
 like this

 [...]
I remember a couple of months ago someone complaining about similar issues when switching to a newer dmd. I tried looking for the thread but can’t find it. Think it was on the general list. Have you tried previous compiler versions yet?
Dec 17 2018
prev sibling parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Monday, 17 December 2018 at 21:59:59 UTC, JN wrote:
 while working on my game engine project, I encountered a DMD 
 codegen bug. It occurs only when compiling in release mode, 
 debug works.
Old thread, but FWIW, such bugs can be easily and precisely reduced with DustMite. In your test script, just compile with and without the compiler option which causes the bug to manifest, and check that one works and the other doesn't. I put together a short article on the DustMite wiki describing how to do this: https://github.com/CyberShadow/DustMite/wiki/Reducing-a-bug-with-a-specific-compiler-option
Feb 06 2019
parent reply JN <666total wp.pl> writes:
On Thursday, 7 February 2019 at 03:50:32 UTC, Vladimir Panteleev 
wrote:
 On Monday, 17 December 2018 at 21:59:59 UTC, JN wrote:
 while working on my game engine project, I encountered a DMD 
 codegen bug. It occurs only when compiling in release mode, 
 debug works.
Old thread, but FWIW, such bugs can be easily and precisely reduced with DustMite. In your test script, just compile with and without the compiler option which causes the bug to manifest, and check that one works and the other doesn't. I put together a short article on the DustMite wiki describing how to do this: https://github.com/CyberShadow/DustMite/wiki/Reducing-a-bug-with-a-specific-compiler-option
Does it also work for dub projects? Anyway, I managed to reduce the source code greatly manually: https://github.com/helikopterodaktyl/repro_d_release/ unfortunately I can't get rid of the dlib dependency. When built with debug, test outputs [0: Object], with release it outputs [0: null]. commenting this line out: f.rotation = Quaternionf.fromEulerAngles(Vector3f(0.0f, 0.0f, 0.0f)); or changing it to: f.rotation = Quaternionf.identity(); is enough to make release output [0: Object] as well. I guess dlib is doing something dodgy with memory layout, but I can't see anything suspicious :(
Feb 07 2019
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Feb 07, 2019 at 10:16:19PM +0000, JN via Digitalmars-d-learn wrote:
[...]
 Anyway, I managed to reduce the source code greatly manually:
 
 https://github.com/helikopterodaktyl/repro_d_release/
 
 unfortunately I can't get rid of the dlib dependency. When built with
 debug, test outputs [0: Object], with release it outputs [0: null].
 
 commenting this line out:
 f.rotation = Quaternionf.fromEulerAngles(Vector3f(0.0f, 0.0f, 0.0f));
 or changing it to:
 f.rotation = Quaternionf.identity();
 
 is enough to make release output [0: Object] as well. I guess dlib is
 doing something dodgy with memory layout, but I can't see anything
 suspicious :(
Hmm. I can't seem to reproduce this in my environment (Linux/x86_64). Tried it with various combinations of `dub -b release|debug|etc.`, manually compiling with `dmd -I~/.dub/packages/dlib-0.15.0/dlib` with various combinations of -release, -debug, etc.. I wonder if you somehow have an ABI mismatch caused by stale cached objects in dub? Perhaps try `dub --force` to force a rebuild of everything? Or, if you're daring, delete the entire dub cache and rebuild, just to be sure there are no stray stale files lying around somewhere. Barring that, one way to narrow this down further is to copy the relevant dlib sources into your own source tree, remove the dub dependency, and then reduce the dlib sources as well. I did a quick and crude test, and discovered that you only need the following files: dlib/math/matrix.d dlib/math/linsolve.d dlib/math/quaternion.d dlib/math/decomposition.d dlib/math/package.d dlib/math/vector.d dlib/math/utils.d dlib/core/package.d dlib/core/tuple.d Replace dlib/core/package.d with an empty file, and edit dlib/math/package.d to import only dlib.math.quaternion and dlib.math.vector. Since you're only using a very small number of functions, you can probably quickly eliminate most of the above files too. Just edit the files directly (since they're your own copy) and delete everything that isn't directly needed by your code. Of course, at the same time check also that deleting doesn't change the bug behaviour. If it does, then whatever you just deleted may possibly be (part of) the cause of the problem. Sorry I can't help you with reproducing the problem, as the bug doesn't seem to show up in my environment. (I suspect it's still there, just that subtle differences in my environment may be masking it somehow.) T -- Political correctness: socially-sanctioned hypocrisy.
Feb 07 2019
prev sibling parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Thursday, 7 February 2019 at 22:16:19 UTC, JN wrote:
 Does it also work for dub projects?
It will work if you can put all the relevant D code in one directory, which is harder for Dub, as it likes to pull dependencies from all over the place. When "dub dustmite" is insufficient (as in this case), the safest way to proceed would be to build with dub in verbose mode, take note of the compiler command lines it's using, then put them in a shell script and all mentioned D files in one directory, then pass that to Dustmite.
Feb 07 2019
parent reply JN <666total wp.pl> writes:
On Friday, 8 February 2019 at 07:30:41 UTC, Vladimir Panteleev 
wrote:
 On Thursday, 7 February 2019 at 22:16:19 UTC, JN wrote:
 Does it also work for dub projects?
It will work if you can put all the relevant D code in one directory, which is harder for Dub, as it likes to pull dependencies from all over the place. When "dub dustmite" is insufficient (as in this case), the safest way to proceed would be to build with dub in verbose mode, take note of the compiler command lines it's using, then put them in a shell script and all mentioned D files in one directory, then pass that to Dustmite.
I will try. However, one last thing - in the example test scripts, it runs first with one compiler setting (or D version) and the second time with the other compiler setting (or D version). But it looks like the exit code of the first run is ignored anyway, so why run it?
Feb 08 2019
parent reply Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
On Friday, 8 February 2019 at 09:28:48 UTC, JN wrote:
 I will try. However, one last thing - in the example test 
 scripts, it runs first with one compiler setting (or D version) 
 and the second time with the other compiler setting (or D 
 version). But it looks like the exit code of the first run is 
 ignored anyway, so why run it?
With "set -e", the shell interpreter will exit the script with any command that fails (returns with non-zero status), unless it's in an "if" condition or such. I'll update the article to clarify it.
Feb 08 2019
parent reply JN <666total wp.pl> writes:
On Friday, 8 February 2019 at 09:30:12 UTC, Vladimir Panteleev 
wrote:
 On Friday, 8 February 2019 at 09:28:48 UTC, JN wrote:
 I will try. However, one last thing - in the example test 
 scripts, it runs first with one compiler setting (or D 
 version) and the second time with the other compiler setting 
 (or D version). But it looks like the exit code of the first 
 run is ignored anyway, so why run it?
With "set -e", the shell interpreter will exit the script with any command that fails (returns with non-zero status), unless it's in an "if" condition or such. I'll update the article to clarify it.
I see. Dustmite helped. I had to convert it to windows batch, so my testscript ended up to be: dmd -O -inline -release -boundscheck=on -i app.d -m64 IF %ERRORLEVEL% EQU 0 (ECHO No error found) ELSE (EXIT /B 1) app | FINDSTR /C:"Object" IF %ERRORLEVEL% EQU 0 (ECHO No error found) ELSE (EXIT /B 1) dmd -O -inline -release -boundscheck=off -i app.d -m64 IF %ERRORLEVEL% EQU 0 (ECHO No error found) ELSE (EXIT /B 1) app | FINDSTR /C:"null" IF %ERRORLEVEL% EQU 0 (EXIT /B 0) ELSE (EXIT /B 1) I managed to greatly reduce the source code. I have filed a bug with the reduced testcase https://issues.dlang.org/show_bug.cgi?id=19662 .
Feb 08 2019
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Feb 08, 2019 at 09:23:40PM +0000, JN via Digitalmars-d-learn wrote:
[...]
 I managed to greatly reduce the source code. I have filed a bug with
 the reduced testcase https://issues.dlang.org/show_bug.cgi?id=19662 .
Haha, you were right! It's a compiler bug, another one of those nasty -O -inline bugs. Probably a backend codegen bug. Ran into one of those before; was pretty nasty. Fortunately it got fixed soon(ish) after I made noise about it in the forum. :-P T -- Don't drink and derive. Alcohol and algebra don't mix.
Feb 08 2019
parent reply JN <666total wp.pl> writes:
On Friday, 8 February 2019 at 21:35:34 UTC, H. S. Teoh wrote:
 On Fri, Feb 08, 2019 at 09:23:40PM +0000, JN via 
 Digitalmars-d-learn wrote: [...]
 I managed to greatly reduce the source code. I have filed a 
 bug with the reduced testcase 
 https://issues.dlang.org/show_bug.cgi?id=19662 .
Haha, you were right! It's a compiler bug, another one of those nasty -O -inline bugs. Probably a backend codegen bug. Ran into one of those before; was pretty nasty. Fortunately it got fixed soon(ish) after I made noise about it in the forum. :-P T
Luckily it's not a blocker for me, because it doesn't trigger on debug builds, and for release builds I can always use LDC, but still it's bugging me (pun intended).
Feb 08 2019
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Feb 08, 2019 at 09:42:11PM +0000, JN via Digitalmars-d-learn wrote:
 On Friday, 8 February 2019 at 21:35:34 UTC, H. S. Teoh wrote:
 On Fri, Feb 08, 2019 at 09:23:40PM +0000, JN via Digitalmars-d-learn
 wrote: [...]
 I managed to greatly reduce the source code. I have filed a bug
 with the reduced testcase
 https://issues.dlang.org/show_bug.cgi?id=19662 .
Haha, you were right! It's a compiler bug, another one of those nasty -O -inline bugs. Probably a backend codegen bug. Ran into one of those before; was pretty nasty. Fortunately it got fixed soon(ish) after I made noise about it in the forum. :-P
[...]
 Luckily it's not a blocker for me, because it doesn't trigger on debug
 builds, and for release builds I can always use LDC, but still it's
 bugging me (pun intended).
Pity I still can't reproduce the problem locally. Otherwise I would reduce it even more -- e.g., eliminate std.stdio dependency and have the program fail on assert(obj != null), and a bunch of other things to make it easier for compiler devs to analyze -- and perhaps look at the generated assembly to see what went wrong. If you have the time (and patience) to do that, it would greatly increase the chances of this being fixed in a timely way, since it would narrow down the bug even more so that it's easier to find in the dmd source code. T -- I see that you JS got Bach.
Feb 08 2019
parent reply JN <666total wp.pl> writes:
On Friday, 8 February 2019 at 22:11:31 UTC, H. S. Teoh wrote:
 Pity I still can't reproduce the problem locally. Otherwise I 
 would reduce it even more -- e.g., eliminate std.stdio 
 dependency and have the program fail on assert(obj != null), 
 and a bunch of other things to make it easier for compiler devs 
 to analyze -- and perhaps look at the generated assembly to see 
 what went wrong.  If you have the time (and patience) to do 
 that, it would greatly increase the chances of this being fixed 
 in a timely way, since it would narrow down the bug even more 
 so that it's easier to find in the dmd source code.


 T
It seems to be a Windows 64-bit only thing. Anyway, I reduced the code further manually. It's very hard to reduce it any further. For example, removing the assignments in fromEulerAngles static method hides the bug. Likewise, replacing writeln with assert makes it work properly too.
Feb 08 2019
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Feb 08, 2019 at 10:45:39PM +0000, JN via Digitalmars-d-learn wrote:
[...]
 Anyway, I reduced the code further manually. It's very hard to reduce
 it any further. For example, removing the assignments in
 fromEulerAngles static method hides the bug.  Likewise, replacing
 writeln with assert makes it work properly too.
Pity we couldn't get rid of std.stdio. It's a pretty big piece of code, and there are plenty of places where it may go wrong inside, even though we generally expect that the bug lies elsewhere. Oh well. Hopefully somebody else can dig into this and figure out what's going on. Hmm. I just glanced over the std.stdio code... it appears that somebody has added trusted all over the place, probably just to get it to compile with safe. That's kinda scary... somebody needs to vet this code carefully to make sure nothing fishy's going on in there! T -- For every argument for something, there is always an equal and opposite argument against it. Debates don't give answers, only wounded or inflated egos.
Feb 08 2019
parent reply JN <666total wp.pl> writes:
On Friday, 8 February 2019 at 23:30:44 UTC, H. S. Teoh wrote:
 On Fri, Feb 08, 2019 at 10:45:39PM +0000, JN via 
 Digitalmars-d-learn wrote: [...]
 Anyway, I reduced the code further manually. It's very hard to 
 reduce it any further. For example, removing the assignments 
 in fromEulerAngles static method hides the bug.  Likewise, 
 replacing writeln with assert makes it work properly too.
Pity we couldn't get rid of std.stdio. It's a pretty big piece of code, and there are plenty of places where it may go wrong inside, even though we generally expect that the bug lies elsewhere. Oh well. Hopefully somebody else can dig into this and figure out what's going on. Hmm. I just glanced over the std.stdio code... it appears that somebody has added trusted all over the place, probably just to get it to compile with safe. That's kinda scary... somebody needs to vet this code carefully to make sure nothing fishy's going on in there! T
I can replace it with core.stdc.stdio if it's any better. Looks like any attempt to do a check for "x is null" hides the bug. I tried assert(), also tried if (x is null) throw new Exception(...)
Feb 08 2019
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Feb 08, 2019 at 11:36:03PM +0000, JN via Digitalmars-d-learn wrote:
 On Friday, 8 February 2019 at 23:30:44 UTC, H. S. Teoh wrote:
[...]
 Pity we couldn't get rid of std.stdio.
[...]
 I can replace it with core.stdc.stdio if it's any better. Looks like
 any attempt to do a check for "x is null" hides the bug. I tried
 assert(), also tried if (x is null) throw new Exception(...)
Aha! That's an important insight. It's almost certain that it's caused by a backend bug now. So testing the value perturbs the codegen code path enough to mask the bug / avoid the bug. I think from this point somebody who's familiar with the dmd backend ought to be able to track it down reasonably easily. (Unfortunately I'm completely unfamiliar with that part of the dmd code.) T -- In order to understand recursion you must first understand recursion.
Feb 08 2019