www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 9438] New: Strange RefCounted stack overflow

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9438

           Summary: Strange RefCounted stack overflow
           Product: D
           Version: D2
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: major
          Priority: P2
         Component: DMD
        AssignedTo: nobody puremagic.com
        ReportedBy: monarchdodra gmail.com



I've been chasing this on and off for a couple months now. Basically, trying to
access the RefCounted.refCountedStore.isInitialized of a non-initialized
RefCounted in a field of a temporary will create a stack overflow. I know
that's not clear, but here is the reduced usecase:

//----
import std.container, std.stdio, std.typecons, std.exception;

struct S
{
  RefCounted!int _data;

  this(int)
  {_data.refCountedStore.ensureInitialized();}

  int get()  property
  {
      writeln("here");
      enforce(_data.refCountedStore.isInitialized); //OH NOES!!!
      writeln("there");
      return _data.refCountedPayload;
  }
}

void main()
{
    // 1)
    writeln(S(1).get);

    // 2)
    S s;
    writeln(s.get).collectException();

    // 3)
    writeln(S().get);
}
//----

1) This will create a temporary S, and intialize the ref counted. the writeln
works.

2) The creates a non-temporary S. Trying to access the ref counted will
(correctly) throw an exception.

3) This will stack overflow at the "//OH NOES!!!" line: It will first call:

ref inout(RefCountedStore) refCountedStore() inout

To get the store, and then will recursively call "isInitialized" until the
program stack overflows. I have no idea why:

//----
         property nothrow  safe
        bool isInitialized() const
        {
            return _store !is null;
        }
//----

This seems to me like the tip of a more serious bug somewhere. I would be very
pleased if someone with more knowledge than me could try to look into it?

I think it might also create problems with things such as arrays of arrays:
Every time I've tried to fix http://d.puremagic.com/issues/show_bug.cgi?id=6153
I've had crashes (NOT asserts/enforeces), and I think this might be the reason.

Originally found with this code:
//----
void main()
{
    writeln(Array!int()[0]);
}
//----

Yes, the code is wrong, but it should *assert*. Currently, it just dies.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 01 2013
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9438


Maxim Fomin <maxim maxim-fomin.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |maxim maxim-fomin.ru



---
The situation is more complicated.

import std.container, std.stdio, std.typecons, std.exception;

struct S
{
  RefCounted!int _data;

  this(int)
  {_data.refCountedStore.ensureInitialized();}

  int get()  property
  {
      writeln("here");
      enforce(_data.refCountedStore.isInitialized); //OH NOES!!! //13
      writeln("there");
      return _data.refCountedPayload;
  }
}

void main()
{
    version (A) {
      writeln(S(1).get);
     }

    version (B) {
      S s;
      writeln(s.get).collectException();
    }

    version (C) {
        writeln(S().get);
    }
}

When compiling with version A or B, everything is fine. Version C fails
enforcement on line 13. Both A and C throws failed enforcement. Both B and C
segfault in Refcounted dtor. Tested on linux64 git head.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 01 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9438




---
Oh, it depends on compiler switches. Reduced from previous by combining B and
C:

import std.container, std.stdio, std.typecons, std.exception;

struct S
{
  RefCounted!int _data;

  int get()  property
  {
      writeln("here");
      enforce(_data.refCountedStore.isInitialized); //OH NOES!!!
      writeln("there");
      return _data.refCountedPayload;
  }
}

void main()
{
      S s;
      writeln(s.get).collectException();
        writeln(S().get);    
}

When compiling with -g, it prints "here", "here" and enforcement failure,
without -g it segfaults as like above. Since valgrind does not detect for -g
version memory errors, the issue may be dmd codegen bug.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 01 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9438


monarchdodra gmail.com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |wrong-code
           Severity|major                       |critical



I wanted to try to get RefCounted out of the way, and I was able to get a
massively reduced test case. That said, the result is mind bogglingly wtf...

FYI: win32 on win7.64

//----
import std.stdio, std.exception;

//S is merely a struct with a pointer. And a destructor.
struct S
{
    int* _payload = null;

    ~this()
    {
        writeln("~this: ", _payload);
        if (!_payload) return;
        writeln("Should not be here...");
    }
}

void foo(int* p)
{
  throw new Exception("bla");
}

void main()
{
    int* p = S()._payload;
    writeln("wait for it...");
    foo(S()._payload);
}
//----
~this: null
wait for it...
~this: FFFF05E8
Should not be here...
object.Exception main.d(17): bla
//----

Apparently, we create a temporary, an exception is thrown, the temporary gets
corrupted, and then things start getting awry in the destructor.

I'm getting this also for as far back as 2.55 (didn't try anything earlier), so
it doesn't seem to be a regression.

In any case, exceptions silently corrupting stack temporaries? That's a
critical in my book.

Who was it again that said our destructors weren't very well tested?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 01 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9438





 I wanted to try to get RefCounted out of the way, and I was able to get a
 massively reduced test case. That said, the result is mind bogglingly wtf...
Just to add, I just tried compiling with different flags, including -O -release -debug -g: The scenario occurs in all cases, except when the flag "-g" is set, in which case things work correctly. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 01 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9438




---

 I wanted to try to get RefCounted out of the way, and I was able to get a
 massively reduced test case. That said, the result is mind bogglingly wtf...
 
 FYI: win32 on win7.64
 
 //----
Cannot reproduce on linux 64 githead. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 01 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9438




---

 I wanted to try to get RefCounted out of the way, and I was able to get a
 massively reduced test case. That said, the result is mind bogglingly wtf...
 
 FYI: win32 on win7.64
 
Can reproduce. And -g fixes the program. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 01 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9438




---
Reduced for linux

struct RefCounted
{
    void *p;
    ~this()
    {
        p = null;
    }
}

struct S
{
  RefCounted _data;

  int get()  property
  {
      throw new Exception("");
  }
}

void main()
{
      S s;
      S().get;    
}

With -g it throws, without -g segfaults.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 01 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9438




---
It really seems to be codegen bug. The problem is that presence of code like in
main function (struct temporary + simple stack struct) makes dmd generate wrong
exception handler table.

If you compile main.d one version with -release -O -noboundcheck and other
version with the same switches and additionally with -g, you will have
absolutely identical asm (obj2asm output) except the single difference is in
data segment.

In segfaulting version you have 

.data    segment
_HandlerTable0:
    db    050h,000h,000h,000h,063h,000h,000h,000h    ;P...c...
    db    002h,000h,000h,000h,000h,000h,000h,000h    ;........
    db    019h,000h,000h,000h,048h,000h,000h,000h    ;....H...
    db    0ffffffffh,0ffffffffh,0ffffffffh,0ffffffffh,000h,000h,000h,000h   
;........
    db    057h,000h,000h,000h,000h,000h,000h,000h    ;W.......
    db    02bh,000h,000h,000h,037h,000h,000h,000h    ;+...7...
    db    000h,000h,000h,000h,000h,000h,000h,000h    ;........
    db    042h,000h,000h,000h,000h,000h,000h,000h    ;B....... // 42h

and in throwing version you will have

_HandlerTable0:
    db    050h,000h,000h,000h,063h,000h,000h,000h    ;P...c...
    db    002h,000h,000h,000h,000h,000h,000h,000h    ;........
    db    019h,000h,000h,000h,048h,000h,000h,000h    ;....H...
    db    0ffffffffh,0ffffffffh,0ffffffffh,0ffffffffh,000h,000h,000h,000h   
;........
    db    057h,000h,000h,000h,000h,000h,000h,000h    ;W.......
    db    02bh,000h,000h,000h,037h,000h,000h,000h    ;+...7...
    db    000h,000h,000h,000h,000h,000h,000h,000h    ;........
    db    03eh,000h,000h,000h,000h,000h,000h,000h    ;>....... //3eh

If you patch incorrect binary, the bug goes away.

Corrupted handler table leads to following problem (asm snippet from main):

0x0000000000418888 <+60>:    jmp    <_Dmain+72>
0x000000000041888a <+62>:    lea    -0x10(%rbp),%rdi //3Eh
0x000000000041888e <+66>:    callq  <_D4main1S11__fieldDtorMFZv> //42h
0x0000000000418893 <+71>:    retq   
0x0000000000418894 <+72>:    sub    $0x8,%rsp
0x0000000000418898 <+76>:    callq  0x4188a3 <_Dmain+87>
0x000000000041889d <+81>:    add    $0x8,%rsp
0x00000000004188a1 <+85>:    jmp    0x4188ad <_Dmain+97>
0x00000000004188a3 <+87>:    lea    -0x18(%rbp),%rdi
0x00000000004188a7 <+91>:    callq  0x418810 <_D4main1S11__fieldDtorMFZv>
0x00000000004188ac <+96>:    retq   
0x00000000004188ad <+97>:    xor    %eax,%eax
0x00000000004188af <+99>:    pop    %r15

In segfaulting version druntime unwinds up to _Dmain+66, after instruction
which sets into %rdi this reference, hence dtor receives corrupted pointer. In
correct version druntime unwinds up to _Dmain+62, so the this pointer is
correct.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 01 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9438





 It really seems to be codegen bug.
Thankyou for investigating this. As I said, it really is out of my league. Do you know what the next step is for fixing this? Who we should forward it to? We should really get this fixed. Stack corruption when an exception is thrown? Nothing good can come out of this. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 03 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9438




---


 It really seems to be codegen bug.
Thankyou for investigating this. As I said, it really is out of my league. Do you know what the next step is for fixing this? Who we should forward it to?
This does not work. Until somebody who knows dmd source faces the issue, the bug will not be fixed simply because there are too much problems and too few people.
 We should really get this fixed. Stack corruption when an exception is thrown?
 Nothing good can come out of this.
-- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Feb 04 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9438


Maxim Fomin <maxim maxim-fomin.ru> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
           Keywords|                            |pull



---
https://github.com/D-Programming-Language/dmd/pull/1645

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 09 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9438




Commits pushed to master at https://github.com/D-Programming-Language/dmd

https://github.com/D-Programming-Language/dmd/commit/7d42ebfc235a0621cb88b6d3438218c376f8908f
Fix issue 9438 - Strange RefCounted stack overflow

https://github.com/D-Programming-Language/dmd/commit/b9478394fb9e1599dce0f5727ddfd4acdc858163


Fix issue 9438 - Strange RefCounted stack overflow

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 10 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9438




Commit pushed to staging at https://github.com/D-Programming-Language/dmd

https://github.com/D-Programming-Language/dmd/commit/204abc2462ca4fa90b51eaea56a5d0604d9d9438


Fix issue 9438 - Strange RefCounted stack overflow

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 10 2013
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9438




Commit pushed to dmd-1.x at https://github.com/D-Programming-Language/dmd

https://github.com/D-Programming-Language/dmd/commit/258d96826ca250b553ce5a3f54ead42273bcf821
merge fix issue 9438

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 10 2013
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=9438


Walter Bright <bugzilla digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
                 CC|                            |bugzilla digitalmars.com
            Version|D2                          |D1 & D2
         Resolution|                            |FIXED


-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Feb 10 2013