www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 6660] New: Problem with SSE registers in array ops

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6660

           Summary: Problem with SSE registers in array ops
           Product: D
           Version: D1 & D2
          Platform: Other
        OS/Version: Windows
            Status: NEW
          Severity: normal
          Priority: P2
         Component: DMD
        AssignedTo: nobody puremagic.com
        ReportedBy: clugdbug yahoo.com.au



This program, arrayop.d,

void main()
{
    double[4] a;
    double[4] b;
    a[] = b[] + b[];
}
compiled and run repeatedly in a batch file 

dmd arrayop
arrayop
dmd arrayop
arrayop
dmd arrayop
arrayop
... (I put it in about 20 times)
eventually generates this error on a SandyBridge processor, Windows 7.

C:\sandbox\bugs>dmd arrayop
DMD v2.055 DEBUG
OPTLINK (R) for Win32  Release 8.00.12
Copyright (C) Digital Mars 1989-2010  All rights reserved.
http://www.digitalmars.com/ctg/optlink.html
OPTLINK : Error 3: Cannot Create File arrayop.exe
--- errorlevel 1
Also happens in release version of DMD 2.055.

I think it is an SSE issue, since it only happens with arrays of floats and
doubles (not reals). But I'm just guessing. Maybe it is corrupting the stack.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 13 2011
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6660


Brad Roberts <braddr puremagic.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |braddr puremagic.com



---
Another data point...

In the auto tester where it's building each test with the sequence of different
parameter combinations, it used to fail every once in a while due to the same
error below.  Changing it to write to a different executable every time (I just
added a counter so it's testfoo_0.exe, testfoo_1.exe, etc..) completely fixed
that problem.  I have no recollection which tests were failing.. I thought it
was pretty random, but it might not have been.

My assumption is/was that windows isn't releasing the exclusive write lock on
the executable file synchronously with the exiting of the application.

Have you tried the same loop with an empty main?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 26 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6660





 Another data point...
 
 In the auto tester where it's building each test with the sequence of different
 parameter combinations, it used to fail every once in a while due to the same
 error below.  Changing it to write to a different executable every time (I just
 added a counter so it's testfoo_0.exe, testfoo_1.exe, etc..) completely fixed
 that problem.  I have no recollection which tests were failing.. I thought it
 was pretty random, but it might not have been.
 
 My assumption is/was that windows isn't releasing the exclusive write lock on
 the executable file synchronously with the exiting of the application.
 
 Have you tried the same loop with an empty main?
Yes, I have, and it never fails. It also never fails when 'double' is replaced by 'real'. This makes it very hard for me to blame Windows for this. I found three tests from the test suite which failed: test15, arrayop, and hospital. I reduced arrayop down to that minimum size. Might be worth trying to reduce the others as well. It's also possible that it could be an issue with core.cpuid. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 27 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6660


Don <clugdbug yahoo.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
            Summary|Problem with SSE registers  |Problem with core.cpuid on
                   |in array ops                |Windows7



Yup, it's core.cpuid. This one fails (intermittently):
-----
import core.cpuid;

void main()
{
    bool b = sse();
}

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 27 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6660




Reduced test case is very, very strange:

void main()
{ 
    __gshared uint a;
    asm {
        mov EAX, 2;
        cpuid;
        mov a, EAX;
    }
    uint numinfos = a& 0xFF;
    do {
    } while (--numinfos);
}

It only happens with cpuid = 2.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 27 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6660




This is really incredible. I've removed all of the D code, and I can still
reproduce the behaviour. If you uncomment out the jz line, it won't happen.
The 'int 3' line is just a breakpoint, to prove that the branch is never taken.

void main()
{ 
    int ctr; // also works with __gshared int ctr;
    asm {
        mov EAX, 2;
        cpuid;
        and EAX, 0xFF;
        mov ctr, EAX;
//        jz was_zero;
Lxx:
        dec int ptr ctr;
        jnz Lxx;
        jmp done;
was_zero: 
        int 3;
done:   ;        
    }
}

Wild speculation: there's a bug in CPUID 2: it's not clearing the loopback
buffer. The loop is executed as if 'ctr' were still zero. This means that it
loops 2^^32 times. This is long enough that Windows does a task switch.
In core2, the loopback buffer was between the predecoders and the decoders, but
on core i7, they moved it after the decoders.
I tried to confirm this by extending the size of the loop, by padding with
nops.
When the loop is 63 bytes of code (56 nops), it fails. Once I add a 57th nop,
it stops failing.
These aren't the numbers I expected -- the loopback buffer is 256 bytes on the
core i7. However I have a core i3, perhaps it's different, or it may be a
decoding bug. Regardless, this looks very much like a CPU erratum.


My guess is that affecting the loop predictor. which isn't the branch
prediction

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 27 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6660




My theory is not correct. I figured that I could check if the number of
iterations was wrong by using rdtsc to see how many instructions are executed.
But it shows nothing unusual.
I'm no longer convinced that this is a loopback issue.
I also found that if I include a writefln after the relevant code, the critical
length of the loop drops from 64 (0x40) to 40 (0x28). It doesn't seem to be
affected by code alignment, so it's not a cache line issue.
This whole thing is very, very strange.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 27 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6660




The reduced test case from test15.d looks _completely_ different:

void main()
{
    char[] a = new char[0];
    uint c = 20000;
    while (c--)
    a ~= 'x';
}

This looks as though the gc is still running after the app has exited.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Sep 27 2011
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6660




This is interesting.

http://msdn.microsoft.com/en-us/library/windows/hardware/ff538528%28v=vs.85%29.aspx

"A CPUID intercept message is delivered by the hypervisor when a virtual
processor executes a CPUID instruction and the parent partition previously
called the HvInstallIntercept hypercall function to install an intercept on
such instructions."

Wow. There is a hypervisor running on my laptop. And it's buggy. Could it be a
rootkit?

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Dec 22 2011
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=6660


Don <clugdbug yahoo.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED



Turns out to be caused by Windows Defender.
Disabling it in the development directory solves the problem.

Looks like a bug in Windows Defender to me.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Mar 28 2012