www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - Win64, merge-2.067, LLVM master, VS 2015 - current status

reply "kinke" <noone nowhere.com> writes:
Hey guys,

as a teaser for the curious, here's the current status with a 
bleeding edge Win64 environment:

* Visual Studio 2015 CTP
* LLVM master (5de9960)
* LDC: merge-2.067
** *.conf.in files manually modified to include 
"-L/LARGEADDRESSAWARE:NO" as default option
** druntime: + 
https://github.com/kinke/druntime/commit/1add4f0d401717acc42d
2600f9e85ca7d0efe11 
+ https://github.com/ldc-developers/druntime/pull/17
** phobos: + https://github.com/ldc-developers/phobos/pull/17

druntime + phobos unittests, debug and release:
92% tests passed, 46 tests failed out of 555

failures:
core.thread (segfaults in release only)
std.csv
std.datetime
std.encoding
std.math
std.parallelism
std.path
std.process
std.socket
std.stream
std.string (fails to compile in debug, fails in release)
std.traits
std.uni
std.uri
std.zip
std.zlib
std.algorithm.sorting (fails in debug only)
std.digest.crc
std.digest.md (fails in debug only)
std.digest.ripemd (fails in debug only)
std.digest.sha
std.net.isemail
std.regex.internal.parser
std.regex.internal.tests
std.regex
std.internal.math.gammafunction
May 10 2015
next sibling parent reply Dan Olson <zans.is.for.cans yahoo.com> writes:
"kinke" <noone nowhere.com> writes:
 failures:
 core.thread (segfaults in release only)
Where does core.thread fail its test?
May 10 2015
parent reply "kinke" <noone nowhere.com> writes:
On Sunday, 10 May 2015 at 22:25:14 UTC, Dan Olson wrote:
 Where does core.thread fail its test?
Apparently an access violation in fiber_switchContext(). asm: push rbx xor rax,rax push qword ptr gs:[rax] push qword ptr gs:[rax+8] push qword ptr gs:[rax+10h] mov qword ptr [rcx],rsp mov rsp,rdx pop qword ptr gs:[rax+10h] --> access violation with rax=0 pop qword ptr gs:[rax+8] pop qword ptr gs:[rax] pop rbx
May 12 2015
parent reply "Dan Olson" <zans4cans yahoo.com> writes:
On Tuesday, 12 May 2015 at 22:46:26 UTC, kinke wrote:
 On Sunday, 10 May 2015 at 22:25:14 UTC, Dan Olson wrote:
 Where does core.thread fail its test?
Apparently an access violation in fiber_switchContext(). asm: push rbx xor rax,rax push qword ptr gs:[rax] push qword ptr gs:[rax+8] push qword ptr gs:[rax+10h] mov qword ptr [rcx],rsp mov rsp,rdx pop qword ptr gs:[rax+10h] --> access violation with rax=0 pop qword ptr gs:[rax+8] pop qword ptr gs:[rax] pop rbx
see on OS X/iOS and only in release builds. Can you see if it happens in the runShared test? If so, it may pass if you enable the version(Posix) code for sm_this that uses pthread_get/setspecific [2]. [1] https://github.com/ldc-developers/ldc/issues/666 [2] https://github.com/ldc-developers/druntime/blob/ldc/src/core/thread.d#L1135
May 12 2015
parent reply "kinke" <noone nowhere.com> writes:
On Wednesday, 13 May 2015 at 05:14:57 UTC, Dan Olson wrote:
 Apparently an access violation in fiber_switchContext().

 asm:
 push rbx
 xor  rax,rax
 push qword ptr gs:[rax]
 push qword ptr gs:[rax+8]
 push qword ptr gs:[rax+10h]
 mov  qword ptr [rcx],rsp
 mov  rsp,rdx
 pop  qword ptr gs:[rax+10h] --> access violation with rax=0
 pop  qword ptr gs:[rax+8]
 pop  qword ptr gs:[rax]
 pop  rbx
I see on OS X/iOS and only in release builds. Can you see if it happens in the runShared test? If so, it may pass if you enable the version(Posix) code for sm_this that uses pthread_get/setspecific [2]. [1] https://github.com/ldc-developers/ldc/issues/666 [2] https://github.com/ldc-developers/druntime/blob/ldc/src/core/thread.d#L1135
Thx - it crashes during the runShared test. pthread isn't supported on Windows. The access violation occurs in https://github.com/ldc-developers/druntime/blob/ldc/src/core/thread.d#L3597.
May 13 2015
parent reply Dan Olson <zans.is.for.cans yahoo.com> writes:
"kinke" <noone nowhere.com> writes:

 On Wednesday, 13 May 2015 at 05:14:57 UTC, Dan Olson wrote:
 Apparently an access violation in fiber_switchContext().

 asm:
 push rbx
 xor  rax,rax
 push qword ptr gs:[rax]
 push qword ptr gs:[rax+8]
 push qword ptr gs:[rax+10h]
 mov  qword ptr [rcx],rsp
 mov  rsp,rdx
 pop  qword ptr gs:[rax+10h] --> access violation with rax=0
 pop  qword ptr gs:[rax+8]
 pop  qword ptr gs:[rax]
 pop  rbx
on OS X/iOS and only in release builds. Can you see if it happens in the runShared test? If so, it may pass if you enable the version(Posix) code for sm_this that uses pthread_get/setspecific [2]. [1] https://github.com/ldc-developers/ldc/issues/666 [2] https://github.com/ldc-developers/druntime/blob/ldc/src/core/thread.d#L1135
Thx - it crashes during the runShared test. pthread isn't supported on Windows. The access violation occurs in https://github.com/ldc-developers/druntime/blob/ldc/src/core/thread.d#L3597.
It does seem to be the same problem because the stack to resume on is wrong. That is what I see on OS X and iOS. You could try Windows TlsGetValue API for sm_this and mimic the pthread_getspecific code. If it works, I don't think it is a real fix but does allow rest of thread unittest to run. Maybe just disabling the runShared test and documenting is best. I found this older page where boost decided coroutine migration between threads was unsafe because of TLS: http://www.crystalclearsoftware.com/soc/coroutine/coroutine/coroutine_thread.html Thinking out loud: If LDC could provide a switch to disable TLS address caching, would folks think to use it? Should it be on by default to support the ability to migrate Fibers across threads? Is it worth the performance loss in other TLS cases? Maybe it is better to have some targets w/ expensive TLS lookup just disable Fiber migration. Is FIber migration that common?
May 14 2015
parent "kinke" <noone nowhere.com> writes:
On Thursday, 14 May 2015 at 07:51:16 UTC, Dan Olson wrote:
 You could try Windows TlsGetValue API for sm_this and mimic
 the pthread_getspecific code.
Yep, that makes the runShared test pass. core.thread still doesn't pass all tests in release though (e.g., non-volatile GP registers are apparently not restored correctly).
May 14 2015
prev sibling next sibling parent "Kai Nacke" <kai redstar.de> writes:
On Sunday, 10 May 2015 at 14:13:35 UTC, kinke wrote:
 Hey guys,

 as a teaser for the curious, here's the current status with a 
 bleeding edge Win64 environment:
Really cool!
 failures:
 std.digest.sha
The failure may be caused by the assembler code because of non-matching ABI conventions. Regards, Kai
May 10 2015
prev sibling next sibling parent reply "kinke" <noone nowhere.com> writes:
Most interesting: many of the failing unittests are caused by a 
single issue, namely, std.concurrency.unregisterMe() invoked by 
the static destructor of the std.concurrency module.
The following unittests all pass when isolated, i.e., by 
compiling via 'ldc2 -g -main -unittest <foo>.d' (debug, 
'-release' added for release) and then running the resulting 
<foo>.exe:

std.datetime
std.parallelism
std.path
std.process
std.string
std.uni
std.uri
std.zip
std.zlib
std.algorithm.sorting
std.digest.crc
std.digest.md
std.digest.ripemd
std.digest.sha
std.net.isemail
std.regex.internal.parser
std.regex.internal.tests
std.regex

The following failures are NOT caused by std.concurrency.~this():

core.thread
std.csv *
std.encoding
std.math
std.socket
std.stream
std.traits *
std.internal.math.gammafunction *

[*] fixed or worked around on my system, patches being prepared

So we're not far from LDC on Win64 passing all druntime + phobos 
unit tests! :)
May 14 2015
next sibling parent Dan Olson <zans.is.for.cans yahoo.com> writes:
"kinke" <noone nowhere.com> writes:

 The following failures are NOT caused by std.concurrency.~this():

 core.thread
 std.csv *
 std.encoding
 std.math
 std.socket
 std.stream
 std.traits *
 std.internal.math.gammafunction *

 [*] fixed or worked around on my system, patches being prepared

 So we're not far from LDC on Win64 passing all druntime + phobos unit
 tests! :)
Very cool. The math unittest failures could just be the ones that aren't written for 64-bit real. There was some work on it by Kevin and Johan in this pull [1], but I have not been following lately. [1] https://github.com/ldc-developers/phobos/pull/7
May 14 2015
prev sibling parent reply "kinke" <noone nowhere.com> writes:
After intensive debugging, the strange issue responsible for most 
failures seems to be reducible to some unit tests allocating GC 
memory, this in turn leading to a GC collection pass which then 
destroys the `__gshared core.thread.Thread 
core.thread.Thread.sm_tbeg` object representing the main thread 
(it's actually the start of a linked list of threads). The object 
should actually be kept alive by the sm_tbeg reference (and the 
main Thread additionally by sm_main). After initializing the 
reference with the main Thread at program startup, it isn't 
touched (i.e., not reset to null - verified via data breakpoint), 
but the object is finalized anyway.

So it looks as if the GC doesn't know about these __gshared 
references. When accessing the linked list later, when 
terminating all threads right before exiting the program, funny 
things happen due to garbage in the .next and .prev references.

I've just tried storing another reference to the main Thread as 
`static Thread core.thread.Thread.sm_mainDummy`, and the object 
isn't destroyed anymore. So `gshared` seems to be the problem. 
And most likely all other targets are affected too by this bug.
May 16 2015
parent reply Dan Olson <zans.is.for.cans yahoo.com> writes:
"kinke" <noone nowhere.com> writes:

 I've just tried storing another reference to the main Thread as
 `static Thread core.thread.Thread.sm_mainDummy`, and the object isn't
 destroyed anymore. So `gshared` seems to be the problem. And most
 likely all other targets are affected too by this bug.
Could sections_ldc.initSections() be neglecting the BSS section? I noticed the version(Win64) code has: if (_bss_start__ != null) { pushRange(&_bss_start__, &_bss_end__); } -- Dan
May 17 2015
parent reply Dan Olson <zans.is.for.cans yahoo.com> writes:
Dan Olson <zans.is.for.cans yahoo.com> writes:

 "kinke" <noone nowhere.com> writes:

 I've just tried storing another reference to the main Thread as
 `static Thread core.thread.Thread.sm_mainDummy`, and the object isn't
 destroyed anymore. So `gshared` seems to be the problem. And most
 likely all other targets are affected too by this bug.
Could sections_ldc.initSections() be neglecting the BSS section? I noticed the version(Win64) code has: if (_bss_start__ != null) { pushRange(&_bss_start__, &_bss_end__); }
I don't have a Windows host available right now, but am assuming that _bss_start__ is a symbol created by linker that overlays first variable in BSS, which very likely is 0 because it is in BSS. But again, just guessing.
May 17 2015
parent reply "kinke" <noone nowhere.com> writes:
Thx for the hint, Dan!!!

99% tests passed, 4 tests failed out of 556

The following tests FAILED:
         175 - std.math (Failed)
         191 - std.stream (Failed)
         460 - std.socket-debug (Failed)
         465 - std.stream-debug (Failed)

!!! :))

The problem was that instead of using the range [_data_start__, 
_data_end__) (computed in rt/msvc.c), we've used [&_data_start__, 
&_data_end__). We also did so for the BSS section, which isn't 
present in my druntime-test-runner-debug.exe though.
May 17 2015
parent reply "Temtaime" <temtaime gmail.com> writes:
Hi all !
http://goo.gl/0JI4qJ
Why ldc cannot optimize it ?
Should i create an issue ?
For example gdc uses only one mulfps.
May 20 2015
next sibling parent "Temtaime" <temtaime gmail.com> writes:
For example that code is ok: http://goo.gl/QMfpzg

Can we rewrite vector operations with foreach rather than call 
vector functions ?
May 20 2015
prev sibling next sibling parent reply "John Colvin" <john.loughran.colvin gmail.com> writes:
On Wednesday, 20 May 2015 at 13:34:50 UTC, Temtaime wrote:
 Hi all !
 http://goo.gl/0JI4qJ
 Why ldc cannot optimize it ?
 Should i create an issue ?
 For example gdc uses only one mulfps.
Rule of thumb: don't use array ops for short arrays. They are quite well optimised for large arrays, but aren't great in cases like your example.
May 21 2015
next sibling parent reply "David Nadlinger" <code klickverbot.at> writes:
On Thursday, 21 May 2015 at 19:05:33 UTC, John Colvin wrote:
 Rule of thumb: don't use array ops for short arrays. They are 
 quite well optimised for large arrays, but aren't great in 
 cases like your example.
We should be able to do better, though, especially for cases like this where the length is known statically. — David
May 21 2015
parent reply "kinke" <noone nowhere.com> writes:
On Thursday, 21 May 2015 at 19:07:20 UTC, David Nadlinger wrote:
 We should be able to do better, though, especially for cases 
 like this where the length is known statically.
Definitely, if GDC manages to do so, we should too. Some years ago, I ported some math-intensive C++ code to D and rewrote all simple loops by array ops, for better readability but primarily because I assumed that'd help with SSE vectorization. No wonder the runtime was about an order of magnitude higher compared to the corresponding C++ version. The disappointment was rather high and lowered my interest in D, so yes, please create a Github issue about this.
May 21 2015
parent reply "Temtaime" <temtaime gmail.com> writes:
Hi guys !
I recently found that ldc doesn't compile with llvm master 
anymore.
They removed CreateCall[2-3] functions ans some of overloads of 
CreateCall too.

Now all the parameters should be passed to that function by 
vector<llvm::Value *>.
Anyone to fix ?
May 22 2015
parent "kinke" <noone nowhere.com> writes:
Kai has just fixed it. Please don't expect the few of us to 
always react so quickly to every LLVM API change (and there's a 
whole lot of them) on its master branch though. ;)
May 22 2015
prev sibling parent "Kai Nacke" <kai redstar.de> writes:
On Thursday, 21 May 2015 at 19:05:33 UTC, John Colvin wrote:
 On Wednesday, 20 May 2015 at 13:34:50 UTC, Temtaime wrote:
 Hi all !
 http://goo.gl/0JI4qJ
 Why ldc cannot optimize it ?
 Should i create an issue ?
 For example gdc uses only one mulfps.
Rule of thumb: don't use array ops for short arrays. They are quite well optimised for large arrays, but aren't great in cases like your example.
Now ldc should inline the arrayops. If you have some benchmarks you could re-run them. Regards, Kai
May 23 2015
prev sibling parent "Kai Nacke" <kai redstar.de> writes:
On Wednesday, 20 May 2015 at 13:34:50 UTC, Temtaime wrote:
 Hi all !
 http://goo.gl/0JI4qJ
 Why ldc cannot optimize it ?
 Should i create an issue ?
 For example gdc uses only one mulfps.
Should be fixed. Now ldc generates: movups (%r8), %xmm0 shufps $0, %xmm1, %xmm1 mulps %xmm0, %xmm1 movups %xmm1, (%rcx) movq %rcx, %rax retq Regards, Kai
May 23 2015
prev sibling parent reply "kinke" <noone nowhere.com> writes:
Update:

* LLVM master (1fd101c)
* LDC: branch merge-2.067
** *.conf.in files hacked to include
    "-L/LARGEADDRESSAWARE:NO" as default option
** druntime: branch ldc-merge-2.067 +
https://github.com/ldc-developers/druntime/pull/29 (VS 2015 only)
** phobos: branch ldc-merge-2.067 +
https://github.com/JohanEngelen/phobos/commit/2ac2581fe49da475bf6f687cfb7bcb9c9ddf8b71
https://github.com/kinke/phobos/commit/86511b3ca9f4a6b5358b7983ee64d0a688c63216

Due to https://github.com/ldc-developers/ldc/issues/930,
you'll need to hack your <buildDir>\build.ninja file and
exclude the -g switch when building
runtime\std\string-unittest-debug.obj.

Except for a Win64-specific core.thread unittest
(testNonvolatileRegister), which fails in the release build,
all druntime & phobos unittests pass, at least with VS 2015. :)
May 23 2015
next sibling parent "kinke" <noone nowhere.com> writes:
With VS 2013, 2 std.conv unittests fail, due to 
strtod()/strtold() not being able to parse hex strings, otherwise 
same as for VS 2015.
May 24 2015
prev sibling parent "Elie Morisse" <syniurge gmail.com> writes:
On Sunday, 24 May 2015 at 01:49:14 UTC, kinke wrote:
 Update:

 * LLVM master (1fd101c)
 * LDC: branch merge-2.067
 ** *.conf.in files hacked to include
    "-L/LARGEADDRESSAWARE:NO" as default option
 ** druntime: branch ldc-merge-2.067 +
 https://github.com/ldc-developers/druntime/pull/29 (VS 2015 
 only)
 ** phobos: branch ldc-merge-2.067 +
 https://github.com/JohanEngelen/phobos/commit/2ac2581fe49da475bf6f687cfb7bcb9c9ddf8b71
 https://github.com/kinke/phobos/commit/86511b3ca9f4a6b5358b7983ee64d0a688c63216

 Due to https://github.com/ldc-developers/ldc/issues/930,
 you'll need to hack your <buildDir>\build.ninja file and
 exclude the -g switch when building
 runtime\std\string-unittest-debug.obj.

 Except for a Win64-specific core.thread unittest
 (testNonvolatileRegister), which fails in the release build,
 all druntime & phobos unittests pass, at least with VS 2015. :)
Wonderful!
May 24 2015