digitalmars.D - Floating Point + Threads?
- dsimcha (28/28) Apr 15 2011 I'm trying to debug an extremely strange bug whose symptoms appear in a
- Andrei Alexandrescu (3/7) Apr 15 2011 Does the scheduling affect the summation order?
- Walter Bright (3/9) Apr 16 2011 That's a good thought. FP addition results can differ dramatically depen...
- Iain Buclaw (2/12) Apr 16 2011 And not to forget optimisations too. ;)
- Walter Bright (3/7) Apr 16 2011 The dmd optimizer is careful not to reorder evaluation in such a way as ...
- Iain Buclaw (17/25) Apr 16 2011 And so it rightly shouldn't!
- Walter Bright (2/5) Apr 16 2011 You're right on that one.
- dsimcha (5/13) Apr 16 2011 No. I realize floating point addition isn't associative, but unless
- Robert Jacques (12/42) Apr 15 2011 Well, on one hand floating point math is not cumulative, and running sum...
- dsimcha (17/67) Apr 16 2011 Right. For this example, though, assuming floating point math behaves
- dsimcha (37/46) Apr 16 2011 Ok, it's definitely **NOT** a bug in std.parallelism. Here's a reduced
- Andrei Alexandrescu (4/9) Apr 16 2011 I think at this precision the difference may be in random bits. Probably...
- dsimcha (6/15) Apr 16 2011 "random bits"? I am fully aware that these low order bits are numerical...
- Andrei Alexandrescu (4/21) Apr 16 2011 I seem to remember that essentially some of the last bits printed in
- Walter Bright (3/5) Apr 16 2011 To see what the exact bits are, print using %A format.
- Iain Buclaw (2/8) Apr 16 2011 This is not something I can replicate on my workstation.
- dsimcha (4/12) Apr 16 2011 Interesting. Since I know you're a Linux user, I fired up my Ubuntu VM
- JimBob (5/10) Apr 16 2011 Could be something somewhere is getting truncated from real to double, w...
- Timon Gehr (14/17) Apr 16 2011 Yes indeed, this is a _Windows_ "bug".
- dsimcha (9/26) Apr 16 2011 Close: If I add this instruction to the function for the new thread,
- dsimcha (3/43) Apr 16 2011 Read up a little on what fninit does, etc. This is IMHO a druntime bug....
- Sean Kelly (6/19) Apr 16 2011 report.
- Walter Bright (4/18) Apr 16 2011 The dmd startup code (actually the C startup code) does an fninit. I nev...
- bearophile (5/8) Apr 16 2011 My congratulations to all the (mostly two) people involved in finding th...
- Robert Jacques (5/28) Apr 16 2011 The documentation I've found on fninit seems to indicate it defaults to ...
- Sean Kelly (17/23) Apr 19 2011 never thought about new thread starts. So, yeah, druntime should do an =
- Robert Jacques (3/26) Apr 19 2011 Yes, that sounds right. Thanks for clarifying.
- Don (29/37) Apr 20 2011 ??? Yes there is.
- Sean Kelly (12/27) Apr 20 2011 I never thought about new thread starts. So, yeah, druntime should do an...
-
JimBob
(8/33)
Apr 20 2011
"Sean Kelly"
wrote in message - Sean Kelly (13/53) Apr 20 2011 do an=20
- Don (2/41) Apr 20 2011 Yes.
I'm trying to debug an extremely strange bug whose symptoms appear in a std.parallelism example, though I'm not at all sure the root cause is in std.parallelism. The bug report is at https://github.com/dsimcha/std.parallelism/issues/1#issuecomment-1011717 . Basically, the example in question sums up all the elements of a lazy range (actually, std.algorithm.map) in parallel. It uses taskPool.reduce, which divides the summation into work units to be executed in parallel. When executed in parallel, the results of the summation are non-deterministic after about the 12th decimal place, even though all of the following properties are true: 1. The work is divided into work units in a deterministic fashion. 2. Within each work unit, the summation happens in a deterministic order. 3. The final summation of the results of all the work units is done in a deterministic order. 4. The smallest term in the summation is about 5e-10. This means the difference across runs is about two orders of magnitude smaller than the smallest term. It can't be a concurrency bug where some terms sometimes get skipped. 5. The results for the individual tasks, not just the final summation, differ in the low-order bits. Each task is executed in a single thread. 6. The rounding mode is apparently the same in all of the threads. 7. The bug appears even on machines with only one core, as long as the number of task pool threads is manually set to >0. Since it's a single core machine, it can't be a low level memory model issue. What could possibly cause such small, non-deterministic differences in floating point results, given everything above? I'm just looking for suggestions here, as I don't even know where to start hunting for a bug like this.
Apr 15 2011
On 4/15/11 10:22 PM, dsimcha wrote:I'm trying to debug an extremely strange bug whose symptoms appear in a std.parallelism example, though I'm not at all sure the root cause is in std.parallelism. The bug report is at https://github.com/dsimcha/std.parallelism/issues/1#issuecomment-1011717 .Does the scheduling affect the summation order? Andrei
Apr 15 2011
On 4/15/2011 8:40 PM, Andrei Alexandrescu wrote:On 4/15/11 10:22 PM, dsimcha wrote:That's a good thought. FP addition results can differ dramatically depending on associativity.I'm trying to debug an extremely strange bug whose symptoms appear in a std.parallelism example, though I'm not at all sure the root cause is in std.parallelism. The bug report is at https://github.com/dsimcha/std.parallelism/issues/1#issuecomment-1011717 .Does the scheduling affect the summation order?
Apr 16 2011
== Quote from Walter Bright (newshound2 digitalmars.com)'s articleOn 4/15/2011 8:40 PM, Andrei Alexandrescu wrote:And not to forget optimisations too. ;)On 4/15/11 10:22 PM, dsimcha wrote:That's a good thought. FP addition results can differ dramatically depending on associativity.I'm trying to debug an extremely strange bug whose symptoms appear in a std.parallelism example, though I'm not at all sure the root cause is in std.parallelism. The bug report is at https://github.com/dsimcha/std.parallelism/issues/1#issuecomment-1011717 .Does the scheduling affect the summation order?
Apr 16 2011
On 4/16/2011 6:46 AM, Iain Buclaw wrote:== Quote from Walter Bright (newshound2 digitalmars.com)'s articleThe dmd optimizer is careful not to reorder evaluation in such a way as to change the results.That's a good thought. FP addition results can differ dramatically depending on associativity.And not to forget optimisations too. ;)
Apr 16 2011
== Quote from Walter Bright (newshound2 digitalmars.com)'s articleOn 4/16/2011 6:46 AM, Iain Buclaw wrote:And so it rightly shouldn't! I was thinking more of a case of FPU precision rather than ordering: as in you get a different result computing on SSE in double precision mode on the one hand, and by computing on x87 in double precision then writing to a double variable in memory. Classic example (which could either be a bug or non-bug depending on your POV): void test(double x, double y) { double y2 = x + 1.0; assert(y == y2); // triggers with -O } void main() { double x = .012; double y = x + 1.0; test(x, y); }== Quote from Walter Bright (newshound2 digitalmars.com)'s articleThe dmd optimizer is careful not to reorder evaluation in such a way as to change the results.That's a good thought. FP addition results can differ dramatically depending on associativity.And not to forget optimisations too. ;)
Apr 16 2011
On 4/16/2011 11:43 AM, Iain Buclaw wrote:I was thinking more of a case of FPU precision rather than ordering: as in you get a different result computing on SSE in double precision mode on the one hand, and by computing on x87 in double precision then writing to a double variable in memory.You're right on that one.
Apr 16 2011
On 4/15/2011 11:40 PM, Andrei Alexandrescu wrote:On 4/15/11 10:22 PM, dsimcha wrote:No. I realize floating point addition isn't associative, but unless there's some detail I'm forgetting about, the ordering is deterministic within each work unit and the ordering of the final summation is deterministic.I'm trying to debug an extremely strange bug whose symptoms appear in a std.parallelism example, though I'm not at all sure the root cause is in std.parallelism. The bug report is at https://github.com/dsimcha/std.parallelism/issues/1#issuecomment-1011717 .Does the scheduling affect the summation order? Andrei
Apr 16 2011
On Fri, 15 Apr 2011 23:22:04 -0400, dsimcha <dsimcha yahoo.com> wrote:I'm trying to debug an extremely strange bug whose symptoms appear in a std.parallelism example, though I'm not at all sure the root cause is in std.parallelism. The bug report is at https://github.com/dsimcha/std.parallelism/issues/1#issuecomment-1011717 . Basically, the example in question sums up all the elements of a lazy range (actually, std.algorithm.map) in parallel. It uses taskPool.reduce, which divides the summation into work units to be executed in parallel. When executed in parallel, the results of the summation are non-deterministic after about the 12th decimal place, even though all of the following properties are true: 1. The work is divided into work units in a deterministic fashion. 2. Within each work unit, the summation happens in a deterministic order. 3. The final summation of the results of all the work units is done in a deterministic order. 4. The smallest term in the summation is about 5e-10. This means the difference across runs is about two orders of magnitude smaller than the smallest term. It can't be a concurrency bug where some terms sometimes get skipped. 5. The results for the individual tasks, not just the final summation, differ in the low-order bits. Each task is executed in a single thread. 6. The rounding mode is apparently the same in all of the threads. 7. The bug appears even on machines with only one core, as long as the number of task pool threads is manually set to >0. Since it's a single core machine, it can't be a low level memory model issue. What could possibly cause such small, non-deterministic differences in floating point results, given everything above? I'm just looking for suggestions here, as I don't even know where to start hunting for a bug like this.Well, on one hand floating point math is not cumulative, and running sums have many known issues (I'd recommend looking up Khan summation). On the hand, it should be repeatably different. As for suggestions? First and foremost, you should always add small to large, so try using iota(n-1,-1,-1) instead of iota(n). Not only should the answer be better, but if your error rate goes down, you have a good idea of where the problem is. I'd also try isolating your implementation's numerics, from the underlying concurrency. i.e. use a task pool of 1 and don't let the host thread join it, so the entire job is done by one worker. The other thing to try is isolation /removing map and iota from the equation.
Apr 15 2011
On 4/16/2011 2:09 AM, Robert Jacques wrote:On Fri, 15 Apr 2011 23:22:04 -0400, dsimcha <dsimcha yahoo.com> wrote:Right. For this example, though, assuming floating point math behaves like regular math is a good enough approximation. The issue isn't that the results aren't reasonably accurate. Furthermore, the results will always change slightly depending on how many work units you have. (I even warn in the documentation that floating point addition is not associative, though it is approximately associative in the well-behaved cases.) My only concern is whether this non-determinism represents some deep underlying bug. For a given work unit allocation (work unit allocations are deterministic and only change when the number of threads changes or they're changed explicitly), I can't figure out how scheduling could change the results at all. If I could be sure that it wasn't a symptom of an underlying bug in std.parallelism, I wouldn't care about this small amount of numerical fuzz. Floating point math is always inexact and parallel summation by its nature can't be made to give the exact same results as serial summation.I'm trying to debug an extremely strange bug whose symptoms appear in a std.parallelism example, though I'm not at all sure the root cause is in std.parallelism. The bug report is at https://github.com/dsimcha/std.parallelism/issues/1#issuecomment-1011717 . Basically, the example in question sums up all the elements of a lazy range (actually, std.algorithm.map) in parallel. It uses taskPool.reduce, which divides the summation into work units to be executed in parallel. When executed in parallel, the results of the summation are non-deterministic after about the 12th decimal place, even though all of the following properties are true: 1. The work is divided into work units in a deterministic fashion. 2. Within each work unit, the summation happens in a deterministic order. 3. The final summation of the results of all the work units is done in a deterministic order. 4. The smallest term in the summation is about 5e-10. This means the difference across runs is about two orders of magnitude smaller than the smallest term. It can't be a concurrency bug where some terms sometimes get skipped. 5. The results for the individual tasks, not just the final summation, differ in the low-order bits. Each task is executed in a single thread. 6. The rounding mode is apparently the same in all of the threads. 7. The bug appears even on machines with only one core, as long as the number of task pool threads is manually set to >0. Since it's a single core machine, it can't be a low level memory model issue. What could possibly cause such small, non-deterministic differences in floating point results, given everything above? I'm just looking for suggestions here, as I don't even know where to start hunting for a bug like this.Well, on one hand floating point math is not cumulative, and running sums have many known issues (I'd recommend looking up Khan summation). On the hand, it should be repeatably different. As for suggestions? First and foremost, you should always add small to large, so try using iota(n-1,-1,-1) instead of iota(n). Not only should the answer be better, but if your error rate goes down, you have a good idea of where the problem is. I'd also try isolating your implementation's numerics, from the underlying concurrency. i.e. use a task pool of 1 and don't let the host thread join it, so the entire job is done by one worker. The other thing to try is isolation /removing map and iota from the equation.
Apr 16 2011
On 4/16/2011 10:11 AM, dsimcha wrote:My only concern is whether this non-determinism represents some deep underlying bug. For a given work unit allocation (work unit allocations are deterministic and only change when the number of threads changes or they're changed explicitly), I can't figure out how scheduling could change the results at all. If I could be sure that it wasn't a symptom of an underlying bug in std.parallelism, I wouldn't care about this small amount of numerical fuzz. Floating point math is always inexact and parallel summation by its nature can't be made to give the exact same results as serial summation.Ok, it's definitely **NOT** a bug in std.parallelism. Here's a reduced test case that only uses core.thread, not std.parallelism. All it does is sum an array using std.algorithm.reduce from the main thread, then start a new thread to do the same thing and compare answers. At the beginning of the summation function the rounding mode is printed to verify that it's the same for both threads. The two threads get slightly different answers. Just to thoroughly rule out a concurrency bug, I didn't even let the two threads execute concurrently. The main thread produces its result, then starts and immediately joins the second thread. import std.algorithm, core.thread, std.stdio, core.stdc.fenv; real sumRange(const(real)[] range) { writeln("Rounding mode: ", fegetround); // 0 from both threads. return reduce!"a + b"(range); } void main() { immutable n = 1_000_000; immutable delta = 1.0 / n; auto terms = new real[1_000_000]; foreach(i, ref term; terms) { immutable x = ( i - 0.5 ) * delta; term = delta / ( 1.0 + x * x ) * 1; } immutable res1 = sumRange(terms); writefln("%.19f", res1); real res2; auto t = new Thread( { res2 = sumRange(terms); } ); t.start(); t.join(); writefln("%.19f", res2); } Output: Rounding mode: 0 0.7853986633972191094 Rounding mode: 0 0.7853986633972437348
Apr 16 2011
On 4/16/11 9:52 AM, dsimcha wrote:Output: Rounding mode: 0 0.7853986633972191094 Rounding mode: 0 0.7853986633972437348I think at this precision the difference may be in random bits. Probably you don't need to worry about it. Andrei
Apr 16 2011
On 4/16/2011 10:55 AM, Andrei Alexandrescu wrote:On 4/16/11 9:52 AM, dsimcha wrote:"random bits"? I am fully aware that these low order bits are numerical fuzz and are meaningless from a practical perspective. I am only concerned because I thought these bits are supposed to be deterministic even if they're meaningless. Now that I've ruled out a bug in std.parallelism, I'm wondering if it's a bug in druntime or DMD.Output: Rounding mode: 0 0.7853986633972191094 Rounding mode: 0 0.7853986633972437348I think at this precision the difference may be in random bits. Probably you don't need to worry about it. Andrei
Apr 16 2011
On 4/16/11 9:59 AM, dsimcha wrote:On 4/16/2011 10:55 AM, Andrei Alexandrescu wrote:I seem to remember that essentially some of the last bits printed in such a result are essentially arbitrary. I forgot what could cause this. AndreiOn 4/16/11 9:52 AM, dsimcha wrote:"random bits"? I am fully aware that these low order bits are numerical fuzz and are meaningless from a practical perspective. I am only concerned because I thought these bits are supposed to be deterministic even if they're meaningless. Now that I've ruled out a bug in std.parallelism, I'm wondering if it's a bug in druntime or DMD.Output: Rounding mode: 0 0.7853986633972191094 Rounding mode: 0 0.7853986633972437348I think at this precision the difference may be in random bits. Probably you don't need to worry about it. Andrei
Apr 16 2011
On 4/16/2011 9:52 AM, Andrei Alexandrescu wrote:I seem to remember that essentially some of the last bits printed in such a result are essentially arbitrary. I forgot what could cause this.To see what the exact bits are, print using %A format. In any case, floating point bits are not random. They are completely deterministic.
Apr 16 2011
== Quote from dsimcha (dsimcha yahoo.com)'s articleOn 4/16/2011 10:11 AM, dsimcha wrote: Output: Rounding mode: 0 0.7853986633972191094 Rounding mode: 0 0.7853986633972437348This is not something I can replicate on my workstation.
Apr 16 2011
On 4/16/2011 11:16 AM, Iain Buclaw wrote:== Quote from dsimcha (dsimcha yahoo.com)'s articleInteresting. Since I know you're a Linux user, I fired up my Ubuntu VM and tried out my test case. I can't reproduce it either on Linux, only on Windows.On 4/16/2011 10:11 AM, dsimcha wrote: Output: Rounding mode: 0 0.7853986633972191094 Rounding mode: 0 0.7853986633972437348This is not something I can replicate on my workstation.
Apr 16 2011
"dsimcha" <dsimcha yahoo.com> wrote in message news:iocalv$2h58$1 digitalmars.com...Output: Rounding mode: 0 0.7853986633972191094 Rounding mode: 0 0.7853986633972437348Could be something somewhere is getting truncated from real to double, which would mean 12 fewer bits of mantisa. Maybe the FPU is set to lower precision in one of the threads?
Apr 16 2011
Could be something somewhere is getting truncated from real to double, which would mean 12 fewer bits of mantisa. Maybe the FPU is set to lower precision in one of the threads?Yes indeed, this is a _Windows_ "bug". I have experienced this in Windows before, the main thread's FPU state register is initialized to lower FPU-Precision (64bits) by default by the OS, presumably to make FP calculations faster. However, when you start a new thread, the FPU will use the whole 80 bits for computation because, curiously, the FPU is not reconfigured for those. Suggested fix: Add asm{fninit}; to the beginning of your main function, and the difference between the two will be gone. This would be a compatibility issue DMD/windows which disables the "real" data type. You might want to file a bug report for druntime if my suggested fix works. (This would imply that the real type was basically identical to the double type in Windows all along!)
Apr 16 2011
On 4/16/2011 2:15 PM, Timon Gehr wrote:Close: If I add this instruction to the function for the new thread, the difference goes away. The relevant statement is: auto t = new Thread( { asm { fninit; } res2 = sumRange(terms); } ); At any rate, this is a **huge** WTF that should probably be fixed in druntime. Once I understand it a little better, I'll file a bug report.Could be something somewhere is getting truncated from real to double, which would mean 12 fewer bits of mantisa. Maybe the FPU is set to lower precision in one of the threads?Yes indeed, this is a _Windows_ "bug". I have experienced this in Windows before, the main thread's FPU state register is initialized to lower FPU-Precision (64bits) by default by the OS, presumably to make FP calculations faster. However, when you start a new thread, the FPU will use the whole 80 bits for computation because, curiously, the FPU is not reconfigured for those. Suggested fix: Add asm{fninit}; to the beginning of your main function, and the difference between the two will be gone. This would be a compatibility issue DMD/windows which disables the "real" data type. You might want to file a bug report for druntime if my suggested fix works. (This would imply that the real type was basically identical to the double type in Windows all along!)
Apr 16 2011
On 4/16/2011 2:24 PM, dsimcha wrote:On 4/16/2011 2:15 PM, Timon Gehr wrote:Read up a little on what fninit does, etc. This is IMHO a druntime bug. Filed as http://d.puremagic.com/issues/show_bug.cgi?id=5847 .Close: If I add this instruction to the function for the new thread, the difference goes away. The relevant statement is: auto t = new Thread( { asm { fninit; } res2 = sumRange(terms); } ); At any rate, this is a **huge** WTF that should probably be fixed in druntime. Once I understand it a little better, I'll file a bug report.Could be something somewhere is getting truncated from real to double, which would mean 12 fewer bits of mantisa. Maybe the FPU is set to lower precision in one of the threads?Yes indeed, this is a _Windows_ "bug". I have experienced this in Windows before, the main thread's FPU state register is initialized to lower FPU-Precision (64bits) by default by the OS, presumably to make FP calculations faster. However, when you start a new thread, the FPU will use the whole 80 bits for computation because, curiously, the FPU is not reconfigured for those. Suggested fix: Add asm{fninit}; to the beginning of your main function, and the difference between the two will be gone. This would be a compatibility issue DMD/windows which disables the "real" data type. You might want to file a bug report for druntime if my suggested fix works. (This would imply that the real type was basically identical to the double type in Windows all along!)
Apr 16 2011
On Apr 16, 2011, at 11:43 AM, dsimcha wrote:the=20 Close: If I add this instruction to the function for the new thread, =report.difference goes away. The relevant statement is: =20 auto t =3D new Thread( { asm { fninit; } res2 =3D sumRange(terms); } ); =20 At any rate, this is a **huge** WTF that should probably be fixed in druntime. Once I understand it a little better, I'll file a bug ==20 Read up a little on what fninit does, etc. This is IMHO a druntime =bug. Filed as http://d.puremagic.com/issues/show_bug.cgi?id=3D5847 . Really a Windows bug that should be fixed in druntime :-) I know I'm = splitting hairs. This will be fixed for the next release.
Apr 16 2011
On 4/16/2011 11:51 AM, Sean Kelly wrote:On Apr 16, 2011, at 11:43 AM, dsimcha wrote:The dmd startup code (actually the C startup code) does an fninit. I never thought about new thread starts. So, yeah, druntime should do an fninit on thread creation.Really a Windows bug that should be fixed in druntime :-) I know I'm splitting hairs. This will be fixed for the next release.Close: If I add this instruction to the function for the new thread, the difference goes away. The relevant statement is: auto t = new Thread( { asm { fninit; } res2 = sumRange(terms); } ); At any rate, this is a **huge** WTF that should probably be fixed in druntime. Once I understand it a little better, I'll file a bug report.Read up a little on what fninit does, etc. This is IMHO a druntime bug. Filed as http://d.puremagic.com/issues/show_bug.cgi?id=5847 .
Apr 16 2011
Walter:The dmd startup code (actually the C startup code) does an fninit. I never thought about new thread starts. So, yeah, druntime should do an fninit on thread creation.My congratulations to all the (mostly two) people involved in finding this bug and its causes :-) I'd like to see this module in Phobos. Bye, bearophile
Apr 16 2011
On Sat, 16 Apr 2011 15:32:12 -0400, Walter Bright <newshound2 digitalmars.com> wrote:On 4/16/2011 11:51 AM, Sean Kelly wrote:The documentation I've found on fninit seems to indicate it defaults to 64-bit precision, which means that by default we aren't seeing the benefit of D's reals. I'd much prefer 80-bit precision by default.On Apr 16, 2011, at 11:43 AM, dsimcha wrote:The dmd startup code (actually the C startup code) does an fninit. I never thought about new thread starts. So, yeah, druntime should do an fninit on thread creation.Really a Windows bug that should be fixed in druntime :-) I know I'm splitting hairs. This will be fixed for the next release.Close: If I add this instruction to the function for the new thread, the difference goes away. The relevant statement is: auto t = new Thread( { asm { fninit; } res2 = sumRange(terms); } ); At any rate, this is a **huge** WTF that should probably be fixed in druntime. Once I understand it a little better, I'll file a bug report.Read up a little on what fninit does, etc. This is IMHO a druntime bug. Filed as http://d.puremagic.com/issues/show_bug.cgi?id=5847 .
Apr 16 2011
On Apr 16, 2011, at 1:02 PM, Robert Jacques wrote:On Sat, 16 Apr 2011 15:32:12 -0400, Walter Bright =<newshound2 digitalmars.com> wrote:never thought about new thread starts. So, yeah, druntime should do an = fninit on thread creation.=20 =20 The dmd startup code (actually the C startup code) does an fninit. I ==20 The documentation I've found on fninit seems to indicate it defaults =to 64-bit precision, which means that by default we aren't seeing the = benefit of D's reals. I'd much prefer 80-bit precision by default. There is no option to set "80-bit precision" via the FPU control word. = Section 8.1.5.2 of the Intel 64 SDM says the following: "The precision-control (PC) field (bits 8 and 9 of the x87 FPU control = word) determines the precision (64, 53, or 24 bits) of floating-point = calculations made by the x87 FPU (see Table 8-2). The default precision = is double extended precision, which uses the full 64-bit significand = available with the double extended-precision floating-point format of = the x87 FPU data registers. This setting is best suited for most = applications, because it allows applications to take full advantage of = the maximum precision available with the x87 FPU data registers." So it sounds like finit/fninit does what we want.=
Apr 19 2011
On Tue, 19 Apr 2011 14:18:46 -0400, Sean Kelly <sean invisibleduck.org> wrote:On Apr 16, 2011, at 1:02 PM, Robert Jacques wrote:Yes, that sounds right. Thanks for clarifying.On Sat, 16 Apr 2011 15:32:12 -0400, Walter Bright <newshound2 digitalmars.com> wrote:There is no option to set "80-bit precision" via the FPU control word. Section 8.1.5.2 of the Intel 64 SDM says the following: "The precision-control (PC) field (bits 8 and 9 of the x87 FPU control word) determines the precision (64, 53, or 24 bits) of floating-point calculations made by the x87 FPU (see Table 8-2). The default precision is double extended precision, which uses the full 64-bit significand available with the double extended-precision floating-point format of the x87 FPU data registers. This setting is best suited for most applications, because it allows applications to take full advantage of the maximum precision available with the x87 FPU data registers." So it sounds like finit/fninit does what we want.The dmd startup code (actually the C startup code) does an fninit. I never thought about new thread starts. So, yeah, druntime should do an fninit on thread creation.The documentation I've found on fninit seems to indicate it defaults to 64-bit precision, which means that by default we aren't seeing the benefit of D's reals. I'd much prefer 80-bit precision by default.
Apr 19 2011
Sean Kelly wrote:On Apr 16, 2011, at 1:02 PM, Robert Jacques wrote:??? Yes there is. enum PrecisionControl : short { PRECISION80 = 0x300, PRECISION64 = 0x200, PRECISION32 = 0x000 }; /** Set the number of bits of precision used by 'real'. * * Returns: the old precision. * This is not supported on all platforms. */ PrecisionControl reduceRealPrecision(PrecisionControl prec) { version(D_InlineAsm_X86) { short cont; asm { fstcw cont; mov CX, cont; mov AX, cont; and EAX, 0x0300; // Form the return value and CX, 0xFCFF; or CX, prec; mov cont, CX; fldcw cont; } } else { assert(0, "Not yet supported"); } }On Sat, 16 Apr 2011 15:32:12 -0400, Walter Bright <newshound2 digitalmars.com> wrote:There is no option to set "80-bit precision" via the FPU control word.The dmd startup code (actually the C startup code) does an fninit. I never thought about new thread starts. So, yeah, druntime should do an fninit on thread creation.The documentation I've found on fninit seems to indicate it defaults to 64-bit precision, which means that by default we aren't seeing the benefit of D's reals. I'd much prefer 80-bit precision by default.
Apr 20 2011
On Apr 20, 2011, at 5:06 AM, Don wrote:Sean Kelly wrote:<newshound2 digitalmars.com> wrote:On Apr 16, 2011, at 1:02 PM, Robert Jacques wrote:On Sat, 16 Apr 2011 15:32:12 -0400, Walter Bright =I never thought about new thread starts. So, yeah, druntime should do an = fninit on thread creation.=20 The dmd startup code (actually the C startup code) does an fninit. =to 64-bit precision, which means that by default we aren't seeing the = benefit of D's reals. I'd much prefer 80-bit precision by default.The documentation I've found on fninit seems to indicate it defaults =word. =20There is no option to set "80-bit precision" via the FPU control ==20 ??? Yes there is. =20 enum PrecisionControl : short { PRECISION80 =3D 0x300, PRECISION64 =3D 0x200, PRECISION32 =3D 0x000 };So has Intel deprecated 80-bit FPU support? Why do the docs for this = say that 64-bit is the highest precision? And more importantly, does = this mean that we should be setting the PC field explicitly instead of = relying on fninit? The docs say that fninit initializes to 64-bit = precision. Or is that inaccurate as well?=
Apr 20 2011
"Sean Kelly" <sean invisibleduck.org> wrote in message news:mailman.3597.1303316625.4748.digitalmars-d puremagic.com... On Apr 20, 2011, at 5:06 AM, Don wrote:Sean Kelly wrote:You misread the docs, it's talking about precision which is just the size of the mantisa, not the actual full size of the floating point data. IE... 80 float = 64 bit precision 64 float = 53 bit precision 32 float = 24 bit precisionOn Apr 16, 2011, at 1:02 PM, Robert Jacques wrote:??? Yes there is. enum PrecisionControl : short { PRECISION80 = 0x300, PRECISION64 = 0x200, PRECISION32 = 0x000 }; So has Intel deprecated 80-bit FPU support? Why do the docs for this say that 64-bit is the highest precision? And more importantly, does this mean that we should be setting the PC field explicitly instead of relying on fninit? The docs say that fninit initializes to 64-bit precision. Or is that inaccurate as well?=On Sat, 16 Apr 2011 15:32:12 -0400, Walter Bright <newshound2 digitalmars.com> wrote:There is no option to set "80-bit precision" via the FPU control word.The dmd startup code (actually the C startup code) does an fninit. I never thought about new thread starts. So, yeah, druntime should do an fninit on thread creation.The documentation I've found on fninit seems to indicate it defaults to 64-bit precision, which means that by default we aren't seeing the benefit of D's reals. I'd much prefer 80-bit precision by default.
Apr 20 2011
On Apr 20, 2011, at 10:46 AM, JimBob wrote:=20 "Sean Kelly" <sean invisibleduck.org> wrote in message=20 news:mailman.3597.1303316625.4748.digitalmars-d puremagic.com... On Apr 20, 2011, at 5:06 AM, Don wrote: =20I=20Sean Kelly wrote:On Apr 16, 2011, at 1:02 PM, Robert Jacques wrote:On Sat, 16 Apr 2011 15:32:12 -0400, Walter Bright=20 <newshound2 digitalmars.com> wrote:=20 The dmd startup code (actually the C startup code) does an fninit. =do an=20never thought about new thread starts. So, yeah, druntime should =defaults to=20fninit on thread creation.The documentation I've found on fninit seems to indicate it =64-bit precision, which means that by default we aren't seeing the=20=word.benefit of D's reals. I'd much prefer 80-bit precision by default.There is no option to set "80-bit precision" via the FPU control =say=20=20 ??? Yes there is. =20 enum PrecisionControl : short { PRECISION80 =3D 0x300, PRECISION64 =3D 0x200, PRECISION32 =3D 0x000 }; =20 So has Intel deprecated 80-bit FPU support? Why do the docs for this =we=20that 64-bit is the highest precision? And more importantly, does this mean that =that=20should be setting the PC field explicitly instead of relying on fninit? The docs say =size of=20fninit initializes to 64-bit precision. Or is that inaccurate as well?=3D=20 You misread the docs, it's talking about precision which is just the =the mantisa, not the actual full size of the floating point data. =IE...=20 80 float =3D 64 bit precision 64 float =3D 53 bit precision 32 float =3D 24 bit precisionOops, you're right. So to summarize: fninit does what we want because = it sets 64-bit precision, which is effectively 80-bit mode. Is this = correct?=
Apr 20 2011
Sean Kelly wrote:On Apr 20, 2011, at 10:46 AM, JimBob wrote:Yes."Sean Kelly" <sean invisibleduck.org> wrote in message news:mailman.3597.1303316625.4748.digitalmars-d puremagic.com... On Apr 20, 2011, at 5:06 AM, Don wrote:Oops, you're right. So to summarize: fninit does what we want because it sets 64-bit precision, which is effectively 80-bit mode. Is this correct?Sean Kelly wrote:You misread the docs, it's talking about precision which is just the size of the mantisa, not the actual full size of the floating point data. IE... 80 float = 64 bit precision 64 float = 53 bit precision 32 float = 24 bit precisionOn Apr 16, 2011, at 1:02 PM, Robert Jacques wrote:??? Yes there is. enum PrecisionControl : short { PRECISION80 = 0x300, PRECISION64 = 0x200, PRECISION32 = 0x000 }; So has Intel deprecated 80-bit FPU support? Why do the docs for this say that 64-bit is the highest precision? And more importantly, does this mean that we should be setting the PC field explicitly instead of relying on fninit? The docs say that fninit initializes to 64-bit precision. Or is that inaccurate as well?=On Sat, 16 Apr 2011 15:32:12 -0400, Walter Bright <newshound2 digitalmars.com> wrote:There is no option to set "80-bit precision" via the FPU control word.The dmd startup code (actually the C startup code) does an fninit. I never thought about new thread starts. So, yeah, druntime should do an fninit on thread creation.The documentation I've found on fninit seems to indicate it defaults to 64-bit precision, which means that by default we aren't seeing the benefit of D's reals. I'd much prefer 80-bit precision by default.
Apr 20 2011