D - D floating point maths
- John Fletcher (4/4) Feb 07 2002 At the moment function like sqrt() use the underlying C functions in
- Pavel Minayev (5/8) Feb 07 2002 Yes - write them =)
- Sean L. Palmer (9/13) Feb 08 2002 Actually can we have some functions like sin, cos, tan, and sqrt that de...
- John Fletcher (7/14) Feb 08 2002 I was pondring implementing the full precision versions I talked about. ...
- Pavel Minayev (7/12) Feb 08 2002 instructions
- Walter (4/7) Feb 08 2002 It usually just subtracts 12 from ESP and does an FST. With scheduling a...
- Russell Borogove (5/18) Feb 08 2002 Indeed on all counts, both extended and single-precision versions
- Pavel Minayev (3/6) Feb 08 2002 Not till we get inline asm working =)
- Sean L. Palmer (12/31) Feb 08 2002 I believe the common form of this stuff is to add "f" to the end of the ...
- Pavel Minayev (4/8) Feb 08 2002 name
- Juan Carlos Arevalo Baeza (6/15) Feb 08 2002 The suffix specifies the precision of the return type, which cannot b...
- Pavel Minayev (6/8) Feb 08 2002 Return type can be determined by the argument:
- Sean L. Palmer (4/13) Feb 08 2002 True true.
- Walter (4/8) Feb 08 2002 name
- Walter (7/21) Feb 08 2002 On Intel processors, the float and double math computations are not one ...
- Sean L. Palmer (12/16) Feb 08 2002 That's not true... but you have to set the CPU into low precision mode t...
- Walter (4/21) Feb 09 2002 Hmm. I didn't know that. -Walter
- Sean L. Palmer (46/74) Feb 09 2002 Here is a sample from the MSDN docs for VC++ 6.0 which illustrates this:
- Walter (9/86) Feb 09 2002 I know that you can reset the internal calculation precision. I did not ...
- Sean L. Palmer (8/11) Feb 09 2002 You may have read more recent docs than I have... last I thoroughly chec...
- Russell Borogove (8/14) Feb 09 2002 As an extension of item 2, note that in the FPU, they're not
- Walter (3/9) Feb 09 2002 Ok, I hadn't thought of that.
At the moment function like sqrt() use the underlying C functions in double precision. Is there any way to have versions which work to extended precision? John
Feb 07 2002
"John Fletcher" <J.P.Fletcher aston.ac.uk> wrote in message news:3C6268D1.C1E23080 aston.ac.uk...At the moment function like sqrt() use the underlying C functions in double precision. Is there any way to have versions which work to extended precision?Yes - write them =) The comment there says it's just a temporary solution. I'd expect all the math functions to be rewritten until the final release.
Feb 07 2002
Actually can we have some functions like sin, cos, tan, and sqrt that deal with float instead of double? In the world of games, speed is usually more important than accuracy and I hate having to explicitly typecast back to float to avoid warnings. Another nice thing to have is reciprocal square root (most processors have this nowadays...) usually it's cheaper (and less accurate) than 1/sqrt(x) Sean "John Fletcher" <J.P.Fletcher aston.ac.uk> wrote in message news:3C6268D1.C1E23080 aston.ac.uk...At the moment function like sqrt() use the underlying C functions in double precision. Is there any way to have versions which work to extended precision? John
Feb 08 2002
"Sean L. Palmer" wrote:Actually can we have some functions like sin, cos, tan, and sqrt that deal with float instead of double? In the world of games, speed is usually more important than accuracy and I hate having to explicitly typecast back to float to avoid warnings. Another nice thing to have is reciprocal square root (most processors have this nowadays...) usually it's cheaper (and less accurate) than 1/sqrt(x) SeanI was pondring implementing the full precision versions I talked about. I think that the modern Intel and compatible chips have coprocessor instructions which actually do the work to full precision. Is it just a question of wrapping the correct input and output around that to do the different cases? There may be some error trapping as well. John
Feb 08 2002
"John Fletcher" <J.P.Fletcher aston.ac.uk> wrote in message news:3C63C2A2.CC837516 aston.ac.uk...I was pondring implementing the full precision versions I talked about. I think that the modern Intel and compatible chips have coprocessorinstructionswhich actually do the work to full precision. Is it just a question of wrapping the correct input and output around that to do the differentcases?There may be some error trapping as well.Yes, AFAIK Intel FPUs do calculations in full precision anyhow. However, extended arguments have to be passed on stack, and since they're 10-byte long, you get three PUSHes (while float would only take one).
Feb 08 2002
"Pavel Minayev" <evilone omen.ru> wrote in message news:a40jcl$1oso$1 digitaldaemon.com...Yes, AFAIK Intel FPUs do calculations in full precision anyhow. However, extended arguments have to be passed on stack, and since they're 10-byte long, you get three PUSHes (while float would only take one).It usually just subtracts 12 from ESP and does an FST. With scheduling and pipelining, the extra instruction frequently takes no extra time.
Feb 08 2002
Sean L. Palmer wrote:Actually can we have some functions like sin, cos, tan, and sqrt that deal with float instead of double? In the world of games, speed is usually more important than accuracy and I hate having to explicitly typecast back to float to avoid warnings. Another nice thing to have is reciprocal square root (most processors have this nowadays...) usually it's cheaper (and less accurate) than 1/sqrt(x) "John Fletcher" <J.P.Fletcher aston.ac.uk> wrote in message news:3C6268D1.C1E23080 aston.ac.uk...Indeed on all counts, both extended and single-precision versions of at least the more common math functions would be valuable. Who wants to get to work on that library? :) -Russell BAt the moment function like sqrt() use the underlying C functions in double precision. Is there any way to have versions which work to extended precision?
Feb 08 2002
"Russell Borogove" <kaleja estarcion.com> wrote in message news:3C63FBED.1080500 estarcion.com...Indeed on all counts, both extended and single-precision versions of at least the more common math functions would be valuable. Who wants to get to work on that library? :)Not till we get inline asm working =)
Feb 08 2002
I believe the common form of this stuff is to add "f" to the end of the name sqrtf fabsf fmodf etc Sean "Russell Borogove" <kaleja estarcion.com> wrote in message news:3C63FBED.1080500 estarcion.com...Sean L. Palmer wrote:dealActually can we have some functions like sin, cos, tan, and sqrt thatmorewith float instead of double? In the world of games, speed is usuallyhaveimportant than accuracy and I hate having to explicitly typecast back to float to avoid warnings. Another nice thing to have is reciprocal square root (most processors1/sqrt(x)this nowadays...) usually it's cheaper (and less accurate) than"John Fletcher" <J.P.Fletcher aston.ac.uk> wrote in message news:3C6268D1.C1E23080 aston.ac.uk...Indeed on all counts, both extended and single-precision versions of at least the more common math functions would be valuable. Who wants to get to work on that library? :) -Russell BAt the moment function like sqrt() use the underlying C functions in double precision. Is there any way to have versions which work to extended precision?
Feb 08 2002
"Sean L. Palmer" <spalmer iname.com> wrote in message news:a4192g$2ihc$1 digitaldaemon.com...I believe the common form of this stuff is to add "f" to the end of thenamesqrtf fabsf fmodfWhy, if we have function overloading?
Feb 08 2002
"Pavel Minayev" <evilone omen.ru> wrote in message news:a419ev$2kdg$1 digitaldaemon.com..."Sean L. Palmer" <spalmer iname.com> wrote in message news:a4192g$2ihc$1 digitaldaemon.com...The suffix specifies the precision of the return type, which cannot be overloaded on. Or am I wrong? Salutaciones, JCABI believe the common form of this stuff is to add "f" to the end of thenamesqrtf fabsf fmodfWhy, if we have function overloading?
Feb 08 2002
"Juan Carlos Arevalo Baeza" <jcab roningames.com> wrote in message news:a419vp$2kol$1 digitaldaemon.com...The suffix specifies the precision of the return type, which cannot be overloaded on. Or am I wrong?Return type can be determined by the argument: float sqrt(float); double sqrt(double); extended sqrt(extended);
Feb 08 2002
True true. Sean "Pavel Minayev" <evilone omen.ru> wrote in message news:a419ev$2kdg$1 digitaldaemon.com..."Sean L. Palmer" <spalmer iname.com> wrote in message news:a4192g$2ihc$1 digitaldaemon.com...I believe the common form of this stuff is to add "f" to the end of thenamesqrtf fabsf fmodfWhy, if we have function overloading?
Feb 08 2002
"Sean L. Palmer" <spalmer iname.com> wrote in message news:a4192g$2ihc$1 digitaldaemon.com...I believe the common form of this stuff is to add "f" to the end of thenamesqrtf fabsf fmodfSince D supports overloading by argument type, that is not necessary.
Feb 08 2002
On Intel processors, the float and double math computations are not one iota faster than the extended ones. The ONLY reasons to use float and double are: 1) compatibility with C 2) large arrays will use less space "Sean L. Palmer" <spalmer iname.com> wrote in message news:a40c1s$1lfi$1 digitaldaemon.com...Actually can we have some functions like sin, cos, tan, and sqrt that deal with float instead of double? In the world of games, speed is usuallymoreimportant than accuracy and I hate having to explicitly typecast back to float to avoid warnings. Another nice thing to have is reciprocal square root (most processors have this nowadays...) usually it's cheaper (and less accurate) than 1/sqrt(x) Sean "John Fletcher" <J.P.Fletcher aston.ac.uk> wrote in message news:3C6268D1.C1E23080 aston.ac.uk...At the moment function like sqrt() use the underlying C functions in double precision. Is there any way to have versions which work to extended precision? John
Feb 08 2002
That's not true... but you have to set the CPU into low precision mode to see the speed advantages. Otherwise it internally works with double precision by default. In game scenarios, we can't just go around wasting 8 bytes per number when 4 bytes will do. And it depends on the processor, as well. Floats are still definitely faster. For instance the P4 can handle 2 doubles per instruction, but can do 4 floats in the same amount of time. Sean "Walter" <walter digitalmars.com> wrote in message news:a41oen$2se5$4 digitaldaemon.com...On Intel processors, the float and double math computations are not oneiotafaster than the extended ones. The ONLY reasons to use float and doubleare:1) compatibility with C 2) large arrays will use less space
Feb 08 2002
Hmm. I didn't know that. -Walter "Sean L. Palmer" <spalmer iname.com> wrote in message news:a42k1f$6m1$1 digitaldaemon.com...That's not true... but you have to set the CPU into low precision mode to see the speed advantages. Otherwise it internally works with double precision by default. In game scenarios, we can't just go around wasting 8 bytes per number when4bytes will do. And it depends on the processor, as well. Floats are still definitely faster. For instance the P4 can handle 2 doubles per instruction, but can do 4 floats in the same amount of time. Sean "Walter" <walter digitalmars.com> wrote in message news:a41oen$2se5$4 digitaldaemon.com...On Intel processors, the float and double math computations are not oneiotafaster than the extended ones. The ONLY reasons to use float and doubleare:1) compatibility with C 2) large arrays will use less space
Feb 09 2002
Here is a sample from the MSDN docs for VC++ 6.0 which illustrates this: (you can do timings yourself if you wish... It only affects the FPU x87 coprocessor. I haven't tried this in a few years so newer Pentium 4 processors may not see much advantage from this) However using SSE2 it is still true that with one instruction you can either process 2 doubles or 4 floats. I believe the main advantage this provides is keeping the FPU from having to do so much work with complex calculations like division, square root, trig, etc. Less bits of precision need be computed. They can get away with fewer iterations, cheaper approximations, less terms in the Taylor series, etc. In a lot of cases 5 or 6 digits of precision is all we need. So don't get rid of the float type yet. ;) Sean /* CNTRL87.C: This program uses _control87 to output the control * word, set the precision to 24 bits, and reset the status to * the default. */ #include <stdio.h> #include <float.h> void main( void ) { double a = 0.1; /* Show original control word and do calculation. */ printf( "Original: 0x%.4x\n", _control87( 0, 0 ) ); printf( "%1.1f * %1.1f = %.15e\n", a, a, a * a ); /* Set precision to 24 bits and recalculate. */ printf( "24-bit: 0x%.4x\n", _control87( _PC_24, MCW_PC ) ); printf( "%1.1f * %1.1f = %.15e\n", a, a, a * a ); /* Restore to default and recalculate. */ printf( "Default: 0x%.4x\n", _control87( _CW_DEFAULT, 0xfffff ) ); printf( "%1.1f * %1.1f = %.15e\n", a, a, a * a ); } Output Original: 0x9001f 0.1 * 0.1 = 1.000000000000000e-002 24-bit: 0xa001f 0.1 * 0.1 = 9.999999776482582e-003 Default: 0x001f 0.1 * 0.1 = 1.000000000000000e-002 "Walter" <walter digitalmars.com> wrote in message news:a42tca$hrc$2 digitaldaemon.com...Hmm. I didn't know that. -Walter "Sean L. Palmer" <spalmer iname.com> wrote in message news:a42k1f$6m1$1 digitaldaemon.com...toThat's not true... but you have to set the CPU into low precision modewhensee the speed advantages. Otherwise it internally works with double precision by default. In game scenarios, we can't just go around wasting 8 bytes per number4onebytes will do. And it depends on the processor, as well. Floats are still definitely faster. For instance the P4 can handle 2 doubles per instruction, but can do 4 floats in the same amount of time. Sean "Walter" <walter digitalmars.com> wrote in message news:a41oen$2se5$4 digitaldaemon.com...On Intel processors, the float and double math computations are notdoubleiotafaster than the extended ones. The ONLY reasons to use float andare:1) compatibility with C 2) large arrays will use less space
Feb 09 2002
I know that you can reset the internal calculation precision. I did not know this affected execution time, I've not seen any hint of that in the Intel CPU documentation, though I could have just missed it. "Sean L. Palmer" <spalmer iname.com> wrote in message news:a444db$14h4$1 digitaldaemon.com...Here is a sample from the MSDN docs for VC++ 6.0 which illustrates this: (you can do timings yourself if you wish... It only affects the FPU x87 coprocessor. I haven't tried this in a few years so newer Pentium 4 processors may not see much advantage from this) However using SSE2 it is still true that with one instruction you can either process 2 doubles or 4 floats. I believe the main advantage this provides is keeping the FPU from havingtodo so much work with complex calculations like division, square root,trig,etc. Less bits of precision need be computed. They can get away withfeweriterations, cheaper approximations, less terms in the Taylor series, etc. In a lot of cases 5 or 6 digits of precision is all we need. So don't get rid of the float type yet. ;) Sean /* CNTRL87.C: This program uses _control87 to output the control * word, set the precision to 24 bits, and reset the status to * the default. */ #include <stdio.h> #include <float.h> void main( void ) { double a = 0.1; /* Show original control word and do calculation. */ printf( "Original: 0x%.4x\n", _control87( 0, 0 ) ); printf( "%1.1f * %1.1f = %.15e\n", a, a, a * a ); /* Set precision to 24 bits and recalculate. */ printf( "24-bit: 0x%.4x\n", _control87( _PC_24, MCW_PC ) ); printf( "%1.1f * %1.1f = %.15e\n", a, a, a * a ); /* Restore to default and recalculate. */ printf( "Default: 0x%.4x\n", _control87( _CW_DEFAULT, 0xfffff ) ); printf( "%1.1f * %1.1f = %.15e\n", a, a, a * a ); } Output Original: 0x9001f 0.1 * 0.1 = 1.000000000000000e-002 24-bit: 0xa001f 0.1 * 0.1 = 9.999999776482582e-003 Default: 0x001f 0.1 * 0.1 = 1.000000000000000e-002 "Walter" <walter digitalmars.com> wrote in message news:a42tca$hrc$2 digitaldaemon.com...time.Hmm. I didn't know that. -Walter "Sean L. Palmer" <spalmer iname.com> wrote in message news:a42k1f$6m1$1 digitaldaemon.com...toThat's not true... but you have to set the CPU into low precision modewhensee the speed advantages. Otherwise it internally works with double precision by default. In game scenarios, we can't just go around wasting 8 bytes per number4bytes will do. And it depends on the processor, as well. Floats are still definitely faster. For instance the P4 can handle 2 doubles per instruction, but can do 4 floats in the same amount ofoneSean "Walter" <walter digitalmars.com> wrote in message news:a41oen$2se5$4 digitaldaemon.com...On Intel processors, the float and double math computations are notdoubleiotafaster than the extended ones. The ONLY reasons to use float andare:1) compatibility with C 2) large arrays will use less space
Feb 09 2002
You may have read more recent docs than I have... last I thoroughly checked this out was on Pentium 1 (in fact Intel seems to not want to disclose instruction cycle counts anymore... hard to find this info in latest P3 specs I've read, and I haven't read up on P4 at all aside from SSE stuff) Sean "Walter" <walter digitalmars.com> wrote in message news:a446n5$15hm$3 digitaldaemon.com...I know that you can reset the internal calculation precision. I did notknowthis affected execution time, I've not seen any hint of that in the Intel CPU documentation, though I could have just missed it.
Feb 09 2002
I suppose the definitive way is to write a benchmark. "Sean L. Palmer" <spalmer iname.com> wrote in message news:a44i10$19tn$1 digitaldaemon.com...You may have read more recent docs than I have... last I thoroughlycheckedthis out was on Pentium 1 (in fact Intel seems to not want to disclose instruction cycle counts anymore... hard to find this info in latest P3 specs I've read, and I haven't read up on P4 at all aside from SSE stuff) Sean "Walter" <walter digitalmars.com> wrote in message news:a446n5$15hm$3 digitaldaemon.com...IntelI know that you can reset the internal calculation precision. I did notknowthis affected execution time, I've not seen any hint of that in theCPU documentation, though I could have just missed it.
Feb 09 2002
You may have read more recent docs than I have... last I thoroughlycheckedthis out was on Pentium 1 (in fact Intel seems to not want to disclose instruction cycle counts anymore... hard to find this info in latest P3 specs I've read, and I haven't read up on P4 at all aside from SSE stuff)That's all what I found in Intel optimization manuals: FDIV: Latency (single, double, extended) cycles: Pentium Pro : 17, 36, 56 Pentium 2,3 : 18, 32, 38 Pentium 4 : 23, 38, 43 FSQR: Latency (single, double, extended) cycles: Pentium 4 : 23, 38, 43 btw, It is highly not recomended to do any performance sensetive calculations on x86 in extended precision. There are no reg-mem floating point instructions for 80bit floats, everything have to be compiled into ( FLD / stack operations / FST ) form. Besides, extended precision FLD & FST are much slower than single/double precision.
Feb 10 2002
The same info with additions and corrections: FDIV: Latency (single, double, extended) cycles: Pentium Pro : 17, 36, 56 Pentium 2,3 : 18, 32, 38 Pentium 4 : 23, 38, 43 Athlon (K7) : 16, 20, 24 FSQRT: Latency (single, double, extended) cycles: Pentium 4 : 23, 38, 43 Athlon (K7) : 19, 27, 35 FLD: Latency (single, double, extended) cycles: Athlon (K7) : 2, 2, 10 FSTP: Latency (single, double, extended) cycles: Athlon (K7) : 4, 4, 8 I have no info about FLD/FSTP on Pentium Pro..4, only the number of micro-ops for FLD/FSTD: number of micro-ops (single, double, extended): FLD : 1, 1, 4 FSTP : 2, 2, complex instructionbtw, It is highly not recomended to do any performance sensitive calculationsonx86 in extended precision. There are no reg-mem floating point instructions for 80bit floats, everything have to be compiled into ( FLD / stack operations / FST ) form. Besides, extended precision FLD & FST are much slower than single/double precision.It should be: ( FLD / stack operations / FSTP ), since there is no FST for extended precision. It means : there is no way to store some result in memory without throwing it out of FPU stack.
Feb 10 2002
Walter wrote:On Intel processors, the float and double math computations are not one iota faster than the extended ones. The ONLY reasons to use float and double are: 1) compatibility with C 2) large arrays will use less spaceAs an extension of item 2, note that in the FPU, they're not one iota faster, but getting thousands of floats into and out of level-1 cache is much faster than doubles or extendeds. That's the main reason that 3D graphics and high-end audio applications, today, use floats instead of the fatter formats. -RB
Feb 09 2002
"Russell Borogove" <kaleja estarcion.com> wrote in message news:3C6590B5.1000602 estarcion.com...As an extension of item 2, note that in the FPU, they're not one iota faster, but getting thousands of floats into and out of level-1 cache is much faster than doubles or extendeds. That's the main reason that 3D graphics and high-end audio applications, today, use floats instead of the fatter formats.Ok, I hadn't thought of that.
Feb 09 2002