digitalmars.D - Exotic floor() function - D is different
- Bob W (43/43) Mar 28 2005 The floor() function in D does not produce equivalent
- Walter (20/50) Mar 30 2005 What you're seeing is the result of using 80 bit precision, which is wha...
- Derek Parnell (42/104) Mar 31 2005 I can follow what you say, but can you explain the output of the program
- Bob W (13/124) Mar 31 2005 Great job! I could not believe it first:
- Walter (23/29) Apr 01 2005 I suggest in general viewing how these things work (floating, chopping,
- Bob W (29/90) Mar 31 2005 Thank you for your information, Walter.
- Walter (28/46) Apr 01 2005 Not true, it fully supports 80 bits.
- Derek Parnell (45/109) Apr 01 2005 I repeat, (I think) I understand what you are saying but can you explain
- Walter (4/5) Apr 01 2005 No, there isn't. The reason for the difference is when you assign the
- Derek Parnell (16/22) Apr 01 2005 Ok, I did that. And I still can't explain the output.
- Derek Parnell (43/43) Apr 01 2005 On Sat, 2 Apr 2005 15:39:01 +1000, Derek Parnell wrote:
- Walter (12/13) Apr 02 2005 Recall that, at runtime, the intermediate values are allowed to be carri...
- Bob W (9/22) Apr 02 2005 It's C legacy hidden in the way the compiler parses
- Derek Parnell (11/33) Apr 02 2005 Got it.
- Walter (3/8) Apr 02 2005 It's the way C works.
- Derek Parnell (7/17) Apr 02 2005 I understand. And here I was thinking that D was meant to be better than...
- Bob W (42/76) Apr 02 2005 Some further info:
- Walter (24/28) Apr 03 2005 Actually, many languages, mathematical programs, and even C compilers ha...
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (13/16) Apr 03 2005 The thing is that the D "real" type does *not* guarantee 80 bits ?
- Walter (12/28) Apr 03 2005 as I
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (24/38) Apr 03 2005 Me neither. Emulating 64-bit integers with two 32-bit registers is OK,
- Walter (11/33) Apr 03 2005 Yes, I believe that is better. Every once in a while, an app *does* care...
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (20/32) Apr 04 2005 I just fail to see how real -> double/extended, is any different from
- Georg Wrede (7/12) Apr 04 2005 Size can be anything divisible by 8 bits, i.e. any number of bytes.
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (9/24) Apr 04 2005 OK, seems like my sloppy syntax is hurting me once again... :-P
- Ben Hinkle (7/22) Apr 04 2005 What happens when someone declares a variable as quadruple on a platform...
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (4/9) Apr 04 2005 Choke... Splutter... Die.
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (7/10) Apr 04 2005 Just to be perfectly clear:
- Ben Hinkle (6/16) Apr 04 2005 yup, I read it that way - though I did notice I spluttered a bit this
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (13/15) Apr 04 2005 That is actually *not* needless to say,
- Bob W (6/21) Apr 04 2005 The IEEE 754r suggests that there won't be
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (6/14) Apr 04 2005 According to Sun, Microsoft, IBM and Apple
- Walter (4/10) Apr 04 2005 I fear it will be constant struggle to keep the chipmakers from dropping...
- Charles Hixson (8/40) Apr 05 2005 Perhaps Ada has the right idea here. Have a system default that
- Walter (21/43) Apr 04 2005 care,
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (12/29) Apr 04 2005 Interesting view of it, but I think that int fixed-point math degrades
- =?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= (3/7) Apr 04 2005 Make that "Linux on X86 aligns to 4 bytes, by making the size 12".
- Charles Hixson (8/34) Apr 05 2005 Would implementing fixed point arithmetic improve that? Even
- Walter (5/12) Apr 05 2005 If using a 128 bit fixed point would work, then one can use integer
- Bob W (37/75) Apr 03 2005 I am probably looking like an extended precison
- Bob W (57/106) Apr 01 2005 I still don't buy that.
The floor() function in D does not produce equivalent results compared to a bunch of other languages tested. The other languages were: dmc djgpp dmdscript jscript assembler ('87 code) The biggest surprise was that neither dmc nor dmdscript were able to match the D results. The sample program below gets an input from the command line, converts it, multiplies it with 1e6 and adds 0.5 before calling the floor() function. The expected result, based on an input of 0.0000195, would be 20.0, but D thinks it should be 19.0. Since 0.0000195 cannot be represented accurately in any of the usual floating point formats, the somewhat unique D result is probably not even a bug. But it is a major inconvenience when comparing numerical outputs produced by different programs. So far I was unable to reproduce the rounding issue in D with any other language tested. (I have even tried OpenOffice to check.) Before someone tells me that D uses a different floating point format, I'd like to mention that I have used float, double and long double in the equivalent C programs without any changes. //------------------------------ import std.stdio,std.string,std.math; int main(char[][] av) { if (av.length!=2) { printf("\nEnter Val! (e.g. 0.0000195)\n"); return(0); } double x=atof(av[1]); // expecting 0.0000195; writef(" x*1e6:%12.6f\n",x*1e6); writef(" floor(x..):%12.6f\n",floor(1e6*x)); writef(" floor(.5+x..):%12.6f\n",floor(.5 + 1e6*x)); writef(" floor(.5+co.):%12.6f\n",floor(.5 + 1e6*0.0000195)); return(0); }
Mar 28 2005
"Bob W" <nospam aol.com> wrote in message news:d2aash$a4s$1 digitaldaemon.com...The floor() function in D does not produce equivalent results compared to a bunch of other languages tested. The other languages were: dmc djgpp dmdscript jscript assembler ('87 code) The biggest surprise was that neither dmc nor dmdscript were able to match the D results. The sample program below gets an input from the command line, converts it, multiplies it with 1e6 and adds 0.5 before calling the floor() function. The expected result, based on an input of 0.0000195, would be 20.0, but D thinks it should be 19.0. Since 0.0000195 cannot be represented accurately in any of the usual floating point formats, the somewhat unique D result is probably not even a bug. But it is a major inconvenience when comparing numerical outputs produced by different programs. So far I was unable to reproduce the rounding issue in D with any other language tested. (I have even tried OpenOffice to check.) Before someone tells me that D uses a different floating point format, I'd like to mention that I have used float, double and long double in the equivalent C programs without any changes.What you're seeing is the result of using 80 bit precision, which is what D uses in internal calculations. .0000195 is not represented exactly, to print the number it is rounded. So, depending on how many bits of precision there are in the representation, it might be one bit, 63 bits to the right, under "5", so floor() will chop it down. Few C compilers support 80 bit long doubles, they implement them as 64 bit ones. Very few programs use 80 bit reals. The std.math.floor function uses 80 bit precision. If you want to use the C 64 bit one instead, add this declaration: extern (C) double floor(double); Then the results are: x*1e6: 19.500000 floor(x..): 19.000000 floor(.5+x..): 20.000000 floor(.5+co.): 20.000000 I suggest that while it's a reasonable thing to require a minimum number of floating point bits for a computation, it's probably not a good idea to require a maximum.
Mar 30 2005
On Wed, 30 Mar 2005 21:43:07 -0800, Walter wrote:"Bob W" <nospam aol.com> wrote in message news:d2aash$a4s$1 digitaldaemon.com...I can follow what you say, but can you explain the output of the program below? There appears to be a difference in the way variables and literals are treated. import std.stdio; import std.math; import std.string; void main() { float x; double y; real z; x = 0.0000195; y = 0.0000195; z = 0.0000195; writefln(" Raw Floor"); writefln("Using float variable: %12.6f %12.6f", (.5 + 1e6*x), floor(.5 + 1e6*x)); writefln("Using double variable: %12.6f %12.6f", (.5 + 1e6*y), floor(.5 + 1e6*y)); writefln("Using real variable: %12.6f %12.6f", (.5 + 1e6*z), floor(.5 + 1e6*z)); writefln("Using float literal: %12.6f %12.6f", (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f)); writefln("Using double literal: %12.6f %12.6f", (.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195)); writefln("Using real literal: %12.6f %12.6f", (.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l)); } ---------- I get the following output... ---------- Raw Floor Using float variable: 19.999999 19.000000 Using double variable: 20.000000 19.000000 Using real variable: 20.000000 19.000000 Using float literal: 19.999999 20.000000 Using double literal: 20.000000 20.000000 Using real literal: 20.000000 20.000000 -- Derek Melbourne, Australia 31/03/2005 6:43:48 PMThe floor() function in D does not produce equivalent results compared to a bunch of other languages tested. The other languages were: dmc djgpp dmdscript jscript assembler ('87 code) The biggest surprise was that neither dmc nor dmdscript were able to match the D results. The sample program below gets an input from the command line, converts it, multiplies it with 1e6 and adds 0.5 before calling the floor() function. The expected result, based on an input of 0.0000195, would be 20.0, but D thinks it should be 19.0. Since 0.0000195 cannot be represented accurately in any of the usual floating point formats, the somewhat unique D result is probably not even a bug. But it is a major inconvenience when comparing numerical outputs produced by different programs. So far I was unable to reproduce the rounding issue in D with any other language tested. (I have even tried OpenOffice to check.) Before someone tells me that D uses a different floating point format, I'd like to mention that I have used float, double and long double in the equivalent C programs without any changes.What you're seeing is the result of using 80 bit precision, which is what D uses in internal calculations. .0000195 is not represented exactly, to print the number it is rounded. So, depending on how many bits of precision there are in the representation, it might be one bit, 63 bits to the right, under "5", so floor() will chop it down. Few C compilers support 80 bit long doubles, they implement them as 64 bit ones. Very few programs use 80 bit reals. The std.math.floor function uses 80 bit precision. If you want to use the C 64 bit one instead, add this declaration: extern (C) double floor(double); Then the results are: x*1e6: 19.500000 floor(x..): 19.000000 floor(.5+x..): 20.000000 floor(.5+co.): 20.000000 I suggest that while it's a reasonable thing to require a minimum number of floating point bits for a computation, it's probably not a good idea to require a maximum.
Mar 31 2005
"Derek Parnell" <derek psych.ward> wrote in message news:7di6xztjokyz.6vnxzcx1d7l8.dlg 40tude.net...On Wed, 30 Mar 2005 21:43:07 -0800, Walter wrote:Great job! I could not believe it first: writefln("Using float literal: %12.6f %12.6f", (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f)); producing the following output: Using float variable: 19.999999 20.000000 Looks like floor() mutates to ceil() at times. To ensure that this is not "down under" specific (Melbourne), I have repeated your test in the northern hemisphere, and, not surprisingly, it did the same thing. Now I am pretty curious to know why this is happening. We'll see if Walter comes up with an answer ....."Bob W" <nospam aol.com> wrote in message news:d2aash$a4s$1 digitaldaemon.com...I can follow what you say, but can you explain the output of the program below? There appears to be a difference in the way variables and literals are treated. import std.stdio; import std.math; import std.string; void main() { float x; double y; real z; x = 0.0000195; y = 0.0000195; z = 0.0000195; writefln(" Raw Floor"); writefln("Using float variable: %12.6f %12.6f", (.5 + 1e6*x), floor(.5 + 1e6*x)); writefln("Using double variable: %12.6f %12.6f", (.5 + 1e6*y), floor(.5 + 1e6*y)); writefln("Using real variable: %12.6f %12.6f", (.5 + 1e6*z), floor(.5 + 1e6*z)); writefln("Using float literal: %12.6f %12.6f", (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f)); writefln("Using double literal: %12.6f %12.6f", (.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195)); writefln("Using real literal: %12.6f %12.6f", (.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l)); } ---------- I get the following output... ---------- Raw Floor Using float variable: 19.999999 19.000000 Using double variable: 20.000000 19.000000 Using real variable: 20.000000 19.000000 Using float literal: 19.999999 20.000000 Using double literal: 20.000000 20.000000 Using real literal: 20.000000 20.000000 -- Derek Melbourne, Australia 31/03/2005 6:43:48 PMThe floor() function in D does not produce equivalent results compared to a bunch of other languages tested. The other languages were: dmc djgpp dmdscript jscript assembler ('87 code) The biggest surprise was that neither dmc nor dmdscript were able to match the D results. The sample program below gets an input from the command line, converts it, multiplies it with 1e6 and adds 0.5 before calling the floor() function. The expected result, based on an input of 0.0000195, would be 20.0, but D thinks it should be 19.0. Since 0.0000195 cannot be represented accurately in any of the usual floating point formats, the somewhat unique D result is probably not even a bug. But it is a major inconvenience when comparing numerical outputs produced by different programs. So far I was unable to reproduce the rounding issue in D with any other language tested. (I have even tried OpenOffice to check.) Before someone tells me that D uses a different floating point format, I'd like to mention that I have used float, double and long double in the equivalent C programs without any changes.What you're seeing is the result of using 80 bit precision, which is what D uses in internal calculations. .0000195 is not represented exactly, to print the number it is rounded. So, depending on how many bits of precision there are in the representation, it might be one bit, 63 bits to the right, under "5", so floor() will chop it down. Few C compilers support 80 bit long doubles, they implement them as 64 bit ones. Very few programs use 80 bit reals. The std.math.floor function uses 80 bit precision. If you want to use the C 64 bit one instead, add this declaration: extern (C) double floor(double); Then the results are: x*1e6: 19.500000 floor(x..): 19.000000 floor(.5+x..): 20.000000 floor(.5+co.): 20.000000 I suggest that while it's a reasonable thing to require a minimum number of floating point bits for a computation, it's probably not a good idea to require a maximum.
Mar 31 2005
"Bob W" <nospam aol.com> wrote in message news:d2i3et$27dg$1 digitaldaemon.com...Great job! I could not believe it first: writefln("Using float literal: %12.6f %12.6f", (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f)); producing the following output: Using float variable: 19.999999 20.000000 We'll see if Walter comes up with an answer .....I suggest in general viewing how these things work (floating, chopping, rounding, precision, etc.) is to print things using the %a format (which prints out ALL the bits in hexadecimal format). As to the specific case above, let's break down each (using suffix 'd' to represent double): (.5 + 1e6*0.0000195f) => (.5d + 1e6d * cast(double)0.0000195f), result is double floor(.5 + 1e6*0.0000195f)) => floor(cast(real)(.5d + 1e6d * cast(double)0.0000195f)), result is real When writef prints a real, it adds ".5" to the last signficant decimal digit and chops. This will give DIFFERENT results for a double and for a real. It's also DIFFERENT from the binary rounding that goes on in intermediate floating point calculations, which adds "half a bit" (not .5) and chops. Also, realize that internally to the FPU, a "guard bit" and a "sticky bit" are maintained for a floating point value, these influence rounding, and are discarded when a value leaves the FPU and is written to memory. What is happening here is that you start with a value that is not exactly representable, then putting it through a series of precision changes and roundings, and comparing it with the result of a different series of precision changes and roundings, and expecting the results to match bit for bit. There's no way to make that happen.
Apr 01 2005
"Walter" <newshound digitalmars.com> wrote in message news:d2g9jj$8om$1 digitaldaemon.com..."Bob W" <nospam aol.com> wrote in message news:d2aash$a4s$1 digitaldaemon.com...Thank you for your information, Walter. However, I am not convinced that the culprit ist the 80-bit floating point format. This is due to some tests I have made programming the FPU directly. Based on my above stated example, the 80 bit format is perfectly capable to generate the 'mainstream result' of 20 as opposed to the lone 19 which D is producing. Some more info, which might lead to the real problem: - D is not entirely 80-bit based as claimed. - Literals are converted to 64 bit first (and from there to 80 bits) at compile time if no suffix is used, even if the target is of type 'real'. - atof() for example is returning a 'real' value which is obviously derived from a 'double', thus missing some essential bits at the end. Example: The hex value for 0.0000195 in 'real' can be expressed as 3fef a393ee5e edcc20d5 or 3fef a393ee5e edcc20d6 (due to the non-decimal fraction). The same value converted from a 'double' would be 3fef a393ee5e edcc2000 and therefore misses several trailing bits. This could cause the floor() function to misbehave. I hope this info was somewhat useful. Cheers.The floor() function in D does not produce equivalent results compared to a bunch of other languages tested. The other languages were: dmc djgpp dmdscript jscript assembler ('87 code) The biggest surprise was that neither dmc nor dmdscript were able to match the D results. The sample program below gets an input from the command line, converts it, multiplies it with 1e6 and adds 0.5 before calling the floor() function. The expected result, based on an input of 0.0000195, would be 20.0, but D thinks it should be 19.0. Since 0.0000195 cannot be represented accurately in any of the usual floating point formats, the somewhat unique D result is probably not even a bug. But it is a major inconvenience when comparing numerical outputs produced by different programs. So far I was unable to reproduce the rounding issue in D with any other language tested. (I have even tried OpenOffice to check.) Before someone tells me that D uses a different floating point format, I'd like to mention that I have used float, double and long double in the equivalent C programs without any changes.What you're seeing is the result of using 80 bit precision, which is what D uses in internal calculations. .0000195 is not represented exactly, to print the number it is rounded. So, depending on how many bits of precision there are in the representation, it might be one bit, 63 bits to the right, under "5", so floor() will chop it down. Few C compilers support 80 bit long doubles, they implement them as 64 bit ones. Very few programs use 80 bit reals. The std.math.floor function uses 80 bit precision. If you want to use the C 64 bit one instead, add this declaration: extern (C) double floor(double); Then the results are: x*1e6: 19.500000 floor(x..): 19.000000 floor(.5+x..): 20.000000 floor(.5+co.): 20.000000 I suggest that while it's a reasonable thing to require a minimum number of floating point bits for a computation, it's probably not a good idea to require a maximum.
Mar 31 2005
"Bob W" <nospam aol.com> wrote in message news:d2ieh5$2ksl$1 digitaldaemon.com...- D is not entirely 80-bit based as claimed.Not true, it fully supports 80 bits.- Literals are converted to 64 bit first (and from there to 80 bits) at compile time if no suffix is used, even if the target is of type 'real'.Incorrect. You can see for yourself in lexer.c. Do a grep for "strtold".- atof() for example is returning a 'real' value which is obviously derived from a 'double', thus missing some essential bits at the end.Check out std.math2.atof(). It's fully 80 bit.Example: The hex value for 0.0000195 in 'real' can be expressed as 3fef a393ee5e edcc20d5 or 3fef a393ee5e edcc20d6 (due to the non-decimal fraction). The same value converted from a 'double' would be 3fef a393ee5e edcc2000 and therefore misses several trailing bits. This could cause the floor() function to misbehave. I hope this info was somewhat useful.Perhaps the following program will help: import std.stdio; void main() { writefln("float %a", 0.0000195F); writefln("double %a", 0.0000195); writefln("real %a", 0.0000195L); writefln("cast(real)float %a", cast(real)0.0000195F); writefln("cast(real)double %a", cast(real)0.0000195); writefln("cast(real)real %a", cast(real)0.0000195L); writefln("float %a", 0.0000195F * 7 - 195); writefln("double %a", 0.0000195 * 7 - 195); writefln("real %a", 0.0000195L * 7 - 195); } float 0x1.4727dcp-16 double 0x1.4727dcbddb984p-16 real 0x1.4727dcbddb9841acp-16 cast(real)float 0x1.4727dcp-16 cast(real)double 0x1.4727dcbddb984p-16 cast(real)real 0x1.4727dcbddb9841acp-16 float -0x1.85ffeep+7 double -0x1.85ffee1bd1edap+7 real -0x1.85ffee1bd1ed9dfep+7
Apr 01 2005
On Fri, 1 Apr 2005 15:03:02 -0800, Walter wrote:"Bob W" <nospam aol.com> wrote in message news:d2ieh5$2ksl$1 digitaldaemon.com...I repeat, (I think) I understand what you are saying but can you explain the output of this ... <code> import std.stdio; import std.math; import std.string; void main() { float x; double y; real z; x = 0.0000195; y = 0.0000195; z = 0.0000195; writefln(" %24s %24s","Raw","Floor"); writefln("Using float variable: %24a %24a", (.5 + 1e6*x), floor(.5 + 1e6*x)); writefln("Using double variable: %24a %24a", (.5 + 1e6*y), floor(.5 + 1e6*y)); writefln("Using real variable: %24a %24a", (.5 + 1e6*z), floor(.5 + 1e6*z)); writefln("Using float literal: %24a %24a", (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f)); writefln("Using double literal: %24a %24a", (.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195)); writefln("Using real literal: %24a %24a", (.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l)); } </code> ______________ Output is ... Raw Floor Using float variable: 0x1.3fffff4afp+4 0x1.3p+4 Using double variable: 0x1.4p+4 0x1.3p+4 Using real variable: 0x1.3ffffffffffffe68p+4 0x1.3p+4 Using float literal: 0x1.3fffff4afp+4 0x1.4p+4 Using double literal: 0x1.4p+4 0x1.4p+4 Using real literal: 0x1.4000000000000002p+4 0x1.4p+4 There seems to be different treatment of literals and variables. Even apart from that, given the values above, I can understand the floor behaviour except for lines 2(double variable) and 6 (real literal). -- Derek Parnell Melbourne, Australia 2/04/2005 10:19:43 AM- D is not entirely 80-bit based as claimed.Not true, it fully supports 80 bits.- Literals are converted to 64 bit first (and from there to 80 bits) at compile time if no suffix is used, even if the target is of type 'real'.Incorrect. You can see for yourself in lexer.c. Do a grep for "strtold".- atof() for example is returning a 'real' value which is obviously derived from a 'double', thus missing some essential bits at the end.Check out std.math2.atof(). It's fully 80 bit.Example: The hex value for 0.0000195 in 'real' can be expressed as 3fef a393ee5e edcc20d5 or 3fef a393ee5e edcc20d6 (due to the non-decimal fraction). The same value converted from a 'double' would be 3fef a393ee5e edcc2000 and therefore misses several trailing bits. This could cause the floor() function to misbehave. I hope this info was somewhat useful.Perhaps the following program will help: import std.stdio; void main() { writefln("float %a", 0.0000195F); writefln("double %a", 0.0000195); writefln("real %a", 0.0000195L); writefln("cast(real)float %a", cast(real)0.0000195F); writefln("cast(real)double %a", cast(real)0.0000195); writefln("cast(real)real %a", cast(real)0.0000195L); writefln("float %a", 0.0000195F * 7 - 195); writefln("double %a", 0.0000195 * 7 - 195); writefln("real %a", 0.0000195L * 7 - 195); } float 0x1.4727dcp-16 double 0x1.4727dcbddb984p-16 real 0x1.4727dcbddb9841acp-16 cast(real)float 0x1.4727dcp-16 cast(real)double 0x1.4727dcbddb984p-16 cast(real)real 0x1.4727dcbddb9841acp-16 float -0x1.85ffeep+7 double -0x1.85ffee1bd1edap+7 real -0x1.85ffee1bd1ed9dfep+7
Apr 01 2005
"Derek Parnell" <derek psych.ward> wrote in message news:eouhnxxkjb80$.clvse1356mlr.dlg 40tude.net...There seems to be different treatment of literals and variables.No, there isn't. The reason for the difference is when you assign the literal to z. Use the 'L' suffix for a real literal.
Apr 01 2005
On Fri, 1 Apr 2005 18:50:40 -0800, Walter wrote:"Derek Parnell" <derek psych.ward> wrote in message news:eouhnxxkjb80$.clvse1356mlr.dlg 40tude.net...Ok, I did that. And I still can't explain the output. Raw Floor Using float variable: 0x1.3fffff4afp+4 0x1.3p+4 Using double variable: 0x1.4p+4 0x1.3p+4 Using real variable: 0x1.4000000000000002p+4 0x1.4p+4 Using float literal: 0x1.3fffff4afp+4 0x1.4p+4 Using double literal: 0x1.4p+4 0x1.4p+4 Using real literal: 0x1.4000000000000002p+4 0x1.4p+4 Look at the results for doubles. How does floor(0x1.4p+4) give 0x1.3p+4 when the expression is a variable and give 0x1.4p+4 when the expression is a literal? -- Derek Parnell Melbourne, Australia 2/04/2005 3:34:22 PMThere seems to be different treatment of literals and variables.No, there isn't. The reason for the difference is when you assign the literal to z. Use the 'L' suffix for a real literal.
Apr 01 2005
On Sat, 2 Apr 2005 15:39:01 +1000, Derek Parnell wrote: I've reformatted the display to make it easier to spot the anomaly. Raw Floor Using float variable: 0x1.3fffff4afp+4 0x1.3p+4 Using float literal: 0x1.3fffff4afp+4 0x1.4p+4 Using double variable: 0x1.4p+4 0x1.3p+4 Using double literal: 0x1.4p+4 0x1.4p+4 Using real variable: 0x1.4000000000000002p+4 0x1.4p+4 Using real literal: 0x1.4000000000000002p+4 0x1.4p+4 And here is the program that created the above ... <code> import std.stdio; import std.math; import std.string; void main() { float x; double y; real z; x = 0.0000195F; y = 0.0000195; z = 0.0000195L; writefln(" %24s %24s","Raw","Floor"); writefln("Using float variable: %24a %24a", (.5 + 1e6*x), floor(.5 + 1e6*x)); writefln("Using float literal: %24a %24a", (.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f)); writefln(""); writefln("Using double variable: %24a %24a", (.5 + 1e6*y), floor(.5 + 1e6*y)); writefln("Using double literal: %24a %24a", (.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195)); writefln(""); writefln("Using real variable: %24a %24a", (.5 + 1e6*z), floor(.5 + 1e6*z)); writefln("Using real literal: %24a %24a", (.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l)); } </code> -- Derek Parnell Melbourne, Australia 2/04/2005 4:48:12 PM
Apr 01 2005
"Derek Parnell" <derek psych.ward> wrote in message news:124cwpdauczht$.1wqi8sqkdi4ec.dlg 40tude.net...Ok, I did that. And I still can't explain the output.Recall that, at runtime, the intermediate values are allowed to be carried out to 80 bits. So, floor(.5 + 1e6*y) is evaluated as: floor(cast(real).5 + cast(real)(1e6) * cast(real)y); whereas: floor(.5 + 1e6*0.0000195) is evaluated as: float(cast(real)(.5 + 1e6*0.0000195)) hence the difference in result.
Apr 02 2005
"Walter" <newshound digitalmars.com> wrote in message news:d2lodh$2qhf$1 digitaldaemon.com..."Derek Parnell" <derek psych.ward> wrote in message news:124cwpdauczht$.1wqi8sqkdi4ec.dlg 40tude.net...It's C legacy hidden in the way the compiler parses this code. You'll be facing these kind of questions over and over again, unless you move a step further away from C and let the compiler treat unsuffixed literals as the "internal compiler floating point precision format". See my thread: "80 Bit Challenge"Ok, I did that. And I still can't explain the output.Recall that, at runtime, the intermediate values are allowed to be carried out to 80 bits. So, floor(.5 + 1e6*y) is evaluated as: floor(cast(real).5 + cast(real)(1e6) * cast(real)y); whereas: floor(.5 + 1e6*0.0000195) is evaluated as: float(cast(real)(.5 + 1e6*0.0000195)) hence the difference in result.
Apr 02 2005
On Sat, 2 Apr 2005 01:23:46 -0800, Walter wrote:"Derek Parnell" <derek psych.ward> wrote in message news:124cwpdauczht$.1wqi8sqkdi4ec.dlg 40tude.net...Got it. So to summarize, in expressions that contain at least one double variable, each term is promoted to real before expression evaluation, but if the expression only contains double literals, then the terms are not promoted to real. Why did you decide to have this anomaly? -- Derek Parnell Melbourne, Australia 2/04/2005 11:46:44 PMOk, I did that. And I still can't explain the output.Recall that, at runtime, the intermediate values are allowed to be carried out to 80 bits. So, floor(.5 + 1e6*y) is evaluated as: floor(cast(real).5 + cast(real)(1e6) * cast(real)y); whereas: floor(.5 + 1e6*0.0000195) is evaluated as: float(cast(real)(.5 + 1e6*0.0000195)) hence the difference in result.
Apr 02 2005
"Derek Parnell" <derek psych.ward> wrote in message news:lsphadeuh4s3.gjqbum65kx87$.dlg 40tude.net...So to summarize, in expressions that contain at least one double variable, each term is promoted to real before expression evaluation, but if the expression only contains double literals, then the terms are not promoted to real. Why did you decide to have this anomaly?It's the way C works.
Apr 02 2005
On Sat, 2 Apr 2005 10:04:32 -0800, Walter wrote:"Derek Parnell" <derek psych.ward> wrote in message news:lsphadeuh4s3.gjqbum65kx87$.dlg 40tude.net...I understand. And here I was thinking that D was meant to be better than C. My bad. -- Derek Parnell Melbourne, Australia 3/04/2005 8:13:09 AMSo to summarize, in expressions that contain at least one double variable, each term is promoted to real before expression evaluation, but if the expression only contains double literals, then the terms are not promoted to real. Why did you decide to have this anomaly?It's the way C works.
Apr 02 2005
"Derek Parnell" <derek psych.ward> wrote in message news:lsphadeuh4s3.gjqbum65kx87$.dlg 40tude.net...On Sat, 2 Apr 2005 01:23:46 -0800, Walter wrote:Some further info: Currently it seems that in the D language no literal is ever promoted to real directly if it was not suffixed with a "L". You can cast(real) it, and it will still be a double which is converted to a crippled real in the FPU, because some of its matissa bits went missing. There are many exceptions though: All floating point integers (1.0 2.0 10.0 etc.) and fractions like 0.5 0.25 0.125 etc. are converted to proper real values, because they are accurately represented in binary floating point formats. But even they are initially doubles, which are just unharmed by the conversion because most of their trailing mantissa bits are zero. Any other fractional number (e.g. 1.2) cannot be represented accurately in the binary system, so its double representation is not equivalent to its real representation (nor is it to the decimal literal). If such a double is converted to real, it is missing several bits of precision, so it will not correspond accurately to its properly converted counterpart (e.g. 1.2L). As a summary: If you feel the need using extended double (real) precision in D, never ever forget the "L" for literals unless you want "special effects". Examples: real r=1.2L; // proper 80 bit real assigned to r real r=1.2; // inaccurate truncated 80 bit real real r=2.4/2.0; // inaccurate (2.4 loses precision) real r=2.4/2.0L; // inaccurate for the same reason real r=2.4L/2.0; // this one will work (2.0 == 2.0L) real r=2.4L/2.0L; // thats the safe way to do it real r=cast(real)1.2; // inaccurate, converted from // 1.2 as a double By the way, C does it the same way for historic reasons. Other languages are more user friendly and I am still hoping that D might evolve in this direction."Derek Parnell" <derek psych.ward> wrote in message news:124cwpdauczht$.1wqi8sqkdi4ec.dlg 40tude.net...Got it. So to summarize, in expressions that contain at least one double variable, each term is promoted to real before expression evaluation, but if the expression only contains double literals, then the terms are not promoted to real. Why did you decide to have this anomaly? -- Derek Parnell Melbourne, Australia 2/04/2005 11:46:44 PMOk, I did that. And I still can't explain the output.Recall that, at runtime, the intermediate values are allowed to be carried out to 80 bits. So, floor(.5 + 1e6*y) is evaluated as: floor(cast(real).5 + cast(real)(1e6) * cast(real)y); whereas: floor(.5 + 1e6*0.0000195) is evaluated as: float(cast(real)(.5 + 1e6*0.0000195)) hence the difference in result.
Apr 02 2005
"Bob W" <nospam aol.com> wrote in message news:d2nd96$1aos$1 digitaldaemon.com...By the way, C does it the same way for historic reasons. Other languages are more user friendly and I am still hoping that D might evolve in this direction.Actually, many languages, mathematical programs, and even C compilers have *dropped* support for 80 bit long doubles. At one point, Microsoft had even made it impossible to execute 80 bit floating instructions on their upcoming Win64 (I made some frantic phone calls to them and apparently was the only one who ever made a case to them in favor of 80 bit long doubles, they said they'd put the support back in). Intel doesn't support 80 bit reals on any of their new vector floating point instructions. The 64 bit chips only bit reals. I haven't done a comprehensive survey of computer languages, but as far as I can tell D stands pretty much alone in its support for 80 bits, along with a handful of C/C++ compilers (including DMC). Because of this shaky operating system and chip support for 80 bits, it would be a mistake to center D's floating point around 80 bits. Some systems may force a reversion to 64 bits. On the other hand, ongoing system support for 64 bit doubles is virtually guaranteed, and D generally follows C's rules with these. (BTW, this thread is a classic example of "build it, and they will come". D is almost single handedly rescuing 80 bit floating point from oblivion, since it makes such a big deal about it and has wound up interesting a lot of people in it. Before D, as far as I could tell, nobody cared a whit about it. I think it's great that this has struck such a responsive chord.)
Apr 03 2005
Walter wrote:I haven't done a comprehensive survey of computer languages, but as far as I can tell D stands pretty much alone in its support for 80 bits, along with a handful of C/C++ compilers (including DMC).The thing is that the D "real" type does *not* guarantee 80 bits ? It doesn't even say the minimum size, so one can only assume 64... I think it would be more clear to say "80 bits minimum", and then future CPUs/code is still free to use 128-bit extended doubles too ? (since D allows all FP calculations to be done at a higher precision) This would be simplified by padding the 80-bit floating point to a full 16 bytes, by adding zeros (as suggested by performance anyway) And then, with both 128-bit integers and 128-bit floating point, D would truly be equipped to face both today (64) and tomorrow... (and with a "real" alias, it's still the "largest hardware implemented") Just my 2 öre, --anders
Apr 03 2005
"Anders F Björklund" <afb algonet.se> wrote in message news:d2og5l$27nh$3 digitaldaemon.com...Walter wrote:as II haven't done a comprehensive survey of computer languages, but as farwith acan tell D stands pretty much alone in its support for 80 bits, alongYes, it's 64. Guaranteeing 80 bits would require writing an 80 bit software emulator. I've used such emulators before, and they are really, really slow. I don't think it's practical for D floating point to be 100x slower on some machines.handful of C/C++ compilers (including DMC).The thing is that the D "real" type does *not* guarantee 80 bits ? It doesn't even say the minimum size, so one can only assume 64...I think it would be more clear to say "80 bits minimum", and then future CPUs/code is still free to use 128-bit extended doubles too ? (since D allows all FP calculations to be done at a higher precision)What it's supposed to be is the max precision supported by the hardware the D program is running on.This would be simplified by padding the 80-bit floating point to a full 16 bytes, by adding zeros (as suggested by performance anyway)C compilers that support 80 bit long doubles will align them on 2 byte boundaries. To conform to the C ABI, D must follow suit.And then, with both 128-bit integers and 128-bit floating point, D would truly be equipped to face both today (64) and tomorrow... (and with a "real" alias, it's still the "largest hardware implemented") Just my 2 öre, --anders
Apr 03 2005
Walter wrote:Me neither. Emulating 64-bit integers with two 32-bit registers is OK, since that is a whole lot easier. (could even be done for 128-bit ints?) But emulating 80-bit floating point ? Eww. Emulating a 128-bit double is better, but the current method is cheating a lot on IEEE-755 spec... No, I meant that extended precision should be *unavailable* on some CPU. But maybe it's better to have it work in D, like long double does in C ? (i.e. it falls back to using regular doubles, possibly with warnings) If so, just tell me it's better to have a flexible width language type, than to have some types be unavailable on certain FPU computer hardware? Since that was the whole idea... (have "extended" map to 80-bit FP type)The thing is that the D "real" type does *not* guarantee 80 bits ? It doesn't even say the minimum size, so one can only assume 64...Yes, it's 64. Guaranteeing 80 bits would require writing an 80 bit software emulator. I've used such emulators before, and they are really, really slow. I don't think it's practical for D floating point to be 100x slower on some machines.What it's supposed to be is the max precision supported by the hardware the D program is running on.OK, for PPC and PPC64 that is definitely 64 bits. Not sure about SPARC ? Think I saw that Cray (or so) has 128-bit FP, but haven't got one... :-) It seems like likely real-life values would be: 64, 80, 96 and 128 bits (PPC/PPC64, X86/X86_64, 68K, and whatever super-computer it was above) It's possible that a future 128-bit CPU would have a 128-bit FPU too... But who knows ? (I haven't even seen the slightest hint of such a beast)I thought that was an ABI option, how to align "long double" types ? It was my understanding that it was aligned to 96 bits on X86, and to 128 bits on X86_64. But I might very well be wrong there... (it's just the impression that I got from reading the GCC manual) i.e. it still uses the regular 80 bit floating point registers, but pads the values out with zeroes when storing them in memory. --andersThis would be simplified by padding the 80-bit floating point to a full 16 bytes, by adding zeros (as suggested by performance anyway)C compilers that support 80 bit long doubles will align them on 2 byte boundaries. To conform to the C ABI, D must follow suit.
Apr 03 2005
"Anders F Björklund" <afb algonet.se> wrote in message news:d2pdbk$30dj$1 digitaldaemon.com...If so, just tell me it's better to have a flexible width language type, than to have some types be unavailable on certain FPU computer hardware?Yes, I believe that is better. Every once in a while, an app *does* care, but they're screwed anyway if the hardware won't support it.theWhat it's supposed to be is the max precision supported by the hardwareWhen I first looked at the AMD64 documentation, I was thrilled to see "m128" for a floating point type. I was crushed when I found it meant "two 64 bit doubles". I'd love to see a big honker 128 bit floating point type in hardware.D program is running on.OK, for PPC and PPC64 that is definitely 64 bits. Not sure about SPARC ? Think I saw that Cray (or so) has 128-bit FP, but haven't got one... :-) It seems like likely real-life values would be: 64, 80, 96 and 128 bits (PPC/PPC64, X86/X86_64, 68K, and whatever super-computer it was above) It's possible that a future 128-bit CPU would have a 128-bit FPU too... But who knows ? (I haven't even seen the slightest hint of such a beast)The only option is to align it to what the corresponding C compiler does.I thought that was an ABI option, how to align "long double" types ?This would be simplified by padding the 80-bit floating point to a full 16 bytes, by adding zeros (as suggested by performance anyway)C compilers that support 80 bit long doubles will align them on 2 byte boundaries. To conform to the C ABI, D must follow suit.It was my understanding that it was aligned to 96 bits on X86,That's not a power of 2, so won't work as alignment.and to 128 bits on X86_64. But I might very well be wrong there... (it's just the impression that I got from reading the GCC manual) i.e. it still uses the regular 80 bit floating point registers, but pads the values out with zeroes when storing them in memory. --anders
Apr 03 2005
Walter wrote:I just fail to see how real -> double/extended, is any different from the int -> short/long that C has gotten so much beating for already ? The suggestion was to have fixed precision types: - float => IEEE 754 Single precision (32-bit) - double => IEEE 754 Double precision (64-bit) - extended => IEEE 754 Double Extended precision (80-bit) - quadruple => "IEEE 754" Quadruple precision (128-bit) And then have "real" be an alias to the largest hardware-supported type. It wouldn't break code more than if it was a variadic size type format ?If so, just tell me it's better to have a flexible width language type, than to have some types be unavailable on certain FPU computer hardware?Yes, I believe that is better. Every once in a while, an app *does* care, but they're screwed anyway if the hardware won't support it.When I first looked at the AMD64 documentation, I was thrilled to see "m128" for a floating point type. I was crushed when I found it meant "two 64 bit doubles". I'd love to see a big honker 128 bit floating point type in hardware.I had a similar experience, with PPC64 and GCC, a while back... (-mlong-double-128, referring to the IBM AIX style DoubledDouble) Anyway, double-double has no chance of being full IEEE 755 spec.You lost me ? (anyway, I suggested 128 - which *is* a power of two) But it was my understanding that on the X86/X86_64 family of processors that Windows used to use 10-byte doubles (and then removed extended?), and that Linux i386(-i686) uses 12-byte doubles and Linux X86_64 now uses 16-byte doubles (using the GCC option of -m128bit-long-double) And that was *not* a suggestion, but how it actually worked... Now ? --andersIt was my understanding that it was aligned to 96 bits on X86,That's not a power of 2, so won't work as alignment.
Apr 04 2005
Anders F Björklund wrote:Size can be anything divisible by 8 bits, i.e. any number of bytes. Alignment has to be a power of two, and is about _where_ in memory the thing can or cannot be stored. Align 4 for example, means that the variable cannot be stored in a memory address which, taken as a number, is not divisible by 4. Only something aligned 1 can be stored in any address.You lost me ? (anyway, I suggested 128 - which *is* a power of two)It was my understanding that it was aligned to 96 bits on X86,That's not a power of 2, so won't work as alignment.
Apr 04 2005
Georg Wrede wrote:OK, seems like my sloppy syntax is hurting me once again... :-P I meant that the *size* of "long double" on GCC X86 is 96 bits, so that it can be *aligned* to 32 bits always (unlike 80 bits?) Anyway, aligning to 128 bits gives better Pentium performance ? (or at least, that's what I heard... Only have doubles on PPC) Thanks for clearing it up, in my head 96 bits was "a power of two". (since anything aligned to a multiple of a power of two is fine too) --andersSize can be anything divisible by 8 bits, i.e. any number of bytes. Alignment has to be a power of two, and is about _where_ in memory the thing can or cannot be stored. Align 4 for example, means that the variable cannot be stored in a memory address which, taken as a number, is not divisible by 4. Only something aligned 1 can be stored in any address.You lost me ? (anyway, I suggested 128 - which *is* a power of two)It was my understanding that it was aligned to 96 bits on X86,That's not a power of 2, so won't work as alignment.
Apr 04 2005
"Anders F Björklund" <afb algonet.se> wrote in message news:d2qq5u$1aau$1 digitaldaemon.com...Walter wrote:What happens when someone declares a variable as quadruple on a platform without hardware support? Does D plug in a software quadruple implementation? That isn't the right thing to do. That's been my whole point of bringing up Java's experience. They tried to foist too much rigor on their floating point model in the name of portability and had to redo it.I just fail to see how real -> double/extended, is any different from the int -> short/long that C has gotten so much beating for already ? The suggestion was to have fixed precision types: - float => IEEE 754 Single precision (32-bit) - double => IEEE 754 Double precision (64-bit) - extended => IEEE 754 Double Extended precision (80-bit) - quadruple => "IEEE 754" Quadruple precision (128-bit) And then have "real" be an alias to the largest hardware-supported type. It wouldn't break code more than if it was a variadic size type format ?If so, just tell me it's better to have a flexible width language type, than to have some types be unavailable on certain FPU computer hardware?Yes, I believe that is better. Every once in a while, an app *does* care, but they're screwed anyway if the hardware won't support it.
Apr 04 2005
Ben Hinkle wrote:What happens when someone declares a variable as quadruple on a platform without hardware support? Does D plug in a software quadruple implementation? That isn't the right thing to do. That's been my whole point of bringing up Java's experience. They tried to foist too much rigor on their floating point model in the name of portability and had to redo it.Choke... Splutter... Die. Java did not re-implement extended in software. They just ignored it... --anders
Apr 04 2005
I wrote, in response to Ben Hinkle:Just to be perfectly clear: Those are the sounds the *compiler* would make, not Ben :-) Seriously, trying to use the extended or quadruple types on platforms where they are not implemented in hardware would be a compile time error. "real" would silently fall back. --andersWhat happens when someone declares a variable as quadruple on a platform without hardware support?Choke... Splutter... Die.
Apr 04 2005
"Anders F Björklund" <afb algonet.se> wrote in message news:d2rcfd$1ueq$2 digitaldaemon.com...I wrote, in response to Ben Hinkle:yup, I read it that way - though I did notice I spluttered a bit this morning...Just to be perfectly clear: Those are the sounds the *compiler* would make, not Ben :-)What happens when someone declares a variable as quadruple on a platform without hardware support?Choke... Splutter... Die.Seriously, trying to use the extended or quadruple types on platforms where they are not implemented in hardware would be a compile time error. "real" would silently fall back.OK, needless to say I think a builtin type that is illegal on many platforms is a mistake.
Apr 04 2005
Ben Hinkle wrote:OK, needless to say I think a builtin type that is illegal on many platforms is a mistake.That is actually *not* needless to say, but Walter agrees with you on the topic. Just as we can talk about "real" as the 64/80/96/128 bit floating point type, and not somehow assume that it will be 80 bits - then I'm perfectly fine with it. "long double" in C/C++ works just the same. But if you *do* want to talk about the "X87" 80-bit type, then please do by all means use "extended" instead. Less confusion, all around ? (let's save "quadruple" for later, with "cent") --anders
Apr 04 2005
"Anders F Björklund" <afb algonet.se> wrote in message news:d2rdjp$1vfl$1 digitaldaemon.com...Ben Hinkle wrote:The IEEE 754r suggests that there won't be a 80bit nor a 96bit format in future (whenever this may be). Ref.: My today's post about IEEE 754rOK, needless to say I think a builtin type that is illegal on many platforms is a mistake.That is actually *not* needless to say, but Walter agrees with you on the topic. Just as we can talk about "real" as the 64/80/96/128 bit floating point type, and not somehow assume that it will be 80 bits - then I'm perfectly fine with it. "long double" in C/C++ works just the same. But if you *do* want to talk about the "X87" 80-bit type, then please do by all means use "extended" instead. Less confusion, all around ? (let's save "quadruple" for later, with "cent") --anders
Apr 04 2005
Bob W wrote:According to Sun, Microsoft, IBM and Apple there isn't such a 80-bit type today even... ;-) BTW; the 96-bit floating point was the type preferred by the 68K families FPU processor --andersBut if you *do* want to talk about the "X87" 80-bit type, then please do by all means use "extended" instead. Less confusion, all around ? (let's save "quadruple" for later, with "cent")The IEEE 754r suggests that there won't be a 80bit nor a 96bit format in future (whenever this may be).
Apr 04 2005
"Anders F Björklund" <afb algonet.se> wrote in message news:d2rhtd$258a$1 digitaldaemon.com...Bob W wrote:I fear it will be constant struggle to keep the chipmakers from dropping it and the OS vendors from abandoning support.The IEEE 754r suggests that there won't be a 80bit nor a 96bit format in future (whenever this may be).According to Sun, Microsoft, IBM and Apple there isn't such a 80-bit type today even... ;-)
Apr 04 2005
Ben Hinkle wrote:"Anders F Björklund" <afb algonet.se> wrote in message news:d2qq5u$1aau$1 digitaldaemon.com...Perhaps Ada has the right idea here. Have a system default that depends on the available hardware, but also allow the user to define what size/precision is needed in any particular case. It may slow things down a lot if you demand 17 places of accuracy, but if you really need exactly 17, you should be able to specify it. (OTOH, Ada had the govt. paying for it's development, and it still ended up as a language people didn't want to use.)Walter wrote:What happens when someone declares a variable as quadruple on a platform without hardware support? Does D plug in a software quadruple implementation? That isn't the right thing to do. That's been my whole point of bringing up Java's experience. They tried to foist too much rigor on their floating point model in the name of portability and had to redo it.I just fail to see how real -> double/extended, is any different from the int -> short/long that C has gotten so much beating for already ? The suggestion was to have fixed precision types: - float => IEEE 754 Single precision (32-bit) - double => IEEE 754 Double precision (64-bit) - extended => IEEE 754 Double Extended precision (80-bit) - quadruple => "IEEE 754" Quadruple precision (128-bit) And then have "real" be an alias to the largest hardware-supported type. It wouldn't break code more than if it was a variadic size type format ?If so, just tell me it's better to have a flexible width language type, than to have some types be unavailable on certain FPU computer hardware?Yes, I believe that is better. Every once in a while, an app *does* care, but they're screwed anyway if the hardware won't support it.
Apr 05 2005
"Anders F Björklund" <afb algonet.se> wrote in message news:d2qq5u$1aau$1 digitaldaemon.com...Walter wrote:care,If so, just tell me it's better to have a flexible width language type, than to have some types be unavailable on certain FPU computer hardware?Yes, I believe that is better. Every once in a while, an app *does*Philosophically, they are the same. Practically, however, they are very different. Increasing integer sizes gives more range, and integer calculations tend to be *right* or *wrong*. Floating point increased size, however, gives more precision. So an answer is *better* or *worse*, insted of right or wrong. (Increased bits also gives fp more range, but if the range is not enough, it fails cleanly with an overflow indication, not just wrapping around and giving garbage.) In other words, decreasing the bits in an fp value tends to gracefully degrade the results, which is very different from the effect on integer values.but they're screwed anyway if the hardware won't support it.I just fail to see how real -> double/extended, is any different from the int -> short/long that C has gotten so much beating for already ?The suggestion was to have fixed precision types: - float => IEEE 754 Single precision (32-bit) - double => IEEE 754 Double precision (64-bit) - extended => IEEE 754 Double Extended precision (80-bit) - quadruple => "IEEE 754" Quadruple precision (128-bit) And then have "real" be an alias to the largest hardware-supported type. It wouldn't break code more than if it was a variadic size type format ?I just don't see the advantage. If you use "extended" and your hardware doesn't support it, you're out of luck. If you use "real", your program will still compile and run. If certain characteristics of the "real" type are required, one can use static asserts on the properties of real.There is nothing set up in the operating system or linker to handle alignment to 96 bits or other values not a power of 2. Note that there is a big difference between the size of an object and what its alignment is.You lost me ? (anyway, I suggested 128 - which *is* a power of two)It was my understanding that it was aligned to 96 bits on X86,That's not a power of 2, so won't work as alignment.But it was my understanding that on the X86/X86_64 family of processors that Windows used to use 10-byte doubles (and then removed extended?), and that Linux i386(-i686) uses 12-byte doubles and Linux X86_64 now uses 16-byte doubles (using the GCC option of -m128bit-long-double) And that was *not* a suggestion, but how it actually worked... Now ?Windows uses 10 byte doubles aligned on 2 byte boundaries. I'm not sure if gcc on linux does it that way or not.
Apr 04 2005
Walter wrote:Philosophically, they are the same. Practically, however, they are very different. Increasing integer sizes gives more range, and integer calculations tend to be *right* or *wrong*. Floating point increased size, however, gives more precision. So an answer is *better* or *worse*, insted of right or wrong. (Increased bits also gives fp more range, but if the range is not enough, it fails cleanly with an overflow indication, not just wrapping around and giving garbage.) In other words, decreasing the bits in an fp value tends to gracefully degrade the results, which is very different from the effect on integer values.Interesting view of it, but I think that int fixed-point math degrades gracefully in the same way (using integers) Still with wrapping, though. Not that I've used fixed-point in quite some time, and it doesn't seem like I will be either - with the current CPUs and the new APIs.I just don't see the advantage. If you use "extended" and your hardware doesn't support it, you're out of luck. If you use "real", your program will still compile and run. If certain characteristics of the "real" type are required, one can use static asserts on the properties of real.To be honest, I was just tired of the "real is 80 bits" all over D ? And more than a little annoyed at the ireal and creal, of course ;-) I always thought that "long double" was confusing, so now I've started to use "extended" for 80-bit and "real" for the biggest-available-type. And it's working out good so far.Linux on X86 aligns to 12 bytes, and Linux on X86_64 aligns to 16 bytes. --andersAnd that was *not* a suggestion, but how it actually worked... Now ?Windows uses 10 byte doubles aligned on 2 byte boundaries. I'm not sure if gcc on linux does it that way or not.
Apr 04 2005
Make that "Linux on X86 aligns to 4 bytes, by making the size 12". You know what I mean :-) --andersWindows uses 10 byte doubles aligned on 2 byte boundaries. I'm not sure if gcc on linux does it that way or not.Linux on X86 aligns to 12 bytes, and Linux on X86_64 aligns to 16 bytes.
Apr 04 2005
Walter wrote:"Anders F Björklund" <afb algonet.se> wrote in message news:d2og5l$27nh$3 digitaldaemon.com...Would implementing fixed point arithmetic improve that? Even with a 128-bit integer as the underlying type, I think it would have operational limitations, but it should be a lot faster then "100 times as slow as hardware". (OTOH, there's lots of reasons why it isn't a normal feature of languages. Apple on the 68000 series is the only computer I know of using it, and then only for specialized applications.)Walter wrote:as II haven't done a comprehensive survey of computer languages, but as farwith acan tell D stands pretty much alone in its support for 80 bits, alongYes, it's 64. Guaranteeing 80 bits would require writing an 80 bit software emulator. I've used such emulators before, and they are really, really slow. I don't think it's practical for D floating point to be 100x slower on some machines. ...handful of C/C++ compilers (including DMC).The thing is that the D "real" type does *not* guarantee 80 bits ? It doesn't even say the minimum size, so one can only assume 64...
Apr 05 2005
"Charles Hixson" <charleshixsn earthlink.net> wrote in message news:d2unfm$2n6s$1 digitaldaemon.com...Would implementing fixed point arithmetic improve that? Even with a 128-bit integer as the underlying type, I think it would have operational limitations, but it should be a lot faster then "100 times as slow as hardware". (OTOH, there's lots of reasons why it isn't a normal feature of languages. Apple on the 68000 series is the only computer I know of using it, and then only for specialized applications.)If using a 128 bit fixed point would work, then one can use integer arithmetic on it. But that isn't floating point, which is a fundamentally different animal.
Apr 05 2005
"Walter" <newshound digitalmars.com> wrote in message news:d2od1o$25vd$1 digitaldaemon.com..."Bob W" <nospam aol.com> wrote in message news:d2nd96$1aos$1 digitaldaemon.com...I am probably looking like an extended precison advocate, but I am actually not. The double format was good enough for me even for statistical evaluation in almost 100% of cases. There are admittedly cases which would benefit from having 80 bit precision available, however. Therefore, although it would not be devastating for me should you ever decide to drop support for the reals, I'd still like having them available just in case they are needed. However, if you do offer 80 bit types you'll have to assign real variables with proper real values if evaluation can be completed at compile time. Otherwise I suggest that you issue a warning where accuracy might be impaired. It is hard to believe that a new millennium programming language would actually require people to write real r=1.2L instead of real r=1.2 in order not to produce an incorrect assignment. Yes, I know what C programmers would want to say here, I am one of them. : ) For someone not familiar with C, the number 1.2 is not a real and is not a double either, especially if he is purely mathematically oriented. It is a decimal floating point value. He takes it for granted that 1.2 is fine whether assigned to a float or to a double. But he will refuse to understand why he has to suffix the literal to become an accurate real value. Of course you could try to explain him that the usual +/- 1/2 LSB error for most fractional (decimal) values converted to binary would increase to about 11 LSBs if he ever forgot to use that important "L" suffix. But would he really want to know?By the way, C does it the same way for historic reasons. Other languages are more user friendly and I am still hoping that D might evolve in this direction.Actually, many languages, mathematical programs, and even C compilers have *dropped* support for 80 bit long doubles. At one point, Microsoft had even made it impossible to execute 80 bit floating instructions on their upcoming Win64 (I made some frantic phone calls to them and apparently was the only one who ever made a case to them in favor of 80 bit long doubles, they said they'd put the support back in). Intel doesn't support 80 bit reals on any of their new vector floating point instructions. The 64 bit chips only 80 bit reals. I haven't done a comprehensive survey of computer languages, but as far as I can tell D stands pretty much alone in its support for 80 bits, along with a handful of C/C++ compilers (including DMC). Because of this shaky operating system and chip support for 80 bits, it would be a mistake to center D's floating point around 80 bits. Some systems may force a reversion to 64 bits. On the other hand, ongoing system support for 64 bit doubles is virtually guaranteed, and D generally follows C's rules with these. (BTW, this thread is a classic example of "build it, and they will come". D is almost single handedly rescuing 80 bit floating point from oblivion, since it makes such a big deal about it and has wound up interesting a lot of people in it. Before D, as far as I could tell, nobody cared a whit about it. I think it's great that this has struck such a responsive chord.)
Apr 03 2005
"Walter" <newshound digitalmars.com> wrote in message news:d2kk71$1pnl$1 digitaldaemon.com..."Bob W" <nospam aol.com> wrote in message news:d2ieh5$2ksl$1 digitaldaemon.com...I still don't buy that. Example: std.string.atof() as mentioned below.- D is not entirely 80-bit based as claimed.Maybe there is a misunderstanding: I just wanted to mention that although it is claimed that the default internal FP format is 80 bits, the default floating point format for literals is double. The lexer, (at least to my understanding) seems to confirm this. Therefore, if someone does not want to experience a loss in precision, he ALWAYS needs to use the L suffix for literals, otherwise he gets a real which was converted from a double. e.g.: real r1=1.2L; // this one is ok thanks to the suffix real r2=1.2; // loss in precision, double convt'd to real- Literals are converted to 64 bit first (and from there to 80 bits) at compile time if no suffix is used, even if the target is of type 'real'.Incorrect. You can see for yourself in lexer.c. Do a grep for "strtold".This one yes, but not the official Phobos version std.string.atof() which I have used. Phobos docs suggest that atof() can be found in 1) std.math (n/a) 2) std.string Since I have not found any atof() function in std.math and std.math2 is not even mentioned in the Phobos docs, I've got it from std.string AND THIS ONE IS 64 BIT! --------- quote from "c.stdlib.d" --------- double atof(char *); --------------- unquote ------------------- --------- quote from "string.d" ---------- real atof(char[] s) { // BUG: should implement atold() return std.c.stdlib.atof(toStringz(s)); } --------------- unquote ------------------- Due to heavy workload this issue might have been overlooked. Luckily I do not even have to mention the word "BUG", this was aparently already done in the author's comment line. : ) After searching the archives it looks like someone was already troubled by the multiple appearance of atof() in Nov 2004: http://www.digitalmars.com/d/archives/digitalmars/D/bugs/2196.html- atof() for example is returning a 'real' value which is obviously derived from a 'double', thus missing some essential bits at the end.Check out std.math2.atof(). It's fully 80 bit.In accordance to what I have mentioned before, the following program demonstrates the existence of "truncated" reals: void main() { real r1=1.2L; // converted directly to 80 bit value real r2=1.2; // parsed to 64b, then convt'd to 80b writefln("Genuine : %a",r1); writefln("Truncated: %a",r2); } Output (using %a): Genuine : 0x1.3333333333333334p+0 Truncated: 0x1.3333333333333p+0 Alternative Output: Genuine: 1.20000000000000000 [3fff 99999999 9999999a] Truncated: 1.19999999999999996 [3fff 99999999 99999800]Example: The hex value for 0.0000195 in 'real' can be expressed as 3fef a393ee5e edcc20d5 or 3fef a393ee5e edcc20d6 (due to the non-decimal fraction). The same value converted from a 'double' would be 3fef a393ee5e edcc2000 and therefore misses several trailing bits. This could cause the floor() function to misbehave. I hope this info was somewhat useful.Perhaps the following program will help: import std.stdio; void main() { writefln("float %a", 0.0000195F); writefln("double %a", 0.0000195); writefln("real %a", 0.0000195L); writefln("cast(real)float %a", cast(real)0.0000195F); writefln("cast(real)double %a", cast(real)0.0000195); writefln("cast(real)real %a", cast(real)0.0000195L); writefln("float %a", 0.0000195F * 7 - 195); writefln("double %a", 0.0000195 * 7 - 195); writefln("real %a", 0.0000195L * 7 - 195); } float 0x1.4727dcp-16 double 0x1.4727dcbddb984p-16 real 0x1.4727dcbddb9841acp-16 cast(real)float 0x1.4727dcp-16 cast(real)double 0x1.4727dcbddb984p-16 cast(real)real 0x1.4727dcbddb9841acp-16 float -0x1.85ffeep+7 double -0x1.85ffee1bd1edap+7 real -0x1.85ffee1bd1ed9dfep+7
Apr 01 2005
"Bob W" <nospam aol.com> wrote in message news:d2kvcc$22qa$1 digitaldaemon.com...I just wanted to mention that although it is claimed that the default internal FP format is 80 bits,Actually, what is happening is that if you write the expression: double a, b, c, d; a = b + c + d; then the intermediate values generated by b+c+d are allowed (but not required) to be evaluated to the largest precision available. This means that it's allowed to evaluate it as: a = cast(double)(cast(real)b + cast(real)c + cast(real)d)); but it is not required to evaluate it in that way. This produces a slightly different result than: double t; t = b + c; a = t + d; The latter is the way Java is specified to work, which turns out to be both numerically inferior and *slower* on the x86 FPU. The x86 FPU *wants* to evaluate things to 80 bits. The D compiler's internal paths fully support 80 bit arithmetic, that means there are no surprising "choke points" where it gets truncated to 64 bits. If the type of a literal is specified to be 'double', which is the case for no suffix, then you get 64 bits of precision. I hope you'll agree that that is the least surprising thing to do.True, that's a bug, and I'll fix itCheck out std.math2.atof(). It's fully 80 bit.I've got it from std.string AND THIS ONE IS 64 BIT!
Apr 01 2005
I have started a new thread: "80 Bit Challenge", which should serve as a reply to your post ....
Apr 02 2005