## digitalmars.D - Exotic floor() function - D is different

• Bob W (43/43) Mar 28 2005 The floor() function in D does not produce equivalent
• Walter (20/50) Mar 30 2005 What you're seeing is the result of using 80 bit precision, which is wha...
• Derek Parnell (42/104) Mar 31 2005 I can follow what you say, but can you explain the output of the program
• Bob W (13/124) Mar 31 2005 Great job! I could not believe it first:
• Walter (23/29) Apr 01 2005 I suggest in general viewing how these things work (floating, chopping,
• Bob W (29/90) Mar 31 2005 Thank you for your information, Walter.
• Walter (28/46) Apr 01 2005 Not true, it fully supports 80 bits.
• Derek Parnell (45/109) Apr 01 2005 I repeat, (I think) I understand what you are saying but can you explain
• Bob W (57/106) Apr 01 2005 I still don't buy that.
• Walter (23/27) Apr 01 2005 Actually, what is happening is that if you write the expression:
• Bob W (2/2) Apr 02 2005 I have started a new thread: "80 Bit Challenge",
"Bob W" <nospam aol.com> writes:
```The floor() function in D does not produce equivalent
results compared to a bunch of other languages
tested. The other languages were:

dmc
djgpp
dmdscript
jscript
assembler ('87 code)

The biggest surprise was that neither dmc nor
dmdscript were able to match the D results.

The sample program below gets an input
from the command line, converts it, multiplies
it with 1e6 and adds 0.5 before calling the
floor() function. The expected result, based on
an input of 0.0000195, would be 20.0, but
D thinks it should be 19.0.

Since 0.0000195 cannot be represented
accurately in any of the usual floating point
formats, the somewhat unique D result is
probably not even a bug. But it is a major
inconvenience when comparing numerical
outputs produced by different programs.

So far I was unable to reproduce the rounding
issue in D with any other language tested.
(I have even tried OpenOffice to check.)
Before someone tells me that D uses a
different floating point format, I'd like to
mention that I have used float, double and
long double in the equivalent C programs
without any changes.

//------------------------------

import std.stdio,std.string,std.math;

int main(char[][] av) {
if (av.length!=2) {
printf("\nEnter Val! (e.g. 0.0000195)\n");  return(0);
}

double x=atof(av[1]);                    // expecting 0.0000195;
writef("          x*1e6:%12.6f\n",x*1e6);
writef("     floor(x..):%12.6f\n",floor(1e6*x));
writef("  floor(.5+x..):%12.6f\n",floor(.5 + 1e6*x));
writef("  floor(.5+co.):%12.6f\n",floor(.5 + 1e6*0.0000195));

return(0);
}
```
Mar 28 2005
"Walter" <newshound digitalmars.com> writes:
```"Bob W" <nospam aol.com> wrote in message
news:d2aash\$a4s\$1 digitaldaemon.com...
The floor() function in D does not produce equivalent
results compared to a bunch of other languages
tested. The other languages were:

dmc
djgpp
dmdscript
jscript
assembler ('87 code)

The biggest surprise was that neither dmc nor
dmdscript were able to match the D results.

The sample program below gets an input
from the command line, converts it, multiplies
it with 1e6 and adds 0.5 before calling the
floor() function. The expected result, based on
an input of 0.0000195, would be 20.0, but
D thinks it should be 19.0.

Since 0.0000195 cannot be represented
accurately in any of the usual floating point
formats, the somewhat unique D result is
probably not even a bug. But it is a major
inconvenience when comparing numerical
outputs produced by different programs.

So far I was unable to reproduce the rounding
issue in D with any other language tested.
(I have even tried OpenOffice to check.)
Before someone tells me that D uses a
different floating point format, I'd like to
mention that I have used float, double and
long double in the equivalent C programs
without any changes.

What you're seeing is the result of using 80 bit precision, which is what D
uses in internal calculations. .0000195 is not represented exactly, to print
the number it is rounded. So, depending on how many bits of precision there
are in the representation, it might be one bit, 63 bits to the right, under
"5", so floor() will chop it down.

Few C compilers support 80 bit long doubles, they implement them as 64 bit
ones. Very few programs use 80 bit reals.

The std.math.floor function uses 80 bit precision. If you want to use the C

extern (C) double floor(double);

Then the results are:

x*1e6:   19.500000
floor(x..):   19.000000
floor(.5+x..):   20.000000
floor(.5+co.):   20.000000

I suggest that while it's a reasonable thing to require a minimum number of
floating point bits for a computation, it's probably not a good idea to
require a maximum.
```
Mar 30 2005
Derek Parnell <derek psych.ward> writes:
```On Wed, 30 Mar 2005 21:43:07 -0800, Walter wrote:

"Bob W" <nospam aol.com> wrote in message
news:d2aash\$a4s\$1 digitaldaemon.com...
The floor() function in D does not produce equivalent
results compared to a bunch of other languages
tested. The other languages were:

dmc
djgpp
dmdscript
jscript
assembler ('87 code)

The biggest surprise was that neither dmc nor
dmdscript were able to match the D results.

The sample program below gets an input
from the command line, converts it, multiplies
it with 1e6 and adds 0.5 before calling the
floor() function. The expected result, based on
an input of 0.0000195, would be 20.0, but
D thinks it should be 19.0.

Since 0.0000195 cannot be represented
accurately in any of the usual floating point
formats, the somewhat unique D result is
probably not even a bug. But it is a major
inconvenience when comparing numerical
outputs produced by different programs.

So far I was unable to reproduce the rounding
issue in D with any other language tested.
(I have even tried OpenOffice to check.)
Before someone tells me that D uses a
different floating point format, I'd like to
mention that I have used float, double and
long double in the equivalent C programs
without any changes.

What you're seeing is the result of using 80 bit precision, which is what D
uses in internal calculations. .0000195 is not represented exactly, to print
the number it is rounded. So, depending on how many bits of precision there
are in the representation, it might be one bit, 63 bits to the right, under
"5", so floor() will chop it down.

Few C compilers support 80 bit long doubles, they implement them as 64 bit
ones. Very few programs use 80 bit reals.

The std.math.floor function uses 80 bit precision. If you want to use the C

extern (C) double floor(double);

Then the results are:

x*1e6:   19.500000
floor(x..):   19.000000
floor(.5+x..):   20.000000
floor(.5+co.):   20.000000

I suggest that while it's a reasonable thing to require a minimum number of
floating point bits for a computation, it's probably not a good idea to
require a maximum.

I can follow what you say, but can you explain the output of the program
below? There appears to be a difference in the way variables and literals
are treated.

import std.stdio;
import std.math;
import std.string;

void main() {

float  x;
double y;
real   z;

x = 0.0000195;
y = 0.0000195;
z = 0.0000195;
writefln("                          Raw            Floor");
writefln("Using float  variable: %12.6f %12.6f",
(.5 + 1e6*x), floor(.5 + 1e6*x));
writefln("Using double variable: %12.6f %12.6f",
(.5 + 1e6*y), floor(.5 + 1e6*y));
writefln("Using real   variable: %12.6f %12.6f",
(.5 + 1e6*z), floor(.5 + 1e6*z));

writefln("Using float   literal: %12.6f %12.6f",
(.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));
writefln("Using double  literal: %12.6f %12.6f",
(.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195));
writefln("Using real    literal: %12.6f %12.6f",
(.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l));

}

----------
I get the following output...
----------
Raw          Floor
Using float  variable:    19.999999    19.000000
Using double variable:    20.000000    19.000000
Using real   variable:    20.000000    19.000000
Using float   literal:    19.999999    20.000000
Using double  literal:    20.000000    20.000000
Using real    literal:    20.000000    20.000000

--
Derek
Melbourne, Australia
31/03/2005 6:43:48 PM
```
Mar 31 2005
"Bob W" <nospam aol.com> writes:
```"Derek Parnell" <derek psych.ward> wrote in message
news:7di6xztjokyz.6vnxzcx1d7l8.dlg 40tude.net...
On Wed, 30 Mar 2005 21:43:07 -0800, Walter wrote:

"Bob W" <nospam aol.com> wrote in message
news:d2aash\$a4s\$1 digitaldaemon.com...
The floor() function in D does not produce equivalent
results compared to a bunch of other languages
tested. The other languages were:

dmc
djgpp
dmdscript
jscript
assembler ('87 code)

The biggest surprise was that neither dmc nor
dmdscript were able to match the D results.

The sample program below gets an input
from the command line, converts it, multiplies
it with 1e6 and adds 0.5 before calling the
floor() function. The expected result, based on
an input of 0.0000195, would be 20.0, but
D thinks it should be 19.0.

Since 0.0000195 cannot be represented
accurately in any of the usual floating point
formats, the somewhat unique D result is
probably not even a bug. But it is a major
inconvenience when comparing numerical
outputs produced by different programs.

So far I was unable to reproduce the rounding
issue in D with any other language tested.
(I have even tried OpenOffice to check.)
Before someone tells me that D uses a
different floating point format, I'd like to
mention that I have used float, double and
long double in the equivalent C programs
without any changes.

What you're seeing is the result of using 80 bit precision, which is what
D
uses in internal calculations. .0000195 is not represented exactly, to
print
the number it is rounded. So, depending on how many bits of precision
there
are in the representation, it might be one bit, 63 bits to the right,
under
"5", so floor() will chop it down.

Few C compilers support 80 bit long doubles, they implement them as 64
bit
ones. Very few programs use 80 bit reals.

The std.math.floor function uses 80 bit precision. If you want to use the
C

extern (C) double floor(double);

Then the results are:

x*1e6:   19.500000
floor(x..):   19.000000
floor(.5+x..):   20.000000
floor(.5+co.):   20.000000

I suggest that while it's a reasonable thing to require a minimum number
of
floating point bits for a computation, it's probably not a good idea to
require a maximum.

I can follow what you say, but can you explain the output of the program
below? There appears to be a difference in the way variables and literals
are treated.

import std.stdio;
import std.math;
import std.string;

void main() {

float  x;
double y;
real   z;

x = 0.0000195;
y = 0.0000195;
z = 0.0000195;
writefln("                          Raw            Floor");
writefln("Using float  variable: %12.6f %12.6f",
(.5 + 1e6*x), floor(.5 + 1e6*x));
writefln("Using double variable: %12.6f %12.6f",
(.5 + 1e6*y), floor(.5 + 1e6*y));
writefln("Using real   variable: %12.6f %12.6f",
(.5 + 1e6*z), floor(.5 + 1e6*z));

writefln("Using float   literal: %12.6f %12.6f",
(.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));
writefln("Using double  literal: %12.6f %12.6f",
(.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195));
writefln("Using real    literal: %12.6f %12.6f",
(.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l));

}

----------
I get the following output...
----------
Raw          Floor
Using float  variable:    19.999999    19.000000
Using double variable:    20.000000    19.000000
Using real   variable:    20.000000    19.000000
Using float   literal:    19.999999    20.000000
Using double  literal:    20.000000    20.000000
Using real    literal:    20.000000    20.000000

--
Derek
Melbourne, Australia
31/03/2005 6:43:48 PM

Great job! I could not believe it first:

writefln("Using float   literal: %12.6f %12.6f",
(.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));

producing the following output:

Using float  variable:    19.999999    20.000000

Looks like floor() mutates to ceil() at times. To ensure
that this is not "down under" specific (Melbourne),
I have repeated your test in the northern hemisphere,
and, not surprisingly, it did the same thing. Now
I am pretty curious to know why this is happening.

We'll see if Walter comes up with an answer .....
```
Mar 31 2005
"Walter" <newshound digitalmars.com> writes:
```"Bob W" <nospam aol.com> wrote in message
news:d2i3et\$27dg\$1 digitaldaemon.com...
Great job! I could not believe it first:

writefln("Using float   literal: %12.6f %12.6f",
(.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));

producing the following output:

Using float  variable:    19.999999    20.000000

We'll see if Walter comes up with an answer .....

I suggest in general viewing how these things work (floating, chopping,
rounding, precision, etc.) is to print things using the %a format (which
prints out ALL the bits in hexadecimal format).

As to the specific case above, let's break down each (using suffix 'd' to
represent double):

(.5 + 1e6*0.0000195f) => (.5d + 1e6d * cast(double)0.0000195f), result is
double
floor(.5 + 1e6*0.0000195f)) => floor(cast(real)(.5d + 1e6d *
cast(double)0.0000195f)), result is real

When writef prints a real, it adds ".5" to the last signficant decimal digit
and chops. This will give DIFFERENT results for a double and for a real.
It's also DIFFERENT from the binary rounding that goes on in intermediate
floating point calculations, which adds "half a bit" (not .5) and chops.
Also, realize that internally to the FPU, a "guard bit" and a "sticky bit"
are maintained for a floating point value, these influence rounding, and are
discarded when a value leaves the FPU and is written to memory.

What is happening here is that you start with a value that is not exactly
representable, then putting it through a series of precision changes and
roundings, and comparing it with the result of a different series of
precision changes and roundings, and expecting the results to match bit for
bit. There's no way to make that happen.
```
Apr 01 2005
"Bob W" <nospam aol.com> writes:
```"Walter" <newshound digitalmars.com> wrote in message
news:d2g9jj\$8om\$1 digitaldaemon.com...
"Bob W" <nospam aol.com> wrote in message
news:d2aash\$a4s\$1 digitaldaemon.com...
The floor() function in D does not produce equivalent
results compared to a bunch of other languages
tested. The other languages were:

dmc
djgpp
dmdscript
jscript
assembler ('87 code)

The biggest surprise was that neither dmc nor
dmdscript were able to match the D results.

The sample program below gets an input
from the command line, converts it, multiplies
it with 1e6 and adds 0.5 before calling the
floor() function. The expected result, based on
an input of 0.0000195, would be 20.0, but
D thinks it should be 19.0.

Since 0.0000195 cannot be represented
accurately in any of the usual floating point
formats, the somewhat unique D result is
probably not even a bug. But it is a major
inconvenience when comparing numerical
outputs produced by different programs.

So far I was unable to reproduce the rounding
issue in D with any other language tested.
(I have even tried OpenOffice to check.)
Before someone tells me that D uses a
different floating point format, I'd like to
mention that I have used float, double and
long double in the equivalent C programs
without any changes.

What you're seeing is the result of using 80 bit precision, which is what
D
uses in internal calculations. .0000195 is not represented exactly, to
print
the number it is rounded. So, depending on how many bits of precision
there
are in the representation, it might be one bit, 63 bits to the right,
under
"5", so floor() will chop it down.

Few C compilers support 80 bit long doubles, they implement them as 64 bit
ones. Very few programs use 80 bit reals.

The std.math.floor function uses 80 bit precision. If you want to use the
C

extern (C) double floor(double);

Then the results are:

x*1e6:   19.500000
floor(x..):   19.000000
floor(.5+x..):   20.000000
floor(.5+co.):   20.000000

I suggest that while it's a reasonable thing to require a minimum number
of
floating point bits for a computation, it's probably not a good idea to
require a maximum.

Thank you for your information, Walter.

However, I am not convinced that the culprit ist the
80-bit floating point format. This is due to some tests
I have made programming the FPU directly.

Based on my above stated example, the 80 bit format
is perfectly capable to generate the 'mainstream result'
of 20 as opposed to the lone 19 which D is producing.

- D is not entirely 80-bit based as claimed.

- Literals are converted to 64 bit first (and from there
to 80 bits) at compile time if no suffix is used, even
if the target is of type 'real'.

- atof() for example is returning a 'real' value which is
obviously derived from a 'double', thus missing some
essential bits at the end.

Example:

The hex value for 0.0000195 in 'real' can be expressed as
3fef a393ee5e edcc20d5
or
3fef a393ee5e edcc20d6
(due to the non-decimal fraction).

The same value converted from a 'double' would be
3fef a393ee5e edcc2000
and therefore misses several trailing bits. This could
cause the floor() function to misbehave.

I hope this info was somewhat useful.

Cheers.
```
Mar 31 2005
"Walter" <newshound digitalmars.com> writes:
```"Bob W" <nospam aol.com> wrote in message
news:d2ieh5\$2ksl\$1 digitaldaemon.com...
- D is not entirely 80-bit based as claimed.

Not true, it fully supports 80 bits.

- Literals are converted to 64 bit first (and from there
to 80 bits) at compile time if no suffix is used, even
if the target is of type 'real'.

Incorrect. You can see for yourself in lexer.c. Do a grep for "strtold".

- atof() for example is returning a 'real' value which is
obviously derived from a 'double', thus missing some
essential bits at the end.

Check out std.math2.atof(). It's fully 80 bit.

Example:

The hex value for 0.0000195 in 'real' can be expressed as
3fef a393ee5e edcc20d5
or
3fef a393ee5e edcc20d6
(due to the non-decimal fraction).

The same value converted from a 'double' would be
3fef a393ee5e edcc2000
and therefore misses several trailing bits. This could
cause the floor() function to misbehave.

I hope this info was somewhat useful.

Perhaps the following program will help:

import std.stdio;

void main()
{
writefln("float  %a", 0.0000195F);
writefln("double %a", 0.0000195);
writefln("real   %a", 0.0000195L);

writefln("cast(real)float  %a", cast(real)0.0000195F);
writefln("cast(real)double %a", cast(real)0.0000195);
writefln("cast(real)real   %a", cast(real)0.0000195L);

writefln("float  %a", 0.0000195F * 7 - 195);
writefln("double %a", 0.0000195  * 7 - 195);
writefln("real   %a", 0.0000195L * 7 - 195);
}

float  0x1.4727dcp-16
double 0x1.4727dcbddb984p-16
real   0x1.4727dcbddb9841acp-16
cast(real)float  0x1.4727dcp-16
cast(real)double 0x1.4727dcbddb984p-16
cast(real)real   0x1.4727dcbddb9841acp-16
float  -0x1.85ffeep+7
double -0x1.85ffee1bd1edap+7
real   -0x1.85ffee1bd1ed9dfep+7
```
Apr 01 2005
Derek Parnell <derek psych.ward> writes:
```On Fri, 1 Apr 2005 15:03:02 -0800, Walter wrote:

"Bob W" <nospam aol.com> wrote in message
news:d2ieh5\$2ksl\$1 digitaldaemon.com...
- D is not entirely 80-bit based as claimed.

Not true, it fully supports 80 bits.

- Literals are converted to 64 bit first (and from there
to 80 bits) at compile time if no suffix is used, even
if the target is of type 'real'.

Incorrect. You can see for yourself in lexer.c. Do a grep for "strtold".

- atof() for example is returning a 'real' value which is
obviously derived from a 'double', thus missing some
essential bits at the end.

Check out std.math2.atof(). It's fully 80 bit.

Example:

The hex value for 0.0000195 in 'real' can be expressed as
3fef a393ee5e edcc20d5
or
3fef a393ee5e edcc20d6
(due to the non-decimal fraction).

The same value converted from a 'double' would be
3fef a393ee5e edcc2000
and therefore misses several trailing bits. This could
cause the floor() function to misbehave.

I hope this info was somewhat useful.

Perhaps the following program will help:

import std.stdio;

void main()
{
writefln("float  %a", 0.0000195F);
writefln("double %a", 0.0000195);
writefln("real   %a", 0.0000195L);

writefln("cast(real)float  %a", cast(real)0.0000195F);
writefln("cast(real)double %a", cast(real)0.0000195);
writefln("cast(real)real   %a", cast(real)0.0000195L);

writefln("float  %a", 0.0000195F * 7 - 195);
writefln("double %a", 0.0000195  * 7 - 195);
writefln("real   %a", 0.0000195L * 7 - 195);
}

float  0x1.4727dcp-16
double 0x1.4727dcbddb984p-16
real   0x1.4727dcbddb9841acp-16
cast(real)float  0x1.4727dcp-16
cast(real)double 0x1.4727dcbddb984p-16
cast(real)real   0x1.4727dcbddb9841acp-16
float  -0x1.85ffeep+7
double -0x1.85ffee1bd1edap+7
real   -0x1.85ffee1bd1ed9dfep+7

I repeat, (I think) I understand what you are saying but can you explain
the output of this ...
<code>
import std.stdio;
import std.math;
import std.string;

void main() {

float  x;
double y;
real   z;

x = 0.0000195;
y = 0.0000195;
z = 0.0000195;
writefln("                       %24s %24s","Raw","Floor");
writefln("Using float  variable: %24a %24a",
(.5 + 1e6*x), floor(.5 + 1e6*x));
writefln("Using double variable: %24a %24a",
(.5 + 1e6*y), floor(.5 + 1e6*y));
writefln("Using real   variable: %24a %24a",
(.5 + 1e6*z), floor(.5 + 1e6*z));

writefln("Using float   literal: %24a %24a",
(.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));
writefln("Using double  literal: %24a %24a",
(.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195));
writefln("Using real    literal: %24a %24a",
(.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l));

}
</code>

______________
Output is ...

Raw                    Floor
Using float  variable:         0x1.3fffff4afp+4                 0x1.3p+4
Using double variable:                 0x1.4p+4                 0x1.3p+4
Using real   variable:  0x1.3ffffffffffffe68p+4                 0x1.3p+4
Using float   literal:         0x1.3fffff4afp+4                 0x1.4p+4
Using double  literal:                 0x1.4p+4                 0x1.4p+4
Using real    literal:  0x1.4000000000000002p+4                 0x1.4p+4

There seems to be different treatment of literals and variables.

Even apart from that, given the values above, I can understand the floor
behaviour except for lines 2(double variable)  and 6 (real literal).

--
Derek Parnell
Melbourne, Australia
2/04/2005 10:19:43 AM
```
Apr 01 2005
"Walter" <newshound digitalmars.com> writes:
```"Derek Parnell" <derek psych.ward> wrote in message
news:eouhnxxkjb80\$.clvse1356mlr.dlg 40tude.net...
There seems to be different treatment of literals and variables.

No, there isn't. The reason for the difference is when you assign the
literal to z. Use the 'L' suffix for a real literal.
```
Apr 01 2005
Derek Parnell <derek psych.ward> writes:
```On Fri, 1 Apr 2005 18:50:40 -0800, Walter wrote:

"Derek Parnell" <derek psych.ward> wrote in message
news:eouhnxxkjb80\$.clvse1356mlr.dlg 40tude.net...
There seems to be different treatment of literals and variables.

No, there isn't. The reason for the difference is when you assign the
literal to z. Use the 'L' suffix for a real literal.

Ok, I did that. And I still can't explain the output.

Raw                    Floor
Using float  variable:         0x1.3fffff4afp+4                 0x1.3p+4
Using double variable:                 0x1.4p+4                 0x1.3p+4
Using real   variable:  0x1.4000000000000002p+4                 0x1.4p+4
Using float   literal:         0x1.3fffff4afp+4                 0x1.4p+4
Using double  literal:                 0x1.4p+4                 0x1.4p+4
Using real    literal:  0x1.4000000000000002p+4                 0x1.4p+4

Look at the results for doubles. How does floor(0x1.4p+4) give 0x1.3p+4
when the expression is a variable and give 0x1.4p+4 when the expression is
a literal?

--
Derek Parnell
Melbourne, Australia
2/04/2005 3:34:22 PM
```
Apr 01 2005
Derek Parnell <derek psych.ward> writes:
```On Sat, 2 Apr 2005 15:39:01 +1000, Derek Parnell wrote:

I've reformatted the display to make it easier to spot the anomaly.

Raw                    Floor
Using float  variable:         0x1.3fffff4afp+4                 0x1.3p+4
Using float   literal:         0x1.3fffff4afp+4                 0x1.4p+4

Using double variable:                 0x1.4p+4                 0x1.3p+4
Using double  literal:                 0x1.4p+4                 0x1.4p+4

Using real   variable:  0x1.4000000000000002p+4                 0x1.4p+4
Using real    literal:  0x1.4000000000000002p+4                 0x1.4p+4

And here is the program that created the above ...
<code>
import std.stdio;
import std.math;
import std.string;

void main() {

float  x;
double y;
real   z;

x = 0.0000195F;
y = 0.0000195;
z = 0.0000195L;
writefln("                       %24s %24s","Raw","Floor");
writefln("Using float  variable: %24a %24a",
(.5 + 1e6*x), floor(.5 + 1e6*x));
writefln("Using float   literal: %24a %24a",
(.5 + 1e6*0.0000195f), floor(.5 + 1e6*0.0000195f));

writefln("");
writefln("Using double variable: %24a %24a",
(.5 + 1e6*y), floor(.5 + 1e6*y));
writefln("Using double  literal: %24a %24a",
(.5 + 1e6*0.0000195), floor(.5 + 1e6*0.0000195));

writefln("");
writefln("Using real   variable: %24a %24a",
(.5 + 1e6*z), floor(.5 + 1e6*z));

writefln("Using real    literal: %24a %24a",
(.5 + 1e6*0.0000195l), floor(.5 + 1e6*0.0000195l));

}
</code>
--
Derek Parnell
Melbourne, Australia
2/04/2005 4:48:12 PM
```
Apr 01 2005
"Walter" <newshound digitalmars.com> writes:
```"Derek Parnell" <derek psych.ward> wrote in message
news:124cwpdauczht\$.1wqi8sqkdi4ec.dlg 40tude.net...
Ok, I did that. And I still can't explain the output.

Recall that, at runtime, the intermediate values are allowed to be carried
out to 80 bits. So,

floor(.5 + 1e6*y)

is evaluated as:

floor(cast(real).5 + cast(real)(1e6) * cast(real)y);

whereas:

floor(.5 + 1e6*0.0000195)

is evaluated as:

float(cast(real)(.5 + 1e6*0.0000195))

hence the difference in result.
```
Apr 02 2005
"Bob W" <nospam aol.com> writes:
```"Walter" <newshound digitalmars.com> wrote in message
news:d2lodh\$2qhf\$1 digitaldaemon.com...
"Derek Parnell" <derek psych.ward> wrote in message
news:124cwpdauczht\$.1wqi8sqkdi4ec.dlg 40tude.net...
Ok, I did that. And I still can't explain the output.

Recall that, at runtime, the intermediate values are allowed to be carried
out to 80 bits. So,

floor(.5 + 1e6*y)

is evaluated as:

floor(cast(real).5 + cast(real)(1e6) * cast(real)y);

whereas:

floor(.5 + 1e6*0.0000195)

is evaluated as:

float(cast(real)(.5 + 1e6*0.0000195))

hence the difference in result.

It's C legacy hidden in the way the compiler parses
this code. You'll be facing these kind of questions
over and over again, unless you move a step further
away from C and let the compiler treat unsuffixed
literals as the "internal compiler floating point
precision format".

See my thread: "80 Bit Challenge"
```
Apr 02 2005
Derek Parnell <derek psych.ward> writes:
```On Sat, 2 Apr 2005 01:23:46 -0800, Walter wrote:

"Derek Parnell" <derek psych.ward> wrote in message
news:124cwpdauczht\$.1wqi8sqkdi4ec.dlg 40tude.net...
Ok, I did that. And I still can't explain the output.

Recall that, at runtime, the intermediate values are allowed to be carried
out to 80 bits. So,

floor(.5 + 1e6*y)

is evaluated as:

floor(cast(real).5 + cast(real)(1e6) * cast(real)y);

whereas:

floor(.5 + 1e6*0.0000195)

is evaluated as:

float(cast(real)(.5 + 1e6*0.0000195))

hence the difference in result.

Got it.

So to summarize, in expressions that contain at least one double variable,
each term is promoted to real before expression evaluation, but if the
expression only contains double literals, then the terms are not promoted
to real.

Why did you decide to have this anomaly?

--
Derek Parnell
Melbourne, Australia
2/04/2005 11:46:44 PM
```
Apr 02 2005
"Walter" <newshound digitalmars.com> writes:
```"Derek Parnell" <derek psych.ward> wrote in message
So to summarize, in expressions that contain at least one double variable,
each term is promoted to real before expression evaluation, but if the
expression only contains double literals, then the terms are not promoted
to real.

Why did you decide to have this anomaly?

It's the way C works.
```
Apr 02 2005
Derek Parnell <derek psych.ward> writes:
```On Sat, 2 Apr 2005 10:04:32 -0800, Walter wrote:

"Derek Parnell" <derek psych.ward> wrote in message
So to summarize, in expressions that contain at least one double variable,
each term is promoted to real before expression evaluation, but if the
expression only contains double literals, then the terms are not promoted
to real.

Why did you decide to have this anomaly?

It's the way C works.

I understand. And here I was thinking that D was meant to be better than C.

--
Derek Parnell
Melbourne, Australia
3/04/2005 8:13:09 AM
```
Apr 02 2005
"Bob W" <nospam aol.com> writes:
```"Derek Parnell" <derek psych.ward> wrote in message
On Sat, 2 Apr 2005 01:23:46 -0800, Walter wrote:

"Derek Parnell" <derek psych.ward> wrote in message
news:124cwpdauczht\$.1wqi8sqkdi4ec.dlg 40tude.net...
Ok, I did that. And I still can't explain the output.

Recall that, at runtime, the intermediate values are allowed to be
carried
out to 80 bits. So,

floor(.5 + 1e6*y)

is evaluated as:

floor(cast(real).5 + cast(real)(1e6) * cast(real)y);

whereas:

floor(.5 + 1e6*0.0000195)

is evaluated as:

float(cast(real)(.5 + 1e6*0.0000195))

hence the difference in result.

Got it.

So to summarize, in expressions that contain at least one double variable,
each term is promoted to real before expression evaluation, but if the
expression only contains double literals, then the terms are not promoted
to real.

Why did you decide to have this anomaly?

--
Derek Parnell
Melbourne, Australia
2/04/2005 11:46:44 PM

Some further info:

Currently it seems that in the D language no
literal is ever promoted to real directly if it
was not suffixed with a "L". You can cast(real)
it, and it will still be a double which is
converted to a crippled real in the FPU, because
some of its matissa bits went missing.

There are many exceptions though: All floating
point integers (1.0 2.0 10.0 etc.) and fractions
like 0.5 0.25 0.125 etc. are converted to proper
real values, because they are accurately
represented in binary floating point formats.
But even they are initially doubles, which are
just unharmed by the conversion because most
of their trailing mantissa bits are zero.

Any other fractional number (e.g. 1.2) cannot be
represented accurately in the binary system, so
its double representation is not equivalent to
its real representation (nor is it to the decimal
literal). If such a double is converted to real,
it is missing several bits of precision, so it
will not correspond accurately to its properly
converted counterpart (e.g. 1.2L).

As a summary: If you feel the need using
extended double (real) precision in D,
never ever forget the "L" for literals unless
you want "special effects".

Examples:

real r=1.2L;      // proper 80 bit real assigned to r
real r=1.2;       // inaccurate truncated 80 bit real
real r=2.4/2.0;   // inaccurate (2.4 loses precision)
real r=2.4/2.0L;  // inaccurate for the same reason
real r=2.4L/2.0;  // this one will work (2.0 == 2.0L)
real r=2.4L/2.0L; // thats the safe way to do it
real r=cast(real)1.2;  // inaccurate, converted from
// 1.2 as a double

By the way, C does it the same way for historic
reasons. Other languages are more user friendly
and I am still hoping that D might evolve in this
direction.
```
Apr 02 2005
"Walter" <newshound digitalmars.com> writes:
```"Bob W" <nospam aol.com> wrote in message
news:d2nd96\$1aos\$1 digitaldaemon.com...
By the way, C does it the same way for historic
reasons. Other languages are more user friendly
and I am still hoping that D might evolve in this
direction.

Actually, many languages, mathematical programs, and even C compilers have
*dropped* support for 80 bit long doubles. At one point, Microsoft had even
made it impossible to execute 80 bit floating instructions on their upcoming
Win64 (I made some frantic phone calls to them and apparently was the only
one who ever made a case to them in favor of 80 bit long doubles, they said
they'd put the support back in). Intel doesn't support 80 bit reals on any
of their new vector floating point instructions. The 64 bit chips only
support it in a 'legacy' manner. Java, C#, VC, Javascript do not support 80
bit reals.

I haven't done a comprehensive survey of computer languages, but as far as I
can tell D stands pretty much alone in its support for 80 bits, along with a
handful of C/C++ compilers (including DMC).

Because of this shaky operating system and chip support for 80 bits, it
would be a mistake to center D's floating point around 80 bits. Some systems
may force a reversion to 64 bits. On the other hand, ongoing system support
for 64 bit doubles is virtually guaranteed, and D generally follows C's
rules with these.

(BTW, this thread is a classic example of "build it, and they will come". D
is almost single handedly rescuing 80 bit floating point from oblivion,
since it makes such a big deal about it and has wound up interesting a lot
of people in it. Before D, as far as I could tell, nobody cared a whit about
it. I think it's great that this has struck such a responsive chord.)
```
Apr 03 2005
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
```Walter wrote:

I haven't done a comprehensive survey of computer languages, but as far as I
can tell D stands pretty much alone in its support for 80 bits, along with a
handful of C/C++ compilers (including DMC).

The thing is that the D "real" type does *not* guarantee 80 bits ?
It doesn't even say the minimum size, so one can only assume 64...

I think it would be more clear to say "80 bits minimum", and then
future CPUs/code is still free to use 128-bit extended doubles too ?

(since D allows all FP calculations to be done at a higher precision)

This would be simplified by padding the 80-bit floating point to
a full 16 bytes, by adding zeros (as suggested by performance anyway)

And then, with both 128-bit integers and 128-bit floating point,
D would truly be equipped to face both today (64) and tomorrow...

(and with a "real" alias, it's still the "largest hardware implemented")

Just my 2 öre,
--anders
```
Apr 03 2005
"Walter" <newshound digitalmars.com> writes:
```"Anders F Björklund" <afb algonet.se> wrote in message
news:d2og5l\$27nh\$3 digitaldaemon.com...
Walter wrote:

I haven't done a comprehensive survey of computer languages, but as far

as I
can tell D stands pretty much alone in its support for 80 bits, along

with a
handful of C/C++ compilers (including DMC).

The thing is that the D "real" type does *not* guarantee 80 bits ?
It doesn't even say the minimum size, so one can only assume 64...

Yes, it's 64. Guaranteeing 80 bits would require writing an 80 bit software
emulator. I've used such emulators before, and they are really, really slow.
I don't think it's practical for D floating point to be 100x slower on some
machines.

I think it would be more clear to say "80 bits minimum", and then
future CPUs/code is still free to use 128-bit extended doubles too ?
(since D allows all FP calculations to be done at a higher precision)

What it's supposed to be is the max precision supported by the hardware the
D program is running on.

This would be simplified by padding the 80-bit floating point to
a full 16 bytes, by adding zeros (as suggested by performance anyway)

C compilers that support 80 bit long doubles will align them on 2 byte
boundaries. To conform to the C ABI, D must follow suit.

And then, with both 128-bit integers and 128-bit floating point,
D would truly be equipped to face both today (64) and tomorrow...

(and with a "real" alias, it's still the "largest hardware implemented")

Just my 2 öre,
--anders

```
Apr 03 2005
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
```Walter wrote:

The thing is that the D "real" type does *not* guarantee 80 bits ?
It doesn't even say the minimum size, so one can only assume 64...

Yes, it's 64. Guaranteeing 80 bits would require writing an 80 bit software
emulator. I've used such emulators before, and they are really, really slow.
I don't think it's practical for D floating point to be 100x slower on some
machines.

Me neither. Emulating 64-bit integers with two 32-bit registers is OK,
since that is a whole lot easier. (could even be done for 128-bit ints?)

But emulating 80-bit floating point ? Eww. Emulating a 128-bit double
is better, but the current method is cheating a lot on IEEE-755 spec...

No, I meant that extended precision should be *unavailable* on some CPU.
But maybe it's better to have it work in D, like long double does in C ?

(i.e. it falls back to using regular doubles, possibly with warnings)

If so, just tell me it's better to have a flexible width language type,
than to have some types be unavailable on certain FPU computer hardware?

Since that was the whole idea... (have "extended" map to 80-bit FP type)

What it's supposed to be is the max precision supported by the hardware the
D program is running on.

OK, for PPC and PPC64 that is definitely 64 bits. Not sure about SPARC ?
Think I saw that Cray (or so) has 128-bit FP, but haven't got one... :-)

It seems like likely real-life values would be: 64, 80, 96 and 128 bits
(PPC/PPC64, X86/X86_64, 68K, and whatever super-computer it was above)

It's possible that a future 128-bit CPU would have a 128-bit FPU too...
But who knows ? (I haven't even seen the slightest hint of such a beast)

This would be simplified by padding the 80-bit floating point to
a full 16 bytes, by adding zeros (as suggested by performance anyway)

C compilers that support 80 bit long doubles will align them on 2 byte
boundaries. To conform to the C ABI, D must follow suit.

I thought that was an ABI option, how to align "long double" types ?

It was my understanding that it was aligned to 96 bits on X86,
and to 128 bits on X86_64. But I might very well be wrong there...
(it's just the impression that I got from reading the GCC manual)

i.e. it still uses the regular 80 bit floating point registers,
but pads the values out with zeroes when storing them in memory.

--anders
```
Apr 03 2005
"Walter" <newshound digitalmars.com> writes:
```"Anders F Björklund" <afb algonet.se> wrote in message
news:d2pdbk\$30dj\$1 digitaldaemon.com...
If so, just tell me it's better to have a flexible width language type,
than to have some types be unavailable on certain FPU computer hardware?

Yes, I believe that is better. Every once in a while, an app *does* care,
but they're screwed anyway if the hardware won't support it.

What it's supposed to be is the max precision supported by the hardware

the
D program is running on.

OK, for PPC and PPC64 that is definitely 64 bits. Not sure about SPARC ?
Think I saw that Cray (or so) has 128-bit FP, but haven't got one... :-)

It seems like likely real-life values would be: 64, 80, 96 and 128 bits
(PPC/PPC64, X86/X86_64, 68K, and whatever super-computer it was above)

It's possible that a future 128-bit CPU would have a 128-bit FPU too...
But who knows ? (I haven't even seen the slightest hint of such a beast)

When I first looked at the AMD64 documentation, I was thrilled to see "m128"
for a floating point type. I was crushed when I found it meant "two 64 bit
doubles". I'd love to see a big honker 128 bit floating point type in
hardware.

This would be simplified by padding the 80-bit floating point to
a full 16 bytes, by adding zeros (as suggested by performance anyway)

C compilers that support 80 bit long doubles will align them on 2 byte
boundaries. To conform to the C ABI, D must follow suit.

I thought that was an ABI option, how to align "long double" types ?

The only option is to align it to what the corresponding C compiler does.

It was my understanding that it was aligned to 96 bits on X86,

That's not a power of 2, so won't work as alignment.

and to 128 bits on X86_64. But I might very well be wrong there...
(it's just the impression that I got from reading the GCC manual)

i.e. it still uses the regular 80 bit floating point registers,
but pads the values out with zeroes when storing them in memory.

--anders

```
Apr 03 2005
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
```Walter wrote:

If so, just tell me it's better to have a flexible width language type,
than to have some types be unavailable on certain FPU computer hardware?

Yes, I believe that is better. Every once in a while, an app *does* care,
but they're screwed anyway if the hardware won't support it.

I just fail to see how real -> double/extended, is any different from
the int -> short/long that C has gotten so much beating for already ?

The suggestion was to have fixed precision types:
- float => IEEE 754 Single precision (32-bit)
- double => IEEE 754 Double precision (64-bit)
- extended => IEEE 754 Double Extended precision (80-bit)

And then have "real" be an alias to the largest hardware-supported type.
It wouldn't break code more than if it was a variadic size type format ?

When I first looked at the AMD64 documentation, I was thrilled to see "m128"
for a floating point type. I was crushed when I found it meant "two 64 bit
doubles". I'd love to see a big honker 128 bit floating point type in
hardware.

I had a similar experience, with PPC64 and GCC, a while back...
(-mlong-double-128, referring to the IBM AIX style DoubledDouble)

Anyway, double-double has no chance of being full IEEE 755 spec.

It was my understanding that it was aligned to 96 bits on X86,

That's not a power of 2, so won't work as alignment.

You lost me ? (anyway, I suggested 128 - which *is* a power of two)

But it was my understanding that on the X86/X86_64 family of processors
that Windows used to use 10-byte doubles (and then removed extended?),
and that Linux i386(-i686) uses 12-byte doubles and Linux X86_64 now
uses 16-byte doubles (using the GCC option of -m128bit-long-double)

And that was *not* a suggestion, but how it actually worked... Now ?

--anders
```
Apr 04 2005
Georg Wrede <georg.wrede nospam.org> writes:
```Anders F Björklund wrote:
It was my understanding that it was aligned to 96 bits on X86,

That's not a power of 2, so won't work as alignment.

You lost me ? (anyway, I suggested 128 - which *is* a power of two)

Size can be anything divisible by 8 bits, i.e. any number of bytes.

Alignment has to be a power of two, and is about _where_ in memory the
thing can or cannot be stored.

Align 4 for example, means that the variable cannot be stored in a
memory address which, taken as a number, is not divisible by 4.

Only something aligned 1 can be stored in any address.
```
Apr 04 2005
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
```Georg Wrede wrote:

It was my understanding that it was aligned to 96 bits on X86,

That's not a power of 2, so won't work as alignment.

You lost me ? (anyway, I suggested 128 - which *is* a power of two)

Size can be anything divisible by 8 bits, i.e. any number of bytes.

Alignment has to be a power of two, and is about _where_ in memory the
thing can or cannot be stored.

Align 4 for example, means that the variable cannot be stored in a
memory address which, taken as a number, is not divisible by 4.

Only something aligned 1 can be stored in any address.

OK, seems like my sloppy syntax is hurting me once again... :-P

I meant that the *size* of "long double" on GCC X86 is 96 bits,
so that it can be *aligned* to 32 bits always (unlike 80 bits?)

Anyway, aligning to 128 bits gives better Pentium performance ?
(or at least, that's what I heard... Only have doubles on PPC)

Thanks for clearing it up, in my head 96 bits was "a power of two".
(since anything aligned to a multiple of a power of two is fine too)

--anders
```
Apr 04 2005
"Ben Hinkle" <ben.hinkle gmail.com> writes:
```"Anders F Björklund" <afb algonet.se> wrote in message
news:d2qq5u\$1aau\$1 digitaldaemon.com...
Walter wrote:

If so, just tell me it's better to have a flexible width language type,
than to have some types be unavailable on certain FPU computer hardware?

Yes, I believe that is better. Every once in a while, an app *does* care,
but they're screwed anyway if the hardware won't support it.

I just fail to see how real -> double/extended, is any different from the
int -> short/long that C has gotten so much beating for already ?

The suggestion was to have fixed precision types:
- float => IEEE 754 Single precision (32-bit)
- double => IEEE 754 Double precision (64-bit)
- extended => IEEE 754 Double Extended precision (80-bit)

And then have "real" be an alias to the largest hardware-supported type.
It wouldn't break code more than if it was a variadic size type format ?

What happens when someone declares a variable as quadruple on a platform
without hardware support? Does D plug in a software quadruple
implementation? That isn't the right thing to do. That's been my whole point
of bringing up Java's experience. They tried to foist too much rigor on
their floating point model in the name of portability and had to redo it.
```
Apr 04 2005
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
```Ben Hinkle wrote:

What happens when someone declares a variable as quadruple on a platform
without hardware support? Does D plug in a software quadruple
implementation? That isn't the right thing to do. That's been my whole point
of bringing up Java's experience. They tried to foist too much rigor on
their floating point model in the name of portability and had to redo it.

Choke... Splutter... Die.

Java did not re-implement extended in software. They just ignored it...

--anders
```
Apr 04 2005
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
```I wrote, in response to Ben Hinkle:

What happens when someone declares a variable as quadruple on a
platform without hardware support?

Choke... Splutter... Die.

Just to be perfectly clear:
Those are the sounds the *compiler* would make, not Ben :-)

Seriously, trying to use the extended or quadruple types on
platforms where they are not implemented in hardware would
be a compile time error. "real" would silently fall back.

--anders
```
Apr 04 2005
"Ben Hinkle" <ben.hinkle gmail.com> writes:
```"Anders F Björklund" <afb algonet.se> wrote in message
news:d2rcfd\$1ueq\$2 digitaldaemon.com...
I wrote, in response to Ben Hinkle:

What happens when someone declares a variable as quadruple on a platform
without hardware support?

Choke... Splutter... Die.

Just to be perfectly clear:
Those are the sounds the *compiler* would make, not Ben :-)

yup, I read it that way - though I did notice I spluttered a bit this
morning...

Seriously, trying to use the extended or quadruple types on
platforms where they are not implemented in hardware would
be a compile time error. "real" would silently fall back.

OK, needless to say I think a builtin type that is illegal on many platforms
is a mistake.
```
Apr 04 2005
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
```Ben Hinkle wrote:

OK, needless to say I think a builtin type that
is illegal on many platforms is a mistake.

That is actually *not* needless to say,
but Walter agrees with you on the topic.

Just as we can talk about "real" as the
64/80/96/128 bit floating point type,
and not somehow assume that it will be
80 bits - then I'm perfectly fine with it.

"long double" in C/C++ works just the same.

But if you *do* want to talk about the "X87"
80-bit type, then please do by all means use
"extended" instead. Less confusion, all around ?
(let's save "quadruple" for later, with "cent")

--anders
```
Apr 04 2005
"Bob W" <nospam aol.com> writes:
```"Anders F Björklund" <afb algonet.se> wrote in message
news:d2rdjp\$1vfl\$1 digitaldaemon.com...
Ben Hinkle wrote:

OK, needless to say I think a builtin type that
is illegal on many platforms is a mistake.

That is actually *not* needless to say,
but Walter agrees with you on the topic.

Just as we can talk about "real" as the
64/80/96/128 bit floating point type,
and not somehow assume that it will be
80 bits - then I'm perfectly fine with it.

"long double" in C/C++ works just the same.

But if you *do* want to talk about the "X87"
80-bit type, then please do by all means use
"extended" instead. Less confusion, all around ?
(let's save "quadruple" for later, with "cent")

--anders

The IEEE 754r suggests that there won't be
a 80bit nor a 96bit format in future (whenever
this may be).

Ref.: My today's post about IEEE 754r
```
Apr 04 2005
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
```Bob W wrote:

But if you *do* want to talk about the "X87"
80-bit type, then please do by all means use
"extended" instead. Less confusion, all around ?
(let's save "quadruple" for later, with "cent")

The IEEE 754r suggests that there won't be
a 80bit nor a 96bit format in future (whenever
this may be).

According to Sun, Microsoft, IBM and Apple
there isn't such a 80-bit type today even... ;-)

BTW; the 96-bit floating point was the type
preferred by the 68K families FPU processor

--anders
```
Apr 04 2005
"Walter" <newshound digitalmars.com> writes:
```"Anders F Björklund" <afb algonet.se> wrote in message
news:d2rhtd\$258a\$1 digitaldaemon.com...
Bob W wrote:
The IEEE 754r suggests that there won't be
a 80bit nor a 96bit format in future (whenever
this may be).

According to Sun, Microsoft, IBM and Apple
there isn't such a 80-bit type today even... ;-)

I fear it will be constant struggle to keep the chipmakers from dropping it
and the OS vendors from abandoning support.
```
Apr 04 2005
```Ben Hinkle wrote:
"Anders F Björklund" <afb algonet.se> wrote in message
news:d2qq5u\$1aau\$1 digitaldaemon.com...

Walter wrote:

If so, just tell me it's better to have a flexible width language type,
than to have some types be unavailable on certain FPU computer hardware?

Yes, I believe that is better. Every once in a while, an app *does* care,
but they're screwed anyway if the hardware won't support it.

I just fail to see how real -> double/extended, is any different from the
int -> short/long that C has gotten so much beating for already ?

The suggestion was to have fixed precision types:
- float => IEEE 754 Single precision (32-bit)
- double => IEEE 754 Double precision (64-bit)
- extended => IEEE 754 Double Extended precision (80-bit)

And then have "real" be an alias to the largest hardware-supported type.
It wouldn't break code more than if it was a variadic size type format ?

What happens when someone declares a variable as quadruple on a platform
without hardware support? Does D plug in a software quadruple
implementation? That isn't the right thing to do. That's been my whole point
of bringing up Java's experience. They tried to foist too much rigor on
their floating point model in the name of portability and had to redo it.

Perhaps Ada has the right idea here.  Have a system default that
depends on the available hardware, but also allow the user to
define what size/precision is needed in any particular case.  It
may slow things down a lot if you demand 17 places of accuracy,
but if you really need exactly 17, you should be able to specify
it.  (OTOH, Ada had the govt. paying for it's development, and it
still ended up as a language people didn't want to use.)
```
Apr 05 2005
"Walter" <newshound digitalmars.com> writes:
```"Anders F Björklund" <afb algonet.se> wrote in message
news:d2qq5u\$1aau\$1 digitaldaemon.com...
Walter wrote:
If so, just tell me it's better to have a flexible width language type,
than to have some types be unavailable on certain FPU computer hardware?

Yes, I believe that is better. Every once in a while, an app *does*

care,
but they're screwed anyway if the hardware won't support it.

I just fail to see how real -> double/extended, is any different from
the int -> short/long that C has gotten so much beating for already ?

Philosophically, they are the same. Practically, however, they are very
different. Increasing integer sizes gives more range, and integer
calculations tend to be *right* or *wrong*. Floating point increased size,
however, gives more precision. So an answer is *better* or *worse*, insted
of right or wrong. (Increased bits also gives fp more range, but if the
range is not enough, it fails cleanly with an overflow indication, not just
wrapping around and giving garbage.) In other words, decreasing the bits in
an fp value tends to gracefully degrade the results, which is very different
from the effect on integer values.

The suggestion was to have fixed precision types:
- float => IEEE 754 Single precision (32-bit)
- double => IEEE 754 Double precision (64-bit)
- extended => IEEE 754 Double Extended precision (80-bit)

And then have "real" be an alias to the largest hardware-supported type.
It wouldn't break code more than if it was a variadic size type format ?

I just don't see the advantage. If you use "extended" and your hardware
doesn't support it, you're out of luck. If you use "real", your program will
still compile and run. If certain characteristics of the "real" type are
required, one can use static asserts on the properties of real.

It was my understanding that it was aligned to 96 bits on X86,

That's not a power of 2, so won't work as alignment.

You lost me ? (anyway, I suggested 128 - which *is* a power of two)

There is nothing set up in the operating system or linker to handle
alignment to 96 bits or other values not a power of 2. Note that there is a
big difference between the size of an object and what its alignment is.

But it was my understanding that on the X86/X86_64 family of processors
that Windows used to use 10-byte doubles (and then removed extended?),
and that Linux i386(-i686) uses 12-byte doubles and Linux X86_64 now
uses 16-byte doubles (using the GCC option of -m128bit-long-double)

And that was *not* a suggestion, but how it actually worked... Now ?

Windows uses 10 byte doubles aligned on 2 byte boundaries. I'm not sure if
gcc on linux does it that way or not.
```
Apr 04 2005
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
```Walter wrote:

Philosophically, they are the same. Practically, however, they are very
different. Increasing integer sizes gives more range, and integer
calculations tend to be *right* or *wrong*. Floating point increased size,
however, gives more precision. So an answer is *better* or *worse*, insted
of right or wrong. (Increased bits also gives fp more range, but if the
range is not enough, it fails cleanly with an overflow indication, not just
wrapping around and giving garbage.) In other words, decreasing the bits in
an fp value tends to gracefully degrade the results, which is very different
from the effect on integer values.

Interesting view of it, but I think that int fixed-point math degrades
gracefully in the same way (using integers) Still with wrapping, though.

Not that I've used fixed-point in quite some time, and it doesn't
seem like I will be either - with the current CPUs and the new APIs.

I just don't see the advantage. If you use "extended" and your hardware
doesn't support it, you're out of luck. If you use "real", your program will
still compile and run. If certain characteristics of the "real" type are
required, one can use static asserts on the properties of real.

To be honest, I was just tired of the "real is 80 bits" all over D ?
And more than a little annoyed at the ireal and creal, of course ;-)

I always thought that "long double" was confusing, so now I've started
to use "extended" for 80-bit and "real" for the biggest-available-type.

And it's working out good so far.

And that was *not* a suggestion, but how it actually worked... Now ?

Windows uses 10 byte doubles aligned on 2 byte boundaries. I'm not sure if
gcc on linux does it that way or not.

Linux on X86 aligns to 12 bytes, and Linux on X86_64 aligns to 16 bytes.

--anders
```
Apr 04 2005
=?ISO-8859-1?Q?Anders_F_Bj=F6rklund?= <afb algonet.se> writes:
``` Windows uses 10 byte doubles aligned on 2 byte boundaries. I'm not
sure if gcc on linux does it that way or not.

Linux on X86 aligns to 12 bytes, and Linux on X86_64 aligns to 16 bytes.

Make that "Linux on X86 aligns to 4 bytes, by making the size 12".

You know what I mean :-)

--anders
```
Apr 04 2005
```Walter wrote:
"Anders F Björklund" <afb algonet.se> wrote in message
news:d2og5l\$27nh\$3 digitaldaemon.com...

Walter wrote:

I haven't done a comprehensive survey of computer languages, but as far

as I

can tell D stands pretty much alone in its support for 80 bits, along

with a

handful of C/C++ compilers (including DMC).

The thing is that the D "real" type does *not* guarantee 80 bits ?
It doesn't even say the minimum size, so one can only assume 64...

Yes, it's 64. Guaranteeing 80 bits would require writing an 80 bit software
emulator. I've used such emulators before, and they are really, really slow.
I don't think it's practical for D floating point to be 100x slower on some
machines.
...

Would implementing fixed point arithmetic improve that?  Even
with a 128-bit integer as the underlying type, I think it would
have operational limitations, but it should be a lot faster then
"100 times as slow as hardware".  (OTOH, there's lots of reasons
why it isn't a normal feature of languages.  Apple on the 68000
series is the only computer I know of using it, and then only for
specialized applications.)
```
Apr 05 2005
"Walter" <newshound digitalmars.com> writes:
```"Charles Hixson" <charleshixsn earthlink.net> wrote in message
news:d2unfm\$2n6s\$1 digitaldaemon.com...
Would implementing fixed point arithmetic improve that?  Even
with a 128-bit integer as the underlying type, I think it would
have operational limitations, but it should be a lot faster then
"100 times as slow as hardware".  (OTOH, there's lots of reasons
why it isn't a normal feature of languages.  Apple on the 68000
series is the only computer I know of using it, and then only for
specialized applications.)

If using a 128 bit fixed point would work, then one can use integer
arithmetic on it. But that isn't floating point, which is a fundamentally
different animal.
```
Apr 05 2005
"Bob W" <nospam aol.com> writes:
```"Walter" <newshound digitalmars.com> wrote in message
news:d2od1o\$25vd\$1 digitaldaemon.com...
"Bob W" <nospam aol.com> wrote in message
news:d2nd96\$1aos\$1 digitaldaemon.com...
By the way, C does it the same way for historic
reasons. Other languages are more user friendly
and I am still hoping that D might evolve in this
direction.

Actually, many languages, mathematical programs, and even C compilers have
*dropped* support for 80 bit long doubles. At one point, Microsoft had
even
made it impossible to execute 80 bit floating instructions on their
upcoming
Win64 (I made some frantic phone calls to them and apparently was the only
one who ever made a case to them in favor of 80 bit long doubles, they
said
they'd put the support back in). Intel doesn't support 80 bit reals on any
of their new vector floating point instructions. The 64 bit chips only
support it in a 'legacy' manner. Java, C#, VC, Javascript do not support
80
bit reals.

I haven't done a comprehensive survey of computer languages, but as far as
I
can tell D stands pretty much alone in its support for 80 bits, along with
a
handful of C/C++ compilers (including DMC).

Because of this shaky operating system and chip support for 80 bits, it
would be a mistake to center D's floating point around 80 bits. Some
systems
may force a reversion to 64 bits. On the other hand, ongoing system
support
for 64 bit doubles is virtually guaranteed, and D generally follows C's
rules with these.

(BTW, this thread is a classic example of "build it, and they will come".
D
is almost single handedly rescuing 80 bit floating point from oblivion,
since it makes such a big deal about it and has wound up interesting a lot
of people in it. Before D, as far as I could tell, nobody cared a whit
it. I think it's great that this has struck such a responsive chord.)

I am probably looking like an extended precison
advocate, but I am actually not. The double
format was good enough for me even for
statistical evaluation in almost 100% of cases.
There are admittedly cases which would benefit
from having 80 bit precision available, however.

Therefore, although it would not be devastating for
me should you ever decide to drop support for the
reals, I'd still like having them available just in
case they are needed. However, if you do offer
80 bit types you'll have to assign real variables
with proper real values if evaluation can be
completed at compile time. Otherwise I suggest
that you issue a warning where accuracy might
be impaired. It is hard to believe that a new
millennium programming language would actually
require people to write

real r=1.2L   instead of   real r=1.2

in order not to produce an incorrect assignment.
Yes, I know what C programmers would want
to say here, I am one of them.    : )

For someone not familiar with C, the number
1.2 is not a real and is not a double either,
especially if he is purely mathematically
oriented. It is a decimal floating point value.
He takes it for granted that 1.2 is fine whether
assigned to a float or to a double. But he will
refuse to understand why he has to suffix the
literal to become an accurate real value.

Of course you could try to explain him that
the usual +/- 1/2 LSB error for most fractional
(decimal) values converted to binary would
increase to about 11 LSBs if he ever forgot
to use that important "L" suffix. But would
he really want to know?
```
Apr 03 2005
"Bob W" <nospam aol.com> writes:
```"Walter" <newshound digitalmars.com> wrote in message
news:d2kk71\$1pnl\$1 digitaldaemon.com...
"Bob W" <nospam aol.com> wrote in message
news:d2ieh5\$2ksl\$1 digitaldaemon.com...
- D is not entirely 80-bit based as claimed.

Example: std.string.atof() as mentioned below.

- Literals are converted to 64 bit first (and from there
to 80 bits) at compile time if no suffix is used, even
if the target is of type 'real'.

Incorrect. You can see for yourself in lexer.c. Do a grep for "strtold".

Maybe there is a misunderstanding:

I just wanted to mention that although it is claimed that
the default internal FP format is 80 bits, the default
floating point format for literals is double. The lexer,
(at least to my understanding) seems to confirm this.
Therefore, if someone does not want to experience a
loss in precision, he ALWAYS needs to use the L suffix
for literals, otherwise he gets a real which was converted
from a double.

e.g.:

real r1=1.2L;  // this one is ok thanks to the suffix
real r2=1.2;   // loss in precision, double convt'd to real

- atof() for example is returning a 'real' value which is
obviously derived from a 'double', thus missing some
essential bits at the end.

Check out std.math2.atof(). It's fully 80 bit.

This one yes, but not the official Phobos
version std.string.atof() which I have used.
Phobos docs suggest that atof() can be found in

1) std.math (n/a)
2) std.string

and std.math2 is not even mentioned in the Phobos docs,
I've got it from std.string AND THIS ONE IS 64 BIT!

--------- quote from "c.stdlib.d" ---------
double atof(char *);
--------------- unquote -------------------

--------- quote from "string.d" ----------
real atof(char[] s)
{
// BUG: should implement atold()
return std.c.stdlib.atof(toStringz(s));
}
--------------- unquote -------------------

Due to heavy workload this issue might have
been overlooked. Luckily I do not even have
to mention the word "BUG", this was aparently
already done in the author's comment line.  : )

After searching the archives it looks like
someone was already troubled by the multiple
appearance of atof() in Nov 2004:

http://www.digitalmars.com/d/archives/digitalmars/D/bugs/2196.html

Example:

The hex value for 0.0000195 in 'real' can be expressed as
3fef a393ee5e edcc20d5
or
3fef a393ee5e edcc20d6
(due to the non-decimal fraction).

The same value converted from a 'double' would be
3fef a393ee5e edcc2000
and therefore misses several trailing bits. This could
cause the floor() function to misbehave.

I hope this info was somewhat useful.

Perhaps the following program will help:

import std.stdio;

void main()
{
writefln("float  %a", 0.0000195F);
writefln("double %a", 0.0000195);
writefln("real   %a", 0.0000195L);

writefln("cast(real)float  %a", cast(real)0.0000195F);
writefln("cast(real)double %a", cast(real)0.0000195);
writefln("cast(real)real   %a", cast(real)0.0000195L);

writefln("float  %a", 0.0000195F * 7 - 195);
writefln("double %a", 0.0000195  * 7 - 195);
writefln("real   %a", 0.0000195L * 7 - 195);
}

float  0x1.4727dcp-16
double 0x1.4727dcbddb984p-16
real   0x1.4727dcbddb9841acp-16
cast(real)float  0x1.4727dcp-16
cast(real)double 0x1.4727dcbddb984p-16
cast(real)real   0x1.4727dcbddb9841acp-16
float  -0x1.85ffeep+7
double -0x1.85ffee1bd1edap+7
real   -0x1.85ffee1bd1ed9dfep+7

In accordance to what I have mentioned before,
the following program demonstrates the
existence of "truncated" reals:

void main() {
real r1=1.2L;  // converted directly to 80 bit value
real r2=1.2;   // parsed to 64b, then convt'd to 80b

writefln("Genuine  : %a",r1);
writefln("Truncated: %a",r2);
}

Output (using %a):

Genuine  : 0x1.3333333333333334p+0
Truncated: 0x1.3333333333333p+0

Alternative Output:

Genuine:    1.20000000000000000 [3fff 99999999 9999999a]
Truncated:  1.19999999999999996 [3fff 99999999 99999800]
```
Apr 01 2005
"Walter" <newshound digitalmars.com> writes:
```"Bob W" <nospam aol.com> wrote in message
news:d2kvcc\$22qa\$1 digitaldaemon.com...
I just wanted to mention that although it is claimed that
the default internal FP format is 80 bits,

Actually, what is happening is that if you write the expression:

double a, b, c, d;
a = b + c + d;

then the intermediate values generated by b+c+d are allowed (but not
required) to be evaluated to the largest precision available. This means
that it's allowed to evaluate it as:

a = cast(double)(cast(real)b + cast(real)c + cast(real)d));

but it is not required to evaluate it in that way. This produces a slightly
different result than:

double t;
t = b + c;
a = t + d;

The latter is the way Java is specified to work, which turns out to be both
numerically inferior and *slower* on the x86 FPU. The x86 FPU *wants* to
evaluate things to 80 bits.

The D compiler's internal paths fully support 80 bit arithmetic, that means
there are no surprising "choke points" where it gets truncated to 64 bits.
If the type of a literal is specified to be 'double', which is the case for
no suffix, then you get 64 bits of precision. I hope you'll agree that that
is the least surprising thing to do.

Check out std.math2.atof(). It's fully 80 bit.

I've got it from std.string AND THIS ONE IS 64 BIT!

True, that's a bug, and I'll fix it
```
Apr 01 2005
"Bob W" <nospam aol.com> writes:
```I have started a new thread: "80 Bit Challenge",