digitalmars.D - Loss of precision errors in FP conversions

bearophile (38/38) Apr 19 2011 In Bugzilla I have just added an enhancement request that asks for a lit...

dsimcha (20/58) Apr 19 2011 Please, _NOOOOOOO!!_ The integer conversion errors are already arguably...

bearophile (15/18) Apr 19 2011 You are right, and I am ready to close that enhancement request at once ...

dsimcha (11/16) Apr 19 2011 Very often, actually. Basically, any time I have a lot of floating
Andrei Alexandrescu (3/7) Apr 19 2011 Yes please. I once felt the same way, but learned better since.

bearophile (4/5) Apr 19 2011 OK :-) But I will probably write an article about this, because I have f...

dsimcha (2/7) Apr 19 2011 ...or write a Lint tool for D.

Robert Jacques (14/35) Apr 19 2011 I do GP GPU work, so I use floats all the time. They're also useful for ...

bearophile (8/15) Apr 20 2011 If you compile D1 code that doesn't contain "real" types with 32-bit LDC...

Robert Jacques (9/13) Apr 20 2011 IIRC, the Fermi Tesla cards do doubles at about 1/2 float speed, so

Walter Bright (12/21) Apr 19 2011 That's definitely a worry. Having a Nagging Nellie giving false alarms o...

bearophile (7/13) Apr 19 2011 Loss of some precision bits in normal FP operations is a fact of life, b...

Walter Bright (2/9) Apr 19 2011 Yes, I've argued strongly against warnings.

Brad Roberts (4/18) Apr 19 2011 The stronger argument, that I agree with, is not having flag based

Walter Bright (5/8) Apr 19 2011 True, if you have N compiler switches, you have 2^N different compilers ...

Sean Kelly (7/13) Apr 20 2011 compilers to test! Every switch added doubles the time it takes to =

Walter Bright (3/6) Apr 20 2011 Currently I test with all combinations of switches that affect code gen....

Jesse Phillips (4/6) Apr 19 2011 Loosing precision on a fractional number seems like it would be a very c...

bearophile <bearophileHUGS lycos.com> writes:

In Bugzilla I have just added an enhancement request that asks for a little
change in D, I don't know if it was already discussed or if it's already
present in Bugzilla:
http://d.puremagic.com/issues/show_bug.cgi?id=5864

In a program like this:

void main() {
    uint x = 10_000;
    ubyte b = x;
}


DMD 2.052 raises a compilation error like this, because the b=x assignment may
lose some information, some bits of x:

test.d(3): Error: cannot implicitly convert expression (x) of type uint to ubyte

I think that a safe and good system language has to help avoid unwanted
(implicit) loss of information during data conversions.

This is a case of loss of precision where D generates no compile errors:


import std.stdio;
void main() {
    real f1 = 1.0000111222222222333;
    writefln("%.19f", f1);
    double f2 = f1; // loss of FP precision
    writefln("%.19f", f2);
    float f3 = f2; // loss of FP precision
    writefln("%.19f", f3);
}

Despite some information is lost, see the output:
1.0000111222222222332
1.0000111222222223261
1.0000110864639282226

So one possible way to face this situation is to statically disallow
double=>float, real=>float, and real=>double conversions (on some computers
real=>double conversions don't cause loss of information, but I suggest to
ignore this, to increase code portability), and introduce compile-time errors
like:

test.d(5): Error: cannot implicitly convert expression (f1) of type real to
double
test.d(7): Error: cannot implicitly convert expression (f2) of type double to
float


Today float values seem less useful, because with serial CPU instructions the
performance difference between operations on float and double is often not
important, and often you want the precision of doubles. But modern CPUs (and
current GPUs) have vector operations too. They are currently able to perform
operations on 4 float values or 2 double values (or 8 float or 4 doubles) at
the same time for each instruction. Such vector instructions are sometimes used
directly in C-GCC code using SSE intrinsics, or come out of auto-vectorization
optimization of loops done by GCC on normal serial C code. In this situation
the usage of float instead of double gives almost a twofold performance
increase. There are programs (like certain ray-tracing code) where the
precision of a float is enough. So a compile-time error that catches currently
implicit double->float conversions may help the programmer avoid unwanted
usages of doubles that allow the compiler to pack 4/8 floats in a vector
register during loop vectorizations.


Partially related note: currently std.math doesn't seem to use the cosf, sinf C
functions, but it uses sqrtf:

import std.math: sqrt, sin, cos;
void main() {
    float x = 1.0f;
    static assert(is(typeof(  sqrt(x)  ) == float)); // OK
    static assert(is(typeof(  sin(x)   ) == float)); // ERR
    static assert(is(typeof(  cos(x)   ) == float)); // ERR
}

Bye,
bearophile

Apr 19 2011

dsimcha <dsimcha yahoo.com> writes:

On 4/19/2011 7:49 PM, bearophile wrote:
 In Bugzilla I have just added an enhancement request that asks for a little
change in D, I don't know if it was already discussed or if it's already
present in Bugzilla:
 http://d.puremagic.com/issues/show_bug.cgi?id=5864

 In a program like this:

 void main() {
      uint x = 10_000;
      ubyte b = x;
 }


 DMD 2.052 raises a compilation error like this, because the b=x assignment may
lose some information, some bits of x:

 test.d(3): Error: cannot implicitly convert expression (x) of type uint to
ubyte

 I think that a safe and good system language has to help avoid unwanted
(implicit) loss of information during data conversions.

 This is a case of loss of precision where D generates no compile errors:


 import std.stdio;
 void main() {
      real f1 = 1.0000111222222222333;
      writefln("%.19f", f1);
      double f2 = f1; // loss of FP precision
      writefln("%.19f", f2);
      float f3 = f2; // loss of FP precision
      writefln("%.19f", f3);
 }

 Despite some information is lost, see the output:
 1.0000111222222222332
 1.0000111222222223261
 1.0000110864639282226

 So one possible way to face this situation is to statically disallow
double=>float, real=>float, and real=>double conversions (on some computers
real=>double conversions don't cause loss of information, but I suggest to
ignore this, to increase code portability), and introduce compile-time errors
like:

 test.d(5): Error: cannot implicitly convert expression (f1) of type real to
double
 test.d(7): Error: cannot implicitly convert expression (f2) of type double to
float


 Today float values seem less useful, because with serial CPU instructions the
performance difference between operations on float and double is often not
important, and often you want the precision of doubles. But modern CPUs (and
current GPUs) have vector operations too. They are currently able to perform
operations on 4 float values or 2 double values (or 8 float or 4 doubles) at
the same time for each instruction. Such vector instructions are sometimes used
directly in C-GCC code using SSE intrinsics, or come out of auto-vectorization
optimization of loops done by GCC on normal serial C code. In this situation
the usage of float instead of double gives almost a twofold performance
increase. There are programs (like certain ray-tracing code) where the
precision of a float is enough. So a compile-time error that catches currently
implicit double->float conversions may help the programmer avoid unwanted
usages of doubles that allow the compiler to pack 4/8 floats in a vecto

r register during loop vectorizations.
 Partially related note: currently std.math doesn't seem to use the cosf, sinf
C functions, but it uses sqrtf:

 import std.math: sqrt, sin, cos;
 void main() {
      float x = 1.0f;
      static assert(is(typeof(  sqrt(x)  ) == float)); // OK
      static assert(is(typeof(  sin(x)   ) == float)); // ERR
      static assert(is(typeof(  cos(x)   ) == float)); // ERR
 }

 Bye,
 bearophile

Please, _NOOOOOOO!!_  The integer conversion errors are already arguably 
too pedantic, make generic code harder to write and get in the way about 
as often as they help.  Floating point tends to degrade much more 
gracefully than integer.  Where integer narrowing can just be silently, 
non-obviously and completely wrong, floating point narrowing will at 
least be approximately right, or become infinity and be wrong in an 
obvious way.  I know what you suggest could prevent bugs in a lot of 
cases, but it also has the potential to get in the way in a lot of cases.

Generally I worry about D's type system becoming like the Boy Who Cried 
Wolf, where it flags so many potential errors (as opposed to things that 
are definitely errors) that people become conditioned to just put in 
whatever casts they need to shut it up.  I definitely fell into that 
when porting some 32-bit code that was sloppy with size_t vs. int to 
64-bit.  I knew there was no way it was going to be a problem, because 
there was no way any of my arrays were going to be even within a few 
orders of magnitude of int.max, but the compiler insisted on nagging me 
about it and I reflexively just put in casts everywhere.  A warning 
_may_ be appropriate, but definitely not an error.

Apr 19 2011

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:

I know what you suggest could prevent bugs in a lot of
cases, but it also has the potential to get in the way in a lot of cases.

You are right, and I am ready to close that enhancement request at once if the
consensus is against it.

double->float and real->float cases are not so common. How often do you use
floats in your code? In my code it's uncommon to use floats, generally I use
doubles.

A problem may be in real->double conversions, because I think D feels free to
use intermediate real values in some FP computations.

Another possible problem: generic code like this is going to produce an error
because 2.0 literal is double, so x*2.0 is a double even if x is float:

T foo(T)(T x) {
return x * 2.0;
}

But when I use floats, I have found it good a C lint that spots double->float
conversions, because it has actually allowed me to speed up some code that was
doing float->double->float conversions without me being aware of it.

A warning _may_ be appropriate, but definitely not an error.

Another option is a -warn_fp_precision_loss compiler switch, that produces
warnings only when you use it. For my purposes this is is enough.

Regarding the actual amount of troubles this errors messages are going to
cause, I have recently shown a link that argues for quantitative analysis of
language changes:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=135049
The idea here is to introduce those three FP conversions errors, compile Phobos
and some other D2 code, and count how many problems it causes.

Bye,
bearophile

Apr 19 2011

dsimcha <dsimcha yahoo.com> writes:

On 4/19/2011 8:37 PM, bearophile wrote:
 dsimcha:

 I know what you suggest could prevent bugs in a lot of
 cases, but it also has the potential to get in the way in a lot of cases.

 You are right, and I am ready to close that enhancement request at once if the
consensus is against it.

 double->float and real->float cases are not so common. How often do you use
floats in your code? In my code it's uncommon to use floats, generally I use
doubles.

Very often, actually.  Basically, any time I have a lot of floating 
point numbers that aren't going to be extremely big or small in 
magnitude and I'm interested in storing them and maybe performing a few 
_simple_ computations with them (sorting, statistical tests, most 
machine learning algorithms, etc.).  Good examples are gene expression 
levels or transformations thereof and probabilities.  Single precision 
is plenty unless your numbers are extremely big or small, you need a 
ridiculous number of significant figures, or you're performing intense 
computations (for example matrix factorizations) where rounding error 
may accumulate and turn a small loss of precision into a catastrophic one.

Apr 19 2011

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 4/19/11 7:37 PM, bearophile wrote:
 dsimcha:

 I know what you suggest could prevent bugs in a lot of
 cases, but it also has the potential to get in the way in a lot of cases.

 You are right, and I am ready to close that enhancement request at once if the
consensus is against it.

Yes please. I once felt the same way, but learned better since.

Andrei

Apr 19 2011

bearophile <bearophileHUGS lycos.com> writes:

Andrei:

 Yes please. I once felt the same way, but learned better since.

OK :-) But I will probably write an article about this, because I have found a
performance problem, that currently DMD doesn't help me avoid, that a C lint
has avoided me.

Bye,
bearophile

Apr 19 2011

dsimcha <dsimcha yahoo.com> writes:

On 4/19/2011 10:42 PM, bearophile wrote:
 Andrei:

 Yes please. I once felt the same way, but learned better since.

 OK :-) But I will probably write an article about this, because I have found a
performance problem, that currently DMD doesn't help me avoid, that a C lint
has avoided me.

 Bye,
 bearophile

...or write a Lint tool for D.

Apr 19 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Tue, 19 Apr 2011 20:37:47 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:
 dsimcha:

 I know what you suggest could prevent bugs in a lot of
 cases, but it also has the potential to get in the way in a lot of  
 cases.

 You are right, and I am ready to close that enhancement request at once  
 if the consensus is against it.

 double->float and real->float cases are not so common. How often do you  
 use floats in your code? In my code it's uncommon to use floats,  
 generally I use doubles.

I do GP GPU work, so I use floats all the time. They're also useful for  
data storage purposes.

 A problem may be in real->double conversions, because I think D feels  
 free to use intermediate real values in some FP computations.

For your information, the x87 can only perform computations at 80-bits. So  
all intermediate values are performed using reals. It's just how the  
hardware works. Now I now some compilers (i.e. VS) allow you to set a  
flag, which basically causes the system to avoid intermediate values  
altogether or to use SIMD instructions instead in order to be properly  
compliant.

 Another possible problem: generic code like this is going to produce an  
 error because 2.0 literal is double, so x*2.0 is a double even if x is  
 float:

 T foo(T)(T x) {
     return x * 2.0;
 }

 But when I use floats, I have found it good a C lint that spots  
 double->float conversions, because it has actually allowed me to speed  
 up some code that was doing float->double->float conversions without me  
 being aware of it.

Yes, this auto-promotion of literals is very annoying, and it would be  
nice if constants could smartly match the expression type. By the way,  
C/C++ also behave this way, which has gotten me into the habit of adding f  
after all my floating point constants.

Apr 19 2011

bearophile <bearophileHUGS lycos.com> writes:

Robert Jacques:

 I do GP GPU work, so I use floats all the time. They're also useful for  
 data storage purposes.

Today GPUs are just starting to manage doubles efficiently (Tesla?).


 For your information, the x87 can only perform computations at 80-bits.

If you compile D1 code that doesn't contain "real" types with 32-bit LDC it
uses SSE instructions on default (just 8 registers), this means most
computations are done with 64 bit doubles.

And in real programs, that use trigonometry, ecc, this is not the whole story.


 Yes, this auto-promotion of literals is very annoying, and it would be  
 nice if constants could smartly match the expression type.

Polysemus literals in general (here just floating point ones) have being
discussed several times in past, but I don't know if floating point polysemus
literals can be implemented well, and what consequences they will have in D
code. Maybe Don is able to give a good comment on this.


 By the way, C/C++ also behave this way, which has gotten me into the
 habit of adding f after all my floating point constants.

I presume if you take a good amount of care, in C (and probably in D too) you
are able to avoid the performance problems I was talking about. But given a
perfect programmer most warnings become useless :-) The warnings are usually
meant for for programmers that do mistakes, don't know enough yet, miss things,
etc.

Bye,
bearophile

Apr 20 2011

"Robert Jacques" <sandford jhu.edu> writes:

On Wed, 20 Apr 2011 06:23:01 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Robert Jacques:

 I do GP GPU work, so I use floats all the time. They're also useful for
 data storage purposes.

 Today GPUs are just starting to manage doubles efficiently (Tesla?).

IIRC, the Fermi Tesla cards do doubles at about 1/2 float speed, so  
relatively to other GPUs, they are both the most efficient, and have the  
best performance. So if there was something that really needed them, I'd  
use them. But doubles requires more double the registers and double the  
memory bandwidth, both of which make dramatic performance differences  
(depending on code). So I'll be sticking to floats and halves for the  
foreseeable future.

Apr 20 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 4/19/2011 5:02 PM, dsimcha wrote:
 Generally I worry about D's type system becoming like the Boy Who Cried Wolf,
 where it flags so many potential errors (as opposed to things that are
 definitely errors) that people become conditioned to just put in whatever casts
 they need to shut it up. I definitely fell into that when porting some 32-bit
 code that was sloppy with size_t vs. int to 64-bit. I knew there was no way it
 was going to be a problem, because there was no way any of my arrays were going
 to be even within a few orders of magnitude of int.max, but the compiler
 insisted on nagging me about it and I reflexively just put in casts everywhere.
 A warning _may_ be appropriate, but definitely not an error.

That's definitely a worry. Having a Nagging Nellie giving false alarms on
errors 
too often will:

1. cause programmers to hate D

2. lead to MORE bugs and harder to find ones, as Bruce Eckel pointed out, 
because people will put in things "just to shut up the compiler"

Hence my reluctance to add in a lot of these suggestions.

As to the specific about erroring on reducing precision, my talks with people 
who actually do write a lot of FP code for a living is NO. They don't want it. 
Losing precision in FP calculations is a fact of life, and FP programmers
simply 
must understand it and deal with it. Having the compiler annoy you about it 
would be less than helpful.

Apr 19 2011

bearophile <bearophileHUGS lycos.com> writes:

Walter:

 Hence my reluctance to add in a lot of these suggestions.

In an answer I've suggested the alternative solution of a
-warn_fp_precision_loss compiler switch, that produces warnings only when you
use it. In theory this avoids most of the Nagging Nellie problem, because you
use this switch only in special situations. But I am aware you generally don't
like warnings.


 As to the specific about erroring on reducing precision, my talks with people
 who actually do write a lot of FP code for a living is NO. They don't want it.
 Losing precision in FP calculations is a fact of life, and FP programmers
simply
 must understand it and deal with it. Having the compiler annoy you about it
 would be less than helpful.

Loss of some precision bits in normal FP operations is a fact of life, but
double->float conversions usually lose a much more significant amount of
precision, and it's not a fact of life, it's the code that in some way asks for
this irreversible conversion.

A related problem your answer doesn't keep in account are unwanted
float->double conversions (that get spotted by those error messages just
because the code actually performs float->double->float conversions). Such
unwanted conversions have caused performance loss on a 32-bit-mode CPU in some
of my C code (maybe this problem is not present with CPU on 64 bit code),
because the code was actually using doubles. A C lint has allowed me to spot
such problems and fix them.

Thank you for your answers,
bye,
bearophile

Apr 19 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 4/19/2011 6:46 PM, bearophile wrote:
 Walter:

 Hence my reluctance to add in a lot of these suggestions.

 In an answer I've suggested the alternative solution of a
 -warn_fp_precision_loss compiler switch, that produces warnings only when you
 use it. In theory this avoids most of the Nagging Nellie problem, because you
 use this switch only in special situations. But I am aware you generally
 don't like warnings.

Yes, I've argued strongly against warnings.

Apr 19 2011

Brad Roberts <braddr slice-2.puremagic.com> writes:

On Tue, 19 Apr 2011, Walter Bright wrote:

 On 4/19/2011 6:46 PM, bearophile wrote:
 Walter:
 
 Hence my reluctance to add in a lot of these suggestions.

 
 In an answer I've suggested the alternative solution of a
 -warn_fp_precision_loss compiler switch, that produces warnings only when
 you
 use it. In theory this avoids most of the Nagging Nellie problem, because
 you
 use this switch only in special situations. But I am aware you generally
 don't like warnings.

 
 Yes, I've argued strongly against warnings.

The stronger argument, that I agree with, is not having flag based 
sometimes warnings.  The more flags you have, the more complex the matrix 
of landmines there are.  I hate micro-managment, in all it's forms.

Apr 19 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 4/19/2011 7:11 PM, Brad Roberts wrote:
 The stronger argument, that I agree with, is not having flag based
 sometimes warnings.  The more flags you have, the more complex the matrix
 of landmines there are.  I hate micro-managment, in all it's forms.

True, if you have N compiler switches, you have 2^N different compilers to
test! 
Every switch added doubles the time it takes to validate the compiler.

If you have N warnings that can be independently toggled, you have 2^N
different 
languages.

Apr 19 2011

Sean Kelly <sean invisibleduck.org> writes:

On Apr 19, 2011, at 11:04 PM, Walter Bright wrote:

 On 4/19/2011 7:11 PM, Brad Roberts wrote:
 The stronger argument, that I agree with, is not having flag based
 sometimes warnings.  The more flags you have, the more complex the =


matrix
 of landmines there are.  I hate micro-managment, in all it's forms.

=20
 True, if you have N compiler switches, you have 2^N different =

compilers to test! Every switch added doubles the time it takes to =
validate the compiler.

Software testing theory has suggestions for how to reduce the number of =
test cases here with only a small sacrifice in general error detection.  =
Still, the fewer switches the better :-)=

Apr 20 2011

Walter Bright <newshound2 digitalmars.com> writes:

On 4/20/2011 9:28 AM, Sean Kelly wrote:
 Software testing theory has suggestions for how to reduce the number of test
 cases here with only a small sacrifice in general error detection.  Still,
 the fewer switches the better :-)

Currently I test with all combinations of switches that affect code gen. 
Sometimes, it will unexpectedly catch an odd interaction.

Apr 20 2011

Jesse Phillips <jessekphillips+D gmail.com> writes:

bearophile Wrote:

 In Bugzilla I have just added an enhancement request that asks for a little
change in D, I don't know if it was already discussed or if it's already
present in Bugzilla:
 http://d.puremagic.com/issues/show_bug.cgi?id=5864

Loosing precision on a fractional number seems like it would be a very common
and desirable case. For one thing even dealing with real doesn't mean you won't
have errors, you really end up in very tricky areas when using float over
double would actually cause issue in the program.

I'm not sure if this would be the best avenue to take in identifying possible
performance issues with converting FP.

And last, don't forget about significant figures. This suggests it would be
better to produce an error when widening since you really don't have the
precision of a double or real.

Apr 19 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Loss of precision errors in FP conversions