www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Loss of precision errors in FP conversions

reply bearophile <bearophileHUGS lycos.com> writes:
In Bugzilla I have just added an enhancement request that asks for a little
change in D, I don't know if it was already discussed or if it's already
present in Bugzilla:
http://d.puremagic.com/issues/show_bug.cgi?id=5864

In a program like this:

void main() {
    uint x = 10_000;
    ubyte b = x;
}


DMD 2.052 raises a compilation error like this, because the b=x assignment may
lose some information, some bits of x:

test.d(3): Error: cannot implicitly convert expression (x) of type uint to ubyte

I think that a safe and good system language has to help avoid unwanted
(implicit) loss of information during data conversions.

This is a case of loss of precision where D generates no compile errors:


import std.stdio;
void main() {
    real f1 = 1.0000111222222222333;
    writefln("%.19f", f1);
    double f2 = f1; // loss of FP precision
    writefln("%.19f", f2);
    float f3 = f2; // loss of FP precision
    writefln("%.19f", f3);
}

Despite some information is lost, see the output:
1.0000111222222222332
1.0000111222222223261
1.0000110864639282226

So one possible way to face this situation is to statically disallow
double=>float, real=>float, and real=>double conversions (on some computers
real=>double conversions don't cause loss of information, but I suggest to
ignore this, to increase code portability), and introduce compile-time errors
like:

test.d(5): Error: cannot implicitly convert expression (f1) of type real to
double
test.d(7): Error: cannot implicitly convert expression (f2) of type double to
float


Today float values seem less useful, because with serial CPU instructions the
performance difference between operations on float and double is often not
important, and often you want the precision of doubles. But modern CPUs (and
current GPUs) have vector operations too. They are currently able to perform
operations on 4 float values or 2 double values (or 8 float or 4 doubles) at
the same time for each instruction. Such vector instructions are sometimes used
directly in C-GCC code using SSE intrinsics, or come out of auto-vectorization
optimization of loops done by GCC on normal serial C code. In this situation
the usage of float instead of double gives almost a twofold performance
increase. There are programs (like certain ray-tracing code) where the
precision of a float is enough. So a compile-time error that catches currently
implicit double->float conversions may help the programmer avoid unwanted
usages of doubles that allow the compiler to pack 4/8 floats in a vector
register during loop vectorizations.


Partially related note: currently std.math doesn't seem to use the cosf, sinf C
functions, but it uses sqrtf:

import std.math: sqrt, sin, cos;
void main() {
    float x = 1.0f;
    static assert(is(typeof(  sqrt(x)  ) == float)); // OK
    static assert(is(typeof(  sin(x)   ) == float)); // ERR
    static assert(is(typeof(  cos(x)   ) == float)); // ERR
}

Bye,
bearophile
Apr 19 2011
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
On 4/19/2011 7:49 PM, bearophile wrote:
 In Bugzilla I have just added an enhancement request that asks for a little
change in D, I don't know if it was already discussed or if it's already
present in Bugzilla:
 http://d.puremagic.com/issues/show_bug.cgi?id=5864

 In a program like this:

 void main() {
      uint x = 10_000;
      ubyte b = x;
 }


 DMD 2.052 raises a compilation error like this, because the b=x assignment may
lose some information, some bits of x:

 test.d(3): Error: cannot implicitly convert expression (x) of type uint to
ubyte

 I think that a safe and good system language has to help avoid unwanted
(implicit) loss of information during data conversions.

 This is a case of loss of precision where D generates no compile errors:


 import std.stdio;
 void main() {
      real f1 = 1.0000111222222222333;
      writefln("%.19f", f1);
      double f2 = f1; // loss of FP precision
      writefln("%.19f", f2);
      float f3 = f2; // loss of FP precision
      writefln("%.19f", f3);
 }

 Despite some information is lost, see the output:
 1.0000111222222222332
 1.0000111222222223261
 1.0000110864639282226

 So one possible way to face this situation is to statically disallow
double=>float, real=>float, and real=>double conversions (on some computers
real=>double conversions don't cause loss of information, but I suggest to
ignore this, to increase code portability), and introduce compile-time errors
like:

 test.d(5): Error: cannot implicitly convert expression (f1) of type real to
double
 test.d(7): Error: cannot implicitly convert expression (f2) of type double to
float


 Today float values seem less useful, because with serial CPU instructions the
performance difference between operations on float and double is often not
important, and often you want the precision of doubles. But modern CPUs (and
current GPUs) have vector operations too. They are currently able to perform
operations on 4 float values or 2 double values (or 8 float or 4 doubles) at
the same time for each instruction. Such vector instructions are sometimes used
directly in C-GCC code using SSE intrinsics, or come out of auto-vectorization
optimization of loops done by GCC on normal serial C code. In this situation
the usage of float instead of double gives almost a twofold performance
increase. There are programs (like certain ray-tracing code) where the
precision of a float is enough. So a compile-time error that catches currently
implicit double->float conversions may help the programmer avoid unwanted
usages of doubles that allow the compiler to pack 4/8 floats in a vecto
r register during loop vectorizations.
 Partially related note: currently std.math doesn't seem to use the cosf, sinf
C functions, but it uses sqrtf:

 import std.math: sqrt, sin, cos;
 void main() {
      float x = 1.0f;
      static assert(is(typeof(  sqrt(x)  ) == float)); // OK
      static assert(is(typeof(  sin(x)   ) == float)); // ERR
      static assert(is(typeof(  cos(x)   ) == float)); // ERR
 }

 Bye,
 bearophile
Please, _NOOOOOOO!!_ The integer conversion errors are already arguably too pedantic, make generic code harder to write and get in the way about as often as they help. Floating point tends to degrade much more gracefully than integer. Where integer narrowing can just be silently, non-obviously and completely wrong, floating point narrowing will at least be approximately right, or become infinity and be wrong in an obvious way. I know what you suggest could prevent bugs in a lot of cases, but it also has the potential to get in the way in a lot of cases. Generally I worry about D's type system becoming like the Boy Who Cried Wolf, where it flags so many potential errors (as opposed to things that are definitely errors) that people become conditioned to just put in whatever casts they need to shut it up. I definitely fell into that when porting some 32-bit code that was sloppy with size_t vs. int to 64-bit. I knew there was no way it was going to be a problem, because there was no way any of my arrays were going to be even within a few orders of magnitude of int.max, but the compiler insisted on nagging me about it and I reflexively just put in casts everywhere. A warning _may_ be appropriate, but definitely not an error.
Apr 19 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
dsimcha:

 I know what you suggest could prevent bugs in a lot of 
 cases, but it also has the potential to get in the way in a lot of cases.
You are right, and I am ready to close that enhancement request at once if the consensus is against it. double->float and real->float cases are not so common. How often do you use floats in your code? In my code it's uncommon to use floats, generally I use doubles. A problem may be in real->double conversions, because I think D feels free to use intermediate real values in some FP computations. Another possible problem: generic code like this is going to produce an error because 2.0 literal is double, so x*2.0 is a double even if x is float: T foo(T)(T x) { return x * 2.0; } But when I use floats, I have found it good a C lint that spots double->float conversions, because it has actually allowed me to speed up some code that was doing float->double->float conversions without me being aware of it.
 A warning _may_ be appropriate, but definitely not an error.
Another option is a -warn_fp_precision_loss compiler switch, that produces warnings only when you use it. For my purposes this is is enough. Regarding the actual amount of troubles this errors messages are going to cause, I have recently shown a link that argues for quantitative analysis of language changes: http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=135049 The idea here is to introduce those three FP conversions errors, compile Phobos and some other D2 code, and count how many problems it causes. Bye, bearophile
Apr 19 2011
next sibling parent dsimcha <dsimcha yahoo.com> writes:
On 4/19/2011 8:37 PM, bearophile wrote:
 dsimcha:

 I know what you suggest could prevent bugs in a lot of
 cases, but it also has the potential to get in the way in a lot of cases.
You are right, and I am ready to close that enhancement request at once if the consensus is against it. double->float and real->float cases are not so common. How often do you use floats in your code? In my code it's uncommon to use floats, generally I use doubles.
Very often, actually. Basically, any time I have a lot of floating point numbers that aren't going to be extremely big or small in magnitude and I'm interested in storing them and maybe performing a few _simple_ computations with them (sorting, statistical tests, most machine learning algorithms, etc.). Good examples are gene expression levels or transformations thereof and probabilities. Single precision is plenty unless your numbers are extremely big or small, you need a ridiculous number of significant figures, or you're performing intense computations (for example matrix factorizations) where rounding error may accumulate and turn a small loss of precision into a catastrophic one.
Apr 19 2011
prev sibling next sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 4/19/11 7:37 PM, bearophile wrote:
 dsimcha:

 I know what you suggest could prevent bugs in a lot of
 cases, but it also has the potential to get in the way in a lot of cases.
You are right, and I am ready to close that enhancement request at once if the consensus is against it.
Yes please. I once felt the same way, but learned better since. Andrei
Apr 19 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Andrei:

 Yes please. I once felt the same way, but learned better since.
OK :-) But I will probably write an article about this, because I have found a performance problem, that currently DMD doesn't help me avoid, that a C lint has avoided me. Bye, bearophile
Apr 19 2011
parent dsimcha <dsimcha yahoo.com> writes:
On 4/19/2011 10:42 PM, bearophile wrote:
 Andrei:

 Yes please. I once felt the same way, but learned better since.
OK :-) But I will probably write an article about this, because I have found a performance problem, that currently DMD doesn't help me avoid, that a C lint has avoided me. Bye, bearophile
...or write a Lint tool for D.
Apr 19 2011
prev sibling parent reply "Robert Jacques" <sandford jhu.edu> writes:
On Tue, 19 Apr 2011 20:37:47 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:
 dsimcha:

 I know what you suggest could prevent bugs in a lot of
 cases, but it also has the potential to get in the way in a lot of  
 cases.
You are right, and I am ready to close that enhancement request at once if the consensus is against it. double->float and real->float cases are not so common. How often do you use floats in your code? In my code it's uncommon to use floats, generally I use doubles.
I do GP GPU work, so I use floats all the time. They're also useful for data storage purposes.
 A problem may be in real->double conversions, because I think D feels  
 free to use intermediate real values in some FP computations.
For your information, the x87 can only perform computations at 80-bits. So all intermediate values are performed using reals. It's just how the hardware works. Now I now some compilers (i.e. VS) allow you to set a flag, which basically causes the system to avoid intermediate values altogether or to use SIMD instructions instead in order to be properly compliant.
 Another possible problem: generic code like this is going to produce an  
 error because 2.0 literal is double, so x*2.0 is a double even if x is  
 float:

 T foo(T)(T x) {
     return x * 2.0;
 }

 But when I use floats, I have found it good a C lint that spots  
 double->float conversions, because it has actually allowed me to speed  
 up some code that was doing float->double->float conversions without me  
 being aware of it.
Yes, this auto-promotion of literals is very annoying, and it would be nice if constants could smartly match the expression type. By the way, C/C++ also behave this way, which has gotten me into the habit of adding f after all my floating point constants.
Apr 19 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Robert Jacques:

 I do GP GPU work, so I use floats all the time. They're also useful for  
 data storage purposes.
Today GPUs are just starting to manage doubles efficiently (Tesla?).
 For your information, the x87 can only perform computations at 80-bits.
If you compile D1 code that doesn't contain "real" types with 32-bit LDC it uses SSE instructions on default (just 8 registers), this means most computations are done with 64 bit doubles. And in real programs, that use trigonometry, ecc, this is not the whole story.
 Yes, this auto-promotion of literals is very annoying, and it would be  
 nice if constants could smartly match the expression type.
Polysemus literals in general (here just floating point ones) have being discussed several times in past, but I don't know if floating point polysemus literals can be implemented well, and what consequences they will have in D code. Maybe Don is able to give a good comment on this.
 By the way, C/C++ also behave this way, which has gotten me into the
 habit of adding f after all my floating point constants.
I presume if you take a good amount of care, in C (and probably in D too) you are able to avoid the performance problems I was talking about. But given a perfect programmer most warnings become useless :-) The warnings are usually meant for for programmers that do mistakes, don't know enough yet, miss things, etc. Bye, bearophile
Apr 20 2011
parent "Robert Jacques" <sandford jhu.edu> writes:
On Wed, 20 Apr 2011 06:23:01 -0400, bearophile <bearophileHUGS lycos.com>  
wrote:

 Robert Jacques:

 I do GP GPU work, so I use floats all the time. They're also useful for
 data storage purposes.
Today GPUs are just starting to manage doubles efficiently (Tesla?).
IIRC, the Fermi Tesla cards do doubles at about 1/2 float speed, so relatively to other GPUs, they are both the most efficient, and have the best performance. So if there was something that really needed them, I'd use them. But doubles requires more double the registers and double the memory bandwidth, both of which make dramatic performance differences (depending on code). So I'll be sticking to floats and halves for the foreseeable future.
Apr 20 2011
prev sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/19/2011 5:02 PM, dsimcha wrote:
 Generally I worry about D's type system becoming like the Boy Who Cried Wolf,
 where it flags so many potential errors (as opposed to things that are
 definitely errors) that people become conditioned to just put in whatever casts
 they need to shut it up. I definitely fell into that when porting some 32-bit
 code that was sloppy with size_t vs. int to 64-bit. I knew there was no way it
 was going to be a problem, because there was no way any of my arrays were going
 to be even within a few orders of magnitude of int.max, but the compiler
 insisted on nagging me about it and I reflexively just put in casts everywhere.
 A warning _may_ be appropriate, but definitely not an error.
That's definitely a worry. Having a Nagging Nellie giving false alarms on errors too often will: 1. cause programmers to hate D 2. lead to MORE bugs and harder to find ones, as Bruce Eckel pointed out, because people will put in things "just to shut up the compiler" Hence my reluctance to add in a lot of these suggestions. As to the specific about erroring on reducing precision, my talks with people who actually do write a lot of FP code for a living is NO. They don't want it. Losing precision in FP calculations is a fact of life, and FP programmers simply must understand it and deal with it. Having the compiler annoy you about it would be less than helpful.
Apr 19 2011
parent reply bearophile <bearophileHUGS lycos.com> writes:
Walter:

 Hence my reluctance to add in a lot of these suggestions.
In an answer I've suggested the alternative solution of a -warn_fp_precision_loss compiler switch, that produces warnings only when you use it. In theory this avoids most of the Nagging Nellie problem, because you use this switch only in special situations. But I am aware you generally don't like warnings.
 As to the specific about erroring on reducing precision, my talks with people
 who actually do write a lot of FP code for a living is NO. They don't want it.
 Losing precision in FP calculations is a fact of life, and FP programmers
simply
 must understand it and deal with it. Having the compiler annoy you about it
 would be less than helpful.
Loss of some precision bits in normal FP operations is a fact of life, but double->float conversions usually lose a much more significant amount of precision, and it's not a fact of life, it's the code that in some way asks for this irreversible conversion. A related problem your answer doesn't keep in account are unwanted float->double conversions (that get spotted by those error messages just because the code actually performs float->double->float conversions). Such unwanted conversions have caused performance loss on a 32-bit-mode CPU in some of my C code (maybe this problem is not present with CPU on 64 bit code), because the code was actually using doubles. A C lint has allowed me to spot such problems and fix them. Thank you for your answers, bye, bearophile
Apr 19 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/19/2011 6:46 PM, bearophile wrote:
 Walter:

 Hence my reluctance to add in a lot of these suggestions.
In an answer I've suggested the alternative solution of a -warn_fp_precision_loss compiler switch, that produces warnings only when you use it. In theory this avoids most of the Nagging Nellie problem, because you use this switch only in special situations. But I am aware you generally don't like warnings.
Yes, I've argued strongly against warnings.
Apr 19 2011
parent reply Brad Roberts <braddr slice-2.puremagic.com> writes:
On Tue, 19 Apr 2011, Walter Bright wrote:

 On 4/19/2011 6:46 PM, bearophile wrote:
 Walter:
 
 Hence my reluctance to add in a lot of these suggestions.
In an answer I've suggested the alternative solution of a -warn_fp_precision_loss compiler switch, that produces warnings only when you use it. In theory this avoids most of the Nagging Nellie problem, because you use this switch only in special situations. But I am aware you generally don't like warnings.
Yes, I've argued strongly against warnings.
The stronger argument, that I agree with, is not having flag based sometimes warnings. The more flags you have, the more complex the matrix of landmines there are. I hate micro-managment, in all it's forms.
Apr 19 2011
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 4/19/2011 7:11 PM, Brad Roberts wrote:
 The stronger argument, that I agree with, is not having flag based
 sometimes warnings.  The more flags you have, the more complex the matrix
 of landmines there are.  I hate micro-managment, in all it's forms.
True, if you have N compiler switches, you have 2^N different compilers to test! Every switch added doubles the time it takes to validate the compiler. If you have N warnings that can be independently toggled, you have 2^N different languages.
Apr 19 2011
parent reply Sean Kelly <sean invisibleduck.org> writes:
On Apr 19, 2011, at 11:04 PM, Walter Bright wrote:

 On 4/19/2011 7:11 PM, Brad Roberts wrote:
 The stronger argument, that I agree with, is not having flag based
 sometimes warnings.  The more flags you have, the more complex the =
matrix
 of landmines there are.  I hate micro-managment, in all it's forms.
=20 True, if you have N compiler switches, you have 2^N different =
compilers to test! Every switch added doubles the time it takes to = validate the compiler. Software testing theory has suggestions for how to reduce the number of = test cases here with only a small sacrifice in general error detection. = Still, the fewer switches the better :-)=
Apr 20 2011
parent Walter Bright <newshound2 digitalmars.com> writes:
On 4/20/2011 9:28 AM, Sean Kelly wrote:
 Software testing theory has suggestions for how to reduce the number of test
 cases here with only a small sacrifice in general error detection.  Still,
 the fewer switches the better :-)
Currently I test with all combinations of switches that affect code gen. Sometimes, it will unexpectedly catch an odd interaction.
Apr 20 2011
prev sibling parent Jesse Phillips <jessekphillips+D gmail.com> writes:
bearophile Wrote:

 In Bugzilla I have just added an enhancement request that asks for a little
change in D, I don't know if it was already discussed or if it's already
present in Bugzilla:
 http://d.puremagic.com/issues/show_bug.cgi?id=5864
Loosing precision on a fractional number seems like it would be a very common and desirable case. For one thing even dealing with real doesn't mean you won't have errors, you really end up in very tricky areas when using float over double would actually cause issue in the program. I'm not sure if this would be the best avenue to take in identifying possible performance issues with converting FP. And last, don't forget about significant figures. This suggests it would be better to produce an error when widening since you really don't have the precision of a double or real.
Apr 19 2011