www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Why use float and double instead of real?

reply "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
Is there ever any reason to use float or double in calculations? I mean, 
when does one *not* want maximum precision? Will code using float or 
double run faster than code using real?

I understand they are needed for I/O and compatibility purposes, so I am 
by no means suggesting they be removed from the language. I am merely 
asking out of curiosity.

-Lars
Jun 23 2009
next sibling parent reply Witold Baryluk <baryluk smp.if.uj.edu.pl> writes:
Dnia 2009-06-23, wto o godzinie 14:44 +0200, Lars T. Kyllingstad pisze:
 Is there ever any reason to use float or double in calculations? I mean, 
 when does one *not* want maximum precision? Will code using float or 
 double run faster than code using real?
yes they are faster and are smaller, and accurate enaugh. they also can be used in SSE. reals can be very undeterministic. like if (f(x) != f(y)) { assert(x != y, "Boom!"); } // it will explode
 I understand they are needed for I/O and compatibility purposes, so I am 
 by no means suggesting they be removed from the language. I am merely 
 asking out of curiosity.
float and double types conforms to IEEE 754 standard. real type not. and many application (scientific computations, simultions, interval arithmetic) absolutly needs IEEE 754 semantic (correct rounding, known error behaviour, and so on). additionally real have varying precission on multiple platforms, and varing size, or are just not supported. if you need very high precision (and still have some knowledge about what is maximal error), you can use double-double, or quad-double (structure of 2 or 4 doubles). I have implemented them in D, but are quite slow.
 -Lars
Jun 23 2009
parent reply BCS <none anon.com> writes:
Hello Witold,

 Dnia 2009-06-23, wto o godzinie 14:44 +0200, Lars T. Kyllingstad
 pisze:
 
 Is there ever any reason to use float or double in calculations? I
 mean, when does one *not* want maximum precision? Will code using
 float or double run faster than code using real?
 
yes they are faster and are smaller, and accurate enaugh.
IIRC on most systems real will only be slower as a result of I/O costs. For example on x86 the FPU only computes using 80-bit.
 float and double types conforms to IEEE 754 standard. real type not.
I think you are in error here. IIRC IEEE-754 has some stuff about "extended precision" values that work like the normal types but with more bits. That is what 80 bit reals are. If you force rounding to 64-bits after each op, I think things will come out exactly the same as for a 64-bit FPU.
 and many application (scientific computations, simultions, interval
 arithmetic) absolutly needs IEEE 754 semantic (correct rounding, known
 error behaviour, and so on).
 additionally
 real have varying precission on multiple platforms, and varing size,
 or are just not supported.
reals are /always/ supported if the platform supports FP, even if only with 16-bit FP types.
Jun 23 2009
parent reply Witold Baryluk <baryluk smp.if.uj.edu.pl> writes:
Dnia 2009-06-23, wto o godzinie 16:01 +0000, BCS pisze:

 I think you are in error here. IIRC IEEE-754 has some stuff about "extended 
 precision" values that work like the normal types but with more bits. That 
 is what 80 bit reals are. If you force rounding to 64-bits after each op, 
 I think things will come out exactly the same as for a 64-bit FPU. 
 
this is exactly the same think which cpu already does when dealing with doubles and floats. internal computations are performed in ext. precision, and written somewhere, truncating to 64bits.
 and many application (scientific computations, simultions, interval
 arithmetic) absolutly needs IEEE 754 semantic (correct rounding, known
 error behaviour, and so on).
 additionally
 real have varying precission on multiple platforms, and varing size,
 or are just not supported.
reals are /always/ supported if the platform supports FP, even if only with 16-bit FP types.
yes, you are absolutely right. i was thinking about reals which are mapped to something bigger than double precision. I'm using sometimes reals for intermediate values, for example when summing large number of values. One can also use Kahan's algorithm.
Jun 23 2009
parent reply BCS <ao pathlink.com> writes:
Reply to Witold,

 Dnia 2009-06-23, wto o godzinie 16:01 +0000, BCS pisze:
 
 I think you are in error here. IIRC IEEE-754 has some stuff about
 "extended precision" values that work like the normal types but with
 more bits. That is what 80 bit reals are. If you force rounding to
 64-bits after each op, I think things will come out exactly the same
 as for a 64-bit FPU.
 
this is exactly the same think which cpu already does when dealing with doubles and floats. internal computations are performed in ext. precision, and written somewhere, truncating to 64bits.
You misread me; if you need computation to exactly match 32 or 64bit math, you will need to round after every single operation (+, -, *, /, etc.), what most systems do is use full internal precision for intermediate value and round only when the value is stored to a variable. If you don't need bit-for-bit matches, then 80-bit matches IEEE-754 semantics just with more bits of precision.
Jun 23 2009
parent Witold Baryluk <baryluk smp.if.uj.edu.pl> writes:
Dnia 2009-06-23, wto o godzinie 17:14 +0000, BCS pisze:
 Reply to Witold,
 this is exactly the same think which cpu already does when dealing
 with doubles and floats. internal computations are performed in ext.
 precision, and written somewhere, truncating to 64bits.
 
You misread me; if you need computation to exactly match 32 or 64bit math, you will need to round after every single operation (+, -, *, /, etc.), what most systems do is use full internal precision for intermediate value and round only when the value is stored to a variable. If you don't need bit-for-bit matches, then 80-bit matches IEEE-754 semantics just with more bits of precision.
We both know this, so EOT. :)
Jun 23 2009
prev sibling next sibling parent Jarrett Billingsley <jarrett.billingsley gmail.com> writes:
On Tue, Jun 23, 2009 at 8:44 AM, Lars T.
Kyllingstad<public kyllingen.nospamnet> wrote:
 Is there ever any reason to use float or double in calculations? I mean,
 when does one *not* want maximum precision? Will code using float or double
 run faster than code using real?
As Witold mentioned, float and double are the only types SSE (and similar SIMD instruction sets on other architectures) can deal with. Furthermore most 3D graphics hardware only uses single or even half-precision (16-bit) floats, so it makes no sense to use 64- or 80-bit floats in those cases. Also keep in mind that 'real' is simply defined as the largest supported floating-point type. On x86, that's an 80-bit real, but on most other architectures, it's the same as double anyway.
Jun 23 2009
prev sibling parent reply Don <nospam nospam.com> writes:
Lars T. Kyllingstad wrote:
 Is there ever any reason to use float or double in calculations? I mean, 
 when does one *not* want maximum precision? Will code using float or 
 double run faster than code using real?
 
 I understand they are needed for I/O and compatibility purposes, so I am 
 by no means suggesting they be removed from the language. I am merely 
 asking out of curiosity.
 
 -Lars
Size. Since modern CPUs are memory-bandwidth limited, it's always going to be MUCH faster to use float[] instead of real[] once the array size gets too big to fit in the cache. Maybe around 2000 elements or so. Rule of thumb: use real for temporary values, use float or double for arrays.
Jul 01 2009
next sibling parent "Lars T. Kyllingstad" <public kyllingen.NOSPAMnet> writes:
Don wrote:
 Lars T. Kyllingstad wrote:
 Is there ever any reason to use float or double in calculations? I 
 mean, when does one *not* want maximum precision? Will code using 
 float or double run faster than code using real?

 I understand they are needed for I/O and compatibility purposes, so I 
 am by no means suggesting they be removed from the language. I am 
 merely asking out of curiosity.

 -Lars
Size. Since modern CPUs are memory-bandwidth limited, it's always going to be MUCH faster to use float[] instead of real[] once the array size gets too big to fit in the cache. Maybe around 2000 elements or so. Rule of thumb: use real for temporary values, use float or double for arrays.
The reason I'm asking is that I've templated the numerical routines I've written, so that the user can choose which floating-point type to use. Then I started wondering whether I should in fact always use real for temporary values inside the routines, for precision's sake, or whether this would reduce performance significantly. From the answers I've gotten to my question (thanks everyone, BTW!), It's not immediately clear to me what is the best choice in general. (Perhaps it would be best to have two template parameters, one for input/output precision and one for working precision?) Functions in std.math are defined in a lot of different ways: - separate overloaded functions for float, double and real - like the above, only float and double versions cast to real and call real version - only real version - templated Is there some rationale behind these choices? -Lars
Jul 01 2009
prev sibling parent reply BCS <none anon.com> writes:
Hello Don,

 Size. Since modern CPUs are memory-bandwidth limited, it's always
 going to be MUCH faster to use float[] instead of real[] once the
 array size gets too big to fit in the cache. Maybe around 2000
 elements or so.
I was under the impression that the memory buss could feed the CPU at least as fast as the CPU could process data but just with huge latency. Based on that, it's not how much data is loaded (bandwidth) but how many places it's loaded from. Is my initial assumption wrong or am I just nit picking?
Jul 01 2009
parent Don <nospam nospam.com> writes:
BCS wrote:
 Hello Don,
 
 Size. Since modern CPUs are memory-bandwidth limited, it's always
 going to be MUCH faster to use float[] instead of real[] once the
 array size gets too big to fit in the cache. Maybe around 2000
 elements or so.
I was under the impression that the memory buss could feed the CPU at least as fast as the CPU could process data but just with huge latency. Based on that, it's not how much data is loaded (bandwidth) but how many places it's loaded from. Is my initial assumption wrong or am I just nit picking?
Intel Core2 can only perform one load per cycle, but can do one floating point add per cycle. So in something like a[] += b[], you're limited by memory bandwidth even when everything is in the L1 cache. But in practice, performance is usually dominated by cache misses.
Jul 02 2009