## digitalmars.D.learn - Why use float and double instead of real?

- Lars T. Kyllingstad (7/7) Jun 23 2009 Is there ever any reason to use float or double in calculations? I mean,...
- Witold Baryluk (15/22) Jun 23 2009 yes they are faster and are smaller, and accurate enaugh.
- BCS (9/24) Jun 23 2009 IIRC on most systems real will only be slower as a result of I/O costs. ...
- Witold Baryluk (8/23) Jun 23 2009 this is exactly the same think which cpu already does when dealing with
- BCS (6/18) Jun 23 2009 You misread me; if you need computation to exactly match 32 or 64bit mat...
- Witold Baryluk (2/15) Jun 23 2009 We both know this, so EOT. :)
- Jarrett Billingsley (10/13) Jun 23 2009 As Witold mentioned, float and double are the only types SSE (and
- Don (6/15) Jul 01 2009 Size. Since modern CPUs are memory-bandwidth limited, it's always going
- Lars T. Kyllingstad (18/35) Jul 01 2009 The reason I'm asking is that I've templated the numerical routines I've...
- BCS (5/9) Jul 01 2009 I was under the impression that the memory buss could feed the CPU at le...
- Don (6/19) Jul 02 2009 Intel Core2 can only perform one load per cycle, but can do one floating...

Is there ever any reason to use float or double in calculations? I mean, when does one *not* want maximum precision? Will code using float or double run faster than code using real? I understand they are needed for I/O and compatibility purposes, so I am by no means suggesting they be removed from the language. I am merely asking out of curiosity. -Lars

Jun 23 2009

Dnia 2009-06-23, wto o godzinie 14:44 +0200, Lars T. Kyllingstad pisze:Is there ever any reason to use float or double in calculations? I mean, when does one *not* want maximum precision? Will code using float or double run faster than code using real?yes they are faster and are smaller, and accurate enaugh. they also can be used in SSE. reals can be very undeterministic. like if (f(x) != f(y)) { assert(x != y, "Boom!"); } // it will explodeI understand they are needed for I/O and compatibility purposes, so I am by no means suggesting they be removed from the language. I am merely asking out of curiosity.float and double types conforms to IEEE 754 standard. real type not. and many application (scientific computations, simultions, interval arithmetic) absolutly needs IEEE 754 semantic (correct rounding, known error behaviour, and so on). additionally real have varying precission on multiple platforms, and varing size, or are just not supported. if you need very high precision (and still have some knowledge about what is maximal error), you can use double-double, or quad-double (structure of 2 or 4 doubles). I have implemented them in D, but are quite slow.-Lars

Jun 23 2009

Hello Witold,Dnia 2009-06-23, wto o godzinie 14:44 +0200, Lars T. Kyllingstad pisze:IIRC on most systems real will only be slower as a result of I/O costs. For example on x86 the FPU only computes using 80-bit.Is there ever any reason to use float or double in calculations? I mean, when does one *not* want maximum precision? Will code using float or double run faster than code using real?yes they are faster and are smaller, and accurate enaugh.float and double types conforms to IEEE 754 standard. real type not.I think you are in error here. IIRC IEEE-754 has some stuff about "extended precision" values that work like the normal types but with more bits. That is what 80 bit reals are. If you force rounding to 64-bits after each op, I think things will come out exactly the same as for a 64-bit FPU.and many application (scientific computations, simultions, interval arithmetic) absolutly needs IEEE 754 semantic (correct rounding, known error behaviour, and so on).additionally real have varying precission on multiple platforms, and varing size, or are just not supported.reals are /always/ supported if the platform supports FP, even if only with 16-bit FP types.

Jun 23 2009

Dnia 2009-06-23, wto o godzinie 16:01 +0000, BCS pisze:I think you are in error here. IIRC IEEE-754 has some stuff about "extended precision" values that work like the normal types but with more bits. That is what 80 bit reals are. If you force rounding to 64-bits after each op, I think things will come out exactly the same as for a 64-bit FPU.this is exactly the same think which cpu already does when dealing with doubles and floats. internal computations are performed in ext. precision, and written somewhere, truncating to 64bits.yes, you are absolutely right. i was thinking about reals which are mapped to something bigger than double precision. I'm using sometimes reals for intermediate values, for example when summing large number of values. One can also use Kahan's algorithm.and many application (scientific computations, simultions, interval arithmetic) absolutly needs IEEE 754 semantic (correct rounding, known error behaviour, and so on).additionally real have varying precission on multiple platforms, and varing size, or are just not supported.reals are /always/ supported if the platform supports FP, even if only with 16-bit FP types.

Jun 23 2009

Reply to Witold,Dnia 2009-06-23, wto o godzinie 16:01 +0000, BCS pisze:You misread me; if you need computation to exactly match 32 or 64bit math, you will need to round after every single operation (+, -, *, /, etc.), what most systems do is use full internal precision for intermediate value and round only when the value is stored to a variable. If you don't need bit-for-bit matches, then 80-bit matches IEEE-754 semantics just with more bits of precision.I think you are in error here. IIRC IEEE-754 has some stuff about "extended precision" values that work like the normal types but with more bits. That is what 80 bit reals are. If you force rounding to 64-bits after each op, I think things will come out exactly the same as for a 64-bit FPU.this is exactly the same think which cpu already does when dealing with doubles and floats. internal computations are performed in ext. precision, and written somewhere, truncating to 64bits.

Jun 23 2009

Dnia 2009-06-23, wto o godzinie 17:14 +0000, BCS pisze:Reply to Witold,We both know this, so EOT. :)this is exactly the same think which cpu already does when dealing with doubles and floats. internal computations are performed in ext. precision, and written somewhere, truncating to 64bits.You misread me; if you need computation to exactly match 32 or 64bit math, you will need to round after every single operation (+, -, *, /, etc.), what most systems do is use full internal precision for intermediate value and round only when the value is stored to a variable. If you don't need bit-for-bit matches, then 80-bit matches IEEE-754 semantics just with more bits of precision.

Jun 23 2009

On Tue, Jun 23, 2009 at 8:44 AM, Lars T. Kyllingstad<public kyllingen.nospamnet> wrote:Is there ever any reason to use float or double in calculations? I mean, when does one *not* want maximum precision? Will code using float or double run faster than code using real?As Witold mentioned, float and double are the only types SSE (and similar SIMD instruction sets on other architectures) can deal with. Furthermore most 3D graphics hardware only uses single or even half-precision (16-bit) floats, so it makes no sense to use 64- or 80-bit floats in those cases. Also keep in mind that 'real' is simply defined as the largest supported floating-point type. On x86, that's an 80-bit real, but on most other architectures, it's the same as double anyway.

Jun 23 2009

Lars T. Kyllingstad wrote:Is there ever any reason to use float or double in calculations? I mean, when does one *not* want maximum precision? Will code using float or double run faster than code using real? I understand they are needed for I/O and compatibility purposes, so I am by no means suggesting they be removed from the language. I am merely asking out of curiosity. -LarsSize. Since modern CPUs are memory-bandwidth limited, it's always going to be MUCH faster to use float[] instead of real[] once the array size gets too big to fit in the cache. Maybe around 2000 elements or so. Rule of thumb: use real for temporary values, use float or double for arrays.

Jul 01 2009

Don wrote:Lars T. Kyllingstad wrote:The reason I'm asking is that I've templated the numerical routines I've written, so that the user can choose which floating-point type to use. Then I started wondering whether I should in fact always use real for temporary values inside the routines, for precision's sake, or whether this would reduce performance significantly. From the answers I've gotten to my question (thanks everyone, BTW!), It's not immediately clear to me what is the best choice in general. (Perhaps it would be best to have two template parameters, one for input/output precision and one for working precision?) Functions in std.math are defined in a lot of different ways: - separate overloaded functions for float, double and real - like the above, only float and double versions cast to real and call real version - only real version - templated Is there some rationale behind these choices? -LarsIs there ever any reason to use float or double in calculations? I mean, when does one *not* want maximum precision? Will code using float or double run faster than code using real? I understand they are needed for I/O and compatibility purposes, so I am by no means suggesting they be removed from the language. I am merely asking out of curiosity. -LarsSize. Since modern CPUs are memory-bandwidth limited, it's always going to be MUCH faster to use float[] instead of real[] once the array size gets too big to fit in the cache. Maybe around 2000 elements or so. Rule of thumb: use real for temporary values, use float or double for arrays.

Jul 01 2009

Hello Don,Size. Since modern CPUs are memory-bandwidth limited, it's always going to be MUCH faster to use float[] instead of real[] once the array size gets too big to fit in the cache. Maybe around 2000 elements or so.I was under the impression that the memory buss could feed the CPU at least as fast as the CPU could process data but just with huge latency. Based on that, it's not how much data is loaded (bandwidth) but how many places it's loaded from. Is my initial assumption wrong or am I just nit picking?

Jul 01 2009

BCS wrote:Hello Don,Intel Core2 can only perform one load per cycle, but can do one floating point add per cycle. So in something like a[] += b[], you're limited by memory bandwidth even when everything is in the L1 cache. But in practice, performance is usually dominated by cache misses.Size. Since modern CPUs are memory-bandwidth limited, it's always going to be MUCH faster to use float[] instead of real[] once the array size gets too big to fit in the cache. Maybe around 2000 elements or so.I was under the impression that the memory buss could feed the CPU at least as fast as the CPU could process data but just with huge latency. Based on that, it's not how much data is loaded (bandwidth) but how many places it's loaded from. Is my initial assumption wrong or am I just nit picking?

Jul 02 2009