digitalmars.D - Submission: Floating point near-equality

Don Clugston (165/165) Aug 14 2005 Submission: probably for std.math

BCS (3/3) Aug 15 2005 #$%@ you!!! ;-)
Dave (3/3) Aug 15 2005 In article , Don Clugston says...
Ben Hinkle (34/107) Aug 15 2005 Comments inline

Ben Hinkle (28/46) Aug 15 2005 To be more concrete here's what I was thinking of:

Don Clugston (35/42) Aug 15 2005 Yes, it's just like the isfinite(), isnan() functions in std.math.

Ben Hinkle (14/66) Aug 15 2005 Do you mean "near one"? Near zero life gets very interesting. If I

Don Clugston (29/45) Aug 15 2005 I did mean "near zero". You just need to go much smaller with 80-bit

Ben Hinkle (25/51) Aug 16 2005 Ah ok. I didn't think you meant the underflow numbers as the the only th...

Don Clugston (42/67) Aug 16 2005 Yes. There is so much precision available near zero, it's insane. It's
Don Clugston (32/34) Aug 16 2005 IDEA: Do you think the same applies for very large numbers?

Ben Hinkle (21/55) Aug 17 2005 Actually I like your original relative equality for large numbers. It's ...
Shammah Chancellor (7/11) Aug 17 2005 What is the formula that defines this coordinate system? Why not use th...

Don Clugston (15/34) Aug 17 2005 Yes, and that's exactly what my code does, but as Ben has pointed out,

Ben Hinkle (16/48) Aug 17 2005 Another way to describe your metric is like the usual Euclidean metric o...

Walter (7/7) Aug 15 2005 I think this is pretty cool. I've thought about building such a thing, b...

Don Clugston (9/16) Aug 15 2005 I need to add a few more tests for denormals, too. There are so many cor...

Ben Hinkle (4/8) Aug 15 2005 Well Walter seems gung-ho with it but please add in a difference test wi...

Don Clugston (11/20) Aug 15 2005 It's not a pure relative test. Subnormals will compare equal to zero. So...

Don Clugston <Don_member pathlink.com> writes:

Submission: probably for std.math
int precisionEquality(real x, real y)

"How equal" are two reals x and y?

Rationale:
The use of == to test floating point numbers for equality is somewhat dangerous,
because == is completely intolerant of round-off error. 
For example, almost every unit test involving floating point numbers has an
assertion of floating-point equality; but operator == is usually far too strict.
Instead, some form of "close enough" test must be used. If not provided by the
language, programmers must write this test themselves; but unfortunately,
it's easy to write this test incorrectly.

In std.math and std.math2 there are two floating point closeness tests: feq()
and mfeq(). Ultimately these are:
return fabs(x - y) <= precision;
where precision is (say) 0.000001
This is incorrect: for very small numbers, this is a very weak assertion
eg mfeq(1e-30, 7.8e-300) returns true!
but for large numbers, eg x=1e60, it is an *extremely* strong assertion.

The "correct" way to do it, according to Knuth, is to combine a ratio
test with a difference test. But, this is a bit slow, and it still has
the arbitrary magic precision number ("0.000001" in the code above).

An interesting idea by Bruce Dawson 
(http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm)
is to calculate the number of representable IEEE reals which lie between x and
y.
Unfortunately..
* the answer depends on whether the numbers are float, double, or real.
* the resulting value gets _very_ large if x and y are far apart...
* ... and with an IEEE extended float, it won't even fit in a ulong!
* it's not very intuitive.

IMHO, what you actually want is a function which tells you: "to how many
significant figures (bits) is x==y?"

The function below works on 80-bit reals. It is very fast (with possibilities
for further optimisation), and it provides sensible results for any input,
including subnormals, infinities, and NaNs. It could easily be converted for
reals with different precision, (you'd just need to know where the exponent
lies).

For example, this function ensures that there are at most 4 bits of round-off
error:

int feq(real x, real y)
{
return precisionEquality(x, y, real.mant_dig-4);
}

As well as providing a 'closeness' test, the function allows you to evaluate the
accuracy of a calculation. More on this in a future post.

writefln("It is accurate to %d bits.", precisionEquality(pifinder(), PI));

I believe that some equivalent to this function belongs in std.math. It would
be used in almost every unit test involving floating point numbers!

Some design choices which are open to debate:
* the name. precisionEquality() is the best I've come up with so far.
* behaviour with NaNs. Like ==,  the version below states that two NaNs are
completely unequal. But it would also be reasonable to state that they are equal
to 64 bits!
* My expansion of the notion of "significant figures" to negative values is
unconventional.

This is my first D contribution, my style may be inadequate. 
I'd really appreciate feedback, since it's surprisingly tricky, and AFAIK, no
language has ever got this right.

-Don.

================================================

import std.math2; // for abs
import std.math; // for isNan

/*
int precisionEquality(real x, real y)

"How equal" are two reals x and y?
Returns the number of mantissa bits which are equal in x and y;
this is similar to the concept of "significant figures".
For example, 0x1.F8p+60 and 0x1.F1p+60 are equal to 5 bits of precision.
If x==y, then precisionEquality(a,b) == real.mant_dig.

If x and y differ by a factor of two or more, then they have no
bits in common, so this function returns a value which is <=0.
Returns 0 if they differ by a factor of >=2 but < 4,
returns 1 for factors >=4 but <8, etc.

If the numbers have opposite sign, the difference in orders
of magnitude is based on the IEEE binary encoding.
For example, -1 and +1 differ by half the number line.

Pretend the numbers were fixed point. In the IEEE extended format,
there are about 32768 digits in front of the decimal point, and
the same number behind. But only the first 64 bits after the first '1'
bit are recorded.

If the exponents are different, then the return value
is negative, and gives the number of (binary) orders of
magnitude of difference between a and b.
*/
int precisionEquality(real a, real b)
{
real diff = a-b;
ushort *pa = cast(ushort *)(&a);
ushort *pb = cast(ushort *)(&b);
ushort *pd = cast(ushort *)(&diff);

// The difference in exponent between 'a' and 'a-b'
// is equal to the number of mantissa bits of a which are
// equal to b. If the difference is negative, then a and b
// have different exponents. If it is positive, then a and b
// are equal to that number of decimal places. 
// AND with 0x7FFF to form the absolute value.
// Subtract 1 so that we round downwards.
int bitsdiff = ( ((pa[4]&0x7FFF) + (pb[4]&0x7FFF)-1)>>1) - (pd[4]&0x7FFF);

if ((pd[4]&0x7FFF) == 0) { // Difference is zero or denormal
if (a==b) return real.mant_dig;
// 'diff' is denormal, so the number of equal bits is higher;
// we need to add the number of zeros that lie at the start of
// its mantissa. We do this by multiplying by 2^real.mant_dig
diff*=0x1p+63;
return bitsdiff + real.mant_dig-(pd[4]&0x7FFF);
}

if (bitsdiff>0) return bitsdiff+1;

// Check for NAN or Infinity.     
if (pd[4]&0x7FFF==0x7FFF) { // Result is NAN or INF
if (a==b) { 			 // both were +INF or both were -INF.
return real.mant_dig;
}
if (isnan(diff)) {
return -65535;  // at least one was a NAN.
}
// fall through for comparisons of INF with large reals.
}
// Return the negative of the absolute value of
// the difference in exponents.

if ((pa[4]^pb[4])&0x8000) { // do they have opposite signs?
// Convert the 'offset' exponent format to twos-complement,
// then do a normal subtraction.
return 1-abs(pa[4]-(0x8000-pb[4]));
}
return 1-abs(pa[4]-pb[4]);
}

unittest {
// Exact equality
assert(precisionEquality(real.max,real.max)==real.mant_dig);
assert(precisionEquality(0,0)==real.mant_dig);
assert(precisionEquality(7.1824,7.1824)==real.mant_dig);
assert(precisionEquality(real.infinity,real.infinity)==real.mant_dig);

// one bit off exact equality
assert(precisionEquality(1+real.epsilon,1)==real.mant_dig-1);
assert(precisionEquality(1,1+real.epsilon)==real.mant_dig-1);
// one bit below. Must round off correctly when exponents are different.
assert(precisionEquality(1,1-real.epsilon)==real.mant_dig-1); 
assert(precisionEquality(1-real.epsilon,1)==real.mant_dig-1);
assert(precisionEquality(-1-real.epsilon,-1)==real.mant_dig-1);
assert(precisionEquality(-1+real.epsilon,-1)==real.mant_dig-1);

// Numbers that are close
assert(precisionEquality(0x1.Bp+5, 0x1.B8p+5)==5); 
assert(precisionEquality(0x1.8p+10, 0x1.Cp+10)==2);
assert(precisionEquality(1, 1.1249)==4);
assert(precisionEquality(1.5, 1)==1); 

// Extreme inequality
assert(precisionEquality(real.nan,0)==-65535);
assert(precisionEquality(0,-real.nan)==-65535);
assert(precisionEquality(real.nan,real.infinity)==-65535); 
assert(precisionEquality(real.infinity,-real.infinity)==-65533);
assert(precisionEquality(-real.max,real.infinity)==-65532);
assert(precisionEquality(-real.max,real.max)==-65531);

// Numbers that are half the number line apart
// Note that (real.max_exp-real.min_exp) = 32765
assert(precisionEquality(-real.max,0)==-32765);
assert(precisionEquality(-1,1)==-32765);
assert(precisionEquality(real.min,1)==-16381);

// Numbers that differ by one order of magnitude
assert(precisionEquality(real.max,real.infinity)==0);
assert(precisionEquality(1, 2)==0);
assert(precisionEquality(2, 1)==0);
assert(precisionEquality(1.5*(1-real.epsilon), 1)==2);
assert(precisionEquality(4*(1-real.epsilon), 1)==0);
assert(precisionEquality(4, 1)==-1);
}

Aug 14 2005

BCS <BCS_member pathlink.com> writes:



I was going to post on this EXACT subject today!! :)

I was thinking of making it an operator, maybe "~~".

Aug 15 2005

Dave <Dave_member pathlink.com> writes:

In article <ddp3eh$b3$1 digitaldaemon.com>, Don Clugston says...

I'm no mathematician, but your proposal seems very reasonable and would be nice
to incorporate into an operator of some kind too.

Aug 15 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

Comments inline

"Don Clugston" <Don_member pathlink.com> wrote in message
news:ddp3eh$b3$1 digitaldaemon.com...
 Submission: probably for std.math
 int precisionEquality(real x, real y)

 "How equal" are two reals x and y?

 Rationale:
 The use of == to test floating point numbers for equality is somewhat
 dangerous,
 because == is completely intolerant of round-off error.
 For example, almost every unit test involving floating point numbers has
 an
 assertion of floating-point equality; but operator == is usually far too
 strict.
 Instead, some form of "close enough" test must be used. If not provided by
 the
 language, programmers must write this test themselves; but unfortunately,
 it's easy to write this test incorrectly.

Agreed - though the notion of 'close enough' depends on the situation. I
think there was an earlier thread about this kind of thing, too. There's a 
really short thread in the archives with Walter's opinion:
http://www.digitalmars.com/d/archives/digitalmars/D/6397.html

 In std.math and std.math2 there are two floating point closeness tests:
 feq()
 and mfeq(). Ultimately these are:
 return fabs(x - y) <= precision;
 where precision is (say) 0.000001
 This is incorrect: for very small numbers, this is a very weak assertion
 eg mfeq(1e-30, 7.8e-300) returns true!
 but for large numbers, eg x=1e60, it is an *extremely* strong assertion.

 The "correct" way to do it, according to Knuth, is to combine a ratio
 test with a difference test. But, this is a bit slow, and it still has
 the arbitrary magic precision number ("0.000001" in the code above).

It'll be hard to avoid magic numbers. Even a statement of "4 bits of
precision" is a magic number - just in binary. I actually wouldn't mind 
being up-front about the fact that there are two notions of equality 
absolute and relative and have the test take two parameters (the relative 
and absute tolerances) and go with those. I'm personally not worried about 
the performance of a division or multiplication when doing a floating pt 
test - if that's the performance hit you're worried about.

 An interesting idea by Bruce Dawson
 (http://www.cygnus-software.com/papers/comparingfloats/comparingfloats.htm)
 is to calculate the number of representable IEEE reals which lie between x
 and
 y.
 Unfortunately..
 * the answer depends on whether the numbers are float, double, or real.
 * the resulting value gets _very_ large if x and y are far apart...
 * ... and with an IEEE extended float, it won't even fit in a ulong!
 * it's not very intuitive.

 IMHO, what you actually want is a function which tells you: "to how many
 significant figures (bits) is x==y?"

With special treatment for 0? or around 0?

 The function below works on 80-bit reals. It is very fast (with
 possibilities
 for further optimisation), and it provides sensible results for any input,
 including subnormals, infinities, and NaNs. It could easily be converted
 for
 reals with different precision, (you'd just need to know where the
 exponent lies).

It's actually very useful to have equality tests that know about float and
double. That way the promotion to real doesn't introduce extra "precision"
in a number that it never had to start with and it's easier to just type 
feq(x,y) when x and y are doubles instead of something like 
feq(x,y,double.mant_dig-4)

 For example, this function ensures that there are at most 4 bits of
 round-off
 error:

 int feq(real x, real y)
 {
 return precisionEquality(x, y, real.mant_dig-4);
 }

The precisionEquality function below doesn't take 3 inputs. Did you mean
something like
  return precisionEquality(x,y) >= real.mant_dig-4
Also glancing over the code does it assume little-endian architecture? I see
the ushort* used to grab different parts of the real.

 As well as providing a 'closeness' test, the function allows you to
 evaluate the
 accuracy of a calculation. More on this in a future post.

 writefln("It is accurate to %d bits.", precisionEquality(pifinder(), PI));

 I believe that some equivalent to this function belongs in std.math. It
 would
 be used in almost every unit test involving floating point numbers!

 Some design choices which are open to debate:
 * the name. precisionEquality() is the best I've come up with so far.

yeah - that name is pretty long. 'feq' would be the best IMHO.

 * behaviour with NaNs. Like ==,  the version below states that two NaNs
 are
 completely unequal. But it would also be reasonable to state that they are
 equal
 to 64 bits!

NaN's should always be unequal. A function 'fis' could be defined that
returned true if both are NaN.

 * My expansion of the notion of "significant figures" to negative values
 is unconventional.

I'm not sure what you are referring to.

 This is my first D contribution, my style may be inadequate.
 I'd really appreciate feedback, since it's surprisingly tricky, and AFAIK,
 no
 language has ever got this right.

 -Don.

It would be nice to combine that with an abstol parameter near 0. That way a 
test like feq(0,double.epsilon) will be true. I could see something like feq 
checking if one of the two values is exactly 0 and doing absolute checking 
there - or just doing a straightforward abs/rel check.

Aug 15 2005

"Ben Hinkle" <bhinkle mathworks.com> writes:

 In std.math and std.math2 there are two floating point closeness tests:
 feq() and mfeq(). Ultimately these are:
 return fabs(x - y) <= precision;
 where precision is (say) 0.000001
 This is incorrect: for very small numbers, this is a very weak assertion
 eg mfeq(1e-30, 7.8e-300) returns true!
 but for large numbers, eg x=1e60, it is an *extremely* strong assertion.

 The "correct" way to do it, according to Knuth, is to combine a ratio
 test with a difference test. But, this is a bit slow, and it still has
 the arbitrary magic precision number ("0.000001" in the code above).

 It'll be hard to avoid magic numbers. Even a statement of "4 bits of
 precision" is a magic number - just in binary. I actually wouldn't mind 
 being up-front about the fact that there are two notions of equality 
 absolute and relative and have the test take two parameters (the relative 
 and absute tolerances) and go with those. I'm personally not worried about 
 the performance of a division or multiplication when doing a floating pt 
 test - if that's the performance hit you're worried about.

To be more concrete here's what I was thinking of:

const float TinyFloat = 1e-4f;
const double TinyDouble = 1e-8;
const real TinyReal = 1e-10L; // depends on real.epsilon
int feq(real x, real y, real reltol = TinyReal, real abstol = TinyReal) {
    real diff = fabs(x-y);
    if (diff <= abstol)
        return 1;
    real fx = fabs(x);
    real fy = fabs(y);
    real scale = fx>fy ? fx : fy;
    return diff/scale <= reltol;
}
int feq(double x, double y, double reltol = TinyDouble, double abstol = 
TinyDouble) {
    return feq(cast(real)x,cast(real)y,cast(real)reltol,cast(real)abstol);
}
int feq(float x, float y, float reltol = TinyFloat, float abstol = 
TinyFloat) {
    return feq(cast(real)x,cast(real)y,cast(real)reltol,cast(real)abstol);
}

That way if you know you'll be dealing with very small numbers (less than 
TinyReal, for example) then one can pass 0 for the abstol (and let the 
optimzer remove any dead code) and only use the relative equality tests. If 
one wants a pure absolute test (suppose one knows the numbers will always be 
near 1) then pass 0 for reltol and let the optimzer remove that code. For 
"normal usage", though, numbers that are "close" either in absolute or 
relative terms will test as equal.

Aug 15 2005

Don Clugston <Don_member pathlink.com> writes:

From your first post:
Also glancing over the code does it assume little-endian architecture? I see
the ushort* used to grab different parts of the real.

Yes, it's just like the isfinite(), isnan() functions in std.math.
It's also only valid for 80-bit reals.
I'm just happy that I managed to do it without reverting to asm!

 It'll be hard to avoid magic numbers. Even a statement of "4 bits of
 precision" is a magic number - just in binary. I actually wouldn't mind 
 being up-front about the fact that there are two notions of equality 
 absolute and relative and have the test take two parameters (the relative 
 and absute tolerances) and go with those.


The point is that in IEEE arithmetic, the relative and absolute tolerances are
related via the number of bits of precision. Near zero, "bits of precision"
looks like an absolute tolerance. Away from zero, it looks like relative
tolerance. And there's a region in between where it's really difficult to think
in terms of absolute or relative tolerance, they kind of blur together.

Note that there are many uses beyond simple "equal-enough" tests.
This function allows you to say: At what value of x are f(x) and g(x) most
different? eg when is exp(ln(x)) most different to x?
You can just try a range of different x values, and find the one where
precisionEquality(exp(ln(x)), x) is smallest. You don't need to worry about
whether you should be using absolute or relative tolerance, that's already taken
care of. So, if writing a function, you can find where the precision is least
adequate. Once you've done the best you can, you can put that into a unit test.

This gives you a simple way of reporting the accuracy of library functions and
identities, and for performing more powerful "second-order" unit tests.
(eg, acos(cos(x)) must be close to x for all -PI/2 <= x <= PI/2).
Then, if you know that cos(x) is accurate, you've proved that acos(x) is also
accurate. My goal is to provide one-line unit tests for most of the math
functions.

You're right: for equality tests, it probably is necessary to have seperate
implementations for double and float, because subnormals become normal when
moved to higher precision. It's only a problem if your intermediate calculations
were done in double or float precision, instead of reals.
What is true, though, is that
if precisionEquality(real x, real y) >= double.mant.dig
then cast(double)x == cast(double)y.
The converse does not apply because of the aforementioned subnormals. (But in
this case you kind of "got lucky").

I did the real case first because it is the most difficult. It's trivial to
extend it to other precisions.

-Don.

Aug 15 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

 It'll be hard to avoid magic numbers. Even a statement of "4 bits of
 precision" is a magic number - just in binary. I actually wouldn't mind
 being up-front about the fact that there are two notions of equality
 absolute and relative and have the test take two parameters (the 
 relative
 and absute tolerances) and go with those.


 The point is that in IEEE arithmetic, the relative and absolute tolerances 
 are
 related via the number of bits of precision.
 Near zero, "bits of precision"
 looks like an absolute tolerance. Away from zero, it looks like relative
 tolerance.

Do you mean "near one"? Near zero life gets very interesting. If I 
understand your algorithm correctly the number 1e-200 is still very far away 
from 0 or even 1e-201 or -1e-200. Then again I might not understand the 
algorithm.

 And there's a region in between where it's really difficult to think
 in terms of absolute or relative tolerance, they kind of blur together.

Agreed (I think). Over number that aren't "too small or too large" the 
absolute and relative measure pretty much the same thing. The distinction 
only gets important at the extremes.

 Note that there are many uses beyond simple "equal-enough" tests.
 This function allows you to say: At what value of x are f(x) and g(x) most
 different? eg when is exp(ln(x)) most different to x?
 You can just try a range of different x values, and find the one where
 precisionEquality(exp(ln(x)), x) is smallest. You don't need to worry 
 about
 whether you should be using absolute or relative tolerance, that's already 
 taken
 care of. So, if writing a function, you can find where the precision is 
 least
 adequate. Once you've done the best you can, you can put that into a unit 
 test.

 This gives you a simple way of reporting the accuracy of library functions 
 and
 identities, and for performing more powerful "second-order" unit tests.
 (eg, acos(cos(x)) must be close to x for all -PI/2 <= x <= PI/2).
 Then, if you know that cos(x) is accurate, you've proved that acos(x) is 
 also
 accurate. My goal is to provide one-line unit tests for most of the math
 functions.

In another post I pointed out that sin(PI) (which I just actually tried and 
got something around -1e-20) tests and having no digits of precision in 
common with 0. So I think in general an absolute component to the notion of 
equality will be important if you want trig functions to behave "normally".

 You're right: for equality tests, it probably is necessary to have 
 seperate
 implementations for double and float, because subnormals become normal 
 when
 moved to higher precision. It's only a problem if your intermediate 
 calculations
 were done in double or float precision, instead of reals.
 What is true, though, is that
 if precisionEquality(real x, real y) >= double.mant.dig
 then cast(double)x == cast(double)y.
 The converse does not apply because of the aforementioned subnormals. (But 
 in
 this case you kind of "got lucky").

Yup. I have no problem with doing things with real - I think that's the way 
to go. I'm just suggesting adding some overloads to make the common case of 
double and float easier to use.

 I did the real case first because it is the most difficult. It's trivial 
 to
 extend it to other precisions.

 -Don.

Aug 15 2005

Don Clugston <dac nospam.com.au> writes:

Ben Hinkle wrote:
The point is that in IEEE arithmetic, the relative and absolute tolerances 
are
related via the number of bits of precision.
Near zero, "bits of precision"
looks like an absolute tolerance. Away from zero, it looks like relative
tolerance.

 
 
 Do you mean "near one"? Near zero life gets very interesting. If I 
 understand your algorithm correctly the number 1e-200 is still very far away 
 from 0 or even 1e-201 or -1e-200. Then again I might not understand the 
 algorithm.

I did mean "near zero". You just need to go much smaller with 80-bit 
reals! real.min = 3.3e-4932 !!
And the smallest representable number is real.min * real.epsilon = 3.3 
e-4951. It's just amazing how much range you have with long doubles.
So 1e-200 is a very poor estimation to 0; there are *billions* of better 
IEEE reals!
My algorithm only gives 4e-4932 as close to 0 (and it's also close to 
-4e-4932). It measures closeness in "representability space".

You could reasonably argue that 1e-4900 is close to 1e-4930.
But in representability space they are no closer than 1 vs 1e30. The 
question then is, is representability space what we want?
(I suspect the answer is: sometimes, but not always. Maybe about half 
the time?)

 In another post I pointed out that sin(PI) (which I just actually tried and 
 got something around -1e-20) tests and having no digits of precision in 
 common with 0. So I think in general an absolute component to the notion of 
 equality will be important if you want trig functions to behave "normally".

Ah. I'm learning a lot here.
1e-20 is normally an exceedingly poor approximation to 0. But, since 
sin(x) maps "linearly" to the interval (-1, 1), 1e-20 is close to 0 as 
sin(x) will get.

Or alternatively, sin(PI) is small but not close to 0, because PI is not 
close enough to the mathematical pi.

sin(n*PI) ==0 is not an identity which is preserved!
But sin(x)*sin(x) + cos(x)*cos(x) == 1 seems to be preserved for all x.

Yet there are clearly many contexts when you want to treat sin(PI) as 
zero. The relevant quantity seems to be the difference divided by the 
range. For trig functions (sin + cos, at least), it seems you only want 
an absolute difference, regardless of how close to 0 you are.

A few weeks with D, and I've already learned more about numerical 
analysis than years with C++ :-)

-Don.

Aug 15 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Don Clugston" <dac nospam.com.au> wrote in message
news:ddrvuo$2e0g$1 digitaldaemon.com...
 Ben Hinkle wrote:
The point is that in IEEE arithmetic, the relative and absolute
tolerances are
related via the number of bits of precision.
Near zero, "bits of precision"
looks like an absolute tolerance. Away from zero, it looks like relative
tolerance.


 Do you mean "near one"? Near zero life gets very interesting. If I
 understand your algorithm correctly the number 1e-200 is still very far
 away from 0 or even 1e-201 or -1e-200. Then again I might not understand
 the algorithm.

 I did mean "near zero". You just need to go much smaller with 80-bit
 reals! real.min = 3.3e-4932 !!
 And the smallest representable number is real.min * real.epsilon = 3.3
 e-4951. It's just amazing how much range you have with long doubles.
 So 1e-200 is a very poor estimation to 0; there are *billions* of better
 IEEE reals!
 My algorithm only gives 4e-4932 as close to 0 (and it's also close
 to -4e-4932). It measures closeness in "representability space".

Ah ok. I didn't think you meant the underflow numbers as the the only things
near 0. To me that's waaaay too near (typically). I'm talking just about 
normalized (ie "proper") numbers. In regular everyday
use the concept of how close to zero is "close" is context dependent. When
you have a simple number like x = 1+eps (which is very close to 1 in
anyone's book) and then you ask "is x-1 near zero?" the answer depends on if
you consider eps to be near zero. By your measurement just about all the
time nothing will be near zero except zero (in particular eps will be
extremely far away). Or to put it another way a given application might 
consider 3 and 3.001 to be close and 2 and 2.001 close and 1 and 1.001 close 
but as soon as they try to say 0 and .001 are close your test would say they 
are infinitely far apart.
In my experience a rough measurement of near zero is
abs(x)<sqrt(eps). That's why people need to be able to control both the
relative tolerance (ie - bits of precision in agreement) and the absolute
tolerance (ie - distance to zero).

 You could reasonably argue that 1e-4900 is close to 1e-4930.
 But in representability space they are no closer than 1 vs 1e30. The
 question then is, is representability space what we want?
 (I suspect the answer is: sometimes, but not always. Maybe about half the
 time?)

You mean in your measurement they are not close. In typical numerical apps
that manipulate numbers near 1 a value of 1e-4900 is definetly near zero. It
all depends on the scale of the problem being solved. The numerical 
applications I can think of in fact force the user to choose what they mean 
when they say "are these roughly equal". There isn't just one notion of 
equal or approximate so maybe the thing to do is make feqabs and feqrel. I'd 
say your precision test would make a great feqrel.

Aug 16 2005

Don Clugston <dac nospam.com.au> writes:

Ben Hinkle wrote:
 you consider eps to be near zero. By your measurement just about all the
 time nothing will be near zero except zero (in particular eps will be
 extremely far away). Or to put it another way a given application might 
 consider 3 and 3.001 to be close and 2 and 2.001 close and 1 and 1.001 close 
 but as soon as they try to say 0 and .001 are close your test would say they 
 are infinitely far apart.

Yes. There is so much precision available near zero, it's insane. It's 
clear that with IEEE extendeds, you'll almost never underflow to zero.

 In my experience a rough measurement of near zero is
 abs(x)<sqrt(eps). That's why people need to be able to control both the
 relative tolerance (ie - bits of precision in agreement) and the absolute
 tolerance (ie - distance to zero).

Agreed.

You could reasonably argue that 1e-4900 is close to 1e-4930.
But in representability space they are no closer than 1 vs 1e30. The
question then is, is representability space what we want?
(I suspect the answer is: sometimes, but not always. Maybe about half the
time?)

 
 
 You mean in your measurement they are not close. In typical numerical apps
 that manipulate numbers near 1 a value of 1e-4900 is definetly near zero. 

Well, it is a genuine feature of the represention. I think that almost 
all of the time, having 64 bits available at 1e-4900 is not much use. 
They are definitely less useful than having 64 bits near 1!

It
 all depends on the scale of the problem being solved. The numerical 
 applications I can think of in fact force the user to choose what they mean 
 when they say "are these roughly equal". There isn't just one notion of 
 equal or approximate so maybe the thing to do is make feqabs and feqrel. I'd 
 say your precision test would make a great feqrel.

I concede -- you're right.
There are actually *three* 'close enough tests, not two:
(1) relative error
(2) absolute error near zero, to cope with subnormals/limited precision 
of FP numbers
(3) absolute error near zero, which is application-specific.

My code deals with (1) and (2) which are intimately related via the 
machine representation.
But test (3) is important. As you say, it is going to be very rare that 
the difference between 1e-4900 and 1e-4930 is significant.
It seems like a difficult problem in general, though.

Consider the case you mentioned earlier, sin(PI).
Consider writing a unit test that sin(n*PI)==0 for all n.

sin(0) = 0
sin(PI) = -5e-20
sin(PI*1e5) = 3.7e-15
sin(PI*1e10) = 1.4 e-9
sin(PI*1e15) = 6.7 e-5
sin(PI*1e18) = 0.041
sin(PI*1e20) = 3.14159 e+20    ?????? Hmmm.
              also cos(PI*1e20) = 3.14159 e+20
sin(infinity) = nan.

It appears to be impossible to come up with an absolute tolerance
for what "close enough" means for all these cases; clearly it
depends on the tolerance of the input. Can this be codified somehow?

I think you'd hope that the following should be true for all x, n:
-1 <= sin(x) <= 1
sin(x)*sin(x) + cos(x)*cos(x) = 1
sin (n*PI) is closer to zero
   than sin(n*PI+x.epsilon) and sin(n*PI-x.epsilon), if
   n*PI + PI >= n*PI * (1+ 4*epsilon)
   (ie, sin(x) crosses zero between sin(n*PI) and sin(n*PI +- epsilon) ).

Anyway, sin(PI*1e20) fails any test I can think of. I think it should 
return nan.

-Don.

Aug 16 2005

Don Clugston <dac nospam.com.au> writes:

Ben Hinkle wrote:
 In typical numerical apps that manipulate numbers near 1 a value of 

1e-4900 is definetly near zero.
 It all depends on the scale of the problem being solved.

IDEA: Do you think the same applies for very large numbers?
For example, I think 1e4900 would often be acceptably close to 1e4930,
and both are reasonable approximations to infinity (and in fact they'll 
be equal to infinity if you later truncate it to double or float precision).

The IEEE number line is basically

-inf       -1           0           1          +inf
|-----------|-----------|-----------|-----------|

with all four sections of almost equal length.

If this is the case, then actually you don't want an absolute tolerance:
you want a comparison of the exponent bits.

So, the appropriate algorithm is something like:
if it is near 1, then compare the mantissa bits. Result: 0 to mant_dig.
if it is near 0 or infinity, compare the exponent bits (4930 vs 4900).

This would mean that 0 and real.min would be equal to 16 bits, even 
before taking the mantissa into account.
But you'd only start counting from the first set bit in the exponent.
eg. 0x1.0p+1 and 1.1111p+1 would have only one bit in common from the 
exponent + 1 bit from the mantissa = 2 bits in total.
But 0x1.00p+0x3FFF (6e4931) and ( 1.1111p+0x3F00 (1e4850) would have 
6bits in common from the exponent + 0 from the mantissa.

Q. Is there any way to combine these two?
You might hope that
precisionEquality(x, y) >=  double.something
if and only if cast(double)x == cast(double)y;
and likewise for float.
This means that double.min would have to have as much closeness to 0 as 
1+double.epsilon does to 1.
I think that when you add the requirements for float, it becomes 
mathematically impossible. But it might lead in the right direction.

-Don.

Aug 16 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Don Clugston" <dac nospam.com.au> wrote in message 
news:ddudr8$1nqo$1 digitaldaemon.com...
 Ben Hinkle wrote:
 In typical numerical apps that manipulate numbers near 1 a value of

 1e-4900 is definetly near zero.
 It all depends on the scale of the problem being solved.

 IDEA: Do you think the same applies for very large numbers?
 For example, I think 1e4900 would often be acceptably close to 1e4930,
 and both are reasonable approximations to infinity (and in fact they'll be 
 equal to infinity if you later truncate it to double or float precision).

 The IEEE number line is basically

 -inf       -1           0           1          +inf
 |-----------|-----------|-----------|-----------|

 with all four sections of almost equal length.

Actually I like your original relative equality for large numbers. It's only 
at zero that relative inequality diverges from people's expectations of the 
real line (which is how one thinks of floating point numbers casually). Now 
if the real line were a real circle instead (connect -inf to +inf) then I'd 
say numbers near +-inf should be treated like near zero.

 If this is the case, then actually you don't want an absolute tolerance:
 you want a comparison of the exponent bits.

 So, the appropriate algorithm is something like:
 if it is near 1, then compare the mantissa bits. Result: 0 to mant_dig.
 if it is near 0 or infinity, compare the exponent bits (4930 vs 4900).

 This would mean that 0 and real.min would be equal to 16 bits, even before 
 taking the mantissa into account.
 But you'd only start counting from the first set bit in the exponent.
 eg. 0x1.0p+1 and 1.1111p+1 would have only one bit in common from the 
 exponent + 1 bit from the mantissa = 2 bits in total.
 But 0x1.00p+0x3FFF (6e4931) and ( 1.1111p+0x3F00 (1e4850) would have 6bits 
 in common from the exponent + 0 from the mantissa.

 Q. Is there any way to combine these two?
 You might hope that
 precisionEquality(x, y) >=  double.something
 if and only if cast(double)x == cast(double)y;
 and likewise for float.
 This means that double.min would have to have as much closeness to 0 as 
 1+double.epsilon does to 1.
 I think that when you add the requirements for float, it becomes 
 mathematically impossible. But it might lead in the right direction.

 -Don.

I still think the way to go is either
1) have one function with two knobs - one for abs tol and one for rel tol or
2) have two functions each with one knob - one for abs tol and one for rel 
tol
The function I posted originally was feq with two knobs but now I'm thinking 
its probably a better idea to have two functions. With one it's easy to 
forget which is the abstol and which is the reltol since D doesn't have 
named optional parameters. So I'd prefer two functions feqrel and feqabs. 
The name feqrel tells the user that essentially nothing is close to zero 
since a relative tolerance is used (I don't mind that feqrel doesn't 
actually use a relative tolerance for denormalized numbers - it might just 
be easier to say underflow values all are approx 0 but who knows). The name 
feqabs tells users not to use it for large numbers and that it matches 
regular closeness notions for numbers near zero.

Aug 17 2005

Shammah Chancellor <Shammah_member pathlink.com> writes:

In article <ddudr8$1nqo$1 digitaldaemon.com>, Don Clugston says...
The IEEE number line is basically

-inf       -1           0           1          +inf
|-----------|-----------|-----------|-----------|

with all four sections of almost equal length.

What is the formula that defines this coordinate system?  Why not use this to
redefine the numbers in 
terms of this system and then take their distance appart and compare that to an
absolute tolerance for 
approximately equal.  That would scale correctly.  


-Sha

Aug 17 2005

Don Clugston <dac nospam.com.au> writes:

Shammah Chancellor wrote:
 In article <ddudr8$1nqo$1 digitaldaemon.com>, Don Clugston says...
 
The IEEE number line is basically

-inf       -1           0           1          +inf
|-----------|-----------|-----------|-----------|

with all four sections of almost equal length.

 
 
 What is the formula that defines this coordinate system?  Why not use this to
 redefine the numbers in 
 terms of this system and then take their distance appart and compare that to an
 absolute tolerance for 
 approximately equal.  That would scale correctly.  
 
 
 -Sha
 

Yes, and that's exactly what my code does, but as Ben has pointed out, 
the internal representation frequently doesn't match your application.
Specifically, IEEE reals are fantastic at distinguishing really small 
numbers, but you often don't care whether a number is 1e-2000 or 
1e-4000: it's just "really really small".
I think the problem only really comes up when comparing numbers to zero,
or possibly to infinity.
Even 0.000000000000000000001 is very close to 1, and very very far from 
0 on the IEEE number line.
Actually comparison with 0 is complicated, because sometimes you demand 
exact equality, and sometimes 'tiny' is acceptable. Also sometimes sign
matters (a tiny positive number might be completely different to a 
positive negative number).

-Don.

Aug 17 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

"Don Clugston" <dac nospam.com.au> wrote in message 
news:de0ms5$o94$1 digitaldaemon.com...
 Shammah Chancellor wrote:
 In article <ddudr8$1nqo$1 digitaldaemon.com>, Don Clugston says...

The IEEE number line is basically

-inf       -1           0           1          +inf
|-----------|-----------|-----------|-----------|

with all four sections of almost equal length.


 What is the formula that defines this coordinate system?  Why not use 
 this to
 redefine the numbers in terms of this system and then take their distance 
 appart and compare that to an
 absolute tolerance for approximately equal.  That would scale 
 orrectly.  -Sha

 Yes, and that's exactly what my code does, but as Ben has pointed out, the 
 internal representation frequently doesn't match your application.
 Specifically, IEEE reals are fantastic at distinguishing really small 
 numbers, but you often don't care whether a number is 1e-2000 or 1e-4000: 
 it's just "really really small".
 I think the problem only really comes up when comparing numbers to zero,
 or possibly to infinity.
 Even 0.000000000000000000001 is very close to 1, and very very far from 0 
 on the IEEE number line.
 Actually comparison with 0 is complicated, because sometimes you demand 
 exact equality, and sometimes 'tiny' is acceptable. Also sometimes sign
 matters (a tiny positive number might be completely different to a 
 positive negative number).

 -Don.

Another way to describe your metric is like the usual Euclidean metric on 
the following nubmer lines:

    4           2           1          1/2         1/4
 ---|-----------|-----------|-----------|-----------|----


   subnormal     0    -subnormal
   --|--|--|--|--|--|--|--|--|--


  -1/4         -1/2         -1         -2          -4
 ---|-----------|-----------|-----------|-----------|----

For example if two numbers like 2.001 and 2.002 are d distance apart then 
4.002 and 4.004 are also d distance apart because they have the same 
mantissas but just differ in the exponents. That's also why essentially 
nothing is close to 0 and why the positive and negative axes are "not 
comparable". It's also why 0 is so special.
That is, the formula for the coordinate system is log2(x).

Aug 17 2005

"Walter" <newshound digitalmars.com> writes:

I think this is pretty cool. I've thought about building such a thing, but
never got around to it. You've got the right idea, which is the number of
significant bits that match.

I agree that if one or both of the operands are NaN, the result should be 0.

And you've got unit tests, too, which I love to see! Need to add a test for
(real.nan, real.nan).

I need to know what license you pick for it - public domain?

Aug 15 2005

Don Clugston <Don_member pathlink.com> writes:

In article <ddr5og$1phb$1 digitaldaemon.com>, Walter says...
I think this is pretty cool. I've thought about building such a thing, but
never got around to it. You've got the right idea, which is the number of
significant bits that match.

I agree that if one or both of the operands are NaN, the result should be 0.

And you've got unit tests, too, which I love to see! Need to add a test for
(real.nan, real.nan).

I need to add a few more tests for denormals, too. There are so many corner
cases!

I need to know what license you pick for it - public domain?

Public domain.

Please note that I haven't finished with it. It works, but some of the comments
are wrong! And some unit tests are missing, and there's some further
optimisation/ cleanup to do. I just wanted to check whether people thought this
was the correct approach before I spent any more time on it.
I will post an update soon.

Aug 15 2005

"Ben Hinkle" <ben.hinkle gmail.com> writes:

 I'd really appreciate feedback, since it's surprisingly tricky, and AFAIK, 
 no
 language has ever got this right.

 -Don.

Well Walter seems gung-ho with it but please add in a difference test with 
feq (at least near 0 where it matters) so that, for example, sin(PI) and 
sin(-PI) and 0 all feq with each other. I think using a pure relative test 
is too narrow.

Aug 15 2005

Don Clugston <Don_member pathlink.com> writes:

In article <ddre2a$1vck$1 digitaldaemon.com>, Ben Hinkle says...
 I'd really appreciate feedback, since it's surprisingly tricky, and AFAIK, 
 no
 language has ever got this right.

 -Don.

Well Walter seems gung-ho with it but please add in a difference test with 
feq (at least near 0 where it matters) so that, for example, sin(PI) and 
sin(-PI) and 0 all feq with each other. I think using a pure relative test 
is too narrow.

It's not a pure relative test. Subnormals will compare equal to zero. So it
should work. But...
I just checked, and was surprised to find that sin(PI) = 1p-64 ~ 1e-20.
If you define
real reducedsin(real x) { return sin(x-PI); }
then reducedsin(PI) ==0. 

I would have thought that sin(n*PI) ==0 was an important identity to preserve.
Could there be an out-by-1 bug in the argument reduction function used by the
trig functions? This is exactly the sort of thing I'm hoping to uncover.

-Don.

Aug 15 2005

D Programming

C/C++ Programming

Other

digitalmars.D - Submission: Floating point near-equality