www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.bugs - [Issue 3248] New: lossless floating point formatting

reply d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3248

           Summary: lossless floating point formatting
           Product: D
           Version: unspecified
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: Phobos
        AssignedTo: nobody puremagic.com
        ReportedBy: moi667 hotmail.com


Could an option be added to the formatting to elide trailing zero's for %f ?
That way it is possible to create an optimal lossless formatting for which the
following holds:

float f;
s = format(f);
float f2 = to!(float)(s);
assert(f==f2);

The formatting I'm trying to get can be seen here (decimal):
http://www.h-schmidt.net/FloatApplet/IEEE754.html

%g fails to format like this because it uses %f for as small as 10^-5,
thus loosing precision for floats with leading zero's, like 0.00001234567.

Fixing this by using %f for 10^-5..10^-1 fails because it doesn't elide
trailing zero's making it suboptimal space-wise. 

It would be even nicer to have this lossless formatting added to std.format!
I would even suggest making this the default formatting for floating point;
floating point isn't as straight forward as integral and it is easy to think
the current formatting holds all information.

Compared to the hex %a format this new lossless format will be better readable
(less bug-prone) and generally shorter (0.1 will be 0.1).

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 12 2009
next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3248


Don <clugdbug yahoo.com.au> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |clugdbug yahoo.com.au





It's not that easy, actually. When should it print 0.09999999999999999, and
when should it print 0.1 ? The code to do it correctly is amazingly
complicated.
Just be aware that what you're asking for is much more difficult than you
probably imagine.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 12 2009
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3248







 It's not that easy, actually. When should it print 0.09999999999999999, and
 when should it print 0.1 ? The code to do it correctly is amazingly
 complicated.
 Just be aware that what you're asking for is much more difficult than you
 probably imagine.
It is less difficult than you imagine :) Lets take floats: A float has at most 24bits of precision 2^-24 = 0.000000059604644775390625 2^-23 = 0.00000011920928955078125 to distinguish between these two you only need a precision of 8. Thus %.8e will always be lossless but isn't always the nicest way of representation. %g fixes this by using %f if the exponent for an e format is greater than -5 and less than the precision. The less than precision part is correct, but the greater than 10^-5 is bad as the precision specifies the number of digits generated after the decimal point; not excluding leading zeros. If %g would be changed to use %f only between 10^-1 and precision that would solve that problem, if %f were to elide trailing zeros. Back to the 0.1 question. 0.1 is actually saved as 0.1000000012... Eliding trailing zeros from %f.8 would be sufficient to get 0.1 -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Aug 12 2009
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3248


Stewart Gordon <smjg iname.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |smjg iname.com





I can see a few possible approaches to lossless floating point formatting:

(a) decimal with infinite precision, minus trailing zeros
(b) minimum number of significant figures guaranteed to be unique, minus
trailing zeros
(c) the shortest possible string that, when parsed as a floating point, is
exactly this number

(a) clearly isn't what the reporter is asking for.

(b) seems straightforward.  (Is the number of s.f. in question just the .dig
property?)

(c) is optimal, and could probably be implemented quite simply (not sure
whether it would be most efficient though) with the aid of the nextUp and
nextDown functions.  This would also address the question in comment 1, though
I'm not sure how easy it would be to implement this efficiently.

But (b) and (c) are ambiguous: do we go by uniqueness/exactitude in the real
type or in the actual floating point type being used?  I can see that sometimes
the app'll know what type it will later be read into, and sometimes it won't.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 12 2009
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3248






As far as I understand it, removing trailing zeros from .8 precision and (c)
are the same.
This is because the first (right to left) non-zero you encounter is there
because of 2^x.

I actually used nextUp to test a few ranges of floats :) (I have a not so fast
computer) 

I remember .dig being 6 for all floats (could be wrong here, not close to any
dmd.exe)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 12 2009
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3248


Andrei Alexandrescu <andrei metalanguage.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |andrei metalanguage.com





22:43:37 PDT ---
I recommend anyone interested in the subject to peruse the papers:

"How to Read Floating Point Numbers Accurately"
ftp://ftp.ccs.neu.edu/pub/people/will/howtoread.ps

and

"Printing Floating-Point Numbers Quickly and Accurately"
www.cs.indiana.edu/~burger/FP-Printing-PLDI96.pdf

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 12 2009
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3248


Walter Bright <bugzilla digitalmars.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |bugzilla digitalmars.com





22:47:28 PDT ---
Right, this problem is an old one, and there's no reason to reinvent the wheel. 
Also, the formatting for them works by simply forwarding the job to the
underlying C library. Some C implementations of this are better than others.

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 14 2009
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3248






Does this mean I can forget about getting this in phobos?
Could then at least an option be added to remove those trailing zeros for %f?
I don't see why %g should be that privileged ;)

-- 
Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
Aug 15 2009
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3248





 As far as I understand it, removing trailing zeros from .8 precision and (c)
 are the same.
I doubt it ... I think the optimal number of decimal s.f. would depend on the binary exponent. But I'll experiment when I have time.
 I remember .dig being 6 for all floats (could be wrong here, not close to any
 dmd.exe)
The spec describes .dig as "number of decimal digits of precision", which seems ambiguous. Is it a property of the type or the value? If it's a type property, is it the maximum number of s.f. that may be required to express a number of the type unambiguously, or the number of s.f. to which numbers are guaranteed to be storeable unambiguously? If a value property, it is the number of s.f. according to which of the approaches I listed, or something else? -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 07 2009
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3248






 As far as I understand it, removing trailing zeros from .8 precision and (c)
 are the same.
I doubt it ... I think the optimal number of decimal s.f. would depend on the binary exponent. But I'll experiment when I have time.
You are correct. Some numbers need an extra digit.
 I remember .dig being 6 for all floats (could be wrong here, not close to any
 dmd.exe)
The spec describes .dig as "number of decimal digits of precision", which seems ambiguous. Is it a property of the type or the value?
It's a property of the type. If it's a type
 property, is it the maximum number of s.f. that may be required to express a
 number of the type unambiguously, or the number of s.f. to which numbers are
 guaranteed to be storeable unambiguously?  
Neither. It's the number of sic figs which are accurate in the worst case. So it's the _minimum_ number of digits which are stored. To unambiguously define the number, more digits are almost always required. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 07 2009
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3248




 Neither. It's the number of sic figs which are accurate in the worst case. So
 it's the _minimum_ number of digits which are stored. To unambiguously define
 the number, more digits are almost always required.
So, if you try to put a decimal number into a float, it's how many s.f. you can get out again and be sure they'll be the same. I don't see in what cases this differs from "the number of s.f. to which numbers are guaranteed to be storeable unambiguously".... -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 07 2009
prev sibling next sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3248





 Neither. It's the number of sic figs which are accurate in the worst case. So
 it's the _minimum_ number of digits which are stored. To unambiguously define
 the number, more digits are almost always required.
So, if you try to put a decimal number into a float, it's how many s.f. you can get out again and be sure they'll be the same. I don't see in what cases this differs from "the number of s.f. to which numbers are guaranteed to be storeable unambiguously"....
It may be the same. I wasn't quite sure what you meant by "unambiguously". In both directions binary<->decimal there is nearly always more than one choice. -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 07 2009
prev sibling parent d-bugmail puremagic.com writes:
http://d.puremagic.com/issues/show_bug.cgi?id=3248







 As far as I understand it, removing trailing zeros from .8 precision and (c)
 are the same.
I doubt it ... I think the optimal number of decimal s.f. would depend on the binary exponent. But I'll experiment when I have time.
You are correct, removing trailing zeros from %.8e isn't optimal, but I thought it was at least lossless..
 
 You are correct. Some numbers need an extra digit.
 
Could you maybe provide one? As I did some ranges with nextUp and didn't find any. A near optimal lossless formatting is fine too :) -- Configure issuemail: http://d.puremagic.com/issues/userprefs.cgi?tab=email ------- You are receiving this mail because: -------
Sep 07 2009