www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Replacement for snprintf

reply berni44 <dlang d-ecke.de> writes:
In PR 7222 [1] Robert Schadek suggested replacing the call to 
snprinf in std.format with an own method written in D. During the 
last days I took a deeper look into this and meanwhile I've got a 
function that works for floats (and probably also doubles, but I 
havn't tested that yet and it should also work with reals if 
ucent would be available; without ucent I need a workaround for 
real or fall back to BigInt).

I only implemented f qualifier yet, but it shouldn't be difficult 
to add e and g qualifiers and the uppercase versions. Also some 
work needs to be done, to implement th
but again, I think, this will not be very difficult. 
Unfortunately I'll be busy with some other (non-D) stuff for some 
time. I'll probably continue work on this someday in november.

I checked correctness for floats by comparing to the result of 
snprintf for about 1% of all numbers (I will do that for all, 
before filing an PR though). The only difference are rounding 
issues, when the number is exactly between two adjacent ways of 
displaying. The implementation of snprintf on my computer always 
rounds towards zero while mine rounds in the opposite direction. 
(E.g. 0.125 rounded to two digits is 0.13 in my implementation 
while it's 0.12 in snprintfs implementation) I doubt, that 
different implementations of printf-variants are all identical in 
this regard.

I also compared the speed of both implementations. They are 
generally in the same order of magnitude (600-2800ns per number, 
depending on precision and number). On average my implementation 
is slightly faster. For numbers close to 0 the snprintf 
implementation is faster (I wasn't able to follow the algorithm 
they use), especially if the desired precision is large (I'll try 
to improve this, because it might get a real problem for reals). 
For all other numbers my current implementation wins by a more or 
less small margin.

[1] 
https://github.com/dlang/phobos/pull/7222#issuecomment-544909188
Oct 30
next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 10/30/2019 06:44 AM, berni44 wrote:
 The only difference are rounding issues, when the number is
 exactly between two adjacent ways of displaying. The implementation of
 snprintf on my computer always rounds towards zero while mine rounds in
 the opposite direction. (E.g. 0.125 rounded to two digits is 0.13 in my
 implementation while it's 0.12 in snprintfs implementation)
The tie-breaker is to always round towards the even digit. So it should always produce 1.12, 1.14, etc. Ali
Oct 30
parent reply berni44 <dlang d-ecke.de> writes:
On Wednesday, 30 October 2019 at 15:48:44 UTC, Ali Çehreli wrote:
 The tie-breaker is to always round towards the even digit. So 
 it should always produce 1.12, 1.14, etc.
As far as I know that's for avoiding error propagation, when intermediate results need to be rounded. When I'm not completely mistaken, Donald Knuth prooved that rounding toward even avoids errors that might building up using several such steps. But here there is little chance, that the result will be used for new calculations. It's most often used for printing a result that humans have to read. This is different.
Oct 30
parent reply Jon Degenhardt <jond noreply.com> writes:
On Wednesday, 30 October 2019 at 16:04:10 UTC, berni44 wrote:
 On Wednesday, 30 October 2019 at 15:48:44 UTC, Ali Çehreli 
 wrote:
 The tie-breaker is to always round towards the even digit. So 
 it should always produce 1.12, 1.14, etc.
As far as I know that's for avoiding error propagation, when intermediate results need to be rounded. When I'm not completely mistaken, Donald Knuth prooved that rounding toward even avoids errors that might building up using several such steps. But here there is little chance, that the result will be used for new calculations. It's most often used for printing a result that humans have to read. This is different.
It's reasonably common to have numeric values written out in text format and read back in and used in subsequent computations. Not always a great idea, especially when done without much consideration for round-off errors. But it's not uncommon. --Jon
Oct 30
parent reply berni44 <dlang d-ecke.de> writes:
On Wednesday, 30 October 2019 at 17:50:03 UTC, Jon Degenhardt 
wrote:
 It's reasonably common to have numeric values written out in 
 text format and read back in and used in subsequent 
 computations. Not always a great idea, especially when done 
 without much consideration for round-off errors. But it's not 
 uncommon.
But IMHO this is the fault of people who do this and not the fault of a printing routine. But: When pondering about how to fix the results of format for ranges of strings (it places currently quotes arround each string, which is somewhat inconsistent because single strings are printed without quotes, and causes confusion). I came up with the idea of having a new format qualifier, maybe S like source, in addition to s, which prints the type in a way, that it can be directly used in D code (which is, as far as I know, the reason why the quotes are printed). That could be also used, to produce a representation of a float, that, when readin, is still the same float as before; which could be done by ryu or grisu algorithm, because these algorithms have exactly this goal.
Oct 30
next sibling parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 10/30/2019 12:19 PM, berni44 wrote:

 But: When pondering about how to fix the results of format for ranges of
 strings (it places currently quotes arround each string
Just to make sure, you are aware of the optional '-' before '(', right? "%-(%s%)" does not print the quotes. Ali
Oct 30
parent berni44 <dlang d-ecke.de> writes:
On Wednesday, 30 October 2019 at 19:28:27 UTC, Ali Çehreli wrote:
 Just to make sure, you are aware of the optional '-' before 
 '(', right? "%-(%s%)" does not print the quotes.
I know this. I personally think, it is somewhat ugly, but I understand how it came to have it like this. My rationale is more like this: Currently it probably won't be possible to change the behavior of %s, because that would be a code breaking change. But there might be a time in the future, where it's possible to do some code breaking changes, maybe when D2 -> D2.1 or something like this. It will be much easier to do these changes at that time, when there is a well tested, simple and working alternative that can be pointed out to the users. Therefore it's a good idea to implement this alternative right now. Isn't it?
Oct 30
prev sibling parent reply Sebastiaan Koppe <mail skoppe.eu> writes:
On Wednesday, 30 October 2019 at 19:19:06 UTC, berni44 wrote:
 On Wednesday, 30 October 2019 at 17:50:03 UTC, Jon Degenhardt 
 wrote:
 It's reasonably common to have numeric values written out in 
 text format and read back in and used in subsequent 
 computations. Not always a great idea, especially when done 
 without much consideration for round-off errors. But it's not 
 uncommon.
But IMHO this is the fault of people who do this and not the fault of a printing routine.
You are correct, but people will still blame the printing routine.
Oct 30
parent reply drug <drug2004 bk.ru> writes:
On 10/30/19 10:54 PM, Sebastiaan Koppe wrote:
 On Wednesday, 30 October 2019 at 19:19:06 UTC, berni44 wrote:
 On Wednesday, 30 October 2019 at 17:50:03 UTC, Jon Degenhardt wrote:
 It's reasonably common to have numeric values written out in text 
 format and read back in and used in subsequent computations. Not 
 always a great idea, especially when done without much consideration 
 for round-off errors. But it's not uncommon.
But IMHO this is the fault of people who do this and not the fault of a printing routine.
You are correct, but people will still blame the printing routine.
I wouldn't state it is any fault. In some cases it is much more productive to have text representation of data than binary ones. Initially I believed too that binary representation is the more suitable but afterwards I was forced to use text format and that gives me a good result.
Oct 31
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 10/31/2019 1:27 AM, drug wrote:
 In some cases it is much more productive to 
 have text representation of data than binary ones. Initially I believed too
that 
 binary representation is the more suitable but afterwards I  was forced to
use 
 text format and that gives me a good result.
To get round-trip 100% accuracy, print the floats in hex using the %A format.
Oct 31
parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Thursday, 31 October 2019 at 20:20:24 UTC, Walter Bright wrote:
 On 10/31/2019 1:27 AM, drug wrote:
 In some cases it is much more productive to have text 
 representation of data than binary ones. Initially I believed 
 too that binary representation is the more suitable but 
 afterwards I  was forced to use text format and that gives me 
 a good result.
To get round-trip 100% accuracy, print the floats in hex using the %A format.
DtoA is also supposed to have 100% accuracy, when it comes to value, not necessarily to binary representation though. I'd still prefer grisu2 over ryu, since it easier to understand and I already have a ctfeable version of it. (it can't be safe though since it casts double* to ulong*)
Oct 31
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Oct 31, 2019 at 09:04:49PM +0000, Stefan Koch via Digitalmars-d wrote:
[...]
 I'd still prefer grisu2 over ryu, since it easier to understand and I
 already have a ctfeable version of it.
Meybe we should be using your implementation then? No need to duplicate work if it's already been done.
 (it can't be safe though since it casts double* to ulong*)
But surely it can be trusted? T -- Your inconsistency is the only consistent thing about you! -- KD
Oct 31
prev sibling next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Oct 30, 2019 at 01:44:52PM +0000, berni44 via Digitalmars-d wrote:
 In PR 7222 [1] Robert Schadek suggested replacing the call to snprinf
 in std.format with an own method written in D. During the last days I
 took a deeper look into this and meanwhile I've got a function that
 works for floats (and probably also doubles, but I havn't tested that
 yet and it should also work with reals if ucent would be available;
 without ucent I need a workaround for real or fall back to BigInt).
 
 I only implemented f qualifier yet, but it shouldn't be difficult to
 add e and g qualifiers and the uppercase versions. Also some work
 needs to be done, to implement th
 I think, this will not be very difficult. Unfortunately I'll be busy
 with some other (non-D) stuff for some time. I'll probably continue
 work on this someday in november.
If you haven't already, please read: https://www.zverovich.net/2019/02/11/formatting-floating-point-numbers.html especially the papers linked in the first paragraph. Formatting floating-point numbers is not a trivial task. It's easy to write up something that works for common cases, but it's not so easy to get something to gives the best results in *all* cases. You probably should use the algorithms referenced above for your implementation, instead of coming up with your own that may have unexpected corner cases that don't produce the right output. T -- Valentine's Day: an occasion for florists to reach into the wallets of nominal lovers in dire need of being reminded to profess their hypothetical love for their long-forgotten.
Oct 30
parent reply berni44 <dlang d-ecke.de> writes:
On Wednesday, 30 October 2019 at 17:41:26 UTC, H. S. Teoh wrote:
 If you haven't already, please read:

 	https://www.zverovich.net/2019/02/11/formatting-floating-point-numbers.html

 especially the papers linked in the first paragraph.
Thanks for that link. I havn't had a look into the grisu algorithms. But I'll definitivly do that.
 Formatting floating-point numbers is not a trivial task. It's 
 easy to write up something that works for common cases, but 
 it's not so easy to get something to gives the best results in 
 *all* cases.
I know, that this is something we all wish. Anyway, my goal is set somewhat lower: I'd like to replace the existing call to snprintf with something that is programmed in D and which should be pure, safe and ctfeable. And ideally it should not be slower then snprintf.
 You probably should use the algorithms referenced above for 
 your implementation,
I read through the paper for the ryu algorithm and rejected it (at least for me; if someone else is goint to implement it and file a PR that's fine). My reason for rejecting is, that the algorithm has not exactly the same goal as printf, which IMHO means, that it cannot be used here; and that it needs a lookuptable, that is too large (300K for 128bit reals). I fear a little bit, from what I read in the ryu paper about the grisu algorithms, that it has the first of the above mentioned problems too. But yet I can't tell for sure.
 instead of coming up with your own that may have
 unexpected corner cases that don't produce the right output.
Obviously I need to prove, that the algorithm is correct somehow. While this can be done for floats by running it on all numbers and comparing these results with the result of snprintf (or the result calculated by bc), for doubles and reals, this isn't possible anymore (a random sample can be tested anyway, but that's no proof). Anyway, I think, that the proof isn't hard to give. The current algorithm is short and straight forward. (And: When I implement one of the mentioned algorithms, it can still contain bugs, because I made a mistake somewhere.)
Oct 30
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Oct 30, 2019 at 07:11:14PM +0000, berni44 via Digitalmars-d wrote:
 On Wednesday, 30 October 2019 at 17:41:26 UTC, H. S. Teoh wrote:
[...]
 Formatting floating-point numbers is not a trivial task. It's easy
 to write up something that works for common cases, but it's not so
 easy to get something to gives the best results in *all* cases.
I know, that this is something we all wish. Anyway, my goal is set somewhat lower: I'd like to replace the existing call to snprintf with something that is programmed in D and which should be pure, safe and ctfeable. And ideally it should not be slower then snprintf.
Yeah, I've been waiting for a long time for a pure, safe, and CTFE-able floating point formatter in D. What rumbu said about rounding mode, though, makes me fear that pure may not be attainable if we're going to be IEEE-compliant (since accessing the current rounding mode would be technically impure). Then again, the CTFE-able version can probably be made pure, since CTFE cannot change rounding mode in the compiler's runtime environment, so if we detect CTFE then we can just assume the default rounding mode. auto formatFloat(F)(F f) { FloatingPointControl fc; if (__ctfe) return formatFloatImpl(fc.roundToNearest); // pure else return formatFloatImpl(fc.rounding); // impure } should do it.
 You probably should use the algorithms referenced above for your
 implementation,
I read through the paper for the ryu algorithm and rejected it (at least for me; if someone else is goint to implement it and file a PR that's fine). My reason for rejecting is, that the algorithm has not exactly the same goal as printf, which IMHO means, that it cannot be used here; and that it needs a lookuptable, that is too large (300K for 128bit reals).
Why is it too large? Couldn't you generate the table with CTFE? :-D Or statically generate it and then import it, like std.uni does with the various Unicode tables (see std.internal.unicode_*). [...]
 Obviously I need to prove, that the algorithm is correct somehow.
 While this can be done for floats by running it on all numbers and
 comparing these results with the result of snprintf (or the result
 calculated by bc), for doubles and reals, this isn't possible anymore
 (a random sample can be tested anyway, but that's no proof).
Are we just copying whatever snprintf does? Is snprintf really a reliable standard to go by?
 Anyway, I think, that the proof isn't hard to give. The current
 algorithm is short and straight forward. (And: When I implement one of
 the mentioned algorithms, it can still contain bugs, because I made a
 mistake somewhere.)
You don't necessarily have to implement grisu, et al, verbatim, but your algorithm should at least gracefully handle the special cases and potentially problematic cases cited in the papers. T -- Too many people have open minds but closed eyes.
Oct 30
prev sibling next sibling parent reply Rumbu <rumbu rumbu.ro> writes:
On Wednesday, 30 October 2019 at 13:44:52 UTC, berni44 wrote:
 In PR 7222 [1] Robert Schadek suggested replacing the call to 
 snprinf in std.format with an own method written in D. During 
 the last days I took a deeper look into this and meanwhile I've 
 got a function that works for floats (and probably also 
 doubles, but I havn't tested that yet and it should also work 
 with reals if ucent would be available; without ucent I need a 
 workaround for real or fall back to BigInt).

 [...]
According to ieee754-2008: "5.12.2 External decimal character sequences representing finite numbers [...] For binary formats, all conversions of H significant digits or fewer round correctly according to the applicable rounding direction;" Where H is 9 for single, 17 for double. IEE754 doesn't specify a H for reals. That means that snprintf must use the current rounding mode that can be read using FloatingPointControl.rounding from std.math.
Oct 30
parent reply berni44 <dlang d-ecke.de> writes:
On Wednesday, 30 October 2019 at 18:16:56 UTC, Rumbu wrote:
 That means that snprintf must use the current rounding mode 
 that can be read using FloatingPointControl.rounding from 
 std.math.
Is it really a "must"? We are not completely bound by the IEEE standard and, if good reasons are available, might reject it. For example, comparing two floats with <= produces either "false" or "true" in D. According to IEEE there should be a third result possible, namly "not comparable". Having said this, it would be possible to implement it the way you claim, but probably at some cost (=slower, more and less easy readable lines of code). I'll think about it.
Oct 30
next sibling parent reply Rumbu <rumbu rumbu.ro> writes:
On Wednesday, 30 October 2019 at 19:28:44 UTC, berni44 wrote:
 On Wednesday, 30 October 2019 at 18:16:56 UTC, Rumbu wrote:
 That means that snprintf must use the current rounding mode 
 that can be read using FloatingPointControl.rounding from 
 std.math.
Is it really a "must"? We are not completely bound by the IEEE standard and, if good reasons are available, might reject it. For example, comparing two floats with <= produces either "false" or "true" in D. According to IEEE there should be a third result possible, namly "not comparable". Having said this, it would be possible to implement it the way you claim, but probably at some cost (=slower, more and less easy readable lines of code). I'll think about it.
I don't know the inners of your code, but I suppose that before "printing" you end up with an integer value and a ten-based exponent. In this case rounding becomes a question of how do you interpret the remainder of a division by a power of ten. Because I spent a lot of time figuring out how to format correctly decimal numbers, here is some piece of code I use in order to format decimal numbers depending on the rounding mode, fully compliant with the standard. https://github.com/rumbu13/decimal/blob/a6bae32d75d56be16e82d37af0c8e4a7c08e318a/src/decimal/decimal.d#L7296 Hope this helps.
Oct 30
parent berni44 <dlang d-ecke.de> writes:
On Wednesday, 30 October 2019 at 20:29:34 UTC, Rumbu wrote:
 In this case rounding becomes a question of how do you 
 interpret the remainder of a division by a power of ten.
Unfortunately not. Think of 0.1500000000000001 rounded to one digit. It's clear, that a reminder of 0-4 is rounded down and of 6-9 is rounded up. But to decide in the case of a 5 you might need to look at the next digits if rounding mode tells you to round down in the case of 0.5...
Oct 31
prev sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Oct 30, 2019 at 07:28:44PM +0000, berni44 via Digitalmars-d wrote:
 On Wednesday, 30 October 2019 at 18:16:56 UTC, Rumbu wrote:
 That means that snprintf must use the current rounding mode that can
 be read using FloatingPointControl.rounding from std.math.
Is it really a "must"? We are not completely bound by the IEEE standard and, if good reasons are available, might reject it. For example, comparing two floats with <= produces either "false" or "true" in D. According to IEEE there should be a third result possible, namly "not comparable".
For non-comparable floats x and y (i.e., at least one is a NaN), D has the semantics: x < y false x <= y false x >= y false x > y false x == y false x != y true D used to have other comparison operators that handle various NaN-related subtleties (the so-called "spaceship operators" because of their alien appearance), but they were deprecated because nobody understood them so nobody used them. Having said that, though, I think we should try to conform to IEEE as much as possible, and there better be very good reasons when we don't.
 Having said this, it would be possible to implement it the way you
 claim, but probably at some cost (=slower, more and less easy readable
 lines of code). I'll think about it.
As I have said, floating-point formatting is far from the trivial affair that it appears to be on the surface. It's not something to be undertaken lightly, because it's full of complicated corner cases that must be handled correctly. T -- Never wrestle a pig. You both get covered in mud, and the pig likes it.
Oct 30
prev sibling next sibling parent reply Stefan Koch <uplink.coder googlemail.com> writes:
On Wednesday, 30 October 2019 at 13:44:52 UTC, berni44 wrote:
 In PR 7222 [1] Robert Schadek suggested replacing the call to 
 snprinf in std.format with an own method written in D. During 
 the last days I took a deeper look into this and meanwhile I've 
 got a function that works for floats
If you could post that so I can have a look over the WIP that'd be nice. Grisu2 also uses lookup tables, though for 52bit mantissa floats it's completely fine.
Oct 30
parent berni44 <dlang d-ecke.de> writes:
On Wednesday, 30 October 2019 at 20:46:07 UTC, Stefan Koch wrote:
 If you could post that so I can have a look over the WIP that'd 
 be nice.
See https://github.com/berni44/phobos/tree/printf The function can be found at the end of std/format.d. I had to comment out some unittests, because e and g qualifiers are not yet supported. I put several comments in the code, so I hope it's clear, what always happens. If not, feel free to ask. (I'll be offline during the weekend.) I also added a diagram for speed comparison. See https://github.com/berni44/phobos/blob/printf/diagram.png Blue and green use "%.10f" while black and red use "%.100f". Blue and red is my function, while green and black is snprintf. The X-axis gives the value in the exponent from 0 to 255, the y-axis gives the average time in nanoseconds. The green bottom line at the left is approx at 600ns. For each exponent there have been approx 217886 numbers checked (the same set for both functions). As you can see, at the left side, snprintf is faster, having an almost constant time, while the time of mine is slightly increasing when exponents get smaller. I scanned the snprintf implementation to find out, what they do - see my comment in the implementation for details.
Oct 31
prev sibling next sibling parent reply Walter Bright <newshound2 digitalmars.com> writes:
Replacing snprintf for floating point is very challenging, because:

1. people have been improving snprintf for decades
2. people expect precision and performance
3. the standard is snprintf, any credible implementation must be the same or
better

To that end, you'll need to be familiar with the following:

754-2019 IEEE Standard for Floating Point Arithmetic
https://ieeexplore.ieee.org/document/8766229

Printing Floating-Pointer Numbers Quickly and Accurately with Integers
https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf

Printing Floating-Point Numbers
https://ranjitjhala.github.io/static/fp-printing-popl16.pdf

Ryu Fast Float To String Conversion
https://dl.acm.org/citation.cfm?id=3192369

https://github.com/ulfjack/ryu
http://www.zverovich.net/2019/02/11/formatting-floating-point-numbers.html
https://news.ycombinator.com/item?id=20181832

Jonathan Marler's D implementation of ryu:
https://github.com/dragon-lang/mar/blob/master/src/mar/ryu.d

For historical interest, here's DMC's version, which was state of the art in
the 
1980's:

https://github.com/DigitalMars/dmc/blob/master/src/core/floatcvt.c
Oct 30
next sibling parent reply Guillaume Piolat <first.last gmail.com> writes:
On Thursday, 31 October 2019 at 01:09:14 UTC, Walter Bright wrote:
 Replacing snprintf for floating point is very challenging, 
 because:

 1. people have been improving snprintf for decades
 2. people expect precision and performance
 3. the standard is snprintf, any credible implementation must 
 be the same or better
Moreover, actual printf implementations seems to depend upon the locale. This creates bugs (say "1,4" instead of "1.4") so this behaviour depends if you want to be bug-compatible. We've been hit by that in `printed` when used with a Russian locale.
Oct 31
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Thu, Oct 31, 2019 at 10:14:59AM +0000, Guillaume Piolat via Digitalmars-d
wrote:
 On Thursday, 31 October 2019 at 01:09:14 UTC, Walter Bright wrote:
 Replacing snprintf for floating point is very challenging, because:
 
 1. people have been improving snprintf for decades
 2. people expect precision and performance
 3. the standard is snprintf, any credible implementation must be the
 same or better
Moreover, actual printf implementations seems to depend upon the locale. This creates bugs (say "1,4" instead of "1.4") so this behaviour depends if you want to be bug-compatible. We've been hit by that in `printed` when used with a Russian locale.
*Is* it a bug, though? Arguably, the reason snprintf was done that way was precisely to support properly-formatted output in the current locale. I.e., when outputting Russian text, the convention is to write the decimal point with "," rather than ".". It would be considered wrong or strange to write "1.4" instead of "1,4". This is important if you want to support i18n in your program. But if you're outputting to, say, JSON, then you *don't* ever want "1,4", you only want "1.4". Which leads me to think that these two should be separate format specifiers. Unfortunately, I can see how this would force format() to be impure, because to support checking the current locale implies accessing global state, which is impure. T -- I'm still trying to find a pun for "punishment"...
Oct 31
parent reply Jacob Carlborg <doob me.com> writes:
On Thursday, 31 October 2019 at 15:58:08 UTC, H. S. Teoh wrote:

 Which leads me to think that these two should be separate 
 format specifiers.
I would put the localization in a completely different function.
 Unfortunately, I can see how this would force format() to be 
 impure, because to support checking the current locale implies 
 accessing global state, which is impure.
You could pass in the locale to the function, then it can be pure. Even more reason to have it as a separate function. I would say that should be best practice because you might want to run a program in a different locale than the global configured one. I'm not sure if it's enough to look at the locale. On my computer (a Mac) I have configured it to have the language in English but the date, time, number and currency format to Swedish. -- /Jacob Carlborg
Nov 01
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Fri, Nov 01, 2019 at 01:01:21PM +0000, Jacob Carlborg via Digitalmars-d
wrote:
 On Thursday, 31 October 2019 at 15:58:08 UTC, H. S. Teoh wrote:
 
 Which leads me to think that these two should be separate format
 specifiers.
I would put the localization in a completely different function.
That would be a better solution. It would be different from snprintf, though, and we'd have to document it well so that people can find it.
 Unfortunately, I can see how this would force format() to be impure,
 because to support checking the current locale implies accessing
 global state, which is impure.
You could pass in the locale to the function, then it can be pure. Even more reason to have it as a separate function. I would say that should be best practice because you might want to run a program in a different locale than the global configured one.
+1.
 I'm not sure if it's enough to look at the locale. On my computer (a
 Mac) I have configured it to have the language in English but the
 date, time, number and currency format to Swedish.
[...] I think it has to do with the LC_* environment variables, at least on a *nix system. You can set LC_ALL to get the same settings across all categories, or you can separately set one or more of the LC_* to get different settings in each category. (Caveat: I've never actually done this myself before, so I could be misunderstanding how it works.) T -- Famous last words: I wonder what will happen if I do *this*...
Nov 01
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 1 November 2019 at 17:02:53 UTC, H. S. Teoh wrote:
 I think it has to do with the LC_* environment variables, at 
 least on a *nix system. You can set LC_ALL to get the same 
 settings across all categories, or you can separately set one 
 or more of the LC_* to get different settings in each category. 
 (Caveat: I've never actually done this myself before, so I 
 could be misunderstanding how it works.)
Yeah, POSIX, so POSIX-compliant C compilers should support it... https://docs.oracle.com/cd/E19253-01/817-2521/overview-39/index.html Other languages do not have to follow it, of course.
Nov 01
prev sibling next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2019-11-01 18:02, H. S. Teoh wrote:

 That would be a better solution. It would be different from snprintf,
 though, and we'd have to document it well so that people can find it.
It depends on what the goal is. If it is to have 100% compatible drop-in replacement to snprintf then we need to include the localization. But if the goal is just to have a function that converts values to a string, which is implemented in D, then have the opportunity to make a better interface. -- /Jacob Carlborg
Nov 02
next sibling parent berni44 <dlang d-ecke.de> writes:
On Saturday, 2 November 2019 at 16:59:15 UTC, Jacob Carlborg 
wrote:
 On 2019-11-01 18:02, H. S. Teoh wrote:

 That would be a better solution. It would be different from 
 snprintf,
 though, and we'd have to document it well so that people can 
 find it.
It depends on what the goal is. If it is to have 100% compatible drop-in replacement to snprintf then we need to include the localization. But if the goal is just to have a function that converts values to a string, which is implemented in D, then have the opportunity to make a better interface.
+1 That's actually, what I ask myself all the time. I personally prefer the second approach. And a similar question arrises with the rounding problem, which is even a little bit more difficult, because the IEEE standard interferes here too.
Nov 04
prev sibling parent Guillaume Piolat <first.last gmail.com> writes:
On Saturday, 2 November 2019 at 16:59:15 UTC, Jacob Carlborg 
wrote:
 It depends on what the goal is. If it is to have 100% 
 compatible drop-in replacement to snprintf then we need to 
 include the localization.

 But if the goal is just to have a function that converts values 
 to a string, which is implemented in D, then have the 
 opportunity to make a better interface.
+1 this is important since we've had localization bug and I suspect it's very easy to have such bugs. Warning: `format` is affected too! (perhaps only when using the %f format specifier?) https://github.com/AuburnSounds/printed/issues/22 ugly fix: https://github.com/AuburnSounds/printed/commit/797343c0fc213ea34aa5b79b61cdc1164ae189df There is a non-zero chance that people _are_ relying on `format` and `snprintf` being localization-aware. So a "drop-in" replacement need to fix this mess by being bug-compatible, or not being drop-in.
Nov 04
prev sibling parent Uknown <sireeshkodali1 gmail.com> writes:
On Friday, 1 November 2019 at 17:02:53 UTC, H. S. Teoh wrote:
 On Fri, Nov 01, 2019 at 01:01:21PM +0000, Jacob Carlborg via 
 Digitalmars-d wrote:
 On Thursday, 31 October 2019 at 15:58:08 UTC, H. S. Teoh wrote:
 [...]
https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f027338b0fab0f5078971fbe This is kind of relevant to the whole issue with locales, and how they simply don't work as they should. Probably best to not replicate at all, and instead just pass in the necessary localisation to the format function, if necessary.
Nov 02
prev sibling next sibling parent Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:
On Thursday, 31 October 2019 09:58:08 MDT H. S. Teoh via Digitalmars-d 
wrote:
 On Thu, Oct 31, 2019 at 10:14:59AM +0000, Guillaume Piolat via 
Digitalmars-d wrote:
 On Thursday, 31 October 2019 at 01:09:14 UTC, Walter Bright wrote:
 Replacing snprintf for floating point is very challenging, because:

 1. people have been improving snprintf for decades
 2. people expect precision and performance
 3. the standard is snprintf, any credible implementation must be the
 same or better
Moreover, actual printf implementations seems to depend upon the locale. This creates bugs (say "1,4" instead of "1.4") so this behaviour depends if you want to be bug-compatible. We've been hit by that in `printed` when used with a Russian locale.
*Is* it a bug, though? Arguably, the reason snprintf was done that way was precisely to support properly-formatted output in the current locale. I.e., when outputting Russian text, the convention is to write the decimal point with "," rather than ".". It would be considered wrong or strange to write "1.4" instead of "1,4". This is important if you want to support i18n in your program. But if you're outputting to, say, JSON, then you *don't* ever want "1,4", you only want "1.4". Which leads me to think that these two should be separate format specifiers. Unfortunately, I can see how this would force format() to be impure, because to support checking the current locale implies accessing global state, which is impure.
The version of format that takes the format specifier as a compile-time argument shouldn't have that problem, but the one that took it as a runtime argument certainly would. - Jonathan M Davis
Oct 31
prev sibling parent reply berni44 <dlang d-ecke.de> writes:
On Thursday, 31 October 2019 at 10:14:59 UTC, Guillaume Piolat 
wrote:
 Moreover, actual printf implementations seems to depend upon 
 the locale. This creates bugs (say "1,4" instead of "1.4") so 
 this behaviour depends if you want to be bug-compatible. We've 
 been hit by that in `printed` when used with a Russian locale.
Meanwhile, my implementation for the f (and F) qualifier is (almost) finished. Yet, the locale-stuff is missing and I do not manage to implement it. Maybe someone can help me: a) I need to create some test. As far as I know, I've to execute "export LANG=de_DE.UTF-8" (in bash, debian) to make it use the german locale, which should replace the dot by a comma. Unfortunately writefln!"%.10f"(0.1) still writes a dot instead of the expected ",". Instead of "LANG" I tried several other stuff, like LC_ALL or LC_NUMERIC. Any idea what I do wrong here? b) How to query the current locale from D? Actually I only need the number-separator in the current locale as a dchar. I found core.stdc.locale but do not know how to use it.
Nov 06
next sibling parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Wednesday, 6 November 2019 at 13:25:38 UTC, berni44 wrote:
 On Thursday, 31 October 2019 at 10:14:59 UTC, Guillaume Piolat 
 wrote:
 [...]
Meanwhile, my implementation for the f (and F) qualifier is (almost) finished. Yet, the locale-stuff is missing and I do not manage to implement it. Maybe someone can help me: a) I need to create some test. As far as I know, I've to execute "export LANG=de_DE.UTF-8" (in bash, debian) to make it use the german locale, which should replace the dot by a comma. Unfortunately writefln!"%.10f"(0.1) still writes a dot instead of the expected ",". Instead of "LANG" I tried several other stuff, like LC_ALL or LC_NUMERIC. Any idea what I do wrong here? b) How to query the current locale from D? Actually I only need the number-separator in the current locale as a dchar. I found core.stdc.locale but do not know how to use it.
I think the best way to go is to make it locale-independent and simply provide a way for user to specify the decimal separator (and other related locale details, if any).
Nov 06
next sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Nov 06, 2019 at 04:17:32PM +0000, Petar via Digitalmars-d wrote:
 On Wednesday, 6 November 2019 at 13:25:38 UTC, berni44 wrote:
[...]
 b) How to query the current locale from D? Actually I only need the
 number-separator in the current locale as a dchar. I found
 core.stdc.locale but do not know how to use it.
I think the best way to go is to make it locale-independent and simply provide a way for user to specify the decimal separator (and other related locale details, if any).
Yes, I think in the long run this will be the more viable approach. Depending on locale as a global state is problematic because it forces formatting to be impure, and also forces users to implement hacks when they need to temporarily change the locale. E.g., in a system like snprintf, if you need to format German text with snippets of English quotations, you will have to temporarily override LC_* somehow in order to print a number with two different separators, or hack it with string postprocessing, etc.. It's better to let the user pass in the desired separator as a parameter -- the ',' flag in std.format already does this via the optional '?' modifier, for example: writefln("%,?d", '_', 12345678); // 12_345_678 writefln("%,?d", '|', 12345678); // 12|345|678 Conceivably one could extend the '.' flag with a '?' modifier as well, so something like this: writefln("%.2?d", ',', 3.141592); // 3,14 writefln("%.2?d", '_', 3.141592); // 3_14 writefln("%.2?d", ':', 3.141592); // 3:14 Then programs that want to support locales can just do this: writefln("%.2?d", curLocale.separator, 3.141592); T -- I don't trust computers, I've spent too long programming to think that they can get anything right. -- James Miller
Nov 06
next sibling parent lithium iodate <whatdoiknow doesntexist.net> writes:
On Wednesday, 6 November 2019 at 17:28:58 UTC, H. S. Teoh wrote:
 Yes, I think in the long run this will be the more viable 
 approach. Depending on locale as a global state is problematic 
 because it forces formatting to be impure, and also forces 
 users to implement hacks when they need to temporarily change 
 the locale. E.g., in a system like snprintf, if you need to 
 format German text with snippets of English quotations, you 
 will have to temporarily override LC_* somehow in order to 
 print a number with two different separators, or hack it with 
 string postprocessing, etc..
All while setlocale doesn't even provide any sort of thread-safety!
Nov 06
prev sibling next sibling parent reply Rumbu <rumbu rumbu.ro> writes:
On Wednesday, 6 November 2019 at 17:28:58 UTC, H. S. Teoh wrote:

 Then programs that want to support locales can just do this:

 	writefln("%.2?d", curLocale.separator, 3.141592);
For %f, the decimal separator is not the only locale specific info. Full list: -decimal separator -negative pattern -positive pattern -infinity symbol -nan symbol -digit shapes, especially for Arabic and Thai For %d and %g there are more like digit grouping/group separator.
Nov 06
next sibling parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Nov 06, 2019 at 06:21:43PM +0000, Rumbu via Digitalmars-d wrote:
 On Wednesday, 6 November 2019 at 17:28:58 UTC, H. S. Teoh wrote:
 
 Then programs that want to support locales can just do this:
 
 	writefln("%.2?d", curLocale.separator, 3.141592);
[...]
 For %f, the decimal separator is not the only locale specific info.
 Full list:
 
 -decimal separator
 -negative pattern
 -positive pattern
 -infinity symbol
 -nan symbol
 -digit shapes, especially for Arabic and Thai
 
 For %d and %g there are more like digit grouping/group separator.
[...] Haha, wonderful. Don't you just love it when i18n consistently throws a monkey wrench into any simplistic scheme? Almost makes me want to suggest that we need std.i18n before we can implement anything sane i18n-wise. But since that's not gonna happen in the foreseeable future, and I'm sick and tired of the trend around these parts of letting the perfect be the enemy of the good, I'm going to propose that we just forget about i18n and just implement formatting for an English-specific locale. If users *really* want to support locales, just use %s with a wrapper struct with a toString method that does whatever it takes to get the right output. I've used this pattern for various problems with formatting complex objects, and it works fairly well: struct i18nFmt { float f; // or double, real, whatever int precision; ... // any other params here, like decimal point format, etc. void toString(S)(S sink) if (isOutputRange!(S, char)) { ... // do whatever you need to do here to // produce the right output } } ... float myData = ...; // just use %s instead of some incomprehensible over-engineered // crap like %1:3,$13&.*^_7?f output = format("%s", myData.i18nFmt); // or: output2 = format("%s", myData.i18nFmt(curLocale.precision, ... /* whatever else */)); This way you lift the complexity out of std.format where it really doesn't belong, and make it possible to plug in different locale handling modules in its place. This even opens the door for a future std.i18n that simply exports a bunch of these locale-dependent proxy formatters that you could just append to your data items. Much more extensible and flexible than trying to shoehorn everything into std.format, which will inevitably turn it into a nasty hairball of intractible dependencies that's impossible to make pure, nothrow, etc.. (Oh wait, it's already such a hairball. :-D Let's not make it worse!) And it makes std.format more pay-as-you-go; if you never need to use std.i18n it won't pull it in as a dependency just because it needs to support an obscure format specifier that you don't actually use. T -- Being forced to write comments actually improves code, because it is easier to fix a crock than to explain it. -- G. Steele
Nov 06
prev sibling parent berni44 <dlang d-ecke.de> writes:
On Wednesday, 6 November 2019 at 18:21:43 UTC, Rumbu wrote:
 For %f, the decimal separator is not the only locale specific 
 info. Full list:

 -decimal separator
 -negative pattern
 -positive pattern
 -infinity symbol
 -nan symbol
 -digit shapes, especially for Arabic and Thai


 For %d and %g there are more like digit grouping/group 
 separator.
snprintf only uses the decimal separator (and grouping but that's not used inside format, the grouping is done separately there). All else is ignored by snprintf.
Nov 07
prev sibling parent berni44 <dlang d-ecke.de> writes:
On Wednesday, 6 November 2019 at 17:28:58 UTC, H. S. Teoh wrote:
 Yes, I think in the long run this will be the more viable 
 approach. Depending on locale as a global state is problematic 
 because it forces formatting to be impure, and also forces 
 users to implement hacks when they need to temporarily change 
 the locale. E.g., in a system like snprintf, if you need to 
 format German text with snippets of English quotations, you 
 will have to temporarily override LC_* somehow in order to 
 print a number with two different separators, or hack it with 
 string postprocessing, etc..
My current approch is a pure and safe function that's doing the formating, but ignores the locale completely. This function is called from formatValueImpl and could be modified there, if desired. Currently (I want to make small steps), the function can only be used for the f (and F) specifier (and only for float and double). For all other specifiers/types snprintf is still called. That might result in different behaviour depending on the specifier and the type. I'd prefere to make it behave identically. Having said this, I completely agree, that it would be better if format ignores the locale and let's the user do this in a wrapper, if desired.
Nov 07
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2019-11-06 17:17, Petar Kirov [ZombineDev] wrote:

 I think the best way to go is to make it locale-independent and simply 
 provide a way for user to specify the decimal separator (and other 
 related locale details, if any).
In my experience, I think it's best to leave the locale support to a separate API. The "snprintf" API is never going to be flexible enough. No one is using "snprintf" for serious localization. It's not just the decimal point that needs to be localized. There are various other number related things that need localization. Just have a look at the number formatter in Apple's API [1]. It's pretty big. Then they have separate formatters for currency, length, mass, interval and more. [1] https://developer.apple.com/documentation/foundation/nsnumberformatter?language=objc -- /Jacob Carlborg
Nov 06
parent "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Wed, Nov 06, 2019 at 07:43:06PM +0100, Jacob Carlborg via Digitalmars-d
wrote:
[...]
 In my experience, I think it's best to leave the locale support to a
 separate API. The "snprintf" API is never going to be flexible enough.
 No one is using "snprintf" for serious localization.
 
 It's not just the decimal point that needs to be localized. There are
 various other number related things that need localization.
 
 Just have a look at the number formatter in Apple's API [1]. It's
 pretty big. Then they have separate formatters for currency, length,
 mass, interval and more.
[...] Yeah, after thinking about this more, I've come to the same conclusion. Just use %s for anything that depends on complex locale-dependent configuration, and wrap your data item in a proxy object that does whatever it takes to make it work. float myQuantity = ...; auto output = format("%s", myQuantity.localeFmt(...)); where localeFmt is some function or wrapper struct overloading toString that does whatever it takes to format the data in a locale-specific way. T -- "Hi." "'Lo."
Nov 06
prev sibling next sibling parent reply Andre Pany <andre s-e-a-p.de> writes:
On Wednesday, 6 November 2019 at 13:25:38 UTC, berni44 wrote:
 On Thursday, 31 October 2019 at 10:14:59 UTC, Guillaume Piolat 
 wrote:
 Moreover, actual printf implementations seems to depend upon 
 the locale. This creates bugs (say "1,4" instead of "1.4") so 
 this behaviour depends if you want to be bug-compatible. We've 
 been hit by that in `printed` when used with a Russian locale.
Meanwhile, my implementation for the f (and F) qualifier is (almost) finished. Yet, the locale-stuff is missing and I do not manage to implement it. Maybe someone can help me: a) I need to create some test. As far as I know, I've to execute "export LANG=de_DE.UTF-8" (in bash, debian) to make it use the german locale, which should replace the dot by a comma. Unfortunately writefln!"%.10f"(0.1) still writes a dot instead of the expected ",". Instead of "LANG" I tried several other stuff, like LC_ALL or LC_NUMERIC. Any idea what I do wrong here? b) How to query the current locale from D? Actually I only need the number-separator in the current locale as a dchar. I found core.stdc.locale but do not know how to use it.
This question comes late, but did you considered to just do an 1 to 1 translation of snprintf from C to D? Of course the second step would be to provide an idiomatic D version with the mentioned suggestions. But having a translation would already be fantastic. Kind regards Andre
Nov 06
parent berni44 <dlang d-ecke.de> writes:
On Wednesday, 6 November 2019 at 16:54:25 UTC, Andre Pany wrote:
 This question comes late, but did you considered to just do an 
 1 to 1 translation of snprintf from C to D?
I scanned through the implementation of snprintf several times while I wrote the replacement. I think, the main algorithm is quite similar, apart from some speed improvement for numbers close to zero, which turned out to be quite nasty in detail (and which for now I skipped therefore). By the way: A 1 to 1 translation would not be something, I could do, because my knowledge of C is very little and the algorithm contains lot's of calls to functions I do not know, where to look them up and how to replace them with D functions.
Nov 07
prev sibling parent lithium iodate <whatdoiknow doesntexist.net> writes:
On Wednesday, 6 November 2019 at 13:25:38 UTC, berni44 wrote:
 a) I need to create some test. As far as I know, I've to 
 execute "export LANG=de_DE.UTF-8" (in bash, debian) to make it 
 use the german locale, which should replace the dot by a comma. 
 Unfortunately writefln!"%.10f"(0.1) still writes a dot instead 
 of the expected ",". Instead of "LANG" I tried several other 
 stuff, like LC_ALL or LC_NUMERIC. Any idea what I do wrong here?
If D wishes to behave the same as C, this is correct behavior. C requires the locale "C" to be activated at program startup. The C-way to use the environment's locale is to call setlocale for the relevant category with an empty string for the locale value. e. g. setlocale(LC_ALL, "")
 b) How to query the current locale from D? Actually I only need 
 the number-separator in the current locale as a dchar. I found 
 core.stdc.locale but do not know how to use it.
You can query the current locale of a given category by calling setlocale with a null-pointer for the locale, it will return the currently set locale as a C-string. The formatting-information is returned by localeconv(). Not sure why the docs don't show the members of lconv, but it contains decimal_point, which is a C-string of the decimal separator. setlocale(LC_ALL, "de_DE.UTF-8"); localeconv.decimal_point.fromStringz.writeln; prints ","
Nov 06
prev sibling parent berni44 <dlang d-ecke.de> writes:
On Thursday, 31 October 2019 at 01:09:14 UTC, Walter Bright wrote:
 To that end, you'll need to be familiar with the following:
Thanks for that list. I'll have a look, when I find the time to do so.
 754-2019 IEEE Standard for Floating Point Arithmetic
 https://ieeexplore.ieee.org/document/8766229
Unfortunately I cannot download this file. I've got no company listed there and I'm not willing to pay for it...
 Ryu Fast Float To String Conversion
 https://dl.acm.org/citation.cfm?id=3192369

 https://github.com/ulfjack/ryu
 [...]

 Jonathan Marler's D implementation of ryu:
 https://github.com/dragon-lang/mar/blob/master/src/mar/ryu.d
I allready read the paper about ryu. IMHO it's of no use here, because the speed advantage comes from being more "inaccurate" than snprintf. Ryu is designed for a round-trip, while snprintf prints as many digits, as the user wants to get (even when they contain no more information). The same holds for grisu variants.
Oct 31
prev sibling next sibling parent Robert Schadek <rschadek symmetryinvestments.com> writes:
On Wednesday, 30 October 2019 at 13:44:52 UTC, berni44 wrote:
 In PR 7222 [1] Robert Schadek suggested replacing the call to 
 snprinf in std.format with an own method written in D.
I suggested it because formatImpl!float is not pure which makes for instance std.json not pure among a few other.
Oct 31
prev sibling parent berni44 <dlang d-ecke.de> writes:
On Wednesday, 30 October 2019 at 13:44:52 UTC, berni44 wrote:
 In PR 7222 [1] Robert Schadek suggested replacing the call to 
 snprinf in std.format with an own method written in D. [...]
Meanwhile I filed a first PR: https://github.com/dlang/phobos/pull/7264 - only part of a complete replacement is achieved with that: Only the 'f' qualifier is replaced and that only for float and double. But it's a start and I want to make small steps. Many thanks to all of you, who answered to this thread or gave hints at other places. This helped a lot. :-)
Nov 08