digitalmars.D - Replacement for snprintf
- berni44 (34/34) Oct 30 2019 In PR 7222 [1] Robert Schadek suggested replacing the call to
- =?UTF-8?Q?Ali_=c3=87ehreli?= (4/9) Oct 30 2019 The tie-breaker is to always round towards the even digit. So it should
- berni44 (8/10) Oct 30 2019 As far as I know that's for avoiding error propagation, when
- Jon Degenhardt (6/18) Oct 30 2019 It's reasonably common to have numeric values written out in text
- berni44 (15/20) Oct 30 2019 But IMHO this is the fault of people who do this and not the
- =?UTF-8?Q?Ali_=c3=87ehreli?= (4/6) Oct 30 2019 Just to make sure, you are aware of the optional '-' before '(', right?
- berni44 (11/13) Oct 30 2019 I know this. I personally think, it is somewhat ugly, but I
- Sebastiaan Koppe (2/11) Oct 30 2019 You are correct, but people will still blame the printing routine.
- drug (6/17) Oct 31 2019 I wouldn't state it is any fault. In some cases it is much more
- Walter Bright (2/6) Oct 31 2019 To get round-trip 100% accuracy, print the floats in hex using the %A fo...
- Stefan Koch (6/14) Oct 31 2019 DtoA is also supposed to have 100% accuracy, when it comes to
- H. S. Teoh (8/11) Oct 31 2019 Meybe we should be using your implementation then? No need to duplicate
- H. S. Teoh (15/28) Oct 30 2019 If you haven't already, please read:
- berni44 (26/37) Oct 30 2019 Thanks for that link. I havn't had a look into the grisu
- H. S. Teoh (30/57) Oct 30 2019 Yeah, I've been waiting for a long time for a pure, @safe, and CTFE-able
- Rumbu (12/20) Oct 30 2019 According to ieee754-2008:
- berni44 (9/12) Oct 30 2019 Is it really a "must"? We are not completely bound by the IEEE
- Rumbu (12/24) Oct 30 2019 I don't know the inners of your code, but I suppose that before
- berni44 (6/8) Oct 31 2019 Unfortunately not. Think of 0.1500000000000001 rounded to one
- H. S. Teoh (22/34) Oct 30 2019 For non-comparable floats x and y (i.e., at least one is a NaN), D has
- Stefan Koch (5/9) Oct 30 2019 If you could post that so I can have a look over the WIP that'd
- berni44 (20/22) Oct 31 2019 See https://github.com/berni44/phobos/tree/printf
- Walter Bright (21/21) Oct 30 2019 Replacing snprintf for floating point is very challenging, because:
- Guillaume Piolat (5/11) Oct 31 2019 Moreover, actual printf implementations seems to depend upon the
- H. S. Teoh (16/28) Oct 31 2019 *Is* it a bug, though? Arguably, the reason snprintf was done that way
- Jacob Carlborg (11/16) Nov 01 2019 You could pass in the locale to the function, then it can be
- H. S. Teoh (13/30) Nov 01 2019 That would be a better solution. It would be different from snprintf,
- Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/10) Nov 01 2019 Yeah, POSIX, so POSIX-compliant C compilers should support it...
- Jacob Carlborg (8/10) Nov 02 2019 It depends on what the goal is. If it is to have 100% compatible drop-in...
- berni44 (8/19) Nov 04 2019 +1
- Guillaume Piolat (13/19) Nov 04 2019 +1 this is important since we've had localization bug and I
- Uknown (6/10) Nov 02 2019 https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f027338b0fab0f5...
- Jonathan M Davis (7/32) Oct 31 2019 Digitalmars-d wrote:
- berni44 (14/18) Nov 06 2019 Meanwhile, my implementation for the f (and F) qualifier is
- Petar Kirov [ZombineDev] (4/19) Nov 06 2019 I think the best way to go is to make it locale-independent and
- H. S. Teoh (25/33) Nov 06 2019 Yes, I think in the long run this will be the more viable approach.
- lithium iodate (3/12) Nov 06 2019 All while setlocale doesn't even provide any sort of
- Rumbu (10/12) Nov 06 2019 For %f, the decimal separator is not the only locale specific
- H. S. Teoh (49/65) Nov 06 2019 [...]
- berni44 (4/14) Nov 07 2019 snprintf only uses the decimal separator (and grouping but that's
- berni44 (13/22) Nov 07 2019 My current approch is a pure and @safe function that's doing the
- Jacob Carlborg (13/16) Nov 06 2019 In my experience, I think it's best to leave the locale support to a
- H. S. Teoh (14/24) Nov 06 2019 [...]
- Andre Pany (8/26) Nov 06 2019 This question comes late, but did you considered to just do an 1
- berni44 (10/12) Nov 07 2019 I scanned through the implementation of snprintf several times
- lithium iodate (16/25) Nov 06 2019 If D wishes to behave the same as C, this is correct behavior. C
- berni44 (10/19) Oct 31 2019 Thanks for that list. I'll have a look, when I find the time to
- Robert Schadek (4/6) Oct 31 2019 I suggested it because formatImpl!float is not pure which makes
- berni44 (8/10) Nov 08 2019 Meanwhile I filed a first PR:
In PR 7222 [1] Robert Schadek suggested replacing the call to snprinf in std.format with an own method written in D. During the last days I took a deeper look into this and meanwhile I've got a function that works for floats (and probably also doubles, but I havn't tested that yet and it should also work with reals if ucent would be available; without ucent I need a workaround for real or fall back to BigInt). I only implemented f qualifier yet, but it shouldn't be difficult to add e and g qualifiers and the uppercase versions. Also some but again, I think, this will not be very difficult. Unfortunately I'll be busy with some other (non-D) stuff for some time. I'll probably continue work on this someday in november. I checked correctness for floats by comparing to the result of snprintf for about 1% of all numbers (I will do that for all, before filing an PR though). The only difference are rounding issues, when the number is exactly between two adjacent ways of displaying. The implementation of snprintf on my computer always rounds towards zero while mine rounds in the opposite direction. (E.g. 0.125 rounded to two digits is 0.13 in my implementation while it's 0.12 in snprintfs implementation) I doubt, that different implementations of printf-variants are all identical in this regard. I also compared the speed of both implementations. They are generally in the same order of magnitude (600-2800ns per number, depending on precision and number). On average my implementation is slightly faster. For numbers close to 0 the snprintf implementation is faster (I wasn't able to follow the algorithm they use), especially if the desired precision is large (I'll try to improve this, because it might get a real problem for reals). For all other numbers my current implementation wins by a more or less small margin. [1] https://github.com/dlang/phobos/pull/7222#issuecomment-544909188
Oct 30 2019
On 10/30/2019 06:44 AM, berni44 wrote:The only difference are rounding issues, when the number is exactly between two adjacent ways of displaying. The implementation of snprintf on my computer always rounds towards zero while mine rounds in the opposite direction. (E.g. 0.125 rounded to two digits is 0.13 in my implementation while it's 0.12 in snprintfs implementation)The tie-breaker is to always round towards the even digit. So it should always produce 1.12, 1.14, etc. Ali
Oct 30 2019
On Wednesday, 30 October 2019 at 15:48:44 UTC, Ali Çehreli wrote:The tie-breaker is to always round towards the even digit. So it should always produce 1.12, 1.14, etc.As far as I know that's for avoiding error propagation, when intermediate results need to be rounded. When I'm not completely mistaken, Donald Knuth prooved that rounding toward even avoids errors that might building up using several such steps. But here there is little chance, that the result will be used for new calculations. It's most often used for printing a result that humans have to read. This is different.
Oct 30 2019
On Wednesday, 30 October 2019 at 16:04:10 UTC, berni44 wrote:On Wednesday, 30 October 2019 at 15:48:44 UTC, Ali Çehreli wrote:It's reasonably common to have numeric values written out in text format and read back in and used in subsequent computations. Not always a great idea, especially when done without much consideration for round-off errors. But it's not uncommon. --JonThe tie-breaker is to always round towards the even digit. So it should always produce 1.12, 1.14, etc.As far as I know that's for avoiding error propagation, when intermediate results need to be rounded. When I'm not completely mistaken, Donald Knuth prooved that rounding toward even avoids errors that might building up using several such steps. But here there is little chance, that the result will be used for new calculations. It's most often used for printing a result that humans have to read. This is different.
Oct 30 2019
On Wednesday, 30 October 2019 at 17:50:03 UTC, Jon Degenhardt wrote:It's reasonably common to have numeric values written out in text format and read back in and used in subsequent computations. Not always a great idea, especially when done without much consideration for round-off errors. But it's not uncommon.But IMHO this is the fault of people who do this and not the fault of a printing routine. But: When pondering about how to fix the results of format for ranges of strings (it places currently quotes arround each string, which is somewhat inconsistent because single strings are printed without quotes, and causes confusion). I came up with the idea of having a new format qualifier, maybe S like source, in addition to s, which prints the type in a way, that it can be directly used in D code (which is, as far as I know, the reason why the quotes are printed). That could be also used, to produce a representation of a float, that, when readin, is still the same float as before; which could be done by ryu or grisu algorithm, because these algorithms have exactly this goal.
Oct 30 2019
On 10/30/2019 12:19 PM, berni44 wrote:But: When pondering about how to fix the results of format for ranges of strings (it places currently quotes arround each stringJust to make sure, you are aware of the optional '-' before '(', right? "%-(%s%)" does not print the quotes. Ali
Oct 30 2019
On Wednesday, 30 October 2019 at 19:28:27 UTC, Ali Çehreli wrote:Just to make sure, you are aware of the optional '-' before '(', right? "%-(%s%)" does not print the quotes.I know this. I personally think, it is somewhat ugly, but I understand how it came to have it like this. My rationale is more like this: Currently it probably won't be possible to change the behavior of %s, because that would be a code breaking change. But there might be a time in the future, where it's possible to do some code breaking changes, maybe when D2 -> D2.1 or something like this. It will be much easier to do these changes at that time, when there is a well tested, simple and working alternative that can be pointed out to the users. Therefore it's a good idea to implement this alternative right now. Isn't it?
Oct 30 2019
On Wednesday, 30 October 2019 at 19:19:06 UTC, berni44 wrote:On Wednesday, 30 October 2019 at 17:50:03 UTC, Jon Degenhardt wrote:You are correct, but people will still blame the printing routine.It's reasonably common to have numeric values written out in text format and read back in and used in subsequent computations. Not always a great idea, especially when done without much consideration for round-off errors. But it's not uncommon.But IMHO this is the fault of people who do this and not the fault of a printing routine.
Oct 30 2019
On 10/30/19 10:54 PM, Sebastiaan Koppe wrote:On Wednesday, 30 October 2019 at 19:19:06 UTC, berni44 wrote:I wouldn't state it is any fault. In some cases it is much more productive to have text representation of data than binary ones. Initially I believed too that binary representation is the more suitable but afterwards I was forced to use text format and that gives me a good result.On Wednesday, 30 October 2019 at 17:50:03 UTC, Jon Degenhardt wrote:You are correct, but people will still blame the printing routine.It's reasonably common to have numeric values written out in text format and read back in and used in subsequent computations. Not always a great idea, especially when done without much consideration for round-off errors. But it's not uncommon.But IMHO this is the fault of people who do this and not the fault of a printing routine.
Oct 31 2019
On 10/31/2019 1:27 AM, drug wrote:In some cases it is much more productive to have text representation of data than binary ones. Initially I believed too that binary representation is the more suitable but afterwards I was forced to use text format and that gives me a good result.To get round-trip 100% accuracy, print the floats in hex using the %A format.
Oct 31 2019
On Thursday, 31 October 2019 at 20:20:24 UTC, Walter Bright wrote:On 10/31/2019 1:27 AM, drug wrote:DtoA is also supposed to have 100% accuracy, when it comes to value, not necessarily to binary representation though. I'd still prefer grisu2 over ryu, since it easier to understand and I already have a ctfeable version of it. (it can't be safe though since it casts double* to ulong*)In some cases it is much more productive to have text representation of data than binary ones. Initially I believed too that binary representation is the more suitable but afterwards I was forced to use text format and that gives me a good result.To get round-trip 100% accuracy, print the floats in hex using the %A format.
Oct 31 2019
On Thu, Oct 31, 2019 at 09:04:49PM +0000, Stefan Koch via Digitalmars-d wrote: [...]I'd still prefer grisu2 over ryu, since it easier to understand and I already have a ctfeable version of it.Meybe we should be using your implementation then? No need to duplicate work if it's already been done.(it can't be safe though since it casts double* to ulong*)But surely it can be trusted? T -- Your inconsistency is the only consistent thing about you! -- KD
Oct 31 2019
On Wed, Oct 30, 2019 at 01:44:52PM +0000, berni44 via Digitalmars-d wrote:In PR 7222 [1] Robert Schadek suggested replacing the call to snprinf in std.format with an own method written in D. During the last days I took a deeper look into this and meanwhile I've got a function that works for floats (and probably also doubles, but I havn't tested that yet and it should also work with reals if ucent would be available; without ucent I need a workaround for real or fall back to BigInt). I only implemented f qualifier yet, but it shouldn't be difficult to add e and g qualifiers and the uppercase versions. Also some work I think, this will not be very difficult. Unfortunately I'll be busy with some other (non-D) stuff for some time. I'll probably continue work on this someday in november.If you haven't already, please read: https://www.zverovich.net/2019/02/11/formatting-floating-point-numbers.html especially the papers linked in the first paragraph. Formatting floating-point numbers is not a trivial task. It's easy to write up something that works for common cases, but it's not so easy to get something to gives the best results in *all* cases. You probably should use the algorithms referenced above for your implementation, instead of coming up with your own that may have unexpected corner cases that don't produce the right output. T -- Valentine's Day: an occasion for florists to reach into the wallets of nominal lovers in dire need of being reminded to profess their hypothetical love for their long-forgotten.
Oct 30 2019
On Wednesday, 30 October 2019 at 17:41:26 UTC, H. S. Teoh wrote:If you haven't already, please read: https://www.zverovich.net/2019/02/11/formatting-floating-point-numbers.html especially the papers linked in the first paragraph.Thanks for that link. I havn't had a look into the grisu algorithms. But I'll definitivly do that.Formatting floating-point numbers is not a trivial task. It's easy to write up something that works for common cases, but it's not so easy to get something to gives the best results in *all* cases.I know, that this is something we all wish. Anyway, my goal is set somewhat lower: I'd like to replace the existing call to snprintf with something that is programmed in D and which should be pure, safe and ctfeable. And ideally it should not be slower then snprintf.You probably should use the algorithms referenced above for your implementation,I read through the paper for the ryu algorithm and rejected it (at least for me; if someone else is goint to implement it and file a PR that's fine). My reason for rejecting is, that the algorithm has not exactly the same goal as printf, which IMHO means, that it cannot be used here; and that it needs a lookuptable, that is too large (300K for 128bit reals). I fear a little bit, from what I read in the ryu paper about the grisu algorithms, that it has the first of the above mentioned problems too. But yet I can't tell for sure.instead of coming up with your own that may have unexpected corner cases that don't produce the right output.Obviously I need to prove, that the algorithm is correct somehow. While this can be done for floats by running it on all numbers and comparing these results with the result of snprintf (or the result calculated by bc), for doubles and reals, this isn't possible anymore (a random sample can be tested anyway, but that's no proof). Anyway, I think, that the proof isn't hard to give. The current algorithm is short and straight forward. (And: When I implement one of the mentioned algorithms, it can still contain bugs, because I made a mistake somewhere.)
Oct 30 2019
On Wed, Oct 30, 2019 at 07:11:14PM +0000, berni44 via Digitalmars-d wrote:On Wednesday, 30 October 2019 at 17:41:26 UTC, H. S. Teoh wrote:[...]Yeah, I've been waiting for a long time for a pure, safe, and CTFE-able floating point formatter in D. What rumbu said about rounding mode, though, makes me fear that pure may not be attainable if we're going to be IEEE-compliant (since accessing the current rounding mode would be technically impure). Then again, the CTFE-able version can probably be made pure, since CTFE cannot change rounding mode in the compiler's runtime environment, so if we detect CTFE then we can just assume the default rounding mode. auto formatFloat(F)(F f) { FloatingPointControl fc; if (__ctfe) return formatFloatImpl(fc.roundToNearest); // pure else return formatFloatImpl(fc.rounding); // impure } should do it.Formatting floating-point numbers is not a trivial task. It's easy to write up something that works for common cases, but it's not so easy to get something to gives the best results in *all* cases.I know, that this is something we all wish. Anyway, my goal is set somewhat lower: I'd like to replace the existing call to snprintf with something that is programmed in D and which should be pure, safe and ctfeable. And ideally it should not be slower then snprintf.Why is it too large? Couldn't you generate the table with CTFE? :-D Or statically generate it and then import it, like std.uni does with the various Unicode tables (see std.internal.unicode_*). [...]You probably should use the algorithms referenced above for your implementation,I read through the paper for the ryu algorithm and rejected it (at least for me; if someone else is goint to implement it and file a PR that's fine). My reason for rejecting is, that the algorithm has not exactly the same goal as printf, which IMHO means, that it cannot be used here; and that it needs a lookuptable, that is too large (300K for 128bit reals).Obviously I need to prove, that the algorithm is correct somehow. While this can be done for floats by running it on all numbers and comparing these results with the result of snprintf (or the result calculated by bc), for doubles and reals, this isn't possible anymore (a random sample can be tested anyway, but that's no proof).Are we just copying whatever snprintf does? Is snprintf really a reliable standard to go by?Anyway, I think, that the proof isn't hard to give. The current algorithm is short and straight forward. (And: When I implement one of the mentioned algorithms, it can still contain bugs, because I made a mistake somewhere.)You don't necessarily have to implement grisu, et al, verbatim, but your algorithm should at least gracefully handle the special cases and potentially problematic cases cited in the papers. T -- Too many people have open minds but closed eyes.
Oct 30 2019
On Wednesday, 30 October 2019 at 13:44:52 UTC, berni44 wrote:In PR 7222 [1] Robert Schadek suggested replacing the call to snprinf in std.format with an own method written in D. During the last days I took a deeper look into this and meanwhile I've got a function that works for floats (and probably also doubles, but I havn't tested that yet and it should also work with reals if ucent would be available; without ucent I need a workaround for real or fall back to BigInt). [...]According to ieee754-2008: "5.12.2 External decimal character sequences representing finite numbers [...] For binary formats, all conversions of H significant digits or fewer round correctly according to the applicable rounding direction;" Where H is 9 for single, 17 for double. IEE754 doesn't specify a H for reals. That means that snprintf must use the current rounding mode that can be read using FloatingPointControl.rounding from std.math.
Oct 30 2019
On Wednesday, 30 October 2019 at 18:16:56 UTC, Rumbu wrote:That means that snprintf must use the current rounding mode that can be read using FloatingPointControl.rounding from std.math.Is it really a "must"? We are not completely bound by the IEEE standard and, if good reasons are available, might reject it. For example, comparing two floats with <= produces either "false" or "true" in D. According to IEEE there should be a third result possible, namly "not comparable". Having said this, it would be possible to implement it the way you claim, but probably at some cost (=slower, more and less easy readable lines of code). I'll think about it.
Oct 30 2019
On Wednesday, 30 October 2019 at 19:28:44 UTC, berni44 wrote:On Wednesday, 30 October 2019 at 18:16:56 UTC, Rumbu wrote:I don't know the inners of your code, but I suppose that before "printing" you end up with an integer value and a ten-based exponent. In this case rounding becomes a question of how do you interpret the remainder of a division by a power of ten. Because I spent a lot of time figuring out how to format correctly decimal numbers, here is some piece of code I use in order to format decimal numbers depending on the rounding mode, fully compliant with the standard. https://github.com/rumbu13/decimal/blob/a6bae32d75d56be16e82d37af0c8e4a7c08e318a/src/decimal/decimal.d#L7296 Hope this helps.That means that snprintf must use the current rounding mode that can be read using FloatingPointControl.rounding from std.math.Is it really a "must"? We are not completely bound by the IEEE standard and, if good reasons are available, might reject it. For example, comparing two floats with <= produces either "false" or "true" in D. According to IEEE there should be a third result possible, namly "not comparable". Having said this, it would be possible to implement it the way you claim, but probably at some cost (=slower, more and less easy readable lines of code). I'll think about it.
Oct 30 2019
On Wednesday, 30 October 2019 at 20:29:34 UTC, Rumbu wrote:In this case rounding becomes a question of how do you interpret the remainder of a division by a power of ten.Unfortunately not. Think of 0.1500000000000001 rounded to one digit. It's clear, that a reminder of 0-4 is rounded down and of 6-9 is rounded up. But to decide in the case of a 5 you might need to look at the next digits if rounding mode tells you to round down in the case of 0.5...
Oct 31 2019
On Wed, Oct 30, 2019 at 07:28:44PM +0000, berni44 via Digitalmars-d wrote:On Wednesday, 30 October 2019 at 18:16:56 UTC, Rumbu wrote:For non-comparable floats x and y (i.e., at least one is a NaN), D has the semantics: x < y false x <= y false x >= y false x > y false x == y false x != y true D used to have other comparison operators that handle various NaN-related subtleties (the so-called "spaceship operators" because of their alien appearance), but they were deprecated because nobody understood them so nobody used them. Having said that, though, I think we should try to conform to IEEE as much as possible, and there better be very good reasons when we don't.That means that snprintf must use the current rounding mode that can be read using FloatingPointControl.rounding from std.math.Is it really a "must"? We are not completely bound by the IEEE standard and, if good reasons are available, might reject it. For example, comparing two floats with <= produces either "false" or "true" in D. According to IEEE there should be a third result possible, namly "not comparable".Having said this, it would be possible to implement it the way you claim, but probably at some cost (=slower, more and less easy readable lines of code). I'll think about it.As I have said, floating-point formatting is far from the trivial affair that it appears to be on the surface. It's not something to be undertaken lightly, because it's full of complicated corner cases that must be handled correctly. T -- Never wrestle a pig. You both get covered in mud, and the pig likes it.
Oct 30 2019
On Wednesday, 30 October 2019 at 13:44:52 UTC, berni44 wrote:In PR 7222 [1] Robert Schadek suggested replacing the call to snprinf in std.format with an own method written in D. During the last days I took a deeper look into this and meanwhile I've got a function that works for floatsIf you could post that so I can have a look over the WIP that'd be nice. Grisu2 also uses lookup tables, though for 52bit mantissa floats it's completely fine.
Oct 30 2019
On Wednesday, 30 October 2019 at 20:46:07 UTC, Stefan Koch wrote:If you could post that so I can have a look over the WIP that'd be nice.See https://github.com/berni44/phobos/tree/printf The function can be found at the end of std/format.d. I had to comment out some unittests, because e and g qualifiers are not yet supported. I put several comments in the code, so I hope it's clear, what always happens. If not, feel free to ask. (I'll be offline during the weekend.) I also added a diagram for speed comparison. See https://github.com/berni44/phobos/blob/printf/diagram.png Blue and green use "%.10f" while black and red use "%.100f". Blue and red is my function, while green and black is snprintf. The X-axis gives the value in the exponent from 0 to 255, the y-axis gives the average time in nanoseconds. The green bottom line at the left is approx at 600ns. For each exponent there have been approx 217886 numbers checked (the same set for both functions). As you can see, at the left side, snprintf is faster, having an almost constant time, while the time of mine is slightly increasing when exponents get smaller. I scanned the snprintf implementation to find out, what they do - see my comment in the implementation for details.
Oct 31 2019
Replacing snprintf for floating point is very challenging, because: 1. people have been improving snprintf for decades 2. people expect precision and performance 3. the standard is snprintf, any credible implementation must be the same or better To that end, you'll need to be familiar with the following: 754-2019 IEEE Standard for Floating Point Arithmetic https://ieeexplore.ieee.org/document/8766229 Printing Floating-Pointer Numbers Quickly and Accurately with Integers https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf Printing Floating-Point Numbers https://ranjitjhala.github.io/static/fp-printing-popl16.pdf Ryu Fast Float To String Conversion https://dl.acm.org/citation.cfm?id=3192369 https://github.com/ulfjack/ryu http://www.zverovich.net/2019/02/11/formatting-floating-point-numbers.html https://news.ycombinator.com/item?id=20181832 Jonathan Marler's D implementation of ryu: https://github.com/dragon-lang/mar/blob/master/src/mar/ryu.d For historical interest, here's DMC's version, which was state of the art in the 1980's: https://github.com/DigitalMars/dmc/blob/master/src/core/floatcvt.c
Oct 30 2019
On Thursday, 31 October 2019 at 01:09:14 UTC, Walter Bright wrote:Replacing snprintf for floating point is very challenging, because: 1. people have been improving snprintf for decades 2. people expect precision and performance 3. the standard is snprintf, any credible implementation must be the same or betterMoreover, actual printf implementations seems to depend upon the locale. This creates bugs (say "1,4" instead of "1.4") so this behaviour depends if you want to be bug-compatible. We've been hit by that in `printed` when used with a Russian locale.
Oct 31 2019
On Thu, Oct 31, 2019 at 10:14:59AM +0000, Guillaume Piolat via Digitalmars-d wrote:On Thursday, 31 October 2019 at 01:09:14 UTC, Walter Bright wrote:*Is* it a bug, though? Arguably, the reason snprintf was done that way was precisely to support properly-formatted output in the current locale. I.e., when outputting Russian text, the convention is to write the decimal point with "," rather than ".". It would be considered wrong or strange to write "1.4" instead of "1,4". This is important if you want to support i18n in your program. But if you're outputting to, say, JSON, then you *don't* ever want "1,4", you only want "1.4". Which leads me to think that these two should be separate format specifiers. Unfortunately, I can see how this would force format() to be impure, because to support checking the current locale implies accessing global state, which is impure. T -- I'm still trying to find a pun for "punishment"...Replacing snprintf for floating point is very challenging, because: 1. people have been improving snprintf for decades 2. people expect precision and performance 3. the standard is snprintf, any credible implementation must be the same or betterMoreover, actual printf implementations seems to depend upon the locale. This creates bugs (say "1,4" instead of "1.4") so this behaviour depends if you want to be bug-compatible. We've been hit by that in `printed` when used with a Russian locale.
Oct 31 2019
On Thursday, 31 October 2019 at 15:58:08 UTC, H. S. Teoh wrote:Which leads me to think that these two should be separate format specifiers.I would put the localization in a completely different function.Unfortunately, I can see how this would force format() to be impure, because to support checking the current locale implies accessing global state, which is impure.You could pass in the locale to the function, then it can be pure. Even more reason to have it as a separate function. I would say that should be best practice because you might want to run a program in a different locale than the global configured one. I'm not sure if it's enough to look at the locale. On my computer (a Mac) I have configured it to have the language in English but the date, time, number and currency format to Swedish. -- /Jacob Carlborg
Nov 01 2019
On Fri, Nov 01, 2019 at 01:01:21PM +0000, Jacob Carlborg via Digitalmars-d wrote:On Thursday, 31 October 2019 at 15:58:08 UTC, H. S. Teoh wrote:That would be a better solution. It would be different from snprintf, though, and we'd have to document it well so that people can find it.Which leads me to think that these two should be separate format specifiers.I would put the localization in a completely different function.+1.Unfortunately, I can see how this would force format() to be impure, because to support checking the current locale implies accessing global state, which is impure.You could pass in the locale to the function, then it can be pure. Even more reason to have it as a separate function. I would say that should be best practice because you might want to run a program in a different locale than the global configured one.I'm not sure if it's enough to look at the locale. On my computer (a Mac) I have configured it to have the language in English but the date, time, number and currency format to Swedish.[...] I think it has to do with the LC_* environment variables, at least on a *nix system. You can set LC_ALL to get the same settings across all categories, or you can separately set one or more of the LC_* to get different settings in each category. (Caveat: I've never actually done this myself before, so I could be misunderstanding how it works.) T -- Famous last words: I wonder what will happen if I do *this*...
Nov 01 2019
On Friday, 1 November 2019 at 17:02:53 UTC, H. S. Teoh wrote:I think it has to do with the LC_* environment variables, at least on a *nix system. You can set LC_ALL to get the same settings across all categories, or you can separately set one or more of the LC_* to get different settings in each category. (Caveat: I've never actually done this myself before, so I could be misunderstanding how it works.)Yeah, POSIX, so POSIX-compliant C compilers should support it... https://docs.oracle.com/cd/E19253-01/817-2521/overview-39/index.html Other languages do not have to follow it, of course.
Nov 01 2019
On 2019-11-01 18:02, H. S. Teoh wrote:That would be a better solution. It would be different from snprintf, though, and we'd have to document it well so that people can find it.It depends on what the goal is. If it is to have 100% compatible drop-in replacement to snprintf then we need to include the localization. But if the goal is just to have a function that converts values to a string, which is implemented in D, then have the opportunity to make a better interface. -- /Jacob Carlborg
Nov 02 2019
On Saturday, 2 November 2019 at 16:59:15 UTC, Jacob Carlborg wrote:On 2019-11-01 18:02, H. S. Teoh wrote:+1 That's actually, what I ask myself all the time. I personally prefer the second approach. And a similar question arrises with the rounding problem, which is even a little bit more difficult, because the IEEE standard interferes here too.That would be a better solution. It would be different from snprintf, though, and we'd have to document it well so that people can find it.It depends on what the goal is. If it is to have 100% compatible drop-in replacement to snprintf then we need to include the localization. But if the goal is just to have a function that converts values to a string, which is implemented in D, then have the opportunity to make a better interface.
Nov 04 2019
On Saturday, 2 November 2019 at 16:59:15 UTC, Jacob Carlborg wrote:It depends on what the goal is. If it is to have 100% compatible drop-in replacement to snprintf then we need to include the localization. But if the goal is just to have a function that converts values to a string, which is implemented in D, then have the opportunity to make a better interface.+1 this is important since we've had localization bug and I suspect it's very easy to have such bugs. Warning: `format` is affected too! (perhaps only when using the %f format specifier?) https://github.com/AuburnSounds/printed/issues/22 ugly fix: https://github.com/AuburnSounds/printed/commit/797343c0fc213ea34aa5b79b61cdc1164ae189df There is a non-zero chance that people _are_ relying on `format` and `snprintf` being localization-aware. So a "drop-in" replacement need to fix this mess by being bug-compatible, or not being drop-in.
Nov 04 2019
On Friday, 1 November 2019 at 17:02:53 UTC, H. S. Teoh wrote:On Fri, Nov 01, 2019 at 01:01:21PM +0000, Jacob Carlborg via Digitalmars-d wrote:https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f027338b0fab0f5078971fbe This is kind of relevant to the whole issue with locales, and how they simply don't work as they should. Probably best to not replicate at all, and instead just pass in the necessary localisation to the format function, if necessary.On Thursday, 31 October 2019 at 15:58:08 UTC, H. S. Teoh wrote: [...]
Nov 02 2019
On Thursday, 31 October 2019 09:58:08 MDT H. S. Teoh via Digitalmars-d wrote:On Thu, Oct 31, 2019 at 10:14:59AM +0000, Guillaume Piolat viaDigitalmars-d wrote:The version of format that takes the format specifier as a compile-time argument shouldn't have that problem, but the one that took it as a runtime argument certainly would. - Jonathan M DavisOn Thursday, 31 October 2019 at 01:09:14 UTC, Walter Bright wrote:*Is* it a bug, though? Arguably, the reason snprintf was done that way was precisely to support properly-formatted output in the current locale. I.e., when outputting Russian text, the convention is to write the decimal point with "," rather than ".". It would be considered wrong or strange to write "1.4" instead of "1,4". This is important if you want to support i18n in your program. But if you're outputting to, say, JSON, then you *don't* ever want "1,4", you only want "1.4". Which leads me to think that these two should be separate format specifiers. Unfortunately, I can see how this would force format() to be impure, because to support checking the current locale implies accessing global state, which is impure.Replacing snprintf for floating point is very challenging, because: 1. people have been improving snprintf for decades 2. people expect precision and performance 3. the standard is snprintf, any credible implementation must be the same or betterMoreover, actual printf implementations seems to depend upon the locale. This creates bugs (say "1,4" instead of "1.4") so this behaviour depends if you want to be bug-compatible. We've been hit by that in `printed` when used with a Russian locale.
Oct 31 2019
On Thursday, 31 October 2019 at 10:14:59 UTC, Guillaume Piolat wrote:Moreover, actual printf implementations seems to depend upon the locale. This creates bugs (say "1,4" instead of "1.4") so this behaviour depends if you want to be bug-compatible. We've been hit by that in `printed` when used with a Russian locale.Meanwhile, my implementation for the f (and F) qualifier is (almost) finished. Yet, the locale-stuff is missing and I do not manage to implement it. Maybe someone can help me: a) I need to create some test. As far as I know, I've to execute "export LANG=de_DE.UTF-8" (in bash, debian) to make it use the german locale, which should replace the dot by a comma. Unfortunately writefln!"%.10f"(0.1) still writes a dot instead of the expected ",". Instead of "LANG" I tried several other stuff, like LC_ALL or LC_NUMERIC. Any idea what I do wrong here? b) How to query the current locale from D? Actually I only need the number-separator in the current locale as a dchar. I found core.stdc.locale but do not know how to use it.
Nov 06 2019
On Wednesday, 6 November 2019 at 13:25:38 UTC, berni44 wrote:On Thursday, 31 October 2019 at 10:14:59 UTC, Guillaume Piolat wrote:I think the best way to go is to make it locale-independent and simply provide a way for user to specify the decimal separator (and other related locale details, if any).[...]Meanwhile, my implementation for the f (and F) qualifier is (almost) finished. Yet, the locale-stuff is missing and I do not manage to implement it. Maybe someone can help me: a) I need to create some test. As far as I know, I've to execute "export LANG=de_DE.UTF-8" (in bash, debian) to make it use the german locale, which should replace the dot by a comma. Unfortunately writefln!"%.10f"(0.1) still writes a dot instead of the expected ",". Instead of "LANG" I tried several other stuff, like LC_ALL or LC_NUMERIC. Any idea what I do wrong here? b) How to query the current locale from D? Actually I only need the number-separator in the current locale as a dchar. I found core.stdc.locale but do not know how to use it.
Nov 06 2019
On Wed, Nov 06, 2019 at 04:17:32PM +0000, Petar via Digitalmars-d wrote:On Wednesday, 6 November 2019 at 13:25:38 UTC, berni44 wrote:[...]Yes, I think in the long run this will be the more viable approach. Depending on locale as a global state is problematic because it forces formatting to be impure, and also forces users to implement hacks when they need to temporarily change the locale. E.g., in a system like snprintf, if you need to format German text with snippets of English quotations, you will have to temporarily override LC_* somehow in order to print a number with two different separators, or hack it with string postprocessing, etc.. It's better to let the user pass in the desired separator as a parameter -- the ',' flag in std.format already does this via the optional '?' modifier, for example: writefln("%,?d", '_', 12345678); // 12_345_678 writefln("%,?d", '|', 12345678); // 12|345|678 Conceivably one could extend the '.' flag with a '?' modifier as well, so something like this: writefln("%.2?d", ',', 3.141592); // 3,14 writefln("%.2?d", '_', 3.141592); // 3_14 writefln("%.2?d", ':', 3.141592); // 3:14 Then programs that want to support locales can just do this: writefln("%.2?d", curLocale.separator, 3.141592); T -- I don't trust computers, I've spent too long programming to think that they can get anything right. -- James Millerb) How to query the current locale from D? Actually I only need the number-separator in the current locale as a dchar. I found core.stdc.locale but do not know how to use it.I think the best way to go is to make it locale-independent and simply provide a way for user to specify the decimal separator (and other related locale details, if any).
Nov 06 2019
On Wednesday, 6 November 2019 at 17:28:58 UTC, H. S. Teoh wrote:Yes, I think in the long run this will be the more viable approach. Depending on locale as a global state is problematic because it forces formatting to be impure, and also forces users to implement hacks when they need to temporarily change the locale. E.g., in a system like snprintf, if you need to format German text with snippets of English quotations, you will have to temporarily override LC_* somehow in order to print a number with two different separators, or hack it with string postprocessing, etc..All while setlocale doesn't even provide any sort of thread-safety!
Nov 06 2019
On Wednesday, 6 November 2019 at 17:28:58 UTC, H. S. Teoh wrote:Then programs that want to support locales can just do this: writefln("%.2?d", curLocale.separator, 3.141592);For %f, the decimal separator is not the only locale specific info. Full list: -decimal separator -negative pattern -positive pattern -infinity symbol -nan symbol -digit shapes, especially for Arabic and Thai For %d and %g there are more like digit grouping/group separator.
Nov 06 2019
On Wed, Nov 06, 2019 at 06:21:43PM +0000, Rumbu via Digitalmars-d wrote:On Wednesday, 6 November 2019 at 17:28:58 UTC, H. S. Teoh wrote:[...]Then programs that want to support locales can just do this: writefln("%.2?d", curLocale.separator, 3.141592);For %f, the decimal separator is not the only locale specific info. Full list: -decimal separator -negative pattern -positive pattern -infinity symbol -nan symbol -digit shapes, especially for Arabic and Thai For %d and %g there are more like digit grouping/group separator.[...] Haha, wonderful. Don't you just love it when i18n consistently throws a monkey wrench into any simplistic scheme? Almost makes me want to suggest that we need std.i18n before we can implement anything sane i18n-wise. But since that's not gonna happen in the foreseeable future, and I'm sick and tired of the trend around these parts of letting the perfect be the enemy of the good, I'm going to propose that we just forget about i18n and just implement formatting for an English-specific locale. If users *really* want to support locales, just use %s with a wrapper struct with a toString method that does whatever it takes to get the right output. I've used this pattern for various problems with formatting complex objects, and it works fairly well: struct i18nFmt { float f; // or double, real, whatever int precision; ... // any other params here, like decimal point format, etc. void toString(S)(S sink) if (isOutputRange!(S, char)) { ... // do whatever you need to do here to // produce the right output } } ... float myData = ...; // just use %s instead of some incomprehensible over-engineered // crap like %1:3,$13&.*^_7?f output = format("%s", myData.i18nFmt); // or: output2 = format("%s", myData.i18nFmt(curLocale.precision, ... /* whatever else */)); This way you lift the complexity out of std.format where it really doesn't belong, and make it possible to plug in different locale handling modules in its place. This even opens the door for a future std.i18n that simply exports a bunch of these locale-dependent proxy formatters that you could just append to your data items. Much more extensible and flexible than trying to shoehorn everything into std.format, which will inevitably turn it into a nasty hairball of intractible dependencies that's impossible to make pure, nothrow, etc.. (Oh wait, it's already such a hairball. :-D Let's not make it worse!) And it makes std.format more pay-as-you-go; if you never need to use std.i18n it won't pull it in as a dependency just because it needs to support an obscure format specifier that you don't actually use. T -- Being forced to write comments actually improves code, because it is easier to fix a crock than to explain it. -- G. Steele
Nov 06 2019
On Wednesday, 6 November 2019 at 18:21:43 UTC, Rumbu wrote:For %f, the decimal separator is not the only locale specific info. Full list: -decimal separator -negative pattern -positive pattern -infinity symbol -nan symbol -digit shapes, especially for Arabic and Thai For %d and %g there are more like digit grouping/group separator.snprintf only uses the decimal separator (and grouping but that's not used inside format, the grouping is done separately there). All else is ignored by snprintf.
Nov 07 2019
On Wednesday, 6 November 2019 at 17:28:58 UTC, H. S. Teoh wrote:Yes, I think in the long run this will be the more viable approach. Depending on locale as a global state is problematic because it forces formatting to be impure, and also forces users to implement hacks when they need to temporarily change the locale. E.g., in a system like snprintf, if you need to format German text with snippets of English quotations, you will have to temporarily override LC_* somehow in order to print a number with two different separators, or hack it with string postprocessing, etc..My current approch is a pure and safe function that's doing the formating, but ignores the locale completely. This function is called from formatValueImpl and could be modified there, if desired. Currently (I want to make small steps), the function can only be used for the f (and F) specifier (and only for float and double). For all other specifiers/types snprintf is still called. That might result in different behaviour depending on the specifier and the type. I'd prefere to make it behave identically. Having said this, I completely agree, that it would be better if format ignores the locale and let's the user do this in a wrapper, if desired.
Nov 07 2019
On 2019-11-06 17:17, Petar Kirov [ZombineDev] wrote:I think the best way to go is to make it locale-independent and simply provide a way for user to specify the decimal separator (and other related locale details, if any).In my experience, I think it's best to leave the locale support to a separate API. The "snprintf" API is never going to be flexible enough. No one is using "snprintf" for serious localization. It's not just the decimal point that needs to be localized. There are various other number related things that need localization. Just have a look at the number formatter in Apple's API [1]. It's pretty big. Then they have separate formatters for currency, length, mass, interval and more. [1] https://developer.apple.com/documentation/foundation/nsnumberformatter?language=objc -- /Jacob Carlborg
Nov 06 2019
On Wed, Nov 06, 2019 at 07:43:06PM +0100, Jacob Carlborg via Digitalmars-d wrote: [...]In my experience, I think it's best to leave the locale support to a separate API. The "snprintf" API is never going to be flexible enough. No one is using "snprintf" for serious localization. It's not just the decimal point that needs to be localized. There are various other number related things that need localization. Just have a look at the number formatter in Apple's API [1]. It's pretty big. Then they have separate formatters for currency, length, mass, interval and more.[...] Yeah, after thinking about this more, I've come to the same conclusion. Just use %s for anything that depends on complex locale-dependent configuration, and wrap your data item in a proxy object that does whatever it takes to make it work. float myQuantity = ...; auto output = format("%s", myQuantity.localeFmt(...)); where localeFmt is some function or wrapper struct overloading toString that does whatever it takes to format the data in a locale-specific way. T -- "Hi." "'Lo."
Nov 06 2019
On Wednesday, 6 November 2019 at 13:25:38 UTC, berni44 wrote:On Thursday, 31 October 2019 at 10:14:59 UTC, Guillaume Piolat wrote:This question comes late, but did you considered to just do an 1 to 1 translation of snprintf from C to D? Of course the second step would be to provide an idiomatic D version with the mentioned suggestions. But having a translation would already be fantastic. Kind regards AndreMoreover, actual printf implementations seems to depend upon the locale. This creates bugs (say "1,4" instead of "1.4") so this behaviour depends if you want to be bug-compatible. We've been hit by that in `printed` when used with a Russian locale.Meanwhile, my implementation for the f (and F) qualifier is (almost) finished. Yet, the locale-stuff is missing and I do not manage to implement it. Maybe someone can help me: a) I need to create some test. As far as I know, I've to execute "export LANG=de_DE.UTF-8" (in bash, debian) to make it use the german locale, which should replace the dot by a comma. Unfortunately writefln!"%.10f"(0.1) still writes a dot instead of the expected ",". Instead of "LANG" I tried several other stuff, like LC_ALL or LC_NUMERIC. Any idea what I do wrong here? b) How to query the current locale from D? Actually I only need the number-separator in the current locale as a dchar. I found core.stdc.locale but do not know how to use it.
Nov 06 2019
On Wednesday, 6 November 2019 at 16:54:25 UTC, Andre Pany wrote:This question comes late, but did you considered to just do an 1 to 1 translation of snprintf from C to D?I scanned through the implementation of snprintf several times while I wrote the replacement. I think, the main algorithm is quite similar, apart from some speed improvement for numbers close to zero, which turned out to be quite nasty in detail (and which for now I skipped therefore). By the way: A 1 to 1 translation would not be something, I could do, because my knowledge of C is very little and the algorithm contains lot's of calls to functions I do not know, where to look them up and how to replace them with D functions.
Nov 07 2019
On Wednesday, 6 November 2019 at 13:25:38 UTC, berni44 wrote:a) I need to create some test. As far as I know, I've to execute "export LANG=de_DE.UTF-8" (in bash, debian) to make it use the german locale, which should replace the dot by a comma. Unfortunately writefln!"%.10f"(0.1) still writes a dot instead of the expected ",". Instead of "LANG" I tried several other stuff, like LC_ALL or LC_NUMERIC. Any idea what I do wrong here?If D wishes to behave the same as C, this is correct behavior. C requires the locale "C" to be activated at program startup. The C-way to use the environment's locale is to call setlocale for the relevant category with an empty string for the locale value. e. g. setlocale(LC_ALL, "")b) How to query the current locale from D? Actually I only need the number-separator in the current locale as a dchar. I found core.stdc.locale but do not know how to use it.You can query the current locale of a given category by calling setlocale with a null-pointer for the locale, it will return the currently set locale as a C-string. The formatting-information is returned by localeconv(). Not sure why the docs don't show the members of lconv, but it contains decimal_point, which is a C-string of the decimal separator. setlocale(LC_ALL, "de_DE.UTF-8"); localeconv.decimal_point.fromStringz.writeln; prints ","
Nov 06 2019
On Thursday, 31 October 2019 at 01:09:14 UTC, Walter Bright wrote:To that end, you'll need to be familiar with the following:Thanks for that list. I'll have a look, when I find the time to do so.754-2019 IEEE Standard for Floating Point Arithmetic https://ieeexplore.ieee.org/document/8766229Unfortunately I cannot download this file. I've got no company listed there and I'm not willing to pay for it...Ryu Fast Float To String Conversion https://dl.acm.org/citation.cfm?id=3192369 https://github.com/ulfjack/ryu [...] Jonathan Marler's D implementation of ryu: https://github.com/dragon-lang/mar/blob/master/src/mar/ryu.dI allready read the paper about ryu. IMHO it's of no use here, because the speed advantage comes from being more "inaccurate" than snprintf. Ryu is designed for a round-trip, while snprintf prints as many digits, as the user wants to get (even when they contain no more information). The same holds for grisu variants.
Oct 31 2019
On Wednesday, 30 October 2019 at 13:44:52 UTC, berni44 wrote:In PR 7222 [1] Robert Schadek suggested replacing the call to snprinf in std.format with an own method written in D.I suggested it because formatImpl!float is not pure which makes for instance std.json not pure among a few other.
Oct 31 2019
On Wednesday, 30 October 2019 at 13:44:52 UTC, berni44 wrote:In PR 7222 [1] Robert Schadek suggested replacing the call to snprinf in std.format with an own method written in D. [...]Meanwhile I filed a first PR: https://github.com/dlang/phobos/pull/7264 - only part of a complete replacement is achieved with that: Only the 'f' qualifier is replaced and that only for float and double. But it's a start and I want to make small steps. Many thanks to all of you, who answered to this thread or gave hints at other places. This helped a lot. :-)
Nov 08 2019
On Friday, 8 November 2019 at 14:42:29 UTC, berni44 wrote:Meanwhile I filed a first PR: https://github.com/dlang/phobos/pull/7264 - only part of a complete replacement is achieved with that: Only the 'f' qualifier is replaced and that only for float and double. But it's a start and I want to make small steps.Update: While this first PR is still waiting for being revied, a second PR (the same for '%a' qualifier) has been merged last week. Today I filed a third PR (for '%e' qualifier). The '%g' qualifier has to wait until these two PRs are merged, because it depends strongly on those two. With the help of Petar Kirov [ZombineDev] I meanwhile also managed to make the whole CTFEable. But this also has to wait for the two PRs mentioned above. Next steps will be some speed optimization for small exponents (works allready on paper but I havn't implemented and tested it yet) and for large exponents (only a vague idea yet).
Dec 14 2019
On Saturday, 14 December 2019 at 08:44:21 UTC, berni44 wrote:Update: While this first PR is still waiting for being revied, a second PR (the same for '%a' qualifier) has been merged last week. Today I filed a third PR (for '%e' qualifier). The '%g' qualifier has to wait until these two PRs are merged, because it depends strongly on those two. With the help of Petar Kirov [ZombineDev] I meanwhile also managed to make the whole CTFEable. But this also has to wait for the two PRs mentioned above.I'm still waiting for a review... The algorithm isn't really complicated. It's essentially successive division/multiplication by 10. Unfortunately with numbers larger than ulong and there is all the stuff with the flags, precision and width... But should be doable anyway, IMHO. * https://github.com/dlang/phobos/pull/7264 * https://github.com/dlang/phobos/pull/7318
Jan 24 2020