digitalmars.D - Replacement for snprintf

berni44 (34/34) Oct 30 2019 In PR 7222 [1] Robert Schadek suggested replacing the call to

=?UTF-8?Q?Ali_=c3=87ehreli?= (4/9) Oct 30 2019 The tie-breaker is to always round towards the even digit. So it should

berni44 (8/10) Oct 30 2019 As far as I know that's for avoiding error propagation, when

Jon Degenhardt (6/18) Oct 30 2019 It's reasonably common to have numeric values written out in text

berni44 (15/20) Oct 30 2019 But IMHO this is the fault of people who do this and not the

=?UTF-8?Q?Ali_=c3=87ehreli?= (4/6) Oct 30 2019 Just to make sure, you are aware of the optional '-' before '(', right?

berni44 (11/13) Oct 30 2019 I know this. I personally think, it is somewhat ugly, but I

Sebastiaan Koppe (2/11) Oct 30 2019 You are correct, but people will still blame the printing routine.

drug (6/17) Oct 31 2019 I wouldn't state it is any fault. In some cases it is much more

Walter Bright (2/6) Oct 31 2019 To get round-trip 100% accuracy, print the floats in hex using the %A fo...

Stefan Koch (6/14) Oct 31 2019 DtoA is also supposed to have 100% accuracy, when it comes to

H. S. Teoh (8/11) Oct 31 2019 Meybe we should be using your implementation then? No need to duplicate

H. S. Teoh (15/28) Oct 30 2019 If you haven't already, please read:

berni44 (26/37) Oct 30 2019 Thanks for that link. I havn't had a look into the grisu

H. S. Teoh (30/57) Oct 30 2019 Yeah, I've been waiting for a long time for a pure, @safe, and CTFE-able

Rumbu (12/20) Oct 30 2019 According to ieee754-2008:

berni44 (9/12) Oct 30 2019 Is it really a "must"? We are not completely bound by the IEEE

Rumbu (12/24) Oct 30 2019 I don't know the inners of your code, but I suppose that before

berni44 (6/8) Oct 31 2019 Unfortunately not. Think of 0.1500000000000001 rounded to one

H. S. Teoh (22/34) Oct 30 2019 For non-comparable floats x and y (i.e., at least one is a NaN), D has

Stefan Koch (5/9) Oct 30 2019 If you could post that so I can have a look over the WIP that'd

berni44 (20/22) Oct 31 2019 See https://github.com/berni44/phobos/tree/printf

Walter Bright (21/21) Oct 30 2019 Replacing snprintf for floating point is very challenging, because:

Guillaume Piolat (5/11) Oct 31 2019 Moreover, actual printf implementations seems to depend upon the

H. S. Teoh (16/28) Oct 31 2019 *Is* it a bug, though? Arguably, the reason snprintf was done that way

Jacob Carlborg (11/16) Nov 01 2019 You could pass in the locale to the function, then it can be

H. S. Teoh (13/30) Nov 01 2019 That would be a better solution. It would be different from snprintf,

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (4/10) Nov 01 2019 Yeah, POSIX, so POSIX-compliant C compilers should support it...
Jacob Carlborg (8/10) Nov 02 2019 It depends on what the goal is. If it is to have 100% compatible drop-in...

berni44 (8/19) Nov 04 2019 +1
Guillaume Piolat (13/19) Nov 04 2019 +1 this is important since we've had localization bug and I

Uknown (6/10) Nov 02 2019 https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f027338b0fab0f5...

Jonathan M Davis (7/32) Oct 31 2019 Digitalmars-d wrote:
berni44 (14/18) Nov 06 2019 Meanwhile, my implementation for the f (and F) qualifier is

Petar Kirov [ZombineDev] (4/19) Nov 06 2019 I think the best way to go is to make it locale-independent and

H. S. Teoh (25/33) Nov 06 2019 Yes, I think in the long run this will be the more viable approach.

lithium iodate (3/12) Nov 06 2019 All while setlocale doesn't even provide any sort of
Rumbu (10/12) Nov 06 2019 For %f, the decimal separator is not the only locale specific

H. S. Teoh (49/65) Nov 06 2019 [...]
berni44 (4/14) Nov 07 2019 snprintf only uses the decimal separator (and grouping but that's

berni44 (13/22) Nov 07 2019 My current approch is a pure and @safe function that's doing the

Jacob Carlborg (13/16) Nov 06 2019 In my experience, I think it's best to leave the locale support to a

H. S. Teoh (14/24) Nov 06 2019 [...]

Andre Pany (8/26) Nov 06 2019 This question comes late, but did you considered to just do an 1

berni44 (10/12) Nov 07 2019 I scanned through the implementation of snprintf several times

lithium iodate (16/25) Nov 06 2019 If D wishes to behave the same as C, this is correct behavior. C

berni44 (10/19) Oct 31 2019 Thanks for that list. I'll have a look, when I find the time to

Robert Schadek (4/6) Oct 31 2019 I suggested it because formatImpl!float is not pure which makes
berni44 (8/10) Nov 08 2019 Meanwhile I filed a first PR:

berni44 (12/17) Dec 14 2019 Update: While this first PR is still waiting for being revied, a

berni44 (8/16) Jan 24 2020 I'm still waiting for a review... The algorithm isn't really

berni44 <dlang d-ecke.de> writes:

In PR 7222 [1] Robert Schadek suggested replacing the call to 
snprinf in std.format with an own method written in D. During the 
last days I took a deeper look into this and meanwhile I've got a 
function that works for floats (and probably also doubles, but I 
havn't tested that yet and it should also work with reals if 
ucent would be available; without ucent I need a workaround for 
real or fall back to BigInt).

I only implemented f qualifier yet, but it shouldn't be difficult 
to add e and g qualifiers and the uppercase versions. Also some 

but again, I think, this will not be very difficult. 
Unfortunately I'll be busy with some other (non-D) stuff for some 
time. I'll probably continue work on this someday in november.

I checked correctness for floats by comparing to the result of 
snprintf for about 1% of all numbers (I will do that for all, 
before filing an PR though). The only difference are rounding 
issues, when the number is exactly between two adjacent ways of 
displaying. The implementation of snprintf on my computer always 
rounds towards zero while mine rounds in the opposite direction. 
(E.g. 0.125 rounded to two digits is 0.13 in my implementation 
while it's 0.12 in snprintfs implementation) I doubt, that 
different implementations of printf-variants are all identical in 
this regard.

I also compared the speed of both implementations. They are 
generally in the same order of magnitude (600-2800ns per number, 
depending on precision and number). On average my implementation 
is slightly faster. For numbers close to 0 the snprintf 
implementation is faster (I wasn't able to follow the algorithm 
they use), especially if the desired precision is large (I'll try 
to improve this, because it might get a real problem for reals). 
For all other numbers my current implementation wins by a more or 
less small margin.

[1] 
https://github.com/dlang/phobos/pull/7222#issuecomment-544909188

Oct 30 2019

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 10/30/2019 06:44 AM, berni44 wrote:
 The only difference are rounding issues, when the number is
 exactly between two adjacent ways of displaying. The implementation of
 snprintf on my computer always rounds towards zero while mine rounds in
 the opposite direction. (E.g. 0.125 rounded to two digits is 0.13 in my
 implementation while it's 0.12 in snprintfs implementation)

The tie-breaker is to always round towards the even digit. So it should 
always produce 1.12, 1.14, etc.

Ali

Oct 30 2019

berni44 <dlang d-ecke.de> writes:

On Wednesday, 30 October 2019 at 15:48:44 UTC, Ali Çehreli wrote:
 The tie-breaker is to always round towards the even digit. So 
 it should always produce 1.12, 1.14, etc.

As far as I know that's for avoiding error propagation, when 
intermediate results need to be rounded. When I'm not completely 
mistaken, Donald Knuth prooved that rounding toward even avoids 
errors that might building up using several such steps.

But here there is little chance, that the result will be used for 
new calculations. It's most often used for printing a result that 
humans have to read. This is different.

Oct 30 2019

Jon Degenhardt <jond noreply.com> writes:

On Wednesday, 30 October 2019 at 16:04:10 UTC, berni44 wrote:
 On Wednesday, 30 October 2019 at 15:48:44 UTC, Ali Çehreli 
 wrote:
 The tie-breaker is to always round towards the even digit. So 
 it should always produce 1.12, 1.14, etc.

 As far as I know that's for avoiding error propagation, when 
 intermediate results need to be rounded. When I'm not 
 completely mistaken, Donald Knuth prooved that rounding toward 
 even avoids errors that might building up using several such 
 steps.

 But here there is little chance, that the result will be used 
 for new calculations. It's most often used for printing a 
 result that humans have to read. This is different.

It's reasonably common to have numeric values written out in text 
format and read back in and used in subsequent computations. Not 
always a great idea, especially when done without much 
consideration for round-off errors. But it's not uncommon.

--Jon

Oct 30 2019

berni44 <dlang d-ecke.de> writes:

On Wednesday, 30 October 2019 at 17:50:03 UTC, Jon Degenhardt 
wrote:
 It's reasonably common to have numeric values written out in 
 text format and read back in and used in subsequent 
 computations. Not always a great idea, especially when done 
 without much consideration for round-off errors. But it's not 
 uncommon.

But IMHO this is the fault of people who do this and not the 
fault of a printing routine.

But: When pondering about how to fix the results of format for 
ranges of strings (it places currently quotes arround each 
string, which is somewhat inconsistent because single strings are 
printed without quotes, and causes confusion).

I came up with the idea of having a new format qualifier, maybe S 
like source, in addition to s, which prints the type in a way, 
that it can be directly used in D code (which is, as far as I 
know, the reason why the quotes are printed). That could be also 
used, to produce a representation of a float, that, when readin, 
is still the same float as before; which could be done by ryu or 
grisu algorithm, because these algorithms have exactly this goal.

Oct 30 2019

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 10/30/2019 12:19 PM, berni44 wrote:

 But: When pondering about how to fix the results of format for ranges of
 strings (it places currently quotes arround each string

Just to make sure, you are aware of the optional '-' before '(', right? 
"%-(%s%)" does not print the quotes.

Ali

Oct 30 2019

berni44 <dlang d-ecke.de> writes:

On Wednesday, 30 October 2019 at 19:28:27 UTC, Ali Çehreli wrote:
 Just to make sure, you are aware of the optional '-' before 
 '(', right? "%-(%s%)" does not print the quotes.

I know this. I personally think, it is somewhat ugly, but I 
understand how it came to have it like this. My rationale is more 
like this: Currently it probably won't be possible to change the 
behavior of %s, because that would be a code breaking change. But 
there might be a time in the future, where it's possible to do 
some code breaking changes, maybe when D2 -> D2.1 or something 
like this. It will be much easier to do these changes at that 
time, when there is a well tested, simple and working alternative 
that can be pointed out to the users. Therefore it's a good idea 
to implement this alternative right now. Isn't it?

Oct 30 2019

Sebastiaan Koppe <mail skoppe.eu> writes:

On Wednesday, 30 October 2019 at 19:19:06 UTC, berni44 wrote:
 On Wednesday, 30 October 2019 at 17:50:03 UTC, Jon Degenhardt 
 wrote:
 It's reasonably common to have numeric values written out in 
 text format and read back in and used in subsequent 
 computations. Not always a great idea, especially when done 
 without much consideration for round-off errors. But it's not 
 uncommon.

 But IMHO this is the fault of people who do this and not the 
 fault of a printing routine.

You are correct, but people will still blame the printing routine.

Oct 30 2019

drug <drug2004 bk.ru> writes:

On 10/30/19 10:54 PM, Sebastiaan Koppe wrote:
 On Wednesday, 30 October 2019 at 19:19:06 UTC, berni44 wrote:
 On Wednesday, 30 October 2019 at 17:50:03 UTC, Jon Degenhardt wrote:
 It's reasonably common to have numeric values written out in text 
 format and read back in and used in subsequent computations. Not 
 always a great idea, especially when done without much consideration 
 for round-off errors. But it's not uncommon.

 But IMHO this is the fault of people who do this and not the fault of 
 a printing routine.

 
 You are correct, but people will still blame the printing routine.

I wouldn't state it is any fault. In some cases it is much more 
productive to have text representation of data than binary ones. 
Initially I believed too that binary representation is the more suitable 
but afterwards I  was forced to use text format and that gives me a good 
result.

Oct 31 2019

Walter Bright <newshound2 digitalmars.com> writes:

On 10/31/2019 1:27 AM, drug wrote:
 In some cases it is much more productive to 
 have text representation of data than binary ones. Initially I believed too
that 
 binary representation is the more suitable but afterwards I  was forced to
use 
 text format and that gives me a good result.

To get round-trip 100% accuracy, print the floats in hex using the %A format.

Oct 31 2019

Stefan Koch <uplink.coder googlemail.com> writes:

On Thursday, 31 October 2019 at 20:20:24 UTC, Walter Bright wrote:
 On 10/31/2019 1:27 AM, drug wrote:
 In some cases it is much more productive to have text 
 representation of data than binary ones. Initially I believed 
 too that binary representation is the more suitable but 
 afterwards I  was forced to use text format and that gives me 
 a good result.

 To get round-trip 100% accuracy, print the floats in hex using 
 the %A format.

DtoA is also supposed to have 100% accuracy, when it comes to 
value, not necessarily to binary representation though.

I'd still prefer grisu2 over ryu, since it easier to understand 
and I already have a ctfeable version of it. (it can't be safe 
though since it casts double* to ulong*)

Oct 31 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Oct 31, 2019 at 09:04:49PM +0000, Stefan Koch via Digitalmars-d wrote:
[...]
 I'd still prefer grisu2 over ryu, since it easier to understand and I
 already have a ctfeable version of it.

Meybe we should be using your implementation then?  No need to duplicate
work if it's already been done.


 (it can't be safe though since it casts double* to ulong*)

But surely it can be  trusted?


T

-- 
Your inconsistency is the only consistent thing about you! -- KD

Oct 31 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Oct 30, 2019 at 01:44:52PM +0000, berni44 via Digitalmars-d wrote:
 In PR 7222 [1] Robert Schadek suggested replacing the call to snprinf
 in std.format with an own method written in D. During the last days I
 took a deeper look into this and meanwhile I've got a function that
 works for floats (and probably also doubles, but I havn't tested that
 yet and it should also work with reals if ucent would be available;
 without ucent I need a workaround for real or fall back to BigInt).
 
 I only implemented f qualifier yet, but it shouldn't be difficult to
 add e and g qualifiers and the uppercase versions. Also some work

 I think, this will not be very difficult. Unfortunately I'll be busy
 with some other (non-D) stuff for some time. I'll probably continue
 work on this someday in november.

If you haven't already, please read:

	https://www.zverovich.net/2019/02/11/formatting-floating-point-numbers.html

especially the papers linked in the first paragraph.

Formatting floating-point numbers is not a trivial task. It's easy to
write up something that works for common cases, but it's not so easy to
get something to gives the best results in *all* cases. You probably
should use the algorithms referenced above for your implementation,
instead of coming up with your own that may have unexpected corner cases
that don't produce the right output.


T

-- 
Valentine's Day: an occasion for florists to reach into the wallets of
nominal lovers in dire need of being reminded to profess their
hypothetical love for their long-forgotten.

Oct 30 2019

berni44 <dlang d-ecke.de> writes:

On Wednesday, 30 October 2019 at 17:41:26 UTC, H. S. Teoh wrote:
 If you haven't already, please read:

 	https://www.zverovich.net/2019/02/11/formatting-floating-point-numbers.html

 especially the papers linked in the first paragraph.

Thanks for that link. I havn't had a look into the grisu 
algorithms. But I'll definitivly do that.

 Formatting floating-point numbers is not a trivial task. It's 
 easy to write up something that works for common cases, but 
 it's not so easy to get something to gives the best results in 
 *all* cases.

I know, that this is something we all wish. Anyway, my goal is 
set somewhat lower: I'd like to replace the existing call to 
snprintf with something that is programmed in D and which should 
be pure,  safe and ctfeable. And ideally it should not be slower 
then snprintf.

 You probably should use the algorithms referenced above for 
 your implementation,

I read through the paper for the ryu algorithm and rejected it 
(at least for me; if someone else is goint to implement it and 
file a PR that's fine). My reason for rejecting is, that the 
algorithm has not exactly the same goal as printf, which IMHO 
means, that it cannot be used here; and that it needs a 
lookuptable, that is too large (300K for 128bit reals).

I fear a little bit, from what I read in the ryu paper about the 
grisu algorithms, that it has the first of the above mentioned 
problems too. But yet I can't tell for sure.

 instead of coming up with your own that may have
 unexpected corner cases that don't produce the right output.

Obviously I need to prove, that the algorithm is correct somehow. 
While this can be done for floats by running it on all numbers 
and comparing these results with the result of snprintf (or the 
result calculated by bc), for doubles and reals, this isn't 
possible anymore (a random sample can be tested anyway, but 
that's no proof). Anyway, I think, that the proof isn't hard to 
give. The current algorithm is short and straight forward. (And: 
When I implement one of the mentioned algorithms, it can still 
contain bugs, because I made a mistake somewhere.)

Oct 30 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Oct 30, 2019 at 07:11:14PM +0000, berni44 via Digitalmars-d wrote:
 On Wednesday, 30 October 2019 at 17:41:26 UTC, H. S. Teoh wrote:

[...]
 Formatting floating-point numbers is not a trivial task. It's easy
 to write up something that works for common cases, but it's not so
 easy to get something to gives the best results in *all* cases.

 
 I know, that this is something we all wish. Anyway, my goal is set
 somewhat lower: I'd like to replace the existing call to snprintf with
 something that is programmed in D and which should be pure,  safe and
 ctfeable. And ideally it should not be slower then snprintf.

Yeah, I've been waiting for a long time for a pure,  safe, and CTFE-able
floating point formatter in D.  What rumbu said about rounding mode,
though, makes me fear that pure may not be attainable if we're going to
be IEEE-compliant (since accessing the current rounding mode would be
technically impure).

Then again, the CTFE-able version can probably be made pure, since CTFE
cannot change rounding mode in the compiler's runtime environment, so if
we detect CTFE then we can just assume the default rounding mode.

	auto formatFloat(F)(F f) {
		FloatingPointControl fc;
		if (__ctfe)
			return formatFloatImpl(fc.roundToNearest); // pure
		else
			return formatFloatImpl(fc.rounding); // impure
	}

should do it.


 You probably should use the algorithms referenced above for your
 implementation,

 
 I read through the paper for the ryu algorithm and rejected it (at
 least for me; if someone else is goint to implement it and file a PR
 that's fine). My reason for rejecting is, that the algorithm has not
 exactly the same goal as printf, which IMHO means, that it cannot be
 used here; and that it needs a lookuptable, that is too large (300K
 for 128bit reals).

Why is it too large?  Couldn't you generate the table with CTFE? :-D

Or statically generate it and then import it, like std.uni does with the
various Unicode tables (see std.internal.unicode_*).


[...]
 Obviously I need to prove, that the algorithm is correct somehow.
 While this can be done for floats by running it on all numbers and
 comparing these results with the result of snprintf (or the result
 calculated by bc), for doubles and reals, this isn't possible anymore
 (a random sample can be tested anyway, but that's no proof).

Are we just copying whatever snprintf does? Is snprintf really a
reliable standard to go by?


 Anyway, I think, that the proof isn't hard to give. The current
 algorithm is short and straight forward. (And: When I implement one of
 the mentioned algorithms, it can still contain bugs, because I made a
 mistake somewhere.)

You don't necessarily have to implement grisu, et al, verbatim, but your
algorithm should at least gracefully handle the special cases and
potentially problematic cases cited in the papers.


T

-- 
Too many people have open minds but closed eyes.

Oct 30 2019

Rumbu <rumbu rumbu.ro> writes:

On Wednesday, 30 October 2019 at 13:44:52 UTC, berni44 wrote:
 In PR 7222 [1] Robert Schadek suggested replacing the call to 
 snprinf in std.format with an own method written in D. During 
 the last days I took a deeper look into this and meanwhile I've 
 got a function that works for floats (and probably also 
 doubles, but I havn't tested that yet and it should also work 
 with reals if ucent would be available; without ucent I need a 
 workaround for real or fall back to BigInt).

 [...]

According to ieee754-2008:

"5.12.2 External decimal character sequences representing finite 
numbers

[...]

For binary formats, all conversions of H significant digits or 
fewer round correctly according to the applicable rounding 
direction;"

Where H is 9 for single, 17 for double. IEE754 doesn't specify a 
H for reals.


That means that snprintf must use the current rounding mode that 
can be read using FloatingPointControl.rounding from std.math.

Oct 30 2019

berni44 <dlang d-ecke.de> writes:

On Wednesday, 30 October 2019 at 18:16:56 UTC, Rumbu wrote:
 That means that snprintf must use the current rounding mode 
 that can be read using FloatingPointControl.rounding from 
 std.math.

Is it really a "must"? We are not completely bound by the IEEE 
standard and, if good reasons are available, might reject it. For 
example, comparing two floats with <= produces either "false" or 
"true" in D. According to IEEE there should be a third result 
possible, namly "not comparable". Having said this, it would be 
possible to implement it the way you claim, but probably at some 
cost (=slower, more and less easy readable lines of code). I'll 
think about it.

Oct 30 2019

Rumbu <rumbu rumbu.ro> writes:

On Wednesday, 30 October 2019 at 19:28:44 UTC, berni44 wrote:
 On Wednesday, 30 October 2019 at 18:16:56 UTC, Rumbu wrote:
 That means that snprintf must use the current rounding mode 
 that can be read using FloatingPointControl.rounding from 
 std.math.

 Is it really a "must"? We are not completely bound by the IEEE 
 standard and, if good reasons are available, might reject it. 
 For example, comparing two floats with <= produces either 
 "false" or "true" in D. According to IEEE there should be a 
 third result possible, namly "not comparable". Having said 
 this, it would be possible to implement it the way you claim, 
 but probably at some cost (=slower, more and less easy readable 
 lines of code). I'll think about it.

I don't know the inners of your code, but I suppose that before 
"printing" you end up with an integer value and a ten-based 
exponent.

In this case rounding becomes a question of how do you interpret 
the remainder of a division by a power of ten.

Because I spent a lot of time figuring out how  to format 
correctly decimal numbers, here is some piece of code I use in 
order to format decimal numbers depending on the rounding mode, 
fully compliant with the standard.

https://github.com/rumbu13/decimal/blob/a6bae32d75d56be16e82d37af0c8e4a7c08e318a/src/decimal/decimal.d#L7296

Hope this helps.

Oct 30 2019

berni44 <dlang d-ecke.de> writes:

On Wednesday, 30 October 2019 at 20:29:34 UTC, Rumbu wrote:
 In this case rounding becomes a question of how do you 
 interpret the remainder of a division by a power of ten.

Unfortunately not. Think of 0.1500000000000001 rounded to one 
digit. It's clear, that a reminder of 0-4 is rounded down and of 
6-9 is rounded up. But to decide in the case of a 5 you might 
need to look at the next digits if rounding mode tells you to 
round down in the case of 0.5...

Oct 31 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Oct 30, 2019 at 07:28:44PM +0000, berni44 via Digitalmars-d wrote:
 On Wednesday, 30 October 2019 at 18:16:56 UTC, Rumbu wrote:
 That means that snprintf must use the current rounding mode that can
 be read using FloatingPointControl.rounding from std.math.

 
 Is it really a "must"? We are not completely bound by the IEEE
 standard and, if good reasons are available, might reject it. For
 example, comparing two floats with <= produces either "false" or
 "true" in D. According to IEEE there should be a third result
 possible, namly "not comparable".

For non-comparable floats x and y (i.e., at least one is a NaN), D has
the semantics:

	x < y		false
	x <= y		false
	x >= y		false
	x > y		false
	x == y		false
	x != y		true

D used to have other comparison operators that handle various
NaN-related subtleties (the so-called "spaceship operators" because of
their alien appearance), but they were deprecated because nobody
understood them so nobody used them.

Having said that, though, I think we should try to conform to IEEE as
much as possible, and there better be very good reasons when we don't.


 Having said this, it would be possible to implement it the way you
 claim, but probably at some cost (=slower, more and less easy readable
 lines of code). I'll think about it.

As I have said, floating-point formatting is far from the trivial affair
that it appears to be on the surface.  It's not something to be
undertaken lightly, because it's full of complicated corner cases that
must be handled correctly.


T

-- 
Never wrestle a pig. You both get covered in mud, and the pig likes it.

Oct 30 2019

Stefan Koch <uplink.coder googlemail.com> writes:

On Wednesday, 30 October 2019 at 13:44:52 UTC, berni44 wrote:
 In PR 7222 [1] Robert Schadek suggested replacing the call to 
 snprinf in std.format with an own method written in D. During 
 the last days I took a deeper look into this and meanwhile I've 
 got a function that works for floats

If you could post that so I can have a look over the WIP that'd 
be nice.

Grisu2 also uses lookup tables, though for 52bit mantissa floats 
it's completely fine.

Oct 30 2019

berni44 <dlang d-ecke.de> writes:

On Wednesday, 30 October 2019 at 20:46:07 UTC, Stefan Koch wrote:
 If you could post that so I can have a look over the WIP that'd 
 be nice.

See https://github.com/berni44/phobos/tree/printf

The function can be found at the end of std/format.d. I had to 
comment out some unittests, because e and g qualifiers are not 
yet supported. I put several comments in the code, so I hope it's 
clear, what always happens. If not, feel free to ask. (I'll be 
offline during the weekend.)

I also added a diagram for speed comparison. See 
https://github.com/berni44/phobos/blob/printf/diagram.png

Blue and green use "%.10f" while black and red use "%.100f". Blue 
and red is my function, while green and black is snprintf. The 
X-axis gives the value in the exponent from 0 to 255, the y-axis 
gives the average time in nanoseconds. The green bottom line at 
the left is approx at 600ns. For each exponent there have been 
approx 217886 numbers checked (the same set for both functions).

As you can see, at the left side, snprintf is faster, having an 
almost constant time, while the time of mine is slightly 
increasing when exponents get smaller. I scanned the snprintf 
implementation to find out, what they do - see my comment in the 
implementation for details.

Oct 31 2019

Walter Bright <newshound2 digitalmars.com> writes:

Replacing snprintf for floating point is very challenging, because:

1. people have been improving snprintf for decades
2. people expect precision and performance
3. the standard is snprintf, any credible implementation must be the same or
better

To that end, you'll need to be familiar with the following:

754-2019 IEEE Standard for Floating Point Arithmetic
https://ieeexplore.ieee.org/document/8766229

Printing Floating-Pointer Numbers Quickly and Accurately with Integers
https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf

Printing Floating-Point Numbers
https://ranjitjhala.github.io/static/fp-printing-popl16.pdf

Ryu Fast Float To String Conversion
https://dl.acm.org/citation.cfm?id=3192369

https://github.com/ulfjack/ryu
http://www.zverovich.net/2019/02/11/formatting-floating-point-numbers.html
https://news.ycombinator.com/item?id=20181832

Jonathan Marler's D implementation of ryu:
https://github.com/dragon-lang/mar/blob/master/src/mar/ryu.d

For historical interest, here's DMC's version, which was state of the art in
the 
1980's:

https://github.com/DigitalMars/dmc/blob/master/src/core/floatcvt.c

Oct 30 2019

Guillaume Piolat <first.last gmail.com> writes:

On Thursday, 31 October 2019 at 01:09:14 UTC, Walter Bright wrote:
 Replacing snprintf for floating point is very challenging, 
 because:

 1. people have been improving snprintf for decades
 2. people expect precision and performance
 3. the standard is snprintf, any credible implementation must 
 be the same or better


Moreover, actual printf implementations seems to depend upon the 
locale. This creates bugs (say "1,4" instead of "1.4") so this 
behaviour depends if you want to be bug-compatible. We've been 
hit by that in `printed` when used with a Russian locale.

Oct 31 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Thu, Oct 31, 2019 at 10:14:59AM +0000, Guillaume Piolat via Digitalmars-d
wrote:
 On Thursday, 31 October 2019 at 01:09:14 UTC, Walter Bright wrote:
 Replacing snprintf for floating point is very challenging, because:
 
 1. people have been improving snprintf for decades
 2. people expect precision and performance
 3. the standard is snprintf, any credible implementation must be the
 same or better

 
 Moreover, actual printf implementations seems to depend upon the
 locale.  This creates bugs (say "1,4" instead of "1.4") so this
 behaviour depends if you want to be bug-compatible. We've been hit by
 that in `printed` when used with a Russian locale.

*Is* it a bug, though?  Arguably, the reason snprintf was done that way
was precisely to support properly-formatted output in the current
locale. I.e., when outputting Russian text, the convention is to write
the decimal point with "," rather than ".". It would be considered wrong
or strange to write "1.4" instead of "1,4". This is important if you
want to support i18n in your program.

But if you're outputting to, say, JSON, then you *don't* ever want
"1,4", you only want "1.4".

Which leads me to think that these two should be separate format
specifiers. Unfortunately, I can see how this would force format() to be
impure, because to support checking the current locale implies accessing
global state, which is impure.


T

-- 
I'm still trying to find a pun for "punishment"...

Oct 31 2019

Jacob Carlborg <doob me.com> writes:

On Thursday, 31 October 2019 at 15:58:08 UTC, H. S. Teoh wrote:

 Which leads me to think that these two should be separate 
 format specifiers.

I would put the localization in a completely different function.

 Unfortunately, I can see how this would force format() to be 
 impure, because to support checking the current locale implies 
 accessing global state, which is impure.

You could pass in the locale to the function, then it can be 
pure. Even more reason to have it as a separate function. I would 
say that should be best practice because you might want to run a 
program in a different locale than the global configured one.

I'm not sure if it's enough to look at the locale. On my computer 
(a Mac) I have configured it to have the language in English but 
the date, time, number and currency format to Swedish.

--
/Jacob Carlborg

Nov 01 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Fri, Nov 01, 2019 at 01:01:21PM +0000, Jacob Carlborg via Digitalmars-d
wrote:
 On Thursday, 31 October 2019 at 15:58:08 UTC, H. S. Teoh wrote:
 
 Which leads me to think that these two should be separate format
 specifiers.

 
 I would put the localization in a completely different function.

That would be a better solution. It would be different from snprintf,
though, and we'd have to document it well so that people can find it.


 Unfortunately, I can see how this would force format() to be impure,
 because to support checking the current locale implies accessing
 global state, which is impure.

 
 You could pass in the locale to the function, then it can be pure.
 Even more reason to have it as a separate function. I would say that
 should be best practice because you might want to run a program in a
 different locale than the global configured one.

+1.


 I'm not sure if it's enough to look at the locale. On my computer (a
 Mac) I have configured it to have the language in English but the
 date, time, number and currency format to Swedish.

[...]

I think it has to do with the LC_* environment variables, at least on a
*nix system. You can set LC_ALL to get the same settings across all
categories, or you can separately set one or more of the LC_* to get
different settings in each category. (Caveat: I've never actually done
this myself before, so I could be misunderstanding how it works.)


T

-- 
Famous last words: I wonder what will happen if I do *this*...

Nov 01 2019

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Friday, 1 November 2019 at 17:02:53 UTC, H. S. Teoh wrote:
 I think it has to do with the LC_* environment variables, at 
 least on a *nix system. You can set LC_ALL to get the same 
 settings across all categories, or you can separately set one 
 or more of the LC_* to get different settings in each category. 
 (Caveat: I've never actually done this myself before, so I 
 could be misunderstanding how it works.)

Yeah, POSIX, so POSIX-compliant C compilers should support it...

https://docs.oracle.com/cd/E19253-01/817-2521/overview-39/index.html

Other languages do not have to follow it, of course.

Nov 01 2019

Jacob Carlborg <doob me.com> writes:

On 2019-11-01 18:02, H. S. Teoh wrote:

 That would be a better solution. It would be different from snprintf,
 though, and we'd have to document it well so that people can find it.

It depends on what the goal is. If it is to have 100% compatible drop-in 
replacement to snprintf then we need to include the localization.

But if the goal is just to have a function that converts values to a 
string, which is implemented in D, then have the opportunity to make a 
better interface.

-- 
/Jacob Carlborg

Nov 02 2019

berni44 <dlang d-ecke.de> writes:

On Saturday, 2 November 2019 at 16:59:15 UTC, Jacob Carlborg 
wrote:
 On 2019-11-01 18:02, H. S. Teoh wrote:

 That would be a better solution. It would be different from 
 snprintf,
 though, and we'd have to document it well so that people can 
 find it.

 It depends on what the goal is. If it is to have 100% 
 compatible drop-in replacement to snprintf then we need to 
 include the localization.

 But if the goal is just to have a function that converts values 
 to a string, which is implemented in D, then have the 
 opportunity to make a better interface.

+1

That's actually, what I ask myself all the time. I personally 
prefer the second approach.

And a similar question arrises with the rounding problem, which 
is even a little bit more difficult, because the IEEE standard 
interferes here too.

Nov 04 2019

Guillaume Piolat <first.last gmail.com> writes:

On Saturday, 2 November 2019 at 16:59:15 UTC, Jacob Carlborg 
wrote:
 It depends on what the goal is. If it is to have 100% 
 compatible drop-in replacement to snprintf then we need to 
 include the localization.

 But if the goal is just to have a function that converts values 
 to a string, which is implemented in D, then have the 
 opportunity to make a better interface.

+1 this is important since we've had localization bug and I 
suspect it's very easy to have such bugs.

Warning: `format` is affected too! (perhaps only when using the 
%f format specifier?)

https://github.com/AuburnSounds/printed/issues/22
ugly fix: 
https://github.com/AuburnSounds/printed/commit/797343c0fc213ea34aa5b79b61cdc1164ae189df


There is a non-zero chance that people _are_ relying on `format` 
and `snprintf` being localization-aware. So a "drop-in" 
replacement need to fix this mess by being bug-compatible, or not 
being drop-in.

Nov 04 2019

Uknown <sireeshkodali1 gmail.com> writes:

On Friday, 1 November 2019 at 17:02:53 UTC, H. S. Teoh wrote:
 On Fri, Nov 01, 2019 at 01:01:21PM +0000, Jacob Carlborg via 
 Digitalmars-d wrote:
 On Thursday, 31 October 2019 at 15:58:08 UTC, H. S. Teoh wrote:
 [...]


https://github.com/mpv-player/mpv/commit/1e70e82baa9193f6f027338b0fab0f5078971fbe

This is kind of relevant to the whole issue with locales, and how 
they simply don't work as they should. Probably best to not 
replicate at all, and instead just pass in the necessary 
localisation to the format function, if necessary.

Nov 02 2019

Jonathan M Davis <newsgroup.d jmdavisprog.com> writes:

On Thursday, 31 October 2019 09:58:08 MDT H. S. Teoh via Digitalmars-d 
wrote:
 On Thu, Oct 31, 2019 at 10:14:59AM +0000, Guillaume Piolat via 

Digitalmars-d wrote:
 On Thursday, 31 October 2019 at 01:09:14 UTC, Walter Bright wrote:
 Replacing snprintf for floating point is very challenging, because:

 1. people have been improving snprintf for decades
 2. people expect precision and performance
 3. the standard is snprintf, any credible implementation must be the
 same or better

 Moreover, actual printf implementations seems to depend upon the
 locale.  This creates bugs (say "1,4" instead of "1.4") so this
 behaviour depends if you want to be bug-compatible. We've been hit by
 that in `printed` when used with a Russian locale.

 *Is* it a bug, though?  Arguably, the reason snprintf was done that way
 was precisely to support properly-formatted output in the current
 locale. I.e., when outputting Russian text, the convention is to write
 the decimal point with "," rather than ".". It would be considered wrong
 or strange to write "1.4" instead of "1,4". This is important if you
 want to support i18n in your program.

 But if you're outputting to, say, JSON, then you *don't* ever want
 "1,4", you only want "1.4".

 Which leads me to think that these two should be separate format
 specifiers. Unfortunately, I can see how this would force format() to be
 impure, because to support checking the current locale implies accessing
 global state, which is impure.

The version of format that takes the format specifier as a compile-time
argument shouldn't have that problem, but the one that took it as a runtime
argument certainly would.

- Jonathan M Davis

Oct 31 2019

berni44 <dlang d-ecke.de> writes:

On Thursday, 31 October 2019 at 10:14:59 UTC, Guillaume Piolat 
wrote:
 Moreover, actual printf implementations seems to depend upon 
 the locale. This creates bugs (say "1,4" instead of "1.4") so 
 this behaviour depends if you want to be bug-compatible. We've 
 been hit by that in `printed` when used with a Russian locale.

Meanwhile, my implementation for the f (and F) qualifier is 
(almost) finished. Yet, the locale-stuff is missing and I do not 
manage to implement it. Maybe someone can help me:

a) I need to create some test. As far as I know, I've to execute 
"export LANG=de_DE.UTF-8" (in bash, debian) to make it use the 
german locale, which should replace the dot by a comma. 
Unfortunately writefln!"%.10f"(0.1) still writes a dot instead of 
the expected ",". Instead of "LANG" I tried several other stuff, 
like LC_ALL or LC_NUMERIC. Any idea what I do wrong here?

b) How to query the current locale from D? Actually I only need 
the number-separator in the current locale as a dchar. I found 
core.stdc.locale but do not know how to use it.

Nov 06 2019

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Wednesday, 6 November 2019 at 13:25:38 UTC, berni44 wrote:
 On Thursday, 31 October 2019 at 10:14:59 UTC, Guillaume Piolat 
 wrote:
 [...]

 Meanwhile, my implementation for the f (and F) qualifier is 
 (almost) finished. Yet, the locale-stuff is missing and I do 
 not manage to implement it. Maybe someone can help me:

 a) I need to create some test. As far as I know, I've to 
 execute "export LANG=de_DE.UTF-8" (in bash, debian) to make it 
 use the german locale, which should replace the dot by a comma. 
 Unfortunately writefln!"%.10f"(0.1) still writes a dot instead 
 of the expected ",". Instead of "LANG" I tried several other 
 stuff, like LC_ALL or LC_NUMERIC. Any idea what I do wrong here?

 b) How to query the current locale from D? Actually I only need 
 the number-separator in the current locale as a dchar. I found 
 core.stdc.locale but do not know how to use it.

I think the best way to go is to make it locale-independent and 
simply provide a way for user to specify the decimal separator 
(and other related locale details, if any).

Nov 06 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Nov 06, 2019 at 04:17:32PM +0000, Petar via Digitalmars-d wrote:
 On Wednesday, 6 November 2019 at 13:25:38 UTC, berni44 wrote:

[...]
 b) How to query the current locale from D? Actually I only need the
 number-separator in the current locale as a dchar. I found
 core.stdc.locale but do not know how to use it.

 
 I think the best way to go is to make it locale-independent and simply
 provide a way for user to specify the decimal separator (and other
 related locale details, if any).

Yes, I think in the long run this will be the more viable approach.
Depending on locale as a global state is problematic because it forces
formatting to be impure, and also forces users to implement hacks when
they need to temporarily change the locale. E.g., in a system like
snprintf, if you need to format German text with snippets of English
quotations, you will have to temporarily override LC_* somehow in order
to print a number with two different separators, or hack it with string
postprocessing, etc..

It's better to let the user pass in the desired separator as a parameter
-- the ',' flag in std.format already does this via the optional '?'
modifier, for example:

	writefln("%,?d", '_', 12345678); // 12_345_678
	writefln("%,?d", '|', 12345678); // 12|345|678

Conceivably one could extend the '.' flag with a '?' modifier as well,
so something like this:

	writefln("%.2?d", ',', 3.141592); // 3,14
	writefln("%.2?d", '_', 3.141592); // 3_14
	writefln("%.2?d", ':', 3.141592); // 3:14

Then programs that want to support locales can just do this:

	writefln("%.2?d", curLocale.separator, 3.141592);


T

-- 
I don't trust computers, I've spent too long programming to think that they can
get anything right. -- James Miller

Nov 06 2019

lithium iodate <whatdoiknow doesntexist.net> writes:

On Wednesday, 6 November 2019 at 17:28:58 UTC, H. S. Teoh wrote:
 Yes, I think in the long run this will be the more viable 
 approach. Depending on locale as a global state is problematic 
 because it forces formatting to be impure, and also forces 
 users to implement hacks when they need to temporarily change 
 the locale. E.g., in a system like snprintf, if you need to 
 format German text with snippets of English quotations, you 
 will have to temporarily override LC_* somehow in order to 
 print a number with two different separators, or hack it with 
 string postprocessing, etc..

All while setlocale doesn't even provide any sort of 
thread-safety!

Nov 06 2019

Rumbu <rumbu rumbu.ro> writes:

On Wednesday, 6 November 2019 at 17:28:58 UTC, H. S. Teoh wrote:

 Then programs that want to support locales can just do this:

 	writefln("%.2?d", curLocale.separator, 3.141592);

For %f, the decimal separator is not the only locale specific 
info. Full list:

-decimal separator
-negative pattern
-positive pattern
-infinity symbol
-nan symbol
-digit shapes, especially for Arabic and Thai


For %d and %g there are more like digit grouping/group separator.

Nov 06 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Nov 06, 2019 at 06:21:43PM +0000, Rumbu via Digitalmars-d wrote:
 On Wednesday, 6 November 2019 at 17:28:58 UTC, H. S. Teoh wrote:
 
 Then programs that want to support locales can just do this:
 
 	writefln("%.2?d", curLocale.separator, 3.141592);


[...]
 For %f, the decimal separator is not the only locale specific info.
 Full list:
 
 -decimal separator
 -negative pattern
 -positive pattern
 -infinity symbol
 -nan symbol
 -digit shapes, especially for Arabic and Thai
 
 For %d and %g there are more like digit grouping/group separator.

[...]

Haha, wonderful. Don't you just love it when i18n consistently throws a
monkey wrench into any simplistic scheme?  Almost makes me want to
suggest that we need std.i18n before we can implement anything sane
i18n-wise.

But since that's not gonna happen in the foreseeable future, and I'm
sick and tired of the trend around these parts of letting the perfect be
the enemy of the good, I'm going to propose that we just forget about
i18n and just implement formatting for an English-specific locale. If
users *really* want to support locales, just use %s with a wrapper
struct with a toString method that does whatever it takes to get the
right output. I've used this pattern for various problems with
formatting complex objects, and it works fairly well:

	struct i18nFmt {
		float f; // or double, real, whatever
		int precision;
		... // any other params here, like decimal point format, etc.

		void toString(S)(S sink)
			if (isOutputRange!(S, char))
		{
			... // do whatever you need to do here to
			    // produce the right output
		}
	}

	...
	float myData = ...;

	// just use %s instead of some incomprehensible over-engineered
	// crap like %1:3,$13&.*^_7?f
	output = format("%s", myData.i18nFmt);

	// or:
	output2 = format("%s", myData.i18nFmt(curLocale.precision, ...
				/* whatever else */));

This way you lift the complexity out of std.format where it really
doesn't belong, and make it possible to plug in different locale
handling modules in its place. This even opens the door for a future
std.i18n that simply exports a bunch of these locale-dependent proxy
formatters that you could just append to your data items. Much more
extensible and flexible than trying to shoehorn everything into
std.format, which will inevitably turn it into a nasty hairball of
intractible dependencies that's impossible to make pure, nothrow, etc..
(Oh wait, it's already such a hairball. :-D  Let's not make it worse!)
And it makes std.format more pay-as-you-go; if you never need to use
std.i18n it won't pull it in as a dependency just because it needs to
support an obscure format specifier that you don't actually use.


T

-- 
Being forced to write comments actually improves code, because it is easier to
fix a crock than to explain it. -- G. Steele

Nov 06 2019

berni44 <dlang d-ecke.de> writes:

On Wednesday, 6 November 2019 at 18:21:43 UTC, Rumbu wrote:
 For %f, the decimal separator is not the only locale specific 
 info. Full list:

 -decimal separator
 -negative pattern
 -positive pattern
 -infinity symbol
 -nan symbol
 -digit shapes, especially for Arabic and Thai


 For %d and %g there are more like digit grouping/group 
 separator.

snprintf only uses the decimal separator (and grouping but that's 
not used inside format, the grouping is done separately there). 
All else is ignored by snprintf.

Nov 07 2019

berni44 <dlang d-ecke.de> writes:

On Wednesday, 6 November 2019 at 17:28:58 UTC, H. S. Teoh wrote:
 Yes, I think in the long run this will be the more viable 
 approach. Depending on locale as a global state is problematic 
 because it forces formatting to be impure, and also forces 
 users to implement hacks when they need to temporarily change 
 the locale. E.g., in a system like snprintf, if you need to 
 format German text with snippets of English quotations, you 
 will have to temporarily override LC_* somehow in order to 
 print a number with two different separators, or hack it with 
 string postprocessing, etc..

My current approch is a pure and  safe function that's doing the 
formating, but ignores the locale completely. This function is 
called from formatValueImpl and could be modified there, if 
desired.

Currently (I want to make small steps), the function can only be 
used for the f (and F) specifier (and only for float and double). 
For all other specifiers/types snprintf is still called. That 
might result in different behaviour depending on the specifier 
and the type. I'd prefere to make it behave identically.

Having said this, I completely agree, that it would be better if 
format ignores the locale and let's the user do this in a 
wrapper, if desired.

Nov 07 2019

Jacob Carlborg <doob me.com> writes:

On 2019-11-06 17:17, Petar Kirov [ZombineDev] wrote:

 I think the best way to go is to make it locale-independent and simply 
 provide a way for user to specify the decimal separator (and other 
 related locale details, if any).

In my experience, I think it's best to leave the locale support to a 
separate API. The "snprintf" API is never going to be flexible enough. 
No one is using "snprintf" for serious localization.

It's not just the decimal point that needs to be localized. There are 
various other number related things that need localization.

Just have a look at the number formatter in Apple's API [1]. It's pretty 
big. Then they have separate formatters for currency, length, mass, 
interval and more.

[1] 
https://developer.apple.com/documentation/foundation/nsnumberformatter?language=objc

-- 
/Jacob Carlborg

Nov 06 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Wed, Nov 06, 2019 at 07:43:06PM +0100, Jacob Carlborg via Digitalmars-d
wrote:
[...]
 In my experience, I think it's best to leave the locale support to a
 separate API. The "snprintf" API is never going to be flexible enough.
 No one is using "snprintf" for serious localization.
 
 It's not just the decimal point that needs to be localized. There are
 various other number related things that need localization.
 
 Just have a look at the number formatter in Apple's API [1]. It's
 pretty big. Then they have separate formatters for currency, length,
 mass, interval and more.

[...]

Yeah, after thinking about this more, I've come to the same conclusion.
Just use %s for anything that depends on complex locale-dependent
configuration, and wrap your data item in a proxy object that does
whatever it takes to make it work.

	float myQuantity = ...;
	auto output = format("%s", myQuantity.localeFmt(...));

where localeFmt is some function or wrapper struct overloading toString
that does whatever it takes to format the data in a locale-specific way.


T

-- 
"Hi." "'Lo."

Nov 06 2019

Andre Pany <andre s-e-a-p.de> writes:

On Wednesday, 6 November 2019 at 13:25:38 UTC, berni44 wrote:
 On Thursday, 31 October 2019 at 10:14:59 UTC, Guillaume Piolat 
 wrote:
 Moreover, actual printf implementations seems to depend upon 
 the locale. This creates bugs (say "1,4" instead of "1.4") so 
 this behaviour depends if you want to be bug-compatible. We've 
 been hit by that in `printed` when used with a Russian locale.

 Meanwhile, my implementation for the f (and F) qualifier is 
 (almost) finished. Yet, the locale-stuff is missing and I do 
 not manage to implement it. Maybe someone can help me:

 a) I need to create some test. As far as I know, I've to 
 execute "export LANG=de_DE.UTF-8" (in bash, debian) to make it 
 use the german locale, which should replace the dot by a comma. 
 Unfortunately writefln!"%.10f"(0.1) still writes a dot instead 
 of the expected ",". Instead of "LANG" I tried several other 
 stuff, like LC_ALL or LC_NUMERIC. Any idea what I do wrong here?

 b) How to query the current locale from D? Actually I only need 
 the number-separator in the current locale as a dchar. I found 
 core.stdc.locale but do not know how to use it.

This question comes late, but did you considered to just do an 1 
to 1 translation of snprintf from C to D? Of course the second 
step would be to provide an idiomatic D version with the 
mentioned suggestions.
But having a translation would already be fantastic.

Kind regards
Andre

Nov 06 2019

berni44 <dlang d-ecke.de> writes:

On Wednesday, 6 November 2019 at 16:54:25 UTC, Andre Pany wrote:
 This question comes late, but did you considered to just do an 
 1 to 1 translation of snprintf from C to D?

I scanned through the implementation of snprintf several times 
while I wrote the replacement.  I think, the main algorithm is 
quite similar, apart from some speed improvement for numbers 
close to zero, which turned out to be quite nasty in detail (and 
which for now I skipped therefore).

By the way: A 1 to 1 translation would not be something, I could 
do, because my knowledge of C is very little and the algorithm 
contains lot's of calls to functions I do not know, where to look 
them up and how to replace them with D functions.

Nov 07 2019

lithium iodate <whatdoiknow doesntexist.net> writes:

On Wednesday, 6 November 2019 at 13:25:38 UTC, berni44 wrote:
 a) I need to create some test. As far as I know, I've to 
 execute "export LANG=de_DE.UTF-8" (in bash, debian) to make it 
 use the german locale, which should replace the dot by a comma. 
 Unfortunately writefln!"%.10f"(0.1) still writes a dot instead 
 of the expected ",". Instead of "LANG" I tried several other 
 stuff, like LC_ALL or LC_NUMERIC. Any idea what I do wrong here?

If D wishes to behave the same as C, this is correct behavior. C 
requires the locale "C" to be activated at program startup.
The C-way to use the environment's locale is to call setlocale 
for the relevant category with an empty string for the locale 
value.
e. g. setlocale(LC_ALL, "")

 b) How to query the current locale from D? Actually I only need 
 the number-separator in the current locale as a dchar. I found 
 core.stdc.locale but do not know how to use it.

You can query the current locale of a given category by calling 
setlocale with a null-pointer for the locale, it will return the 
currently set locale as a C-string.

The formatting-information is returned by localeconv(). Not sure 
why the docs don't show the members of lconv, but it contains 
decimal_point, which is a C-string of the decimal separator.
setlocale(LC_ALL, "de_DE.UTF-8");
localeconv.decimal_point.fromStringz.writeln;
prints ","

Nov 06 2019

berni44 <dlang d-ecke.de> writes:

On Thursday, 31 October 2019 at 01:09:14 UTC, Walter Bright wrote:
 To that end, you'll need to be familiar with the following:

Thanks for that list. I'll have a look, when I find the time to 
do so.

 754-2019 IEEE Standard for Floating Point Arithmetic
 https://ieeexplore.ieee.org/document/8766229

Unfortunately I cannot download this file. I've got no company 
listed there and I'm not willing to pay for it...

 Ryu Fast Float To String Conversion
 https://dl.acm.org/citation.cfm?id=3192369

 https://github.com/ulfjack/ryu
 [...]

 Jonathan Marler's D implementation of ryu:
 https://github.com/dragon-lang/mar/blob/master/src/mar/ryu.d

I allready read the paper about ryu. IMHO it's of no use here, 
because the speed advantage comes from being more "inaccurate" 
than snprintf. Ryu is designed for a round-trip, while snprintf 
prints as many digits, as the user wants to get (even when they 
contain no more information). The same holds for grisu variants.

Oct 31 2019

Robert Schadek <rschadek symmetryinvestments.com> writes:

On Wednesday, 30 October 2019 at 13:44:52 UTC, berni44 wrote:
 In PR 7222 [1] Robert Schadek suggested replacing the call to 
 snprinf in std.format with an own method written in D.

I suggested it because formatImpl!float is not pure which makes 
for instance
std.json not pure among a few other.

Oct 31 2019

berni44 <dlang d-ecke.de> writes:

On Wednesday, 30 October 2019 at 13:44:52 UTC, berni44 wrote:
 In PR 7222 [1] Robert Schadek suggested replacing the call to 
 snprinf in std.format with an own method written in D. [...]

Meanwhile I filed a first PR: 
https://github.com/dlang/phobos/pull/7264 - only part of a 
complete replacement is achieved with that: Only the 'f' 
qualifier is replaced and that only for float and double. But 
it's a start and I want to make small steps.

Many thanks to all of you, who answered to this thread or gave 
hints at other places. This helped a lot. :-)

Nov 08 2019

berni44 <dlang d-ecke.de> writes:

On Friday, 8 November 2019 at 14:42:29 UTC, berni44 wrote:
 Meanwhile I filed a first PR: 
 https://github.com/dlang/phobos/pull/7264 - only part of a 
 complete replacement is achieved with that: Only the 'f' 
 qualifier is replaced and that only for float and double. But 
 it's a start and I want to make small steps.

Update: While this first PR is still waiting for being revied, a 
second PR (the same for '%a' qualifier) has been merged last 
week. Today I filed a third PR (for '%e' qualifier).

The '%g' qualifier has to wait until these two PRs are merged, 
because it depends strongly on those two. With the help of Petar 
Kirov [ZombineDev] I meanwhile also managed to make the whole 
CTFEable. But this also has to wait for the two PRs mentioned 
above.

Next steps will be some speed optimization for small exponents 
(works allready on paper but I havn't implemented and tested it 
yet) and for large exponents (only a vague idea yet).

Dec 14 2019

berni44 <dlang d-ecke.de> writes:

On Saturday, 14 December 2019 at 08:44:21 UTC, berni44 wrote:
 Update: While this first PR is still waiting for being revied, 
 a second PR (the same for '%a' qualifier) has been merged last 
 week. Today I filed a third PR (for '%e' qualifier).

 The '%g' qualifier has to wait until these two PRs are merged, 
 because it depends strongly on those two. With the help of 
 Petar Kirov [ZombineDev] I meanwhile also managed to make the 
 whole CTFEable. But this also has to wait for the two PRs 
 mentioned above.

I'm still waiting for a review... The algorithm isn't really 
complicated. It's essentially successive division/multiplication 
by 10. Unfortunately with numbers larger than ulong and there is 
all the stuff with the flags, precision and width... But should 
be doable anyway, IMHO.

* https://github.com/dlang/phobos/pull/7264
* https://github.com/dlang/phobos/pull/7318

Jan 24 2020

D Programming

C/C++ Programming

Other

digitalmars.D - Replacement for snprintf