www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Internationalization support and format strings

reply Bruno Haible <bruno clisp.org> writes:
Hi,

The GNU gettext package contains tools for internationalization, 
enabling a programmer to make their package "speak" to the users 
in their specific language.

GNU gettext so far supports a number of programming languages, see
https://www.gnu.org/software/gettext/manual/html_node/List-of-Programming-Languages.html

I thought it would be a good idea to make GNU gettext support 
also the D programming language. This is a registered wish list 
item since 2017: https://savannah.gnu.org/bugs/?51291 . On the D 
side, a rudimentary interface to the gettext() function in the 
GNU C library exists as well: 
https://code.dlang.org/packages/libintl

I am now trying to implement this support. I am already done with 
the xgettext support (parsing D source code and extracting 
messages). But from the programming language, this support also 
needs format strings with positions (so that translators can 
reorder arguments in their translations of format strings).

D has format strings in its standard library (phobos): 
https://dlang.org/library/std/format.html
But this format string facility has 4 major bugs:
https://github.com/dlang/phobos/issues/10699
https://github.com/dlang/phobos/issues/10711
https://github.com/dlang/phobos/issues/10712
https://github.com/dlang/phobos/issues/10713

Two questions:

1) How can this be, that a programming language that is more than 
20 years old and that is integrated into GCC for 6 years, has a 
format string facility that is riddled with bugs? Is D only a 
playground for compiler hackers and not used for real 
applications, and thus the standard library is "uninteresting"?

2) How should I continue? What advice would you give me? Should I 
wait until the format string bugs are fixed (and if so, in which 
time frame)? Or should I cancel the GNU gettext support for D ?

Best regards. I don't want to offend anyone. If you feel an 
offense, please excuse it with frustration on my side.
Mar 24
next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 25/03/2025 11:39 AM, Bruno Haible wrote:
 Hi,
 
 The GNU gettext package contains tools for internationalization, 
 enabling a programmer to make their package "speak" to the users in 
 their specific language.
 
 GNU gettext so far supports a number of programming languages, see
 https://www.gnu.org/software/gettext/manual/html_node/List-of- 
 Programming-Languages.html
 
 I thought it would be a good idea to make GNU gettext support also the D 
 programming language. This is a registered wish list item since 2017: 
 https://savannah.gnu.org/bugs/?51291 . On the D side, a rudimentary 
 interface to the gettext() function in the GNU C library exists as well: 
 https://code.dlang.org/packages/libintl
 
 I am now trying to implement this support. I am already done with the 
 xgettext support (parsing D source code and extracting messages). But 
 from the programming language, this support also needs format strings 
 with positions (so that translators can reorder arguments in their 
 translations of format strings).
 
 D has format strings in its standard library (phobos): https:// 
 dlang.org/library/std/format.html
 But this format string facility has 4 major bugs:
 https://github.com/dlang/phobos/issues/10699
Needs to be discussed. Requires sentinel value, and right now its typed as a ubyte.
 https://github.com/dlang/phobos/issues/10711
I think its more that docs are wrong here, rather than implementation.
 https://github.com/dlang/phobos/issues/10712
I'm not sure about this one, it looks funky, but at least its easy enough to work around with. I'll leave it to someone else.
 https://github.com/dlang/phobos/issues/10713
https://github.com/dlang/phobos/pull/10714
Mar 24
next sibling parent Bruno Haible <bruno clisp.org> writes:
Richard (Rikki) Andrew Cattermole wrote:
 https://github.com/dlang/phobos/issues/10713
https://github.com/dlang/phobos/pull/10714
Thanks for handling this one.
 https://github.com/dlang/phobos/issues/10711
I think its more that docs are wrong here, rather than implementation.
Hmm, you mean, instead of specifying a fixed order: Parameters: Position Flags Width Precision Separator the spec should specify arbitrary order? Parameters: empty Parameter Parameters Parameter: Position Flags Width Precision Separator If that is intended, then 1) Is it valid to specify two positions, two widths, two precisions, or two separators? For example %1$2$d %.3.5d %,3,5d And if it is valid, does the first position/width/precision/separator matter, or the last one? 2) Since Flags can start with a digit 0, how do you disambiguate Flags after Width, Precision, or Separator? For example %40d %.50d %,50d The current spec, with the fixed order, is better at avoiding these ambiguities.
 https://github.com/dlang/phobos/issues/10712
Richard (Rikki) Andrew Cattermole wrote:
 I'm not sure about this one, it looks funky, but at least its 
 easy enough to work around with.
Such a workaround does not help me with the internationalization. The situation with the internationalization is: 1) The programmer specifies a format string as a gettext() argument. For instance, gettext("%s is replaced with %*s") or "%s is replaced with %*s".gettext 2) The translator decides whether they need reordering, and thus translates "%s is replaced with %*s" with "%3$*2$d ersetzt %1$s" 3) The GNU msgfmt program verifies that the translator's translation "matches", based on the specification of format strings. There is no programmer that could add a workaround, since the programmer is not involved after step 1. And the translator usually does not try their translations "live". So, what is really needed here, is not a possible workaround but an implementation of std.format that is in sync with its specification.
Mar 25
prev sibling parent Bruno Haible <bruno clisp.org> writes:
Richard (Rikki) Andrew Cattermole wrote:
 https://github.com/dlang/phobos/issues/10711
I think its more that docs are wrong here, rather than implementation.
Another reason why it's better to keep a fixed order of Position Flags Width Precision Separator is the runtime execution (implemented in std/format/write.d, function formattedWrite). This function currently processes width, precision, separators in that order. Now, think of a format string such as "%,*.**d" The programmer would expect that argument 1 are the separator digits, argument 2 is the precision, argument 3 is the width, and argument 4 is the value to be formatted. If you don't change formattedWrite, it will actually use argument 1 for the width, argument 2 for the precision, argument 3 for the separator digits — which doesn't match programmer expectations. Whereas if you change formattedWrite to use the arguments in the order in which they were referenced in the format string, you are slowing down the formatting at runtime.
Mar 25
prev sibling parent reply Paul Backus <snarwin gmail.com> writes:
On Monday, 24 March 2025 at 22:39:11 UTC, Bruno Haible wrote:
 I thought it would be a good idea to make GNU gettext support 
 also the D programming language. This is a registered wish list 
 item since 2017: https://savannah.gnu.org/bugs/?51291 . On the 
 D side, a rudimentary interface to the gettext() function in 
 the GNU C library exists as well: 
 https://code.dlang.org/packages/libintl
Have you seen this package? https://code.dlang.org/packages/gettext
Mar 24
parent reply Bruno Haible <bruno clisp.org> writes:
Paul Backus wrote:
 On the D side, a rudimentary interface to the gettext() 
 function in the GNU C library exists as well: 
 https://code.dlang.org/packages/libintl
Have you seen this package? https://code.dlang.org/packages/gettext
Thanks for the hint. Yes, I have looked at all of these: https://code.dlang.org/packages/libintl https://code.dlang.org/packages/gettext https://code.dlang.org/packages/i18nd https://code.dlang.org/packages/mofile https://code.dlang.org/packages/djtext They all have very small "Download Stats", indicating that their actual use in applications is nonexistent or irrelevant. From these five packages, the one that comes closest to having a usable API, on par with the gettext APIs for other programming languages, is https://code.dlang.org/packages/libintl .
Mar 25
parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On Tuesday, 25 March 2025 at 09:23:56 UTC, Bruno Haible wrote:
 Paul Backus wrote:
 Have you seen this package?

 https://code.dlang.org/packages/gettext
Thanks for the hint. Yes, I have looked at all of these: https://code.dlang.org/packages/libintl https://code.dlang.org/packages/gettext https://code.dlang.org/packages/i18nd https://code.dlang.org/packages/mofile https://code.dlang.org/packages/djtext They all have very small "Download Stats", indicating that their actual use in applications is nonexistent or irrelevant.
https://code.dlang.org/packages/gettext is, as far as I know, being used in a real application, and was developed specifically for that application, for one of D's major users in industry (see Bastiaan's talk in 2023: https://dconf.org/2023/#veelob) I would focus on that one for de-facto standardization. Likely it has a small number of users, but would be actively fixed if issues are found. And also, the formatting bugs should all be fixed as well, regardless of gettext support. -Steve
Mar 25
parent reply Bruno Haible <bruno clisp.org> writes:
On Tuesday, 25 March 2025 at 16:00:09 UTC, Steven Schveighoffer 
wrote:
 https://code.dlang.org/packages/gettext
 I would focus on that one for de-facto standardization. Likely 
 it has a small number of users, but would be actively fixed if 
 issues are found.
No, this approach from https://code.dlang.org/packages/gettext is not suitable for wide-scale use, due to the following misfeatures: 1) https://dconf.org/2023/slides/veelo.pdf page 77 explains how to generate repetitive msgids through the use of 'foreach' and the 'tr!' template. Such generation of msgids is a bad idea for several reasons: - Care should be taken to not burden translators with unnecessary work. If a programmer does not want to write N strings, the translators should not have to translate N strings. - It is impossible for the programmer to review the msgids according to https://www.gnu.org/software/gettext/manual/html_node/Preparing-Strings.html in such scenarios. - It is impossible for the programmer to attach translator comments in such scenarios. 2) https://github.com/veelo/gettext/blob/main/README.md says that they "scan any generated code that may be mixed in". But translators are humans and occasionally want to consult the actual source code for understanding the context of messages. They can't consult code produced by the compiler on-the-fly, nor can they consult the generated binary code. 3) It mixes library code and build-time add-ons into one file. This does not work - for internationalized libraries (because they are not an executable, that can be subject to "dub run --config=xgettext"), - when cross-compiling (because you can't execute the just-compiled code during the build process). 4) It generates non-standard POT files (with function names after the line numbers).
Oct 03
next sibling parent reply monkyyy <crazymonkyyy gmail.com> writes:
On Friday, 3 October 2025 at 22:53:09 UTC, Bruno Haible wrote:
 They can't consult code produced by the compiler on-the-fly,
https://dlang.org/dmd-linux.html#switch-mixin
Oct 03
parent Bruno Haible <bruno clisp.org> writes:
On Friday, 3 October 2025 at 23:10:25 UTC, monkyyy wrote:
 They can't consult code produced by the compiler on-the-fly,
https://dlang.org/dmd-linux.html#switch-mixin
OK, then let me say instead: They (the translators) can't consult code produced by the compiler on-the-fly, merely by looking at the source code in the release tarball, without installing a development environment for D.
Oct 03
prev sibling parent Bastiaan Veelo <Bastiaan Veelo.net> writes:
On Friday, 3 October 2025 at 22:53:09 UTC, Bruno Haible wrote:
 On Tuesday, 25 March 2025 at 16:00:09 UTC, Steven Schveighoffer 
 wrote:
 https://code.dlang.org/packages/gettext
 I would focus on that one for de-facto standardization. Likely 
 it has a small number of users, but would be actively fixed if 
 issues are found.
No, this approach from https://code.dlang.org/packages/gettext is not suitable for wide-scale use, due to the following misfeatures: 1) https://dconf.org/2023/slides/veelo.pdf page 77 explains how to generate repetitive msgids through the use of 'foreach' and the 'tr!' template.
```d foreach (i, where; [tr!"hand", tr!"bush"]) writefln(tr!("One bird in the %1$s", "%2$d birds in the %1$s")(i + 1), where) ``` It appears to me that you misread that code. This is a (contrived) example of explicit ordering of placeholders (https://youtu.be/FYKrXsnzrIM?si=F-MaAh0rWMoa0cj-&t=1534). I fail to see the generation of repetitive msgids, there are just three in this example. If you think that looping over translatable strings is a bad idea then that is fine by me, but it is not a flaw in the package to support such feature.
    Such generation of msgids is a bad idea for several reasons:
    - Care should be taken to not burden translators with 
 unnecessary work. If a programmer does not want to write N 
 strings, the translators should not have to translate N strings.
They don't.
    - It is impossible for the programmer to review the msgids 
 according to 
 https://www.gnu.org/software/gettext/manual/html_node/Preparing-Strings.html
in such scenarios.
There are no generated strings. The strings in the code are the strings to be translated. There is full compliance with the manual that you reference here.
    - It is impossible for the programmer to attach translator 
 comments in such scenarios.
Yes, comments are fully supported, adding them here is trivial. https://youtu.be/FYKrXsnzrIM?si=YcsCqnohRnzG61ZH&t=1711
 2) https://github.com/veelo/gettext/blob/main/README.md says 
 that they "scan any generated code that may be mixed in". But 
 translators are humans and occasionally want to consult the 
 actual source code for understanding the context of messages. 
 They can't consult code produced by the compiler on-the-fly, 
 nor can they consult the generated binary code.
Again, a feature of the package is not a misfeature just because using it is a bad idea. Code generation is an integral feature of the language, and this package supports it (and it didn't need to do anything special for it, it just works due to the approach taken).
 3) It mixes library code and build-time add-ons into one file. 
 This does not work
      - for internationalized libraries (because they are not an 
 executable, that can be subject to "dub run --config=xgettext"),
You can, using conditional compilation.
 - when cross-compiling (because you can't execute the 
 just-compiled code during the build process).
It is a separate and different build. You would configure the xgettext config to be native, then cross-compile to any other targets. No traces of extraction code appear in non-xgettext builds.
 4) It generates non-standard POT files (with function names 
 after the line numbers).
Well observed. There is nothing in the standard that disallows this. https://www.gnu.org/software/gettext/manual/gettext.html#PO-Files. They are fully supported by the GNU gettext utilities. I think it is a useful feature. I applaude adding support for D in official GNU software, and I thank you for the work that you put into it. I am sorry that I didn't see your post earlier. There are different tradeoffs to using GNU xgettext and using --config=xgettext. The biggest downside of the latter is that string extraction takes about as long as a full build; GNU xgettext is likely much quicker. What this gets you is platform independence and complete language coverage without maintenance (https://youtu.be/FYKrXsnzrIM?si=yFTIrhN5KWSa5uc8&t=2915). Which is why you haven't seen any updates to https://code.dlang.org/packages/gettext. -- Bastiaan.
Oct 07