www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Proof of concept: automatic extraction of gettext-style translation

reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
This morning a neat idea occurred to me for a gettext-like system in D
that allows automatic and reliable extraction of all translation strings
from a program, that doesn't need an external parser to run over the
program source code.

Traditionally, gettext requires an external tool to parse the source
code and extract translatable strings.  In D, however, we can take
advantage of (1) passing the format string at compile-time to gettext(),
which then allows (2) using static this() to register all format strings
at runtime to a central dictionary of format strings, regardless of
whether the corresponding gettext() call actually got called at runtime.
(3) Wrap that in a version() condition, and you can have the compiler do
the string extraction for you without needing an external source code
parser.

Here's a proof of concept:

	// ------------------------------------------------------------------
	// File: lang.d
	version(extractStr) {
		int[string] allStrings;
		void main() {
			import std.algorithm;
			import std.stdio;
			auto s = allStrings.keys;
			s.sort();
			writefln("string[string] dict = [\n%(\t%s: \"\",\n%|%)];", s);
		}
	}
	
	template gettext(string fmt, Args...)
	{
		version(extractStr)
		static this() {
			allStrings[fmt]++;
		}
		string gettext(Args args) {
			import std.format;
			return format(fmt, args);
		}
	}

	// ------------------------------------------------------------------
	// File: main.d
	import mod1, mod2;
	
	version(extractStr) {} else
	void main() {
		auto names = [ "Joe", "Schmoe", "Jane", "Doe" ];
		foreach (i; 0 .. names.length) {
			fun1(names[i]);
			fun2(5 + cast(int)i*10);
		}
	}

	// ------------------------------------------------------------------
	// File: mod1.d
	import std.stdio;
	import lang;
	
	void fun1(string name) {
		writeln(gettext!"Hello! My name is %s."(name));
	}

	// ------------------------------------------------------------------
	// File: mod2.d
	import std.stdio;
	import lang;
	
	void fun2(int num) {
		writeln(gettext!"I'm counting %d apples."(num));
	}
	
	void fun3() {
		writeln(gettext!"Never called, but nevertheless registered!");
	}


Running the program normally with `dmd -i -run main.d` produces the
output:

	Hello! My name is Joe.
	I'm counting 5 apples.
	Hello! My name is Schmoe.
	I'm counting 15 apples.
	Hello! My name is Jane.
	I'm counting 25 apples.
	Hello! My name is Doe.
	I'm counting 35 apples.


Format strings can be extracted by compiling with -version=extractStr:

	dmd -i -version=extractStr -run main.d

which produces a template for translating the format strings into
another language:

	string[string] dict = [
		"Hello! My name is %s.": "",
		"I'm counting %d apples.": "",
		"Never called, but nevertheless registered!": "",
	];


The idea is that in a real implementation gettext(), it would look up
the format string in the l10n file containing a filled-out instance of
the above dictionary and map it to the target language. It could also
have a fancier extractStr that merges new format strings into an
existing translated file, so that l10n files can be continually updated
as development proceeds.

The best thing about this is that no additional tooling is required; the
string extraction process is 100% reliable and not prone to bugs in an
external parser, and done completely within D.


T

-- 
Computerese Irregular Verb Conjugation: I have preferences.  You have biases. 
He/She has prejudices. -- Gene Wirchenko
Apr 02 2020
parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 2 April 2020 at 13:01:09 UTC, H. S. Teoh wrote:
 This morning a neat idea occurred to me for a gettext-like 
 system in D that allows automatic and reliable extraction of 
 all translation strings from a program, that doesn't need an 
 external parser to run over the program source code.
Indeed, I have played with this before, it is really cool. I almost wrote it as an example of my string interpolation proposal, since with my proposal, it would be possible to run this over i"" strings passed to a particular function too. D rox.
Apr 02 2020
parent reply =?UTF-8?Q?S=c3=b6nke_Ludwig?= <sludwig+d outerproduct.org> writes:
Am 02.04.2020 um 15:04 schrieb Adam D. Ruppe:
 On Thursday, 2 April 2020 at 13:01:09 UTC, H. S. Teoh wrote:
 This morning a neat idea occurred to me for a gettext-like system in D 
 that allows automatic and reliable extraction of all translation 
 strings from a program, that doesn't need an external parser to run 
 over the program source code.
Indeed, I have played with this before, it is really cool. I almost wrote it as an example of my string interpolation proposal, since with my proposal, it would be possible to run this over i"" strings passed to a particular function too. D rox.
I'm doing the same in my UI framework. In addition to being able to collect all strings at compile-time, it is also possible to translate and verify the existence of strings at compile-time by loading and parsing the PO-files with CTFE. BTW, I never got around to commenting on the string interpolation topic, but the inability to translate i"" strings was my biggest practical concern. Never really understood the reluctance against lowering to a template instantiation, though. (PS: nevermind the e-mail, I can't handle my e-mail client properly)
Apr 02 2020
parent Adam D. Ruppe <destructionator gmail.com> writes:
On Thursday, 2 April 2020 at 15:53:58 UTC, Sönke Ludwig wrote:
 BTW, I never got around to commenting on the string 
 interpolation topic, but the inability to translate i"" strings 
 was my biggest practical concern. Never really understood the 
 reluctance against lowering to a template instantiation, though.
Yeah, I don't wanna derail too much but my version here: https://github.com/dlang/DIPs/pull/186 could be used. Here, I kept the part I cut out of the file - I just wan't happy with the details all being right, but the thrust of it has: ----- The string must be processed in whole, with as much context as possible for the translator to do a good job. If the string was broken up into a tuple, it would be very difficult for a translator to make sense of it. With `_d_interpolated_string`, however, the static components are clearly separated from, while still being clearly associated with, their companion runtime arguments, and are indeed available at compile time. Moreover, it may be necessary to reorder words and act on factors like plurality. With the templated version together with a helper function (e.g. `translate(i"I have $count apples")` you can get a compile-time list of strings needing translation and write runtime functions to handle localization details as required for individual strings. ``` string translate(d_interpolated_string!("I have ", spec(null), " apples") spec, int count) { if(count == 1) return "I have 1 apple"; else return format(spec.ToFormatString!"%d", count); } ``` ------ I never finished that section, I just wasn't happy with my examples and arguments, but it is one of the things on my mind. You could do complicated logic in D itself all verified at compile time. Or just pass to a runtime thing like gnu gettext possibly with helper templates. D has a LOT of potential in this area that we have barely even scratched the surface of.
Apr 02 2020