www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - printf() metaprogramming challenge

reply Walter Bright <newshound2 digitalmars.com> writes:
While up at night with jetlag at DConf, I started toying about solving a small 
problem. In order to use printf(), the format specifiers in the printf format 
string have to match the types of the rest of the parameters. This is well
known 
to be brittle and error-prone, especially when refactoring the types of the 
arguments.

(Of course, this is not a problem with writefln() and friends, but that isn't 
available in the dmd front end, nor when using betterC. Making printf better 
would mesh nicely with betterC. Note that many C compilers have extensions to 
tell you if there's a mismatch, but they won't fix it for you.)

I thought why not use D's metaprogramming to fix it. Some ground rules:

1. No extra overhead
2. Completely self-contained
3. Only %s specifiers are rewritten
4. %% is handled
5. diagnose mismatch between number of specifiers and number of arguments

Here's my solution:

     int i;
     dprintf!"hello %s %s %s %s betty\n"(3, 4.0, &i, "abc".ptr);

gets rewritten to:

     printf("hello %d %g %p %s betty\n", 3, 4.0, &i, "abc".ptr);

The code at the end accomplishes this. Yay!

But what I'd like it to do is to extend it to convert a `string s` argument
into 
`cast(int)s.length, s.ptr` tuple and use the "%.*s" specifier for it.

I completely failed at that. I suspect the language has a deficiency in 
manipulating expression tuples.

Does anyone see a way to make this work?

Note: In order to minimize template bloat, I refactored most of the work into a 
regular function, minimizing the size of the template expansions.

------ Das Code ------------
import core.stdc.stdio : printf;

template Seq(A ...) { alias Seq = A; }

int dprintf(string f, A ...)(A args)
{
     enum Fmts = Formats!(A);
     enum string s = formatString(f, Fmts);
     __gshared const(char)* s2 = s.ptr;
     return printf(Seq!(s2, args[0..2], args[2..4]));
}

template Formats(T ...)
{
     static if (T.length == 0)
	enum Formats = [ ];
     else static if (T.length == 1)
	enum Formats = [Spec!(T[0])];
     else
	enum Formats = [Spec!(T[0])] ~ Formats!(T[1 .. T.length]);
}

template Spec(T : byte)    { enum Spec = "%d"; }
template Spec(T : short)   { enum Spec = "%d"; }
template Spec(T : int)     { enum Spec = "%d"; }
template Spec(T : long)    { enum Spec = "%lld"; }

template Spec(T : ubyte)   { enum Spec = "%u"; }
template Spec(T : ushort)  { enum Spec = "%u"; }
template Spec(T : uint)    { enum Spec = "%u"; }
template Spec(T : ulong)   { enum Spec = "%llu"; }

template Spec(T : float)   { enum Spec = "%g"; }
template Spec(T : double)  { enum Spec = "%g"; }
template Spec(T : real)    { enum Spec = "%Lg"; }

template Spec(T : char)    { enum Spec = "%c"; }
template Spec(T : wchar)   { enum Spec = "%c"; }
template Spec(T : dchar)   { enum Spec = "%c"; }

template Spec(T : immutable(char)*)   { enum Spec = "%s"; }
template Spec(T : const(char)*)       { enum Spec = "%s"; }
template Spec(T : T*)                 { enum Spec = "%p"; }

/******************************************
  * Replace %s format specifiers in f with corresponding specifiers in A[].
  * Other format specifiers are left as is.
  * Number of format specifiers must match A.length.
  * Params:
  *	f = printf format string
  *	A = replacement format specifiers
  * Returns:
  *	replacement printf format string
  */
string formatString(string f, string[] A ...)
{
     string r;
     size_t i;
     size_t ai;
     while (i < f.length)
     {
	if (f[i] != '%' || i + 1 == f.length)
	{
	    r ~= f[i];
	    ++i;
	    continue;
	}
	char c = f[i + 1];
	if (c == '%')
	{
	    r ~= "%%";
	    i += 2;
	    continue;
	}
	assert(ai < A.length, "not enough arguments");
	string fmt = A[ai];
	++ai;
	if (c == 's')
	{
	    r ~= fmt;
	    i += 2;
	    continue;
	}
	r ~= '%';
	++i;
	continue;
     }
     assert(ai == A.length, "not enough formats");
     return r;
}
----- End Of Das Code ----------
May 23
next sibling parent reply Jonathan Marler <johnnymarler gmail.com> writes:
On Thursday, 23 May 2019 at 19:33:15 UTC, Walter Bright wrote:
 While up at night with jetlag at DConf, I started toying about 
 solving a small problem. In order to use printf(), the format 
 specifiers in the printf format string have to match the types 
 of the rest of the parameters. This is well known to be brittle 
 and error-prone, especially when refactoring the types of the 
 arguments.

 (Of course, this is not a problem with writefln() and friends, 
 but that isn't available in the dmd front end, nor when using 
 betterC. Making printf better would mesh nicely with betterC. 
 Note that many C compilers have extensions to tell you if 
 there's a mismatch, but they won't fix it for you.)

 I thought why not use D's metaprogramming to fix it. Some 
 ground rules:

 1. No extra overhead
 2. Completely self-contained
 3. Only %s specifiers are rewritten
 4. %% is handled
 5. diagnose mismatch between number of specifiers and number of 
 arguments

 Here's my solution:

     int i;
     dprintf!"hello %s %s %s %s betty\n"(3, 4.0, &i, "abc".ptr);

 gets rewritten to:

     printf("hello %d %g %p %s betty\n", 3, 4.0, &i, "abc".ptr);

 The code at the end accomplishes this. Yay!

 But what I'd like it to do is to extend it to convert a `string 
 s` argument into `cast(int)s.length, s.ptr` tuple and use the 
 "%.*s" specifier for it.

 I completely failed at that. I suspect the language has a 
 deficiency in manipulating expression tuples.

 Does anyone see a way to make this work?

 Note: In order to minimize template bloat, I refactored most of 
 the work into a regular function, minimizing the size of the 
 template expansions.

 ------ Das Code ------------
 import core.stdc.stdio : printf;

 template Seq(A ...) { alias Seq = A; }

 int dprintf(string f, A ...)(A args)
 {
     enum Fmts = Formats!(A);
     enum string s = formatString(f, Fmts);
     __gshared const(char)* s2 = s.ptr;
     return printf(Seq!(s2, args[0..2], args[2..4]));
 }

 template Formats(T ...)
 {
     static if (T.length == 0)
 	enum Formats = [ ];
     else static if (T.length == 1)
 	enum Formats = [Spec!(T[0])];
     else
 	enum Formats = [Spec!(T[0])] ~ Formats!(T[1 .. T.length]);
 }

 template Spec(T : byte)    { enum Spec = "%d"; }
 template Spec(T : short)   { enum Spec = "%d"; }
 template Spec(T : int)     { enum Spec = "%d"; }
 template Spec(T : long)    { enum Spec = "%lld"; }

 template Spec(T : ubyte)   { enum Spec = "%u"; }
 template Spec(T : ushort)  { enum Spec = "%u"; }
 template Spec(T : uint)    { enum Spec = "%u"; }
 template Spec(T : ulong)   { enum Spec = "%llu"; }

 template Spec(T : float)   { enum Spec = "%g"; }
 template Spec(T : double)  { enum Spec = "%g"; }
 template Spec(T : real)    { enum Spec = "%Lg"; }

 template Spec(T : char)    { enum Spec = "%c"; }
 template Spec(T : wchar)   { enum Spec = "%c"; }
 template Spec(T : dchar)   { enum Spec = "%c"; }

 template Spec(T : immutable(char)*)   { enum Spec = "%s"; }
 template Spec(T : const(char)*)       { enum Spec = "%s"; }
 template Spec(T : T*)                 { enum Spec = "%p"; }

 /******************************************
  * Replace %s format specifiers in f with corresponding 
 specifiers in A[].
  * Other format specifiers are left as is.
  * Number of format specifiers must match A.length.
  * Params:
  *	f = printf format string
  *	A = replacement format specifiers
  * Returns:
  *	replacement printf format string
  */
 string formatString(string f, string[] A ...)
 {
     string r;
     size_t i;
     size_t ai;
     while (i < f.length)
     {
 	if (f[i] != '%' || i + 1 == f.length)
 	{
 	    r ~= f[i];
 	    ++i;
 	    continue;
 	}
 	char c = f[i + 1];
 	if (c == '%')
 	{
 	    r ~= "%%";
 	    i += 2;
 	    continue;
 	}
 	assert(ai < A.length, "not enough arguments");
 	string fmt = A[ai];
 	++ai;
 	if (c == 's')
 	{
 	    r ~= fmt;
 	    i += 2;
 	    continue;
 	}
 	r ~= '%';
 	++i;
 	continue;
     }
     assert(ai == A.length, "not enough formats");
     return r;
 }
 ----- End Of Das Code ----------
It uses mixin, so not pretty, but it works... void main() { int i = 0; dprintf!"hello %s %s %s %s betty\n"(3, 4.0, &i, "abc".ptr); const msg = "AAA!"; dprintf!"A dstring '%s'\n"(msg[0 .. 3]); } template Seq(A ...) { alias Seq = A; } int dprintf(string f, A ...)(A args) { import core.stdc.stdio : printf; enum Fmts = Formats!(A); enum string s = formatString(f, Fmts); __gshared const(char)* s2 = s.ptr; enum call = function() { import std.conv : to; string printfCall = "printf(s2"; foreach(i, T; A) { static if (is(T : string)) { printfCall ~= ", cast(size_t)args[" ~ i.to!string ~ "].length, args[" ~ i.to!string ~ "].ptr"; } else { printfCall ~= ", args[" ~ i.to!string ~ "]"; } } return printfCall ~ ")"; }(); //pragma(msg, call); // uncomment to see the final call return mixin(call); } template Formats(T ...) { static if (T.length == 0) enum Formats = []; else static if (T.length == 1) enum Formats = [Spec!(T[0])]; else enum Formats = [Spec!(T[0])] ~ Formats!(T[1 .. T.length]); } template Spec(T : byte) { enum Spec = "%d"; } template Spec(T : short) { enum Spec = "%d"; } template Spec(T : int) { enum Spec = "%d"; } template Spec(T : long) { enum Spec = "%lld"; } template Spec(T : ubyte) { enum Spec = "%u"; } template Spec(T : ushort) { enum Spec = "%u"; } template Spec(T : uint) { enum Spec = "%u"; } template Spec(T : ulong) { enum Spec = "%llu"; } template Spec(T : float) { enum Spec = "%g"; } template Spec(T : double) { enum Spec = "%g"; } template Spec(T : real) { enum Spec = "%Lg"; } template Spec(T : char) { enum Spec = "%c"; } template Spec(T : wchar) { enum Spec = "%c"; } template Spec(T : dchar) { enum Spec = "%c"; } template Spec(T : string) { enum Spec = "%.*s"; } template Spec(T : immutable(char)*) { enum Spec = "%s"; } template Spec(T : const(char)*) { enum Spec = "%s"; } template Spec(T : T*) { enum Spec = "%p"; } /****************************************** * Replace %s format specifiers in f with corresponding specifiers in A[]. * Other format specifiers are left as is. * Number of format specifiers must match A.length. * Params: * f = printf format string * A = replacement format specifiers * Returns: * replacement printf format string */ string formatString(string f, string[] A ...) { string r; size_t i; size_t ai; while (i < f.length) { if (f[i] != '%' || i + 1 == f.length) { r ~= f[i]; ++i; continue; } char c = f[i + 1]; if (c == '%') { r ~= "%%"; i += 2; continue; } assert(ai < A.length, "not enough arguments"); string fmt = A[ai]; ++ai; if (c == 's') { r ~= fmt; i += 2; continue; } r ~= '%'; ++i; continue; } assert(ai == A.length, "not enough formats"); return r; }
May 23
next sibling parent Les De Ridder <les lesderid.net> writes:
On Thursday, 23 May 2019 at 22:48:33 UTC, Jonathan Marler wrote:
 It uses mixin, so not pretty, but it works...
Similar solution: --- printf.d 2019-05-24 00:48:44.840543714 +0200 +++ printf_s.d 2019-05-24 00:52:47.829178613 +0200 -1,13 +1,12 import core.stdc.stdio : printf; -template Seq(A ...) { alias Seq = A; } - int dprintf(string f, A ...)(A args) { enum Fmts = Formats!(A); enum string s = formatString(f, Fmts); + alias args_ = Args!(args); __gshared const(char)* s2 = s.ptr; - return printf(Seq!(s2, args[0..2], args[2..4])); + mixin( q{return printf(s2, } ~ args_ ~ q{);} ); } template Formats(T ...) -42,6 +41,22 template Spec(T : const(char)*) { enum Spec = "%s"; } template Spec(T : T*) { enum Spec = "%p"; } +template Spec(T : string) { enum Spec = "%.*s"; } + +template Args(A ...) +{ + static if (A.length == 0) + enum Args = ""; + else static if (A.length == 1) + enum Args = Arg!(A[0]); + else + enum Args = Arg!(A[0]) ~ ", " ~ Args!(A[1 .. A.length]); +} + +template Arg(alias string arg) { enum Arg = "cast(int)"~arg.stringof~".length,"~arg.stringof~".ptr"; } + +template Arg(alias arg) { enum Arg = arg.stringof; } + /****************************************** * Replace %s format specifiers in f with corresponding specifiers in A[]. * Other format specifiers are left as is.
May 23
prev sibling parent Alex <AJ gmail.com> writes:
 template Spec(T : byte)    { enum Spec = "%d"; }
 template Spec(T : short)   { enum Spec = "%d"; }
 template Spec(T : int)     { enum Spec = "%d"; }
 template Spec(T : long)    { enum Spec = "%lld"; }

 template Spec(T : ubyte)   { enum Spec = "%u"; }
 template Spec(T : ushort)  { enum Spec = "%u"; }
 template Spec(T : uint)    { enum Spec = "%u"; }
 template Spec(T : ulong)   { enum Spec = "%llu"; }

 template Spec(T : float)   { enum Spec = "%g"; }
 template Spec(T : double)  { enum Spec = "%g"; }
 template Spec(T : real)    { enum Spec = "%Lg"; }

 template Spec(T : char)    { enum Spec = "%c"; }
 template Spec(T : wchar)   { enum Spec = "%c"; }
 template Spec(T : dchar)   { enum Spec = "%c"; }
 template Spec(T : string)  { enum Spec = "%.*s"; }

 template Spec(T : immutable(char)*)   { enum Spec = "%s"; }
 template Spec(T : const(char)*)       { enum Spec = "%s"; }
 template Spec(T : T*)                 { enum Spec = "%p"; }
this can all be simplified: static foreach(k,v: ["byte":"%d", "short":"%d", ...]) mixin(`template Spec(T : `~k~`) { enum Spec = "`~v~`"; }`); The string mixin is not necessary but easier than an aliasSeq.
May 23
prev sibling next sibling parent reply Yuxuan Shui <yshuiv7 gmail.com> writes:
On Thursday, 23 May 2019 at 19:33:15 UTC, Walter Bright wrote:
 [snip]

 I completely failed at that. I suspect the language has a 
 deficiency in manipulating expression tuples.

 
What a coincidence, I had this exact problem today as well. It seems currently the only way to do this is either with mixins, or using tuple. Assuming you already have all of the arguments in a tuple: auto args = tuple(...); And args[x] is a string, you can do this: auto args_prime = tuple(args[0..x], args[x].length, args[x].ptr, args[x..$]); You then need to do some template magic to expand all such arguments... Using mixin is probably better.
May 23
parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.com> writes:
On 5/23/19 6:58 PM, Yuxuan Shui wrote:
 On Thursday, 23 May 2019 at 19:33:15 UTC, Walter Bright wrote:
 [snip]

 I completely failed at that. I suspect the language has a deficiency 
 in manipulating expression tuples.
What a coincidence, I had this exact problem today as well. It seems currently the only way to do this is either with mixins, or using tuple. Assuming you already have all of the arguments in a tuple:     auto args = tuple(...); And args[x] is a string, you can do this:     auto args_prime = tuple(args[0..x], args[x].length, args[x].ptr, args[x..$]); You then need to do some template magic to expand all such arguments... Using mixin is probably better.
Did you try .expand with the tuple?
May 23
parent Yuxuan Shui <yshuiv7 gmail.com> writes:
On Friday, 24 May 2019 at 00:41:31 UTC, Andrei Alexandrescu wrote:
 On 5/23/19 6:58 PM, Yuxuan Shui wrote:
 On Thursday, 23 May 2019 at 19:33:15 UTC, Walter Bright wrote:
 [snip]

 I completely failed at that. I suspect the language has a 
 deficiency in manipulating expression tuples.
What a coincidence, I had this exact problem today as well. It seems currently the only way to do this is either with mixins, or using tuple. Assuming you already have all of the arguments in a tuple:     auto args = tuple(...); And args[x] is a string, you can do this:     auto args_prime = tuple(args[0..x], args[x].length, args[x].ptr, args[x..$]); You then need to do some template magic to expand all such arguments... Using mixin is probably better.
Did you try .expand with the tuple?
It's 1 character shorter to just write someTuple[0..$] :)
May 23
prev sibling next sibling parent ag0aep6g <anonymous example.com> writes:
On 23.05.19 21:33, Walter Bright wrote:
 But what I'd like it to do is to extend it to convert a `string s` 
 argument into `cast(int)s.length, s.ptr` tuple and use the "%.*s" 
 specifier for it.
[...]
 Does anyone see a way to make this work?
I don't know if this satisfies the "no extra overhead" rule. Maybe when `arrlen` and `arrptr` are inlined? int dprintf(string f, A ...)(A args) { enum Fmts = Formats!(A); enum string s = formatString(f, Fmts); __gshared const(char)* s2 = s.ptr; import std.meta: staticMap; return printf(Seq!(s2, staticMap!(arg, args))); } template arg(alias a) { static if (is(typeof(a) == string)) alias arg = Seq!(arrlen!a, arrptr!a); else alias arg = a; } auto arrlen(alias a)() { return a.length; } auto arrptr(alias a)() { return a.ptr; } template Spec(T : string) { enum Spec = "%.*s"; } void main() { int i; dprintf!"hello %s %s %s %s betty %s\n"(3, 4.0, &i, "abc".ptr, "foobar"); } // ... rest of the code unchanged ...
May 23
prev sibling next sibling parent reply bpr <brogoff gmail.com> writes:
On Thursday, 23 May 2019 at 19:33:15 UTC, Walter Bright wrote:
 string formatString(string f, string[] A ...)
 {
     string r;
     size_t i;
     size_t ai;
     while (i < f.length)
     {
 	if (f[i] != '%' || i + 1 == f.length)
 	{
 	    r ~= f[i];
Are you sure this works for betterC? It's been a while for me, but I think it won't, the string appends will stop it. Good job at getting unit tests and final switch in!
 ----- End Of Das Code ----------
May 23
parent reply Radu <void null.pt> writes:
On Friday, 24 May 2019 at 04:49:22 UTC, bpr wrote:
 On Thursday, 23 May 2019 at 19:33:15 UTC, Walter Bright wrote:
 string formatString(string f, string[] A ...)
 {
     string r;
     size_t i;
     size_t ai;
     while (i < f.length)
     {
 	if (f[i] != '%' || i + 1 == f.length)
 	{
 	    r ~= f[i];
Are you sure this works for betterC? It's been a while for me, but I think it won't, the string appends will stop it. Good job at getting unit tests and final switch in!
 ----- End Of Das Code ----------
Indeed it doesn't work with -betterC flag. Easily testable on run.dlang.io This probably would work if CTFE was supported when compiling with betterC.
May 24
parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Friday, 24 May 2019 at 07:09:56 UTC, Radu wrote:
 On Friday, 24 May 2019 at 04:49:22 UTC, bpr wrote:
 On Thursday, 23 May 2019 at 19:33:15 UTC, Walter Bright wrote:
 string formatString(string f, string[] A ...)
 {
     string r;
     size_t i;
     size_t ai;
     while (i < f.length)
     {
 	if (f[i] != '%' || i + 1 == f.length)
 	{
 	    r ~= f[i];
Are you sure this works for betterC? It's been a while for me, but I think it won't, the string appends will stop it. Good job at getting unit tests and final switch in!
 ----- End Of Das Code ----------
Indeed it doesn't work with -betterC flag. Easily testable on run.dlang.io This probably would work if CTFE was supported when compiling with betterC.
In cases like this, one needs to use the enum lambda trick: // Before: string foo(string arg1) { /* .. */ } // After: enum foo(string arg1) = () { /* .. */ }; (Replace `string arg1` with all compile-time and run-time parameters that `foo` may take.) That way, `foo` won't reach the code-generator and hence you won't get errors with `-betterC`.
May 24
next sibling parent Sebastiaan Koppe <mail skoppe.eu> writes:
On Friday, 24 May 2019 at 08:00:31 UTC, Petar Kirov [ZombineDev] 
wrote:
 In cases like this, one needs to use the enum lambda trick:

 // Before:
 string foo(string arg1) { /* .. */ }

 // After:
 enum foo(string arg1) = () { /* .. */ };


 (Replace `string arg1` with all compile-time and run-time 
 parameters that `foo` may take.)

 That way, `foo` won't reach the code-generator and hence you 
 won't get errors with `-betterC`.
Ohh, that is nice one. Thanks!
May 24
prev sibling parent reply Radu <void null.pt> writes:
On Friday, 24 May 2019 at 08:00:31 UTC, Petar Kirov [ZombineDev] 
wrote:
 On Friday, 24 May 2019 at 07:09:56 UTC, Radu wrote:
 On Friday, 24 May 2019 at 04:49:22 UTC, bpr wrote:
 [...]
Indeed it doesn't work with -betterC flag. Easily testable on run.dlang.io This probably would work if CTFE was supported when compiling with betterC.
In cases like this, one needs to use the enum lambda trick: // Before: string foo(string arg1) { /* .. */ } // After: enum foo(string arg1) = () { /* .. */ }; (Replace `string arg1` with all compile-time and run-time parameters that `foo` may take.) That way, `foo` won't reach the code-generator and hence you won't get errors with `-betterC`.
Yes, good point! I forgot about this trick.
May 24
parent reply Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:
On Friday, 24 May 2019 at 09:52:58 UTC, Radu wrote:
 Yes, good point! I forgot about this trick.
Best verified on d.godbolt.org. Compare: * https://d.godbolt.org/z/E8aoBg - compiles without -betterC, generates a ton of bloat * https://d.godbolt.org/z/GGh9c1 - same, but doesn't compile with -betterC * https://d.godbolt.org/z/mPQMcc - compiles with -betterC * https://run.dlang.io/gist/run-dlang/1caf15c8c7dded16ba812353361feda9 - what I would like to write, but currently produces too much bloat and doesn't work with -betterC The examples above, were inspired by https://twitter.com/Cor3ntin/status/1127210941718962177. I wanted to check how D with -betterC would compare to C++23+ w.r.t code gen (bloat).
May 24
next sibling parent Radu <void null.pt> writes:
On Friday, 24 May 2019 at 12:14:09 UTC, Petar Kirov [ZombineDev] 
wrote:
 On Friday, 24 May 2019 at 09:52:58 UTC, Radu wrote:
 Yes, good point! I forgot about this trick.
Best verified on d.godbolt.org. Compare: * https://d.godbolt.org/z/E8aoBg - compiles without -betterC, generates a ton of bloat * https://d.godbolt.org/z/GGh9c1 - same, but doesn't compile with -betterC * https://d.godbolt.org/z/mPQMcc - compiles with -betterC * https://run.dlang.io/gist/run-dlang/1caf15c8c7dded16ba812353361feda9 - what I would like to write, but currently produces too much bloat and doesn't work with -betterC The examples above, were inspired by https://twitter.com/Cor3ntin/status/1127210941718962177. I wanted to check how D with -betterC would compare to C++23+ w.r.t code gen (bloat).
I used the same method to generate C header files for a betterC library, I know it works and doesn't produce runtime bloat. To bad it is something you forget, i.e. it is not obvious :)
May 24
prev sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.com> writes:
On 5/24/19 8:14 AM, Petar Kirov [ZombineDev] wrote:
 On Friday, 24 May 2019 at 09:52:58 UTC, Radu wrote:
 Yes, good point! I forgot about this trick.
Best verified on d.godbolt.org. Compare: * https://d.godbolt.org/z/E8aoBg - compiles without -betterC, generates a ton of bloat * https://d.godbolt.org/z/GGh9c1 - same, but doesn't compile with -betterC * https://d.godbolt.org/z/mPQMcc - compiles with -betterC * https://run.dlang.io/gist/run-dlang/1caf15c8c7dded16ba812353361feda9 - what I would like to write, but currently produces too much bloat and doesn't work with -betterC The examples above, were inspired by https://twitter.com/Cor3ntin/status/1127210941718962177. I wanted to check how D with -betterC would compare to C++23+ w.r.t code gen (bloat).
Interesting. These problems seem to be implementation-specific, not fundamental.
May 24
prev sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2019-05-23 21:33, Walter Bright wrote:
 While up at night with jetlag at DConf, I started toying about solving a 
 small problem. In order to use printf(), the format specifiers in the 
 printf format string have to match the types of the rest of the 
 parameters. This is well known to be brittle and error-prone, especially 
 when refactoring the types of the arguments.
 
 (Of course, this is not a problem with writefln() and friends, but that 
 isn't available in the dmd front end, nor when using betterC. Making 
 printf better would mesh nicely with betterC. Note that many C compilers 
 have extensions to tell you if there's a mismatch, but they won't fix it 
 for you.)
 
 I thought why not use D's metaprogramming to fix it. Some ground rules:
 
 1. No extra overhead
 2. Completely self-contained
 3. Only %s specifiers are rewritten
 4. %% is handled
 5. diagnose mismatch between number of specifiers and number of arguments
 
 Here's my solution:
 
      int i;
      dprintf!"hello %s %s %s %s betty\n"(3, 4.0, &i, "abc".ptr);
 
 gets rewritten to:
 
      printf("hello %d %g %p %s betty\n", 3, 4.0, &i, "abc".ptr);
 
This is kind of nice, but I would prefer to have a complete implementation written in D (of sprintf) that is nogc safe nothrow and pure. To avoid having to add various hacks to apply these attributes. Would be nice if it recognizes objects and calls `toString` or `toChars` as well. I can also add that there was this guy at DConf that said that if a D string should be passed to a C library it should manually pass the pointer and length separately without any magic ;) -- /Jacob Carlborg
May 24
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/24/2019 8:35 AM, Jacob Carlborg wrote:
 This is kind of nice, but I would prefer to have a complete implementation 
 written in D (of sprintf) that is  nogc  safe nothrow and pure. To avoid
having 
 to add various hacks to apply these attributes.
C's sprintf is already nogc nothrow and pure. Doing our own is not that easy, in particular, the floating point formatting is a fair amount of tricky work. Besides, this is a few lines of code, and would fit in fine with betterC.
 I can also add that there was this guy at DConf that said that if a D string 
 should be passed to a C library it should manually pass the pointer and length 
 separately without any magic ;)
That wouldn't work with %.*s because the .length argument must be cast to int.
May 24
next sibling parent reply Jacob Carlborg <doob me.com> writes:
On 2019-05-24 20:39, Walter Bright wrote:

 C's sprintf is already  nogc nothrow and pure.
Technically it's not pure because it access `errno`, that's what I meant with "various hacks".
 Doing our own is not that 
 easy, in particular, the floating point formatting is a fair amount of 
 tricky work.
Stefan Koch has an implementation for that [3], even works at CTFE. Not sure if it's compatible with the C implementation though.
 That wouldn't work with %.*s because the .length argument must be cast 
 to int.
Of course it works. The DMD code base is littered with calls to printf with D strings the manually way of passing the pointer and length separately, including the casting. [3] https://github.com/UplinkCoder/fpconv/blob/master/src/fpconv_ctfe.d -- /Jacob Carlborg
May 24
parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/24/2019 12:15 PM, Jacob Carlborg wrote:
 On 2019-05-24 20:39, Walter Bright wrote:
 
 C's sprintf is already  nogc nothrow and pure.
Technically it's not pure because it access `errno`, that's what I meant with "various hacks".
The C standard doesn't say printf can set errno. Be that as it may, I did find one printf that did: "If a multibyte character encoding error occurs while writing wide characters, errno is set to EILSEQ and a negative number is returned." http://www.cplusplus.com/reference/cstdio/printf/ It's pure if not sending it malformed UTF.
 Doing our own is not that easy, in particular, the floating point formatting 
 is a fair amount of tricky work.
Stefan Koch has an implementation for that [3], even works at CTFE. Not sure if it's compatible with the C implementation though.
I have one, too, the DMC++ one, though it doesn't do the fp formatting exactly right. I infer Stefan's doesn't, either, simply because his test suite spans lines 574-583 and is completely inadequate. You can get an idea of what is required by reading: https://www.cs.tufts.edu/~nr/cs257/archive/florian-loitsch/printf.pdf
 That wouldn't work with %.*s because the .length argument must be cast to int.
Of course it works. The DMD code base is littered with calls to printf with D strings the manually way of passing the pointer and length separately, including the casting.
The compiler doesn't know to do the cast when passing `string` arguments by .ptr/.length.
May 24
prev sibling next sibling parent Andrei Alexandrescu <SeeWebsiteForEmail erdani.com> writes:
On 5/24/19 2:39 PM, Walter Bright wrote:
 On 5/24/2019 8:35 AM, Jacob Carlborg wrote:
 This is kind of nice, but I would prefer to have a complete 
 implementation written in D (of sprintf) that is  nogc  safe nothrow 
 and pure. To avoid having to add various hacks to apply these attributes.
C's sprintf is already nogc nothrow and pure. Doing our own is not that easy, in particular, the floating point formatting is a fair amount of tricky work.
This 100x. Once C++ variadics were out, everybody and their cat had an article about "safely replacing the printf family". Invariably the implementations ditched printf and as a consequence were bulky and they did awfully with floating point numbers. The right approach in 10% of the code is to check the arguments during compilation and then forward to the C function. The only remaining slightly tricky part (for printing to a string) is figuring out the maximum buffer size needed.
May 24
prev sibling parent reply Jonathan Marler <johnnymarler gmail.com> writes:
On Friday, 24 May 2019 at 18:39:41 UTC, Walter Bright wrote:
 On 5/24/2019 8:35 AM, Jacob Carlborg wrote:
 This is kind of nice, but I would prefer to have a complete 
 implementation written in D (of sprintf) that is  nogc  safe 
 nothrow and pure. To avoid having to add various hacks to 
 apply these attributes.
C's sprintf is already nogc nothrow and pure. Doing our own is not that easy, in particular, the floating point formatting is a fair amount of tricky work.
It took me about an hour to port this "float to string" implementation to D: https://github.com/ulfjack/ryu https://github.com/dragon-lang/mar/blob/master/src/mar/ryu.d You can use `floatToString` to print a default-formatted float, or you can add your own formats by calling `f2d` which gives you the exponent and mantissa. I only added support for 32-bit floats though. Will add support for more when I need it.
 Besides, this is a few lines of code, and would fit in fine 
 with betterC.
True
 I can also add that there was this guy at DConf that said that 
 if a D string should be passed to a C library it should 
 manually pass the pointer and length separately without any 
 magic ;)
That wouldn't work with %.*s because the .length argument must be cast to int.
Not sure if you'll find it helpful, but I wrote my own "print" framework in my library that's meant to be usable in -betterC and with/without druntime/phobos. https://github.com/dragon-lang/mar/blob/master/Print.md https://github.com/dragon-lang/mar/tree/master/src/mar/print It doesn't use format strings, instead, allows you to return a struct with a "print" function, i.e. import mar.print; int a = 42; sprint("a is: ", a); sprint("a in hex is: 0x", a.formatHex); struct Point { int x; int y; auto print(P)(P printer) const { return printArgs(printer, x, ',', y); } } sprint("point is ", Point(1, 2));
May 24
parent reply Walter Bright <newshound2 digitalmars.com> writes:
On 5/24/2019 2:07 PM, Jonathan Marler wrote:
 It took me about an hour to port this "float to string" implementation to D:
 
 https://github.com/ulfjack/ryu
 https://github.com/dragon-lang/mar/blob/master/src/mar/ryu.d
https://github.com/ulfjack/ryu says: "The Java implementation differs from the output of Double.toString in some cases: sometimes the output is shorter (which is arguably more accurate) and sometimes the output may differ in the precise digits output" which I find fairly concerning. Please review the paper I linked to in my reply to Jacob. Floating point formatting is not something that can be knocked out in an hour. You can get a "mostly working" implementation that way, but not a serious, robust, correct implementation with the expected flexibility. (And the test cases to prove it correct.) The fact that people write academic papers about it should be good evidence. C's printf has been hammered on by literally generations of programmers over 3 decades. While the interface to it is old-fashioned and unsafe, the guts of it are rock solid, fast, and correct.
May 24
next sibling parent reply Jonathan Marler <johnnymarler gmail.com> writes:
On Friday, 24 May 2019 at 22:59:10 UTC, Walter Bright wrote:
 On 5/24/2019 2:07 PM, Jonathan Marler wrote:
 It took me about an hour to port this "float to string" 
 implementation to D:
 
 https://github.com/ulfjack/ryu
 https://github.com/dragon-lang/mar/blob/master/src/mar/ryu.d
https://github.com/ulfjack/ryu says: "The Java implementation differs from the output of Double.toString in some cases: sometimes the output is shorter (which is arguably more accurate) and sometimes the output may differ in the precise digits output" which I find fairly concerning. Please review the paper I linked to in my reply to Jacob. Floating point formatting is not something that can be knocked out in an hour. You can get a "mostly working" implementation that way, but not a serious, robust, correct implementation with the expected flexibility. (And the test cases to prove it correct.) The fact that people write academic papers about it should be good evidence. C's printf has been hammered on by literally generations of programmers over 3 decades. While the interface to it is old-fashioned and unsafe, the guts of it are rock solid, fast, and correct.
I didn't design an implementation in an hour, I just ported one :) Ulf's algorithm can be implemented in only a few hundred lines and apparently is the fastest implementation to-date that maintains a 100% robust algorithm. At least that what I remember from watching his video. https://pldi18.sigplan.org/details/pldi-2018-papers/20/Ry-Fast-Float-to-String-Conversion He explains in the video why this is a hard problem and tries to explain his paper/algorithm. But's it's very new, only a year old I think. Cool innovation.
May 24
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 24 May 2019 at 23:55:13 UTC, Jonathan Marler wrote:
 Ulf's algorithm can be implemented in only a few hundred lines 
 and apparently is the fastest implementation to-date that 
 maintains a 100% robust algorithm.
It is quite interesting that you get that performance without bloat. I wonder if it is faster than the special cased float implementations. (using an estimator that chooses a faster floating point version where it works).
 But's it's very new, only a year old I think.  Cool innovation.
Yes: ACM SIGPLAN Notices - PLDI '18 Volume 53 Issue 4, April 2018 Pages 270-282
May 25
parent reply Patrick Schluter <Patrick.Schluter bbox.fr> writes:
On Saturday, 25 May 2019 at 07:26:47 UTC, Ola Fosheim Grøstad 
wrote:
 On Friday, 24 May 2019 at 23:55:13 UTC, Jonathan Marler wrote:
 Ulf's algorithm can be implemented in only a few hundred lines 
 and apparently is the fastest implementation to-date that 
 maintains a 100% robust algorithm.
 It is quite interesting that you get that performance without 
 bloat.
L1 instruction cache are small and the cost of code bloat is only rarely counted. Benchmarks are overwhelmingly good mannered concerning instruction caches. This makes that optimisation for instruction cache are neglected. I had once on our project a heavily optimised function with a lot of subcases, loop unrolling etc. In the test benchmark it was the fastest to all alternatives. When using in the final application, the simple 2 line loop in pure C, outrun it in the concrete application. With valgrind cachegrind I discovered that the misses in instruction cache made a big, big, difference.
 I wonder if it is faster than the special cased float 
 implementations. (using an estimator that chooses a faster 
 floating point version where it works).

 But's it's very new, only a year old I think.  Cool innovation.
May 25
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Saturday, 25 May 2019 at 11:27:43 UTC, Patrick Schluter wrote:
 L1 instruction cache are small and the cost of code bloat is 
 only rarely counted. Benchmarks are overwhelmingly good 
 mannered concerning instruction caches.
Yes, in benchmarking one should only test full applications… I guess invalidating the cache every N iterations is a possibility in a synthetic benchmark.
 This makes that optimisation for instruction cache are 
 neglected.
Right, I'm interested in seeing what Mike Franklin does for embedded. A minimalistic framework would be interesting to see.
 concrete application. With valgrind cachegrind I discovered 
 that the misses in instruction cache made a big, big, 
 difference.
Interesting, I really need to try that cachegrind some time. Sounds very useful.
May 25
prev sibling next sibling parent reply Mike Franklin <slavo5150 yahoo.com> writes:
On Friday, 24 May 2019 at 22:59:10 UTC, Walter Bright wrote:

 C's printf has been hammered on by literally generations of 
 programmers over 3 decades. While the interface to it is 
 old-fashioned and unsafe, the guts of it are rock solid, fast, 
 and correct.
That may be true, but one problem with `printf` is it is much too large and inefficient for some problem domains [1]. Rust has a more efficient `printf` alternative which is not dependent on a runtime or libc [2]. D could offer a *much* more efficient, pay-for-what-you-use implementation that doesn't require libc, a runtime, etc., like Rust's implementation. It wouldn't be easy (especially wrt floating point types), but it would be a great benefit to D and its users. Maybe I'll add it to dlang/projects [3]. There seems to be a perception about C that because it's old and proven, it's magical. There's nothing `printf` is doing that D can't do better, if someone would just be willing to do the hard work. Mike [1] - Minimizing memory use in embedded systems Tip #3 – Don’t use printf() - https://embeddedgurus.com/stack-overflow/tag/printf/ [2] - std.fmt : https://doc.rust-lang.org/std/fmt/ [3] - dlang/projects - https://github.com/dlang/projects
May 24
next sibling parent Jonathan Marler <johnnymarler gmail.com> writes:
On Friday, 24 May 2019 at 23:58:46 UTC, Mike Franklin wrote:
 On Friday, 24 May 2019 at 22:59:10 UTC, Walter Bright wrote:

 [...]
That may be true, but one problem with `printf` is it is much too large and inefficient for some problem domains [1]. [...]
My implementation is "pay for what you use". A pure D implementation that's also extensible.
May 24
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 5/24/19 7:58 PM, Mike Franklin wrote:
 On Friday, 24 May 2019 at 22:59:10 UTC, Walter Bright wrote:
 
 C's printf has been hammered on by literally generations of 
 programmers over 3 decades. While the interface to it is old-fashioned 
 and unsafe, the guts of it are rock solid, fast, and correct.
That may be true, but one problem with `printf` is it is much too large and inefficient for some problem domains [1]. Rust has a more efficient `printf` alternative which is not dependent on a runtime or libc [2]. D could offer a *much* more efficient, pay-for-what-you-use implementation that doesn't require libc, a runtime, etc., like Rust's implementation.  It wouldn't be easy (especially wrt floating point types), but it would be a great benefit to D and its users.  Maybe I'll add it to dlang/projects [3]. There seems to be a perception about C that because it's old and proven, it's magical.  There's nothing `printf` is doing that D can't do better, if someone would just be willing to do the hard work.
The high impact part is the metaprogramming and introspection machinery. This is where D can contribute something innovative to the larger programming community. Yet another implementation of formatting primitives is low impact.
May 25
next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Saturday, 25 May 2019 at 14:40:09 UTC, Andrei Alexandrescu 
wrote:
 The high impact part is the metaprogramming and introspection 
 machinery. This is where D can contribute something innovative 
 to the larger programming community. Yet another implementation 
 of formatting primitives is low impact.
Programmers are looking for solutions, not machinery… Many people might want to use D for Arduino if it was a plug&play.
May 25
prev sibling parent bpr <brogoff gmail.com> writes:
On Saturday, 25 May 2019 at 14:40:09 UTC, Andrei Alexandrescu 
wrote:
 The high impact part is the metaprogramming and introspection 
 machinery. This is where D can contribute something innovative 
 to the larger programming community.
I'd think you'd be commenting on the "Issue 5710" thread then, as that very much does affect the metaprogramming capabilities of D. I do agree that this is an area where D shines and where D can show off the most compared to its competitors.
 Yet another implementation of formatting primitives is low 
 impact.
You're probably right, but the people implementing this in D are already doing or have done it, so that's a good thing. Also, as someone who will likely only use the betterC subset of D, having some better primitives than printf in BetterC is a small boon, though I'd rather have something more like the new C++ fmt library than like printf.
May 25
prev sibling next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 24 May 2019 at 22:59:10 UTC, Walter Bright wrote:
 https://github.com/ulfjack/ryu says: "The Java implementation 
 differs from the output of Double.toString in some cases: 
 sometimes the output is shorter (which is arguably more 
 accurate) and sometimes the output may differ in the precise 
 digits output" which I find fairly concerning. Please review 
 the paper I linked to in my reply to Jacob.
AFAIK, Ulf Adams is stating that the Java implementation is sloppy (my word). He states that other implementations provide more digits than is necessary to get an accurate representation. https://dl.acm.org/citation.cfm?id=3192369
 C's printf has been hammered on by literally generations of 
 programmers over 3 decades. While the interface to it is 
 old-fashioned and unsafe, the guts of it are rock solid, fast, 
 and correct.
Not really. He argues that the C spec isn't clear, so he follows a more stringent criteria than C printf. Burger and Dybvig found errors in implementations from DEC, HP and SGI: https://www.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf Others claim to find roundoff errors in a common printf implementations e.g.:«The implementation that ships with Microsoft Visual C++ 2010 Express sometimes has an off-by-one error when rounding to the closest value.» http://www.ryanjuckett.com/programming/printing-floating-point-numbers/ (I haven't checked the claim, but it would not surprise me). It is clear that not using C standard lib will bring more consistent and portable results across platforms, even the C version is correct as the C-standard leaves wiggle room. This can be important in scientific computing when comparing results from various platforms.
May 25
prev sibling next sibling parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Friday, 24 May 2019 at 22:59:10 UTC, Walter Bright wrote:
 https://github.com/ulfjack/ryu says: "The Java implementation 
 differs from the output of Double.toString in some cases: 
 sometimes the output is shorter (which is arguably more 
 accurate) and sometimes the output may differ in the precise 
 digits output" which I find fairly concerning. Please review 
 the paper I linked to in my reply to Jacob.
AFAIK, Ulf Adams is stating that the Java specification is unclear, so it is up for debate a to whether the Java implementation is wrong or whether the spec should be reviewed. He also states that other implementations provide more digits than is necessary to get an accurate representation. https://dl.acm.org/citation.cfm?id=3192369
 C's printf has been hammered on by literally generations of 
 programmers over 3 decades. While the interface to it is 
 old-fashioned and unsafe, the guts of it are rock solid, fast, 
 and correct.
Not really. He argues that the C spec isn't clear, so he follows a more stringent criteria than C printf. Burger and Dybvig found errors in implementations from DEC, HP and SGI: https://www.cs.indiana.edu/~dyb/pubs/FP-Printing-PLDI96.pdf Others claim to find roundoff errors in a common printf implementations e.g.:«The implementation that ships with Microsoft Visual C++ 2010 Express sometimes has an off-by-one error when rounding to the closest value.» http://www.ryanjuckett.com/programming/printing-floating-point-numbers/ (I haven't checked the claim, but it would not surprise me). It is clear that not using C standard lib will bring more consistent and portable results across platforms, even if the C version is correct as the C-standard leaves wiggle room. This can be important in scientific computing when comparing results from various platforms.
May 25
prev sibling parent reply Joseph Rushton Wakeling <joseph.wakeling webdrake.net> writes:
On Friday, 24 May 2019 at 22:59:10 UTC, Walter Bright wrote:
 https://github.com/ulfjack/ryu says: "The Java implementation 
 differs from the output of Double.toString in some cases: 
 sometimes the output is shorter (which is arguably more 
 accurate) and sometimes the output may differ in the precise 
 digits output" which I find fairly concerning. Please review 
 the paper I linked to in my reply to Jacob.
FWIW the Ryu algorithm looks like a serious piece of work — see this paper, which references (and compares in detail) to the paper you linked to: https://dl.acm.org/citation.cfm?id=3192369 It covers in some detail the rationale for the differences you note.
 Floating point formatting is not something that can be knocked 
 out in an hour. You can get a "mostly working" implementation 
 that way, but not a serious, robust, correct implementation 
 with the expected flexibility. (And the test cases to prove it 
 correct.)
One interesting remark in the paper on the Ryu algorithm: "We did not compare our implementation against the C standard library function printf, as its specification does not include the correctness criteria set forth by Steele and White [15], and, accordingly, neither the glibc nor the MacOS implementation does."
May 25
parent Walter Bright <newshound2 digitalmars.com> writes:
On 5/25/2019 1:42 AM, Joseph Rushton Wakeling wrote:
 FWIW the Ryu algorithm looks like a serious piece of work — see this paper, 
 which references (and compares in detail) to the paper you linked to:
 https://dl.acm.org/citation.cfm?id=3192369
 
 It covers in some detail the rationale for the differences you note.
Thank you. I've saved a copy of the paper. It it is indeed 1. faster 2. more accurate 3. supports all the options (precision, etc.) with %e %f %g then it is indeed a candidate for inclusion in Phobos' std.format. But not for this exercise.
 Floating point formatting is not something that can be knocked out in an hour. 
 You can get a "mostly working" implementation that way, but not a serious, 
 robust, correct implementation with the expected flexibility. (And the test 
 cases to prove it correct.)
One interesting remark in the paper on the Ryu algorithm: "We did not compare our implementation against the C standard library function printf, as its specification does not include the correctness criteria set forth by Steele and White [15], and, accordingly, neither the glibc nor the MacOS implementation does."
The C standard library does indeed not have correctness criteria for floating point formatting nor for the trig functions, and that does lead to some sloppy implementations, but the mainstream compilers do it well and are expected to. I once attended a presentation on numerics by a math professor, who said the C trig functions on FreeBSD were extremely reliable. He was rather upset (and didn't believe me) when I said I'd tested them and found errors in the last digit. It should be a point of pride for the D language to have the floating point "good to the last bit" and I'm always interested in contributions that get us there.
May 25