www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - std.format with wstring and dstring

reply Robert Schadek <rburners gmail.com> writes:
I spend one day at dconf this year and removed wstring and 
dstring support from
std.format to improve compile speed. std.format.FormatSpec is no 
longer a template
on the character type, and the bitfield template was removed as 
well.

The dub package can be found here 
https://code.dlang.org/packages/std2_format
https://github.com/burner/std2.format

When compiling the below format call with ldc and -ftime-trace
```
import std2.format;
//import std.format;

void main(){
     string s = format("Hello %s %s %.2f", "World", 1337, 13.37);
     assert(s == "Hello World 1337 13.37", s);
}

```

The overall compile time decreases from 290ms to 223ms and the 
frontend time for
the format call goes from 71ms to 23ms.

Currently, alias this and toString tests fail. And I can't really 
figure out why.
Also some float tests fails.

PR's are always welcome.

Meta: Removing wstring and dstring support from std.format for 
phobos 3 should be looked at IMHO.
Sep 07
next sibling parent reply monkyyy <crazymonkyyy gmail.com> writes:
On Sunday, 7 September 2025 at 12:29:08 UTC, Robert Schadek wrote:
 I spend one day at dconf this year and removed wstring and 
 dstring support from
 std.format to improve compile speed. std.format.FormatSpec is 
 no longer a template
 on the character type, and the bitfield template was removed as 
 well.

 The dub package can be found here 
 https://code.dlang.org/packages/std2_format
 https://github.com/burner/std2.format

 When compiling the below format call with ldc and -ftime-trace
 ```
 import std2.format;
 //import std.format;

 void main(){
     string s = format("Hello %s %s %.2f", "World", 1337, 13.37);
     assert(s == "Hello World 1337 13.37", s);
 }

 ```

 The overall compile time decreases from 290ms to 223ms and the 
 frontend time for
 the format call goes from 71ms to 23ms.

 Currently, alias this and toString tests fail. And I can't 
 really figure out why.
 Also some float tests fails.

 PR's are always welcome.

 Meta: Removing wstring and dstring support from std.format for 
 phobos 3 should be looked at IMHO.
wouldnt be far faster to just rip out all the c api complexity and just do a simple sane api?
Sep 07
parent reply Robert Schadek <rburners gmail.com> writes:
On Sunday, 7 September 2025 at 13:15:21 UTC, monkyyy wrote:
 wouldnt be far faster to just rip out all the c api complexity 
 and just do a simple sane api?
No, not at all. The compile time goes into all the duck typing of toString on classes and structs and nested formats like %(%s,%)
Sep 07
parent monkyyy <crazymonkyyy gmail.com> writes:
On Sunday, 7 September 2025 at 19:09:01 UTC, Robert Schadek wrote:
 On Sunday, 7 September 2025 at 13:15:21 UTC, monkyyy wrote:
 wouldnt be far faster to just rip out all the c api complexity 
 and just do a simple sane api?
No, not at all. The compile time goes into all the duck typing of toString on classes and structs and nested formats like %(%s,%)
Based on what? This api is crazy In what world is "nested" formats a simple or sane api?
Sep 08
prev sibling next sibling parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
std.format is on my list of modules that I want replaced.

Its written with a lot of error conditions, rather than just adapting to 
the inputs. Hence lots of potential for exceptions being thrown that 
don't need to be.

Multiple multipliers in ``formattedWrite``: format string, output range.

What I'd like to do is to force IES for one or more values, if you want 
finer grained control you want do one value a time (``formatValue``).

```d
writeln(i"$i ${i:X}: $(j + 1)/${(j + 1):X} $1 ${0:X}", k, obj, "atend");
```

Require the use of a string builder, rather than any old output range.
This is a required change for pretty printing. It requires the use of 
arbitrary inserts and removals that ranges can't do.

Every template parameter like these that you have is a multiplier of 
instances, and that isn't good for compile times. Simplifying them down 
may seem like a pain, but it helps quite significantly. Given that there 
are some clear requirements and use cases we can in fact simplify it 
without hurting anyone enough to care.
Sep 07
parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Mon, Sep 08, 2025 at 01:09:02PM +1200, Richard (Rikki) Andrew Cattermole via
Digitalmars-d-announce wrote:
 std.format is on my list of modules that I want replaced.
 
 Its written with a lot of error conditions, rather than just adapting
 to the inputs. Hence lots of potential for exceptions being thrown
 that don't need to be.
 
 Multiple multipliers in ``formattedWrite``: format string, output
 range.
IMO, a lot of template bloat could be removed by internally converting output ranges to a delegate of static type that receives a const(char)[] and writes to whatever output range was passed in from user code. 90% of the std.format code does not actually care for the concrete type of the output range; we do not need a copy of the entire formatting code for every output range type passed in. Just erase the type at the entry function and make most of the formatting code non-templated.
 What I'd like to do is to force IES for one or more values, if you
 want finer grained control you want do one value a time
 (``formatValue``).
 
 ```d
 writeln(i"$i ${i:X}: $(j + 1)/${(j + 1):X} $1 ${0:X}", k, obj, "atend");
 ```
I'm still a fan of old-school format strings, I've to admit. Having to manually type `formatValue(x), formatValue(y), ...` is just way too much boilerplate.
 Require the use of a string builder, rather than any old output range.
Too much boilerplate to use a string builder.
 This is a required change for pretty printing. It requires the use of
 arbitrary inserts and removals that ranges can't do.
Format strings should be just strings. It should not accept arbitrary ranges (does it do that currently?). Arguments may be ranges.
 Every template parameter like these that you have is a multiplier of
 instances, and that isn't good for compile times. Simplifying them
 down may seem like a pain, but it helps quite significantly. Given
 that there are some clear requirements and use cases we can in fact
 simplify it without hurting anyone enough to care.
See, the thing is that the current implementation of std.format goes about things the wrong way. It really should be just a thin wrapper template, the sole purpose of which is to unpack the incoming arguments and forward them to non-templated formatting functions. Or, at least, formatting functions templated only on a *single* argument type (like formatValue!int, formatValue!string, formatValue!float, ...), or perhaps just overload on various basic types, maybe plus a couple of templates for handling structs and classes, not on the entire `Args...` tuple of types. The latter causes combinatorial explosion of template instances, which is both bloating and needless. Only the top-level std.format.format needs to be templated on `Args...`. This should be split up so that instead of O(n*m) template instantiations we have only O(n) template instantiations (or preferably, O(1) template instantiations if all the type-dependent stuff is handled at the top level, and all lower-level functions are isolated formatting functions that only do one job each). Also, I dream of the day when we can pass compile-time format strings to std.format and it will *only* instantiate the formatting functions that you actually use. Float-formatting functions are particularly complex and bloaty; why should your program pay for that extra baggage if you never actually format a float? The various formatting functions should be pulled in only when you actually use them. Just because you call std.format with "%d" should not also pull in the whole shebang for formatting floats, structs, classes, BigInts, and who knows what else. T -- Economics: (n.) The science of explaining why yesterday's predictions didn't come true today.
Sep 08
parent reply "Richard (Rikki) Andrew Cattermole" <richard cattermole.co.nz> writes:
On 09/09/2025 3:39 AM, H. S. Teoh wrote:
 On Mon, Sep 08, 2025 at 01:09:02PM +1200, Richard (Rikki) Andrew Cattermole
via Digitalmars-d-announce wrote:
 std.format is on my list of modules that I want replaced.

 Its written with a lot of error conditions, rather than just adapting
 to the inputs. Hence lots of potential for exceptions being thrown
 that don't need to be.

 Multiple multipliers in ``formattedWrite``: format string, output
 range.
IMO, a lot of template bloat could be removed by internally converting output ranges to a delegate of static type that receives a const(char)[] and writes to whatever output range was passed in from user code. 90% of the std.format code does not actually care for the concrete type of the output range; we do not need a copy of the entire formatting code for every output range type passed in. Just erase the type at the entry function and make most of the formatting code non-templated.
 What I'd like to do is to force IES for one or more values, if you
 want finer grained control you want do one value a time
 (``formatValue``).

 ```d
 writeln(i"$i ${i:X}: $(j + 1)/${(j + 1):X} $1 ${0:X}", k, obj, "atend");
 ```
I'm still a fan of old-school format strings, I've to admit. Having to manually type `formatValue(x), formatValue(y), ...` is just way too much boilerplate.
Agreed it is too much boilerplate. If we have to add a runtime string option then we can, the machinery will all be there. However it shouldn't be the option people should be reaching for. How long have we been recommending the template parameter for formatting over the runtime one? A good 10+ years now right.
 Require the use of a string builder, rather than any old output range.
Too much boilerplate to use a string builder.
Nah. We use appenders in place of the string builder today. Its a direct 1:1 swap.
 This is a required change for pretty printing. It requires the use of
 arbitrary inserts and removals that ranges can't do.
Format strings should be just strings. It should not accept arbitrary ranges (does it do that currently?). Arguments may be ranges.
The reply isn't matching what I said?
 Every template parameter like these that you have is a multiplier of
 instances, and that isn't good for compile times. Simplifying them
 down may seem like a pain, but it helps quite significantly. Given
 that there are some clear requirements and use cases we can in fact
 simplify it without hurting anyone enough to care.
See, the thing is that the current implementation of std.format goes about things the wrong way. It really should be just a thin wrapper template, the sole purpose of which is to unpack the incoming arguments and forward them to non-templated formatting functions. Or, at least, formatting functions templated only on a *single* argument type (like formatValue!int, formatValue!string, formatValue!float, ...), or perhaps just overload on various basic types, maybe plus a couple of templates for handling structs and classes, not on the entire `Args...` tuple of types. The latter causes combinatorial explosion of template instances, which is both bloating and needless. Only the top-level std.format.format needs to be templated on `Args...`. This should be split up so that instead of O(n*m) template instantiations we have only O(n) template instantiations (or preferably, O(1) template instantiations if all the type-dependent stuff is handled at the top level, and all lower-level functions are isolated formatting functions that only do one job each).
Looks like its three template parameters: writer, format spec char and value type. Two of which I want gone. Doesn't appear to have a central dispatcher, its leaving it to overloading which is not a good design IMO. We'd like it to be closer to mine (although for whatever reason I've still got the builder templated): https://github.com/Project-Sidero/basic_memory/blob/main/source/sidero/base/text/format/rawwrite.d#L13
 Also, I dream of the day when we can pass compile-time format strings to
 std.format and it will *only* instantiate the formatting functions that
 you actually use. Float-formatting functions are particularly complex
 and bloaty; why should your program pay for that extra baggage if you
 never actually format a float?  The various formatting functions should
 be pulled in only when you actually use them.  Just because you call
 std.format with "%d" should not also pull in the whole shebang for
 formatting floats, structs, classes, BigInts, and who knows what else.
It already is. https://github.com/dlang/phobos/blob/master/std/format/internal/write.d#L575
Sep 08
parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Tue, Sep 09, 2025 at 02:26:20PM +1200, Richard (Rikki) Andrew Cattermole via
Digitalmars-d-announce wrote:
 On 09/09/2025 3:39 AM, H. S. Teoh wrote:
[...]
 I'm still a fan of old-school format strings, I've to admit. Having
 to manually type `formatValue(x), formatValue(y), ...` is just way
 too much boilerplate.
Agreed it is too much boilerplate. If we have to add a runtime string option then we can, the machinery will all be there. However it shouldn't be the option people should be reaching for. How long have we been recommending the template parameter for formatting over the runtime one? A good 10+ years now right.
 Require the use of a string builder, rather than any old output range.
Too much boilerplate to use a string builder.
Nah. We use appenders in place of the string builder today.
Oh you mean use a string building internally? That makes sense. I misunderstood, I thought you meant for user code to use a string builder. [...]
 See, the thing is that the current implementation of std.format goes
 about things the wrong way.  It really should be just a thin wrapper
 template, the sole purpose of which is to unpack the incoming
 arguments and forward them to non-templated formatting functions.
 Or, at least, formatting functions templated only on a *single*
 argument type (like formatValue!int, formatValue!string,
 formatValue!float, ...), or perhaps just overload on various basic
 types, maybe plus a couple of templates for handling structs and
 classes, not on the entire `Args...` tuple of types.  The latter
 causes combinatorial explosion of template instances, which is both
 bloating and needless.  Only the top-level std.format.format needs
 to be templated on `Args...`.  This should be split up so that
 instead of O(n*m) template instantiations we have only O(n) template
 instantiations (or preferably, O(1) template instantiations if all
 the type-dependent stuff is handled at the top level, and all
 lower-level functions are isolated formatting functions that only do
 one job each).
Looks like its three template parameters: writer, format spec char and value type. Two of which I want gone.
Yeah, the writer should be type-erased to a delegate that receives string data, the format spec should not be templated on char type. Value type should pretty much be the only template parameter.
 Doesn't appear to have a central dispatcher, its leaving it to
 overloading which is not a good design IMO.
Yeah it's a mess. :-/ I did look at this code some years ago, hoping to find low-hanging fruits to improve it, but gave up after struggling with the tangled mess that it was in.
 We'd like it to be closer to mine (although for whatever reason I've
 still got the builder templated):
 
 https://github.com/Project-Sidero/basic_memory/blob/main/source/sidero/base/text/format/rawwrite.d#L13
Standardizing to a delegate for the builder allows us to swap out different builders without incurring any template bloat. Not sure if that's necessary, but could be a nice escape hatch just in case an unusual case arises. Templates are powerful but sometimes type erasure is called for.
 Also, I dream of the day when we can pass compile-time format strings to
 std.format and it will *only* instantiate the formatting functions that
 you actually use. Float-formatting functions are particularly complex
 and bloaty; why should your program pay for that extra baggage if you
 never actually format a float?  The various formatting functions should
 be pulled in only when you actually use them.  Just because you call
 std.format with "%d" should not also pull in the whole shebang for
 formatting floats, structs, classes, BigInts, and who knows what else.
It already is. https://github.com/dlang/phobos/blob/master/std/format/internal/write.d#L575
+1. T -- Talk is cheap. Whining is actually free. -- Lars Wirzenius
Sep 08
prev sibling next sibling parent reply IchorDev <zxinsworld gmail.com> writes:
On Sunday, 7 September 2025 at 12:29:08 UTC, Robert Schadek wrote:
 Meta: Removing wstring and dstring support from std.format for 
 phobos 3 should be looked at IMHO.
As I understand it, Phobos 3 will basically not support UTF-16 or UTF-32 anymore except for encoding conversion. As someone who appreciates UTF-32's elegant simplicity, I find this saddening.
Sep 08
parent Dom DiSc <dominikus scherkl.de> writes:
On Monday, 8 September 2025 at 08:49:53 UTC, IchorDev wrote:
 On Sunday, 7 September 2025 at 12:29:08 UTC, Robert Schadek 
 wrote:
 Meta: Removing wstring and dstring support from std.format for 
 phobos 3 should be looked at IMHO.
As I understand it, Phobos 3 will basically not support UTF-16 or UTF-32 anymore except for encoding conversion. As someone who appreciates UTF-32's elegant simplicity, I find this saddening.
This "simplicity" is gone in the moment you start working with graphemes. A "unit" on the screen may consist of multiple characters no matter which encoding you use - so why not stay with a single one (UTF-8)?
Sep 08
prev sibling parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Sun, Sep 07, 2025 at 12:29:08PM +0000, Robert Schadek via
Digitalmars-d-announce wrote:
 I spend one day at dconf this year and removed wstring and dstring
 support from std.format to improve compile speed.
 std.format.FormatSpec is no longer a template on the character type,
 and the bitfield template was removed as well.
[...] +1, time to get rid of excess baggage. T -- PENTIUM = Produces Erroneous Numbers Thru Incorrect Understanding of Mathematics
Sep 08