www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Why is there no lazy `format`?

reply burt <invalid_email_address cab.abc> writes:
Hello,

I noticed that there is the function `formattedWrite`, which 
outputs its resulting strings to an output range, as follows:
```
unittest
{
     auto output = appender!string();
     output.formattedWrite!"%s %s"(1, 2);
     assert(output.data == "1 2");
}
```

But why is there no formatting function that returns a lazy input 
range? That way, string formatting with (barely) any allocation 
would be possible, in the following way:
```
 nogc unittest
{
     auto range = formatRange!"%s %s"(42, 43);
     assert(range.front == "42");
     range.popFront();
     assert(range.front == " ");
     range.popFront();
     assert(range.front == "43");
     range.popFront();
     assert(range.empty);
}
```

The range returned by `formatRange` could have an internal buffer 
of maybe 16 characters that stores small strings, e.g. for small 
integers. It would also allow chaining with other range 
algorithms: you would call `.joiner()` on it to get an input 
range of chars.

Is this something worth including in the standard library 
(presumably in std.format)?

(The same may also be possible for `std.conv.text` but I did not 
look into this.)
Oct 20 2020
next sibling parent reply rikki cattermole <rikki cattermole.co.nz> writes:
You are describing the purpose of an output range.

I.e.

void test() {
     InPlaceAppender appender;

     appender.formattedWrite!"%d: %d"(123, 456);

     stdout.rawWrite(appender.get);
}

struct InPlaceAppender {
     private {
         char[ushort.max] buffer;
         size_t used;
     }

      disable this(this);

     void put(char c) {
         assert(used < buffer.length);
         buffer[used++] = c;
     }

     scope char[] get() {
         return buffer[0 .. used];
     }

     void reset() {
        used = 0;
        buffer[] = '\0';
     }
}
Oct 20 2020
parent burt <invalid_email_address cab.abc> writes:
On Tuesday, 20 October 2020 at 13:45:20 UTC, rikki cattermole 
wrote:
 You are describing the purpose of an output range.

 I.e.

 [...]
I see. However, this still feels wrong; after all, we also do not use an output range for algorithms like map: ``` OutputRange output; [1, 2, 3].map!((x) => x + 1)(output); ``` Mostly because it does not allow chaining like `lazyFormat("%d plus %d is %d", 1, 2, 3).joiner().map!toUpper()`. It still feels incosistent to me. An input range could achieve the same goals, but it would be much more flexible and pleasing to use.
Oct 20 2020
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 10/20/20 9:28 AM, burt wrote:
 
 The range returned by `formatRange` could have an internal buffer of 
 maybe 16 characters that stores small strings, e.g. for small integers. 
 It would also allow chaining with other range algorithms: you would call 
 `.joiner()` on it to get an input range of chars.
 
 Is this something worth including in the standard library (presumably in 
 std.format)?
 
 (The same may also be possible for `std.conv.text` but I did not look 
 into this.)
I think it's possible, but also it needs a buffer. Which means it needs to allocate. Even a 16 character buffer might not be enough. std.format is not designed around tracking an in-progress conversion, so you would have to convert whole things at once. It might not be that desirable. For example: formatRange("%s", someLargeArrayOrStruct); this is going to have to buffer the *whole thing*, and then give you lazy access to the buffer. In order for this to work, I think you would have to redesign how format works. It's not an easy thing, but could be an interesting way of looking at it. Note that you can probably mimic this with fibers, but that's really heavy for this task. And you still need to allocate a buffer. -Steve
Oct 20 2020
parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Tue, Oct 20, 2020 at 01:10:12PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
[...]
 std.format is not designed around tracking an in-progress conversion,
 so you would have to convert whole things at once. It might not be
 that desirable.
 
 For example:
 
 formatRange("%s", someLargeArrayOrStruct);
 
 this is going to have to buffer the *whole thing*, and then give you
 lazy access to the buffer.
Yeah, I think std.format's design isn't really conducive to lazy access. Also, the way the OP wrote the example code isn't really consistent, because it appears to be returning segments of the formatted string rather than characters in the string, i.e., it behaves like `string[]` rather than `string`, which isn't how std.format is designed to work. If anything, perhaps what's closer to what the OP wants is a lazy version of text(), because there you can actually individually format arguments lazily. But nonetheless, as Steven said, you still need a buffer of arbitrary size because the .toString of an arbitrary user-defined type can return an arbitrary amount of formatted data. You also cannot impose nogc, because .toString methods can potentially be allocating (complex ones almost certainly will). In such scenarios, output ranges are a much better way to control allocations -- the caller specifies the allocation scheme (by passing in an output range that implements the desired allocation scheme). What *would* be nice, is a standard library construct for inverting an output range into an input range. Fibers is one way of doing this. Basically, the pipeline up to the output range will run in its own fiber, and initially it's backgrounded. As data is requested from the input range end of the interface, it will context-switch to the output range fiber and generate data which gets saved into a buffer. At some point calling Fiber.yield(); then the input range end will start spooling the generated data to the caller. Once the buffered data is exhausted, it context-switches to the output range fiber again, etc.. Note that this does not alleviate the need for buffering, and it's not 100% lazy; what it primarily does is to give a nice input range interface for stuff written into an output range. I don't expect it will do very well performance-wise either, unless the data generators are designed to cooperate with the inverter -- but in that case, they would have been written to return an input range instead of requiring an output range in the first place. So this construct is really more for convenience than anything. T -- Любишь кататься - люби и саночки возить.
Oct 20 2020
parent burt <invalid_email_address cab.abc> writes:
On Tuesday, 20 October 2020 at 18:03:32 UTC, H. S. Teoh wrote:
 [...]

 Yeah, I think std.format's design isn't really conducive to 
 lazy access. Also, the way the OP wrote the example code isn't 
 really consistent, because it appears to be returning segments 
 of the formatted string rather than characters in the string, 
 i.e., it behaves like `string[]` rather than `string`, which 
 isn't how std.format is designed to work.
Well, the idea was that you could call `join()` or `flatten()` or whatever it is called to turn it into an input range of chars. But it could also do that directly. I understand now why returning an input range could be problematic though.
 [...]
 What *would* be nice, is a standard library construct for 
 inverting an output range into an input range. Fibers is one 
 way of doing this. Basically, the pipeline up to the output 
 range will run in its own fiber, and initially it's 
 backgrounded. As data is requested from the input range end of 
 the interface, it will context-switch to the output range fiber 
 and generate data which gets saved into a buffer. At some point 
 calling Fiber.yield(); then the input range end will start 
 spooling the generated data to the caller.  Once the buffered 
 data is exhausted, it context-switches to the output range 
 fiber again, etc..

 Note that this does not alleviate the need for buffering, and 
 it's not 100% lazy; what it primarily does is to give a nice 
 input range interface for stuff written into an output range.  
 I don't expect it will do very well performance-wise either, 
 unless the data generators are designed to cooperate with the 
 inverter -- but in that case, they would have been written to 
 return an input range instead of requiring an output range in 
 the first place. So this construct is really more for 
 convenience than anything.
Interesting idea. Although maybe it doesn't even have to use fibers to work, if you're willing to give up the laziness part: ``` /*ref*/ O pipeRange(alias fn, O, T...)(/*ref*/ O output, T args) if (isInputRange!O && isOutputRange!O) { fn(output, args); return output; } auto thing = appender!string() .pipeRange!formattedWrite("%d plus %d is %d", 1, 2, 3) .map!toUpperCase() .array(); ``` Or something like that.
Oct 20 2020