digitalmars.D - Why is there no lazy `format`?

burt (36/36) Oct 20 2020 Hello,

rikki cattermole (25/25) Oct 20 2020 You are describing the purpose of an output range.

burt (13/16) Oct 20 2020 I see. However, this still feels wrong; after all, we also do not

Steven Schveighoffer (16/27) Oct 20 2020 I think it's possible, but also it needs a buffer. Which means it needs

H. S. Teoh (37/47) Oct 20 2020 Yeah, I think std.format's design isn't really conducive to lazy access.

burt (21/49) Oct 20 2020 Well, the idea was that you could call `join()` or `flatten()` or

burt <invalid_email_address cab.abc> writes:

Hello,

I noticed that there is the function `formattedWrite`, which 
outputs its resulting strings to an output range, as follows:
```
unittest
{
     auto output = appender!string();
     output.formattedWrite!"%s %s"(1, 2);
     assert(output.data == "1 2");
}
```

But why is there no formatting function that returns a lazy input 
range? That way, string formatting with (barely) any allocation 
would be possible, in the following way:
```
 nogc unittest
{
     auto range = formatRange!"%s %s"(42, 43);
     assert(range.front == "42");
     range.popFront();
     assert(range.front == " ");
     range.popFront();
     assert(range.front == "43");
     range.popFront();
     assert(range.empty);
}
```

The range returned by `formatRange` could have an internal buffer 
of maybe 16 characters that stores small strings, e.g. for small 
integers. It would also allow chaining with other range 
algorithms: you would call `.joiner()` on it to get an input 
range of chars.

Is this something worth including in the standard library 
(presumably in std.format)?

(The same may also be possible for `std.conv.text` but I did not 
look into this.)

Oct 20 2020

rikki cattermole <rikki cattermole.co.nz> writes:

You are describing the purpose of an output range.

I.e.

void test() {
     InPlaceAppender appender;

     appender.formattedWrite!"%d: %d"(123, 456);

     stdout.rawWrite(appender.get);
}

struct InPlaceAppender {
     private {
         char[ushort.max] buffer;
         size_t used;
     }

      disable this(this);

     void put(char c) {
         assert(used < buffer.length);
         buffer[used++] = c;
     }

     scope char[] get() {
         return buffer[0 .. used];
     }

     void reset() {
        used = 0;
        buffer[] = '\0';
     }
}

Oct 20 2020

burt <invalid_email_address cab.abc> writes:

On Tuesday, 20 October 2020 at 13:45:20 UTC, rikki cattermole 
wrote:
 You are describing the purpose of an output range.

 I.e.

 [...]

I see. However, this still feels wrong; after all, we also do not 
use an output range for algorithms like map:
```
OutputRange output;
[1, 2, 3].map!((x) => x + 1)(output);
```

Mostly because it does not allow chaining like `lazyFormat("%d 
plus %d is %d", 1, 2, 3).joiner().map!toUpper()`.

It still feels incosistent to me. An input range could achieve 
the same goals, but it would be much more flexible and pleasing 
to use.

Oct 20 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 10/20/20 9:28 AM, burt wrote:
 
 The range returned by `formatRange` could have an internal buffer of 
 maybe 16 characters that stores small strings, e.g. for small integers. 
 It would also allow chaining with other range algorithms: you would call 
 `.joiner()` on it to get an input range of chars.
 
 Is this something worth including in the standard library (presumably in 
 std.format)?
 
 (The same may also be possible for `std.conv.text` but I did not look 
 into this.)

I think it's possible, but also it needs a buffer. Which means it needs 
to allocate. Even a 16 character buffer might not be enough.

std.format is not designed around tracking an in-progress conversion, so 
you would have to convert whole things at once. It might not be that 
desirable.

For example:

formatRange("%s", someLargeArrayOrStruct);

this is going to have to buffer the *whole thing*, and then give you 
lazy access to the buffer.

In order for this to work, I think you would have to redesign how format 
works. It's not an easy thing, but could be an interesting way of 
looking at it.

Note that you can probably mimic this with fibers, but that's really 
heavy for this task. And you still need to allocate a buffer.

-Steve

Oct 20 2020

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Tue, Oct 20, 2020 at 01:10:12PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
[...]
 std.format is not designed around tracking an in-progress conversion,
 so you would have to convert whole things at once. It might not be
 that desirable.
 
 For example:
 
 formatRange("%s", someLargeArrayOrStruct);
 
 this is going to have to buffer the *whole thing*, and then give you
 lazy access to the buffer.

Yeah, I think std.format's design isn't really conducive to lazy access.
Also, the way the OP wrote the example code isn't really consistent,
because it appears to be returning segments of the formatted string
rather than characters in the string, i.e., it behaves like `string[]`
rather than `string`, which isn't how std.format is designed to work.

If anything, perhaps what's closer to what the OP wants is a lazy
version of text(), because there you can actually individually format
arguments lazily.  But nonetheless, as Steven said, you still need a
buffer of arbitrary size because the .toString of an arbitrary
user-defined type can return an arbitrary amount of formatted data.  You
also cannot impose  nogc, because .toString methods can potentially be
allocating (complex ones almost certainly will).

In such scenarios, output ranges are a much better way to control
allocations -- the caller specifies the allocation scheme (by passing in
an output range that implements the desired allocation scheme).

What *would* be nice, is a standard library construct for inverting an
output range into an input range. Fibers is one way of doing this.
Basically, the pipeline up to the output range will run in its own
fiber, and initially it's backgrounded. As data is requested from the
input range end of the interface, it will context-switch to the output
range fiber and generate data which gets saved into a buffer. At some
point calling Fiber.yield(); then the input range end will start
spooling the generated data to the caller.  Once the buffered data is
exhausted, it context-switches to the output range fiber again, etc..

Note that this does not alleviate the need for buffering, and it's not
100% lazy; what it primarily does is to give a nice input range
interface for stuff written into an output range.  I don't expect it
will do very well performance-wise either, unless the data generators
are designed to cooperate with the inverter -- but in that case, they
would have been written to return an input range instead of requiring an
output range in the first place. So this construct is really more for
convenience than anything.


T

-- 
Любишь кататься - люби и саночки возить.

Oct 20 2020

burt <invalid_email_address cab.abc> writes:

On Tuesday, 20 October 2020 at 18:03:32 UTC, H. S. Teoh wrote:
 [...]

 Yeah, I think std.format's design isn't really conducive to 
 lazy access. Also, the way the OP wrote the example code isn't 
 really consistent, because it appears to be returning segments 
 of the formatted string rather than characters in the string, 
 i.e., it behaves like `string[]` rather than `string`, which 
 isn't how std.format is designed to work.

Well, the idea was that you could call `join()` or `flatten()` or 
whatever it is called to turn it into an input range of chars. 
But it could also do that directly.

I understand now why returning an input range could be 
problematic though.

 [...]
 What *would* be nice, is a standard library construct for 
 inverting an output range into an input range. Fibers is one 
 way of doing this. Basically, the pipeline up to the output 
 range will run in its own fiber, and initially it's 
 backgrounded. As data is requested from the input range end of 
 the interface, it will context-switch to the output range fiber 
 and generate data which gets saved into a buffer. At some point 
 calling Fiber.yield(); then the input range end will start 
 spooling the generated data to the caller.  Once the buffered 
 data is exhausted, it context-switches to the output range 
 fiber again, etc..

 Note that this does not alleviate the need for buffering, and 
 it's not 100% lazy; what it primarily does is to give a nice 
 input range interface for stuff written into an output range.  
 I don't expect it will do very well performance-wise either, 
 unless the data generators are designed to cooperate with the 
 inverter -- but in that case, they would have been written to 
 return an input range instead of requiring an output range in 
 the first place. So this construct is really more for 
 convenience than anything.

Interesting idea. Although maybe it doesn't even have to use 
fibers to work, if you're willing to give up the laziness part:

```
/*ref*/ O pipeRange(alias fn, O, T...)(/*ref*/ O output, T args)
if (isInputRange!O && isOutputRange!O)
{
     fn(output, args);
     return output;
}

auto thing = appender!string()
     .pipeRange!formattedWrite("%d plus %d is %d", 1, 2, 3)
     .map!toUpperCase()
     .array();
```

Or something like that.

Oct 20 2020

D Programming

C/C++ Programming

Other

digitalmars.D - Why is there no lazy `format`?