digitalmars.D - Why is there no lazy `format`?
- burt (36/36) Oct 20 2020 Hello,
- rikki cattermole (25/25) Oct 20 2020 You are describing the purpose of an output range.
- burt (13/16) Oct 20 2020 I see. However, this still feels wrong; after all, we also do not
- Steven Schveighoffer (16/27) Oct 20 2020 I think it's possible, but also it needs a buffer. Which means it needs
- H. S. Teoh (37/47) Oct 20 2020 Yeah, I think std.format's design isn't really conducive to lazy access.
- burt (21/49) Oct 20 2020 Well, the idea was that you could call `join()` or `flatten()` or
Hello,
I noticed that there is the function `formattedWrite`, which
outputs its resulting strings to an output range, as follows:
```
unittest
{
auto output = appender!string();
output.formattedWrite!"%s %s"(1, 2);
assert(output.data == "1 2");
}
```
But why is there no formatting function that returns a lazy input
range? That way, string formatting with (barely) any allocation
would be possible, in the following way:
```
nogc unittest
{
auto range = formatRange!"%s %s"(42, 43);
assert(range.front == "42");
range.popFront();
assert(range.front == " ");
range.popFront();
assert(range.front == "43");
range.popFront();
assert(range.empty);
}
```
The range returned by `formatRange` could have an internal buffer
of maybe 16 characters that stores small strings, e.g. for small
integers. It would also allow chaining with other range
algorithms: you would call `.joiner()` on it to get an input
range of chars.
Is this something worth including in the standard library
(presumably in std.format)?
(The same may also be possible for `std.conv.text` but I did not
look into this.)
Oct 20 2020
You are describing the purpose of an output range.
I.e.
void test() {
InPlaceAppender appender;
appender.formattedWrite!"%d: %d"(123, 456);
stdout.rawWrite(appender.get);
}
struct InPlaceAppender {
private {
char[ushort.max] buffer;
size_t used;
}
disable this(this);
void put(char c) {
assert(used < buffer.length);
buffer[used++] = c;
}
scope char[] get() {
return buffer[0 .. used];
}
void reset() {
used = 0;
buffer[] = '\0';
}
}
Oct 20 2020
On Tuesday, 20 October 2020 at 13:45:20 UTC, rikki cattermole wrote:You are describing the purpose of an output range. I.e. [...]I see. However, this still feels wrong; after all, we also do not use an output range for algorithms like map: ``` OutputRange output; [1, 2, 3].map!((x) => x + 1)(output); ``` Mostly because it does not allow chaining like `lazyFormat("%d plus %d is %d", 1, 2, 3).joiner().map!toUpper()`. It still feels incosistent to me. An input range could achieve the same goals, but it would be much more flexible and pleasing to use.
Oct 20 2020
On 10/20/20 9:28 AM, burt wrote:The range returned by `formatRange` could have an internal buffer of maybe 16 characters that stores small strings, e.g. for small integers. It would also allow chaining with other range algorithms: you would call `.joiner()` on it to get an input range of chars. Is this something worth including in the standard library (presumably in std.format)? (The same may also be possible for `std.conv.text` but I did not look into this.)I think it's possible, but also it needs a buffer. Which means it needs to allocate. Even a 16 character buffer might not be enough. std.format is not designed around tracking an in-progress conversion, so you would have to convert whole things at once. It might not be that desirable. For example: formatRange("%s", someLargeArrayOrStruct); this is going to have to buffer the *whole thing*, and then give you lazy access to the buffer. In order for this to work, I think you would have to redesign how format works. It's not an easy thing, but could be an interesting way of looking at it. Note that you can probably mimic this with fibers, but that's really heavy for this task. And you still need to allocate a buffer. -Steve
Oct 20 2020
On Tue, Oct 20, 2020 at 01:10:12PM -0400, Steven Schveighoffer via
Digitalmars-d wrote:
[...]
std.format is not designed around tracking an in-progress conversion,
so you would have to convert whole things at once. It might not be
that desirable.
For example:
formatRange("%s", someLargeArrayOrStruct);
this is going to have to buffer the *whole thing*, and then give you
lazy access to the buffer.
Yeah, I think std.format's design isn't really conducive to lazy access.
Also, the way the OP wrote the example code isn't really consistent,
because it appears to be returning segments of the formatted string
rather than characters in the string, i.e., it behaves like `string[]`
rather than `string`, which isn't how std.format is designed to work.
If anything, perhaps what's closer to what the OP wants is a lazy
version of text(), because there you can actually individually format
arguments lazily. But nonetheless, as Steven said, you still need a
buffer of arbitrary size because the .toString of an arbitrary
user-defined type can return an arbitrary amount of formatted data. You
also cannot impose nogc, because .toString methods can potentially be
allocating (complex ones almost certainly will).
In such scenarios, output ranges are a much better way to control
allocations -- the caller specifies the allocation scheme (by passing in
an output range that implements the desired allocation scheme).
What *would* be nice, is a standard library construct for inverting an
output range into an input range. Fibers is one way of doing this.
Basically, the pipeline up to the output range will run in its own
fiber, and initially it's backgrounded. As data is requested from the
input range end of the interface, it will context-switch to the output
range fiber and generate data which gets saved into a buffer. At some
point calling Fiber.yield(); then the input range end will start
spooling the generated data to the caller. Once the buffered data is
exhausted, it context-switches to the output range fiber again, etc..
Note that this does not alleviate the need for buffering, and it's not
100% lazy; what it primarily does is to give a nice input range
interface for stuff written into an output range. I don't expect it
will do very well performance-wise either, unless the data generators
are designed to cooperate with the inverter -- but in that case, they
would have been written to return an input range instead of requiring an
output range in the first place. So this construct is really more for
convenience than anything.
T
--
Любишь кататься - люби и саночки возить.
Oct 20 2020
On Tuesday, 20 October 2020 at 18:03:32 UTC, H. S. Teoh wrote:[...] Yeah, I think std.format's design isn't really conducive to lazy access. Also, the way the OP wrote the example code isn't really consistent, because it appears to be returning segments of the formatted string rather than characters in the string, i.e., it behaves like `string[]` rather than `string`, which isn't how std.format is designed to work.Well, the idea was that you could call `join()` or `flatten()` or whatever it is called to turn it into an input range of chars. But it could also do that directly. I understand now why returning an input range could be problematic though.[...] What *would* be nice, is a standard library construct for inverting an output range into an input range. Fibers is one way of doing this. Basically, the pipeline up to the output range will run in its own fiber, and initially it's backgrounded. As data is requested from the input range end of the interface, it will context-switch to the output range fiber and generate data which gets saved into a buffer. At some point calling Fiber.yield(); then the input range end will start spooling the generated data to the caller. Once the buffered data is exhausted, it context-switches to the output range fiber again, etc.. Note that this does not alleviate the need for buffering, and it's not 100% lazy; what it primarily does is to give a nice input range interface for stuff written into an output range. I don't expect it will do very well performance-wise either, unless the data generators are designed to cooperate with the inverter -- but in that case, they would have been written to return an input range instead of requiring an output range in the first place. So this construct is really more for convenience than anything.Interesting idea. Although maybe it doesn't even have to use fibers to work, if you're willing to give up the laziness part: ``` /*ref*/ O pipeRange(alias fn, O, T...)(/*ref*/ O output, T args) if (isInputRange!O && isOutputRange!O) { fn(output, args); return output; } auto thing = appender!string() .pipeRange!formattedWrite("%d plus %d is %d", 1, 2, 3) .map!toUpperCase() .array(); ``` Or something like that.
Oct 20 2020









burt <invalid_email_address cab.abc> 