digitalmars.dip.ideas - Parameter storage classes on foreach variables

Quirin Schroll (105/105) May 17 2024 As of now, `foreach` admits `ref` variables as in `foreach (ref

Timon Gehr (7/15) May 18 2024 I contest that `in` is a great option every time mere avoiding of copies...

Nick Treleaven (5/14) May 20 2024 Did you mean "isn't a great option"?

Timon Gehr (5/17) May 20 2024 The negation is in the word "contest".

Quirin Schroll (14/30) May 21 2024 True. In generic code, one basically can’t use `const`, and

Paul Backus (12/40) May 21 2024 I don't like these special-case rewrites. Binding an

Quirin Schroll <qs.il.paperinik gmail.com> writes:

As of now, `foreach` admits `ref` variables as in `foreach (ref 
x; xs)`. There, `ref` can be used for two conceptually different 
things:
* Avoiding copies
* Mutating the values in place

If mutating in place is desired, `ref` is an excellent choice.
However, if mere copy avoiding is desired, another great option 
would be `in`.
On parameters, it avoids expensive copies, but does trivial ones.

A type supplying `opApply` can, in principle, easily provide an 
implementation where the callback takes an argument by `in` or 
`out`:
```d
struct Range
{
     int opApply(scope int delegate(size_t, in X) callback)
     {
         X x;
         if (auto result = callback(0, x)) return result;
         return 0;
     }
}
```
For `out`, it’s not really different.

However, how do classical ranges (`empty`, `front`, `popFront`) 
fare with these?
First `in`.
```d
foreach (in x; xs) { … }
// lowers to
{
     auto __xs = xs;
     for (; !__xs.empty; __xs.popFront)
     {
         static if (/* should be ref */)
             const scope ref x = __xs.front;
         else
             const scope x = __xs.front;
         …
     }
}
```

The first notable observation is that `out` makes no sense for 
input ranges. Rather, it would make sense for, well, output 
ranges: Every time the loop reaches the end, a `put` is issued, 
whereas `continue` means “this loop iteration did not produce a 
value, but continue” and `break` means “end the loop”:
```d
foreach (out T x; xs) { … }
// lowers to
{
     auto __xs = xs; // or xs[]
     for (; !__xs.empty /* or __xs.length > 0 or nothing */;)
     {
         auto x = T.init;
         …
         __xs.put(x); /* or similar */
     }
}
```
The program should assign `x` in its body. If control reaches the 
end of the loop, the value is `put` in the output range.
As an output range, in general, need not be finite, the loop is 
endless by design, but if the range has an `empty` member, it’s 
being used, and for types with `length`, but no `empty`, the 
condition is `__xs.length > 0`. For arrays and slices, the `put` 
operation is `__xs[0] = x; __xs = __xs[1 .. $];`.

If `T` is not explicitly given, and `xs` is not an array or 
slice, an attempt should be made to extract it from the single 
parameter of a non-overloaded `xs.put`. Otherwise, it’s an error.

Dynamic arrays and slices should support `size_t` keys as well:
```d
foreach (i, out x; xs) { … }
// lowers to
{
     auto __xs = xs[];
     for (size_t __i = 0; __xs.length > 0; ++__i)
     {
         size_t i = __i;
         auto x = typeof(xs[0]).init;
         …
         __xs[0] = x;
         __xs = __xs[1 .. $];
     }
}
```

Associative arrays specifically can be filled using `out` key and 
values:
```d
int[string] aa;
foreach (out key, out value; aa) { … }
// lowers to
{
     auto __aa = aa;
     for (;;)
     {
         KeyType key = KeyType.init;
         ValueType value = ValueType.init;
         …
         __aa[key] = value;
     }
}
```
At some point, a `break` is needed, otherwise the loop is 
infinite.

May 17 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 5/17/24 20:59, Quirin Schroll wrote:
 As of now, `foreach` admits `ref` variables as in `foreach (ref x; xs)`. 
 There, `ref` can be used for two conceptually different things:
 * Avoiding copies
 * Mutating the values in place
 
 If mutating in place is desired, `ref` is an excellent choice.
 However, if mere copy avoiding is desired, another great option would be 
 `in`.

I contest that `in` is a great option every time mere avoiding of copies 
is desired (because it implies transitive `const`).

In general, extending `foreach` to `in` and `out` makes some sense, but 
`out` is likely to be quite controversial, especially the output range 
lowering. When I think of `foreach`, I think of consuming a range, not 
producing one.

May 18 2024

Nick Treleaven <nick geany.org> writes:

On Saturday, 18 May 2024 at 22:43:48 UTC, Timon Gehr wrote:
 If mutating in place is desired, `ref` is an excellent choice.
 However, if mere copy avoiding is desired, another great 
 option would be `in`.

 I contest that `in` is a great option every time mere avoiding 
 of copies is desired (because it implies transitive `const`).

Did you mean "isn't a great option"?
And if so, presumably we still need `auto ref`:
https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1022.md

 In general, extending `foreach` to `in` and `out` makes some 
 sense, but `out` is likely to be quite controversial, 
 especially the output range lowering. When I think of 
 `foreach`, I think of consuming a range, not producing one.

+1

May 20 2024

Timon Gehr <timon.gehr gmx.ch> writes:

On 5/20/24 16:29, Nick Treleaven wrote:
 On Saturday, 18 May 2024 at 22:43:48 UTC, Timon Gehr wrote:
 If mutating in place is desired, `ref` is an excellent choice.
 However, if mere copy avoiding is desired, another great option would 
 be `in`.

 I contest that `in` is a great option every time mere avoiding of 
 copies is desired (because it implies transitive `const`).

 ...

The negation is in the word "contest".
(Stated more clearly: Sometimes `in` cannot be used because `const` is 
transitive.)

 Did you mean "isn't a great option"?
 And if so, presumably we still need `auto ref`:
 https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1022.md
 ...

Would be good, also for local variables outside of `foreach`.

May 20 2024

Quirin Schroll <qs.il.paperinik gmail.com> writes:

On Saturday, 18 May 2024 at 22:43:48 UTC, Timon Gehr wrote:
 On 5/17/24 20:59, Quirin Schroll wrote:
 As of now, `foreach` admits `ref` variables as in `foreach 
 (ref x; xs)`. There, `ref` can be used for two conceptually 
 different things:
 * Avoiding copies
 * Mutating the values in place
 
 If mutating in place is desired, `ref` is an excellent choice.
 However, if mere copy avoiding is desired, another great 
 option would be `in`.

 I contest that `in` is a great option every time mere avoiding 
 of copies is desired (because it implies transitive `const`).

True. In generic code, one basically can’t use `const`, and 
therefore `in`, as a type can become simply unusable. (Prime 
example would be delegate types once they’re fixed.)

I case you know the type and it’s a type that works well being 
`const`, *then* `in` might be a great option.

 In general, extending `foreach` to `in` and `out` makes some 
 sense, but `out` is likely to be quite controversial, 
 especially the output range lowering. When I think of 
 `foreach`, I think of consuming a range, not producing one.

I thought the same, but on the other hand, there’s a keyword, so 
it absolutely won’t happen accidentally. It may just surprise 
people to read it in someone else’s code.

My sense is that everything that the stuff in a `foreach` header 
before the semicolon should support exactly the same things a 
lambda parameter list would simply because it may become a lambda 
passed to `opApply`. If it isn’t, well, it’s up for discussion 
what to do with it. Making it invalid is always an option.

May 21 2024

Paul Backus <snarwin gmail.com> writes:

On Friday, 17 May 2024 at 18:59:13 UTC, Quirin Schroll wrote:
 ```d
 foreach (out T x; xs) { … }
 // lowers to
 {
     auto __xs = xs; // or xs[]
     for (; !__xs.empty /* or __xs.length > 0 or nothing */;)
     {
         auto x = T.init;
         …
         __xs.put(x); /* or similar */
     }
 }
 ```

[...]
 ```d
 int[string] aa;
 foreach (out key, out value; aa) { … }
 // lowers to
 {
     auto __aa = aa;
     for (;;)
     {
         KeyType key = KeyType.init;
         ValueType value = ValueType.init;
         …
         __aa[key] = value;
     }
 }
 ```

I don't like these special-case rewrites. Binding an 
array/AA/range element to an `out` loop variable should have 
*exactly* the same semantics as binding a function argument to an 
`out` parameter. That is,

* The element must be an lvalue.
* The element is bound by reference.
* Upon being bound, the element is set its `.init` value.

So, no implicit calls to `put`, no implicit insertion of AA 
elements, etc.

Aside from that, this seems like a good idea to me. 👍

May 21 2024

D Programming

C/C++ Programming

Other

digitalmars.dip.ideas - Parameter storage classes on foreach variables