www.digitalmars.com         C & C++   DMDScript  

digitalmars.dip.ideas - Parameter storage classes on foreach variables

reply Quirin Schroll <qs.il.paperinik gmail.com> writes:
As of now, `foreach` admits `ref` variables as in `foreach (ref 
x; xs)`. There, `ref` can be used for two conceptually different 
things:
* Avoiding copies
* Mutating the values in place

If mutating in place is desired, `ref` is an excellent choice.
However, if mere copy avoiding is desired, another great option 
would be `in`.
On parameters, it avoids expensive copies, but does trivial ones.

A type supplying `opApply` can, in principle, easily provide an 
implementation where the callback takes an argument by `in` or 
`out`:
```d
struct Range
{
     int opApply(scope int delegate(size_t, in X) callback)
     {
         X x;
         if (auto result = callback(0, x)) return result;
         return 0;
     }
}
```
For `out`, it’s not really different.

However, how do classical ranges (`empty`, `front`, `popFront`) 
fare with these?
First `in`.
```d
foreach (in x; xs) { … }
// lowers to
{
     auto __xs = xs;
     for (; !__xs.empty; __xs.popFront)
     {
         static if (/* should be ref */)
             const scope ref x = __xs.front;
         else
             const scope x = __xs.front;
         …
     }
}
```

The first notable observation is that `out` makes no sense for 
input ranges. Rather, it would make sense for, well, output 
ranges: Every time the loop reaches the end, a `put` is issued, 
whereas `continue` means “this loop iteration did not produce a 
value, but continue” and `break` means “end the loop”:
```d
foreach (out T x; xs) { … }
// lowers to
{
     auto __xs = xs; // or xs[]
     for (; !__xs.empty /* or __xs.length > 0 or nothing */;)
     {
         auto x = T.init;
         …
         __xs.put(x); /* or similar */
     }
}
```
The program should assign `x` in its body. If control reaches the 
end of the loop, the value is `put` in the output range.
As an output range, in general, need not be finite, the loop is 
endless by design, but if the range has an `empty` member, it’s 
being used, and for types with `length`, but no `empty`, the 
condition is `__xs.length > 0`. For arrays and slices, the `put` 
operation is `__xs[0] = x; __xs = __xs[1 .. $];`.

If `T` is not explicitly given, and `xs` is not an array or 
slice, an attempt should be made to extract it from the single 
parameter of a non-overloaded `xs.put`. Otherwise, it’s an error.

Dynamic arrays and slices should support `size_t` keys as well:
```d
foreach (i, out x; xs) { … }
// lowers to
{
     auto __xs = xs[];
     for (size_t __i = 0; __xs.length > 0; ++__i)
     {
         size_t i = __i;
         auto x = typeof(xs[0]).init;
         …
         __xs[0] = x;
         __xs = __xs[1 .. $];
     }
}
```

Associative arrays specifically can be filled using `out` key and 
values:
```d
int[string] aa;
foreach (out key, out value; aa) { … }
// lowers to
{
     auto __aa = aa;
     for (;;)
     {
         KeyType key = KeyType.init;
         ValueType value = ValueType.init;
         …
         __aa[key] = value;
     }
}
```
At some point, a `break` is needed, otherwise the loop is 
infinite.
May 17 2024
next sibling parent reply Timon Gehr <timon.gehr gmx.ch> writes:
On 5/17/24 20:59, Quirin Schroll wrote:
 As of now, `foreach` admits `ref` variables as in `foreach (ref x; xs)`. 
 There, `ref` can be used for two conceptually different things:
 * Avoiding copies
 * Mutating the values in place
 
 If mutating in place is desired, `ref` is an excellent choice.
 However, if mere copy avoiding is desired, another great option would be 
 `in`.
I contest that `in` is a great option every time mere avoiding of copies is desired (because it implies transitive `const`). In general, extending `foreach` to `in` and `out` makes some sense, but `out` is likely to be quite controversial, especially the output range lowering. When I think of `foreach`, I think of consuming a range, not producing one.
May 18 2024
next sibling parent reply Nick Treleaven <nick geany.org> writes:
On Saturday, 18 May 2024 at 22:43:48 UTC, Timon Gehr wrote:
 If mutating in place is desired, `ref` is an excellent choice.
 However, if mere copy avoiding is desired, another great 
 option would be `in`.
I contest that `in` is a great option every time mere avoiding of copies is desired (because it implies transitive `const`).
Did you mean "isn't a great option"? And if so, presumably we still need `auto ref`: https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1022.md
 In general, extending `foreach` to `in` and `out` makes some 
 sense, but `out` is likely to be quite controversial, 
 especially the output range lowering. When I think of 
 `foreach`, I think of consuming a range, not producing one.
+1
May 20 2024
parent Timon Gehr <timon.gehr gmx.ch> writes:
On 5/20/24 16:29, Nick Treleaven wrote:
 On Saturday, 18 May 2024 at 22:43:48 UTC, Timon Gehr wrote:
 If mutating in place is desired, `ref` is an excellent choice.
 However, if mere copy avoiding is desired, another great option would 
 be `in`.
I contest that `in` is a great option every time mere avoiding of copies is desired (because it implies transitive `const`).
...
The negation is in the word "contest". (Stated more clearly: Sometimes `in` cannot be used because `const` is transitive.)
 Did you mean "isn't a great option"?
 And if so, presumably we still need `auto ref`:
 https://github.com/dlang/DIPs/blob/master/DIPs/other/DIP1022.md
 ...
Would be good, also for local variables outside of `foreach`.
May 20 2024
prev sibling parent Quirin Schroll <qs.il.paperinik gmail.com> writes:
On Saturday, 18 May 2024 at 22:43:48 UTC, Timon Gehr wrote:
 On 5/17/24 20:59, Quirin Schroll wrote:
 As of now, `foreach` admits `ref` variables as in `foreach 
 (ref x; xs)`. There, `ref` can be used for two conceptually 
 different things:
 * Avoiding copies
 * Mutating the values in place
 
 If mutating in place is desired, `ref` is an excellent choice.
 However, if mere copy avoiding is desired, another great 
 option would be `in`.
I contest that `in` is a great option every time mere avoiding of copies is desired (because it implies transitive `const`).
True. In generic code, one basically can’t use `const`, and therefore `in`, as a type can become simply unusable. (Prime example would be delegate types once they’re fixed.) I case you know the type and it’s a type that works well being `const`, *then* `in` might be a great option.
 In general, extending `foreach` to `in` and `out` makes some 
 sense, but `out` is likely to be quite controversial, 
 especially the output range lowering. When I think of 
 `foreach`, I think of consuming a range, not producing one.
I thought the same, but on the other hand, there’s a keyword, so it absolutely won’t happen accidentally. It may just surprise people to read it in someone else’s code. My sense is that everything that the stuff in a `foreach` header before the semicolon should support exactly the same things a lambda parameter list would simply because it may become a lambda passed to `opApply`. If it isn’t, well, it’s up for discussion what to do with it. Making it invalid is always an option.
May 21 2024
prev sibling parent Paul Backus <snarwin gmail.com> writes:
On Friday, 17 May 2024 at 18:59:13 UTC, Quirin Schroll wrote:
 ```d
 foreach (out T x; xs) { … }
 // lowers to
 {
     auto __xs = xs; // or xs[]
     for (; !__xs.empty /* or __xs.length > 0 or nothing */;)
     {
         auto x = T.init;
         …
         __xs.put(x); /* or similar */
     }
 }
 ```
[...]
 ```d
 int[string] aa;
 foreach (out key, out value; aa) { … }
 // lowers to
 {
     auto __aa = aa;
     for (;;)
     {
         KeyType key = KeyType.init;
         ValueType value = ValueType.init;
         …
         __aa[key] = value;
     }
 }
 ```
I don't like these special-case rewrites. Binding an array/AA/range element to an `out` loop variable should have *exactly* the same semantics as binding a function argument to an `out` parameter. That is, * The element must be an lvalue. * The element is bound by reference. * Upon being bound, the element is set its `.init` value. So, no implicit calls to `put`, no implicit insertion of AA elements, etc. Aside from that, this seems like a good idea to me. 👍
May 21 2024