www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - foreach iterator with closure

reply Denis <noreply noserver.lan> writes:
Is it possible to write an iterator that does the following, 
using a struct and some functions?

  - Operates in a foreach loop
  - Has BEGIN-like and END-like blocks or functions that are 
executed automatically, before and after the iterations
  - Initializes variables in the BEGIN block that are used in the 
other two. These variables are for internal use only, i.e. must 
not be accessible to the user of the foreach loop

I'd like to use the simplest solution while keeping the code 
clean. As a starting point, here's a contrived example using a 
struct with a range-style iterarator:

   import std.stdio;

   struct letters {
     string str;
     int pos = 0;
     char front() { return str[pos]; }
     void popFront() { pos ++; }
     bool empty() {
       if (pos == 0) writeln(`BEGIN`);
       else if (pos == str.length) writeln("\nEND");
       return pos == str.length; }}

   void main() {
     foreach (letter; letters(`hello`)) {
       write(letter, ' '); }
     writeln(); }

The obvious problems with this code include:

(1) The user can pass a second argument, which will set the 
initial value of pos. This must not be allowed. (The real code 
will need to initialize a half dozen internal-only variables, and 
do some additional work, before the looping starts.)

(2) Sticking the code for the BEGIN and END blocks into the 
empty() function is ugly.

Can this iterator be written using a range-style struct? Or is 
something more complicated needed, like an OO solution?

I should add that the final version of this will be put in a 
separate module, possibly in a library, so I can call it from 
many programs. Not sure if that might help simplify things.

Thanks for your guidance.
Jun 27 2020
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 6/27/20 8:19 PM, Denis wrote:

 Is it possible to write an iterator
It is arguable whether D's ranges are iterators but if nouns are useful, we call them ranges. :) (Iterators can be written in D as well and then it would really be confusing.)
    struct letters {
      string str;
      int pos = 0;
      char front() { return str[pos]; }
      void popFront() { pos ++; }
      bool empty() {
        if (pos == 0) writeln(`BEGIN`);
        else if (pos == str.length) writeln("\nEND");
        return pos == str.length; }}

    void main() {
      foreach (letter; letters(`hello`)) {
        write(letter, ' '); }
      writeln(); }

 The obvious problems with this code include:

 (1) The user can pass a second argument, which will set the initial
 value of pos.
That problem can be solved by a constructor that takes a single string. Your BEGIN code would normally go there as well. And END goes into the destructor: struct letters { this(string str) { this.str = str; this.pos = 0; // Redundant writeln(`BEGIN`); } ~this() { writeln("\nEND"); } // [...] } Note: You may want to either disallow copying of your type or write copy constructor that does the right thing: https://dlang.org/spec/struct.html#struct-copy-constructor However, it's common to construct a range object by a function. The actual range type can be kept as an implementation detail: struct Letters { // Note capital L // ... } auto letters(string str) { // ... return Letters(str); } struct Letter can be a private type of its module or even a nested struct inside letters(), in which case it's called a "Voldemort type". Ali
Jun 27 2020
parent reply Denis <noreply noserver.lan> writes:
Many thanks: your post has helped me get past the initial 
stumbling blocks I was struggling with. I do have a followup 
question.

First, here are my conclusions up to this point, based on your 
post above, some additional experimentation, and further research 
(for future reference, and for any other readers).

* foreach is the actual iterator, the instantiation of a struct 
is the range.
* When a constructor is not used, the arguments in the call to 
instantiate the range (in this case, `hello` in letters(`hello`)) 
are mapped sequentially to the member variables in the struct 
definition (i.e. to letters.str).
* When a constructor is used, the member variables in the struct 
definition are in essence private. The arguments in the call to 
instantiate the range are now mapped directly to the parameters 
in the definition of the "this" function.
* The syntax and conventions for constructors is difficult and 
non-intuitive for anyone who hasn't learned Java (or a 
derivative). The linked document provides a simplified 
explanation for the "this" keyword, which is helpful for the 
first read: 
https://docs.oracle.com/javase/tutorial/java/javaOO/thiskey.html.
* In some respects, the Java syntax is not very D-like. (For 
example, it breaks the well-established convention of "Do not use 
the same name to mean two different things".) However, it does 
need to be learned, because it is common in D source code.

Here is the complete revised code for the example (in condensed 
form):

   import std.stdio;

   struct letters {

     string str;
     int pos = 1;		// Assign here or in this())

     this(string param1) {	// cf. shadow str
       str = param1;		// cf. this.str = param1 / this.str = str
       writeln(`BEGIN`); }

     char front() { return str[pos]; }
     void popFront() { pos ++; }
     bool empty() { return pos == str.length; }

     ~this() { writeln("\nEND"); }}

   void main() {
     foreach (letter; letters(`hello`)) {
       write(letter, ' '); }}

At this point, I do have one followup question:

Why is the shadow str + "this.str = str" the more widely used 
syntax in D, when the syntax in the code above is unambiguous?

One possible reason that occurred to me is that "str = param1" 
might require additional GC, because they are different names. 
But I wouldn't think it'd make any difference to the compiler.

Denis
Jun 28 2020
parent reply =?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:
On 6/28/20 9:07 AM, Denis wrote:

 * foreach is the actual iterator,
Yes. foreach is "lowered" to the following equivalent: for ( ; !range.empty; range.popFront()) { // Use range.front here } A struct can support foreach iteration through its opCall() member function as well. opCall() takes the body of the foreach as a delegate. Because it's a function call, it can take full advantage of the function call stack. This may help with e.g. writing recursive iteration algorithms. http://ddili.org/ders/d.en/foreach_opapply.html#ix_foreach_opapply.opApply
 the instantiation of a struct is the
 range.
Yes.
 * When a constructor is not used, the arguments in the call to
 instantiate the range (in this case, `hello` in letters(`hello`)) are
 mapped sequentially to the member variables in the struct definition
 (i.e. to letters.str).
Yes, that is a very practical struct feature. I write my structs with as little as needed and provide a constructor only when it is necessary as in your case.
 * When a constructor is used, the member variables in the struct
 definition are in essence private.
Not entirely true. You can still make them public if you want. http://ddili.org/ders/d.en/encapsulation.html
 The arguments in the call to
 instantiate the range are now mapped directly to the parameters in the
 definition of the "this" function.
Yes.
 * The syntax and conventions for constructors is difficult and
 non-intuitive for anyone who hasn't learned Java (or a derivative).
C++ uses the name of the class as the constructor: // C++ code struct S { S(); // <-- Constructor S(int); // <-- Another one }; The problem with that syntax is having to rename more than one thing when the name of struct changes e.g. to Q: struct Q { Q(); Q(int); }; And usually in the implementation: Q::Q() {} Q::Q(int) {} D's choice of 'this' is productive.
 The
 linked document provides a simplified explanation for the "this"
 keyword, which is helpful for the first read:
 https://docs.oracle.com/javase/tutorial/java/javaOO/thiskey.html.
I like searching for keywords in my index. The "this, constructor" here links to the constructor syntax: http://ddili.org/ders/d.en/ix.html
 * In some respects, the Java syntax is not very D-like. (For example, it
 breaks the well-established convention of "Do not use the same name to
 mean two different things".)
Yes but it competes with another goal: Change as little code as possible when one thing needs to be changed. This is not only practical but helps with correctness.
 However, it does need to be learned,
 because it is common in D source code.
I like D. :p
 Here is the complete revised code for the example (in condensed form):

    import std.stdio;

    struct letters {

      string str;
      int pos = 1;        // Assign here or in this())

      this(string param1) {    // cf. shadow str
        str = param1;        // cf. this.str = param1 / this.str = str
        writeln(`BEGIN`); }

      char front() { return str[pos]; }
      void popFront() { pos ++; }
      bool empty() { return pos == str.length; }

      ~this() { writeln("\nEND"); }}

    void main() {
      foreach (letter; letters(`hello`)) {
        write(letter, ' '); }}

 At this point, I do have one followup question:

 Why is the shadow str + "this.str = str" the more widely used syntax in
 D, when the syntax in the code above is unambiguous?
Because one needs to come up with names like "param7", "str_", "_str", "s", etc. I like and follow D's standard here.
 One possible reason that occurred to me is that "str = param1" might
 require additional GC, because they are different names.
Not at all because there is not memory allocation at all. strings are implemented as the equivalent of the following struct: struct __D_native_string { size_t length_; char * ptr; // ... } So, the "str = param1" assignment is nothing but two 64 bit data transfer, which can easily by optimized away by the compiler in many cases.
 But I wouldn't
 think it'd make any difference to the compiler.
Yes. :)
 Denis
Ali
Jun 28 2020
parent Denis <noreply noserver.lan> writes:
To keep this reply brief, I'll just summarize:

Lots of great takeaways from both of your posts, and a handful of 
topics you mentioned that I need to dig into further now. This is 
great (I too like D :)

I very much appreciate the extra insight into how things work and 
why certain design decisions were made: for me, this is essential 
for gaining fluency in a language.

Thanks again for all your help!
Denis
Jun 28 2020