digitalmars.D - Proposal: Hide the int in opApply from the user
- Bill Baxter (73/73) Jan 07 2008 I proposed this iniitally over in D.learn, but I'm cleaning it up and
- Jason House (2/2) Jan 07 2008 I think this proposal has one fatal flaw...
- Sean Kelly (3/6) Jan 07 2008 scope(exit) would work, but it's not ideal.
- Bill Baxter (34/42) Jan 07 2008 Hmm, you're right. That's an issue that had not occurred to me because
- Bruce Adams (6/19) Jan 07 2008 When you say magic value what do you mean? From the context it sounds li...
- Bill Baxter (15/38) Jan 07 2008 I mean this:
- Paul Findlay (5/5) Jan 08 2008 Sorry, to use the web interface to the newsgroup..
- bearophile (6/10) Jan 09 2008 Thank you for the link, it's quite interesting.
- Bill Baxter (4/15) Jan 09 2008 In other words, the author seems to have overlooked the other place
- Dan (9/26) Jan 09 2008 Or the other place; in registers.
- bearophile (18/18) Jan 10 2008 Bill Baxter>In other words, the author seems to have overlooked the othe...
- bearophile (4/6) Jan 10 2008 D 2.x has full closures too, so to solve those problems it may be used o...
- BC (8/31) Jan 11 2008 It's difficult to iterate over two collections without buffering.
I proposed this iniitally over in D.learn, but I'm cleaning it up and reposting here in hopes of getting some response from Walter who was probably too busy finishing const and eating holiday Turkey at the time to notice. And rightly so. I don't believe it's appropriate in a high-level supposedly clean language like D that one of the main facilities for iterating over user types (foreach) requires writing code that passes around magic values generated by the compiler (opApply). It seems wrong to me that these magic values - come from code generated by the compiler, - must be handled exactly the proper way by the user's opApply (or else you get undefined behavior, but no compiler errors) - and then are handed back to code also generated by the compiler. Furthermore, the compiler-generated code in the first and last steps share the same scope! So the solution seems obvious -- they should pass the information back and forth using a local variable in their shared local scope. In this proposal we need to add two things: 1) a new template struct and 2) a new macro [yes, this proposal relies on macros which don't exist yet!] The template just bundles an int* (pointer to _ret) together with the loop body delegate: struct Apply(Args...) { alias void delegate(Args) LoopBody; LoopBody _loop_body; int* _ret = null; } the macro is this (just guessing what syntax will be, and hoping macros will support tuple-like varargs): macro yield(dg, args...) { dg._call(args); if (dg._ret && *dg._ret) { return; } } With these two library additions, opApply functions can become this: void opApply( Apply!(ref T) dg ) { for( /*T x in elements*/ ) { yield(dg,x); } } Now the trickiness is *all* shifted to how you call such a beast properly, which is all handled by the compiler. For a foreach in a void function, the compiler will have to generate code like so: int _ret = 0; void _loop_body(/*ref*/ T x) { writefln("x is ", x); if (x=="two") { _ret = BREAK; return; } if (x=="three") { _ret = RETURN; return; } do_something; } obj.opApply( Apply!(T)(&_loop_body, &_ret) ); if (_ret==RETURN) return; The language can ALMOST do this today except for three small things: 1) No macros - but they're on the way! 2) Inability to preserve ref-ness of template arguments -- but I think this really needs to be solved one way or another regardless. 3) The necessary but changes to the foreach code gen -- this is straightforward. Attached is a proof of concept demo. I've manually inlined the yield() code to work around 1), and made the loop body use a non-ref type to work around 2). I manually generated the foreach code too to deal with 3). The great thing about this proposal is that it is backwards compatible. foreach already generates different code depending on what the argument is, this can just be another case detected by the use of the Apply argument. Code using old-style opApplys can continue to work. The main thing fuzzy in my mind is the vague status of yield and Apply. They don't need to be keywords per-se, but the compiler at least needs to know about Apply so that it can recognize the signature of this "new-style" opApply. I think it can maybe satisfy all that by going into object.d? If there were anonymous struct literals it wouldn't even need to be a real struct, just an alias like we have for 'string' now. --bb
Jan 07 2008
I think this proposal has one fatal flaw... There is no way for the opApply function to do something after iteration stops prematurely. Some data structures could change internal state as they iterate. Those state changes may require clean-up. I have no good examples at the moment, but know they exist.
Jan 07 2008
Jason House wrote:I think this proposal has one fatal flaw... There is no way for the opApply function to do something after iteration stops prematurely. Some data structures could change internal state as they iterate. Those state changes may require clean-up. I have no good examples at the moment, but know they exist.scope(exit) would work, but it's not ideal. Sean
Jan 07 2008
Sean Kelly wrote:Jason House wrote:Hmm, you're right. That's an issue that had not occurred to me because I've never seen it in code. But it's not a fatal flaw I do not think. The yield thing is a pretty trivial macro. I see several possible ways to handle such rare cases. 1) Add public opCall and a public 'finished' methods to Apply to allow users to do yield's work on their own: struct Apply(Args...) { // .. same as before plus: void opCall(Args args) { _loop_body(args); } bool finished() { return *_ret!=0; } } Then this is possible: void opApply( Apply!(ref T) dg ) { for( /*T x in elements*/ ) { dg(x); if (dg.finished) { // do clean up; return; } } } 2) Provide an alternate macro that takes a cleanup parameter: void opApply( Apply!(ref T) dg ) { for( /*T x in elements*/ ) { yield_with_cleanup(dg, { /*do cleanup*/ }, x); } } 3)I think this proposal has one fatal flaw... There is no way for the opApply function to do something after iteration stops prematurely. Some data structures could change internal state as they iterate. Those state changes may require clean-up. I have no good examples at the moment, but know they exist.scope(exit) would work, but it's not ideal.But it's probably the solution that I would use, and one of the other solutions can be used for what I expect are the very rare situations in which you both have to do clean up in your opApply and for some reason can't use scope(exit). --bb
Jan 07 2008
On Mon, 07 Jan 2008 09:06:52 -0000, Bill Baxter <dnewsgroup billbaxter.com> wrote:I proposed this iniitally over in D.learn, but I'm cleaning it up and reposting here in hopes of getting some response from Walter who was probably too busy finishing const and eating holiday Turkey at the time to notice. And rightly so. I don't believe it's appropriate in a high-level supposedly clean language like D that one of the main facilities for iterating over user types (foreach) requires writing code that passes around magic values generated by the compiler (opApply). It seems wrong to me that these magic values - come from code generated by the compiler, - must be handled exactly the proper way by the user's opApply (or else you get undefined behavior, but no compiler errors) - and then are handed back to code also generated by the compiler.When you say magic value what do you mean? From the context it sounds like you are describing an iterator without iterators being properly part of the D world (yet).
Jan 07 2008
Bruce Adams wrote:On Mon, 07 Jan 2008 09:06:52 -0000, Bill Baxter <dnewsgroup billbaxter.com> wrote:I mean this: alias int MagicValue; MagicValue opApply(MagicValue delegate(ref T) dg) { MagicValue magic_value=0; foreach(x; stuff) { magic_value = dg(x); if (magic_value != 0) return magic_value; } return magic_value; } "Magic value" is that int that we're forced to have scattered all over our opApply functions. --bbI proposed this iniitally over in D.learn, but I'm cleaning it up and reposting here in hopes of getting some response from Walter who was probably too busy finishing const and eating holiday Turkey at the time to notice. And rightly so. I don't believe it's appropriate in a high-level supposedly clean language like D that one of the main facilities for iterating over user types (foreach) requires writing code that passes around magic values generated by the compiler (opApply). It seems wrong to me that these magic values - come from code generated by the compiler, - must be handled exactly the proper way by the user's opApply (or else you get undefined behavior, but no compiler errors) - and then are handed back to code also generated by the compiler.When you say magic value what do you mean? From the context it sounds like you are describing an iterator without iterators being properly part of the D world (yet).
Jan 07 2008
Sorry, to use the web interface to the newsgroup.. This is an old (1993), but I hope related, discussion: Iterators: Signs of Weakness in Object-Oriented Languages http://home.pipeline.com/~hbaker1/Iterator.html - Paul
Jan 08 2008
Paul Findlay Wrote:This is an old (1993), but I hope related, discussion: Iterators: Signs of Weakness in Object-Oriented Languages http://home.pipeline.com/~hbaker1/Iterator.htmlThank you for the link, it's quite interesting. From that article:Iterator functions for a collection can be "member functions" of the C++ collection class itself, but this policy creates a problem if one wishes to enumerate the same collection for different purposes at the same time--e.g., if one wishes to enumerate the Cartesian product of a collection with itself. The problem is that the "state" of the iterator must be stored somewhere, and the only two places are in the instance of the collection or the class of the collection. If the state is stored in the class, then only one iterator can be active for the entire class, while if the state is stored in the instance, then only one iterator can be active for the collection. Neither of these policies provides the flexibility required for our Cartesian product problem. The solution usually proposed for C++ is to provide every collection class with a "friend" class called its "iterator class".<If you put a D opApply as a metod of a collection class, it allows you to enumerate the Cartesian product of a collection with itself too. Bye, bearophile
Jan 09 2008
bearophile wrote:Paul Findlay Wrote:In other words, the author seems to have overlooked the other place where state can be stored: on the stack. --bbThis is an old (1993), but I hope related, discussion: Iterators: Signs of Weakness in Object-Oriented Languages http://home.pipeline.com/~hbaker1/Iterator.htmlThank you for the link, it's quite interesting. From that article:The problem is that the "state" of the iterator must be stored somewhere, and the only two places are in the instance of the collection or the class of the collection.If you put a D opApply as a metod of a collection class, it allows you to enumerate the Cartesian product of a collection with itself too.
Jan 09 2008
Bill Baxter Wrote:bearophile wrote:Or the other place; in registers. Typically when you go through a loop, you generate this (on x86): mov ecx, length; top: bla bla bla dec ecx; jnz top; This is considerably more efficient for storing iterator state than any other means.Paul Findlay Wrote:In other words, the author seems to have overlooked the other place where state can be stored: on the stack. --bbThis is an old (1993), but I hope related, discussion: Iterators: Signs of Weakness in Object-Oriented Languages http://home.pipeline.com/~hbaker1/Iterator.htmlThank you for the link, it's quite interesting. From that article:The problem is that the "state" of the iterator must be stored somewhere, and the only two places are in the instance of the collection or the class of the collection.If you put a D opApply as a metod of a collection class, it allows you to enumerate the Cartesian product of a collection with itself too.
Jan 09 2008
Bill Baxter>In other words, the author seems to have overlooked the other place where state can be stored: on the stack.< In other parts of that article the author is aware of the stack too, he talks about the alloca() function too. What I don't like is the formatting of those snippets of C code :-) void maptimes(fn,limit) void (*fn)(int); int limit; {int i; for (i=0; i<limit; i++) (*fn)(i);} The article presents some problems of iterating on a collection, and how to solve them in Lisp, C and C++. The main problems are: 1) Iterating on a collection 2) Given a collection (like a tree), enumerate the Cartesian product of it with itself. 3) Given two collections, like two trees, find if they have the same leaves, ignoring the other structure of the tree. The purpose is to do this computation efficiently, keeping the collection data encapsulation, etc. 4) I think the article tries to talk about a problem where you need to do both 2) and 3). Putting an opApply method into the collection class is enough to solve the 1) and 2) problem too. The problem 3) can be solved in many ways, the simpler way is to scan both trees to accumulate their leaves into an array, and then to see if the two (or more) arrays are equal. But that ways usually wastes too much memory and time if the trees aren't small. So a lazy approach is better. In Haskell and in recent versions of Python it's easy to perform lazy computations. (In Python you can write a function/method generator, that yields the leaves, then you just have to write a little function that calls the two generators in parallel. If the two generators give equal sequences, and they stop at the same time, then the two trees have the same fringe). The D opApply doesn't allow you to do that, I presume (fibers/threads may help, but so far I haven't used them to solve this problem of parallel lazy scan). But to solve the samefringe problem in D (if you have a Tree data structure class) you can add three methods to that class, like "next", "isfinished" and "reset" (or something similar) that use a little part of the state of the tree object to iterate on the nodes, where the "next" method gives the currently seen node. Then you can write a little function that calls "next" on two different trees, to solve the samefringe problem (so you need "next", "isfinished", "reset" to solve parallel scan problems, and you can add "opApply" method too, for a faster single scan, because I presume opApply is faster than using the next/isfinished pair). To solve the problem 4) (that's not stated in the article, I think) the reset/next/isfinished methods don't suffice, because the object stores just one iteration state. So you may need to give the tree class yet another method that returns a newly created object with reset/next/isfinished methods and a new iteration state. This is probably closer to the solution used by C++ (but it's a friend class, not a nested class). You may want to avoid putting the reset/next/isfinished methods in the tree class, and just use this smaller iteration object for the problem 3) too. I don't know if D may use some standard way (like opApply) to allow parallel lazy scans (that is to manage the yield like Python does). I am learning D still, so if you see something wrong, silly or non D-thonic among the things I have written here, please tell me :-) Bye, bearophile
Jan 10 2008
bearophile:I don't know if D may use some standard way (like opApply) to allow parallel lazy scans (that is to manage the yield like Python does).D 2.x has full closures too, so to solve those problems it may be used one of the full functional approaches discussed in the article too (I presume they aren't much used yet among D programmers). Bye, bearophile
Jan 10 2008
On Wed, 09 Jan 2008 21:15:48 -0000, bearophile <bearophileHUGS lycos.com> wrote:Paul Findlay Wrote:It's difficult to iterate over two collections without buffering. A tree iterator needs a tree with parent links on all the nodes or it needs to hold its own stack. Unfortunately if you try to assign those iterators around, the stack is copied by reference and your program is broken. So you have to explicitly dup those iterators in all your generic code.This is an old (1993), but I hope related, discussion: Iterators: Signs of Weakness in Object-Oriented Languages http://home.pipeline.com/~hbaker1/Iterator.htmlThank you for the link, it's quite interesting. From that article:Iterator functions for a collection can be "member functions" of the C++ collection class itself, but this policy creates a problem if one wishes to enumerate the same collection for different purposes at the same time--e.g., if one wishes to enumerate the Cartesian product of a collection with itself. The problem is that the "state" of the iterator must be stored somewhere, and the only two places are in the instance of the collection or the class of the collection. If the state is stored in the class, then only one iterator can be active for the entire class, while if the state is stored in the instance, then only one iterator can be active for the collection. Neither of these policies provides the flexibility required for our Cartesian product problem. The solution usually proposed for C++ is to provide every collection class with a "friend" class called its "iterator class".<If you put a D opApply as a metod of a collection class, it allows you to enumerate the Cartesian product of a collection with itself too. Bye, bearophile
Jan 11 2008