digitalmars.D - Iterator straw-man
- Sean Kelly (145/145) Nov 07 2006 C++ supports pointers as iterators largely for compatibility with
- Craig Black (10/10) Nov 07 2006 I do agree that using classes and interfaces to represent iterators is a...
- Sean Kelly (6/19) Nov 07 2006 Yup. The reason for a proposal is twofold: to establish a common idiom
- Bill Baxter (14/27) Nov 07 2006 I haven't read Seans big long proposal yet, but I think the method that
- Sean Kelly (3/22) Nov 08 2006 I agree. In my proposal I used fwdIt and revIt for lack of a better ide...
- Bill Baxter (44/81) Nov 07 2006 I don't think opApply should be the mechanism by which foreach accepts
- Benji Smith (9/12) Nov 08 2006 Why the continued assumption that there are only two valid kinds of
- Bill Baxter (14/28) Nov 08 2006 Because those two will be recognized specially by foreach and
- Sean Kelly (20/117) Nov 08 2006 True enough. I mostly added it as an alternative for situations where
- Bill Baxter (5/14) Nov 08 2006 That's true. With all current uses you could see the 'op' as synonymous...
C++ supports pointers as iterators largely for compatibility with
existing/old code and because of the prevalence of pointer use in the
language. D however, is not burdened with the need for backwards
compatibility, nor is pointer use at all common. Languages without
pointers typically implement iterators as a single object that is itself
aware of the end of the referenced sequence. But while this works quite
well for sequential operations, it falls apart somewhat for random
access operations, particularly in languages lacking operator overloading.
Unlike C++, D contains a fairly robust built-in dynamic array type.
These arrays may be passed by value efficiently and contain both
built-in property methods and user extension of property semantics using
the following convention:
void mutate( char[] c, int i );
char[] c;
c.mutate( 0 );
D also supports a slice syntax which is roughly comparable to a
self-contained random access iterator. Thus "abcdef"[0 .. 2] returns a
dynamic array type supporting all the typical array operations,
including the length property for determining an end point. Slicing is
supported for user-defined types as well, so this method may be applied
to any container supporting random access.
Given the presence of D's slice syntax, it would be a shame to ignore it
in favor of pointer-style random access iterators for algorithm use.
And while the idea of a single iterator design for all access methods
(sequential and random access) seems vaguely appealing, it relies
heavily on traits templates to allow proper function overloading for
different iterator categories. In fact, overloading template functions
for different iterator categories is so annoying that the C++ committee
is adding concept checking to C++ 0x to simplify the implementation of
such overloaded template functions.
I believe D's slice syntax offers a perfect opportunity to provide a
reasonable, intuitive syntax for robust "smart" random-access iterators.
The built-in array type will work as-is, and a user-defined random
access container could support the same syntax by providing opIndex,
opIndexAssign, opSlice, and opSliceAssign for both the container and
iterator types. Obtaining an iterator on a random access container,
then, would be done via a slice operation:
auto i = myCont[0 .. $];
foreach( e; i ) {} // sequential access
auto e = i[0]; // random access
The remaining iterator categories in C++ are essentially variations of
forward iterators and bidirectional iterators, and could roughly be
described as requiring a means of advancing, testing whether the
iterator references a valid element, and of accessing the referenced
element. Testing validity in C++ is done by comparing the data iterator
with an "end" iterator for equality, as again, this syntax works with
pointers as well. But using two separate iterators for defining the
bounds of a range:
* Is cumbersome. It means passing two parameters to every function
requiring an iterator instead of one.
* Is dangerous. Comparing the "end" iterator of one range with the
data iterator of another range is legal if the data (and possibly
container) types match, and can result in an attempt to read/write past
the end of the referenced sequence.
* Is unnecessary. It is more natural, and more common, to perform
random access operations via array index operations, which leaves
typical iterator use largely restricted to sequential access. This
being the case, why support pointers as valid iterators?
What follows is a rough description of a "smart" read-only iterator for D:
interface Iterator<T>
{
bool atEnd();
T next();
T curr();
int opApply( int delegate( T ) dg );
}
The nuances are unimportant, but the basic idea is that any such
iterator should provide a means of advancing, testing whether the
iterator references a valid element, and of referencing that element.
Also, opApply is supported to allow iterators to be used in foreach
operations for simple uses.
Assuming such a design, the iterator type determines progress through
the container (forward, reverse, stepped, etc) rather than the algorithm
being applied. Here is a straightforward implementation of some
algorithms using such iterators, 'C' is a container type for demonstration:
bool contains( C cont, E match ) {
foreach( e; cont ) {
if( e == match )
return true;
}
return false;
}
I find_first( C cont, E match ) {
auto i = cont.fwdIt;
foreach( e; i ) {
if( e == match )
return i;
}
return i;
}
I find_last( C cont, E match ) {
auto i = cont.revIt;
foreach( e; i ) {
if( e == match )
return i;
}
return i;
}
Given the above, it seems awkward that the iterator must be declared
outside of foreach so a copy may be made to return the position of the
found element. Therefore, foreach should allow an arbitrary key type to
be returned by opApply. For array operations, the key could continue to
be an integer representing the array postion., but in sequence
operations the key could be an iterator representing the current
position. Thus, the following operations would be equivalent:
1) auto i = cont.fwdIt;
foreach( e; i ) {
if( e == match )
return i;
}
2) foreach( i, e; cont.fwdIt ) {
if( e == match )
return i;
}
3) foreach( i, e; cont ) {
if( e == match )
return i;
}
This being the case, I believe iterators must be copyable in a generic
manner, and ideally, copies must be relatively efficient to perform.
Therefore, I suggest the addition of a .dup() property to the standard
iterator interface to flatten the differences between class and
struct-based iterators, and it may be advisable to use structs as
iterators in the standard case and forget about classes and interfaces
altogether. That said, if a reference to the current iterator could be
passed around within foreach instead of a copy being made for each
iteration, then much of the cost could be eliminated. This should
easily be possible using the same auto-dereferencing semantics used by
"inout" parameters in D already.
So in summation, D adequately represents "smart" random access iterators
via array slice semantics and sequential iteration is best represented
using a single "smart" iterator object as opposed to a pair of "dumb"
iterators as per C++. Finally, the example code above suggests that the
need for a foreach_reverse statement may not be necessary if fwdIt and
revIt properties are provided for built-in arrays. Though the only
functional difference between:
foreach( i, e; array )
and
foreach( i, e; array.fwdIt )
would be the "key" type returned in 'i'. In the first case it would be
an integer index and in the second it would be an iterator.
As an aside, foreach_reverse may remain useful in instances where we
want the index of the current element instead of an iterator. Or array
iterators may provide a means to obtain the relevant index. Perhaps
someone could make an argument for or against based on these ideas.
Nov 07 2006
I do agree that using classes and interfaces to represent iterators is a bad idea for performance reasons. I also agree that I don't think it's necessary to support raw pointers. IMO iterators should be structs such that iteration can be performed as follows: for(auto i = container.begin(); i.isValid(); i.next()) write(i.value); If I am not mistaken, this can already be done in D. With foreach support this would simplify to. foreach(i; container) write(i); -Craig
Nov 07 2006
Craig Black wrote:I do agree that using classes and interfaces to represent iterators is a bad idea for performance reasons. I also agree that I don't think it's necessary to support raw pointers. IMO iterators should be structs such that iteration can be performed as follows: for(auto i = container.begin(); i.isValid(); i.next()) write(i.value); If I am not mistaken, this can already be done in D. With foreach support this would simplify to. foreach(i; container) write(i);Yup. The reason for a proposal is twofold: to establish a common idiom for people to follow so code can interact reliably, and to determine whether any built-in behavior needs to be tweaked to support iterators in a way this is both generic (for UDTs, AAs, and arrays) and elegant. Sean
Nov 07 2006
Craig Black wrote:I do agree that using classes and interfaces to represent iterators is a bad idea for performance reasons. I also agree that I don't think it's necessary to support raw pointers. IMO iterators should be structs such that iteration can be performed as follows: for(auto i = container.begin(); i.isValid(); i.next()) write(i.value); If I am not mistaken, this can already be done in D. With foreach support this would simplify to. foreach(i; container) write(i);I haven't read Seans big long proposal yet, but I think the method that returns an iterator should not be called 'begin'. I should be called 'forward' or 'iterator' or 'iter' or 'forward_iterator' or something like that. Then i would hope that all of the following would be possible: foreach(i; container.iter) write(i); foreach(i; container.reverse_iter) write(i); foreach(i; container.my_skip_every_prime_number_iter) write(i); foreach(i; container) can look for the .iter (or opIterator, or whatever standard name is decided upon). foreach_reverse(i; container) [shudder] can look for the .riter (or opIteratorReverse, or whatever). --bb
Nov 07 2006
Bill Baxter wrote:Craig Black wrote:I agree. In my proposal I used fwdIt and revIt for lack of a better idea. SeanI do agree that using classes and interfaces to represent iterators is a bad idea for performance reasons. I also agree that I don't think it's necessary to support raw pointers. IMO iterators should be structs such that iteration can be performed as follows: for(auto i = container.begin(); i.isValid(); i.next()) write(i.value); If I am not mistaken, this can already be done in D. With foreach support this would simplify to. foreach(i; container) write(i);I haven't read Seans big long proposal yet, but I think the method that returns an iterator should not be called 'begin'. I should be called 'forward' or 'iterator' or 'iter' or 'forward_iterator' or something like that.
Nov 08 2006
Sean Kelly wrote: I mostly agree with you. But have some comments on the specifics.What follows is a rough description of a "smart" read-only iterator for D: interface Iterator<T> { bool atEnd(); T next(); T curr(); int opApply( int delegate( T ) dg ); } The nuances are unimportant, but the basic idea is that any such iterator should provide a means of advancing, testing whether the iterator references a valid element, and of referencing that element. Also, opApply is supported to allow iterators to be used in foreach operations for simple uses.I don't think opApply should be the mechanism by which foreach accepts iterators. It's just too convoluted and too much a burden on iterator implementors. And it's not going to be as efficient as a raw for-loop, either.Assuming such a design, the iterator type determines progress through the container (forward, reverse, stepped, etc) rather than the algorithm being applied. Here is a straightforward implementation of some algorithms using such iterators, 'C' is a container type for demonstration: bool contains( C cont, E match ) { foreach( e; cont ) { if( e == match ) return true; } return false; }It should be possible to make a template out of that, too. bool contains(C,E)( C cont, E match ) { foreach( e; cont ) { if( e == match ) return true; } return false; } Current compiler limitations aside, that *should* be reducible to the same efficiency as a for loop with simple pointer (or index) manipulation.[...] Given the above, it seems awkward that the iterator must be declared outside of foreach so a copy may be made to return the position of the found element. Therefore, foreach should allow an arbitrary key type to be returned by opApply.I like that. It makes sense. The iterator itself is analogous to the int index when foreach-ing arrays. That seems very logical. Given that similarity it seems like it would make sense for indexing a container by an iterator to work as well: foreach( i, e; cont.fwdIt ) { if( e == match ) { writefln("Matched item ", cont[i]); return i; } } I.e. cont[i] becomes synonymous with i.curr(). That would be easy to arrange if opIndex was allowed to take an iterator as the index. (I think it's a good idea to remove the restriction on opIndex arguments in any event, but this is another justification.) An opIndex for an iterator would take the simple form: static T opIndex(ContIterator i) { return i.curr(); } If you had write-able iterator with say a .set(T v) method, then you could also have opIndexAssign: static T opIndexAssign(ContIterator i, T x) { return i.set(x); } If the Iterator type has a pointer to its container, you could make the methods non-static and assert(this==i.container).As an aside, foreach_reverse may remain useful in instances where we want the index of the current element instead of an iterator. Or array iterators may provide a means to obtain the relevant index. Perhaps someone could make an argument for or against based on these ideas.foreach_reverse(i,x; cont) would remain (marginally) useful as a shortcut for the special reverse iterator name: I.e. a synonym for foreach(i,x; cont.revIter) I'm guessing the specially-ordained iterator method names will probably end up being opIter and opIterReverse (or maybe same with 'Iterator' spelled out) just following the pattern established so far. --bb
Nov 07 2006
Bill Baxter wrote:I'm guessing the specially-ordained iterator method names will probably end up being opIter and opIterReverse (or maybe same with 'Iterator' spelled out) just following the pattern established so far.Why the continued assumption that there are only two valid kinds of iterator? A hashtable might have forward and reverse iterators for both its keys and its values, thereby requiring four distinct iterators. A tree might have depth-first, breadth-first, in-order, and reverse-order iterators. Any collection class might have a random-order iterator. I think it's awfully presumptuous to bake the notion of forward and reverse iteration directly into the language. --benji
Nov 08 2006
Benji Smith wrote:Bill Baxter wrote:Because those two will be recognized specially by foreach and foreach_reverse without having to specify a method name. But you'll still be able to pass in a specific iterator of whatever kind you wish. A hashtable might have forward and reverse iterators for bothI'm guessing the specially-ordained iterator method names will probably end up being opIter and opIterReverse (or maybe same with 'Iterator' spelled out) just following the pattern established so far.Why the continued assumption that there are only two valid kinds of iterator?its keys and its values, thereby requiring four distinct iterators.Yep, for hashtable you may have to name them all explicitly: foreach(k; ht.keys) foreach(v; ht.values) foreach(k; ht.keysReverse) foreach(v; ht.valuesReverse) Atree might have depth-first, breadth-first, in-order, and reverse-order iterators. Any collection class might have a random-order iterator. I think it's awfully presumptuous to bake the notion of forward and reverse iteration directly into the language.Can't say I disagree with you, but it's already there with foreach and foreach_reverse and opApply/opApplyReverse. --bb
Nov 08 2006
Bill Baxter wrote:Sean Kelly wrote: I mostly agree with you. But have some comments on the specifics.True enough. I mostly added it as an alternative for situations where users didn't need the power of a manual loop. And I think implementation would be pretty simple for most/all iterators: for( ; !atEnd(); next() ) { ret = dg( curr() ); // handle ret }What follows is a rough description of a "smart" read-only iterator for D: interface Iterator<T> { bool atEnd(); T next(); T curr(); int opApply( int delegate( T ) dg ); } The nuances are unimportant, but the basic idea is that any such iterator should provide a means of advancing, testing whether the iterator references a valid element, and of referencing that element. Also, opApply is supported to allow iterators to be used in foreach operations for simple uses.I don't think opApply should be the mechanism by which foreach accepts iterators. It's just too convoluted and too much a burden on iterator implementors. And it's not going to be as efficient as a raw for-loop, either.Yup. The function above was meant for illustrating an idea more than it was a proposal for how to implement a contains function :-)Assuming such a design, the iterator type determines progress through the container (forward, reverse, stepped, etc) rather than the algorithm being applied. Here is a straightforward implementation of some algorithms using such iterators, 'C' is a container type for demonstration: bool contains( C cont, E match ) { foreach( e; cont ) { if( e == match ) return true; } return false; }It should be possible to make a template out of that, too. bool contains(C,E)( C cont, E match ) { foreach( e; cont ) { if( e == match ) return true; } return false; } Current compiler limitations aside, that *should* be reducible to the same efficiency as a for loop with simple pointer (or index) manipulation.Hrm, good point.[...] Given the above, it seems awkward that the iterator must be declared outside of foreach so a copy may be made to return the position of the found element. Therefore, foreach should allow an arbitrary key type to be returned by opApply.I like that. It makes sense. The iterator itself is analogous to the int index when foreach-ing arrays. That seems very logical. Given that similarity it seems like it would make sense for indexing a container by an iterator to work as well: foreach( i, e; cont.fwdIt ) { if( e == match ) { writefln("Matched item ", cont[i]); return i; } } I.e. cont[i] becomes synonymous with i.curr().That would be easy to arrange if opIndex was allowed to take an iterator as the index. (I think it's a good idea to remove the restriction on opIndex arguments in any event, but this is another justification.)Is it currently required to be an integer? I had no idea.An opIndex for an iterator would take the simple form: static T opIndex(ContIterator i) { return i.curr(); } If you had write-able iterator with say a .set(T v) method, then you could also have opIndexAssign: static T opIndexAssign(ContIterator i, T x) { return i.set(x); } If the Iterator type has a pointer to its container, you could make the methods non-static and assert(this==i.container).All good ideas.True enough.As an aside, foreach_reverse may remain useful in instances where we want the index of the current element instead of an iterator. Or array iterators may provide a means to obtain the relevant index. Perhaps someone could make an argument for or against based on these ideas.foreach_reverse(i,x; cont) would remain (marginally) useful as a shortcut for the special reverse iterator name: I.e. a synonym for foreach(i,x; cont.revIter)I'm guessing the specially-ordained iterator method names will probably end up being opIter and opIterReverse (or maybe same with 'Iterator' spelled out) just following the pattern established so far.Following the D convention, the 'op' prefix suggests a use form different from that of a normal function call. Personally, I'd be fine with this done as plain old property methods: fwdIt/revIt, or something along those lines. Sean
Nov 08 2006
Sean Kelly wrote:Bill Baxter wrote:Sean Kelly wrote:Following the D convention, the 'op' prefix suggests a use form different from that of a normal function call. Personally, I'd be fine with this done as plain old property methods: fwdIt/revIt, or something along those lines.That's true. With all current uses you could see the 'op' as synonymous with "you ain't never gonna call this directly". So something intended to be called directly doesn't necessarily have to follow that convention. --bb
Nov 08 2006









Sean Kelly <sean f4.ca> 