www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Algorithms & opApply

reply bearophile <bearophileHUGS lycos.com> writes:
The gentle and a-lot-working David Simcha has converted one of my bug reports
into an enhancement request:
http://d.puremagic.com/issues/show_bug.cgi?id=4264

Currently (SVN) only array() is able to digest iterables that define opApply
only, but I think a few more spread support will be very good.

In the thread of that 4264 there are plenty of comments and views of the
situation. Opinions welcome, both here and there.

Bye,
bearophile
Aug 22 2010
next sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
 The gentle and a-lot-working David Simcha has converted one of my bug reports
into an enhancement request:
 http://d.puremagic.com/issues/show_bug.cgi?id=4264
 Currently (SVN) only array() is able to digest iterables that define opApply
only, but I think a few more spread support will be very good.
 In the thread of that 4264 there are plenty of comments and views of the
situation. Opinions welcome, both here and there.
 Bye,
 bearophile
This one didn't get much attention, possibly because the bug report is too long, too complicated and too indirect. Let me reiterate the basic points inline. I've been working heavily on debugging and polishing std.range and std.algorithm. (I've fixed 14 Bugzilla bugs plus several unlisted bugs and enhancements in these modules since last DMD release.) One issue that's been nagging me for a while, and has apparently been nagging Bearophile, too is that opApply-based objects are not supported at all, even where supporting them would be completely feasible. Examples are std.algorithm.map, std.algorithm.filter and std.algorithm.chain. I'm thinking of generalizing std.range and std.algorithm to work with any foreach-able object where possible, including opApply-based iteration. My concerns are: 1. Bug 2443 makes it impossible to propagate ref to higher order objects correctly. 2. Building higher order iterables with opApply might get so slow that it's virtually unusable, due to multiple layers of delegate calls. Then again, this might be premature optimization for a dumb compiler, as LDC can inline through at least one level of opApply delegate calls. I haven't tested to see if it can do more than this. 3. Ranges are the flagship way of iterating through an object in D, as they should be. opApply has its uses, but they're relatively few and far between. I'm wondering if anyone besides bearophile and I think supporting opApply is worth the effort and potential bloat, if Walter cares enough to fix Bug 2443, and if Andrei considers it to be a reasonable part of std.range/std.algorithm's design.
Aug 23 2010
next sibling parent reply Fawzi Mohamed <fawzi gmx.ch> writes:
On 24-ago-10, at 04:26, dsimcha wrote:

 [...]
 3.  Ranges are the flagship way of iterating through an object in D,  
 as they
 should be.  opApply has its uses, but they're relatively few and far  
 between.  I'm
 wondering if anyone besides bearophile and I think supporting  
 opApply is worth the
 effort and potential bloat, if Walter cares enough to fix Bug 2443,  
 and if Andrei
 considers it to be a reasonable part of std.range/std.algorithm's  
 design.
I already defended opApply in the past let me put this here again: ranges abstract away one iteration step. This is useful because it means that combining several of them, adancing in locksteo,... is possible and easy to do opApply abstracts away a whole loop. This makes complex loops simpler to realize without having to resort to fibers, and it allows *parallel* looping in a natural way (which cannot be done directly using only single iteration operations. In blip/dchem I use this quite a bit (for example for parallel loop helpers, often my objects have obj.sLopp obj.pLoop(blocksize=dafaultVal)). I like very much to be able to do foreach (x;obj.pLoop){...} Now they obviously are different kinds of abstractions, so looping in lockstep is not possible with opApply, at most you can lockstep *one* opApply with N iterators. About the operations in algorithm well I am not too sure how much support there should be, if they require just looping then yes, but (for example) I think that almost all of them would fail if feeded with a parallel opApply. For sequential loops fibers can transform opApply in an iterator (just as fibers can be used to easily make an iterator out of a complex loop). But fiber have a relatively high cost. So I suppose I would give support just to the basic stuff + conversion for sequential loops (via fibers), keeping most of std.algorithm opApply free Fawzi
Aug 23 2010
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 24 Aug 2010 01:59:51 -0400, Fawzi Mohamed <fawzi gmx.ch> wrote:

 On 24-ago-10, at 04:26, dsimcha wrote:

 [...]
 3.  Ranges are the flagship way of iterating through an object in D, as  
 they
 should be.  opApply has its uses, but they're relatively few and far  
 between.  I'm
 wondering if anyone besides bearophile and I think supporting opApply  
 is worth the
 effort and potential bloat, if Walter cares enough to fix Bug 2443, and  
 if Andrei
 considers it to be a reasonable part of std.range/std.algorithm's  
 design.
I already defended opApply in the past let me put this here again: ranges abstract away one iteration step. This is useful because it means that combining several of them, adancing in locksteo,... is possible and easy to do opApply abstracts away a whole loop. This makes complex loops simpler to realize without having to resort to fibers, and it allows *parallel* looping in a natural way (which cannot be done directly using only single iteration operations. In blip/dchem I use this quite a bit (for example for parallel loop helpers, often my objects have obj.sLopp obj.pLoop(blocksize=dafaultVal)). I like very much to be able to do foreach (x;obj.pLoop){...} Now they obviously are different kinds of abstractions, so looping in lockstep is not possible with opApply, at most you can lockstep *one* opApply with N iterators. About the operations in algorithm well I am not too sure how much support there should be, if they require just looping then yes, but (for example) I think that almost all of them would fail if feeded with a parallel opApply. For sequential loops fibers can transform opApply in an iterator (just as fibers can be used to easily make an iterator out of a complex loop). But fiber have a relatively high cost. So I suppose I would give support just to the basic stuff + conversion for sequential loops (via fibers), keeping most of std.algorithm opApply free
Building on this, (I once asked "do we need opApply anymore?"), I know of several reasons ranges do not replace opApply: 1. opApply is a more natural fit for virtual functions. 2. opApply can use the stack, ranges cannot. This is useful for say iterating a binary tree without parent pointers. 3. opApply supports the full syntax of foreach, which can include multiple parameters in the loop. I have proposed an enhancement in the past to make all callable symbols foreachable, including delegates, normal functions, and functors. This would obviate the need for a special "opApply" function (just use opCall with the proper signature). The bug is here: http://d.puremagic.com/issues/show_bug.cgi?id=2498 -Steve
Aug 24 2010
prev sibling parent reply Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:
On 8/23/10 19:26 PDT, dsimcha wrote:
 == Quote from bearophile (bearophileHUGS lycos.com)'s article
 The gentle and a-lot-working David Simcha has converted one of my bug reports
into an enhancement request:
 http://d.puremagic.com/issues/show_bug.cgi?id=4264
 Currently (SVN) only array() is able to digest iterables that define opApply
only, but I think a few more spread support will be very good.
 In the thread of that 4264 there are plenty of comments and views of the
situation. Opinions welcome, both here and there.
 Bye,
 bearophile
This one didn't get much attention, possibly because the bug report is too long, too complicated and too indirect. Let me reiterate the basic points inline. I've been working heavily on debugging and polishing std.range and std.algorithm. (I've fixed 14 Bugzilla bugs plus several unlisted bugs and enhancements in these modules since last DMD release.) One issue that's been nagging me for a while, and has apparently been nagging Bearophile, too is that opApply-based objects are not supported at all, even where supporting them would be completely feasible. Examples are std.algorithm.map, std.algorithm.filter and std.algorithm.chain. I'm thinking of generalizing std.range and std.algorithm to work with any foreach-able object where possible, including opApply-based iteration. My concerns are: 1. Bug 2443 makes it impossible to propagate ref to higher order objects correctly. 2. Building higher order iterables with opApply might get so slow that it's virtually unusable, due to multiple layers of delegate calls. Then again, this might be premature optimization for a dumb compiler, as LDC can inline through at least one level of opApply delegate calls. I haven't tested to see if it can do more than this. 3. Ranges are the flagship way of iterating through an object in D, as they should be. opApply has its uses, but they're relatively few and far between. I'm wondering if anyone besides bearophile and I think supporting opApply is worth the effort and potential bloat, if Walter cares enough to fix Bug 2443, and if Andrei considers it to be a reasonable part of std.range/std.algorithm's design.
Thanks for this note. First, I think those interested in the relative merits and demerits of ranges vs. opApply should read this: http://okmij.org/ftp/papers/LL3-collections-enumerators.txt. The author, Oleg Kiselyov - an excellent PL theorist - makes a lot of excellent points (but also misses a few; I plan to write a rebuttal to that article a some point). Clearly opApply (and internal iteration in general) has many merits that are difficult to rival with ranges. The converse is also true. So perhaps it's compelling to have both. One concern regarding opApply is speed of manipulation through an indirect (delegate) call. It would be great to have some benchmarks that give us a good baseline for discussion. David, would you want to put a few together? Andrei
Aug 26 2010
parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 On 8/23/10 19:26 PDT, dsimcha wrote:
 == Quote from bearophile (bearophileHUGS lycos.com)'s article
 The gentle and a-lot-working David Simcha has converted one of my bug reports
into an enhancement request:
 http://d.puremagic.com/issues/show_bug.cgi?id=4264
 Currently (SVN) only array() is able to digest iterables that define opApply
only, but I think a few more spread support will be very good.
 In the thread of that 4264 there are plenty of comments and views of the
situation. Opinions welcome, both here and there.
 Bye,
 bearophile
This one didn't get much attention, possibly because the bug report is too long, too complicated and too indirect. Let me reiterate the basic points inline. I've been working heavily on debugging and polishing std.range and std.algorithm. (I've fixed 14 Bugzilla bugs plus several unlisted bugs and enhancements in
these
 modules since last DMD release.)  One issue that's been nagging me for a while,
 and has apparently been nagging Bearophile, too is that opApply-based objects
are
 not supported at all, even where supporting them would be completely feasible.
 Examples are std.algorithm.map, std.algorithm.filter and std.algorithm.chain. 
I'm
 thinking of generalizing std.range and std.algorithm to work with any
foreach-able
 object where possible, including opApply-based iteration.  My concerns are:

 1.  Bug 2443 makes it impossible to propagate ref to higher order objects
correctly.
 2.  Building higher order iterables with opApply might get so slow that it's
 virtually unusable, due to multiple layers of delegate calls.  Then again, this
 might be premature optimization for a dumb compiler, as LDC can inline through
at
 least one level of opApply delegate calls.  I haven't tested to see if it can
do
 more than this.

 3.  Ranges are the flagship way of iterating through an object in D, as they
 should be.  opApply has its uses, but they're relatively few and far between. 
I'm
 wondering if anyone besides bearophile and I think supporting opApply is worth
the
 effort and potential bloat, if Walter cares enough to fix Bug 2443, and if
Andrei
 considers it to be a reasonable part of std.range/std.algorithm's design.
Thanks for this note. First, I think those interested in the relative merits and demerits of ranges vs. opApply should read this: http://okmij.org/ftp/papers/LL3-collections-enumerators.txt. The author, Oleg Kiselyov - an excellent PL theorist - makes a lot of excellent points (but also misses a few; I plan to write a rebuttal to that article a some point). Clearly opApply (and internal iteration in general) has many merits that are difficult to rival with ranges. The converse is also true. So perhaps it's compelling to have both. One concern regarding opApply is speed of manipulation through an indirect (delegate) call. It would be great to have some benchmarks that give us a good baseline for discussion. David, would you want to put a few together? Andrei
Here's one benchmark and the results, using SHOO's awesome new StopWatch module that just landed in SVN a few days ago: import std.stdio, std.functional, std.stopwatch, std.traits, std.range, std.algorithm; struct OpApplyFilter(alias pred, Range) { alias unaryFun!pred fun; Range range; int opApply(int delegate(ref ForeachType!Range) dg) { int res; foreach(elem; range) if(fun(elem)) { res = dg(elem); if(res) break; } return res; } } auto opApplyFilter(alias pred, Range)(Range range) { return OpApplyFilter!(pred, Range)(range); } struct OpApplyMap(alias pred, Range) { alias unaryFun!pred fun; Range range; int opApply(int delegate(ref ForeachType!Range) dg) { int res; foreach(elem; range) { auto mapped = fun(elem); res = dg(mapped); if(res) break; } return res; } } auto opApplyMap(alias pred, Range)(Range range) { return OpApplyMap!(pred, Range)(range); } void main() { auto myRange = iota(100_000_000); // Keep the compiler from excessively optimizing. int sum; auto sw = new StopWatch(StopWatch.AutoStart.yes); foreach(elem; map!"a * a"(filter!"a & 1"(myRange))) { sum += elem; } sw.stop; writeln("Range-based: ", sw.peek.milliseconds); sw.reset(); sw.start(); foreach(elem; opApplyMap!"a * a"(opApplyFilter!"a & 1"(myRange))) { sum += elem; } sw.stop; writeln("opApply-based: ", sw.peek.milliseconds); sw.reset(); sw.start(); foreach(elem; opApplyMap!"a * a"(myRange)) { sum += elem; } sw.stop; writeln("opApply-based, Map only: ", sw.peek.milliseconds); sw.reset(); sw.start(); foreach(elem; opApplyFilter!"a & 1"( opApplyMap!"a + 1"(opApplyMap!"a * a"(myRange)))) { sum += elem; } sw.stop; writeln("opApply-based, two maps: ", sw.peek.milliseconds); // This is to force the value of sum to be used to prevent // overly aggressive optimizations by DMD. writeln("Ignore this: ", sum); } Results: Range-based: 404.445 opApply-based: 718.914 opApply-based, Map only: 524.051 opApply-based, two maps: 1501.92 Ignore this: 1158518272 So it seems like adding more and more layers of stacking causes each layer of stacking to have progressively more overhead. This makes sense because each layer of stacking is likely to have a progressively worse effect on the CPU pipeline. The CPU can probably speculatively execute around one layer of indirect calls, but not a whole bunch. BTW, from looking at disassemblies it seems like for some strange reason the predicate functions are never getting inlined.
Aug 27 2010
parent bearophile <bearophileHUGS lycos.com> writes:
dsimcha:
 So it seems like adding more and more layers of stacking causes each layer of
 stacking to have progressively more overhead.  This makes sense because each
layer
 of stacking is likely to have a progressively worse effect on the CPU pipeline.
 The CPU can probably speculatively execute around one layer of indirect calls,
but
 not a whole bunch.
See also: http://tinyurl.com/2w5jlh6 Bye, bearophile
Aug 27 2010
prev sibling parent reply dsimcha <dsimcha yahoo.com> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
 The gentle and a-lot-working David Simcha has converted one of my bug reports
into an enhancement request:
 http://d.puremagic.com/issues/show_bug.cgi?id=4264
 Currently (SVN) only array() is able to digest iterables that define opApply
only, but I think a few more spread support will be very good.
 In the thread of that 4264 there are plenty of comments and views of the
situation. Opinions welcome, both here and there.
 Bye,
 bearophile
I've started looking into the performance aspects of supporting opApply in std.algorithm and std.range where possible. I created quick and dirty Map, Filter and Until opApply-based iterables for D1. I then instantiated them with int[] arrays and some trivial (as in, compile to a single instruction) predicates, compiled them on LDC with full optimizations and started reading some disassemblies. It seems even LDC is too stupid to fully inline cases where Map is stacked with Filter. Things seem to get partially inlined in the most arbitrary ways, even though Map gets fully inlined when used alone. Even stranger, the delegate calls get converted to direct function calls, even though they don't get inlined. I wonder if this is a weakness of compiler inlining heuristics in general, though. When you stack a whole bunch of trivial function calls together, such these function calls would compile down to a handful of instructions if every single one of them was inlined, is this just an ugly corner case in the imperfect heuristics compilers use to inline functions? If so, now I understand why old-school programmers don't trust compilers.
Aug 25 2010
parent bearophile <bearophileHUGS lycos.com> writes:
dsimcha:
 even though Map gets fully inlined when used alone.
This is what I too have found in past.
 I wonder if this is a weakness of compiler inlining heuristics in general,
though.
I have a bug report open for LLVM about this :-)
 is this just an ugly corner case in the imperfect heuristics
 compilers use to inline functions?
Those heuristics are often fragile.
 if so, now I understand why old-school
 programmers don't trust compilers.
Good :-) On the other hand I have heard that the Scala compiler is better in inlining callables. Bye, bearophile
Aug 25 2010