digitalmars.D - Algorithms & opApply

bearophile (6/6) Aug 22 2010 The gentle and a-lot-working David Simcha has converted one of my bug re...

dsimcha (25/31) Aug 23 2010 only, but I think a few more spread support will be very good.

Fawzi Mohamed (27/38) Aug 23 2010 I already defended opApply in the past let me put this here again:

Steven Schveighoffer (14/51) Aug 24 2010 Building on this, (I once asked "do we need opApply anymore?"), I know o...

Andrei Alexandrescu (16/47) Aug 26 2010 Thanks for this note.

dsimcha (87/139) Aug 27 2010 correctly.

bearophile (5/10) Aug 27 2010 See also:

dsimcha (20/26) Aug 25 2010 only, but I think a few more spread support will be very good.

bearophile (8/14) Aug 25 2010 I have a bug report open for LLVM about this :-)

bearophile <bearophileHUGS lycos.com> writes:

The gentle and a-lot-working David Simcha has converted one of my bug reports
into an enhancement request:
http://d.puremagic.com/issues/show_bug.cgi?id=4264

Currently (SVN) only array() is able to digest iterables that define opApply
only, but I think a few more spread support will be very good.

In the thread of that 4264 there are plenty of comments and views of the
situation. Opinions welcome, both here and there.

Bye,
bearophile

Aug 22 2010

dsimcha <dsimcha yahoo.com> writes:

== Quote from bearophile (bearophileHUGS lycos.com)'s article
 The gentle and a-lot-working David Simcha has converted one of my bug reports

into an enhancement request:
 http://d.puremagic.com/issues/show_bug.cgi?id=4264
 Currently (SVN) only array() is able to digest iterables that define opApply

only, but I think a few more spread support will be very good.
 In the thread of that 4264 there are plenty of comments and views of the

situation. Opinions welcome, both here and there.
 Bye,
 bearophile

This one didn't get much attention, possibly because the bug report is too long,
too complicated and too indirect.  Let me reiterate the basic points inline.

I've been working heavily on debugging and polishing std.range and
std.algorithm.
 (I've fixed 14 Bugzilla bugs plus several unlisted bugs and enhancements in
these
modules since last DMD release.)  One issue that's been nagging me for a while,
and has apparently been nagging Bearophile, too is that opApply-based objects
are
not supported at all, even where supporting them would be completely feasible.
Examples are std.algorithm.map, std.algorithm.filter and std.algorithm.chain. 
I'm
thinking of generalizing std.range and std.algorithm to work with any
foreach-able
object where possible, including opApply-based iteration.  My concerns are:

1.  Bug 2443 makes it impossible to propagate ref to higher order objects
correctly.

2.  Building higher order iterables with opApply might get so slow that it's
virtually unusable, due to multiple layers of delegate calls.  Then again, this
might be premature optimization for a dumb compiler, as LDC can inline through
at
least one level of opApply delegate calls.  I haven't tested to see if it can do
more than this.

3.  Ranges are the flagship way of iterating through an object in D, as they
should be.  opApply has its uses, but they're relatively few and far between. 
I'm
wondering if anyone besides bearophile and I think supporting opApply is worth
the
effort and potential bloat, if Walter cares enough to fix Bug 2443, and if
Andrei
considers it to be a reasonable part of std.range/std.algorithm's design.

Aug 23 2010

Fawzi Mohamed <fawzi gmx.ch> writes:

On 24-ago-10, at 04:26, dsimcha wrote:

 [...]
 3.  Ranges are the flagship way of iterating through an object in D,  
 as they
 should be.  opApply has its uses, but they're relatively few and far  
 between.  I'm
 wondering if anyone besides bearophile and I think supporting  
 opApply is worth the
 effort and potential bloat, if Walter cares enough to fix Bug 2443,  
 and if Andrei
 considers it to be a reasonable part of std.range/std.algorithm's  
 design.

I already defended opApply in the past let me put this here again:

ranges abstract away one iteration step. This is useful because it  
means that combining several of them, adancing in locksteo,... is  
possible and easy to do

opApply abstracts away a whole loop. This makes complex loops simpler  
to realize without having to resort to fibers, and it allows  
*parallel* looping in a natural way (which cannot be done directly  
using only single iteration operations.
In blip/dchem I use this quite a bit (for example for parallel loop  
helpers, often my objects have obj.sLopp  
obj.pLoop(blocksize=dafaultVal)).
I like very much to be able to do foreach (x;obj.pLoop){...}

Now they obviously are different kinds of abstractions, so looping in  
lockstep is not possible with opApply, at most you can lockstep *one*  
opApply with N iterators.

About the operations in algorithm well I am not too sure how much  
support there should be, if they require just looping then yes, but  
(for example) I think that almost all of them would fail if feeded  
with a parallel opApply.
For sequential loops fibers can transform opApply in an iterator (just  
as fibers can be used to easily make an iterator out of a complex loop).
But fiber have a relatively high cost.
So I suppose I would give support just to the basic stuff + conversion  
for sequential loops (via fibers), keeping most of std.algorithm  
opApply free

Fawzi

Aug 23 2010

"Steven Schveighoffer" <schveiguy yahoo.com> writes:

On Tue, 24 Aug 2010 01:59:51 -0400, Fawzi Mohamed <fawzi gmx.ch> wrote:

 On 24-ago-10, at 04:26, dsimcha wrote:

 [...]
 3.  Ranges are the flagship way of iterating through an object in D, as  
 they
 should be.  opApply has its uses, but they're relatively few and far  
 between.  I'm
 wondering if anyone besides bearophile and I think supporting opApply  
 is worth the
 effort and potential bloat, if Walter cares enough to fix Bug 2443, and  
 if Andrei
 considers it to be a reasonable part of std.range/std.algorithm's  
 design.

 I already defended opApply in the past let me put this here again:

 ranges abstract away one iteration step. This is useful because it means  
 that combining several of them, adancing in locksteo,... is possible and  
 easy to do

 opApply abstracts away a whole loop. This makes complex loops simpler to  
 realize without having to resort to fibers, and it allows *parallel*  
 looping in a natural way (which cannot be done directly using only  
 single iteration operations.
 In blip/dchem I use this quite a bit (for example for parallel loop  
 helpers, often my objects have obj.sLopp  
 obj.pLoop(blocksize=dafaultVal)).
 I like very much to be able to do foreach (x;obj.pLoop){...}

 Now they obviously are different kinds of abstractions, so looping in  
 lockstep is not possible with opApply, at most you can lockstep *one*  
 opApply with N iterators.

 About the operations in algorithm well I am not too sure how much  
 support there should be, if they require just looping then yes, but (for  
 example) I think that almost all of them would fail if feeded with a  
 parallel opApply.
 For sequential loops fibers can transform opApply in an iterator (just  
 as fibers can be used to easily make an iterator out of a complex loop).
 But fiber have a relatively high cost.
 So I suppose I would give support just to the basic stuff + conversion  
 for sequential loops (via fibers), keeping most of std.algorithm opApply  
 free


Building on this, (I once asked "do we need opApply anymore?"), I know of  
several reasons ranges do not replace opApply:

1. opApply is a more natural fit for virtual functions.
2. opApply can use the stack, ranges cannot.  This is useful for say  
iterating a binary tree without parent pointers.
3. opApply supports the full syntax of foreach, which can include multiple  
parameters in the loop.

I have proposed an enhancement in the past to make all callable symbols  
foreachable, including delegates, normal functions, and functors.  This  
would obviate the need for a special "opApply" function (just use opCall  
with the proper signature).

The bug is here: http://d.puremagic.com/issues/show_bug.cgi?id=2498

-Steve

Aug 24 2010

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 8/23/10 19:26 PDT, dsimcha wrote:
 == Quote from bearophile (bearophileHUGS lycos.com)'s article
 The gentle and a-lot-working David Simcha has converted one of my bug reports

 into an enhancement request:
 http://d.puremagic.com/issues/show_bug.cgi?id=4264
 Currently (SVN) only array() is able to digest iterables that define opApply

 only, but I think a few more spread support will be very good.
 In the thread of that 4264 there are plenty of comments and views of the

 situation. Opinions welcome, both here and there.
 Bye,
 bearophile

 This one didn't get much attention, possibly because the bug report is too
long,
 too complicated and too indirect.  Let me reiterate the basic points inline.

 I've been working heavily on debugging and polishing std.range and
std.algorithm.
   (I've fixed 14 Bugzilla bugs plus several unlisted bugs and enhancements in
these
 modules since last DMD release.)  One issue that's been nagging me for a while,
 and has apparently been nagging Bearophile, too is that opApply-based objects
are
 not supported at all, even where supporting them would be completely feasible.
 Examples are std.algorithm.map, std.algorithm.filter and std.algorithm.chain. 
I'm
 thinking of generalizing std.range and std.algorithm to work with any
foreach-able
 object where possible, including opApply-based iteration.  My concerns are:

 1.  Bug 2443 makes it impossible to propagate ref to higher order objects
correctly.

 2.  Building higher order iterables with opApply might get so slow that it's
 virtually unusable, due to multiple layers of delegate calls.  Then again, this
 might be premature optimization for a dumb compiler, as LDC can inline through
at
 least one level of opApply delegate calls.  I haven't tested to see if it can
do
 more than this.

 3.  Ranges are the flagship way of iterating through an object in D, as they
 should be.  opApply has its uses, but they're relatively few and far between. 
I'm
 wondering if anyone besides bearophile and I think supporting opApply is worth
the
 effort and potential bloat, if Walter cares enough to fix Bug 2443, and if
Andrei
 considers it to be a reasonable part of std.range/std.algorithm's design.

Thanks for this note.

First, I think those interested in the relative merits and demerits of 
ranges vs. opApply should read this: 
http://okmij.org/ftp/papers/LL3-collections-enumerators.txt. The author, 
Oleg Kiselyov - an excellent PL theorist - makes a lot of excellent 
points (but also misses a few; I plan to write a rebuttal to that 
article a some point).

Clearly opApply (and internal iteration in general) has many merits that 
are difficult to rival with ranges. The converse is also true. So 
perhaps it's compelling to have both.

One concern regarding opApply is speed of manipulation through an 
indirect (delegate) call. It would be great to have some benchmarks that 
give us a good baseline for discussion. David, would you want to put a 
few together?


Andrei

Aug 26 2010

dsimcha <dsimcha yahoo.com> writes:

== Quote from Andrei Alexandrescu (SeeWebsiteForEmail erdani.org)'s article
 On 8/23/10 19:26 PDT, dsimcha wrote:
 == Quote from bearophile (bearophileHUGS lycos.com)'s article
 The gentle and a-lot-working David Simcha has converted one of my bug reports

 into an enhancement request:
 http://d.puremagic.com/issues/show_bug.cgi?id=4264
 Currently (SVN) only array() is able to digest iterables that define opApply

 only, but I think a few more spread support will be very good.
 In the thread of that 4264 there are plenty of comments and views of the

 situation. Opinions welcome, both here and there.
 Bye,
 bearophile

 This one didn't get much attention, possibly because the bug report is too
long,
 too complicated and too indirect.  Let me reiterate the basic points inline.

 I've been working heavily on debugging and polishing std.range and
std.algorithm.
   (I've fixed 14 Bugzilla bugs plus several unlisted bugs and enhancements in


these
 modules since last DMD release.)  One issue that's been nagging me for a while,
 and has apparently been nagging Bearophile, too is that opApply-based objects
are
 not supported at all, even where supporting them would be completely feasible.
 Examples are std.algorithm.map, std.algorithm.filter and std.algorithm.chain. 
I'm
 thinking of generalizing std.range and std.algorithm to work with any
foreach-able
 object where possible, including opApply-based iteration.  My concerns are:

 1.  Bug 2443 makes it impossible to propagate ref to higher order objects


correctly.
 2.  Building higher order iterables with opApply might get so slow that it's
 virtually unusable, due to multiple layers of delegate calls.  Then again, this
 might be premature optimization for a dumb compiler, as LDC can inline through
at
 least one level of opApply delegate calls.  I haven't tested to see if it can
do
 more than this.

 3.  Ranges are the flagship way of iterating through an object in D, as they
 should be.  opApply has its uses, but they're relatively few and far between. 
I'm
 wondering if anyone besides bearophile and I think supporting opApply is worth
the
 effort and potential bloat, if Walter cares enough to fix Bug 2443, and if
Andrei
 considers it to be a reasonable part of std.range/std.algorithm's design.

 Thanks for this note.
 First, I think those interested in the relative merits and demerits of
 ranges vs. opApply should read this:
 http://okmij.org/ftp/papers/LL3-collections-enumerators.txt. The author,
 Oleg Kiselyov - an excellent PL theorist - makes a lot of excellent
 points (but also misses a few; I plan to write a rebuttal to that
 article a some point).
 Clearly opApply (and internal iteration in general) has many merits that
 are difficult to rival with ranges. The converse is also true. So
 perhaps it's compelling to have both.
 One concern regarding opApply is speed of manipulation through an
 indirect (delegate) call. It would be great to have some benchmarks that
 give us a good baseline for discussion. David, would you want to put a
 few together?
 Andrei

Here's one benchmark and the results, using SHOO's awesome new StopWatch module
that just landed in SVN a few days ago:

import std.stdio, std.functional, std.stopwatch, std.traits, std.range,
    std.algorithm;

struct OpApplyFilter(alias pred, Range) {
    alias unaryFun!pred fun;
    Range range;

    int opApply(int delegate(ref ForeachType!Range) dg) {
        int res;

        foreach(elem; range) if(fun(elem)) {
            res = dg(elem);
            if(res) break;
        }

        return res;
    }
}

auto opApplyFilter(alias pred, Range)(Range range) {
    return OpApplyFilter!(pred, Range)(range);
}

struct OpApplyMap(alias pred, Range) {
    alias unaryFun!pred fun;
    Range range;

    int opApply(int delegate(ref ForeachType!Range) dg) {
        int res;

        foreach(elem; range) {
            auto mapped = fun(elem);
            res = dg(mapped);
            if(res) break;
        }

        return res;
    }
}

auto opApplyMap(alias pred, Range)(Range range) {
    return OpApplyMap!(pred, Range)(range);
}


void main() {
    auto myRange = iota(100_000_000);

    // Keep the compiler from excessively optimizing.
    int sum;

    auto sw = new StopWatch(StopWatch.AutoStart.yes);

    foreach(elem; map!"a * a"(filter!"a & 1"(myRange))) {
        sum += elem;
    }

    sw.stop;
    writeln("Range-based:  ", sw.peek.milliseconds);

    sw.reset();
    sw.start();

    foreach(elem; opApplyMap!"a * a"(opApplyFilter!"a & 1"(myRange))) {
        sum += elem;
    }

    sw.stop;
    writeln("opApply-based:  ", sw.peek.milliseconds);

    sw.reset();
    sw.start();

    foreach(elem; opApplyMap!"a * a"(myRange)) {
        sum += elem;
    }

    sw.stop;
    writeln("opApply-based, Map only:  ", sw.peek.milliseconds);

    sw.reset();
    sw.start();

    foreach(elem; opApplyFilter!"a & 1"(
    opApplyMap!"a + 1"(opApplyMap!"a * a"(myRange)))) {
        sum += elem;
    }

    sw.stop;
    writeln("opApply-based, two maps:  ", sw.peek.milliseconds);


    // This is to force the value of sum to be used to prevent
    // overly aggressive optimizations by DMD.
    writeln("Ignore this:  ", sum);
}

Results:

Range-based:  404.445
opApply-based:  718.914
opApply-based, Map only:  524.051
opApply-based, two maps:  1501.92
Ignore this:  1158518272

So it seems like adding more and more layers of stacking causes each layer of
stacking to have progressively more overhead.  This makes sense because each
layer
of stacking is likely to have a progressively worse effect on the CPU pipeline.
The CPU can probably speculatively execute around one layer of indirect calls,
but
not a whole bunch.

BTW, from looking at disassemblies it seems like for some strange reason the
predicate functions are never getting inlined.

Aug 27 2010

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:
 So it seems like adding more and more layers of stacking causes each layer of
 stacking to have progressively more overhead.  This makes sense because each
layer
 of stacking is likely to have a progressively worse effect on the CPU pipeline.
 The CPU can probably speculatively execute around one layer of indirect calls,
but
 not a whole bunch.

See also:
http://tinyurl.com/2w5jlh6

Bye,
bearophile

Aug 27 2010

dsimcha <dsimcha yahoo.com> writes:

== Quote from bearophile (bearophileHUGS lycos.com)'s article
 The gentle and a-lot-working David Simcha has converted one of my bug reports

into an enhancement request:
 http://d.puremagic.com/issues/show_bug.cgi?id=4264
 Currently (SVN) only array() is able to digest iterables that define opApply

only, but I think a few more spread support will be very good.
 In the thread of that 4264 there are plenty of comments and views of the

situation. Opinions welcome, both here and there.
 Bye,
 bearophile

I've started looking into the performance aspects of supporting opApply in
std.algorithm and std.range where possible.  I created quick and dirty Map,
Filter
and Until opApply-based iterables for D1.  I then instantiated them with int[]
arrays and some trivial (as in, compile to a single instruction) predicates,
compiled them on LDC with full optimizations and started reading some
disassemblies.  It seems even LDC is too stupid to fully inline cases where Map
is
stacked with Filter.  Things seem to get partially inlined in the most arbitrary
ways, even though Map gets fully inlined when used alone.  Even stranger, the
delegate calls get converted to direct function calls, even though they don't
get
inlined.

I wonder if this is a weakness of compiler inlining heuristics in general,
though.
 When you stack a whole bunch of trivial function calls together, such these
function calls would compile down to a handful of instructions if every single
one
of them was inlined, is this just an ugly corner case in the imperfect
heuristics
compilers use to inline functions?  If so, now I understand why old-school
programmers don't trust compilers.

Aug 25 2010

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:
 even though Map gets fully inlined when used alone.

This is what I too have found in past.


 I wonder if this is a weakness of compiler inlining heuristics in general,
though.

I have a bug report open for LLVM about this :-)


 is this just an ugly corner case in the imperfect heuristics
 compilers use to inline functions?

Those heuristics are often fragile.


 if so, now I understand why old-school
 programmers don't trust compilers.

Good :-)
On the other hand I have heard that the Scala compiler is better in inlining
callables.

Bye,
bearophile

Aug 25 2010

D Programming

C/C++ Programming

Other

digitalmars.D - Algorithms & opApply