digitalmars.D.learn - Efficient idiom for fastest code

IntegratedDimensions (37/37) May 22 2018 Many times in expensive loops one must make decisions. Decisions

Nicholas Wilson (18/60) May 22 2018 assuming you meant

IntegratedDimensions (26/103) May 22 2018 I knew someone was going to say that and I forgot to say DON'T!

Nicholas Wilson (19/45) May 23 2018 I'm going to ignore the tone of your response, but I am going to
biocyberman (10/39) May 24 2018 Very challenging. Wish I could help you out with the tough work.

Malte (19/90) May 23 2018 I would just do

IntegratedDimensions (4/10) May 23 2018 Thanks, I didn't think about using a for loop like that. While it

IntegratedDimensions <IntegratedDimensions gmail.com> writes:

Many times in expensive loops one must make decisions. Decisions 
must be determined and the determination costs.

for(int i = 0; i < N; i++)
{
     if(decision(i)) A; else B;
}

the if statement costs N times the cycle cost.

In some cases the decision holds for continuous ranges. For some 
0 <= n <= N the decision is constant, but n is 
arbitrary(determined by unknown factors at compile time).

One can speed up the routine by using something akin to a 
simplified strategy pattern where one uses 
functions/delegates/lambdas to code a faster version without the 
test:


for(int i = 0; i < N; i++)
{
     d = (){ if(decision(i)) A; else d = () { B; };
     d();
}

this code basically reduces to

for(int i = 0; i < N; i++)
{
     B;
}

Once the decision fails, which we assume once it fails it always 
fails in this particular case.

Therefor, once the decision fails it kicks in the "faster" code. 
Suppose decision is very slow.

The cycle cost is then n times rather than N times.

In general, such techniques can be used to speed up code, 
sometimes significantly, but are difficult to implement using a 
standard pattern and for more complex decision functions.


Does anyone see any way to use some some of D's power to help 
simplify such things but not using something like a strategy 
pattern or complexifying the overall design(using advanced 
techniques such as templates, mixins is not making the code more 
complex).

May 22 2018

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Wednesday, 23 May 2018 at 02:24:08 UTC, IntegratedDimensions 
wrote:
 Many times in expensive loops one must make decisions. 
 Decisions must be determined and the determination costs.

 for(int i = 0; i < N; i++)
 {
     if(decision(i)) A; else B;
 }

 the if statement costs N times the cycle cost.

 In some cases the decision holds for continuous ranges. For 
 some 0 <= n <= N the decision is constant, but n is 
 arbitrary(determined by unknown factors at compile time).

 One can speed up the routine by using something akin to a 
 simplified strategy pattern where one uses 
 functions/delegates/lambdas to code a faster version without 
 the test:


 for(int i = 0; i < N; i++)
 {
     d = (){ if(decision(i)) A; else d = () { B; };
     d();
 }

assuming you meant
 auto d = (int i){ if(decision(i)) A; else d = (int i) { B; };};
 for(int i = 0; i < N; i++)
 {
     d(i);
 }

 this code basically reduces to

 for(int i = 0; i < N; i++)
 {
     B;
 }

 Once the decision fails, which we assume once it fails it 
 always fails in this particular case.

 Therefor, once the decision fails it kicks in the "faster" 
 code. Suppose decision is very slow.

 The cycle cost is then n times rather than N times.

 In general, such techniques can be used to speed up code, 
 sometimes significantly, but are difficult to implement using a 
 standard pattern and for more complex decision functions.


 Does anyone see any way to use some some of D's power to help 
 simplify such things but not using something like a strategy 
 pattern or complexifying the overall design(using advanced 
 techniques such as templates, mixins is not making the code 
 more complex).

If decision is pure, consider using static foreach 
(iota(N).map!decision) { ... }to unroll it at compile time. even 
if it isn't the compiler may still be able to factor out parts of 
repeated calls to decision, look at the assembly/IR to confirm 
this.

Otherwise PROFILE!!!!! (and use profile guided optimisation! this 
implies using a compiler other than DMD which if you care about 
performance you should be doing anyway)

Blindly trying to optimise is just as likely to hurt performance.
in particular don't underestimate the branch predictor. Even if 
decision is complex, if there is a pattern in evaluating 
iota(n).map!decision the branch predictor will pick up on it.

In terms of exploiting knowledge of decision a priori that the 
compiler somehow lacks you really don't have much option but to 
do it yourself.

May 22 2018

IntegratedDimensions <IntegratedDimensions gmail.com> writes:

On Wednesday, 23 May 2018 at 03:00:17 UTC, Nicholas Wilson wrote:
 On Wednesday, 23 May 2018 at 02:24:08 UTC, IntegratedDimensions 
 wrote:
 Many times in expensive loops one must make decisions. 
 Decisions must be determined and the determination costs.

 for(int i = 0; i < N; i++)
 {
     if(decision(i)) A; else B;
 }

 the if statement costs N times the cycle cost.

 In some cases the decision holds for continuous ranges. For 
 some 0 <= n <= N the decision is constant, but n is 
 arbitrary(determined by unknown factors at compile time).

 One can speed up the routine by using something akin to a 
 simplified strategy pattern where one uses 
 functions/delegates/lambdas to code a faster version without 
 the test:


 for(int i = 0; i < N; i++)
 {
     d = (){ if(decision(i)) A; else d = () { B; };
     d();
 }

 assuming you meant
 auto d = (int i){ if(decision(i)) A; else d = (int i) { B; };};
 for(int i = 0; i < N; i++)
 {
     d(i);
 }

 this code basically reduces to

 for(int i = 0; i < N; i++)
 {
     B;
 }

 Once the decision fails, which we assume once it fails it 
 always fails in this particular case.

 Therefor, once the decision fails it kicks in the "faster" 
 code. Suppose decision is very slow.

 The cycle cost is then n times rather than N times.

 In general, such techniques can be used to speed up code, 
 sometimes significantly, but are difficult to implement using 
 a standard pattern and for more complex decision functions.


 Does anyone see any way to use some some of D's power to help 
 simplify such things but not using something like a strategy 
 pattern or complexifying the overall design(using advanced 
 techniques such as templates, mixins is not making the code 
 more complex).

 If decision is pure, consider using static foreach 
 (iota(N).map!decision) { ... }to unroll it at compile time. 
 even if it isn't the compiler may still be able to factor out 
 parts of repeated calls to decision, look at the assembly/IR to 
 confirm this.

 Otherwise PROFILE!!!!! (and use profile guided optimisation! 
 this implies using a compiler other than DMD which if you care 
 about performance you should be doing anyway)

 Blindly trying to optimise is just as likely to hurt 
 performance.
 in particular don't underestimate the branch predictor. Even if 
 decision is complex, if there is a pattern in evaluating 
 iota(n).map!decision the branch predictor will pick up on it.

 In terms of exploiting knowledge of decision a priori that the 
 compiler somehow lacks you really don't have much option but to 
 do it yourself.

I knew someone was going to say that and I forgot to say DON'T!

Saying to profile when I clearly said these ARE cases where they 
are slow is just moronic. Please don't use default answers to 
arguments.

This was a general question about cases on how to attack a 
problem WHEN profiling says I need to optimize. Your SO 101 
answer sucks! Sorry!

To prove to you that your answer is invalid:

I profile my code, it says that it is very slow and shows that it 
is do to the decision checking... I then I have to come here and 
write up a post trying to explain how to solve the problem. I 
then get a post telling me I should profile. I then respond I did 
profile and that this is my problem. A lot of wasted energy when 
it is better to know a general attack strategy. Yes, some of us 
can judge if code is needed to be optimized before profiling. It 
is not difficult. Giving a generic answer that always does not 
apply and is obvious to anyone trying to do optimization is not 
helpful. Everyone today pretty must does not even optimize code 
anymore... this isn't 1979. It's not ok to keep repeating the 
same mantra. I guess we should turn this in to a meme?

The reason I'm getting on to you is that the "profile before 
optimization" sounds a bit grade school, specially since I wasn't 
talking anything about profiling but a general programming 
pattern speed up code, which is always valid but not always 
useful(and hence that is when profiling comes in).

May 22 2018

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Wednesday, 23 May 2018 at 03:12:52 UTC, IntegratedDimensions
wrote:
I knew someone was going to say that and I forgot to say DON'T!

Saying to profile when I clearly said these ARE cases where
they are slow is just moronic. Please don't use default answers
to arguments.

This was a general question about cases on how to attack a
problem WHEN profiling says I need to optimize. Your SO 101
answer sucks! Sorry!

To prove to you that your answer is invalid:

I profile my code, it says that it is very slow and shows that
it is do to the decision checking... I then I have to come here
and write up a post trying to explain how to solve the problem.
I then get a post telling me I should profile. I then respond I
did profile and that this is my problem. A lot of wasted energy
when it is better to know a general attack strategy. Yes, some
of us can judge if code is needed to be optimized before
profiling. It is not difficult. Giving a generic answer that
always does not apply and is obvious to anyone trying to do
optimization is not helpful. Everyone today pretty must does
not even optimize code anymore... this isn't 1979. It's not ok
to keep repeating the same mantra. I guess we should turn this
in to a meme?

The reason I'm getting on to you is that the "profile before
optimization" sounds a bit grade school, specially since I
wasn't talking anything about profiling but a general
programming pattern speed up code, which is always valid but
not always useful(and hence that is when profiling comes in).

I'm going to ignore the tone of your response, but I am going to
say that responding like that isn't going to get you very far.
Don't expect others to do likewise.

Assuming that your decision function is indeed the bottleneck,
you'll see I did actually provide some hints as to how to
optimise the case where decision is pure. even if you can't
convince the compiler to inline and expression combine, as in the
case for the other answer, you can memoize it (look in
std.functional).

One of the great things about D is that you can force lots of
computation to happen at compile time, so in the case where
decision is impure, factoring it into pure and impure parts and
`enum x = pureFunc(args);`ing the part that can be can make a
large difference if you can't convince the optimiser to do it for
you.

If you still can't do that consider Jitting it
(https://github.com/ldc-developers/druntime/blob/e3bfc5fb780967f1b6807039ff00b2ccaf4b03d9/src/
dc/attributes.d#L78 ) with `-enable-dynamic-compile` or running the loop in
parallel with std.parallelism.

May 23 2018

biocyberman <biocyberman gmail.com> writes:

On Wednesday, 23 May 2018 at 03:12:52 UTC, IntegratedDimensions 
wrote:
 On Wednesday, 23 May 2018 at 03:00:17 UTC, Nicholas Wilson 
 wrote:
 [...]

 I knew someone was going to say that and I forgot to say DON'T!

 Saying to profile when I clearly said these ARE cases where 
 they are slow is just moronic. Please don't use default answers 
 to arguments.

 This was a general question about cases on how to attack a 
 problem WHEN profiling says I need to optimize. Your SO 101 
 answer sucks! Sorry!

 To prove to you that your answer is invalid:

 I profile my code, it says that it is very slow and shows that 
 it is do to the decision checking... I then I have to come here 
 and write up a post trying to explain how to solve the problem. 
 I then get a post telling me I should profile. I then respond I 
 did profile and that this is my problem. A lot of wasted energy 
 when it is better to know a general attack strategy. Yes, some 
 of us can judge if code is needed to be optimized before 
 profiling. It is not difficult. Giving a generic answer that 
 always does not apply and is obvious to anyone trying to do 
 optimization is not helpful. Everyone today pretty must does 
 not even optimize code anymore... this isn't 1979. It's not ok 
 to keep repeating the same mantra. I guess we should turn this 
 in to a meme?

 The reason I'm getting on to you is that the "profile before 
 optimization" sounds a bit grade school, specially since I 
 wasn't talking anything about profiling but a general 
 programming pattern speed up code, which is always valid but 
 not always useful(and hence that is when profiling comes in).

Very challenging. Wish I could help you out with the tough work. 
People don't share the same context, especially via online, so it 
is necessary to clarify the  problem so other can understand and 
help. I've been beaten on stackoverflow many times for not 
providing sufficient information for my questions. It seems like 
one can do the reverse here at forum.dlang.org.

With that said, I think you know what you are doing, and you can 
do it. Just relax and give it more time and experimentation.

May 24 2018

Malte <no valid.mail> writes:

On Wednesday, 23 May 2018 at 02:24:08 UTC, IntegratedDimensions 
wrote:
 In some cases the decision holds for continuous ranges. For 
 some 0 <= n <= N the decision is constant, but n is 
 arbitrary(determined by unknown factors at compile time).

 One can speed up the routine by using something akin to a 
 simplified strategy pattern where one uses 
 functions/delegates/lambdas to code a faster version without 
 the test:


 for(int i = 0; i < N; i++)
 {
     d = (){ if(decision(i)) A; else d = () { B; };
     d();
 }

 this code basically reduces to

 for(int i = 0; i < N; i++)
 {
     B;
 }

 Once the decision fails, which we assume once it fails it 
 always fails in this particular case.

 Therefor, once the decision fails it kicks in the "faster" 
 code. Suppose decision is very slow.

I would just do
int i=0;
for(;decision(i) && i < N; i++)
{
    A;
}
for(;i < N; i++)
{
    B;
}

This could be turned to a mixin template with something like this:
mixin template forSplit(alias condition, alias A, alias B)
{
    void execute()
    {
        int i = 0;
        for (; condition(i) && i < N; i++)
        {
            A();
        }
        for (; i < N; i++)
        {
            B();
        }
    }
}

and to use it in code (assuming N is defined in the scope)
mixin forSplit!((int i)=>(decision(i)), {A;}, {B;}) loop;
loop.execute();

I have't measured anything, but I would assume that delegates 
come with an overhead that you just don't need here. In fact when 
trying to use
 auto d = (int i) {};
 d = (int i){ if(decision(i)) A; else d = (int i) { B; };};
 for(int i = 0; i < N; i++)
 {
     d(i);
 }

All I got was "cannot access frame of function D main" which sums 
up my experiences with lambdas in D so far.

While PGO and branch predictor are good, they don't help much 
here. Not executing an expression is out of scope for them. All 
they do is prevent pipeline flushes.
Also I think you overestimate what the compiler can do. My 
decision function to do some testing was this:
bool decision(int a) pure
out (result) {
    	assert(result == (a < 10));
} do {
    import std.algorithm, std.range;
 
    // stolen from https://dlang.org/library/std/parallelism.html
    enum n = 1_000_000;
    enum delta = 1.0 / n;
    alias getTerm = (int i)
    {
        immutable x = ( i - 0.5 ) * delta;
        return delta / ( 1.0 + x * x ) ;
    };
    immutable pi = 4.0 * reduce!"a + b"(n.iota.map!getTerm);
 
    return a < 3*pi;
}

With N=100 I got a speedup of ~10 (ldc -O3 -release). Even though 
this function is pure and could be optimized a lot. It calculated 
pi for every single call. And optimizing the decision function 
isn't even the point of that question.

May 23 2018

IntegratedDimensions <IntegratedDimensions gmail.com> writes:

On Wednesday, 23 May 2018 at 10:55:02 UTC, Malte wrote:
 On Wednesday, 23 May 2018 at 02:24:08 UTC, IntegratedDimensions 
 wrote:
 [...]

 I would just do
[...]

 [...]


Thanks, I didn't think about using a for loop like that. While it 
is not the most general it does solve the specific case for a 
simple step/toggled decision.

May 23 2018

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Efficient idiom for fastest code