www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Compile time loop unrolling

reply Bill Baxter <dnewsgroup billbaxter.com> writes:
Has anyone done this before?
It's pretty similar to what Don's stuff does, and maybe Don is even 
doing this in part of Blade somewhere, but anyway it's a little 
different from the type of thing he's got on his web page.

Here the basic idea is to optimize templated small vector classes.

Say you've got a struct Vector(N) type.  A lot of the operations look like
     values_[0] op other.values_[0];
     values_[1] op other.values_[1];
     ...
     values_[N-1] op other.values_[N-1];

//----------------------------------------------------------------------------
import std.metastrings;

// Create a string that unrolls the given expression N times replacing
// idx in the expression each time
string unroll(int N,int i=0)(string expr, char idx='z') {
     static if(i<N) {
         char[] subs_expr;
         foreach (c; expr) {
             if (c==idx) {
                 subs_expr ~= ToString!(i);
             } else {
                 subs_expr ~= c;
             }
         }
         return subs_expr ~ "\n" ~ unroll!(N,i+1)(expr,idx);
     }
     return "";
}

Then to use it to implement opAddAssign you write code like:

     alias unroll!(N) unroll_;
     void opAddAssign(ref vector_type _rhs) {
         const string expr = "values_[z] += _rhs[z];";
         //pragma(msg,unroll_(expr)); // handy for debug
         mixin( unroll_(expr) );
     }

Seems to work pretty well despite the braindead strategy of "replace 
every 'z' with the loop number".

I suspect this would improve performance significantly when using DMD 
since it can't inline anything with loops.

With the D2.0 and a "static foreach(i;N)" type of construct you could 
probably do this by just saying:
     static foreach(i;N) {
        values_[i] = _rhs.values_[i];
     }

I wish that were coming to D1.0.

--bb
Aug 29 2007
parent Daniel Keep <daniel.keep.lists gmail.com> writes:
I've done loop unrolling in a few places using Tuples and foreach.

template Tuple(T...) { alias T Tuple; }

template Range(uint n)
{
    static if( n == 0 )
        alias Tuple!() Range;
    else
        alias Tuple!(Range!(n-1), n-1) Range;
}

void copy_four(int[] src, int[] dst)
{
    foreach( i,_ ; Range!(4) )
        src[i] = dst[i];
}


Which *should* unroll the loop.  Note that I haven't checked the
assembly to make sure of this, but since it works when you have tuples
inside the loop, I'd assume that it would have to :)

	-- Daniel
Aug 29 2007