digitalmars.D - A simple way to do compile time loop unrolling
- finalpatch (15/15) May 31 2013 Just want to share a new way I just discovered to do loop
- finalpatch (13/28) May 31 2013 Minor improvement:
- Piotr Szturmaj (5/19) May 31 2013 The advantage of foreach unrolling is that compiler can optimally choose...
- Marco Leise (14/16) May 31 2013 GDC once vectorized something for me, where I used a struct of
- Andrei Alexandrescu (4/18) May 31 2013 Hehe, first shot is always a trip isn't it. Welcome aboard.
- bearophile (5/6) May 31 2013 Better (some part of static foreach):
- Peter Alexander (5/20) May 31 2013 Remember that in D, most side-effect free functions can be run at
- Nick Sabalausky (5/8) May 31 2013 Dayamn! I knew CTFE had improved considerably over the last year or
- finalpatch (10/14) May 31 2013 Wow! That's so very cool! We can make it even nicer with
Just want to share a new way I just discovered to do loop unrolling. template Unroll(alias CODE, alias N) { static if (N == 1) enum Unroll = format(CODE, 0); else enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1); } after that you can write stuff like mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3)); and it gets expanded to v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2]; I find this method simpler than with foreach() and a tuple range, and also faster because it's identical to hand unrolling.
May 31 2013
Minor improvement: template Unroll(alias CODE, alias N, alias SEP="") { static if (N == 1) enum Unroll = format(CODE, 0); else enum Unroll = Unroll!(CODE, N-1, SEP)~SEP~format(CODE, N-1); } So vector dot product can be unrolled like this: mixin(Unroll!("v1[%1$d]*v2[%1$d]", 3, "+")); which becomes: v1[0]*v2[0]+v1[1]*v2[1]+v1[2]*v2[2] On Friday, 31 May 2013 at 14:06:19 UTC, finalpatch wrote:Just want to share a new way I just discovered to do loop unrolling. template Unroll(alias CODE, alias N) { static if (N == 1) enum Unroll = format(CODE, 0); else enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1); } after that you can write stuff like mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3)); and it gets expanded to v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2]; I find this method simpler than with foreach() and a tuple range, and also faster because it's identical to hand unrolling.
May 31 2013
W dniu 31.05.2013 16:06, finalpatch pisze:Just want to share a new way I just discovered to do loop unrolling. template Unroll(alias CODE, alias N) { static if (N == 1) enum Unroll = format(CODE, 0); else enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1); } after that you can write stuff like mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3)); and it gets expanded to v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2]; I find this method simpler than with foreach() and a tuple range, and also faster because it's identical to hand unrolling.The advantage of foreach unrolling is that compiler can optimally choose unrolling depth as different depths may be faster or slower on different CPU targets. It is also an opportunity to do loop vectorization. But I doubt that either is available in DMD, not sure about GDC and LDC.
May 31 2013
Am Fri, 31 May 2013 16:33:19 +0200 schrieb Piotr Szturmaj <bncrbme jadamspam.pl>:It is also an opportunity to do loop vectorization. But I doubt that either is available in DMD, not sure about GDC and LDC.GDC once vectorized something for me, where I used a struct of 4 ubyte fields. I don't remember if it was a loop at all. I think all I did was operate on 3 of the fields in sequence applying the same operations and the compiler loaded the whole struct into an SSE register and it really payed off speed wise! But when you think about it, working with RGB or XYZW vectors is a common task in programming, so I can see why they put so much work into vectorization. The caveat is just that you have to remember to add a fourth dummy field to XYZ or RGB. -- Marco
May 31 2013
On 5/31/13 10:06 AM, finalpatch wrote:Just want to share a new way I just discovered to do loop unrolling. template Unroll(alias CODE, alias N) { static if (N == 1) enum Unroll = format(CODE, 0); else enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1); } after that you can write stuff like mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3)); and it gets expanded to v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2]; I find this method simpler than with foreach() and a tuple range, and also faster because it's identical to hand unrolling.Hehe, first shot is always a trip isn't it. Welcome aboard. We should have something like that in phobos. Andrei
May 31 2013
Andrei Alexandrescu:We should have something like that in phobos.Better (some part of static foreach): http://d.puremagic.com/issues/show_bug.cgi?id=4085 Bye, bearophile
May 31 2013
On Friday, 31 May 2013 at 14:06:19 UTC, finalpatch wrote:Just want to share a new way I just discovered to do loop unrolling. template Unroll(alias CODE, alias N) { static if (N == 1) enum Unroll = format(CODE, 0); else enum Unroll = Unroll!(CODE, N-1)~format(CODE, N-1); } after that you can write stuff like mixin(Unroll!("v[%1$d]"~op~"=rhs.v[%1$d];", 3)); and it gets expanded to v[0]+=rhs.v[0];v[1]+=rhs.v[1];v[2]+=rhs.v[2]; I find this method simpler than with foreach() and a tuple range, and also faster because it's identical to hand unrolling.Remember that in D, most side-effect free functions can be run at compile time. No need for recursive template trickery: mixin(iota(3).map!(i => format("v[%1$d]+=rhs.v[%1$d];", i)).join());
May 31 2013
On Fri, 31 May 2013 19:30:10 +0200 "Peter Alexander" <peter.alexander.au gmail.com> wrote:mixin(iota(3).map!(i => format("v[%1$d]+=rhs.v[%1$d];", i)).join());Dayamn! I knew CTFE had improved considerably over the last year or so, but even I didn't expect something like that to be working already. That's crazy! :)
May 31 2013
Wow! That's so very cool! We can make it even nicer with template Unroll(alias CODE, alias N, alias SEP="") { enum t = replace(CODE, "%", "%1$d"); enum Unroll = iota(N).map!(i => format(t, i)).join(SEP); } And use % as the placeholder instead of the ugly %1$d: mixin(Unroll!("v1[%]*v2[%]", 3, "+")); It actually gets quite readable now. On Friday, 31 May 2013 at 17:30:13 UTC, Peter Alexander wrote:Remember that in D, most side-effect free functions can be run at compile time. No need for recursive template trickery: mixin(iota(3).map!(i => format("v[%1$d]+=rhs.v[%1$d];", i)).join());
May 31 2013