www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Template performance

reply bearophile <bearophileHUGS lycos.com> writes:
Time ago I have found a little C++ program that computes the number of
solutions to the N Queens problem at compile-time using just templates. I have
translated it into CTFE code and I have shown in this newsgroup that if your
purpose is to compute a result at compile-time, then D CTFE allows you to write
much simpler (and faster) code compared to the C++ template mataprogramming.

This time I have translated that C++ code to D code that uses just templates.
This is not idiomatic D code, because for this purpose CTFE is better, but
templates are used in D too, so it may be a performance benchmark for templates
in general.

I am not very good with C++ templates yet, so if you spot an error in my D
translation please tell me that I will redo the timings.

Compilation time: G++ 0.96 seconds, dmd 12.4 seconds.

With N=7 G++ uses about 34 MB RAM, DMD about 130+ MB RAM.

I have used MinGW 4.5.1 and DMD 2.050.

Compilation:
g++ nqueens_cpp.cpp -o nqueens_cpp
dmd nqueens_d.d

The D code is nicer and shorter.

GCC 4.5 has optimized the template compilation, and from the benchmark it seems
faster:
http://cpptruths.blogspot.com/2010/03/faster-meta-programs-using-gcc-45-and.html

Is a similar optimization (using a hash) already present in DMD and is it
useful?

-----------------------------

// G++ 4.5.1 code
#include "stdio.h"

typedef unsigned long long uint64_t;

template <uint64_t Cols = 0, uint64_t Diag135 = 0, uint64_t Diag45 = 0,
uint64_t Solution = 0>
struct state {
    static uint64_t const cols = Cols;
    static uint64_t const diag135 = Diag135;
    static uint64_t const diag45 = Diag45;
    static uint64_t const solution = Solution;
};

template <int K, int J, typename State>
struct test {
    static bool const result = ((State::cols    & (1ull << J)) +
                                (State::diag135 & (1ull << (J + K))) +
                                (State::diag45  & (1ull << (32 + J - K)))) == 0;
};

template <int K, int J, typename State>
struct mark {
    typedef state <
                State::cols    ^ (1ull << J),
                State::diag135 ^ (1ull << (J + K)),
                State::diag45  ^ (1ull << (32 + J - K)),
                State::solution
            > state_type;
};

template <int StartValue, int Times, typename State>
struct accumulate_result {
    typedef typename
        accumulate_result <
            StartValue + 1,
            Times - 1,
            state <
                State::cols,
                State::diag135,
                State::diag45,
                State::solution + test<0, StartValue, State>::result
            >
        >::state_type state_type;
};

template <int StartValue, typename State>
struct accumulate_result<StartValue, 0, State> {
    typedef State state_type;
};

template <int Niv, int Dx, typename State>
struct solve;

template <bool Condition, typename State, int Current, int Niv, int Dx>
struct result_from_test {
    typedef typename
        mark <
            Niv,
            Current,
            typename solve <
                Niv - 1,
                Dx,
                typename mark<Niv, Current, State>::state_type
            >::state_type
        >::state_type state_type;
};

template <typename State, int Current, int Niv, int Dx>
struct result_from_test<false, State, Current, Niv, Dx> {
    typedef State state_type;
};

template <int Begin, int Times, typename State, int Niv, int Dx>
struct process_queens {
    typedef typename
        process_queens <
            Begin + 1,
            Times - 1,
            typename result_from_test <
                test<Niv, Begin, State>::result,
                State,
                Begin,
                Niv,
                Dx
            >::state_type,
            Niv,
            Dx
        >::state_type state_type;
};

template <int Begin, typename State, int Niv, int Dx>
struct process_queens<Begin, 0, State, Niv, Dx> {
    typedef State state_type;
};

template <int Niv, int Dx, typename State = state<> >
struct solve {
    typedef typename
        process_queens<0, Dx, State, Niv, Dx>::state_type state_type;
    static uint64_t const result = state_type::solution;
};

template <int Dx, typename State>
struct solve<0, Dx, State> {
    typedef typename
        accumulate_result<0, Dx, State>::state_type state_type;
    static uint64_t const result = state_type::solution;
};

template <int Dx>
struct meta_queens:
    solve<Dx - 1, Dx> {};

int main() {
    printf("%llu", meta_queens<7>::result);
}

--------------------------------

// D2 code
import core.stdc.stdio: printf;

template State(ulong Cols=0, ulong Diag135=0, ulong Diag45=0, ulong Solution=0)
{
    enum ulong cols = Cols;
    enum ulong diag135 = Diag135;
    enum ulong diag45 = Diag45;
    enum ulong solution = Solution;
}

template Test(int k, int j, alias state) {
    enum bool Test = ((state.cols    & (1UL << j)) +
                      (state.diag135 & (1UL << (j + k))) +
                      (state.diag45  & (1UL << (32 + j - k)))) == 0;
}

template Mark(int k, int j, alias state) {
    alias State!(state.cols    ^ (1UL << j),
                 state.diag135 ^ (1UL << (j + k)),
                 state.diag45  ^ (1UL << (32 + j - k)),
                 state.solution
                ) Mark;
}

template AccumulateResult(int startValue, int times, alias state) {
    static if (times == 0)
        alias state AccumulateResult;
    else
        alias AccumulateResult!(startValue + 1,
                                times - 1,
                                State!(state.cols,
                                       state.diag135,
                                       state.diag45,
                                       state.solution + Test!(0, startValue,
state))
                               ) AccumulateResult;
}

template ResultFromTest(bool condition, alias state, int current, int niv, int
dx) {
    static if (condition == 0)
        alias state ResultFromTest;
    else
        alias Mark!(niv,
                    current,
                    Solve!(niv - 1,
                           dx,
                           Mark!(niv, current, state))
                   ) ResultFromTest;
}

template ProcessQueens(int begin, int times, alias state, int niv, int dx) {
    static if (times == 0)
        alias state ProcessQueens;
    else
        alias ProcessQueens!(begin + 1,
                             times - 1,
                             ResultFromTest!(Test!(niv, begin, state), state,
begin, niv, dx),
                             niv,
                             dx
                            ) ProcessQueens;
}

template Solve(int niv, int dx, alias state=State!()) {
    static if (niv == 0)
        alias AccumulateResult!(0, dx, state) Solve;
    else
        alias ProcessQueens!(0, dx, state, niv, dx) Solve;
}

template MetaNQueens(int dx) {
    enum ulong MetaNQueens = Solve!(dx - 1, dx).solution;
}

void main() {
    printf("%llu", MetaNQueens!(7));
}

-----------------------------

Bye,
bearophile
Nov 22 2010
next sibling parent reply "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Mon, 22 Nov 2010 18:23:37 -0500, bearophile <bearophileHUGS lycos.com>  
wrote:

 Time ago I have found a little C++ program that computes the number of  
 solutions to the N Queens problem at compile-time using just templates.  
 I have translated it into CTFE code and I have shown in this newsgroup  
 that if your purpose is to compute a result at compile-time, then D CTFE  
 allows you to write much simpler (and faster) code compared to the C++  
 template mataprogramming.

 This time I have translated that C++ code to D code that uses just  
 templates. This is not idiomatic D code, because for this purpose CTFE  
 is better, but templates are used in D too, so it may be a performance  
 benchmark for templates in general.

 I am not very good with C++ templates yet, so if you spot an error in my  
 D translation please tell me that I will redo the timings.

 Compilation time: G++ 0.96 seconds, dmd 12.4 seconds.
This might be due to a bug I reported how DMD is very slow when dealing with lots of templates. See http://d.puremagic.com/issues/show_bug.cgi?id=4900
Nov 23 2010
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
Steven Schveighoffer:

 This might be due to a bug I reported how DMD is very slow when dealing  
 with lots of templates.
 
 See http://d.puremagic.com/issues/show_bug.cgi?id=4900
Right, it may be the same cause. Are you able & willing to perform a similar profiling of DMD while it compiles that little template-nqueen D bench? (How do you profile the DMD compiler?) Bye, bearophile
Nov 23 2010
parent "Steven Schveighoffer" <schveiguy yahoo.com> writes:
On Tue, 23 Nov 2010 13:27:22 -0500, bearophile <bearophileHUGS lycos.com>  
wrote:

 Steven Schveighoffer:

 This might be due to a bug I reported how DMD is very slow when dealing
 with lots of templates.

 See http://d.puremagic.com/issues/show_bug.cgi?id=4900
Right, it may be the same cause. Are you able & willing to perform a similar profiling of DMD while it compiles that little template-nqueen D bench? (How do you profile the DMD compiler?)
basically, I downloaded the svn source for dmd, then edited the makefile to add the profile flags (I think they are commented out), and finally run the profiler. I've run into cases where I thought I did it right and I really didn't, so it's not an easy thing. I can't remember off the top of my head what they are, but something like -gp? Remember however that the default build of dmd from svn is debug mode, so if you profile that, you will likely get the wrong results. -Steve
Nov 23 2010
prev sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Tuesday, November 23, 2010 08:43:20 Steven Schveighoffer wrote:
 On Mon, 22 Nov 2010 18:23:37 -0500, bearophile <bearophileHUGS lycos.com>
 
 wrote:
 Time ago I have found a little C++ program that computes the number of
 solutions to the N Queens problem at compile-time using just templates.
 I have translated it into CTFE code and I have shown in this newsgroup
 that if your purpose is to compute a result at compile-time, then D CTFE
 allows you to write much simpler (and faster) code compared to the C++
 template mataprogramming.
 
 This time I have translated that C++ code to D code that uses just
 templates. This is not idiomatic D code, because for this purpose CTFE
 is better, but templates are used in D too, so it may be a performance
 benchmark for templates in general.
 
 I am not very good with C++ templates yet, so if you spot an error in my
 D translation please tell me that I will redo the timings.
 
 Compilation time: G++ 0.96 seconds, dmd 12.4 seconds.
This might be due to a bug I reported how DMD is very slow when dealing with lots of templates. See http://d.puremagic.com/issues/show_bug.cgi?id=4900
There are also definitely memory issues with compiling templates: http://d.puremagic.com/issues/show_bug.cgi?id=4984 It's enough, for instance, that you can't use very many arguments to a single call to find(), or the compiler will run out of memory. I expect that it plays a definite part in the slow template compilation. - Jonathan M Davis
Nov 23 2010
prev sibling next sibling parent Stanislav Blinov <blinov loniir.ru> writes:
23.11.2010 2:23, bearophile wrote:
 Time ago I have found a little C++ program that computes the number of
solutions to the N Queens problem at compile-time using just templates. I have
translated it into CTFE code and I have shown in this newsgroup that if your
purpose is to compute a result at compile-time, then D CTFE allows you to write
much simpler (and faster) code compared to the C++ template mataprogramming.

 This time I have translated that C++ code to D code that uses just templates.
This is not idiomatic D code, because for this purpose CTFE is better, but
templates are used in D too, so it may be a performance benchmark for templates
in general.

 I am not very good with C++ templates yet, so if you spot an error in my D
translation please tell me that I will redo the timings.

 Compilation time: G++ 0.96 seconds, dmd 12.4 seconds.

 With N=7 G++ uses about 34 MB RAM, DMD about 130+ MB RAM.

 I have used MinGW 4.5.1 and DMD 2.050.
It'd be good if you also posted the machine specs. I ran the benchmark on Intel i5 750 with 8Gb RAM under Windows7 Pro x64. I don't have G++ right now, but I tried cl from Visual Studio 2008 and dmc 8.42n instead Here are the timings (compiler-version-seconds): dmd 2.050 2.97958 cl 15.00.30729.01 3.65038 dmc 8.42n 11.6375 So at least on this system dmd doesn't look that bad at all.
Nov 23 2010
prev sibling parent Emil Madsen <sovende gmail.com> writes:
2010/11/23 Stanislav Blinov <blinov loniir.ru>

 23.11.2010 2:23, bearophile wrote:

 Time ago I have found a little C++ program that computes the number of
 solutions to the N Queens problem at compile-time using just templates. I
 have translated it into CTFE code and I have shown in this newsgroup that if
 your purpose is to compute a result at compile-time, then D CTFE allows you
 to write much simpler (and faster) code compared to the C++ template
 mataprogramming.

 This time I have translated that C++ code to D code that uses just
 templates. This is not idiomatic D code, because for this purpose CTFE is
 better, but templates are used in D too, so it may be a performance
 benchmark for templates in general.

 I am not very good with C++ templates yet, so if you spot an error in my D
 translation please tell me that I will redo the timings.

 Compilation time: G++ 0.96 seconds, dmd 12.4 seconds.

 With N=7 G++ uses about 34 MB RAM, DMD about 130+ MB RAM.

 I have used MinGW 4.5.1 and DMD 2.050.
It'd be good if you also posted the machine specs. I ran the benchmark on Intel i5 750 with 8Gb RAM under Windows7 Pro x64. I don't have G++ right now, but I tried cl from Visual Studio 2008 and dmc 8.42n instead Here are the timings (compiler-version-seconds): dmd 2.050 2.97958 cl 15.00.30729.01 3.65038 dmc 8.42n 11.6375 So at least on this system dmd doesn't look that bad at all.
I'm getting: MinGW G++ (source code modified): 744 miliseconds MinGW G++ (usual version): 924 miliseconds Dmd: 4275 miliseconds (i7 720QM, 4GB ram) -- // Yours sincerely // Emil 'Skeen' Madsen
Nov 23 2010