digitalmars.D - Code runtime
- Ausprobierer (71/71) Jun 05 2016 I've written this simple piece of code:
- ag0aep6g (3/4) Jun 05 2016 Misses -O for actual optimizations. You can also try ldc or gdc for
- Ausprobierer (2/6) Jun 05 2016 Still a runtime of approx. 36 seconds.
- ag0aep6g (6/13) Jun 05 2016 When I compare `ldc2 -release -O` against `g++ -O`, I get 13 seconds and...
- Donald Drumpf (6/18) Jun 05 2016 How many times did you run each? A single run isn't ever really
- Ausprobierer (22/42) Jun 05 2016 With
- Donald Drumpf (2/25) Jun 05 2016 Well that's embarrassing.
- extrawurst (9/80) Jun 05 2016 I never used those methods neither in D nor in C++ but I have a
- Ausprobierer (14/17) Jun 05 2016 I think you're right, it probably is the nextPermutation()
- H. S. Teoh via Digitalmars-d (7/10) Jun 05 2016 I wrote nextPermutation. It does not allocate.
- Andrei Alexandrescu (14/21) Jun 05 2016 FWIW:
- Vladimir Panteleev (3/4) Jun 06 2016 FWIW, there's also this:
- Seb (6/11) Jun 06 2016 and mir.combinatorics
- John Colvin (8/79) Jun 05 2016 On OS X using ldc and clang, I get C++: ~6s and D: ~7s. The
- Ausprobierer (3/10) Jun 05 2016 I've compiled and run the code on Win7 x64. I think it's a
- Johan Engelen (6/10) Jun 06 2016 +1 !
- Max Samukha (2/3) Jun 06 2016 Note that 'long' in D is 64 bits, whereas in MSVC it is 32 bits.
I've written this simple piece of code: [CODE] import std.algorithm; import std.array; import std.conv; import std.datetime; import std.parallelism; import std.range; import std.stdio; import core.atomic; import core.thread; void main() { immutable long n = 11; long[] z = [1,1,2,2,3,3,4,4,5,5,6,6,0,0]; long j = 0; long l = z.length; StopWatch w; w.reset(); w.start; do { if (z[0] == 0) break; else { long m = 0; for (long i = 0; i < z.length; i += 2) { m += (z[i]*10 + z[i+1]); } if (m % n == 0) j++; } } while (nextPermutation(z)); w.stop(); j.writeln; w.peek().msecs.writeln; } [/CODE] I've compiled it with "dmd -m64 -release -inline <filename>.d". Output is 58679513, and it takes around 52 seconds to run. Then I've replicated this code in MSVS2015 C++: [CODE] #include <algorithm> #include <iostream> #include <ctime> using namespace std; const long N = 11; long Z[] = { 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 0, 0 }; long L = sizeof(Z) / sizeof(Z[0]); int main(void) { long j = 0; clock_t t = clock(); do { if (Z[0] == 0) break; long m = 0; for (long i = 0; i < L; i += 2) { m += (Z[i] * 10 + Z[i + 1]); } if (m % N == 0) { j++; } } while (next_permutation(Z, Z+L)); t = clock() - t; cout << j << endl; cout << (float(t)) / CLOCKS_PER_SEC << endl; return 0; } [/CODE] Compile mode is also Release/x64. I've run the program and was eagerly surprised, not to say stunned. The output is also 58679513, but it's done in under 6 seconds! Care to explain? I haven't found a solution to this :(
Jun 05 2016
On 06/05/2016 11:52 PM, Ausprobierer wrote:I've compiled it with "dmd -m64 -release -inline <filename>.d".Misses -O for actual optimizations. You can also try ldc or gdc for better performance.
Jun 05 2016
On Sunday, 5 June 2016 at 21:56:30 UTC, ag0aep6g wrote:On 06/05/2016 11:52 PM, Ausprobierer wrote:Still a runtime of approx. 36 seconds.I've compiled it with "dmd -m64 -release -inline <filename>.d".Misses -O for actual optimizations. You can also try ldc or gdc for better performance.
Jun 05 2016
On 06/06/2016 12:00 AM, Ausprobierer wrote:On Sunday, 5 June 2016 at 21:56:30 UTC, ag0aep6g wrote:When I compare `ldc2 -release -O` against `g++ -O`, I get 13 seconds and 9 seconds. Still slower, but it's in the same ballpark. `dmd -release -O -inline` does significantly worse: 40 seconds. dmd's strength are quick compilations, not fast executables (not that the two are mutually exclusive).On 06/05/2016 11:52 PM, Ausprobierer wrote:Still a runtime of approx. 36 seconds.I've compiled it with "dmd -m64 -release -inline <filename>.d".Misses -O for actual optimizations. You can also try ldc or gdc for better performance.
Jun 05 2016
On Sunday, 5 June 2016 at 21:52:20 UTC, Ausprobierer wrote:I've written this simple piece of code: [CODE] import std.algorithm; import std.array; import std.conv; import std.datetime; import std.parallelism; import std.range; import std.stdio; import core.atomic; import core.thread; [...]How many times did you run each? A single run isn't ever really useful. Also, not sure if this would account for the drastic difference, but declare long Z[] in global scope over in local scope might have some impact. If I wrote this, it would be faster, I guarantee it.
Jun 05 2016
On Sunday, 5 June 2016 at 21:58:46 UTC, Donald Drumpf wrote:On Sunday, 5 June 2016 at 21:52:20 UTC, Ausprobierer wrote:With [CODE] import std.algorithm; import std.array; import std.conv; import std.datetime; import std.parallelism; import std.range; import std.stdio; import core.atomic; import core.thread; long[] z = [1,1,2,2,3,3,4,4,5,5,6,6,0,0]; void main() { immutable long n = 11; long l = z.length; long j = 0; [...] [/CODE] and "dmd -m64 -O -inline -release <filename>.d" I still get 40 seconds runtime.I've written this simple piece of code: [CODE] import std.algorithm; import std.array; import std.conv; import std.datetime; import std.parallelism; import std.range; import std.stdio; import core.atomic; import core.thread; [...]How many times did you run each? A single run isn't ever really useful. Also, not sure if this would account for the drastic difference, but declare long Z[] in global scope over in local scope might have some impact. If I wrote this, it would be faster, I guarantee it.
Jun 05 2016
On Sunday, 5 June 2016 at 22:08:21 UTC, Ausprobierer wrote:On Sunday, 5 June 2016 at 21:58:46 UTC, Donald Drumpf wrote:Well that's embarrassing.[...]With [CODE] import std.algorithm; import std.array; import std.conv; import std.datetime; import std.parallelism; import std.range; import std.stdio; import core.atomic; import core.thread; long[] z = [1,1,2,2,3,3,4,4,5,5,6,6,0,0]; void main() { immutable long n = 11; long l = z.length; long j = 0; [...] [/CODE] and "dmd -m64 -O -inline -release <filename>.d" I still get 40 seconds runtime.
Jun 05 2016
On Sunday, 5 June 2016 at 21:52:20 UTC, Ausprobierer wrote:I've written this simple piece of code: [CODE] import std.algorithm; import std.array; import std.conv; import std.datetime; import std.parallelism; import std.range; import std.stdio; import core.atomic; import core.thread; void main() { immutable long n = 11; long[] z = [1,1,2,2,3,3,4,4,5,5,6,6,0,0]; long j = 0; long l = z.length; StopWatch w; w.reset(); w.start; do { if (z[0] == 0) break; else { long m = 0; for (long i = 0; i < z.length; i += 2) { m += (z[i]*10 + z[i+1]); } if (m % n == 0) j++; } } while (nextPermutation(z)); w.stop(); j.writeln; w.peek().msecs.writeln; } [/CODE] I've compiled it with "dmd -m64 -release -inline <filename>.d". Output is 58679513, and it takes around 52 seconds to run. Then I've replicated this code in MSVS2015 C++: [CODE] #include <algorithm> #include <iostream> #include <ctime> using namespace std; const long N = 11; long Z[] = { 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 0, 0 }; long L = sizeof(Z) / sizeof(Z[0]); int main(void) { long j = 0; clock_t t = clock(); do { if (Z[0] == 0) break; long m = 0; for (long i = 0; i < L; i += 2) { m += (Z[i] * 10 + Z[i + 1]); } if (m % N == 0) { j++; } } while (next_permutation(Z, Z+L)); t = clock() - t; cout << j << endl; cout << (float(t)) / CLOCKS_PER_SEC << endl; return 0; } [/CODE] Compile mode is also Release/x64. I've run the program and was eagerly surprised, not to say stunned. The output is also 58679513, but it's done in under 6 seconds! Care to explain? I haven't found a solution to this :(I never used those methods neither in D nor in C++ but I have a gut feeling it is the nextPermutation method in phobos, allocating internally (it is not nogc at least) but you can compare it easily to c++: https://github.com/dlang/phobos/blob/cdc8218e9840353f6ed79f00a63280c3002b198f/std/algorithm/sorting.d#L2786 could be less noise if you benchmark just calling this method vs. in C++ and even dig into the disassembly difference there. --stephan
Jun 05 2016
I never used those methods neither in D nor in C++ but I have a gut feeling it is the nextPermutation method in phobos, [...] --stephanI think you're right, it probably is the nextPermutation() function: <imports> long[] z = [1,1,2,2,3,3,4,4,5,5,6,6,0,0]; void main() { do { if (z[0] == 0) break; } while (nextPermutation(z)); } ...takes approx. 31 seconds, compiled with "-m64 -O -release -inline".
Jun 05 2016
On Sun, Jun 05, 2016 at 10:29:09PM +0000, extrawurst via Digitalmars-d wrote: [...]I never used those methods neither in D nor in C++ but I have a gut feeling it is the nextPermutation method in phobos, allocating internallyI wrote nextPermutation. It does not allocate. For performance, I recommend using gdc or ldc. It is not dmd's strength. T -- If blunt statements had a point, they wouldn't be blunt...
Jun 05 2016
On 6/6/16 1:12 AM, H. S. Teoh via Digitalmars-d wrote:On Sun, Jun 05, 2016 at 10:29:09PM +0000, extrawurst via Digitalmars-d wrote: [...]FWIW: * gdc should be used when benchmarking to ensure comparable backends. * Just looked at std.permutation (https://github.com/dlang/phobos/blob/cdc8218e9840353f6ed79f00a63280c3002b198f/std/algorit m/sorting.d#L2786), looks good but operates on complex compound ranges so it puts more pressure on the optimizer. Compare to C++'s std::next_permutation (http://stackoverflow.com/questions/11483060/stdnext-permutation-impleme tation-explanation) which directly bumps iterators etc. We also have trouble because of ranges' much discussed difficulties with iterators pointing somewhere inside them (e.g. we need an extra counter n). A specialization of nextPermutation for random-access ranges should pay off here. AndreiI never used those methods neither in D nor in C++ but I have a gut feeling it is the nextPermutation method in phobos, allocating internallyI wrote nextPermutation. It does not allocate. For performance, I recommend using gdc or ldc. It is not dmd's strength.
Jun 05 2016
On Monday, 6 June 2016 at 06:25:19 UTC, Andrei Alexandrescu wrote:* Just looked at std.permutationFWIW, there's also this: https://dlang.org/library/std/algorithm/iteration/permutations.html
Jun 06 2016
On Monday, 6 June 2016 at 10:44:10 UTC, Vladimir Panteleev wrote:On Monday, 6 June 2016 at 06:25:19 UTC, Andrei Alexandrescu wrote:and mir.combinatorics http://docs.mir.dlang.io/latest/mir_combinatorics.html if you are searching for more algorithms with range support. For all of these you can use the new Allocator API to allocate the initial state.* Just looked at std.permutationFWIW, there's also this: https://dlang.org/library/std/algorithm/iteration/permutations.html
Jun 06 2016
On Sunday, 5 June 2016 at 21:52:20 UTC, Ausprobierer wrote:I've written this simple piece of code: [CODE] import std.algorithm; import std.array; import std.conv; import std.datetime; import std.parallelism; import std.range; import std.stdio; import core.atomic; import core.thread; void main() { immutable long n = 11; long[] z = [1,1,2,2,3,3,4,4,5,5,6,6,0,0]; long j = 0; long l = z.length; StopWatch w; w.reset(); w.start; do { if (z[0] == 0) break; else { long m = 0; for (long i = 0; i < z.length; i += 2) { m += (z[i]*10 + z[i+1]); } if (m % n == 0) j++; } } while (nextPermutation(z)); w.stop(); j.writeln; w.peek().msecs.writeln; } [/CODE] I've compiled it with "dmd -m64 -release -inline <filename>.d". Output is 58679513, and it takes around 52 seconds to run. Then I've replicated this code in MSVS2015 C++: [CODE] #include <algorithm> #include <iostream> #include <ctime> using namespace std; const long N = 11; long Z[] = { 1, 1, 2, 2, 3, 3, 4, 4, 5, 5, 6, 6, 0, 0 }; long L = sizeof(Z) / sizeof(Z[0]); int main(void) { long j = 0; clock_t t = clock(); do { if (Z[0] == 0) break; long m = 0; for (long i = 0; i < L; i += 2) { m += (Z[i] * 10 + Z[i + 1]); } if (m % N == 0) { j++; } } while (next_permutation(Z, Z+L)); t = clock() - t; cout << j << endl; cout << (float(t)) / CLOCKS_PER_SEC << endl; return 0; } [/CODE] Compile mode is also Release/x64. I've run the program and was eagerly surprised, not to say stunned. The output is also 58679513, but it's done in under 6 seconds! Care to explain? I haven't found a solution to this :(On OS X using ldc and clang, I get C++: ~6s and D: ~7s. The slowdown in D seems to be due to parts of nextPermutation not ending up inlined. Be careful with benchmarks like this, you are giving the compiler a lot more information than it usually has in any real world case (here it knows the exact values of all the input data/parameters!).
Jun 05 2016
On OS X using ldc and clang, I get C++: ~6s and D: ~7s. The slowdown in D seems to be due to parts of nextPermutation not ending up inlined. Be careful with benchmarks like this, you are giving the compiler a lot more information than it usually has in any real world case (here it knows the exact values of all the input data/parameters!).I've compiled and run the code on Win7 x64. I think it's a compiler issue then, with DMD causing the same slow behaviour on Linux x64. If so, I should possibly resort to e. g. LDC.
Jun 05 2016
On Sunday, 5 June 2016 at 22:34:47 UTC, John Colvin wrote:Be careful with benchmarks like this, you are giving the compiler a lot more information than it usually has in any real world case (here it knows the exact values of all the input data/parameters!).+1 ! For the input data, I think you could have a function return the data and mark that function with weak linkage so the compiler cannot reason about it nor inline it. -Johan
Jun 06 2016
On Sunday, 5 June 2016 at 21:52:20 UTC, Ausprobierer wrote:Then I've replicated this code in MSVS2015 C++:Note that 'long' in D is 64 bits, whereas in MSVC it is 32 bits.
Jun 06 2016