digitalmars.D - Reduce has dreadful performance?

Russel Winder via Digitalmars-d (33/33) Jun 18 2015 On a given machine, the code:

Laeeth Isharc (2/30) Jun 18 2015 which compiler ?

Laeeth Isharc (10/46) Jun 18 2015 fwiw n=1bn, delta=0.1

weaselcat (3/31) Jun 18 2015 first two run in the same time on LDC, third one runs about
Ilya Yaroshenko (43/71) Jun 18 2015 import std.stdio, std.datetime, std.algorithm, std.range,

Andrea Fontana (7/18) Jun 18 2015 What about this:

Ilya Yaroshenko (3/22) Jun 18 2015 This is not equivalent, because sum uses another (more precise)

Russel Winder via Digitalmars-d (23/26) Jun 18 2015 [=E2=80=A6]

Laeeth Isharc (2/18) Jun 18 2015 python setup.py pydexe --compiler=ldc

Laeeth Isharc (5/31) Jun 18 2015 GDC doesn't work anyway, I think, because of the shared library
John Colvin (3/29) Jun 19 2015 Just a note: if you're using PydMagic you can pass --compiler=ldc
Russel Winder via Digitalmars-d (14/17) Jun 19 2015 I now have this working and reduce is no longer an embarrassment. Also w...

Walter Bright (8/12) Jun 18 2015 I expect that at the moment, range+algorithms code will likely be somewh...

weaselcat (3/17) Jun 18 2015 ldc and gdc have no issue optimizing the range code(range code is

Andrei Alexandrescu (2/22) Jun 18 2015 Russel, do you have numbers for ldc by any chance? Thx! -- Andrei

Russel Winder via Digitalmars-d (20/23) Jun 19 2015 Single data point only, and I need to check this, but:

weaselcat (3/17) Jun 20 2015 Interesting, I saw a bigger speedup from the ranges.

Russel Winder via Digitalmars-d (17/22) Jun 19 2015 reduce already has achieved parity using LDC =E2=80=93 once you install ...

Martin Nowak (3/7) Jun 20 2015 Don't compare performance numbers on dmd, particularly not when

John Colvin (6/34) Jun 18 2015 What you've forgotten is to turn on your optimisation flags. Even

weaselcat (2/9) Jun 18 2015 third one was faster for me on ldc, fwiw

Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:

On a given machine, the code:

double sequential_loop(const int n, const double delta) {
  auto sum =3D 0.0;
  foreach (immutable i; 1 .. n + 1) {
    immutable x =3D (i - 0.5) * delta;
    sum +=3D 1.0 / (1.0 + x * x);
  }
  return 4.0 * delta * sum;
}

runs in about 6.70s. However the code:

double sequential_reduce(const int n, const double delta) {
  return 4.0 * delta * reduce!((double t, int i){immutable x =3D (i -
0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n + 1));
}

runs in about 17.03s, whilst:

double sequential_reduce_alt(const int n, const double delta) {
  return 4.0 * delta * reduce!"a + b"(
         map!((int i){immutable x =3D (i - 0.5) * delta; return 1.0 /
(1.0 + x * x);})(iota(1, n + 1)));
}

takes about 28.02s. Unless I am missing something (very possible), this
is not going to create a good advert for D as an imperative language
with declarative (internal iteration) expression.


--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Jun 18 2015

"Laeeth Isharc" <laeethnospam nospamlaeeth.com> writes:

On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
 On a given machine, the code:

 double sequential_loop(const int n, const double delta) {
   auto sum = 0.0;
   foreach (immutable i; 1 .. n + 1) {
     immutable x = (i - 0.5) * delta;
     sum += 1.0 / (1.0 + x * x);
   }
   return 4.0 * delta * sum;
 }

 runs in about 6.70s. However the code:

 double sequential_reduce(const int n, const double delta) {
   return 4.0 * delta * reduce!((double t, int i){immutable x = 
 (i -
 0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n 
 + 1));
 }

 runs in about 17.03s, whilst:

 double sequential_reduce_alt(const int n, const double delta) {
   return 4.0 * delta * reduce!"a + b"(
          map!((int i){immutable x = (i - 0.5) * delta; return 
 1.0 /
 (1.0 + x * x);})(iota(1, n + 1)));
 }

 takes about 28.02s. Unless I am missing something (very 
 possible), this
 is not going to create a good advert for D as an imperative 
 language
 with declarative (internal iteration) expression.

which compiler ?

Jun 18 2015

"Laeeth Isharc" <laeethnospam nospamlaeeth.com> writes:

On Thursday, 18 June 2015 at 10:34:46 UTC, Laeeth Isharc wrote:
 On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
 On a given machine, the code:

 double sequential_loop(const int n, const double delta) {
   auto sum = 0.0;
   foreach (immutable i; 1 .. n + 1) {
     immutable x = (i - 0.5) * delta;
     sum += 1.0 / (1.0 + x * x);
   }
   return 4.0 * delta * sum;
 }

 runs in about 6.70s. However the code:

 double sequential_reduce(const int n, const double delta) {
   return 4.0 * delta * reduce!((double t, int i){immutable x = 
 (i -
 0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n 
 + 1));
 }

 runs in about 17.03s, whilst:

 double sequential_reduce_alt(const int n, const double delta) {
   return 4.0 * delta * reduce!"a + b"(
          map!((int i){immutable x = (i - 0.5) * delta; return 
 1.0 /
 (1.0 + x * x);})(iota(1, n + 1)));
 }

 takes about 28.02s. Unless I am missing something (very 
 possible), this
 is not going to create a good advert for D as an imperative 
 language
 with declarative (internal iteration) expression.

 which compiler ?

fwiw n=1bn, delta=0.1
ldc no params:
8.749s/17.466s
ldc -O2
4.541s/4.562s
gdc no params
4.690/22.163 (!)
gdc -O2
4.552/4.551

Jun 18 2015

"weaselcat" <weaselcat gmail.com> writes:

On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
 On a given machine, the code:

 double sequential_loop(const int n, const double delta) {
   auto sum = 0.0;
   foreach (immutable i; 1 .. n + 1) {
     immutable x = (i - 0.5) * delta;
     sum += 1.0 / (1.0 + x * x);
   }
   return 4.0 * delta * sum;
 }

 runs in about 6.70s. However the code:

 double sequential_reduce(const int n, const double delta) {
   return 4.0 * delta * reduce!((double t, int i){immutable x = 
 (i -
 0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n 
 + 1));
 }

 runs in about 17.03s, whilst:

 double sequential_reduce_alt(const int n, const double delta) {
   return 4.0 * delta * reduce!"a + b"(
          map!((int i){immutable x = (i - 0.5) * delta; return 
 1.0 /
 (1.0 + x * x);})(iota(1, n + 1)));
 }

 takes about 28.02s. Unless I am missing something (very 
 possible), this
 is not going to create a good advert for D as an imperative 
 language
 with declarative (internal iteration) expression.

first two run in the same time on LDC, third one runs about 
30-40% faster.

Jun 18 2015

"Ilya Yaroshenko" <ilyayaroshenko gmail.com> writes:

On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
 On a given machine, the code:

 double sequential_loop(const int n, const double delta) {
   auto sum = 0.0;
   foreach (immutable i; 1 .. n + 1) {
     immutable x = (i - 0.5) * delta;
     sum += 1.0 / (1.0 + x * x);
   }
   return 4.0 * delta * sum;
 }

 runs in about 6.70s. However the code:

 double sequential_reduce(const int n, const double delta) {
   return 4.0 * delta * reduce!((double t, int i){immutable x = 
 (i -
 0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n 
 + 1));
 }

 runs in about 17.03s, whilst:

 double sequential_reduce_alt(const int n, const double delta) {
   return 4.0 * delta * reduce!"a + b"(
          map!((int i){immutable x = (i - 0.5) * delta; return 
 1.0 /
 (1.0 + x * x);})(iota(1, n + 1)));
 }

 takes about 28.02s. Unless I am missing something (very 
 possible), this
 is not going to create a good advert for D as an imperative 
 language
 with declarative (internal iteration) expression.

import std.stdio, std.datetime, std.algorithm, std.range, 
std.conv;

double sequential_loop(const int n, const double delta) {
   auto sum = 0.0;
   foreach (immutable i; 1 .. n + 1) {
     immutable x = (i - 0.5) * delta;
     sum += 1.0 / (1.0 + x * x);
   }
   return 4.0 * delta * sum;
}

//runs in about 6.70s. However the code:

double sequential_reduce(const int n, const double delta) {
   return 4.0 * delta * reduce!((double t, int i){immutable x = (i 
-
0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n + 
1));
}

//runs in about 17.03s, whilst:

double sequential_reduce_alt(const int n, const double delta) {
   return 4.0 * delta * reduce!"a + b"(
          map!((int i){immutable x = (i - 0.5) * delta; return 1.0 
/
(1.0 + x * x);})(iota(1, n + 1)));
}

void main() {
	auto res = benchmark!(
		{sequential_loop(1000, 10);},
		{sequential_reduce(1000, 10);},
		{sequential_reduce_alt(1000, 10);},
		)(1000);
	writeln(res[].map!(to!Duration));
}

$ dmd -run test.d
[9 ms, 305 μs, and 9 hnsecs, 16 ms, 625 μs, and 3 hnsecs, 27 ms, 
417 μs, and 3 hnsecs]

$ dmd -O -inline -release -run test.d
[4 ms, 567 μs, and 6 hnsecs, 4 ms, 853 μs, and 8 hnsecs, 5 ms, 52 
μs, and 5 hnsecs]

OS X, DMD64 D Compiler v2.067.1

Looks like you forgot optimisation troika "-O -inline -release"

Regards,
Ilya

Jun 18 2015

"Andrea Fontana" <nospam example.com> writes:

On Thursday, 18 June 2015 at 10:46:18 UTC, Ilya Yaroshenko wrote:
 On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
 On a given machine, the code:

 double sequential_loop(const int n, const double delta) {
   auto sum = 0.0;
   foreach (immutable i; 1 .. n + 1) {
     immutable x = (i - 0.5) * delta;
     sum += 1.0 / (1.0 + x * x);
   }
   return 4.0 * delta * sum;
 }


What about this:

double sequential_alternative(const int n, const double delta) {
	return 4.0 * delta * iota(1, n+1).map!(x => 
(x-0.5)*delta).map!(x => 1.0/(1.0 + x*x)).sum;
}

?

Jun 18 2015

"Ilya Yaroshenko" <ilyayaroshenko gmail.com> writes:

On Thursday, 18 June 2015 at 10:50:09 UTC, Andrea Fontana wrote:
 On Thursday, 18 June 2015 at 10:46:18 UTC, Ilya Yaroshenko 
 wrote:
 On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
 On a given machine, the code:

 double sequential_loop(const int n, const double delta) {
   auto sum = 0.0;
   foreach (immutable i; 1 .. n + 1) {
     immutable x = (i - 0.5) * delta;
     sum += 1.0 / (1.0 + x * x);
   }
   return 4.0 * delta * sum;
 }


 What about this:

 double sequential_alternative(const int n, const double delta) {
 	return 4.0 * delta * iota(1, n+1).map!(x => 
 (x-0.5)*delta).map!(x => 1.0/(1.0 + x*x)).sum;
 }

 ?

This is not equivalent, because sum uses another (more precise) 
algorithms to do summation. It should work a little bit slower.

Jun 18 2015

Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Thu, 2015-06-18 at 10:46 +0000, Ilya Yaroshenko via Digitalmars-d
wrote:
=20

[=E2=80=A6]
 Looks like you forgot optimisation troika "-O -inline -release"
=20

Don't I feel stupid :-)

Thanks to all for quick answers, most helpful.

Now I get

Loop: 3.14s
Reduce 1: 4.76s
Reduce 2: 5.12s

This is DMD 2.067

Anyone know how to get PyD to use LDC rather than DMD in the setuptools
build spec?

(I can't use GDC just now as it is not packaged for Fedora.)

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder      t: +44 20 7585 2200   voip: sip:russel.winder ekiga.n=
et
41 Buckmaster Road    m: +44 7770 465 077   xmpp: russel winder.org.uk
London SW11 1EN, UK   w: www.russel.org.uk  skype: russel_winder

Jun 18 2015

"Laeeth Isharc" <nospamlaeeth laeethnospam.laeeth.com> writes:

On Thursday, 18 June 2015 at 18:55:25 UTC, Russel Winder wrote:
 On Thu, 2015-06-18 at 10:46 +0000, Ilya Yaroshenko via 
 Digitalmars-d wrote:
 

 […]
 Looks like you forgot optimisation troika "-O -inline -release"
 

 Don't I feel stupid :-)

 Thanks to all for quick answers, most helpful.

 Now I get

 Loop: 3.14s
 Reduce 1: 4.76s
 Reduce 2: 5.12s

 This is DMD 2.067

 Anyone know how to get PyD to use LDC rather than DMD in the 
 setuptools build spec?

 (I can't use GDC just now as it is not packaged for Fedora.)

python setup.py pydexe --compiler=ldc

Jun 18 2015

"Laeeth Isharc" <nospamlaeeth laeethnospam.laeeth.com> writes:

On Thursday, 18 June 2015 at 20:18:31 UTC, Laeeth Isharc wrote:
 On Thursday, 18 June 2015 at 18:55:25 UTC, Russel Winder wrote:
 On Thu, 2015-06-18 at 10:46 +0000, Ilya Yaroshenko via 
 Digitalmars-d wrote:
 

 […]
 Looks like you forgot optimisation troika "-O -inline 
 -release"
 

 Don't I feel stupid :-)

 Thanks to all for quick answers, most helpful.

 Now I get

 Loop: 3.14s
 Reduce 1: 4.76s
 Reduce 2: 5.12s

 This is DMD 2.067

 Anyone know how to get PyD to use LDC rather than DMD in the 
 setuptools build spec?

 (I can't use GDC just now as it is not packaged for Fedora.)

 python setup.py pydexe --compiler=ldc

GDC doesn't work anyway, I think, because of the shared library 
problem.

would be great to have a dub setup so one can build in a standard 
way without using python.

Jun 18 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Thursday, 18 June 2015 at 20:18:31 UTC, Laeeth Isharc wrote:
 On Thursday, 18 June 2015 at 18:55:25 UTC, Russel Winder wrote:
 On Thu, 2015-06-18 at 10:46 +0000, Ilya Yaroshenko via 
 Digitalmars-d wrote:
 

 […]
 Looks like you forgot optimisation troika "-O -inline 
 -release"
 

 Don't I feel stupid :-)

 Thanks to all for quick answers, most helpful.

 Now I get

 Loop: 3.14s
 Reduce 1: 4.76s
 Reduce 2: 5.12s

 This is DMD 2.067

 Anyone know how to get PyD to use LDC rather than DMD in the 
 setuptools build spec?

 (I can't use GDC just now as it is not packaged for Fedora.)

 python setup.py pydexe --compiler=ldc

Just a note: if you're using PydMagic you can pass --compiler=ldc 
to %%pyd

Jun 19 2015

Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Thu, 2015-06-18 at 20:18 +0000, Laeeth Isharc via Digitalmars-d wrote:
 [=E2=80=A6]
=20
 python setup.py pydexe --compiler=3Dldc

I now have this working and reduce is no longer an embarrassment. Also we
have reaffirmed that LDC is the tool of choice and DMD unusable. But this
surprises no-one involved with CPU-bound activity I suspect.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder     t:+44 20 7585 2200   voip:sip:
russel.winder ekiga.net
41 Buckmaster Road   m:+44 7770 465 077   xmpp:russel winder.org.uk
London SW11 1EN, UK  w: www.russel.org.uk skype:russel_winder

Jun 19 2015

Walter Bright <newshound2 digitalmars.com> writes:

On 6/18/2015 7:04 AM, Russel Winder via Digitalmars-d wrote:
 Now I get

 Loop: 3.14s
 Reduce 1: 4.76s
 Reduce 2: 5.12s


I expect that at the moment, range+algorithms code will likely be somewhat 
slower than old fashioned loops. This is because code generators have been
tuned 
for decades to do a great job with loops.

There's no intrinsic reason why ranges must do worse, so I expect they'll 
achieve parity.

Ranges can move ahead because they can reduce the algorithmic complexity, 
whereas user written loops tend to be suboptimal.

Jun 18 2015

"weaselcat" <weaselcat gmail.com> writes:

On Thursday, 18 June 2015 at 20:53:20 UTC, Walter Bright wrote:
 On 6/18/2015 7:04 AM, Russel Winder via Digitalmars-d wrote:
 Now I get

 Loop: 3.14s
 Reduce 1: 4.76s
 Reduce 2: 5.12s


 I expect that at the moment, range+algorithms code will likely 
 be somewhat slower than old fashioned loops. This is because 
 code generators have been tuned for decades to do a great job 
 with loops.

 There's no intrinsic reason why ranges must do worse, so I 
 expect they'll achieve parity.

 Ranges can move ahead because they can reduce the algorithmic 
 complexity, whereas user written loops tend to be suboptimal.

ldc and gdc have no issue optimizing the range code(range code is 
actually faster with ldc)

Jun 18 2015

Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> writes:

On 6/18/15 2:04 PM, weaselcat wrote:
 On Thursday, 18 June 2015 at 20:53:20 UTC, Walter Bright wrote:
 On 6/18/2015 7:04 AM, Russel Winder via Digitalmars-d wrote:
 Now I get

 Loop: 3.14s
 Reduce 1: 4.76s
 Reduce 2: 5.12s


 I expect that at the moment, range+algorithms code will likely be
 somewhat slower than old fashioned loops. This is because code
 generators have been tuned for decades to do a great job with loops.

 There's no intrinsic reason why ranges must do worse, so I expect
 they'll achieve parity.

 Ranges can move ahead because they can reduce the algorithmic
 complexity, whereas user written loops tend to be suboptimal.

 ldc and gdc have no issue optimizing the range code(range code is
 actually faster with ldc)

Russel, do you have numbers for ldc by any chance? Thx! -- Andrei

Jun 18 2015

Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Thu, 2015-06-18 at 15:54 -0700, Andrei Alexandrescu via Digitalmars-d
wrote:
 [=E2=80=A6]
=20
 Russel, do you have numbers for ldc by any chance? Thx! -- Andrei

Single data point only, and I need to check this, but:

Loop: 1.44s
Reduce 1: 1.43s
Reduce 2: 1.42s

LDC seems to beat DMD into the ground.

But then I think we all know this ;-)

The PyD setuptools support appears to not properly ser RPATH which is a bit
annoying. Actually it is a lot annoying.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder     t:+44 20 7585 2200   voip:sip:
russel.winder ekiga.net
41 Buckmaster Road   m:+44 7770 465 077   xmpp:russel winder.org.uk
London SW11 1EN, UK  w: www.russel.org.uk skype:russel_winder

Jun 19 2015

"weaselcat" <weaselcat gmail.com> writes:

On Saturday, 20 June 2015 at 05:55:07 UTC, Russel Winder wrote:
 On Thu, 2015-06-18 at 15:54 -0700, Andrei Alexandrescu via 
 Digitalmars-d wrote:
 […]
 
 Russel, do you have numbers for ldc by any chance? Thx! -- 
 Andrei

 Single data point only, and I need to check this, but:

 Loop: 1.44s
 Reduce 1: 1.43s
 Reduce 2: 1.42s

 LDC seems to beat DMD into the ground.

 But then I think we all know this ;-)

 The PyD setuptools support appears to not properly ser RPATH 
 which is a bit annoying. Actually it is a lot annoying.

Interesting, I saw a bigger speedup from the ranges.
Do you use windows? What LLVM/LDC version? flags?

Jun 20 2015

Russel Winder via Digitalmars-d <digitalmars-d puremagic.com> writes:

On Thu, 2015-06-18 at 13:53 -0700, Walter Bright via Digitalmars-d wrote:
 [=E2=80=A6]
=20
 There's no intrinsic reason why ranges must do worse, so I expect they'll=

=20
 achieve parity.
=20

reduce already has achieved parity using LDC =E2=80=93 once you install it =
properly,
unlike what I had been doing.

Explicit loops are just so last millenium. :-)

PyData London 2015 here we come.

--=20
Russel.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D
Dr Russel Winder     t:+44 20 7585 2200   voip:sip:
russel.winder ekiga.net
41 Buckmaster Road   m:+44 7770 465 077   xmpp:russel winder.org.uk
London SW11 1EN, UK  w: www.russel.org.uk skype:russel_winder

Jun 19 2015

"Martin Nowak" <code dawg.eu> writes:

On Thursday, 18 June 2015 at 18:55:25 UTC, Russel Winder wrote:
 Loop: 3.14s
 Reduce 1: 4.76s
 Reduce 2: 5.12s

 This is DMD 2.067

Don't compare performance numbers on dmd, particularly not when 
assessing abstraction overhead.

Jun 20 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
 On a given machine, the code:

 double sequential_loop(const int n, const double delta) {
   auto sum = 0.0;
   foreach (immutable i; 1 .. n + 1) {
     immutable x = (i - 0.5) * delta;
     sum += 1.0 / (1.0 + x * x);
   }
   return 4.0 * delta * sum;
 }

 runs in about 6.70s. However the code:

 double sequential_reduce(const int n, const double delta) {
   return 4.0 * delta * reduce!((double t, int i){immutable x = 
 (i -
 0.5) * delta; return t + 1.0 / (1.0 + x * x);})(0.0, iota(1, n 
 + 1));
 }

 runs in about 17.03s, whilst:

 double sequential_reduce_alt(const int n, const double delta) {
   return 4.0 * delta * reduce!"a + b"(
          map!((int i){immutable x = (i - 0.5) * delta; return 
 1.0 /
 (1.0 + x * x);})(iota(1, n + 1)));
 }

 takes about 28.02s. Unless I am missing something (very 
 possible), this
 is not going to create a good advert for D as an imperative 
 language
 with declarative (internal iteration) expression.

What you've forgotten is to turn on your optimisation flags. Even 
DMD gets the first 2 vaguely right with -O -inline -release

gdc is able to vectorize all 3 examples if you give it -Ofast

The 3rd one does appear to be a bit slower in all cases, not sure 
why right now.

Jun 18 2015

"weaselcat" <weaselcat gmail.com> writes:

On Thursday, 18 June 2015 at 13:20:04 UTC, John Colvin wrote:
 On Thursday, 18 June 2015 at 10:27:58 UTC, Russel Winder wrote:
 [...]

 What you've forgotten is to turn on your optimisation flags. 
 Even DMD gets the first 2 vaguely right with -O -inline -release

 gdc is able to vectorize all 3 examples if you give it -Ofast

 The 3rd one does appear to be a bit slower in all cases, not 
 sure why right now.

third one was faster for me on ldc, fwiw

Jun 18 2015

D Programming

C/C++ Programming

Other

digitalmars.D - Reduce has dreadful performance?