digitalmars.D.learn - D outperformed by C++, what am I doing wrong?

amfvcg (115/115) Aug 12 2017 Hi all,

rikki cattermole (3/138) Aug 12 2017 Dmd compiles quickly, but doesn't create all that optimized code.
Roman Hargrave (8/21) Aug 12 2017 If I had to guess, this could be due to backend and optimizer.
Neia Neutuladh (14/16) Aug 13 2017 Well, for one thing, you are preallocating in C++ code but not in

Daniel Kozak via Digitalmars-d-learn (44/60) Aug 13 2017 this works ok for me with ldc compiler, gdc does not work on my arch
Daniel Kozak via Digitalmars-d-learn (38/104) Aug 13 2017 Here is more D idiomatic way:

amfvcg (18/111) Aug 13 2017 Thank you all for the replies. Good to know the community is

Daniel Kozak via Digitalmars-d-learn (4/123) Aug 13 2017 my second version on ldc takes 380ms and c++ version on same compiler

amfvcg (6/8) Aug 13 2017 Ok, on ideone (ldc 1.1.0) it timeouts, on dpaste (ldc 0.12.0) it

ikod (59/68) Aug 13 2017 import std.stdio : writeln;

amfvcg (3/8) Aug 13 2017 And you've compiled it with?

ikod (2/11) Aug 13 2017 DMD64 D Compiler v2.074.1

Petar Kirov [ZombineDev] (22/31) Aug 13 2017 Here are my results:

Petar Kirov [ZombineDev] (8/41) Aug 13 2017 With Daniel's latest version (

amfvcg (9/15) Aug 13 2017 Great!!! And that's what I was hoping for.

Petar Kirov [ZombineDev] (57/74) Aug 13 2017 There's one especially interesting result:

amfvcg (4/61) Aug 13 2017 Change the parameter for this array size to be taken from stdin

Johan Engelen (25/27) Aug 13 2017 This is paramount for all of the testing, examining, and

Petar Kirov [ZombineDev] (27/55) Aug 13 2017 I agree that in general this is not the right way to benchmark. I

Johan Engelen (7/23) Aug 13 2017 Execution of sum_subranges is already O(1), because the

Petar Kirov [ZombineDev] (4/13) Aug 13 2017 Heh, yeah you're absolutely right. I was just about to correct

Daniel Kozak via Digitalmars-d-learn (3/133) Aug 13 2017 this one is even faster than c++:
Daniel Kozak via Digitalmars-d-learn (3/142) Aug 13 2017 And this one is awesome :P

data pulverizer (5/23) Aug 17 2017 From time to time the forum gets questions like these. It would

amfvcg <hid d.en> writes:

Hi all,
I'm solving below task:

given container T and value R return sum of R-ranges over T. An 
example:
input : T=[1,1,1] R=2
output : [2, 1]

input : T=[1,2,3] R=1
output : [1,2,3]
(see dlang unittests for more examples)


Below c++ code compiled with g++-5.4.0 -O2 -std=c++14 runs on my 
machine in 656 836 us.
Below D code compiled with dmd v2.067.1 -O runs on my machine in 
~ 14.5 sec.

Each language has it's own "way of programming", and as I'm a 
beginner in D - probably I'm running through bushes instead of 
highway. Therefore I'd like to ask you, experienced dlang devs, 
to shed some light on "how to do it dlang-way".


C++ code:

#include <algorithm>
#include <chrono>
#include <iostream>
#include <iterator>
#include <list>
#include <map>
#include <string>
#include <utility>
#include <numeric>
#include <vector>


template<typename T, typename K>
std::vector<K> sum_elements(const T& beg, const T& end, 
std::size_t k, K def)
{
  if (k == 0) {
      return std::vector<K>{};
  }
  return sum_elements(beg, end, k, def, [](auto &l, auto &r){ 
return r+l;});
}

template<typename T, typename K, class BinaryOp>
std::vector<K>
     sum_elements(
             const T& beg,
             const T& end,
             std::size_t k,
             K def,
             BinaryOp op)
{
     std::vector<K> out;
     out.reserve((std::distance(beg, end) - 1)/k + 1);
     for (auto it = beg; it!=end; std::advance(it, 
std::min(static_cast<std::size_t>(std::distance(it, end)), k)))
     {
         out.push_back(std::accumulate(it, std::next(it, 
std::min(static_cast<std::size_t>(std::distance(it, end)), k)), 
def, op));
     }
     return out;
}

int main()
{
     std::vector<int> vec;
     auto size = 1000000;
     vec.reserve(size);
     for (int i=0; i < size; ++i)
         vec.push_back(i);
     auto beg = std::chrono::system_clock::now();
     auto sum = 0;
     for (int i=0; i < 100; i++)
         sum += sum_elements(vec.begin(), vec.end(), 2, 0).size();
     auto end = std::chrono::system_clock::now();
     std::cout << 
std::chrono::duration_cast<std::chrono::microseconds>(end-beg).count() <<
std::endl;
     std::cout << sum << std::endl;

     return sum;
}


D code:

import std.stdio : writeln;
import std.algorithm.comparison: min;
import std.algorithm.iteration: sum;
import core.time: MonoTime, Duration;


T[] sum_subranges(T)(T[] input, uint range)
{
     T[] result;
     if (range == 0)
     {
         return result;
     }
     for (uint i; i < input.length; i=min(i+range, input.length))
     {
         result ~= sum(input[i..min(i+range, input.length)]);
     }
     return result;
}

unittest
{
     assert(sum_subranges([1,1,1], 2) == [2, 1]);
     assert(sum_subranges([1,1,1,2,3,3], 2) == [2, 3, 6]);
     assert(sum_subranges([], 2) == []);
     assert(sum_subranges([1], 2) == [1]);
     assert(sum_subranges([1], 0) == []);
}


int main()
{
     int[1000000] v;
     for (int i=0; i < 1000000; ++i)
         v[i] = i;
     int sum;
     MonoTime beg = MonoTime.currTime;
     for (int i=0; i < 100; i++)
         sum += cast(int)sum_subranges(v,2).length;
     MonoTime end = MonoTime.currTime;
     writeln(end-beg);
     writeln(sum);
     return sum;
}

Aug 12 2017

rikki cattermole <rikki cattermole.co.nz> writes:

On 13/08/2017 7:09 AM, amfvcg wrote:
 Hi all,
 I'm solving below task:
 
 given container T and value R return sum of R-ranges over T. An example:
 input : T=[1,1,1] R=2
 output : [2, 1]
 
 input : T=[1,2,3] R=1
 output : [1,2,3]
 (see dlang unittests for more examples)
 
 
 Below c++ code compiled with g++-5.4.0 -O2 -std=c++14 runs on my machine 
 in 656 836 us.
 Below D code compiled with dmd v2.067.1 -O runs on my machine in ~ 14.5 
 sec.
 
 Each language has it's own "way of programming", and as I'm a beginner 
 in D - probably I'm running through bushes instead of highway. Therefore 
 I'd like to ask you, experienced dlang devs, to shed some light on "how 
 to do it dlang-way".
 
 
 C++ code:
 
 #include <algorithm>
 #include <chrono>
 #include <iostream>
 #include <iterator>
 #include <list>
 #include <map>
 #include <string>
 #include <utility>
 #include <numeric>
 #include <vector>
 
 
 template<typename T, typename K>
 std::vector<K> sum_elements(const T& beg, const T& end, std::size_t k, K 
 def)
 {
   if (k == 0) {
       return std::vector<K>{};
   }
   return sum_elements(beg, end, k, def, [](auto &l, auto &r){ return 
 r+l;});
 }
 
 template<typename T, typename K, class BinaryOp>
 std::vector<K>
      sum_elements(
              const T& beg,
              const T& end,
              std::size_t k,
              K def,
              BinaryOp op)
 {
      std::vector<K> out;
      out.reserve((std::distance(beg, end) - 1)/k + 1);
      for (auto it = beg; it!=end; std::advance(it, 
 std::min(static_cast<std::size_t>(std::distance(it, end)), k)))
      {
          out.push_back(std::accumulate(it, std::next(it, 
 std::min(static_cast<std::size_t>(std::distance(it, end)), k)), def, op));
      }
      return out;
 }
 
 int main()
 {
      std::vector<int> vec;
      auto size = 1000000;
      vec.reserve(size);
      for (int i=0; i < size; ++i)
          vec.push_back(i);
      auto beg = std::chrono::system_clock::now();
      auto sum = 0;
      for (int i=0; i < 100; i++)
          sum += sum_elements(vec.begin(), vec.end(), 2, 0).size();
      auto end = std::chrono::system_clock::now();
      std::cout << 
 std::chrono::duration_cast<std::chrono::microseconds>(end-beg).count() 
 << std::endl;
      std::cout << sum << std::endl;
 
      return sum;
 }
 
 
 D code:
 
 import std.stdio : writeln;
 import std.algorithm.comparison: min;
 import std.algorithm.iteration: sum;
 import core.time: MonoTime, Duration;
 
 
 T[] sum_subranges(T)(T[] input, uint range)
 {
      T[] result;
      if (range == 0)
      {
          return result;
      }
      for (uint i; i < input.length; i=min(i+range, input.length))
      {
          result ~= sum(input[i..min(i+range, input.length)]);
      }
      return result;
 }
 
 unittest
 {
      assert(sum_subranges([1,1,1], 2) == [2, 1]);
      assert(sum_subranges([1,1,1,2,3,3], 2) == [2, 3, 6]);
      assert(sum_subranges([], 2) == []);
      assert(sum_subranges([1], 2) == [1]);
      assert(sum_subranges([1], 0) == []);
 }
 
 
 int main()
 {
      int[1000000] v;
      for (int i=0; i < 1000000; ++i)
          v[i] = i;
      int sum;
      MonoTime beg = MonoTime.currTime;
      for (int i=0; i < 100; i++)
          sum += cast(int)sum_subranges(v,2).length;
      MonoTime end = MonoTime.currTime;
      writeln(end-beg);
      writeln(sum);
      return sum;
 }


Dmd compiles quickly, but doesn't create all that optimized code.
Try ldc or gdc and get back to us about it ;)

Aug 12 2017

Roman Hargrave <roman hargrave.info> writes:

On Sunday, 13 August 2017 at 06:09:39 UTC, amfvcg wrote:
 Hi all,
 I'm solving below task:

 given container T and value R return sum of R-ranges over T. An 
 example:
 input : T=[1,1,1] R=2
 output : [2, 1]

 input : T=[1,2,3] R=1
 output : [1,2,3]
 (see dlang unittests for more examples)


 Below c++ code compiled with g++-5.4.0 -O2 -std=c++14 runs on 
 my machine in 656 836 us.
 Below D code compiled with dmd v2.067.1 -O runs on my machine 
 in ~ 14.5 sec.

If I had to guess, this could be due to backend and optimizer.
I don't want to go in to detail on my thoughts because I am not 
an expert
on codegen optimization, but I might suggest running your test 
compiled with
GDC (using identical optimization settings as G++) and ldc2 with 
similar settings.

Aug 12 2017

Neia Neutuladh <neia ikeran.org> writes:

On Sunday, 13 August 2017 at 06:09:39 UTC, amfvcg wrote:
 Hi all,
 I'm solving below task:

Well, for one thing, you are preallocating in C++ code but not in 
D.

On my machine, your version of the code completes in 3.175 
seconds. Changing it a little reduces it to 0.420s:

     T[] result = new T[input.length];
     size_t o = 0;
     for (uint i; i < input.length; i=min(i+range, input.length))
     {
         result[o] = sum(input[i..min(i+range, input.length)]);
         o++;
     }
     return result[0..o];

You can also use Appender from std.array.

Aug 13 2017

Daniel Kozak via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

this works ok for me with ldc compiler, gdc does not work on my arch
machine so I can not do comparsion to your c++ versin (clang does not work
with your c++ code)

import std.stdio : writeln;
import std.algorithm.comparison: min;
import std.algorithm.iteration: sum;
import core.time: MonoTime, Duration;


T[] sum_subranges(T)(T[] input, uint range)
{
    import std.array : appender;
    auto app = appender!(T[])();
    if (range == 0)
    {
        return app.data;
    }
    for (uint i; i < input.length; i=min(i+range, input.length))
    {
        app.put(sum(input[i..min(i+range, input.length)]));
    }
    return app.data;
}

unittest
{
    assert(sum_subranges([1,1,1], 2) == [2, 1]);
    assert(sum_subranges([1,1,1,2,3,3], 2) == [2, 3, 6]);
    assert(sum_subranges([], 2) == []);
    assert(sum_subranges([1], 2) == [1]);
    assert(sum_subranges([1], 0) == []);
}


int main()
{
    import std.range : iota, array;
    auto v = iota(0,1000000).array;
    int sum;
    MonoTime beg = MonoTime.currTime;
    for (int i=0; i < 100; i++)
        sum += cast(int)sum_subranges(v,2).length;
    MonoTime end = MonoTime.currTime;
    writeln(end-beg);
    writeln(sum);
    return sum;
}

On Sun, Aug 13, 2017 at 9:03 AM, Neia Neutuladh via Digitalmars-d-learn <
digitalmars-d-learn puremagic.com> wrote:

 On Sunday, 13 August 2017 at 06:09:39 UTC, amfvcg wrote:

 Hi all,
 I'm solving below task:

 Well, for one thing, you are preallocating in C++ code but not in D.

 On my machine, your version of the code completes in 3.175 seconds.
 Changing it a little reduces it to 0.420s:

     T[] result = new T[input.length];
     size_t o = 0;
     for (uint i; i < input.length; i=min(i+range, input.length))
     {
         result[o] = sum(input[i..min(i+range, input.length)]);
         o++;
     }
     return result[0..o];

 You can also use Appender from std.array.

Aug 13 2017

Daniel Kozak via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

Here is more D idiomatic way:

import std.stdio : writeln;
import std.algorithm.comparison: min;
import std.algorithm.iteration: sum;
import core.time: MonoTime, Duration;


auto sum_subranges(T)(T input, uint range)
{
    import std.array : array;
    import std.range : chunks, ElementType;
    import std.algorithm : map;

    if (range == 0)
    {
        return ElementType!(T)[].init;
    }
    return input.chunks(range).map!(sum).array;
}

unittest
{
    assert(sum_subranges([1,1,1], 2) == [2, 1]);
    assert(sum_subranges([1,1,1,2,3,3], 2) == [2, 3, 6]);
    assert(sum_subranges([], 2) == []);
    assert(sum_subranges([1], 2) == [1]);
    assert(sum_subranges([1], 0) == []);
}


int main()
{
    import std.range : iota, array;
    auto v = iota(0,1000000);
    int sum;
    MonoTime beg = MonoTime.currTime;
    for (int i=0; i < 100; i++)
        sum += cast(int)sum_subranges(v,2).length;
    MonoTime end = MonoTime.currTime;
    writeln(end-beg);
    writeln(sum);
    return sum;
}

On Sun, Aug 13, 2017 at 9:13 AM, Daniel Kozak <kozzi11 gmail.com> wrote:

 this works ok for me with ldc compiler, gdc does not work on my arch
 machine so I can not do comparsion to your c++ versin (clang does not work
 with your c++ code)

 import std.stdio : writeln;
 import std.algorithm.comparison: min;
 import std.algorithm.iteration: sum;
 import core.time: MonoTime, Duration;


 T[] sum_subranges(T)(T[] input, uint range)
 {
     import std.array : appender;
     auto app = appender!(T[])();
     if (range == 0)
     {
         return app.data;
     }
     for (uint i; i < input.length; i=min(i+range, input.length))
     {
         app.put(sum(input[i..min(i+range, input.length)]));
     }
     return app.data;
 }

 unittest
 {
     assert(sum_subranges([1,1,1], 2) == [2, 1]);
     assert(sum_subranges([1,1,1,2,3,3], 2) == [2, 3, 6]);
     assert(sum_subranges([], 2) == []);
     assert(sum_subranges([1], 2) == [1]);
     assert(sum_subranges([1], 0) == []);
 }


 int main()
 {
     import std.range : iota, array;
     auto v = iota(0,1000000).array;
     int sum;
     MonoTime beg = MonoTime.currTime;
     for (int i=0; i < 100; i++)
         sum += cast(int)sum_subranges(v,2).length;
     MonoTime end = MonoTime.currTime;
     writeln(end-beg);
     writeln(sum);
     return sum;
 }

 On Sun, Aug 13, 2017 at 9:03 AM, Neia Neutuladh via Digitalmars-d-learn <
 digitalmars-d-learn puremagic.com> wrote:

 On Sunday, 13 August 2017 at 06:09:39 UTC, amfvcg wrote:

 Hi all,
 I'm solving below task:

 Well, for one thing, you are preallocating in C++ code but not in D.

 On my machine, your version of the code completes in 3.175 seconds.
 Changing it a little reduces it to 0.420s:

     T[] result = new T[input.length];
     size_t o = 0;
     for (uint i; i < input.length; i=min(i+range, input.length))
     {
         result[o] = sum(input[i..min(i+range, input.length)]);
         o++;
     }
     return result[0..o];

 You can also use Appender from std.array.

Aug 13 2017

amfvcg <hid d.en> writes:

On Sunday, 13 August 2017 at 07:30:32 UTC, Daniel Kozak wrote:
 Here is more D idiomatic way:

 import std.stdio : writeln;
 import std.algorithm.comparison: min;
 import std.algorithm.iteration: sum;
 import core.time: MonoTime, Duration;


 auto sum_subranges(T)(T input, uint range)
 {
     import std.array : array;
     import std.range : chunks, ElementType;
     import std.algorithm : map;

     if (range == 0)
     {
         return ElementType!(T)[].init;
     }
     return input.chunks(range).map!(sum).array;
 }

 unittest
 {
     assert(sum_subranges([1,1,1], 2) == [2, 1]);
     assert(sum_subranges([1,1,1,2,3,3], 2) == [2, 3, 6]);
     assert(sum_subranges([], 2) == []);
     assert(sum_subranges([1], 2) == [1]);
     assert(sum_subranges([1], 0) == []);
 }


 int main()
 {
     import std.range : iota, array;
     auto v = iota(0,1000000);
     int sum;
     MonoTime beg = MonoTime.currTime;
     for (int i=0; i < 100; i++)
         sum += cast(int)sum_subranges(v,2).length;
     MonoTime end = MonoTime.currTime;
     writeln(end-beg);
     writeln(sum);
     return sum;
 }

 On Sun, Aug 13, 2017 at 9:13 AM, Daniel Kozak 
 <kozzi11 gmail.com> wrote:

 this works ok for me with ldc compiler, gdc does not work on 
 my arch machine so I can not do comparsion to your c++ versin 
 (clang does not work with your c++ code)

 import std.stdio : writeln;
 import std.algorithm.comparison: min;
 import std.algorithm.iteration: sum;
 import core.time: MonoTime, Duration;


 T[] sum_subranges(T)(T[] input, uint range)
 {
     import std.array : appender;
     auto app = appender!(T[])();
     if (range == 0)
     {
         return app.data;
     }
     for (uint i; i < input.length; i=min(i+range, 
 input.length))
     {
         app.put(sum(input[i..min(i+range, input.length)]));
     }
     return app.data;
 }

 unittest
 {
     assert(sum_subranges([1,1,1], 2) == [2, 1]);
     assert(sum_subranges([1,1,1,2,3,3], 2) == [2, 3, 6]);
     assert(sum_subranges([], 2) == []);
     assert(sum_subranges([1], 2) == [1]);
     assert(sum_subranges([1], 0) == []);
 }


 int main()
 {
     import std.range : iota, array;
     auto v = iota(0,1000000).array;
     int sum;
     MonoTime beg = MonoTime.currTime;
     for (int i=0; i < 100; i++)
         sum += cast(int)sum_subranges(v,2).length;
     MonoTime end = MonoTime.currTime;
     writeln(end-beg);
     writeln(sum);
     return sum;
 }

 On Sun, Aug 13, 2017 at 9:03 AM, Neia Neutuladh via 
 Digitalmars-d-learn < digitalmars-d-learn puremagic.com> wrote:

 [...]



Thank you all for the replies. Good to know the community is 
alive in d :)

Let's settle the playground:
D  : http://ideone.com/h4fnsD
C++: http://ideone.com/X1pyXG

Both using GCC under the hood.
C++ in 112 ms;
D in :
- 2.5 sec with original source;
- 2.5 sec with Daniel's 1st version;
- 5 sec timeout exceeded with Daniel's 2nd version;
- 1.8 sec with Neia-like preallocation;

So still it's not that neaty.

(What's interesting C++ code generates 2KLOC of assembly, and 
Dlang   ldc 12KLOC - checked at godbolt).

P.S. For C++ version to work under clang, the function which 
takes (BinaryOp) must go before the other one (my bad).

Aug 13 2017

Daniel Kozak via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

my second version on ldc takes 380ms and c++ version on same compiler
(clang), takes 350ms, so it seems to be almost same

On Sun, Aug 13, 2017 at 9:51 AM, amfvcg via Digitalmars-d-learn <
digitalmars-d-learn puremagic.com> wrote:

 On Sunday, 13 August 2017 at 07:30:32 UTC, Daniel Kozak wrote:

 Here is more D idiomatic way:

 import std.stdio : writeln;
 import std.algorithm.comparison: min;
 import std.algorithm.iteration: sum;
 import core.time: MonoTime, Duration;


 auto sum_subranges(T)(T input, uint range)
 {
     import std.array : array;
     import std.range : chunks, ElementType;
     import std.algorithm : map;

     if (range == 0)
     {
         return ElementType!(T)[].init;
     }
     return input.chunks(range).map!(sum).array;
 }

 unittest
 {
     assert(sum_subranges([1,1,1], 2) == [2, 1]);
     assert(sum_subranges([1,1,1,2,3,3], 2) == [2, 3, 6]);
     assert(sum_subranges([], 2) == []);
     assert(sum_subranges([1], 2) == [1]);
     assert(sum_subranges([1], 0) == []);
 }


 int main()
 {
     import std.range : iota, array;
     auto v = iota(0,1000000);
     int sum;
     MonoTime beg = MonoTime.currTime;
     for (int i=0; i < 100; i++)
         sum += cast(int)sum_subranges(v,2).length;
     MonoTime end = MonoTime.currTime;
     writeln(end-beg);
     writeln(sum);
     return sum;
 }

 On Sun, Aug 13, 2017 at 9:13 AM, Daniel Kozak <kozzi11 gmail.com> wrote:

 this works ok for me with ldc compiler, gdc does not work on my arch
 machine so I can not do comparsion to your c++ versin (clang does not work
 with your c++ code)

 import std.stdio : writeln;
 import std.algorithm.comparison: min;
 import std.algorithm.iteration: sum;
 import core.time: MonoTime, Duration;


 T[] sum_subranges(T)(T[] input, uint range)
 {
     import std.array : appender;
     auto app = appender!(T[])();
     if (range == 0)
     {
         return app.data;
     }
     for (uint i; i < input.length; i=min(i+range, input.length))
     {
         app.put(sum(input[i..min(i+range, input.length)]));
     }
     return app.data;
 }

 unittest
 {
     assert(sum_subranges([1,1,1], 2) == [2, 1]);
     assert(sum_subranges([1,1,1,2,3,3], 2) == [2, 3, 6]);
     assert(sum_subranges([], 2) == []);
     assert(sum_subranges([1], 2) == [1]);
     assert(sum_subranges([1], 0) == []);
 }


 int main()
 {
     import std.range : iota, array;
     auto v = iota(0,1000000).array;
     int sum;
     MonoTime beg = MonoTime.currTime;
     for (int i=0; i < 100; i++)
         sum += cast(int)sum_subranges(v,2).length;
     MonoTime end = MonoTime.currTime;
     writeln(end-beg);
     writeln(sum);
     return sum;
 }

 On Sun, Aug 13, 2017 at 9:03 AM, Neia Neutuladh via Digitalmars-d-learn
 < digitalmars-d-learn puremagic.com> wrote:

 [...]



 Thank you all for the replies. Good to know the community is alive in d :)

 Let's settle the playground:
 D  : http://ideone.com/h4fnsD
 C++: http://ideone.com/X1pyXG

 Both using GCC under the hood.
 C++ in 112 ms;
 D in :
 - 2.5 sec with original source;
 - 2.5 sec with Daniel's 1st version;
 - 5 sec timeout exceeded with Daniel's 2nd version;
 - 1.8 sec with Neia-like preallocation;

 So still it's not that neaty.

 (What's interesting C++ code generates 2KLOC of assembly, and Dlang   ldc
 12KLOC - checked at godbolt).

 P.S. For C++ version to work under clang, the function which takes
 (BinaryOp) must go before the other one (my bad).

Aug 13 2017

amfvcg <hid d.en> writes:

On Sunday, 13 August 2017 at 08:00:53 UTC, Daniel Kozak wrote:
 my second version on ldc takes 380ms and c++ version on same 
 compiler (clang), takes 350ms, so it seems to be almost same

Ok, on ideone (ldc 1.1.0) it timeouts, on dpaste (ldc 0.12.0) it 
gets killed.
What version are you using?

Either way, if that'd be the case - that's slick. (and ldc would 
be the compiler of choice for real use cases).

Aug 13 2017

ikod <geller.garry gmail.com> writes:

On Sunday, 13 August 2017 at 08:13:56 UTC, amfvcg wrote:
 On Sunday, 13 August 2017 at 08:00:53 UTC, Daniel Kozak wrote:
 my second version on ldc takes 380ms and c++ version on same 
 compiler (clang), takes 350ms, so it seems to be almost same

 Ok, on ideone (ldc 1.1.0) it timeouts, on dpaste (ldc 0.12.0) 
 it gets killed.
 What version are you using?

 Either way, if that'd be the case - that's slick. (and ldc 
 would be the compiler of choice for real use cases).

import std.stdio : writeln;
import std.algorithm.comparison: min;
import std.algorithm.iteration: sum;
import core.time: MonoTime, Duration;
import std.range;
import std.algorithm;

auto s1(T)(T input, uint r) {
     return input.chunks(r).map!sum;
}

T[] sum_subranges(T)(T[] input, uint range)
{
     T[] result;
     if (range == 0)
     {
         return result;
     }
     for (uint i; i < input.length; i=min(i+range, input.length))
     {
         result ~= sum(input[i..min(i+range, input.length)]);
     }
     return result;
}

unittest
{
     assert(sum_subranges([1,1,1], 2) == [2, 1]);
     assert(sum_subranges([1,1,1,2,3,3], 2) == [2, 3, 6]);
     assert(sum_subranges([], 2) == []);
     assert(sum_subranges([1], 2) == [1]);
     assert(sum_subranges([1], 0) == []);
     assert(s1([1,1,1], 2).array == [2, 1]);
     assert(s1([1,1,1,2,3,3], 2).array == [2, 3, 6]);
}
int main()
{
     int sum;
     MonoTime beg0 = MonoTime.currTime;
     for (int i=0; i < 100; i++)
         sum += s1(iota(1000000),2).length;
     MonoTime end0 = MonoTime.currTime;
     writeln(end0-beg0);
     writeln(sum);

     sum = 0;
     int[1000000] v;
     for (int i=0; i < 1000000; ++i)
         v[i] = i;
     MonoTime beg = MonoTime.currTime;
     for (int i=0; i < 100; i++)
         sum += cast(int)sum_subranges(v,2).length;
     MonoTime end = MonoTime.currTime;
     writeln(end-beg);
     writeln(sum);
     return sum;
}

Gives me

5 μs and 2 hnsecs
50000000
3 secs, 228 ms, 837 μs, and 4 hnsecs
50000000

Aug 13 2017

amfvcg <hid d.en> writes:

 Gives me

 5 μs and 2 hnsecs
 50000000
 3 secs, 228 ms, 837 μs, and 4 hnsecs
 50000000

And you've compiled it with?

Btw. clang for c++ version works worse than gcc (for this case 
[112ms vs 180ms]).

Aug 13 2017

ikod <geller.garry gmail.com> writes:

On Sunday, 13 August 2017 at 08:32:50 UTC, amfvcg wrote:
 Gives me

 5 μs and 2 hnsecs
 50000000
 3 secs, 228 ms, 837 μs, and 4 hnsecs
 50000000

 And you've compiled it with?

 Btw. clang for c++ version works worse than gcc (for this case 
 [112ms vs 180ms]).

DMD64 D Compiler v2.074.1

Aug 13 2017

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Sunday, 13 August 2017 at 08:13:56 UTC, amfvcg wrote:
 On Sunday, 13 August 2017 at 08:00:53 UTC, Daniel Kozak wrote:
 my second version on ldc takes 380ms and c++ version on same 
 compiler (clang), takes 350ms, so it seems to be almost same

 Ok, on ideone (ldc 1.1.0) it timeouts, on dpaste (ldc 0.12.0) 
 it gets killed.
 What version are you using?

 Either way, if that'd be the case - that's slick. (and ldc 
 would be the compiler of choice for real use cases).

Here are my results:

$ uname -sri
Linux 4.10.0-28-generic x86_64

$ lscpu | grep 'Model name'
Model name:            Intel(R) Core(TM) i7-3770K CPU   3.50GHz

$ ldc2 --version | head -n5
LDC - the LLVM D compiler (1.3.0):
   based on DMD v2.073.2 and LLVM 4.0.0
   built with LDC - the LLVM D compiler (1.3.0)
   Default target: x86_64-unknown-linux-gnu
   Host CPU: ivybridge

$ g++ --version | head -n1
g++ (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406

$ ldc2 -O3 --release sum_subranges.d
$ ./sum_subranges
378 ms, 556 μs, and 9 hnsecs
50000000

$ g++ -O5 sum_subranges.cpp -o sum_subranges
$ ./sum_subranges
237135
50000000

Aug 13 2017

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Sunday, 13 August 2017 at 08:29:30 UTC, Petar Kirov 
[ZombineDev] wrote:
 On Sunday, 13 August 2017 at 08:13:56 UTC, amfvcg wrote:
 On Sunday, 13 August 2017 at 08:00:53 UTC, Daniel Kozak wrote:
 my second version on ldc takes 380ms and c++ version on same 
 compiler (clang), takes 350ms, so it seems to be almost same

 Ok, on ideone (ldc 1.1.0) it timeouts, on dpaste (ldc 0.12.0) 
 it gets killed.
 What version are you using?

 Either way, if that'd be the case - that's slick. (and ldc 
 would be the compiler of choice for real use cases).

 Here are my results:

 $ uname -sri
 Linux 4.10.0-28-generic x86_64

 $ lscpu | grep 'Model name'
 Model name:            Intel(R) Core(TM) i7-3770K CPU   3.50GHz

 $ ldc2 --version | head -n5
 LDC - the LLVM D compiler (1.3.0):
   based on DMD v2.073.2 and LLVM 4.0.0
   built with LDC - the LLVM D compiler (1.3.0)
   Default target: x86_64-unknown-linux-gnu
   Host CPU: ivybridge

 $ g++ --version | head -n1
 g++ (Ubuntu 6.3.0-12ubuntu2) 6.3.0 20170406

 $ ldc2 -O3 --release sum_subranges.d
 $ ./sum_subranges
 378 ms, 556 μs, and 9 hnsecs
 50000000

 $ g++ -O5 sum_subranges.cpp -o sum_subranges
 $ ./sum_subranges
 237135
 50000000

With Daniel's latest version (
http://forum.dlang.org/post/mailman.5963.1502612885.31550.digitalmars-d
learn puremagic.com )

$ ldc2 -O3 --release sum_subranges2.d
$ ./sum_subranges2
210 ms, 838 μs, and 8 hnsecs
50000000

Aug 13 2017

amfvcg <hid d.en> writes:

On Sunday, 13 August 2017 at 08:33:53 UTC, Petar Kirov 
[ZombineDev] wrote:
 With Daniel's latest version (
 http://forum.dlang.org/post/mailman.5963.1502612885.31550.digitalmars-d
learn puremagic.com )

 $ ldc2 -O3 --release sum_subranges2.d
 $ ./sum_subranges2
 210 ms, 838 μs, and 8 hnsecs
 50000000

Great!!! And that's what I was hoping for.

So the conclusion is:

use the latest ldc that's out there.

Thank you Petar, thank you Daniel. (I cannot change the subject 
to SOLVED, can I?)

Btw. the idiomatic version of this d sample looks how I imagined 
it should!

Aug 13 2017

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Sunday, 13 August 2017 at 08:43:29 UTC, amfvcg wrote:
 On Sunday, 13 August 2017 at 08:33:53 UTC, Petar Kirov 
 [ZombineDev] wrote:
 With Daniel's latest version (
 http://forum.dlang.org/post/mailman.5963.1502612885.31550.digitalmars-d
learn puremagic.com )

 $ ldc2 -O3 --release sum_subranges2.d
 $ ./sum_subranges2
 210 ms, 838 μs, and 8 hnsecs
 50000000

 Great!!! And that's what I was hoping for.

 So the conclusion is:

 use the latest ldc that's out there.

 Thank you Petar, thank you Daniel. (I cannot change the subject 
 to SOLVED, can I?)

 Btw. the idiomatic version of this d sample looks how I 
 imagined it should!

There's one especially interesting result:

This instantiation:

sum_subranges(std.range.iota!(int, int).iota(int, int).Result, 
uint)

of the following function:

auto sum_subranges(T)(T input, uint range)
{
     import std.range : chunks, ElementType, array;
     import std.algorithm : map;
     return input.chunks(range).map!(sum);
}

gets optimized with LDC to:
   push rax
   test edi, edi
   je .LBB2_2
   mov edx, edi
   mov rax, rsi
   pop rcx
   ret
.LBB2_2:
   lea rsi, [rip + .L.str.3]
   lea rcx, [rip + .L.str]
   mov edi, 45
   mov edx, 89
   mov r8d, 6779
   call _d_assert_msg PLT

I.e. the compiler turned a O(n) algorithm to O(1), which is quite 
neat. It is also quite surprising to me that it looks like even 
dmd managed to do a similar optimization:

sum_subranges(std.range.iota!(int, int).iota(int, int).Result, 
uint):
     push   rbp
     mov    rbp,rsp
     sub    rsp,0x30
     mov    DWORD PTR [rbp-0x8],edi
     mov    r9d,DWORD PTR [rbp-0x8]
     test   r9,r9
     jne    41
     mov    r8d,0x1b67
     mov    ecx,0x0
     mov    eax,0x61
     mov    rdx,rax
     mov    QWORD PTR [rbp-0x28],rdx
     mov    edx,0x0
     mov    edi,0x2d
     mov    rsi,rdx
     mov    rdx,QWORD PTR [rbp-0x28]
     call   41
41: mov    QWORD PTR [rbp-0x20],rsi
     mov    QWORD PTR [rbp-0x18],r9
     mov    rdx,QWORD PTR [rbp-0x18]
     mov    rax,QWORD PTR [rbp-0x20]
     mov    rsp,rbp algorithms a
     pop    rbp
     ret

Moral of the story: templates + ranges are an awesome combination.

Aug 13 2017

amfvcg <hid d.en> writes:

On Sunday, 13 August 2017 at 09:08:14 UTC, Petar Kirov 
[ZombineDev] wrote:
 There's one especially interesting result:

 This instantiation:

 sum_subranges(std.range.iota!(int, int).iota(int, int).Result, 
 uint)

 of the following function:

 auto sum_subranges(T)(T input, uint range)
 {
     import std.range : chunks, ElementType, array;
     import std.algorithm : map;
     return input.chunks(range).map!(sum);
 }

 gets optimized with LDC to:
   push rax
   test edi, edi
   je .LBB2_2
   mov edx, edi
   mov rax, rsi
   pop rcx
   ret
 .LBB2_2:
   lea rsi, [rip + .L.str.3]
   lea rcx, [rip + .L.str]
   mov edi, 45
   mov edx, 89
   mov r8d, 6779
   call _d_assert_msg PLT

 I.e. the compiler turned a O(n) algorithm to O(1), which is 
 quite neat. It is also quite surprising to me that it looks 
 like even dmd managed to do a similar optimization:

 sum_subranges(std.range.iota!(int, int).iota(int, int).Result, 
 uint):
     push   rbp
     mov    rbp,rsp
     sub    rsp,0x30
     mov    DWORD PTR [rbp-0x8],edi
     mov    r9d,DWORD PTR [rbp-0x8]
     test   r9,r9
     jne    41
     mov    r8d,0x1b67
     mov    ecx,0x0
     mov    eax,0x61
     mov    rdx,rax
     mov    QWORD PTR [rbp-0x28],rdx
     mov    edx,0x0
     mov    edi,0x2d
     mov    rsi,rdx
     mov    rdx,QWORD PTR [rbp-0x28]
     call   41
 41: mov    QWORD PTR [rbp-0x20],rsi
     mov    QWORD PTR [rbp-0x18],r9
     mov    rdx,QWORD PTR [rbp-0x18]
     mov    rax,QWORD PTR [rbp-0x20]
     mov    rsp,rbp algorithms a
     pop    rbp
     ret

 Moral of the story: templates + ranges are an awesome 
 combination.

Change the parameter for this array size to be taken from stdin 
and I assume that these optimizations will go away.

Aug 13 2017

Johan Engelen <j j.nl> writes:

On Sunday, 13 August 2017 at 09:15:48 UTC, amfvcg wrote:
 Change the parameter for this array size to be taken from stdin 
 and I assume that these optimizations will go away.

This is paramount for all of the testing, examining, and 
comparisons that are discussed in this thread.
Full information is given to the compiler, and you are basically 
testing the constant folding power of the compilers (not 
unimportant). No runtime calculation is needed for the sum. Your 
program could be optimized to the following code:
```
void main()
{
     MonoTime beg = MonoTime.currTime;
     MonoTime end = MonoTime.currTime;
     writeln(end-beg);
     writeln(50000000);
}
```
So actually you should be more surprised that the reported time 
is not equal to near-zero (just the time between two 
`MonoTime.currTime` calls)!

Instead of `iota(1,1000000)`, you should initialize the array 
with random numbers with a randomization seed given by the user 
(e.g. commandline argument or stdin). Then, the program will 
actually have to do the runtime calculations that I assume you 
are expecting it to perform.

- Johan

Aug 13 2017

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Sunday, 13 August 2017 at 09:56:44 UTC, Johan Engelen wrote:
 On Sunday, 13 August 2017 at 09:15:48 UTC, amfvcg wrote:
 Change the parameter for this array size to be taken from 
 stdin and I assume that these optimizations will go away.

 This is paramount for all of the testing, examining, and 
 comparisons that are discussed in this thread.
 Full information is given to the compiler, and you are 
 basically testing the constant folding power of the compilers 
 (not unimportant).

I agree that in general this is not the right way to benchmark. I 
however am interested specifically in the pattern matching / 
constant folding abilities
of the compiler. I would have expected `sum(iota(1, N + 1))` to 
be replaced with `(N*(N+1))/2`. LDC already does this 
optimization in some cases. I have opened an issue for some of 
the rest: https://github.com/ldc-developers/ldc/issues/2271

 No runtime calculation is needed for the sum. Your program 
 could be optimized to the following code:
 ```
 void main()
 {
     MonoTime beg = MonoTime.currTime;
     MonoTime end = MonoTime.currTime;
     writeln(end-beg);
     writeln(50000000);
 }
 ```
 So actually you should be more surprised that the reported time 
 is not equal to near-zero (just the time between two 
 `MonoTime.currTime` calls)!

On Posix, `MonoTime.currTime`'s implementation uses 
clock_gettime(CLOCK_MONOTONIC, ...) which quite a bit more 
involved than simply using the rdtsc instruciton on x86. See: 
http://linuxmogeb.blogspot.bg/2013/10/how-does-clockgettime-work.html

On Windows, `MonoTime.currTime` uses QueryPerformanceCounter, 
which on Win 7 and later uses the rdtsc instruction, which makes 
it quite streamlined. In some testing I did several months ago 
QueryPerformanceCounter had really good latency and precision 
(though I forgot the exact numbers I got).

 Instead of `iota(1,1000000)`, you should initialize the array 
 with random numbers with a randomization seed given by the user 
 (e.g. commandline argument or stdin). Then, the program will 
 actually have to do the runtime calculations that I assume you 
 are expecting it to perform.

Agreed, though I think Phobos's unpredictableSeed does an ok job 
w.r.t. seeding, so unless you want to repeat the benchmark on the 
exact same dataset, something like this does a good job:

T[] generate(T)(size_t size)
{
     import std.algorithm.iteration : map;
     import std.range : array, iota;
     import std.random : uniform;

     return size.iota.map!(_ => uniform!T()).array;
}

Aug 13 2017

Johan Engelen <j j.nl> writes:

On Sunday, 13 August 2017 at 09:08:14 UTC, Petar Kirov 
[ZombineDev] wrote:
 This instantiation:

 sum_subranges(std.range.iota!(int, int).iota(int, int).Result, 
 uint)

 of the following function:

 auto sum_subranges(T)(T input, uint range)
 {
     import std.range : chunks, ElementType, array;
     import std.algorithm : map;
     return input.chunks(range).map!(sum);
 }

 gets optimized with LDC to:
  [snip]

 I.e. the compiler turned a O(n) algorithm to O(1), which is 
 quite neat. It is also quite surprising to me that it looks 
 like even dmd managed to do a similar optimization:
  [snip]

Execution of sum_subranges is already O(1), because the 
calculation of the sum is delayed: the return type of the 
function is not `uint`, it is `MapResult!(sum, <range>)` which 
does a lazy evaluation of the sum.

- Johan

Aug 13 2017

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Sunday, 13 August 2017 at 09:41:39 UTC, Johan Engelen wrote:
 On Sunday, 13 August 2017 at 09:08:14 UTC, Petar Kirov 
 [ZombineDev] wrote:
  [...]

  [...]

 Execution of sum_subranges is already O(1), because the 
 calculation of the sum is delayed: the return type of the 
 function is not `uint`, it is `MapResult!(sum, <range>)` which 
 does a lazy evaluation of the sum.

 - Johan

Heh, yeah you're absolutely right. I was just about to correct 
myself, when I saw your reply. Don't know how I missed such an 
obvious thing :D

Aug 13 2017

Daniel Kozak via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

this one is even faster than c++:
http://ideone.com/TRDsOo

On Sun, Aug 13, 2017 at 10:00 AM, Daniel Kozak <kozzi11 gmail.com> wrote:

 my second version on ldc takes 380ms and c++ version on same compiler
 (clang), takes 350ms, so it seems to be almost same

 On Sun, Aug 13, 2017 at 9:51 AM, amfvcg via Digitalmars-d-learn <
 digitalmars-d-learn puremagic.com> wrote:

 On Sunday, 13 August 2017 at 07:30:32 UTC, Daniel Kozak wrote:

 Here is more D idiomatic way:

 import std.stdio : writeln;
 import std.algorithm.comparison: min;
 import std.algorithm.iteration: sum;
 import core.time: MonoTime, Duration;


 auto sum_subranges(T)(T input, uint range)
 {
     import std.array : array;
     import std.range : chunks, ElementType;
     import std.algorithm : map;

     if (range == 0)
     {
         return ElementType!(T)[].init;
     }
     return input.chunks(range).map!(sum).array;
 }

 unittest
 {
     assert(sum_subranges([1,1,1], 2) == [2, 1]);
     assert(sum_subranges([1,1,1,2,3,3], 2) == [2, 3, 6]);
     assert(sum_subranges([], 2) == []);
     assert(sum_subranges([1], 2) == [1]);
     assert(sum_subranges([1], 0) == []);
 }


 int main()
 {
     import std.range : iota, array;
     auto v = iota(0,1000000);
     int sum;
     MonoTime beg = MonoTime.currTime;
     for (int i=0; i < 100; i++)
         sum += cast(int)sum_subranges(v,2).length;
     MonoTime end = MonoTime.currTime;
     writeln(end-beg);
     writeln(sum);
     return sum;
 }

 On Sun, Aug 13, 2017 at 9:13 AM, Daniel Kozak <kozzi11 gmail.com> wrote:

 this works ok for me with ldc compiler, gdc does not work on my arch
 machine so I can not do comparsion to your c++ versin (clang does not work
 with your c++ code)

 import std.stdio : writeln;
 import std.algorithm.comparison: min;
 import std.algorithm.iteration: sum;
 import core.time: MonoTime, Duration;


 T[] sum_subranges(T)(T[] input, uint range)
 {
     import std.array : appender;
     auto app = appender!(T[])();
     if (range == 0)
     {
         return app.data;
     }
     for (uint i; i < input.length; i=min(i+range, input.length))
     {
         app.put(sum(input[i..min(i+range, input.length)]));
     }
     return app.data;
 }

 unittest
 {
     assert(sum_subranges([1,1,1], 2) == [2, 1]);
     assert(sum_subranges([1,1,1,2,3,3], 2) == [2, 3, 6]);
     assert(sum_subranges([], 2) == []);
     assert(sum_subranges([1], 2) == [1]);
     assert(sum_subranges([1], 0) == []);
 }


 int main()
 {
     import std.range : iota, array;
     auto v = iota(0,1000000).array;
     int sum;
     MonoTime beg = MonoTime.currTime;
     for (int i=0; i < 100; i++)
         sum += cast(int)sum_subranges(v,2).length;
     MonoTime end = MonoTime.currTime;
     writeln(end-beg);
     writeln(sum);
     return sum;
 }

 On Sun, Aug 13, 2017 at 9:03 AM, Neia Neutuladh via Digitalmars-d-learn
 < digitalmars-d-learn puremagic.com> wrote:

 [...]



 Thank you all for the replies. Good to know the community is alive in d :)

 Let's settle the playground:
 D  : http://ideone.com/h4fnsD
 C++: http://ideone.com/X1pyXG

 Both using GCC under the hood.
 C++ in 112 ms;
 D in :
 - 2.5 sec with original source;
 - 2.5 sec with Daniel's 1st version;
 - 5 sec timeout exceeded with Daniel's 2nd version;
 - 1.8 sec with Neia-like preallocation;

 So still it's not that neaty.

 (What's interesting C++ code generates 2KLOC of assembly, and Dlang   ldc
 12KLOC - checked at godbolt).

 P.S. For C++ version to work under clang, the function which takes
 (BinaryOp) must go before the other one (my bad).

Aug 13 2017

Daniel Kozak via Digitalmars-d-learn <digitalmars-d-learn puremagic.com> writes:

And this one is awesome :P
http://ideone.com/muehUw

On Sun, Aug 13, 2017 at 10:27 AM, Daniel Kozak <kozzi11 gmail.com> wrote:

 this one is even faster than c++:
 http://ideone.com/TRDsOo

 On Sun, Aug 13, 2017 at 10:00 AM, Daniel Kozak <kozzi11 gmail.com> wrote:

 my second version on ldc takes 380ms and c++ version on same compiler
 (clang), takes 350ms, so it seems to be almost same

 On Sun, Aug 13, 2017 at 9:51 AM, amfvcg via Digitalmars-d-learn <
 digitalmars-d-learn puremagic.com> wrote:

 On Sunday, 13 August 2017 at 07:30:32 UTC, Daniel Kozak wrote:

 Here is more D idiomatic way:

 import std.stdio : writeln;
 import std.algorithm.comparison: min;
 import std.algorithm.iteration: sum;
 import core.time: MonoTime, Duration;


 auto sum_subranges(T)(T input, uint range)
 {
     import std.array : array;
     import std.range : chunks, ElementType;
     import std.algorithm : map;

     if (range == 0)
     {
         return ElementType!(T)[].init;
     }
     return input.chunks(range).map!(sum).array;
 }

 unittest
 {
     assert(sum_subranges([1,1,1], 2) == [2, 1]);
     assert(sum_subranges([1,1,1,2,3,3], 2) == [2, 3, 6]);
     assert(sum_subranges([], 2) == []);
     assert(sum_subranges([1], 2) == [1]);
     assert(sum_subranges([1], 0) == []);
 }


 int main()
 {
     import std.range : iota, array;
     auto v = iota(0,1000000);
     int sum;
     MonoTime beg = MonoTime.currTime;
     for (int i=0; i < 100; i++)
         sum += cast(int)sum_subranges(v,2).length;
     MonoTime end = MonoTime.currTime;
     writeln(end-beg);
     writeln(sum);
     return sum;
 }

 On Sun, Aug 13, 2017 at 9:13 AM, Daniel Kozak <kozzi11 gmail.com>
 wrote:

 this works ok for me with ldc compiler, gdc does not work on my arch
 machine so I can not do comparsion to your c++ versin (clang does not work
 with your c++ code)

 import std.stdio : writeln;
 import std.algorithm.comparison: min;
 import std.algorithm.iteration: sum;
 import core.time: MonoTime, Duration;


 T[] sum_subranges(T)(T[] input, uint range)
 {
     import std.array : appender;
     auto app = appender!(T[])();
     if (range == 0)
     {
         return app.data;
     }
     for (uint i; i < input.length; i=min(i+range, input.length))
     {
         app.put(sum(input[i..min(i+range, input.length)]));
     }
     return app.data;
 }

 unittest
 {
     assert(sum_subranges([1,1,1], 2) == [2, 1]);
     assert(sum_subranges([1,1,1,2,3,3], 2) == [2, 3, 6]);
     assert(sum_subranges([], 2) == []);
     assert(sum_subranges([1], 2) == [1]);
     assert(sum_subranges([1], 0) == []);
 }


 int main()
 {
     import std.range : iota, array;
     auto v = iota(0,1000000).array;
     int sum;
     MonoTime beg = MonoTime.currTime;
     for (int i=0; i < 100; i++)
         sum += cast(int)sum_subranges(v,2).length;
     MonoTime end = MonoTime.currTime;
     writeln(end-beg);
     writeln(sum);
     return sum;
 }

 On Sun, Aug 13, 2017 at 9:03 AM, Neia Neutuladh via
 Digitalmars-d-learn < digitalmars-d-learn puremagic.com> wrote:

 [...]



 Thank you all for the replies. Good to know the community is alive in d
 :)

 Let's settle the playground:
 D  : http://ideone.com/h4fnsD
 C++: http://ideone.com/X1pyXG

 Both using GCC under the hood.
 C++ in 112 ms;
 D in :
 - 2.5 sec with original source;
 - 2.5 sec with Daniel's 1st version;
 - 5 sec timeout exceeded with Daniel's 2nd version;
 - 1.8 sec with Neia-like preallocation;

 So still it's not that neaty.

 (What's interesting C++ code generates 2KLOC of assembly, and Dlang  
 ldc 12KLOC - checked at godbolt).

 P.S. For C++ version to work under clang, the function which takes
 (BinaryOp) must go before the other one (my bad).

Aug 13 2017

data pulverizer <data.pulverizer gmail.com> writes:

On Sunday, 13 August 2017 at 06:09:39 UTC, amfvcg wrote:
 Hi all,
 I'm solving below task:

 given container T and value R return sum of R-ranges over T. An 
 example:
 input : T=[1,1,1] R=2
 output : [2, 1]

 input : T=[1,2,3] R=1
 output : [1,2,3]
 (see dlang unittests for more examples)


 Below c++ code compiled with g++-5.4.0 -O2 -std=c++14 runs on 
 my machine in 656 836 us.
 Below D code compiled with dmd v2.067.1 -O runs on my machine 
 in ~ 14.5 sec.

 Each language has it's own "way of programming", and as I'm a 
 beginner in D - probably I'm running through bushes instead of 
 highway. Therefore I'd like to ask you, experienced dlang devs, 
 to shed some light on "how to do it dlang-way".

 ...

 From time to time the forum gets questions like these. It would 
be nice it we had blog articles concerning high performance 
programming in D, a kind of best practice approach, everything 
from programming idioms, to compilers, and compiler flags.

Aug 17 2017

D Programming

C/C++ Programming

Other

digitalmars.D.learn - D outperformed by C++, what am I doing wrong?