digitalmars.D.learn - Why is this code slow?

Csaba (24/24) Mar 24 2024 I know that benchmarks are always controversial and depend on a

matheus (11/18) Mar 24 2024 I think a few things can be going on, but one way to go is trying
Sergey (20/22) Mar 24 2024 Not really..
kdevel (13/30) Mar 24 2024 Usually you do not translate mathematical expressions directly

rkompass (16/24) Mar 24 2024 I used the loop:

Sergey (8/10) Mar 24 2024 1) If possible you can use "betterC" - to disable runtime

rkompass (32/42) Mar 25 2024 Thank you. I succeeded with `gdc -Wall -O2 -frelease

Salih Dincer (42/62) Mar 26 2024 It's obvious that you are a good mathematician. You used sequence

Salih Dincer (74/90) Mar 24 2024 I also used this code:

Csaba (3/16) Mar 26 2024 I know that the code can be simplified/optimized, I just wanted

Lance Bachmeier (19/44) Mar 26 2024 As others suggested, pow is the problem. I noticed that the C

Lance Bachmeier (16/70) Mar 26 2024 And then the other thing is changing

rkompass (45/45) Mar 27 2024 I apologize for digressing a little bit further - just to share

Salih Dincer (19/21) Mar 27 2024 Good thing you're digressing; I am 45 years old and I still

rkompass (41/45) Mar 28 2024 So we go with another digression. I discovered parallel, also

Salih Dincer (40/44) Mar 28 2024 You can achieve parallelism in C using libraries such as OpenMP,

rkompass (14/38) Mar 28 2024 Nice, thank you.

Sergey (15/16) Mar 28 2024 It's hard to compare actually.
Salih Dincer (5/9) Mar 28 2024 There is no such thing as parallel programming in D anyway. At

Serg Gini (3/6) Mar 28 2024 I think it just works :)

Salih Dincer (54/61) Mar 28 2024 A year has passed and I have tried almost everything! Either it

Csaba <feketecsaba gmail.com> writes:

I know that benchmarks are always controversial and depend on a 
lot of factors. So far, I read that D performs very well in 
benchmarks, as well, if not better, as C.

I wrote a little program that approximates PI using the Leibniz 
formula. I implemented the same thing in C, D and Python, all of 
them execute 1,000,000 iterations 20 times and display the 
average time elapsed.

Here are the results:

C: 0.04s
Python: 0.33s
D: 0.73s

What the hell? D slower than Python? This cannot be real. I am 
sure I am making a mistake here. I'm sharing all 3 programs here:

C: https://pastebin.com/s7e2HFyL
D: https://pastebin.com/fuURdupc
Python: https://pastebin.com/zcXAkSEf

As you can see the function that does the job is exactly the same 
in C and D.

Here are the compile/run commands used:

C: `gcc leibniz.c -lm -oleibc`
D: `gdc leibniz.d -frelease -oleibd`
Python: `python3 leibniz.py`

PS. my CPU is AMD A8-5500B and my OS is Ubuntu Linux, if that 
matters.

Mar 24 2024

matheus <matheus gmail.com> writes:

On Sunday, 24 March 2024 at 19:31:19 UTC, Csaba wrote:
 ...

 Here are the results:

 C: 0.04s
 Python: 0.33s
 D: 0.73s
 
 ...

I think a few things can be going on, but one way to go is trying 
using optimization flags like "-O2", and run again.

But anyway, looking through Assembly generated:

C: https://godbolt.org/z/45Kn1W93b
D: https://godbolt.org/z/Ghr3fqaTW

The Leibniz's function is very close each other, except for one 
thing, the "pow" function on D side. It's a template, maybe you 
should start from there, in fact I'd try the pow from C to see 
what happens.

Matheus.

Mar 24 2024

Sergey <kornburn yandex.ru> writes:

On Sunday, 24 March 2024 at 19:31:19 UTC, Csaba wrote:
 As you can see the function that does the job is exactly the 
 same in C and D.

Not really..

The speed of Leibniz algo is mostly the same. You can check the 
code in this benchmark for example: 
https://github.com/niklas-heer/speed-comparison

What you could fix in your code:
* you can use enum for BENCHMARKS and ITERATIONS
* use pow from core.stdc.math
* use sw.reset() in a loop

So the main part could look like this:
```d
auto sw = StopWatch(AutoStart.no);
sw.start();
foreach (i; 0..BENCHMARKS) {
     result += leibniz(ITERATIONS);
     total_time += sw.peek.total!"nsecs";
     sw.reset();
}
sw.stop();
```

Mar 24 2024

kdevel <kdevel vogtner.de> writes:

On Sunday, 24 March 2024 at 19:31:19 UTC, Csaba wrote:
 I know that benchmarks are always controversial and depend on a 
 lot of factors. So far, I read that D performs very well in 
 benchmarks, as well, if not better, as C.

 I wrote a little program that approximates PI using the Leibniz 
 formula. I implemented the same thing in C, D and Python, all 
 of them execute 1,000,000 iterations 20 times and display the 
 average time elapsed.

 Here are the results:

 C: 0.04s
 Python: 0.33s
 D: 0.73s

 What the hell? D slower than Python? This cannot be real. I am 
 sure I am making a mistake here. I'm sharing all 3 programs 
 here:

 C: https://pastebin.com/s7e2HFyL
 D: https://pastebin.com/fuURdupc
 Python: https://pastebin.com/zcXAkSEf

Usually you do not translate mathematical expressions directly 
into code:

```
    n += pow(-1.0, i - 1.0) / (i * 2.0 - 1.0);
```

The term containing the `pow` invocation computes the alternating 
sequence -1, 1, -1, ..., which can be replaced by e.g.

```
    immutable int [2] sign = [-1, 1];
    n += sign [i & 1] / (i * 2.0 - 1.0);
```

This saves the expensive call to the pow function.

Mar 24 2024

rkompass <rkompass gmx.de> writes:

 The term containing the `pow` invocation computes the 
 alternating sequence -1, 1, -1, ..., which can be replaced by 
 e.g.

 ```
    immutable int [2] sign = [-1, 1];
    n += sign [i & 1] / (i * 2.0 - 1.0);
 ```

 This saves the expensive call to the pow function.

I used the loop:
```d
	for (int i = 1; i < iter; i++)
		n += ((i%2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);
```
in both C and D, with gcc and gdc and got average execution times:

--- C -----
original:  ....   loop replacement: ....      -O2:
0.009989   ....        0.003198     ...........     0.001335

--- D -----
original:  ....    loop replacement: ....     -O2:
0.230346   ....       0.003083       ...........   0.001309

almost no difference.

But the D binary is much larger on my Linux:
  4600920 bytes instead of 15504 bytes for the C version.

Are there some simple switches / settings to get a smaller binary?

Mar 24 2024

Sergey <kornburn yandex.ru> writes:

On Sunday, 24 March 2024 at 22:16:06 UTC, rkompass wrote:
 Are there some simple switches / settings to get a smaller 
 binary?

1) If possible you can use "betterC" - to disable runtime
2) otherwise
```bash
--release --O3 --flto=full -fvisibility=hidden 
-defaultlib=phobos2-ldc-lto,druntime-ldc-lto -L=-dead_strip -L=-x 
-L=-S -L=-lz
```

Mar 24 2024

rkompass <rkompass gmx.de> writes:

On Sunday, 24 March 2024 at 23:02:19 UTC, Sergey wrote:
 On Sunday, 24 March 2024 at 22:16:06 UTC, rkompass wrote:
 Are there some simple switches / settings to get a smaller 
 binary?

 1) If possible you can use "betterC" - to disable runtime
 2) otherwise
 ```bash
 --release --O3 --flto=full -fvisibility=hidden 
 -defaultlib=phobos2-ldc-lto,druntime-ldc-lto -L=-dead_strip 
 -L=-x -L=-S -L=-lz
 ```

Thank you. I succeeded with `gdc -Wall -O2 -frelease 
-shared-libphobos`

A little remark:
The approximation to pi is slow, but oscillates up and down much 
more than its average. So doing the average of 2 steps gives many 
more precise digits. We can simulate this by doing a last step 
with half the size:

```d
double leibniz(int it) {
   double n = 1.0;
   for (int i = 1; i < it; i++)
     n += ((i%2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);
   n += 0.5*((it%2) ? -1.0 : 1.0) / (it * 2.0 + 1.0);
   return n * 4.0;
}
```
Of course you may also combine the up(+) and down(-) step to one:

1/i - 1/(i+2) = 2/(i*(i+2))

```d
double leibniz(int iter) {
   double n = 0.0;
   for (int i = 1; i < iter; i+=4)
     n += 2.0 / (i * (i+2.0));
   return n * 4.0;
}
```
or even combine both approaches. But of, course mathematically 
much more is possible. This was not about approximating pi as 
fast as possible...

The above first approach still works with the original speed, 
only makes the result a little bit nicer.

Mar 25 2024

Salih Dincer <salihdb hotmail.com> writes:

On Monday, 25 March 2024 at 14:02:08 UTC, rkompass wrote:
 
 Of course you may also combine the up(+) and down(-) step to 
 one:

 1/i - 1/(i+2) = 2/(i*(i+2))

 ```d
 double leibniz(int iter) {
   double n = 0.0;
   for (int i = 1; i < iter; i+=4)
     n += 2.0 / (i * (i+2.0));
   return n * 4.0;
 }
 ```
 or even combine both approaches. But of, course mathematically 
 much more is possible. This was not about approximating pi as 
 fast as possible...

 The above first approach still works with the original speed, 
 only makes the result a little bit nicer.

It's obvious that you are a good mathematician. You used sequence 
A005563.  First of all, I must apologize to the questioner for 
digressing from the topic. But I saw that there is a calculation 
difference between real and double. My goal was to see if there 
would be a change in speed.  For example, with 250 million cycles 
(iter/4) I got the following result:

 3.14159265158976691 (250 5million (with real)
 3.14159264457621568 (250 million with double)
 3.14159265358979324 (std.math.constants.PI)

First of all, my question is: Why do we see this calculation 
error with double?  Could the changes I made to the algorithm 
have caused this?  Here's an executable code snippet:

```d
enum step = 4;
enum loop = 250_000_000;

auto leibniz(T)(int iter)
{
   T n = 2/3.0;
   for(int i = 5; i < iter; i += step)
   {
     T a = (2.0 + i) * i; // https://oeis.org/A005563
     n += 2/a;
   }
   return n * step;
}

import std.stdio : writefln;

void main()
{
   enum iter = loop * step-10;
   
   65358979323.writefln!"Compare.%s";

   iter.leibniz!double.writefln!"%.17f (double)";
   iter.leibniz!real.writefln!"%.17f (real)";

   imported!"std.math".PI.writefln!"%.17f (enum)";
} /* Prints:

Compare.65358979323
3.14159264457621568 (double)
3.14159265158976689 (real)
3.14159265358979324 (enum)
*/
```

In fact, there are algorithms that calculate accurately up to 12 
decimal places with fewer cycles. (e.g. 9999)

SDB 79

Mar 26 2024

Salih Dincer <salihdb hotmail.com> writes:

On Sunday, 24 March 2024 at 22:16:06 UTC, Kdevel wrote:
 The term containing the `pow` invocation computes the 
 alternating sequence -1, 1, -1, ..., which can be replaced by 
 e.g.

 ```d
    immutable int [2] sign = [-1, 1];
    n += sign [i & 1] / (i * 2.0 - 1.0);
 ```

 This saves the expensive call to the pow function.

I also used this code:
```d
import std.stdio : writefln;
import std.datetime.stopwatch;

enum ITERATIONS = 1_000_000;
enum BENCHMARKS = 20;

auto leibniz(bool speed = true)(int iter) {
   double n = 1.0;

   static if(speed) const sign = [-1, 1];

   for(int i = 2; i < iter; i++) {
     static if(speed) {
       const m = i << 1;
       n += sign [i & 1] / (m - 1.0);
     } else {
       n += pow(-1, i - 1) / (i * 2.0 - 1.0);
     }
   }
   return n * 4.0;
}

auto pow(F, G)(F x, G n)  nogc  trusted pure nothrow {
     import std.traits : Unsigned, Unqual;

     real p = 1.0, v = void;
     Unsigned!(Unqual!G) m = n;

     if(n < 0) {
         if(n == -1) return 1 / x;
         m = cast(typeof(m))(0 - n);
         v = p / x;
     } else {
         switch(n) {
           case 0: return 1.0;
           case 1: return x;
           case 2: return x * x;
           default:
         }
         v = x;
     }
     while(true) {
         if(m & 1) p *= v;
         m >>= 1;
         if(!m) break;
         v *= v;
     }
     return p;
}

void main()
{
     double result;
     long total_time = 0;

     for(int i = 0; i < BENCHMARKS; i++)
     {
         auto sw = StopWatch(AutoStart.no);
         sw.start();

         result = ITERATIONS.leibniz;//!false;

         sw.stop();
         total_time += sw.peek.total!"nsecs";
     }

     result.writefln!"%0.21f";
     writefln("Avg execution time: %f\n", total_time / BENCHMARKS 
/ 1e9);
}
```

and results:

 dmd -run "leibnizTest.d"
 3.141594653593692054727
 Avg execution time: 0.002005

If I compile with leibniz!false(ITERATIONS) the average execution 
time increases slightly:

 Avg execution time: 0.044435

However, if you pay attention, it is not connected to an external 
library and a power function that works with integers is used. 
Normally the following function of the library should be called:

 Unqual!(Largest!(F, G)) pow(F, G)(F x, G y)  nogc  trusted pure 
 nothrow
 if (isFloatingPoint!(F) && isFloatingPoint!(G))
 ...

Now, the person asking the question will ask why it is slow even 
though we use exactly the same codes in C; rightly. You may think 
that the more watermelon you carry in your arms, the slower you 
naturally become. I think the important thing is not to drop the 
watermelons :)

SDB 79

Mar 24 2024

Csaba <feketecsaba gmail.com> writes:

On Sunday, 24 March 2024 at 21:21:13 UTC, kdevel wrote:
 Usually you do not translate mathematical expressions directly 
 into code:

 ```
    n += pow(-1.0, i - 1.0) / (i * 2.0 - 1.0);
 ```

 The term containing the `pow` invocation computes the 
 alternating sequence -1, 1, -1, ..., which can be replaced by 
 e.g.

 ```
    immutable int [2] sign = [-1, 1];
    n += sign [i & 1] / (i * 2.0 - 1.0);
 ```

 This saves the expensive call to the pow function.

I know that the code can be simplified/optimized, I just wanted 
to compare the same expression in C and D.

Mar 26 2024

Lance Bachmeier <no spam.net> writes:

On Sunday, 24 March 2024 at 19:31:19 UTC, Csaba wrote:
 I know that benchmarks are always controversial and depend on a 
 lot of factors. So far, I read that D performs very well in 
 benchmarks, as well, if not better, as C.

 I wrote a little program that approximates PI using the Leibniz 
 formula. I implemented the same thing in C, D and Python, all 
 of them execute 1,000,000 iterations 20 times and display the 
 average time elapsed.

 Here are the results:

 C: 0.04s
 Python: 0.33s
 D: 0.73s

 What the hell? D slower than Python? This cannot be real. I am 
 sure I am making a mistake here. I'm sharing all 3 programs 
 here:

 C: https://pastebin.com/s7e2HFyL
 D: https://pastebin.com/fuURdupc
 Python: https://pastebin.com/zcXAkSEf

 As you can see the function that does the job is exactly the 
 same in C and D.

 Here are the compile/run commands used:

 C: `gcc leibniz.c -lm -oleibc`
 D: `gdc leibniz.d -frelease -oleibd`
 Python: `python3 leibniz.py`

 PS. my CPU is AMD A8-5500B and my OS is Ubuntu Linux, if that 
 matters.

As others suggested, pow is the problem. I noticed that the C 
versions are often much faster than their D counterparts. (And I 
don't view that as a problem, since both are built into the 
language - my only thought is that the D version should call the 
C version).

Changing

```
import std.math:pow;
```

to

```
import core.stdc.math: pow;
```

and leaving everything unchanged, I get

C: Avg execution time: 0.007918
D (original): Avg execution time: 0.102612
D (using core.stdc.math): Avg execution time: 0.008134

So more or less the exact same numbers if you use core.stdc.math.

Mar 26 2024

Lance Bachmeier <no spam.net> writes:

On Tuesday, 26 March 2024 at 14:25:53 UTC, Lance Bachmeier wrote:
 On Sunday, 24 March 2024 at 19:31:19 UTC, Csaba wrote:
 I know that benchmarks are always controversial and depend on 
 a lot of factors. So far, I read that D performs very well in 
 benchmarks, as well, if not better, as C.

 I wrote a little program that approximates PI using the 
 Leibniz formula. I implemented the same thing in C, D and 
 Python, all of them execute 1,000,000 iterations 20 times and 
 display the average time elapsed.

 Here are the results:

 C: 0.04s
 Python: 0.33s
 D: 0.73s

 What the hell? D slower than Python? This cannot be real. I am 
 sure I am making a mistake here. I'm sharing all 3 programs 
 here:

 C: https://pastebin.com/s7e2HFyL
 D: https://pastebin.com/fuURdupc
 Python: https://pastebin.com/zcXAkSEf

 As you can see the function that does the job is exactly the 
 same in C and D.

 Here are the compile/run commands used:

 C: `gcc leibniz.c -lm -oleibc`
 D: `gdc leibniz.d -frelease -oleibd`
 Python: `python3 leibniz.py`

 PS. my CPU is AMD A8-5500B and my OS is Ubuntu Linux, if that 
 matters.

 As others suggested, pow is the problem. I noticed that the C 
 versions are often much faster than their D counterparts. (And 
 I don't view that as a problem, since both are built into the 
 language - my only thought is that the D version should call 
 the C version).

 Changing

 ```
 import std.math:pow;
 ```

 to

 ```
 import core.stdc.math: pow;
 ```

 and leaving everything unchanged, I get

 C: Avg execution time: 0.007918
 D (original): Avg execution time: 0.102612
 D (using core.stdc.math): Avg execution time: 0.008134

 So more or less the exact same numbers if you use 
 core.stdc.math.

And then the other thing is changing

```
const int BENCHMARKS = 20;
```

to

```
enum BENCHMARKS = 20;
```

which should allow substitution of the constant directly into the 
rest of the program, which gives

```
Avg execution time: 0.007564
```

On my Ubuntu 22.04 machine, therefore, the LDC binary with no 
flags is slightly faster than the C code compiled with your flags.

Mar 26 2024

rkompass <rkompass gmx.de> writes:

I apologize for digressing a little bit further - just to share 
insights to other learners.

I had the question, why my binary was so big (> 4M), discovered 
the
`gdc -Wall -O2 -frelease -shared-libphobos` options (now >200K).
Then I tried to avoid GC, just learnt about this: The GC in the 
Leibnitz code is there only for the writeln. With a change to 
(again standard C) printf the
` nogc` modifier can be applied, the binary then gets down to 
~17K, a comparable size of the C counterpart.

Another observation regarding precision:
The iteration proceeds in the wrong order. Adding small 
contributions first and bigger last leads to less loss when 
summing up the small parts below the final real/double LSB limit.

So I'm now at this code (abolishing the avarage of 20 interations 
as unnesseary)

```d
// import std.stdio;  // writeln will lead to the garbage 
collector to be included
import core.stdc.stdio: printf;
import std.datetime.stopwatch;

const int ITERATIONS = 1_000_000_000;

 nogc pure double leibniz(int it) {  // sum up the small values 
first
   double n = 0.5*((it%2) ? -1.0 : 1.0) / (it * 2.0 + 1.0);
   for (int i = it-1; i >= 0; i--)
     n += ((i%2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);
   return n * 4.0;
}

 nogc void main() {
     double result;
     double total_time = 0;
     auto sw = StopWatch(AutoStart.yes);
     result = leibniz(ITERATIONS);
     sw.stop();
     total_time = sw.peek.total!"nsecs";
     printf("%.16f\n", result);
     printf("Execution time: %f\n", total_time / 1e9);
}
```
result:
```
3.1415926535897931
Execution time: 1.068111
```

Mar 27 2024

Salih Dincer <salihdb hotmail.com> writes:

On Wednesday, 27 March 2024 at 08:22:42 UTC, rkompass wrote:
 I apologize for digressing a little bit further - just to share 
 insights to other learners.

Good thing you're digressing; I am 45 years old and I still 
cannot say that I am finished as a student! For me this is 
version 4 and it looks like we don't need a 3rd variable other 
than the function parameter and return value:

```d
auto leibniz_v4(int i)  nogc pure {
   double n = 0.5*((i%2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);

   while(--i >= 0)
     n += ((i%2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);

   return n * 4.0;
} /*
3.1415926535892931
3.141592653589 793238462643383279502884197169399375105
3.141593653590774200000 (v1)
Avg execution time: 0.000033
*/
```

SDB 79

Mar 27 2024

rkompass <rkompass gmx.de> writes:

On Thursday, 28 March 2024 at 01:09:34 UTC, Salih Dincer wrote:
 Good thing you're digressing; I am 45 years old and I still 
 cannot say that I am finished as a student! For me this is 
 version 4 and it looks like we don't need a 3rd variable other 
 than the function parameter and return value:

So we go with another digression. I discovered parallel, also 
avoided the extra variable, as suggested by Salih:

```d
import std.range;
import std.parallelism;
import core.stdc.stdio: printf;
import std.datetime.stopwatch;

enum ITERS = 1_000_000_000;
enum STEPS = 31; // 5 is fine, even numbers (e.g. 10) may give 
bad precision (for math reason ???)

pure double leibniz(int i) {  // sum up the small values first
	double r = (i == ITERS) ? 0.5 * ((i%2) ? -1.0 : 1.0) / (i * 2.0 
+ 1.0) : 0.0;
	for (--i; i >= 0; i-= STEPS)
		r += ((i%2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);
	return r * 4.0;
}

void main() {
	auto start = iota(ITERS, ITERS-STEPS, -1).array;
	auto sw = StopWatch(AutoStart.yes);
	double result = 0.0;
	foreach(s; start.parallel)
		result += leibniz(s);
	double total_time = sw.peek.total!"nsecs";
     printf("%.16f\n", result);
     printf("Execution time: %f\n", total_time / 1e9);
}
```
gives:
```
3.1415926535897931
Execution time: 0.211667
```
My laptop has 6 cores and obviously 5 are used in parallel by 
this.

The original question related to a comparison between C, D and 
Python.
Turning back to this: Are there similarly simple libraries for C, 
that allow for
parallel computation?

Mar 28 2024

Salih Dincer <salihdb hotmail.com> writes:

On Thursday, 28 March 2024 at 11:50:38 UTC, rkompass wrote:
 
 Turning back to this: Are there similarly simple libraries for 
 C, that allow for
 parallel computation?

You can achieve parallelism in C using libraries such as OpenMP, 
which provides a set of compiler directives and runtime library 
routines for parallel programming.

Here’s an example of how you might modify the code to use OpenMP 
for parallel processing:

```c
#include <stdio.h>
#include <time.h>
#include <omp.h>

#define ITERS 1000000000
#define STEPS 31

double leibniz(int i) {
   double r = (i == ITERS) ? 0.5 * ((i % 2) ? -1.0 : 1.0) / (i * 
2.0 + 1.0) : 0.0;
   for (--i; i >= 0; i -= STEPS)
     r += ((i % 2) ? -1.0 : 1.0) / (i * 2.0 + 1.0);
   return r * 4.0;
}

int main() {
   double start_time = omp_get_wtime();

   double result = 0.0;

   #pragma omp parallel for reduction(+:result)
   for (int s = ITERS; s >= 0; s -= STEPS) {
     result += leibniz(s);
   }

   // Calculate the time taken
   double time_taken = omp_get_wtime() - start_time;

   printf("%.16f\n", result);
   printf("%f (seconds)\n", time_taken);

   return 0;
}
```
To compile this code with OpenMP support, you would use a command 
like gcc -fopenmp your_program.c. This tells the GCC compiler to 
enable OpenMP directives. The #pragma omp parallel for directive 
tells the compiler to parallelize the loop, and the reduction 
clause is used to safely accumulate the result variable across 
multiple threads.

SDB 79

Mar 28 2024

rkompass <rkompass gmx.de> writes:

On Thursday, 28 March 2024 at 14:07:43 UTC, Salih Dincer wrote:
 On Thursday, 28 March 2024 at 11:50:38 UTC, rkompass wrote:
 
 Turning back to this: Are there similarly simple libraries for 
 C, that allow for
 parallel computation?

 You can achieve parallelism in C using libraries such as 
 OpenMP, which provides a set of compiler directives and runtime 
 library routines for parallel programming.

 Here’s an example of how you might modify the code to use 
 OpenMP for parallel processing:

 ```c
  . . .

   #pragma omp parallel for reduction(+:result)
   for (int s = ITERS; s >= 0; s -= STEPS) {
     result += leibniz(s);
   }
  . . . ```
 To compile this code with OpenMP support, you would use a 
 command like gcc -fopenmp your_program.c. This tells the GCC 
 compiler to enable OpenMP directives. The #pragma omp parallel 
 for directive tells the compiler to parallelize the loop, and 
 the reduction clause is used to safely accumulate the result 
 variable across multiple threads.

 SDB 79

Nice, thank you.
It worked endlessly until I saw I had to correct the `for` to
   `for (int s = ITERS; s > ITERS-STEPS; s--)`
Now the result is:
```
3.1415926535897936
Execution time: 0.212483 (seconds).
```
This result is sooo similar!

I didn't know that OpenMP programming could be that easy.
Binary size is 16K, same order of magnitude, although somewhat 
less.
D advantage is gone here, I would say.

Mar 28 2024

Sergey <kornburn yandex.ru> writes:

On Thursday, 28 March 2024 at 20:18:10 UTC, rkompass wrote:
 D advantage is gone here, I would say.

It's hard to compare actually.
Std.parallelism has a bit different mechanics, and I think easier 
to use. The syntax is nicer.

OpenMP is an well-known and highly adopted tool, which is also 
quite flexible, but usually used with initially sequential code. 
And the syntax is not very intuitive.

Interesting point from Dr Russel here: 
https://forum.dlang.org/thread/qvksmhwkaxbrnggsvtxe forum.dlang.org

However since 2012 OpenMP also got some development and 
improvement and HPC world is pretty conservative. So it is one of 
the most popular tool in the area: 
https://www.openmp.org/wp-content/uploads/sc23-openmp-popularity-mattson.pdf
With MPI.. But probably with AI and GPU revolution the balance 
will shift a bit to CUDA-like technologies.

Mar 28 2024

Salih Dincer <salihdb hotmail.com> writes:

On Thursday, 28 March 2024 at 20:18:10 UTC, rkompass wrote:
 I didn't know that OpenMP programming could be that easy.
 Binary size is 16K, same order of magnitude, although somewhat 
 less.
 D advantage is gone here, I would say.

There is no such thing as parallel programming in D anyway. At 
least it has modules, but I didn't see it being works. Whenever I 
use toys built in foreach() it always ends in disappointment :)

SDB 79

Mar 28 2024

Serg Gini <kornburn yandex.ru> writes:

On Thursday, 28 March 2024 at 23:15:26 UTC, Salih Dincer wrote:
 There is no such thing as parallel programming in D anyway. At 
 least it has modules, but I didn't see it being works. Whenever 
 I use toys built in foreach() it always ends in disappointment

I think it just works :)
Which issues did you have with it?

Mar 28 2024

Salih Dincer <salihdb hotmail.com> writes:

On Friday, 29 March 2024 at 00:04:14 UTC, Serg Gini wrote:
 On Thursday, 28 March 2024 at 23:15:26 UTC, Salih Dincer wrote:
 There is no such thing as parallel programming in D anyway. At 
 least it has modules, but I didn't see it being works. 
 Whenever I use toys built in foreach() it always ends in 
 disappointment

 I think it just works :)
 Which issues did you have with it?

A year has passed and I have tried almost everything! Either it 
went into an infinite loop or nothing changed at the speed. At 
least things are not as simple as openMP on the D side! First I 
tried this code snippet: futile attempt!

```d
struct RowlandSequence {
   import std.numeric : gcd;
   import std.format : format;
   import std.conv : text;

   long b, r, a = 3;
   enum empty = false;

   string[] front() {
     string result = format("%s, %s", b, r);
     return [text(a), result];
   }

   void popFront() {
     long result = 1;
     while(result == 1) {
       result = gcd(r++, b);
       b += result;
     }
     a = result;
   }
}

enum BP {
   f = 1, b = 7, r = 2, a = 1, /*
   f = 109, b = 186837516, r = 62279173, //*/
   s = 5
}

void main()
{
   RowlandSequence rs;
   long start, skip;

   with(BP) {
     rs = RowlandSequence(b, r);
     start = f;
     skip = s;
   }
   rs.popFront();

   import std.stdio, std.parallelism;
   import std.range : take;

   auto rsFirst128 = rs.take(128);
   foreach(r; rsFirst128.parallel)
   {
     if(r[0].length > skip)
     {
       start.writeln(": ", r);
     }
     start++;
   }
}
```
SDB 79

Mar 28 2024

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Why is this code slow?