digitalmars.D.ldc - LDC optimized builds causing unexpected behaviour when sharing

Keivan Shah (29/29) Jun 01 2022 Hello,

Johan (10/40) Jun 01 2022 This is not a bug.

Keivan Shah (14/25) Jun 01 2022 Hey Johan,

Kagamin (4/7) Jun 01 2022 No, `shared` is a low level primitive. If you want one size fits

Keivan Shah (10/17) Jun 01 2022 Hey Kagamin,

kinke (17/19) Jun 01 2022 `core.atomic.atomic{Load,Store}` handle this:

Keivan Shah (12/31) Jun 01 2022 Hey kinke,

Johan (8/40) Jun 02 2022 `atomicFence` only works for ordering memory operations on a

Keivan Shah (11/18) Jun 02 2022 Hey Johan,

rm (6/27) Jun 06 2022 You can check this:

Kagamin (3/5) Jun 15 2022 You should disable postblit or it will be copyable without atomic

David Nadlinger (14/16) Jun 04 2022 `atomicFence` should work across threads as well; it inserts actual

Johan (18/24) Jun 05 2022 What I meant to say is that the fence only works for ordering

Keivan Shah <keivan.shah silverleafcaps.com> writes:

Hello,
I am encountering a strange bug when making optimized build with 
LDC. Can someone please help me debug it.

```
void main()
{
     import std.stdio : writeln;
     import core.thread;

     // Runinng this code on LDC with `-O` leads to infinite loop

     writeln("Starting...");
     bool done = false; // Making this `shared` also doesn't work
     new Thread({ done = true; }).start();
     while (!done)
     {
     } // Wait for done
     writeln("Done!");
}
```
The program ends up in an infinite loop if `-O` is passed to LDC 
and works normally without it. Also works fine in both cases with 
dmd. I thought this was due to the variable `done` being shared 
between threads, so I tried marking the variable as `shared` but 
even that does not solve the problem. Also tried making the 
variable global and using `__gshared` doesn't seem to work as 
well. Finally I was able to solve the problem by using 
[core.volatile](https://dlang.org/phobos/core_volatile.html), but 
that doesn't look like a very great solution. So I wanted to 
understand the main problem here and make sure and also verify 
that this is not a LDC bug.

Jun 01 2022

Johan <j j.nl> writes:

On Wednesday, 1 June 2022 at 07:02:08 UTC, Keivan Shah wrote:
 Hello,
 I am encountering a strange bug when making optimized build 
 with LDC. Can someone please help me debug it.

 ```
 void main()
 {
     import std.stdio : writeln;
     import core.thread;

     // Runinng this code on LDC with `-O` leads to infinite loop

     writeln("Starting...");
     bool done = false; // Making this `shared` also doesn't work
     new Thread({ done = true; }).start();
     while (!done)
     {
     } // Wait for done
     writeln("Done!");
 }
 ```
 The program ends up in an infinite loop if `-O` is passed to 
 LDC and works normally without it. Also works fine in both 
 cases with dmd. I thought this was due to the variable `done` 
 being shared between threads, so I tried marking the variable 
 as `shared` but even that does not solve the problem. Also 
 tried making the variable global and using `__gshared` doesn't 
 seem to work as well. Finally I was able to solve the problem 
 by using 
 [core.volatile](https://dlang.org/phobos/core_volatile.html), 
 but that doesn't look like a very great solution. So I wanted 
 to understand the main problem here and make sure and also 
 verify that this is not a LDC bug.

This is not a bug.
The solution with `volatileLoad` works, but I think it 
technically isn't quite correct; documentation says: "They should 
not be used for communication between threads.".

There are many articles online about this kind of conditional 
loop on a variable that is modified by another thread. For 
example: 
https://stackoverflow.com/questions/10091112/using-a-global-variable-to-control-a-while-loop-
n-a-separate-thread  and 
https://devblogs.microsoft.com/oldnewthing/20180924-00/?p=99805

-Johan

Jun 01 2022

Keivan Shah <keivan.shah silverleafcaps.com> writes:

On Wednesday, 1 June 2022 at 08:23:18 UTC, Johan wrote:
 This is not a bug.
 The solution with `volatileLoad` works, but I think it 
 technically isn't quite correct; documentation says: "They 
 should not be used for communication between threads.".

 There are many articles online about this kind of conditional 
 loop on a variable that is modified by another thread. For 
 example: 
 https://stackoverflow.com/questions/10091112/using-a-global-variable-to-control-a-while-loop-
n-a-separate-thread  and 
https://devblogs.microsoft.com/oldnewthing/20180924-00/?p=99805

 -Johan

Hey Johan,
Thanks for the quick response. As you mentioned even I don't feel 
the solution with `volatileLoad` is correct and so was looking 
for something else. The 
[documentation](https://dlang.org/articles/migrate-to-shared.html#shared)
stated that

 Global data that is meant to be shared among multiple threads 
 should be marked with the shared keyword

So to me it seemed like marking the variable as `shared` should 
have made it accessible as well as it takes care of the memory 
ordering. That plus the fact that the snippet works in `dmd` but 
not `ldc` made me think it maybe was a bug. Anyways, have gotten 
a better understanding of the issue now, probably will use 
[core.atomic.atomicFence](https://dlang.org/phobos/core_atomi


Thanks and Regards,
Keivan Shah.

Jun 01 2022

Kagamin <spam here.lot> writes:

On Wednesday, 1 June 2022 at 13:35:45 UTC, Keivan Shah wrote:
 So to me it seemed like marking the variable as `shared` should 
 have made it accessible as well as it takes care of the memory 
 ordering.

No, `shared` is a low level primitive. If you want one size fits 
all concurrency, use mutex, synchronized statement or 
std.concurrency.

Jun 01 2022

Keivan Shah <keivan.shah silverleafcaps.com> writes:

On Wednesday, 1 June 2022 at 15:31:40 UTC, Kagamin wrote:
 On Wednesday, 1 June 2022 at 13:35:45 UTC, Keivan Shah wrote:
 So to me it seemed like marking the variable as `shared` 
 should have made it accessible as well as it takes care of the 
 memory ordering.

 No, `shared` is a low level primitive. If you want one size 
 fits all concurrency, use mutex, synchronized statement or 
 std.concurrency.

Hey Kagamin,

Thanks, starting to understand that a little now. Seems like that 
I was relying on some incorrect assumptions and will need to make 
some overall logic changes. Will look into using synchronized, 
but seems like my use-case might be fine with `core.atomic` for 
now since need a fast lightweight way for data sharing for a few 
shared variables.

Regards and Thanks,
Keivan Shah.

Jun 01 2022

kinke <noone nowhere.com> writes:

On Wednesday, 1 June 2022 at 13:35:45 UTC, Keivan Shah wrote:
 probably will use 
 [core.atomic.atomicFence](https://dlang.org/phobos/core_atomi


`core.atomic.atomic{Load,Store}` handle this:

```
void main()
{
     import std.stdio : writeln;
     import core.thread;
     import core.atomic;

     writeln("Starting...");
     shared bool done = false;
     new Thread({ atomicStore(done, true); }).start();
     while (!atomicLoad(done))
     {
     } // Wait for done
     writeln("Done!");
}
``

Jun 01 2022

Keivan Shah <keivan.shah silverleafcaps.com> writes:

On Wednesday, 1 June 2022 at 16:46:51 UTC, kinke wrote:
 On Wednesday, 1 June 2022 at 13:35:45 UTC, Keivan Shah wrote:
 probably will use 
 [core.atomic.atomicFence](https://dlang.org/phobos/core_atomi


 `core.atomic.atomic{Load,Store}` handle this:

 ```
 void main()
 {
     import std.stdio : writeln;
     import core.thread;
     import core.atomic;

     writeln("Starting...");
     shared bool done = false;
     new Thread({ atomicStore(done, true); }).start();
     while (!atomicLoad(done))
     {
     } // Wait for done
     writeln("Done!");
 }
 ``

Hey kinke,
Thanks for the snippet, I have some existing code that follows 
the pattern similar to the example I had given. There are a lot 
of variables that are shared between the 2 threads and so instead 
of changing the reads and writes for every variable to be 
`atomic{Store,Load}`, I was thinking if `atomicFence` might 
achieve the same effect with far lesser code change requirements. 
Would that be recommended or should I use `atomic{Store,Load}` 
instead?

Regards and Thanks,
Keivan Shah

Jun 01 2022

Johan <j j.nl> writes:

On Thursday, 2 June 2022 at 04:33:39 UTC, Keivan Shah wrote:
 On Wednesday, 1 June 2022 at 16:46:51 UTC, kinke wrote:
 On Wednesday, 1 June 2022 at 13:35:45 UTC, Keivan Shah wrote:
 probably will use 
 [core.atomic.atomicFence](https://dlang.org/phobos/core_atomi


 `core.atomic.atomic{Load,Store}` handle this:

 ```
 void main()
 {
     import std.stdio : writeln;
     import core.thread;
     import core.atomic;

     writeln("Starting...");
     shared bool done = false;
     new Thread({ atomicStore(done, true); }).start();
     while (!atomicLoad(done))
     {
     } // Wait for done
     writeln("Done!");
 }
 ``

 Hey kinke,
 Thanks for the snippet, I have some existing code that follows 
 the pattern similar to the example I had given. There are a lot 
 of variables that are shared between the 2 threads and so 
 instead of changing the reads and writes for every variable to 
 be `atomic{Store,Load}`, I was thinking if `atomicFence` might 
 achieve the same effect with far lesser code change 
 requirements. Would that be recommended or should I use 
 `atomic{Store,Load}` instead?

`atomicFence` only works for ordering memory operations on a 
single thread, not ordering between threads.

You can either use a mutex (to coordinate between threads), or 
need all store/loads to be atomic with `atomic{Store,Load}`.
I am surprised that there is no `core.atomic!int` to simplify 
your life. Perhaps you should make a feature request :)

-Johan

Jun 02 2022

Keivan Shah <keivan.shah silverleafcaps.com> writes:

On Thursday, 2 June 2022 at 11:56:07 UTC, Johan wrote:
 `atomicFence` only works for ordering memory operations on a 
 single thread, not ordering between threads.

 You can either use a mutex (to coordinate between threads), or 
 need all store/loads to be atomic with `atomic{Store,Load}`.
 I am surprised that there is no `core.atomic!int` to simplify 
 your life. Perhaps you should make a feature request :)

 -Johan

Hey Johan,
Thanks for the reply, I finally ended up using 
`atomic{Store,Load}` with some memory ordering after reading 
about them a bit. Atomic variables (`core.atomic!int`) would have 
been a great feature, I have no further usecases for them and 
they probably are a bit more complex for my understanding to make 
a informed request right now, but will definitely make one when 
my understanding improves in the future.

Thanks and regards,
Keivan Shah.

Jun 02 2022

rm <rdm e.email> writes:

On 02/06/2022 20:41, Keivan Shah wrote:
 On Thursday, 2 June 2022 at 11:56:07 UTC, Johan wrote:
 `atomicFence` only works for ordering memory operations on a single 
 thread, not ordering between threads.

 You can either use a mutex (to coordinate between threads), or need 
 all store/loads to be atomic with `atomic{Store,Load}`.
 I am surprised that there is no `core.atomic!int` to simplify your 
 life. Perhaps you should make a feature request :)

 -Johan

 
 Hey Johan,
 Thanks for the reply, I finally ended up using `atomic{Store,Load}` with 
 some memory ordering after reading about them a bit. Atomic variables 
 (`core.atomic!int`) would have been a great feature, I have no further 
 usecases for them and they probably are a bit more complex for my 
 understanding to make a informed request right now, but will definitely 
 make one when my understanding improves in the future.
 
 Thanks and regards,
 Keivan Shah.

You can check this:
https://github.com/rymrg/drm/blob/main/atomic.d

It works well enough for my needs at the moment. But it requires more 
extensive testing to see if it plays nicely with  safe. But for 
__gshared it should work just fine.

Jun 06 2022

Kagamin <spam here.lot> writes:

On Monday, 6 June 2022 at 08:08:45 UTC, rm wrote:
 You can check this:
 https://github.com/rymrg/drm/blob/main/atomic.d

You should disable postblit or it will be copyable without atomic 
operation.

Jun 15 2022

David Nadlinger <code klickverbot.at> writes:

On 2 Jun 2022, at 7:56, Johan via digitalmars-d-ldc wrote:
 `atomicFence` only works for ordering memory operations on a single 
 thread, not ordering between threads.

`atomicFence` should work across threads as well; it inserts actual 
processor fences as necessary, in addition to the requisite effect on 
compiler optimisations.

However, in practice, code using standalone fences often harder to make 
sense of than that using atomic loads/stores, unless the relevant 
accesses are explicitly documented in enough detail. Just sprinkling 
`atomicFence` over an existing code base to make it thread-safe is 
pretty much guaranteed to fail. Thus, preferring atomic loads/stores is 
very sound advice in practice.

(There are also fences that are effectively for compiler optimisations 
only and usually go by volatile fences or signal fences; perhaps 
that’s what Johan was thinking of.)

  — David

Jun 04 2022

Johan <j j.nl> writes:

On Saturday, 4 June 2022 at 21:33:30 UTC, David Nadlinger wrote:
 On 2 Jun 2022, at 7:56, Johan via digitalmars-d-ldc wrote:
 `atomicFence` only works for ordering memory operations on a 
 single thread, not ordering between threads.

 `atomicFence` should work across threads as well; it inserts 
 actual processor fences as necessary, in addition to the 
 requisite effect on compiler optimisations.

What I meant to say is that the fence only works for ordering 
memory operations of the thread that contains the fence, as 
observable by other threads. Indeed you use it to coordinate 
between threads.
If thread A has a fence somewhere, then that fence orders the 
memory operations of thread A that thread B will observe. So you 
can use variables from thread A in thread B, for example: `if 
(value_is_ok_flag) foo(value);`. The fence in thread A (between 
writing `value` and `ok_flag`) prevents that `ok_flag==true` is 
observed in thread B _before_ the new `value` value is observed.

Because the OP issue is not a thread synchronization issue, but a 
re-read optimization (only need to read variable once, instead of 
upon every loop iteration), the atomic fence is likely not to 
work. I think the atomicLoad works because it is _both_ 
`volatile` and `atomic`.

cheers,
   Johan

Jun 05 2022

D Programming

C/C++ Programming

Other

digitalmars.D.ldc - LDC optimized builds causing unexpected behaviour when sharing