www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.ldc - LDC optimized builds causing unexpected behaviour when sharing

reply Keivan Shah <keivan.shah silverleafcaps.com> writes:
Hello,
I am encountering a strange bug when making optimized build with 
LDC. Can someone please help me debug it.

```
void main()
{
     import std.stdio : writeln;
     import core.thread;

     // Runinng this code on LDC with `-O` leads to infinite loop

     writeln("Starting...");
     bool done = false; // Making this `shared` also doesn't work
     new Thread({ done = true; }).start();
     while (!done)
     {
     } // Wait for done
     writeln("Done!");
}
```
The program ends up in an infinite loop if `-O` is passed to LDC 
and works normally without it. Also works fine in both cases with 
dmd. I thought this was due to the variable `done` being shared 
between threads, so I tried marking the variable as `shared` but 
even that does not solve the problem. Also tried making the 
variable global and using `__gshared` doesn't seem to work as 
well. Finally I was able to solve the problem by using 
[core.volatile](https://dlang.org/phobos/core_volatile.html), but 
that doesn't look like a very great solution. So I wanted to 
understand the main problem here and make sure and also verify 
that this is not a LDC bug.
Jun 01 2022
parent reply Johan <j j.nl> writes:
On Wednesday, 1 June 2022 at 07:02:08 UTC, Keivan Shah wrote:
 Hello,
 I am encountering a strange bug when making optimized build 
 with LDC. Can someone please help me debug it.

 ```
 void main()
 {
     import std.stdio : writeln;
     import core.thread;

     // Runinng this code on LDC with `-O` leads to infinite loop

     writeln("Starting...");
     bool done = false; // Making this `shared` also doesn't work
     new Thread({ done = true; }).start();
     while (!done)
     {
     } // Wait for done
     writeln("Done!");
 }
 ```
 The program ends up in an infinite loop if `-O` is passed to 
 LDC and works normally without it. Also works fine in both 
 cases with dmd. I thought this was due to the variable `done` 
 being shared between threads, so I tried marking the variable 
 as `shared` but even that does not solve the problem. Also 
 tried making the variable global and using `__gshared` doesn't 
 seem to work as well. Finally I was able to solve the problem 
 by using 
 [core.volatile](https://dlang.org/phobos/core_volatile.html), 
 but that doesn't look like a very great solution. So I wanted 
 to understand the main problem here and make sure and also 
 verify that this is not a LDC bug.
This is not a bug. The solution with `volatileLoad` works, but I think it technically isn't quite correct; documentation says: "They should not be used for communication between threads.". There are many articles online about this kind of conditional loop on a variable that is modified by another thread. For example: https://stackoverflow.com/questions/10091112/using-a-global-variable-to-control-a-while-loop- n-a-separate-thread and https://devblogs.microsoft.com/oldnewthing/20180924-00/?p=99805 -Johan
Jun 01 2022
parent reply Keivan Shah <keivan.shah silverleafcaps.com> writes:
On Wednesday, 1 June 2022 at 08:23:18 UTC, Johan wrote:
 This is not a bug.
 The solution with `volatileLoad` works, but I think it 
 technically isn't quite correct; documentation says: "They 
 should not be used for communication between threads.".

 There are many articles online about this kind of conditional 
 loop on a variable that is modified by another thread. For 
 example: 
 https://stackoverflow.com/questions/10091112/using-a-global-variable-to-control-a-while-loop-
n-a-separate-thread  and 
https://devblogs.microsoft.com/oldnewthing/20180924-00/?p=99805

 -Johan
Hey Johan, Thanks for the quick response. As you mentioned even I don't feel the solution with `volatileLoad` is correct and so was looking for something else. The [documentation](https://dlang.org/articles/migrate-to-shared.html#shared) stated that
 Global data that is meant to be shared among multiple threads 
 should be marked with the shared keyword
So to me it seemed like marking the variable as `shared` should have made it accessible as well as it takes care of the memory ordering. That plus the fact that the snippet works in `dmd` but not `ldc` made me think it maybe was a bug. Anyways, have gotten a better understanding of the issue now, probably will use [core.atomic.atomicFence](https://dlang.org/phobos/core_atomi Thanks and Regards, Keivan Shah.
Jun 01 2022
next sibling parent reply Kagamin <spam here.lot> writes:
On Wednesday, 1 June 2022 at 13:35:45 UTC, Keivan Shah wrote:
 So to me it seemed like marking the variable as `shared` should 
 have made it accessible as well as it takes care of the memory 
 ordering.
No, `shared` is a low level primitive. If you want one size fits all concurrency, use mutex, synchronized statement or std.concurrency.
Jun 01 2022
parent Keivan Shah <keivan.shah silverleafcaps.com> writes:
On Wednesday, 1 June 2022 at 15:31:40 UTC, Kagamin wrote:
 On Wednesday, 1 June 2022 at 13:35:45 UTC, Keivan Shah wrote:
 So to me it seemed like marking the variable as `shared` 
 should have made it accessible as well as it takes care of the 
 memory ordering.
No, `shared` is a low level primitive. If you want one size fits all concurrency, use mutex, synchronized statement or std.concurrency.
Hey Kagamin, Thanks, starting to understand that a little now. Seems like that I was relying on some incorrect assumptions and will need to make some overall logic changes. Will look into using synchronized, but seems like my use-case might be fine with `core.atomic` for now since need a fast lightweight way for data sharing for a few shared variables. Regards and Thanks, Keivan Shah.
Jun 01 2022
prev sibling parent reply kinke <noone nowhere.com> writes:
On Wednesday, 1 June 2022 at 13:35:45 UTC, Keivan Shah wrote:
 probably will use 
 [core.atomic.atomicFence](https://dlang.org/phobos/core_atomi

`core.atomic.atomic{Load,Store}` handle this: ``` void main() { import std.stdio : writeln; import core.thread; import core.atomic; writeln("Starting..."); shared bool done = false; new Thread({ atomicStore(done, true); }).start(); while (!atomicLoad(done)) { } // Wait for done writeln("Done!"); } ``
Jun 01 2022
parent reply Keivan Shah <keivan.shah silverleafcaps.com> writes:
On Wednesday, 1 June 2022 at 16:46:51 UTC, kinke wrote:
 On Wednesday, 1 June 2022 at 13:35:45 UTC, Keivan Shah wrote:
 probably will use 
 [core.atomic.atomicFence](https://dlang.org/phobos/core_atomi

`core.atomic.atomic{Load,Store}` handle this: ``` void main() { import std.stdio : writeln; import core.thread; import core.atomic; writeln("Starting..."); shared bool done = false; new Thread({ atomicStore(done, true); }).start(); while (!atomicLoad(done)) { } // Wait for done writeln("Done!"); } ``
Hey kinke, Thanks for the snippet, I have some existing code that follows the pattern similar to the example I had given. There are a lot of variables that are shared between the 2 threads and so instead of changing the reads and writes for every variable to be `atomic{Store,Load}`, I was thinking if `atomicFence` might achieve the same effect with far lesser code change requirements. Would that be recommended or should I use `atomic{Store,Load}` instead? Regards and Thanks, Keivan Shah
Jun 01 2022
parent reply Johan <j j.nl> writes:
On Thursday, 2 June 2022 at 04:33:39 UTC, Keivan Shah wrote:
 On Wednesday, 1 June 2022 at 16:46:51 UTC, kinke wrote:
 On Wednesday, 1 June 2022 at 13:35:45 UTC, Keivan Shah wrote:
 probably will use 
 [core.atomic.atomicFence](https://dlang.org/phobos/core_atomi

`core.atomic.atomic{Load,Store}` handle this: ``` void main() { import std.stdio : writeln; import core.thread; import core.atomic; writeln("Starting..."); shared bool done = false; new Thread({ atomicStore(done, true); }).start(); while (!atomicLoad(done)) { } // Wait for done writeln("Done!"); } ``
Hey kinke, Thanks for the snippet, I have some existing code that follows the pattern similar to the example I had given. There are a lot of variables that are shared between the 2 threads and so instead of changing the reads and writes for every variable to be `atomic{Store,Load}`, I was thinking if `atomicFence` might achieve the same effect with far lesser code change requirements. Would that be recommended or should I use `atomic{Store,Load}` instead?
`atomicFence` only works for ordering memory operations on a single thread, not ordering between threads. You can either use a mutex (to coordinate between threads), or need all store/loads to be atomic with `atomic{Store,Load}`. I am surprised that there is no `core.atomic!int` to simplify your life. Perhaps you should make a feature request :) -Johan
Jun 02 2022
next sibling parent reply Keivan Shah <keivan.shah silverleafcaps.com> writes:
On Thursday, 2 June 2022 at 11:56:07 UTC, Johan wrote:
 `atomicFence` only works for ordering memory operations on a 
 single thread, not ordering between threads.

 You can either use a mutex (to coordinate between threads), or 
 need all store/loads to be atomic with `atomic{Store,Load}`.
 I am surprised that there is no `core.atomic!int` to simplify 
 your life. Perhaps you should make a feature request :)

 -Johan
Hey Johan, Thanks for the reply, I finally ended up using `atomic{Store,Load}` with some memory ordering after reading about them a bit. Atomic variables (`core.atomic!int`) would have been a great feature, I have no further usecases for them and they probably are a bit more complex for my understanding to make a informed request right now, but will definitely make one when my understanding improves in the future. Thanks and regards, Keivan Shah.
Jun 02 2022
parent reply rm <rdm e.email> writes:
On 02/06/2022 20:41, Keivan Shah wrote:
 On Thursday, 2 June 2022 at 11:56:07 UTC, Johan wrote:
 `atomicFence` only works for ordering memory operations on a single 
 thread, not ordering between threads.

 You can either use a mutex (to coordinate between threads), or need 
 all store/loads to be atomic with `atomic{Store,Load}`.
 I am surprised that there is no `core.atomic!int` to simplify your 
 life. Perhaps you should make a feature request :)

 -Johan
Hey Johan, Thanks for the reply, I finally ended up using `atomic{Store,Load}` with some memory ordering after reading about them a bit. Atomic variables (`core.atomic!int`) would have been a great feature, I have no further usecases for them and they probably are a bit more complex for my understanding to make a informed request right now, but will definitely make one when my understanding improves in the future. Thanks and regards, Keivan Shah.
You can check this: https://github.com/rymrg/drm/blob/main/atomic.d It works well enough for my needs at the moment. But it requires more extensive testing to see if it plays nicely with safe. But for __gshared it should work just fine.
Jun 06 2022
parent Kagamin <spam here.lot> writes:
On Monday, 6 June 2022 at 08:08:45 UTC, rm wrote:
 You can check this:
 https://github.com/rymrg/drm/blob/main/atomic.d
You should disable postblit or it will be copyable without atomic operation.
Jun 15 2022
prev sibling parent reply David Nadlinger <code klickverbot.at> writes:
On 2 Jun 2022, at 7:56, Johan via digitalmars-d-ldc wrote:
 `atomicFence` only works for ordering memory operations on a single 
 thread, not ordering between threads.
`atomicFence` should work across threads as well; it inserts actual processor fences as necessary, in addition to the requisite effect on compiler optimisations. However, in practice, code using standalone fences often harder to make sense of than that using atomic loads/stores, unless the relevant accesses are explicitly documented in enough detail. Just sprinkling `atomicFence` over an existing code base to make it thread-safe is pretty much guaranteed to fail. Thus, preferring atomic loads/stores is very sound advice in practice. (There are also fences that are effectively for compiler optimisations only and usually go by volatile fences or signal fences; perhaps that’s what Johan was thinking of.) — David
Jun 04 2022
parent Johan <j j.nl> writes:
On Saturday, 4 June 2022 at 21:33:30 UTC, David Nadlinger wrote:
 On 2 Jun 2022, at 7:56, Johan via digitalmars-d-ldc wrote:
 `atomicFence` only works for ordering memory operations on a 
 single thread, not ordering between threads.
`atomicFence` should work across threads as well; it inserts actual processor fences as necessary, in addition to the requisite effect on compiler optimisations.
What I meant to say is that the fence only works for ordering memory operations of the thread that contains the fence, as observable by other threads. Indeed you use it to coordinate between threads. If thread A has a fence somewhere, then that fence orders the memory operations of thread A that thread B will observe. So you can use variables from thread A in thread B, for example: `if (value_is_ok_flag) foo(value);`. The fence in thread A (between writing `value` and `ok_flag`) prevents that `ok_flag==true` is observed in thread B _before_ the new `value` value is observed. Because the OP issue is not a thread synchronization issue, but a re-read optimization (only need to read variable once, instead of upon every loop iteration), the atomic fence is likely not to work. I think the atomicLoad works because it is _both_ `volatile` and `atomic`. cheers, Johan
Jun 05 2022