digitalmars.D.learn - While loop on global variable optimised away?

ichneumwn (32/32) May 11 2022 Hi Forum,

rikki cattermole (3/3) May 11 2022 Compiler optimizations should not be defined by a programming language
Johan (29/63) May 11 2022 This is part of the language spec. The language assumes that

ichneumwn (6/9) May 11 2022 Thank you, in C I would not have been surprised. It was the
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (5/8) May 12 2022 I think the common semantics ought to be that everything written

ichneumwn <idonotenjoyemail idonotenjoyemail.org> writes:

Hi Forum,

I have a snippet of code as follows:
```
extern(C) extern __gshared uint g_count;

// inside a class member function:
   while(g_count) <= count) {}
```

This is from a first draft of the code without proper thread 
synchronisation. The global variable g_count is updated from a 
bit of C++ code. As soon as I turn the optimiser on, the code 
never gets passed this point, leading me to suspect it gets 
turned into
```
   while(true) {}
```

If modify the code in the following way:

```
   import core.volatile : volatileLoad;

   while(volatileLoad(&g_count) <= count) {}
```

it works again.

My question is, have I hit a compiler bug (ldc 1.28.1, aarch64 
[Raspberry Pi]) or is this part of the language design. I would 
have thought since D use thread-local storage by default, that 
for a __gshared variable it would be understood that it can get 
modified by another thread. Access through atomic function would 
prevent the compiler from optimising this away as well, but if I 
were to use a Mutex inside the loop, there is no way for the 
compiler to tell *what* that Mutex is protecting and it might 
still decide to optimise the test away (assuming that is what is 
happening, did not attempt to look at the assembler code).

Cheers

May 11 2022

rikki cattermole <rikki cattermole.co.nz> writes:

Compiler optimizations should not be defined by a programming language 
specification.

This will be on LLVM.

May 11 2022

Johan <j j.nl> writes:

On Wednesday, 11 May 2022 at 09:34:20 UTC, ichneumwn wrote:
 Hi Forum,

 I have a snippet of code as follows:
 ```
 extern(C) extern __gshared uint g_count;

 // inside a class member function:
   while(g_count) <= count) {}
 ```

 This is from a first draft of the code without proper thread 
 synchronisation. The global variable g_count is updated from a 
 bit of C++ code. As soon as I turn the optimiser on, the code 
 never gets passed this point, leading me to suspect it gets 
 turned into
 ```
   while(true) {}
 ```

 If modify the code in the following way:

 ```
   import core.volatile : volatileLoad;

   while(volatileLoad(&g_count) <= count) {}
 ```

 it works again.

 My question is, have I hit a compiler bug (ldc 1.28.1, aarch64 
 [Raspberry Pi]) or is this part of the language design. I would 
 have thought since D use thread-local storage by default, that 
 for a __gshared variable it would be understood that it can get 
 modified by another thread.

This is part of the language spec. The language assumes that 
there is a single thread running, and any thread synchronization 
must be done by the user. This is well known from C and C++, from 
which D (implicitly afaik) borrows the memory model.

Example: imagine loading a struct with 2 ulongs from shared 
memory: `auto s = global_struct_variable;`. Loading the data into 
local storage `s` - e.g. CPU registers - would happen in two 
steps, first member1, then member2 (simplified, let's assume it 
spans across a cache boundary, etc..). During that load sequence, 
another thread might write to the struct. If the language must 
have defined behavior in that situation (other thread write), 
then a global mutex lock/unlock must be done before/after _every_ 
read and write of shared data. That'd be a big performance impact 
on multithreading. Instead, single-thread execution is assumed, 
and thus the optimization is valid.

Your solution with `volatileLoad` is correct.

 Access through atomic function would prevent the compiler from 
 optimising this away as well, but if I were to use a Mutex 
 inside the loop, there is no way for the compiler to tell 
 *what* that Mutex is protecting and it might still decide to 
 optimise the test away (assuming that is what is happening, did 
 not attempt to look at the assembler code).

Any function call (inside the loop) for which it cannot be proven 
that it never modifies your memory variable will work. That's why 
I'm pretty sure that mutex lock/unlock will work.


On Wednesday, 11 May 2022 at 09:37:26 UTC, rikki cattermole wrote:
 Compiler optimizations should not be defined by a programming 
 language specification.

This is not true. Compiler optimizations are valid if and only if 
they can be proven by the programming language specification. A 
compiler optimization can never change valid program behavior. If 
an optimization does change behavior, then either the program is 
invalid per the language spec, or the optimization is bugged (or 
the observed behavior change is outside the language spec, such 
as how long a program takes to execute).


-Johan

May 11 2022

ichneumwn <idonotenjoyemail idonotenjoyemail.org> writes:

On Wednesday, 11 May 2022 at 10:01:18 UTC, Johan wrote:

 Any function call (inside the loop) for which it cannot be 
 proven that it never modifies your memory variable will work. 
 That's why I'm pretty sure that mutex lock/unlock will work.

Thank you, in C I would not have been surprised. It was the 
default thread local storage that made me question it. Your 
remark about the mutex lock/unlock is very helpful, I was 
starting to get worried I would need to keep a very careful watch 
on those __gshareds [well, more than usual :) ]

May 11 2022

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:

On Wednesday, 11 May 2022 at 10:01:18 UTC, Johan wrote:
 Any function call (inside the loop) for which it cannot be 
 proven that it never modifies your memory variable will work. 
 That's why I'm pretty sure that mutex lock/unlock will work.

  I think the common semantics ought to be that everything written 
by thread A  before it releases the mutex will be visible to 
thread B when it aquires the same mutex, and any assumptions 
beyond this are nonportable?

May 12 2022

D Programming

C/C++ Programming

Other

digitalmars.D.learn - While loop on global variable optimised away?