digitalmars.D - synchronized class bugs?

Gregor =?UTF-8?B?TcO8Y2ts?= (36/36) Apr 07 2020 I've been playing around with synchronized class. My example is

IGotD- (4/41) Apr 07 2020 Correct me if I'm wrong, aren't all synchronized classes

Gregor =?UTF-8?B?TcO8Y2ts?= (9/12) Apr 07 2020 That's what's I'm trying to say. They should be protected and the

IGotD- (7/19) Apr 07 2020 I think this is correct, you should be allowed to cast to atomics

Gregor =?UTF-8?B?TcO8Y2ts?= (6/27) Apr 07 2020 I don't think you understand: the compiler actively prohibits the

IGotD- (3/7) Apr 07 2020 Sorry, I misunderstood. I agree, that doesn't make any sense at

Steven Schveighoffer (24/67) Apr 08 2020 The requirement for atomicOp is separate from the synchronization. The

Gregor =?UTF-8?B?TcO8Y2ts?= (73/101) Apr 08 2020 OK, I understand the rationale, even though I don't like the

Steven Schveighoffer (10/21) Apr 08 2020 No, you misunderstand. The correct path is to cast away shared for those...
Steven Schveighoffer (18/46) Apr 08 2020 The issue is not synchronization of the class data, but your promiscuous...

Gregor =?UTF-8?B?TcO8Y2ts?= (10/13) Apr 08 2020 Whooops... that was a major brain fart :(. Sorry for stealing

Steven Schveighoffer (7/18) Apr 08 2020 Yeah, there is a lot of things we could possibly do to prevent these

Gregor =?UTF-8?B?TcO8Y2ts?= <gregormueckl gmx.de> writes:

I've been playing around with synchronized class. My example is 
the following dummy class:

     synchronized class Shared {
     public:
         void increment(int value) {
             //(cast(Shared) this).x += value;
             x += value;
         }

         void decrement(int value) {
             //(cast(Shared) this).x -= value;
             x -= value;
         }

         shared int get() { return x; }

     private:
         int x;
     }

Then I'm calling increment and decrement many times in parallel 
threads so that they *should* cancel out in the end and x should 
be back to 0 at the end. With the implicit synchronization using 
d monitors, this should work.

Bug number 1: dmd doesn't translate this without casting away 
shared:

sharedtest.d(14): Error: read-modify-write operations are not 
allowed for shared variables. Use 
core.atomic.atomicOp!"+="(this.x, value) instead.

The += and -= operators are safe as they are inside locked 
monitors. The emitted code contains calls to d_montiorenter and 
d_monitorexit. The compiler should understand this.

Bug number 2: even when casting away shared, the end result is 
wrong when using multiple threads. Thus, the monitor locking code 
in druntime cannot be correct. It does have an effect, though. 
Omitting synchronized on Shared results in values that are wide 
off the mark. Including it results in small differences. This 
suggests that a memory fence might be missing somewhere.

The observed reality contradicts the documentation in many ways 
and is inconsistent with itself. What is going on here?

Apr 07 2020

IGotD- <nise nise.com> writes:

On Tuesday, 7 April 2020 at 15:18:41 UTC, Gregor Mückl wrote:
 I've been playing around with synchronized class. My example is 
 the following dummy class:

     synchronized class Shared {
     public:
         void increment(int value) {
             //(cast(Shared) this).x += value;
             x += value;
         }

         void decrement(int value) {
             //(cast(Shared) this).x -= value;
             x -= value;
         }

         shared int get() { return x; }

     private:
         int x;
     }

 Then I'm calling increment and decrement many times in parallel 
 threads so that they *should* cancel out in the end and x 
 should be back to 0 at the end. With the implicit 
 synchronization using d monitors, this should work.

 Bug number 1: dmd doesn't translate this without casting away 
 shared:

 sharedtest.d(14): Error: read-modify-write operations are not 
 allowed for shared variables. Use 
 core.atomic.atomicOp!"+="(this.x, value) instead.

 The += and -= operators are safe as they are inside locked 
 monitors. The emitted code contains calls to d_montiorenter and 
 d_monitorexit. The compiler should understand this.

 Bug number 2: even when casting away shared, the end result is 
 wrong when using multiple threads. Thus, the monitor locking 
 code in druntime cannot be correct. It does have an effect, 
 though. Omitting synchronized on Shared results in values that 
 are wide off the mark. Including it results in small 
 differences. This suggests that a memory fence might be missing 
 somewhere.

 The observed reality contradicts the documentation in many ways 
 and is inconsistent with itself. What is going on here?

Correct me if I'm wrong, aren't all synchronized classes 
protected with a mutex. In this case atomic operations are 
pointless as all methods are protected by the mutex anyway.

Apr 07 2020

Gregor =?UTF-8?B?TcO8Y2ts?= <gregormueckl gmx.de> writes:

On Tuesday, 7 April 2020 at 16:11:12 UTC, IGotD- wrote:
 Correct me if I'm wrong, aren't all synchronized classes 
 protected with a mutex. In this case atomic operations are 
 pointless as all methods are protected by the mutex anyway.

That's what's I'm trying to say. They should be protected and the 
code for locking and unlocking is generated (tested on dmd, gdc 
and ldc). But

- the compiler tries to enforce atomics in this context anyway, 
but they are pointless due to the mutex
- the mutex doesn't lead to proper locking behavior at runtime; 
when calling increment an decrement in many threads in parallel, 
the result is wrong.

Apr 07 2020

IGotD- <nise nise.com> writes:

On Tuesday, 7 April 2020 at 16:18:53 UTC, Gregor Mückl wrote:
 On Tuesday, 7 April 2020 at 16:11:12 UTC, IGotD- wrote:
 Correct me if I'm wrong, aren't all synchronized classes 
 protected with a mutex. In this case atomic operations are 
 pointless as all methods are protected by the mutex anyway.

 That's what's I'm trying to say. They should be protected and 
 the code for locking and unlocking is generated (tested on dmd, 
 gdc and ldc). But

 - the compiler tries to enforce atomics in this context anyway, 
 but they are pointless due to the mutex
 - the mutex doesn't lead to proper locking behavior at runtime; 
 when calling increment an decrement in many threads in 
 parallel, the result is wrong.

I think this is correct, you should be allowed to cast to atomics 
or do whatever operation you want within the mutex, atomic 
operation or not.

The wrong result sound more like a bug in the locking itself. One 
way is to try this is on another operating system and see if you 
get similar result.

Apr 07 2020

Gregor =?UTF-8?B?TcO8Y2ts?= <gregormueckl gmx.de> writes:

On Tuesday, 7 April 2020 at 16:33:13 UTC, IGotD- wrote:
 On Tuesday, 7 April 2020 at 16:18:53 UTC, Gregor Mückl wrote:
 On Tuesday, 7 April 2020 at 16:11:12 UTC, IGotD- wrote:
 Correct me if I'm wrong, aren't all synchronized classes 
 protected with a mutex. In this case atomic operations are 
 pointless as all methods are protected by the mutex anyway.

 That's what's I'm trying to say. They should be protected and 
 the code for locking and unlocking is generated (tested on 
 dmd, gdc and ldc). But

 - the compiler tries to enforce atomics in this context 
 anyway, but they are pointless due to the mutex
 - the mutex doesn't lead to proper locking behavior at 
 runtime; when calling increment an decrement in many threads 
 in parallel, the result is wrong.

 I think this is correct, you should be allowed to cast to 
 atomics or do whatever operation you want within the mutex, 
 atomic operation or not.

I don't think you understand: the compiler actively prohibits the 
use of non-atomic operations in synchronized functions. That 
makes no sense.

 The wrong result sound more like a bug in the locking itself. 
 One way is to try this is on another operating system and see 
 if you get similar result.

The results are wrong with dmd 2.088 on Windows and dmd 2.91, gdc 
9.3.0 and ldc 1.20.1 on Linux.

Apr 07 2020

IGotD- <nise nise.com> writes:

On Tuesday, 7 April 2020 at 23:34:13 UTC, Gregor Mückl wrote:
 On Tuesday, 7 April 2020 at 16:33:13 UTC, IGotD- wrote:

 I don't think you understand: the compiler actively prohibits 
 the use of non-atomic operations in synchronized functions. 
 That makes no sense.

Sorry, I misunderstood. I agree, that doesn't make any sense at 
all.

Apr 07 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 4/7/20 11:18 AM, Gregor Mückl wrote:
 I've been playing around with synchronized class. My example is the 
 following dummy class:
 
      synchronized class Shared {
      public:
          void increment(int value) {
              //(cast(Shared) this).x += value;
              x += value;
          }
 
          void decrement(int value) {
              //(cast(Shared) this).x -= value;
              x -= value;
          }
 
          shared int get() { return x; }
 
      private:
          int x;
      }
 
 Then I'm calling increment and decrement many times in parallel threads 
 so that they *should* cancel out in the end and x should be back to 0 at 
 the end. With the implicit synchronization using d monitors, this should 
 work.
 
 Bug number 1: dmd doesn't translate this without casting away shared:
 
 sharedtest.d(14): Error: read-modify-write operations are not allowed 
 for shared variables. Use core.atomic.atomicOp!"+="(this.x, value) instead.
 
 The += and -= operators are safe as they are inside locked monitors. The 
 emitted code contains calls to d_montiorenter and d_monitorexit. The 
 compiler should understand this.

The requirement for atomicOp is separate from the synchronization. The 
synchronization is only sound because you made it sound.

In fact, if you did:

auto getPtr() { return &x; }

Then the synchronization all of a sudden does not apply to that variable.

This is why we are moving to REQUIRING casting away shared to read/write 
data -- you have to do it anyway.

 Bug number 2: even when casting away shared, the end result is wrong 
 when using multiple threads. Thus, the monitor locking code in druntime 
 cannot be correct. It does have an effect, though. Omitting synchronized 
 on Shared results in values that are wide off the mark. Including it 
 results in small differences. This suggests that a memory fence might be 
 missing somewhere.
 
 The observed reality contradicts the documentation in many ways and is 
 inconsistent with itself. What is going on here?

synchronized classes call _d_monitorenter and _d_monitorexit, which lock 
OS primitives (and initialize the monitor if necessary). These have not 
been touched in years, and depend on the OS mutex which should be 
rock-solid. Looking at the history, it looks like the last significant 
change was 
https://github.com/dlang/druntime/commit/70b3536ddd06e8e1afb269b88ed3298ec03b8798#diff-d609308f6ce1
06db16e214b30db6f74 
in 2015.

I've definitely depended on synchronized functions in the past, and 
haven't had issues, but it was in the far past.

The only place I can reasonably assume that there might be an issue is 
for the initialization.

To test this, try initializing the lock FIRST before spawning multiple 
threads (just call one of the functions), and see if it helps. If so, it 
might be a bug in the initialization routines.

Also if you find an issue, it may be specific to your OS.

-Steve

Apr 08 2020

Gregor =?UTF-8?B?TcO8Y2ts?= <gregormueckl gmx.de> writes:

On Wednesday, 8 April 2020 at 13:43:08 UTC, Steven Schveighoffer 
wrote:
 This is why we are moving to REQUIRING casting away shared to 
 read/write data -- you have to do it anyway.

OK, I understand the rationale, even though I don't like the 
solution very much. You shouldn't really use atomic ops when 
locks on the affected variables are held. That's bad practice.


 Bug number 2: even when casting away shared, the end result is 
 wrong when using multiple threads. Thus, the monitor locking 
 code in druntime cannot be correct. It does have an effect, 
 though. Omitting synchronized on Shared results in values that 
 are wide off the mark. Including it results in small 
 differences. This suggests that a memory fence might be 
 missing somewhere.
 
 The observed reality contradicts the documentation in many 
 ways and is inconsistent with itself. What is going on here?

 synchronized classes call _d_monitorenter and _d_monitorexit, 
 which lock OS primitives (and initialize the monitor if 
 necessary). These have not been touched in years, and depend on 
 the OS mutex which should be rock-solid. Looking at the 
 history, it looks like the last significant change was 
 https://github.com/dlang/druntime/commit/70b3536ddd06e8e1afb269b88ed3298ec03b8798#diff-d609308f6ce1
06db16e214b30db6f74 in 2015.

The calls to _d_monitorenter and _d_monitorexit are in the 
generated code and I've looked up their implementation before 
posting to make sure I have correct understanding.

I would also assume mutexes to be rock solid. This is why I am in 
utter disbelief about the actual behavior of the toy program (see 
code at the end).

 I've definitely depended on synchronized functions in the past, 
 and haven't had issues, but it was in the far past.

 The only place I can reasonably assume that there might be an 
 issue is for the initialization.

 To test this, try initializing the lock FIRST before spawning 
 multiple threads (just call one of the functions), and see if 
 it helps. If so, it might be a bug in the initialization 
 routines.

I added such function calls and that didn't change anything.

 Also if you find an issue, it may be specific to your OS.

Well, as I've mentioned before, I've reproduced this on Linux and 
Windows with dmd, ldc and gdc. So it's not OS specific, or at 
least not entirely.

 -Steve

The the full program code is below the line. You can run it on 
run.dlang.org to reproduce the problem.

----

import core.thread;

import std.stdio;

version = Sync;

version(Sync) {
     pragma(msg, "synchronized");

     synchronized class Shared {
     public:
         void increment(int value) {
             (cast(Shared) this).x += value;
         }

         void decrement(int value) {
             (cast(Shared) this).x -= value;
         }

         shared int get() { return x; }

     private:
         int x;
     }
} else {
     pragma(msg, "non-synchronized");

     class Shared {
     public:
         shared void increment(int value) {
             (cast(Shared) this).x += value;
         }

         shared void decrement(int value) {
             (cast(Shared) this).x -= value;
         }

         shared int get() { return x; }

     private:
         int x;
     }
}

void main()
{
     shared Shared s = new shared Shared();

     s.increment(1);
     s.decrement(1);
     writeln(s.get());

     Thread[] threads = new Thread[10];
     for(int i = 0; i < threads.length; i++) {
         threads[i] = new Thread({
             for(int j = 0; j < 1_000_000; j++) {
                 s.increment(i);
             }
             for(int j = 0; j < 1_000_000; j++) {
                 s.decrement(i);
             }
         });
         threads[i].start();
     }

     for(int i = 0; i < threads.length; i++) {
         threads[i].join();
     }

     writeln(s.get());
}

Apr 08 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 4/8/20 6:50 PM, Gregor Mückl wrote:
 On Wednesday, 8 April 2020 at 13:43:08 UTC, Steven Schveighoffer wrote:
 This is why we are moving to REQUIRING casting away shared to 
 read/write data -- you have to do it anyway.

 
 OK, I understand the rationale, even though I don't like the solution 
 very much. You shouldn't really use atomic ops when locks on the 
 affected variables are held. That's bad practice.

No, you misunderstand. The correct path is to cast away shared for those 
variables that you manually ensure cannot be touched outside your class, 
not to use atomic ops.

But the compiler has no way to prove you did it right, which is why the 
casting is required.

The expectation is that frameworks that use this requirement will emerge 
that make this much more convenient.

 
 The the full program code is below the line. You can run it on 
 run.dlang.org to reproduce the problem.

OK, will look to see what I can find.

-Steve

Apr 08 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 4/8/20 6:50 PM, Gregor Mückl wrote:

 void main()
 {
      shared Shared s = new shared Shared();
 
      s.increment(1);
      s.decrement(1);
      writeln(s.get());
 
      Thread[] threads = new Thread[10];
      for(int i = 0; i < threads.length; i++) {
          threads[i] = new Thread({
              for(int j = 0; j < 1_000_000; j++) {
                  s.increment(i);
              }
              for(int j = 0; j < 1_000_000; j++) {
                  s.decrement(i);
              }
          });
          threads[i].start();
      }
 
      for(int i = 0; i < threads.length; i++) {
          threads[i].join();
      }
 
      writeln(s.get());
 }
 

The issue is not synchronization of the class data, but your promiscuous 
usage of the stack frame ;)

Try this instead for your loop:

     for(int i = 0; i < threads.length; i++) {
         threads[i] = (int val) { return new Thread({
             for(int j = 0; j < 1_000_000; j++) {
                 s.increment(val);
             }
             for(int j = 0; j < 1_000_000; j++) {
                 s.decrement(val);
             }
         });}(i);


Why doesn't yours work? Because you are starting your threads and then 
changing the value of i as you iterate the loop.

This works, because it captures the i value as a new variable in a 
closure, so `val` remains constant while `i` increments.

-Steve

Apr 08 2020

Gregor =?UTF-8?B?TcO8Y2ts?= <gregormueckl gmx.de> writes:

On Wednesday, 8 April 2020 at 23:25:09 UTC, Steven Schveighoffer 
wrote:
 This works, because it captures the i value as a new variable 
 in a closure, so `val` remains constant while `i` increments.

 -Steve

Whooops... that was a major brain fart :(. Sorry for stealing 
your time.

Still, this makes me think about the rules for accessing 
variables in the outer frame in lambdas. I guess I'm used too 
much to the C++ rules that require explicit capture of everything 
and default to copying. At least  safe code should require 
explicit capture. There is no guarantee that a lambda is executed 
in the same thread.

Apr 08 2020

Steven Schveighoffer <schveiguy gmail.com> writes:

On 4/8/20 8:20 PM, Gregor Mückl wrote:
 On Wednesday, 8 April 2020 at 23:25:09 UTC, Steven Schveighoffer wrote:
 This works, because it captures the i value as a new variable in a 
 closure, so `val` remains constant while `i` increments.

 
 Whooops... that was a major brain fart :(. Sorry for stealing your time.

No worries, I didn't spend too long on it.

 Still, this makes me think about the rules for accessing variables in 
 the outer frame in lambdas. I guess I'm used too much to the C++ rules 
 that require explicit capture of everything and default to copying. At 
 least  safe code should require explicit capture. There is no guarantee 
 that a lambda is executed in the same thread.

Yeah, there is a lot of things we could possibly do to prevent these 
things. I don't know if they are doable without major breakage elsewhere.

I had thought of a capture utility function, maybe that can help in 
cases like this.

-Steve

Apr 08 2020

D Programming

C/C++ Programming

Other

digitalmars.D - synchronized class bugs?