www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - synchronized class bugs?

reply Gregor =?UTF-8?B?TcO8Y2ts?= <gregormueckl gmx.de> writes:
I've been playing around with synchronized class. My example is 
the following dummy class:

     synchronized class Shared {
     public:
         void increment(int value) {
             //(cast(Shared) this).x += value;
             x += value;
         }

         void decrement(int value) {
             //(cast(Shared) this).x -= value;
             x -= value;
         }

         shared int get() { return x; }

     private:
         int x;
     }

Then I'm calling increment and decrement many times in parallel 
threads so that they *should* cancel out in the end and x should 
be back to 0 at the end. With the implicit synchronization using 
d monitors, this should work.

Bug number 1: dmd doesn't translate this without casting away 
shared:

sharedtest.d(14): Error: read-modify-write operations are not 
allowed for shared variables. Use 
core.atomic.atomicOp!"+="(this.x, value) instead.

The += and -= operators are safe as they are inside locked 
monitors. The emitted code contains calls to d_montiorenter and 
d_monitorexit. The compiler should understand this.

Bug number 2: even when casting away shared, the end result is 
wrong when using multiple threads. Thus, the monitor locking code 
in druntime cannot be correct. It does have an effect, though. 
Omitting synchronized on Shared results in values that are wide 
off the mark. Including it results in small differences. This 
suggests that a memory fence might be missing somewhere.

The observed reality contradicts the documentation in many ways 
and is inconsistent with itself. What is going on here?
Apr 07 2020
next sibling parent reply IGotD- <nise nise.com> writes:
On Tuesday, 7 April 2020 at 15:18:41 UTC, Gregor Mückl wrote:
 I've been playing around with synchronized class. My example is 
 the following dummy class:

     synchronized class Shared {
     public:
         void increment(int value) {
             //(cast(Shared) this).x += value;
             x += value;
         }

         void decrement(int value) {
             //(cast(Shared) this).x -= value;
             x -= value;
         }

         shared int get() { return x; }

     private:
         int x;
     }

 Then I'm calling increment and decrement many times in parallel 
 threads so that they *should* cancel out in the end and x 
 should be back to 0 at the end. With the implicit 
 synchronization using d monitors, this should work.

 Bug number 1: dmd doesn't translate this without casting away 
 shared:

 sharedtest.d(14): Error: read-modify-write operations are not 
 allowed for shared variables. Use 
 core.atomic.atomicOp!"+="(this.x, value) instead.

 The += and -= operators are safe as they are inside locked 
 monitors. The emitted code contains calls to d_montiorenter and 
 d_monitorexit. The compiler should understand this.

 Bug number 2: even when casting away shared, the end result is 
 wrong when using multiple threads. Thus, the monitor locking 
 code in druntime cannot be correct. It does have an effect, 
 though. Omitting synchronized on Shared results in values that 
 are wide off the mark. Including it results in small 
 differences. This suggests that a memory fence might be missing 
 somewhere.

 The observed reality contradicts the documentation in many ways 
 and is inconsistent with itself. What is going on here?
Correct me if I'm wrong, aren't all synchronized classes protected with a mutex. In this case atomic operations are pointless as all methods are protected by the mutex anyway.
Apr 07 2020
parent reply Gregor =?UTF-8?B?TcO8Y2ts?= <gregormueckl gmx.de> writes:
On Tuesday, 7 April 2020 at 16:11:12 UTC, IGotD- wrote:
 Correct me if I'm wrong, aren't all synchronized classes 
 protected with a mutex. In this case atomic operations are 
 pointless as all methods are protected by the mutex anyway.
That's what's I'm trying to say. They should be protected and the code for locking and unlocking is generated (tested on dmd, gdc and ldc). But - the compiler tries to enforce atomics in this context anyway, but they are pointless due to the mutex - the mutex doesn't lead to proper locking behavior at runtime; when calling increment an decrement in many threads in parallel, the result is wrong.
Apr 07 2020
parent reply IGotD- <nise nise.com> writes:
On Tuesday, 7 April 2020 at 16:18:53 UTC, Gregor Mückl wrote:
 On Tuesday, 7 April 2020 at 16:11:12 UTC, IGotD- wrote:
 Correct me if I'm wrong, aren't all synchronized classes 
 protected with a mutex. In this case atomic operations are 
 pointless as all methods are protected by the mutex anyway.
That's what's I'm trying to say. They should be protected and the code for locking and unlocking is generated (tested on dmd, gdc and ldc). But - the compiler tries to enforce atomics in this context anyway, but they are pointless due to the mutex - the mutex doesn't lead to proper locking behavior at runtime; when calling increment an decrement in many threads in parallel, the result is wrong.
I think this is correct, you should be allowed to cast to atomics or do whatever operation you want within the mutex, atomic operation or not. The wrong result sound more like a bug in the locking itself. One way is to try this is on another operating system and see if you get similar result.
Apr 07 2020
parent reply Gregor =?UTF-8?B?TcO8Y2ts?= <gregormueckl gmx.de> writes:
On Tuesday, 7 April 2020 at 16:33:13 UTC, IGotD- wrote:
 On Tuesday, 7 April 2020 at 16:18:53 UTC, Gregor Mückl wrote:
 On Tuesday, 7 April 2020 at 16:11:12 UTC, IGotD- wrote:
 Correct me if I'm wrong, aren't all synchronized classes 
 protected with a mutex. In this case atomic operations are 
 pointless as all methods are protected by the mutex anyway.
That's what's I'm trying to say. They should be protected and the code for locking and unlocking is generated (tested on dmd, gdc and ldc). But - the compiler tries to enforce atomics in this context anyway, but they are pointless due to the mutex - the mutex doesn't lead to proper locking behavior at runtime; when calling increment an decrement in many threads in parallel, the result is wrong.
I think this is correct, you should be allowed to cast to atomics or do whatever operation you want within the mutex, atomic operation or not.
I don't think you understand: the compiler actively prohibits the use of non-atomic operations in synchronized functions. That makes no sense.
 The wrong result sound more like a bug in the locking itself. 
 One way is to try this is on another operating system and see 
 if you get similar result.
The results are wrong with dmd 2.088 on Windows and dmd 2.91, gdc 9.3.0 and ldc 1.20.1 on Linux.
Apr 07 2020
parent IGotD- <nise nise.com> writes:
On Tuesday, 7 April 2020 at 23:34:13 UTC, Gregor Mückl wrote:
 On Tuesday, 7 April 2020 at 16:33:13 UTC, IGotD- wrote:

 I don't think you understand: the compiler actively prohibits 
 the use of non-atomic operations in synchronized functions. 
 That makes no sense.
Sorry, I misunderstood. I agree, that doesn't make any sense at all.
Apr 07 2020
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/7/20 11:18 AM, Gregor Mückl wrote:
 I've been playing around with synchronized class. My example is the 
 following dummy class:
 
      synchronized class Shared {
      public:
          void increment(int value) {
              //(cast(Shared) this).x += value;
              x += value;
          }
 
          void decrement(int value) {
              //(cast(Shared) this).x -= value;
              x -= value;
          }
 
          shared int get() { return x; }
 
      private:
          int x;
      }
 
 Then I'm calling increment and decrement many times in parallel threads 
 so that they *should* cancel out in the end and x should be back to 0 at 
 the end. With the implicit synchronization using d monitors, this should 
 work.
 
 Bug number 1: dmd doesn't translate this without casting away shared:
 
 sharedtest.d(14): Error: read-modify-write operations are not allowed 
 for shared variables. Use core.atomic.atomicOp!"+="(this.x, value) instead.
 
 The += and -= operators are safe as they are inside locked monitors. The 
 emitted code contains calls to d_montiorenter and d_monitorexit. The 
 compiler should understand this.
The requirement for atomicOp is separate from the synchronization. The synchronization is only sound because you made it sound. In fact, if you did: auto getPtr() { return &x; } Then the synchronization all of a sudden does not apply to that variable. This is why we are moving to REQUIRING casting away shared to read/write data -- you have to do it anyway.
 Bug number 2: even when casting away shared, the end result is wrong 
 when using multiple threads. Thus, the monitor locking code in druntime 
 cannot be correct. It does have an effect, though. Omitting synchronized 
 on Shared results in values that are wide off the mark. Including it 
 results in small differences. This suggests that a memory fence might be 
 missing somewhere.
 
 The observed reality contradicts the documentation in many ways and is 
 inconsistent with itself. What is going on here?
synchronized classes call _d_monitorenter and _d_monitorexit, which lock OS primitives (and initialize the monitor if necessary). These have not been touched in years, and depend on the OS mutex which should be rock-solid. Looking at the history, it looks like the last significant change was https://github.com/dlang/druntime/commit/70b3536ddd06e8e1afb269b88ed3298ec03b8798#diff-d609308f6ce1 06db16e214b30db6f74 in 2015. I've definitely depended on synchronized functions in the past, and haven't had issues, but it was in the far past. The only place I can reasonably assume that there might be an issue is for the initialization. To test this, try initializing the lock FIRST before spawning multiple threads (just call one of the functions), and see if it helps. If so, it might be a bug in the initialization routines. Also if you find an issue, it may be specific to your OS. -Steve
Apr 08 2020
parent reply Gregor =?UTF-8?B?TcO8Y2ts?= <gregormueckl gmx.de> writes:
On Wednesday, 8 April 2020 at 13:43:08 UTC, Steven Schveighoffer 
wrote:
 This is why we are moving to REQUIRING casting away shared to 
 read/write data -- you have to do it anyway.
OK, I understand the rationale, even though I don't like the solution very much. You shouldn't really use atomic ops when locks on the affected variables are held. That's bad practice.
 Bug number 2: even when casting away shared, the end result is 
 wrong when using multiple threads. Thus, the monitor locking 
 code in druntime cannot be correct. It does have an effect, 
 though. Omitting synchronized on Shared results in values that 
 are wide off the mark. Including it results in small 
 differences. This suggests that a memory fence might be 
 missing somewhere.
 
 The observed reality contradicts the documentation in many 
 ways and is inconsistent with itself. What is going on here?
synchronized classes call _d_monitorenter and _d_monitorexit, which lock OS primitives (and initialize the monitor if necessary). These have not been touched in years, and depend on the OS mutex which should be rock-solid. Looking at the history, it looks like the last significant change was https://github.com/dlang/druntime/commit/70b3536ddd06e8e1afb269b88ed3298ec03b8798#diff-d609308f6ce1 06db16e214b30db6f74 in 2015.
The calls to _d_monitorenter and _d_monitorexit are in the generated code and I've looked up their implementation before posting to make sure I have correct understanding. I would also assume mutexes to be rock solid. This is why I am in utter disbelief about the actual behavior of the toy program (see code at the end).
 I've definitely depended on synchronized functions in the past, 
 and haven't had issues, but it was in the far past.

 The only place I can reasonably assume that there might be an 
 issue is for the initialization.

 To test this, try initializing the lock FIRST before spawning 
 multiple threads (just call one of the functions), and see if 
 it helps. If so, it might be a bug in the initialization 
 routines.
I added such function calls and that didn't change anything.
 Also if you find an issue, it may be specific to your OS.
Well, as I've mentioned before, I've reproduced this on Linux and Windows with dmd, ldc and gdc. So it's not OS specific, or at least not entirely.
 -Steve
The the full program code is below the line. You can run it on run.dlang.org to reproduce the problem. ---- import core.thread; import std.stdio; version = Sync; version(Sync) { pragma(msg, "synchronized"); synchronized class Shared { public: void increment(int value) { (cast(Shared) this).x += value; } void decrement(int value) { (cast(Shared) this).x -= value; } shared int get() { return x; } private: int x; } } else { pragma(msg, "non-synchronized"); class Shared { public: shared void increment(int value) { (cast(Shared) this).x += value; } shared void decrement(int value) { (cast(Shared) this).x -= value; } shared int get() { return x; } private: int x; } } void main() { shared Shared s = new shared Shared(); s.increment(1); s.decrement(1); writeln(s.get()); Thread[] threads = new Thread[10]; for(int i = 0; i < threads.length; i++) { threads[i] = new Thread({ for(int j = 0; j < 1_000_000; j++) { s.increment(i); } for(int j = 0; j < 1_000_000; j++) { s.decrement(i); } }); threads[i].start(); } for(int i = 0; i < threads.length; i++) { threads[i].join(); } writeln(s.get()); }
Apr 08 2020
next sibling parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/8/20 6:50 PM, Gregor Mückl wrote:
 On Wednesday, 8 April 2020 at 13:43:08 UTC, Steven Schveighoffer wrote:
 This is why we are moving to REQUIRING casting away shared to 
 read/write data -- you have to do it anyway.
OK, I understand the rationale, even though I don't like the solution very much. You shouldn't really use atomic ops when locks on the affected variables are held. That's bad practice.
No, you misunderstand. The correct path is to cast away shared for those variables that you manually ensure cannot be touched outside your class, not to use atomic ops. But the compiler has no way to prove you did it right, which is why the casting is required. The expectation is that frameworks that use this requirement will emerge that make this much more convenient.
 
 The the full program code is below the line. You can run it on 
 run.dlang.org to reproduce the problem.
OK, will look to see what I can find. -Steve
Apr 08 2020
prev sibling parent reply Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/8/20 6:50 PM, Gregor Mückl wrote:

 void main()
 {
      shared Shared s = new shared Shared();
 
      s.increment(1);
      s.decrement(1);
      writeln(s.get());
 
      Thread[] threads = new Thread[10];
      for(int i = 0; i < threads.length; i++) {
          threads[i] = new Thread({
              for(int j = 0; j < 1_000_000; j++) {
                  s.increment(i);
              }
              for(int j = 0; j < 1_000_000; j++) {
                  s.decrement(i);
              }
          });
          threads[i].start();
      }
 
      for(int i = 0; i < threads.length; i++) {
          threads[i].join();
      }
 
      writeln(s.get());
 }
 
The issue is not synchronization of the class data, but your promiscuous usage of the stack frame ;) Try this instead for your loop: for(int i = 0; i < threads.length; i++) { threads[i] = (int val) { return new Thread({ for(int j = 0; j < 1_000_000; j++) { s.increment(val); } for(int j = 0; j < 1_000_000; j++) { s.decrement(val); } });}(i); Why doesn't yours work? Because you are starting your threads and then changing the value of i as you iterate the loop. This works, because it captures the i value as a new variable in a closure, so `val` remains constant while `i` increments. -Steve
Apr 08 2020
parent reply Gregor =?UTF-8?B?TcO8Y2ts?= <gregormueckl gmx.de> writes:
On Wednesday, 8 April 2020 at 23:25:09 UTC, Steven Schveighoffer 
wrote:
 This works, because it captures the i value as a new variable 
 in a closure, so `val` remains constant while `i` increments.

 -Steve
Whooops... that was a major brain fart :(. Sorry for stealing your time. Still, this makes me think about the rules for accessing variables in the outer frame in lambdas. I guess I'm used too much to the C++ rules that require explicit capture of everything and default to copying. At least safe code should require explicit capture. There is no guarantee that a lambda is executed in the same thread.
Apr 08 2020
parent Steven Schveighoffer <schveiguy gmail.com> writes:
On 4/8/20 8:20 PM, Gregor Mückl wrote:
 On Wednesday, 8 April 2020 at 23:25:09 UTC, Steven Schveighoffer wrote:
 This works, because it captures the i value as a new variable in a 
 closure, so `val` remains constant while `i` increments.
Whooops... that was a major brain fart :(. Sorry for stealing your time.
No worries, I didn't spend too long on it.
 Still, this makes me think about the rules for accessing variables in 
 the outer frame in lambdas. I guess I'm used too much to the C++ rules 
 that require explicit capture of everything and default to copying. At 
 least  safe code should require explicit capture. There is no guarantee 
 that a lambda is executed in the same thread.
Yeah, there is a lot of things we could possibly do to prevent these things. I don't know if they are doable without major breakage elsewhere. I had thought of a capture utility function, maybe that can help in cases like this. -Steve
Apr 08 2020