www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Is core.internal.atomic.atomicFetchAdd implementation really lock

reply mw <mingwu gmail.com> writes:
According to the doc:

https://dlang.org/library/core/atomic/atomic_fetch_add.html

Atomically adds mod to the value referenced by val and returns 
the value val held previously. This operation is both lock-free 
and atomic.


However, when I check its final implementation, I saw


https://github.com/dlang/dmd/blob/master/druntime/src/core/internal/atomic.d#L275

```
lock; xadd[%0], %1;
```

Is this really lock free?

If not, can we use `cas` to implement it, e.g pseudo code:


```
int atomicFetchAdd(int * pAddr, int nIncr ) {

  while (true) {
   int ncur = atomicLoad(pAddr);
   if (cas( pAddr, ncur, ncur + nIncr )
     return ncur ;
  }

}
```
Nov 29 2022
next sibling parent Vladimir Panteleev <thecybershadow.lists gmail.com> writes:
(Disclaimer: I have not studied lock-free programming so I might 
be wrong.)

On Wednesday, 30 November 2022 at 00:04:22 UTC, mw wrote:
 ```
 lock; xadd[%0], %1;
 ```

 Is this really lock free?
`lock` here refers not to some mutex (which is what "lock free" generally means), but to the processor cache line: https://stackoverflow.com/questions/8891067/what-does-the-lock-instruction-mean-in-x86-assembly
 If not, can we use `cas` to implement it, e.g pseudo code:


 ```
 int atomicFetchAdd(int * pAddr, int nIncr ) {

  while (true) {
   int ncur = atomicLoad(pAddr);
   if (cas( pAddr, ncur, ncur + nIncr )
     return ncur ;
  }

 }
 ```
`CMPXCHG`, used to implement CAS on x86, needs LOCK as well: https://stackoverflow.com/questions/27837731/is-x86-cmpxchg-atomic-if-so-why-does-it-need-lock
Nov 29 2022
prev sibling parent reply "H. S. Teoh" <hsteoh qfbox.info> writes:
On Wed, Nov 30, 2022 at 12:04:22AM +0000, mw via Digitalmars-d wrote:
 According to the doc:
 
 https://dlang.org/library/core/atomic/atomic_fetch_add.html
 
 Atomically adds mod to the value referenced by val and returns the
 value val held previously. This operation is both lock-free and
 atomic.
[...] Hmm, that's weird that the docs would say that. I've always been under the impression that core.atomic ops use locks to achieve atomicity. T -- Which is worse: ignorance or apathy? Who knows? Who cares? -- Erich Schubert
Nov 29 2022
parent reply rikki cattermole <rikki cattermole.co.nz> writes:
On 30/11/2022 1:12 PM, H. S. Teoh wrote:
 Hmm, that's weird that the docs would say that.  I've always been under
 the impression that core.atomic ops use locks to achieve atomicity.
No its correct. As long as the hardware supports atomic operations, it'll use those instructions. It does have a fallback to use a mutex if need be though, which might be where you got that idea from.
Nov 29 2022
next sibling parent "H. S. Teoh" <hsteoh qfbox.info> writes:
On Wed, Nov 30, 2022 at 01:16:00PM +1300, rikki cattermole via Digitalmars-d
wrote:
 On 30/11/2022 1:12 PM, H. S. Teoh wrote:
 Hmm, that's weird that the docs would say that.  I've always been
 under the impression that core.atomic ops use locks to achieve
 atomicity.
No its correct. As long as the hardware supports atomic operations, it'll use those instructions. It does have a fallback to use a mutex if need be though, which might be where you got that idea from.
Ah, I see. Thanks! T -- It is not the employer who pays the wages. Employers only handle the money. It is the customer who pays the wages. -- Henry Ford
Nov 29 2022
prev sibling parent reply claptrap <clap trap.com> writes:
On Wednesday, 30 November 2022 at 00:16:00 UTC, rikki cattermole 
wrote:
 On 30/11/2022 1:12 PM, H. S. Teoh wrote:
 Hmm, that's weird that the docs would say that.  I've always 
 been under
 the impression that core.atomic ops use locks to achieve 
 atomicity.
No its correct. As long as the hardware supports atomic operations, it'll use those instructions. It does have a fallback to use a mutex if need be though, which might be where you got that idea from.
It really shouldn't do that IMO. People expect atomic ops to be lock-free, it should be compile error if it cant be so.
Nov 29 2022
next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 30/11/2022 1:35 PM, claptrap wrote:
 On Wednesday, 30 November 2022 at 00:16:00 UTC, rikki cattermole wrote:
 On 30/11/2022 1:12 PM, H. S. Teoh wrote:
 Hmm, that's weird that the docs would say that.  I've always been under
 the impression that core.atomic ops use locks to achieve atomicity.
No its correct. As long as the hardware supports atomic operations, it'll use those instructions. It does have a fallback to use a mutex if need be though, which might be where you got that idea from.
It really shouldn't do that IMO. People expect atomic ops to be lock-free, it should be compile error if it cant be so.
It does. Its really only GDC that supports it with its wide range of esoteric targets. But generally speaking if druntime works on a CPU it probably has atomics.
Nov 29 2022
prev sibling next sibling parent reply Iain Buclaw <ibuclaw gdcproject.org> writes:
On Wednesday, 30 November 2022 at 00:35:55 UTC, claptrap wrote:
 On Wednesday, 30 November 2022 at 00:16:00 UTC, rikki 
 cattermole wrote:
 On 30/11/2022 1:12 PM, H. S. Teoh wrote:
 Hmm, that's weird that the docs would say that.  I've always 
 been under
 the impression that core.atomic ops use locks to achieve 
 atomicity.
No its correct. As long as the hardware supports atomic operations, it'll use those instructions. It does have a fallback to use a mutex if need be though, which might be where you got that idea from.
It really shouldn't do that IMO. People expect atomic ops to be lock-free, it should be compile error if it cant be so.
C++'s [std::atomic](https://github.com/gcc-mirror/gcc/blob/a1b5cdf381d6b02f5048d886a8377d0042bda3af/libstdc%2B%2B-v3/config/cpu/generic/atomicity_mu ex/atomicity.h#L46) has followed this precedent too of using a mutex *if all else fails*. Really, you'd have to be pretty explicit in forcing GDC built-in atomics to be disabled at configure-time - there's an atomic library [right there in GCC's tree](https://github.com/gcc-mirror/gcc/tree/master/libatomic) if a given target does not have native support.
Nov 30 2022
parent claptrap <clap trap.com> writes:
On Wednesday, 30 November 2022 at 09:25:57 UTC, Iain Buclaw wrote:
 On Wednesday, 30 November 2022 at 00:35:55 UTC, claptrap wrote:
 On Wednesday, 30 November 2022 at 00:16:00 UTC, rikki 
 cattermole wrote:
C++'s [std::atomic](https://github.com/gcc-mirror/gcc/blob/a1b5cdf381d6b02f5048d886a8377d0042bda3af/libstdc%2B%2B-v3/config/cpu/generic/atomicity_mu ex/atomicity.h#L46) has followed this precedent too of using a mutex *if all else fails*. Really, you'd have to be pretty explicit in forcing GDC built-in atomics to be disabled at configure-time - there's an atomic library [right there in GCC's tree](https://github.com/gcc-mirror/gcc/tree/master/libatomic) if a given target does not have native support.
I used to make this joke about my brother, if you ever needed to know how to do something you would ask him what to do... and then do the exact opposite. I feel the same way about C++
Dec 03 2022
prev sibling parent reply max haughton <maxhaton gmail.com> writes:
On Wednesday, 30 November 2022 at 00:35:55 UTC, claptrap wrote:
 On Wednesday, 30 November 2022 at 00:16:00 UTC, rikki 
 cattermole wrote:
 On 30/11/2022 1:12 PM, H. S. Teoh wrote:
 Hmm, that's weird that the docs would say that.  I've always 
 been under
 the impression that core.atomic ops use locks to achieve 
 atomicity.
No its correct. As long as the hardware supports atomic operations, it'll use those instructions. It does have a fallback to use a mutex if need be though, which might be where you got that idea from.
It really shouldn't do that IMO. People expect atomic ops to be lock-free, it should be compile error if it cant be so.
I expect atomic ops to provide correct memory consistency, and preferably be atomic where the platform allows it.
Dec 02 2022
parent reply claptrap <clap trap.com> writes:
On Saturday, 3 December 2022 at 03:42:01 UTC, max haughton wrote:
 On Wednesday, 30 November 2022 at 00:35:55 UTC, claptrap wrote:
 On Wednesday, 30 November 2022 at 00:16:00 UTC, rikki 
 cattermole wrote:
 On 30/11/2022 1:12 PM, H. S. Teoh wrote:
 Hmm, that's weird that the docs would say that.  I've always 
 been under
 the impression that core.atomic ops use locks to achieve 
 atomicity.
No its correct. As long as the hardware supports atomic operations, it'll use those instructions. It does have a fallback to use a mutex if need be though, which might be where you got that idea from.
It really shouldn't do that IMO. People expect atomic ops to be lock-free, it should be compile error if it cant be so.
I expect atomic ops to provide correct memory consistency, and preferably be atomic where the platform allows it.
I am really hoping that "preferably atomic" was just a poor choice or words. I guess regarding the other issues I still think of atomic ops in terms of what the cpu does, so a high level wrapper around them that takes a lock is the complete antithesis to what is actually desired. I mean the whole point of atomic ops is to be able to avoid using locks. And if you look at the phobos docs... "Atomically adds mod to the value referenced by val and returns the value val held previously. This operation is both lock-free and atomic." https://dlang.org/library/core/atomic/atomic_fetch_add.html
Dec 03 2022
parent reply max haughton <maxhaton gmail.com> writes:
On Saturday, 3 December 2022 at 13:05:44 UTC, claptrap wrote:
 On Saturday, 3 December 2022 at 03:42:01 UTC, max haughton 
 wrote:
 On Wednesday, 30 November 2022 at 00:35:55 UTC, claptrap wrote:
 On Wednesday, 30 November 2022 at 00:16:00 UTC, rikki 
 cattermole wrote:
 On 30/11/2022 1:12 PM, H. S. Teoh wrote:
 Hmm, that's weird that the docs would say that.  I've 
 always been under
 the impression that core.atomic ops use locks to achieve 
 atomicity.
No its correct. As long as the hardware supports atomic operations, it'll use those instructions. It does have a fallback to use a mutex if need be though, which might be where you got that idea from.
It really shouldn't do that IMO. People expect atomic ops to be lock-free, it should be compile error if it cant be so.
I expect atomic ops to provide correct memory consistency, and preferably be atomic where the platform allows it.
I am really hoping that "preferably atomic" was just a poor choice or words. I guess regarding the other issues I still think of atomic ops in terms of what the cpu does, so a high level wrapper around them that takes a lock is the complete antithesis to what is actually desired. I mean the whole point of atomic ops is to be able to avoid using locks. And if you look at the phobos docs... "Atomically adds mod to the value referenced by val and returns the value val held previously. This operation is both lock-free and atomic." https://dlang.org/library/core/atomic/atomic_fetch_add.html
If it provides the same memory ordering guarantees does it matter (if we ignore performance for a second)? There are situations where you do (for reasons beyond performance) actually need a efficient (no overhead) atomic operation in lock-free coding, but these are really on the edge of what can be considered guaranteed by any specification. Obviously they *should* be atomic, but not all platforms have atomics and not all platforms support the same atomic privileges - but they all need to support the same code/patterns. Our atomics prefer to error whereas the C++ ones have a fallback thing for large types and so on.
Dec 03 2022
parent reply claptrap <clap trap.com> writes:
On Saturday, 3 December 2022 at 20:18:07 UTC, max haughton wrote:
 On Saturday, 3 December 2022 at 13:05:44 UTC, claptrap wrote:
 On Saturday, 3 December 2022 at 03:42:01 UTC, max haughton 
 wrote:
 On Wednesday, 30 November 2022 at 00:35:55 UTC, claptrap
"Atomically adds mod to the value referenced by val and returns the value val held previously. This operation is both lock-free and atomic." https://dlang.org/library/core/atomic/atomic_fetch_add.html
If it provides the same memory ordering guarantees does it matter (if we ignore performance for a second)? There are situations where you do (for reasons beyond performance) actually need a efficient (no overhead) atomic operation in lock-free coding, but these are really on the edge of what can be considered guaranteed by any specification.
It matters because the whole point of atomic operations is lock free coding. There is no oh you might need atomic for lock free coding, you literally have to have them. If they fall back on a mutex it's not lock free anymore. memory ordering is a somewhat orthogonal issue from atomic ops.
Dec 03 2022
parent reply max haughton <maxhaton gmail.com> writes:
On Saturday, 3 December 2022 at 21:05:51 UTC, claptrap wrote:
 On Saturday, 3 December 2022 at 20:18:07 UTC, max haughton 
 wrote:
 On Saturday, 3 December 2022 at 13:05:44 UTC, claptrap wrote:
 On Saturday, 3 December 2022 at 03:42:01 UTC, max haughton 
 wrote:
 On Wednesday, 30 November 2022 at 00:35:55 UTC, claptrap
"Atomically adds mod to the value referenced by val and returns the value val held previously. This operation is both lock-free and atomic." https://dlang.org/library/core/atomic/atomic_fetch_add.html
If it provides the same memory ordering guarantees does it matter (if we ignore performance for a second)? There are situations where you do (for reasons beyond performance) actually need a efficient (no overhead) atomic operation in lock-free coding, but these are really on the edge of what can be considered guaranteed by any specification.
It matters because the whole point of atomic operations is lock free coding. There is no oh you might need atomic for lock free coding, you literally have to have them. If they fall back on a mutex it's not lock free anymore. memory ordering is a somewhat orthogonal issue from atomic ops.
Memory ordering is literally why modern atomic operations exist. That's why there's a lock prefix on the instruction in X86 - it doesn't just say "do this in one go" it says "do this in one go *and* maintain this memory ordering for the other threads". You'd never see the mutex or similar, it's just a detail of the atomics library. Again, I'm not saying it would be good, just that it wouldn't make any difference to code.
Dec 03 2022
next sibling parent rikki cattermole <rikki cattermole.co.nz> writes:
On 04/12/2022 2:51 PM, max haughton wrote:
 Memory ordering is literally why modern atomic operations exist. That's 
 why there's a lock prefix on the instruction in X86 - it doesn't just 
 say "do this in one go" it says "do this in one go *and* maintain this 
 memory ordering for the other threads".
Also worth remembering that at some point somebody has to do synchronization to make atomics work. It doesn't matter if its in user code, or in microcode. It has to exist some place. If it doesn't, you continue to have different cores with different values for a chunk of memory. This is something you do not want to encounter and it makes you lose hair (from experience).
Dec 03 2022
prev sibling parent reply claptrap <clap trap.com> writes:
On Sunday, 4 December 2022 at 01:51:56 UTC, max haughton wrote:
 On Saturday, 3 December 2022 at 21:05:51 UTC, claptrap wrote:
 On Saturday, 3 December 2022 at 20:18:07 UTC, max haughton 
 wrote:
 On Saturday, 3 December 2022 at 13:05:44 UTC, claptrap wrote:
 On Saturday, 3 December 2022 at 03:42:01 UTC, max haughton 
 wrote:
 On Wednesday, 30 November 2022 at 00:35:55 UTC, claptrap
"Atomically adds mod to the value referenced by val and returns the value val held previously. This operation is both lock-free and atomic." https://dlang.org/library/core/atomic/atomic_fetch_add.html
If it provides the same memory ordering guarantees does it matter (if we ignore performance for a second)? There are situations where you do (for reasons beyond performance) actually need a efficient (no overhead) atomic operation in lock-free coding, but these are really on the edge of what can be considered guaranteed by any specification.
It matters because the whole point of atomic operations is lock free coding. There is no oh you might need atomic for lock free coding, you literally have to have them. If they fall back on a mutex it's not lock free anymore. memory ordering is a somewhat orthogonal issue from atomic ops.
Memory ordering is literally why modern atomic operations exist. That's why there's a lock prefix on the instruction in X86 - it doesn't just say "do this in one go" it says "do this in one go *and* maintain this memory ordering for the other threads".
No it's not, literally go read the Intel SDM, the lock prefix is for atomic operations, that's it's intent. That it also has extra memory ordering guarantees between cores is simply because it would be useless if it didn't also do that. The m/s/l/fence instructions are for memory ordering. If you still dont believe me consider this. x86 has always had strong memory ordering, there's no need to worry about reads to a given location being moved ahead of writes to the same location if you are on a single core. That only goes out of the window when you have a multicore x86. The lock prefix existed before x86 cpus had multiple cores. IE locked instructions are for atomics, the memory ordering guarantees already existed for all reads/writes on single cores. The extra guarantees for memory ordering between cores were added when x86 went multicore, because lock free algorithms would not work if the atomic reads/writes were not also ordered between cores. To say atmoics are about memory ordering is like saying cars are for parking. Yes you have to park them somewhere, but thats not the problem they are meant to solve.
Dec 04 2022
parent reply max haughton <maxhaton gmail.com> writes:
On Sunday, 4 December 2022 at 10:15:48 UTC, claptrap wrote:
 On Sunday, 4 December 2022 at 01:51:56 UTC, max haughton wrote:
 On Saturday, 3 December 2022 at 21:05:51 UTC, claptrap wrote:
 On Saturday, 3 December 2022 at 20:18:07 UTC, max haughton 
 wrote:
 On Saturday, 3 December 2022 at 13:05:44 UTC, claptrap wrote:
 [...]
If it provides the same memory ordering guarantees does it matter (if we ignore performance for a second)? There are situations where you do (for reasons beyond performance) actually need a efficient (no overhead) atomic operation in lock-free coding, but these are really on the edge of what can be considered guaranteed by any specification.
It matters because the whole point of atomic operations is lock free coding. There is no oh you might need atomic for lock free coding, you literally have to have them. If they fall back on a mutex it's not lock free anymore. memory ordering is a somewhat orthogonal issue from atomic ops.
Memory ordering is literally why modern atomic operations exist. That's why there's a lock prefix on the instruction in X86 - it doesn't just say "do this in one go" it says "do this in one go *and* maintain this memory ordering for the other threads".
No it's not, literally go read the Intel SDM, the lock prefix is for atomic operations, that's it's intent. That it also has extra memory ordering guarantees between cores is simply because it would be useless if it didn't also do that. The m/s/l/fence instructions are for memory ordering. If you still dont believe me consider this. x86 has always had strong memory ordering, there's no need to worry about reads to a given location being moved ahead of writes to the same location if you are on a single core. That only goes out of the window when you have a multicore x86. The lock prefix existed before x86 cpus had multiple cores. IE locked instructions are for atomics, the memory ordering guarantees already existed for all reads/writes on single cores. The extra guarantees for memory ordering between cores were added when x86 went multicore, because lock free algorithms would not work if the atomic reads/writes were not also ordered between cores. To say atmoics are about memory ordering is like saying cars are for parking. Yes you have to park them somewhere, but thats not the problem they are meant to solve.
X86 isn't the only processor, by atomics I am referring to the ones we use in programming languages (note that I said modern). The ones in D are basically inherited from C++11 (and LLVM), and we're drafted because working with memory ordering prior to them was wild-west. X86 is also still ordered with or without a LOCK prefix. It's a weaker kind of order but it's still defined (it actually wasn't defined to an academically satisfactorily standard until shockingly recently). For example, a store with release ordering on X86 will yield a regular mov instruction.
Dec 04 2022
parent reply claptrap <clap trap.com> writes:
On Sunday, 4 December 2022 at 12:39:44 UTC, max haughton wrote:
 On Sunday, 4 December 2022 at 10:15:48 UTC, claptrap wrote:
 On Sunday, 4 December 2022 at 01:51:56 UTC, max haughton wrote:
 On Saturday, 3 December 2022 at 21:05:51 UTC, claptrap wrote:
X86 isn't the only processor
I think you'll struggle to find any processor that defines atomics in terms of memory ordering and not in terms of a read/modify/write sequence that is executed atomically.
 by atomics I am referring to the ones we use in programming 
 languages (note that I said modern). The ones in D are 
 basically inherited from C++11 (and LLVM), and we're drafted 
 because working with memory ordering prior to them was 
 wild-west.
I was working on lock free algorithms around 2008, there was an intel pdf at the time specifying memory ordering. Tbh it probably didn't need to be specified until multicore came about since the default memory ordering on x86 single core meant you you literally didn't need to worry about it. C++.. wild west? sounds about right :)
 X86 is also still ordered with or without a LOCK prefix. It's a 
 weaker kind of order but it's still defined
Yes the only difference a lock prefix makes is it adds extra ordering guarantees between cores. Basically the ordering grantees you get on a single core, you get the same with locked instructions between cores.
Dec 04 2022
next sibling parent max haughton <maxhaton gmail.com> writes:
On Monday, 5 December 2022 at 01:37:02 UTC, claptrap wrote:
 On Sunday, 4 December 2022 at 12:39:44 UTC, max haughton wrote:
 [...]
I think you'll struggle to find any processor that defines atomics in terms of memory ordering and not in terms of a read/modify/write sequence that is executed atomically.
 [...]
I was working on lock free algorithms around 2008, there was an intel pdf at the time specifying memory ordering. Tbh it probably didn't need to be specified until multicore came about since the default memory ordering on x86 single core meant you you literally didn't need to worry about it. C++.. wild west? sounds about right :)
 [...]
Yes the only difference a lock prefix makes is it adds extra ordering guarantees between cores. Basically the ordering grantees you get on a single core, you get the same with locked instructions between cores.
https://www.cl.cam.ac.uk/~pes20/weakmemory/x86tso-paper.tphols.pdf Intel's initial efforts to specify the behaviour of the processors was not up to snuff at least in the eyes of academic researchers.
Dec 04 2022
prev sibling parent Johan <j j.nl> writes:
claptrap, Max,

Keep in mind that D is not "portable assembly", neither is C or 
C++. The druntime library follows (afaict) that logic everywhere.
The documentation of atomicFetchAdd could be more clear and 
explain "lock free", because that does not have a clear meaning 
in D language context.

If you want the guarantee that the machine code contains specific 
instructions, you _have_ to write assembly instructions yourself 
(which you can do in D as in C and C++).

regards,
   Johan
Dec 05 2022