digitalmars.D - Is core.internal.atomic.atomicFetchAdd implementation really lock
- mw (21/21) Nov 29 2022 According to the doc:
- Vladimir Panteleev (8/22) Nov 29 2022 (Disclaimer: I have not studied lock-free programming so I might
- H. S. Teoh (8/15) Nov 29 2022 [...]
- rikki cattermole (5/7) Nov 29 2022 No its correct.
- H. S. Teoh (5/15) Nov 29 2022 Ah, I see. Thanks!
- claptrap (4/13) Nov 29 2022 It really shouldn't do that IMO. People expect atomic ops to be
- rikki cattermole (5/18) Nov 29 2022 It does.
- Iain Buclaw (8/23) Nov 30 2022 C++'s
- claptrap (5/16) Dec 03 2022 I used to make this joke about my brother, if you ever needed to
- max haughton (3/18) Dec 02 2022 I expect atomic ops to provide correct memory consistency, and
- claptrap (14/33) Dec 03 2022 I am really hoping that "preferably atomic" was just a poor
- max haughton (12/48) Dec 03 2022 If it provides the same memory ordering guarantees does it matter
- claptrap (6/22) Dec 03 2022 It matters because the whole point of atomic operations is lock
- max haughton (8/32) Dec 03 2022 Memory ordering is literally why modern atomic operations exist.
- rikki cattermole (8/12) Dec 03 2022 Also worth remembering that at some point somebody has to do
- claptrap (20/52) Dec 04 2022 No it's not, literally go read the Intel SDM, the lock prefix is
- max haughton (11/57) Dec 04 2022 X86 isn't the only processor, by atomics I am referring to the
- claptrap (14/26) Dec 04 2022 I think you'll struggle to find any processor that defines
- max haughton (5/22) Dec 04 2022 https://www.cl.cam.ac.uk/~pes20/weakmemory/x86tso-paper.tphols.pdf
- Johan (11/11) Dec 05 2022 claptrap, Max,
According to the doc: https://dlang.org/library/core/atomic/atomic_fetch_add.html Atomically adds mod to the value referenced by val and returns the value val held previously. This operation is both lock-free and atomic. However, when I check its final implementation, I saw https://github.com/dlang/dmd/blob/master/druntime/src/core/internal/atomic.d#L275 ``` lock; xadd[%0], %1; ``` Is this really lock free? If not, can we use `cas` to implement it, e.g pseudo code: ``` int atomicFetchAdd(int * pAddr, int nIncr ) { while (true) { int ncur = atomicLoad(pAddr); if (cas( pAddr, ncur, ncur + nIncr ) return ncur ; } } ```
Nov 29 2022
(Disclaimer: I have not studied lock-free programming so I might be wrong.) On Wednesday, 30 November 2022 at 00:04:22 UTC, mw wrote:``` lock; xadd[%0], %1; ``` Is this really lock free?`lock` here refers not to some mutex (which is what "lock free" generally means), but to the processor cache line: https://stackoverflow.com/questions/8891067/what-does-the-lock-instruction-mean-in-x86-assemblyIf not, can we use `cas` to implement it, e.g pseudo code: ``` int atomicFetchAdd(int * pAddr, int nIncr ) { while (true) { int ncur = atomicLoad(pAddr); if (cas( pAddr, ncur, ncur + nIncr ) return ncur ; } } ````CMPXCHG`, used to implement CAS on x86, needs LOCK as well: https://stackoverflow.com/questions/27837731/is-x86-cmpxchg-atomic-if-so-why-does-it-need-lock
Nov 29 2022
On Wed, Nov 30, 2022 at 12:04:22AM +0000, mw via Digitalmars-d wrote:According to the doc: https://dlang.org/library/core/atomic/atomic_fetch_add.html Atomically adds mod to the value referenced by val and returns the value val held previously. This operation is both lock-free and atomic.[...] Hmm, that's weird that the docs would say that. I've always been under the impression that core.atomic ops use locks to achieve atomicity. T -- Which is worse: ignorance or apathy? Who knows? Who cares? -- Erich Schubert
Nov 29 2022
On 30/11/2022 1:12 PM, H. S. Teoh wrote:Hmm, that's weird that the docs would say that. I've always been under the impression that core.atomic ops use locks to achieve atomicity.No its correct. As long as the hardware supports atomic operations, it'll use those instructions. It does have a fallback to use a mutex if need be though, which might be where you got that idea from.
Nov 29 2022
On Wed, Nov 30, 2022 at 01:16:00PM +1300, rikki cattermole via Digitalmars-d wrote:On 30/11/2022 1:12 PM, H. S. Teoh wrote:Ah, I see. Thanks! T -- It is not the employer who pays the wages. Employers only handle the money. It is the customer who pays the wages. -- Henry FordHmm, that's weird that the docs would say that. I've always been under the impression that core.atomic ops use locks to achieve atomicity.No its correct. As long as the hardware supports atomic operations, it'll use those instructions. It does have a fallback to use a mutex if need be though, which might be where you got that idea from.
Nov 29 2022
On Wednesday, 30 November 2022 at 00:16:00 UTC, rikki cattermole wrote:On 30/11/2022 1:12 PM, H. S. Teoh wrote:It really shouldn't do that IMO. People expect atomic ops to be lock-free, it should be compile error if it cant be so.Hmm, that's weird that the docs would say that. I've always been under the impression that core.atomic ops use locks to achieve atomicity.No its correct. As long as the hardware supports atomic operations, it'll use those instructions. It does have a fallback to use a mutex if need be though, which might be where you got that idea from.
Nov 29 2022
On 30/11/2022 1:35 PM, claptrap wrote:On Wednesday, 30 November 2022 at 00:16:00 UTC, rikki cattermole wrote:It does. Its really only GDC that supports it with its wide range of esoteric targets. But generally speaking if druntime works on a CPU it probably has atomics.On 30/11/2022 1:12 PM, H. S. Teoh wrote:It really shouldn't do that IMO. People expect atomic ops to be lock-free, it should be compile error if it cant be so.Hmm, that's weird that the docs would say that. I've always been under the impression that core.atomic ops use locks to achieve atomicity.No its correct. As long as the hardware supports atomic operations, it'll use those instructions. It does have a fallback to use a mutex if need be though, which might be where you got that idea from.
Nov 29 2022
On Wednesday, 30 November 2022 at 00:35:55 UTC, claptrap wrote:On Wednesday, 30 November 2022 at 00:16:00 UTC, rikki cattermole wrote:C++'s [std::atomic](https://github.com/gcc-mirror/gcc/blob/a1b5cdf381d6b02f5048d886a8377d0042bda3af/libstdc%2B%2B-v3/config/cpu/generic/atomicity_mu ex/atomicity.h#L46) has followed this precedent too of using a mutex *if all else fails*. Really, you'd have to be pretty explicit in forcing GDC built-in atomics to be disabled at configure-time - there's an atomic library [right there in GCC's tree](https://github.com/gcc-mirror/gcc/tree/master/libatomic) if a given target does not have native support.On 30/11/2022 1:12 PM, H. S. Teoh wrote:It really shouldn't do that IMO. People expect atomic ops to be lock-free, it should be compile error if it cant be so.Hmm, that's weird that the docs would say that. I've always been under the impression that core.atomic ops use locks to achieve atomicity.No its correct. As long as the hardware supports atomic operations, it'll use those instructions. It does have a fallback to use a mutex if need be though, which might be where you got that idea from.
Nov 30 2022
On Wednesday, 30 November 2022 at 09:25:57 UTC, Iain Buclaw wrote:On Wednesday, 30 November 2022 at 00:35:55 UTC, claptrap wrote:I used to make this joke about my brother, if you ever needed to know how to do something you would ask him what to do... and then do the exact opposite. I feel the same way about C++On Wednesday, 30 November 2022 at 00:16:00 UTC, rikki cattermole wrote:C++'s [std::atomic](https://github.com/gcc-mirror/gcc/blob/a1b5cdf381d6b02f5048d886a8377d0042bda3af/libstdc%2B%2B-v3/config/cpu/generic/atomicity_mu ex/atomicity.h#L46) has followed this precedent too of using a mutex *if all else fails*. Really, you'd have to be pretty explicit in forcing GDC built-in atomics to be disabled at configure-time - there's an atomic library [right there in GCC's tree](https://github.com/gcc-mirror/gcc/tree/master/libatomic) if a given target does not have native support.
Dec 03 2022
On Wednesday, 30 November 2022 at 00:35:55 UTC, claptrap wrote:On Wednesday, 30 November 2022 at 00:16:00 UTC, rikki cattermole wrote:I expect atomic ops to provide correct memory consistency, and preferably be atomic where the platform allows it.On 30/11/2022 1:12 PM, H. S. Teoh wrote:It really shouldn't do that IMO. People expect atomic ops to be lock-free, it should be compile error if it cant be so.Hmm, that's weird that the docs would say that. I've always been under the impression that core.atomic ops use locks to achieve atomicity.No its correct. As long as the hardware supports atomic operations, it'll use those instructions. It does have a fallback to use a mutex if need be though, which might be where you got that idea from.
Dec 02 2022
On Saturday, 3 December 2022 at 03:42:01 UTC, max haughton wrote:On Wednesday, 30 November 2022 at 00:35:55 UTC, claptrap wrote:I am really hoping that "preferably atomic" was just a poor choice or words. I guess regarding the other issues I still think of atomic ops in terms of what the cpu does, so a high level wrapper around them that takes a lock is the complete antithesis to what is actually desired. I mean the whole point of atomic ops is to be able to avoid using locks. And if you look at the phobos docs... "Atomically adds mod to the value referenced by val and returns the value val held previously. This operation is both lock-free and atomic." https://dlang.org/library/core/atomic/atomic_fetch_add.htmlOn Wednesday, 30 November 2022 at 00:16:00 UTC, rikki cattermole wrote:I expect atomic ops to provide correct memory consistency, and preferably be atomic where the platform allows it.On 30/11/2022 1:12 PM, H. S. Teoh wrote:It really shouldn't do that IMO. People expect atomic ops to be lock-free, it should be compile error if it cant be so.Hmm, that's weird that the docs would say that. I've always been under the impression that core.atomic ops use locks to achieve atomicity.No its correct. As long as the hardware supports atomic operations, it'll use those instructions. It does have a fallback to use a mutex if need be though, which might be where you got that idea from.
Dec 03 2022
On Saturday, 3 December 2022 at 13:05:44 UTC, claptrap wrote:On Saturday, 3 December 2022 at 03:42:01 UTC, max haughton wrote:If it provides the same memory ordering guarantees does it matter (if we ignore performance for a second)? There are situations where you do (for reasons beyond performance) actually need a efficient (no overhead) atomic operation in lock-free coding, but these are really on the edge of what can be considered guaranteed by any specification. Obviously they *should* be atomic, but not all platforms have atomics and not all platforms support the same atomic privileges - but they all need to support the same code/patterns. Our atomics prefer to error whereas the C++ ones have a fallback thing for large types and so on.On Wednesday, 30 November 2022 at 00:35:55 UTC, claptrap wrote:I am really hoping that "preferably atomic" was just a poor choice or words. I guess regarding the other issues I still think of atomic ops in terms of what the cpu does, so a high level wrapper around them that takes a lock is the complete antithesis to what is actually desired. I mean the whole point of atomic ops is to be able to avoid using locks. And if you look at the phobos docs... "Atomically adds mod to the value referenced by val and returns the value val held previously. This operation is both lock-free and atomic." https://dlang.org/library/core/atomic/atomic_fetch_add.htmlOn Wednesday, 30 November 2022 at 00:16:00 UTC, rikki cattermole wrote:I expect atomic ops to provide correct memory consistency, and preferably be atomic where the platform allows it.On 30/11/2022 1:12 PM, H. S. Teoh wrote:It really shouldn't do that IMO. People expect atomic ops to be lock-free, it should be compile error if it cant be so.Hmm, that's weird that the docs would say that. I've always been under the impression that core.atomic ops use locks to achieve atomicity.No its correct. As long as the hardware supports atomic operations, it'll use those instructions. It does have a fallback to use a mutex if need be though, which might be where you got that idea from.
Dec 03 2022
On Saturday, 3 December 2022 at 20:18:07 UTC, max haughton wrote:On Saturday, 3 December 2022 at 13:05:44 UTC, claptrap wrote:It matters because the whole point of atomic operations is lock free coding. There is no oh you might need atomic for lock free coding, you literally have to have them. If they fall back on a mutex it's not lock free anymore. memory ordering is a somewhat orthogonal issue from atomic ops.On Saturday, 3 December 2022 at 03:42:01 UTC, max haughton wrote:If it provides the same memory ordering guarantees does it matter (if we ignore performance for a second)? There are situations where you do (for reasons beyond performance) actually need a efficient (no overhead) atomic operation in lock-free coding, but these are really on the edge of what can be considered guaranteed by any specification.On Wednesday, 30 November 2022 at 00:35:55 UTC, claptrap"Atomically adds mod to the value referenced by val and returns the value val held previously. This operation is both lock-free and atomic." https://dlang.org/library/core/atomic/atomic_fetch_add.html
Dec 03 2022
On Saturday, 3 December 2022 at 21:05:51 UTC, claptrap wrote:On Saturday, 3 December 2022 at 20:18:07 UTC, max haughton wrote:Memory ordering is literally why modern atomic operations exist. That's why there's a lock prefix on the instruction in X86 - it doesn't just say "do this in one go" it says "do this in one go *and* maintain this memory ordering for the other threads". You'd never see the mutex or similar, it's just a detail of the atomics library. Again, I'm not saying it would be good, just that it wouldn't make any difference to code.On Saturday, 3 December 2022 at 13:05:44 UTC, claptrap wrote:It matters because the whole point of atomic operations is lock free coding. There is no oh you might need atomic for lock free coding, you literally have to have them. If they fall back on a mutex it's not lock free anymore. memory ordering is a somewhat orthogonal issue from atomic ops.On Saturday, 3 December 2022 at 03:42:01 UTC, max haughton wrote:If it provides the same memory ordering guarantees does it matter (if we ignore performance for a second)? There are situations where you do (for reasons beyond performance) actually need a efficient (no overhead) atomic operation in lock-free coding, but these are really on the edge of what can be considered guaranteed by any specification.On Wednesday, 30 November 2022 at 00:35:55 UTC, claptrap"Atomically adds mod to the value referenced by val and returns the value val held previously. This operation is both lock-free and atomic." https://dlang.org/library/core/atomic/atomic_fetch_add.html
Dec 03 2022
On 04/12/2022 2:51 PM, max haughton wrote:Memory ordering is literally why modern atomic operations exist. That's why there's a lock prefix on the instruction in X86 - it doesn't just say "do this in one go" it says "do this in one go *and* maintain this memory ordering for the other threads".Also worth remembering that at some point somebody has to do synchronization to make atomics work. It doesn't matter if its in user code, or in microcode. It has to exist some place. If it doesn't, you continue to have different cores with different values for a chunk of memory. This is something you do not want to encounter and it makes you lose hair (from experience).
Dec 03 2022
On Sunday, 4 December 2022 at 01:51:56 UTC, max haughton wrote:On Saturday, 3 December 2022 at 21:05:51 UTC, claptrap wrote:No it's not, literally go read the Intel SDM, the lock prefix is for atomic operations, that's it's intent. That it also has extra memory ordering guarantees between cores is simply because it would be useless if it didn't also do that. The m/s/l/fence instructions are for memory ordering. If you still dont believe me consider this. x86 has always had strong memory ordering, there's no need to worry about reads to a given location being moved ahead of writes to the same location if you are on a single core. That only goes out of the window when you have a multicore x86. The lock prefix existed before x86 cpus had multiple cores. IE locked instructions are for atomics, the memory ordering guarantees already existed for all reads/writes on single cores. The extra guarantees for memory ordering between cores were added when x86 went multicore, because lock free algorithms would not work if the atomic reads/writes were not also ordered between cores. To say atmoics are about memory ordering is like saying cars are for parking. Yes you have to park them somewhere, but thats not the problem they are meant to solve.On Saturday, 3 December 2022 at 20:18:07 UTC, max haughton wrote:Memory ordering is literally why modern atomic operations exist. That's why there's a lock prefix on the instruction in X86 - it doesn't just say "do this in one go" it says "do this in one go *and* maintain this memory ordering for the other threads".On Saturday, 3 December 2022 at 13:05:44 UTC, claptrap wrote:It matters because the whole point of atomic operations is lock free coding. There is no oh you might need atomic for lock free coding, you literally have to have them. If they fall back on a mutex it's not lock free anymore. memory ordering is a somewhat orthogonal issue from atomic ops.On Saturday, 3 December 2022 at 03:42:01 UTC, max haughton wrote:If it provides the same memory ordering guarantees does it matter (if we ignore performance for a second)? There are situations where you do (for reasons beyond performance) actually need a efficient (no overhead) atomic operation in lock-free coding, but these are really on the edge of what can be considered guaranteed by any specification.On Wednesday, 30 November 2022 at 00:35:55 UTC, claptrap"Atomically adds mod to the value referenced by val and returns the value val held previously. This operation is both lock-free and atomic." https://dlang.org/library/core/atomic/atomic_fetch_add.html
Dec 04 2022
On Sunday, 4 December 2022 at 10:15:48 UTC, claptrap wrote:On Sunday, 4 December 2022 at 01:51:56 UTC, max haughton wrote:X86 isn't the only processor, by atomics I am referring to the ones we use in programming languages (note that I said modern). The ones in D are basically inherited from C++11 (and LLVM), and we're drafted because working with memory ordering prior to them was wild-west. X86 is also still ordered with or without a LOCK prefix. It's a weaker kind of order but it's still defined (it actually wasn't defined to an academically satisfactorily standard until shockingly recently). For example, a store with release ordering on X86 will yield a regular mov instruction.On Saturday, 3 December 2022 at 21:05:51 UTC, claptrap wrote:No it's not, literally go read the Intel SDM, the lock prefix is for atomic operations, that's it's intent. That it also has extra memory ordering guarantees between cores is simply because it would be useless if it didn't also do that. The m/s/l/fence instructions are for memory ordering. If you still dont believe me consider this. x86 has always had strong memory ordering, there's no need to worry about reads to a given location being moved ahead of writes to the same location if you are on a single core. That only goes out of the window when you have a multicore x86. The lock prefix existed before x86 cpus had multiple cores. IE locked instructions are for atomics, the memory ordering guarantees already existed for all reads/writes on single cores. The extra guarantees for memory ordering between cores were added when x86 went multicore, because lock free algorithms would not work if the atomic reads/writes were not also ordered between cores. To say atmoics are about memory ordering is like saying cars are for parking. Yes you have to park them somewhere, but thats not the problem they are meant to solve.On Saturday, 3 December 2022 at 20:18:07 UTC, max haughton wrote:Memory ordering is literally why modern atomic operations exist. That's why there's a lock prefix on the instruction in X86 - it doesn't just say "do this in one go" it says "do this in one go *and* maintain this memory ordering for the other threads".On Saturday, 3 December 2022 at 13:05:44 UTC, claptrap wrote:It matters because the whole point of atomic operations is lock free coding. There is no oh you might need atomic for lock free coding, you literally have to have them. If they fall back on a mutex it's not lock free anymore. memory ordering is a somewhat orthogonal issue from atomic ops.[...]If it provides the same memory ordering guarantees does it matter (if we ignore performance for a second)? There are situations where you do (for reasons beyond performance) actually need a efficient (no overhead) atomic operation in lock-free coding, but these are really on the edge of what can be considered guaranteed by any specification.
Dec 04 2022
On Sunday, 4 December 2022 at 12:39:44 UTC, max haughton wrote:On Sunday, 4 December 2022 at 10:15:48 UTC, claptrap wrote:I think you'll struggle to find any processor that defines atomics in terms of memory ordering and not in terms of a read/modify/write sequence that is executed atomically.On Sunday, 4 December 2022 at 01:51:56 UTC, max haughton wrote:X86 isn't the only processorOn Saturday, 3 December 2022 at 21:05:51 UTC, claptrap wrote:by atomics I am referring to the ones we use in programming languages (note that I said modern). The ones in D are basically inherited from C++11 (and LLVM), and we're drafted because working with memory ordering prior to them was wild-west.I was working on lock free algorithms around 2008, there was an intel pdf at the time specifying memory ordering. Tbh it probably didn't need to be specified until multicore came about since the default memory ordering on x86 single core meant you you literally didn't need to worry about it. C++.. wild west? sounds about right :)X86 is also still ordered with or without a LOCK prefix. It's a weaker kind of order but it's still definedYes the only difference a lock prefix makes is it adds extra ordering guarantees between cores. Basically the ordering grantees you get on a single core, you get the same with locked instructions between cores.
Dec 04 2022
On Monday, 5 December 2022 at 01:37:02 UTC, claptrap wrote:On Sunday, 4 December 2022 at 12:39:44 UTC, max haughton wrote:https://www.cl.cam.ac.uk/~pes20/weakmemory/x86tso-paper.tphols.pdf Intel's initial efforts to specify the behaviour of the processors was not up to snuff at least in the eyes of academic researchers.[...]I think you'll struggle to find any processor that defines atomics in terms of memory ordering and not in terms of a read/modify/write sequence that is executed atomically.[...]I was working on lock free algorithms around 2008, there was an intel pdf at the time specifying memory ordering. Tbh it probably didn't need to be specified until multicore came about since the default memory ordering on x86 single core meant you you literally didn't need to worry about it. C++.. wild west? sounds about right :)[...]Yes the only difference a lock prefix makes is it adds extra ordering guarantees between cores. Basically the ordering grantees you get on a single core, you get the same with locked instructions between cores.
Dec 04 2022
claptrap, Max, Keep in mind that D is not "portable assembly", neither is C or C++. The druntime library follows (afaict) that logic everywhere. The documentation of atomicFetchAdd could be more clear and explain "lock free", because that does not have a clear meaning in D language context. If you want the guarantee that the machine code contains specific instructions, you _have_ to write assembly instructions yourself (which you can do in D as in C and C++). regards, Johan
Dec 05 2022