www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Fixing core.atomic

reply rm <rymrg memail.com> writes:
I plan on making core.atomic more consistent and easier to use. Please 
provide me with your feedback.


https://github.com/rymrg/drm/blob/main/atomic.d
https://github.com/rymrg/drm/blob/main/atomic_rationale.md
May 30 2021
next sibling parent Johan Engelen <j j.nl> writes:
On Sunday, 30 May 2021 at 20:41:29 UTC, rm wrote:
 I plan on making core.atomic more consistent and easier to use. 
 Please provide me with your feedback.


 https://github.com/rymrg/drm/blob/main/atomic.d
 https://github.com/rymrg/drm/blob/main/atomic_rationale.md
Pretty nice initiative. `fadd` --> `fetchadd` or `increment`. `fadd` and `fsub` look like floatingpoint add/sub to me... cheers, Johan
May 30 2021
prev sibling next sibling parent reply Johan Engelen <j j.nl> writes:
On Sunday, 30 May 2021 at 20:41:29 UTC, rm wrote:
 I plan on making core.atomic more consistent and easier to use. 
 Please provide me with your feedback.


 https://github.com/rymrg/drm/blob/main/atomic.d
 https://github.com/rymrg/drm/blob/main/atomic_rationale.md
I would also make the template accept more than just integral types, and only add the increment/binop functions for integral types.
May 30 2021
parent rm <rymrg memail.com> writes:
On 30/05/2021 23:49, Johan Engelen wrote:
 On Sunday, 30 May 2021 at 20:41:29 UTC, rm wrote:
 I plan on making core.atomic more consistent and easier to use. Please 
 provide me with your feedback.


 https://github.com/rymrg/drm/blob/main/atomic.d
 https://github.com/rymrg/drm/blob/main/atomic_rationale.md
I would also make the template accept more than just integral types, and only add the increment/binop functions for integral types.
The plan is to support pointers as this will be really useful for lock free data structures. Currently `isPointer` is defined in `core.internal.traits`. So I haven't written this usage.
May 30 2021
prev sibling next sibling parent reply IGotD- <nise nise.com> writes:
On Sunday, 30 May 2021 at 20:41:29 UTC, rm wrote:
 I plan on making core.atomic more consistent and easier to use. 
 Please provide me with your feedback.


 https://github.com/rymrg/drm/blob/main/atomic.d
 https://github.com/rymrg/drm/blob/main/atomic_rationale.md
Definitely, the D atomic library is cumbersome to use. C++ std::atomic supports operator overloading for example. atomicVar += 1; will create an atomic add as atomicVar is of the atomic type. D doesn't have this and I think D should add atomic types like std::atomic<T>. I like this because then I can easily switch between atomic operations and normal operations by just changing the type and very few changes.
May 30 2021
next sibling parent Zardoz <luis.panadero gmail.com> writes:
On Sunday, 30 May 2021 at 20:58:56 UTC, IGotD- wrote:
 On Sunday, 30 May 2021 at 20:41:29 UTC, rm wrote:
 I plan on making core.atomic more consistent and easier to 
 use. Please provide me with your feedback.


 https://github.com/rymrg/drm/blob/main/atomic.d
 https://github.com/rymrg/drm/blob/main/atomic_rationale.md
Definitely, the D atomic library is cumbersome to use. C++ std::atomic supports operator overloading for example. atomicVar += 1; will create an atomic add as atomicVar is of the atomic type. D doesn't have this and I think D should add atomic types like std::atomic<T>. I like this because then I can easily switch between atomic operations and normal operations by just changing the type and very few changes.
Yes, please! This should be merged ASAP.
May 30 2021
prev sibling parent reply sarn <sarn theartofmachinery.com> writes:
On Sunday, 30 May 2021 at 20:58:56 UTC, IGotD- wrote:
 Definitely, the D atomic library is cumbersome to use. C++ 
 std::atomic supports operator overloading for example.

 atomicVar += 1;

 will create an atomic add as atomicVar is of the atomic type. D 
 doesn't have this and I think D should add atomic types like 
 std::atomic<T>.
That was a design choice. It's because of this:
 I like this because then I can easily switch between atomic 
 operations and normal operations by just changing the type and 
 very few changes.
The trouble is that only works in a handful of simple cases (e.g., you just want a simple event counter that doesn't affect flow of control). For anything else, you need to think carefully about exactly where the atomic operations are, so there's no point making them implicit.
May 31 2021
parent reply rm <rymrg memail.com> writes:
On 01/06/2021 5:50, sarn wrote:
 On Sunday, 30 May 2021 at 20:58:56 UTC, IGotD- wrote:
 Definitely, the D atomic library is cumbersome to use. C++ std::atomic 
 supports operator overloading for example.

 atomicVar += 1;

 will create an atomic add as atomicVar is of the atomic type. D 
 doesn't have this and I think D should add atomic types like 
 std::atomic<T>.
That was a design choice.  It's because of this:
 I like this because then I can easily switch between atomic operations 
 and normal operations by just changing the type and very few changes.
The trouble is that only works in a handful of simple cases (e.g., you just want a simple event counter that doesn't affect flow of control). For anything else, you need to think carefully about exactly where the atomic operations are, so there's no point making them implicit.
I agree about that. One shouldn't simply access the same memory location atomically and non-atomically interchangeably. That is a source for many bugs. Especially considering the kind of synchronization you'll have or not have as a result. Still, there are cases where you *know* that your thread is the *only one* that can access this variable. In a case like this, only after you made sure to synchronize you can also allow for non atomic access to the variable (Though I'd still avoid this). Alternatively, the other case is going from non-atomic to atomic. After initializing the location with an allocator in a non atomic manner, you move to use it atomically to synchronize between threads. But regarding the design choice, if your intention is to prevent casting the atomic to non-atomic. You can simply wrap it in a struct and not allowing access to the raw value. That should be sufficient. Anyway, I disagree about the simple cases. Because specifically the case of simple event counter that isn't require for synchronization, you should be using relaxed. There is no need for sequential consistency in this case.
Jun 02 2021
parent reply sarn <sarn theartofmachinery.com> writes:
On Wednesday, 2 June 2021 at 14:50:44 UTC, rm wrote:
 *snip*
Sorry, but I don't feel like anything you wrote relates to anything I actually said. For example:
 Anyway, I disagree about the simple cases. Because specifically 
 the case of simple event counter that isn't require for 
 synchronization, you should be using relaxed. There is no need 
 for sequential consistency in this case.
The "simple cases" comment was about how the "thread-safe value" abstraction only works at all in a few simple cases (such as an event counter that doesn't affect flow of control). No one has implied anything about what memory order you need for a counter. But if you do consider memory order, that's more reason to treat atomic operations as explicit atomic *operations*, and not wrap them in a "thread-safe value" abstraction.
Jun 02 2021
parent rm <rymrg memail.com> writes:
On 03/06/2021 1:04, sarn wrote:
 On Wednesday, 2 June 2021 at 14:50:44 UTC, rm wrote:
 *snip*
Sorry, but I don't feel like anything you wrote relates to anything I actually said.  For example:
 Anyway, I disagree about the simple cases. Because specifically the 
 case of simple event counter that isn't require for synchronization, 
 you should be using relaxed. There is no need for sequential 
 consistency in this case.
The "simple cases" comment was about how the "thread-safe value" abstraction only works at all in a few simple cases (such as an event counter that doesn't affect flow of control).  No one has implied anything about what memory order you need for a counter. But if you do consider memory order, that's more reason to treat atomic operations as explicit atomic *operations*, and not wrap them in a "thread-safe value" abstraction.
I think I was conflating two replies into one and misread your response. Either way, it does allow for easier porting from C/C++ if such code is used.
Jun 06 2021
prev sibling parent reply Guillaume Piolat <first.last gmail.com> writes:
On Sunday, 30 May 2021 at 20:41:29 UTC, rm wrote:
 I plan on making core.atomic more consistent and easier to use. 
 Please provide me with your feedback.


 https://github.com/rymrg/drm/blob/main/atomic.d
 https://github.com/rymrg/drm/blob/main/atomic_rationale.md
I have once implemented an atomic struct like this and the first thing that happened is that you would write: s = s + 1; Breaking atomicity.
May 31 2021
next sibling parent reply IGotD- <nise nise.com> writes:
On Monday, 31 May 2021 at 08:18:35 UTC, Guillaume Piolat wrote:
 I have once implemented an atomic struct like this and the 
 first thing that happened is that you would write:


 s = s + 1;

 Breaking atomicity.
Yes, so the programmers must be aware of this. In C++ only the unary -= and += are supported and -- and ++. Is it possible to overload binary operators so that they cause an compiler error? In order achieve the same s = s + 1; you need to write. s.store(s.load() + 1) However, the assignment operator writing s = 1 is nice instead of s.store(1).
May 31 2021
next sibling parent Paul Backus <snarwin gmail.com> writes:
On Monday, 31 May 2021 at 08:52:33 UTC, IGotD- wrote:
 Is it possible to overload binary operators so that they cause 
 an compiler error?
` disable opBinary` should do it.
May 31 2021
prev sibling parent reply Guillaume Piolat <first.last gmail.com> writes:
On Monday, 31 May 2021 at 08:52:33 UTC, IGotD- wrote:
 However, the assignment operator writing s = 1 is nice instead 
 of s.store(1).
I don't think so, it's a bit implicit meaning to be atomic, so it falls under "nice short syntax for something pretty much important": a terrible idea.
May 31 2021
parent Guillaume Piolat <first.last gmail.com> writes:
On Monday, 31 May 2021 at 16:33:05 UTC, Guillaume Piolat wrote:
 On Monday, 31 May 2021 at 08:52:33 UTC, IGotD- wrote:
 bit
big*
May 31 2021
prev sibling parent reply rm <rymrg memail.com> writes:
On 31/05/2021 11:18, Guillaume Piolat wrote:
 
 I have once implemented an atomic struct like this and the first thing 
 that happened is that you would write:
 
 
 s = s + 1;
 
 Breaking atomicity.
I don't consider this a problem. In this case you have a load and a store. This is a non-atomic RMW. On the other hand, you do get sequential consistency synchronization from this process.
May 31 2021
parent reply Guillaume Piolat <first.last gmail.com> writes:
On Monday, 31 May 2021 at 09:26:36 UTC, rm wrote:
 I don't consider this a problem. In this case you have a load 
 and a store. This is a non-atomic RMW. On the other hand, you 
 do get sequential consistency synchronization from this process.
I prefer atomicLoad and atomicStore then, because it's explicit and it's useless to hide the fact it's atomic behind nice syntax.
May 31 2021
next sibling parent reply IGotD- <nise nise.com> writes:
On Monday, 31 May 2021 at 16:34:35 UTC, Guillaume Piolat wrote:
 I prefer atomicLoad and atomicStore then, because it's explicit 
 and it's useless to hide the fact it's atomic behind nice 
 syntax.
Yes, you can use it if you want to. We will not remove the regular D atomic functions.
May 31 2021
parent reply Max Haughton <maxhaton gmail.com> writes:
On Monday, 31 May 2021 at 17:51:26 UTC, IGotD- wrote:
 On Monday, 31 May 2021 at 16:34:35 UTC, Guillaume Piolat wrote:
 I prefer atomicLoad and atomicStore then, because it's 
 explicit and it's useless to hide the fact it's atomic behind 
 nice syntax.
Yes, you can use it if you want to. We will not remove the regular D atomic functions.
That and the C++ `std::atomic` will provide the same semantics on types of wide size (LDC and GDC seem to differ in behaviour here when you use the atomic primitive functions.)
May 31 2021
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 31 May 2021 at 20:01:57 UTC, Max Haughton wrote:
 On Monday, 31 May 2021 at 17:51:26 UTC, IGotD- wrote:
 On Monday, 31 May 2021 at 16:34:35 UTC, Guillaume Piolat wrote:
 I prefer atomicLoad and atomicStore then, because it's 
 explicit and it's useless to hide the fact it's atomic behind 
 nice syntax.
Yes, you can use it if you want to. We will not remove the regular D atomic functions.
That and the C++ `std::atomic` will provide the same semantics on types of wide size (LDC and GDC seem to differ in behaviour here when you use the atomic primitive functions.)
Are you sure? «All atomic types except for std::atomic_flag may be implemented using mutexes or other locking operations, rather than using the lock-free atomic CPU instructions.» https://en.cppreference.com/w/cpp/atomic/atomic_is_lock_free C/C++ is trying to be hardware-independent to a much larger extent than D.
May 31 2021
parent reply Max Haughton <maxhaton gmail.com> writes:
On Monday, 31 May 2021 at 20:43:37 UTC, Ola Fosheim Grøstad wrote:
 On Monday, 31 May 2021 at 20:01:57 UTC, Max Haughton wrote:
 On Monday, 31 May 2021 at 17:51:26 UTC, IGotD- wrote:
 [...]
That and the C++ `std::atomic` will provide the same semantics on types of wide size (LDC and GDC seem to differ in behaviour here when you use the atomic primitive functions.)
Are you sure? «All atomic types except for std::atomic_flag may be implemented using mutexes or other locking operations, rather than using the lock-free atomic CPU instructions.» https://en.cppreference.com/w/cpp/atomic/atomic_is_lock_free C/C++ is trying to be hardware-independent to a much larger extent than D.
"Atomic types are also allowed to be sometimes lock-free" https://gcc.godbolt.org/z/Ph981GvY8 Note the use of library calls.
May 31 2021
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 31 May 2021 at 21:01:35 UTC, Max Haughton wrote:
 https://en.cppreference.com/w/cpp/atomic/atomic_is_lock_free

 C/C++ is trying to be hardware-independent to a much larger 
 extent than D.
"Atomic types are also allowed to be sometimes lock-free"
Yes, that is what the trait is for? But with the limited hardware scope D has it surely can provide more convenient guarantees than C++ can?
May 31 2021
parent reply Max Haughton <maxhaton gmail.com> writes:
On Monday, 31 May 2021 at 21:08:37 UTC, Ola Fosheim Grøstad wrote:
 On Monday, 31 May 2021 at 21:01:35 UTC, Max Haughton wrote:
 https://en.cppreference.com/w/cpp/atomic/atomic_is_lock_free

 C/C++ is trying to be hardware-independent to a much larger 
 extent than D.
"Atomic types are also allowed to be sometimes lock-free"
Yes, that is what the trait is for? But with the limited hardware scope D has it surely can provide more convenient guarantees than C++ can?
This is orthogonal to the example I posted, what if the hardware can't perform the operation using simple atomic instructions, you might as well provide the fallback case anyway - both for easier correctness and to kill two birds with one API. Guaranteeing that the type uses the instructions anyway is up to the implementation, but the guarantee can be made nonetheless.
May 31 2021
parent Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 31 May 2021 at 21:23:17 UTC, Max Haughton wrote:
 This is orthogonal to the example I posted, what if the 
 hardware can't perform the operation using simple atomic 
 instructions, you might as well provide the fallback case 
 anyway - both for easier correctness and to kill two birds with 
 one API. Guaranteeing that the type uses the instructions 
 anyway is up to the implementation, but the guarantee can be 
 made nonetheless.
I am not sure I understand what you mean now. Locking operations may imply completely different algorithms. In C++ you can either do a static compile time check using ```is_always_lock_free``` or a dynamic runtime check (then take an alternative path if it isn't). The dynamic check is to allow higher performance when it can be used, but that might require a completely different algorithm? Or with C++20 you have optional ```atomic_signed_lock_free``` and ```atomic_unsigned_lock_free```, which I probably will use when I get them.
May 31 2021
prev sibling parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Monday, 31 May 2021 at 16:34:35 UTC, Guillaume Piolat wrote:
 On Monday, 31 May 2021 at 09:26:36 UTC, rm wrote:
 I don't consider this a problem. In this case you have a load 
 and a store. This is a non-atomic RMW. On the other hand, you 
 do get sequential consistency synchronization from this 
 process.
I prefer atomicLoad and atomicStore then, because it's explicit and it's useless to hide the fact it's atomic behind nice syntax.
Yes, how often do people use this anyway? I try to avoid concurrency issues and have found that I tend to end up using compare-exchange when I have to.
May 31 2021
parent reply rm <rymrg memail.com> writes:
On 31/05/2021 23:33, Ola Fosheim Grøstad wrote:
 On Monday, 31 May 2021 at 16:34:35 UTC, Guillaume Piolat wrote:
 On Monday, 31 May 2021 at 09:26:36 UTC, rm wrote:
 I don't consider this a problem. In this case you have a load and a 
 store. This is a non-atomic RMW. On the other hand, you do get 
 sequential consistency synchronization from this process.
I prefer atomicLoad and atomicStore then, because it's explicit and it's useless to hide the fact it's atomic behind nice syntax.
Yes, how often do people use this anyway? I try to avoid concurrency issues and have found that I tend to end up using compare-exchange when I have to.
It's useful if you want to implement known concurrency algorithms with SC semantics. Such as lamports lock (which requires SC). http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2 http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-2-of-2 It's there to nudge people away from using the weaker semantics and allow easy synchronization.
Jun 02 2021
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 2 June 2021 at 14:08:32 UTC, rm wrote:
 It's useful if you want to implement known concurrency 
 algorithms with SC semantics. Such as lamports lock (which 
 requires SC).
Have you ever used Lamport's Bakery, though? Atomic inc/dec are obviously useful, but usually you want to know what the value was before/after the operation, so fetch_add/compare_exchange are easier to deal with IMO.
Jun 02 2021
parent reply rm <rymrg memail.com> writes:
On 02/06/2021 17:59, Ola Fosheim Grøstad wrote:
 On Wednesday, 2 June 2021 at 14:08:32 UTC, rm wrote:
 It's useful if you want to implement known concurrency algorithms with 
 SC semantics. Such as lamports lock (which requires SC).
Have you ever used Lamport's Bakery, though?
Not Lamport's Bakery. But I did implement some primitives. betterC does limit the options to work with phobos. For the other cases, I do start with explicit syntax. As I start with strong accesses and try to relax them as I progress. But that's mostly because I want try use the weaker memory semantics.
 Atomic inc/dec are obviously useful, but usually you want to know what 
 the value was before/after the operation, so fetch_add/compare_exchange 
 are easier to deal with IMO.
What's wrong with this? ```D Atomic!int x = 5; int a = x+; // a = 5 ``` https://github.com/rymrg/drm/blob/9db88fb468e2b8babdf9bde488d28d733aea638f/atomic.d#L95 inc/dec are implemented in terms of fetch_add.
Jun 02 2021
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 2 June 2021 at 15:09:54 UTC, rm wrote:
 inc/dec are implemented in terms of fetch_add.
IIRC some architectures provide more efficient inc/dec atomics without fetch? I haven't looked at that in years, so I have no idea what the contemporary situation is.
Jun 02 2021
parent reply Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= <ola.fosheim.grostad gmail.com> writes:
On Wednesday, 2 June 2021 at 15:19:59 UTC, Ola Fosheim Grøstad 
wrote:
 On Wednesday, 2 June 2021 at 15:09:54 UTC, rm wrote:
 inc/dec are implemented in terms of fetch_add.
IIRC some architectures provide more efficient inc/dec atomics without fetch? I haven't looked at that in years, so I have no idea what the contemporary situation is.
No, I think that was wrong, I think they usually return the original value (or set a flag or whatever). But it doesn't matter. We should just look at what the common contemporary processors provide and look at instructions per clock cycles throughput. I guess last generation ARM/Intel/AMD is sufficient?
Jun 02 2021
parent reply Max Haughton <maxhaton gmail.com> writes:
On Wednesday, 2 June 2021 at 15:30:46 UTC, Ola Fosheim Grøstad 
wrote:
 On Wednesday, 2 June 2021 at 15:19:59 UTC, Ola Fosheim Grøstad 
 wrote:
 On Wednesday, 2 June 2021 at 15:09:54 UTC, rm wrote:
 inc/dec are implemented in terms of fetch_add.
IIRC some architectures provide more efficient inc/dec atomics without fetch? I haven't looked at that in years, so I have no idea what the contemporary situation is.
No, I think that was wrong, I think they usually return the original value (or set a flag or whatever). But it doesn't matter. We should just look at what the common contemporary processors provide and look at instructions per clock cycles throughput. I guess last generation ARM/Intel/AMD is sufficient?
Are they always fixed latency? No dependence on the load store queue state (etc.) for example?
Jun 02 2021
parent rm <rymrg memail.com> writes:
On 02/06/2021 20:33, Max Haughton wrote:
 On Wednesday, 2 June 2021 at 15:30:46 UTC, Ola Fosheim Grøstad wrote:
 On Wednesday, 2 June 2021 at 15:19:59 UTC, Ola Fosheim Grøstad wrote:
 On Wednesday, 2 June 2021 at 15:09:54 UTC, rm wrote:
 inc/dec are implemented in terms of fetch_add.
IIRC some architectures provide more efficient inc/dec atomics without fetch? I haven't looked at that in years, so I have no idea what the contemporary situation is.
No, I think that was wrong, I think they usually return the original value (or set a flag or whatever). But it doesn't matter. We should just look at what the common contemporary processors provide and look at instructions per clock cycles throughput. I guess last generation ARM/Intel/AMD is sufficient?
Are they always fixed latency? No dependence on the load store queue state (etc.)  for example?
At least on x86-TSO, an atomic operation forces the cache to be flushed to memory.
Jun 02 2021