digitalmars.D - Low-Lock Singletons In D
- dsimcha (12/12) May 05 2013 On the advice of Walter and Andrei, I've written a blog article
- Joshua Niehus (2/6) May 05 2013 Excellent talk at the conf, solid blog: +1 and 1
- Andrei Alexandrescu (8/19) May 05 2013 The reddit post is in need for upvotes to make it on the /r/programming
- Mehrdad (44/48) May 06 2013 Hmm...
- Mehrdad (2/6) May 06 2013 Oops I forgot the null check inside, don't forget that.
- Sergei Nosov (4/10) May 06 2013 All that double-checking stuff is trying to avoid calling "lock"
- Mehrdad (5/16) May 06 2013 There are _two_ Get() functions. Only one of them calls lock();
- Dmitry Olshansky (13/61) May 06 2013 And that field is still shared... it doesn't matter if the null was
- Mehrdad (2/6) May 06 2013
- Mehrdad (5/7) May 06 2013 That never happens, though. _static is only written to inside a
- deadalnix (4/11) May 06 2013 ON x86 you rarely see this kind of thing as the memory model is
- Mehrdad (6/21) May 06 2013 Ah yeah, this entire thing was geared toward x86.
- Dmitry Olshansky (26/33) May 06 2013 Yes, but the 1st processor may read _static exactly when the 2nd is
- Mehrdad (5/14) May 06 2013 It's a single pointer, there is no partial state -- it's either
- David Nadlinger (3/6) May 06 2013 No. But *aligned* word-sized writes *on x86* are.
- Mehrdad (3/9) May 06 2013 Right, but this was specifically meant for x86 (as I mentioned in
- Dmitry Olshansky (55/66) May 06 2013 True... e.g on x86 if that word is aligned.
- Mehrdad (14/27) May 06 2013 That's exactly the same as the classic double-checked lock bug,
- Dmitry Olshansky (11/34) May 07 2013 Yeah, and that was my point to begin with - your method doesn't bring
- Walter Bright (2/4) May 06 2013 I did that too. It's trickier than it appears.
- Andrei Alexandrescu (3/12) May 06 2013 It's well known. Needs a memory barrier on each access, so it's slower.
- Mehrdad (6/8) May 06 2013 Hmm, are you saying I'm missing a memory barrier I should have
- Andrei Alexandrescu (8/14) May 06 2013 Any concurrent operation (in this case read from one thread and write
- Mehrdad (9/17) May 06 2013 Hmm, are you referring to the same lack of a barrier that the
- Andrei Alexandrescu (4/21) May 07 2013 No. A tutorial on memory consistency models would be too long to insert
- Steven Schveighoffer (16/18) May 07 2013 In essence, a read requires an acquire memory barrier, a write requires ...
- Andrei Alexandrescu (4/14) May 07 2013 There's always a need to acquire so as to figure whether the steady
- Steven Schveighoffer (26/42) May 07 2013 Not really. Whether it is entered or not is dictated by the vtable. Ev...
- deadalnix (4/19) May 07 2013 That is incorrect as the thread not going into the lock can see a
- Steven Schveighoffer (4/23) May 07 2013 The memory barrier prevents that. You don't store the variable until th...
- Andrei Alexandrescu (6/29) May 07 2013 A memory barrier is not a one-way thing, i.e. not only the writer must
- Dmitry Olshansky (11/25) May 07 2013 Exactly.
- Steven Schveighoffer (31/57) May 07 2013 =
- Andrei Alexandrescu (11/33) May 07 2013 Yes. (One detail is there are several kinds of barriers, such as acquire...
- Steven Schveighoffer (6/18) May 07 2013 I will have to watch those some time, when I have 3 hours to kill :)
- deadalnix (7/43) May 07 2013 Short answer : because read can be done out of order and/or from
- Mehrdad (10/16) May 07 2013 Andrew, I still don't understand:
- Andrei Alexandrescu (6/21) May 07 2013 The writer is only half of the equation. The reader has its own cache to...
- Mehrdad (9/14) May 07 2013 Oooh! so basically this is the scenario you're referring to?
- Brad Roberts (7/32) May 07 2013 (this might be a repeat, I've only skimmed this thread)
- Andrei Alexandrescu (3/22) May 07 2013 Correct.
- Walter Bright (4/6) May 07 2013 Here's one written by this Andrei Alexandrescu fellow, I hear he knows w...
- Andrei Alexandrescu (3/10) May 07 2013 That doesn't get into memory models.
- Dmitry Olshansky (9/36) May 07 2013 Sutter's Mill is a good starting point even though it's C++ biased.
- QAston (3/7) May 07 2013 This was very helpful for me - focuses much more on the memory
- QAston (2/10) May 07 2013 Nah, look @ Dmitry's post, he was first to post that :P
- Mehrdad (2/10) May 25 2013 This was awesome/amazing.
- Sean Kelly (9/12) May 07 2013 The best explanation I've found of memory models is this paper:
- David Nadlinger (6/12) May 07 2013 The issue is that the write to _static might never appear on the
- Steven Schveighoffer (8/21) May 07 2013 I don't think that is an issue. The write to _static is protected by a ...
- Max Samukha (22/35) May 06 2013 FWIW, I played with a generalized form of this pattern long ago,
- Walter Bright (2/4) May 06 2013 And, that's the classic double checked locking bug!
- Steven Schveighoffer (8/13) May 06 2013 I think that's exactly what David had in his slide (well, except for an ...
- Max Samukha (15/20) May 24 2013 D man is the bug! I simply failed to insert a line that flags the
- bearophile (7/10) May 24 2013 In Bugzilla I have suggested some improvements for Nullable, but
- Max Samukha (6/16) May 24 2013 The question is what should be the result of:
- bearophile (9/14) May 24 2013 In D Nullable!(int*) needs to be a different type, because the
- Max Samukha (7/23) May 24 2013 It could be normalized by free functions isNull(T)(T t),
- bearophile (11/17) May 24 2013 In Haskell/Scala if you program idiomatically you usually use
- deadalnix (3/21) May 24 2013 New type, with an extra boolean.
- Idan Arye (7/20) May 06 2013 Thanks!
- Andrej Mitrovic (14/16) May 06 2013 Personally this covers it for me:
- Diggory (2/18) May 06 2013 Although that's only a single-per-thread-ton :P
- Andrej Mitrovic (2/9) May 06 2013
- Steven Schveighoffer (8/17) May 06 2013 The point that is missed is the lazy creation. Imagine if it's not just...
- Steven Schveighoffer (22/33) May 07 2013 Pulling this out from the depths of this discussion:
On the advice of Walter and Andrei, I've written a blog article about the low-lock Singleton pattern in D. This is a previously obscure pattern that uses thread-local storage to make Singletons both thread-safe and efficient and was independently invented by at least me and Alexander Terekhov, an IBM researcher. However, D's first-class treatment of thread-local storage means the time has come to move it out of obscurity and possibly make it the standard way to do Singletons. Article: http://davesdprogramming.wordpress.com/2013/05/06/low-lock-singletons/ Reddit: http://www.reddit.com/r/programming/comments/1droaa/lowlock_singletons_in_d_the_singleton_pattern/
May 05 2013
On Monday, 6 May 2013 at 02:35:33 UTC, dsimcha wrote:Article: http://davesdprogramming.wordpress.com/2013/05/06/low-lock-singletons/ Reddit: http://www.reddit.com/r/programming/comments/1droaa/lowlock_singletons_in_d_the_singleton_pattern/Excellent talk at the conf, solid blog: +1 and 1
May 05 2013
On 5/5/13 10:35 PM, dsimcha wrote:On the advice of Walter and Andrei, I've written a blog article about the low-lock Singleton pattern in D. This is a previously obscure pattern that uses thread-local storage to make Singletons both thread-safe and efficient and was independently invented by at least me and Alexander Terekhov, an IBM researcher. However, D's first-class treatment of thread-local storage means the time has come to move it out of obscurity and possibly make it the standard way to do Singletons. Article: http://davesdprogramming.wordpress.com/2013/05/06/low-lock-singletons/ Reddit: http://www.reddit.com/r/programming/comments/1droaa/lowlock_singletons_in_d_the_singleton_pattern/The reddit post is in need for upvotes to make it on the /r/programming front page. I think posting time wasn't the best: Saturday is the worst day to post anything (http://www.slideshare.net/babasave/the-hidden-secrets-of-successful-reddit-posts). Anyhow, I've also posted to https://news.ycombinator.com/item?id=5660897 Vote up! Andrei
May 05 2013
On Monday, 6 May 2013 at 02:35:33 UTC, dsimcha wrote:that uses thread-local storage On DMD the overhead of TLS vs. unsafe is noticeable but small. In both cases it pales in comparison to the overhead of synchronizing on every call to get().Hmm... So I just invented another method right now in like 10 minutes, which can completely avoid TLS altogether, and which I think might end up being faster. Basically, my idea is that since modern CPUs (x86) are great at predicting predictable virtual calls, you can use that to your advantage as shown below. have the time to do it at the moment but if you're interested give it a try and see how it compares to TLS: class Program { private interface IValue<T> { T Get(); } private class ActualValue<T> : IValue<T> { private T value; public T Get() { return this.value; } } private class NullValue<T> : IValue<T> { // This field is initialized on startup public static IValue<T> _static = new NullValue<T>(); public T Get() { lock (this) { _static = new ActualValue<T>(); } // Insert memory barrier if you'd like return _static.Get(); } } public static object Static { get { return NullValue<object>._static.Get(); } } static void Main(string[] args) { var a = Static; // initializes on first use var b = Static; // doesn't initialize anymore } }
May 06 2013
On Monday, 6 May 2013 at 09:06:55 UTC, Mehrdad wrote:lock (this) { _static = new ActualValue<T>(); }Oops I forgot the null check inside, don't forget that.
May 06 2013
On Monday, 6 May 2013 at 09:11:00 UTC, Mehrdad wrote:On Monday, 6 May 2013 at 09:06:55 UTC, Mehrdad wrote:All that double-checking stuff is trying to avoid calling "lock" for entire Get function body. Your solution does exactly that. So it's no good.lock (this) { _static = new ActualValue<T>(); }Oops I forgot the null check inside, don't forget that.
May 06 2013
On Monday, 6 May 2013 at 09:30:24 UTC, Sergei Nosov wrote:On Monday, 6 May 2013 at 09:11:00 UTC, Mehrdad wrote:There are _two_ Get() functions. Only one of them calls lock(); once the field is initialized, that Get() is no longer called, and the other one is called instead. Unless I'm missing something?On Monday, 6 May 2013 at 09:06:55 UTC, Mehrdad wrote:All that double-checking stuff is trying to avoid calling "lock" for entire Get function body. Your solution does exactly that. So it's no good.lock (this) { _static = new ActualValue<T>(); }Oops I forgot the null check inside, don't forget that.
May 06 2013
06-May-2013 13:06, Mehrdad пишет:On Monday, 6 May 2013 at 02:35:33 UTC, dsimcha wrote:And that field is still shared... it doesn't matter if the null was replaced by NullValue and an if-branch with indirect call. You have to read a field to know what to do next, and the other processor may as well write to it.that uses thread-local storage On DMD the overhead of TLS vs. unsafe is noticeable but small. In both cases it pales in comparison to the overhead of synchronizing on every call to get().Hmm... So I just invented another method right now in like 10 minutes, which can completely avoid TLS altogether, and which I think might end up being faster. Basically, my idea is that since modern CPUs (x86) are great at predicting predictable virtual calls, you can use that to your advantage as shown below. time to do it at the moment but if you're interested give it a try and see how it compares to TLS: class Program { private interface IValue<T> { T Get(); } private class ActualValue<T> : IValue<T> { private T value; public T Get() { return this.value; } } private class NullValue<T> : IValue<T> { // This field is initialized on startup public static IValue<T> _static = new NullValue<T>();public T Get() { lock (this) {if(_static != null)_static = new ActualValue<T>();Who told you that the processor will immediately see the fully constructed value of ActualValue upon assignment? Barriers and other minor forms of black magic are still required on each access (e.g. atomic reads and atomic write) which isn't the case with TLS flag as discussed.} // Insert memory barrier if you'd like return _static.Get(); } } public static object Static { get { return NullValue<object>._static.Get(); } } static void Main(string[] args) { var a = Static; // initializes on first use var b = Static; // doesn't initialize anymore } }-- Dmitry Olshansky
May 06 2013
On Monday, 6 May 2013 at 09:35:59 UTC, Dmitry Olshansky wrote:Who told you that the processor will immediately see the fully constructed value of ActualValue upon assignment? Barriers and other minor forms of black magic are still requiredIsn't that why I wrote this?// Insert memory barrier if you'd like
May 06 2013
On Monday, 6 May 2013 at 09:35:59 UTC, Dmitry Olshansky wrote:You have to read a field to know what to do next, and the other processor may as well write to it.That never happens, though. _static is only written to inside a lock, and the check is inside a lock, hence the other processor can't be writing to it when it's being read... Maybe I misunderstood what you're saying?
May 06 2013
On Monday, 6 May 2013 at 10:06:50 UTC, Mehrdad wrote:On Monday, 6 May 2013 at 09:35:59 UTC, Dmitry Olshansky wrote:ON x86 you rarely see this kind of thing as the memory model is strong. You can see some weird thing in rare cases, but as soon as you leave x86 world, it isn't even worth trying.You have to read a field to know what to do next, and the other processor may as well write to it.That never happens, though. _static is only written to inside a lock, and the check is inside a lock, hence the other processor can't be writing to it when it's being read... Maybe I misunderstood what you're saying?
May 06 2013
On Monday, 6 May 2013 at 10:12:53 UTC, deadalnix wrote:On Monday, 6 May 2013 at 10:06:50 UTC, Mehrdad wrote:Ah yeah, this entire thing was geared toward x86. On a second thought, maybe it probably doesn't improve anything compared to the double-checked lock, since the branch prediction for 'if' is most likely faster than the branch prediction for a virtual call anyway. It was worth trying I guess...On Monday, 6 May 2013 at 09:35:59 UTC, Dmitry Olshansky wrote:ON x86 you rarely see this kind of thing as the memory model is strong. You can see some weird thing in rare cases, but as soon as you leave x86 world, it isn't even worth trying.You have to read a field to know what to do next, and the other processor may as well write to it.That never happens, though. _static is only written to inside a lock, and the check is inside a lock, hence the other processor can't be writing to it when it's being read... Maybe I misunderstood what you're saying?
May 06 2013
06-May-2013 14:06, Mehrdad пишет:On Monday, 6 May 2013 at 09:35:59 UTC, Dmitry Olshansky wrote:Yes, but the 1st processor may read _static exactly when the 2nd is inside the lock and writing to that field. Then chances are it will read whatever partial state there is written.You have to read a field to know what to do next, and the other processor may as well write to it.That never happens, though. _static is only written to inside a lock,and the check is inside a lock, hence the other processor can't be writing to it when it's being read...Surely it can. 1st processor takes the lock and write while the second one reads static_ to call Get.Maybe I misunderstood what you're saying?I bet you just mystified the pattern a bit by relying on indirect call. In pseudo-code what you do is this: read static_ // 1st processor may be reading here if static_ is NullValue //that is indirect call lock(static_){ check again write static_ //while 2nd one writes here } else call static_ -~~-> get I claim that 'read static_' better be "protected" against that 'write static_' inside the lock. One way is to ensure write is atomic w.r.t. that particular read operation e.g. by hardware acquire-release, or via both reads/writes being atomic. On x86 word-sized reads/writes are atomic anyway (iff aligned apparently). The neat stuff about that TLS pattern was that you need not to rely on hardware specific support. -- Dmitry Olshansky
May 06 2013
On Monday, 6 May 2013 at 11:21:37 UTC, Dmitry Olshansky wrote:Yes, but the 1st processor may read _static exactly when the 2nd is inside the lock and writing to that field. Then chances are it will read whatever partial state there is written. Surely it can. 1st processor takes the lock and write while the second one reads static_ to call Get.It's a single pointer, there is no partial state -- it's either written or it isn't.One way is to ensure write is atomic w.r.t. that particular read operationAre't pointer writes always atomic?The neat stuff about that TLS pattern was that you need not to rely on hardware specific support.Ah okay I see, didn't realize that.
May 06 2013
On Monday, 6 May 2013 at 18:23:51 UTC, Mehrdad wrote:No. But *aligned* word-sized writes *on x86* are. DavidOne way is to ensure write is atomic w.r.t. that particular read operationAre't pointer writes always atomic?
May 06 2013
On Monday, 6 May 2013 at 18:51:12 UTC, David Nadlinger wrote:On Monday, 6 May 2013 at 18:23:51 UTC, Mehrdad wrote:Right, but this was specifically meant for x86 (as I mentioned in the OP) and pointers are aligned by default. :)No. But *aligned* word-sized writes *on x86* are. DavidOne way is to ensure write is atomic w.r.t. that particular read operationAre't pointer writes always atomic?
May 06 2013
06-May-2013 22:23, Mehrdad пишет:On Monday, 6 May 2013 at 11:21:37 UTC, Dmitry Olshansky wrote:True... e.g on x86 if that word is aligned. But the core of problem is not only that word it's rather the fact that processor (and compiler and who's not) is free to re-order read/write ops inside that locked region. Potentially it could lead to epically nasty things. struct Static{ int value; Static(int v){ value = v; } } Now suppose: lock(_static){ _static = new Static(42); //or smth like } Is compiled down to this "C--" code (quite realistically): lock _static_mutex; x = alloc int; x[0] = 42; static_ = x; unlock _static_mutex; And now compiler/CPU decides to optimize/execute out of order (again, it's an illustration) it as: lock _static_mutex; x = alloc int; //even if that's atomic static_ = x; // BOOM! somebody not locking mutex may already // see static_ in "half-baked" state x[0] = 42; unlock _static_mutex; Observe that: a) It can and nothing prevents such a scenario. In fact it should feel free to optimize inside the locked section, but not accross b) The chance that it happens is non-zero and rises with the complexity of construction c) Depends on hardware kind(!) used as in OoO vs non-OoO So it would work more reliably on old Atoms if that of any comfort ;). That if your compiler isn't equally smart and does nasty things behind your back ("Sorry, I just scheduled instructions optimally...").Yes, but the 1st processor may read _static exactly when the 2nd is inside the lock and writing to that field. Then chances are it will read whatever partial state there is written. Surely it can. 1st processor takes the lock and write while the second one reads static_ to call Get.It's a single pointer, there is no partial state -- it's either written or it isn't.In short - no. Even not counting the world of legal re-ordering, unaligned writes too have this nasty habit of partially written stuff being visible at the wrong time. Truth be told relying on these kind of "not ever likely to go wrong" is the very reason explicit atomics are encouraged even if they are NOPs on your current arch. That or faster/harder stuff - barriers. Speaking of barriers - that is exactly the kind of thing that would disallow moving write of x[0] = 42 past "line" of static_ = x; Simply put barriers act as a line of "no reordering across this point" for specific operations (depending on type, or all of them). Needles to say - hurts performance. -- Dmitry OlshanskyOne way is to ensure write is atomic w.r.t. that particular read operationAre't pointer writes always atomic?
May 06 2013
On Monday, 6 May 2013 at 18:56:08 UTC, Dmitry Olshansky wrote: Thanks for the detailed explanation!And now compiler/CPU decides to optimize/execute out of order (again, it's an illustration) it as: lock _static_mutex; x = alloc int; //even if that's atomic static_ = x; // BOOM! somebody not locking mutex may already // see static_ in "half-baked" state x[0] = 42; unlock _static_mutex;That's exactly the same as the classic double-checked lock bug, right? As I wrote in my original code -- and as you also mentioned yourself -- isn't it trivially fixed with a memory barrier? Like maybe replacing _static = new ActualValue<T>(); with var value = new ActualValue<T>(); _ReadWriteBarrier(); _static = value; Wouldn't this make it correct?But my example was completely aligned...Are't pointer writes always atomic?In short - no. Even not counting the world of legal re-ordering, unaligned writes
May 06 2013
07-May-2013 10:47, Mehrdad пишет:On Monday, 6 May 2013 at 18:56:08 UTC, Dmitry Olshansky wrote: Thanks for the detailed explanation!Yeah, and that was my point to begin with - your method doesn't bring anything new. It's the same as the one with null and 'if-null-check' with same issues and requires atomics or barriers.And now compiler/CPU decides to optimize/execute out of order (again, it's an illustration) it as: lock _static_mutex; x = alloc int; //even if that's atomic static_ = x; // BOOM! somebody not locking mutex may already // see static_ in "half-baked" state x[0] = 42; unlock _static_mutex;That's exactly the same as the classic double-checked lock bug, right?As I wrote in my original code -- and as you also mentioned yourself -- isn't it trivially fixed with a memory barrier? Like maybe replacing _static = new ActualValue<T>(); with var value = new ActualValue<T>(); _ReadWriteBarrier(); _static = value; Wouldn't this make it correct?Would but then it's the same as the old fixed double-checked locking. Barriers hurt performance that we were after to begin with. Now it would be interesting to measure speed of this TLS low-lock vs atomic-load/memory barrier + mutex. This measurement is absent from the blog post, but Andrei claims memory barrier on each access is too slow. -- Dmitry Olshansky
May 07 2013
On 5/6/2013 2:06 AM, Mehrdad wrote:So I just invented another method right now in like 10 minutes, which can completely avoid TLS altogether, and which I think might end up being faster.I did that too. It's trickier than it appears.
May 06 2013
On 5/6/13 5:06 AM, Mehrdad wrote:On Monday, 6 May 2013 at 02:35:33 UTC, dsimcha wrote:It's well known. Needs a memory barrier on each access, so it's slower. Andreithat uses thread-local storage On DMD the overhead of TLS vs. unsafe is noticeable but small. In both cases it pales in comparison to the overhead of synchronizing on every call to get().Hmm... So I just invented another method right now in like 10 minutes, which can completely avoid TLS altogether, and which I think might end up being faster.
May 06 2013
On Monday, 6 May 2013 at 13:33:54 UTC, Andrei Alexandrescu wrote:It's well known. Needs a memory barrier on each access, so it's slower.Hmm, are you saying I'm missing a memory barrier I should have written, or are you saying I already have a memory barrier which I'm not seeing? The only memory barrier I have is during initialization, and after that only read operations occur.
May 06 2013
On 5/6/13 2:25 PM, Mehrdad wrote:On Monday, 6 May 2013 at 13:33:54 UTC, Andrei Alexandrescu wrote:Any concurrent operation (in this case read from one thread and write from another) requires a handshake between threads, most often in the form of an release write coupled with an acquire read. Whenever the handshake is absent but concurrent operations on shared memory do occur, the code is broken. The beauty of the TLS-based pattern is that in the steady state there's no need for a shared read and handshake. AndreiIt's well known. Needs a memory barrier on each access, so it's slower.Hmm, are you saying I'm missing a memory barrier I should have written, or are you saying I already have a memory barrier which I'm not seeing? The only memory barrier I have is during initialization, and after that only read operations occur.
May 06 2013
On Monday, 6 May 2013 at 18:46:56 UTC, Andrei Alexandrescu wrote:Any concurrent operation (in this case read from one thread and write from another) requires a handshake between threads, most often in the form of an release write coupled with an acquire read. Whenever the handshake is absent but concurrent operations on shared memory do occur, the code is broken. The beauty of the TLS-based pattern is that in the steady state there's no need for a shared read and handshake. AndreiHmm, are you referring to the same lack of a barrier that the others are also referring to? As far as I can see, there shouldn't be a need for any other handshake in this example. As long as the object is fully initialized before _static is written to (easy enough with just a memory barrier), there is no penalty for subsequent reads whatsoever. Right?
May 06 2013
On 5/7/13 2:50 AM, Mehrdad wrote:On Monday, 6 May 2013 at 18:46:56 UTC, Andrei Alexandrescu wrote:No. A tutorial on memory consistency models would be too long to insert here. I don't know of a good online resource, does anyone? AndreiAny concurrent operation (in this case read from one thread and write from another) requires a handshake between threads, most often in the form of an release write coupled with an acquire read. Whenever the handshake is absent but concurrent operations on shared memory do occur, the code is broken. The beauty of the TLS-based pattern is that in the steady state there's no need for a shared read and handshake. AndreiHmm, are you referring to the same lack of a barrier that the others are also referring to? As far as I can see, there shouldn't be a need for any other handshake in this example. As long as the object is fully initialized before _static is written to (easy enough with just a memory barrier), there is no penalty for subsequent reads whatsoever. Right?
May 07 2013
On Tue, 07 May 2013 09:25:36 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:No. A tutorial on memory consistency models would be too long to insert here. I don't know of a good online resource, does anyone?In essence, a read requires an acquire memory barrier, a write requires a release memory barrier, but in this case, we only need to be concerned if the value we get back is not valid (i.e. NullValue). Once in steady state, there is no need to acquire (as long as the write is atomic, the read value will either be NullValue or ActualValue, not something else). The code in the case of NullValue must be handled very carefully with the correct memory barriers (Given the fact that you should only execute once, just insert a full memory barrier). But that is not the steady state. It's not a revolutionary design, it's basic double-checked locking (implemented in an unnecessarily complex way). It can be done right, but it's still really difficult to get right. The benefit of David's method is that it's REALLY easy to get right, and is MM independent. -Steve
May 07 2013
On 5/7/13 10:31 AM, Steven Schveighoffer wrote:On Tue, 07 May 2013 09:25:36 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:There's always a need to acquire so as to figure whether the steady state has been entered. AndreiNo. A tutorial on memory consistency models would be too long to insert here. I don't know of a good online resource, does anyone?In essence, a read requires an acquire memory barrier, a write requires a release memory barrier, but in this case, we only need to be concerned if the value we get back is not valid (i.e. NullValue). Once in steady state, there is no need to acquire (as long as the write is atomic, the read value will either be NullValue or ActualValue, not something else).
May 07 2013
On Tue, 07 May 2013 11:30:12 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 5/7/13 10:31 AM, Steven Schveighoffer wrote:Not really. Whether it is entered or not is dictated by the vtable. Even classic double-check locking doesn't need an acquire outside the lock. Even if your CPU's view of the variable is outdated, the check after the memory barrier inside the lock only occurs once. After that, steady state is achieved. All subsequent reads need no memory barriers, because the singleton object will never change after that. The only thing we need to guard against is non-atomic writes, and out of order writes of the static variable (fixed with a memory barrier). Instruction ordering OUTSIDE the lock is irrelevant, because if we don't get the "steady state" value (not null), then we go into the lock to perform the careful initialization with barriers. I think aligned native word writes are atomic, so we don't have to worry about that. But I think we've spent enough time on this solution. Yes, double-checked locking can be done, but David's pattern is far easier to implement, understand, and explain. It comes at a small cost of checking a boolean before each access of the initialized data. His benchmarks show a very small performance penalty. And another LARGE benefit is you don't have to pull out your obscure (possibly challenged) memory model book/blog post or the CPU spec to prove it :) Hmm... you might be able to mitigate the penalty by storing the actual object reference instead of a bool in the _instantiated variable. Then a separate load is not required. David? -SteveOn Tue, 07 May 2013 09:25:36 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:There's always a need to acquire so as to figure whether the steady state has been entered.No. A tutorial on memory consistency models would be too long to insert here. I don't know of a good online resource, does anyone?In essence, a read requires an acquire memory barrier, a write requires a release memory barrier, but in this case, we only need to be concerned if the value we get back is not valid (i.e. NullValue). Once in steady state, there is no need to acquire (as long as the write is atomic, the read value will either be NullValue or ActualValue, not something else).
May 07 2013
On Tuesday, 7 May 2013 at 16:14:50 UTC, Steven Schveighoffer wrote:Not really. Whether it is entered or not is dictated by the vtable. Even classic double-check locking doesn't need an acquire outside the lock. Even if your CPU's view of the variable is outdated, the check after the memory barrier inside the lock only occurs once. After that, steady state is achieved. All subsequent reads need no memory barriers, because the singleton object will never change after that. The only thing we need to guard against is non-atomic writes, and out of order writes of the static variable (fixed with a memory barrier). Instruction ordering OUTSIDE the lock is irrelevant, because if we don't get the "steady state" value (not null), then we go into the lock to perform the careful initialization with barriers. I think aligned native word writes are atomic, so we don't have to worry about that.That is incorrect as the thread not going into the lock can see a partially initialized object.
May 07 2013
On Tue, 07 May 2013 12:30:05 -0400, deadalnix <deadalnix gmail.com> wrote:On Tuesday, 7 May 2013 at 16:14:50 UTC, Steven Schveighoffer wrote:The memory barrier prevents that. You don't store the variable until the object is initialized. That is the whole point. -SteveNot really. Whether it is entered or not is dictated by the vtable. Even classic double-check locking doesn't need an acquire outside the lock. Even if your CPU's view of the variable is outdated, the check after the memory barrier inside the lock only occurs once. After that, steady state is achieved. All subsequent reads need no memory barriers, because the singleton object will never change after that. The only thing we need to guard against is non-atomic writes, and out of order writes of the static variable (fixed with a memory barrier). Instruction ordering OUTSIDE the lock is irrelevant, because if we don't get the "steady state" value (not null), then we go into the lock to perform the careful initialization with barriers. I think aligned native word writes are atomic, so we don't have to worry about that.That is incorrect as the thread not going into the lock can see a partially initialized object.
May 07 2013
On 5/7/13 12:46 PM, Steven Schveighoffer wrote:On Tue, 07 May 2013 12:30:05 -0400, deadalnix <deadalnix gmail.com> wrote:A memory barrier is not a one-way thing, i.e. not only the writer must do it. Any operation on shared memory is a handshake between the writer and the reader. If the reader doesn't do its bit, it can see the writes out of order no matter what the writer does. AndreiOn Tuesday, 7 May 2013 at 16:14:50 UTC, Steven Schveighoffer wrote:The memory barrier prevents that. You don't store the variable until the object is initialized. That is the whole point.Not really. Whether it is entered or not is dictated by the vtable. Even classic double-check locking doesn't need an acquire outside the lock. Even if your CPU's view of the variable is outdated, the check after the memory barrier inside the lock only occurs once. After that, steady state is achieved. All subsequent reads need no memory barriers, because the singleton object will never change after that. The only thing we need to guard against is non-atomic writes, and out of order writes of the static variable (fixed with a memory barrier). Instruction ordering OUTSIDE the lock is irrelevant, because if we don't get the "steady state" value (not null), then we go into the lock to perform the careful initialization with barriers. I think aligned native word writes are atomic, so we don't have to worry about that.That is incorrect as the thread not going into the lock can see a partially initialized object.
May 07 2013
07-May-2013 23:49, Andrei Alexandrescu пишет:On 5/7/13 12:46 PM, Steven Schveighoffer wrote:[snip]On Tue, 07 May 2013 12:30:05 -0400, deadalnix <deadalnix gmail.com> wrote:Exactly. Returning to the confusing point. On x86 things are actually muddied by stronger then required hardware guarantees. And only because of this there no need for explicit read barrier (nor x86 have one) in this case. Still the read operation has to be marked specifically (volatile, asm block, whatever) to ensure the _compiler_ does the right thing (no reordering of across it). -- Dmitry OlshanskyA memory barrier is not a one-way thing, i.e. not only the writer must do it. Any operation on shared memory is a handshake between the writer and the reader. If the reader doesn't do its bit, it can see the writes out of order no matter what the writer does.That is incorrect as the thread not going into the lock can see a partially initialized object.The memory barrier prevents that. You don't store the variable until the object is initialized. That is the whole point.
May 07 2013
On Tue, 07 May 2013 16:02:08 -0400, Dmitry Olshansky = <dmitry.olsh gmail.com> wrote:07-May-2013 23:49, Andrei Alexandrescu =D0=BF=D0=B8=D1=88=D0=B5=D1=82:==On 5/7/13 12:46 PM, Steven Schveighoffer wrote:[snip]On Tue, 07 May 2013 12:30:05 -0400, deadalnix <deadalnix gmail.com> wrote:That is incorrect as the thread not going into the lock can see a partially initialized object.The memory barrier prevents that. You don't store the variable until=tthe object is initialized. That is the whole point.A memory barrier is not a one-way thing, i.e. not only the writer mus=erdo it. Any operation on shared memory is a handshake between the writ=esand the reader. If the reader doesn't do its bit, it can see the writ=So the memory barrier ensures that neither the compiler nor the processo= r = can re-order the stores to memory. But you are saying that the *reader* must also put in a memory barrier, = = otherwise it might see the stores out of order. It does not make sense to me, how does the reader see a half-initialized= = object, when the only way it sees anything is when it's stored to main = memory *and* the stores are done in order? So that must not be what it is doing. What it must be doing is storing = = out of order, BUT placing a prevention mechanism from reading the memory= = until the "release" is done? Kind of like a minuscule mutex lock. So = while it is out-of-order writing the object data, it holds a lock on the= = reference pointer to that data, so anyone using acquire cannot read it y= et? That actually makes sense to me. Is that the case?out of order no matter what the writer does.Exactly.Returning to the confusing point. On x86 things are actually muddied by stronger then required hardware ==guarantees. And only because of this there no need for explicit read =barrier (nor x86 have one) in this case. Still the read operation has =to =be marked specifically (volatile, asm block, whatever) to ensure the =_compiler_ does the right thing (no reordering of across it).I would think the model Mehrdad used would be sufficient to prevent = caching of the data, since it is a virtual, no? -Steve
May 07 2013
On 5/7/13 6:12 PM, Steven Schveighoffer wrote:So the memory barrier ensures that neither the compiler nor the processor can re-order the stores to memory. But you are saying that the *reader* must also put in a memory barrier, otherwise it might see the stores out of order.Yes. (One detail is there are several kinds of barriers, such as acquire and release.)It does not make sense to me, how does the reader see a half-initialized object, when the only way it sees anything is when it's stored to main memory *and* the stores are done in order?There are several actors involved: two processors, each with its own cache, and the main memory. There are several cache coherency protocols, which reflect in different programming models.So that must not be what it is doing. What it must be doing is storing out of order, BUT placing a prevention mechanism from reading the memory until the "release" is done? Kind of like a minuscule mutex lock. So while it is out-of-order writing the object data, it holds a lock on the reference pointer to that data, so anyone using acquire cannot read it yet? That actually makes sense to me. Is that the case?Not at all. A memory barrier dictates operation ordering. It doesn't do interlocking. One of Herb's slides shows very nicely how memory operations can't be moved ahead of an acquire and past a release.No. AndreiReturning to the confusing point. On x86 things are actually muddied by stronger then required hardware guarantees. And only because of this there no need for explicit read barrier (nor x86 have one) in this case. Still the read operation has to be marked specifically (volatile, asm block, whatever) to ensure the _compiler_ does the right thing (no reordering of across it).I would think the model Mehrdad used would be sufficient to prevent caching of the data, since it is a virtual, no?
May 07 2013
On Tue, 07 May 2013 18:49:19 -0400, Andrei Alexandrescu <SeeWebsiteForEmail erdani.org> wrote:On 5/7/13 6:12 PM, Steven Schveighoffer wrote:I will have to watch those some time, when I have 3 hours to kill :) Thanks for sticking with me and not dismissing out of hand. This is actually the first time I felt close to grasping this concept. -SteveSo that must not be what it is doing. What it must be doing is storing out of order, BUT placing a prevention mechanism from reading the memory until the "release" is done? Kind of like a minuscule mutex lock. So while it is out-of-order writing the object data, it holds a lock on the reference pointer to that data, so anyone using acquire cannot read it yet? That actually makes sense to me. Is that the case?Not at all. A memory barrier dictates operation ordering. It doesn't do interlocking. One of Herb's slides shows very nicely how memory operations can't be moved ahead of an acquire and past a release.
May 07 2013
On Tuesday, 7 May 2013 at 22:12:17 UTC, Steven Schveighoffer wrote:On Tue, 07 May 2013 16:02:08 -0400, Dmitry Olshansky <dmitry.olsh gmail.com> wrote:Short answer : because read can be done out of order and/or from cached values. The reader must do its part of the job as well. You ay want to read this serie of articles, they are very interesting. http://preshing.com/20120913/acquire-and-release-semantics07-May-2013 23:49, Andrei Alexandrescu пишет:So the memory barrier ensures that neither the compiler nor the processor can re-order the stores to memory. But you are saying that the *reader* must also put in a memory barrier, otherwise it might see the stores out of order. It does not make sense to me, how does the reader see a half-initialized object, when the only way it sees anything is when it's stored to main memory *and* the stores are done in order?On 5/7/13 12:46 PM, Steven Schveighoffer wrote:[snip]On Tue, 07 May 2013 12:30:05 -0400, deadalnix <deadalnix gmail.com> wrote:Exactly.A memory barrier is not a one-way thing, i.e. not only the writer must do it. Any operation on shared memory is a handshake between the writer and the reader. If the reader doesn't do its bit, it can see the writes out of order no matter what the writer does.That is incorrect as the thread not going into the lock can see a partially initialized object.The memory barrier prevents that. You don't store the variable until the object is initialized. That is the whole point.
May 07 2013
On Tuesday, 7 May 2013 at 19:49:30 UTC, Andrei Alexandrescu wrote:A memory barrier is not a one-way thing, i.e. not only the writer must do it. Any operation on shared memory is a handshake between the writer and the reader. If the reader doesn't do its bit, it can see the writes out of order no matter what the writer does. AndreiAndrew, I still don't understand: The writer is ensuring that writes to memory are happening _after_ the object is initialized and _before_ the reference to the old object is modified, via a memory barrier. Unless you're claiming that a memory barrier _doesn't_ do what it's supposed to (i.e., the memory module is executing writes out-of-order even though the processor is issuing them in the correct order), there is no way for _anyone_ to see a partially initialized object anywhere...
May 07 2013
On 5/7/13 5:57 PM, Mehrdad wrote:On Tuesday, 7 May 2013 at 19:49:30 UTC, Andrei Alexandrescu wrote:The writer is only half of the equation. The reader has its own cache to worry about and its own loading order.A memory barrier is not a one-way thing, i.e. not only the writer must do it. Any operation on shared memory is a handshake between the writer and the reader. If the reader doesn't do its bit, it can see the writes out of order no matter what the writer does. AndreiAndrew, I still don't understand: The writer is ensuring that writes to memory are happening _after_ the object is initialized and _before_ the reference to the old object is modified, via a memory barrier.Unless you're claiming that a memory barrier _doesn't_ do what it's supposed to (i.e., the memory module is executing writes out-of-order even though the processor is issuing them in the correct order), there is no way for _anyone_ to see a partially initialized object anywhere...I'm not claiming, I'm destroying :o). There is. I know it's confusing. You may want to peruse the reading materials linked by others. Andrei
May 07 2013
On Tuesday, 7 May 2013 at 22:44:28 UTC, Andrei Alexandrescu wrote:The writer is only half of the equation. The reader has its own cache to worry about and its own loading order.Oooh! so basically this is the scenario you're referring to? 1. The reader has the uninitialized data in its cache 2. The writer writes the new data and a pointer to the new data 3. The reader sees a new pointer and attempts to load the new data 4. The reader receives stale data from its cache In that case....I'm not claiming, I'm destroying :o)... well done!!There is. I know it's confusing. You may want to peruse the reading materials linked by others.Ok thanks!
May 07 2013
On 5/7/13 3:44 PM, Andrei Alexandrescu wrote:On 5/7/13 5:57 PM, Mehrdad wrote:(this might be a repeat, I've only skimmed this thread) Section 8.2 of this doc is a good read: http://download.intel.com/products/processor/manual/325384.pdf There's a couple allowed re-orderings that are not what most people expect. Later, BradOn Tuesday, 7 May 2013 at 19:49:30 UTC, Andrei Alexandrescu wrote:The writer is only half of the equation. The reader has its own cache to worry about and its own loading order.A memory barrier is not a one-way thing, i.e. not only the writer must do it. Any operation on shared memory is a handshake between the writer and the reader. If the reader doesn't do its bit, it can see the writes out of order no matter what the writer does. AndreiAndrew, I still don't understand: The writer is ensuring that writes to memory are happening _after_ the object is initialized and _before_ the reference to the old object is modified, via a memory barrier.Unless you're claiming that a memory barrier _doesn't_ do what it's supposed to (i.e., the memory module is executing writes out-of-order even though the processor is issuing them in the correct order), there is no way for _anyone_ to see a partially initialized object anywhere...I'm not claiming, I'm destroying :o). There is. I know it's confusing. You may want to peruse the reading materials linked by others. Andrei
May 07 2013
On 5/7/13 12:30 PM, deadalnix wrote:On Tuesday, 7 May 2013 at 16:14:50 UTC, Steven Schveighoffer wrote:Correct. AndreiNot really. Whether it is entered or not is dictated by the vtable. Even classic double-check locking doesn't need an acquire outside the lock. Even if your CPU's view of the variable is outdated, the check after the memory barrier inside the lock only occurs once. After that, steady state is achieved. All subsequent reads need no memory barriers, because the singleton object will never change after that. The only thing we need to guard against is non-atomic writes, and out of order writes of the static variable (fixed with a memory barrier). Instruction ordering OUTSIDE the lock is irrelevant, because if we don't get the "steady state" value (not null), then we go into the lock to perform the careful initialization with barriers. I think aligned native word writes are atomic, so we don't have to worry about that.That is incorrect as the thread not going into the lock can see a partially initialized object.
May 07 2013
On 5/7/2013 6:25 AM, Andrei Alexandrescu wrote:No. A tutorial on memory consistency models would be too long to insert here. I don't know of a good online resource, does anyone?Here's one written by this Andrei Alexandrescu fellow, I hear he knows what he's talking about: http://erdani.com/publications/DDJ_Jul_Aug_2004_revised.pdf
May 07 2013
On 5/7/13 1:31 PM, Walter Bright wrote:On 5/7/2013 6:25 AM, Andrei Alexandrescu wrote:That doesn't get into memory models. AndreiNo. A tutorial on memory consistency models would be too long to insert here. I don't know of a good online resource, does anyone?Here's one written by this Andrei Alexandrescu fellow, I hear he knows what he's talking about: http://erdani.com/publications/DDJ_Jul_Aug_2004_revised.pdf
May 07 2013
07-May-2013 17:25, Andrei Alexandrescu пишет:On 5/7/13 2:50 AM, Mehrdad wrote:Sutter's Mill is a good starting point even though it's C++ biased. Two recent insightful talks on C++11 memory model with down and dirty details: http://herbsutter.com/2013/02/11/atomic-weapons-the-c-memory-model-and-modern-hardware/ -- Dmitry Olshansky -- Dmitry OlshanskyOn Monday, 6 May 2013 at 18:46:56 UTC, Andrei Alexandrescu wrote:No. A tutorial on memory consistency models would be too long to insert here. I don't know of a good online resource, does anyone?Any concurrent operation (in this case read from one thread and write from another) requires a handshake between threads, most often in the form of an release write coupled with an acquire read. Whenever the handshake is absent but concurrent operations on shared memory do occur, the code is broken. The beauty of the TLS-based pattern is that in the steady state there's no need for a shared read and handshake. AndreiHmm, are you referring to the same lack of a barrier that the others are also referring to? As far as I can see, there shouldn't be a need for any other handshake in this example. As long as the object is fully initialized before _static is written to (easy enough with just a memory barrier), there is no penalty for subsequent reads whatsoever. Right?
May 07 2013
No. A tutorial on memory consistency models would be too long to insert here. I don't know of a good online resource, does anyone? AndreiThis was very helpful for me - focuses much more on the memory model itself than the c++11 part. http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2
May 07 2013
On Tuesday, 7 May 2013 at 20:17:43 UTC, QAston wrote:Nah, look Dmitry's post, he was first to post that :PNo. A tutorial on memory consistency models would be too long to insert here. I don't know of a good online resource, does anyone? AndreiThis was very helpful for me - focuses much more on the memory model itself than the c++11 part. http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2
May 07 2013
On Tuesday, 7 May 2013 at 20:17:43 UTC, QAston wrote:This was awesome/amazing.No. A tutorial on memory consistency models would be too long to insert here. I don't know of a good online resource, does anyone? AndreiThis was very helpful for me - focuses much more on the memory model itself than the c++11 part. http://channel9.msdn.com/Shows/Going+Deep/Cpp-and-Beyond-2012-Herb-Sutter-atomic-Weapons-1-of-2
May 25 2013
On Tuesday, 7 May 2013 at 13:25:37 UTC, Andrei Alexandrescu wrote:A tutorial on memory consistency models would be too long to insert here. I don't know of a good online resource, does anyone?The best explanation I've found of memory models is this paper: http://dl.acm.org/citation.cfm?id=325102 Which can be had for free (albeit in terrible print quality) here: http://eecs.vanderbilt.edu/courses/eece343/papers/p15-gharachorloo.pdf Kourosh Gharachorloo actually has a number of really good papers about this stuff. Here's another, if you're interested in the details of how things work: http://www.hpl.hp.com/techreports/Compaq-DEC/WRL-95-9.pdf
May 07 2013
On Tuesday, 7 May 2013 at 06:50:16 UTC, Mehrdad wrote:As far as I can see, there shouldn't be a need for any other handshake in this example. As long as the object is fully initialized before _static is written to (easy enough with just a memory barrier), there is no penalty for subsequent reads whatsoever. Right?The issue is that the write to _static might never appear on the other threads, thus leading to multiple instances being created - even though it is atomic in the sense that you never end up reading a pointer with e.g. only half of the bytes updated. David
May 07 2013
On Tue, 07 May 2013 10:33:13 -0400, David Nadlinger <see klickverbot.at> wrote:On Tuesday, 7 May 2013 at 06:50:16 UTC, Mehrdad wrote:I don't think that is an issue. The write to _static is protected by a lock, which should present a consistent view of it inside the lock. Mehrdad corrected his version right away that you need to check the value inside the lock again. This essentially is classic double-checked locking, with one check being a virtual table lookup. -SteveAs far as I can see, there shouldn't be a need for any other handshake in this example. As long as the object is fully initialized before _static is written to (easy enough with just a memory barrier), there is no penalty for subsequent reads whatsoever. Right?The issue is that the write to _static might never appear on the other threads, thus leading to multiple instances being created - even though it is atomic in the sense that you never end up reading a pointer with e.g. only half of the bytes updated.
May 07 2013
On Monday, 6 May 2013 at 02:35:33 UTC, dsimcha wrote:On the advice of Walter and Andrei, I've written a blog article about the low-lock Singleton pattern in D. This is a previously obscure pattern that uses thread-local storage to make Singletons both thread-safe and efficient and was independently invented by at least me and Alexander Terekhov, an IBM researcher. However, D's first-class treatment of thread-local storage means the time has come to move it out of obscurity and possibly make it the standard way to do Singletons. Article: http://davesdprogramming.wordpress.com/2013/05/06/low-lock-singletons/ Reddit: http://www.reddit.com/r/programming/comments/1droaa/lowlock_singletons_in_d_the_singleton_pattern/FWIW, I played with a generalized form of this pattern long ago, something like (typing from memory): template sharedGlobal(alias ctor, alias lock = globalLock) { property sharedGlobal() { alias ReturnType!ctor T; __gshared static Nullable!T t; static bool instantiated; if (!instantiated) { synchronized(lock) { if (t is null) t = ctor(); } } return cast(T)t; } alias sharedGlobal!({ return new Blah(); }) blah; It should have been part of QtD but that has never happened.
May 06 2013
On 5/6/2013 6:14 AM, Max Samukha wrote:FWIW, I played with a generalized form of this pattern long ago, something like (typing from memory):And, that's the classic double checked locking bug!
May 06 2013
On Mon, 06 May 2013 13:58:19 -0400, Walter Bright <newshound2 digitalmars.com> wrote:On 5/6/2013 6:14 AM, Max Samukha wrote:I think that's exactly what David had in his slide (well, except for an obvious typo). He's only checking t once, and protecting the lock/check with a TLS boolean. Although the set of the boolean is missing (probably a memory error, since he typed it from memory :) -SteveFWIW, I played with a generalized form of this pattern long ago, something like (typing from memory):And, that's the classic double checked locking bug!
May 06 2013
On Monday, 6 May 2013 at 17:58:19 UTC, Walter Bright wrote:On 5/6/2013 6:14 AM, Max Samukha wrote:D man is the bug! I simply failed to insert a line that flags the thread-local guard. Sorry for that. The Nullable thing was an impromptu to avoid ugly specializations I used for nullable and non-nullable types in my original implementation. Note that the Nullable is not phobos Nullable - the latter incurs unnecessary overhead for types that are already nullable. Maybe, the use of Nullable is an overkill and a __gshared boolean (redundant in case of a nullable type) would suffice. David mentioned the trick on this NG years ago. It is well known, understood and would be rarely needed if D could properly do eager initialization of global state. :P Anyway, I popped up mainly to point out that the pattern should not be limited to classes.FWIW, I played with a generalized form of this pattern long ago, something like (typing from memory):And, that's the classic double checked locking bug!
May 24 2013
Max Samukha:Note that the Nullable is not phobos Nullable - the latter incurs unnecessary overhead for types that are already nullable.In Bugzilla I have suggested some improvements for Nullable, but in Phobos there is already an alternative Nullable that avoids that overhead: struct Nullable(T, T nullValue); Bye, bearophile
May 24 2013
On Friday, 24 May 2013 at 13:05:36 UTC, bearophile wrote:Max Samukha:The question is what should be the result of: Nullable(int*)? Nullable!(Nullable!T)? type?Note that the Nullable is not phobos Nullable - the latter incurs unnecessary overhead for types that are already nullable.In Bugzilla I have suggested some improvements for Nullable, but in Phobos there is already an alternative Nullable that avoids that overhead: struct Nullable(T, T nullValue); Bye, bearophile
May 24 2013
Max Samukha:The question is what should be the result of: Nullable(int*)? Nullable!(Nullable!T)? type?In D Nullable!(int*) needs to be a different type, because the interface is different between a pointer and a Nullable. But often you want to use something like Nullable(int*, null) or Nullable(int*, cast(int*)null) instead. In the second case maybe it can be collapsed into a single Nullable, but such behavior needs to be documented. Bye, bearophile
May 24 2013
On Friday, 24 May 2013 at 14:03:24 UTC, bearophile wrote:Max Samukha:It could be normalized by free functions isNull(T)(T t), nullableValue(T)(..) etc. Isn't that what was done to arrays so they could support the range interface?The question is what should be the result of: Nullable(int*)? Nullable!(Nullable!T)? type?In D Nullable!(int*) needs to be a different type, because the interface is different between a pointer and a Nullable.But often you want to use something like Nullable(int*, null) or Nullable(int*, cast(int*)null) instead.Do you have an example? In what cases is the distinction between a null null int* and non-null null int* necessary?In the second case maybe it can be collapsed into a single Nullable, but such behavior needs to be documented.Ok.Bye, bearophile
May 24 2013
Max Samukha:It could be normalized by free functions isNull(T)(T t), nullableValue(T)(..) etc.In Haskell/Scala if you program idiomatically you usually use HOFs as map, filter, etc that work "transparently" on nullables or collection of nullables, because Maybe is a monad. Example: you have a range of nullables, and you map a function on it, using a special map that ignores the null items. (This is one of the improvements I have suggested in Bugzilla.)If you use a Nullable!(int*, null) type that distinction vanishes, there is only one null. Bye, bearophileBut often you want to use something like Nullable(int*, null) or Nullable(int*, cast(int*)null) instead.Do you have an example? In what cases is the distinction between a null null int* and non-null null int* necessary?
May 24 2013
On Friday, 24 May 2013 at 13:49:14 UTC, Max Samukha wrote:On Friday, 24 May 2013 at 13:05:36 UTC, bearophile wrote:New type, but no overhead.Max Samukha:The question is what should be the result of: Nullable(int*)?Note that the Nullable is not phobos Nullable - the latter incurs unnecessary overhead for types that are already nullable.In Bugzilla I have suggested some improvements for Nullable, but in Phobos there is already an alternative Nullable that avoids that overhead: struct Nullable(T, T nullValue); Bye, bearophileNullable!(Nullable!T)?New type, with an extra boolean.
May 24 2013
On Monday, 6 May 2013 at 02:35:33 UTC, dsimcha wrote:On the advice of Walter and Andrei, I've written a blog article about the low-lock Singleton pattern in D. This is a previously obscure pattern that uses thread-local storage to make Singletons both thread-safe and efficient and was independently invented by at least me and Alexander Terekhov, an IBM researcher. However, D's first-class treatment of thread-local storage means the time has come to move it out of obscurity and possibly make it the standard way to do Singletons. Article: http://davesdprogramming.wordpress.com/2013/05/06/low-lock-singletons/ Reddit: http://www.reddit.com/r/programming/comments/1droaa/lowlock_singletons_in_d_the_singleton_pattern/Thanks! I want to make a module with template mixins that implement some common idioms - singleton being one of them(http://forum.dlang.org/thread/fofbrlqruxbevnxchxdp forum.dlang.org). I'm going to use your version for implementing the singleton idiom.
May 06 2013
On 5/6/13, dsimcha <dsimcha yahoo.com> wrote:On the advice of Walter and Andrei, I've written a blog article about the low-lock Singleton pattern in D.Personally this covers it for me: final abstract class singleton { static: void foo() { } int f; } void main() { singleton.foo(); singleton.f = 1; } Never needed anything more complex than that. :p
May 06 2013
On Monday, 6 May 2013 at 21:13:24 UTC, Andrej Mitrovic wrote:On 5/6/13, dsimcha <dsimcha yahoo.com> wrote:Although that's only a single-per-thread-ton :POn the advice of Walter and Andrei, I've written a blog article about the low-lock Singleton pattern in D.Personally this covers it for me: final abstract class singleton { static: void foo() { } int f; } void main() { singleton.foo(); singleton.f = 1; } Never needed anything more complex than that. :p
May 06 2013
On 5/6/13, Diggory <diggsey googlemail.com> wrote:Although that's only a single-per-thread-ton :PMy bad.final abstract class singleton { __gshared: void foo() { } int f; }
May 06 2013
On Mon, 06 May 2013 17:36:00 -0400, Andrej Mitrovic <andrej.mitrovich gmail.com> wrote:On 5/6/13, Diggory <diggsey googlemail.com> wrote:The point that is missed is the lazy creation. Imagine if it's not just an int but a whole large resource, which takes a long time to create/open. If all we cared about is access, and you *know* you are going to need it, you can just create it eagerly via shared static this(). That is not what the singleton pattern is for. -SteveAlthough that's only a single-per-thread-ton :PMy bad.final abstract class singleton { __gshared: void foo() { } int f; }
May 06 2013
On Sun, 05 May 2013 22:35:27 -0400, dsimcha <dsimcha yahoo.com> wrote:On the advice of Walter and Andrei, I've written a blog article about the low-lock Singleton pattern in D. This is a previously obscure pattern that uses thread-local storage to make Singletons both thread-safe and efficient and was independently invented by at least me and Alexander Terekhov, an IBM researcher. However, D's first-class treatment of thread-local storage means the time has come to move it out of obscurity and possibly make it the standard way to do Singletons. Article: http://davesdprogramming.wordpress.com/2013/05/06/low-lock-singletons/ Reddit: http://www.reddit.com/r/programming/comments/1droaa/lowlock_singletons_in_d_the_singleton_pattern/Pulling this out from the depths of this discussion: David, the current pattern protects the read of the __gshared singleton with a thread local boolean. This means to check to see if the value is valid, we do: if(!instantiated_) { ... // thread safe initialization of instance_ instantiated_ = true; } return instance_; This requires a load of the boolean, and then a load of the instance. I wonder, if it might be more efficient to store a copy of the instance instead of the bool? This would only require one load for the check: if(!instantiated_instance_) // a TLS copy of the instance { ... // thread safe initialization of instance_ instantiated_instance_ = instance_; } return instantiated_instance_; I think in the steady state this devolves to the "unsafe" case, which of course is safe by that time. Might this account for at least some of dmd's poorer performance? -Steve
May 07 2013