digitalmars.D - Word Tearing: Still a practical problem?
- dsimcha (16/16) Mar 21 2011 A few posts deep in the discussion on std.parallelism have prompted me t...
- bearophile (4/7) Mar 21 2011 Aren't some problems caused by writing on the same cache line?
- dsimcha (5/12) Mar 21 2011 I think you're referring to false sharing. If so, this is only a perfor...
- nedbrek (8/24) Mar 21 2011 Hello all,
- dsimcha (11/36) Mar 21 2011 Excellent. I highly doubt we care about std.parallelism working on
- Nick Sabalausky (6/17) Mar 21 2011 Parallax's Propeller microcontroller has 8 cores. But it's so low-memory...
- %u (4/5) Mar 21 2011 embedded platforms. (Who the heck has a multicore embedded CPU
A few posts deep in the discussion on std.parallelism have prompted me to double-check an assumption that I made previously. Is writing to adjacent but non-overlapping memory addresses concurrently from different threads safe on all hardware we care about supporting? I know this isn't safe on some DS9K-like architectures that we don't care about, like old DEC Alphas. This is because the hardware doesn't allow addressing of single bytes. I'm also aware of the performance implications of false sharing, but this is not of concern because, for the cases where adjacent memory addresses are written to concurrently in std.parallelism or its examples, these are only a tiny fraction of writes and would not have a significant impact on performance. I'm also aware that the compiler could in theory generate instructions to perform writes at a higher granularity than what's specified by the source code, but I imagine this is a purely theoretical concern, as I can't see any reason why it would in practice. IMHO if this is already the way it works in practice, it should be formally specified by D's memory model.
Mar 21 2011
dsimcha:Is writing to adjacent but non-overlapping memory addresses concurrently from different threads safe on all hardware we care about supporting?Aren't some problems caused by writing on the same cache line? Bye, bearophile
Mar 21 2011
== Quote from bearophile (bearophileHUGS lycos.com)'s articledsimcha:I think you're referring to false sharing. If so, this is only a performance problem, nit a correctness problem. If not, please elaborate. Also, on x86, cache coherency circuitry make the cache much more transparent than on some architectures. I'm not so sure about others.Is writing to adjacent but non-overlapping memory addresses concurrently from different threads safe on all hardware we care about supporting?Aren't some problems caused by writing on the same cache line? Bye, bearophile
Mar 21 2011
Hello all, "dsimcha" <dsimcha yahoo.com> wrote in message news:im8d3b$j78$1 digitalmars.com...A few posts deep in the discussion on std.parallelism have prompted me to double-check an assumption that I made previously. Is writing to adjacent but non-overlapping memory addresses concurrently from different threads safe on all hardware we care about supporting? I know this isn't safe on some DS9K-like architectures that we don't care about, like old DEC Alphas. This is because the hardware doesn't allow addressing of single bytes. I'm also aware of the performance implications of false sharing, but this is not of concern because, for the cases where adjacent memory addresses are written to concurrently in std.parallelism or its examples, these are only a tiny fraction of writes and would not have a significant impact on performance.The main architectures (x86 and ARM) are both byte granular. Most embedded platforms are also byte granular. Alpha is the only architecture I am aware of that had this problem. Possibly other old/high performance ones... (Cray, 360, etc.) Ned
Mar 21 2011
On 3/21/2011 7:55 PM, nedbrek wrote:Hello all, "dsimcha"<dsimcha yahoo.com> wrote in message news:im8d3b$j78$1 digitalmars.com...Excellent. I highly doubt we care about std.parallelism working on embedded platforms. (Who the heck has a multicore embedded CPU anyway?) My only other concern is that the compiler could in theory do strange things that effectively increase granularity in some cases. I doubt any would in practice. I'd feel much better if I had some official-looking documentation, or at least assurance from Walter that DMD doesn't. Better yet would be assurance from a compiler expert (i.e. Walter) that all sanely implemented compilers for byte-granular hardware don't increase memory granularity in practice, even if they don't officially guarantee it.A few posts deep in the discussion on std.parallelism have prompted me to double-check an assumption that I made previously. Is writing to adjacent but non-overlapping memory addresses concurrently from different threads safe on all hardware we care about supporting? I know this isn't safe on some DS9K-like architectures that we don't care about, like old DEC Alphas. This is because the hardware doesn't allow addressing of single bytes. I'm also aware of the performance implications of false sharing, but this is not of concern because, for the cases where adjacent memory addresses are written to concurrently in std.parallelism or its examples, these are only a tiny fraction of writes and would not have a significant impact on performance.The main architectures (x86 and ARM) are both byte granular. Most embedded platforms are also byte granular. Alpha is the only architecture I am aware of that had this problem. Possibly other old/high performance ones... (Cray, 360, etc.) Ned
Mar 21 2011
"dsimcha" <dsimcha yahoo.com> wrote in message news:im8pu5$1921$1 digitalmars.com...On 3/21/2011 7:55 PM, nedbrek wrote:Parallax's Propeller microcontroller has 8 cores. But it's so low-memory that I doubt D would be appropriate for it. Someone did manage to make a C compiler for it, but even that involved some compromises (although not as many as the Propeller's built-in SPIN language).The main architectures (x86 and ARM) are both byte granular. Most embedded platforms are also byte granular. Alpha is the only architecture I am aware of that had this problem. Possibly other old/high performance ones... (Cray, 360, etc.)Excellent. I highly doubt we care about std.parallelism working on embedded platforms. (Who the heck has a multicore embedded CPU anyway?)
Mar 21 2011
Excellent. I highly doubt we care about std.parallelism working onembedded platforms. (Who the heck has a multicore embedded CPU anyway?) I KNOW!! 64k ought to be enough for anybody, right?
Mar 21 2011