www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Word Tearing: Still a practical problem?

reply dsimcha <dsimcha yahoo.com> writes:
A few posts deep in the discussion on std.parallelism have prompted me to
double-check an assumption that I made previously.  Is writing to adjacent but
non-overlapping memory addresses concurrently from different threads safe on
all hardware we care about supporting?

I know this isn't safe on some DS9K-like architectures that we don't care
about, like old DEC Alphas.  This is because the hardware doesn't allow
addressing of single bytes.  I'm also aware of the performance implications of
false sharing, but this is not of concern because, for the cases where
adjacent memory addresses are written to concurrently in std.parallelism or
its examples, these are only a tiny fraction of writes and would not have a
significant impact on performance.

I'm also aware that the compiler could in theory generate instructions to
perform writes at a higher granularity than what's specified by the source
code, but I imagine this is a purely theoretical concern, as I can't see any
reason why it would in practice.  IMHO if this is already the way it works in
practice, it should be formally specified by D's memory model.
Mar 21 2011
next sibling parent reply bearophile <bearophileHUGS lycos.com> writes:
dsimcha:

 Is writing to adjacent but
 non-overlapping memory addresses concurrently from different threads safe on
 all hardware we care about supporting?
Aren't some problems caused by writing on the same cache line? Bye, bearophile
Mar 21 2011
parent dsimcha <dsimcha yahoo.com> writes:
== Quote from bearophile (bearophileHUGS lycos.com)'s article
 dsimcha:
 Is writing to adjacent but
 non-overlapping memory addresses concurrently from different threads safe on
 all hardware we care about supporting?
Aren't some problems caused by writing on the same cache line? Bye, bearophile
I think you're referring to false sharing. If so, this is only a performance problem, nit a correctness problem. If not, please elaborate. Also, on x86, cache coherency circuitry make the cache much more transparent than on some architectures. I'm not so sure about others.
Mar 21 2011
prev sibling parent reply "nedbrek" <nedbrek yahoo.com> writes:
Hello all,

"dsimcha" <dsimcha yahoo.com> wrote in message 
news:im8d3b$j78$1 digitalmars.com...
A few posts deep in the discussion on std.parallelism have prompted me to
 double-check an assumption that I made previously.  Is writing to adjacent 
 but
 non-overlapping memory addresses concurrently from different threads safe 
 on
 all hardware we care about supporting?

 I know this isn't safe on some DS9K-like architectures that we don't care
 about, like old DEC Alphas.  This is because the hardware doesn't allow
 addressing of single bytes.  I'm also aware of the performance 
 implications of
 false sharing, but this is not of concern because, for the cases where
 adjacent memory addresses are written to concurrently in std.parallelism 
 or
 its examples, these are only a tiny fraction of writes and would not have 
 a
 significant impact on performance.
The main architectures (x86 and ARM) are both byte granular. Most embedded platforms are also byte granular. Alpha is the only architecture I am aware of that had this problem. Possibly other old/high performance ones... (Cray, 360, etc.) Ned
Mar 21 2011
parent reply dsimcha <dsimcha yahoo.com> writes:
On 3/21/2011 7:55 PM, nedbrek wrote:
 Hello all,

 "dsimcha"<dsimcha yahoo.com>  wrote in message
 news:im8d3b$j78$1 digitalmars.com...
 A few posts deep in the discussion on std.parallelism have prompted me to
 double-check an assumption that I made previously.  Is writing to adjacent
 but
 non-overlapping memory addresses concurrently from different threads safe
 on
 all hardware we care about supporting?

 I know this isn't safe on some DS9K-like architectures that we don't care
 about, like old DEC Alphas.  This is because the hardware doesn't allow
 addressing of single bytes.  I'm also aware of the performance
 implications of
 false sharing, but this is not of concern because, for the cases where
 adjacent memory addresses are written to concurrently in std.parallelism
 or
 its examples, these are only a tiny fraction of writes and would not have
 a
 significant impact on performance.
The main architectures (x86 and ARM) are both byte granular. Most embedded platforms are also byte granular. Alpha is the only architecture I am aware of that had this problem. Possibly other old/high performance ones... (Cray, 360, etc.) Ned
Excellent. I highly doubt we care about std.parallelism working on embedded platforms. (Who the heck has a multicore embedded CPU anyway?) My only other concern is that the compiler could in theory do strange things that effectively increase granularity in some cases. I doubt any would in practice. I'd feel much better if I had some official-looking documentation, or at least assurance from Walter that DMD doesn't. Better yet would be assurance from a compiler expert (i.e. Walter) that all sanely implemented compilers for byte-granular hardware don't increase memory granularity in practice, even if they don't officially guarantee it.
Mar 21 2011
parent reply "Nick Sabalausky" <a a.a> writes:
"dsimcha" <dsimcha yahoo.com> wrote in message 
news:im8pu5$1921$1 digitalmars.com...
 On 3/21/2011 7:55 PM, nedbrek wrote:
 The main architectures (x86 and ARM) are both byte granular.  Most 
 embedded
 platforms are also byte granular.  Alpha is the only architecture I am 
 aware
 of that had this problem.  Possibly other old/high performance ones...
 (Cray, 360, etc.)
Excellent. I highly doubt we care about std.parallelism working on embedded platforms. (Who the heck has a multicore embedded CPU anyway?)
Parallax's Propeller microcontroller has 8 cores. But it's so low-memory that I doubt D would be appropriate for it. Someone did manage to make a C compiler for it, but even that involved some compromises (although not as many as the Propeller's built-in SPIN language).
Mar 21 2011
parent %u <wfunction hotmail.com> writes:
 Excellent.  I highly doubt we care about std.parallelism working on
embedded platforms. (Who the heck has a multicore embedded CPU anyway?) I KNOW!! 64k ought to be enough for anybody, right?
Mar 21 2011