digitalmars.D - Word Tearing: Still a practical problem?

dsimcha (16/16) Mar 21 2011 A few posts deep in the discussion on std.parallelism have prompted me t...

bearophile (4/7) Mar 21 2011 Aren't some problems caused by writing on the same cache line?

dsimcha (5/12) Mar 21 2011 I think you're referring to false sharing. If so, this is only a perfor...

nedbrek (8/24) Mar 21 2011 Hello all,

dsimcha (11/36) Mar 21 2011 Excellent. I highly doubt we care about std.parallelism working on

Nick Sabalausky (6/17) Mar 21 2011 Parallax's Propeller microcontroller has 8 cores. But it's so low-memory...

%u (4/5) Mar 21 2011 embedded platforms. (Who the heck has a multicore embedded CPU

dsimcha <dsimcha yahoo.com> writes:

A few posts deep in the discussion on std.parallelism have prompted me to
double-check an assumption that I made previously.  Is writing to adjacent but
non-overlapping memory addresses concurrently from different threads safe on
all hardware we care about supporting?

I know this isn't safe on some DS9K-like architectures that we don't care
about, like old DEC Alphas.  This is because the hardware doesn't allow
addressing of single bytes.  I'm also aware of the performance implications of
false sharing, but this is not of concern because, for the cases where
adjacent memory addresses are written to concurrently in std.parallelism or
its examples, these are only a tiny fraction of writes and would not have a
significant impact on performance.

I'm also aware that the compiler could in theory generate instructions to
perform writes at a higher granularity than what's specified by the source
code, but I imagine this is a purely theoretical concern, as I can't see any
reason why it would in practice.  IMHO if this is already the way it works in
practice, it should be formally specified by D's memory model.

Mar 21 2011

bearophile <bearophileHUGS lycos.com> writes:

dsimcha:

 Is writing to adjacent but
 non-overlapping memory addresses concurrently from different threads safe on
 all hardware we care about supporting?

Aren't some problems caused by writing on the same cache line?

Bye,
bearophile

Mar 21 2011

dsimcha <dsimcha yahoo.com> writes:

== Quote from bearophile (bearophileHUGS lycos.com)'s article
 dsimcha:
 Is writing to adjacent but
 non-overlapping memory addresses concurrently from different threads safe on
 all hardware we care about supporting?

 Aren't some problems caused by writing on the same cache line?
 Bye,
 bearophile

I think you're referring to false sharing.  If so, this is only a performance
problem, nit a correctness problem.  If not, please elaborate.  Also, on x86,
cache coherency circuitry make the cache much more transparent than on some
architectures.  I'm not so sure about others.

Mar 21 2011

"nedbrek" <nedbrek yahoo.com> writes:

Hello all,

"dsimcha" <dsimcha yahoo.com> wrote in message 
news:im8d3b$j78$1 digitalmars.com...
A few posts deep in the discussion on std.parallelism have prompted me to
 double-check an assumption that I made previously.  Is writing to adjacent 
 but
 non-overlapping memory addresses concurrently from different threads safe 
 on
 all hardware we care about supporting?

 I know this isn't safe on some DS9K-like architectures that we don't care
 about, like old DEC Alphas.  This is because the hardware doesn't allow
 addressing of single bytes.  I'm also aware of the performance 
 implications of
 false sharing, but this is not of concern because, for the cases where
 adjacent memory addresses are written to concurrently in std.parallelism 
 or
 its examples, these are only a tiny fraction of writes and would not have 
 a
 significant impact on performance.

The main architectures (x86 and ARM) are both byte granular.  Most embedded 
platforms are also byte granular.  Alpha is the only architecture I am aware 
of that had this problem.  Possibly other old/high performance ones... 
(Cray, 360, etc.)

Ned

Mar 21 2011

dsimcha <dsimcha yahoo.com> writes:

On 3/21/2011 7:55 PM, nedbrek wrote:
 Hello all,

 "dsimcha"<dsimcha yahoo.com>  wrote in message
 news:im8d3b$j78$1 digitalmars.com...
 A few posts deep in the discussion on std.parallelism have prompted me to
 double-check an assumption that I made previously.  Is writing to adjacent
 but
 non-overlapping memory addresses concurrently from different threads safe
 on
 all hardware we care about supporting?

 I know this isn't safe on some DS9K-like architectures that we don't care
 about, like old DEC Alphas.  This is because the hardware doesn't allow
 addressing of single bytes.  I'm also aware of the performance
 implications of
 false sharing, but this is not of concern because, for the cases where
 adjacent memory addresses are written to concurrently in std.parallelism
 or
 its examples, these are only a tiny fraction of writes and would not have
 a
 significant impact on performance.

 The main architectures (x86 and ARM) are both byte granular.  Most embedded
 platforms are also byte granular.  Alpha is the only architecture I am aware
 of that had this problem.  Possibly other old/high performance ones...
 (Cray, 360, etc.)

 Ned

Excellent.  I highly doubt we care about std.parallelism working on 
embedded platforms.  (Who the heck has a multicore embedded CPU anyway?)

My only other concern is that the compiler could in theory do strange 
things that effectively increase granularity in some cases.  I doubt any 
would in practice.  I'd feel much better if I had some official-looking 
documentation, or at least assurance from Walter that DMD doesn't. 
Better yet would be assurance from a compiler expert (i.e. Walter) that 
all sanely implemented compilers for byte-granular hardware don't 
increase memory granularity in practice, even if they don't officially 
guarantee it.

Mar 21 2011

"Nick Sabalausky" <a a.a> writes:

"dsimcha" <dsimcha yahoo.com> wrote in message 
news:im8pu5$1921$1 digitalmars.com...
 On 3/21/2011 7:55 PM, nedbrek wrote:
 The main architectures (x86 and ARM) are both byte granular.  Most 
 embedded
 platforms are also byte granular.  Alpha is the only architecture I am 
 aware
 of that had this problem.  Possibly other old/high performance ones...
 (Cray, 360, etc.)

 Excellent.  I highly doubt we care about std.parallelism working on 
 embedded platforms.  (Who the heck has a multicore embedded CPU anyway?)

Parallax's Propeller microcontroller has 8 cores. But it's so low-memory 
that I doubt D would be appropriate for it. Someone did manage to make a C 
compiler for it, but even that involved some compromises (although not as 
many as the Propeller's built-in SPIN language).

Mar 21 2011

%u <wfunction hotmail.com> writes:

 Excellent.  I highly doubt we care about std.parallelism working on

embedded platforms. (Who the heck has a multicore embedded CPU
anyway?)

I KNOW!!

64k ought to be enough for anybody, right?

Mar 21 2011

D Programming

C/C++ Programming

Other

digitalmars.D - Word Tearing: Still a practical problem?