www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Re: Unofficial wish list status.(Jul 2008)

reply superdan <super dan.org> writes:
Me Here Wrote:

 Walter Bright wrote:
 
 Yes, but the onus will be on you (the programmer) to prevent data races and
 do proper synchronization.   

In the scenario described, the main thread initialises the array of data. Then, non-overlapping slices of that are tioned out to N worker threads. Only one thread ever modifies any given segment. When the worker threads are complete, the 'results' are left in the original array available in its entirety only to the main thread.
You have to be very wary of cache effects when
 writing data in one thread and expecting to see it in another.

Are you saying that there is some combination of OS and/or hardware L1/L2 caching that would allow one thread to read a memory location (previously) modified by another thread, and see 'old data'? Cos if you are, its a deeply serious bug that if its not already very well documented by the OS writer or hardware manufacturers, then here's your chance to get slashdotted (and diggited and redited etc. all concurrently) as the discoveerer of a fatel processor flaw.

google for "relaxed memory consistency model" or "memory barriers". geez.
Jul 03 2008
parent reply Oskar Linde <oskar.lindeREM OVEgmail.com> writes:
superdan wrote:
 Me Here Wrote:
 
 Walter Bright wrote:

 Yes, but the onus will be on you (the programmer) to prevent data races and
 do proper synchronization.   

non-overlapping slices of that are tioned out to N worker threads. Only one thread ever modifies any given segment. When the worker threads are complete, the 'results' are left in the original array available in its entirety only to the main thread.
 You have to be very wary of cache effects when
 writing data in one thread and expecting to see it in another.

caching that would allow one thread to read a memory location (previously) modified by another thread, and see 'old data'? Cos if you are, its a deeply serious bug that if its not already very well documented by the OS writer or hardware manufacturers, then here's your chance to get slashdotted (and diggited and redited etc. all concurrently) as the discoveerer of a fatel processor flaw.

google for "relaxed memory consistency model" or "memory barriers". geez.

I presume the discussion regards symmetric multiprocessing (SMP). Cache coherency is a very important element of any SMP design. It basically means that caches should be fully transparent, i.e. the behavior should not change by the addition or removal of caches. So the above scenario should never occur. If thread A writes something prior to thread B reading it, B should never get the old value. "Memory barriers" have nothing to do with cache consistency. A memory barrier only prevents a single CPU thread from reordering load/store instructions across that specific barrier. -- Oskar
Jul 04 2008
next sibling parent Sean Kelly <sean invisibleduck.org> writes:
== Quote from Oskar Linde (oskar.lindeREM OVEgmail.com)'s article
 superdan wrote:
 Me Here Wrote:

 Walter Bright wrote:

 Yes, but the onus will be on you (the programmer) to prevent data races and
 do proper synchronization.

non-overlapping slices of that are tioned out to N worker threads. Only one thread ever modifies any given segment. When the worker threads are complete, the 'results' are left in the original array available in its entirety only to the main thread.
 You have to be very wary of cache effects when
 writing data in one thread and expecting to see it in another.

caching that would allow one thread to read a memory location (previously) modified by another thread, and see 'old data'? Cos if you are, its a deeply serious bug that if its not already very well documented by the OS writer or hardware manufacturers, then here's your chance to get slashdotted (and diggited and redited etc. all concurrently) as the discoveerer of a fatel processor flaw.

google for "relaxed memory consistency model" or "memory barriers". geez.

Cache coherency is a very important element of any SMP design. It basically means that caches should be fully transparent, i.e. the behavior should not change by the addition or removal of caches. So the above scenario should never occur. If thread A writes something prior to thread B reading it, B should never get the old value. "Memory barriers" have nothing to do with cache consistency. A memory barrier only prevents a single CPU thread from reordering load/store instructions across that specific barrier.

Things get a bit weird once pipelining and out-of-order execution come into the picture. Most modern CPUs are still quite good at making things work as you'd expect, but some, like the Alpha, have an amazingly weak memory model in terms of what they are allowed to do if you don't reign them in. Most amazing about the Alpha is that it will even reorder dependent loads by default, so some really crazy things can happen with SMP if you aren't extremely careful. Lock-free programming on the x86 is dead simple compared to some other architectures. Sean
Jul 04 2008
prev sibling parent reply superdan <super dan.org> writes:
Oskar Linde Wrote:

 superdan wrote:
 Me Here Wrote:
 
 Walter Bright wrote:

 Yes, but the onus will be on you (the programmer) to prevent data races and
 do proper synchronization.   

non-overlapping slices of that are tioned out to N worker threads. Only one thread ever modifies any given segment. When the worker threads are complete, the 'results' are left in the original array available in its entirety only to the main thread.
 You have to be very wary of cache effects when
 writing data in one thread and expecting to see it in another.

caching that would allow one thread to read a memory location (previously) modified by another thread, and see 'old data'? Cos if you are, its a deeply serious bug that if its not already very well documented by the OS writer or hardware manufacturers, then here's your chance to get slashdotted (and diggited and redited etc. all concurrently) as the discoveerer of a fatel processor flaw.

google for "relaxed memory consistency model" or "memory barriers". geez.

I presume the discussion regards symmetric multiprocessing (SMP). Cache coherency is a very important element of any SMP design. It basically means that caches should be fully transparent, i.e. the behavior should not change by the addition or removal of caches.

you are perfectly correct... as of ten years ago. you are right in that cache coherency protocols ensure the memory model is respected regardless of adding or eliminating caches. (i should know coz i implemented a couple for a simulator.) the problem is that the memory model has been aggressively changed recently towards providing less and less implied ordering and requiring programs to write explicit synchronization directives.
 So the above scenario should never occur. If thread A writes something 
 prior to thread B reading it, B should never get the old value.

yeah the problem is it's hard to define what "prior" means.
 "Memory barriers" have nothing to do with cache consistency. A memory 
 barrier only prevents a single CPU thread from reordering load/store 
 instructions across that specific barrier.

memory barriers strengthen the relaxed memory model that was pushed aggressively by the need for faster caches.
Jul 04 2008
parent reply "Me Here" <p9e883002 sneakemail.com> writes:
superdan wrote:

 Oskar Linde Wrote:
 
 superdan wrote:
 Me Here Wrote:
 
 Walter Bright wrote:
 
 Yes, but the onus will be on you (the programmer) to prevent data races



 In the scenario described, the main thread initialises the array of


threads. Only one >> thread ever modifies any given segment. When the worker threads are complete, >> the 'results' are left in the original array available in its entirety only to >> the main thread.
 
 You have to be very wary of cache effects when
 writing data in one thread and expecting to see it in another.

caching that would allow one thread to read a memory location


 
 Cos if you are, its a deeply serious bug that if its not already very


your chance >> to get slashdotted (and diggited and redited etc. all concurrently) as the >> discoveerer of a fatel processor flaw.
 
 google for "relaxed memory consistency model" or "memory barriers". geez.

I presume the discussion regards symmetric multiprocessing (SMP). Cache coherency is a very important element of any SMP design. It basically means that caches should be fully transparent, i.e. the behavior should not change by the addition or removal of caches.

you are perfectly correct... as of ten years ago. you are right in that cache coherency protocols ensure the memory model is respected regardless of adding or eliminating caches. (i should know coz i implemented a couple for a simulator.) the problem is that the memory model has been aggressively changed recently towards providing less and less implied ordering and requiring programs to write explicit synchronization directives.
 So the above scenario should never occur. If thread A writes something 
 prior to thread B reading it, B should never get the old value.

yeah the problem is it's hard to define what "prior" means.
 "Memory barriers" have nothing to do with cache consistency. A memory 
 barrier only prevents a single CPU thread from reordering load/store 
 instructions across that specific barrier.

memory barriers strengthen the relaxed memory model that was pushed aggressively by the need for faster caches.

Since in the scenario I describe, Each thread or cpu is dealing with a single section of memory. And each section of memory is being dealt with by a single thread or cpu, the is effectively no shared state whilst the threads run, Hence no possibility of cache inconsistancy due to pipeline reordering. Ie. main thread populates a[ 0 .. 1000 ]; for thread 1 .. 10 spawn( thread, \a[ ((thread-1 ) *100 ) .. ((thread-1 + 100) * 100 ] ); main thread waits for all threads to terminate; main thread does something with a[]; In any case, cache consistancy issues due to pipeline reordering do not survive context switches, so the issue is a non-issue for the purposes of the discussion at hand. Ie. threading Pipelines cover single digit or low double digit runs of non-branching instructsion at most. A context switch consists of hundreds if not thousands of instructions on all but the most highly tuned of real-time kernels. This is a very localised issue, for the compiler writer, not the application programmer to worry about. I know Walter *is* a compiler writer, but this is a complete red-herring in the context of this discussion. b. --
Jul 04 2008
next sibling parent reply Sean Kelly <sean invisibleduck.org> writes:
== Quote from Me Here (p9e883002 sneakemail.com)'s article
 superdan wrote:
 Oskar Linde Wrote:

 superdan wrote:
 Me Here Wrote:

 Walter Bright wrote:

 Yes, but the onus will be on you (the programmer) to prevent data races



 In the scenario described, the main thread initialises the array of


threads. Only one >> thread ever modifies any given segment. When the worker threads are complete, >> the 'results' are left in the original array available in its entirety only to >> the main thread.
 You have to be very wary of cache effects when
 writing data in one thread and expecting to see it in another.

caching that would allow one thread to read a memory location


 Cos if you are, its a deeply serious bug that if its not already very


your chance >> to get slashdotted (and diggited and redited etc. all concurrently) as the >> discoveerer of a fatel processor flaw.
 google for "relaxed memory consistency model" or "memory barriers". geez.

I presume the discussion regards symmetric multiprocessing (SMP). Cache coherency is a very important element of any SMP design. It basically means that caches should be fully transparent, i.e. the behavior should not change by the addition or removal of caches.

you are perfectly correct... as of ten years ago. you are right in that cache coherency protocols ensure the memory model is respected regardless of adding or eliminating caches. (i should know coz i implemented a couple for a simulator.) the problem is that the memory model has been aggressively changed recently towards providing less and less implied ordering and requiring programs to write explicit synchronization directives.
 So the above scenario should never occur. If thread A writes something
 prior to thread B reading it, B should never get the old value.

yeah the problem is it's hard to define what "prior" means.
 "Memory barriers" have nothing to do with cache consistency. A memory
 barrier only prevents a single CPU thread from reordering load/store
 instructions across that specific barrier.

memory barriers strengthen the relaxed memory model that was pushed aggressively by the need for faster caches.

section of memory. And each section of memory is being dealt with by a single thread or cpu, the is effectively no shared state whilst the threads run, Hence no possibility of cache inconsistancy due to pipeline reordering. Ie. main thread populates a[ 0 .. 1000 ]; for thread 1 .. 10 spawn( thread, \a[ ((thread-1 ) *100 ) .. ((thread-1 + 100) * 100 ] ); main thread waits for all threads to terminate; main thread does something with a[]; In any case, cache consistancy issues due to pipeline reordering do not survive context switches, so the issue is a non-issue for the purposes of the discussion at hand. Ie. threading

Multithreading with a single-CPU machine is always fairly safe and predictable because all threads share the same cache, etc. Even most popular multicore machines today are relatively safe because in most instances the cores share at least the L2+ caches, sidestepping many typical SMP issues. But multiple CPUs in a machine introduce an entirely new set of issues and it's these that concurrent programmers must consider. For example, here's one fun issue that can occur with PC, which is what the IA-32 (ie. x86) was thought to follow: x = y = 0; // thread A x = 1; // thread B if( x == 1 ) y = 1; // thread B if( y == 1 ) assert( x == 1 ); // may fail The issue with PC described above is that while each CPU observes the actions of another CPU in a specific order, all CPUs are not guaranteed to observe the actions of other CPUs simultaneously. So it's possible that thread B may observe thread A's store of 1 to x before thread B sees the same store. Fortunately, Intel has recently gotten a lot more proactive about facilitating SMP, and during the C++0x memory model discussions it was verified that the above behavior will in fact not occur on current Intel architectures. But there are a lot of weird little issues like this that can lead to surprising behavior, even on an architecture with a fairly strong memory model.
 Pipelines cover single digit or low double digit runs of non-branching
 instructsion at most. A context switch consists of hundreds if not thousands of
 instructions on all but the most highly tuned of real-time kernels. This is a
 very localised issue, for the compiler writer, not the application programmer
 to worry about.
 I know Walter *is* a compiler writer, but this is a complete red-herring in the
 context of this discussion.

As above, once there is more than one CPU in a box then one may no longer rely on context switching to provide a convenient "quiescent state," so I think that you're providing false assurances here. Sean
Jul 04 2008
parent reply "Me Here" <p9e883002 sneakemail.com> writes:
Sean Kelly wrote:

 == Quote from Me Here (p9e883002 sneakemail.com)'s article
 superdan wrote:
 Oskar Linde Wrote:
 
 superdan wrote:
 Me Here Wrote:
 
 Walter Bright wrote:
 
 Yes, but the onus will be on you (the programmer) to prevent data



 In the scenario described, the main thread initialises the array of


threads. Only one >> thread ever modifies any given segment. When the worker threads are complete, >> the 'results' are left in the original array available in its entirety only to >> the main thread.
 
 You have to be very wary of cache effects when
 writing data in one thread and expecting to see it in another.



(previously) >> modified by another thread, and see 'old data'?
 
 Cos if you are, its a deeply serious bug that if its not already very


here's your chance >> to get slashdotted (and diggited and redited etc. all concurrently) as the >> discoveerer of a fatel processor flaw.
 
 google for "relaxed memory consistency model" or "memory barriers".
 geez.

I presume the discussion regards symmetric multiprocessing (SMP). Cache coherency is a very important element of any SMP design. It basically means that caches should be fully transparent, i.e. the behavior should not change by the addition or removal of caches.

you are perfectly correct... as of ten years ago. you are right in that cache coherency protocols ensure the memory model is respected regardless of adding or eliminating caches. (i should know coz i implemented a couple for a simulator.) the problem is that the memory model has been aggressively changed recently towards providing less and less implied ordering and requiring programs to write explicit synchronization directives.
 So the above scenario should never occur. If thread A writes something
 prior to thread B reading it, B should never get the old value.

yeah the problem is it's hard to define what "prior" means.
 "Memory barriers" have nothing to do with cache consistency. A memory
 barrier only prevents a single CPU thread from reordering load/store
 instructions across that specific barrier.

memory barriers strengthen the relaxed memory model that was pushed aggressively by the need for faster caches.

single section of memory. And each section of memory is being dealt with by a single thread or cpu, the is effectively no shared state whilst the threads run, Hence no possibility of cache inconsistancy due to pipeline reordering. Ie. main thread populates a[ 0 .. 1000 ]; for thread 1 .. 10 spawn( thread, \a[ ((thread-1 ) *100 ) .. ((thread-1 + 100) * 100 ] ); main thread waits for all threads to terminate; main thread does something with a[]; In any case, cache consistancy issues due to pipeline reordering do not survive context switches, so the issue is a non-issue for the purposes of the discussion at hand. Ie. threading

Multithreading with a single-CPU machine is always fairly safe and predictable because all threads share the same cache, etc. Even most popular multicore machines today are relatively safe because in most instances the cores share at least the L2+ caches, sidestepping many typical SMP issues. But multiple CPUs in a machine introduce an entirely new set of issues and it's these that concurrent programmers must consider. For example, here's one fun issue that can occur with PC, which is what the IA-32 (ie. x86) was thought to follow: x = y = 0; // thread A x = 1; // thread B if( x == 1 ) y = 1; // thread B if( y == 1 ) assert( x == 1 ); // may fail The issue with PC described above is that while each CPU observes the actions of another CPU in a specific order, all CPUs are not guaranteed to observe the actions of other CPUs simultaneously. So it's possible that thread B may observe thread A's store of 1 to x before thread B sees the same store. Fortunately, Intel has recently gotten a lot more proactive about facilitating SMP, and during the C++0x memory model discussions it was verified that the above behavior will in fact not occur on current Intel architectures. But there are a lot of weird little issues like this that can lead to surprising behavior, even on an architecture with a fairly strong memory model.
 Pipelines cover single digit or low double digit runs of non-branching
 instructsion at most. A context switch consists of hundreds if not
 thousands of instructions on all but the most highly tuned of real-time
 kernels. This is a very localised issue, for the compiler writer, not the
 application programmer to worry about.
 I know Walter is a compiler writer, but this is a complete red-herring in
 the context of this discussion.

As above, once there is more than one CPU in a box then one may no longer rely on context switching to provide a convenient "quiescent state," so I think that you're providing false assurances here. Sean

Sean, I'm sorry, but *please* re-read everything I've posted o this subject. Your x is (can be) accessed by two threads/cores/cpus concurrently. *In the scenraio I described, this is not possible.* Please do not feed more red herrings into this already complicated discussion. ,oO( Does anyone around here know how to stick to a single subject at a time? Or maybe I'm typing German or Japanese and don't realise it? ) b. --
Jul 04 2008
parent "Manfred_Nowak" <svv1999 hotmail.com> writes:
Me Here wrote:

[...]
 ,oO( Does anyone around here know how to stick to a single subject
 at a time?

It is a general communicational human habit not to stay on any focus after some time has expired. The capability to stay on focus can be changed by (mis-)education and (mis-)presentation(!). Its a matter of luck to find someone who has similar capabilities for diving into a deep concentration into the matters of the subject _and_ to communicate his thoughts without disrupting yours. -manfred
Jul 05 2008
prev sibling parent reply superdan <super dan.org> writes:
Me Here Wrote:

 superdan wrote:
 
 Oskar Linde Wrote:
 
 superdan wrote:
 Me Here Wrote:
 
 Walter Bright wrote:
 
 Yes, but the onus will be on you (the programmer) to prevent data races



 In the scenario described, the main thread initialises the array of


threads. Only one >> thread ever modifies any given segment. When the worker threads are complete, >> the 'results' are left in the original array available in its entirety only to >> the main thread.
 
 You have to be very wary of cache effects when
 writing data in one thread and expecting to see it in another.

caching that would allow one thread to read a memory location


 
 Cos if you are, its a deeply serious bug that if its not already very


your chance >> to get slashdotted (and diggited and redited etc. all concurrently) as the >> discoveerer of a fatel processor flaw.
 
 google for "relaxed memory consistency model" or "memory barriers". geez.

I presume the discussion regards symmetric multiprocessing (SMP). Cache coherency is a very important element of any SMP design. It basically means that caches should be fully transparent, i.e. the behavior should not change by the addition or removal of caches.

you are perfectly correct... as of ten years ago. you are right in that cache coherency protocols ensure the memory model is respected regardless of adding or eliminating caches. (i should know coz i implemented a couple for a simulator.) the problem is that the memory model has been aggressively changed recently towards providing less and less implied ordering and requiring programs to write explicit synchronization directives.
 So the above scenario should never occur. If thread A writes something 
 prior to thread B reading it, B should never get the old value.

yeah the problem is it's hard to define what "prior" means.
 "Memory barriers" have nothing to do with cache consistency. A memory 
 barrier only prevents a single CPU thread from reordering load/store 
 instructions across that specific barrier.

memory barriers strengthen the relaxed memory model that was pushed aggressively by the need for faster caches.

Since in the scenario I describe, Each thread or cpu is dealing with a single section of memory. And each section of memory is being dealt with by a single thread or cpu, the is effectively no shared state whilst the threads run, Hence no possibility of cache inconsistancy due to pipeline reordering. Ie.

goodness this is so wrong i don't know where to start from. like trying to figure out what's bad about a movie that's real real bad. you have no idea what you're talking about do you. just throwing terms here and there and making unstated assumptions that worked in 1980 on an atari. first off there is word tearing. u can't change one character in a string willy-nilly. the rest will need to be masked and you got a race condition right there. but of course you had no idea.
 main thread populates a[ 0 .. 1000 ];
 
 for thread 1 .. 10
     spawn( thread, \a[ ((thread-1 ) *100 ) .. ((thread-1 + 100) * 100 ] );
 
 main thread waits for all threads to terminate;
 
 main thread does something with a[];
 
 In any case, cache consistancy issues due to pipeline reordering do not survive
 context switches, so the issue is a non-issue for the purposes of the
 discussion at hand. Ie. threading
 
 Pipelines cover single digit or low double digit runs of non-branching
 instructsion at most. A context switch consists of hundreds if not thousands of
 instructions on all but the most highly tuned of real-time kernels. This is a
 very localised issue, for the compiler writer, not the application programmer
 to worry about.

this is just babble. you bring pipelines and thread switching because you have no idea what the discussion is about and you try to relate it to the little things 1985 vintage you have a vague idea about. in the name of brian: we got more than one processor today. wake up and smell the shit.
 I know Walter *is* a compiler writer, but this is a complete red-herring in the
 context of this discussion.

i'll tell you what's a red herring: everything you say. your knowledge is obsolete by decades. you have no idea what you are talking about yet you try to defraud us by using cockiness. you even have the nerve to criticize walter and andre. tell you what. walter's brown underwear and andreis dirty socks with holes in'em know more shit than you. i'll leave it to sean to serve you your arrogant ass on a silver plate.
Jul 04 2008
next sibling parent BLS <nanali nospam-wanadoo.fr> writes:
superdan schrieb:

 i'll tell you what's a red herring: everything you say. your knowledge is
obsolete by decades. you have no idea what you are talking about yet you try to
defraud us by using cockiness. you even have the nerve to criticize walter and
andre. tell you what. walter's brown underwear and andreis dirty socks with
holes in'em know more shit than you. i'll leave it to sean to serve you your
arrogant ass on a silver plate.

atm it seems to me that Gregor R. is really a nice, gentle guy. :)
Jul 04 2008
prev sibling parent "Me Here" <p9e883002 sneakemail.com> writes:
superdan wrote:

 i'll tell you what's a red herring: everything you say. your knowledge is
 obsolete by decades. you have no idea what you are talking about yet you try
 to defraud us by using cockiness. you even have the nerve to criticize walter
 and andre. tell you what. walter's brown underwear and andreis dirty socks
 with holes in'em know more shit than you. i'll leave it to sean to serve you
 your arrogant ass on a silver plate.

Hey tweety-pie. How about you get back to sucking your mother's nipple. It'll give you something useful to do with that foul mouth and perhaps calm that overexcited brain of yours. There is nothing, I repeat *NOTHING* being implemented in today's commodity cpus, that wasn't pioneered (and perfected) in CDC processors (and others) *TWO DECADES AGO*. So, when you've suckled well and aren't so grouchy through humger, and have been burped and well rested, and had your diaper changed. Perhaps then you can go away and do a little research on what was being done on big iron 20 or more years ago. You almost certainly won't because kiddies like you don't have the attention span for it. You'll probably just come back here and spout another unjustified and unsupported load of twaddle. C'est la vie. Bozo, cos I won't be reading it. b. --
Jul 04 2008