digitalmars.D - Re: Unofficial wish list status.(Jul 2008)

superdan <super dan.org> Jul 03 2008

Oskar Linde <oskar.lindeREM OVEgmail.com> Jul 04 2008

Sean Kelly <sean invisibleduck.org> Jul 04 2008
superdan <super dan.org> Jul 04 2008

"Me Here" <p9e883002 sneakemail.com> Jul 04 2008

Sean Kelly <sean invisibleduck.org> Jul 04 2008

"Me Here" <p9e883002 sneakemail.com> Jul 04 2008

"Manfred_Nowak" <svv1999 hotmail.com> Jul 05 2008

superdan <super dan.org> Jul 04 2008

BLS <nanali nospam-wanadoo.fr> Jul 04 2008
"Me Here" <p9e883002 sneakemail.com> Jul 04 2008

superdan <super dan.org> writes:

Me Here Wrote:

 Walter Bright wrote:
 
 Yes, but the onus will be on you (the programmer) to prevent data races and
 do proper synchronization.   


 In the scenario described, the main thread initialises the array of data. Then,
 non-overlapping slices of that are tioned out to N worker threads. Only one
 thread ever modifies any given segment. When the worker threads are complete,
 the 'results' are left in the original array available in its entirety only to
 the main thread.
 
You have to be very wary of cache effects when
 writing data in one thread and expecting to see it in another.


 Are you saying that there is some combination of OS and/or hardware L1/L2
 caching that would allow one thread to read a memory location (previously)
 modified by another thread, and see 'old data'?
 
 Cos if you are, its a deeply serious bug that if its not already very well
 documented by the OS writer or hardware manufacturers, then here's your chance
 to get slashdotted (and diggited and redited etc. all concurrently) as the
 discoveerer of a fatel processor flaw.


google for "relaxed memory consistency model" or "memory barriers". geez.

Jul 03 2008

Oskar Linde <oskar.lindeREM OVEgmail.com> writes:

superdan wrote:
 Me Here Wrote:
 
 Walter Bright wrote:

 Yes, but the onus will be on you (the programmer) to prevent data races and
 do proper synchronization.   


 non-overlapping slices of that are tioned out to N worker threads. Only one
 thread ever modifies any given segment. When the worker threads are complete,
 the 'results' are left in the original array available in its entirety only to
 the main thread.

 You have to be very wary of cache effects when
 writing data in one thread and expecting to see it in another.


 caching that would allow one thread to read a memory location (previously)
 modified by another thread, and see 'old data'?

 Cos if you are, its a deeply serious bug that if its not already very well
 documented by the OS writer or hardware manufacturers, then here's your chance
 to get slashdotted (and diggited and redited etc. all concurrently) as the
 discoveerer of a fatel processor flaw.


 google for "relaxed memory consistency model" or "memory barriers". geez.


I presume the discussion regards symmetric multiprocessing (SMP).

Cache coherency is a very important element of any SMP design. It 
basically means that caches should be fully transparent, i.e. the 
behavior should not change by the addition or removal of caches.

So the above scenario should never occur. If thread A writes something 
prior to thread B reading it, B should never get the old value.

"Memory barriers" have nothing to do with cache consistency. A memory 
barrier only prevents a single CPU thread from reordering load/store 
instructions across that specific barrier.

-- 
Oskar

Jul 04 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Oskar Linde (oskar.lindeREM OVEgmail.com)'s article
 superdan wrote:
 Me Here Wrote:

 Walter Bright wrote:

 Yes, but the onus will be on you (the programmer) to prevent data races and
 do proper synchronization.


 non-overlapping slices of that are tioned out to N worker threads. Only one
 thread ever modifies any given segment. When the worker threads are complete,
 the 'results' are left in the original array available in its entirety only to
 the main thread.

 You have to be very wary of cache effects when
 writing data in one thread and expecting to see it in another.


 caching that would allow one thread to read a memory location (previously)
 modified by another thread, and see 'old data'?

 Cos if you are, its a deeply serious bug that if its not already very well
 documented by the OS writer or hardware manufacturers, then here's your chance
 to get slashdotted (and diggited and redited etc. all concurrently) as the
 discoveerer of a fatel processor flaw.


 google for "relaxed memory consistency model" or "memory barriers". geez.


 Cache coherency is a very important element of any SMP design. It
 basically means that caches should be fully transparent, i.e. the
 behavior should not change by the addition or removal of caches.
 So the above scenario should never occur. If thread A writes something
 prior to thread B reading it, B should never get the old value.
 "Memory barriers" have nothing to do with cache consistency. A memory
 barrier only prevents a single CPU thread from reordering load/store
 instructions across that specific barrier.


Things get a bit weird once pipelining and out-of-order execution come
into the picture.  Most modern CPUs are still quite good at making things
work as you'd expect, but some, like the Alpha, have an amazingly weak
memory model in terms of what they are allowed to do if you don't reign
them in.  Most amazing about the Alpha is that it will even reorder
dependent loads by default, so some really crazy things can happen with
SMP if you aren't extremely careful.  Lock-free programming on the x86
is dead simple compared to some other architectures.


Sean

Jul 04 2008

superdan <super dan.org> writes:

Oskar Linde Wrote:

 superdan wrote:
 Me Here Wrote:
 
 Walter Bright wrote:

 Yes, but the onus will be on you (the programmer) to prevent data races and
 do proper synchronization.   


 non-overlapping slices of that are tioned out to N worker threads. Only one
 thread ever modifies any given segment. When the worker threads are complete,
 the 'results' are left in the original array available in its entirety only to
 the main thread.

 You have to be very wary of cache effects when
 writing data in one thread and expecting to see it in another.


 caching that would allow one thread to read a memory location (previously)
 modified by another thread, and see 'old data'?

 Cos if you are, its a deeply serious bug that if its not already very well
 documented by the OS writer or hardware manufacturers, then here's your chance
 to get slashdotted (and diggited and redited etc. all concurrently) as the
 discoveerer of a fatel processor flaw.


 google for "relaxed memory consistency model" or "memory barriers". geez.


 I presume the discussion regards symmetric multiprocessing (SMP).
 
 Cache coherency is a very important element of any SMP design. It 
 basically means that caches should be fully transparent, i.e. the 
 behavior should not change by the addition or removal of caches.


you are perfectly correct... as of ten years ago. you are right in that cache
coherency protocols ensure the memory model is respected regardless of adding
or eliminating caches. (i should know coz i implemented a couple for a
simulator.) the problem is that the memory model has been aggressively changed
recently towards providing less and less implied ordering and requiring
programs to write explicit synchronization directives.

 So the above scenario should never occur. If thread A writes something 
 prior to thread B reading it, B should never get the old value.


yeah the problem is it's hard to define what "prior" means.

 "Memory barriers" have nothing to do with cache consistency. A memory 
 barrier only prevents a single CPU thread from reordering load/store 
 instructions across that specific barrier.


memory barriers strengthen the relaxed memory model that was pushed
aggressively by the need for faster caches.

Jul 04 2008

"Me Here" <p9e883002 sneakemail.com> writes:

superdan wrote:

 Oskar Linde Wrote:
 
 superdan wrote:
 Me Here Wrote:
 
 Walter Bright wrote:
 
 Yes, but the onus will be on you (the programmer) to prevent data races






 In the scenario described, the main thread initialises the array of




 threads. Only one >> thread ever modifies any given segment. When the
 worker threads are complete, >> the 'results' are left in the original
 array available in its entirety only to >> the main thread.
 
 You have to be very wary of cache effects when
 writing data in one thread and expecting to see it in another.


 caching that would allow one thread to read a memory location




 
 Cos if you are, its a deeply serious bug that if its not already very




 your chance >> to get slashdotted (and diggited and redited etc. all
 concurrently) as the >> discoveerer of a fatel processor flaw.
 
 google for "relaxed memory consistency model" or "memory barriers". geez.


 I presume the discussion regards symmetric multiprocessing (SMP).
 
 Cache coherency is a very important element of any SMP design. It 
 basically means that caches should be fully transparent, i.e. the 
 behavior should not change by the addition or removal of caches.


 you are perfectly correct... as of ten years ago. you are right in that cache
 coherency protocols ensure the memory model is respected regardless of adding
 or eliminating caches. (i should know coz i implemented a couple for a
 simulator.) the problem is that the memory model has been aggressively
 changed recently towards providing less and less implied ordering and
 requiring programs to write explicit synchronization directives.
 
 So the above scenario should never occur. If thread A writes something 
 prior to thread B reading it, B should never get the old value.


 yeah the problem is it's hard to define what "prior" means.
 
 "Memory barriers" have nothing to do with cache consistency. A memory 
 barrier only prevents a single CPU thread from reordering load/store 
 instructions across that specific barrier.


 memory barriers strengthen the relaxed memory model that was pushed
 aggressively by the need for faster caches.


Since in the scenario I describe, Each thread or cpu is dealing with a single
section of memory. And each section of memory is being dealt with by a single
thread or cpu, the is effectively no shared state whilst the threads run, Hence
no possibility of cache inconsistancy due to pipeline reordering. Ie.

main thread populates a[ 0 .. 1000 ];

for thread 1 .. 10
    spawn( thread, \a[ ((thread-1 ) *100 ) .. ((thread-1 + 100) * 100 ] );

main thread waits for all threads to terminate;

main thread does something with a[];

In any case, cache consistancy issues due to pipeline reordering do not survive
context switches, so the issue is a non-issue for the purposes of the
discussion at hand. Ie. threading

Pipelines cover single digit or low double digit runs of non-branching
instructsion at most. A context switch consists of hundreds if not thousands of
instructions on all but the most highly tuned of real-time kernels. This is a
very localised issue, for the compiler writer, not the application programmer
to worry about.

I know Walter *is* a compiler writer, but this is a complete red-herring in the
context of this discussion.

b.
--

Jul 04 2008

Sean Kelly <sean invisibleduck.org> writes:

== Quote from Me Here (p9e883002 sneakemail.com)'s article
 superdan wrote:
 Oskar Linde Wrote:

 superdan wrote:
 Me Here Wrote:

 Walter Bright wrote:

 Yes, but the onus will be on you (the programmer) to prevent data races






 In the scenario described, the main thread initialises the array of




 threads. Only one >> thread ever modifies any given segment. When the
 worker threads are complete, >> the 'results' are left in the original
 array available in its entirety only to >> the main thread.
 You have to be very wary of cache effects when
 writing data in one thread and expecting to see it in another.


 caching that would allow one thread to read a memory location




 Cos if you are, its a deeply serious bug that if its not already very




 your chance >> to get slashdotted (and diggited and redited etc. all
 concurrently) as the >> discoveerer of a fatel processor flaw.
 google for "relaxed memory consistency model" or "memory barriers". geez.


 I presume the discussion regards symmetric multiprocessing (SMP).

 Cache coherency is a very important element of any SMP design. It
 basically means that caches should be fully transparent, i.e. the
 behavior should not change by the addition or removal of caches.


 you are perfectly correct... as of ten years ago. you are right in that cache
 coherency protocols ensure the memory model is respected regardless of adding
 or eliminating caches. (i should know coz i implemented a couple for a
 simulator.) the problem is that the memory model has been aggressively
 changed recently towards providing less and less implied ordering and
 requiring programs to write explicit synchronization directives.

 So the above scenario should never occur. If thread A writes something
 prior to thread B reading it, B should never get the old value.


 yeah the problem is it's hard to define what "prior" means.

 "Memory barriers" have nothing to do with cache consistency. A memory
 barrier only prevents a single CPU thread from reordering load/store
 instructions across that specific barrier.


 memory barriers strengthen the relaxed memory model that was pushed
 aggressively by the need for faster caches.


 section of memory. And each section of memory is being dealt with by a single
 thread or cpu, the is effectively no shared state whilst the threads run, Hence
 no possibility of cache inconsistancy due to pipeline reordering. Ie.
 main thread populates a[ 0 .. 1000 ];
 for thread 1 .. 10
     spawn( thread, \a[ ((thread-1 ) *100 ) .. ((thread-1 + 100) * 100 ] );
 main thread waits for all threads to terminate;
 main thread does something with a[];
 In any case, cache consistancy issues due to pipeline reordering do not survive
 context switches, so the issue is a non-issue for the purposes of the
 discussion at hand. Ie. threading


Multithreading with a single-CPU machine is always fairly safe and predictable
because all threads share the same cache, etc.  Even most popular multicore
machines today are relatively safe because in most instances the cores share
at least the L2+ caches, sidestepping many typical SMP issues.  But multiple
CPUs in a machine introduce an entirely new set of issues and it's these that
concurrent programmers must consider.  For example, here's one fun issue
that can occur with PC, which is what the IA-32 (ie. x86) was thought to
follow:

x = y = 0;

// thread A
x = 1;

// thread B
if( x == 1 )
    y = 1;

// thread B
if( y == 1 )
    assert( x == 1 ); // may fail

The issue with PC described above is that while each CPU observes the actions
of another CPU in a specific order, all CPUs are not guaranteed to observe the
actions of other CPUs simultaneously.  So it's possible that thread B may
observe
thread A's store of 1 to x before thread B sees the same store.

Fortunately, Intel has recently gotten a lot more proactive about facilitating
SMP,
and during the C++0x memory model discussions it was verified that the above
behavior will in fact not occur on current Intel architectures.  But there are
a lot
of weird little issues like this that can lead to surprising behavior, even on
an
architecture with a fairly strong memory model.

 Pipelines cover single digit or low double digit runs of non-branching
 instructsion at most. A context switch consists of hundreds if not thousands of
 instructions on all but the most highly tuned of real-time kernels. This is a
 very localised issue, for the compiler writer, not the application programmer
 to worry about.
 I know Walter *is* a compiler writer, but this is a complete red-herring in the
 context of this discussion.


As above, once there is more than one CPU in a box then one may no longer
rely on context switching to provide a convenient "quiescent state," so I think
that you're providing false assurances here.


Sean

Jul 04 2008

"Me Here" <p9e883002 sneakemail.com> writes:

Sean Kelly wrote:

 == Quote from Me Here (p9e883002 sneakemail.com)'s article
 superdan wrote:
 Oskar Linde Wrote:
 
 superdan wrote:
 Me Here Wrote:
 
 Walter Bright wrote:
 
 Yes, but the onus will be on you (the programmer) to prevent data






 In the scenario described, the main thread initialises the array of




 threads. Only one >> thread ever modifies any given segment. When the
 worker threads are complete, >> the 'results' are left in the original
 array available in its entirety only to >> the main thread.
 
 You have to be very wary of cache effects when
 writing data in one thread and expecting to see it in another.






 (previously) >> modified by another thread, and see 'old data'?
 
 Cos if you are, its a deeply serious bug that if its not already very




 here's your chance >> to get slashdotted (and diggited and redited etc.
 all concurrently) as the >> discoveerer of a fatel processor flaw.
 
 google for "relaxed memory consistency model" or "memory barriers".
 geez.


 I presume the discussion regards symmetric multiprocessing (SMP).
 
 Cache coherency is a very important element of any SMP design. It
 basically means that caches should be fully transparent, i.e. the
 behavior should not change by the addition or removal of caches.


 you are perfectly correct... as of ten years ago. you are right in that
 cache coherency protocols ensure the memory model is respected regardless
 of adding or eliminating caches. (i should know coz i implemented a
 couple for a simulator.) the problem is that the memory model has been
 aggressively changed recently towards providing less and less implied
 ordering and requiring programs to write explicit synchronization
 directives.
 
 So the above scenario should never occur. If thread A writes something
 prior to thread B reading it, B should never get the old value.


 yeah the problem is it's hard to define what "prior" means.
 
 "Memory barriers" have nothing to do with cache consistency. A memory
 barrier only prevents a single CPU thread from reordering load/store
 instructions across that specific barrier.


 memory barriers strengthen the relaxed memory model that was pushed
 aggressively by the need for faster caches.


 single section of memory. And each section of memory is being dealt with by
 a single thread or cpu, the is effectively no shared state whilst the
 threads run, Hence no possibility of cache inconsistancy due to pipeline
 reordering. Ie.  main thread populates a[ 0 .. 1000 ];
 for thread 1 .. 10
     spawn( thread, \a[ ((thread-1 ) *100 ) .. ((thread-1 + 100) * 100 ] );
 main thread waits for all threads to terminate;
 main thread does something with a[];
 In any case, cache consistancy issues due to pipeline reordering do not
 survive context switches, so the issue is a non-issue for the purposes of
 the discussion at hand. Ie. threading


 Multithreading with a single-CPU machine is always fairly safe and predictable
 because all threads share the same cache, etc.  Even most popular multicore
 machines today are relatively safe because in most instances the cores share
 at least the L2+ caches, sidestepping many typical SMP issues.  But multiple
 CPUs in a machine introduce an entirely new set of issues and it's these that
 concurrent programmers must consider.  For example, here's one fun issue
 that can occur with PC, which is what the IA-32 (ie. x86) was thought to
 follow:
 
 x = y = 0;
 
 // thread A
 x = 1;
 
 // thread B
 if( x == 1 )
     y = 1;
 
 // thread B
 if( y == 1 )
     assert( x == 1 ); // may fail
 
 The issue with PC described above is that while each CPU observes the actions
 of another CPU in a specific order, all CPUs are not guaranteed to observe the
 actions of other CPUs simultaneously.  So it's possible that thread B may
 observe thread A's store of 1 to x before thread B sees the same store.
 
 Fortunately, Intel has recently gotten a lot more proactive about
 facilitating SMP, and during the C++0x memory model discussions it was
 verified that the above behavior will in fact not occur on current Intel
 architectures.  But there are a lot of weird little issues like this that can
 lead to surprising behavior, even on an architecture with a fairly strong
 memory model.
 
 Pipelines cover single digit or low double digit runs of non-branching
 instructsion at most. A context switch consists of hundreds if not
 thousands of instructions on all but the most highly tuned of real-time
 kernels. This is a very localised issue, for the compiler writer, not the
 application programmer to worry about.
 I know Walter is a compiler writer, but this is a complete red-herring in
 the context of this discussion.


 As above, once there is more than one CPU in a box then one may no longer
 rely on context switching to provide a convenient "quiescent state," so I
 think that you're providing false assurances here.
 
 
 Sean


Sean, I'm sorry, but *please* re-read everything I've posted o this subject. 

Your x is (can be) accessed by two threads/cores/cpus concurrently. 

*In the scenraio I described, this is not possible.*

Please do not feed more red herrings into this already complicated discussion.

,oO( Does anyone around here know how to stick to a single subject at a time?
Or maybe I'm typing German or Japanese and don't realise it? )

b.
--

Jul 04 2008

"Manfred_Nowak" <svv1999 hotmail.com> writes:

Me Here wrote:

[...]
 ,oO( Does anyone around here know how to stick to a single subject
 at a time?



It is a general communicational human habit not to stay on any focus 
after some time has expired. The capability to stay on focus can be 
changed by (mis-)education and (mis-)presentation(!).

Its a matter of luck to find someone who has similar capabilities for 
diving into a deep concentration into the matters of the subject _and_ 
to communicate his thoughts without disrupting yours.

-manfred

Jul 05 2008

superdan <super dan.org> writes:

Me Here Wrote:

 superdan wrote:
 
 Oskar Linde Wrote:
 
 superdan wrote:
 Me Here Wrote:
 
 Walter Bright wrote:
 
 Yes, but the onus will be on you (the programmer) to prevent data races






 In the scenario described, the main thread initialises the array of




 threads. Only one >> thread ever modifies any given segment. When the
 worker threads are complete, >> the 'results' are left in the original
 array available in its entirety only to >> the main thread.
 
 You have to be very wary of cache effects when
 writing data in one thread and expecting to see it in another.


 caching that would allow one thread to read a memory location




 
 Cos if you are, its a deeply serious bug that if its not already very




 your chance >> to get slashdotted (and diggited and redited etc. all
 concurrently) as the >> discoveerer of a fatel processor flaw.
 
 google for "relaxed memory consistency model" or "memory barriers". geez.


 I presume the discussion regards symmetric multiprocessing (SMP).
 
 Cache coherency is a very important element of any SMP design. It 
 basically means that caches should be fully transparent, i.e. the 
 behavior should not change by the addition or removal of caches.


 you are perfectly correct... as of ten years ago. you are right in that cache
 coherency protocols ensure the memory model is respected regardless of adding
 or eliminating caches. (i should know coz i implemented a couple for a
 simulator.) the problem is that the memory model has been aggressively
 changed recently towards providing less and less implied ordering and
 requiring programs to write explicit synchronization directives.
 
 So the above scenario should never occur. If thread A writes something 
 prior to thread B reading it, B should never get the old value.


 yeah the problem is it's hard to define what "prior" means.
 
 "Memory barriers" have nothing to do with cache consistency. A memory 
 barrier only prevents a single CPU thread from reordering load/store 
 instructions across that specific barrier.


 memory barriers strengthen the relaxed memory model that was pushed
 aggressively by the need for faster caches.


 Since in the scenario I describe, Each thread or cpu is dealing with a single
 section of memory. And each section of memory is being dealt with by a single
 thread or cpu, the is effectively no shared state whilst the threads run, Hence
 no possibility of cache inconsistancy due to pipeline reordering. Ie.


goodness this is so wrong i don't know where to start from. like trying to
figure out what's bad about a movie that's real real bad. you have no idea what
you're talking about do you. just throwing terms here and there and making
unstated assumptions that worked in 1980 on an atari.

first off there is word tearing. u can't change one character in a string
willy-nilly. the rest will need to be masked and you got a race condition right
there. but of course you had no idea.

 main thread populates a[ 0 .. 1000 ];
 
 for thread 1 .. 10
     spawn( thread, \a[ ((thread-1 ) *100 ) .. ((thread-1 + 100) * 100 ] );
 
 main thread waits for all threads to terminate;
 
 main thread does something with a[];
 
 In any case, cache consistancy issues due to pipeline reordering do not survive
 context switches, so the issue is a non-issue for the purposes of the
 discussion at hand. Ie. threading
 
 Pipelines cover single digit or low double digit runs of non-branching
 instructsion at most. A context switch consists of hundreds if not thousands of
 instructions on all but the most highly tuned of real-time kernels. This is a
 very localised issue, for the compiler writer, not the application programmer
 to worry about.


this is just babble. you bring pipelines and thread switching because you have
no idea what the discussion is about and you try to relate it to the little
things 1985 vintage you have a vague idea about. in the name of brian: we got
more than one processor today. wake up and smell the shit.

 I know Walter *is* a compiler writer, but this is a complete red-herring in the
 context of this discussion.


i'll tell you what's a red herring: everything you say. your knowledge is
obsolete by decades. you have no idea what you are talking about yet you try to
defraud us by using cockiness. you even have the nerve to criticize walter and
andre. tell you what. walter's brown underwear and andreis dirty socks with
holes in'em know more shit than you. i'll leave it to sean to serve you your
arrogant ass on a silver plate.

Jul 04 2008

BLS <nanali nospam-wanadoo.fr> writes:

superdan schrieb:

 i'll tell you what's a red herring: everything you say. your knowledge is
obsolete by decades. you have no idea what you are talking about yet you try to
defraud us by using cockiness. you even have the nerve to criticize walter and
andre. tell you what. walter's brown underwear and andreis dirty socks with
holes in'em know more shit than you. i'll leave it to sean to serve you your
arrogant ass on a silver plate.


atm it seems to me that Gregor R. is really a nice, gentle guy.
:)

Jul 04 2008

"Me Here" <p9e883002 sneakemail.com> writes:

superdan wrote:

 i'll tell you what's a red herring: everything you say. your knowledge is
 obsolete by decades. you have no idea what you are talking about yet you try
 to defraud us by using cockiness. you even have the nerve to criticize walter
 and andre. tell you what. walter's brown underwear and andreis dirty socks
 with holes in'em know more shit than you. i'll leave it to sean to serve you
 your arrogant ass on a silver plate.


Hey tweety-pie. How about you get back to sucking your mother's nipple. It'll
give you something useful to do with that foul mouth and perhaps calm that
overexcited brain of yours. There is nothing, I repeat *NOTHING* being
implemented in today's commodity cpus, that wasn't pioneered (and perfected) in
CDC processors (and others) *TWO DECADES AGO*.

So, when you've suckled well and aren't so grouchy through humger, and have
been burped and well rested, and had your diaper changed. Perhaps then you can
go away and do a little research on what was being done on big iron 20 or more
years ago.

You almost certainly won't because kiddies like you don't have the attention
span for it. You'll probably just come back here and spout another unjustified
and unsupported load of twaddle.

C'est la vie. Bozo, cos I won't be reading it.

b.
--

Jul 04 2008

D Programming

C/C++ Programming

Other

digitalmars.D - Re: Unofficial wish list status.(Jul 2008)