www.digitalmars.com         C & C++   DMDScript  

digitalmars.D - Re: Good demo for showing benefits of parallelism

reply renoX <renosky free.fr> writes:
What I find funny, me, is that you're talking about GPUs and at the same time
about STM, AFAIK noone has suggested yet to use STM for programs running on
GPUs.

That said STM, futurism what about 'normal' data-parallel (array) operations?
IMHO, this is the parallelism which is the easiest for programmers to use, even
if its scope is limitated.

renoX

David B. Held Wrote:
 Bill Baxter wrote:
 [...]
 Along the same lines, almost any image processing algorithm is in the 
 same category where the top loop looks like "foreach pixel do ___". This 
 is a big part of why GPUs just keep getting faster and faster, because 
 the basic problem is just so inherently parallelizable.

Yeah, it's funny that people are so impressed with 2, 4, even 32 cores when the nVidia 8800 has 48 pixel pipelines and 128 shaders. That's also why some people use GPUs as math coprocessors (and why some video cards require a dedicated power supply cord!). Speaking of threading and parallelization, when is D going to get Transactional Memory? Could it steal it from here: http://en.wikipedia.org/wiki/Transactional_memory#Implementations? There's http://libcmt.sourceforge.net/ and https://sourceforge.net/projects/libltx. Surely something can be kiped? Dave

Jan 26 2007
parent reply Mikola Lysenko <mclysenk mtu.edu> writes:
There seems to be a great deal of confusion between concurrency and 
parallelism.  Parallelism is a natural part of many problems, and it is 
relatively easy to exploit in order to enhance performance.  Parallel 
algorithms naturally scale to arbitrary numbers of processors, and are 
not particularly difficult to develop.

Concurrency on the other hand is very difficult.  When multiple 
processes must communicate, the programming complexity quickly spirals 
out of control resulting in unmanageable chaotic programs.  Locks, 
channels and STM are all useful concurrency primitives, but no single 
one can be considered a complete solution.  The difficulty of concurrent 
programming was recognized early on by programmers like Dijkstra who 
worked on the first operating systems.  To this day, it is still an 
unsolved problem and must be approached very carefully on a per-case basis.

In this light, GPU programming should not be considered concurrent 
programming, since it is impossible for threads on the GPU to 
communicate since all shader memory is read-only.  GPU programs are 
parallel however, and they are typically not very difficult to write 
(beyond some annoyances in the API/shader language).  Similarly futures 
do not help with concurrent programs, since they only improve the 
parallelism inherent within a program.

Shaders, futures and array operations are all helpful, since they 
provide convenient mechanisms for utilizing parallelism.  However they 
utterly fail to address the most difficult aspects of multi-threading, 
which means they are not a complete solution.
Jan 27 2007
parent reply Kevin Bealer <kevinbealer gmail.com> writes:
Mikola Lysenko wrote:
 There seems to be a great deal of confusion between concurrency and 
 parallelism.  Parallelism is a natural part of many problems, and it is 
 relatively easy to exploit in order to enhance performance.  Parallel 
 algorithms naturally scale to arbitrary numbers of processors, and are 
 not particularly difficult to develop.
 
 Concurrency on the other hand is very difficult.  When multiple 
 processes must communicate, the programming complexity quickly spirals 
 out of control resulting in unmanageable chaotic programs.  Locks, 
 channels and STM are all useful concurrency primitives, but no single 
 one can be considered a complete solution.  The difficulty of concurrent 
 programming was recognized early on by programmers like Dijkstra who 
 worked on the first operating systems.  To this day, it is still an 
 unsolved problem and must be approached very carefully on a per-case basis.
 
 In this light, GPU programming should not be considered concurrent 
 programming, since it is impossible for threads on the GPU to 
 communicate since all shader memory is read-only.  GPU programs are 
 parallel however, and they are typically not very difficult to write 
 (beyond some annoyances in the API/shader language).  Similarly futures 
 do not help with concurrent programs, since they only improve the 
 parallelism inherent within a program.
 
 Shaders, futures and array operations are all helpful, since they 
 provide convenient mechanisms for utilizing parallelism.  However they 
 utterly fail to address the most difficult aspects of multi-threading, 
 which means they are not a complete solution.

I like your description -- I've probably been switching these terms myself. I also agree that futures are not a complete solution, however, I do think they are a useful tool. I think it depends on the reason for the concurrency in the first place. Why not write an algorithm sequentially instead of using multiple threads? 1. There is a lot of work to do and multiple CPUs to do it. 2. The process alternates between periods of heavy CPU / low IO usage, and low CPU / heavy I/O usage. Often one thread can use the CPU while the other waits for I/O. 3. Each thread represents interaction with another 'active entity', such as a socket connection to another computer, a GUI connection to an X server, an input device or piece of hardware. 4. Other cases such as (if I understand this correctly) O/S threads where the O/S thread corresponds to a user thread, and runs when the user thread is in a syscall. Items 1 and 2 can be done easily via futures -- they are strictly concerned with parallelism. Item 4 probably needs a thread for architectural reasons (except in a message passing O/S design? I'm not sure about this one...) Item 3: I think there is more flexibility here than is normally exploited; A design which currently uses the "one-thread-per-entity" philosophy can often be redesigned as a message passing algorithm or something like it. The select call helps for file-like datasets. Then the question comes: why (and if) message passing / futures are better than Thread and Mutex. Herb Sutter argues that it is hard to design correct code using locks and primitives like sleep/pause/mutex, and that it gets a lot harder with larger systems. (As I understand it...) Herb's argument is that if I have several modules that use Locks correctly, they often won't when combined. Deadlock (and livelock) avoidance require knowledge of the locking rules for the entire system. Without such knowledge, it is difficult to do things like lock ordering that prevent deadlocks. Other techniques are available (deadlock detection and rollback), but these can have their own thorny failure states. In a design based on futures, I can reason about correctness much more easily because the design can be made sequential trivially -- just don't compute the result until the point where the value is accessed. If the sequential program is correct, the concurrent version is probably correct. There are fewer opportunities for breaking it, because the code does synchronization in a few places instead of everywhere. (Of course there is synchronization and concurrency management in the Future implementation, but its pretty well constrained.) I agree completely with your premise that concurrency is fundamentally hard. So the goal (as I see it today) is to take as much of the concurrency as possible *out* of the algorithm, and still leverage multiple CPUs and solve the I/O vs. CPU problem I label as #2 above. Kevin
Jan 28 2007
parent reply Sean Kelly <sean f4.ca> writes:
Kevin Bealer wrote:
 
 Then the question comes: why (and if) message passing / futures are 
 better than Thread and Mutex.  Herb Sutter argues that it is hard to 
 design correct code using locks and primitives like sleep/pause/mutex, 
 and that it gets a lot harder with larger systems.

I don't think anyone is disagreeing with you here. CSP is built around message passing and was invented in the late 70s. And IIRC the agent model was designed in the early 60s.
 (As I understand it...) Herb's argument is that if I have several 
 modules that use Locks correctly, they often won't when combined. 
 Deadlock (and livelock) avoidance require knowledge of the locking rules 
 for the entire system.  Without such knowledge, it is difficult to do 
 things like lock ordering that prevent deadlocks.  Other techniques are 
 available (deadlock detection and rollback), but these can have their 
 own thorny failure states.

Yup. His basic argument is that object-oriented programming is incompatible with lock-based programming because object composition can result in unpredictable lock interaction. In essence, if you call into unknown when a lock is held then there is no way to prove your code will not deadlock.
 In a design based on futures, I can reason about correctness much more 
 easily because the design can be made sequential trivially -- just don't 
 compute the result until the point where the value is accessed.

I like futures, but they are structured in such a way that they still lend themselves to data sharing. They're definitely better than traditional lock-based programming and they're a good, efficient middle ground for parallel/concurrent programming, but there's something to be said for more structured models like CSP as well.
 I agree completely with your premise that concurrency is fundamentally 
 hard.  So the goal (as I see it today) is to take as much of the 
 concurrency as possible *out* of the algorithm, and still leverage 
 multiple CPUs and solve the I/O vs. CPU problem I label as #2 above.

One thing I like about Concur is that forces the user to think in terms of which tasks may be run in parallel without much affecting the structure of the application--it's a good introduction to parallel programming and it can be implemented fairly cleanly entirely in library code. But I think it will be a slow transition, because it's not a natural way for people to think about things. A while back I read that most congestion on the highways exists because people all accelerate and decelerate at different rates, so accretion points naturally form just from this interaction of 'particles'. But when people encounter a traffic slowdown their first thought is that there is a specific, localized cause: an accident occurred, etc. In essence, people tend to be reasonably good at delegation, but they are worse at understanding the interaction between atomic tasks. Eventually, both will be important, and the more machines can figure out the details for us the better off we'll be :-) Sean
Jan 28 2007
parent reply "Joel C. Salomon" <JoelCSalomon Gmail.com> writes:
Sean Kelly wrote:
 Kevin Bealer wrote:
 Then the question comes: why (and if) message passing / futures are 
 better than Thread and Mutex.  Herb Sutter argues that it is hard to 
 design correct code using locks and primitives like sleep/pause/mutex, 
 and that it gets a lot harder with larger systems.

I don't think anyone is disagreeing with you here. CSP is built around message passing and was invented in the late 70s. And IIRC the agent model was designed in the early 60s.

The Plan 9 threads library (<http://plan9.bell-labs.com/magic/man2html/2/thread>, ported to UNIX and available from <http://swtch.com/plan9port/>), the (defunct) language Alef, and the Limbo language on the Inferno system are all based on the CSP model. Beginning C programmers can write deadlock-free programs that usefully exploit concurrency. For God’s sake don’t copy the Java model… --Joel
Jan 28 2007
parent kris <foo bar.com> writes:
Joel C. Salomon wrote:
 Sean Kelly wrote:
 
 Kevin Bealer wrote:

 Then the question comes: why (and if) message passing / futures are 
 better than Thread and Mutex.  Herb Sutter argues that it is hard to 
 design correct code using locks and primitives like 
 sleep/pause/mutex, and that it gets a lot harder with larger systems.

I don't think anyone is disagreeing with you here. CSP is built around message passing and was invented in the late 70s. And IIRC the agent model was designed in the early 60s.

The Plan 9 threads library (<http://plan9.bell-labs.com/magic/man2html/2/thread>, ported to UNIX and available from <http://swtch.com/plan9port/>), the (defunct) language Alef, and the Limbo language on the Inferno system are all based on the CSP model. Beginning C programmers can write deadlock-free programs that usefully exploit concurrency. For God’s sake don’t copy the Java model… --Joel

Amen. And now that Mik has released his concise and syntactically slick CSP for D, well, perhaps even Tony Hoare might crack a smile :)
Jan 28 2007