www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.announce - Re: DMD 1.034 and 2.018 releases

reply bearophile <bearophileHUGS lycos.com> writes:
Walter Bright:
 Can you make it faster?

Lot of people today have 2 (or even 4 cores), the order of the computation of those ops is arbitrary, so a major (nearly linear, hopefully) speedup will probably come as soon all the cores are used. This job splitting is probably an advantage even then the ops aren't computed by asm code. I've taken a look at my code, and so far I don't see many spots where the array operations (once they actually give some speedup) can be useful (there are many other things I can find much more useful than such ops, see my wish lists). But if the array ops are useful for enough people, then it may be useful to burn some programming time to make those array ops use all the 2-4+ cores. Bye, bearophile
Aug 09 2008
parent reply Christopher Wright <dhasenan gmail.com> writes:
bearophile wrote:
 Walter Bright:
 Can you make it faster?

Lot of people today have 2 (or even 4 cores), the order of the computation of those ops is arbitrary, so a major (nearly linear, hopefully) speedup will probably come as soon all the cores are used. This job splitting is probably an advantage even then the ops aren't computed by asm code.

The overhead of creating a new thread for this would be significant. You'd probably be better off using a regular loop for arrays that are not huge.
Aug 09 2008
next sibling parent "Jarrett Billingsley" <kb3ctd2 yahoo.com> writes:
"Christopher Wright" <dhasenan gmail.com> wrote in message 
news:g7ljal$2i84$1 digitalmars.com...
 bearophile wrote:
 Walter Bright:
 Can you make it faster?

Lot of people today have 2 (or even 4 cores), the order of the computation of those ops is arbitrary, so a major (nearly linear, hopefully) speedup will probably come as soon all the cores are used. This job splitting is probably an advantage even then the ops aren't computed by asm code.

The overhead of creating a new thread for this would be significant. You'd probably be better off using a regular loop for arrays that are not huge.

I think we could see a lot more improvement from using vector ops to perform SIMD operations. They are just begging for it.
Aug 09 2008
prev sibling next sibling parent JAnderson <ask me.com> writes:
Christopher Wright wrote:
 bearophile wrote:
 Walter Bright:
 Can you make it faster?

Lot of people today have 2 (or even 4 cores), the order of the computation of those ops is arbitrary, so a major (nearly linear, hopefully) speedup will probably come as soon all the cores are used. This job splitting is probably an advantage even then the ops aren't computed by asm code.

The overhead of creating a new thread for this would be significant. You'd probably be better off using a regular loop for arrays that are not huge.

I agree. I think a lot of profiling would be in order to see when certain things become an advantage to use. Then use a branch to jump to the best algorithm for the particular case (platform + length of array). Hopefully the compiler could inline the algorithm so that constant sized arrays don't pay for the additional overhead. There would be a small cost for the extra branch for small dynamic arrays. Ideally one could argue that if this becomes a performance bottleneck then the program is doing a lot of operations on lots of small arrays. The user could change the design to group their small arrays into a larger array to get the performance they desire. -Joel
Aug 09 2008
prev sibling parent renoX <renosky free.fr> writes:
Christopher Wright a écrit :
 bearophile wrote:
 Walter Bright:
 Can you make it faster?

Lot of people today have 2 (or even 4 cores), the order of the computation of those ops is arbitrary, so a major (nearly linear, hopefully) speedup will probably come as soon all the cores are used. This job splitting is probably an advantage even then the ops aren't computed by asm code.

The overhead of creating a new thread for this would be significant.

Well for this kind of scheme, you wouldn't start a new set of thread each time! Just start a set of worker threads (one per cpu pinned to each cpu) which are created at startup of the program, and do nothing until they are woken up when there is an operation which can be accelerated through parallelism.
 You'd probably be better off using a regular loop for arrays that are 
 not huge.

Sure, even with pre-created threads, using several cpu induce additional cost at startup and end cost so this would be worthwhile only with loops 'big enough'.. A pitfall also is to ensure that two cpu don't write to the same cache line, otherwise this 'false sharing' will reduce the performance. renoX
Sep 07 2008