digitalmars.D - Re: RFC: naming for FrontTransversal and Transversal ranges

bearophile <bearophileHUGS lycos.com> May 01 2009

Bill Baxter <wbaxter gmail.com> May 01 2009

Don <nospam nospam.com> May 02 2009

Georg Wrede <georg.wrede iki.fi> May 02 2009

bearophile <bearophileHUGS lycos.com> May 02 2009

Georg Wrede <georg.wrede iki.fi> May 02 2009

"Robert Jacques" <sandford jhu.edu> May 02 2009
"Robert Jacques" <sandford jhu.edu> May 02 2009

bearophile <bearophileHUGS lycos.com> writes:

Bill Baxter:
 Much more often the discussion on the numpy list takes the form of
 "how do I make this loop faster" becuase loops are slow in Python so
 you have to come up with clever transformations to turn your loop into
 array ops.  This is thankfully a problem that D array libs do not
 have.  If you think of it as a loop, go ahead and implement it as a
 loop.


Sigh! Already today, and even more tomorrow, this is often false for D too. In
my computer I have a cheap GPU that is sleeping while my D code runs. Even my
other core sleeps. And I am using one core at 32 bits only.
You will need ways to data-parallelize and other forms of parallel processing.
So maybe nornmal loops will not cuti it. I think the best way to go is
following the ideas of the Chapel language I have shown a bit here:
http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=87311

Bye,
bearophile

May 01 2009

Bill Baxter <wbaxter gmail.com> writes:

On Fri, May 1, 2009 at 5:36 PM, bearophile <bearophileHUGS lycos.com> wrote=
:
 Bill Baxter:
 Much more often the discussion on the numpy list takes the form of
 "how do I make this loop faster" becuase loops are slow in Python so
 you have to come up with clever transformations to turn your loop into
 array ops. =A0This is thankfully a problem that D array libs do not
 have. =A0If you think of it as a loop, go ahead and implement it as a
 loop.


 Sigh! Already today, and even more tomorrow, this is often false for D to=


 Even my other core sleeps. And I am using one core at 32 bits only.
 You will need ways to data-parallelize and other forms of parallel proces=



Yeh.  If you want to use multiple cores you've got a whole 'nother can
o worms.  But at least I find that today most apps seem get by just
fine using a single core.  Strange though, aren't you the guy always
telling us how being able to express your algorithm clearly is often
more important than raw performance?

--bb

May 01 2009

Don <nospam nospam.com> writes:

Bill Baxter wrote:
 On Fri, May 1, 2009 at 5:36 PM, bearophile <bearophileHUGS lycos.com> wrote:
 Bill Baxter:
 Much more often the discussion on the numpy list takes the form of
 "how do I make this loop faster" becuase loops are slow in Python so
 you have to come up with clever transformations to turn your loop into
 array ops.  This is thankfully a problem that D array libs do not
 have.  If you think of it as a loop, go ahead and implement it as a
 loop.


 You will need ways to data-parallelize and other forms of parallel processing.
So maybe nornmal loops will not cuti it.


 Yeh.  If you want to use multiple cores you've got a whole 'nother can
 o worms.  But at least I find that today most apps seem get by just
 fine using a single core.  Strange though, aren't you the guy always
 telling us how being able to express your algorithm clearly is often
 more important than raw performance?
 
 --bb


I confess to being mighty skeptical about the whole multi-threaded, 
multi-core thing. I think we're going to find that there's only two 
practical uses of multi-core:
(1) embarressingly-parallel operations; and
(2) process-level concurrency.
I just don't believe that apps have as much opportunity for parallelism 
as people seem to think. There's just too many dependencies.
Sure, you can (say) with a game, split your AI into a seperate core from 
your graphics stuff, but that's only applicable for 2-4 cores. It 
doesn't work for 100+ cores.

(Which is why I think that broadening the opportunity for case (1) is 
the most promising avenue for actually using a host of cores).

May 02 2009

Georg Wrede <georg.wrede iki.fi> writes:

Don wrote:
 Bill Baxter wrote:
 On Fri, May 1, 2009 at 5:36 PM, bearophile <bearophileHUGS lycos.com> 
 wrote:
 Bill Baxter:
 Much more often the discussion on the numpy list takes the form of
 "how do I make this loop faster" becuase loops are slow in Python so
 you have to come up with clever transformations to turn your loop into
 array ops.  This is thankfully a problem that D array libs do not
 have.  If you think of it as a loop, go ahead and implement it as a
 loop.


 D too. In my computer I have a cheap GPU that is sleeping while my D 
 code runs. Even my other core sleeps. And I am using one core at 32 
 bits only.
 You will need ways to data-parallelize and other forms of parallel 
 processing. So maybe nornmal loops will not cuti it.


 Yeh.  If you want to use multiple cores you've got a whole 'nother can
 o worms.  But at least I find that today most apps seem get by just
 fine using a single core.  Strange though, aren't you the guy always
 telling us how being able to express your algorithm clearly is often
 more important than raw performance?

 --bb


 I confess to being mighty skeptical about the whole multi-threaded, 
 multi-core thing. I think we're going to find that there's only two 
 practical uses of multi-core:
 (1) embarressingly-parallel operations; and
 (2) process-level concurrency.
 I just don't believe that apps have as much opportunity for parallelism 
 as people seem to think. There's just too many dependencies.
 Sure, you can (say) with a game, split your AI into a seperate core from 
 your graphics stuff, but that's only applicable for 2-4 cores. It 
 doesn't work for 100+ cores.


I had this bad dream where there's a language where it's trivial to use 
multiple CPUs. And I could see every Joe and John executing their 
trivial apps, each of which used all available CPUs. They had their 
programs and programlets run twice or four times as fast, but most of 
them ran in less than a couple of seconds anyway, and the longer ones 
spent most of their time waiting for external resources.

All it ended up with was a lot of work for the OS, the total throughput 
of the computer decreasing because now every CPU had to deal with every 
process, not to mention the increase in electricity consumption and heat 
because none of the CPUs could rest. And still nobody was using the GPU, 
MMX, SSE, etc.

Most of these programs consisted of sequences, with the odd selection or 
short iteration spread far and apart. And none of them used 
parallellizable data.

 (Which is why I think that broadening the opportunity for case (1) is 
 the most promising avenue for actually using a host of cores).


The more I think about it, the more I'm starting to believe that the 
average desktop or laptop won't see two dozen cores in the immediate 
future. And definitely, by the time there are more cores than processes 
on the average Windows PC, we're talking about gross wasteage.

OTOH, Serious Computing is different, of course. Corporate machine rooms 
would benefit from many cores. Virtual host servers, heavy-duty web 
servers, and of course scientific and statistical computing come to mind.

It's interesting to note that in the old days, machine room computers 
were totally different from PCs. Then they sort-of got together, as in 
machine rooms all of a sudden filled with regular PCs running Linux. And 
now, I see the trend again separating the PC from the machine room 
computers. Software for the latter might be the target for language 
features that utilize multiple CPUs.

May 02 2009

bearophile <bearophileHUGS lycos.com> writes:

Georg Wrede:
 The more I think about it, the more I'm starting to believe that the 
 average desktop or laptop won't see two dozen cores in the immediate 
 future.


Too much late. My personal computer has already about 100 small cores in the
GPU, and using CUDA (and soon OpenCL) you are even able to use them for almost
general computations...
And then comes Nehalem:
http://en.wikipedia.org/wiki/Intel_Nehalem_(microarchitecture)

Bye,
bearophile

May 02 2009

Georg Wrede <georg.wrede iki.fi> writes:

bearophile wrote:
 Georg Wrede:
 The more I think about it, the more I'm starting to believe that the 
 average desktop or laptop won't see two dozen cores in the immediate 
 future.


 Too much late. My personal computer has already about 100 small cores in the
GPU, and using CUDA (and soon OpenCL) you are even able to use them for almost
general computations...
 And then comes Nehalem:
 http://en.wikipedia.org/wiki/Intel_Nehalem_(microarchitecture)



http://en.wikipedia.org/wiki/Simultaneous_multithreading:

The latest MIPS architecture designs include an SMT system known as
"MIPS MT". MIPS MT provides for both heavyweight virtual processing
elements and lighter-weight hardware microthreads. RMI, a
Cupertino-based startup, is the first MIPS vendor to provide a processor
SOC based on 8 cores, each of which runs 4 threads. The threads can be
run in fine-grain mode where a different thread can be executed each
cycle. The threads can also be assigned priorities.

I hope this stuff is for the machine room. :-)

And recession helps...

May 02 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Fri, 01 May 2009 21:14:54 -0400, Bill Baxter <wbaxter gmail.com> wrote:

 On Fri, May 1, 2009 at 5:36 PM, bearophile <bearophileHUGS lycos.com>  
 wrote:
 Bill Baxter:
 Much more often the discussion on the numpy list takes the form of
 "how do I make this loop faster" becuase loops are slow in Python so
 you have to come up with clever transformations to turn your loop into
 array ops. �This is thankfully a problem that D array libs do not
 have. �If you think of it as a loop, go ahead and implement it as a
 loop.


 Sigh! Already today, and even more tomorrow, this is often false for D  
 too. In my computer I have a cheap GPU that is sleeping while my D code  
 runs. Even my other core sleeps. And I am using one core at 32 bits  
 only.
 You will need ways to data-parallelize and other forms of parallel  
 processing. So maybe nornmal loops will not cuti it.


 Yeh.  If you want to use multiple cores you've got a whole 'nother can
 o worms.  But at least I find that today most apps seem get by just
 fine using a single core.  Strange though, aren't you the guy always
 telling us how being able to express your algorithm clearly is often
 more important than raw performance?

 --bb


Well, since I do GP-GPU work, the GPU algorithm is much algorithmically  
cleaner than the CPU algorithm. :) But I do know that this is very  
algorithm dependent.

May 02 2009

"Robert Jacques" <sandford jhu.edu> writes:

On Sat, 02 May 2009 04:17:29 -0400, Don <nospam nospam.com> wrote:

 Bill Baxter wrote:
 On Fri, May 1, 2009 at 5:36 PM, bearophile <bearophileHUGS lycos.com>  
 wrote:
 Bill Baxter:
 Much more often the discussion on the numpy list takes the form of
 "how do I make this loop faster" becuase loops are slow in Python so
 you have to come up with clever transformations to turn your loop into
 array ops.  This is thankfully a problem that D array libs do not
 have.  If you think of it as a loop, go ahead and implement it as a
 loop.


 too. In my computer I have a cheap GPU that is sleeping while my D  
 code runs. Even my other core sleeps. And I am using one core at 32  
 bits only.
 You will need ways to data-parallelize and other forms of parallel  
 processing. So maybe nornmal loops will not cuti it.


 o worms.  But at least I find that today most apps seem get by just
 fine using a single core.  Strange though, aren't you the guy always
 telling us how being able to express your algorithm clearly is often
 more important than raw performance?
  --bb


 I confess to being mighty skeptical about the whole multi-threaded,  
 multi-core thing. I think we're going to find that there's only two  
 practical uses of multi-core:
 (1) embarressingly-parallel operations; and
 (2) process-level concurrency.
 I just don't believe that apps have as much opportunity for parallelism  
 as people seem to think. There's just too many dependencies.
 Sure, you can (say) with a game, split your AI into a seperate core from  
 your graphics stuff, but that's only applicable for 2-4 cores. It  
 doesn't work for 100+ cores.

 (Which is why I think that broadening the opportunity for case (1) is  
 the most promising avenue for actually using a host of cores).


Actually, AI is mostly embarrassingly-parallel. The issue is the mostly  
part of that statement, which is why optimized reader/writer locks and STM  
are showing up in game engines. And really, >90% of the CPU time has been  
physics, which is both embarrassingly-parallel and being off-loaded onto  
the GPU/multi-cores.

May 02 2009

D Programming

C/C++ Programming

Other

digitalmars.D - Re: RFC: naming for FrontTransversal and Transversal ranges