www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - foreach - premature optimization vs cultivating good habits

reply "Laeeth Isharc" <laeethnospam nospamlaeeth.com> writes:
Hi.

The standard advice is not to worry about memory usage and 
execution speed until profiling shows you where the problem is, 
and I respect Knuth greatly as a thinker.

Still, one may learn from others' experience and cultivate good 
habits early.  To say that one should not prematurely optimize is 
not to say that one should not try to avoid cases that tend to be 
really bad, and I would rather learn from others what these are 
then learn only the hard way.

For the time being I am still at early development stage and have 
not yet tested things with the larger data sets I anticipate 
eventually using.  It's cheap to make slightly different design 
decisions early, but much more painful further down the line, 
particularly given my context.

As I understand it, foreach allocates when a simple C-style for 
using an array index would not.  I would like to learn more about 
when this turns particularly expensive, and perhaps I could put 
this up on the wiki if people think it is a good idea.

What exactly does it allocate, and how often, and how large is 
this in relation to the size of the underlying data 
(structs/classes/ranges)?  Are there any cache effects to 
consider?  Happy to go to the source code if you can give me some 
pointers.

Thanks in advice for any thoughts.



Laeeth.
Jan 30 2015
next sibling parent "BBaz" <bb.temp gmx.com> writes:
On Friday, 30 January 2015 at 11:55:16 UTC, Laeeth Isharc wrote:
 Hi.

 The standard advice is not to worry about memory usage and 
 execution speed until profiling shows you where the problem is, 
 and I respect Knuth greatly as a thinker.

 Still, one may learn from others' experience and cultivate good 
 habits early.  To say that one should not prematurely optimize 
 is not to say that one should not try to avoid cases that tend 
 to be really bad, and I would rather learn from others what 
 these are then learn only the hard way.

 For the time being I am still at early development stage and 
 have not yet tested things with the larger data sets I 
 anticipate eventually using.  It's cheap to make slightly 
 different design decisions early, but much more painful further 
 down the line, particularly given my context.

 As I understand it, foreach allocates when a simple C-style for 
 using an array index would not.  I would like to learn more 
 about when this turns particularly expensive, and perhaps I 
 could put this up on the wiki if people think it is a good idea.

 What exactly does it allocate, and how often, and how large is 
 this in relation to the size of the underlying data 
 (structs/classes/ranges)?  Are there any cache effects to 
 consider?  Happy to go to the source code if you can give me 
 some pointers.

 Thanks in advice for any thoughts.



 Laeeth.
foreach() is good for linked list because it uses a delegate so it avoids to cross the items from 0 to i at each iteration. for() is good for arrays because of the pointer arithmetic. But the D way for foreach'es is to use ranges: popFront etc, although i often feel too lazy to use them... You'll probably get more technical answers...
Jan 30 2015
prev sibling next sibling parent reply "Adam D. Ruppe" <destructionator gmail.com> writes:
On Friday, 30 January 2015 at 11:55:16 UTC, Laeeth Isharc wrote:
 As I understand it, foreach allocates when a simple C-style for 
 using an array index would not.
foreach is just syntax sugar over a for loop. If there's any allocations, it is because your code had some, it isn't inherit to the loop. The doc definition even lists the translation of foreach to for in the case of ranges explicitly: http://dlang.org/statement.html#ForeachStatement The most likely allocation would be to a user-defined opApply delegate, and you can prevent that by making it opApply(scope your_delegate) - the scope word prevents any closure allocation.
Jan 30 2015
parent reply "Laeeth Isharc" <laeethnospam nospamlaeeth.com> writes:
On Friday, 30 January 2015 at 12:55:20 UTC, Adam D. Ruppe wrote:
 On Friday, 30 January 2015 at 11:55:16 UTC, Laeeth Isharc wrote:
 As I understand it, foreach allocates when a simple C-style 
 for using an array index would not.
foreach is just syntax sugar over a for loop. If there's any allocations, it is because your code had some, it isn't inherit to the loop. The doc definition even lists the translation of foreach to for in the case of ranges explicitly: http://dlang.org/statement.html#ForeachStatement The most likely allocation would be to a user-defined opApply delegate, and you can prevent that by making it opApply(scope your_delegate) - the scope word prevents any closure allocation.
Thanks, Adam. That's what I had thought (your first paragraph), but something Ola on a different thread confused me and made me think I didn't understand it, and I wanted to pin it down. The second paragraph is very helpful - appreciate it. Laeeth.
Jan 30 2015
parent reply "Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:
On Friday, 30 January 2015 at 14:41:11 UTC, Laeeth Isharc wrote:
 Thanks, Adam.  That's what I had thought (your first 
 paragraph), but something Ola on a different thread confused me 
 and made me think I didn't understand it, and I wanted to pin 
 it down.
There is always significant optimization effects in long running loops: - SIMD - cache locality / prefetching For the former (SIMD) you need to make sure that good code is generated either by hand, by using vectorized libraries or by auto vectorization. For the latter (cache) you need to make sure that the prefetcher is able to predict or is being told to prefetch explicitly and also that the working set is small enough to stay at the faster cache levels. If you want good performance you cannot ignore any of these, and you have to design the data structures and algorithms for it. Prefetching has to happen maybe 100 instructions before the actual load from memory and AVX requires byte alignment and a layout that fits the algorithm. On next gen Xeon Skylake I think the alignment might go up to 64 byte and you have 512 bits wide registers (so you can do 8 64 bit floating point operations in parallel per core). The difference between issuing 1-4 ops and issuing 8-16 per time unit is noticable... An of course, the closer your code is to theoretical throughput in the CPU, the more critical it becomes to not wait for memory loads. This is also a moving target...
Jan 31 2015
parent "Laeeth Isharc" <laeethnospam nospamlaeeth.com> writes:
Thank you Adam, Bbaz and Ola for the helpful thoughts.  I dumped 
them in a wiki page off the sandbox but needs editing and 
refining.
Feb 01 2015
prev sibling parent "John Colvin" <john.loughran.colvin gmail.com> writes:
On Friday, 30 January 2015 at 11:55:16 UTC, Laeeth Isharc wrote:
 Hi.

 The standard advice is not to worry about memory usage and 
 execution speed until profiling shows you where the problem is, 
 and I respect Knuth greatly as a thinker.

 Still, one may learn from others' experience and cultivate good 
 habits early.  To say that one should not prematurely optimize 
 is not to say that one should not try to avoid cases that tend 
 to be really bad, and I would rather learn from others what 
 these are then learn only the hard way.

 For the time being I am still at early development stage and 
 have not yet tested things with the larger data sets I 
 anticipate eventually using.  It's cheap to make slightly 
 different design decisions early, but much more painful further 
 down the line, particularly given my context.

 As I understand it, foreach allocates when a simple C-style for 
 using an array index would not.  I would like to learn more 
 about when this turns particularly expensive, and perhaps I 
 could put this up on the wiki if people think it is a good idea.

 What exactly does it allocate, and how often, and how large is 
 this in relation to the size of the underlying data 
 (structs/classes/ranges)?  Are there any cache effects to 
 consider?  Happy to go to the source code if you can give me 
 some pointers.

 Thanks in advice for any thoughts.



 Laeeth.
I think you're thinking of java's foreach syntax, which IIRC does cause allocations.
Jan 30 2015