digitalmars.D.learn - foreach - premature optimization vs cultivating good habits

Laeeth Isharc (25/25) Jan 30 2015 Hi.

BBaz (7/32) Jan 30 2015 foreach() is good for linked list because it uses a delegate so
Adam D. Ruppe (9/11) Jan 30 2015 foreach is just syntax sugar over a for loop. If there's any

Laeeth Isharc (6/17) Jan 30 2015 Thanks, Adam. That's what I had thought (your first paragraph),

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= (25/29) Jan 31 2015 There is always significant optimization effects in long running

Laeeth Isharc (3/3) Feb 01 2015 Thank you Adam, Bbaz and Ola for the helpful thoughts. I dumped

John Colvin (3/28) Jan 30 2015 I think you're thinking of java's foreach syntax, which IIRC does

"Laeeth Isharc" <laeethnospam nospamlaeeth.com> writes:

Hi.

The standard advice is not to worry about memory usage and 
execution speed until profiling shows you where the problem is, 
and I respect Knuth greatly as a thinker.

Still, one may learn from others' experience and cultivate good 
habits early.  To say that one should not prematurely optimize is 
not to say that one should not try to avoid cases that tend to be 
really bad, and I would rather learn from others what these are 
then learn only the hard way.

For the time being I am still at early development stage and have 
not yet tested things with the larger data sets I anticipate 
eventually using.  It's cheap to make slightly different design 
decisions early, but much more painful further down the line, 
particularly given my context.

As I understand it, foreach allocates when a simple C-style for 
using an array index would not.  I would like to learn more about 
when this turns particularly expensive, and perhaps I could put 
this up on the wiki if people think it is a good idea.

What exactly does it allocate, and how often, and how large is 
this in relation to the size of the underlying data 
(structs/classes/ranges)?  Are there any cache effects to 
consider?  Happy to go to the source code if you can give me some 
pointers.

Thanks in advice for any thoughts.



Laeeth.

Jan 30 2015

"BBaz" <bb.temp gmx.com> writes:

On Friday, 30 January 2015 at 11:55:16 UTC, Laeeth Isharc wrote:
 Hi.

 The standard advice is not to worry about memory usage and 
 execution speed until profiling shows you where the problem is, 
 and I respect Knuth greatly as a thinker.

 Still, one may learn from others' experience and cultivate good 
 habits early.  To say that one should not prematurely optimize 
 is not to say that one should not try to avoid cases that tend 
 to be really bad, and I would rather learn from others what 
 these are then learn only the hard way.

 For the time being I am still at early development stage and 
 have not yet tested things with the larger data sets I 
 anticipate eventually using.  It's cheap to make slightly 
 different design decisions early, but much more painful further 
 down the line, particularly given my context.

 As I understand it, foreach allocates when a simple C-style for 
 using an array index would not.  I would like to learn more 
 about when this turns particularly expensive, and perhaps I 
 could put this up on the wiki if people think it is a good idea.

 What exactly does it allocate, and how often, and how large is 
 this in relation to the size of the underlying data 
 (structs/classes/ranges)?  Are there any cache effects to 
 consider?  Happy to go to the source code if you can give me 
 some pointers.

 Thanks in advice for any thoughts.



 Laeeth.

foreach() is good for linked list because it uses a delegate so 
it avoids to cross the items from 0 to i at each iteration.

for() is good for arrays because of the pointer arithmetic.

But the D way for foreach'es is to use ranges: popFront etc, 
although i often feel too lazy to use them...

You'll probably get more technical answers...

Jan 30 2015

"Adam D. Ruppe" <destructionator gmail.com> writes:

On Friday, 30 January 2015 at 11:55:16 UTC, Laeeth Isharc wrote:
 As I understand it, foreach allocates when a simple C-style for 
 using an array index would not.

foreach is just syntax sugar over a for loop. If there's any 
allocations, it is because your code had some, it isn't inherit 
to the loop. The doc definition even lists the translation of 
foreach to for in the case of ranges explicitly:

http://dlang.org/statement.html#ForeachStatement


The most likely allocation would be to a user-defined opApply 
delegate, and you can prevent that by making it opApply(scope 
your_delegate) - the scope word prevents any closure allocation.

Jan 30 2015

"Laeeth Isharc" <laeethnospam nospamlaeeth.com> writes:

On Friday, 30 January 2015 at 12:55:20 UTC, Adam D. Ruppe wrote:
 On Friday, 30 January 2015 at 11:55:16 UTC, Laeeth Isharc wrote:
 As I understand it, foreach allocates when a simple C-style 
 for using an array index would not.

 foreach is just syntax sugar over a for loop. If there's any 
 allocations, it is because your code had some, it isn't inherit 
 to the loop. The doc definition even lists the translation of 
 foreach to for in the case of ranges explicitly:

 http://dlang.org/statement.html#ForeachStatement


 The most likely allocation would be to a user-defined opApply 
 delegate, and you can prevent that by making it opApply(scope 
 your_delegate) - the scope word prevents any closure allocation.

Thanks, Adam.  That's what I had thought (your first paragraph), 
but something Ola on a different thread confused me and made me 
think I didn't understand it, and I wanted to pin it down.

The second paragraph is very helpful - appreciate it.


Laeeth.

Jan 30 2015

"Ola Fosheim =?UTF-8?B?R3LDuHN0YWQi?= writes:

On Friday, 30 January 2015 at 14:41:11 UTC, Laeeth Isharc wrote:
 Thanks, Adam.  That's what I had thought (your first 
 paragraph), but something Ola on a different thread confused me 
 and made me think I didn't understand it, and I wanted to pin 
 it down.

There is always significant optimization effects in long running 
loops:
- SIMD
- cache locality / prefetching

For the former (SIMD) you need to make sure that good code is 
generated either by hand, by using vectorized libraries or by 
auto vectorization.

For the latter (cache) you need to make sure that the prefetcher 
is able to predict or is being told to prefetch explicitly and 
also that the working set is small enough to stay at the faster 
cache levels.

If you want good performance you cannot ignore any of these, and 
you have to design the data structures and algorithms for it. 
Prefetching has to happen maybe 100 instructions before the 
actual load from memory and AVX requires byte alignment and a 
layout that fits the algorithm. On next gen Xeon Skylake I think 
the alignment might go up to 64 byte and you have 512 bits wide 
registers (so you can do 8 64 bit floating point operations in 
parallel per core). The difference between issuing 1-4 ops and 
issuing 8-16 per time unit is noticable...

An of course, the closer your code is to theoretical throughput 
in the CPU, the more critical it becomes to not wait for memory 
loads.

This is also a moving target...

Jan 31 2015

"Laeeth Isharc" <laeethnospam nospamlaeeth.com> writes:

Thank you Adam, Bbaz and Ola for the helpful thoughts.  I dumped 
them in a wiki page off the sandbox but needs editing and 
refining.

Feb 01 2015

"John Colvin" <john.loughran.colvin gmail.com> writes:

On Friday, 30 January 2015 at 11:55:16 UTC, Laeeth Isharc wrote:
 Hi.

 The standard advice is not to worry about memory usage and 
 execution speed until profiling shows you where the problem is, 
 and I respect Knuth greatly as a thinker.

 Still, one may learn from others' experience and cultivate good 
 habits early.  To say that one should not prematurely optimize 
 is not to say that one should not try to avoid cases that tend 
 to be really bad, and I would rather learn from others what 
 these are then learn only the hard way.

 For the time being I am still at early development stage and 
 have not yet tested things with the larger data sets I 
 anticipate eventually using.  It's cheap to make slightly 
 different design decisions early, but much more painful further 
 down the line, particularly given my context.

 As I understand it, foreach allocates when a simple C-style for 
 using an array index would not.  I would like to learn more 
 about when this turns particularly expensive, and perhaps I 
 could put this up on the wiki if people think it is a good idea.

 What exactly does it allocate, and how often, and how large is 
 this in relation to the size of the underlying data 
 (structs/classes/ranges)?  Are there any cache effects to 
 consider?  Happy to go to the source code if you can give me 
 some pointers.

 Thanks in advice for any thoughts.



 Laeeth.

I think you're thinking of java's foreach syntax, which IIRC does 
cause allocations.

Jan 30 2015

D Programming

C/C++ Programming

Other

digitalmars.D.learn - foreach - premature optimization vs cultivating good habits