digitalmars.D.learn - Just another question about memory management in d from a newbie

Thomas (53/53) Jun 17 2019 Hello!

Adam D. Ruppe (9/13) Jun 17 2019 No, the compiler will generate code to reuse the same thing each

Thomas (3/5) Jun 17 2019 Does this code also work on complex types like structs ?

H. S. Teoh (79/112) Jun 17 2019 If x were a heap-allocated object, then your concerns would be true: it

Thomas (4/11) Jun 17 2019 Thank you for your exact explanation on how the compiler works

Thomas <earthwormjimmy gmx.at> writes:

Hello!

First my background: C++ and Java ages ago. Since then only 
PLSQL. Now learning D just for fun and personal education on time 
to time and very pleased about it :-)

Now I have to ask a question here, because I could not find a 
corresponding answer for it. Or I am unable to find it :-)

I was wondering about the memory system in D (and other C like 
languages) about the handling of the memory allocation overhead.

Just an example that I have seen many times:


int main()
{
     foreach(i;0 .. 10000)
     {
         int x;
         // do something with x
     }
     return 0;
}


Do I understand it right that the variable x will be created 
10000 times and destroyed at the end of the scope in each loop ? 
Or will it be 10000 overwritten by creation ?
I mean does it not cost the CPU some time to allocate (I know 
were talking here of nsec) but work is work. As far I know from 
my school days in assembler, allocation of memory is one of the 
most expensive instructions on the cpu. Activate memory block, 
find some free place, reserve, return pointer to caller, and so 
on..

In my opinion this version should perform a little better:

int main()
{
     int x;
     foreach(i;0 .. 10000)
     {
         x = 0; // reinitialize
         // do something with x
     }
     return 0;
}

Or do I miss something and there is an optimization by the 
compiler to avoid recreating/destroying the variable x 10000 
times ?

I know that version 1 is more secure because there will be no 
value waste after each loop we could stumble onto, but that is 
not my question here.
And I know that were talking about really small cpu usage 
compared to what an app should do. But when it is to look on 
performance like games or gui (and there are a lot of examples 
like version 1 out there) then I have to ask myself if it is not 
just a waste of cpu time ?

Or is it a styling of code thing ?

Thank you for your time!


Greetings from Austria
Thomas

Jun 17 2019

Adam D. Ruppe <destructionator gmail.com> writes:

On Monday, 17 June 2019 at 19:53:52 UTC, Thomas wrote:
 Do I understand it right that the variable x will be created 
 10000 times and destroyed at the end of the scope in each loop 
 ? Or will it be 10000 overwritten by creation ?

No, the compiler will generate code to reuse the same thing each 
loop.

 In my opinion this version should perform a little better:

That's actually slightly *less* efficient (in some situations) 
because it doesn't allow the compiler to reuse the memory for `x` 
after the loop has exited.


But in both cases, the compiler will just set aside a little bit 
of local space (or just a cpu scratch register) and use it 
repeatedly. It is totally "free".

Jun 17 2019

Thomas <earthwormjimmy gmx.at> writes:

First, thank you for your fast reply!

On Monday, 17 June 2019 at 20:00:34 UTC, Adam D. Ruppe wrote:
 No, the compiler will generate code to reuse the same thing 
 each loop.

Does this code also work on complex types like structs ?

Jun 17 2019

"H. S. Teoh" <hsteoh quickfur.ath.cx> writes:

On Mon, Jun 17, 2019 at 07:53:52PM +0000, Thomas via Digitalmars-d-learn wrote:
[...]
 int main()
 {
     foreach(i;0 .. 10000)
     {
         int x;
         // do something with x
     }
     return 0;
 }
 
 Do I understand it right that the variable x will be created 10000
 times and destroyed at the end of the scope in each loop ? Or will it
 be 10000 overwritten by creation ?

If x were a heap-allocated object, then your concerns would be true: it
would be allocated once every iteration (and also add to the garbage
that the GC will have to collect later).

However, in this case x is an int, which is a value type. This means it
will be allocated on the stack, and allocation is as trivial as bumping
the stack pointer (practically zero cost), and deallocation is as simple
as bumping the stack pointer the other way. In fact, it doesn't even
have to bump the stack pointer between iterations, since it's obvious
from code analysis that x will always be allocated on the same position
of the stack relative to the function's call frame, so in the generated
machine code it can be as simple as just using that memory location
directly, and reusing the same location between iterations.


 I mean does it not cost the CPU some time to allocate (I know were
 talking here of nsec) but work is work. As far I know from my school
 days in assembler, allocation of memory is one of the most expensive
 instructions on the cpu. Activate memory block, find some free place,
 reserve, return pointer to caller, and so on..

That's not entirely accurate.  It depends on how the allocation is done.
Allocation on the stack is extremely cheap,and consists literally of
adding some number (the size of the allocation) to a register (the stack
pointer) -- you cannot get much simpler than that.

Also, memory allocation on the heap is not "one of the most expensive
instructions".  The reason it's expensive is because it's not a single
instruction, but an entire subroutine of instructions to manage the
heap.  There is no CPU I know of that has a built-in instruction for
heap allocation.


 In my opinion this version should perform a little better:
 
 int main()
 {
     int x;
     foreach(i;0 .. 10000)
     {
         x = 0; // reinitialize
         // do something with x
     }
     return 0;
 }
 
 Or do I miss something and there is an optimization by the compiler to
 avoid recreating/destroying the variable x 10000 times ?

[...]

What you wrote here is exactly how the compiler will emit the machine
code given your first code example above.  Basically all modern
optimizing compilers will implement it this way, and you never have to
worry about stack allocation being slow.

The only time you will have a performance problem is if x is a
heap-allocated object. In *that* case you will want to look into reusing
previous instances of the object between iterations.  But if x is a
value type (int, or struct, etc.), you don't have to worry about this at
all.

The first version of the code you have above is the preferred one,
because it makes the code clearer.

On a related note, in modern programming languages generally you should
be more concerned about the higher-level meaning of the code than worry
about low-level details like how instructions are generated, because
generally speaking the machine code generated by the compiler is highly
transformed from the original source code, and generally will not have a
1-to-1 mapping to the higher-level logical meaning of the code, i.e.,
the first version of the code *logically* allocates x at the beginning
of each iteration of the loop and deallocates it at the end, but the
actual generated machine code elides all of that because the compiler's
optimizer can easily see that it's a stack allocation that always ends
up in the same place, so none of the allocation/deallocation actually
needs to be represented as-is in the machine code.

On another note, for performance-related concerns the general advice
these days is to write something in the most straightforward, logical
way first, and then if the performance is not good enough, **use a
profiler** to identify where the hotspots are, and optimize those.
Trying to optimize before you have real-world, profiler data that your
code is a hotspot is premature optimization, widely regarded as evil
because it usually leads to overly-convoluted code that's hard to
understand and maintain, and often actually *slower* because the way you
expressed the meaning of the code has become so obfuscated that the
optimizer can't figure out what you're actually trying to do, so it
gives up and doesn't even try to apply any optimizations that may have
benefitted the code.

(Of course, the above is with the caveat that writing "straightforward"
code only holds up to a certain extent; when it comes to algorithms, for
example, no compiler is going to change an O(n^2) algorithm into an
O(log n) one; you have to select the appropriate algorithm in the first
place in order to avoid the inevitable performance bottleneck. And there
are certain coding practices that can result in overall better
performance, but these come with experience, and should not be blindly
applied in the hopes that it will magically make your program faster. If
you have a hotspot somewhere in the program, you can optimize everything
else to hell and back and it will still be slow, whereas if you use a
profiler to identify the hotspot in the first place, a 1-line change
could have won you an 80% performance boost where long hours of blood
and sweat won you only 5%.  You need to identify and eliminate the big
hotspots first, and *then* come back and trim off the smaller,
lower-level stuff for that extra 2-5% win.  And experience shows that
generally, most predictions of where the hotspots will be are wrong.
Always use a profiler before jumping to conclusions.)


T

-- 
Those who don't understand D are condemned to reinvent it, poorly. -- Daniel N

Jun 17 2019

Thomas <earthwormjimmy gmx.at> writes:

On Monday, 17 June 2019 at 20:26:28 UTC, H. S. Teoh wrote:
 On Mon, Jun 17, 2019 at 07:53:52PM +0000, Thomas via 
 Digitalmars-d-learn wrote: [...]
 [...]

 If x were a heap-allocated object, then your concerns would be 
 true: it would be allocated once every iteration (and also add 
 to the garbage that the GC will have to collect later).

 [...]

Thank you for your exact explanation on how the compiler works 
inside. That will clear some question that have bothered me.

Thomas

Jun 17 2019

D Programming

C/C++ Programming

Other

digitalmars.D.learn - Just another question about memory management in d from a newbie