www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Just another question about memory management in d from a newbie

reply Thomas <earthwormjimmy gmx.at> writes:
Hello!

First my background: C++ and Java ages ago. Since then only 
PLSQL. Now learning D just for fun and personal education on time 
to time and very pleased about it :-)

Now I have to ask a question here, because I could not find a 
corresponding answer for it. Or I am unable to find it :-)

I was wondering about the memory system in D (and other C like 
languages) about the handling of the memory allocation overhead.

Just an example that I have seen many times:


int main()
{
     foreach(i;0 .. 10000)
     {
         int x;
         // do something with x
     }
     return 0;
}


Do I understand it right that the variable x will be created 
10000 times and destroyed at the end of the scope in each loop ? 
Or will it be 10000 overwritten by creation ?
I mean does it not cost the CPU some time to allocate (I know 
were talking here of nsec) but work is work. As far I know from 
my school days in assembler, allocation of memory is one of the 
most expensive instructions on the cpu. Activate memory block, 
find some free place, reserve, return pointer to caller, and so 
on..

In my opinion this version should perform a little better:

int main()
{
     int x;
     foreach(i;0 .. 10000)
     {
         x = 0; // reinitialize
         // do something with x
     }
     return 0;
}

Or do I miss something and there is an optimization by the 
compiler to avoid recreating/destroying the variable x 10000 
times ?

I know that version 1 is more secure because there will be no 
value waste after each loop we could stumble onto, but that is 
not my question here.
And I know that were talking about really small cpu usage 
compared to what an app should do. But when it is to look on 
performance like games or gui (and there are a lot of examples 
like version 1 out there) then I have to ask myself if it is not 
just a waste of cpu time ?

Or is it a styling of code thing ?

Thank you for your time!


Greetings from Austria
Thomas
Jun 17 2019
next sibling parent reply Adam D. Ruppe <destructionator gmail.com> writes:
On Monday, 17 June 2019 at 19:53:52 UTC, Thomas wrote:
 Do I understand it right that the variable x will be created 
 10000 times and destroyed at the end of the scope in each loop 
 ? Or will it be 10000 overwritten by creation ?
No, the compiler will generate code to reuse the same thing each loop.
 In my opinion this version should perform a little better:
That's actually slightly *less* efficient (in some situations) because it doesn't allow the compiler to reuse the memory for `x` after the loop has exited. But in both cases, the compiler will just set aside a little bit of local space (or just a cpu scratch register) and use it repeatedly. It is totally "free".
Jun 17 2019
parent Thomas <earthwormjimmy gmx.at> writes:
First, thank you for your fast reply!

On Monday, 17 June 2019 at 20:00:34 UTC, Adam D. Ruppe wrote:
 No, the compiler will generate code to reuse the same thing 
 each loop.
Does this code also work on complex types like structs ?
Jun 17 2019
prev sibling parent reply "H. S. Teoh" <hsteoh quickfur.ath.cx> writes:
On Mon, Jun 17, 2019 at 07:53:52PM +0000, Thomas via Digitalmars-d-learn wrote:
[...]
 int main()
 {
     foreach(i;0 .. 10000)
     {
         int x;
         // do something with x
     }
     return 0;
 }
 
 Do I understand it right that the variable x will be created 10000
 times and destroyed at the end of the scope in each loop ? Or will it
 be 10000 overwritten by creation ?
If x were a heap-allocated object, then your concerns would be true: it would be allocated once every iteration (and also add to the garbage that the GC will have to collect later). However, in this case x is an int, which is a value type. This means it will be allocated on the stack, and allocation is as trivial as bumping the stack pointer (practically zero cost), and deallocation is as simple as bumping the stack pointer the other way. In fact, it doesn't even have to bump the stack pointer between iterations, since it's obvious from code analysis that x will always be allocated on the same position of the stack relative to the function's call frame, so in the generated machine code it can be as simple as just using that memory location directly, and reusing the same location between iterations.
 I mean does it not cost the CPU some time to allocate (I know were
 talking here of nsec) but work is work. As far I know from my school
 days in assembler, allocation of memory is one of the most expensive
 instructions on the cpu. Activate memory block, find some free place,
 reserve, return pointer to caller, and so on..
That's not entirely accurate. It depends on how the allocation is done. Allocation on the stack is extremely cheap,and consists literally of adding some number (the size of the allocation) to a register (the stack pointer) -- you cannot get much simpler than that. Also, memory allocation on the heap is not "one of the most expensive instructions". The reason it's expensive is because it's not a single instruction, but an entire subroutine of instructions to manage the heap. There is no CPU I know of that has a built-in instruction for heap allocation.
 In my opinion this version should perform a little better:
 
 int main()
 {
     int x;
     foreach(i;0 .. 10000)
     {
         x = 0; // reinitialize
         // do something with x
     }
     return 0;
 }
 
 Or do I miss something and there is an optimization by the compiler to
 avoid recreating/destroying the variable x 10000 times ?
[...] What you wrote here is exactly how the compiler will emit the machine code given your first code example above. Basically all modern optimizing compilers will implement it this way, and you never have to worry about stack allocation being slow. The only time you will have a performance problem is if x is a heap-allocated object. In *that* case you will want to look into reusing previous instances of the object between iterations. But if x is a value type (int, or struct, etc.), you don't have to worry about this at all. The first version of the code you have above is the preferred one, because it makes the code clearer. On a related note, in modern programming languages generally you should be more concerned about the higher-level meaning of the code than worry about low-level details like how instructions are generated, because generally speaking the machine code generated by the compiler is highly transformed from the original source code, and generally will not have a 1-to-1 mapping to the higher-level logical meaning of the code, i.e., the first version of the code *logically* allocates x at the beginning of each iteration of the loop and deallocates it at the end, but the actual generated machine code elides all of that because the compiler's optimizer can easily see that it's a stack allocation that always ends up in the same place, so none of the allocation/deallocation actually needs to be represented as-is in the machine code. On another note, for performance-related concerns the general advice these days is to write something in the most straightforward, logical way first, and then if the performance is not good enough, **use a profiler** to identify where the hotspots are, and optimize those. Trying to optimize before you have real-world, profiler data that your code is a hotspot is premature optimization, widely regarded as evil because it usually leads to overly-convoluted code that's hard to understand and maintain, and often actually *slower* because the way you expressed the meaning of the code has become so obfuscated that the optimizer can't figure out what you're actually trying to do, so it gives up and doesn't even try to apply any optimizations that may have benefitted the code. (Of course, the above is with the caveat that writing "straightforward" code only holds up to a certain extent; when it comes to algorithms, for example, no compiler is going to change an O(n^2) algorithm into an O(log n) one; you have to select the appropriate algorithm in the first place in order to avoid the inevitable performance bottleneck. And there are certain coding practices that can result in overall better performance, but these come with experience, and should not be blindly applied in the hopes that it will magically make your program faster. If you have a hotspot somewhere in the program, you can optimize everything else to hell and back and it will still be slow, whereas if you use a profiler to identify the hotspot in the first place, a 1-line change could have won you an 80% performance boost where long hours of blood and sweat won you only 5%. You need to identify and eliminate the big hotspots first, and *then* come back and trim off the smaller, lower-level stuff for that extra 2-5% win. And experience shows that generally, most predictions of where the hotspots will be are wrong. Always use a profiler before jumping to conclusions.) T -- Those who don't understand D are condemned to reinvent it, poorly. -- Daniel N
Jun 17 2019
parent Thomas <earthwormjimmy gmx.at> writes:
On Monday, 17 June 2019 at 20:26:28 UTC, H. S. Teoh wrote:
 On Mon, Jun 17, 2019 at 07:53:52PM +0000, Thomas via 
 Digitalmars-d-learn wrote: [...]
 [...]
If x were a heap-allocated object, then your concerns would be true: it would be allocated once every iteration (and also add to the garbage that the GC will have to collect later). [...]
Thank you for your exact explanation on how the compiler works inside. That will clear some question that have bothered me. Thomas
Jun 17 2019