www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Running out of memory

reply FG <home fgda.pl> writes:
Hello,
I'm having memory problems with a string concatenation program.
I'm on Windows 7 64-bit with DMD32 v2.060 and cygwin gcc-3.4.4 32-bit.
Consider this example in C++:

     #include <iostream>
     int main(int argc, char **args) {
         std::string s;
         for (int i=0, j=0; i < 500000000; i++) {
             s += "....";
             if (i % 10000000 == 0) {
                 std::cout << "loop " << ++j << std::endl;
                 s = "";
             }
         }
         return 0;
     }

After the first "loop" (10 million iterations) the memory footprint is stable
at 
40 MB, which is expected. Now, rewriting that to D:

     import std.stdio;
     class C { string x; }
     void main(string[] args) {
         char[] s;
         // s.reserve(40_000_000);
         for (int i=0, j=0; i < 500_000_000; i++) {
             s ~= "....";
             if (i % 10_000_000 == 0) {
                 writeln("loop ", ++j); stdout.flush();
                 s = "".dup;
             }
         }
     }

It runs 6 times faster thanks to a better string implementation than C++ (hats 
off), but memory keeps growing long after the first "loop" message, reaching 
around 350-430 MB, then sometimes near the end it starts growing again,
reaching 
even 700 MB. Why doesn't the GC take care of it? Interestingly, when I
uncomment 
the s.reserve(...) line, it's much worse - memory keeps growing until the 
program dies with a core.exception.OutOfMemoryError.

Then I have rewritten this code to use an Appender:

     import std.stdio, std.array;
     class C { string x; }
     void main(string[] args) {
         char[] s;
         s.reserve(40_000_000);
         auto app = appender(s);
         for (int i=0, j=0; i < 500_000_000; i++) {
             app.put("....");
             if (i % 10_000_000 == 0) {
                 writeln("loop ", ++j); stdout.flush();
                 app.clear();
             }
         }
     }

This is what I was after - it runs 4 times faster than the first D code and
uses 
only 40 MB of memory. It's close to the efficiency of operating on the array 
using indexing. I get, that it's by design much faster than s ~= "....", but
why 
is so much memory wasted in the first D example?
Am I doing something wrong? I thought s ~= a is rewritten as an equivalent of 
s.push_back(a) (from C++'s vector) and not s = s ~ a.

best regards,
FG
Dec 27 2012
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
FG:

 but why is so much memory wasted in the first D example?
There is also assumeSafeAppend, for your experiments :-) Beside the experiments, there is also this to read: http://dlang.org/d-array-article.html Bye, bearophile
Dec 27 2012
parent reply FG <home fgda.pl> writes:
On 2012-12-27 13:52, bearophile wrote:
 There is also assumeSafeAppend, for your experiments :-)
This is it! Makes appends in-place. The following code uses only 40 MB of RAM (or 75 MB without prior call to reserve): import std.stdio; class C { string x; } void main(string[] args) { char[] s; s.reserve(40_000_000); for (int i=0, j=0; i < 500_000_000; i++) { if (i % 10_000_000 == 0) { writeln("loop ", ++j); stdout.flush(); s.length = 0; s.assumeSafeAppend(); } s ~= "...."; } } But I also had to write s.length = 0 instead of either s = [] or s = "".dup. I wonder why GC didn't reclaim the regions previously used by the array before it an empty one was assigned to s. GC has started freeing those regions only after the 21'st loop when used up RAM has gone up to 780 MB. I have 8 GB of RAM, 4 GB free but I doubt 780 MB is the right moment to start clearing space. :) Oddly enough I can keep it at 40 MB (or 75 MB when not reserving space) without assumeSafeAppend(), when using manual delete: import std.stdio; class C { string x; } void main(string[] args) { char[] s; s.reserve(40_000_000); for (int i=0, j=0; i < 500_000_000; i++) { if (i % 10_000_000 == 0) { writeln("loop ", ++j); stdout.flush(); delete s; s = []; s.reserve(40_000_000); } s ~= "...."; } } Now, this delete looks like Python's kind of GC cheating. :) Personally, I don't have anything against it, but is it a valid practice?
Dec 27 2012
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
FG:

 But I also had to write s.length = 0 instead of either s = [] 
 or s = "".dup.
Also try: s = null;
 I wonder why GC didn't reclaim the regions previously used by 
 the array before it an empty one was assigned to s.
Some explanations are in the article I've linked previously. Most of the other part of the answer comes from the fact that currently the D GC is not a precise GC. So randomly inbound pointers keep large memory chunks alive.
 Now, this delete looks like Python's kind of GC cheating. :)
 Personally, I don't have anything against it, but is it a valid 
 practice?
delete is deprecated in D. There is destroy(), and a replacement for delete in the memory/GC module. Bye, bearophile
Dec 27 2012
next sibling parent Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, December 27, 2012 15:31:45 bearophile wrote:
 delete is deprecated in D. There is destroy(), and a replacement
 for delete in the memory/GC module.
Indeed. But in this particular case, destroy would be of no help, as it specifically destroys objects _without_ freeing their memory. And at the moment, for strings, I don't believe that it's any different from setting them to null. - Jonathan M Davis
Dec 27 2012
prev sibling parent reply FG <home fgda.pl> writes:
On 2012-12-27 15:31, bearophile wrote:
 delete is deprecated in D. There is destroy(), and a replacement for delete in
 the memory/GC module.
destroy() had no effect on program's behavior, even the docs state that "it does not initiate a GC cycle or free any GC memory". So I hope delete will still remain an option. Very useful for freeing such large chunks like in this example. What do you mean as the delete replacement?
Dec 27 2012
next sibling parent reply "bearophile" <bearophileHUGS lycos.com> writes:
FG:

 What do you mean as the delete replacement?
Take a look here (at free()): http://dlang.org/phobos/core_memory.html Bye, bearophile
Dec 27 2012
parent reply FG <home fgda.pl> writes:
On 2012-12-27 16:14, bearophile wrote:
 FG:

 What do you mean as the delete replacement?
Take a look here (at free()): http://dlang.org/phobos/core_memory.html
Oh, I didn't expect this to work on arrays but it does as good as delete: GC.free(cast(void*)s);
Dec 27 2012
parent reply "bearophile" <bearophileHUGS lycos.com> writes:
FG:

 GC.free(cast(void*)s);
Don't cast arrays arrays to pointers, it's unclean, because dynamic arrays are two words long. Use the .ptr property: GC.free(s.ptr); Or if the precedent doesn't work: GC.free(cast(void*)s.ptr); Bye, bearophile
Dec 27 2012
parent FG <home fgda.pl> writes:
On 2012-12-27 16:31, bearophile wrote:
 Don't cast arrays arrays to pointers, it's unclean, because dynamic arrays are
 two words long. Use the .ptr property:

 GC.free(s.ptr);
Things appeared sorted out, but it's not over yet. delete still wins. Performance-wise free() worked like delete, but there's a nasty glitch. The following code crashes after a few hundred loops with core.exception.InvalidMemoryOperationError. import std.stdio, core.memory; char[1000] filler = '.'; int fill_times = 10000; void main(string[] args) { char[] s; for (int i=0; i < 100000; i++) { for (int j=0; j < fill_times; j++) s ~= filler; writeln("loop ", i + 1, ", length: ", s.length); stdout.flush(); // delete s; GC.free(s.ptr); s = []; } } Surprisingly, when fill_times >= 3736, it always crashes after the 341th loop (relies on the filler having 1000 bytes. Some other numbers also trigger it). What could be responsible for this odd behavior of free()?
Dec 27 2012
prev sibling parent reply Jonathan M Davis <jmdavisProg gmx.com> writes:
On Thursday, December 27, 2012 16:00:58 FG wrote:
 On 2012-12-27 15:31, bearophile wrote:
 delete is deprecated in D. There is destroy(), and a replacement for
 delete in the memory/GC module.
destroy() had no effect on program's behavior, even the docs state that "it does not initiate a GC cycle or free any GC memory". So I hope delete will still remain an option. Very useful for freeing such large chunks like in this example. What do you mean as the delete replacement?
The idea is that if you want to be manually freeing memory, you shouldn't be using GC memory in the first place. It's unsafe to be freeing it. Let the GC do that. If you want to manually manage memory, then manually manage it with malloc and free. If you really need to, then core.memory has the functions for managing the GC's memory, but you really shouldn't be doing that normally. Manual memory management should become easier once custom allocators are sorted out, since right now, if you want objects, you need to use emplace to turn the malloced memory into a proper object, and it's definitely more of a pain than using new with the GC. But it's still not a good idea in general to try and do manual memory management on GC memory. - Jonathan M Davis
Dec 27 2012
parent reply FG <home fgda.pl> writes:
On 2012-12-28 05:41, Jonathan M Davis wrote:
 The idea is that if you want to be manually freeing memory, you shouldn't be
 using GC memory in the first place. It's unsafe to be freeing it. Let the GC do
 that. If you want to manually manage memory, then manually manage it with
 malloc and free. If you really need to, then core.memory has the functions for
 managing the GC's memory, but you really shouldn't be doing that normally.
I don't require manual malloc. I wanted to make a hint to the GC, that this block can be freed, because I know there are no other pointers into it (that would be used later), while the imprecise GC finds false pointers and prevents the block from being released. Of course delete is not safe in general, but if the program is designed to process data in completely isolated batches, then it shouldn't pose a threat.
Dec 28 2012
parent "evilrat" <evilrat666 gmail.com> writes:
On Friday, 28 December 2012 at 10:40:32 UTC, FG wrote:
 I don't require manual malloc. I wanted to make a hint to the 
 GC, that this block can be freed, because I know there are no 
 other pointers into it (that would be used later), while the 
 imprecise GC finds false pointers and prevents the block from 
 being released.
set to null + GC.collect()? but of course it shouldn't be used careless too
Dec 28 2012
prev sibling parent "evilrat" <evilrat666 gmail.com> writes:
On Thursday, 27 December 2012 at 14:15:51 UTC, FG wrote:
 ...

 Now, this delete looks like Python's kind of GC cheating. :)
 Personally, I don't have anything against it, but is it a valid 
 practice?
delete itself is considered deprecated in D2, though it was used widely, instead there is destroy function but i still use delete too and never tested that func. also u can call GC.collect() from core.memory module to tell GC to manually free mem but don't accept this as absolutely truth because i'm a bit noob
Dec 27 2012