www.digitalmars.com         C & C++   DMDScript  

digitalmars.D.learn - Garbage collection

reply James Gray <james.gray.public gmail.com> writes:
I am writing a web application using vibe.d (not sure whether 
that is relevant or not), and have run into a memory leak. I 
wrote the following code to try and replicate the problem.

import std.algorithm;
import std.range;
import std.format;
import std.stdio;
import core.thread;
import core.memory;

auto f(R)(R r) {
  return format("%s", r);
}

void main()
{
  while(true)
  {
   writeln("Starting");
   {
    auto str = f(iota(100000000).map!(x=>x+1));
   }
   writeln("Done");
   GC.collect();
   Thread.sleep( dur!("msecs")( 30000 ) );
  }
}

  It doesn't replicate the problem but it still doesn't behave as 
I would expect. I would expect the memory usage of this code to 
grow and shrink. However, I find that the memory usage grows to 
about 1.5GB and never decreases. Is there something I am not 
understanding?
Jun 27 2020
next sibling parent James Gray <test test.com> writes:
On Saturday, 27 June 2020 at 10:08:15 UTC, James Gray wrote:
 I am writing a web application using vibe.d (not sure whether 
 that is relevant or not), and have run into a memory leak. I 
 wrote the following code to try and replicate the problem.

 [...]
I now compiled the same code above with ldc2 and it is leaking. Any suggestions?
Jun 27 2020
prev sibling next sibling parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Saturday, 27 June 2020 at 10:08:15 UTC, James Gray wrote:

 I find that the memory usage grows to about 1.5GB and never 
 decreases. Is there something I am not understanding?
How are you measuring that? GC.collect() does not necessarily release the pages to the OS. For that, there's the GC.minimize().
Jun 27 2020
parent reply James Gray <test test.com> writes:
On Saturday, 27 June 2020 at 11:00:58 UTC, Stanislav Blinov wrote:
 On Saturday, 27 June 2020 at 10:08:15 UTC, James Gray wrote:

 I find that the memory usage grows to about 1.5GB and never 
 decreases. Is there something I am not understanding?
How are you measuring that? GC.collect() does not necessarily release the pages to the OS. For that, there's the GC.minimize().
I am measuring the memory usage using top from the command line. GC.minimize() does seem to stop the leak. But it doesn't explain why the program isn't releasing essentially all the memory between calls to f (it using around 2GB ram all the time). Is there a way of achieving that?
Jun 27 2020
next sibling parent Mike Parker <aldacron gmail.com> writes:
On Saturday, 27 June 2020 at 11:11:38 UTC, James Gray wrote:

 I am measuring the memory usage using top from the command line.
 GC.minimize() does seem to stop the leak. But it doesn't 
 explain why
 the program isn't releasing essentially all the memory between 
 calls
 to f (it using around 2GB ram all the time). Is there a way of 
 achieving that?
It's not a leak. The GC allocates memory as it needs it and holds on to it. When something is collected, the GC can reuse then released memory when it needs it.
Jun 27 2020
prev sibling parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Saturday, 27 June 2020 at 11:11:38 UTC, James Gray wrote:

 I am measuring the memory usage using top from the command line.
 GC.minimize() does seem to stop the leak.
That is not a memory leak. That's the allocator keeping pages for itself to not have to go to the kernel every time you allocate.
 But it doesn't explain why the program isn't releasing 
 essentially all the memory between calls to f (it using around 
 2GB ram all the time).
Allocators usually don't do that. They keep (at least some) memory mapped to make allocations more efficient.
 Is there a way of achieving that?
I would think collect + minimize should do the trick. Just keep in mind that that's grossly inefficient.
Jun 27 2020
parent reply Arafel <er.krali gmail.com> writes:
On 27/6/20 13:21, Stanislav Blinov wrote:
 
 I would think collect + minimize should do the trick. Just keep in mind 
 that that's grossly inefficient.
If you are using linux, have in mind that the memory is often not returned to the OS even after a (libc) free. If you check with tools like `top`, it'll still show as assigned to the process. What I had to do (both in D and in C/C++) was to call malloc_trim [1] manually to have the memory actually sent back to the OS. [1]: https://man7.org/linux/man-pages/man3/malloc_trim.3.html
Jun 27 2020
parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Saturday, 27 June 2020 at 11:35:12 UTC, Arafel wrote:

 If you are using linux, have in mind that the memory is often 
 not returned to the OS even after a (libc) free.
That's a good observation. Although a GC implementation is not required to actually use malloc, so depending on that falls into "know what you're doing" territory :)
Jun 27 2020
parent reply James Gray <test test.com> writes:
On Saturday, 27 June 2020 at 12:07:19 UTC, Stanislav Blinov wrote:
 On Saturday, 27 June 2020 at 11:35:12 UTC, Arafel wrote:

 If you are using linux, have in mind that the memory is often 
 not returned to the OS even after a (libc) free.
That's a good observation. Although a GC implementation is not required to actually use malloc, so depending on that falls into "know what you're doing" territory :)
Thanks for the help, but unfortunately it isn't stopping memory usage growing in the original app. I will try and build a minimal example. In the meantime perhaps someone can suggest how I might figure out what is going on. Repeating the same action is giving memory usage growth as follows. 1.7GB first time (which now drops to about 1GB), then 2.7GB dropping to about 2GB and so on.
Jun 27 2020
parent James Gray <test test.com> writes:
On Saturday, 27 June 2020 at 12:53:01 UTC, James Gray wrote:
 On Saturday, 27 June 2020 at 12:07:19 UTC, Stanislav Blinov 
 wrote:
 On Saturday, 27 June 2020 at 11:35:12 UTC, Arafel wrote:

 [...]
That's a good observation. Although a GC implementation is not required to actually use malloc, so depending on that falls into "know what you're doing" territory :)
Thanks for the help, but unfortunately it isn't stopping memory usage growing in the original app. I will try and build a minimal example. In the meantime perhaps someone can suggest how I might figure out what is going on. Repeating the same action is giving memory usage growth as follows. 1.7GB first time (which now drops to about 1GB), then 2.7GB dropping to about 2GB and so on.
Which eventually results in mac os running out of memory.
Jun 27 2020
prev sibling parent reply kinke <noone nowhere.com> writes:
On Saturday, 27 June 2020 at 10:08:15 UTC, James Gray wrote:
 have run into a memory leak
Something seems really off indeed. I've run this on Win64 with DMD (2.092) and LDC (1.22), without any extra cmdline options: ----- import core.memory; import core.stdc.stdio; import std.range; import std.format; auto f(R)(R r) { return format("%s", r); } int toMB(ulong size) { return cast(int) (size / 1048576.0 + 0.5); } void printGCStats() { const stats = GC.stats; const used = toMB(stats.usedSize); const free = toMB(stats.freeSize); const total = toMB(stats.usedSize + stats.freeSize); printf(" GC stats: %dM used, %dM free, %dM total\n", used, free, total); } void main() { printGCStats(); while (true) { puts("Starting"); string str = f(iota(100_000_000)); printf(" string size: %dM\n", toMB(str.length)); str = null; GC.collect(); printGCStats(); } } ----- Output with DMD (no change with the precise GC via `--DRT-gcopt=gc:precise`): ----- GC stats: 0M used, 1M free, 1M total Starting string size: 943M GC stats: 1168M used, 1139M free, 2306M total Starting string size: 943M GC stats: 1168M used, 2456M free, 3623M total Starting string size: 943M GC stats: 1168M used, 2456M free, 3623M total Starting string size: 943M GC stats: 1168M used, 2456M free, 3623M total Starting string size: 943M GC stats: 1168M used, 2456M free, 3623M total ----- With LDC: ----- GC stats: 0M used, 1M free, 1M total Starting string size: 943M GC stats: 1168M used, 1139M free, 2306M total Starting string size: 943M GC stats: 2335M used, 1288M free, 3623M total Starting string size: 943M GC stats: 2335M used, 2605M free, 4940M total Starting string size: 943M GC stats: 2335M used, 2605M free, 4940M total Starting string size: 943M GC stats: 2335M used, 2605M free, 4940M total ----- Note that I explicitly clear the `str` slice before GC.collect(), so that the stack shouldn't contain any refs to the fat string anymore.
Jun 27 2020
next sibling parent kinke <noone nowhere.com> writes:
=> https://issues.dlang.org/show_bug.cgi?id=20983
Jun 27 2020
prev sibling next sibling parent reply James Gray <test test.com> writes:
On Saturday, 27 June 2020 at 14:12:09 UTC, kinke wrote:
 On Saturday, 27 June 2020 at 10:08:15 UTC, James Gray wrote:
 have run into a memory leak
Something seems really off indeed. I've run this on Win64 with DMD (2.092) and LDC (1.22), without any extra cmdline options: ----- import core.memory; import core.stdc.stdio; import std.range; import std.format; auto f(R)(R r) { return format("%s", r); } int toMB(ulong size) { return cast(int) (size / 1048576.0 + 0.5); } void printGCStats() { const stats = GC.stats; const used = toMB(stats.usedSize); const free = toMB(stats.freeSize); const total = toMB(stats.usedSize + stats.freeSize); printf(" GC stats: %dM used, %dM free, %dM total\n", used, free, total); } void main() { printGCStats(); while (true) { puts("Starting"); string str = f(iota(100_000_000)); printf(" string size: %dM\n", toMB(str.length)); str = null; GC.collect(); printGCStats(); } } ----- Output with DMD (no change with the precise GC via `--DRT-gcopt=gc:precise`): ----- GC stats: 0M used, 1M free, 1M total Starting string size: 943M GC stats: 1168M used, 1139M free, 2306M total Starting string size: 943M GC stats: 1168M used, 2456M free, 3623M total Starting string size: 943M GC stats: 1168M used, 2456M free, 3623M total Starting string size: 943M GC stats: 1168M used, 2456M free, 3623M total Starting string size: 943M GC stats: 1168M used, 2456M free, 3623M total ----- With LDC: ----- GC stats: 0M used, 1M free, 1M total Starting string size: 943M GC stats: 1168M used, 1139M free, 2306M total Starting string size: 943M GC stats: 2335M used, 1288M free, 3623M total Starting string size: 943M GC stats: 2335M used, 2605M free, 4940M total Starting string size: 943M GC stats: 2335M used, 2605M free, 4940M total Starting string size: 943M GC stats: 2335M used, 2605M free, 4940M total ----- Note that I explicitly clear the `str` slice before GC.collect(), so that the stack shouldn't contain any refs to the fat string anymore.
Thank you for doing this. I hope my example doesn't obscure what you show here. (I borrowed some of your code). I have produced something which essentially reproduces my problem. ----------- import std.range; import std.algorithm; import std.format; import std.stdio; import core.thread; import core.memory; struct Node { Node* next; Node* prev; ulong val; } Node* insertAfter(Node* cur, ulong val) { Node* node = new Node; if (cur != null) { node.next = cur.next; node.prev = cur; cur.next = node; node.next.prev = node; } else { node.next = node; node.prev = node; } node.val = val; return node; } int toMB(ulong size) { return cast(int) (size / 1048576.0 + 0.5); } void printGCStats() { const stats = GC.stats; const used = toMB(stats.usedSize); const free = toMB(stats.freeSize); const total = toMB(stats.usedSize + stats.freeSize); writef(" GC stats: %dM used, %dM free, %dM total\n", used, free, total); } void main() { while(true) { printGCStats(); writeln("Starting"); Node* dll; dll = iota(200000000).fold!((c,x)=>insertAfter(c,x))(dll); writef("Last element %s\n", dll.val); dll = null; writeln("Done"); GC.collect(); GC.minimize(); Thread.sleep( dur!("msecs")( 10000 ) ); } } ---------- With DMD this produces: GC stats: 0M used, 0M free, 0M total Starting Last element 199999999 Done GC stats: 6104M used, 51M free, 6155M total Starting Last element 199999999 Done GC stats: 12207M used, 28M free, 12235M total With LDC2 this produces: GC stats: 0M used, 0M free, 0M total Starting Last element 199999999 Done GC stats: 6104M used, 51M free, 6155M total Starting Last element 199999999 Done GC stats: 12207M used, 28M free, 12235M total
Jun 27 2020
next sibling parent James Gray <test test.com> writes:
On Saturday, 27 June 2020 at 14:49:34 UTC, James Gray wrote:
 On Saturday, 27 June 2020 at 14:12:09 UTC, kinke wrote:
 [...]
Thank you for doing this. I hope my example doesn't obscure what you show here. (I borrowed some of your code). [...]
In case it helps, setting all the next and previous pointers in the link list to null allows the garbage collector to collect in the above code.
Jun 27 2020
prev sibling parent reply Kagamin <spam here.lot> writes:
On Saturday, 27 June 2020 at 14:49:34 UTC, James Gray wrote:
 I have produced something which essentially reproduces my 
 problem.
What is the problem? Do you have a leak or you want to know how GC works?
Jun 29 2020
parent James Gray <test test.com> writes:
On Tuesday, 30 June 2020 at 06:16:26 UTC, Kagamin wrote:
 On Saturday, 27 June 2020 at 14:49:34 UTC, James Gray wrote:
 I have produced something which essentially reproduces my 
 problem.
What is the problem? Do you have a leak or you want to know how GC works?
I have managed to resolve my problem (which was a memory leak). My code uses a large data structure similar to a link list and the garbage collector was not collecting it. However, if I set all the "links" between the nodes in the data structure to null it is then collected.
Jul 20 2020
prev sibling parent reply Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Saturday, 27 June 2020 at 14:12:09 UTC, kinke wrote:

 Note that I explicitly clear the `str` slice before 
 GC.collect(), so that the stack shouldn't contain any refs to 
 the fat string anymore.
Hrm... What happens if you call collect() twice?
Jun 27 2020
parent reply kinke <noone nowhere.com> writes:
On Saturday, 27 June 2020 at 15:27:34 UTC, Stanislav Blinov wrote:
 On Saturday, 27 June 2020 at 14:12:09 UTC, kinke wrote:

 Note that I explicitly clear the `str` slice before 
 GC.collect(), so that the stack shouldn't contain any refs to 
 the fat string anymore.
Hrm... What happens if you call collect() twice?
Nothing changes, even when collecting 5 times at the end of each iteration. In the filed testcase, I've extracted the stack ref to a dedicated function, so that there really shouldn't be any refs on the stack (this is unoptimized code after all...).
Jun 27 2020
parent Stanislav Blinov <stanislav.blinov gmail.com> writes:
On Saturday, 27 June 2020 at 16:03:12 UTC, kinke wrote:
 On Saturday, 27 June 2020 at 15:27:34 UTC, Stanislav Blinov 
 wrote:
 Hrm... What happens if you call collect() twice?
Nothing changes, even when collecting 5 times at the end of each iteration. In the filed testcase, I've extracted the stack ref to a dedicated function, so that there really shouldn't be any refs on the stack (this is unoptimized code after all...).
Here on Linux, the double collection results in this output: GC stats: 0M used, 0M free, 0M total Starting string size: 943M GC stats: 0M used, 2306M free, 2306M total Starting string size: 943M GC stats: 0M used, 2306M free, 2306M total Starting string size: 943M GC stats: 0M used, 2306M free, 2306M total Starting string size: 943M GC stats: 0M used, 2306M free, 2306M total
Jun 27 2020