digitalmars.D.learn - GC question

osa1 (18/18) Jan 31 2017 Hi all,

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (8/13) Feb 01 2017 The GC isn't competitive with the ones you find in GC languages

osa1 (6/20) Feb 01 2017 Thanks for the answer. Could you elaborate on the lacklustre

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (15/19) Feb 01 2017 Rust, Go, Java, Swift etc have a single memory management scheme
Kagamin (19/25) Feb 03 2017 Currently can't touch GC from destructor during collection.

osa1 (5/8) Feb 03 2017 This looks pretty bad. I think I'll consider something else until

Dsby (3/11) Feb 03 2017 you can use less auto GC. use the RC to replace the GC.

thedeemon (31/41) Feb 04 2017 Here's some practical perspective from someone who released a

osa1 (11/52) Feb 04 2017 Honestly this still sounds horrible. I'd be OK with any of these

Kagamin (3/4) Feb 04 2017 It's used mostly for server software. Things are moving to 64
Adam D. Ruppe (15/16) Feb 04 2017 All GCs are prone to leak, including precise ones. The point of

osa1 (8/11) Feb 04 2017 Of course I can have leaks in a GC environment, but having
Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= (12/17) Feb 08 2017 No, the main point of GC is to prevent leaks in the case where

thedeemon (11/15) Feb 04 2017 By "managed heap" I just meant the GC heap, the one used by "new"

rikki cattermole (20/25) Feb 04 2017 The best of both worlds can be done quite simply.

Cym13 (120/151) Feb 05 2017 Some data to weigh that in order to compare different memory

osa1 <omeragacan gmail.com> writes:

Hi all,

I was looking at D as the next language to use in my hobby 
projects, but the
"conservative GC" part in the language spec
(http://dlang.org/spec/garbage.html) looks a bit concerning. I'm 
wondering what
are the implications of the fact that current GC is a Boehm-style 
conservative
GC rather than a precise one, I've never worked with a 
conservative GC before.
Are there any disallowed memory operations? Can I break things by 
not following
some unchecked rules etc. ? How often does it leak? Do I need to 
be careful
with some operations to avoid leaks? Is a precise GC in the 
roadmap? any kind
of comments on the GC would be really appreciated.

Thanks

Jan 31 2017

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote:
 I'm wondering what
 are the implications of the fact that current GC is a 
 Boehm-style conservative
 GC rather than a precise one, I've never worked with a 
 conservative GC before.

The GC isn't competitive with the ones you find in GC languages 
(Java, Go etc). E.g. Go is now aiming for GC pauses in the 
microseconds range.

Resource management in D is rather lacklustre, even C++ does 
better imho. D seems to move towards using thread local 
ref-counting and making GC optional. I guess that would be ok on 
cpus with few cores, but not really adequate on many core CPUs.

Feb 01 2017

osa1 <omeragacan gmail.com> writes:

On Wednesday, 1 February 2017 at 09:40:17 UTC, Ola Fosheim 
Grøstad wrote:
 On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote:
 I'm wondering what
 are the implications of the fact that current GC is a 
 Boehm-style conservative
 GC rather than a precise one, I've never worked with a 
 conservative GC before.

 The GC isn't competitive with the ones you find in GC languages 
 (Java, Go etc). E.g. Go is now aiming for GC pauses in the 
 microseconds range.

 Resource management in D is rather lacklustre, even C++ does 
 better imho. D seems to move towards using thread local 
 ref-counting and making GC optional. I guess that would be ok 
 on cpus with few cores, but not really adequate on many core 
 CPUs.

Thanks for the answer. Could you elaborate on the lacklustre 
part? It's fine if I have to do manual memory management, but I 
don't want any leaks. Ideally I'd have a precise GC + RAII style 
resource management when needed.

Feb 01 2017

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Wednesday, 1 February 2017 at 09:50:42 UTC, osa1 wrote:
 Thanks for the answer. Could you elaborate on the lacklustre 
 part? It's fine if I have to do manual memory management, but I 
 don't want any leaks. Ideally I'd have a precise GC + RAII 
 style resource management when needed.

Rust, Go, Java, Swift etc have a single memory management scheme 
which is used by libraries and mostly enforced by the compiler.

In C++ you tend to go with unique ownership and occasional shared 
ownership with the ability to have weak pointers and swap out 
objects without updating pointer, and there is an easy transition 
form old ad-hoc ownership to shared (the reference counter is in 
a separate object). It is not enforced by the compiler, but C++ 
is moving towards having dedicated tools for checking correctness.

In D the goal is to have safety enforced by the compiler, but it 
isn't quite there yet and what is on the map for leak free 
resource management seems a simple reference counting mechanism 
(simpler than swift?) with refcount embedded in objects (like 
intrusive_ptr in Boost except the compiler is intended to be 
better at optimizing unnecessary updates of the reference count).

Feb 01 2017

Kagamin <spam here.lot> writes:

On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote:
 Are there any disallowed memory operations?

Currently can't touch GC from destructor during collection. 
Another concern is interoperability with C-allocated memory: GC 
knows nothing about C heap.

 How often does it leak?

Leaks are likely in 32-bit processes and unlikely in 64-bit 
processes. See e.g. https://issues.dlang.org/show_bug.cgi?id=15723

 Do I need to be careful with some operations to avoid leaks?

Leaks happen only due to false pointers. But data allocated in GC 
with new operator and known to have no pointers (e.g. strings) is 
not scanned. Premature collection happen when GC doesn't see a 
pointer to the allocated data, happens when such pointer is put 
in a memory GC doesn't see, like C heap.

 Is a precise GC in the roadmap?

There's an effort on it: 
https://forum.dlang.org/post/hdwwkzqswwtffjehenjt forum.dlang.org

 It's fine if I have to do manual memory management, but I don't 
 want any leaks.

If you manually deallocate memory, it gets deallocated for sure, 
shouldn't leak.
Comparing to java, D GC trades GC performance for code execution 
performance, which can result in better overall performance when 
you don't allocate much and worse performance for 
allocation-heavy code that java GC is optimized for.

Feb 03 2017

osa1 <omeragacan gmail.com> writes:

On Friday, 3 February 2017 at 10:49:00 UTC, Kagamin wrote:
 Leaks are likely in 32-bit processes and unlikely in 64-bit 
 processes. See e.g. 
 https://issues.dlang.org/show_bug.cgi?id=15723

This looks pretty bad. I think I'll consider something else until 
D's memory management story gets better. This is sad because the 
language otherwise looks quite good, and I'd love to try 
assertions, contracts, scope guards, macros etc.

Feb 03 2017

Dsby <dushibaiyu yahoo.com> writes:

On Friday, 3 February 2017 at 11:36:26 UTC, osa1 wrote:
 On Friday, 3 February 2017 at 10:49:00 UTC, Kagamin wrote:
 Leaks are likely in 32-bit processes and unlikely in 64-bit 
 processes. See e.g. 
 https://issues.dlang.org/show_bug.cgi?id=15723

 This looks pretty bad. I think I'll consider something else 
 until D's memory management story gets better. This is sad 
 because the language otherwise looks quite good, and I'd love 
 to try assertions, contracts, scope guards, macros etc.

you can use less auto GC. use the RC to replace the GC.
https://github.com/huntlabs/SmartRef

Feb 03 2017

thedeemon <dlang thedeemon.com> writes:

On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote:
 I'm wondering what
 are the implications of the fact that current GC is a 
 Boehm-style conservative
 GC rather than a precise one, I've never worked with a 
 conservative GC before.
 Are there any disallowed memory operations? Can I break things 
 by not following
 some unchecked rules etc. ? How often does it leak? Do I need 
 to be careful
 with some operations to avoid leaks?

Here's some practical perspective from someone who released a 
32-bit video processing app in D with thousands of users.
When developing with GC in D you need to keep in mind 3 key 
things:

1) The GC will treat some random stack data as possible pointers, 
and some of those false pointers will accidentally point to some 
places in the heap, so for any object in GC heap there is a 
probability that GC will think it's alive (used) even when it's 
not, and this probability is directly proportional to the size of 
your object.

2) Each GC iteration scans the whole GC heap, so the larger your 
managed heap, the slower it gets.

Main consequence of 1 and 2: don't store large objects (images, 
big file chunks etc.) in the GC heap, use other allocators for 
them. Leave GC heap just for the small litter. This way you 
practically don't leak and keep GC pauses short.

3) GC will call destructors (aka finalizers) for the objects that 
have them, and during the GC phase no allocations are allowed. 
Also, since you don't know in which order objects are collected, 
accessing other objects from a destructor is a bad idea, those 
objects might be collected already.

Main consequence of 3: don't do silly things in destructor (like 
throwing exceptions or doing other operations that might 
allocate), try avoiding using the destructors at all, if 
possible. They may be used to ensure you release your resources, 
but don't make it the primary and only way to release them, since 
some objects might leak and their destructors won't be called at 
all.

If you follow these principles, your app will be fine, it's not 
hard really.

Feb 04 2017

osa1 <omeragacan gmail.com> writes:

On Saturday, 4 February 2017 at 11:09:21 UTC, thedeemon wrote:
 On Wednesday, 1 February 2017 at 06:58:43 UTC, osa1 wrote:
 I'm wondering what
 are the implications of the fact that current GC is a 
 Boehm-style conservative
 GC rather than a precise one, I've never worked with a 
 conservative GC before.
 Are there any disallowed memory operations? Can I break things 
 by not following
 some unchecked rules etc. ? How often does it leak? Do I need 
 to be careful
 with some operations to avoid leaks?

 Here's some practical perspective from someone who released a 
 32-bit video processing app in D with thousands of users.
 When developing with GC in D you need to keep in mind 3 key 
 things:

 1) The GC will treat some random stack data as possible 
 pointers, and some of those false pointers will accidentally 
 point to some places in the heap, so for any object in GC heap 
 there is a probability that GC will think it's alive (used) 
 even when it's not, and this probability is directly 
 proportional to the size of your object.

 2) Each GC iteration scans the whole GC heap, so the larger 
 your managed heap, the slower it gets.

 Main consequence of 1 and 2: don't store large objects (images, 
 big file chunks etc.) in the GC heap, use other allocators for 
 them. Leave GC heap just for the small litter. This way you 
 practically don't leak and keep GC pauses short.

 3) GC will call destructors (aka finalizers) for the objects 
 that have them, and during the GC phase no allocations are 
 allowed. Also, since you don't know in which order objects are 
 collected, accessing other objects from a destructor is a bad 
 idea, those objects might be collected already.

 Main consequence of 3: don't do silly things in destructor 
 (like throwing exceptions or doing other operations that might 
 allocate), try avoiding using the destructors at all, if 
 possible. They may be used to ensure you release your 
 resources, but don't make it the primary and only way to 
 release them, since some objects might leak and their 
 destructors won't be called at all.

 If you follow these principles, your app will be fine, it's not 
 hard really.

Honestly this still sounds horrible. I'd be OK with any of these 
two:

- Percise GC, no manual management, no RAII or destructors etc.
- Manual GC, RAII and destructors, smart pointers.

but this:

- Automatic but conservative. Can leak at any time. You have to 
implement manual management (managed heaps) to avoid leaks. Leaks 
are hard to find as any heap value may be causing it.

is the worst of both worlds. I'm surprised that D was able to 
come this far with this.

Feb 04 2017

Kagamin <spam here.lot> writes:

On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote:
 I'm surprised that D was able to come this far with this.

It's used mostly for server software. Things are moving to 64 
bit, so this will be less of an issue.

Feb 04 2017

Adam D. Ruppe <destructionator gmail.com> writes:

On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote:
 - Automatic but conservative. Can leak at any time.

All GCs are prone to leak, including precise ones. The point of 
garbage collection is not to prevent leaks, but rather to prevent 
use-after-free bugs.

Granted, the D 32 bit GC is more prone to leak than most others 
(including D 64 bit), but this isn't as horrible as you're 
believing, it does it's *main* job pretty well just at the cost 
of higher memory consumption, which we can often afford.

And if you can't, manual management of large arrays tends to be 
relatively simple anyway. For example, my png.d used to leak 
something nasty in 32 bit because it used GC-allocated large 
temporary buffers while decompressing images. But, since they 
were temporary buffers, it was really easy to just `scope(exit) 
free(buffer);` after allocating to let them be freed at the end 
of the function. Then the memory consumption cut in half.

Feb 04 2017

osa1 <omeragacan gmail.com> writes:

 All GCs are prone to leak, including precise ones. The point of 
 garbage collection is not to prevent leaks, but rather to 
 prevent use-after-free bugs.

Of course I can have leaks in a GC environment, but having 
non-deterministic leaks is another thing, and I'd rather make 
sure to delete my references to let GC do its thing than to pray 
and hope some random number on my stack won't be in the range of 
my heap.

I don't agree that the point is just preventing use-after-free, 
which can be guaranteed statically even in a non-GC language (see 
e.g. Rust).

Feb 04 2017

Ola Fosheim =?UTF-8?B?R3LDuHN0YWQ=?= writes:

On Saturday, 4 February 2017 at 15:23:53 UTC, Adam D. Ruppe wrote:
 On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote:
 - Automatic but conservative. Can leak at any time.

 All GCs are prone to leak, including precise ones. The point of 
 garbage collection is not to prevent leaks, but rather to 
 prevent use-after-free bugs.

No, the main point of GC is to prevent leaks in the case where 
you have circular references.

Precise GCs don't leak, by definition. If the object is reachable 
then it isn't a leak.

Now, you might claim that objects that provably won't be touched 
again should be classified as dead and freed and that this is a 
bug that exhibit the same behaviour as a leak (running out of 
memory). But it's really nothing like the leaks you experience 
with manual memory management (e.g. circular references 
preventing memory from being released in a reference counting 
management scheme)

Feb 08 2017

thedeemon <dlang thedeemon.com> writes:

On Saturday, 4 February 2017 at 12:56:55 UTC, osa1 wrote:
 - Automatic but conservative. Can leak at any time. You have to 
 implement manual management (managed heaps) to avoid leaks. 
 Leaks are hard to find as any heap value may be causing it.

By "managed heap" I just meant the GC heap, the one used by "new" 
operator.
Besides it, there are already other allocators and container 
libraries available, don't need to implement this stuff manually.

 the worst of both worlds.

It may look so from a distance. But in my experience it's not 
that bad. In most software I did in D it did not matter really 
(it's either 64-bit or short lived programs) and the control D 
gives to choose how to deal with everything makes it all quite 
manageable, I can decide what to take from both worlds and hence 
pick the best, not the worst.

Feb 04 2017

rikki cattermole <rikki cattermole.co.nz> writes:

On 05/02/2017 5:02 PM, thedeemon wrote:

snip

 It may look so from a distance. But in my experience it's not that bad.
 In most software I did in D it did not matter really (it's either 64-bit
 or short lived programs) and the control D gives to choose how to deal
 with everything makes it all quite manageable, I can decide what to take
 from both worlds and hence pick the best, not the worst.

The best of both worlds can be done quite simply.

Instead of a chain of input ranges like:

int[] data = input.filter!"a != 7".map!"a * 2".array;

Use:

int[] data;
data.length = input.length;

size_t i;
foreach(v; input.filter!"a != 7".map!"a * 2") {
	data[i] = v;
	i++;
}

data.length = i;

Of course this is dirt simple example, but instead look at it for e.g. a 
csv parser with some complex data structure creation + manipulation.

I have some real world code here[0] that uses it. Not only is there less 
allocations and uses the GC but also it ends up being significantly faster!

[0] 
https://gist.github.com/rikkimax/42c3dfa6500155c5e441cbb1437142ea#file-reports-d-L124

Feb 04 2017

Cym13 <cpicard openmailbox.org> writes:

On Sunday, 5 February 2017 at 04:22:30 UTC, rikki cattermole 
wrote:
 On 05/02/2017 5:02 PM, thedeemon wrote:

 snip

 It may look so from a distance. But in my experience it's not 
 that bad.
 In most software I did in D it did not matter really (it's 
 either 64-bit
 or short lived programs) and the control D gives to choose how 
 to deal
 with everything makes it all quite manageable, I can decide 
 what to take
 from both worlds and hence pick the best, not the worst.

 The best of both worlds can be done quite simply.

 Instead of a chain of input ranges like:

 int[] data = input.filter!"a != 7".map!"a * 2".array;

 Use:

 int[] data;
 data.length = input.length;

 size_t i;
 foreach(v; input.filter!"a != 7".map!"a * 2") {
 	data[i] = v;
 	i++;
 }

 data.length = i;

 Of course this is dirt simple example, but instead look at it 
 for e.g. a csv parser with some complex data structure creation 
 + manipulation.

 I have some real world code here[0] that uses it. Not only is 
 there less allocations and uses the GC but also it ends up 
 being significantly faster!

 [0] 
 https://gist.github.com/rikkimax/42c3dfa6500155c5e441cbb1437142ea#file-reports-d-L124

Some data to weigh that in order to compare different memory 
management strategies on that simple case:



import std.conv;
import std.stdio;
import std.array;
import std.range;
import std.algorithm;

auto input = [1, 2, 7, 3, 7, 8, 8, 9, 7, 1, 0];


void naive() {
     int[] data = input.filter!(a => a!= 7).map!(a => a*2).array;
     assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string);
}

void maxReallocs() {
     int[] data;

     size_t i;
     foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) {
         data ~= v;
     }

     assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string);
}

void betterOfTwoWorlds() {
     int[] data;
     data.length = input.length;

     size_t i;
     foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) {
         data[i] = v;
         i++;
     }
     data.length = i;

     assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string);
}

void explicitNew() {
     int[] data = new int[input.length];
     scope(exit) delete data;

     size_t i;
     foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) {
         data[i] = v;
         i++;
     }
     data.length = i;

     assert(data == [2, 4, 6, 16, 16, 18, 2, 0], data.to!string);
}

void cStyle()  nogc {
     import std.c.stdlib;

     int* data = cast(int*)malloc(input.length * int.sizeof);
     scope(exit) free(data);

     size_t i;
     foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) {
         data[i++] = v;
     }

     debug assert(data[0..i] == [2, 4, 6, 16, 16, 18, 2, 0], 
data.to!string);
}

void onTheStack()  nogc {
     int[100] data;

     size_t i;
     foreach(v ; input.filter!(a => a!=7).map!(a => a*2)) {
         data[i++] = v;
     }

     debug assert(data[0..i] == [2, 4, 6, 16, 16, 18, 2, 0], 
data.to!string);
}

void main(string[] args) {
     import std.datetime;
     benchmark!(
         naive,
         maxReallocs,
         betterOfTwoWorlds,
         explicitNew,
         cStyle,
         onTheStack
     )(100000).each!writeln;
}

/* Results:

Compiled with dmd -profile=gc test.d
====================================

TickDuration(385731143)  // naive,
TickDuration(575673615)  // maxReallocs,
TickDuration(255928562)  // betterOfTwoWorlds,
TickDuration(270497154)  // explicitNew,
TickDuration(97596363)   // cStyle,
TickDuration(96467459)   // onTheStack

GC usage:

bytes allocated, allocations, type, function, file:line
        17600000          100000 int[] test.explicitNew test.d:43
         4400000          100000 int[] test.betterOfTwoWorlds 
test.d:30
         3200000          800000 int[] test.maxReallocs test.d:22
         3200000          100000 int[] test.maxReallocs test.d:25
         3200000          100000 int[] test.explicitNew test.d:51
         3200000          100000 int[] test.explicitNew test.d:53
         3200000          100000 int[] test.betterOfTwoWorlds 
test.d:37
         3200000          100000 int[] test.betterOfTwoWorlds 
test.d:39
         3200000          100000 
std.array.Appender!(int[]).Appender.Data 
std.array.Appender!(int[]).Appender.this 
/usr/include/dlang/dmd/std/array.d:2675
         3200000          100000 int[] test.naive test.d:14

Compiled with dmd -O -inline test.d
===================================

TickDuration(159383005)  // naive,
TickDuration(187192137)  // maxReallocs,
TickDuration(94094585)   // betterOfTwoWorlds,
TickDuration(102374657)  // explicitNew,
TickDuration(41801695)   // cStyle,
TickDuration(45613954)   // onTheStack

Compiled with dmd -O -inline -release -boundscheck=off test.d
=============================================================

TickDuration(152151439)  // naive,
TickDuration(140870515)  // maxReallocs,
TickDuration(46740440)   // betterOfTwoWorlds,
TickDuration(59089016)   // explicitNew,
TickDuration(26038060)   // cStyle,
TickDuration(25984371)   // onTheStack

*/

Feb 05 2017

D Programming

C/C++ Programming

Other

digitalmars.D.learn - GC question