digitalmars.D.learn - GC interpreting integer values as pointers
- Ivo Kasiuk (58/58) Oct 09 2010 Hi!
- %u (37/56) Oct 11 2010 In D1:
- Ivo Kasiuk (17/39) Oct 11 2010 ...
- %u (64/103) Oct 11 2010 Isn't p a pointer data type?
- Ivo Kasiuk (45/106) Oct 11 2010 What I mean is that p is pointing to data which has a simple data type
- %u (4/58) Oct 12 2010 Actually, those were global variables: I simply commented out the encaps...
- Steven Schveighoffer (17/29) Oct 14 2010 Yes, D's garbage collector is a conservative garbage collector. One whi...
- bearophile (4/6) Oct 14 2010 D has unions, and sometimes normal C-style unions are useful. But in man...
- Steven Schveighoffer (5/17) Oct 14 2010 Unions are rare enough that I think this may not be worth doing. But ye...
- Ivo Kasiuk (33/73) Oct 14 2010 h =20
- Steven Schveighoffer (9/14) Oct 14 2010 This is a common problem. I am not intimately familiar with AAs, but it...
Hi! In my D programs I am having problems with objects not getting finalised although there is no reference anymore. It turned out that this is caused by integers which happen to have values corresponding to pointers into the heap. So I wrote a test program to check the GC behaviour concerning integer values: ---------------------------------------- import std.stdio; import core.memory; class C { string s; this(string s) { this.s=3Ds; } ~this() { writeln(s); } } struct S { uint r; this(uint x) { r =3D x; } } class X { C c; uint r; S s; uint[int] a; uint* p; this() { c =3D new C("reference"); new C("no reference"); r =3D cast(uint) cast(void*) new C("uint"); s =3D S(cast(uint) cast(void*) new C("struct")); a[0] =3D cast(uint) cast(void*) new C("AA"); p =3D new uint; *p =3D (cast(uint) cast(void*) new C("new uint")); } } void main(string[] args) { X x =3D new X; GC.collect(); writefln("=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D %s, %x, %x, %x, %x", x.c.s, x.r, x.s.r, x.a[0], *x.p); } ---------------------------------------- This writes: new uint no reference =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D reference, f7490e20, f7490e10, f7490df0, f74= 90dd0 AA struct uint reference So in most but not all situations the integer value keeps the object from getting finalised. This observation corresponds to the effects I saw in my programs. I find this rather unfortunate. Is this known, documented behaviour? In a typical program there are such integer values all over the place. How should such values be stored to avoid unwanted interaction with the GC? Thanks, Ivo
Oct 09 2010
== Quote from Ivo Kasiuk (i.kasiuk gmx.de)'s articleHi!~snip---------------------------------------- This writes: new uint no reference ========== reference, f7490e20, f7490e10, f7490df0, f74 90dd0 AA struct uint reference So in most but not all situations the integer value keeps the object from getting finalised. This observation corresponds to the effects I saw in my programs. I find this rather unfortunate. Is this known, documented behaviour? In a typical program there are such integer values all over the place. How should such values be stored to avoid unwanted interaction with the GC? Thanks, IvoIn D1: import std.stdio; import std.gc; class C { string s; this(string s) { this.s=s; } ~this() { writefln(s); } } class X { C c; uint r; uint[int] a; uint* p; this() { c = new C("reference"); new C("no reference"); r = cast(uint) cast(void*) new C("uint"); a[0] = cast(uint) cast(void*) new C("AA"); p = new uint; *p = (cast(uint) cast(void*) new C("new uint")); } } void main(string[] args) { X x = new X; std.gc.fullCollect(); writefln("========== %s, %x, %x, %x", x.c.s, x.r, x.a[0],*x.p); } Writes: no reference ========== reference, ad3fd0, ad3fb0, ad3f90 new uint << ;) AA uint reference
Oct 11 2010
~snipf74---------------------------------------- This writes: new uint no reference =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D reference, f7490e20, f7490e10, f7490df0,=...90dd0 AA struct uint reference...Thanks, Ivo=20 In D1:Writes: no reference =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D reference, ad3fd0, ad3fb0, ad3f90 new uint << ;) AA uint referenceThanks for trying it out in D1. So, summing up this means that: - In most cases, memory is by default scanned for pointers regardless of the actual data types. - In D2, newly allocated memory for a non-pointer data type (like "new uint" or "new uint[10]") is not scanned by default. - In D1, you have to use hasNoPointers if you want some memory not to be scanned. Is this observation correct? And what about structs/classes that have integer fields as well as pointer/reference fields? And what about associative arrays - apparently these are scanned even if the type is uint? Ivo
Oct 11 2010
== Quote from Ivo Kasiuk (i.kasiuk gmx.de)'s articleIsn't p a pointer data type? I didn't even know I could do "i = new int;" :D~snipf74---------------------------------------- This writes: new uint no reference ========== reference, f7490e20, f7490e10, f7490df0,...90dd0 AA struct uint reference...Thanks, IvoIn D1:Writes: no reference ========== reference, ad3fd0, ad3fb0, ad3f90 new uint << ;) AA uint referenceThanks for trying it out in D1. So, summing up this means that: - In most cases, memory is by default scanned for pointers regardless of the actual data types. - In D2, newly allocated memory for a non-pointer data type (like "new uint" or "new uint[10]") is not scanned by default.- In D1, you have to use hasNoPointers if you want some memory not to be scanned. Is this observation correct? And what about structs/classes that have integer fields as well as pointer/reference fields? And what about associative arrays - apparently these are scanned even if the type is uint? IvoI added the struct again and also ran without the enclosing X class. With X : no reference ========== reference, ad3fd0, ad3fc0, ad3fa0, ad3f80 new uint AA struct uint reference Without X : no reference ========== reference, ad2fd0, ad2fc0, ad2fa0, ad2f80 new uint -- import std.stdio; import std.gc; class C { string s; this(string s) { this.s=s; } ~this() { writefln(s); } } struct S { uint r; static S opCall(uint x) { S s; s.r = x; return s; } } class X{ C c; uint r; S s; uint[int] a; uint* p; this() { c = new C("reference"); new C("no reference"); r = cast(uint) cast(void*) new C("uint"); s = S(cast(uint) cast(void*) new C("struct")); a[0] = cast(uint) cast(void*) new C("AA"); p = new uint; *p = (cast(uint) cast(void*) new C("new uint")); } } void main(string[] args) { /+ c = new C("reference"); new C("no reference"); r = cast(uint) cast(void*) new C("uint"); s = S(cast(uint) cast(void*) new C("struct")); a[0] = cast(uint) cast(void*) new C("AA"); p = new uint; *p = (cast(uint) cast(void*) new C("new uint")); +/ X x = new X; std.gc.fullCollect(); writefln("========== %s, %x, %x, %x, %x", x.c.s, x.r, x.s.r, x.a[0],*x.p); //writefln("========== %s, %x, %x, %x, %x", c.s, r, s.r, a[0],*p); }
Oct 11 2010
df0,~snip---------------------------------------- This writes: new uint no reference =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D reference, f7490e20, f7490e10, f7490=What I mean is that p is pointing to data which has a simple data type (not a struct/class/union) that is not a pointer/reference type. For instance, with "p =3D new uint[10]" the compiler knows that the newly allocated memory that p points to does not contain any pointers. With D2, that seems to cause the memory not to be scanned.f74Isn't p a pointer data type? I didn't even know I could do "i =3D new int;" :D...90dd0 AA struct uint reference...Thanks, IvoIn D1:Writes: no reference =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D reference, ad3fd0, ad3fb0, ad3f90 new uint << ;) AA uint referenceThanks for trying it out in D1. So, summing up this means that: - In most cases, memory is by default scanned for pointers regardless of the actual data types. - In D2, newly allocated memory for a non-pointer data type (like "new uint" or "new uint[10]") is not scanned by default.... No suprises with the struct. And the "Without X" example... I am not sure, with the variables all in the current stack frame that might be a special case. What about global variables instead: ... C c; uint r; S s; uint[int] a; uint* p; uint[] arr; void f() { c =3D new C("reference"); new C("no reference"); r =3D cast(uint) cast(void*) new C("uint"); s =3D S(cast(uint) cast(void*) new C("struct")); a[0] =3D cast(uint) cast(void*) new C("AA"); p =3D new uint; *p =3D (cast(uint) cast(void*) new C("new uint")); arr =3D new uint[1]; arr[0] =3D (cast(uint) cast(void*) new C("array")); } void main(string[] args) { f(); GC.collect(); writefln("=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D %s, %x, %x, %x, %x, %x", c.s, r, s.r, a[0], *p, arr[0]); } That gives me (with D2): array new uint no reference =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D reference, f74c3e20, f74c3e10, f74c3df0, f74= c3dd0, f74c3db0 AA struct uint reference- In D1, you have to use hasNoPointers if you want some memory not to be scanned. Is this observation correct? And what about structs/classes that have integer fields as well as pointer/reference fields? And what about associative arrays - apparently these are scanned even if the type is uint? Ivo=20 I added the struct again and also ran without the enclosing X class. =20 With X : no reference =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D reference, ad3fd0, ad3fc0, ad3fa0, ad3f80 new uint AA struct uint reference =20 Without X : no reference =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D reference, ad2fd0, ad2fc0, ad2fa0, ad2f80 new uint
Oct 11 2010
== Quote from Ivo Kasiuk (i.kasiuk gmx.de)'s articleActually, those were global variables: I simply commented out the encapsulating class and constructor. But I left all the allocation in the main.. would that matter?I added the struct again and also ran without the enclosing X class. With X : no reference ========== reference, ad3fd0, ad3fc0, ad3fa0, ad3f80 new uint AA struct uint reference Without X : no reference ========== reference, ad2fd0, ad2fc0, ad2fa0, ad2f80 new uint... No suprises with the struct. And the "Without X" example... I am not sure, with the variables all in the current stack frame that might be a special case. What about global variables instead:... C c; uint r; S s; uint[int] a; uint* p; uint[] arr; void f() { c = new C("reference"); new C("no reference"); r = cast(uint) cast(void*) new C("uint"); s = S(cast(uint) cast(void*) new C("struct")); a[0] = cast(uint) cast(void*) new C("AA"); p = new uint; *p = (cast(uint) cast(void*) new C("new uint")); arr = new uint[1]; arr[0] = (cast(uint) cast(void*) new C("array")); } void main(string[] args) { f(); GC.collect(); writefln("========== %s, %x, %x, %x, %x, %x", c.s, r, s.r, a[0], *p, arr[0]); } That gives me (with D2): array new uint no reference ========== reference, f74c3e20, f74c3e10, f74c3df0, f74 c3dd0, f74c3db0 AA struct uint reference
Oct 12 2010
On Sat, 09 Oct 2010 15:51:37 -0400, Ivo Kasiuk <i.kasiuk gmx.de> wrote:Hi! In my D programs I am having problems with objects not getting finalised although there is no reference anymore. It turned out that this is caused by integers which happen to have values corresponding to pointers into the heap. So I wrote a test program to check the GC behaviour concerning integer values:[snip]So in most but not all situations the integer value keeps the object from getting finalised. This observation corresponds to the effects I saw in my programs. I find this rather unfortunate. Is this known, documented behaviour? In a typical program there are such integer values all over the place. How should such values be stored to avoid unwanted interaction with the GC?Yes, D's garbage collector is a conservative garbage collector. One which doesn't have this problem is called a precise garbage collector. There are two problems here. First, D has unions, so it is impossible for the GC to determine if a union contains an integer or a pointer. Second problem is the granularity of scanning. A memory block is scanned as if every n bits (n being your architecture) is a pointer, or there are no pointers. This is determined by a bit associated with the block (the NO_SCAN bit). If you allocate a memory block that contains at least one pointer, then all the words in the memory block are considered to be pointers by the GC. There is a (continually updated) patch which allows the GC to be semi-precise. That is, the type information of the memory block will be linked to it. This will allow precise scanning except for unions. Once this is integrated, the false pointer problem will be much less prevalent. -Steve
Oct 14 2010
Steven Schveighoffer:There are two problems here. First, D has unions, so it is impossible for the GC to determine if a union contains an integer or a pointer.D has unions, and sometimes normal C-style unions are useful. But in many situations when you have a union you also keep a tag that represents the type, so in many of those situations you may use the tagged union of Phobos, std.variant.Algebraic (if the Phobos implementation is good enough, currently unfinished and not good enough yet) and the D GC may be aware and read and use the tag of an Algebraic union to know at runtime what's the type. This improves the GC precision a little. Bye, bearophile
Oct 14 2010
On Thu, 14 Oct 2010 12:39:33 -0400, bearophile <bearophileHUGS lycos.com> wrote:Steven Schveighoffer:Unions are rare enough that I think this may not be worth doing. But yes, it could be had. -SteveThere are two problems here. First, D has unions, so it is impossible for the GC to determine if a union contains an integer or a pointer.D has unions, and sometimes normal C-style unions are useful. But in many situations when you have a union you also keep a tag that represents the type, so in many of those situations you may use the tagged union of Phobos, std.variant.Algebraic (if the Phobos implementation is good enough, currently unfinished and not good enough yet) and the D GC may be aware and read and use the tag of an Algebraic union to know at runtime what's the type. This improves the GC precision a little.
Oct 14 2010
On Sat, 09 Oct 2010 15:51:37 -0400, Ivo Kasiuk <i.kasiuk gmx.de> wrote: =20dHi! In my D programs I am having problems with objects not getting finalise=salthough there is no reference anymore. It turned out that this is caused by integers which happen to have values corresponding to pointer=h =20into the heap. So I wrote a test program to check the GC behaviour concerning integer values:=20 [snip] =20So in most but not all situations the integer value keeps the object from getting finalised. This observation corresponds to the effects I saw in my programs. I find this rather unfortunate. Is this known, documented behaviour? In a typical program there are such integer values all over the place. How should such values be stored to avoid unwanted interaction with the GC?=20 Yes, D's garbage collector is a conservative garbage collector. One whic=doesn't have this problem is called a precise garbage collector. =20 There are two problems here. First, D has unions, so it is impossible fo=r =20the GC to determine if a union contains an integer or a pointer. =20 Second problem is the granularity of scanning. A memory block is scanned==20as if every n bits (n being your architecture) is a pointer, or there are==20no pointers. This is determined by a bit associated with the block (the ==20NO_SCAN bit). =20 If you allocate a memory block that contains at least one pointer, then =20 all the words in the memory block are considered to be pointers by the =20 GC. There is a (continually updated) patch which allows the GC to be =20 semi-precise. That is, the type information of the memory block will be ==20linked to it. This will allow precise scanning except for unions. Once ==20this is integrated, the false pointer problem will be much less prevalent=.=20 -SteveThanks! This absolutely makes sense. It is basically a trade-off between precision and efficiency of the GC. Slowly, I am learning all the little details of D's garbage collection. It is more complicated than it seems at first, but understanding it better greatly helps to write better programs in terms of memory management. There is one case though that I am still not sure about: associative arrays. It seems that keys as well as values in AAs are scanned for pointers even if both are integer types. How can I tell the GC that I do not want them to be scanned? I know about the NO_SCAN flag but what memory region should it be applied to in this case? BTW: considering the "conservative" scanning, the implementation of Object.toHash() is somewhat interesting: hash_t toHash() { // BUG: this prevents a compacting GC from working, needs to be fixed return cast(hash_t)cast(void*)this; } So an object's hash value will keep the GC from freeing the object, if that value is scanned. But as the comment indicates, this implementation needs to be changed anyway (I am eager to see the result). A compacting GC probably gives rise to some whole new problems. Ivo
Oct 14 2010
On Thu, 14 Oct 2010 13:35:13 -0400, Ivo Kasiuk <i.kasiuk gmx.de> wrote:There is one case though that I am still not sure about: associative arrays. It seems that keys as well as values in AAs are scanned for pointers even if both are integer types. How can I tell the GC that I do not want them to be scanned? I know about the NO_SCAN flag but what memory region should it be applied to in this case?This is a common problem. I am not intimately familiar with AAs, but it may have something to do with the fact that it's not a templated type. That means the runtime is responsible for allocating AA nodes. I think at the moment there is no way to do this. I also think there is likely a bug report to this effect, and that others may have implemented better AAs to fix the issue. Try searching the bug database for AA and NO_SCAN. -Steve
Oct 14 2010