digitalmars.D - D1 garbage collector + threads + malloc = garbage?
- Bane (46/46) Dec 11 2009 Bug with GC fullCollect() in multithreaded environment. Grauzone explain...
- Leandro Lucarella (42/99) Dec 11 2009 The patch is very simple (as pointed by Grauzone), I guess it should mak...
- Sean Kelly (2/4) Dec 11 2009 To be perfectly safe, it's a tiny bit trickier than that. You have to a...
- Walter Bright (4/22) Dec 11 2009 Yes, the lock on the gc must still be held until the collection work is
- Sean Kelly (2/25) Dec 12 2009 Hm... so if a dtor creates a new thread then that thread could enter and...
- Walter Bright (4/32) Dec 12 2009 No, because the thread doing the GC collection has locked the GC. Any
- Sean Kelly (2/35) Dec 12 2009 Yeah, but it releases the lock when the collection completes, and the it...
- Walter Bright (4/13) Dec 12 2009 I don't understand why there is a problem with this. When the collection...
- Sean Kelly (2/16) Dec 12 2009 Ah, I didn't realize the allocation call backed up that far after the co...
- Walter Bright (2/4) Dec 11 2009 It's a good fix, I checked in a patch for it.
- Leandro Lucarella (10/15) Dec 12 2009 Great, thanks!
- Leandro Lucarella (14/22) Dec 12 2009 If you could add a small description of what the commit is about besides
- Walter Bright (3/6) Dec 12 2009 Just click on the link! (At least, that's how the changelog works.)
- Leandro Lucarella (18/25) Dec 13 2009 There are no links in the commit messages, and my RSS feed gets only the
- Bane (3/8) Dec 12 2009 Thanx :)
- Bane (4/109) Dec 12 2009 Home computer I tested it in is winxp with phenom quad core, but this bu...
- Bane (1/10) Dec 12 2009 I forgot: for linux just remove msleep lines completely. It will deadloc...
Bug with GC fullCollect() in multithreaded environment. Grauzone explained it here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610 Test case that freezes app (tested on 1.053): import std.c.stdlib; import std.stdio; import std.gc; import std.thread; import std.c.time; class Foo { void* p; this(){ synchronized(this.classinfo) { writefln("malloc begin"); p = std.c.stdlib.malloc(1024*1024); writefln("malloc finished"); } } ~this(){ synchronized(this.classinfo){ writefln("free begin"); std.c.stdlib.free(p); writefln("free finished"); } } } class MyThread : Thread { int run(){ while(true){ new Foo; msleep(1); } return 0; } } void main(){ for(int i=0; i<10; i++){ (new MyThread).start; } while(true){ writefln("collect begin"); std.gc.fullCollect; writefln("collect finished"); msleep(1000); } } Can it be fixed or adleast mentioned somewhere in docs as known limitation of D1 GC? It would save a lot of time to some people (too late for me :).
Dec 11 2009
Bane, el 11 de diciembre a las 06:00 me escribiste:Bug with GC fullCollect() in multithreaded environment. Grauzone explained it here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610 Test case that freezes app (tested on 1.053): import std.c.stdlib; import std.stdio; import std.gc; import std.thread; import std.c.time; class Foo { void* p; this(){ synchronized(this.classinfo) { writefln("malloc begin"); p = std.c.stdlib.malloc(1024*1024); writefln("malloc finished"); } } ~this(){ synchronized(this.classinfo){ writefln("free begin"); std.c.stdlib.free(p); writefln("free finished"); } } } class MyThread : Thread { int run(){ while(true){ new Foo; msleep(1); } return 0; } } void main(){ for(int i=0; i<10; i++){ (new MyThread).start; } while(true){ writefln("collect begin"); std.gc.fullCollect; writefln("collect finished"); msleep(1000); } } Can it be fixed or adleast mentioned somewhere in docs as known limitation of D1 GC? It would save a lot of time to some people (too late for me :).The patch is very simple (as pointed by Grauzone), I guess it should make it into D1 (I hope): ------------------------------------------------------------ diff --git a/phobos/internal/gc/gcx.d b/phobos/internal/gc/gcx.d index b0d2821..7b1a7a3 100644 --- a/phobos/internal/gc/gcx.d +++ b/phobos/internal/gc/gcx.d -2033,8 +2033,6 struct Gcx } } - Thread.resumeAll(); - // Free up everything not marked debug(COLLECT_PRINTF) printf("\tfree'ing\n"); size_t freedpages = 0; -2194,6 +2192,8 struct Gcx debug(COLLECT_PRINTF) printf("recovered pages = %d\n", recoveredpages); debug(COLLECT_PRINTF) printf("\tfree'd %u bytes, %u pages from %u pools\ + Thread.resumeAll(); + return freedpages + recoveredpages; } ------------------------------------------------------------ I wanted to reproduce it to make a bug report to attach the patch but I couldn't. I replaced msleep() with usleep() (and multiplied the values by 1000) because I'm in Linux, but that's all I changed. I have an old AMD Athlon 1.8GHz. How do you reproduce it? I want to be sure the bug is present before the patch and fixed after the patch before submiting the patch via Bugzilla. Thanks! -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Nos retiramos hasta la semana que viene reflexionando sobre nuestras vidas: "Qué vida de mier'... Qué vida de mier'!" -- Sidharta Kiwi
Dec 11 2009
Leandro Lucarella Wrote:Bane, el 11 de diciembre a las 06:00 me escribiste:To be perfectly safe, it's a tiny bit trickier than that. You have to always lock on allocations instead of just locking the collection cycle in single-threaded apps. I think the reason for this is that once the lock is acquired it must be held until the current thread exits the GC code, and the "synchronized" statement only allows you to lock for the length of the collect() call (because the start and end must be within the same scope). However, this /may/ actually be a separate bug with the D 1.0 GC. I really can't recall for sure if this was an issue with collecting when all the other threads were suspended.Bug with GC fullCollect() in multithreaded environment. Grauzone explained it here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
Dec 11 2009
Sean Kelly wrote:Leandro Lucarella Wrote:Yes, the lock on the gc must still be held until the collection work is completed, but the other threads can be unfrozen once the mark phase is complete.Bane, el 11 de diciembre a las 06:00 me escribiste:To be perfectly safe, it's a tiny bit trickier than that. You have to always lock on allocations instead of just locking the collection cycle in single-threaded apps. I think the reason for this is that once the lock is acquired it must be held until the current thread exits the GC code, and the "synchronized" statement only allows you to lock for the length of the collect() call (because the start and end must be within the same scope). However, this /may/ actually be a separate bug with the D 1.0 GC. I really can't recall for sure if this was an issue with collecting when all the other threads were suspended.Bug with GC fullCollect() in multithreaded environment. Grauzone explained it here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
Dec 11 2009
Walter Bright Wrote:Sean Kelly wrote:Hm... so if a dtor creates a new thread then that thread could enter and "lock" the GC while the first thread is still completing its allocation. Couldn't this cause problems?Leandro Lucarella Wrote:Yes, the lock on the gc must still be held until the collection work is completed, but the other threads can be unfrozen once the mark phase is complete.Bane, el 11 de diciembre a las 06:00 me escribiste:To be perfectly safe, it's a tiny bit trickier than that. You have to always lock on allocations instead of just locking the collection cycle in single-threaded apps. I think the reason for this is that once the lock is acquired it must be held until the current thread exits the GC code, and the "synchronized" statement only allows you to lock for the length of the collect() call (because the start and end must be within the same scope). However, this /may/ actually be a separate bug with the D 1.0 GC. I really can't recall for sure if this was an issue with collecting when all the other threads were suspended.Bug with GC fullCollect() in multithreaded environment. Grauzone explained it here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
Dec 12 2009
Sean Kelly wrote:Walter Bright Wrote:No, because the thread doing the GC collection has locked the GC. Any other thread attempting to alloc will wait on that lock. There is only one GC lock.Sean Kelly wrote:Hm... so if a dtor creates a new thread then that thread could enter and "lock" the GC while the first thread is still completing its allocation. Couldn't this cause problems?Leandro Lucarella Wrote:Yes, the lock on the gc must still be held until the collection work is completed, but the other threads can be unfrozen once the mark phase is complete.Bane, el 11 de diciembre a las 06:00 me escribiste:To be perfectly safe, it's a tiny bit trickier than that. You have to always lock on allocations instead of just locking the collection cycle in single-threaded apps. I think the reason for this is that once the lock is acquired it must be held until the current thread exits the GC code, and the "synchronized" statement only allows you to lock for the length of the collect() call (because the start and end must be within the same scope). However, this /may/ actually be a separate bug with the D 1.0 GC. I really can't recall for sure if this was an issue with collecting when all the other threads were suspended.Bug with GC fullCollect() in multithreaded environment. Grauzone explained it here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
Dec 12 2009
Walter Bright Wrote:Sean Kelly wrote:Yeah, but it releases the lock when the collection completes, and the it retries the allocation to actually obtain memory. If the lock were held until the thread exited the GC code completely there wouldn't be a problem. I've thought about switching to using a Mutex in D2 so the lock could be acquired at the collection and released at the end of GC.malloc(), but this isn't an option in D1.Walter Bright Wrote:No, because the thread doing the GC collection has locked the GC. Any other thread attempting to alloc will wait on that lock. There is only one GC lock.Sean Kelly wrote:Hm... so if a dtor creates a new thread then that thread could enter and "lock" the GC while the first thread is still completing its allocation. Couldn't this cause problems?Leandro Lucarella Wrote:Yes, the lock on the gc must still be held until the collection work is completed, but the other threads can be unfrozen once the mark phase is complete.Bane, el 11 de diciembre a las 06:00 me escribiste:To be perfectly safe, it's a tiny bit trickier than that. You have to always lock on allocations instead of just locking the collection cycle in single-threaded apps. I think the reason for this is that once the lock is acquired it must be held until the current thread exits the GC code, and the "synchronized" statement only allows you to lock for the length of the collect() call (because the start and end must be within the same scope). However, this /may/ actually be a separate bug with the D 1.0 GC. I really can't recall for sure if this was an issue with collecting when all the other threads were suspended.Bug with GC fullCollect() in multithreaded environment. Grauzone explained it here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610
Dec 12 2009
Sean Kelly wrote:Walter Bright Wrote:I don't understand why there is a problem with this. When the collection is complete, release lock, then get lock again for allocation, if the allocation fails, then go for another collection.No, because the thread doing the GC collection has locked the GC. Any other thread attempting to alloc will wait on that lock. There is only one GC lock.Yeah, but it releases the lock when the collection completes, and the it retries the allocation to actually obtain memory. If the lock were held until the thread exited the GC code completely there wouldn't be a problem.
Dec 12 2009
Walter Bright Wrote:Sean Kelly wrote:Ah, I didn't realize the allocation call backed up that far after the collection. It's been a while since I looked at the code.Walter Bright Wrote:I don't understand why there is a problem with this. When the collection is complete, release lock, then get lock again for allocation, if the allocation fails, then go for another collection.No, because the thread doing the GC collection has locked the GC. Any other thread attempting to alloc will wait on that lock. There is only one GC lock.Yeah, but it releases the lock when the collection completes, and the it retries the allocation to actually obtain memory. If the lock were held until the thread exited the GC code completely there wouldn't be a problem.
Dec 12 2009
Leandro Lucarella wrote:I want to be sure the bug is present before the patch and fixed after the patch before submiting the patch via Bugzilla.It's a good fix, I checked in a patch for it.
Dec 11 2009
Walter Bright, el 11 de diciembre a las 20:51 me escribiste:Leandro Lucarella wrote:Great, thanks! -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- You should've seen her face. It was the exact same look my father gave me when I told him I wanted to be a ventriloquist. -- George ConstanzaI want to be sure the bug is present before the patch and fixed after the patch before submiting the patch via Bugzilla.It's a good fix, I checked in a patch for it.
Dec 12 2009
Leandro Lucarella, el 12 de diciembre a las 11:18 me escribiste:Walter Bright, el 11 de diciembre a las 20:51 me escribiste:If you could add a small description of what the commit is about besides pointing the NG discussion/Bugzilla number, as you are doing with DMD now, it would be really great! =) -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Le pedí que me enseñe a usar el mouse Pero solo quiere hablarme del Bauhaus Le pregunté si era chorra o rockera Me dijo "Gertrude Stein era re-tortillera" Me hizo mucho mal la cumbiera intelectualLeandro Lucarella wrote:Great, thanks!I want to be sure the bug is present before the patch and fixed after the patch before submiting the patch via Bugzilla.It's a good fix, I checked in a patch for it.
Dec 12 2009
Leandro Lucarella wrote:If you could add a small description of what the commit is about besides pointing the NG discussion/Bugzilla number, as you are doing with DMD now, it would be really great! =)Just click on the link! (At least, that's how the changelog works.) Also, I added a comment in the source code that explains the why.
Dec 12 2009
Walter Bright, el 12 de diciembre a las 10:29 me escribiste:Leandro Lucarella wrote:There are no links in the commit messages, and my RSS feed gets only the commit message, not the diff, so to actually see the diff I have to open the RSS "link" to go to the full changeset information. So, if there are 10 changes, I have to open 10 new web pages to see what the changes are about. It would be nice to know what the changes are about *before* opening any new web pages, so I can only click in the changes I'm interested in seeing fully. Thanks. -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Si pusieramos la máquina de cortar boludos dentro de la máquina del tiempo y se pusiera a cortar boludos históricos con retroactividad, otra sería la historieta hoy. -- Tato BoresIf you could add a small description of what the commit is about besides pointing the NG discussion/Bugzilla number, as you are doing with DMD now, it would be really great! =)Just click on the link! (At least, that's how the changelog works.) Also, I added a comment in the source code that explains the why.
Dec 13 2009
Walter Bright Wrote:Leandro Lucarella wrote:Thanx :) I tested it with patched phobos on winxp/centos and it works.I want to be sure the bug is present before the patch and fixed after the patch before submiting the patch via Bugzilla.It's a good fix, I checked in a patch for it.
Dec 12 2009
Leandro Lucarella Wrote:Bane, el 11 de diciembre a las 06:00 me escribiste:Home computer I tested it in is winxp with phenom quad core, but this bug is confirmed with centos 5.2 + intel dual core and centos 5.2 + intel xeon. Bug is present for a long time, thats for sure, It took me a while to track it because I thought its just me abusing keyboard, not a bug in phobos. Worst thing with this bug is that there is no output or exception when it happens - just that dreaded deadlock :) Once thing I noted - if you reduce memory required in that malloc call from 1 MB to 1 KB or less (or increase sleep time between calls), chances for bug to occur drop drastically, so in most applications this probably will never happen.Bug with GC fullCollect() in multithreaded environment. Grauzone explained it here http://www.digitalmars.com/webnews/newsgroups.php?art_group=digitalmars.D&article_id=99610 Test case that freezes app (tested on 1.053): import std.c.stdlib; import std.stdio; import std.gc; import std.thread; import std.c.time; class Foo { void* p; this(){ synchronized(this.classinfo) { writefln("malloc begin"); p = std.c.stdlib.malloc(1024*1024); writefln("malloc finished"); } } ~this(){ synchronized(this.classinfo){ writefln("free begin"); std.c.stdlib.free(p); writefln("free finished"); } } } class MyThread : Thread { int run(){ while(true){ new Foo; msleep(1); } return 0; } } void main(){ for(int i=0; i<10; i++){ (new MyThread).start; } while(true){ writefln("collect begin"); std.gc.fullCollect; writefln("collect finished"); msleep(1000); } } Can it be fixed or adleast mentioned somewhere in docs as known limitation of D1 GC? It would save a lot of time to some people (too late for me :).The patch is very simple (as pointed by Grauzone), I guess it should make it into D1 (I hope): ------------------------------------------------------------ diff --git a/phobos/internal/gc/gcx.d b/phobos/internal/gc/gcx.d index b0d2821..7b1a7a3 100644 --- a/phobos/internal/gc/gcx.d +++ b/phobos/internal/gc/gcx.d -2033,8 +2033,6 struct Gcx } } - Thread.resumeAll(); - // Free up everything not marked debug(COLLECT_PRINTF) printf("\tfree'ing\n"); size_t freedpages = 0; -2194,6 +2192,8 struct Gcx debug(COLLECT_PRINTF) printf("recovered pages = %d\n", recoveredpages); debug(COLLECT_PRINTF) printf("\tfree'd %u bytes, %u pages from %u pools\ + Thread.resumeAll(); + return freedpages + recoveredpages; } ------------------------------------------------------------ I wanted to reproduce it to make a bug report to attach the patch but I couldn't. I replaced msleep() with usleep() (and multiplied the values by 1000) because I'm in Linux, but that's all I changed. I have an old AMD Athlon 1.8GHz. How do you reproduce it? I want to be sure the bug is present before the patch and fixed after the patch before submiting the patch via Bugzilla. Thanks! -- Leandro Lucarella (AKA luca) http://llucax.com.ar/ ---------------------------------------------------------------------- GPG Key: 5F5A8D05 (F8CD F9A7 BF00 5431 4145 104C 949E BFB6 5F5A 8D05) ---------------------------------------------------------------------- Nos retiramos hasta la semana que viene reflexionando sobre nuestras vidas: "Qué vida de mier'... Qué vida de mier'!" -- Sidharta Kiwi
Dec 12 2009
I wanted to reproduce it to make a bug report to attach the patch but I couldn't. I replaced msleep() with usleep() (and multiplied the values by 1000) because I'm in Linux, but that's all I changed. I have an old AMD Athlon 1.8GHz. How do you reproduce it? I want to be sure the bug is present before the patch and fixed after the patch before submiting the patch via Bugzilla. Thanks!I forgot: for linux just remove msleep lines completely. It will deadlock soon after program start. Seems that linux is more resistant to this bug than windows.
Dec 12 2009