digitalmars.D - Analysis of D GC
- Dmitry Olshansky (5/5) Jun 19 2017 My take on D's GC problem, also spoiler - I'm going to build a
- Adam D. Ruppe (3/3) Jun 19 2017 What is it about Windows that makes you call it a distant
- Dmitry Olshansky (7/10) Jun 20 2017 This is mostly because I wanted to abuse lazy commit of POSIX.
- Petar Kirov [ZombineDev] (4/15) Jun 20 2017 BTW, Rainer Schuetze has studied this in detail and has written
- =?UTF-8?Q?Ali_=c3=87ehreli?= (7/12) Jun 19 2017 Very informative, thanks.
- Dmitry Olshansky (6/21) Jun 20 2017 I could call it a problem :) Still one reason I didn't go to D
- H. S. Teoh via Digitalmars-d (38/45) Jun 19 2017 [...]
- safety0ff (5/11) Jun 19 2017 I've read that there is such a function on Windows but you need
- ketmar (5/18) Jun 19 2017 it is higly depends of undocumented windows internals, and not portable
- Jacob Carlborg (6/10) Jun 20 2017 I'm wondering what Windows 10 is using to implement "fork" for Windows
- rikki cattermole (3/14) Jun 20 2017 It wouldn't surprise me to learn that it was a posix layer specific
- Petar Kirov [ZombineDev] (18/33) Jun 20 2017 The Windows Subsystem for Linux is build on a new form processes
- Jacob Carlborg (4/8) Jun 20 2017 Looks interesting.
- ketmar (16/21) Jun 19 2017 and it was even ported to D2, and worked. sadly, using `fork()` has it's...
- Dmitry Olshansky (3/20) Jun 20 2017 Since we are in control of what child does I see this as no
- Dmitry Olshansky (10/43) Jun 20 2017 Yeah if said 32-bit application makes use of no interior pointer
- Vladimir Panteleev (25/27) Jun 19 2017 Looks like I'm not the only one itching to have a go at D's GC :)
- Dmitry Olshansky (16/43) Jun 20 2017 Nice. A pool could have many different structures, the collector
- H. S. Teoh via Digitalmars-d (25/35) Jun 20 2017 [...]
- Dmitry Olshansky (6/35) Jun 20 2017 Interestingly the moment you "reallocate" to expand the AA it
- H. S. Teoh via Digitalmars-d (11/14) Jun 20 2017 [...]
- Jacob Carlborg (4/6) Jun 20 2017 Don't for get the Clang sanitizers, assuming they work using LDC.
- Nicholas Wilson (4/11) Jun 19 2017 should probably be
- safety0ff (6/11) Jun 19 2017 Good overview, however:
- Dmitry Olshansky (6/19) Jun 20 2017 Pools are granular to 256kb irc, so the trick is to keep them
- ketmar (11/14) Jun 19 2017 "...the dubious optimization of no interior pointers..."
- Jacob Carlborg (6/20) Jun 20 2017 You need to move to 64bit. Apple is already deprecating support for
- Petar Kirov [ZombineDev] (5/10) Jun 20 2017 I highly doubt that ketmar would have any intention of touching
- Jacob Carlborg (5/7) Jun 20 2017 I somehow mixed up ketmar and Guillaume Piolat (which used to go by the
- Adrian Matoga (3/6) Jun 25 2017 There are other 32-bit platforms that are going to stay on the
- Jacob Carlborg (5/7) Jun 26 2017 Sure, but as I mentioned I mixed up ketmar and Guillaume Piolat and
- Nicholas Wilson (3/8) Jun 20 2017 This was posted on reddit:
- Walter Bright (2/6) Jun 20 2017 Also on hacker news.
- Ecstatic Coder (23/28) Jun 20 2017 Many thanks for your efforts Dmitry :)
- Dmitry Olshansky (3/18) Jun 20 2017 No incremental GC, sorry. It may grow thread-local collection one
- Kagamin (4/8) Jun 22 2017 https://github.com/3dicc/Urhonimo/blob/master/Urho3D-1.32/Source/Engine/...
- safety0ff (16/18) Jun 22 2017 It's likely to pave over the many pitfalls of D finalizers.
- Martin Nowak (16/21) Jun 24 2017 FYI, we've tried to improve the binary pool search, but there
- Dmitry Olshansky (5/20) Jun 24 2017 Doesn't have to be for pages. Pool granularity is 256k, aligning
- Martin Nowak (7/9) Jun 25 2017 Right now this leads to some inflation of RSS cause previously
My take on D's GC problem, also spoiler - I'm going to build a new one soonish. http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html --- Dmitry Olshansky
Jun 19 2017
What is it about Windows that makes you call it a distant possibility? Is it just that you are unfamiliar with it or is there some specific OS level feature you plan on needing?
Jun 19 2017
On Monday, 19 June 2017 at 22:50:05 UTC, Adam D. Ruppe wrote:What is it about Windows that makes you call it a distant possibility? Is it just that you are unfamiliar with it or is there some specific OS level feature you plan on needing?This is mostly because I wanted to abuse lazy commit of POSIX. Now that I think of it Windows is mostly ok, except for the fork trick used in concurrent GC. As Vladimir pointed out on Windows there are other ways to do it but they are more involved. --- Dmitry Olshansky
Jun 20 2017
On Tuesday, 20 June 2017 at 07:11:10 UTC, Dmitry Olshansky wrote:On Monday, 19 June 2017 at 22:50:05 UTC, Adam D. Ruppe wrote:BTW, Rainer Schuetze has studied this in detail and has written down some of it here: http://rainers.github.io/visuald/druntime/concurrentgc.htmlWhat is it about Windows that makes you call it a distant possibility? Is it just that you are unfamiliar with it or is there some specific OS level feature you plan on needing?This is mostly because I wanted to abuse lazy commit of POSIX. Now that I think of it Windows is mostly ok, except for the fork trick used in concurrent GC. As Vladimir pointed out on Windows there are other ways to do it but they are more involved. --- Dmitry Olshansky
Jun 20 2017
On 06/19/2017 03:35 PM, Dmitry Olshansky wrote:My take on D's GC problem, also spoiler - I'm going to build a new one soonish. http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html --- Dmitry OlshanskyVery informative, thanks. However, I can think of many reasons like appreciation the efforts of the original authors to tone it down a little bit like changing "mistake" to "optimization opportunity", "criticism" to "observation", etc. :) Ali
Jun 19 2017
On Monday, 19 June 2017 at 23:10:43 UTC, Ali Çehreli wrote:On 06/19/2017 03:35 PM, Dmitry Olshansky wrote:I could call it a problem :) Still one reason I didn't go to D blog to post this is because it's a critique followed by a promise of action though.My take on D's GC problem, also spoiler - I'm going to build a new one soonish. http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html --- Dmitry OlshanskyVery informative, thanks. However, I can think of many reasons like appreciation the efforts of the original authors to tone it down a little bit like changing "mistake" to "optimization opportunity", "criticism" to "observation", etc. :)Ali--- Dmitry Olshansky
Jun 20 2017
On Mon, Jun 19, 2017 at 10:35:42PM +0000, Dmitry Olshansky via Digitalmars-d wrote:My take on D's GC problem, also spoiler - I'm going to build a new one soonish. http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html[...] Very interesting indeed! One question about killing the no interior pointer attribute: would this be problematic for 32-bit platforms? And if so, what do you plan to do about it? Keep the current GC as version(32bit) and your new version as version(64bit)? One (potentially crazy) idea that occurred to me while reading your post is TLS allocations. I haven't thought through the details of how this would interact with the existing language yet, but would it make sense for some allocations that you know will never be shared across threads to be allocated in a thread-local pool instead of the global pool? I.e., in addition to the global set of memory pools you also have thread-local memory pools. Then you could potentially run collections per-thread rather than stop-the-world. For example, if you have a bunch of threads that call a function that does a bunch of short-lived allocations that are not shared across threads, it seems to wasteful to have these allocations add to the global GC load. Why not have them go into a local pool that can be collected per-thread? Of course, whether the current language can take advantage of this is another matter. Perhaps if the function is pure and returns scope, then you know any allocation it makes can't possibly be shared with other threads, or something like that... On Mon, Jun 19, 2017 at 10:50:05PM +0000, Adam D. Ruppe via Digitalmars-d wrote:What is it about Windows that makes you call it a distant possibility? Is it just that you are unfamiliar with it or is there some specific OS level feature you plan on needing?He mentioned the "fork trick", which I assume refers to how Linux's implementation of fork() uses copy-on-write rather than immediately duplicating the parent process' memory structures. There was a D1 GC some time ago that depended on this behaviour to speed up the collection cycle. AFAIK, Windows does not have equivalent functionality to this. (Well, for that matter, I'm not sure Posix in general has this feature either, since AFAIK it's Linux-specific. But I surmise that modern-day *nix flavors probably have adopted this in one way or another, since otherwise the very common pattern of fork-and-exec would be inordinately expensive -- copying all the parent's pages only to replace them all pretty much immediately.) T -- Give me some fresh salted fish, please.
Jun 19 2017
On Monday, 19 June 2017 at 23:39:54 UTC, H. S. Teoh wrote:On Mon, Jun 19, 2017 at 10:50:05PM +0000, Adam D. Ruppe via Digitalmars-d wrote:I've read that there is such a function on Windows but you need to use undocumented/unofficial API to access it: e.g. https://github.com/opencollab/scilab/blob/master/scilab/modules/parallel/src/c/forkWindows.cWhat is it about Windows that makes you call it a distant possibility? Is it just that you are unfamiliar with it or is there some specific OS level feature you plan on needing?AFAIK, Windows does not have equivalent functionality to this.
Jun 19 2017
safety0ff wrote:On Monday, 19 June 2017 at 23:39:54 UTC, H. S. Teoh wrote:it is higly depends of undocumented windows internals, and not portable between windows versions. more-or-less working implementations of `fork()` were existed at least since NT3 era, but nobody considered 'em as more than a PoC, and even next service pack can break everything.On Mon, Jun 19, 2017 at 10:50:05PM +0000, Adam D. Ruppe via Digitalmars-d wrote:I've read that there is such a function on Windows but you need to use undocumented/unofficial API to access it: e.g. https://github.com/opencollab/scilab/blob/master/scilab/modules/parallel/src/c/forkWindows.cWhat is it about Windows that makes you call it a distant possibility? Is it just that you are unfamiliar with it or is there some specific OS level feature you plan on needing?AFAIK, Windows does not have equivalent functionality to this.
Jun 19 2017
On 2017-06-20 06:37, ketmar wrote:it is higly depends of undocumented windows internals, and not portable between windows versions. more-or-less working implementations of `fork()` were existed at least since NT3 era, but nobody considered 'em as more than a PoC, and even next service pack can break everything.I'm wondering what Windows 10 is using to implement "fork" for Windows Subsystem for Linux. If it's using these internal functions or something else. -- /Jacob Carlborg
Jun 20 2017
On 20/06/2017 12:41 PM, Jacob Carlborg wrote:On 2017-06-20 06:37, ketmar wrote:It wouldn't surprise me to learn that it was a posix layer specific syscall, meaning we can't from a native Windows process.it is higly depends of undocumented windows internals, and not portable between windows versions. more-or-less working implementations of `fork()` were existed at least since NT3 era, but nobody considered 'em as more than a PoC, and even next service pack can break everything.I'm wondering what Windows 10 is using to implement "fork" for Windows Subsystem for Linux. If it's using these internal functions or something else.
Jun 20 2017
On Tuesday, 20 June 2017 at 11:44:41 UTC, rikki cattermole wrote:On 20/06/2017 12:41 PM, Jacob Carlborg wrote:The Windows Subsystem for Linux is build on a new form processes called picoprocesses. There's a whole API build specifically to service WSL, that's not otherwise available (AFAIR) for security reasons to normal processes. I highly recommend watching this talk: https://www.youtube.com/watch?v=36Ykla27FIo and browsing through this repo: https://github.com/ionescu007/lxss which reveals many interesting details about that part of Windows. I have watched that talk a while ago and maybe I have misremembered something, but my understanding is that using the WSL infrastructure is off limits for normal Win32 processes and as such is not suitable for implementation of CoW pages for D's GC. (I watched that talk specifically because I was interested if some of that could be used in druntime.)On 2017-06-20 06:37, ketmar wrote:It wouldn't surprise me to learn that it was a posix layer specific syscall, meaning we can't from a native Windows process.it is higly depends of undocumented windows internals, and not portable between windows versions. more-or-less working implementations of `fork()` were existed at least since NT3 era, but nobody considered 'em as more than a PoC, and even next service pack can break everything.I'm wondering what Windows 10 is using to implement "fork" for Windows Subsystem for Linux. If it's using these internal functions or something else.
Jun 20 2017
On 2017-06-20 16:16, Petar Kirov [ZombineDev] wrote:I highly recommend watching this talk: https://www.youtube.com/watch?v=36Ykla27FIo and browsing through this repo: https://github.com/ionescu007/lxss which reveals many interesting details about that part of Windows.Looks interesting. -- /Jacob Carlborg
Jun 20 2017
H. S. Teoh wrote:He mentioned the "fork trick", which I assume refers to how Linux's implementation of fork() uses copy-on-write rather than immediately duplicating the parent process' memory structures. There was a D1 GC some time ago that depended on this behaviour to speed up the collection cycle.and it was even ported to D2, and worked. sadly, using `fork()` has it's own set of problems -- `fork()` itself is in no way a flawless expirience. like you can fork while other thread is inside glibc's `malloc()`, and BOOM! alot of glibc is locked forever, as `malloc()` lock is never released in child process. some other libraries may try to intercept `fork()` to do unnecessary "cleanup", and so on. so using "forking GC" require alot of discipline in coding and library use, or it will be an endless source of heisenbugs. new linux kernels got userfaultfd API (so code can simply `select()` on fd, and process protection violation from `mprotect()` without tricks with signals), but... to much of my joy and hapiness, the proposed API was just fine to create GC with mprotect barriers, and the final API that was included gladly omited that exactly necessary API call which allows to make it happen. great work, yeah. it may changed since then, tho, i didn't rechecked.
Jun 19 2017
On Tuesday, 20 June 2017 at 04:35:27 UTC, ketmar wrote:H. S. Teoh wrote:Since we are in control of what child does I see this as no issue. Just call mmap and do bump a pointer allocation.He mentioned the "fork trick", which I assume refers to how Linux's implementation of fork() uses copy-on-write rather than immediately duplicating the parent process' memory structures. There was a D1 GC some time ago that depended on this behaviour to speed up the collection cycle.and it was even ported to D2, and worked. sadly, using `fork()` has it's own set of problems -- `fork()` itself is in no way a flawless expirience. like you can fork while other thread is inside glibc's `malloc()`, and BOOM! alot of glibc is locked forever, as `malloc()` lock is never released in child process. some other libraries may try to intercept `fork()` to do unnecessary "cleanup", and so on.
Jun 20 2017
On Monday, 19 June 2017 at 23:39:54 UTC, H. S. Teoh wrote:On Mon, Jun 19, 2017 at 10:35:42PM +0000, Dmitry Olshansky via Digitalmars-d wrote:Yeah if said 32-bit application makes use of no interior pointer attribute then using old gc is an option. I have no plans for this broken attribute.My take on D's GC problem, also spoiler - I'm going to build a new one soonish. http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html[...] Very interesting indeed! One question about killing the no interior pointer attribute: would this be problematic for 32-bit platforms? And if so, what do you plan to do about it? Keep the current GC as version(32bit) and your new version as version(64bit)?One (potentially crazy) idea that occurred to me while reading your post is TLS allocations. I haven't thought through the details of how this would interact with the existing language yet, but would it make sense for some allocations that you know will never be shared across threads to be allocated in a thread-local pool instead of the global pool? I.e., in addition to the global set of memory pools you also have thread-local memory pools. Then you could potentially run collections per-thread rather than stop-the-world.This needs spec updateon interaction between TLS and shared, in particular the current trend of lock + cast away shared is problematic. Also the implicit cast to immutable of a result of unique expression.On Mon, Jun 19, 2017 at 10:50:05PM +0000, Adam D. Ruppe via Digitalmars-d wrote:To the best of my knowledge all of D's current target OSes support this save for Windows.What is it about Windows that makes you call it a distant possibility? Is it just that you are unfamiliar with it or is there some specific OS level feature you plan on needing?He mentioned the "fork trick", which I assume refers to how Linux's implementation of fork() uses copy-on-write rather than immediately duplicating the parent process' memory structures. There was a D1 GC some time ago that depended on this behaviour to speed up the collection cycle. AFAIK, Windows does not have equivalent functionality to this.T
Jun 20 2017
On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:My take on D's GC problem, also spoiler - I'm going to build a new one soonish.Looks like I'm not the only one itching to have a go at D's GC :) This will very likely be my DConf 2018 project. However, I have slightly different plans: - The GC should be usable as a library (mainly to facilitate testing). - Support for all platforms D already supports from the start. - Use design-by-introspection when applicable and design-by-contract elsewhere to split the design into modular components. - Make the GC configurable (using policies) and swappable at runtime. (No need to get clever, just treat previous implementation's pools as opaque void[]). - Support concurrency on Windows via anonymous memory-mapped files. - Support generational collection using write barriers implemented through memory protection. - Integrate existing GC work - don't reinvent the wheel. - More, much more debugging facilities! Integrate Diamond and Valgrind interoperability. - Gray-marking and compacting. - Still need to look at immix. I have some past work that I'd like to integrate (an experimental generational GC I wrote like 9 years ago for D1, Diamond, and Valgrind integration I have in a fork somewhere.)
Jun 19 2017
On Monday, 19 June 2017 at 23:52:16 UTC, Vladimir Panteleev wrote:On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:I see no problem in eventually uniting our efforts.My take on D's GC problem, also spoiler - I'm going to build a new one soonish.Looks like I'm not the only one itching to have a go at D's GC :) This will very likely be my DConf 2018 project. However, I have slightly different plans:- The GC should be usable as a library (mainly to facilitate testing). - Support for all platforms D already supports from the start. - Use design-by-introspection when applicable and design-by-contract elsewhere to split the design into modular components.Nice. A pool could have many different structures, the collector could then introspect on that. Sadly this almost doubles the effort so I will not go there.- Make the GC configurable (using policies) and swappable at runtime. (No need to get clever, just treat previous implementation's pools as opaque void[]). - Support concurrency on Windows via anonymous memory-mapped files.Yeah I recall Rainer and myself discussing this approach, it had some downside such as you need to remap each pool individually. Still doable.- Support generational collection using write barriers implemented through memory protection.Super slow sadly. That being said I belive D is just fine without generational GC. The generational hypothesis just doesn't hold to the extent it holds in say Java. My hypothesis is that most performance minded applications already allocate temporaries using region allocator of sorts (or using C heap).- Integrate existing GC work - don't reinvent the wheel. - More, much more debugging facilities! Integrate Diamond and Valgrind interoperability.I could use help on thus one.- Gray-marking and compacting. - Still need to look at immix. I have some past work that I'd like to integrate (an experimental generational GC I wrote like 9 years ago for D1, Diamond, and Valgrind integration I have in a fork somewhere.)--- Dmitry Olshansky
Jun 20 2017
On Tue, Jun 20, 2017 at 07:47:13AM +0000, Dmitry Olshansky via Digitalmars-d wrote:On Monday, 19 June 2017 at 23:52:16 UTC, Vladimir Panteleev wrote:[...][...] FWIW, here's a data point to the contrary: One of my projects involves constructing a (very large) AA that grows over time, and entries are never deleted. The AA itself is persistent and lasts until the end of the program. Besides the AA, there are a couple of arrays that also grow (more slowly) but eventually become unreferenced. Because of the sheer size of the AA, I've observed that GC collection cycles become slower and slower, yet most of this extra work is completely needless, because the only thing that might need collecting is the arrays, yet the GC has to mark the entire AA each time, only to discover it's still live. After some experimentation I discovered that I could get up to 40-50% performance improvement just by calling GC.disable and scheduling my own GC collection cycles via GC.collect at a slower rate than the current default setting.- Support generational collection using write barriers implemented through memory protection.Super slow sadly. That being said I belive D is just fine without generational GC. The generational hypothesis just doesn't hold to the extent it holds in say Java. My hypothesis is that most performance minded applications already allocate temporaries using region allocator of sorts (or using C heap).From this, it would seem to me that a generational collector would havehelped, since most of the AA will eventually migrate to older generations and most of the time the GC won't bother marking/scanning those parts. Of course, this is only for this particular program, and I can't say that this is typical usage for D programs in general. But I think D would still benefit from a generational collector. T -- What did the alien say to Schubert? "Take me to your lieder."
Jun 20 2017
On Tuesday, 20 June 2017 at 16:49:44 UTC, H. S. Teoh wrote:On Tue, Jun 20, 2017 at 07:47:13AM +0000, Dmitry Olshansky via Digitalmars-d wrote:Interestingly the moment you "reallocate" to expand the AA it will be considered a new object. Overall I think your case is more about faulty collection heuristics, that is collecting when there is a slim chance of getting enough of free space after collection.On Monday, 19 June 2017 at 23:52:16 UTC, Vladimir Panteleev wrote:[...] FWIW, here's a data point to the contrary: One of my projects involves constructing a (very large) AA that grows over time, and entries are never deleted. The AA itself is persistent and lasts until the end of the program. Besides the AA, there are a couple of arrays that also grow (more slowly) but eventually become unreferenced. Because of the sheer size of the AA, I've observed that GC collection cycles become slower and slower, yet most of this extra work is completely needless, because the only thing that might need collecting is the arrays, yet the GC has to mark the entire AA each time, only to discover it's still live. After some experimentation I discovered that I could get up to 40-50% performance improvement just by calling GC.disable and scheduling my own GC collection cycles via GC.collect at a slower rate than the current default setting.From this, it would seem to me that a generational collector would havehelped, since most of the AA will eventually migrate to older generations and most of the time the GC won't bother marking/scanning those parts. Of course, this is only for this particular program, and I can't say that this is typical usage for D programs in general. But I think D would still benefit from a generational collector.T
Jun 20 2017
On Tue, Jun 20, 2017 at 07:14:11PM +0000, Dmitry Olshansky via Digitalmars-d wrote:On Tuesday, 20 June 2017 at 16:49:44 UTC, H. S. Teoh wrote:[...]Interestingly the moment you "reallocate" to expand the AA it will be considered a new object.[...] This is not entirely true. The *table* itself will of course get moved to a new object, but most of the size of the AA comes from its entries, and those are nodes that stay in-place. You'll still have to scan references to the table, of course, but that's a lot better than scanning all the entries as well. T -- The diminished 7th chord is the most flexible and fear-instilling chord. Use it often, use it unsparingly, to subdue your listeners into submission!
Jun 20 2017
On 2017-06-20 01:52, Vladimir Panteleev wrote:- More, much more debugging facilities! Integrate Diamond and Valgrind interoperability.Don't for get the Clang sanitizers, assuming they work using LDC. -- /Jacob Carlborg
Jun 20 2017
On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:My take on D's GC problem, also spoiler - I'm going to build a new one soonish. http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html --- Dmitry Olshanskyif not a single pool is capable to service an allocation a new pool is allocatedshould probably be "if a single pool is not capable of servicing ..." Looove the figures! Looking forward to seeing the results.
Jun 19 2017
On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:My take on D's GC problem, also spoiler - I'm going to build a new one soonish. http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html --- Dmitry OlshanskyGood overview, however: the binary search pool lookup is used because it naturally supports variable sized pools. IMHO, simply concluding "A hash table could have saved quite a few cycles." glosses over the issue of handling variable sizes.
Jun 19 2017
On Tuesday, 20 June 2017 at 02:23:48 UTC, safety0ff wrote:On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:Pools are granular to 256kb irc, so the trick is to keep them 256kb aligned in memory. Then a map from 256kb chunks to pools is easily created. --- Dmitry OlshanskyMy take on D's GC problem, also spoiler - I'm going to build a new one soonish. http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html --- Dmitry OlshanskyGood overview, however: the binary search pool lookup is used because it naturally supports variable sized pools. IMHO, simply concluding "A hash table could have saved quite a few cycles." glosses over the issue of handling variable sizes.
Jun 20 2017
Dmitry Olshansky wrote:My take on D's GC problem, also spoiler - I'm going to build a new one soonish. http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html"...the dubious optimization of no interior pointers..." this is the ONLY (i emphasise it!) way i were able to make my e-mail and irc clients to not leak memory, and keep using GC. on 32-bit systems false pointers *is* a problem, and NO_INTERIOR really helps. turning NO_INTERIOR into something dog-slow (or noop) will make D unusable on 32-bit systems for anything more complex than helloworld and throwaway scripts. particularly, any app that should work for weeks or monthes without restart (yep, i want my mail client to Just Work, and i'm not rebooting my PC that often) will be *forced* to ditch GC. while NO_INTERIOR requires some coding discipline, it is invaluable in IRL apps.
Jun 19 2017
On 2017-06-20 06:54, ketmar wrote:"...the dubious optimization of no interior pointers..." this is the ONLY (i emphasise it!) way i were able to make my e-mail and irc clients to not leak memory, and keep using GC. on 32-bit systems false pointers *is* a problem, and NO_INTERIOR really helps. turning NO_INTERIOR into something dog-slow (or noop) will make D unusable on 32-bit systems for anything more complex than helloworld and throwaway scripts. particularly, any app that should work for weeks or monthes without restart (yep, i want my mail client to Just Work, and i'm not rebooting my PC that often) will be *forced* to ditch GC. while NO_INTERIOR requires some coding discipline, it is invaluable in IRL apps.You need to move to 64bit. Apple is already deprecating support for 32bit apps and after the next version of macOS (High Sierra) they're going to remove the support for 32bit apps. -- /Jacob Carlborg
Jun 20 2017
On Tuesday, 20 June 2017 at 11:49:49 UTC, Jacob Carlborg wrote:On 2017-06-20 06:54, ketmar wrote:I highly doubt that ketmar would have any intention of touching macOS regardless ;) Besides, there are many domains where the x32 ABI is a more worthwhile upgrade from i688 than x86_64.[...]You need to move to 64bit. Apple is already deprecating support for 32bit apps and after the next version of macOS (High Sierra) they're going to remove the support for 32bit apps.
Jun 20 2017
On 2017-06-20 16:03, Petar Kirov [ZombineDev] wrote:I highly doubt that ketmar would have any intention of touching macOS regardless ;)I somehow mixed up ketmar and Guillaume Piolat (which used to go by the alias p0nce). My mistake. -- /Jacob Carlborg
Jun 20 2017
On Tuesday, 20 June 2017 at 11:49:49 UTC, Jacob Carlborg wrote:You need to move to 64bit. Apple is already deprecating support for 32bit apps and after the next version of macOS (High Sierra) they're going to remove the support for 32bit apps.There are other 32-bit platforms that are going to stay on the market for a while. 32-bit ARMs won't disappear anytime soon.
Jun 25 2017
On 2017-06-25 17:47, Adrian Matoga wrote:There are other 32-bit platforms that are going to stay on the market for a while. 32-bit ARMs won't disappear anytime soon.Sure, but as I mentioned I mixed up ketmar and Guillaume Piolat and Guillaume Piolat is using Apple platforms, as far as I understand. -- /Jacob Carlborg
Jun 26 2017
On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:My take on D's GC problem, also spoiler - I'm going to build a new one soonish. http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html --- Dmitry OlshanskyThis was posted on reddit: https://www.reddit.com/r/programming/comments/6ic52d/inside_ds_gc/
Jun 20 2017
On 6/20/2017 12:04 AM, Nicholas Wilson wrote:On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:Also on hacker news.http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.htmlThis was posted on reddit: https://www.reddit.com/r/programming/comments/6ic52d/inside_ds_gc/
Jun 20 2017
My take on D's GC problem, also spoiler - I'm going to build a new one soonish. http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html --- Dmitry OlshanskyMany thanks for your efforts Dmitry :) May I ask you if you plan to make a soft real-time GC similar to the one implemented in the Nim language ? https://nim-lang.org/docs/gc.html https://nim-lang.org/docs/intern.html#debugging-nim-s-memory-management What is great about it is that we can call it regularly to collect memory a bit at a time, giving it a maximum delay for this operation. Being able to manually specify the maximum GC delay is what makes Nim compatible with game development, as collections can be made iteratively, and on a per-thread basis. In the worst case, we know that just one of the application threads will be delayed for a few milliseconds between two frame renderings, which is generally acceptable for games and other similar applications. Moreover this opens to opportunity to call the GC only in the main menu or the pause menu for instance, but not during actual gameplay, so that even these few lost milliseconds will always remain unnoticed. This is probably why Nim's author was once paid to wrap an open source game engine (Urho3D), and improve the language's native compatibility with C++ libraries. https://forum.nim-lang.org/t/870
Jun 20 2017
On Tuesday, 20 June 2017 at 15:16:01 UTC, Ecstatic Coder wrote:No incremental GC, sorry. It may grow thread-local collection one day, once spec is precise about what is allowed and what is not.My take on D's GC problem, also spoiler - I'm going to build a new one soonish. http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html --- Dmitry OlshanskyMany thanks for your efforts Dmitry :) May I ask you if you plan to make a soft real-time GC similar to the one implemented in the Nim language ? https://nim-lang.org/docs/gc.html https://nim-lang.org/docs/intern.html#debugging-nim-s-memory-management What is great about it is that we can call it regularly to collect memory a bit at a time, giving it a maximum delay for this operation.
Jun 20 2017
On Tuesday, 20 June 2017 at 15:16:01 UTC, Ecstatic Coder wrote:This is probably why Nim's author was once paid to wrap an open source game engine (Urho3D), and improve the language's native compatibility with C++ libraries. https://forum.nim-lang.org/t/870https://github.com/3dicc/Urhonimo/blob/master/Urho3D-1.32/Source/Engine/Container/Str.h http://dbartolini.github.io/crown/doxygen/structcrown_1_1_dynamic_string.html Is it always like this?
Jun 22 2017
On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html "But the main unanswered question is why? Why an extra pass?"It's likely to pave over the many pitfalls of D finalizers. E.g. finalizers corrupting data: class A { size_t i; } class B { A a; this(){ a = new A; } ~this() { a.i = 1; } } // modifying B.a.i is undefined behavior (e.g. it could corrupt the GC's freelist) E.g. finalizers reading undefined data: class A { virtual bool check() { return true; } } class B { A a; this(){ a = new A; } ~this() { a.check(); } } // B.a's object header is undefined (e.g. replaced with GC freelist pointer) There's also invariants, which are prepended to the finalizers, so their code is subject to the same issues. The best thing about the current implementation is that object resurrection has never been supported.
Jun 22 2017
On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:My take on D's GC problem, also spoiler - I'm going to build a new one soonish. http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html --- Dmitry OlshanskyFYI, we've tried to improve the binary pool search, but there aren't many pools and it's quite hard to beat. A hashtable for a pages in the address range is too big. I'd like to replace all of those separate pools types with a single page heap, similar to what TCMalloc is using. http://goog-perftools.sourceforge.net/doc/tcmalloc.html http://jamesgolick.com/2013/5/19/how-tcmalloc-works.html There was also https://github.com/dlang/druntime/pull/801 which got reverted. One problem that you'll run into with a Thread cache is synchronizing GC attributes. In the stalled work on a thread-cache for the current GC. Using single-reader single-writer queues to would've been an option there to reduce contention. https://github.com/MartinNowakhttps://github.com/dlang/druntime/compare/master...MartinNowak:gcCache#commitcomment-16202536
Jun 24 2017
On Saturday, 24 June 2017 at 15:31:21 UTC, Martin Nowak wrote:On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:Doesn't have to be for pages. Pool granularity is 256k, aligning the pools at this boundary is enough. On x64 pool granularity could be enlarged.My take on D's GC problem, also spoiler - I'm going to build a new one soonish. http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html --- Dmitry OlshanskyFYI, we've tried to improve the binary pool search, but there aren't many pools and it's quite hard to beat. A hashtable for a pages in the address range is too big.I'd like to replace all of those separate pools types with a single page heap, similar to what TCMalloc is using. http://goog-perftools.sourceforge.net/doc/tcmalloc.html http://jamesgolick.com/2013/5/19/how-tcmalloc-works.htmlI still think that separate pool types is better, see eg jemalloc.
Jun 24 2017
On Saturday, 24 June 2017 at 18:12:43 UTC, Dmitry Olshansky wrote:I still think that separate pool types is better, see eg jemalloc.Right now this leads to some inflation of RSS cause previously used and now freed pages can only be reused when the whole pool (e.g. 4MB or 16MB) is free again. It doesn't seem sensible to reserve 16MB only for big (>PAGESIZE) allocations. In particular once the pages are dirty and mapped, you'd rather want to make use of them.
Jun 25 2017