digitalmars.D - Analysis of D GC

Dmitry Olshansky (5/5) Jun 19 2017 My take on D's GC problem, also spoiler - I'm going to build a

Adam D. Ruppe (3/3) Jun 19 2017 What is it about Windows that makes you call it a distant

Dmitry Olshansky (7/10) Jun 20 2017 This is mostly because I wanted to abuse lazy commit of POSIX.

Petar Kirov [ZombineDev] (4/15) Jun 20 2017 BTW, Rainer Schuetze has studied this in detail and has written

=?UTF-8?Q?Ali_=c3=87ehreli?= (7/12) Jun 19 2017 Very informative, thanks.

Dmitry Olshansky (6/21) Jun 20 2017 I could call it a problem :) Still one reason I didn't go to D

H. S. Teoh via Digitalmars-d (38/45) Jun 19 2017 [...]

safety0ff (5/11) Jun 19 2017 I've read that there is such a function on Windows but you need

ketmar (5/18) Jun 19 2017 it is higly depends of undocumented windows internals, and not portable

Jacob Carlborg (6/10) Jun 20 2017 I'm wondering what Windows 10 is using to implement "fork" for Windows

rikki cattermole (3/14) Jun 20 2017 It wouldn't surprise me to learn that it was a posix layer specific

Petar Kirov [ZombineDev] (18/33) Jun 20 2017 The Windows Subsystem for Linux is build on a new form processes

Jacob Carlborg (4/8) Jun 20 2017 Looks interesting.

ketmar (16/21) Jun 19 2017 and it was even ported to D2, and worked. sadly, using `fork()` has it's...

Dmitry Olshansky (3/20) Jun 20 2017 Since we are in control of what child does I see this as no

Dmitry Olshansky (10/43) Jun 20 2017 Yeah if said 32-bit application makes use of no interior pointer

Vladimir Panteleev (25/27) Jun 19 2017 Looks like I'm not the only one itching to have a go at D's GC :)

Dmitry Olshansky (16/43) Jun 20 2017 Nice. A pool could have many different structures, the collector

H. S. Teoh via Digitalmars-d (25/35) Jun 20 2017 [...]

Dmitry Olshansky (6/35) Jun 20 2017 Interestingly the moment you "reallocate" to expand the AA it

H. S. Teoh via Digitalmars-d (11/14) Jun 20 2017 [...]

Jacob Carlborg (4/6) Jun 20 2017 Don't for get the Clang sanitizers, assuming they work using LDC.

Nicholas Wilson (4/11) Jun 19 2017 should probably be
safety0ff (6/11) Jun 19 2017 Good overview, however:

Dmitry Olshansky (6/19) Jun 20 2017 Pools are granular to 256kb irc, so the trick is to keep them

ketmar (11/14) Jun 19 2017 "...the dubious optimization of no interior pointers..."

Jacob Carlborg (6/20) Jun 20 2017 You need to move to 64bit. Apple is already deprecating support for

Petar Kirov [ZombineDev] (5/10) Jun 20 2017 I highly doubt that ketmar would have any intention of touching

Jacob Carlborg (5/7) Jun 20 2017 I somehow mixed up ketmar and Guillaume Piolat (which used to go by the

Adrian Matoga (3/6) Jun 25 2017 There are other 32-bit platforms that are going to stay on the

Jacob Carlborg (5/7) Jun 26 2017 Sure, but as I mentioned I mixed up ketmar and Guillaume Piolat and

Nicholas Wilson (3/8) Jun 20 2017 This was posted on reddit:

Walter Bright (2/6) Jun 20 2017 Also on hacker news.

Ecstatic Coder (23/28) Jun 20 2017 Many thanks for your efforts Dmitry :)

Dmitry Olshansky (3/18) Jun 20 2017 No incremental GC, sorry. It may grow thread-local collection one
Kagamin (4/8) Jun 22 2017 https://github.com/3dicc/Urhonimo/blob/master/Urho3D-1.32/Source/Engine/...

safety0ff (16/18) Jun 22 2017 It's likely to pave over the many pitfalls of D finalizers.
Martin Nowak (16/21) Jun 24 2017 FYI, we've tried to improve the binary pool search, but there

Dmitry Olshansky (5/20) Jun 24 2017 Doesn't have to be for pages. Pool granularity is 256k, aligning

Martin Nowak (7/9) Jun 25 2017 Right now this leads to some inflation of RSS cause previously

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

My take on D's GC problem, also spoiler - I'm going to build a 
new one soonish.

http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

---
Dmitry Olshansky

Jun 19 2017

Adam D. Ruppe <destructionator gmail.com> writes:

What is it about Windows that makes you call it a distant 
possibility? Is it just that you are unfamiliar with it or is 
there some specific OS level feature you plan on needing?

Jun 19 2017

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Monday, 19 June 2017 at 22:50:05 UTC, Adam D. Ruppe wrote:
 What is it about Windows that makes you call it a distant 
 possibility? Is it just that you are unfamiliar with it or is 
 there some specific OS level feature you plan on needing?

This is mostly because I wanted to abuse lazy commit of POSIX. 
Now that I think of it Windows is mostly ok, except for the fork 
trick used in concurrent GC. As Vladimir pointed out on Windows 
there are other ways to do it but they are more involved.

---
Dmitry Olshansky

Jun 20 2017

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Tuesday, 20 June 2017 at 07:11:10 UTC, Dmitry Olshansky wrote:
 On Monday, 19 June 2017 at 22:50:05 UTC, Adam D. Ruppe wrote:
 What is it about Windows that makes you call it a distant 
 possibility? Is it just that you are unfamiliar with it or is 
 there some specific OS level feature you plan on needing?

 This is mostly because I wanted to abuse lazy commit of POSIX. 
 Now that I think of it Windows is mostly ok, except for the 
 fork trick used in concurrent GC. As Vladimir pointed out on 
 Windows there are other ways to do it but they are more 
 involved.

 ---
 Dmitry Olshansky

BTW, Rainer Schuetze has studied this in detail and has written 
down some of it here: 
http://rainers.github.io/visuald/druntime/concurrentgc.html

Jun 20 2017

=?UTF-8?Q?Ali_=c3=87ehreli?= <acehreli yahoo.com> writes:

On 06/19/2017 03:35 PM, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a new one
 soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky

Very informative, thanks.

However, I can think of many reasons like appreciation the efforts of 
the original authors to tone it down a little bit like changing 
"mistake" to "optimization opportunity", "criticism" to "observation", 
etc. :)

Ali

Jun 19 2017

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Monday, 19 June 2017 at 23:10:43 UTC, Ali Çehreli wrote:
 On 06/19/2017 03:35 PM, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one
 soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky

 Very informative, thanks.

 However, I can think of many reasons like appreciation the 
 efforts of the original authors to tone it down a little bit 
 like changing "mistake" to "optimization opportunity", 
 "criticism" to "observation", etc. :)

I could call it a problem :) Still one reason I didn't go to D 
blog to post this is because it's a critique followed by a 
promise of action though.

 Ali

---
Dmitry Olshansky

Jun 20 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Mon, Jun 19, 2017 at 10:35:42PM +0000, Dmitry Olshansky via Digitalmars-d
wrote:
 My take on D's GC problem, also spoiler - I'm going to build a new one
 soonish.
 
 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

[...]

Very interesting indeed!

One question about killing the no interior pointer attribute: would this
be problematic for 32-bit platforms? And if so, what do you plan to do
about it?  Keep the current GC as version(32bit) and your new version as
version(64bit)?

One (potentially crazy) idea that occurred to me while reading your post
is TLS allocations. I haven't thought through the details of how this
would interact with the existing language yet, but would it make sense
for some allocations that you know will never be shared across threads
to be allocated in a thread-local pool instead of the global pool? I.e.,
in addition to the global set of memory pools you also have thread-local
memory pools. Then you could potentially run collections per-thread
rather than stop-the-world.

For example, if you have a bunch of threads that call a function that
does a bunch of short-lived allocations that are not shared across
threads, it seems to wasteful to have these allocations add to the
global GC load. Why not have them go into a local pool that can be
collected per-thread?  Of course, whether the current language can take
advantage of this is another matter.  Perhaps if the function is pure
and returns scope, then you know any allocation it makes can't possibly
be shared with other threads, or something like that...


On Mon, Jun 19, 2017 at 10:50:05PM +0000, Adam D. Ruppe via Digitalmars-d wrote:
 What is it about Windows that makes you call it a distant possibility?
 Is it just that you are unfamiliar with it or is there some specific
 OS level feature you plan on needing?

He mentioned the "fork trick", which I assume refers to how Linux's
implementation of fork() uses copy-on-write rather than immediately
duplicating the parent process' memory structures.  There was a D1 GC
some time ago that depended on this behaviour to speed up the collection
cycle.  AFAIK, Windows does not have equivalent functionality to this.

(Well, for that matter, I'm not sure Posix in general has this feature
either, since AFAIK it's Linux-specific. But I surmise that modern-day
*nix flavors probably have adopted this in one way or another, since
otherwise the very common pattern of fork-and-exec would be inordinately
expensive -- copying all the parent's pages only to replace them all
pretty much immediately.)


T

-- 
Give me some fresh salted fish, please.

Jun 19 2017

safety0ff <safety0ff.dev gmail.com> writes:

On Monday, 19 June 2017 at 23:39:54 UTC, H. S. Teoh wrote:
 On Mon, Jun 19, 2017 at 10:50:05PM +0000, Adam D. Ruppe via 
 Digitalmars-d wrote:
 What is it about Windows that makes you call it a distant 
 possibility? Is it just that you are unfamiliar with it or is 
 there some specific OS level feature you plan on needing?

 AFAIK, Windows does not have equivalent functionality to this.

I've read that there is such a function on Windows but you need 
to use undocumented/unofficial API to access it:

e.g. 
https://github.com/opencollab/scilab/blob/master/scilab/modules/parallel/src/c/forkWindows.c

Jun 19 2017

ketmar <ketmar ketmar.no-ip.org> writes:

safety0ff wrote:

 On Monday, 19 June 2017 at 23:39:54 UTC, H. S. Teoh wrote:
 On Mon, Jun 19, 2017 at 10:50:05PM +0000, Adam D. Ruppe via 
 Digitalmars-d wrote:
 What is it about Windows that makes you call it a distant possibility? 
 Is it just that you are unfamiliar with it or is there some specific OS 
 level feature you plan on needing?

 AFAIK, Windows does not have equivalent functionality to this.

 I've read that there is such a function on Windows but you need to use 
 undocumented/unofficial API to access it:

 e.g. 
 https://github.com/opencollab/scilab/blob/master/scilab/modules/parallel/src/c/forkWindows.c

it is higly depends of undocumented windows internals, and not portable 
between windows versions. more-or-less working implementations of `fork()` 
were existed at least since NT3 era, but nobody considered 'em as more than 
a PoC, and even next service pack can break everything.

Jun 19 2017

Jacob Carlborg <doob me.com> writes:

On 2017-06-20 06:37, ketmar wrote:

 it is higly depends of undocumented windows internals, and not portable 
 between windows versions. more-or-less working implementations of 
 `fork()` were existed at least since NT3 era, but nobody considered 'em 
 as more than a PoC, and even next service pack can break everything.

I'm wondering what Windows 10 is using to implement "fork" for Windows 
Subsystem for Linux. If it's using these internal functions or something 
else.

-- 
/Jacob Carlborg

Jun 20 2017

rikki cattermole <rikki cattermole.co.nz> writes:

On 20/06/2017 12:41 PM, Jacob Carlborg wrote:
 On 2017-06-20 06:37, ketmar wrote:
 
 it is higly depends of undocumented windows internals, and not 
 portable between windows versions. more-or-less working 
 implementations of `fork()` were existed at least since NT3 era, but 
 nobody considered 'em as more than a PoC, and even next service pack 
 can break everything.

 
 I'm wondering what Windows 10 is using to implement "fork" for Windows 
 Subsystem for Linux. If it's using these internal functions or something 
 else.

It wouldn't surprise me to learn that it was a posix layer specific 
syscall, meaning we can't from a native Windows process.

Jun 20 2017

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Tuesday, 20 June 2017 at 11:44:41 UTC, rikki cattermole wrote:
 On 20/06/2017 12:41 PM, Jacob Carlborg wrote:
 On 2017-06-20 06:37, ketmar wrote:
 
 it is higly depends of undocumented windows internals, and 
 not portable between windows versions. more-or-less working 
 implementations of `fork()` were existed at least since NT3 
 era, but nobody considered 'em as more than a PoC, and even 
 next service pack can break everything.

 
 I'm wondering what Windows 10 is using to implement "fork" for 
 Windows Subsystem for Linux. If it's using these internal 
 functions or something else.

 It wouldn't surprise me to learn that it was a posix layer 
 specific syscall, meaning we can't from a native Windows 
 process.

The Windows Subsystem for Linux is build on a new form processes 
called
picoprocesses. There's a whole API build specifically to service 
WSL,
that's not otherwise available (AFAIR) for security reasons to 
normal processes.

I highly recommend watching this talk: 
https://www.youtube.com/watch?v=36Ykla27FIo and browsing through 
this repo: https://github.com/ionescu007/lxss which reveals many 
interesting details about that part of Windows.

I have watched that talk a while ago and maybe I have 
misremembered something, but my understanding is that using the 
WSL infrastructure is off limits for normal Win32 processes and 
as such is not suitable for implementation of CoW pages for D's 
GC.
(I watched that talk specifically because I was interested if 
some of that could be used in druntime.)

Jun 20 2017

Jacob Carlborg <doob me.com> writes:

On 2017-06-20 16:16, Petar Kirov [ZombineDev] wrote:

 I highly recommend watching this talk: 
 https://www.youtube.com/watch?v=36Ykla27FIo and browsing through this 
 repo: https://github.com/ionescu007/lxss which reveals many interesting 
 details about that part of Windows.

Looks interesting.

-- 
/Jacob Carlborg

Jun 20 2017

ketmar <ketmar ketmar.no-ip.org> writes:

H. S. Teoh wrote:

 He mentioned the "fork trick", which I assume refers to how Linux's
 implementation of fork() uses copy-on-write rather than immediately
 duplicating the parent process' memory structures.  There was a D1 GC
 some time ago that depended on this behaviour to speed up the collection
 cycle.

and it was even ported to D2, and worked. sadly, using `fork()` has it's 
own set of problems -- `fork()` itself is in no way  a flawless expirience. 
like you can fork while other thread is inside glibc's `malloc()`, and 
BOOM! alot of glibc is locked forever, as `malloc()` lock is never released 
in child process. some other libraries may try to intercept `fork()` to 
do unnecessary "cleanup", and so on.

so using "forking GC" require alot of discipline in coding and library use, 
or it will be an endless source of heisenbugs.

new linux kernels got userfaultfd API (so code can simply `select()` on fd, 
and process protection violation from `mprotect()` without tricks with 
signals), but... to much of my joy and hapiness, the proposed API was just 
fine to create GC with mprotect barriers, and the final API that was 
included gladly omited that exactly necessary API call which allows to make 
it happen. great work, yeah. it may changed since then, tho, i didn't 
rechecked.

Jun 19 2017

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Tuesday, 20 June 2017 at 04:35:27 UTC, ketmar wrote:
 H. S. Teoh wrote:

 He mentioned the "fork trick", which I assume refers to how 
 Linux's
 implementation of fork() uses copy-on-write rather than 
 immediately
 duplicating the parent process' memory structures.  There was 
 a D1 GC
 some time ago that depended on this behaviour to speed up the 
 collection
 cycle.

 and it was even ported to D2, and worked. sadly, using `fork()` 
 has it's own set of problems -- `fork()` itself is in no way  a 
 flawless expirience. like you can fork while other thread is 
 inside glibc's `malloc()`, and BOOM! alot of glibc is locked 
 forever, as `malloc()` lock is never released in child process. 
 some other libraries may try to intercept `fork()` to do 
 unnecessary "cleanup", and so on.

Since we are in control of what child does I see this as no 
issue. Just call mmap and do bump a pointer allocation.

Jun 20 2017

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Monday, 19 June 2017 at 23:39:54 UTC, H. S. Teoh wrote:
 On Mon, Jun 19, 2017 at 10:35:42PM +0000, Dmitry Olshansky via 
 Digitalmars-d wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.
 
 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 [...]

 Very interesting indeed!

 One question about killing the no interior pointer attribute: 
 would this be problematic for 32-bit platforms? And if so, what 
 do you plan to do about it?  Keep the current GC as 
 version(32bit) and your new version as version(64bit)?

Yeah if said 32-bit application makes use of no interior pointer 
attribute then using old gc is an option. I have no plans for 
this broken attribute.

 One (potentially crazy) idea that occurred to me while reading 
 your post is TLS allocations. I haven't thought through the 
 details of how this would interact with the existing language 
 yet, but would it make sense for some allocations that you know 
 will never be shared across threads to be allocated in a 
 thread-local pool instead of the global pool? I.e., in addition 
 to the global set of memory pools you also have thread-local 
 memory pools. Then you could potentially run collections 
 per-thread rather than stop-the-world.

This needs spec updateon interaction between TLS and shared, in 
particular the current trend of lock + cast away shared is 
problematic. Also the implicit cast to immutable of a result of 
unique expression.

 On Mon, Jun 19, 2017 at 10:50:05PM +0000, Adam D. Ruppe via 
 Digitalmars-d wrote:
 What is it about Windows that makes you call it a distant 
 possibility? Is it just that you are unfamiliar with it or is 
 there some specific OS level feature you plan on needing?

 He mentioned the "fork trick", which I assume refers to how 
 Linux's implementation of fork() uses copy-on-write rather than 
 immediately duplicating the parent process' memory structures.  
 There was a D1 GC some time ago that depended on this behaviour 
 to speed up the collection cycle.  AFAIK, Windows does not have 
 equivalent functionality to this.

To the best of my knowledge all of D's current target OSes 
support this save for Windows.

 T

Jun 20 2017

Vladimir Panteleev <thecybershadow.lists gmail.com> writes:

On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.

Looks like I'm not the only one itching to have a go at D's GC :) 
This will very likely be my DConf 2018 project. However, I have 
slightly different plans:

- The GC should be usable as a library (mainly to facilitate 
testing).
- Support for all platforms D already supports from the start.
- Use design-by-introspection when applicable and 
design-by-contract elsewhere to split the design into modular 
components.
- Make the GC configurable (using policies) and swappable at 
runtime. (No need to get clever, just treat previous 
implementation's pools as opaque void[]).
- Support concurrency on Windows via anonymous memory-mapped 
files.
- Support generational collection using write barriers 
implemented through memory protection.
- Integrate existing GC work - don't reinvent the wheel.
- More, much more debugging facilities! Integrate Diamond and 
Valgrind interoperability.
- Gray-marking and compacting.
- Still need to look at immix.

I have some past work that I'd like to integrate (an experimental 
generational GC I wrote like 9 years ago for D1, Diamond, and 
Valgrind integration I have in a fork somewhere.)

Jun 19 2017

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Monday, 19 June 2017 at 23:52:16 UTC, Vladimir Panteleev wrote:
 On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.

 Looks like I'm not the only one itching to have a go at D's GC 
 :) This will very likely be my DConf 2018 project. However, I 
 have slightly different plans:

I see no problem in eventually uniting our efforts.

 - The GC should be usable as a library (mainly to facilitate 
 testing).
 - Support for all platforms D already supports from the start.
 - Use design-by-introspection when applicable and 
 design-by-contract elsewhere to split the design into modular 
 components.

Nice. A pool could have many different structures, the collector 
could then introspect on that. Sadly this almost doubles the 
effort so I will not go there.

 - Make the GC configurable (using policies) and swappable at 
 runtime. (No need to get clever, just treat previous 
 implementation's pools as opaque void[]).
 - Support concurrency on Windows via anonymous memory-mapped 
 files.

Yeah I recall Rainer and myself discussing this approach, it had 
some downside such as you need to remap each pool individually. 
Still doable.

 - Support generational collection using write barriers 
 implemented through memory protection.

Super slow sadly. That being said I belive D is just fine without 
generational GC. The generational hypothesis just doesn't hold to 
the extent it holds in say Java. My hypothesis is that most 
performance minded applications already allocate temporaries 
using region allocator of sorts (or using C heap).

 - Integrate existing GC work - don't reinvent the wheel.
 - More, much more debugging facilities! Integrate Diamond and 
 Valgrind interoperability.

  I could use help on thus one.

 - Gray-marking and compacting.
 - Still need to look at immix.

 I have some past work that I'd like to integrate (an 
 experimental generational GC I wrote like 9 years ago for D1, 
 Diamond, and Valgrind integration I have in a fork somewhere.)

---
Dmitry Olshansky

Jun 20 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Tue, Jun 20, 2017 at 07:47:13AM +0000, Dmitry Olshansky via Digitalmars-d
wrote:
 On Monday, 19 June 2017 at 23:52:16 UTC, Vladimir Panteleev wrote:

[...]
 - Support generational collection using write barriers implemented
 through memory protection.

 
 Super slow sadly. That being said I belive D is just fine without
 generational GC. The generational hypothesis just doesn't hold to the
 extent it holds in say Java. My hypothesis is that most performance
 minded applications already allocate temporaries using region
 allocator of sorts (or using C heap).

[...]

FWIW, here's a data point to the contrary:

One of my projects involves constructing a (very large) AA that grows
over time, and entries are never deleted.  The AA itself is persistent
and lasts until the end of the program.  Besides the AA, there are a
couple of arrays that also grow (more slowly) but eventually become
unreferenced.  Because of the sheer size of the AA, I've observed that
GC collection cycles become slower and slower, yet most of this extra
work is completely needless, because the only thing that might need
collecting is the arrays, yet the GC has to mark the entire AA each
time, only to discover it's still live.

After some experimentation I discovered that I could get up to 40-50%
performance improvement just by calling GC.disable and scheduling my own
GC collection cycles via GC.collect at a slower rate than the current
default setting.

From this, it would seem to me that a generational collector would have

helped, since most of the AA will eventually migrate to older
generations and most of the time the GC won't bother marking/scanning
those parts.  Of course, this is only for this particular program, and I
can't say that this is typical usage for D programs in general.  But I
think D would still benefit from a generational collector.


T

-- 
What did the alien say to Schubert? "Take me to your lieder."

Jun 20 2017

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Tuesday, 20 June 2017 at 16:49:44 UTC, H. S. Teoh wrote:
 On Tue, Jun 20, 2017 at 07:47:13AM +0000, Dmitry Olshansky via 
 Digitalmars-d wrote:
 On Monday, 19 June 2017 at 23:52:16 UTC, Vladimir Panteleev 
 wrote:

 [...]


 FWIW, here's a data point to the contrary:

 One of my projects involves constructing a (very large) AA that 
 grows over time, and entries are never deleted.  The AA itself 
 is persistent and lasts until the end of the program.  Besides 
 the AA, there are a couple of arrays that also grow (more 
 slowly) but eventually become unreferenced.  Because of the 
 sheer size of the AA, I've observed that GC collection cycles 
 become slower and slower, yet most of this extra work is 
 completely needless, because the only thing that might need 
 collecting is the arrays, yet the GC has to mark the entire AA 
 each time, only to discover it's still live.

 After some experimentation I discovered that I could get up to 
 40-50% performance improvement just by calling GC.disable and 
 scheduling my own GC collection cycles via GC.collect at a 
 slower rate than the current default setting.

From this, it would seem to me that a generational collector 
would have

 helped, since most of the AA will eventually migrate to older 
 generations and most of the time the GC won't bother 
 marking/scanning those parts.  Of course, this is only for this 
 particular program, and I can't say that this is typical usage 
 for D programs in general.  But I think D would still benefit 
 from a generational collector.

Interestingly the moment you "reallocate" to expand the AA it 
will be considered a new object. Overall I think your case is 
more about faulty collection heuristics, that is collecting when 
there is a slim chance of getting enough of free space after 
collection.

 T

Jun 20 2017

"H. S. Teoh via Digitalmars-d" <digitalmars-d puremagic.com> writes:

On Tue, Jun 20, 2017 at 07:14:11PM +0000, Dmitry Olshansky via Digitalmars-d
wrote:
 On Tuesday, 20 June 2017 at 16:49:44 UTC, H. S. Teoh wrote:

[...]
 Interestingly the moment you "reallocate" to expand the AA it will be
 considered a new object.

[...]

This is not entirely true.  The *table* itself will of course get moved
to a new object, but most of the size of the AA comes from its entries,
and those are nodes that stay in-place. You'll still have to scan
references to the table, of course, but that's a lot better than
scanning all the entries as well.


T

-- 
The diminished 7th chord is the most flexible and fear-instilling chord. Use it
often, use it unsparingly, to subdue your listeners into submission!

Jun 20 2017

Jacob Carlborg <doob me.com> writes:

On 2017-06-20 01:52, Vladimir Panteleev wrote:

 - More, much more debugging facilities! Integrate Diamond and Valgrind 
 interoperability.

Don't for get the Clang sanitizers, assuming they work using LDC.

-- 
/Jacob Carlborg

Jun 20 2017

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky

 if not a single pool is capable to service an allocation a new 
 pool is allocated

should probably be
"if a single pool is not capable of servicing ..."

Looove the figures! Looking forward to seeing the results.

Jun 19 2017

safety0ff <safety0ff.dev gmail.com> writes:

On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky

Good overview, however:
the binary search pool lookup is used because it naturally 
supports variable sized pools.
IMHO, simply concluding "A hash table could have saved quite a 
few cycles." glosses over the issue of handling variable sizes.

Jun 19 2017

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Tuesday, 20 June 2017 at 02:23:48 UTC, safety0ff wrote:
 On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky

 Good overview, however:
 the binary search pool lookup is used because it naturally 
 supports variable sized pools.
 IMHO, simply concluding "A hash table could have saved quite a 
 few cycles." glosses over the issue of handling variable sizes.

Pools are granular to 256kb irc, so the trick is to keep them 
256kb aligned in memory. Then a map from 256kb chunks to pools is 
easily created.


---
Dmitry Olshansky

Jun 20 2017

ketmar <ketmar ketmar.no-ip.org> writes:

Dmitry Olshansky wrote:

 My take on D's GC problem, also spoiler - I'm going to build a new one 
 soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

"...the dubious optimization of no interior pointers..."

this is the ONLY (i emphasise it!) way i were able to make my e-mail and 
irc clients to not leak memory, and keep using GC. on 32-bit systems false 
pointers *is* a problem, and NO_INTERIOR really helps.

turning NO_INTERIOR into something dog-slow (or noop) will make D unusable 
on 32-bit systems for anything more complex than helloworld and throwaway 
scripts. particularly, any app that should work for weeks or monthes 
without restart (yep, i want my mail client to Just Work, and i'm not 
rebooting my PC that often) will be *forced* to ditch GC.

while NO_INTERIOR requires some coding discipline, it is invaluable in IRL apps.

Jun 19 2017

Jacob Carlborg <doob me.com> writes:

On 2017-06-20 06:54, ketmar wrote:

 "...the dubious optimization of no interior pointers..."
 
 this is the ONLY (i emphasise it!) way i were able to make my e-mail and 
 irc clients to not leak memory, and keep using GC. on 32-bit systems 
 false pointers *is* a problem, and NO_INTERIOR really helps.
 
 turning NO_INTERIOR into something dog-slow (or noop) will make D 
 unusable on 32-bit systems for anything more complex than helloworld and 
 throwaway scripts. particularly, any app that should work for weeks or 
 monthes without restart (yep, i want my mail client to Just Work, and 
 i'm not rebooting my PC that often) will be *forced* to ditch GC.
 
 while NO_INTERIOR requires some coding discipline, it is invaluable in 
 IRL apps.

You need to move to 64bit. Apple is already deprecating support for 
32bit apps and after the next version of macOS (High Sierra) they're 
going to remove the support for 32bit apps.

-- 
/Jacob Carlborg

Jun 20 2017

Petar Kirov [ZombineDev] <petar.p.kirov gmail.com> writes:

On Tuesday, 20 June 2017 at 11:49:49 UTC, Jacob Carlborg wrote:
 On 2017-06-20 06:54, ketmar wrote:

 [...]

 You need to move to 64bit. Apple is already deprecating support 
 for 32bit apps and after the next version of macOS (High 
 Sierra) they're going to remove the support for 32bit apps.

I highly doubt that ketmar would have any intention of touching 
macOS regardless ;)
Besides, there are many domains where the x32 ABI is a more 
worthwhile upgrade from i688 than x86_64.

Jun 20 2017

Jacob Carlborg <doob me.com> writes:

On 2017-06-20 16:03, Petar Kirov [ZombineDev] wrote:

 I highly doubt that ketmar would have any intention of touching macOS
 regardless ;)

I somehow mixed up ketmar and Guillaume Piolat (which used to go by the 
alias p0nce). My mistake.

-- 
/Jacob Carlborg

Jun 20 2017

Adrian Matoga <dlang.spam matoga.info> writes:

On Tuesday, 20 June 2017 at 11:49:49 UTC, Jacob Carlborg wrote:
 You need to move to 64bit. Apple is already deprecating support 
 for 32bit apps and after the next version of macOS (High 
 Sierra) they're going to remove the support for 32bit apps.

There are other 32-bit platforms that are going to stay on the 
market for a while. 32-bit ARMs won't disappear anytime soon.

Jun 25 2017

Jacob Carlborg <doob me.com> writes:

On 2017-06-25 17:47, Adrian Matoga wrote:

 There are other 32-bit platforms that are going to stay on the market
 for a while. 32-bit ARMs won't disappear anytime soon.

Sure, but as I mentioned I mixed up ketmar and Guillaume Piolat and 
Guillaume Piolat is using Apple platforms, as far as I understand.

-- 
/Jacob Carlborg

Jun 26 2017

Nicholas Wilson <iamthewilsonator hotmail.com> writes:

On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky

This was posted on reddit: 
https://www.reddit.com/r/programming/comments/6ic52d/inside_ds_gc/

Jun 20 2017

Walter Bright <newshound2 digitalmars.com> writes:

On 6/20/2017 12:04 AM, Nicholas Wilson wrote:
 On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 This was posted on reddit: 
 https://www.reddit.com/r/programming/comments/6ic52d/inside_ds_gc/

Also on hacker news.

Jun 20 2017

Ecstatic Coder <ecstatic.coder gmail.com> writes:

 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky

Many thanks for your efforts Dmitry :)

May I ask you if you plan to make a soft real-time GC similar to 
the one implemented in the Nim language ?

https://nim-lang.org/docs/gc.html
https://nim-lang.org/docs/intern.html#debugging-nim-s-memory-management

What is great about it is that we can call it regularly to 
collect memory a bit at a time, giving it a maximum delay for 
this operation.

Being able to manually specify the maximum GC delay is what makes 
Nim compatible with game development, as collections can be made 
iteratively, and on a per-thread basis.

In the worst case, we know that just one of the application 
threads will be delayed for a few milliseconds between two frame 
renderings, which is generally acceptable for games and other 
similar applications.

Moreover this opens to opportunity to call the GC only in the 
main menu or the pause menu for instance, but not during actual 
gameplay, so that even these few lost milliseconds will always 
remain unnoticed.

This is probably why Nim's author was once paid to wrap an open 
source game engine (Urho3D), and improve the language's native 
compatibility with C++ libraries.

https://forum.nim-lang.org/t/870

Jun 20 2017

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Tuesday, 20 June 2017 at 15:16:01 UTC, Ecstatic Coder wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky

 Many thanks for your efforts Dmitry :)

 May I ask you if you plan to make a soft real-time GC similar 
 to the one implemented in the Nim language ?

 https://nim-lang.org/docs/gc.html
 https://nim-lang.org/docs/intern.html#debugging-nim-s-memory-management

 What is great about it is that we can call it regularly to 
 collect memory a bit at a time, giving it a maximum delay for 
 this operation.

No incremental GC, sorry. It may grow thread-local collection one 
day, once spec is precise about what is allowed and what is not.

Jun 20 2017

Kagamin <spam here.lot> writes:

On Tuesday, 20 June 2017 at 15:16:01 UTC, Ecstatic Coder wrote:
 This is probably why Nim's author was once paid to wrap an open 
 source game engine (Urho3D), and improve the language's native 
 compatibility with C++ libraries.

 https://forum.nim-lang.org/t/870

https://github.com/3dicc/Urhonimo/blob/master/Urho3D-1.32/Source/Engine/Container/Str.h
http://dbartolini.github.io/crown/doxygen/structcrown_1_1_dynamic_string.html

Is it always like this?

Jun 22 2017

safety0ff <safety0ff.dev gmail.com> writes:

On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 "But the main unanswered question is why? Why an extra pass?"

It's likely to pave over the many pitfalls of D finalizers.

E.g. finalizers corrupting data:
class A { size_t i; }
class B { A a; this(){ a = new A; } ~this() { a.i = 1; } }
// modifying B.a.i is undefined behavior (e.g. it could corrupt 
the GC's freelist)

E.g. finalizers reading undefined data:
class A { virtual bool check() { return true; } }
class B { A a; this(){ a = new A; } ~this() { a.check(); } }
// B.a's object header is undefined (e.g. replaced with GC 
freelist pointer)

There's also invariants, which are prepended to the finalizers, 
so their code is subject to the same issues.

The best thing about the current implementation is that object 
resurrection has never been supported.

Jun 22 2017

Martin Nowak <code dawg.eu> writes:

On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky

FYI, we've tried to improve the binary pool search, but there 
aren't many pools and it's quite hard to beat.
A hashtable for a pages in the address range is too big.
I'd like to replace all of those separate pools types with a 
single page heap, similar to what TCMalloc is using.
http://goog-perftools.sourceforge.net/doc/tcmalloc.html
http://jamesgolick.com/2013/5/19/how-tcmalloc-works.html

There was also https://github.com/dlang/druntime/pull/801 which 
got reverted.

One problem that you'll run into with a Thread cache is 
synchronizing GC attributes.
In the stalled work on a thread-cache for the current GC. Using 
single-reader single-writer queues to would've been an option 
there to reduce contention.

https://github.com/MartinNowakhttps://github.com/dlang/druntime/compare/master...MartinNowak:gcCache#commitcomment-16202536

Jun 24 2017

Dmitry Olshansky <dmitry.olsh gmail.com> writes:

On Saturday, 24 June 2017 at 15:31:21 UTC, Martin Nowak wrote:
 On Monday, 19 June 2017 at 22:35:42 UTC, Dmitry Olshansky wrote:
 My take on D's GC problem, also spoiler - I'm going to build a 
 new one soonish.

 http://olshansky.me/gc/runtime/dlang/2017/06/14/inside-d-gc.html

 ---
 Dmitry Olshansky

 FYI, we've tried to improve the binary pool search, but there 
 aren't many pools and it's quite hard to beat.
 A hashtable for a pages in the address range is too big.

Doesn't have to be for pages. Pool granularity is 256k, aligning 
the pools at this boundary is enough. On x64 pool granularity 
could be enlarged.


 I'd like to replace all of those separate pools types with a 
 single page heap, similar to what TCMalloc is using.
 http://goog-perftools.sourceforge.net/doc/tcmalloc.html
 http://jamesgolick.com/2013/5/19/how-tcmalloc-works.html

I still think that separate pool types is better, see eg jemalloc.

Jun 24 2017

Martin Nowak <code dawg.eu> writes:

On Saturday, 24 June 2017 at 18:12:43 UTC, Dmitry Olshansky wrote:
 I still think that separate pool types is better, see eg 
 jemalloc.

Right now this leads to some inflation of RSS cause previously 
used and now freed pages can only be reused when the whole pool 
(e.g. 4MB or 16MB) is free again.
It doesn't seem sensible to reserve 16MB only for big (>PAGESIZE) 
allocations. In particular once the pages are dirty and mapped, 
you'd rather want to make use of them.

Jun 25 2017

D Programming

C/C++ Programming

Other

digitalmars.D - Analysis of D GC